Approval and Score Voting are Intrinsically Tactical
My previous post was a large-scale comparison of approaches to voting based on modeling voters and simulating elections. I ran into a specific wrinkle there that I want to comment on from a more theoretical point of view. One question I set out to explore was how much voters benefit from tactical voting — that is, filling out their ballot by anticipating how others will vote and voting with that in mind, rather than merely giving a sincere expression of their own preferences. To do that, I wanted to compare sincere versus tactical voting as a strategy in each system.
I ran into a problem with this specific analysis. It’s ultimately impossible to give a definitive answer for what it means to vote sincerely in an approval or score-based voting system. These voting systems force voters to make tactical choices because they do not even permit ballots that simply reflect a voter’s preferences in a straightforward way. One must make tactical decisions in order to vote at all.
I’ll explore this from two perspectives: the challenge of threshold-setting in approval voting, and the challenge of choosing a scale of voter satisfaction in score voting. Ultimately, I’ll make the claim that these are different ways of expressing the same fundamental problem.
Approval voting requires tactical threshold-setting.
You are voting in an election between three candidates: Alice, Bob, and Casey. You’re a big fan of Alice, and would love to see her elected. Bob is terrible: he kicks puppies after letting them poop on your lawn and not cleaning it up. Casey is alright; you’re not excited by them, but it wouldn’t be a disaster to see them elected. You arrive at the ballot box, and see this:
Vote for as many as you like:
[ ] Alice
[ ] Bob
[ ] Casey
What do you do? Clearly, you vote for Alice, and you don’t vote for Bob (that sassa frassin’ dirty no good puppy-kicker). But what about Casey? If you don’t vote for Casey, and then Casey comes in second just barely behind Bob, you’ll regret the decision. If you do vote for Casey, and then Casey edges out Alice for the win, you’ll also regret the decision.
This is an example of a tactical voting problem. But there’s something else going on, too. In most situations, we can think about tactical voting by comparing tactical voting to sincere voting. In this case, though, which choice is more sincere? There’s simply no good answer. You can vote for Casey to differentiate them from Bob, or you can not vote for Casey to differentiate them from Alice, but the ballot doesn’t let you explain that you prefer Alice over Casey and prefer Casey over Bob, so you are forced to make a tactical decision: which of those preferences should you express, and which should you keep to yourself?
One could argue that voters “sincerely” approve or disapprove of certain candidates, and an approval ballot can sincerely express this. However, this oversimplifies how voters perceive candidates. Preferences are relative: it’s rare to find a candidate so perfect that a voter couldn’t prefer someone else, nor so bad that a voter couldn’t imagine anyone worse. Factoring a voter’s overall level of approval —their general optimism or pessimism about politicians, for instance — into the effectiveness of their vote would be an affront to democratic principles. Everyone’s vote should count equally, regardless of their general attitude toward politics. A ballot is an expression of relative preference, not overall sentiment. In the context of approval voting, therefore, there is no objectively sincere way to decide which candidates should receive a voter’s approval.
Limited precision score voting requires tactical threshold-setting.
The same argument applies to score-based voting systems, to the extent that they offer limited precision to score candidates. We might try to fix the example above by offering an intermediate option: 1-star ratings mean you can’t stand this candidate, 2 stars means they are alright, and 3 stars means you love them. But now enter Donna, a fourth candidate who feels a bit scummy, and you’d prefer Casey to Donna, but Donna is still far better than Bob. Now you’re back in the same dilemma: you cannot merely express your preferences without making a tactical decision.
Score voting requires a tactical preference scale.
As stated above, it might seem that intrinsic tactical voting only matters when there are fewer rating choices than candidates. This isn’t the case, though. Even with essentially unlimited precision, voting systems that average voters’ scores are still inherently tactical.
Your local election is being held again, with Alice, Bob, and Casey all running for the second time. (Donna decided not to run again.) This time, the ballot reads as follows:
Rank each candidate from 1 to 100.
[ ___ ] Alice
[ ___ ] Bob
[ ___ ] Casey
You love Alice, and you’re happy to assign her a rating of 100. Bob is terrible, and clearly gets a 1. But is Casey a 25? 50? 75? The election will be decided by averaging the scores for each candidate, so if you rate Casey too high, they might edge out Alice for the win, but too low and they might be edged out by Bob.
It’s a little less obvious here that the decision of how to rate Casey is inherently tactical. Nevertheless, I’d argue that it is an inherently tactical decision, because the scale on which to rank candidates is not well-defined.
An aside about pitch
Because “satisfaction” or “happiness” are such nebulous terms, it’s easier to explain what I mean in terms that are more concrete. Let’s talk about the pitch of a musical note, which is also all about perception, but gives us a precise selection of units of measure to investigate.
- To a musician, at least in the modern western world, pitch of musical notes is often measured in steps. Each consecutive key on a piano keyboard (including the black keys) is a half-step of difference in pitch. The distance from A2 to A4, for instance, is 24 keys, or 12 steps.
- In physics, pitch is represented by frequency, and measured in Hertz: the number of oscillations per second of the sound wave that’s produced. A2 oscillates 110 times per second, while A4 oscillates 440 times per second. That’s a difference of 330 Hertz.
- Let’s consider the note C4 (also called middle C). It’s 15 keys, or 7.5 steps, above A2. It oscillates about 262 times per second, which is 152 Hertz above A2.
Suppose a musician and a physicist are asked to rate the relative pitch of A2, C4, and A4 on a scale from 1 to 100. They both assign A2 a score of 1 because it’s the lowest pitch, and A4 a score of 100 because it’s the highest pitch. But how do they score the C4? The musician might look at the number of steps of difference: 7.5 out of 12, which is a score of about 63. The physicist might look at the frequency difference: 152 out of 330, which is a score of about 47.
Why do they reach different results? They are considering pitch on different scales with different rates of growth. These aren’t the rather boring differences we see in distance, either: whether you’re measuring in inches, centimeters, or light-years, twice as far is still twice as far. But frequency grows exponentially relative to steps, so it increases much faster in higher octaves. Conversely, steps grow logarithmically with frequency, so they increase much faster at lower frequencies and then slow down. Crucially, neither of these measurements is the right one all the time; it’s a matter of choosing a perspective and carefully defining what you’re measuring.
But what about voters and elections?
That kind of scaling issue, where there are different scales that change at different rates, is very common when we deal with perception and subjective experience, whether it’s the pitch of a sound, the brightness of a light… or, far more so, experiences of happiness or pain or satisfaction. These experiences don’t live on one definitive scale where we can compare relative distances or take averages. Rather, the scale itself is a matter of perspective, and the more subjective the experience, the harder it is to define that perspective.
So how do you rate Casey on your scored ballot? Maybe you pick a logarithmic scale, analogous to musical steps, and Casey receives a score of 63. Or maybe you pick an exponential scale, similar to frequency, and Casey receives a 47. Neither of these are fundamentally sincere or insincere ways to vote, because the ballot didn’t tell us which scale to measure on. They are simply choosing a point of view about what satisfaction means and what scale it’s best measured on.
But they do have tactical consequences: choosing the logarithmic scale that rates Casey as a 63 means using your ballot more to stop Bob from winning, while accepting that you’re doing less to help Alice beat Casey. Conversely, choosing the exponential scale that rates Casey as a 47 means using your ballot more to help Alice, and accepting that you’re doing less to help Casey beat Bob if Alice isn’t the winner.
Once again, you’re being forced to make a choice that has no sincere answer, but definitely has tactical implications. The tactics are intrinsic to the voting system.
Tactical thresholds and scales are the same thing.
These might initially seem like two very different phenomena, but I’d argue they are two manifestations of the same thing. In the first election, when you were asked to make a choice whether to approve of Casey (grouping them with Alice) or disapprove (grouping them with Bob), one way to look at this is that you were asked whether Casey is more similar to Bob or Alice in terms of how satisfied you’d be with their election.
Notice that if you adopt the logarithmic scale, where Casey scores a 63, you’re likely to consider the most sincere answer to be grouping Casey with Alice, and therefore giving them your approval. On the other hand, if you adopt the exponential scale and rate Casey a 47, you’re likely to have a tough choice, but ultimately conclude it’s more sincere to group them with Bob and not give them your approval.
In this way, the threshold-setting problem is just a consequence of the scale-setting problem. Any threshold you choose effectively defines a scale where that threshold is the midpoint between the two extremes. The precision still matters, but only in the sense that rounding error further exaggerates the difference between the scales. That is its own separate weakness, but the fact that voting is intrinsically tactical ultimately comes from the scale-setting problem in both cases.
This has consequences.
This originally came up, for me, because it made it difficult to say what it means to compare approval, range, and STAR voting systems in my simulations with sincere ballots. These voting systems do very well on many measures, such as maximizing satisfaction of voters and making decisions consistent with democratic principles like majority rule. However, when it comes to the goal of minimizing tactical voting, there’s a problem because non-tactical ballots simply do not exist. I attempted to approximate a “sincere” ballot by making these tactical choices arbitrarily, but this was rightly criticized as sub-optimal in many cases.
But outside the challenges of implementing my simulations, it has consequences for real elections, as well. An important goal in comparing election systems is to minimize the significance of tactical voting, since not all voters are equipped to vote tactically. But what does that mean when there’s no such thing as a non-tactical vote? For the same reason that I struggled to perform my analysis, voters who haven’t followed election polls and strategy closely may struggle to know how to vote at all.
With approval and score-based voting, voters are asked to cast ballots in a way that inherently involves tactical decisions, leaving no escape valve for sincere expression. This honestly can feel more like playing a complex board game than seriously assessing voter preferences. What implications might this have for voters’ decisions on whether to vote, or their confidence in the legitimacy of election results? I don’t have those answers, but they are questions worth considering.