Spelunky is probably my favorite PC video game of 2013 (pending when I finally give Divekick a try). There’s a lot of good stuff going on in the gameplay – find your way down through randomly assembled platforming configurations and collect as much treasure as you can. You have a clear goal, but how to pursue that goal is ambiguous, the way hazards can combine are treacherous, and the ghostly dread of the soft time limit adds a sharp tension to every level you attempt. It has all the trappings of a great game you can play basically forever. However, there was one feature that really stood out to me when I began playing: the Daily Challenge. A master server randomly generates a single configuration for the day, and all players get exactly one chance to score on it. It nicely counters the random arrangement and allows players to directly compete on an equal footing while still keeping the core gameplay completely intact.
I loved it – it was exciting to boot up the game each day to try my hand at today’s challenge. Knowing that my one shot for the day was on the line added even more tension to the run, and it really brought out my best. I really had to play things smart – I had to know when to take a big risk with low resources and when to just move on, when to hold ‘em and when to fold ‘em. It seemed like the game was at its best. Then I began to see a disturbing trend in the high score list for each Daily Challenge. That’s when I realized that I was playing all wrong and came to a surprising conclusion:
Daily mode as implemented in Spelunky is actually a bad idea.
When to Hold ‘Em and When to Fold ‘Em
Why this 180 in my opinion? It has to do with Spelunky’s design and the way daily mode changes the way you play. Spelunky is largely about risk management. This shows up in several forms throughout the design, but largely ties to resources. There’s a crate behind a rock wall – if I use a bomb to get it, it might pay off with more bombs, which is always good. It might give ropes, which is maybe good depending on how many ropes I have. It might be a great item, or it might be an item I already have, which means using my bomb was a waste. You have a risky choice, but the payoff is predetermined. You just have to decide whether to buy in or not.
This manifests itself in other ways, too. You can sometimes bomb a wall open or use a rope to climb a cliff to get guaranteed treasure. If you can see it on your screen, you even get to know exactly how much. Since bombs and ropes can be bought, you can use that knowledge to quantify how much money you came out ahead. But that assessment assumes that you’re going to find a shop that carries bombs or ropes on a later level and that you won’t be in a situation that required that bomb or rope before you find that next shop. It could be that you use your last rope to get 2,500 points only to find a situation on the next level that a rope would’ve gotten you 4,500, maybe more. Maybe you even fall into a pit that requires that rope to escape, and your run is just over now. You didn’t really have a way of knowing for sure what lies ahead, and this requires you to plan for future levels and situations you haven’t even seen yet. You have to draw on your overall experience to know what risks tend to pay off when.
A really major long-term risk you have to decide on is the secret levels. In each “world” (1 through 4), there’s an item that requires some extra effort to get, and they must (mostly) be acquired in sequence. If you miss an item along the way, your attempt at the secret world 5 fails, and each step is progressively riskier. Getting the Udjat Eye in the Mines might require some extra ropes and bombs to secure the locked chest and its key. Getting the Ankh in the Jungle requires a $50,000 buy-in, which is directly deducted from your score. Alternatively, you could steal it, which requires killing seven shopkeepers, arguably the hardest, and certainly the most volatile enemy in the game, plus you’ll be fighting additional ones throughout the rest of your run. Getting the Hedjet in the Ice Caves requires you to sacrifice the extra life that the Ankh otherwise grants you, which takes away a powerful safety net, and you must be willing to sacrifice any items or weapons you might be carrying. Getting the Scepter in the Temple requires you to fight Anubis, a powerful enemy with homing one-hit-kill projectiles. Subsequently getting the Book of the Dead in the Temple’s City of Gold requires you to fight Anubis II, who will follow you to all subsequent levels until killed. Even after all this extra effort and risk, the entrance to the secret world 5 requires even more precision as you fight Olmec and ride him down into lava to enter the door. Doing all this gives you not only access to the City of Gold (as earlier described) but also the secret world Hell, which gives you four more levels worth of treasure to try for and another boss with a guaranteed payload of bonus treasure in his room. It’s the riskiest thing in the game but is extremely well rewarded.
So far it seems like maybe we’re doing pretty well as far as the game dynamics go. You make the best decisions you can with limited information, and when your run is over, you get a place on the leader board that gives you feedback for your decisions. Maybe you got to 3-1 with $200,000, and you got within the top 1,000 players. Then you look at the top slots: loads of them are on level 5-4. All of them approach a million, sometimes ranging over two million. Every single one of them shot the works, went for broke, and got the big payoff. Then it hits you:
If you want to win at Spelunky, you have to do the same.
Every Hand is All In!
You have to take every risk you possibly can. You have to play dangerously at all times. You have to bet the farm on every hand. If you decline a risk that would’ve paid off, if you fail to bomb a wall that leads to treasure you had no other way of knowing is there, you lost automatically because someone else in the world with the exact same information as you (i.e. not much) did take the risk, and it paid off for them. It’s not that you’re being punished for guessing wrong – it’s that you’re forced to guess that all risks will pay off. You have to be as greedy as possible and just hope it all works out, because inevitably the top ranks on the leader board are full of even greedier people who don’t even think about playing it safe. And really they shouldn’t: if they play it safe, someone else among the thousands of daily participants will beat them. The odds are in the numbers.
And thus the risk management game breaks down. We’ve optimized the strategy into maximizing risk at all times. You always have to try for the secret levels, even though the Ankh is a great safety net and fighting Anubis is so dangerous, because without 5-1 through 5-4, chances are you just can’t compete. You always have to rob and kill every shopkeeper you see, regardless of how dangerous the setup because you need all the bombs they carry to blow up as much of City of Gold as you can (all exploded tiles spawn more treasure) and still have enough bombs to collect the column of gems in King Yama’s chamber on 5-4. And forget about buying bombs instead, because not only does buying deduct from your score, but the shopkeepers themselves drop yet more gold when killed – it’s a deficit you just can’t afford.
And if you have to do all these things, that means everyone is doing them. So really that means there’s not much reason to believe that the top slots on the leader board are much better at the risk-management game than anyone else is. In fact, there is no risk-management game at all anymore – the optimal strategy is known, and everyone just has to execute. And on top of it all, the dedicated player can cheat by watching someone else play out the Daily Challenge before making his own attempt, thus utterly crushing any hint of risky decisions.
You want proof that Daily Challenges have problems measuring decision skill? Just check the Daily Challenge leader boards. You’d think that if it’s a good measurement of skill, you’d see a lot of repeat names among the top ranks. As of the time of this writing, of the 56 entries for the top slots for the Daily Challenge (the top eight for each of the previous seven days), only 7 of the 52 total names are repeated. And from what I remember, when the game first released on PC (when the most people were playing and competition was fiercest), that number was typically zero. Even today, within the past week, no one person holds the top slot more than once. I’m not saying these players aren’t skilled, but I do think that perhaps this mode measures the wrong skill.
High Score By Suicide
The Daily Challenge isn’t the only mode where the risk-management game breaks down. What about the regular Adventure mode? If the decisions are largely optimized for the Daily Challenge, the problem is ten times worse when you’ve got unlimited chances to get a high score. See, even though we know what methods are required to get the absolute best scores, there will still occasionally be times in the Daily Challenge when you might decide that you just can’t pull off that major risk and that you need to play it safe. Not so in Adventure mode! Now that our only directive is to get the highest score ever, any time our optimal strategy of maximized risk doesn’t pay off, we can just start over! Did you lose two of your starting four hit points on level 1-1? Suicide. Did you misplant a bomb or not get the item you wanted from a crate? Suicide. Does the first shop not have a weapon you can use to kill the shopkeeper? Suicide.
This isn’t anything remotely like managing risk. You don’t have to deal with the consequences of your actions, nor do you have to deal with challenging configurations. This is the game on easy mode. Just keep starting over until things happen just the way you want them to. Anyone who’s watched Spelunky on Twitch TV knows that by and large, this is the primary way the game is played. In fact, that’s exactly the way we got the highest video-documented Spelunky score ever: Bananasaurus_Rex’s amazing $3.1 million run was predicated on hours and hours of restarting the game in order to get a plasma cannon (0.1% drop from a crate that isn’t even guaranteed to be present) on level 1-1 and a shop with a jetpack on level 1-2. Does that mean that every run from now on that doesn’t go for that once-in-a-lifetime seed is just a waste of time? Without a scoring rubric to validate scores that aren’t the highest one that has ever been, the answer is unavoidably yes.
At this point, the game has more in common with a slot machine than a risk-management engine. Forever waiting for that best seed ever in order to justify your new high score is not only tedious, but also largely fruitless when you consider that someone else hunting for the perfect seed might get an even better one than you got. In fact, that’s the very reason why we have the Daily Challenge mode to begin with – to mitigate the random seed’s affect on performance! Without that mitigation, it’s hard to justify the high score holder being the world’s best player necessarily. While he’s certainly well-versed and skilled at the game, all we really know about Bananasaurus_Rex from his world record is maybe that he’s just the most patient player.
The same is true for speed runs, a type of play for which Spelunky is ill-suited. I’ve been watching some Twitch streams of players with quite impressive skills in that regard, but invariably speed runners will abandon their run the moment it goes the least bit wrong, especially in the early stages. Maybe the player made a mistake, maybe the seed was unfavorable, but either way the player can simply choose not to deal with even the tiniest hiccup. The problem is the same whether you’re running for score or for speed: as long as you’re shooting for the best score, all failed attempts are essentially discarded and unscored. Your sub-optimal attempts end up not reflecting on the way the game measures your skill. It’s just as well – the only way you could look at your failed attempts are basically as losses, and the better the score or time you’re trying to beat, the more often you just plain lose.
Dealing With Your Mistakes
Don’t let the article so far convince you that I don’t like Spelunky. It isn’t a bad game by any means – it’s just that the scoring mechanism is a little broken. It encourages behavior that seems contrary to how you’d think the game is played. But we can fix this! We just need some way to reward the player for sticking with his run no matter how bad it gets. That’s what the original goal of the Daily Challenge seems to have been – to make you deal with your mistakes and make the most of them.
Awhile ago I was talking to a friend of mine who’s an SCA member. He participates in ranked archery shoots, and he told me about how they worked. When you shoot a flight of six arrows at one of many qualifying events, you can choose to publish that score. Your ranking is determined by the average of your top three scores. What’s more, the averaged scores have to happen within a designated twelve month period, the “tourney season” if you will. When the season ends, you start over and your old scores go away completely no matter how high they were. When I heard this, I realized right away that SCA archery tested a quality that Spelunky doesn’t – consistency. It isn’t enough to get a high score once. It isn’t enough that you got the highest of scores three years ago. What matters is, can you keep doing it? I realized that’s exactly what Spelunky needed.
So how do we make Spelunky test your consistency? I started with the idea of averaging your scores together. SCA archery is onto something good, but it still has that same problem wherein it motivates players to ditch scoring attempts early. If your first shot is zero points, that might already be enough to make you abandon the whole flight! Also, since you can shoot as much as you want all year long, but only submit the top three scores, you can run into some strange scenarios: maybe archer Robin only had time to shoot three times during that year, but averaged an exceptional score on those three times alone. Meanwhile, archer Joe shoots five hundred flights, most of which are well under Robin’s average. But every great once in awhile, Joe gets a little lucky and shoots higher than Robin’s average. Maybe he does so three times all year long. Who’s the better archer? The scoring rubric says Joe, but it seems pretty obvious that Robin has more skill. His scores have 100% consistency, whereas Joe is better than Robin less than 1% of the time.
One solution I’ve devised, one that I’m currently pioneering on my Twitch channel, is “Average of Ten.” All you need to do is play the game whenever you like while taking the average score from your previous ten consecutive runs. When a score is more than ten games ago, you bump it off the list like it never existed. This already has a lot of good things going for it. Foremost, it encourages the consistency factor. Sure, you scored $2.1 million on that one magic run six months ago. Who cares: Can you keep doing it? No longer can you just ride on the coattails of past performance – how good are you now right this second? That’s what I really care about. Also, it makes every single run matter. Are you down to one hit point on the very first level? Tough. Make the best of it, and minimize the impact it has on your average. Turn it around despite all odds. Can you do that? Only average of ten can tell you. It also very much feels like a ladder system. One bad game will hurt, but it won’t just completely kill you like it does in the Daily Challenge. Just keep playing well, and eventually the score will disappear, and your average will be none the worse for it. Conversely when you get an uncharacteristically high score, you only gain so much from it, and with no repeat performances, even that small boost will quickly go away. It also counters the random seed pretty well since it’s average of ten games. Sure, you’ll get those occasional seeds that are particularly nasty, but ten times in a row? Not likely. And even if it does happen by freak accident, you’ll climb right back out afterwards when your bad luck runs out. If nothing else, it feels like playing a Daily Challenge every time you play the game! I’ve been testing this method with a lot of success. It changes the way you play the game, you can easily track it on your own without the game’s help, and with some effort you can even potentially adapt it to speed runs!
“Average of Ten” is hardly the only way we can effectively rank players in a single-player game. Dinofarm Games’ upcoming title Auro also gives you a randomized level every time you play, but instead of asking you to get the highest score ever, Auro sets a target score based on your skill level. Reach it and you earn points towards graduating to a higher skill level with a higher ranking to show for it – and a harder score to try for. Win several times in a row, and your rank increases faster. Lose and expect to see your rank drop. Now, here is a single-player game that rewards consistency. It gives you the most honest assessment of your skill, and it gives you something to compare with your friends and rivals! On top of that, every single game has a reasonable, attainable score goal that ensures that games don’t become prohibitively long, like the seven-and-a-half-hour long marathon session that poor Bananasaurus_Rex had to endure. Auro’s scheme is designed to produce quick games that still manage to give you feedback that’s extremely reliable. This idea is so strong that I feel it should be the standard for scoring and ranking single-player games. Keep an eye out for this title when it releases in August!
So what about Daily Challenge mode itself? I don’t think it’s beyond help by any means. Earlier I mentioned the problem of there being thousands of players for each Daily Challenge. Really, that’s the root of the problem: when you have thousands of participants, only one guy gets to win, and that’s the dynamic that makes everyone go for broke every game. But what if you changed the Daily Challenge into an individualized Player to Player Challenge? Imagine if instead of a worldwide free-for-all, the game matched you with someone close to your skill level and gave a pre-seeded dungeon to just the two of you? Now you don’t have to worry about someone out of thousands lucking out with a payout on every long-shot bet – you have just one opponent and a reasonable expectation that he might get killed if he tries any unwise risks. Turns out that the problem with Daily Challenge mode wasn’t in the rules, it was in its scope. Giving the same seed to just two people gives them an opportunity to properly manage risk while still retaining a good chance of winning, even if they fall short of the million dollar mark. Add in a ranking system and an unranked “Challenge a Friend” option to the mix, and this mode is ready to ship! It’s too bad we can’t do this on our own, like we can with the “Average of Ten” method – I’d play Player to Player Challenge all the time.
Does It Matter?
It may seem to some that I’m overanalyzing a simple scoring mechanism, but it’s anything but simple. Score is the single most important part of Spelunky. It represents the goal of the game, the force that drives you to do anything at all. It’s what gives you feedback for your actions and calculates your skill. That makes scoring solely responsible for the way we play the game, and the way we’ve been playing Spelunky isn’t exactly ideal, as evidenced above. We’ve largely solved it down to just execution, and while execution is certainly a skill, it’s just not as interesting as risk-management decisions. Risk management is what separates Spelunky from Super Mario Bros. Once you beat a Mario game, you can do so any time without much trouble. But Spelunky is never truly beaten. Sure, it has an “ending” after Olmec or King Yama, but dying on level 2-4 is just as valid an ending as those end bosses, provided your score is up to snuff. And since score is the only thing that matters, there’s always room to improve. See, if you’re only playing the execution game, you’re only playing half the game. Whether or not you can do something is a short-term skill. Whether or not you should do something is a long-term skill, one that you can spend a long time cultivating, indeed one that gives Spelunky life far beyond just completing the last level. If your game is all about execution, you’re missing out on the best part of the game.
Feel free to join me on my Twitch stream as I play my “Average of Ten” scoring rubric. You can even download an Excel scorecard from me and play it yourself! I’d love to see what kind of scores you can get, so do drop by and let me know on my stream!