A big part of my job here around TouchArcade is acting as a bit of an air traffic controller, it’s like playing a real-world version of Flight Control only instead of routing planes around it’s new games to writers, support issues to tech people, and other things along those lines. I’ve got an extensive Rube Goldberg machine of triggered alerts and other crazy things to help automate this as much as I reasonably can, but, every day I spend a huge amount of my time just keeping track of what’s going on in the world of iOS gaming as a whole. That also involves following our own awesome community, as well as other places on the internet where anyone might be talking about iOS games. One thing that consistently comes up is general confusion surrounding how we rate our games. The farther you get away from TouchArcade, the nastier people get about it, but the prevailing wisdom is something along the lines of “lol toucharcade loves everything lol.” It’s part of a bigger problem of people not really understanding how we rate our game reviews, but more importantly, why we rate them the way we do. I posted a lengthy response to the latest discussion I found about this, but it seemed a little weird to not also post a similar thing here as unless you listen to our podcast, chances are you’ve never gotten to the nuts and bolts of the TouchArcade review rating system.
Before I get started with all this, I’ll come straight out and say it: I wish we didn't score games at all. A long time ago in a galaxy far, far away we just posted reviews of iPhone games and you had to read the review and get an idea of whether or not you'd like a game based off the text. We eventually started doing monthly "Best of" roundups that picked a small handful of what we thought were the best games of that particular month, and that was the closest we ever really came to rating anything. Eventually reader pressure combined with Metacritic opening up an iOS category on their site forced our hands when it came to assigning scores to our game reviews as we published them. We went with a five star system because that's what the App Store uses and it felt like it fit in with what we were doing here.
The problem any kind of numerical rating scheme introduces when reviewing iOS games is that we’re applying a system with roots in traditional games and games journalism shoehorned onto a platform that’s unlike anything that gamers (and developers) have really ever experienced before. The quantity of games on the App Store, as well as the massive gulf in quality between the absolute best games and the absolute worst games makes utilizing conventional knowledge about how game review scores should work incredibly difficult. On a busy week, the App Store will have more new games released than a console might see in an entire year. Cynical followers of mobile gaming will often claim that "99% of mobile games are total sh*t.” So, assuming just for a moment that that number is true, since the start of the App Store to today we’ve seen 562,513 games released, 1% of that would mean that 5,625 games are presumably good, or at least, not “total sh*t.” If you take a slightly more liberal stance towards the quality of mobile gaming and think that even 2% of all the games on the App Store are good, that’s over 11,000 different titles worth at least some sliver of our time. If you’re willing to be more generous in your percentage of “good” mobile games, that number just continues to balloon.
For the sake of comparison, the PlayStation 2, which according to my research had the largest game library of any console, saw 3,874 games released in all regions across its entire lifecycle. As of this writing, we have reviewed 3,448 different iOS games. At the pace we’re going, at some point early this year we will have published the coverage equivalent of reviewing every single PS2 game ever released. Using our numbers from before, 3,448 games is actually 0.6% of the games that have been released on the App Store. Taking this a step further, that means that in order for a game to be reviewed on TouchArcade, it needs to be in the 99.4th percentile of App Store quality. With that in mind, how do you establish any kind of baseline of what the absolute worst game we review should be rated, and what should be the top end? Before we even write the first sentence of the review, we’ve already established that the game we’re writing about is among the best of the best, and there’s really no other platform we can even look to to inform us of how to handle this sort of thing. We’re skimming such a premium level of cream off the App Store that if you compared these percentages and this kind of quality curve to the Nintendo 64 library, our entire rating system would need to exist in the minor difference between the score we’d give Goldeneye and what we’d rate Ocarina of Time.
That all seems really crazy, and just doesn’t really work in any practical sense. Instead, we’ve chosen to look at our review scale as less of a "how perfect is this game" way and more of a "how much do we recommend downloading this game” sort of mentality. That's the only way we can get any level of granularity between the phenomenal games we write about, as again, anything that we post about on any level is good enough to be worth checking out and is easily among the best stuff the App Store has to offer. You saw the numbers from earlier, it isn’t even hyperbole on any level to say something like that. Additionally, like everyone else writing about video games, we have fairly limited resources with regards to how many games we can even cover. We try to intelligently manage those resources, resulting in us really only ever reviewing the absolute best of the best iOS games a vast majority of the time. This makes things feel a bit more timeless too, as old five star reviews still stand up as games I'd totally recommend downloading even years later, which is important as we've seen multiple generational leaps in graphical fidelity as iOS hardware continues to be released and get faster annually. With that said, I totally agree it's not a perfect system by any stretch of the imagination- No system that distills a complex opinion down to a number can ever be.
A great recent example as to there not really being a great solution to this problem was a recent review I wrote for Grand Theft Auto: Liberty City Stories [$6.99]. I gave it five stars, and from the outside looking in, it’s totally understandable to be like, "Wow, TouchArcade is saying that arguably the weakest game in the modern GTA lineup is a flawless game?" Do I think it's a perfect game? Of course not. GTA: San Andreas [$6.99] is an objectively bigger and potentially an overall substantially better game. It's also available on the App Store, and it even sells for the same price. However, given the massive amounts of terrible games on the App Store, do I recommend downloading GTA: Liberty City Stories over nearly everything else? Absolutely, in fact, if you follow TouchArcade and have downloaded the other GTA games as they've been released, you badly need this one too because it's frickin' GTA in your pocket. It’s a game with a real story, full voiceovers, great gameplay, and a level of depth and complexity that’s rarely seen on the App Store. Hell, it's still worth the $6.99 even if all you ever do is get on a motorcycle and run from cops. So, through that lens, five stars makes total sense for that game and it’s hard to argue anything lower.
Just glancing at the headline, looking at the score, and assuming we rate our reviews like everyone else inevitably causes the whole "lol TouchArcade likes everything lol” thing. I'm really not sure how to better communicate the intent of our rating system, as, really, it's true. We like everything we post about. We're in an incredibly unique situation that we can choose to ignore everything that's bad and still not have enough bandwidth to possibly post about everything that’s good on the App Store. 30 games looked compelling enough to be worth including in our “Out Now” post yesterday, comparatively, it appears that there’s 14 vaguely high profile console games coming out in the entire month of January. Even if you added in every random below-the-radar indie console title, it still wouldn’t come close to what we’ll see this month.
If we were a console site, we’d review all or most of those 14 games, and in turn we’d likely have a very wide spread of scores as some of those games inevitably will be good and some will be bad. Instead, we’ll actively choose to write about the five or so best games that came out today alone, amongst a pool of hundreds that were released that aren’t even worth mentioning. If we had a massive staff and the resources to review every single one of those games that were released today, the five best titles would stand out as amazing, incredible titles, that are every bit worth the highest marks on our scoring scale. The problem is, in doing that, well over 99% of our content would serve absolutely no purpose beyond making those good games feel more special, like their high rating is appropriate, or a five star rating from TouchArcade "means something." Also, I'm really not sure what value TouchArcade as a site would have, as why would you visit a web site to find an avalanche of terrible games?
I've thought about steering things away from the star system and instead some kind of recommend-o-meter sort of thing that better illustrates our overall intent. The problem with anything we do is at the end of the day it matters little what crazy system you come up with, it all still comes down to some kind of numerical value representing your opinion regardless of how it’s all displayed. Again, since we actively choose to only write about games we think are good, interesting, or otherwise among the best the App Store has to offer, anything that we review is naturally going to swing higher on a scoring scale. It makes no sense to slam great games that might not be as good as other slightly greater games with a substantially lower score just so we have a wider spread of numerical ratings. Similarly, it makes just as little sense to keep our top-end scores behind some "break in case of emergency" piece of glass that we only touch when something truly groundbreaking comes along once or twice a year. Things get even more complicated when you add review aggregators into the mix.
I have friends who ran a console news/review site for a while that had a four point scale that I really liked. It was "Don't bother," "Try it," "Buy it," and "Classic." I thought this was really clever because it dumped the granularity of deciding if something is a 78 out of 100 or a 79 out of a 100 or averaging crazy feature score matrices where you assigned scores to things like graphics, sound, “fun factor,” and other meaningless categories for its final score. With their method, you knew that a game totally wasn't worth your time, was worth maybe grabbing through Redbox or Gamefly to try out, was worth picking up, or was one of the best games of the year. It seems like a fantastic idea on paper, and similarly does a great job of conveying what we're trying to do with the TouchArcade rating system.
Sadly, this has one significant and very real drawback: Review aggregators are a thing, and regardless of how weird you get with your scoring system, everything eventually will be boiled down to a 100 point scale for aggregated average scores. In this example, a rating of “Buy” comes with a very strong recommendation, but it would be counted on Metacritic as a 75 out of 100, unintentionally significantly dragging down the average score of titles they thoroughly enjoyed. By Metacritic standards, their recommendation would be numerically distilled into a rating that's well below what people see as a "good" score. That seems broken too. You could say, "Eh, screw the review aggregators, who needs 'em, Metacritic does more harm than good," and while that’s certainly an argument you could make, we're still a very tiny site in the grand scheme of things. Those clicks and backlinks are valuable when it comes to site traffic, Google page ranking, and other things you have to think about when you're running a web site while facing the reality that you’re also responsible for the health of a small business with employees that need to be paid that have families that need to be fed.
At the end of the day, there’s really no great solution. I think what we have works pretty well, although at times it feels like our intent has been communicated rather poorly. A fantastic way to assign a numerical score to a subjective opinion of anything doesn’t really exist, and the challenges we deal with on iOS as a platform puts us in a unique situation that makes applying conventional logic in what other outlets are doing or have done incredibly difficult. Hopefully that sheds some light on things!
Watch Button Watch App