An Example of How To Use Statistics using Contenders NA S1

September 25th
written by Barroi
Twitter

numbersThis document is an example of how stats can be interpreted. It is neither a perfect guideline nor entirely flawless, but it should contain information on how stats should and how they should not be used. I write stats-based performance reports on a regular basis and I am probably the by far most experienced person in regards to using Overwatch esports statistics in the scene. There was no one that asked me to do this, but I felt like people could learn from that experience. This post will not be something to entertain the general reddit user, but since I do not dislike people looking at ads I decided to post it on Winston’s Lab (disable AdBlock by the way, thanks <3).

I encourage anyone who wants to work with Overwatch esports statistics to read this. It is not the holy grail of stats-usage, it has good and bad points. Try to find the good parts and use them to improve yourself.

I want to preface this by saying that you can not judge everything and everyone by using statistics. Stats are merely a tool that can help you, but in the end the human brain (if used correctly) will always be superior. And at this point I want to clarify that I don’t want to blame or shame anyone for doing something “wrong”, I just want to try to help improve anyone who makes it their goal to work with OW statistics in the future.

Fight Winrate is a beast metric

This metric is one of the most useful, if not the most useful metrics in many ways. In case you wonder how I define “fights” read my article on it.

First of all it is a great replacement for normal map-winrate. If you look at data from official games you will always have the issue that one player only played a very limited amount of maps, and it gets even more limited if you look a players’ numbers on only one hero. In that case it gets especially murky, because map-winrate is the time you have spend on a hero when playing a map that you ultimately won divided by the total time you spent on a hero. The problem is that you might switch to certain heroes only because you think that a victory is very certain (because of the first attack round) or you need to mix it up to have a chance at all. Because of that map-winrates can be kind of questionable from time to time. Your fight winrate, though, is not influenced by you winning a map or not, it simply is the percentage of teamfights you have won. You can argue that drawn-out 20 kill fights influence that dataset in a weird way, but in the end those fights are rather rare and even with them I think FWin% is more useful than pure mapWin%. FWin% also includes the margin of victory in a way that Win% can’t, you win more fights if you totally destroy a team, but the map winrate remains the same.

The aspect that makes this a really good metric is another one, though. Fight Winrate is by far the best “Control-Stat” (well, ok time played is better if the time is below 30 minutes, but in general FWin% is way more useful). What I mean with Control-Stat is that it can tell you if a players’ stats have the potential to be “boosted” because of good teammates or poor opposition, or lower then they should be because of bad teammates or overwhelming opposition. This does not mean that this has to be the case, but it has the potential to be.

There are some numbers you have to keep in mind when using FWin% as a control stat: On average a person wins 46% of his fights (remember fights can be tied, read my article on it). And half of all players have a fight win rate between 41% and 51.5%, so anything that is below/above that can be considered an outlier. You might even want to consider <43.5% or >50.5% a “small-outlier” (one third of the players fall between that).

Lets look at an example from Contenders NA S1. Please note that I am writing this article after week 5. Link to query. Click on the picture to enlarge it.picks

Red are the players that have abnormally high FWin%, blue means relatively low. So, if you try to evaluate Effect you might want to say “Effect is one of the best Tracers in the world”, but if you then contextualize his high FWin% in combination with the (arguably week) opposition he faced, you should continue that sentence by “but his current statistics could very well be boosted by the fact he played bad teams”. Again it doesn’t necessarily mean they are boosted, the data just suggests that they might be. What you have to do “manually” is think about which teams he faced, if he played in Contenders he will probably perform worse against the Lunatic-Hais of this world, if those stats were the result of playing those teams considered top tier then he could just be insane or there could be another reason.

That other reason would be that his team mates are insanely good and he just reaps the benefits from that. Since I couldn’t think of an example from Contenders, here an example from APEX:picks2In S3, Esca’s Sombra and Soldier stats had the potential to be artificially boosted, as suggested by the 52%+ FWin%. In Season 4 then his stats on both heroes got worse while his FWin% have become much more normal. Especially his performance on Soldier (blue) really dropped off a cliff, one more death/10 is huge (that said, he only played that hero for 40min in S4, which is an ok amount to make this statement, but you would wish for it to be more). The conclusion is that his numbers were partially boosted (if it was because of bad opposition or good teammates is for you to decide). Note as well how map winrate could not have predicted that at all.

Let’s go back to the Contenders example, here is the picture once again:Tracers in Contenders NA S1

The data suggests that J3sus’ numbers could be a lot better on a better team, you can base that assumption on his low FWin% and high PTK (another great Control-Stat). Another very interesting player to look at is Carpe, his FWin is 10% lower than Effects but he already stands out because of his high K/10 and KPU. It would be an interesting experiment if we could replace Effect with Carpe (ignore all those side-effects like communication) and see how his performance changes. I would assume that he might be able to at least come close to Effect’s numbers. All in all Effect is so far ahead numbers-wise that he will probably stay ahead of Carpe even in an experiment like that. Note that Carpe and Effect are both Korean, btw ;).

Context, Context, Context

Earlier I said Effects numbers are probably higher than they would be if playing Lunatic-Hai. Context is one of the most important things that is currently not represented in those numbers. It seems a rather go-to thing to say, but its also easy to forget so better remind yourself to always contextualize your data. Look at who someone played against, who he played with, what the meta was at the time. Take as much info into account as possible, you will not be able to remember/know everything, but as long as you try to use what you can think of you are fine.

The most important factor is probably the faced opposition, so at least make sure to think off that. FWin% is a great Control-Stat to help you a bit with that, but it won’t be able to tell you everything. PTK is a Control-Stat that can help you with determining how influential a player was on good/bad team. If you have 3 more K/10 but your PTK is 10% lower than that of your general heroX player then you might want to reconsider calling that player a god.

Again numbers help, but you need to use your brain to make them work. Incorporate them with the context you can think of and your own opinion of a player, the “eye-test” if you will, and go from there.

Time Played as a Control-Stat

I mentioned earlier that time played is a great Control-Stat if you look at low timePlayed numbers. To get more specific: Just disregard everything that does not meet a certain threshold! You might want to say “but I could at least use those numbers as a point of speculation” and this is true to some degree, but in the end those numbers really don’t tell you anything. And more than that people could easily think that you are using those numbers to make a strong case for something, I myself fell into that trap and felt like an idiot because of it.

It is just a lot safer to not use those numbers at all, there are less mistakes to be made. If you still want to use low-time numbers make sure to make it absolutely clear that those numbers don’t represent anything. I sometimes use numbers for 15-30 playtime if I feel like I have to, but I always include something like “I really really really need more data, those numbers are worth as much as speculation about the weather in 12 weeks” in that evaluation.

The thresholds I found very reasonable are the following: Don’t ever use anything with below 15min playtime. Between 15 and 30 min you still can’t draw conclusions, but if you have use them very very carefully. 30-45 minutes starts becoming a slightly reasonable place, but if you want to draw big conclusions the player/hero combo you are looking at should at least have 1 hour of data to it. Above 2 hours of data, is in my opinion, a really good place to be (but data is often too limited to get here). But even if you are at one hour or 2 hours of data, don’t disregard the context. If you got one hour of playtime by playing a single opponent then that fact should probably influence your evaluation process quite heavily.

Example: Winstons

Now that you know what Control-Stats are let us look at some more interesting numbers in regards to player-evaluation. As an example we take a look at the Winston players in Contenders. Link to query.picks

Knoxxx (blue) has the highest Rating and if we take a look at the actual numbers we will see two things. 1) He has nice above average K/10 and D/10 and 2) he seems to really know when to engage and when not to, because his First-Kill and First-Death ratios are magnificent. In case you don’t know, the team that gets the first kill of a fight wins that fight in 78% of all fights. Drawing first blood is arguably more important than having an ult advantage of 2 ults (I will write an article on that one day).

If your FK is 12.5% that means that you draw first blood in every 8th fight. 10% obviously means you do so in only every 10th fight, so a small difference in percentages is already a huge difference in impact if you consider the 78% chance to win a fight. First Kills alone is not what makes Knoxxx’s numbers so interesting, it rather is the pairing with the low FD ratio. The data strongly suggests that he is really intelligent when it comes to choosing when to engage. And please note how I phrased that, the data “suggests” this obviously one has to confirm that on his own.

The other player I picked out here is iRemiix. The reason is that I really like to categorize players as “aggressive” or “passive” if possible. To me an aggressive player is someone who gets a lot of kills but also dies much more than normal. The way I use passive is for a player who kills less than the average but also dies way less often. I think the best hero to do this with is Soldier as he can be used aggressively as a semi front line in dive comps or passively as someone who stands in the back in Control comps. I personally think that there are some players that are naturally better at playing an aggressive style and some are naturally suited to the defensive style. I think for example, that HarryHook is (or rather was, who knows about now) one of the best passive Soldiers in the world.

As for Winston it is harder to conceptualize players this way, but iRemiix might be a fairly aggressive player. He gets a ton of kills but also dies more often, especially his FirstDeath ratio is pretty high. What slightly contradicts this assumption, though, is his low FWin%. This suggests that his deaths might only be this high, because he faced overwhelming opposition (or his teammates were letting him down). If that is true remains to be seen. What makes iRemiix even more interesting is that he has a rather high KPU, taking all of that into account it seems like iRemiix could perform even better if plugged into a better team (assuming that would be possible).

As a side note, having Tanks that get a lot of kills seems to be rather important if you want to win championships. Or rather not relying on your DPS players too much is what wins you titles. At least this is what the data suggests and I feel like in the West people put too much attention on having good DPS players that totally carry your team, even though this generally does not seem to work. Whereas the idea of having good tanks that can carry your team seems to have a higher success rate. If you want to learn more about that, read up on the PTK Model.

Lucio

Let’s start by saying this: Don’t judge Lucios by their stats! In my opinion the biggest quality for a Lucio player to have is their shot-calling ability. If a Lucio is fragging out that’s a fun little info, but don’t use that to say that a Lucio “is good”.

I found it really hard to come up with a metric that is useful for Lucios and the only remote success I had was with the creation of UOOF (ults outside of fights), sometimes called “Wasted ults”. This number tells you what percentage of ults was used when not being in a teamfight situation. It turns out that Lucios that use more ults outside of fights (probably mostly before fights start, so “too early”) lose more, what a revelation.

Every other metric for Lucio is basically complete garbage in evaluating a players’ performance. Kills and Deaths for example are correlated to winrate to such a great degree that you can completely disregard those as caused by the winrate (in theory that statement could be false, even the other way around, but I think its safe to say that its not entirely unreasonable to assume that statement is true).

Presenting Stats in a relative way

If you present statistics it helps a whole bunch if you can put the numbers in relation to general numbers. Saying player X on hero Y is an “above average (or top 10%, bottom 5%) player on this patch in regards to his kills” is worth so much more than saying he “gets 7.48 Kills/10”. A person not dealing with those numbers on a daily basis, and even those who do will not be able to do anything with 7.48, is it good, is it bad, who knows.

When I try to evaluate players I always try to evaluate them in regards to others in the recent past (normally I choose something like the last 3 months, sometimes even less). There is even a page on Winston’s Lab that already does all of this for you, it is the one I used for all the screenshots above, called “Compare and Rank Players“. Here is how it works: Filter 1 is for the players you want to look at, select an Event, a team or even a certain player. Filter 2 is the data you want to compare it to, here I normally just put in a time constraint, such that the data only consists of the last 3 months, if you want to go a step further you could do that and also only include tournaments that you think are top tier, for example.

To further limit what data you will see you can put in a time criterion and select heroes in “Options” on the right. It will then color code the players’ stats in regards to where they stay stack up, the CFP column tells you how many players met the criteria given by the options and filter2, aka. how many players where used to create the color coding.

Now, sometimes if you look at data you will find out that all the players you take a look at (selected through filter 1) are very average when compared to a broader data set (selected by filter 2). Then you should, most of the time, not make an effort to pick out the best average player and talk of him as good. In that situation just stating that there was no one that was really outstanding already is a fair summary. Sometimes you look at players that are mostly bad, but one of them is an average player. Then this player is not suddenly super good because he’s so outstanding in relation to the bad players, he is still average.

I think you get the gist of it. What I think sums this up fairly well is: Try to use phrases like “above average” and “top 10%” instead of the pure numbers, people will understand it more easily and don’t limit your data set too much, you want to know how good players are in a broader sense. Building a team that is better than most tryouts but still only average, will not be very satisfying in the long term.

Using statistics as negative evidence

Something I kind of alluded to earlier is that statistics are great if used as negative evidence. Don’t be afraid to look for numbers that contradict your theory. If you only gather stats that support it you are only looking for evidence, but it doesn’t matter how much evidence you have, you can never be 100% sure that something is the case. The additional value one piece of evidence gives you when you already have 10 pieces of evidence, is far less than the value you get from finding a piece of negative evidence.

Control-Stats are basically the stats you want to use for this. If you really can not find anything that contradicts your theory that’s good, and if you find something that doesn’t necessarily mean that you are outright wrong, it just adds perspective. If you want to see an example of how to use stats as negative evidence go back to the “Example: Winstons” section and look at the second to last paragraph.

Don’t give in to your inner confirmation bias, look at things scientifically.

Conclusion

First of all, apologies I think this whole document was a bit messy and could have been structured a lot better.

Again, I want to make this as clear as possible, don’t take stats as the ultimate evaluation method! That ultimate method is still using your brain. To make stats useful at all you have to always look at the context, not taking that into consideration only makes you a fool.

Sometimes comparisons of stats or statistics on their own are completely unreasonable to consider/present, if that is the case don’t even try using numbers, just state your opinion on things. If you would use numbers to say “xyz might be the case” then people would only assume that you actually strongly believe it to be the case and those stats are strong evidence for it. And I see what you are trying to do there but it’s really better to not use them at all.

So if you have come to this point of the article and actually read every single word and thought about them (thinking about it is really the important part) then good for you and thanks! It tells me that you are dedicated to learning about those things and if you were disciplined enough to read all of this then I am sure you will be able to become only better informed and make less mistakes with the time. Because remember it is not about becoming perfect, as this article was not perfect, it is about making less mistakes.