Ultimate Kill Power Ratings in Overwatch (Escort/Hybrid Maps)

April 09th

picks

Tactical Visor is the Most Efficient Ultimate

Edited by Chase Wassenar @RedShirtKing

Thanks to Winston’s Lab, I’ve had the opportunity to breakdown teamfight data for professional Overwatch matches in an effort to quantify exactly how effective each ultimate is. For this analysis, I examined data from Winston’s Lab database (4,733 “fights”) on escort and hybrid maps from December 1st 2016 to March 7th 2017 (Season 3 and Season 4). Ultimates do vary in power by game mode, so segregating by game type is necessary. Only ultimates used at least 280 times were included to ensure a meaningful sample size. The cutoff of 280 was chosen because it corresponds with a 95% confidence interval equal to 1. Sadly, there was not enough data to accurately estimate ultimate ratings for many off-meta picks. Eat your heart out, Junkrat mains.

My first analysis used linear regression to predict the kill difference (Attack Kills-Defense Kills) at the end of fight based on each ultimate used. Here are the initial results:

Ultimate User

Kill Rating

Confidence Interval (95%)

Sample Size

Attack Soldier

1.53

(1.10 to 1.97)

282

Attack Reinhardt

1.51

(1.24 to 1.78)

1006

Attack Genji

1.42

(1.01 to 1.84)

327

Defense Zarya

1.42

(1.09 to 1.75)

632

Attack Zarya

1.35

(1.03 to 1.67)

601

Attack Lucio

1.32

(1.03 to 1.61)

786

Attack Ana

1.27

(1.02 to 1.52)

1082

Attack Roadhog

1.06

(0.73 to 1.39)

552

Attack Tracer

1.02

(0.69 to 1.36)

493

Defense Reinhardt

-1.01

(0.76 to 1.26)

1099

Defense Ana

-1.01

(0.77 to 1.25)

1322

Defense Soldier

-0.98

(0.65 to 1.31)

541

Defense Lucio

-0.91

(0.61 to 1.22)

689

Defense Roadhog

-0.38

(0.06 to 0.71)

557

Defense Tracer

-0.17

(-0.22 to 0.57)

321

Figure 1

Ultimates from the attacking side are overvalued across the board using linear regression. Earthshatter doesn’t get 50% better just from being on attack. To go from this initial analysis to the final one, we need to determine why attack ultimates are overvalued. Let’s take a look at how the kill difference distributions change depending on which team wins the fight: Kill Distribution Graphs.

picks

Figure 2

The analysis reveals that when attacking teams win fights on hybrid and escort maps they have a larger kill difference on average (4.03 to 2.89). This makes sense: when defense gets a single pick or more, the attacking team is incentivized to fall back, regroup, and not allow the defending team to charge their ultimates. In contrast, when the attacking team gets a man advantage, the defending team needs to slow the progress of the payload and does so by throwing their bodies at the payload and stalling for time as long as possible. The desire to stop the payload leads to more deaths per lost fight on defense and this asymmetry explains why the initial linear regression coefficients above are so attack biased.

Before moving any further, we need to examine “neutral” fights where both teams use zero ultimates to measure and account for any inherent imbalance between attack and defense. In this context, the team that gets more kills will be considered the fight winner. If both teams get the same number of kills, it will be a tie. This will allow us to contrast this with how each side’s mean win rate change as ultimates are used:

picks

Figure 3

Clearly, there’s an imbalance here: defending teams win many more neutral fights. Fortunately for attacking teams, the balance of power begins to shift in their favor as more ultimates are used by both teams. The marginal value graph shows it doesn’t make sense to use three more ultimates than the other team, as the marginal value is diminished in those cases. Ideally, attacking teams should seek to use one or two more ultimates than the defending team to maximize value, while defending teams should seek to use exactly one more ultimate than attacking teams and two more if necessary.

However, the above analysis assumes all ultimates are worth the same amount, which isn’t accurate. Now, I will use logistic regression to determine how ultimate usage effects the likelihood of both teams to win a teamfight. This method will give us coefficients for each ultimate; yielding a formula which predicts how often attack or defense will win based on the ultimates used. This formula correctly predicts 68.2% of fights, ignoring ties and fights where zero ultimates are used. Interestingly, there is a statistically significant opportunity cost associated with using zero ultimates in any given fight, and it is larger for the attacking team. This is likely due to the defense typically having a superior position before the fight begins.

To remove opportunity cost from the calculation, this analysis is conducted calculating the difference of two ultimates versus one rather than one versus zero. E.g. in order to measure Dragonblade’s worth, I found the expected attacking team’s win percentage when using Dragonblade and Earthshatter vs. a defending team just using Earthshatter (57.7%). Then I subtracted this percentage from the expected attack win percentage in a fight where both teams use just Earthshatter (43.5%) to find the value of Dragonblade (14.2%). The big five ultimates (Tactical Visor, Earthshatter, Graviton Surge, Nanoboost, and Sound Barrier) were used as references because they have the largest sample size, and the differences were averaged. To figure out how many kills each ultimate is worth linear regression is used to find the slope of a line of best fit correlating expected attack win% with actual net kills (12.8% is worth 1 net kill).

Attack

Attack Win% Diff

Net Kill Rating

Confidence Interval (95%)

Sample Size

DragonBlade

14.2%

1.11

(0.65 to 1.57)

325

Tactical Visor

16.9%

1.32

(0.82 to 1.82)

282

Pulse Bomb

5.3%

0.41

(0.04 to 0.78)

481

Earthshatter

14.2%

1.11

(0.91 to 1.31)

995

Whole Hog

10.3%

0.80

(0.46 to 1.14)

547

Graviton Surge

19.7%

1.54

(1.22 to 1.86)

599

NanoBoost

14.2%

1.11

(0.92 to 1.30)

1080

Sound Barrier

15.7%

1.22

(0.96 to 1.48)

785

Defense

Attack Win% Diff

Net Kill Rating

Confidence Interval (95%)

Sample Size

Tactical Visor

-15.9%

-1.24

(-1.59 to -1.24)

540

Earthshatter

-11.6%

-0.90

(-1.08 to -0.72)

1086

Graviton Surge

-18.0%

-1.40

(-1.71 to -1.09)

631

NanoBoost

-13.0%

-1.01

(-1.15 to -0.87)

1318

Sound Barrier

-12.6%

-0.98

(-1.27 to -0.69)

687

Figure 4

There’s still some difference between attack and defense, but the values produced this way are much more reasonable. I will now do a weighted average to estimate how much each ultimate is worth on average in a per use basis.

Ultimate

Net Kill Rating

Winrate Added

New Attack Winrate

Relative Winrate

Graviton Surge

+1.47

18.8%

63.6%

+42.0%

Tactical Visor

+1.27

16.2%

61.1%

+36.3%

Sound Barrier

+1.11

14.2%

59.0%

+31.7%

Dragonblade*

+1.11

14.2%

59.0%

+31.7%

Nanoboost

+1.06

13.6%

58.4%

+30.3%

Earthshatter

+1.00

12.8%

57.7%

+28.7%

Whole Hog*

+0.80

10.3%

55.1%

+22.9%

Pulse Bomb*

+0.41

5.3%

50.1%

+11.7%

Even Fight

+0.00

44.8%

Figure 5

*Contains data from only attacking teams. Despite a large enough sample, Whole Hog and Pulse Bomb from defending teams were found to have too high a p-value to include.

“Net Kill Rating” measures the average impact using an ultimate has on the kill difference between the two teams at the end of the fight. Sorry Flame, but it looks like Earthshatter isn’t the best ultimate in the game after all. “Winrate Added” is the absolute percentage increase an ultimate has on their team’s chance of winning a team fight. For example, using Graviton Surge increases Zarya’s chances of winning by 18.8% on average (from 44.8% to 63.6%). “Relative Winrate” takes the new 63.6% and divides by the original 44.8% to show Zarya’s ultimate will allow you to win a fight 42% more often than fighting without it.

It’s important to note that context does change an ultimate’s value, as noted in Figure 3. The net kill ratings in Figure 5 maintain a high accuracy when both teams use at least one ultimate, but do not apply to fights where one team saves all their ultimates.

Looking at numbers like these can give us insight into how freely players should use certain ultimates. Given Eartshatter’s (+1.00) rating, using it to secure a solo kill is an average use of the ability, and using Graviton Surge to secure a double kill is worth it despite how long it takes to charge.

Zarya’s ultimate being the best on a per use basis is not surprising given the number of team wipes it creates. There’s still one big difference that Figure 5 doesn’t take into account: Ultimates have different charge rates. To account for charge rates, I calculated the average time between ultimate uses (average time to charge + average time held) using a weighted average by game type (our data is 62.1% hybrid and 37.9% escort). I divided this time between uses by the average length of a single attacking phase (393s) to determine how many ultimates each hero gets on average in half of a hybrid/escort map. Then we can multiply this by each ultimate’s net kill rating to calculate how many net kills each ultimate creates on average over a single attacking phase.

Ultimate

Time between Uses (s)

Ults Per Phase

Net Kills Per Ult

Net Kills Per Phase

Tier List

Tactical Visor

125

3.13

+1.27

+3.97

A

Graviton Surge

149

2.64

+1.47

+3.87

A

Nanoboost

114

3.46

+1.06

+3.66

A

Dragonblade

124

3.16

+1.11

+3.51

A

Earthshatter

132

2.99

+1.00

+2.99

B

Sound Barrier

162

2.43

+1.11

+2.70

B

Whole Hog

149

2.63

+0.80

+2.11

C

Pulse Bomb

91

4.30

+0.41

+1.76

C

Figure 6

Interestingly, Soldier: 76’s Tactical Visor is statistically the most efficient ultimate over the course of a map. Evidently even pros benefit from being able to say, “Look Ma, no aim!”

I created a tier list for ultimates using this metric because everyone loves tier lists: A Tier (+3.50 to +4.50), B Tier (+2.50 to +3.50), and C Tier (+1.50 to +2.50). More ultimates will be added to the list as more matches are played and more data is made available. Winston, D.va, Pharah, and Zenyatta should be added next. In my next article I will repeat this analysis for Control maps and attempt to segregate for data from Season 4 if sample size permits.

Disclaimer: Many Lucios were killed in the making of this article.

Steve Cafmeyer

Steve Cafmeyer is currently head coach for Team Reakt Overwatch and has done analysis work for Team Reunited in the past. He founded his own website, LCSPredict.com, dedicated to predicting match outcomes and providing power ratings for professional League of Legends. He enjoys spectating quality Overwatch and analyzing data in order to create competitive advantages. His peak SR is 3950. One day he will close Excel long enough to get GM.