In Part I, we talked about the criteria we wanted to satisfy to ensure that a metric was good, and briefly assessed the results of our beta test of the new version of TMI. The conclusion I came to after that testing was that, in short, it needed more work.
I don’t know that it’s entirely true to say that I went “back to the drawing board,” so much as I went back to my slew of equations and mulled over what I could tweak in them to fix the problems. To recap, the formula I was using was:
$$\large {\rm Beta\_TMI} = c_1 \ln \left [ 1 + \frac{c_2}{N} \sum_{i=1}^N e^{F(MA_i-1)} \right ],$$
with $F=10$, $c_1=500$ and $c_2=e^{10}$.
One of the problems I was running into was one of conflicting constraints. If you look back at the last blog post, you’ll see that constraint #6 was that the numbers had to stay reasonable. Mentally, I had converted this constraint to be “should have a fixed range of a few thousand,” possibly up to 10 or 20 thousand at a maximum. So I was rigidly trying to keep the score down around a few thousand.
But the obvious solution to the stat weight problem was to increase $c_1$, which increases the slope of the graph. That makes a small change in spike size a more significant change in TMI, and gives you larger stat weights. Multiply $c_1$ by ten, and your stat weights all get multiplied by 10. Seems simple enough.
Except that in the beta test, I got data with TMIs ranging from a few hundred to over 12 thousand. So if I multiply by ten, I’m looking at TMIs ranging from around a thousand to over 120 thousand, which is a much larger range. And a factor of ten still wouldn’t have fixed everything thanks to the “knee” in the graph, because if your TMI was on the really low end you could still get garbage stat weights.
It felt like the two constraints were at odds with one another. And both at odds with a third, somewhat self-imposed constraint, which is that I wanted to keep the zero-bounding effect that the “1+” in the brackets produced. Because without that, the score could go negative, which is odd. After all, what does it mean when your arbitrary FICO-like metric goes negative? Which just led back to more fussing over the fact that I was still pretty light on “meaning” in this metric to begin with.
It was a conversation with a colleague that led me to the solution. While discussing the stat weight issues, and how I could tweak the equation to fix them, he mentioned that he would rather have a metric with large numbers that had an obvious meaning than a nicely-constrained metric that didn’t. We were talking in terms of percentages of health, and it was only at that point that the answer hit me. Within a day of that conversation, I made all of the changes I needed to give TMI a meaning.
Asking The Right Question
As is often the case, the answer had been staring me in the face the entire time. I’ve been looking at this graph (in various different incarnations, with various different constants) for the last few months:

Simulated TMI data using the Beta_TMI formula. Red is the uniform damage case, blue is the single-spike case, and green is pseudo-random combat data.
What that conversation led me to realize was that I was asking the wrong question. I was trying to figure out what combination of constants I needed to keep the numbers “reasonable.” But my definition of “reasonable” was vague and arbitrary. So it’s no surprise that what I was getting out was also… vague and arbitrary.
What I should have been doing was trying come up with a score that does a better job of communicating to the user how big those spikes were. Because that, by definition, would be “reasonable” no matter what size the numbers were.
In other words, the question I should have been asking was “how can I tweak this equation so that the number it spits out has a simple and intuitive relationship to the spike size, expressed in a scale that the user can not only easily understand, but easily remember?”
And the answer, which was clear after that conversation, was to use percent health.
To illustrate, let’s flip that graph around it’s diagonal, such that instead of plotting TMI vs. $MA_{\rm max}$, we were plotting $MA_{\rm max}$ vs. TMI.
At a given TMI value, the $MA_{\rm max}$ values we get from the random combat simulation always fall below the blue single-spike line. In other words, at a TMI of X, you can confidently say that the maximum spike you will take is of size Y. It could be smaller, of course – you could take a few spikes that are a little smaller than Y and get the same score. But you can be absolutely sure it isn’t above Y.
So we just need to find a way to make the relationship between X and Y obvious, such that someone can look at a TMI of e.g. 20k and immediately know how large of a damage spike that is, as a percentage of their health.
We could use a one-to-one relationship, such that a TMI of 100 meant you were taking spikes that were 100% of your health. That would correspond to a slope of 100, or a $c_1$ of 10. But that would give us even smaller stat weights, which is a problem. We could literally end up with a plot in Simulationcraft where every single one of your stat weights was 0.00.
It would be nice to keep using factors of ten. Bumping it up to a slope of 1000 doesn’t work. That’s a $c_1$ of 100, which is still smaller than what we used in Beta_TMI. A slope of 10000, or a $c_1$ of 1000, is only a factor of two improvement over Beta_TMI, so our stat weights will still be sloppy.
But a slope of 100k… that might just work. A TMI of 100k would mean that your maximum spikes were around 100% of your health. If your TMI went up to 120k, you’d immediately know that the spikes are now about 120% of your health. Easy. Intuitive. Now we’re getting somewhere. The stat weights would also be 20x as large as they were for Beta_TMI, ensuring that we would get good unnormalized weights even with two decimal places of precision.
So, assuming we’re happy with that, it locks down our $c_1$ at $10^4$, so that every percentage of health corresponds to 1k TMI. Now we just have to look at the formula and figure out what else, if anything, needs to be changed.
Narrowing the Field
The very first thing I did after coming to this realization is toss out the “1+” in the formula. While I liked zero-bounding when we were treating this metric like a FICO score, it suddenly has no relevance if the metric has a distinct and clear meaning. Removing it allows for negative TMI values, but those negative values actually mean something now! If you end up with a TMI of -10k, it means that you were out-healing your damage intake by so much that the largest “spike” you ever took was smaller than your incoming healing in that time window. It also tells you exactly how much smaller: 10% of your health. While it’s not a situation we’ll run into that often, I suspect, it actually has meaning. There’s no sense obscuring that information with zero-bounding.
Which just leaves the question of what to do with $c_2$. Let’s look at the equation after removing the “+1″:
$$\large {\rm TMI} = c_1 \ln \left [ \frac{c_2}{N} \sum_{i=1}^N e^{F(MA_i-1)} \right ] $$
If we make the single-spike approximation, i.e. that we can replace the sum with a single $e^{F(MA_{\rm max}-1)}$, we get:
$$\large \begin{align} {\rm TMI_{SS}} &= c_1 (\ln c_2 – \ln N) + c_1 F (MA_{\rm max} – 1) \\&~\\ &= c_1 F MA_{\rm max} + c_1 ( \ln c_2 – \ln N – F ) \end{align}$$
just as before. Now that we’ve removed the “1+” from the formula, the single-spike approximation isn’t limited to large spikes anymore, so this is valid for any value of $\large MA_{\rm max}.$
Remember that in our single-spike approximation, $c_2$ controlled the y-intercept of the plot. And now that this y-intercept isn’t being artificially modified by zero-bounding, it actually has some meaning. It’s the value of $MA_{\rm max}$ at which our TMI is zero.
And given our convention that X*1000 TMI is a spike that’s X% of our health, a TMI of zero should mean that we take spikes that are 0% of our health. In other words, this should happen at $MA_{\rm max}=0$. So we want our y-intercept to be zero, or
$$\large c_1 ( \ln c_2 – \ln N – F ) = 0 .$$
Since $c_1$ can’t be zero, there’s only one way to accomplish this: $c_2 = N e^F.$ I was already using $e^F$ for $c_2$ in Beta_TMI, so this wasn’t totally unexpected. In fact, I figured out quite a while ago that the choice of $e^F$ for $c_2$ was equivalent to simplifying the term inside the sum:
$$\large \frac{e^F}{N}\sum_{i=1}^N e^{F(MA_i-1)} = \frac{1}{N}\sum_{i=1}^N e^{F\cdot MA_i}.$$
Defining $c_2=Ne^F$ would also eliminate the $1/N$ factor in front of the sum. However, there’s a problem here: I don’t want to eliminate it. That $1/N$ is serving an important purpose: normalizing the metric for fight length. For example, let’s consider two simulations, one being three minutes long and the other five minutes long. We’ll assume the boss is identical in both cases, so the magnitude and frequency of spikes are identical. In theory, the metric should give you nearly identical results for both, because the amount of danger is identical. A fight that’s twice as long should have roughly twice as many large spikes, but they’re spread over twice as much time.
But a longer fight will have more terms in the sum for a particular bin size, and a shorter fight will have fewer terms. So the sum will be approximately twice as large for the longer fight. The $1/N$ cancels that effect because $N$ would also be twice as large. If we get rid of that $1/N$, then the longer fight will seem significantly more dangerous than the shorter one. In other words, it would cause the metric to vary significantly with fight length, which isn’t good.
So I decided to define $c_2$ slightly differently. Rather than $Ne^F$, I chose to use $N_0e^F$, where $N_0$ is a default fight length. This means that we’re normalizing the fight length to $N_0$ rather than eliminating the dependence entirely, which should mean much smaller fluctuations in the metric across a large range of fight lengths. Since the default fight length in SimC is 450 seconds, that seemed like an obvious choice for $N_0$.
To illustrate that graphically, I fired up Visual Studio and coded the new metric into Simulationcraft, with and without the normalization. I then ran a character through for fight lengths ranging from 100s to 600s. Here are the results:

Comparison of normalized ($N_0/N$) and unnormalized versions of the TMI metric. Vertical axis is in thousands.
The difference is pretty clear. The version where $c_2=Ne^F$ varies from a little under 65k TMI to around 86k TMI. The normalized version where $c_2 = N_0e^F=450e^F$ varies much less, from about 80k to a little over 83k, and most of that variation happening for fights that are shorter than four minutes long (i.e. not that common). This version is stable enough that it should work well for combat log analysis sites, where we’d expect a wide variety of encounter lengths.
There was one final change I felt I should make, and it’s not to the formula per se, it’s to the definition of $MA$. If you recall from the last post, we defined it as follows:
$$\large MA_i = \frac{T_0}{T}\sum_{j=1}^{T / dt} D_{i+j-1} / H.$$
This definition normalizes for two things: player health (by dividing by $H$), and window size (by multiplying by $T_0$). The latter is the part I wanted to change.
The reason we originally multiplied by $T_0/T$ was to allow the user to specify a shorter time window $T$ over which to calculate spikes, for example in cases where you were getting a large heal every 5 second, but were fighting a boss who could kill you in 3 or 4 seconds in-between those heals. This normalization meant that it calculated the moving average over $T$-second intervals, but always scaled the total damage up to what it would be if that damage intake rate were sustained for $T_0$ seconds. Doing this kept the metric from varying significantly with window size, as we discussed last year.
But that particular normalization doesn’t make sense anymore now that the metric is representing a real quantity. If my TMI is a direct reflection of spike size, then I’d expect it to go up or down fairly significantly as I change the window size. If I take X damage in a 6-second time window, but only X/2 damage in a 3-second time window, then I want my TMI to drop by a factor of 2 when I drop the window size from 6 seconds to 3 seconds as well.
In other words, I want TMI to accurately reflect what percentage of my health I lose in the window I’m considering. If I want to analyze a 3-second window, then I want to know what percentage of my health the boss can take off in that 3 seconds, not how much he would take off if he had 6 seconds.
So we’re entirely eliminating the time-window normalization in the definition of $MA_i$. That seems to match people’s intuition for how the time-window control should work anyway (this topic has come up before, including in the comments of the Crowdsourcing TMI post), so it’s a win on multiple fronts.
Bringing it all Together
Now, we have all the pieces we need to construct a formal definition for TMI v2.0. I’ll update the TMI Standard Reference Document with the rigorous details, but since we’ve already discussed many of them, I’m only going to summarize it here. Assume we start with an array $D$ containing the damage we take in every time bin of size $dt$, and the player has health $H$.
The moving average array is now defined as
$$\large MA_i = \frac{1}{H}\sum_{j=1}^{T / dt} D_{i+j-1}.$$
In other words, it’s the array in which each element is the $T$-second moving sum of damage taken, normalized to player health $H$.
We then take this array and use it to calculate TMI as follows:
$$\large {\rm TMI} = 10^4 \ln \left [ \frac{N_0}{N}\sum_{i=1}^N e^{10 MA_i} \right ] ,$$
where $N$ is the length of the $MA$ array, or equivalently the fight length divided by $dt$, and $N_0=450/dt$ is the “default” array size corresponding to a fight length of 450 seconds.
But Does It Work?
To illustrate how this works, let’s look at some examples using Simulationcraft. I coded the new formula into my local copy and ran some tests. Here are two reports, both against the T16H25 boss, using my own character and the T16H Protection Warrior profile:
The very first thing I looked at was the stat weights:
Much, much better. This was with 25k iterations, but even 10k iterations gave us reasonable (if noisy) stat weights. The error bars here are all pretty reasonable, and it wouldn’t be hard to increase the precision by bumping it up to 50k iterations if we wanted to. The warrior profile’s stat weights are similarly high-precision.
We could also look at the TMI distribution:
Again, much nicer looking than before. We’re still getting a bit of skew here, but that mostly has to do with being slightly overgeared for the boss definition. The warrior profile exhibits even stronger skew, but tests run with characters of lower gear levels (and thus higher average TMI values) show very little skew.
I also wanted to see exactly how well the TMI value reflected maximum spike size, and what (if any) difference there was. So you may have noticed that I’ve enhanced the tanking section of the SimC report a little bit by adding some new columns:
In short, SimC now also records the “Maximum Spike Damage,” or MSD, for each iteration and calculates the maximum, minimum, and mean MSD value. It reports this information in units of “percentage of player health” right alongside the DTPS and TMI information that you’re used to getting. Lest the multiple “max” modifiers be confusing: the MSD for one iteration is the biggest spike you take that iteration, and the “MSD Max” is the largest spike you take out of all iterations.
You may be wondering, at this point, if this isn’t all superfluous. If I can code SimC to report the biggest spike, why wouldn’t we want to use that directly? What does TMI add that we can’t get from MSD?
The answer is continuity. MSD uses a max() function to isolate the absolute biggest spike in each iteration. Which is fine, but often misleading. For example, let’s consider two different tanks, one of which takes a single spike that’s 90% of their health, and another that takes one 90% spike and three or four 89% spikes. Assume nothing else in the encounter is remotely threatening them. Their MSD values will be identical, because it ignores all but the largest spike. But it’s clear that the second tank is in more danger, because he’s taking a large spike more frequently, and the TMI value will accurately reflect that.
That continuity also translates into generating better and more reliable stat weights. A stat that reduces the frequency of 90% spikes without eliminating them would be given a garbage stat weight if we tried to scale over MSD, because MSD doesn’t retain any information about frequency. However, we know that stats like hit and expertise are strong partly because they reduce spike frequency. TMI reflects that accurately while MSD simply can’t.
MSD is still useful though, in that having both TMI and MSD gives us additional information about our spike patterns. It also gives us a convenient way to compare the two to see how TMI works.
First, take a look at the TMI Max and MSD Max values. You’ll notice they mimic each other pretty well: MSD Max is 150.3%, TMI Max is 151.7k. This makes sense for the extreme case because that’s when all the planets align to create your worst-case scenario, which is rare. It won’t happen multiple times per fight, so it’s a situation where you have one giant spike that dominates the score, much like our single-spike approximation. And in that approximation, TMI is roughly equal to the largest spike size, just like it should be.
Comparing the mean TMI value (just “TMI” on the table) to the MSD mean shows a little bit of a gap: MSD Mean is 69.5%, TMI mean is 82.8k. The TMI is about 13k above where you’d expect it to be based on the single-spike model. That’s because of spike frequency. You wouldn’t normally expect to take one giant spike in an encounter and nothing else; the more common case is to take several spikes of similar magnitude over that 450 seconds. If we’re taking 3-4 of those spikes, then that’s going to raise the TMI value a little bit compared to the situation where we only take one. That’s exactly what’s happening here.
Mathematically, if we take $n$ spikes, we expect the TMI to be $\ln(n)$ times as large as the single-spike case. In this simulation, the TMI is about 1.2 times larger, meaning that $n\approx 3.3.$ In other words, on average we’re taking about 3.3 spikes every 450 seconds, each of which is about 69.5% of our health. That’s pretty useful information – in fact, I may add it to the table in the future if people would like SimC to calculate it for them.
You can see that the gap grows considerably for the minimum TMI and MSD values. The MSD Min is only about 31% while the minimum TMI is ~66k. Again, this comes down to frequency. Large spikes tend to be infrequent due to statistics, as they require a failure to avoid any one of multiple attacks. But as we eliminate those (either by gearing, or in this case, by lucky RNG on one iteration) we’re left with smaller, more frequent spikes. In the extreme limit, you could imagine a scenario where you alternated between taking a full hit and avoiding every second attack, in which case you’d have loads of really tiny spikes. So what we’re seeing at this end of the distribution is that we’re taking about $n=8.4$ small spikes in the low-TMI iterations.
This behavior also has a more subtle, but rather important meaning. TMI is really good at prioritizing large spikes and giving you stat weights that preferentially eliminate them. Once you eliminate those spikes, it automatically shifts to prioritizing the next-biggest spikes, and so on. If you smooth your damage intake sufficiently that you’re taking a lot of moderately-sized spikes, it naturally tries to reduce the frequency of those spikes. In other words, if you’ve successfully eliminated the danger of isolated spikes, it automatically starts optimizing you for DTPS. So it seamlessly fuses spike mitigation and DTPS into a metric that shifts the goalposts based on your biggest concern, as determined by the combat data.
A lot of those ideas can be seen graphically, as well. Here’s a plot showing data generated with my own character pitted against the T16H25 boss. We’re plotting MSD (which I was originally calling “Max Moving Average”) against the reported TMI score. To generate this plot, I used a variety of window sizes. At each window size, I recorded the minimum, mean, and maximum TMI and MSD values. The dotted line is the expected relationship, i.e. 100k TMI = 100% max health.
Generally speaking, as we increase or decrease the window size, the MSD and TMI should similarly increase or decrease. That’s certainly happening for the maximum MSD and TMI values, which should be expected. And in that limit, we see that TMI and MSD mostly agree and lie close to the dotted line.
However, the mean values show a much smaller spread, and the minimum values show almost no spread. It turns out that this is the fault of EF’s crazy scaling. A paladin in this level of gear is basically self-sufficient against the T16H25 boss, so changing the window size doesn’t have a large effect unless we consider the most extreme cases. If we’re out-healing the boss, then a longer window won’t cause a noticeable increase in damage intake or spike size. At the very low end, where the minimum TMI & MSD values show up, we’re basically plotting window-edge effects.
The results look a lot cleaner if we consider a player that’s undergeared for the boss (and of a class that doesn’t have a strong self-healing mechanic, like a warrior):
This is one of the warriors who submitted multiple data sets for the beta test. He’s got an average ilvl of 517, which is well below what would be needed to comfortably survive the 25H boss. As a result, his TMI values are fairly high, with even the smallest values being over 200k. As you can see, though, all of the values cluster nicely around the equivalence line, meaning that the TMI value is a very good representation of his expected spike size. Also note that the colors are more evenly distributed on this plot. That’s because the window size adjustment is working properly here. The lowest values are from simulations with a window size of 2 seconds, while the largest ones are using a window size of 10 seconds. And the data is pretty linear: double the window size, and you double the MSD and TMI.
Report Card
So this final version of the metric seems to be hitting all the right notes. Let’s get our checklist out and grade it on each of the criteria we set out to satisfy.
-
Accurately representing danger: Pass. There’s really no difference between this version and the beta version in this category. If anything, this may be a bit better since it no longer has the “knee” obfuscating danger for smaller spikes.
-
Work seamlessly: Pass. Apart from coding the metric into SimC, it took no additional tweaks to get it to work properly with the default plotting and analysis tools.
-
Generate useful stat weights: Pass. The stat weights are being generated properly and to sufficient precision to identify differences between the stats, without having to normalize. It will generate useful stat weights even in low-damage regimes thanks to the removal of the “knee,” and it automatically adapts to generate DTPS-like results when you’ve done all you can for smoothing. Massive improvement in this category.
-
Useful statistics: Pass. Again, not much difference between this version and Beta_TMI, at least in this category.
-
Easily interpreted: Pass. This is the most important improvement. If I get a TMI score of 80k, I immediately know that I’m in danger of taking spikes that are up to 80% of my health. I don’t need to do any mental math to figure it out, just replace a “k” with a “%” and I’m there. No need to look back to a blog post or remember a funny conversion factor. As long as I know what TMI is, I know what it means.
-
Numbers should be reasonable: Pass. While the numbers aren’t technically small, I think it’s fair to say that they’re reasonable. After Mists, everyone is comfortable working in thousands (“I do 400k DPS and have 500k health”), so I don’t think the nomenclature will be confusing. The biggest issue with the original TMI was that it varied wildly by orders of magnitude due to small changes, which can’t happen in this new form. Going from 75k to 125k has a clear and obvious meaning, and won’t throw anyone for a loop, unlike going from 75k to 18.3M (an equivalent change in Old_TMI).
I’ll admit that I may be a little biased when it comes to grading my own metric, but I don’t think you can argue that I’m being unfairly kind in any of these categories. I set up clear expectations for what I wanted in each category, and made sure the metric met them. If it hadn’t, you probably wouldn’t be reading about it, because I’d have tossed it like Beta_TMI and continued working on it until I found a version that did.
But keep in mind that this doesn’t mean the metric is flawless. It just means that we haven’t discovered what (if any) its flaws are yet. As the logging sites get on-board with the new metric and implement it, we’ll be able to look for differences between real-world performance and Simulationcraft results and identify the causes. And if we do find problems, we’ll adjust it as necessary to fix them.
Looking Forward
It shouldn’t be much of a surprise that I’m very happy with TMI 2.0. It finally has a solid meaning, and will be far simpler to explain to players discovering it for the first time. It’s a vast improvement over the original version of the metric in so many ways that it’s hard to even compare the two.
And by giving the metric a clear meaning, we’ve opened up a number of new possible applications. For example, let’s say you sim your character and get a TMI of 85k. You and your healers now know they need to be prepared for you to take a spike that’s around 85% of your health at any given moment. Which leads directly into the question, “how much healing do I need to ensure survival?”
If your healer is a druid, you might consider how many Rejuvenation ticks you can rely on in a 6-second window and how much healing that will be. If it’s 20% of your health, then you (and your healer!) immediately have an estimate of how much on-demand healer throughput you’ll need to keep you safe. Or if you have multiple HoTs, and they sum up to about 50% of your health in that time window, your healers know that as long as they keep you HoT-ted up, they can spend their GCDs elsewhere and just spot-heal you when you hit 50% health.
In other words, TMI may be a tanking metric, but it’s got the potential to have a meaning for (and be useful to) your healers as well.
Extend this idea even further: TMI was originally defined as only including self-healing effects, not external heals. The new definition can be much looser, because it still has a meaning if you include external heals. Adding a healer to your simulation may reduce your TMI, but the end result is still meaningful because it tells you how large a spike you took with a healer focusing on you.
Likewise, a combat logging site might report your regular TMI and an “ETMI” or Effective TMI, which includes outside healing. And that ETMI would tell you something slightly different – what was the biggest spike you took and survived (or not!) on that pull. If your ETMI is less than 50k you’re never really in much danger. If your ETMI is pushing 90k or 100k (and you didn’t die), it means you’re getting awfully close to dying at least a few times in that encounter, which may warrant some investigation. You could then analyze your own logs and your healers’ logs to figure out why that’s happening and determine ways to improve it.
I’m really excited to see where this goes over the next few months. For now, though, I’m going to focus on getting the foundations in place. I’ve already coded the new metric into Simulationcraft, so as of the next release (547-3) all TMI calculations will use the new formula.
I also plan on working with both WarcraftLogs and AskMrRobot, both of whom have expressed an interest in implementing TMI, to get it up and running on their logging sites. And I’ll be updating the standard reference document shortly with a rigorous definition of the standard to facilitate that.