Data and Dice

Cartwright · July 29, 2025, 2:46pm

To build out list analysis, I’m attempting to create a general set of rules that categorize armies into archetypes. I previously used k-means clustering, but that approach isn’t very intuitive—and it’s hard to explain why a list ends up in a certain group. So I’m switching to a rules-based system instead.

Here’s the idea. For each tournament, I scale every list to a 2300-point baseline so they’re easier to compare. Then I look at a few key stats: expected melee damage, ranged attacks, average speed, total units, unit strength, and a few others. Based on how those numbers stack up—especially compared to percentile cutoffs—I assign one of several archetypes:

Current Archetype Rules

Alpha Strike: Fast and dangerous. Either above the 75th percentile both for speed and expected damage, or above the 90th percentile of speed.
Gun Line: Above 50 ranged attacks
Trash: Swarms of cheap units—either high unit count (16) and US (27) , or just extreme on one of them (17, 28)
Grind: Low offense, but takes a ton of shots to remove (defensive lists) (shots to six nerve above 395)
Mixed Arms (Shambling-heavy): Moderate shooting (at least 19 shots) and at least two Shambling units
Mixed Arms: Moderate shooting (at least 19 shots) but more flexible overall.
Balanced: Anything that doesn’t fit neatly above.

Behind the Scenes

Stats are scaled to 2300 points.
Thresholds are based on percentiles from a large dataset (e.g., top 25% for speed = Alpha).
All this is handled in a script (generate_dataset_for_tourney_comparison.py), which adds the archetype to each list.

I’ve hardcoded the current thresholds based on past events, but the plan is to update them over time as more data rolls in.

What changes would you make to capture the list archetypes better in a rules-based system?

Boss_Salvage · July 29, 2025, 6:43pm

Big same! I think it shows that alpha strike has faded pretty substantially in the face of NS and Dwarf hammer-and-anvil lists built for scenario and responsiveness.

We’ve seen some RFO lists relying heavily on tooled up fight wagon legions and helstriker hordes in the last couple years, however the competitive set seem to have given up on that concept (and/or those players didn’t qualify or want to travel to Reno this year). See above on alpha strike re: helstrikers being put aside.

Sceleris · July 29, 2025, 7:33pm

Certainly the alpha GL list and the shooty/alpha SK one does struggle against dwarfs/NS trash type.

Possibly the GL/SK thing from my pov is slightly tilted, since a couple of really good players use them and do very well - punching more than the lists themselves are good?

The individual region breakdown is interesting - obviously given the geographical size of the US, players will get used to running across generally a similar group, so meta/styles develop. Then you either follow, look to run counter meta or just do your own thing?

Thing the army type breakdown is pretty sensible.

The shot count (does that include magic?) at the US masters was really gradual until you hit about the top 10. Is that something quite common with other events?

Is anything like effective shot range doable - since there is a lot of difference between AD with decimators, flame throwers and fireballs; dwarfs with xbows, sharpshooters and cannons; a mobile shooting list like NA with frostclaws, bolt throwers and 18" steady aim

Cartwright · July 29, 2025, 9:07pm

Yep, for simplicity takes the highest number of offensive magic shots (so lb4 and fireball 8 becomes 8 shots). It’s possible to do effective shot range, but it gets super messy because of edge cases (artillery with multiple shot options). I always try to strike a balance between presenting a lot of data and yet making it accessible. I worry that’s so much detail even the data diehard will zone out

Cartwright · July 31, 2025, 9:49pm

Alright, one more US Masters data dump, this time looking at units: