Data and Dice

Cartwright · July 29, 2025, 2:46pm

To build out list analysis, I’m attempting to create a general set of rules that categorize armies into archetypes. I previously used k-means clustering, but that approach isn’t very intuitive—and it’s hard to explain why a list ends up in a certain group. So I’m switching to a rules-based system instead.

Here’s the idea. For each tournament, I scale every list to a 2300-point baseline so they’re easier to compare. Then I look at a few key stats: expected melee damage, ranged attacks, average speed, total units, unit strength, and a few others. Based on how those numbers stack up—especially compared to percentile cutoffs—I assign one of several archetypes:

Current Archetype Rules

Alpha Strike: Fast and dangerous. Either above the 75th percentile both for speed and expected damage, or above the 90th percentile of speed.
Gun Line: Above 50 ranged attacks
Trash: Swarms of cheap units—either high unit count (16) and US (27) , or just extreme on one of them (17, 28)
Grind: Low offense, but takes a ton of shots to remove (defensive lists) (shots to six nerve above 395)
Mixed Arms (Shambling-heavy): Moderate shooting (at least 19 shots) and at least two Shambling units
Mixed Arms: Moderate shooting (at least 19 shots) but more flexible overall.
Balanced: Anything that doesn’t fit neatly above.

Behind the Scenes

Stats are scaled to 2300 points.
Thresholds are based on percentiles from a large dataset (e.g., top 25% for speed = Alpha).
All this is handled in a script (generate_dataset_for_tourney_comparison.py), which adds the archetype to each list.

I’ve hardcoded the current thresholds based on past events, but the plan is to update them over time as more data rolls in.

What changes would you make to capture the list archetypes better in a rules-based system?