username | status | games |
---|---|---|
GOBBluth89 | ever_owned | 135 |
GOBBluth89 | own | 122 |
Predicting Board Game Collections
GOBBluth89’s Collection
About
This report details the results of training and evaluating a classification model for predicting games for a user’s boardgame collection.
To view games predicted by the model, go to Section 5.
Collection
The data in this project comes from BoardGameGeek.com. The data used is at the game level, where an individual observation contains features about a game, such as its publisher, categories, and playing time, among many others.
I train a classification model at the user level to learn the relationship between game features and games that a user owns - what predicts a user’s collection?
I evaluate the model’s performance on a training set of historical games via resampling, then validate the model’s performance on a set aside set of newer relases. I then refit the model on the training and validation in order and predict upcoming releases in order to find new games that the user is most likely to add to their collection.
username | years | type |
Own
|
|
---|---|---|---|---|
no | yes | |||
GOBBluth89 | -3500-2021 | train | 26263 | 99 |
GOBBluth89 | 2022-2023 | valid | 10288 | 20 |
GOBBluth89 | 2024-2028 | test | 8590 | 3 |
Types of Games
What types of game does the user own? The following plot displays the most frequent publishers, mechanics, designers, artists, etc that appear in a user’s collection.
Show the code
|>
collection filter(own == 1) |>
collection_by_category(
games = games_raw
|>
) plot_collection_by_category() +
ylab("feature")
The following plot shows the years in which games in the user’s collection were published. This can usually indicate when someone first entered the hobby.
Games in Collection
What games does the user currently have in their collection? The following table can be used to examine games the user owns, along with some helpful information for selecting the right game for a game night!
Use the filters above the table to sort/filter based on information about the game, such as year published, recommended player counts, or playing time.
Show the code
|>
collection filter(own == 1) |>
prep_collection_datatable(
games = games_raw
|>
) filter(!is.na(image)) |>
collection_datatable()
Modeling
I’ll now the examine predictive models trained on the user’s collection.
For an individual user, I train a predictive model on their collection in order to predict whether a user owns a game. The outcome, in this case, is binary: does the user have a game listed in their collection or not? This is the setting for training a classification model, where the model aims to learn the probability that a user will add a game to their collection based on its observable features.
How does a model learn what a user is likely to own? The training process is a matter of examining historical games and finding patterns that exist between game features (designers, mechanics, playing time, etc) and games in the user’s collection.
I make use of many potential features for games, the vast majority of which are dummies indicating the presence or absence of the presence or absence of things such as a publisher/artist/designer. The “standard” BGG features for every game contain information that is typically listed on the box its playing time, player counts, or its recommended minimum age.
I train models to predict whether a user owns a game based only on information that could be observed about the game at its release: playing time, player count, mechanics, categories, genres, and selected designers, artists, and publishers. I do not make use of BGG community information, such as its average rating, weight, or number of user ratings. This is to ensure the model can predict newly released games without relying on information from the BGG community.
What Predicts A Collection?
A predictive model gives us more than just predictions. We can also ask, what did the model learn from the data? What predicts the outcome? In the case of predicting a boardgame collection, what did the model find to be predictive of games a user has in their collection?
To answer this, I examine the coefficients from a model logistic regression with ridge regularization (which I will refer to as a penalized logistic regression).
Positive values indicate that a feature increases a user’s probability of owning/rating a game, while negative values indicate a feature decreases the probability. To be precise, the coefficients indicate the effect of a particular feature on the log-odds of a user owning a game.
The following visualization shows the path of each feature as it enters the model, with highly influential features tending to enter the model early with large positive or negative effects. The dotted line indicates the level of regularization that was selected during tuning.
Show the code
#|
|>
model_glmnet pluck("wflow", 1) |>
trace_plot.glmnet(max.overlaps = 30) +
facet_wrap(~ params$username)
Partial Effects
What are the effects of individual features?
Use the buttons below to examine the effects different types of predictors had in predicting the user’s collection.
Assessment
How well did the model do in predicting the user’s collection?
This section contains a variety of visualizations and metrics for assessing the performance of the model(s). If you’re not particularly interested in predictive modeling, skip down further to the predictions from the model.
The following displays the model’s performance in resampling on a training set, a validation set, and a holdout set of upcoming games.
Show the code
|>
metrics mutate_if(is.numeric, round, 3) |>
pivot_wider(
names_from = c(".metric"),
values_from = c(".estimate")
|>
) ::gt() |>
gt::sub_missing() |>
gtgt_options()
username | wflow_id | type | .estimator | mn_log_loss | roc_auc | pr_auc |
---|---|---|---|---|---|---|
GOBBluth89 | glmnet | resamples | binary | 0.017 | 0.944 | 0.206 |
GOBBluth89 | glmnet | test | binary | 0.005 | 0.923 | 0.016 |
GOBBluth89 | glmnet | valid | binary | 0.012 | 0.893 | 0.055 |
An easy way to visually examine the performance of classification model is to view a separation plot.
I plot the predicted probabilities from the model for every game (during resampling) from lowest to highest. I then overlay a blue line for any game that the user does own. A good classifier is one that is able to separate the blue (games owned by the user) from the white (games not owned by the user), with most of the blue occurring at the highest probabilities (left side of the chart).
Show the code
|>
preds filter(type %in% c("resamples", "valid")) |>
plot_separation(outcome = params$outcome)
I can more formally assess how well each model did in resampling by looking at the area under the ROC curve (roc_auc). A perfect model would receive a score of 1, while a model that cannot predict the outcome will default to a score of 0.5. The extent to which something is a good score depends on the setting, but generally anything in the .8 to .9 range is very good while the .7 to .8 range is perfectly acceptable.
Show the code
|>
preds nest(data = -c(username, wflow_id, type)) |>
mutate(
roc_curve = map(
data,safely(~ .x |> safe_roc_curve(truth = params$outcome))
)|>
) mutate(result = map(roc_curve, ~ .x |> pluck("result"))) |>
select(username, wflow_id, type, result) |>
unnest(result) |>
plot_roc_curve()
Top Games in Training
What were the model’s top games in the training set?
Show the code
|>
preds filter(type == "resamples") |>
prep_predictions_datatable(
games = games,
outcome = params$outcome
|>
) predictions_datatable(
outcome = params$outcome,
remove_description = T,
remove_image = T,
pagelength = 15
)
Top Games in Validation
What were the model’s top games in the validation set?
Show the code
|>
preds filter(type %in% c("valid")) |>
prep_predictions_datatable(
games = games,
outcome = params$outcome
|>
) predictions_datatable(
outcome = params$outcome,
remove_description = T,
remove_image = T,
pagelength = 15
)
Top Games by Year
Displaying the model’s top games for individual years in recent years.
Show the code
|>
preds filter(type %in% c("resamples", "valid")) |>
top_n_preds(
games = games,
outcome = params$outcome,
top_n = 15,
n_years = 15
|>
) gt_top_n(collection = collection |> prep_collection())
Rank | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Middle-Earth Quest | Runewars | A Game of Thrones: The Board Game (Second Edition) | Descent: Journeys in the Dark (Second Edition) | Glass Road | Star Wars: Imperial Assault | Pandemic Legacy: Season 1 | Star Wars: Rebellion | Stop Thief! | Star Wars: X-Wing (Second Edition) | Maracaibo | Unmatched: Little Red Riding Hood vs. Beowulf | Unmatched: Battle of Legends, Volume Two | The Lord of the Rings: The Card Game – Revised Core Set | Undaunted: Battle of Britain |
2 | Chaos in the Old World | Battles of Westeros | Mansions of Madness | Star Wars: X-Wing Miniatures Game | NFL Game Day | Pandemic: The Cure | Blood Rage | Agricola (Revised Edition) | Century: Spice Road | The Lord of the Rings: The Card Game – Two-Player Limited Edition Starter | Unmatched Game System | Unmatched: Cobble & Fog | Brian Boru: High King of Ireland | Unmatched: Redemption Row | Unmatched: Teen Spirit |
3 | Kuhhandel Master | Space Hulk: Death Angel – The Card Game | The Lord of the Rings: The Card Game | Galaxy Trucker: Anniversary Edition | Eldritch Horror | Nyakuza | Forbidden Stars | Junk Art | Gloomhaven | Azul: Stained Glass of Sintra | Unmatched: Battle of Legends, Volume One | Unmatched: Jurassic Park – InGen vs Raptors | Boonlake | Unmatched: Hell's Kitchen | Unmatched: For King and Country |
4 | Age of Conan: The Strategy Board Game | Troyes | Gears of War: The Board Game | Rex: Final Days of an Empire | Caverna: The Cave Farmers | Pandemic: Contagion | Star Wars: X-Wing Miniatures Game – The Force Awakens Core Set | Sherlock Holmes Consulting Detective: Jack the Ripper & West End Adventures | Twilight Imperium: Fourth Edition | Root | Unmatched: Robin Hood vs. Bigfoot | Unmatched: Buffy the Vampire Slayer | Savannah Park | Unmatched: Jurassic Park – Dr. Sattler vs. T. Rex | Unmatched: Brains and Brawn |
5 | Warhammer: Invasion | DungeonQuest (Third Edition) | Rune Age | Wiz-War (Eighth Edition) | BattleLore: Second Edition | Alchemists | 7 Wonders Duel | Scythe | My Little Scythe | Camel Up (Second Edition) | Star Wars: Outer Rim | Undaunted: North Africa | Unfathomable | Unmatched: Houdini vs. The Genie | The Witcher: Old World |
6 | Cyclades | 7 Wonders | Letters from Whitechapel | Keyflower | Impulse | Five Tribes: The Djinns of Naqala | Star Wars: Armada | Terraforming Mars | Folklore: The Affliction | Fireball Island: The Curse of Vul-Kar | Tapestry | Century: Golem Edition – An Endless World | Railroad Ink Challenge: Lush Green Edition | SPYBAM | Rough Draft |
7 | Chronicle | Dominant Species | Dark Moon | Android: Netrunner | Relic | Camel Up | The King Is Dead | A Feast for Odin | Century: Golem Edition | Rising Sun | Undaunted: Normandy | Gloomhaven: Jaws of the Lion | Galaxy Trucker (Second Edition) | アンドーンテッド:ノルマンディー・プラス (Undaunted: Normandy Plus) | Empire's End |
8 | Ubongo 3D | Wars of the Roses: Lancaster vs. York | Dungeon Fighter | Star Wars: The Card Game | Rococo | Akrotiri | Mombasa | New Angeles | Spirit Island | The Estates | Subtext | Merv: The Heart of the Silk Road | Railroad Ink Challenge: Shining Yellow Edition | Undaunted: Stalingrad | Arkendom Conquista Starter Set |
9 | Endeavor | High Frontier | Quarriors! | Agricola: All Creatures Big and Small | Lewis & Clark: The Expedition | Fields of Arle | Oh My Goods! | Arkham Horror: The Card Game | Star Wars: Destiny – Two-Player Game | War Chest | Century: A New World | Eclipse: Second Dawn for the Galaxy | Great Plains | Frosthaven | Witchcraft! |
10 | The Adventurers: The Temple of Chac | Forbidden Island | King of Tokyo | Antike Duellum | Euphoria: Build a Better Dystopia | Warhammer 40,000: Conquest | Codenames | Great Western Trail | Azul | Cosmic Encounter: 42nd Anniversary Edition | Era: Medieval Age | The Search for Planet X | Bloodborne: The Board Game | Foundations of Rome | Century: Big Box |
11 | Small World | Sid Meier's Civilization: The Board Game | Dungeon Petz | Clash of Cultures | Nations | Spyfall | Viticulture Essential Edition | Love Letter: Premium Edition | Sherlock Holmes Consulting Detective: Vanishing from Hyde Park | Newton | Century: Golem Edition – Eastern Mountains | New York Zoo | Ark Nova | Foundations of Rome (Emperor Edition) | Chase Chess |
12 | Wings of War: WW2 Deluxe set | Glen More | Belfort | Il Vecchio | Spyrium | Port Royal | A Game of Thrones: The Card Game (Second Edition) | Sakura Arms | Downforce | Everdell | The King's Dilemma | Lost Ruins of Arnak | Arkham Horror: The Card Game (Revised Edition) | Return to Dark Tower | Unmatched Adventures: Tales to Amaze |
13 | EVE: Conquests | Earth Reborn | Mage Knight Board Game | Merchant of Venus (Second Edition) | Francis Drake | La Granja | Watson & Holmes | Captain Sonar | Bunny Kingdom | Western Legends | The Taverns of Tiefenthal | Dune: Imperium | Corrosion | Agricola 15 | Mad Mars |
14 | Wings of War: Fire from the Sky | Labyrinth: The War on Terror, 2001 – ? | Eclipse: New Dawn for the Galaxy | Terra Mystica | Sails of Glory | Star Wars: Empire vs. Rebellion | Steampunk Rally | Star Wars: Destiny | Indulgence | Century: Eastern Wonders | The Isle of Cats | Star Wars: Armada – Galactic Republic Fleet Starter | Cascadia | Bardsung | General Orders: World War II |
15 | Imperial 2030 | Mousquetaires du Roy | A Few Acres of Snow | Pax Porfiriana | Cube Quest | Antike II | Kraftwagen | DOOM: The Board Game | Sherlock Holmes Consulting Detective: Carlton House & Queen's Park | Coimbra | Marvel Champions: The Card Game | Star Wars: Armada – Separatist Alliance Fleet Starter | Oath | ISS Vanguard | Pirate Tales |
Predictions
New and Upcoming Games
What were the model’s top predictions for new and upcoming board game releases?
Show the code
|>
new_preds filter(type == "upcoming") |>
# imposing a minimum threshold to filter out games with no info
filter(usersrated >= 1) |>
# removing this goddamn boxing game that has every mechanic listed
filter(game_id != 420629) |>
prep_predictions_datatable(
games = games_new,
outcome = params$outcome
|>
) predictions_datatable(outcome = params$outcome)
Older Games
What were the model’s top predictions for older games?