Predicting Board Game Collections

GOBBluth89’s Collection

Author

Phil Henrickson

Published

10/31/24

About

This report details the results of training and evaluating a classification model for predicting games for a user’s boardgame collection.

Note

To view games predicted by the model, go to Section 5.

Collection

The data in this project comes from BoardGameGeek.com. The data used is at the game level, where an individual observation contains features about a game, such as its publisher, categories, and playing time, among many others.

I train a classification model at the user level to learn the relationship between game features and games that a user owns - what predicts a user’s collection?

username status games
GOBBluth89 ever_owned 107
GOBBluth89 own 102

I evaluate the model’s performance on a training set of historical games via resampling, then validate the model’s performance on a set aside set of newer relases. I then refit the model on the training and validation in order and predict upcoming releases in order to find new games that the user is most likely to add to their collection.

username years type Own
no yes
GOBBluth89 -3500-2020 train 24428 90
GOBBluth89 2021-2022 valid 9885 12
GOBBluth89 2023-2028 test 9099 NA

Types of Games

What types of game does the user own? The following plot displays the most frequent publishers, mechanics, designers, artists, etc that appear in a user’s collection.

Show the code
collection |>
        filter(own == 1) |>
        collection_by_category(
                games = games_raw
        ) |>
        plot_collection_by_category()+
        ylab("feature")

The following plot shows the years in which games in the user’s collection were published. This can usually indicate when someone first entered the hobby.

Games in Collection

What games does the user currently have in their collection? The following table can be used to examine games the user owns, along with some helpful information for selecting the right game for a game night!

Use the filters above the table to sort/filter based on information about the game, such as year published, recommended player counts, or playing time.

Show the code
collection |>
        filter(own == 1) |>
        prep_collection_datatable(
                games = games_raw
        ) |>
        filter(!is.na(image)) |>
        collection_datatable()

Modeling

I’ll now the examine predictive models trained on the user’s collection.

For an individual user, I train a predictive model on their collection in order to predict whether a user owns a game. The outcome, in this case, is binary: does the user have a game listed in their collection or not? This is the setting for training a classification model, where the model aims to learn the probability that a user will add a game to their collection based on its observable features.

How does a model learn what a user is likely to own? The training process is a matter of examining historical games and finding patterns that exist between game features (designers, mechanics, playing time, etc) and games in the user’s collection.

I make use of many potential features for games, the vast majority of which are dummies indicating the presence or absence of the presence or absence of things such as a publisher/artist/designer. The “standard” BGG features for every game contain information that is typically listed on the box its playing time, player counts, or its recommended minimum age.

Note

I train models to predict whether a user owns a game based only on information that could be observed about the game at its release: playing time, player count, mechanics, categories, genres, and selected designers, artists, and publishers. I do not make use of BGG community information, such as its average rating, weight, or number of user ratings. This is to ensure the model can predict newly released games without relying on information from the BGG community.

What Predicts A Collection?

A predictive model gives us more than just predictions. We can also ask, what did the model learn from the data? What predicts the outcome? In the case of predicting a boardgame collection, what did the model find to be predictive of games a user has in their collection?

To answer this, I examine the coefficients from a model logistic regression with ridge regularization (which I will refer to as a penalized logistic regression).

Positive values indicate that a feature increases a user’s probability of owning/rating a game, while negative values indicate a feature decreases the probability. To be precise, the coefficients indicate the effect of a particular feature on the log-odds of a user owning a game.

The following visualization shows the path of each feature as it enters the model, with highly influential features tending to enter the model early with large positive or negative effects. The dotted line indicates the level of regularization that was selected during tuning.

Show the code
model_glmnet |> 
        pluck("wflow", 1) |>
        trace_plot.glmnet(max.overlaps = 30)+
        facet_wrap(~params$username)

Partial Effects

What are the effects of individual features?

Use the buttons below to examine the effects different types of predictors had in predicting the user’s collection.

Assessment

How well did the model do in predicting the user’s collection?

This section contains a variety of visualizations and metrics for assessing the performance of the model(s). If you’re not particularly interested in predictive modeling, skip down further to the predictions from the model.

The following displays the model’s performance in resampling on a training set, a validation set, and a holdout set of upcoming games.

Show the code
metrics |>
        mutate_if(is.numeric, round, 3) |>
        pivot_wider(
                names_from = c(".metric"),
                values_from = c(".estimate")) |>
        gt::gt() |>
        gt::sub_missing() |>
        gt_options()
username wflow_id type .estimator mn_log_loss roc_auc pr_auc
GOBBluth89 glmnet resamples binary 0.017 0.943 0.170
GOBBluth89 glmnet test binary 0.003
GOBBluth89 glmnet valid binary 0.008 0.926 0.071

An easy way to visually examine the performance of classification model is to view a separation plot.

I plot the predicted probabilities from the model for every game (during resampling) from lowest to highest. I then overlay a blue line for any game that the user does own. A good classifier is one that is able to separate the blue (games owned by the user) from the white (games not owned by the user), with most of the blue occurring at the highest probabilities (left side of the chart).

Show the code
preds |>
        filter(type %in% c('resamples', 'valid')) |>
        plot_separation(outcome = params$outcome)

I can more formally assess how well each model did in resampling by looking at the area under the ROC curve (roc_auc). A perfect model would receive a score of 1, while a model that cannot predict the outcome will default to a score of 0.5. The extent to which something is a good score depends on the setting, but generally anything in the .8 to .9 range is very good while the .7 to .8 range is perfectly acceptable.

Show the code
preds |>
        nest(data = -c(username, wflow_id, type)) |>
        mutate(roc_curve = map(data, safely( ~ .x |> safe_roc_curve(truth = params$outcome)))) |>
        mutate(result = map(roc_curve, ~ .x |> pluck("result"))) |>
        select(username, wflow_id, type, result) |>
        unnest(result) |>
        plot_roc_curve()

Top Games in Training

What were the model’s top games in the training set?

Show the code
preds |>
        filter(type == 'resamples') |>
        prep_predictions_datatable(
                games = games, 
                outcome = params$outcome
        ) |>
        predictions_datatable(outcome = params$outcome,
                remove_description = T, 
                remove_image = T, 
                pagelength = 15)

Top Games in Validation

What were the model’s top games in the validation set?

Show the code
preds |>
        filter(type %in% c("valid")) |>
        prep_predictions_datatable(
                games = games,
                outcome = params$outcome
        ) |>
        predictions_datatable(
                outcome = params$outcome,
                remove_description = T, 
                remove_image = T, 
                pagelength = 15)

Top Games by Year

Displaying the model’s top games for individual years in recent years.

Show the code
preds |>
        filter(type %in% c('resamples', 'valid')) |>
        top_n_preds(
                games = games,
                outcome = params$outcome,
                top_n = 15,
                n_years = 15
        ) |>
        gt_top_n(collection = collection |> prep_collection())
Rank 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
1 Battlestar Galactica: The Board Game Middle-Earth Quest Earth Reborn Mansions of Madness Wiz-War (Eighth Edition) Caverna: The Cave Farmers Star Wars: Imperial Assault Star Wars: X-Wing Miniatures Game – The Force Awakens Core Set Star Wars: Rebellion Gloomhaven Cosmic Encounter: 42nd Anniversary Edition Unmatched: Battle of Legends, Volume One Unmatched: Little Red Riding Hood vs. Beowulf Unmatched: Battle of Legends, Volume Two The Lord of the Rings: The Card Game – Revised Core Set
2 A Game of Thrones: The Card Game Warhammer: Invasion Battles of Westeros A Game of Thrones: The Board Game (Second Edition) Descent: Journeys in the Dark (Second Edition) NFL Game Day Alchemists Pandemic Legacy: Season 1 Agricola (Revised Edition) Century: Spice Road The Lord of the Rings: The Card Game – Two-Player Limited Edition Starter Unmatched: Robin Hood vs. Bigfoot Unmatched: Jurassic Park – InGen vs Raptors Boonlake Unmatched: Redemption Row
3 Android Age of Conan: The Strategy Board Game Runewars The Lord of the Rings: The Card Game Galaxy Trucker: Anniversary Edition Glass Road Pandemic: Contagion Star Wars: Armada Junk Art Stop Thief! Rising Sun Unmatched Game System Unmatched: Cobble & Fog Experior Existencial Unmatched: Hell's Kitchen
4 Space Alert Bunny Bunny Moose Moose War of the Ring Collector's Edition Rune Age Android: Netrunner BattleLore: Second Edition Pandemic: The Cure The King Is Dead Scythe Twilight Imperium: Fourth Edition Everdell Maracaibo Unmatched: Buffy the Vampire Slayer Galaxy Trucker (Second Edition) Unmatched: Jurassic Park – Dr. Sattler vs. T. Rex
5 Dixit Endeavor Space Hulk: Death Angel – The Card Game Letters from Whitechapel Star Wars: The Card Game Relic Spyfall Blood Rage Sherlock Holmes Consulting Detective: Jack the Ripper & West End Adventures My Little Scythe Concordia Venus Star Wars: Outer Rim Gloomhaven: Jaws of the Lion Arkham Horror: The Card Game (Revised Edition) Unmatched: Houdini vs. The Genie
6 Mutant Chronicles Collectible Miniatures Game Chaos in the Old World DungeonQuest (Third Edition) Gears of War: The Board Game Star Wars: X-Wing Miniatures Game Blueprints Port Royal Forbidden Stars Terraforming Mars Bunny Kingdom Newton Era: Medieval Age Century: Golem Edition – An Endless World Mind MGMT: The Psychic Espionage “Game.” アンドーンテッド:ノルマンディー・プラス (Undaunted: Normandy Plus)
7 Wasabi! Revolution! Sid Meier's Civilization: The Board Game King of Tokyo Pax Porfiriana Eldritch Horror Akrotiri Mysterium Dead of Winter: The Long Night Dungeon of Mandom VIII Railroad Ink: Blazing Red Edition Aftermath Undaunted: North Africa Crowded Cave Adventures Undaunted: Stalingrad
8 Call of Cthulhu: The Card Game Small World Merchants & Marauders Mage Knight Board Game Il Vecchio Impulse Fields of Arle Ashes Reborn: Rise of the Phoenixborn Arkham Horror: The Card Game Azul Root Tapestry Hues and Cues Cascadia Bardsung
9 Le Havre At the Gates of Loyang Escape from the Aliens in Outer Space Dark Moon Rex: Final Days of an Empire Ici Londres Deception: Murder in Hong Kong Mombasa Codenames: Deep Undercover Fallout Fireball Island: The Curse of Vul-Kar Marvel Champions: The Card Game Star Wars: Armada – Galactic Republic Fleet Starter Bloodborne: The Board Game Frosthaven
10 World of Warcraft: The Adventure Game Chronicle Wars of the Roses: Lancaster vs. York Ora et Labora Love Letter Blood Bound Camel Up Oh My Goods! Sakura Arms Indulgence Blackout: Hong Kong The Isle of Cats Star Wars: Armada – Separatist Alliance Fleet Starter Pendulum Fighters Quacks & Co.: Quedlinburg Dash
11 Cosmic Encounter Shipyard Merkator A Few Acres of Snow Smash Up Suburbia + Inc. The Battle at Kemble's Cascade Love Letter: Adventure Time Perdition's Mouth: Abyssal Rift Century: Golem Edition Camel Up (Second Edition) Blitzkrieg!: World War Two in 20 Minutes Alma Mater Unfathomable Return to Dark Tower
12 Snow Tails Tarantel Tango The Hobbit Ascending Empires Merchant of Venus (Second Edition) Tash-Kalar: Arena of Legends Roll for the Galaxy Specter Ops Love Letter: Premium Edition Downforce Azul: Stained Glass of Sintra Century: Golem Edition – Eastern Mountains The Pet Cemetery Railroad Ink Challenge: Lush Green Edition Agricola 15
13 Senji Skyline 3000 Horus Heresy Space Empires 4X Clash of Cultures The Ravens of Thri Sahashri AquaSphere A Game of Thrones: The Card Game (Second Edition) Let Them Eat Cake Folklore: The Affliction Brikks Silver & Gold Project: ELITE Thakhi: la senda de los dioses Resist!
14 Ice Flow Cyclades Lords of Scotland Belfort Suburbia Sails of Glory Blue Moon Legends Arboretum Iberia Herbaceous Dinosaur Tea Party Century: A New World Merv: The Heart of the Silk Road Evil Corp Warhammer: The Horus Heresy – Age of Darkness
15 Toledo Arcana Labyrinth: The War on Terror, 2001 – ? Discworld: Ankh-Morpork Scripts and Scribes: The Dice Game Hanamikoji Isle of Trains Mafia de Cuba Inis Flick 'em Up!: Giant Edition Neon Gods Dune Rogue Dungeon Ark Nova Planet Unknown

Predictions

New and Upcoming Games

What were the model’s top predictions for new and upcoming board game releases?

Show the code
new_preds |>
        filter(type == 'upcoming') |>
        # imposing a minimum threshold to filter out games with no info
        filter(usersrated >= 1) |>
        # removing this goddamn boxing game that has every mechanic listed
        filter(game_id != 420629) |>
        prep_predictions_datatable(
                games = games_new,
                outcome = params$outcome
        ) |>
        predictions_datatable(outcome = params$outcome)

Older Games

What were the model’s top predictions for older games?