Predicting Board Game Collections

aboardgamebarrage’s Collection

Author

Phil Henrickson

Published

May 21, 2025

About

This report details the results of training and evaluating a classification model for predicting games for a user’s boardgame collection.

Note

To view games predicted by the model, go to Section 5.

Collection

The data in this project comes from BoardGameGeek.com. The data used is at the game level, where an individual observation contains features about a game, such as its publisher, categories, and playing time, among many others.

I train a classification model at the user level to learn the relationship between game features and games that a user owns - what predicts a user’s collection?

username	status	games
aboardgamebarrage	ever_owned	457
aboardgamebarrage	own	236
aboardgamebarrage	rated	330

I evaluate the model’s performance on a training set of historical games via resampling, then validate the model’s performance on a set aside set of newer relases. I then refit the model on the training and validation in order and predict upcoming releases in order to find new games that the user is most likely to add to their collection.

username	years	type	Own
username	years	type	no	yes
aboardgamebarrage	-3500-2021	train	26170	192
aboardgamebarrage	2022-2023	valid	10267	41
aboardgamebarrage	2024-2028	test	8590	3

Types of Games

What types of game does the user own? The following plot displays the most frequent publishers, mechanics, designers, artists, etc that appear in a user’s collection.

Show the code

collection |>
    filter(own == 1) |>
    collection_by_category(
        games = games_raw
    ) |>
    plot_collection_by_category() +
    ylab("feature")

The following plot shows the years in which games in the user’s collection were published. This can usually indicate when someone first entered the hobby.

Games in Collection

What games does the user currently have in their collection? The following table can be used to examine games the user owns, along with some helpful information for selecting the right game for a game night!

Use the filters above the table to sort/filter based on information about the game, such as year published, recommended player counts, or playing time.

Show the code

collection |>
    filter(own == 1) |>
    prep_collection_datatable(
        games = games_raw
    ) |>
    filter(!is.na(image)) |>
    collection_datatable()

Modeling

I’ll now the examine predictive models trained on the user’s collection.

For an individual user, I train a predictive model on their collection in order to predict whether a user owns a game. The outcome, in this case, is binary: does the user have a game listed in their collection or not? This is the setting for training a classification model, where the model aims to learn the probability that a user will add a game to their collection based on its observable features.

How does a model learn what a user is likely to own? The training process is a matter of examining historical games and finding patterns that exist between game features (designers, mechanics, playing time, etc) and games in the user’s collection.

I make use of many potential features for games, the vast majority of which are dummies indicating the presence or absence of the presence or absence of things such as a publisher/artist/designer. The “standard” BGG features for every game contain information that is typically listed on the box its playing time, player counts, or its recommended minimum age.

Note

I train models to predict whether a user owns a game based only on information that could be observed about the game at its release: playing time, player count, mechanics, categories, genres, and selected designers, artists, and publishers. I do not make use of BGG community information, such as its average rating, weight, or number of user ratings. This is to ensure the model can predict newly released games without relying on information from the BGG community.

What Predicts A Collection?

A predictive model gives us more than just predictions. We can also ask, what did the model learn from the data? What predicts the outcome? In the case of predicting a boardgame collection, what did the model find to be predictive of games a user has in their collection?

To answer this, I examine the coefficients from a model logistic regression with ridge regularization (which I will refer to as a penalized logistic regression).

Positive values indicate that a feature increases a user’s probability of owning/rating a game, while negative values indicate a feature decreases the probability. To be precise, the coefficients indicate the effect of a particular feature on the log-odds of a user owning a game.

The following visualization shows the path of each feature as it enters the model, with highly influential features tending to enter the model early with large positive or negative effects. The dotted line indicates the level of regularization that was selected during tuning.

Show the code

#|
model_glmnet |>
    pluck("wflow", 1) |>
    trace_plot.glmnet(max.overlaps = 30) +
    facet_wrap(~ params$username)

Partial Effects

What are the effects of individual features?

Use the buttons below to examine the effects different types of predictors had in predicting the user’s collection.

Assessment

How well did the model do in predicting the user’s collection?

This section contains a variety of visualizations and metrics for assessing the performance of the model(s). If you’re not particularly interested in predictive modeling, skip down further to the predictions from the model.

The following displays the model’s performance in resampling on a training set, a validation set, and a holdout set of upcoming games.

Show the code

metrics |>
    mutate_if(is.numeric, round, 3) |>
    pivot_wider(
        names_from = c(".metric"),
        values_from = c(".estimate")
    ) |>
    gt::gt() |>
    gt::sub_missing() |>
    gt_options()

username	wflow_id	type	.estimator	mn_log_loss	roc_auc	pr_auc
aboardgamebarrage	glmnet	resamples	binary	0.034	0.894	0.111
aboardgamebarrage	glmnet	test	binary	0.010	0.951	0.003
aboardgamebarrage	glmnet	valid	binary	0.024	0.825	0.021

An easy way to visually examine the performance of classification model is to view a separation plot.

I plot the predicted probabilities from the model for every game (during resampling) from lowest to highest. I then overlay a blue line for any game that the user does own. A good classifier is one that is able to separate the blue (games owned by the user) from the white (games not owned by the user), with most of the blue occurring at the highest probabilities (left side of the chart).

Show the code

preds |>
    filter(type %in% c("resamples", "valid")) |>
    plot_separation(outcome = params$outcome)

I can more formally assess how well each model did in resampling by looking at the area under the ROC curve (roc_auc). A perfect model would receive a score of 1, while a model that cannot predict the outcome will default to a score of 0.5. The extent to which something is a good score depends on the setting, but generally anything in the .8 to .9 range is very good while the .7 to .8 range is perfectly acceptable.

Show the code

preds |>
    nest(data = -c(username, wflow_id, type)) |>
    mutate(
        roc_curve = map(
            data,
            safely(~ .x |> safe_roc_curve(truth = params$outcome))
        )
    ) |>
    mutate(result = map(roc_curve, ~ .x |> pluck("result"))) |>
    select(username, wflow_id, type, result) |>
    unnest(result) |>
    plot_roc_curve()

Top Games in Training

What were the model’s top games in the training set?

Show the code

preds |>
    filter(type == "resamples") |>
    prep_predictions_datatable(
        games = games,
        outcome = params$outcome
    ) |>
    predictions_datatable(
        outcome = params$outcome,
        remove_description = T,
        remove_image = T,
        pagelength = 15
    )

Top Games in Validation

What were the model’s top games in the validation set?

Show the code

preds |>
    filter(type %in% c("valid")) |>
    prep_predictions_datatable(
        games = games,
        outcome = params$outcome
    ) |>
    predictions_datatable(
        outcome = params$outcome,
        remove_description = T,
        remove_image = T,
        pagelength = 15
    )

Top Games by Year

Displaying the model’s top games for individual years in recent years.

Show the code

preds |>
    filter(type %in% c("resamples", "valid")) |>
    top_n_preds(
        games = games,
        outcome = params$outcome,
        top_n = 15,
        n_years = 15
    ) |>
    gt_top_n(collection = collection |> prep_collection())

Rank	2009	2010	2011	2012	2013	2014	2015	2016	2017	2018	2019	2020	2021	2022	2023
1	Dominion: Intrigue	Innovation	A Fake Artist Goes to New York	The Resistance: Avalon	Impulse	Maskmen	Mottainai	Scythe	Azul	Cosmic Encounter: 42nd Anniversary Edition	Dune	Project L	Brian Boru: High King of Ireland	SPYBAM	TerraFyte
2	Telestrations	Earth Reborn	Omen: A Reign of War	Archipelago	A Study in Emerald	Nyakuza	7 Wonders Duel	Insider	Dungeon of Mandom VIII	Mr. Face	Noctiluca	Nidavellir	Kemet: Blood and Sand – Kickstarter Edition	Cat in the Box: Deluxe Edition	La Granja: Deluxe Master Set
3	Steam	Dominion: Big Box	Discworld: Ankh-Morpork	Terra Mystica	Hemloch: Vault of Darkness	Spyfall	The Game + The Game on Fire	Omen: Edge of the Aegean	WOO	Brass: Birmingham	Chocolate Factory	Mezo	For Sale Autorama	Kongkang: The Wild Party	The Big Crunch
4	Chaos in the Old World	Politico: The Fall of Caesar	The Castles of Burgundy	Rex: Final Days of an Empire	Kobayakawa	Chimera	Elysium	Turin Market	Merchants of Muziris	Rising Sun	SCOUT	Omen: Heir to the Dunes	Import / Export: Definitive Edition	Make the Difference	Rafter Five
5	Hansa Teutonica	Time's Up! Family	Vanuatu	Tooth & Nail: Factions	Francis Drake	Red7	The King Is Dead	Junk Art	One Night Ultimate Alien	Pencil Nose!	Omen: Fires in the East	Insider Black	Kemet: Blood and Sand	White Elephant: A Gift Exchange	The Same Game
6	Samurai: The Card Game	7 Wonders	Puerto Rico	Machi Koro	Five Cucumbers	La Isla	Trans-Siberian Railroad	Reign of Cthulhu	Startups	TOKYO METRO	Obscurio	The Red Cathedral	The Diamond Swap	Revive	CoGNaC
7	Greed Incorporated	London	A Game of Thrones: The Board Game (Second Edition)	Targi	Dominion: Special Edition	Tricks & Deserts	TROLL	Game of Thrones: The Iron Throne	The Quest for El Dorado	The Mind	Fafnir	Hello Neighbor: The Secret Neighbor Party Game	Bad Company	Hunch!	Avalonia
8	Jaipur	In a Grove	City Tycoon	The Great Zimbabwe	Relic	Kingdom Builder: Big Box	GEM	Lorenzo il Magnifico	Troika	VOID	Nanty Narking	Hues and Cues	Evil Corp	1877: Stockholm Tramways	The White Castle
9	Time's Up! Academy	Bhazum	Friday	Fleet	Habe fertig	The Nile Ran Red	Rights	Akua	Breaking Bad: The Board Game	Just One	Detective: City of Angels	Merv: The Heart of the Silk Road	Lorenzo il Magnifico: Big Box	Order Overload: Cafe	Disrupt
10	Finca	Irondale	Artus	Uchronia	The Valkyrie Incident	Power Grid Deluxe: Europe/North America	Keep	Neolithic	Calimala	Newton	We Need to Talk	The Cost	Biblios: Quill and Parchment	A Game of Thrones: B'Twixt	Spears and Bones
11	The Resistance	Runewars	Takenoko	Android: Infiltration	Room 25	Spike	Het Koninkrijk Dominion	Let Them Eat Cake	Innovation Deluxe	Moneybags	One Night Ultimate Super Heroes	The Game of Fuzzy Logic	Mercado de Lisboa	Gateway Island	Yarr!: Stranded Scoundrels
12	American Rails	De Vulgari Eloquentia	Mundus Novus	Kemet	Age of Assassins	Port Royal	One Night Ultimate Werewolf: Daybreak	Hit Z Road	Import / Export Captain Edition	Everdell	TOKYO GAME SHOW	13 Monsters	Moon Adventure	Desamparados: Stalingrado	Craft my Agenda
13	Kuhhandel Master	Mystery Express	Pictomania	Tzolk'in: The Mayan Calendar	Glass Road	Imperial Settlers	Hordes of Grimoor	One Night Ultimate Vampire	The Game: Face to Face	Decrypto	Miskatonic University: The Restricted Collection	Sacred Rites	Dirge: The Rust Wars	GridL	Zombie Sniper
14	Automobile	The Hobbit	Tournay	The Manhattan Project	Amerigo	Pandemic: Contagion	Codenames	Codenames: Deep Undercover	Import / Export	The Binding of Isaac: Four Souls	The North	Long Live the King: A Game of Secrecy and Subterfuge	Dune: Betrayal	The Middle Ages	Doodle Heist
15	Filipino Fruit Market	High Frontier	Singapore	Serenissima (Second Edition)	Lewis & Clark: The Expedition	The Battle at Kemble's Cascade	Watson & Holmes	SYNOD	Cartouche Dynasties	Camel Up (Second Edition)	Mega Empires: The West	TOKYO TSUKIJI MARKET	Moving Pictures	Blood on the Clocktower	Pizzachef

Predictions

New and Upcoming Games

What were the model’s top predictions for new and upcoming board game releases?

Show the code

new_preds |>
    filter(type == "upcoming") |>
    # imposing a minimum threshold to filter out games with no info
    filter(usersrated >= 1) |>
    # removing this goddamn boxing game that has every mechanic listed
    filter(game_id != 420629) |>
    prep_predictions_datatable(
        games = games_new,
        outcome = params$outcome
    ) |>
    predictions_datatable(outcome = params$outcome)

Older Games

What were the model’s top predictions for older games?