Decomposition of Reinforcement Learning Deficits in Disordered Gambling via Drift Diffusion Modeling and Functional Magnetic Resonance Imaging

Gambling disorder is associated with deficits in reward-based learning, but the underlying computational mechanisms are still poorly understood. Here, we examined this issue using a stationary reinforcement learning task in combination with computational modeling and functional resonance imaging (fMRI) in individuals that regular participate in gambling (n = 23, seven fulfilled one to three DSM 5 criteria for gambling disorder, sixteen fulfilled four or more) and matched controls (n = 23). As predicted, the gambling group exhibited substantially reduced accuracy, whereas overall response times (RTs) were not reliably different between groups. We then used comprehensive modeling using reinforcement learning drift diffusion models (RLDDMs) in combination with hierarchical Bayesian parameter estimation to shed light on the computational underpinnings of this performance deficit. In both groups, an RLDDM in which both non-decision time and decision threshold (boundary separation) changed over the course of the experiment accounted for the data best. The model showed good parameter and model recovery, and posterior predictive checks revealed that, in both groups, the model accurately reproduced the evolution of accuracies and RTs over time. Modeling revealed that, compared to controls, the learning impairment in the gambling group was linked to a more rapid reduction in decision thresholds over time, and a reduced impact of value-differences on the drift rate. The gambling group also showed shorter non-decision times. FMRI analyses replicated effects of prediction error coding in the ventral striatum and value coding in the ventro-medial prefrontal cortex, but there was no credible evidence for group differences in these effects. Taken together, our findings show that reinforcement learning impairments in disordered gambling are linked to both maladaptive decision threshold adjustments and a reduced consideration of option values in the choice process.

Wiehler & Peters: Decomposition of reinforcement learning deficits in disordered gambling via drift diffusion modeling and functional magnetic resonance imaging.

Supplemental material
Supplemental Figure 1.Parameter recovery simulation results for RLDDM8 (dual learning rates, modulated decision threshold and non-decision time).a) Generating vs. estimated single-subject parameters across all 10 simulations.b) Control group parameter means.c) Gambling group parameter means.d) Control group parameter standard deviations.e) Gambling group parameter standard deviations.In b-e, squares denote the generating parameter, and vertical lines denote the 95% highest posterior density of the parameter estimation.RLDDM8), model recovery focused on those models that exhibited overlap with RLDDM8 in terms of the 95%CI of the -elpd score in at least one group.This was the case for RLDDMs 4 and 6.We simulated n=20 full datasets from each of the three models, and re-fit the simulated data with all nine models from our model space.Plotted is the percentage of simulations in which the true data-generating model was recovered (True model) and the percentage of simulations in which some other model accounted for the data best (Other).Recovery was successful in > 70% of simulatons for both RLDDM4 and RLDDM8, whereas it was <50% for RLDDM6.Note that chance level is 11.11%.Lower panels: posterior group differences per parameter (controls -gamblers).Solid (thin) horizontal lines in the lower panels denote 85% (95%) highest posterior density intervals.Note that learning rates were fitted in standard normal space [-3, 3] as plotted here, and were back-transformed to the interval [0, 1] via the inverse cumulative normal distribution function.

RLDDM4
Supplemental Table 3. Group differences: Mean posterior group differences in model parameters (Mdiff) and Bayes factors testing for directional effects (dBF).dBF values > 1 quantify the degree of evidence for a reduction in a parameter in gamblers vs. controls compared to the evidence for an increase.dBF values < 1 reflect the reverse.

2 .
Parameter recovery simulation results for RLDDM4 (single learning rate, modulated decision threshold and modulated non-decision time).a) Generating vs. estimated singlesubject parameters across all 10 simulations.Panels b-

3 .
Model recovery results.In addition to the best-fitting model (

4 . 5 . 6 . 10 .
Individual subject posterior predictive checks for the control group and 36 RLDDM8.Plotted are individual-subject observed RT distributions (blue histograms) and model 37 simulated RT distributions (grey lines, smoothed histograms of 1k RT distributions simulated from the 38 modelIndividual subject posterior predictive checks for the gambling disorder group and RLDDM8.Plotted are individual-subject observed RT distributions (red histograms) and model simulated RT distributions (grey lines, smoothed histograms of 1k RT distributions simulated from the model's posterior distribution.Posterior predictive checks for RT changes over the course of learning in 48 individual control group participants.Black lines denote observed mean RTs per trial bin.Solid blue 49 lines denote mean RTs across 1k simluated data sets from the RLDDM8 posterior distribution.Dashed 50 lines denote the +/-95% percentile of the simulated RTs. of evidence for a reduction in a parameter in gamblers vs. controls compared to the evidence for an increase.dBF values < 1 reflect the reverse.Directional test refers to tests for directional effects of the learning rate difference performed separately per group.Upper panels: Softmax model posterior distributions of group mean learning rates (a) and softmax inverse temperatures (b) for controls (blue) and gamblers (red).