^{1}

^{2}

^{1}

^{‡}

^{2}

^{‡}

The authors have declared that no competing interests exist.

‡ These authors are joint senior authors on this work.

Decision formation recruits many brain regions, but the procedure they jointly execute is unknown. Here we characterize its essential composition, using as a framework a novel recursive Bayesian algorithm that makes decisions based on spike-trains with the statistics of those in sensory cortex (MT). Using it to simulate the random-dot-motion task, we demonstrate it quantitatively replicates the choice behaviour of monkeys, whilst predicting losses of otherwise usable information from MT. Its architecture maps to the recurrent cortico-basal-ganglia-thalamo-cortical loops, whose components are all implicated in decision-making. We show that the dynamics of its mapped computations match those of neural activity in the sensorimotor cortex and striatum during decisions, and forecast those of basal ganglia output and thalamus. This also predicts which aspects of neural dynamics are and are not part of inference. Our single-equation algorithm is probabilistic, distributed, recursive, and parallel. Its success at capturing anatomy, behaviour, and electrophysiology suggests that the mechanism implemented by the brain has these same characteristics.

Decision-making is central to cognition. Abnormally-formed decisions characterize disorders like over-eating, Parkinson’s and Huntington’s diseases, OCD, addiction, and compulsive gambling. Yet, a unified account of decision-making has, hitherto, remained elusive. Here we show the essential composition of the brain’s decision mechanism by matching experimental data from monkeys making decisions, to the knowable function of a novel statistical inference algorithm. Our algorithm maps onto the large-scale architecture of decision circuits in the primate brain, replicating the monkeys’ choice behaviour and the dynamics of the neural activity that accompany it. Validated in this way, our algorithm establishes a basic framework for understanding the mechanistic ingredients of decision-making in the brain, and thereby, a basic platform for understanding how pathologies arise from abnormal function.

Decisions rely on evidence that is collected for, accumulated about, and contrasted between available options. Neural activity consistent with evidence accumulation over time has been reported in parietal and frontal sensorimotor cortex [

Multiple models of decision making match aspects of recorded choice behaviour, associated neural activity or both [

Here we test the hypothesis that the brain implements an approximation to an exact inference algorithm for decision making. We show that the algorithm reproduces behaviour quantitatively while the dynamics of its inner variables match those of corresponding neural signals on the random dot motion task—a highly developed paradigm to probe decision formation. By doing so, we predict how experimentally-acquired snapshots of neural activity map onto inference operations. We show this mapping accounts for the involvement of full recurrent cortico-subcortical loops in decision making. Evidence accumulation is thus predicted to occur over the entire loops, not just within cortex. Introducing this algorithm enables us to predict which aspects of neural activity are necessary for inference—hence decision-making—and which are not. For instance, recent data questioned whether non-increasing cortical firing rates encode evidence accumulation during decisions [

Our algorithm explains the decision-correlated experimental data more comprehensively than any prior model, thus introducing a new, cohesive formal framework to interpret it. Collectively, our analyses and simulations indicate that mammalian decision-making is implemented as a probabilistic, recursive, parallel procedure distributed across the cortico-basal-ganglia-thalamo-cortical loops.

We tested our algorithm against behavioural and electrophysiological data recorded in sensorimotor cortex [

(a) Fixed duration task for MT recordings [

During the dot motion task, neurons in the middle-temporal visual area (MT) respond more vigorously to visual stimuli moving in their “preferred” direction than in the opposite “null” direction [

Normative algorithms are useful benchmarks to test how well the brain approximates an optimal probabilistic computation. The family of the multi-hypothesis sequential probability ratio test (MSPRT) [

We now conceptually review the MSPRT and introduce the rMSPRT (

Circles joined by arrows are the Bayes’ rule. All

The MSPRT is a special case of the rMSPRT (in its general form in Eqs

Inference using recursive and non-recursive forms of Bayes’ rule gives the same results (

The hypothesis that the brain approximates an exact inference algorithm during decision formation is so far untested. This requires showing how uncertain sensory spike-trains can be transformed into the experimentally recorded choices. We do so here for the first time by comparing the predicted choice reaction times of the (r)MSPRT to those of monkeys performing the random dot motion task. We sought to account for the reaction time dependence on three factors: the coherence of the dot motion, the number of decision alternatives, and the trial’s outcome (error, correct). We use a particular instance of rMSPRT (Eqs

We assume that during the random dot motion task (

Using these data-determined MT statistics, the (r)MSPRT predicts that the mean decision time on the dot motion task is a decreasing function of coherence (

(a) Comparison of the mean reaction time of monkeys for 2 and 4 alternatives (lines) with that predicted by (r)MSPRT (markers), both for correct trials. Red line: assumed 250 ms of non-decision time. Simulation values are means over 100 Monte Carlo experiments each comprising 3200, 4800 total trials for

The (r)MSPRT framework suggests that decision times directly depend on the discrimination information in the evidence. Discrimination information here is measured as the divergence between pairs of distributions of ISIs (those in

To verify if this information loss alone could account for the monkeys’ deviation from the (r)MSPRT upper bounds, we depleted the discrimination information of its input distributions to exactly match the estimated monkey loss in

Using these information-depleted statistics, the mean reaction times predicted by the (r)MSPRT in correct trials closely match those of monkeys (

(a, b) Mean reaction time of monkeys (lines) with 99% Chebyshev confidence intervals (shading) and (r)MSPRT predictions for correct (a; _{d}). (r)MSPRT results are means of 100 simulations with 3200, 4800 total trials each for

The reaction time distributions of the algorithm closely resemble those of monkeys in that they are positively skewed and exhibit shorter right tails for higher coherence levels (

The above shows that the (r)MSPRT family of exact inference algorithms can account for the dependence of choice reaction times on task difficulty, trial outcome, and the number of alternatives. But replicating behaviour alone does not tell us if the brain implements a similar computation during decisions. We thus asked whether the inner variables of the rMSPRT could account for the known dynamics of neural activity in cortex and striatum during the dot-motion task. To answer this, we must first map its components to a neural circuit. The rMSPRT is the first probabilistic model of decision able to handle recursion and arbitrary signal delays, which means that in principle it could map to a range of feedback neural circuits. Because cortex [

In the visuo-motor system, MT projects to the lateral intra-parietal area (LIP) and frontal eye fields (FEF)—two ‘sensorimotor cortex’ areas. The basal ganglia receives topographically organized afferent projections [

Multiple parallel recurrent loops connecting cortex, basal ganglia and thalamus can be traced anatomically [

Our mapping of computations within the rMSPRT to the cortico-basal-ganglia- thalamo-cortical loop is shown in

(a) Mapping of the negative logarithm of rMSPRT components from _{pq}, from _{yb} + _{bu} + _{uy} with the requirement Δ ≥ 1.

Also, negative log-posteriors will tend to decrease for the best supported hypothesis and increase otherwise. This is consistent with the idea of basal ganglia output nuclei (

Lastly, our mapping of rMSPRT provides an account for the spatially diffuse cortico-thalamic projection [

The mapping of rMSPRT to cortico-subcortical circuits produces key, testable predictions. First, that sensorimotor areas like LIP or FEF in the cortex evaluate the plausibility of all available alternatives in parallel, based on the evidence produced by MT, and join this to any initial bias. Second, that as these signals traverse the basal ganglia, they compete, resulting in a decision variable per alternative. Third, that the basal ganglia output nuclei use these to assess whether to make a final choice and what alternative to pick. Fourth, that decision variables are returned to sensorimotor cortex via thalamus, to become a fresh bias carrying all conclusions on the decision so far. The rMSPRT thus predicts that evidence accumulation happens uninterruptedly in the overall, large-scale loop, rather than in a single site.

With the mapping above, we can compare the dynamics of rMSPRT computations to those of recorded activity during decision-making in area LIP and striatum. We first consider the dynamics around decision initiation. During the dot motion task, the mean firing rate of LIP neurons deviates from baseline into a stereotypical dip soon after stimulus onset, possibly indicating the reset of a neural integrator [

(a, b) Mean population firing rate of LIP neurons during correct trials on the reaction-time version of the dot motion task (19 neurons). By convention, inRF trials are those when recorded neurons had the motion-cued target inside their response field (solid lines); outRF trials are those when that target was outside the neuron’s response field (dashed lines). (a) Aligned at stimulus onset, starting at the stereotypical dip, illustrating the “ramp-and-fork” pattern between average inRF and outRF responses. (b) Aligned at saccade onset (vertical dashed line). (c, d) Mean time course of the model sensorimotor cortex in rMSPRT aligned at decision initiation (c; _{d}. For simplicity the (r)MSPRT is simulated in discrete time steps, but these have an interpretation in continous time (see _{i}(_{yb}) = 0 for all _{i}) = −log (1/

(a-j)

The model LIP (sensorimotor cortex) in rMSPRT captures each of these properties: activity ramps from the start of the accumulation, forks between putative in- and out-RF responses, and scales with the number of alternatives (

The rMSPRT embodies a mechanistic explanation for the ramp-and-fork pattern in the two cases of

The rMSPRT further predicts that the scaling of activity in sensorimotor sites by the number of alternatives is due to cortico-subcortical loops becoming transiently organized as

The rMSPRT also captures key features of dynamics at decision termination. For inRF trials, the mean firing rate of LIP neurons peaks at or very close to the time of saccade onset (

This prediction may explain an apparent paradox of LIP activity. The peri-saccadic population firing rate peak in LIP during inRF trials (

In the rMSPRT, the striatum relays the input from sensorimotor cortex as an inhibitory drive for downstream basal ganglia nuclei. The rMSPRT has three free parameters that shape the ramp-and-fork of its inner variables, but do not alter inference. We have set their value to show that mapped variables can match the pattern in sensorimotor cortical neural dynamics (see

LIP and striatal firing rates are also modulated by dot-motion coherence (

Our proposed mapping of the rMSPRT’s components (

In detection tasks like visually- or memory-guided ones, the decision cues are extremely obvious. Hence, the accompanying recorded neural-activity transients may be argued to encode very short evidence-accumulations. After all, the accumulation of a single observation (

For visuo-motor thalamus, rMSPRT predicts that the time course of the mean firing rate will exhibit a ramp-and-fork pattern similar to that in LIP (

Understanding how a neural system implements an algorithm is complicated by the need to identify which features are core to executing the algorithm, and which are imposed by the constraints of implementing computations using neural elements—for example, that neurons cannot have negative firing rates, so cannot straightforwardly represent negative numbers. The three free parameters in the rMSPRT allow us to propose which functional and anatomical properties of the cortico-basal-ganglia-thalamo-cortical loop are workarounds within these constraints, but do not affect inference.

One free parameter enforces the baseline activity that LIP neurons maintain before and during the initial stimulus presentation (

Each solid and dashed set of lines is the mean of correct trials in a single Monte Carlo experiment, with 800 total trials, 25.6% coherence and _{yu} = 0, _{yu}. (c) Varying the data scaling factor,

The second free parameter, _{yt}, sets the strength of the spatially diffuse projection from cortex to thalamus. Varying this weight changes the forking between inRF and outRF computations but does not affect inference (

Traditionally, evidence accumulation is exclusively associated with increasing firing rates during decision, and previous studies have questioned whether the often-observed decision-correlated yet non-increasing firing rates (

We tested the hypothesis that the brain approximates exact inference for decision making. We did so by showing that a novel recursive form of the MSPRT, the rMSPRT, uniquely accounts for both monkey choice behaviour and the corresponding neural dynamics in cortex and striatum, while its architecture matches that of the cortico-subcortical decision circuits.

The recursive computation implied by the looped cortico-basal-ganglia-thalamo-cortical architecture has several advantages over local or feedforward computations. First, recursion makes trial-to-trial adaptation of decisions possible. Priors determined by previous stimulation (fed-back posteriors), can bias upcoming similar decisions towards the expected best choice, even before any new evidence is collected. This can shorten reaction times in future familiar settings without compromising accuracy. Second, recursion provides a robust memory. A posterior fed-back as a prior is a sufficient statistic of all past evidence observations. That is, it has taken ‘on-board’ all sensory information since the decision onset. In rMSPRT, the sensorimotor cortex only need keep track of observations in a moving time window of maximum width Δ —the delay around the cortico-subcortical loop— rather than keeping track of the entire sequence of observations. For a physical substrate subject to dynamics and leakage, like a neuron in LIP or FEF, this has obvious advantages: it would reduce the demand for keeping a perfect record (

The rMSPRT decides faster than monkeys in the same conditions because monkeys do not make full use of the discrimination information available in their MT (

Excellent matches to monkeys’ performance in both correct and error trials, and hence their speed-accuracy trade-offs, were obtained solely by accounting for lost information in the evidence streams. No noise was added within the rMSPRT itself. Prior experimental work reported perfect, noiseless evidence integration by both rat and human subjects performing an auditory task, attributing all effects of noise on task performance to the variability in the sensory input [

Neurons in LIP, FEF [

A random dot stimulus pulse delivered earlier in a trial has a bigger impact on LIP firing rate than a later one [

rMSPRT qualitatively replicates the ramp-and-fork pattern for individual coherence levels and given number of alternatives,

One such constraint is that these brain regions engage in multiple other computations, some of which are likely orthogonal to solving the random dot motion task. The neural activity recorded during decision tasks may then be a transformation of inference computations, by mixing them with all other simultaneous computations. Consistent with this, the successful fitting of previous computational models to neural data [

Inputs to the rMSPRT were determined solely from MT responses during the dot-motion task, and it has only three free parameters, none of which affect inference. It is thus surprising that it renders emergent predictions that are consistent with experimental data. First, our information-depletion procedure used exclusively statistics from correct trials. Yet, after depletion, rMSPRT matches monkey behaviour in correct

The rMSPRT contains all previous instances of the MSPRT [

Biophysical models that directly address neural implementations of decision making are predominantly based on winner-take-all competition between neurons representing different hypotheses [

Mapping any formal algorithm to a neural substrate implies proposing assumed computational contributions for the components of the substrate. In mapping the rMSPRT we made two broad classes of assumptions. First, as explained above, that individual substrates implement multiple functions either simultaneously or under different stimulation scenarios (

Our second class of assumptions is that the omitted connections into and within the basal ganglia may not contribute to the computations essential to inference with cortical inputs. Of note, we have omitted in our mapping the projections from thalamus to striatum [

Demonstrating the compatibility of anatomical pathways with the mapping of the (r)MSPRT is the subject of ongoing research. Success has been achieved in the expansion of the basal-ganglia mapping of the MSPRT to include the pathway from striatum to globus pallidus pars externa and that from the latter to SNr, where the same inference could be done without those pathways [

We sought to characterize the neural mechanism that underlies decisions using a normative algorithm—the rMSPRT—as a framework. We find it remarkable that, starting from data-constrained spike-trains, our monolithic statistical test can simultaneously account for much of the anatomy, behaviour, and electrophysiology of decision-making. While it is not plausible that the brain implements exactly a specific algorithm, our results suggest that the essential composition of its underlying decision mechanism includes the following. First, that the mechanism is probabilistic in nature—the brain utilizes the uncertainty in neural signals, rather than suffering from it. Second, that the mechanism works entirely ‘on-line’, continuously updating representations of hypotheses that can be queried at any time to make a decision. Third, that this processing is distributed, recursive, and parallel, producing a decision variable for each available hypothesis. And fourth, that this recursion allows the mechanism to adapt to the observed statistics of the environment in an unsupervised manner, as it can re-use updated probabilities about hypotheses as priors for upcoming decisions. With the currently available range of experimental studies giving us local snapshots of cortical and subcortical activity during decision-making tasks, the rMSPRT shows us how, where, and when these snapshots fit into a complete inference procedure.

Behavioural and neural data was collected in three previous studies [

Three rhesus macaques (

Two macaques per study learned to fixate their gaze on a central fixation point (

For comparability across databases, we only analysed data from trials with coherence levels of 3.2, 6.4, 12.8, 25.6, and 51.2%, unless otherwise stated. We used data from all neurons recorded in such trials. Our datasets contained between 189 and 213 visual-motion-sensitive MT neurons (see

Ω, Ω_{d} |
Ω | Ω_{d}, |
Ω_{d}, |
||||||
---|---|---|---|---|---|---|---|---|---|

Coherence % | No. neurons | _{*} |
_{*} |
_{0} |
_{0} |
_{0} |
_{0} |
_{0} |
_{0} |

3.2 | 206 | 54.1 | 33.1 | 59.4 | 34.5 | 59.0 | 34.4 | 58.5 | 34.3 |

6.4 | 211 | 52.0 | 32.2 | 62.9 | 35.3 | 60.6 | 34.7 | 59.8 | 34.4 |

12.8 | 213 | 46.1 | 30.5 | 65.5 | 36.1 | 60.3 | 34.6 | 58.3 | 34.0 |

25.6 | 208 | 37.7 | 28.0 | 70.2 | 37.2 | 62.0 | 34.9 | 59.9 | 34.3 |

51.2 | 189 | 29.9 | 26.0 | 83.5 | 40.6 | 75.5 | 38.5 | 71.8 | 37.4 |

Second column: number of neurons for which data was available per coherence. _{*} denote that dots were moving towards the preferred motion direction of the MT neuron, whereas _{0} denotes that they were moving in the opposite, null direction. The parameter set, Ω (computed here from MT data) or Ω_{d} (after information depletion), to which each value corresponds is noted above them. Note that, due to the information depletion required to produce Ω_{d}, _{0}_{0}

To estimate moving statistics of neural activity we first computed the spike count over a 20 ms window sliding every 1 ms, per trial. The moving mean firing rate per neuron per condition was then the mean spike count over the valid bins of all trials divided by the width of this window; the standard deviation was estimated analogously. LIP and striatal recordings were either aligned at the onset of the stimulus or of the saccade; after or before these (respectively), data was only valid for a period equal to the reaction time per trial. The population moving mean firing rate is the mean of single-neuron moving means over valid bins; analogously, the population moving variance of the firing rate is the mean of single neuron moving variances. For clarity, population statistics were then smoothed by convolving them with a Gaussian kernel with a 10 ms standard deviation. The resulting smoothed population moving statistics for MT are in

Analogous procedures were used to compute the moving mean of the computations within simulated algorithms, per time step, rather than over a moving window. These are shown up to the median of termination observations plus 3 time steps.

Let _{1}(_{C}(_{j}(_{j}(

_{*}_{0}_{c} in correct trials, as in this diagram. In discrete time it takes an average of 〈_{c} vector observations, _{i}(_{c}, required by the simpler, discrete-time MSPRT (here 7 in both cases), which carries to our identically-performing rMSPRT; this is true under equal input channel statistics (_{*}, _{0}, _{*}, _{0}), data distributions (_{c}, and multiply this by the minimum mean ISI, _{*}_{c}, hence _{0} and 〈_{e}, to get _{e} (

There are _{i} (_{i}|_{i}|_{i}) and likelihoods _{i}); however, after some time Δ ∈ {1, 2, …}, it will re-use past posteriors, _{i}|_{i}) of the segment of _{i}|

If say Δ = 2, in the first time step,

By

Note that we are still using the initial fixed priors _{i}). Now, for

According to the product rule, we can segment the probability of the sequence

And, since

If we substitute the likelihood in

It is evident that the rightmost factor is _{i}|

So, in general:

Ahead we use three key results from [

It is apparent that the critical computations in _{i} will bear its same index _{i}(_{i}(_{*}, rather than from _{0}, that is assumed to have originated _{k}(

When _{i}|

Now, for our likelihood functions to work upon a statistical structure like that produced by neurons in MT we need to be more specific. Inter-spike intervals (ISI) in MT during the random dot motion task are best described as lognormally distributed [_{*} and _{0} are lognormal and that they are specified by means _{*} and _{0}, and standard deviations _{*} and _{0}, respectively. We can then put together the logarithm of Eqs _{i}(^{2} = log(^{2}/^{2} + 1) with appropriate subindices _{*}, _{0}.

The terms _{0}Δ in _{i}(

We now take the logarithm of _{i}(_{i}|_{i}(_{i}(

The rMSPRT itself takes the form:
_{1}, …, _{N}}, giving a more general formulation.

According to our mapping of rMSPRT to the cortico-subcortical loops (

It houses a constant baseline _{bu} − _{uy}), which in turn is the delayed cortical input to the thalamus

Here we have chosen _{bu}) to be a scaled average of cortical contributions; nevertheless, any other hypothesis-independent function of them can be picked instead if necessary. It would thus not affect inference and render similar results.

The definitions above introduce two free parameters _{yu} ∈ [0, 1) that have the purpose of shaping the dynamics of the computations within rMSPRT during decision formation. The range of _{yu} ensures that the value of computations in the cortico-thalamo-cortical, positive-feedback loop is not amplified to the point of disrupting inference in the overall loop. Crucially, since both parameters are hypothesis-independent, none affects inference.

For rMSPRT decisions to be comparable to those of monkeys, they must exhibit the same error rate, ^{2} > 0.99) to the behavioural psychometric curves from the analysed LIP database, including 0, 9, and 72.4% coherence for this purpose. This resulted in:

Since monkeys are trained to be unbiased regarding choosing either target, initial priors for rMSPRT are set flat (_{i}) = 1/_{j} (_{0} and _{0}, respectively; the exception was a single channel where the sampled distribution was specified by _{*} and _{*}. This models the fact that MT neurons respond more vigorously to visual motion in their preferred direction compared to motion in a null direction,

To parameterize the input stochastic processes and likelihood functions of rMSPRT, we estimated the means _{*}_{0} _{*}_{0}_{*} indicates the condition when dots were predominantly moving in the direction preferred by the neuron. The subscript _{0} indicates when they were moving against it. We dub this parameter set Ω, and report it in _{*}, in our notation, and dashed ones are _{0}, per coherence.

We use _{yu} = 0.4, _{yb}, _{yu}, _{bu}, _{uy} = 1 (hence Δ = 3) in all simulations, unless otherwise stated. The value of latencies was set to 1 for simplicity. The values of the first three free parameters come from a manual tuning exercise with the aim of revealing a pattern in the model LIP akin to the ramp-and-fork one in LIP recordings; note that such a two-segment pattern is already guaranteed by the two cases of

The statistics in all simulations were from either the Ω or Ω_{d} parameter sets as noted per case. Note that the statistics actually used are those extracted from MT in

We have defined rMSPRT to operate over a discrete time line; however, the brain operates over continuous time. [_{c}, required by the (r)MSPRT, can be interpreted as the mean decision time
_{*}_{*} < _{0})—as in MT [_{0}_{*}_{e} is the mean decision sample size for error trials. An instance of rMSPRT capable of making choices upon sequences of spike-trains is straightforward from the formal framework above and that introduced by [

We outline here how we use the monkeys’ reaction times on correct trials and the properties of the rMSPRT, to estimate the amount of discrimination information lost by the animals. That is, the gap between all the information available in the responses of MT neurons, as fully used by the rMSPRT (parameter set Ω), and the fraction of such information actually used by monkeys.

The expected number of observations to reach a correct decision for (r)MSPRT, 〈_{c}, depends on two quantities. First, the mean total discrimination information required for the decision, _{*} to _{0}

The product of our Monte Carlo estimate of 〈_{c} in the rMSPRT (

The ‘mean decision sample size’ of monkeys—hence the superscript ^{m}—within this framework corresponds to

Expression _{0} closer to _{*} per condition, assuming 250 ms of non-decision time; critically, simulations like those in

An example of the results of information depletion in one condition is in _{*} and _{*} fixed. Then, we iteratively reduce or increase the differences |_{0} − _{*}| and |_{0} − _{*}| by the same proportion, until we get new parameters _{0} and _{0} that, together with _{*} and _{*}, specify preferred (‘preferred’ in _{d} and reported in

The slight deviation of the mean reaction times of (r)MSPRT

Monkey mean reaction times from

(TIF)

Example mean cortico-thalamic contribution, _{bu}) (red), compared to the mean thalamic output during inRF settings (solid blue) and outRF ones (dashed blue) for 25% coherence and

(TIF)

Parallel computations for

(TIF)

With the method described in the main article, we computed the mean firing rate of every MT neuron in our data facing dots moving in its preferred or null motion directions at six coherence levels (%): 0, 3.2, 6.4, 12.8, 25.6, 51.2 (randomly assigning 0% trials between directions). For every neuron, every 1 ms bin, we conducted a linear regression per motion direction of the form _{s}_{s} are the intercept and the coefficient for the coherence contribution, respectively. We then applied a _{s}’s (again, one per MT neuron) we got per direction, equals 0. Here we show the corresponding

(TIF)

We thank Anne Churchland, Roozbeh Kiani, Michael Shadlen, Long Ding, and Joshua Gold for sharing their experimental data and the Humphries lab (Abhinav Singh, Mathew Evans, and Silvia Maggi), Rafal Bogacz, and Long Ding for discussions.