^{1}

^{2}

^{*}

^{1}

^{3}

^{4}

^{3}

^{4}

^{1}

^{3}

^{5}

^{1}

^{5}

^{6}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: LT AMC CS MB. Performed the experiments: LT AMC DA CS. Analyzed the data: LT AMC CS MB. Wrote the paper: LT MB.

An essential goal of sensory systems neuroscience is to characterize the functional relationship between neural responses and external stimuli. Of particular interest are the nonlinear response properties of single cells. Inherently linear approaches such as generalized linear modeling can nevertheless be used to fit nonlinear behavior by choosing an appropriate feature space for the stimulus. This requires, however, that one has already obtained a good understanding of a cells nonlinear properties, whereas more flexible approaches are necessary for the characterization of unexpected nonlinear behavior. In this work, we present a generalization of some frequently used generalized linear models which enables us to automatically extract complex stimulus-response relationships from recorded data. We show that our model can lead to substantial quantitative and qualitative improvements over generalized linear and quadratic models, which we illustrate on the example of primary afferents of the rat whisker system.

To account for the stochasticity inherent to neural responses, single cells as well as populations of cells are often characterized in terms of a probabilistic model. A popular choice for this task are generalized linear models (GLMs) and related approaches

Several approaches have been suggested to overcome the limitations of the generalized linear model. A natural extension is given by

An alternative approach is offered by nonparametric methods such as

Here, we explore a different tradeoff. We derive a much more flexible neuron model for single cells which can, at least in principle, approximate arbitrary dependencies on the stimulus. The model can be viewed as generalizing generalized linear and quadratic models, but in contrast to quadratic models cannot easily be reduced to a GLM by choosing a different representation of the stimulus. Nonlinear stimulus features are directly learned from the data by maximizing the model's likelihood and do not need to be hand picked. The number of parameters of the model still grows only quadratically with the dimensionality of the stimulus, and the complexity of the model can be tuned to take into account the cell's complexity and the amount of data available. We demonstrate that optimizing this model is feasible in practice and can lead to a better fit than either generalized linear or quadratic models.

In the following—after briefly reviewing generalized linear and quadratic models—we introduce a new model for single cell responses and discuss its properties.

All experimental and surgical procedures were carried out in accordance with the policy on the use of animals in neuroscience research of the Society for Neuroscience and the German law.

In a GLM it is assumed that the output

A special case of the GLM applicable to our problem is, for instance, the linear-nonlinear-Bernoulli (LNB) model, where the exponential family is given by the Bernoulli distribution. As nonlinearity we might choose the sigmoidal logistic function,

Let us consider the distribution over the stimulus

The unconstrained quadratic model is equivalent to a GLM with a quadratic extension of the feature space, since

The generative point of view immediately suggests generalizations by loosening the assumptions of Gaussian distributed spike-triggered and non-spike-triggered stimuli. In the following, we consider mixtures of Gaussians as possible candidates,

In the same manner that we have derived a model for the neuron's dependency on the stimulus, we can incorporate dependence on other features as well. Let

Taken together, the input to the sigmoid nonlinearity (

It is instructive to compare

Assuming a single non-spike-triggered mixture component as in

To reduce the number of parameters, we can employ the same trick as for the quadratic model and replace the matrices

We tested our model on spike trains obtained from 18 whisker-sensitive trigeminal ganglion cells of adult Sprague-Dawley rats. Recordings were made with a single electrode (sampling frequency: 20 kHz, bandpass filter: 300–5000 Hz). Manual stimulation was used to identify which whisker the neuron innervated as well as the approximate preferred direction of the whisker, after which the whisker was placed inside a plastic tube driven by a metal stimulator arm. The stimulator arm was programmed to deliver low-pass filtered (100 Hz) Gaussian white noise stimulation along the neuron's preferred movement direction. Stimulation was divided into 50

Stimuli corresponding to the spike trains are shown at the top. The first row below the stimulus shows spike trains and interspike interval distributions generated by one

We extracted 10 ms windows from the stimulus and reduced their dimensionality by keeping the first 10 principal components (

Filters of generalized linear models were first trained assuming a sigmoid nonlinearity. Together with a Bernoulli output distribution, this guarantees a concave log-likelihood such that an optimal solution can be found. Afterwards, we replaced the sigmoid nonlinearity with a more flexible nonlinearity consisting of a sum of Gaussian blobs,

The parameters of the STM (

We used between three and five components for the spike-triggered distribution and one and two components for the non-spike-triggered distribution, which was found to work well in preliminary runs on a different but related dataset with similar stimuli. Using two non-spike-triggered components increased the stability of the optimization for some cells. Finally, factored STMs were trained discriminatively using limited-memory BFGS with randomly initialized parameters.

All models were trained on the 50 unfrozen trials and performance was evaluated based on the 50 frozen trials.

We qualitatively and quantitatively compare the performance of the generalized linear, quadratic and spike-triggered mixture model (STM) for different cells and find in both cases that the STM can lead to substantial improvements.

The insets show peristimulus time histograms (PSTHs) corresponding to the spike trains in ^{2}) by the generalized linear model, quadratic model and STM was 0.15, 0.26, and 0.47 (

Similar conclusions can be drawn by looking at spike-triggered distributions (

The plots show spike-triggered histograms of positions

To get a better intuition for the degree of nonlinearity of a cell, we can compare the cell's spike-triggered distribution with the spike-triggered distribution of the best matching linear model. In the given examples, the linear model is unable to reproduce the spike-triggered distributions of the cells displayed in

To quantify the performance of the different models, we estimate the

The cross-entropy can be used to lower-bound the mutual information between stimuli and spikes,

The spike train's mutual information with the stimulus can be decomposed as follows

Linear, quadratic and spike-triggered mixture models (STM) were evaluated on 8 slowly adapting cells (

In addition to comparing different models, we can also compute and compare our model's performance to the cross-entropy of a PSTH, which has also been called

While the performance of the PSTH does not give us a guaranteed lower bound on the achievable cross-entropy, it gives us a very optimistic estimate of the performance that can be achieved by a model which does not take spike history into account. We found that the PSTH yielded a significantly lower cross-entropy than an STM without history dependency (

PSTHs for model cells were estimated from 1000 spike trains (sampling spike trains was necessary since the models depend on the spike history) using the same kernel as for the real cell. The variance explained (

The high firing rate of the cells and the resulting abundance of data allowed us to neglect regularization and overfitting issues. The training set contained on average about 25,000 spikes for SA cells and 6,700 spikes for RA cells. However, typically much less data is available.

To counter overfitting, different approaches to regularization can be taken. We already suggested reducing the number of parameters of the STM via factorization and parameter sharing (

The factored STM was trained with different random subsets of the training trials and evaluated on all test trials for one SA cell (

For the SA cell, the performance of the factored STM started to decrease more rapidly as soon as less than 5,000 spikes were present in the training set. However, even with 2,500 spikes the average performance was still much better than the performance of a quadratic model trained on the entire dataset. For the RA cell, the performance started to deteriorate at about 2,000 spikes. Note that the performance of the linear model worsened at a similar rate. Reducing the number of parameters further by using half the spike history or six instead of ten principal componets to represent the stimulus did not help to improve performance in the regime of few data points. The performance might however be improved by choosing suitable priors for the parameters, which we did not explore here.

Training with half the dataset of the RA cell (about

We have shown that a spike-triggered mixture model can lead to better performance than either linear or quadratic models, which we illustrated on the example of rat primary afferents. A possible explanation for the improved performance might be that our model can better cope with a cell's adaptation to the stimulus. Because the firing rate of our model is effectively a maximum over a number of quadratic models, the model is able to respond differently in different regions of the stimulus space. Our model may yield even bigger improvements when applied to cells higher up the hierarchy—such as cortical cells—where highly nonlinear dependencies on the stimulus are to be expected

Here, we chose to give up on the constraint of convexity to be able to build a more flexible neuron model. In practice, non-convex or even multimodal likelihoods do not have to be an issue. Many local optima of the STM likelihood are created simply by permutations of the parameters of the different mixture components and are therefore unproblematic. We found that initializing mixture models with EM and fine-tuning with an off-the-shelf optimizer worked well for our data and the performance of the resulting model was stable across different intializations. The parameters of the factored variant of the STM (

Alternatively, we could have used support vector machines, kernel logistic regression (KLR)

KLR is equivalent to a linear-nonlinear-Bernoulli model with a cleverly chosen feature space whose dimensionality grows with the number of data points. Hence, one advantage of KLR is that its objective function is convex. Advantages of a parametric model like the one presented in this paper are more readily interpretable parameters and lower computational costs when the number of training points is large. Ultimately, whether kernel based methods or a generative approach should be preferred presumably depends on whether one has a better intuition of what represents a good kernel for the input space, or a better intuition of what represents a good characterization of the spike-triggered distribution.

The idea of using spike-triggered distributions to construct and motivate neuron models is not new. However, most work in this direction has focused on spike-triggered averages and covariances

Yet another related approach is to use feed-forward neural networks

STMs can easily be extended to model populations of neurons similar to how GLMs are extended to populations by introducing coupling filters

Code for fitting STMs is provided at

Gradients for the log-likelihoods of the STM and factored STM.

(PDF)