The authors have declared that no competing interests exist.

Recurrent neural networks (RNNs) provide state-of-the-art performances in a wide variety of tasks that require memory. These performances can often be achieved thanks to gated recurrent cells such as gated recurrent units (GRU) and long short-term memory (LSTM). Standard gated cells share a layer internal state to store information at the network level, and long term memory is shaped by network-wide recurrent connection weights. Biological neurons on the other hand are capable of holding information at the cellular level for an arbitrary long amount of time through a process called bistability. Through bistability, cells can stabilize to different stable states depending on their own past state and inputs, which permits the durable storing of past information in neuron state. In this work, we take inspiration from biological neuron bistability to embed RNNs with long-lasting memory at the cellular level. This leads to the introduction of a new bistable biologically-inspired recurrent cell that is shown to strongly improves RNN performance on time-series which require very long memory, despite using only cellular connections (all recurrent connections are from neurons to themselves, i.e. a neuron state is not influenced by the state of other neurons). Furthermore, equipping this cell with recurrent neuromodulation permits to link them to standard GRU cells, taking a step towards the biological plausibility of GRU. With this link, this work paves the way for studying more complex and biologically plausible neuromodulation schemes as gating mechanisms in RNNs.

Recurrent neural networks (RNNs) have been widely used in the past years, providing excellent performances on many problems requiring memory such as sequence to sequence modeling, speech recognition, and neural translation. These achievements are often the result of the development of the long short-term memory (LSTM [

In parallel, there has been an increased interest in assessing the biological plausibility of neural networks. There has not only been a lot of interest in spiking neural networks [

RNNs combine simple cellular dynamics and a rich, highly recurrent network architecture. The recurrent network architecture enables the encoding of complex memory patterns in the connection weights. These memory patterns rely on global feedback interconnections of large neuronal populations. Such global feedback interconnections are difficult to tune, and can be a source of vanishing or exploding gradient during training, which is a major drawback of RNNs. In biological networks, a significant part of advanced computing is handled at the cellular level, mitigating the burden at the network level. Each neuron type can switch between several complex firing patterns, which include e.g. spiking, bursting, and bistability. In particular, bistability is the ability for a neuron to switch between two stable outputs depending on input history. It is a form of cellular memory [

In this work, we propose a new biologically motivated bistable recurrent cell (BRC), which embeds classical RNNs with local cellular memory rather than global network memory. More precisely, BRCs are built such that their hidden recurrent state does not directly influence other neurons (i.e. they are not recurrently connected to other cells). To make cellular bistability compatible with the RNNs feedback architecture, a BRC is constructed by taking a feedback control perspective on biological neuron excitability [

We show that, despite having only cellular temporal connections, BRCs provide decent performances on standard benchmarks and outperform classic RNN cells as GRUs and LSTMs on benchmarks with datasets requiring long-term memory, highlighting the importance of bistability. To further improve BRCs performances, we endow them with recurrent neuromodulation, leading to a new neuromodulated bistable recurrent cell (nBRC). We carry a thorough analysis of the performances of nBRCs against state-of-the-art cells and show that they are the top performers when long-term memory requirements are important.

RNNs have been widely used to tackle many problems having a temporal structure. In such problems, the relevant information can only be captured by processing observations obtained during multiple time-steps. More formally, a time-series can be defined as _{0}, …, _{T}] with _{t} = _{t−1}, _{t};_{0} is a constant and

Biological neurons are intrinsically dynamical systems that can exhibit a wide variety of firing patterns. In this work, we focus on the control of bistability, which corresponds to the coexistence of two stable states at the neuronal level. Bistable neurons can switch between their two stable states in response to transient inputs [

Complex neuron firing patterns are often modeled by systems of ordinary differential equations (ODEs). Translating ODEs into an artificial neural network algorithm often leads to mixed results due to increased complexity and the difference in modeling language. Another approach to model neuronal dynamics is to use a control systems viewpoint [

A neuronal feedback diagram focusing on one time-scale, which is sufficient for bistability, is illustrated in _{pre} are combined at the input level to create a synaptic current _{syn}. Neuron-intrinsic dynamics are modeled by the negative feedback interconnection of a nonlinear function _{int} = _{post}), called the IV curve in neurophysiology, which outputs an intrinsic current _{int} that adds to _{syn} to create the membrane current _{m}. The slope of _{post}) determines the feedback gain, a positive slope leading to negative feedback and a negative slope to positive feedback. _{m} is then integrated by the postsynaptic neuron membrane to modify its output voltage _{post}.

To model controllable bistability in RNNs, we start by drawing two main comparisons between the feedback structure _{z} _{t−1}, _{r} _{t−1} and _{h} _{t−1} that each cell uses the internal state of other neurons to compute its own state without going through synaptic connections. In biological neurons, the intrinsic dynamics defined by _{int} is constrained to only depend on its own state _{post}, and the influence of other neurons comes only through the synaptic compartment (_{syn}), or through neuromodulation.

To enforce this cellular feedback constraint in GRU equations and to endow them with bistability, we propose to update _{t} as follows:
_{t} = 1 + tanh(_{a} _{t} + _{a}⊙_{t−1}) and _{t} = _{c} _{t} + _{c}⊙_{t−1}). _{t} corresponds to the feedback parameter _{t} ∈ ]0, 2[ (as tanh(⋅)∈] − 1, 1[). _{t} corresponds to the update gate in GRU and plays the role of the membrane capacitance

The main differences between a BRC and a GRU are twofold. First, each neuron has its own internal state _{t} that is not directly affected by the internal state of the other neurons. Indeed, due to the four instances of _{t−1} coming from Hadamard products, the only temporal connections existing in layers of BRC are from neurons to themselves. This enforces the memory to be only cellular. Second, the feedback parameter _{t} is allowed to take a value in the range ]0, 2[ rather than ]0, 1[. This allows the cell to switch between monostability (

To further improve the performance of BRC, one can relax the cellular memory constraint. By creating a dependency of _{t} and _{t} on the output of other neurons of the layer, one can build a kind of recurrent layer-wise neuromodulation. We refer to this modified version of a BRC as an nBRC, standing for recurrently neuromodulated BRC. The update rule for the nBRC is the same as for BRC, and follows _{t} and _{t}, which are neuromodulated as follows:

The update rule of nBRCs being that of BRCs (

This recurrent neuromodulation scheme brings the update rule even closer to standard GRU. This is highlighted when comparing Eqs _{t} belonging to ]0, 2[ (thus the possibility to be greater than 1). A relaxed cellular memory constraint is also ensured, as each neuron past state _{t−1} only directly influences its own current state and not the state of other neurons of the layer (Hadamard product on the _{t} update in _{h} _{t−1} in

Finally, let us note that to be consistent with the biological model presented in Section 3,

To demonstrate the impact of bistability in RNNs we tackle four problems. The first is a one-dimensional toy problem, the second is a two-dimensional denoising problem, the third is the permuted sequential MNIST problem and the fourth is a variation of the third benchmark. All benchmarks are related to a supervised setting. The network is presented with a time-series and is asked to output a prediction (regression for the first two benchmarks and classification for the others) after having received the last element(s) of the time-series _{T}. Note that for the second benchmark the regression is carried over multiple time-steps (sequence-to-sequence) whereas, this prediction is given in a single time-step after receiving _{T} for the other benchmarks. We first show that the introduction of bistability in recurrent cells is especially useful for datasets in which only time-series with long time-dependencies are available. We achieve this by comparing results of BRC and nBRC to other recurrent cells. We use LSTMs [

For the first two problems, training sets comprise 40000 samples and performances are evaluated on test sets generated with 50000 samples. For the permuted MNIST benchmarks, the standard train and test sets are used. All averages and standard deviations reported were computed over three different seeds. We found that there were only minor variations in between runs, and thus believe that three runs are sufficient to capture the performance of the different architectures. For all benchmarks, networks are composed of two layers of 128 neurons. Different recurrent cells are always tested on similar networks (i.e. same number of layers/neurons). We used the tensorflow [^{−3} is used for training all networks, with a mini-batch size of 100. The source code for carrying out the experiments is available at

In this benchmark, the network is presented with a one-dimensional time-series of _{T}, the network output value should approximate _{0}, a task that is well suited for capturing their capacity to learn long temporal dependencies if _{0} is sampled from a normal distribution

Results are shown after 50 epochs and for different values of

BRC | NBRC | GORU | LSTMCell | GRUCell | LMU | |
---|---|---|---|---|---|---|

5 | 0.005±0.001 | 0.000±0.000 | 0.000±0.000 | 0.000±0.000 | 0.000±0.000 | 0.000±0.000 |

50 | 0.082±0.027 | 0.002±0.000 | 0.019±0.009 | 0.000±0.000 | 0.997±0.005 | 0.000±0.000 |

300 | 0.086±0.014 | 0.010±0.003 | 0.308±0.050 | 1.002±0.009 | 0.876±0.190 | 0.000±0.000 |

600 | 0.099±0.029 | 0.009±0.002 | 0.323±0.068 | 0.989±0.008 | 0.999±0.017 | 0.002±0.001 |

The copy input benchmark is interesting as a means to highlight the memorisation capacity of the recurrent neural network, but it does not tackle its ability to successfully exploit complex relationships between different elements of the input signal to predict the output. In the denoising benchmark, the network is presented with a two-dimensional time-series of _{1}, …, _{5}, for which data should be remembered, are sampled uniformly in {0, …, _{t}[1] = 1 if _{1}, …, _{5}}, _{t}[1] = 0 if _{t}[1] = −1 otherwise. Note that the parameter _{x} < _{t}[1] = 0).

The second dimension is a data-stream, generated as for the copy first input benchmark, that is _{t}[2] = 0, ∀_{x} the output of the neural network at time-step

As one can see in

Results are shown with and without constraint on the location of relevant inputs and after 50 epochs. Relevant inputs cannot appear in the _{t}[1] = −1, ∀

BRC | NBRC | GORU | LSTM | GRU | LMU | |
---|---|---|---|---|---|---|

5 | 0.579±0.033 | 0.016±0.003 | 0.000±0.000 | 0.655±0.463 | 0.001±0.000 | 1.004±0.006 |

200 | 0.614±0.119 | 0.071±0.078 | 1.004±0.003 | 0.996±0.005 | 0.995±0.003 | 1.000±0.003 |

In this benchmark, the network is presented with the MNIST images, where pixels are shown, one by one, as a time-series. It differs from the regular sequential MNIST in that pixels are shuffled, with the result that they are not shown in top-left to bottom-right order. This benchmark is known to be a more complex challenge than the regular one. Indeed, shuffling makes time-dependencies more complex by introducing lag in between pixels that are close together in the image, thus making structure in the time-serie harder to find. MNIST images are comprised of 784 pixels (28 by 28), requiring dynamics over hundreds of time-steps to be learned.

BRC | NBRC | GORU | LSTMCell | GRUCell | LMU | |
---|---|---|---|---|---|---|

Acc. | 0.662±0.007 | 0.908±0.006 | 0.902±0.004 | 0.910±0.002 | 0.908±0.004 | 0.969±0.001 |

F1 | 0.655±0.007 | 0.906±0.005 | 0.897±0.008 | 0.907±0.003 | 0.902±0.006 | 0.965±0.002 |

In this benchmark, we use the same permutation of pixels in the MNIST images as for the previous benchmark. We then feed the pixels to the RNNs line by line, thus allowing one to test the networks with a higher input dimension (28 in this case). Furthermore, to highlight once again the interest of bistability, we add

Images are fed to the recurrent network line by line and

BRC | NBRC | GORU | LSTMCell | GRUCell | LMU | ||
---|---|---|---|---|---|---|---|

Acc. | 72 | 0.968±0.001 | 0.973±0.001 | 0.977±0.000 | 0.977±0.002 | 0.977±0.002 | 0.969±0.001 |

F1 | 72 | 0.967±0.001 | 0.972±0.001 | 0.972±0.000 | 0.976±0.002 | 0.974±0.001 | 0.941±0.002 |

Acc. | 472 | 0.960±0.001 | 0.972±0.002 | 0.198±0.021 | 0.562±0.328 | 0.591±0.388 | 0.961±0.003 |

F1 | 472 | 0.956±0.001 | 0.972±0.002 | 0.083±0.0031 | 0.454±0.453 | 0.495±0.477 | 0.898±0.013 |

Finally, to test the capacity of the network on variable-length sequences, we also test a variation of this benchmark, which we call permuted variable-sequential-line MNIST. In this variation, a random number

Images are fed to the recurrent network line by line and

BRC | NBRC | GORU | LSTMCell | GRUCell | LMU | ||
---|---|---|---|---|---|---|---|

Acc. | 472 | 0.958±0.002 | 0.970±0.001 | 0.148±0.015 | 0.630±0.318 | 0.540±0.426 | 0.180±0.002 |

F1 | 472 | 0.954±0.000 | 0.967±0.001 | 0.022±0.001 | 0.491±0.465 | 0.451±0.474 | 0.062±0.036 |

Until now, we have looked at the learning performances of bistable recurrent cells. It is, however, interesting to take a deeper look at the dynamics of such cells to understand whether or not bistability is used by the network. To this end, we pick a random time-series from the denoising benchmark and analyse some properties of _{t} and _{t}. For this analysis, we train a network with 4 layers of 100 neurons each, allowing for the analysis of a deeper network as compared to those used in the benchmarks. Note that the performances of this network are similar to those reported in _{t} per layer. The dynamics of the parameters show that they are well used by the network, and three main observations should be made. First, as relevant inputs are presented to the network, the proportion of bistable neurons tends to increase in layers 2 and 3, effectively storing information and thus confirming the interest of introducing bistability for long-term memory. As more information needs to be stored, the network leverages the power of bistability by increasing the number of bistable neurons. Second, as relevant inputs are presented to the network, the average value _{t} tends to increase in layer 3, effectively making the network less and less sensitive to new inputs. Third, one can observe a transition regime when a relevant input is shown. Indeed, there is a high decrease in the average value of _{t}, effectively making the network extremely sensitive to the current input, which allows for its efficient memorization.

Layer numbering increases as layers get deeper (i.e. layer i corresponds to the ith layer of the network). The 5 time-steps at which a relevant input is shown to the model are clearly distinguishable by the behaviour of those measures alone.

In this paper, we introduced two new important concepts from the biological brain into recurrent neural networks: cellular memory and bistability. This led to the development of two new cells, called the Bistable Recurrent Cell (BRC) and recurrently neuromodulated Bistable Recurrent Cell (nBRC) that proved to be very efficient on several datasets requiring long-term memory and on which the performances of classical recurrent cells such as GRUs and LSTMS were poor. Furthermore, through the similarities between nBRCs and standard GRUs, we highlight that gating mechanisms can be linked to biological neuromodulation.

As future work, it would be of interest to study more complex and biologically plausible neuromodulation schemes and identify what types of new, gated architectures could emerge from them. A well biologically-motivated example of this would be the use of a neuromodulatory network [

Furthermore, we note that even though we focused on supervised benchmarks in the context of this paper, bistable cells might be of great use for reinforcement learning (RL), and more precisely for RL problems with sparse environments. These problems have been known to be extremely hard to solve, on one hand due to the difficulty of exploration and on the other hand due to the difficulty of remembering relevant information across large periods of time-steps. Bistable cells are a promising avenue for solving the latter, and might be a worthwhile path to explore.

_{t} _{t} = 0.

_{0}, _{0}) = (0, 0) for _{pf} = 1 by verifying the conditions
_{t}) = _{t} − _{t}) [

The stability of (_{0}, _{0}) for

The equilibrium point is stable if _{t})/_{t} ∈ [0, 1[, singular if _{t})/_{t} = 1, and unstable if _{t})/_{t} ∈ ]1, + ∞[. We have
_{0}, _{0}) is stable for

It follows that for _{0}, _{0}), whose uniqueness is verified by the monotonicity of _{t}) (_{t})/_{t} > 0∀_{t}).

For _{0}, _{0}) is unstable, and there exist two stable points (_{0}, ±_{1}) whose basins of attraction are defined by _{t} ∈ ]−∞, _{0}[ for −_{1} and _{t} ∈ ]_{0}, + ∞[ for _{1}.