^{*}

Wrote the paper: ABB MCWvR. Performed the research, using analytic mathematics, computer programming, and literature review: ABB. Conceived and designed the project: MCWvR. Performed computer simulations: MCWvR.

The authors have declared that no competing interests exist.

There is evidence that biological synapses have a limited number of discrete weight states. Memory storage with such synapses behaves quite differently from synapses with unbounded, continuous weights, as old memories are automatically overwritten by new memories. Consequently, there has been substantial discussion about how this affects learning and storage capacity. In this paper, we calculate the storage capacity of discrete, bounded synapses in terms of Shannon information. We use this to optimize the learning rules and investigate how the maximum information capacity depends on the number of synapses, the number of synaptic states, and the coding sparseness. Below a certain critical number of synapses per neuron (comparable to numbers found in biology), we find that storage is similar to unbounded, continuous synapses. Hence, discrete synapses do not necessarily have lower storage capacity.

It is believed that the neural basis of learning and memory is change in the strength of synaptic connections between neurons. Much theoretical work on this topic assumes that the strength, or weight, of a synapse may vary continuously and be unbounded. More recent studies have considered synapses that have a limited number of discrete states. In dynamical models of such synapses, old memories are automatically overwritten by new memories, and it has been previously difficult to optimize performance using standard capacity measures, for stronger learning typically implies faster forgetting. Here, we propose an information theoretic measure of storage capacity of such forgetting systems, and use this to optimize the learning rules. We find that for parameters comparable to those found in biology, capacity of discrete synapses is similar to that of unbounded, continuous synapses, provided the number of synapses per neuron is limited. Our findings are relevant for experiments investigating the precise nature of synaptic changes during learning, and also pave a path for further work on building biologically realistic memory models.

Memory in biological neural systems is believed to be stored in the synaptic weights. Numerous computational models of such memory systems have been constructed in order to study their properties and to explore potential hardware implementations. Storage capacity and optimal learning rules have been studied both for single-layer associative networks

However, in biology, as well as in potential hardware, synaptic weights should take values between certain bounds. Furthermore, synapses might be restricted to have a limited number of synaptic states, e.g. the synapse might be binary. Although binary synapses might have limited storage capacity, they can be made more robust to biochemical noise than continuous synapses

Networks with bounded synapses have the palimpsest property, i.e. old memories decay automatically as they are overwritten by new ones

It is common to use the signal-to-noise ratio (SNR) to quantify memory storage in neural networks

However, for discrete, bounded synapses performance must be characterized by

We model a single neuron, and investigate how information capacity depends on the number of synapses and the number of synaptic states. We find that below a critical number of synapses, the total capacity is linear in the number of synapses, while for more synapses the capacity grows only as the square root of the number of synapses per neuron. This critical number is dependent on the sparseness of the patterns stored, as well as on the number of synaptic states. Furthermore, when increasing the number of synaptic states, the information initially grows linearly with the number of states, but saturates for many states. Interestingly, for biologically realistic parameters, capacity is just at this critical point, suggesting that the number of synapses per neuron is limited to prevent sub-optimal learning. Finally, the capacity measure allows direct comparison of discrete with continuous synapses, showing that under the right conditions their capacities are comparable.

The single neuron learning paradigm we consider is as follows: at each time-step during the learning phase, a binary pattern is presented and the synapses are updated in an unsupervised manner with a stochastic learning rule. High inputs lead to potentiation, and low inputs to depression of the synapses. Note that if we assume that the inputs cause sufficient post-synaptic activity, the learning rule can be thought of as Hebbian: high (low) pre-synaptic activity paired with post-synaptic activity leads to potentiation (depression). After the learning phase, the neuron is tested with both learned and novel patterns, and it has to perform a recognition task and decide which patterns were learned and which are novel. Alternatively, one can do a (supervised) association task in which some patterns have to be associated with a high output, and others with a low output. This gives qualitatively similar results (see

More precisely, we consider the setup depicted in _{a}^{a}

Binary input vectors ^{a}_{a}

Each synapse occupies one of

Finally, we note that this setup is of course an abstraction of biological memory storage. For instance, biological coding is believed to be sparse, but the relation between our definition of

After learning, the neuron is tested on learned and novel patterns. Presentation of a learned pattern yields a signal which is on average larger than for a novel pattern. Presentation of an unlearned random pattern _{u}^{∞}, ^{∞} denotes the equilibrium weight distribution. The angular brackets stand for an average over many realizations of the system.

Because the synapses are assumed independent and learning is stochastic, the learning is defined by Markov transition matrices ^{+} (^{−}). The distribution of the weights immediately after a high (low) input is ^{±}(^{±}^{∞}. As subsequent uncorrelated patterns are learned, this signal decays according to ^{±}(^{t}^{±}(^{+}+^{−} is the average update matrix. Note that the equilibrium distribution ^{∞} is the normalized eigenvector of _{a}x^{a}w_{a}

We define the SNR for the pattern stored

In the testing phase we measure the mutual information in the neuron's output about whether a test pattern is learned or a novel, unlearned pattern. Given an equal likelihood of the test pattern being some learned pattern (ℓ) or an unlearned pattern (_{ℓ}(_{u}

In general the full distributions _{ℓ} and _{u}_{ℓ} and _{u}_{ℓ}<_{u}

As the patterns are independent, the total information is the sum of the information over all patterns presented during learning. We number the patterns using discrete time. The time associated with each pattern is the age of the pattern at the end of the learning phase, as measured by the number of patterns that have been subsequently presented. The total information per synapse is obtained by summing together the information of all patterns and dividing by the number of synapses, thus _{c}_{c}

Storage capacity depends on the ^{+} and ^{−}. To find the maximal storage capacity we need to optimize these matrices, and this optimization depends on sparseness, the number of synapses, and the number of states per synapse. Because these are Markov transition matrices, their columns need sum to one, leaving

In the case of binary synapses (W = 2) we write the learning matrices as_{+} and _{−} that maximize the information depend on the sparsity _{+}, _{−}) = (1,1). For 0.11<_{+}, _{−}) = (1,1) maximizes the information. In this case the synapse is modified every time-step and only stores the most recent pattern; the information stored on one pattern drops to zero as soon as the next pattern is learned. This leads to equilibrium weight distribution ^{∞} = (^{T}_{S} = 0.115.

For sparser patterns _{+} = 1, _{−}≈2^{∞} = (2/3,1/3)^{T}

We next consider the limit of many synapses, for which the initial SNR is high. With Equation 8 we find^{∞} = (1/2,1/2)^{T}

To verify the above results, and to examine the information between the large and small _{+} and _{−}. We find there is a smooth interpolation between the two limiting cases, and good match with the theory. For given sparsity, there is a critical number of synapses beyond which addition of further synapses does not substantially improve information capacity, ^{−1}.

Information storage capacity per synapse versus the number of synaptic inputs, for dense (

In terms of

We compare the storage capacity found here with that of a Willshaw net

Since a learned pattern definitely gives the signal ^{m}^{m}^{np}_{S} = (_{Patt}. Given the number of synapses, and the sparsity, one can optimize the information with respect to the number of patterns. In the limit of few synapses, and sparse patterns, one can achieve _{S} = 0.11 bits, which is several times higher than the storage we obtain for our model when coding is sparse. However, as the number of synapses increases, storage decays with ^{−1}, which is much faster than the ^{−1/2} decay found here. (Aside: Willshaw obtains a maximum capacity of _{S} = 0.69 bits within his framework

Next, we examine whether storage capacity increases as the number of synaptic states increases. Even under small or large ^{∞} of a general Markov matrix

In the dense (^{+})_{ij}^{−})_{W}_{+1−j,W+1−i}. In the limit of many synapses, the optimal learning rule takes a simple form

Perhaps one would expect optimal storage if, in equilibrium, synapses were uniformly distributed, thus making equal use of all the states. However, the equilibrium weight distribution is peaked at both ends, and low and flat in the middle, ^{∞}∝(1,^{T}

In the sparse case there seems to be no simple optimal transfer matrix, even in the large _{S} from our analytic and numerical results. A formula consistent with the binary synapse information Equation 14, as well as the case of dense patterns, Equation 17 is_{+} and _{−} small for it to be accurate, Equation 18 is valid when

Information storage capacity per synapse versus the number

For large _{t}

Finally we study, for large _{+} (_{−}). I.e.

Given that simple stochastic learning performs almost as well as the optimal learning rule, we wondered how well a simple deterministic learning rule performs in comparison. In that case, synapses are always potentiated or depressed, there is no stochastic element, i.e. _{+} = _{−} = 1. One finds^{2}/^{2}. Although the information grows faster with

The above results raise the question whether binary synapses are much worse than continuous synapses. It is interesting to note that even continuous, unbounded synapses can store only a limited amount of information. We consider a setup analogous to that of Dayan and Willshaw _{u}_{ℓ}〉 = _{S}≈0.11 when _{S} is independent of _{S} is independent of

In all the above the neuron's task was to correctly recognize patterns that were learned before. We wondered if our results generalize to a case in which the neuron has to associate one half of the patterns to a low output and the other half of the patterns to a high output. This is a supervised learning paradigm which is specified by defining what happens when the input is high/low and the desired output is high/low. In other words, there are four learning matrices

The capacities for the recognition task (

We have studied pattern storage using discrete, bounded synapses. Learning rules for these synapses can be defined by stochastic transition matrices

Given optimal learning we find two regimes for the information storage capacity: 1. When the number of synapses is small, information per synapse is constant and approximately independent of the number of synaptic states. 2. When the number of synapses is large, capacity per synapse increases linearly with

The implications for biology depend on the precise nature of single neuron computation. If a neuron can only compute the sum of all its inputs then we might conclude the following. As synapses are metabolically expensive

Furthermore, our results predict that when synapses are binary, coding is sparse, and learning is optimized, that at equilibrium about 67% of synapses should occupy the low state. This is not far off the experimental figure of 80%

We have directly compared discrete to continuous synapses. For few synapses and dense coding, binary synapses can store up to 0.11 bits of information, which is comparable to the maximal capacity of continuous synapses. However, for sparse coding and many synapses per neuron, the capacity of binary synapses is reduced. Hence, if one considered only information storage, one would conclude that, unsurprisingly, unbounded synapses perform better than binary synapses. However, in unbounded synapses, weight decay mechanisms must be introduced to prevent runaway, so the information storage capacity is necessarily reduced in on-line learning

Finally, it is worth noting that although using Shannon information is a principled way to measure storage, it is unclear whether for all biological scenarios it is the best measure of performance, c.f.

To obtain the information capacity numerically, we used Matlab and implemented the following process. For a given number of synaptic states, number of synapses and sparsity, we used Matlab's fminsearchbnd to search through the parameter space of all possible transfer matrices ^{+} and ^{−}. That is, all matrix elements were constrained to take values between 0 and 1, and all columns were required to sum to 1. For each set of transfer matrices we first obtained the equilibrium weight distribution ^{∞} as the eigenvector with eigenvalue 1 of the matrix

In particular, in the case of many weight states (large

Our results can also be compared to the so-called cascade model, which was recently proposed to have high SNR and slow memory decay

Finally, we explored how well the Gaussian approximation worked. We calculated the full multinomial distribution of the total input

We thank Henning Sprekeler, Peter Latham, Jesus Cortes, David Sterratt, Guy Billings, and Robert Urbanczik for discussion.