The authors have declared that no competing interests exist.

Conceived and designed the experiments: JG. Performed the experiments: JG. Analyzed the data: JG. Wrote the paper: JG CE.

The Semantic Pointer Architecture (SPA) is a proposal of specifying the computations and architectural elements needed to account for cognitive functions. By means of the Neural Engineering Framework (NEF) this proposal can be realized in a spiking neural network. However, in any such network each SPA transformation will accumulate noise. By increasing the accuracy of common SPA operations, the overall network performance can be increased considerably. As well, the representations in such networks present a trade-off between being able to represent all possible values and being only able to represent the most likely values, but with high accuracy. We derive a heuristic to find the near-optimal point in this trade-off. This allows us to improve the accuracy of common SPA operations by up to 25 times. Ultimately, it allows for a reduction of neuron number and a more efficient use of both traditional and neuromorphic hardware, which we demonstrate here.

The Neural Engineering Framework (NEF) [

The SPA employs a small number of mathematical operations (discussed in detail later). When those operations are implemented in a neural network, each transformation will accumulate noise. Thus, any improvement to the accuracy of these operations will improve the overall network performance by a significant factor. The goal of this work is to improve the accuracy of such computations for a given amount of neural resources. We will show that such an improvement is possible by considering the distribution of values represented in neural ensembles. By allowing a less accurate representation of rarely occurring values we can considerably improve the representation of common values. For example, if almost all represented values had a magnitude below one, it is not useful to have neurons tuned to represent values with a larger magnitude. In fact, we derive a heuristic to determine a near optimal trade-off of being able to represent all possibly occurring values and only being able to represent the most likely values. Identifying this heuristic allows for much more efficient simulation of neural components that perform typical SPA operations because less neurons can be used while maintaining the error level. Large-scale models require optimizations of this sort because they are computationally costly to simulate. For instance, the Spaun model required 2.5h of simulation time for each second of simulated time. With fewer neurons, model simulation can run faster or larger models can be run without increased hardware requirements. As well, such optimizations make it more feasible to put computationally useful networks on non-traditional, neuromorphic hardware.

While traditional von Neumann computers approach physical limits, neuromorphic platforms promise continuing speed-ups and a far better energy efficiency. Projects like CAVIAR [

Choudhary et al. [

The paper is organized as follows: First, we give an introduction to the Neural Engineering Framework and the Semantic Pointer Architecture before describing our optimization methods in detail. Following that, we test the methods first with computer simulations of standard leaky integrate-and-fire neurons and then with a SpiNNaker neuromorphic implementation using fixed-point calculations. To show the applicability to large scale models, we demonstrate the improvement on a recent model of the

The methods in this paper are applicable to the Neural Engineering Framework (NEF) [

Representation: Populations of neurons represent vectors by non-linear encoding and linear decoding.

Transformation: Functions of time-varying variables can be computed by an alternative linear decoding of the vector represented by a neural population.

Dynamics: Represented values can be treated as state variables (see Principle 1). A dynamical system of these state variables can be implemented with recurrent connections. Necessary nonlinearities can be computed with principle 2.

Here we focus on the first two principles. NEF models typically begin by suggesting a description of the cognitive system using time-varying, real-valued vectors and transformations of these vectors. The NEF specifies a population encoding given a group of neurons for such a vector. It also specifies how to approximately decode the vector or a transformation of it from the population coding of such a group of neurons. By combining encoding and decoding, populations of neurons can be connected to create networks to transmit and process information. In the following we will detail these steps (see also

The NEF specifies how an input (upper left) is encoded by a population of neurons with individual tuning curves. The orientation of the rising flank of the tuning curve corresponds to the preferred direction _{i} of the neuron (either −1 or +1 in this one-dimensional example). The encoding equation will typically produce spike trains for the neurons (top right). To decode the represented value, the spike trains are filtered (corresponding to synaptic filtering) and then a linear weighted sum is computed (bottom right). Using different sets of decoding weights non-linear transformations can be implemented (bottom left).

The representation of a time-varying vector _{i}(_{i} a gain factor, _{i} the neuron’s encoder or preferred direction vector, and _{bias, i} the background current to the neuron. The dot product _{i} ⋅

Depending on the activation function, the activity _{i}(_{k} δ(_{k}) expressed as a sum of Dirac delta δ functions for all spikes _{k}. A specific neuromorphic hardware platform may dictate

For simplicity, we assume _{i} to be a rate approximation in the following equations. The value represented by a group of _{i} as
_{ik}. Together, the encoding and decoding equations specify the first NEF core principle of representation.

The decoding weights are typically found by a least-squares optimization of the difference

To solve the least-square problem for the decoding weights, we define the matrices

Next, we consider the principle of transformation. The decoding weights for an arbitrary function

Individual ensembles can be connected to form larger functional networks in the NEF. To do so we determine the decoding weights for the desired transformation from the presynaptic ensemble. The decoded value can then be fed to and encoded in the postsynaptic ensemble. Mathematically this gives an outer product of the presynaptic decoding weights and postsynaptic encoders which specifies the synaptic connection weights from

Because the third principle of dynamics is not relevant to the methods in this paper, we skip a detailed description for the sake of brevity and refer the interested reader to [

Many cognitive models implemented with the NEF, including Spaun, use the Semantic Pointer Architecture (SPA) [

The binding produces a new vector dissimilar to the original vectors. From the resulting vector, the operands can be approximately recovered by a circular convolution with the involution of the other operand. That is,

Using addition and circular convolution, multiple semantic pointers can be stored and retrieved within a single vector. Given semantic pointers for SQUARE, CIRCLE, BLUE, and RED, a scene with a blue square and a red circle could be represented as

This characterization of structured representation in the SPA has been used to build a variety of spiking neural models that simulate simple linguistic parsing [

To calculate the optimal decoding weights in the NEF it is necessary to invert an ^{3}/^{2}) instead of ^{3}). Note that in most models based on the NEF and SPA linear combinations of vectors components, which allow this optimization are extremely common. Moreover, ^{3}) vs.

Furthermore, if we increase the number of dimensions ^{2} to achieve this goal. By splitting up the vector into

In the NEF, each ensemble is usually optimized over a certain radius

In the following we present a method to optimize the ensemble radius in these cases to increase the network accuracy.

The radius of the hypersphere from which evaluation points are chosen is central to the optimization proposed in this paper. If we chose the radius too small, values might fall outside of this hypersphere and they cannot be represented well because the neural ensemble has not been optimized for values in this range. If, however, we chose the radius too large some proportion the evaluation points will be used to cover an irrelevant part of the input space, i.e., a part where no values have to be represented. We want to find the radius with the best trade-off of these two effects.

We proceed by defining an approximate error function in dependence of the radius. By minimizing this error function, a nearly optimal radius can be obtained. There are three factors contributing to the representation error:

Distortion _{x > r} from points that fall outside of the optimization radius

Distortion _{x ≤ r} from points that fall inside of the optimization radius

Noise from the spiking and random fluctuations of the neurons.

In the following we derive expressions for the static distortion given by the first two error contributions listed above. The neuron noise is assumed to be independent of the radius

Let _{1}, …, _{D}) be a random vector with

This distribution allows us to derive the distribution of the length

The cumulative distribution function is given by (see

Assuming that every point outside of the optimization radius ^{2}. (Most neuron models used with the NEF will not do a hard cut-off at the radius, but saturate more slowly. Thus, values outside of the radius

The exact distortion inside of the radius is given by
_{q} are evaluation points sampled from the unit hyper-sphere ^{D}.

Weighting both error contributions by the probability of representing a value in the respective domains gives the complete error function:

It is assumed that each vector component is stored in an individual ensemble (

To validate the derived error function we performed a number of simulations using the Nengo neural simulator [

To empirically determine the distortion we used the rate approximation of leaky integrate-and-fire (LIF) neurons given by
_{ref} = 2 ms, membrane time constant _{RC} = 20 ms, and threshold current

The results for different parameter sets together with the analytical error estimate are plotted in ^{−3} of the empirical error.

Error bars on the scatter points denote the 95% confidence intervals. The empirical error is the mean of 20 trials with a duration of 10s each. See the text for details on how the empirical error was obtained. As long as not otherwise noted in the title of the individual plot the simulations were performed with

Next we tested the accuracy of representation with spiking LIF neurons using the same neuronal parameters as for the rate neurons (_{ref} = 2 ms, _{RC} = 20 ms, random voltage thresholds

Each subplot shows the results for a fixed dimensionality. In each subplot results with the default and optimized radius for a baseline number of neurons per dimension are given and also the result for the optimized radius and heuristically reduced neuron number (see text for details).

In the test cases, the RMSE was reduced by a factor of 2.3 to 4.6 compared to using a non-optimized radius. As the mean square error is proportional to 1/

To show that the radius optimization not only improves representational accuracy, but also the accuracy of transformations, we tested it with a circular convolution network. The same random and slowly varying input vector (normalized random vector with white noise components) was used as one operand and the second operand was fixed to a random unitary vector. Otherwise, the same procedure is used as in the representation test.

The circular convolution in Nengo is computed by taking the discrete Fourier transform (DFT), multiplying the Fourier coefficients in individual ensembles, and calculating the inverse discrete Fourier transform (IDFT). This characterization of the computation in the state space results in a simple, 2-layer feedforward network. The Nengo default implementation uses a normalization factor of 1 for the DFT, a factor of 1/

The resulting distributions of the RMSE with respect to the analytical circular convolution are shown in

Each subplot shows the results for a fixed dimensionality. In each subplot results with the default and optimized radius for a baseline number of neurons per dimension are given and also the result for the optimized radius and heuristically reduced neuron number (see text for details).

In the test cases the RMSE was reduced by a factor ranging from 1.4 to 1.8. We applied the same heuristic as in the representation test to reduce the neuron number and included the results in

In addition to the circular convolution for binding, the SPA uses dot products for comparison of semantic pointers. In Nengo a dot product is implemented by multiplying the vector components in individual parabolic multiplier networks and having the outputs project to a single ensemble to generate a sum. The default ensemble radius is 1.

Each subplot shows the results for a fixed dimensionality. In each subplot results with the default and optimized radius for a baseline number of neurons per dimension are given and also the result for the optimized radius and heuristically reduced neuron number (see text for details).

Again this optimization allows for the use of far fewer neurons with the optimized radius. We set a lower limit of five neurons per dimension as fewer neurons could easily give rise to additional error sources. The results of reducing the number of neurons by 10 to 20 times to this lower limit is depicted in the same figure. The error of the optimized dot product is still clearly below that of the default implementation.

As discussed in the introduction, these optimizations are helpful for improving the accuracy of computations being performed in resource-limited neuromorphic hardware. Here we verify that these optimizations hold on physical neuromorphic hardware by using the SpiNNaker platform [

The results in

The results with the default and optimized radius for a baseline of 200 neurons per dimension are given and also the result for the optimized radius and heuristically reduced neuron number (see text for details).

We have shown the effectiveness of the optimization method for single ensembles. To verify the improvements on larger scale models, we employ the optimization method on an NEF model of the

The original results were obtained with the optimization methods presented in this paper with a single ensemble for each vector component. If we disable the optimization of the radius and use the Nengo default, the effect on the model performance is detrimental (see

Results for

We derived a mostly analytical approximation of the representational error of an NEF ensemble in dependence of its radius. This can be used to find a near optimal radius for the representation of a low-dimensional subvector of a high-dimensional semantic pointer. Doing so is an efficient operation as only the activity matrix

The method provides a number of important improvements. First, it can be used instead of the default radius which is only optimized for a specific neuron type and parameter set. Second, it allows us to obtain a radius without relying on trial and error methods or a rule of thumb. Third, fewer resources (e.g. simulated neurons) are needed to achieve the same level of performance as the error is reduced with a well-chosen radius.

Using the optimized radius yields a reduction of the RMSE up to a factor of 1.8 in the case of circular convolution and a factor of up to 25.7 in case of a dot product compared to the current Nengo default implementation. Both operations are frequently used in cognitive models built with the Semantic Pointer Architecture. On the SpiNNaker neuromorphic platform the reduction by a factor of 1.3 is more moderate, but still allowed a reduction of the number of neurons by 40%. Also, note that only 25 dimensions were used on the SpiNNaker platform. As current limitations with the hardware implementation are overcome, allowing higher dimensional semantic pointers, the usefulness of the presented method is expected to increase. In general, the variance of the individual unit vector components will decrease with increasing vector dimensionality as the distribution of vector component lengths will shift to smaller values and decrease in width. That allows for a smaller radius and increases the benefit of the radius optimization.

In cognitive models a number of these operations are often used in sequence, resulting in the accumulation of error. When a smaller error is introduced by the individual operations, it is possible to build larger cognitive networks without negative functional consequences due to accumulated error. Alternatively, a reduction in the number of neurons is possible while keeping the error constant. This allows for a more efficient use of resources, including neuromorphic hardware, to run even larger models or allow for more processing in power-sensitive applications.

More generally, we have demonstrated the potential of adapting neural network parameters to the distribution of the input signals to the specific neural subsystems. In SPA models it is common to have high-dimensional vectors split up into subvectors, consequently we focussed our optimization on this particular input structure. However, future work can focus on differently structured input which should allow for related derivations of error functions that can be optimized in a similar way.

Similarly, the analysis in this paper used the _{2} norm as error measure. We expect future work to consider other cases in which different error norms might be more appropriate. The choice of norm determines the trade-off that is being made between having a majority of small errors and a few large errors versus all errors being similar in magnitude. Similar optimizations should be achievable in these cases.

By considering the probability distribution of represented values within the Semantic Pointer Architecture, we were able to derive a method for determining an optimized radius for neural networks constructed using the Neural Engineering Framework. Depending on the hardware platform and calculated transformation neuron numbers could be reduced by 40% up to 97.5% while still achieving a comparable performance to unoptimized networks. Ultimately, this allows to simulate more complex networks as the hardware can be used more efficiently.

We are planning to include the proposed methods in the Nengo neural network simulator in the future.

(PDF)

(PDF)

(PDF)

(PDF)