Digital silicon neurons rely upon implementing a set of defined differential equations using arithmetic hardware. These equations may be implemented using software and a microprocessor, such as within SpiNNaker [108], but this introduces area, energy and performance overheads. By using a specific architecture and defined datapath the mathe- matical operations can be streamlined to reduce these overheads. This approach has been adopted by multiple groups, utilizing both FPGA and ASIC devices [122][138][130][119]. The high-speed operation of silicon circuits in comparison to biology allow for multiple virtual neurons to be multiplexed across a single processing core datapath. Each neuron is then allocated a specific time frame for its operation through the datapath. A neuron’s parameters are stored in memory whilst other virtual neurons are being updated. This design concept is illustrated in Figure 4.7(a).
The number of virtual neurons allocated to a single processing core will impact the area, energy, latency and throughput of the system [127]. This parameter has been defined as the system granularity previously insection 3.2.3 and will be referred to as n.
Figure 4.7: Digital neuron design. (a) Processing core may implement n virtual neurons through a time-multiplexing approach. (b) Maximum resource design. As many functional units as possible are utilized to reduce latency and provide maximum throughput. (c) Minimum resource design. Virtual neurons are updated sequentially, increasing system latency and reducing throughput, but utilizing less area resources.
The datapath structure consists of a combination of arithmetic logic units and memory elements. Each neural equation may be implemented using a variety of structures de- pending upon the design specification. For instance, if a high throughput is required with little concern for area and energy consumption then a maximum resource approach, such as illustrated byFigure 4.7(b), should be used [124]. Alternatively, to reduce area and energy consumption the datapath could be constrained to use a minimum number of resources (see Figure 4.7(c)). Next, the impact of the datapath structure and the granularity upon the system requirements is considered.
4.3.1 Implementation Parameters
Area As defined by Cassidy et al. [127] the area consumption of a neuron processing core is dependent upon the area of the datapath and the area of the parameter memory (4.1).
Areatotal = Areadatapath+ Areamemory (4.1)
With increasing granularity n the area of the datapath will remain fixed, whilst the area of the memory will increase linearly. Therefore, the area per neuron will asymptotically
reduce to the memory area consumed by an individual virtual neuron’s parameters. As such, to reduce the area overhead a large granularity should be utilized.
Further, the analysis of Cassidy et al. [127] can be extended to include an optimization function for the datapath structure. By utilizing a maximum resource approach the datapath can be fully pipelined, increasing the throughput and the amount of virtual neurons per processing core. Whereas, a minimum resource approach may reduce dat- apath size, but limit the number of virtual neurons, thereby, requiring more processing cores to be implemented for a given network size. This leads to the following relationship (4.2)
T otalArea = P rocessingCoreArea ∗N etworkSize
n (4.2)
Equation (4.2) states that the total area consumed is the size of a processing core multiplied by the number of processing cores that are required to compute the function. Area of the datapath may be estimated by calculating the total number of arithmetic units required. Area of the memory units may be estimated by calculating the number of bits required to represent a neuron’s parameters.
Energy Energy dissipation is a combination of static and dynamic power [139]. Static power relates to parasitic leakage currents within transistors. Its impact can therefore be reduced by reducing the overall number of transistors in a circuit. Alterna- tively, power gate techniques can be used to fully disable transistors when they are not required.
Dynamic power is the power required to switch the gates in a circuit between voltage levels at a set frequency. Its relationship is shown in (4.3), where P is the dynamic power, C is the gate capacitance, V is the voltage of the transistor and f is the average switching frequency.
Reducing the frequency of operation of the circuit will lead to power reduction. Also, a lower frequency allows for a lower voltage, which has a quadratic effect upon the dynamic power consumption.
Lowering the operating frequency will limit the number of virtual neurons per processing core. Therefore to reach a certain network size more processing cores will be required. This will increase the static power consumption.
A maximum resource datapath may utilize a slower operating frequency due to its higher throughput, reducing the dynamic power consumption. However, the increased number of transistors required will contribute towards increased static power consumption. Finding the optimal energy performance will rely upon locating a sweet-spot in the relationship between static and dynamic power per neuron. This is illustrated later in the thesis by Figure 4.22, where the static and dynamic power performance of a neuron processing core are shown as the granularity of that processing core is varied.
Latency Latency is the time required to update a single virtual neuron. A maximum resource approach will provide a theoretical lower bound upon the latency.
To ensure working in real biological time the number of virtual neurons per neuron pro- cessing core should be constrained such that the total time to update all virtual neurons does not exceed the differential equation time step period. Equally, the simulation can be accelerated by reducing the latency.
Throughput Throughput is the rate at which virtual neurons are updated. A max- imum resource approach will offer an upper bound on throughput of 1 virtual neuron updated per clock cycle if all operations can be pipelined. Reducing available resources will reduce throughput significantly, especially if the datapath is no longer pipelined. The global throughput can be increased by including more neuron processing cores. In the following sections the optimization of two different neural models is considered: a Hodgkin-Huxley model with a focus upon closed-loop in vitro experiments, and an Izhikevich model suitable for large-scale network simulations.