3: Valor Real - Desarrollo de un sistema prototipo de calentamiento eficiente de agua potable p

As mentioned earlier, the computation of the CNNs is shared among the accelerator and the ARM cores. But still, we need a way to extract the structure and the weights of a model and load them into our system. We have chosen to transfer the model to the ZYBO board in two steps. This was done because the instructions of the accelerator are very long and complex. So, they cannot be easily read and modied by hand. The intention of this work was to also make the accelerator as exible as possible.

Thus, the rst step is to encode the Keras model from Python object into two simple text les, one for the structure and one for the weights. We then let the software on the ZYBO board to parse one and create the complex instructions of the accelerator. Thereafter it copies the weights from the other into the appropriate storage. This will allow the user to manually write a description and even set the weights if needed.

A model in Keras is an object with multiple variables. The majority though, are not useful for us in order to accelerate only the inference. The way that we parse a Keras model is focused around convolution. This is due to the structure of the accelerator. It was previously discussed how its components are placed one after the other. Although we can activate or deactivate the max pooling and zero padding modules, the convolution on the other hand, will always take place and will always be followed by ReLU activation.

Subsequently, when our algorithm parses a Keras model it focuses on nding only the convolution layers. Then it checks if their input is zero padded and whether they are followed by max pooling or not. This information is encoded in a simple way by just producing a text le that contains information about the convolution layers. Each row of the text le stores all the setting that are needed to describe a convolution layer. Those settings include the width and height of the input feature maps, the number of input and output feature maps and nally, an ON/OFF indication for zero padding and max pulling.

Figure 44: Structure description text le for model B.

A sample of a structure description text le can be seen in gure 44. The le contains also information about the fully connected layers as well. This information is used by the software which runs on the ARM cores. It basically instructs the software on how compute the rest of the neural network which follows after the convolutional part. In this case the settings include the number of input and output neurons together with a number that indicates which activation is to be applied at each layer's output.

The second step in order to set the accelerator, is to handle the text les at the ZYBO board. The structure description is combined with information about the feature map buers which are located in the DDR3 SDRAM memory and the amount of bytes that are exchanged. This complexity arises from the fact that the convolver computes a full pass of the convolutional part autonomously. Thus the instructions should combine the settings that will enable the controller to control the data stream, together with the commands that the Datamover module needs.

During the initialization of the convolver the program allocates the feature map buers in the SDRAM memory. In order to save space, we use a ping- pong conguration for the buers. This is when we use two buers that serve a dierent role during each layer's computation. In case that for a given layer, buer A holds the input feature maps and buer B is used to store the output, then for the next layer, B will be the input and A the output buer. In this way we can save space by using only two buers instead of one buer per convolution layer. This conguration is possible due to the sequential nature of our simple CNNs. As one layer uses the output of only the previous one, we can overwrite data that will not be further used.

The memory allocation is followed by the compilation of the instructions. The CPU now can compile the instructions together with the known base ad- dresses and the osets. When ready, the program will write them into the Instructions BRAM which is connected also to the accelerator's controller. The same will happen for the weights as well. They don't require any further process though, so they can be directly copied from the text le to the weight memory

Figure 45: The complete object tracking system. Similarly to gure 40, Ps includes the predened hardware of the Zynq SoC and PL indicates its FPGA fabric.

that is depicted in gure 36 . Finally, we set the number of convolution layers directly in a controller's register via an AXI4 lite interface. This is needed in order to make the controller aware of the number of instructions it has to execute.

In document Desarrollo de un sistema prototipo de calentamiento eficiente de agua potable por inducción electromagnética (página 104-153)