CN109086879B

CN109086879B - Method for realizing dense connection neural network based on FPGA

Info

Publication number: CN109086879B
Application number: CN201810729915.4A
Authority: CN
Inventors: 陆生礼; 庞伟; 李宇峰; 周世豪; 范雪梅; 向家淇
Original assignee: Southeast University - Wuxi Institute Of Integrated Circuit Technology; Southeast University
Current assignee: Southeast University - Wuxi Institute Of Integrated Circuit Technology; Southeast University
Priority date: 2018-07-05
Filing date: 2018-07-05
Publication date: 2020-06-16
Anticipated expiration: 2038-07-05
Also published as: CN109086879A

Abstract

The invention discloses a method for realizing a dense connection neural network based on an FPGA (field programmable gate array), which comprises the following steps of: dividing the whole convolutional neural network into a plurality of dense connecting blocks; designing a convolution operation unit by using resources on the FPGA, and further designing an FPGA end convolution operation module; designing the data transceiving logic of the whole neural network, which comprises seven parts: input Feature Map, Send Buffer, convolution operation module, Receive Buffer, Output Feature Map, sense Block Buffer, Max Buffer; designing the sizes of storage areas required by an Input Feature Map, an Output Feature Map and a sense Block Buffer according to the size of Input and Output data of each layer of the Dense connection neural network, and designing the sizes of the storage areas required by a Send Buffer and a Receive Buffer according to the size of the Block and the parallelism of a convolution operation unit; and designing data transceiving logic according to the characteristics of each layer of the dense connection neural network. The method can reduce the width of each layer of the network on the premise of ensuring the accuracy of the algorithm, reduce the number of parameters, improve the data transmission efficiency and improve the running speed of the neural network.

Description

An implementation method of densely connected neural network based on FPGA

技术领域technical field

本发明属于图像处理领域，特别涉及一种基于FPGA的稠密连接神经网络的实现方法。The invention belongs to the field of image processing, and in particular relates to an implementation method of a densely connected neural network based on FPGA.

背景技术Background technique

在图像处理领域，卷积神经网络(Convolutional Neural Network，CNN)已成为主导机器学习方法。随着神经网络的不断发展，网络深度不断加深，网络性能和识别准确度都有了很大的提升。但随着网络深度的不断加深，输入信息和梯度的消失逐渐成为阻止卷积神经网络准确度提升的主导因素。研究表明，如果在靠近输入层和接近输出层之间使用更短的连接，卷积神经网络可以更准确、更高效地进行表达。In the field of image processing, Convolutional Neural Network (CNN) has become the dominant machine learning method. With the continuous development of neural network and the deepening of network depth, network performance and recognition accuracy have been greatly improved. However, with the deepening of the network depth, the disappearance of input information and gradients has gradually become the dominant factor preventing the improvement of the accuracy of convolutional neural networks. Research has shown that convolutional neural networks can express more accurately and efficiently if shorter connections are used between layers close to the input and layers close to the output.

稠密连接神经网络将所有层(特征图的大小一致)连接在一起，以确保网络层之间的最大信息流动。每一层都将其前面所有层的输出拼接后作为本层的输入，并将本层的输出传递给所有的后续层。稠密连接神经网减轻了梯度消失的问题，增强了特征的传播，实现了特征的重复利用，由于其不需要重新学习冗余的特征映射，因此它比传统的卷积神经网络需要更少的参数。Densely connected neural networks connect all layers (with feature maps of the same size) together to ensure maximum information flow between network layers. Each layer concatenates the outputs of all previous layers as the input of this layer, and passes the output of this layer to all subsequent layers. Densely connected neural network alleviates the problem of gradient disappearance, enhances feature propagation, and realizes feature reuse. Since it does not need to relearn redundant feature maps, it requires fewer parameters than traditional convolutional neural networks. .

FPGA(Field-Programmable Gate Array)，即现场可编程门阵列，是专用集成电路领域中的一种高速、高密度的半定制电路。FPGA厂商生产出来的芯片不含配置信息，用户可以根据自己的现实需要来进行配置，通过充分利用片上提供的资源实现所需功能。FPGA (Field-Programmable Gate Array), namely Field Programmable Gate Array, is a high-speed, high-density semi-custom circuit in the field of application-specific integrated circuits. The chips produced by FPGA manufacturers do not contain configuration information. Users can configure according to their actual needs, and realize the required functions by making full use of the resources provided on the chip.

传统的卷积神经网络参数过多，网络结构复杂，使得其运行速度较低，因此需要一种新的网络架构来减少参数数量，并充分利用FPGA的并行运算的特点来提升卷积神经网络的运行速度。The traditional convolutional neural network has too many parameters and complex network structure, which makes its running speed low. Therefore, a new network architecture is needed to reduce the number of parameters and make full use of the parallel operation characteristics of FPGA to improve the performance of the convolutional neural network. running speed.

发明内容SUMMARY OF THE INVENTION

本发明的目的，在于提供一种基于FPGA的稠密连接神经网络的实现方法，通过将前面所有层的输出进行拼接后作为本层输入，实现特征的重复利用，在保证算法准确度的前提下降低网络各层宽度，从而减少参数数量，通过对神经网络各层的数据收发逻辑进行设计，加强对数据的重复利用，减少数据传输次数，提高数据传输效率，并充分利用FPGA的并行运算，提升神经网络的运行速度。The purpose of the present invention is to provide a method for implementing a densely connected neural network based on FPGA, by splicing the outputs of all the previous layers as the input of this layer, to realize the repeated use of features, and to reduce the reduction in the accuracy of the algorithm under the premise of ensuring The width of each layer of the network can reduce the number of parameters. By designing the data sending and receiving logic of each layer of the neural network, the reuse of data is strengthened, the number of data transmissions is reduced, the efficiency of data transmission is improved, and the parallel operation of FPGA is fully utilized to improve the neural network. The speed of the network.

为了达成上述目的，本发明的解决方案是：In order to achieve the above-mentioned purpose, the solution of the present invention is:

一种基于FPGA的稠密连接神经网络的实现方法，包括如下步骤：An implementation method of a densely connected neural network based on FPGA, comprising the following steps:

步骤1，将整个卷积神经网络划分为多个稠密连接块；Step 1. Divide the entire convolutional neural network into multiple densely connected blocks;

步骤2，利用FPGA片上DSP资源、LUT以及逻辑资源，设计卷积运算单元，进而设计FPGA端卷积运算模块；Step 2, using FPGA on-chip DSP resources, LUT and logic resources to design a convolution operation unit, and then design an FPGA side convolution operation module;

步骤3，设计神经网络整体的数据收发逻辑，包括七个部分：Input Feature Map、Send Buffer、卷积运算模块、Receive Buffer、Output Feature Map、Dense BlockBuffer、Max Buffer；Step 3: Design the overall data sending and receiving logic of the neural network, including seven parts: Input Feature Map, Send Buffer, Convolution Operation Module, Receive Buffer, Output Feature Map, Dense BlockBuffer, Max Buffer;

步骤4，根据稠密连接神经网络各层输入输出数据量的大小，设计Input FeatureMap、Output Feature Map、Dense Block Buffer所需的存储区域大小，根据Block大小和卷积运算单元的并行度设计Send Buffer、Receive Buffer所需存储区域的大小；Step 4: Design the size of the storage area required for Input FeatureMap, Output Feature Map, and Dense Block Buffer according to the size of the input and output data of each layer of the densely connected neural network, and design the Send Buffer, The size of the storage area required for the Receive Buffer;

步骤5，根据稠密连接神经网络各层的特点具体设计其数据收发逻辑。Step 5, according to the characteristics of each layer of the densely connected neural network, specifically design its data sending and receiving logic.

采用上述方案后，本发明可通过硬件平台Xilinx ZYNQ-702N来实现，本发明基于稠密连接设计了一种卷积神经网络，通过稠密连接，复用前层特征，在保证算法准确度的前提下降低网络各层宽度，从而减少参数数量，通过对神经网络各层的数据收发逻辑进行设计，加强对数据的重复利用，减少数据传输次数，提高数据传输效率，并充分利用FPGA的并行运算，提升神经网络的运行效率。After adopting the above scheme, the present invention can be realized by the hardware platform Xilinx ZYNQ-702N. The present invention designs a convolutional neural network based on the dense connection, and reuses the features of the front layer through the dense connection, under the premise of ensuring the accuracy of the algorithm Reduce the width of each layer of the network, thereby reducing the number of parameters. By designing the data sending and receiving logic of each layer of the neural network, the reuse of data is strengthened, the number of data transmissions is reduced, the efficiency of data transmission is improved, and the parallel operation of FPGA is fully utilized to improve the The efficiency of neural network operation.

附图说明Description of drawings

图1是基于稠密连接设计的卷积神经网络结构；Figure 1 is a convolutional neural network structure based on dense connection design;

图2是卷积单元设计原理图；Figure 2 is a schematic diagram of the design of the convolution unit;

图3是神经网络整体的数据收发逻辑；Figure 3 is the overall data transceiver logic of the neural network;

图4是Feature Map的Block划分示意图；Fig. 4 is the block division schematic diagram of Feature Map;

图5是加Output Feature Map加padding操作的实现示意图；Figure 5 is a schematic diagram of the implementation of adding Output Feature Map and padding operation;

其中，(a)是单张Map加padding的示意图，(b)是将加好padding的所有Map按顺序存放至Output Feature Map的示意图；Among them, (a) is a schematic diagram of a single Map plus padding, (b) is a schematic diagram of storing all the maps with padding in the Output Feature Map in order;

图6是稠密连接块各层缓存至Dense Block Buffer的示意图。Figure 6 is a schematic diagram of each layer of dense connection block buffering to Dense Block Buffer.

具体实施方式Detailed ways

以下将结合附图，对本发明的技术方案及有益效果进行详细说明。The technical solutions and beneficial effects of the present invention will be described in detail below with reference to the accompanying drawings.

本发明提供一种基于FPGA的稠密连接神经网络的实现方法，包括如下步骤：The present invention provides a method for implementing a densely connected neural network based on FPGA, comprising the following steps:

步骤一：基于稠密连接将整个卷积神经网络划分为多个稠密连接块，用稠密连接块替代传统卷积神经网络中，特征图尺寸一致的相邻层，如图1所示，其设计规则是：Step 1: Divide the entire convolutional neural network into multiple densely connected blocks based on dense connections, and replace the adjacent layers with the same feature map size in the traditional convolutional neural network with densely connected blocks. As shown in Figure 1, its design rules Yes:

(1)每个稠密连接块中各层的特征图尺寸一致；(1) The feature map size of each layer in each dense connection block is consistent;

(2)每个稠密连接块中每一层都将其前面所有层的输出拼接后作为本层的输入，并将本层的输出传递给块中所有的后续层；(2) Each layer in each densely connected block splices the outputs of all previous layers as the input of this layer, and transmits the output of this layer to all subsequent layers in the block;

(3)稠密连接块之间通过卷积池化层相连。(3) The densely connected blocks are connected by a convolution pooling layer.

步骤二：利用FPGA上的资源，如图2所示设计卷积运算单元，进而设计FPGA端卷积运算模块，卷积运算模块即在FPGA端并行放置P个卷积运算单元，以提高并行度，其中P取不超过FPGA端的资源约束的最大值。Step 2: Using the resources on the FPGA, design the convolution operation unit as shown in Figure 2, and then design the convolution operation module on the FPGA side. The convolution operation module places P convolution operation units in parallel on the FPGA side to improve the degree of parallelism , where P takes the maximum value that does not exceed the resource constraints on the FPGA side.

卷积运算单元：将m*m的卷积核与特征图m*m区域的对应位置元素相乘，把得到的结果相加，然后加上偏置bias，得到一个输出值输出。依次将需要计算的特征图和权重传送到卷积运算模块进行运算，即进行乘累加操作，并将计算结果发送到输出缓存。Convolution operation unit: Multiply the m*m convolution kernel with the corresponding position element in the m*m area of the feature map, add the obtained results, and then add the bias bias to obtain an output value output. The feature maps and weights that need to be calculated are sequentially sent to the convolution operation module for operation, that is, the multiply-accumulate operation is performed, and the calculation result is sent to the output buffer.

步骤三：设计神经网络整体的数据收发逻辑，包括七个部分：Input Feature Map、Send Buffer、卷积运算模块、Receive Buffer、Output Feature Map、Dense BlockBuffer、Max Buffer。如图3所示，收发逻辑如下：Step 3: Design the overall data sending and receiving logic of the neural network, including seven parts: Input Feature Map, Send Buffer, Convolution Operation Module, Receive Buffer, Output Feature Map, Dense BlockBuffer, and Max Buffer. As shown in Figure 3, the transceiver logic is as follows:

(1)将Input Feature Map中的数据按Block的大小进行划分，并按Feature Map的顺序每次传送P个Block传送到Send Buffer，如图4所示：设输入图像的大小为320*320，输入通道数input_channels＝4，P＝64，则每张Map每次可以发送

个Block，第一次的发送顺序为1¹，1²，1³，1⁴，2¹，2²，2³，2⁴，...，16¹，16²，16³，16⁴，其中i^j表示第j张Map的第i个Block，其后的发送顺序以此类推；(1) Divide the data in the Input Feature Map according to the size of the Block, and transmit P blocks each time to the Send Buffer in the order of the Feature Map, as shown in Figure 4: Set the size of the input image to 320*320, The number of input channels input_channels=4, P=64, then each Map can be sent each time

Blocks, the first sending sequence is 1 ¹ , 1 ² , 1 ³ , 1 ⁴ , 2 ¹ , 2 ² , 2 ³ , 2 ⁴ ,..., 16 ¹ , 16 ² , 16 ³ , 16 ⁴ , where i ^j represents the i-th Block of the j-th Map, and the subsequent sending order is analogous;

(2)将Send Buffer中的Map数据通过DMA传输送至卷积运算模块；(2) Send the Map data in the Send Buffer to the convolution operation module through DMA transmission;

(3)将已发送的Feature Map Block对应的权重和偏置传送到Send Buffer，每次传送P个卷积核；(3) The weight and offset corresponding to the sent Feature Map Block are sent to the Send Buffer, and P convolution kernels are sent each time;

(4)将Send Buffer中的权重和偏置通过DMA传输送至卷积运算模块，并开始进行卷积运算；(4) Send the weight and offset in the Send Buffer to the convolution operation module through DMA transmission, and start the convolution operation;

(5)将计算结果通过DMA传送至Receive Buffer；(5) Transfer the calculation result to Receive Buffer through DMA;

(6)一批Feature Map和权重进行卷积运算完之后，若不需要进行max池化，则将Receive Buffer中接收到的计算结果按Feature Map的顺序传送至Output Feature Map，每个接收到的Block放至对应Map的对应位置；(6) After a batch of Feature Maps and weights are convolved, if max pooling is not required, the calculation results received in the Receive Buffer are sent to the Output Feature Map in the order of Feature Maps. Put the Block to the corresponding position of the corresponding Map;

若需要进行max池化，则将Receive Buffer中接收到的计算结果按Feature Map的顺序传送至Max Buffer，每个接收到的Block放至对应Map的对应位置，将Max Buffer中的数据进行max池化并送至Output Feature Map；If max pooling is required, the calculation results received in the Receive Buffer are sent to the Max Buffer in the order of the Feature Map, each received Block is placed in the corresponding position of the corresponding Map, and the data in the Max Buffer is max pooled and send it to the Output Feature Map;

若为稠密连接块，则其中每一层都将其前面所有层的输出在Dense Block Buffer中拼接后存入Input Feature Map作为本层的输入，并将本层的输出通过Dense BlockBuffer传递给块中所有的后续层。稠密连接块的具体设计见步骤五。If it is a densely connected block, each layer will splicing the outputs of all previous layers in the Dense Block Buffer and store it in the Input Feature Map as the input of this layer, and pass the output of this layer to the block through the Dense BlockBuffer. all subsequent layers. See Step 5 for the specific design of the dense connection block.

(7)将Input Feature Map和Output Feature Map指针对调，从而实现将上一层的输出作为下一层的输入。(7) Swap the Input Feature Map and Output Feature Map pointers, so as to realize the output of the previous layer as the input of the next layer.

步骤四：根据稠密连接神经网络各层输入输出数据量的大小，设计Input FeatureMap、Output Feature Map、Dense Block Buffer所需的存储区域大小。根据Block大小和卷积运算单元的并行度设计Send Buffer、Receive Buffer所需存储区域的大小。Step 4: Design the storage area size required for Input FeatureMap, Output Feature Map, and Dense Block Buffer according to the size of the input and output data of each layer of the densely connected neural network. Design the size of the storage area required for Send Buffer and Receive Buffer according to the block size and the parallelism of the convolution operation unit.

(1)Input Feature Map：(1) Input Feature Map:

max{input_featureMapSize_l ²*input_channels_l|l＝0，1，...，n}max{input_featureMapSize _l ² *input_channels _l |l=0,1,...,n}

其中input_featureMapSize_l为第l层输入的特征图尺寸，input_channels_l为第l层的输入通道数量，n为整个卷积神经网络的层数。where input_featureMapSize _l is the input feature map size of the lth layer, input_channels _l is the number of input channels of the lth layer, and n is the number of layers of the entire convolutional neural network.

(2)Output Feature Map：(2)Output Feature Map:

max{output_featureMapSize_l ²*output_channels_l|l＝0，1，...，n}max{output_featureMapSize _l ² *output_channels _l |l=0,1,...,n}

其中output_featureMapSize_l为第l层输出的特征图尺寸，output_channels₁为第1层的输出通道数量，n为整个卷积神经网络的层数。where output_featureMapSize _l is the feature map size output by the first layer, output_channels ₁ is the number of output channels of the first layer, and n is the number of layers of the entire convolutional neural network.

(3)Dense Block Buffer：(3) Dense Block Buffer:

其中input_featureMapSize_i为第i个稠密连接块第1层的输入特征图尺寸，input_channels_i为输入通道数，j为第i个稠密连接块的第j层，n_i为第i个稠密连接块的层数，output_featureMapSize_j为第i个稠密连接块第j层的输出特征图尺寸，output_channels_j为输出通道数，m为稠密连接块的个数。where input_featureMapSize _i is the input feature map size of the first layer of the ith dense connection block, input_channels _i is the number of input channels, j is the jth layer of the ith dense connection block, and _ni is the layer of the ith dense connection block output_featureMapSize _j is the output feature map size of the jth layer of the ith dense connection block, output_channels _j is the number of output channels, and m is the number of dense connection blocks.

(4)Send Buffer、Receive Buffer：blockSize²*P(4) Send Buffer, Receive Buffer: blockSize ² *P

其中blockSize为Input Feature Map划分的Block大小，P为卷积运算单元的个数。where blockSize is the block size divided by the Input Feature Map, and P is the number of convolution operation units.

步骤五：根据稠密连接神经网络各层的特点具体设计其数据收发逻辑，包括对于输入通道数input_channels＜P的卷积层、输入通道数input_channels＞P的卷积层、max池化层、稠密连接块的具体设计。Step 5: Design the data sending and receiving logic according to the characteristics of each layer of the densely connected neural network, including the convolutional layer with the number of input channels input_channels<P, the convolutional layer with the number of input channels input_channels>P, the max pooling layer, and the dense connection. The specific design of the block.

(1)对于输入通道数input_channels＜P的卷积层，每次发送P个Block进行卷积运算，得到的结果即为对应的输出层的Block。(1) For the convolutional layer with the number of input channels input_channels<P, each time P blocks are sent to perform the convolution operation, and the obtained result is the Block of the corresponding output layer.

(2)对于输入通道数input_channels＞P的卷积层，每次发送P个Block进行卷积运算，得到的结果暂存至Max Buffer，若发送的次数

则将每次卷积运算结果对应的Block相加后暂存至Max Buffer，若发送的次数

则将卷积运算结果对应的Block相加后存至Output FeatureMap，其中input_channels必须为P的整数倍。(2) For the convolutional layer with the number of input channels input_channels>P, each time P blocks are sent for convolution operation, and the obtained result is temporarily stored in the Max Buffer.

Then add the blocks corresponding to the results of each convolution operation and temporarily store them in the Max Buffer.

Then add the blocks corresponding to the results of the convolution operation and store them in the Output FeatureMap, where input_channels must be an integer multiple of P.

(3)若下一层为max池化层，则将本层Receive Buffer中接收到的计算结果按Feature Map的顺序传送至Max Buffer，每个接收到的Block放至相应Map的对应位置，将本层的所有输出数据传送至Max Buffer后进行池化操作(m*m max pooling，strides＝m)：(3) If the next layer is the max pooling layer, the calculation results received in the Receive Buffer of this layer are sent to the Max Buffer in the order of the Feature Map, and each received Block is placed in the corresponding position of the corresponding Map, and the All output data of this layer are sent to Max Buffer and then pooled (m*m max pooling, strides=m):

将Max Buffer中的数据按m*m划分为多个块，对每个块的m*m个数据取其中的最大值送至Output Feature Map中的对应位置。The data in the Max Buffer is divided into multiple blocks according to m*m, and the maximum value of the m*m pieces of data in each block is taken and sent to the corresponding position in the Output Feature Map.

(4)若下一层需要加padding，则在本层发送Map数据至Send Buffer前先将OutputFeature Map的一块连续存储区域S置零，S＝(output_featureMapSize+padding)²*output_channels，其中output_featureMapSize为本层输出特征图的尺寸，padding为需要加padding的行/列数，output_channels为本层的输出通道数量。如图5所示，图(a)为单张Map加padding的示意图，图(b)为将加好padding的所有Map按顺序存放至Output FeatureMap的示意图。根据图5(b)所示Output Feature Map的存放顺序，通过将Send Buffer中接收到的输出Block按Output Feature Map的顺序存放在对应位置，来实现将每张OutputFeature Map加padding的操作。(4) If padding needs to be added to the next layer, set a continuous storage area S of OutputFeature Map to zero before sending Map data to Send Buffer at this layer, S=(output_featureMapSize+padding) ² *output_channels, where output_featureMapSize is this The size of the layer output feature map, padding is the number of rows/columns that need to be added padding, and output_channels is the number of output channels of the layer. As shown in Figure 5, Figure (a) is a schematic diagram of adding padding to a single Map, and Figure (b) is a schematic diagram of storing all the Maps with padding in the Output FeatureMap in order. According to the storage order of the Output Feature Map shown in Figure 5(b), the operation of adding padding to each Output Feature Map is realized by storing the output Blocks received in the Send Buffer in the corresponding positions in the order of the Output Feature Map.

(5)对于稠密连接块，将第i层的计算结果存至Output Feature Map时，要预留出

的存储区域用于拼接前面所有层的输出作为下一层的输入，其中input_channels₁为稠密连接块第一层的输入通道数，map_size为特征图尺寸，i为稠密连接块中的第i层，output_channels₁为第1层的输出通道数，m为稠密连接块的层数。(5) For dense connection blocks, when storing the calculation results of the i-th layer to the Output Feature Map, reserve

The storage area is used to splicing the outputs of all previous layers as the input of the next layer, where input_channels ₁ is the number of input channels of the first layer of the dense connection block, map_size is the feature map size, i is the ith layer in the dense connection block, output_channels ₁ is the number of output channels of the first layer, and m is the number of layers of densely connected blocks.

如图6所示，将稠密连接块的特征图输入以及块中各层的特征图输出依次存入Dense Block Buffer中，对于第i+1层的输入，将Dense Block Buffer中的

个数据及第i层的输出拼接后存入Input Feature Map作为本层输入。As shown in Figure 6, the feature map input of the densely connected block and the feature map output of each layer in the block are sequentially stored in the Dense Block Buffer.

The data and the output of the i-th layer are spliced and stored in the Input Feature Map as the input of this layer.

以上实施例仅为说明本发明的技术思想，不能以此限定本发明的保护范围，凡是按照本发明提出的技术思想，在技术方案基础上所做的任何改动，均落入本发明保护范围之内。The above embodiments are only to illustrate the technical idea of the present invention, and cannot limit the protection scope of the present invention. Any changes made on the basis of the technical solution according to the technical idea proposed by the present invention all fall into the protection scope of the present invention. Inside.

Claims

1. an implementation method of a densely connected neural network based on FPGA, is characterized in that comprising the steps:

Step 1. Divide the entire convolutional neural network into multiple densely connected blocks;

Step 2, design the convolution operation module on the FPGA side;

Step 3: Design the overall data sending and receiving logic of the neural network, including seven parts: Input Feature Map, SendBuffer, Convolution Operation Module, Receive Buffer, Output Feature Map, Dense Block Buffer, MaxBuffer;

The transceiver logic in step 3 is as follows:

Logic 1: Divide the data in the Input Feature Map according to the size of the block, and transmit P blocks each time to the Send Buffer in the order of the Feature Map: set the size of the input image to 320*320, and the number of input channels input_channels=4 , P=64, then each Map can be sent every time

Blocks, the first sending sequence is 1 ¹ , 1 ² , 1 ³ , 1 ⁴ , 2 ¹ , 2 ² , 2 ³ , 2 ⁴ ,…, 16 ¹ , 16 ² , 16 ³ , 16 ⁴ , where i ^j represents the i-th Block of the j-th Map, and the subsequent sending order is analogous;

Logic 2, send the Map data in the Send Buffer to the convolution operation module through DMA;

Logic 3, transmit the weight and bias corresponding to the sent Feature Map Block to the Send Buffer, and transmit P convolution kernels each time;

Logic 4: Send the weights and offsets in the Send Buffer to the convolution operation module through DMA, and start the convolution operation;

Logic 5, transfer the calculation result to Receive Buffer through DMA;

Logic 6: After a batch of Feature Maps and weights are convolved, if max pooling is not required, the calculation results received in the Receive Buffer are sent to the Output Feature Map in the order of Feature Maps. The Block is placed in the corresponding position of the corresponding Map; if max pooling is required, the calculation results received in the Receive Buffer are sent to the Max Buffer in the order of the Feature Map, and each received Block is placed in the corresponding position of the corresponding Map, The data in the Max Buffer is max-pooled and sent to the Output Feature Map; if it is a dense connection block, each layer will splicing the outputs of all the previous layers in the Dense Block Buffer and store it in the Input Feature Map as this. The input of the layer, and the output of this layer is passed to all subsequent layers in the block through the Dense Block Buffer;

Logic 7: Swap the Input Feature Map and Output Feature Map pointers, so as to use the output of the previous layer as the input of the next layer;

Step 4: According to the size of the input and output data of each layer of the densely connected neural network, design the storage area size required for the Input Feature Map, Output Feature Map, and Dense Block Buffer, and design the Send Buffer according to the block size and the parallelism of the convolution operation unit. , the size of the storage area required by the Receive Buffer;

Step 5, design its data sending and receiving logic according to the characteristics of each layer of the densely connected neural network;

In the step 5, the design data sending and receiving logic includes the design of the convolutional layer with the number of input channels input_channels<P, the convolutional layer with the number of input channels input_channels>P, the max pooling layer, and the design of the dense connection block, specifically:

For the convolutional layer with the number of input channels input_channels<P, each time P blocks are sent to perform the convolution operation, and the obtained result is the Block of the corresponding output layer;

For the convolutional layer with the number of input channels input_channels>P, each time P blocks are sent to perform the convolution operation, and the obtained result is temporarily stored in the Max Buffer.

Then add the blocks corresponding to the convolution operation results and store them in the Output Feature Map, where input_channels must be an integer multiple of P;

If the next layer is the max pooling layer, the calculation results received in the Receive Buffer of this layer are sent to the Max Buffer in the order of Feature Map, each received Block is placed in the corresponding position of the corresponding Map, and the All output data is sent to Max Buffer and then pooled, m*m max pooling, strides=m:

Divide the data in the Max Buffer into multiple blocks according to m*m, and take the maximum value of the m*m pieces of data in each block and send it to the corresponding position in the Output Feature Map;

For densely connected blocks, when storing the calculation results of the i-th layer to the Output Feature Map, reserve

The storage area is used to splicing the outputs of all previous layers as the input of the next layer, i=1,2,...,m-1, where input_channels ₁ is the number of input channels of the first layer of the dense connection block, and map_size is the feature map size , i is the ith layer in the densely connected block, output_channels ₁ is the number of output channels of the lth layer, and m is the number of layers of the densely connected block.

2. the realization method of a kind of dense connection neural network based on FPGA as claimed in claim 1 is characterized in that: in described step 1, the design rule that whole convolutional neural network is divided into a plurality of dense connection blocks is:

First, the feature map size of each layer in each dense connection block is consistent;

Second, each layer in each densely connected block concatenates the outputs of all previous layers as the input of this layer, and transmits the output of this layer to all subsequent layers in the block;

Third, the densely connected blocks are connected by convolutional pooling layers.

3. the realization method of a kind of dense connection neural network based on FPGA as claimed in claim 1, is characterized in that: the concrete content of described step 4 is:

Input Feature Map:

maxinput_featureMapSize ₁ ² *input_channels _l |l=0,1,…,n}

where input_featureMapSize _l is the input feature map size of the lth layer, input_channels _l is the number of input channels of the lth layer, and n is the number of layers of the entire convolutional neural network;

Output Feature Map:

max{output_featureMapSize _l ² *output_channels _l |l=0,1,…,n}

where output_featureMapSize _l is the feature map size output by the lth layer, output_channels _l is the number of output channels of the lth layer, and n is the number of layers of the entire convolutional neural network;

Dense Block Buffer:

where input_featureMapSize _i is the input feature map size of the first layer of the ith dense connection block, input_channels _i is the number of input channels, j is the jth layer of the ith dense connection block, and _ni is the layer of the ith dense connection block number, output_featureMapSize _j is the output feature map size of the jth layer of the ith dense connection block, output_channels _j is the number of output channels, m is the number of dense connection blocks;

Send Buffer, Receive Buffer: blockSize ² *P

where blockSize is the block size divided by the Input Feature Map, and P is the number of convolution operation units.