CN109784489B - Convolutional neural network IP core based on FPGA - Google Patents

Convolutional neural network IP core based on FPGA Download PDF

Info

Publication number
CN109784489B
CN109784489B CN201910038533.1A CN201910038533A CN109784489B CN 109784489 B CN109784489 B CN 109784489B CN 201910038533 A CN201910038533 A CN 201910038533A CN 109784489 B CN109784489 B CN 109784489B
Authority
CN
China
Prior art keywords
core
convolution
neural network
input
pooling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910038533.1A
Other languages
Chinese (zh)
Other versions
CN109784489A (en
Inventor
常瀛修
廖立伟
曹健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yu Dunshan
SCHOOL OF SOFTWARE AND MICROELECTRONICS PEKING UNIVERSITY
Original Assignee
Yu Dunshan
SCHOOL OF SOFTWARE AND MICROELECTRONICS PEKING UNIVERSITY
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yu Dunshan, SCHOOL OF SOFTWARE AND MICROELECTRONICS PEKING UNIVERSITY filed Critical Yu Dunshan
Priority to CN201910038533.1A priority Critical patent/CN109784489B/en
Publication of CN109784489A publication Critical patent/CN109784489A/en
Application granted granted Critical
Publication of CN109784489B publication Critical patent/CN109784489B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a convolution neural network IP core based on FPGA, aiming at realizing the operation acceleration of the convolution neural network on a field programmable logic array (FPGA). The invention specifically comprises a convolution operation IP core, a pooling operation IP core, a full-connection operation IP core, a bubbling convolution layer, a bubbling pooling layer, a full-connection layer, a characteristic diagram storage module and a parameter storage module according to a basic model of a convolution neural network. The invention supports the construction of convolution neural networks with different scales by various IP cores, and instantiates IP cores with different types and quantities according to the required network model. Different neural network layers are constructed by instantiating IP cores, and the parallelism of the FPGA is fully utilized to realize the acceleration of the operation of the convolutional neural network. Different FPGA migration was realized by designing the IP core in Verilog HDL language. The invention greatly improves the operation speed and efficiency of the convolutional neural network and reduces the processing power consumption of the convolutional neural network.

Description

Convolutional neural network IP core based on FPGA
Technical Field
The invention relates to the field of hardware acceleration of a convolutional neural network, in particular to design of an IP core of the convolutional neural network based on an FPGA.
Background
With the recent rise and improvement of machine learning, deep learning, and Artificial intelligence, an Artificial Neural Network (ANN) is continuously developed, and is being concerned by academic and commercial industries as an Artificial intelligence field of cross-infiltration of bioscience and computer science. Early artificial neural networks were similar to the structure of the biomimetic nervous system, and computational structures that mimic the structure of human cerebral neurons were proposed in the mid-20 th century. The method is characterized in that dendritic branches of a human neuron structure are simulated into a plurality of input data, axons are simulated into single output data, and nerve signal output of the axons is achieved through certain data transformation, namely linear weighting.
The threshold value and the weight of various linear weights are manually set, the best result is not necessarily achieved, researchers in the 20 th century and the 70 th century think that a perceptron model cannot solve the linear indifference problem, the calculation capability is low at that time, a multilayer neural network model cannot be achieved, and the research of a neural network enters the low tide.
In order to solve the complex calculation problem and the linear irreparable problem of the neural network in the end of the 80 s of the 20 th century, researchers propose a back propagation algorithm, so that the calculation time of the neural network is greatly reduced. Up to now, the back propagation algorithm is still the mainstream algorithm for neural network training. Training of deep neural networks is still difficult because the computational resources at the time are still insufficient.
In 1989, researchers proposed the first to have a true Convolutional Neural Network (CNN) LeNet-5, and the Convolutional Neural Network gradually became the most widely used one of various deep Neural networks. With the development of neural network algorithms, convolutional neural networks are widely applied to the fields of image and pattern recognition, target detection, semantic segmentation and the like.
Because the convolutional neural network is widely applied, the convolutional neural network is gradually paid attention from the academic and business fields, and the convolutional neural network has clear advantages in the aspects of image processing, particularly in the aspects of scaling of image size, feature map extraction and the like. In order to fit the application of the industrial field, the learning capability and the classification capability of the convolutional neural network are continuously improved, so that the structure of the convolutional neural network is gradually complicated, and a large-scale and deep-level network needs to train parameters of the neural network by using a large number of samples, so that the calculation amount of the training process is huge. Massive training parameters of large-scale and deep-level convolutional neural networks need higher storage resources, high-throughput data processing and high parallelism, so that the characteristics of the convolutional neural networks cannot be exerted based on the control theory of the traditional computer structure. The development of cuda (computer Unified Device architecture) of NVIDIA and the tensrflow framework of Google supports the high-performance numerical calculation of GPU, and relieves the calculation pressure of the general-purpose architecture CPU to a certain extent, but the research and manufacturing cost, the energy efficiency ratio and the like of the framework cannot meet the requirements of low power consumption and high performance, and the framework is limited by volume and portability and is difficult to support the application scene of the convolutional neural network on the terminal.
Field Programmable Gate Arrays (FPGAs) are emerging as semi-custom Integrated circuits in the Field of Application Specific Integrated Circuits (ASICs). The FPGA combines the high performance and the high integration of the ASIC and the flexibility of a user programmable device, and is characterized by reconfigurability, higher performance and integration and large hardware upgrading space. Due to the reconfigurability of the FPGA, the FPGA is matched with the high-performance numerical calculation of the convolutional neural network under the condition that a general convolutional neural network special chip framework does not exist at present, and the low energy efficiency ratio of single-thread processing of a CPU and a GPU of the general framework is avoided. And the FPGA product has high marketing speed, and can be quickly put into the market at the present day when the neural network structure is changed day by day, so that the poor flexibility that the ASIC chip can only be designed according to a specific algorithm is avoided.
Existing FPGA acceleration schemes are roughly as follows: 1. by adopting a low-power-consumption high-performance accelerator design, the memory access bandwidth is improved by stacking a small number of Processing units (PEs), but the data throughput is low due to fewer pipelines, and even a redundant pipeline design in the convolution process exists. 2. The convolution kernels with different sizes are built by adopting smaller Processing units (PE), so that the problem of computation bottleneck is avoided, but the delay time of convolution computation is longer, and the peak computation performance is limited to some extent. 3. The accelerator of frequency domain processing is designed, and the OaA convolution kernel with variable size is used for reducing the convolution times and improving the universality of the convolution kernels of different levels, but OaA of the accelerator is composed of FFT with fixed size, so that the convolution processing process needs 0 filling FFT edge, and the convolution kernels are tiled, so that the convolution delay time is longer. 4. The hybrid neural network processor is adopted to realize neural network acceleration, different neural networks are processed according to requirements and configuration through 16 multiplied by 16 reconfigurable heterogeneous PE, and self-adaptive bit width configuration is adopted to reduce power consumption and improve efficiency. The main reason for the reconfigurable computing platform is that the current market has few general neural network processors, and an accelerator is basically designed for a neural network or a neural network model. The reconfigurable computing platform can adapt to most of the existing neural networks or neural network models, including CNN, FCN, RNN and the like, and the energy efficiency ratio of the computing platform is improved. The neural network solidified by the reconfigurable computing platform comprises CNN, FCN and RNN, occupies larger logic resources, so that the power consumption of the computing process of the platform cannot be effectively reduced, the expandability is low, the requirements on FPGA resources and performance are high, and the research and development cost is increased.
In summary, the convolutional neural network and the terminal application market thereof are very wide, and although the convolutional neural network converts the network model thereof for different application scenarios, the basic components of the convolutional neural network, such as convolution, pooling, full connection, activation functions and the like, do not change much, so the FPGA can cope with the convolutional neural network of different scenarios. On one hand, an IP core designed aiming at the convolutional neural network comprises basic components of the convolutional neural network, and the IP core is designed through a Verilog HDL language, is easy to be deployed in different FPGAs and embedded systems, and has strong portability. On the other hand, hardware design, modification, and debugging are a threshold for software engineers and algorithm engineers unfamiliar with hardware, which may increase costs and man-hours for enterprises. In order to solve the problem of door opening and reduce the working time cost of enterprises and researchers, an IP core designed for the convolutional neural network comprises a callable interface, so that a user can conveniently construct different convolutional neural network models on different FPGAs, and the important significance is achieved in supporting hardware acceleration of the convolutional neural network.
Disclosure of Invention
The invention provides a convolution neural network IP core based on FPGA, aiming at quickly and conveniently constructing a hardware structure of the convolution neural network in the FPGA, realizing acceleration of feedforward operation of the convolution neural network, reducing hardware design thresholds of software and algorithm engineers and facilitating development and verification of algorithms and terminal products.
The anticipated application scenario of the present invention requires the FPGA as the accelerator hardware platform, a reconfigurable convolutional neural network model, a data stream containing the convolution, pooling, full connectivity, activation functions, and convolutional neural network signature. The method has the characteristics of low power consumption while improving the operational performance and efficiency. The FPGA has standard interface configuration and supports the extension of a convolutional neural network.
In order to achieve the purpose, the invention adopts the following technical scheme:
the convolutional neural network IP core based on the FPGA is characterized in that the specific IP core and the composition module comprise a convolutional operation IP core, a pooling operation IP core, a full-connection operation IP core, a bubbling method convolutional layer, a bubbling method pooling layer, a full-connection layer, a feature map storage module and a parameter storage module.
The neural network layer formed by the IP cores is internally interconnected with the parameter storage module and the feature map storage module, and a hardware structure formed by the neural network layer, the parameter storage module and the feature map storage module is consistent with a required convolutional neural network algorithm structure.
The convolution neural network IP core based on the FPGA is characterized in that the convolution operation IP core comprises an input feature map buffer, a weight parameter buffer, a multiplier, an adder and an activation function module;
1) and the convolution operation IP core reads the characteristic points in the characteristic diagram one by one according to rows in each clock cycle to form a data stream. The input characteristic diagram buffer is a register group with configurable depth, is used for shifting and buffering the data stream of the input characteristic diagram, modifies the depth of the register group according to the number of rows and columns of the input characteristic diagram, supports convolutional neural networks with different scales, and the input characteristic diagram data of a fixed address interval is connected with a multiplier.
2) The weight parameter buffer is a register group with configurable depth and is used for shifting and buffering weight parameters, and the weight parameter buffer is fixed after the register group is filled with the weight parameters. The hardware structure of the weight parameter buffer is the same as that of the input characteristic diagram buffer, namely, the weight parameter data of the same fixed address interval is connected with the multiplier.
3) The multiplier and the adder connected with the input feature diagram buffer and the weight parameter buffer fixed address interval form a multiplication-addition pair, and the feature points in the feature diagram and the corresponding weight parameters jointly form complete convolution operation.
4) The convolution operation IP core adopts a common ReLU activation function, and the activation function module is a MUX multiplexer which is equivalent to the formula f (x) max (0, x).
The convolution neural network IP core based on the FPGA is characterized in that the pooling operation IP core comprises an input feature map buffer and a comparator;
1) the input characteristic diagram buffer is a register group with configurable depth, is used for shifting and buffering the input characteristic diagram data stream, can modify the depth of the register group according to the number of rows and columns of the input characteristic diagram to support convolutional neural networks with different scales, and the input characteristic diagram data of a fixed address interval is connected with the comparator.
2) The input port of the comparator is connected with 4 fixed address intervals of the input feature map buffer, and the output port is the maximum value of the 4 data.
The convolution neural network IP core based on the FPGA is characterized in that the full-connection operation IP core comprises a counter, an accumulator, a multiplier and an adder;
1) the multiplier inputs the input node data and the weight parameters which correspond to each other one by one respectively to complete the multiplication operation of the input node and the weight parameters.
2) The accumulator is composed of a register and an adder, and the operation result of the multiplier is input into the register of the accumulator and is accumulated with the multiplication result of the next clock cycle to form multiply-accumulate operation.
3) The counter controls the iteration cycle of multiply-accumulate operation, and one iteration cycle is that all input nodes and the corresponding weight parameters complete multiply-accumulate operation to obtain data of one output node. And after one iteration period is finished, the control result of the counter is multiplied and accumulated and is added with the offset parameter corresponding to the output node.
The convolution neural network IP core based on the FPGA is characterized in that the bubbling convolution layer comprises a bubbling controller and a convolution operation IP core;
1) the three clock sequences of the bubble method controller are respectively a read input characteristic diagram sequence, a read weight parameter sequence and a convolution process sequence. And reading the waiting interval of the input characteristic diagram sequence and the read weight parameter sequence, which is the time when the input characteristic diagram data and the weight parameter data respectively enter the convolution operation IP core input characteristic diagram buffer and the weight parameter buffer, wherein the convolution result of the interval is invalid. Reading the holding interval of the weight parameter sequence to fill the weight parameter buffer with the weight parameter data and holding for a fixed time, wherein the input feature map data flow in the interval passes through the input feature map buffer, the process is actually a convolution processing process, and the convolution result of the interval is partially effective. The waiting interval of the convolution process sequence is the same as the input characteristic diagram sequence and the read weight parameter sequence. The effective interval in the convolution operation process represents that the data of the output characteristic diagram is effective, namely an effective result generated by a convolution kernel without crossing rows; the invalid interval represents the data invalidity of the output characteristic diagram, namely an invalid result generated by the cross-row convolution kernel. If the convolution process is liquid and the invalid section is gas, the valid result of the convolution process is equivalent to discharging excessive gas from the liquid, so the method is called bubbling.
2) And stacking convolution operation IP cores with the same number as the depth of the input feature map to form a single convolution core of the convolutional layer, instantiating the convolution core required by the convolutional layer, and forming the bubbling convolutional layer together with the bubbling controller.
The convolution neural network IP core based on the FPGA is characterized in that the bubbling method pooling layer comprises a bubbling method controller and a pooling operation IP core;
1) the four clock sequences of the bubble method controller are respectively a read input characteristic diagram sequence, a column effective sequence, a row effective sequence and a pooling process sequence. The waiting intervals of the four clock sequences are the time when the input feature map data enter the input feature map buffer of the pooling operation IP core, and the interval pooling result is invalid. Each effective pooling result of the column effective sequence is separated by an ineffective pooling result, which indicates that the step length of the pooling filter column is 2; the results for each active line of the line active sequence are separated by one inactive line, indicating a step size of 2 for the pooled filtered line. Assuming that the pooling process is liquid and the invalid pooling results of the column active sequences and the row active sequences are gas, the pooling process is equivalent to draining excess gas from the liquid, similar to a bubble convolution layer controller, and is referred to as bubbling.
2) And stacking the pooling operation IP cores with the same number as the depth of the input feature map to form a single pooling core of the pooling layer, and forming the bubbling method pooling layer together with the bubbling method controller.
The convolution neural network IP core based on the FPGA is characterized in that the full-connection layer comprises a full-connection layer controller and a full-connection operation IP core;
1) the full-connection layer controller comprises a weight parameter address reader, a bias parameter address reader, a counter and an input characteristic diagram address reader. The input characteristic graph address reading device and the weight parameter address reading device output an address signal in each clock cycle, and respectively read input nodes and weight parameter data which are in one-to-one correspondence to enter the full-connection operation IP core. The offset parameter reading addresser is controlled by a counter to complete the multiplication and accumulation of the first output node and then read the offset parameter corresponding to the first output node.
2) And the control port of the full-connection layer controller is in point-to-point connection with the corresponding ports of the characteristic map storage module and the parameter storage module and forms a full-connection layer together with the full-connection operation IP core.
The convolutional neural network IP core based on the FPGA is characterized in that the feature map storage module comprises global feature map caches corresponding to all dimensions, and the internal structure of the feature map caches is composed of shift caches. The buffer depth is the number of all the characteristic points of the characteristic diagram line multiplied by the number of all the characteristic points of the characteristic diagram column, and the number of the buffers is the depth of the three-dimensional characteristic diagram. The input port of the characteristic graph storage module is connected with the address port corresponding to each layer of controller in a point-to-point manner, and the output port is connected with the input port of the IP core of each neural network layer.
The convolutional neural network IP core based on the FPGA is characterized in that the parameter storage module comprises a global weight parameter and a bias parameter of the convolutional neural network model and is interconnected with each neural network layer.
Drawings
FIG. 1 is a system block diagram of the IP core of the FPGA-based convolutional neural network of the present invention;
FIG. 2 is a block diagram of the hardware structure and internal and external components of the convolution operation IP core of the present invention;
FIG. 3 is a schematic diagram of the timing control logic of the bubble convolutional layer of the present invention;
FIG. 4 is a block diagram of the bubble convolution layer hardware structure and internal and external components of the present invention;
FIG. 5 is a block diagram of the hardware structure and internal and external components of the pooling computing IP core of the present invention;
FIG. 6 is a schematic diagram of the timing control logic of the bubbling pooling layer of the present invention;
FIG. 7 is a block diagram of the bubbling pond level hardware architecture and internal and external components of the present invention;
FIG. 8 is a schematic diagram of a fully-connected neural network according to the present invention;
FIG. 9 is a diagram of the hardware architecture of a fully-connected arithmetic IP core of the present invention;
FIG. 10 is a block diagram of the hardware architecture and internal and external components of the fully-connected layer controller according to the present invention;
FIG. 11 is a block diagram of the fully-connected hardware architecture and internal and external components of the present invention;
fig. 12 is a schematic diagram of a power measurement result during FPGA operation.
Detailed Description
The invention designs the convolution neural network IP core based on the FPGA to carry out operation acceleration on the feedforward propagation of the convolution neural network from 3 aspects of convolution, pooling and full connection which are intensive in calculation of the convolution neural network by analyzing the basic characteristics of the convolution neural network, researching the current research situation at home and abroad at present and analyzing the advantages and disadvantages of the convolution neural network and combining the high parallelism, the high energy efficiency ratio and the reconfigurability of the FPGA. An IP core designed using the Verilog HDL language can efficiently build the required hardware structures using minimal logic resources and is easily ported to different types of FPGAs.
The following basic unit definitions are first given for the following detailed description and the description of the mathematical expressions:
table 1 CNN basic unit definitions of the invention
Figure GSB0000193712460000071
Referring to fig. 1, a neural network accelerator composed of convolutional neural network IP cores based on FPGA includes a convolutional operation IP core, a pooling operation IP core, a full-connection operation IP core, a bubble convolution layer, a bubble pooling layer, a full-connection layer, a feature map storage module, and a parameter storage module.
Fig. 2 specifically describes an internal hardware structure of the convolution operation IP core and a connection manner of external component modules.
The convolution operation IP core comprises an input characteristic diagram buffer, a weight parameter buffer, a multiplier, an adder and an activation function module. Before the convolution operation works, all parameters of the trained CNN are stored in a parameter storage module of the BRAM, so that the access and storage efficiency of the parameters is improved. Parameters (parameters) IC and CR within the convolution IP core are manually modified to conform to the size requirements of the current convolutional layer. When the convolution operation is started, the characteristic point pixel values in the characteristic map storage module pass through an input characteristic map buffer in the convolution operation IP core line by line in a data stream mode. If the first layer of convolution layer is adopted, the data flow is a data flow of real-time image collected by a camera which is subjected to binarization and scaling. CC corresponds to the light shaded portion of the input signature graph buffer, the number of the light shaded portions is CR in the buffer, and two adjacent light shaded portions are separated by IC-CC unit registers. The depth of the configurable input feature map buffer is (CR-1) multiplied by IC + CC, and the IC is modified to adapt to different input feature map sizes so as to support the construction of convolutional neural networks of different scales.
Similarly, the weight parameter of the parameter storage module also enters a weight parameter buffer in the IP core in the form of data stream, and the structure of the buffer is consistent with that of the input characteristic diagram buffer. The weight parameters are stored by filling (IC-CC) 0 effective weight parameters every CC, and are read in sequence in the convolution process.
The weight parameter buffer keeps unchanged after the data is stored, the input feature map buffer shifts to continue buffering all input feature map data streams, and the two shift buffers fix address intervals: { IC × 0+0, IC × 0+ 1., IC × 0+ (CR-1) }, { IC × 1+0, IC × 1+ 1., IC × 1+ (CR-1) }, { IC × (CR-1) +0, IC × (CR-1) +1, IC × (CR-1) + (CR-1) }. The number of addresses in the braces is the number of columns of the convolution kernel, and the number of the braces is the number of rows. The two shift buffer fixed address intervals are connected with the multiply-add unit to form an abstract convolution operation IP core.
The convolution process is that the first feature point data of the input feature graph enters the input feature graph buffer to the last feature point data enters the buffer, namely, the convolution kernel shifts from the upper left corner to the lower right corner of the feature graph to scan the whole input feature graph.
The bubble method convolutional layer needs to be matched with a bubble method controller, and details of the bubble method timing control logic are described with reference to fig. 3:
referring to fig. 3(a), three clock sequences of the bubble controller are a read input feature map sequence, a read weight parameter sequence and a convolution process sequence, respectively. The waiting intervals of the three clock sequences are the time when the input feature map data and the weight parameter data respectively enter the input feature map buffer and the weight parameter buffer in the convolution operation IP core, the convolution result in the intervals is invalid, and the process delays the clock period: clkwait(CR-1) × IC + CC-1. Reading the holding interval of the weight parameter sequence to fill the weight parameter buffer with the weight parameter data and holding for a fixed time, wherein the input feature map data flow in the interval passes through the input feature map buffer, the process is actually a convolution processing process, and the convolution result of the interval is partially effective.
The valid interval of the convolution process sequence represents that the data of the output characteristic diagram is valid, corresponds to the blank valid output characteristic diagram data of the figure 3(b) and is stored immediately, and the process delays the clock period: clkValIC-CC + 1. The invalid interval of the convolution process sequence indicates that the data of the output feature map is invalid, namely, the invalid data generated across the rows of the convolution kernel in FIG. 3(b) corresponds to the grey invalid output feature in FIG. 3(b)Graph data is characterized and filtered out, the process delays clock cycles: clkUnvalCC-1. If the convolution process is liquid and the invalid section is gas, the valid result of the convolution process is equivalent to discharging excessive gas from the liquid, so the method is called bubbling. Total convolution delay clock period:
clkconv Tolal clkwait+Convolutional process=clkwait+{[(IC-CC+1)+(CC-1)]×(IR-CR)+(IC-CC+1)}=IC×IR。
referring to fig. 4, the feature map storage module has registered an input feature map of the current convolutional layer, whose depth is ID. A hardware structure of a single convolution kernel with a dashed line frame is constructed by instantiating ID general convolution operation IP kernels, and then instantiating n convolution kernels of the convolution layer to increase the number of channels, so that convolution kernel parallel convolution operation is achieved. The instantiated convolution operation IP core, convolution core and bubbling controller construct CNN convolution layers of different sizes. The feature map after convolution operation is still stored in the feature map storage module.
Referring to fig. 5, the hardware structure of the pooling operation IP core is similar to that of the convolution operation IP core, and the output feature map data of the previous layer stored in the feature map storage module is first read into the input feature map buffer of the pooling operation IP core in a data stream manner, where the buffer is a register group with configurable depth, and the depth is IC × (CR-1) + CC. Parameters IC and CR of the pooling operation IP core are manually modified to adapt to different input characteristic diagram sizes, so that the convolutional neural network building with different scales is supported.
When the input feature map data stream continuously flows into the input feature map buffer, the sliding translation of the pooling operation IP core is realized. The invention defines the size of the pooling filter to be 2 x 2, the step size is 2, so the fixed address interval of the shift register is: { IC × 0+0, IC × 0+1}, { IC × 1+0, IC × 1+1} are connected to the input ports of the comparators. The comparator takes the 4 maximum data values as the characteristic points of the output characteristic diagram and stores the characteristic points in the characteristic diagram storage module.
The bubbling method pooling layer needs to be matched with a bubbling method controller, and details of the bubbling method sequential control logic of the pooling layer are described with reference to fig. 6:
referring to FIG. 6, bubbling timing control logic for a pooling layerThere are 4 clock sequences, respectively a read input profile sequence, a column valid sequence, a row valid sequence, and a pooling process sequence. The pooling layer needs to read the data of the input feature map row by row from the feature map storage module and enter the pooling operation IP core. The 4 waiting intervals in the clock sequence indicate that the data stream of the input characteristic diagram is filling the buffer in the pooling operation IP core, and the clock period is delayed: clkwait=(CR-1)×IC+CC-1。
Referring to fig. 6, reading the input signature sequence shaded region indicates that the input signature data has filled the buffer of the IP core of the pooling operation, and the pooling process begins. The bubble method controller makes the input characteristic diagram input a data column by column and row by row into the buffer memory of the pooling operation IP core according to a clock cycle, and this process actually causes the covered part of the filter to move by step size 1, which is not in accordance with the present invention. Therefore, the column valid sequence selects or rejects the pooled output data, starting from the buffer of the IP core for which the input feature map data fills up the pooled operation, the first result is valid data of the output feature map, the result obtained after the next clock cycle enters the buffer is invalid data, and so on. The column valid sequence shaded region represents column valid data of the output feature map.
The step size of the pooling filter is 2, when the buffer shifts to store data according to the original clock period, the step size after crossing rows of the covered part of the filter is actually 1, so the output characteristic diagram data of the next row after crossing rows is invalid data, refer to the blank area in the valid sequence of row 6.
The sequence of pooling processes is similar to the sequence of convolution processes, and referring to fig. 6, the pooling process is regarded as liquid, the invalid data in the column valid sequence and the invalid data in the row valid sequence, i.e., the blank space, are regarded as gas, and the pooling process is equivalent to a process of exhausting air in the liquid, and is therefore called bubbling. The shaded regions of the pooling process sequence represent valid data, which is stored in the feature map storage module.
Hardware architecture of bubble pooling referring to fig. 7, the bubble pooling layer includes a pooling core and a bubble controller. And instantiating ID pooling operation IP cores to construct a pooling filter, and instantiating a bubbling method controller to control the pooling filter to process the input feature map. The number of the pooling filters is only 1 in CNN, but the number of the pooling operation IP cores for constructing the pooling filters is not fixed, because the pooling layers generally follow the convolutional layers, and the depth of the output feature map generated after the feature map passes through the convolutional layers is not fixed, so the number of the pooling operation IP cores in the pooling filters needs to be manually modified.
The instantiated pooling operation IP core, pooling filter and bubbling controller construct CNN pooling layers of different sizes. The characteristic graph after the pooling operation is still stored in the characteristic graph storage module.
The last layer of the CNN model is usually a fully connected layer, and the algorithm model of the fully connected layer refers to fig. 8. FIG. 8(a) shows a single-layer fully-connected neural network with n input nodes and m output nodes. x represents the value of the input node and y represents the value of the output node. FIG. 8(b) is a graph showing y1Output nodes are taken as examples, and the numerical result of any one output node is calculated according to the following formula:
Figure GSB0000193712460000101
in the CNN feedforward process, the fully-connected layer functions as a "classifier". Convolutional layers, pooling layers, and the like operate to map raw data to a feature space, and fully-connected layers map the learned "distributed feature representation" to a sample label space. And (4) carrying out probability distribution calculation on the result of the full connection layer through a softmax function to obtain effective image classification, wherein the classification can realize final image identification.
Fig. 9 specifically illustrates an internal hardware structure of the fully-connected arithmetic IP core.
The full-connection operation IP core is responsible for operation of the weight parameter and the input node data, and outputs an operation result through an Outnode port, wherein the operation result is a numerical value of an output node. The full-connection operation IP core comprises a counter, an accumulator, a multiplier and an adder. Each input node and any output node have one-to-one corresponding weight parameter, and each output node has one-to-one corresponding bias parameter. With reference to fig. 8(b), the value of the first output node is calculated by performing one-to-one corresponding multiply-add calculation on the n input nodes and the first n parameters of the n × m weight parameters and added to the offset, and similarly, the value of the second input node is still calculated by the n input nodes and the second n parameters of the n × m weight parameters and added to the offset.
Referring to fig. 9, n data of the input nodes need to be read circularly m times, and the weight parameter data are read sequentially, so that the data of each input node and the weight parameter can be ensured to be in one-to-one correspondence. The first clock period full-connection operation IP core inputs a first pair of input node data x through f data and w _ data ports respectively1And weight parameter w1,x1And w1Firstly, performing multiplication calculation and inputting a result into an accumulator for temporary storage; second clock cycle input x2And w2Performing multiplication calculation and adding x in accumulator1×w1And accumulating the result, and temporarily storing the accumulated result in an accumulator. By analogy, after n clock cycles, the accumulator completes n times of multiplication accumulation operation, the counter controls the gate switch to open and output the accumulation result, and then the accumulation result is added with the Bias parameter (Bias), and the final result is output through the egress port. The result is calculated from the following formula:
y1=x1×w1+x2×w2+...+xn×wn+bias1
n clock cycles may result in the first of the output nodes. Similarly, the second output node needs to read in the input node data and the weight parameter again, and the result of the second output node is calculated through n clock cycles. Since the number of output nodes is m, it takes n × m clock cycles to obtain the results of all output nodes.
And the full-connection operation IP core is responsible for inputting node data and corresponding weight parameters and calculating, and the final m output node data only need to judge the size of the data, so that the full-connection layer feedforward process is finally completed.
Referring to fig. 10, a hardware structure of the fully-connected layer controller is specifically described, and details of a sequential control logic of the fully-connected layer controller are described in conjunction with fig. 8:
the full-connection layer controller is responsible for the operations of data, weight parameters, bias parameters, memory access and the like of the input nodes. The hardware structure of the fully-connected layer controller is shown in fig. 10. The input characteristic diagram address reading device and the weight parameter address reading device output an address signal in each clock cycle, respectively read one data from the characteristic diagram storage module and the parameter storage module and simultaneously enter the full-connection operation IP core. The offset parameter reading addresser needs to be controlled by a counter, and after all input node parameters and weight parameters of the first output node are read, the offset parameter corresponding to the first output node is read.
With reference to fig. 8, it is assumed that the number of input nodes of the full connection layer is n, the number of output nodes is m, the cycle period of the input feature map address reading device is n clock cycles, the cycle period of the weight parameter address reading device is n × m clock cycles, and the offset parameter address reading device reads one data for every n clock cycles. The time required for the full link layer to complete the computation is n × m clock cycles.
Referring to fig. 11, instantiated fully-connected operational IP cores, fully-connected layer controllers build CNN pooling layers of different sizes. Fig. 11(a) is a schematic diagram of a single-layer fully-connected neural network, and fig. 11(b) is a hardware configuration diagram of the layer fully-connected neural network. Wherein x1,x2,...,xnIndicating a fully connected layer with n input nodes, y1Representing one of a plurality of output nodes, w1,w2,...,wnRepresenting the calculation of weight parameters of the output node of y 1.
The upper layer of the fully-connected layer in the CNN network structure is usually a pooling layer or a convolutional layer, and thus a two-dimensional or three-dimensional output feature map needs to be converted into a one-dimensional data stream for storage. And the full-connection layer controls that n data of the input characteristic diagrams need to be read circularly from the characteristic diagram storage module and enter the full-connection operation IP core through the f _ data port. And simultaneously, n multiplied by m weight parameters and m Bias parameters of the full connection layer are modified into a one-dimensional matrix form of (n multiplied by m, 1) and (m, 1), and the controller of the full connection layer sequentially reads the weight parameters and the Bias parameter data from the parameter memory and respectively enters the full connection operation IP core through w _ data and the Bias ports.
The fully-connected neural network needs the softmax layer to perform probability distribution calculation in the training process, but in the CNN feedforward process, different pictures can be classified only by distinguishing the size of the result, so that the softmax layer is not needed in the hardware structure.
Through the steps, a convolutional neural network hardware structure of any scale can be constructed by using the IP core of the convolutional neural network based on the FPGA to accelerate the feedforward process of the convolutional neural network, but a peripheral module is still required to support the convolutional neural network accelerator. The peripheral module comprises a feature map storage module and a parameter storage module.
The characteristic diagram storage module caches global characteristic diagrams of all dimensions, and an internal structure of the characteristic diagram storage module is composed of a shift buffer composed of register groups with configurable depth. A single shift buffer caches a two-dimensional feature map of a unit depth in the three-dimensional feature map, the buffer depth is the number of all feature points of a feature map row x a column, and the number of the buffers is the depth of the three-dimensional feature map. The input port of the characteristic graph storage module is connected with the address port corresponding to each layer of controller in a point-to-point manner, and the output port is connected with the input port of the IP core of each neural network layer.
The parameter storage module is composed of a read-only memory, weight parameters and bias parameters of each neural network layer trained by a PC end are stored in the parameter storage module as initial files of the read-only memory, an input port of the parameter storage module is connected with an address port corresponding to each layer of controller to access data, and an output port of the parameter storage module is connected with a corresponding data input port of each layer of IP core to read data.
The built convolution neural network hardware accelerator needs an external device Micro Control Unit (MCU) to regulate and control the convolution process. The MCU control system is described in detail with reference to fig. 1:
TABLE 2 MCU control State
Status of state Description of the invention
IDLE Idle State, also the initial State of the State machine in the MCU
Start Starting state, controlling the data of register in each convolution layer and the characteristic diagram storage module to zero
Conv1 The first layer convolution layer enable signal is pulled high to carry out convolution operation
Pool1 The first layer convolution layer enable signal is pulled down and follows the pooling layer enable signal of the first layer convolution layer to be pulled up to carry out pooling operation
FCN The enable signals of the convolution layer and the pooling layer in the neural network are pulled down, the enable signal of the full-connection layer is pulled up, and full-connection operation is carried out
Table 3 shows MCU state jump conditions and assembly instruction protocol
Present state Secondary tyre Jump condition
IDLE Start The external Switcher of the FPGA is pulled high,accelerator for representing the beginning of an image into a convolutional neural network
Start Conv1 The on-chip storage is used for completing the binarization and storage of the image, and the assembly instruction is [00.. 0001 ]]Bit width is the number of neural network layers
Conv1 Pool1 The convolution operation is completed by IC multiplied by IR clock cycles, and the assembly instruction is [00.. 0010 ]]
Pool1 Conv2 The convolution operation is completed by IC multiplied by IR clock cycles, and the assembly instruction is [00.. 0100 ]]
Conv2 Pool2 The convolution operation is completed by IC multiplied by IR clock cycles, and the assembly instruction is [00.. 1000.. 1000]
... ... ...
Poolx FCN After the operation of the last pooling layer is finished, the operation of the full connection layer is carried out, and the assembly instruction is [10.. 0000 ]]
The present invention takes the MNIST handwritten digit set as the verification data set, and the CNN model structure is represented by table 4. Are respectively at
Figure GSB0000193712460000131
The delay time, performance and efficiency of the E3-1230V 2 CPU, NVIDIA Quadro 4000GPU and De2i-150FPGA are subjected to prototype verification.
TABLE 4 CNN model
Neural network layer 1 2 3 4 5
ID 1 3 3 6 -
n 3 1 6 1 -
IC=IR 28 24 12 8 -
CC=CR 5 2 5 2 -
Step size 1 2 1 2 -
Status of state Convolution with a bit line Maximum pooling Convolution with a bit line Maximum pooling Full connection
TABLE 5 convolution neural network feed-forward process delay time comparison for each hardware platform
Neural network layer The design GPU CPU
First floor (occupancy rate) 7.84×10-6s(31.0%) 8.52×10-3s(31.4%) 1.13×10-2s(30.4%)
Second floor (occupancy rate) 5.76×10-6s(22.8%) 6.63×10-3s(24.4%) 1.25×10-2s(33.7%)
Third layer (occupancy rate) 1.44×10-6s(5.7%) 9.00×10-3s(33.2%) 0.74×10-2s(19.9%)
Fourth layer (occupancy rate) 0.64×10-6s(2.5%) 2.58×10-3s(9.5%) 0.56×10-2s(15.1%)
Fifth layer (occupancy rate) 96×10-6s(38.0%) 0.40×10-3s(1.5%) 0.03×10-2s(0.9%)
Time consuming 25.28×10-6s(100%) 27.13×10-3s(100%) 3.71×10-2s(100%)
Ratio of 1.00x 1073.18x 1468.35x
TABLE 6 Accelerator Performance consisting of FPGA-based convolutional neural network IP cores
Figure GSB0000193712460000141
TABLE 7 Accelerator Power consumption and efficiency formed by convolutional neural network IP cores based on FPGA
Figure GSB0000193712460000142
Referring to fig. 12, a hardware accelerator built based on the FPGA convolutional neural network IP core measures power on the FPGA. The power was measured using a model TM9 miniature power meter from TECMAN. The power of the FPGA when image recognition is performed on a single picture is 1.69W, corresponding to the data in table 7.

Claims (7)

1. The convolutional neural network IP core based on the FPGA is characterized in that the specific IP core and the composition module comprise a convolutional operation IP core, a pooling operation IP core, a full-connection operation IP core, a bubbling method convolutional layer, a bubbling method pooling layer, a full-connection layer, a feature map storage module and a parameter storage module; the neural network layer formed by each IP core is internally interconnected with the parameter storage module and the characteristic map storage module, and a hardware structure formed by the neural network layer, the parameter storage module and the characteristic map storage module is consistent with a required convolutional neural network algorithm structure;
the bubbling method pooling layer comprises a bubbling method controller and a pooling operation IP core:
1) the four clock sequences of the bubble method controller are respectively a read input characteristic diagram sequence, a column effective sequence, a row effective sequence and a pooling process sequence; the waiting intervals of the four clock sequences are the time when the input feature map data enter the input feature map buffer of the pooling operation IP core, and the interval pooling result is invalid; each effective pooling result of the column effective sequence is separated by an ineffective pooling result, which indicates that the step length of the pooling filter column is 2; the result of each effective line of the line effective sequence is separated by the result of one ineffective line, and the step length of the pooling filtering line is 2; assuming that the pooling process is liquid and the ineffective pooling results of the column active sequences and the row active sequences are gas, the pooling process is equivalent to discharging excessive gas from the liquid, so the method is called bubbling;
2) stacking pooling operation IP cores with the same number as the depth of the input feature map to form a single pooling core of a pooling layer, and forming a bubbling method pooling layer together with a bubbling method controller;
the bubbling method rolling layer can be obtained by the same method as the bubbling method pooling layer.
2. The FPGA-based convolutional neural network IP core of claim 1, wherein the convolutional operation IP core comprises an input feature map buffer, a weight parameter buffer, a multiplier, an adder, and an activation function module:
1) the convolution operation IP core reads the feature points in the feature map one by one according to rows in each clock cycle to form a data stream; the input characteristic diagram buffer is a register group with configurable depth, is used for shifting and buffering the data stream of the input characteristic diagram, modifies the depth of the register group according to the number of rows and columns of the input characteristic diagram, supports convolutional neural networks with different scales, and the input characteristic diagram data of a fixed address interval is connected with a multiplier;
2) the weight parameter buffer is a register group with configurable depth and is used for shifting and buffering weight parameters, and the weight parameter buffer is fixed after the register group is filled with the weight parameters; the hardware structure of the weight parameter buffer is the same as that of the input characteristic diagram buffer, namely, the weight parameter data of the same fixed address interval is connected with the multiplier;
3) the multiplier and the adder connected with the input feature map buffer and the weight parameter buffer fixed address interval form a multiplication-addition pair, and the feature points in the feature map and the corresponding weight parameters form complete convolution operation;
4) the convolution operation IP core adopts a common ReLU activation function, and the activation function module is a MUX multiplexer which is equivalent to the formula f (x) max (0, x).
3. The FPGA-based convolutional neural network IP core of claim 1, wherein the pooling operation IP core comprises an input profile buffer and a comparator:
1) the input characteristic diagram buffer is a register group with configurable depth, is used for shifting and buffering the input characteristic diagram data stream, can modify the depth of the register group according to the number of rows and columns of the input characteristic diagram to support convolutional neural networks with different scales, and the input characteristic diagram data of a fixed address interval is connected with the comparator;
2) the input port of the comparator is connected with 4 fixed address intervals of the input feature map buffer, and the output port is the maximum value of the 4 data.
4. The FPGA-based convolutional neural network IP core of claim 1, wherein the fully-connected arithmetic IP core comprises a counter, an accumulator, a multiplier and an adder:
1) the multiplier inputs the input node data and the weight parameters which correspond one to one respectively to complete the multiplication operation of the input node and the weight parameters;
2) the accumulator is composed of a register and an adder, and the operation result of the multiplier is input into the register of the accumulator and is accumulated with the multiplication result of the next clock cycle to form multiply-accumulate operation;
3) the counter controls the iteration period of the multiply-accumulate operation, and after the iteration period is completed, the counter controls the addition of the multiply-accumulate result and the offset parameter corresponding to the output node.
5. The FPGA-based convolutional neural network IP core of claim 1, wherein the bubble convolution layer comprises a bubble controller and a convolutional operational IP core:
1) three clock sequences of the bubble method controller are respectively a read input characteristic diagram sequence, a read weight parameter sequence and a convolution process sequence; reading a waiting interval of the input characteristic diagram sequence and the read weight parameter sequence, wherein the waiting interval is the time when the input characteristic diagram data and the weight parameter data respectively enter a convolution operation IP core input characteristic diagram buffer and a weight parameter buffer, and the interval convolution result is invalid; reading a holding interval of the weight parameter sequence to fill the weight parameter buffer with the weight parameter data and hold the weight parameter buffer for a fixed time, wherein the input feature map data stream in the interval passes through the input feature map buffer, the process is actually a convolution processing process, and the convolution result of the interval is partially effective; the waiting interval of the convolution process sequence is the same as the input characteristic diagram sequence and the read weight parameter sequence; the effective interval in the convolution operation process represents that the data of the output characteristic diagram is effective, namely an effective result generated by a convolution kernel without crossing rows; the invalid interval represents that the data of the output characteristic diagram is invalid, namely invalid results generated by the cross-row convolution kernel; if the convolution process is liquid and the invalid interval is gas, the valid result of the convolution process is equivalent to discharge redundant gas from the liquid, so the method is called bubbling;
2) and stacking convolution operation IP cores with the same number as the depth of the input feature map to form a single convolution core of the convolutional layer, instantiating the convolution core required by the convolutional layer, and forming the bubbling convolutional layer together with the bubbling controller.
6. The FPGA-based convolutional neural network IP core of claim 1, wherein the fully-connected layer comprises a fully-connected layer controller and a fully-connected operation IP core:
1) the full-connection layer controller comprises a weight parameter address reading device, a bias parameter address reading device, a counter and an input characteristic diagram address reading device; inputting a characteristic diagram reading addressor and a weight parameter reading addressor to output an address signal in each clock cycle, respectively reading input nodes and weight parameter data which correspond one to one and entering a full-connection operation IP core; the offset parameter reading addresser is controlled by a counter to read the offset parameter corresponding to the first output node after the multiplication and accumulation of the first output node are completed;
2) and the control port of the full-connection layer controller is in point-to-point connection with the corresponding ports of the characteristic map storage module and the parameter storage module and forms a full-connection layer together with the full-connection operation IP core.
7. The IP core of the convolutional neural network based on FPGA of claim 1, wherein the feature map storage module comprises a global feature map buffer corresponding to each dimension, and the internal structure is composed of a shift buffer; the buffer depth is the number of all feature points of the feature map row x column, the buffer number is the depth of the three-dimensional feature map, the input port of the feature map storage module is connected with the address port corresponding to each layer of controller point to point, and the output port is connected with the input port of the IP core of each neural network layer.
CN201910038533.1A 2019-01-16 2019-01-16 Convolutional neural network IP core based on FPGA Active CN109784489B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910038533.1A CN109784489B (en) 2019-01-16 2019-01-16 Convolutional neural network IP core based on FPGA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910038533.1A CN109784489B (en) 2019-01-16 2019-01-16 Convolutional neural network IP core based on FPGA

Publications (2)

Publication Number Publication Date
CN109784489A CN109784489A (en) 2019-05-21
CN109784489B true CN109784489B (en) 2021-07-30

Family

ID=66500538

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910038533.1A Active CN109784489B (en) 2019-01-16 2019-01-16 Convolutional neural network IP core based on FPGA

Country Status (1)

Country Link
CN (1) CN109784489B (en)

Families Citing this family (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110263925B (en) * 2019-06-04 2022-03-15 电子科技大学 Hardware acceleration implementation device for convolutional neural network forward prediction based on FPGA
CN110543939B (en) * 2019-06-12 2022-05-03 电子科技大学 Hardware acceleration realization device for convolutional neural network backward training based on FPGA
TWI719512B (en) * 2019-06-24 2021-02-21 瑞昱半導體股份有限公司 Method and system for algorithm using pixel-channel shuffle convolution neural network
CN110399591B (en) * 2019-06-28 2021-08-31 苏州浪潮智能科技有限公司 Data processing method and device based on convolutional neural network
CN110489077B (en) * 2019-07-23 2021-12-31 瑞芯微电子股份有限公司 Floating point multiplication circuit and method of neural network accelerator
CN110390392B (en) * 2019-08-01 2021-02-19 上海安路信息科技有限公司 Convolution parameter accelerating device based on FPGA and data reading and writing method
CN110472442A (en) * 2019-08-20 2019-11-19 厦门理工学院 A kind of automatic detection hardware Trojan horse IP kernel
CN112506087A (en) * 2019-09-16 2021-03-16 阿里巴巴集团控股有限公司 FPGA acceleration system and method, electronic device, and computer-readable storage medium
CN110717588B (en) * 2019-10-15 2022-05-03 阿波罗智能技术(北京)有限公司 Apparatus and method for convolution operation
CN110782022A (en) * 2019-10-31 2020-02-11 福州大学 Method for implementing small neural network for programmable logic device mobile terminal
CN110780923B (en) * 2019-10-31 2021-09-14 合肥工业大学 Hardware accelerator applied to binary convolution neural network and data processing method thereof
CN110929860B (en) * 2019-11-07 2020-10-23 深圳云天励飞技术有限公司 Convolution acceleration operation method and device, storage medium and terminal equipment
CN110991632B (en) * 2019-11-29 2023-05-23 电子科技大学 Heterogeneous neural network calculation accelerator design method based on FPGA
US20210174198A1 (en) * 2019-12-10 2021-06-10 GM Global Technology Operations LLC Compound neural network architecture for stress distribution prediction
CN112966807B (en) * 2019-12-13 2022-09-16 上海大学 Convolutional neural network implementation method based on storage resource limited FPGA
CN111210019B (en) * 2020-01-16 2022-06-24 电子科技大学 Neural network inference method based on software and hardware cooperative acceleration
CN111242289B (en) * 2020-01-19 2023-04-07 清华大学 Convolutional neural network acceleration system and method with expandable scale
CN111242295B (en) * 2020-01-20 2022-11-25 清华大学 Method and circuit capable of configuring pooling operator
EP4094194A1 (en) 2020-01-23 2022-11-30 Umnai Limited An explainable neural net architecture for multidimensional data
CN111339027B (en) * 2020-02-25 2023-11-28 中国科学院苏州纳米技术与纳米仿生研究所 Automatic design method of reconfigurable artificial intelligent core and heterogeneous multi-core chip
CN111325327B (en) * 2020-03-06 2022-03-08 四川九洲电器集团有限责任公司 Universal convolution neural network operation architecture based on embedded platform and use method
CN111427838B (en) * 2020-03-30 2022-06-21 电子科技大学 Classification system and method for dynamically updating convolutional neural network based on ZYNQ
CN111563582A (en) * 2020-05-06 2020-08-21 哈尔滨理工大学 Method for realizing and optimizing accelerated convolution neural network on FPGA (field programmable Gate array)
CN111582451B (en) * 2020-05-08 2022-09-06 中国科学技术大学 Image recognition interlayer parallel pipeline type binary convolution neural network array architecture
CN111860781A (en) * 2020-07-10 2020-10-30 逢亿科技(上海)有限公司 Convolutional neural network feature decoding system realized based on FPGA
CN111914999B (en) * 2020-07-30 2024-04-19 云知声智能科技股份有限公司 Method and equipment for reducing calculation bandwidth of neural network accelerator
CN111797982A (en) * 2020-07-31 2020-10-20 北京润科通用技术有限公司 Image processing system based on convolution neural network
CN112100118B (en) * 2020-08-05 2021-09-10 中科驭数(北京)科技有限公司 Neural network computing method, device and storage medium
CN111931925B (en) * 2020-08-10 2024-02-09 西安电子科技大学 Acceleration system of binary neural network based on FPGA
CN112346704B (en) * 2020-11-23 2021-09-17 华中科技大学 Full-streamline type multiply-add unit array circuit for convolutional neural network
CN112435270B (en) * 2020-12-31 2024-02-09 杭州电子科技大学 Portable burn depth identification equipment and design method thereof
CN112732638B (en) * 2021-01-22 2022-05-06 上海交通大学 Heterogeneous acceleration system and method based on CTPN network
CN112926733B (en) * 2021-03-10 2022-09-16 之江实验室 Special chip for voice keyword detection
CN113301221B (en) * 2021-03-19 2022-09-09 西安电子科技大学 Image processing method of depth network camera and terminal
CN112905213B (en) * 2021-03-26 2023-08-08 中国重汽集团济南动力有限公司 Method and system for realizing ECU (electronic control Unit) refreshing parameter optimization based on convolutional neural network
CN113065647B (en) * 2021-03-30 2023-04-25 西安电子科技大学 Calculation-storage communication system and communication method for accelerating neural network
CN115145839A (en) * 2021-03-31 2022-10-04 广东高云半导体科技股份有限公司 Deep convolution accelerator and method for accelerating deep convolution by using same
CN113344179B (en) * 2021-05-31 2022-06-14 哈尔滨理工大学 IP core of binary convolution neural network algorithm based on FPGA
CN113298237A (en) * 2021-06-23 2021-08-24 东南大学 Convolutional neural network on-chip training accelerator based on FPGA
CN113762491B (en) * 2021-08-10 2023-06-30 南京工业大学 Convolutional neural network accelerator based on FPGA
CN113778940B (en) * 2021-09-06 2023-03-07 电子科技大学 High-precision reconfigurable phase adjustment IP core based on FPGA
CN113762480B (en) * 2021-09-10 2024-03-19 华中科技大学 Time sequence processing accelerator based on one-dimensional convolutional neural network
WO2023131252A1 (en) * 2022-01-06 2023-07-13 深圳鲲云信息科技有限公司 Data flow architecture-based image size adjustment structure, adjustment method, and image resizing method and apparatus
CN114781629B (en) * 2022-04-06 2024-03-05 合肥工业大学 Hardware accelerator of convolutional neural network based on parallel multiplexing and parallel multiplexing method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106940815A (en) * 2017-02-13 2017-07-11 西安交通大学 A kind of programmable convolutional neural networks Crypto Coprocessor IP Core
CN109086867A (en) * 2018-07-02 2018-12-25 武汉魅瞳科技有限公司 A kind of convolutional neural networks acceleration system based on FPGA

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106462800A (en) * 2014-04-11 2017-02-22 谷歌公司 Parallelizing the training of convolutional neural networks
CN107392218B (en) * 2017-04-11 2020-08-04 创新先进技术有限公司 Vehicle loss assessment method and device based on image and electronic equipment
CN108229670B (en) * 2018-01-05 2021-10-08 中国科学技术大学苏州研究院 Deep neural network acceleration platform based on FPGA
CN108665059A (en) * 2018-05-22 2018-10-16 中国科学技术大学苏州研究院 Convolutional neural networks acceleration system based on field programmable gate array
CN109032781A (en) * 2018-07-13 2018-12-18 重庆邮电大学 A kind of FPGA parallel system of convolutional neural networks algorithm

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106940815A (en) * 2017-02-13 2017-07-11 西安交通大学 A kind of programmable convolutional neural networks Crypto Coprocessor IP Core
CN109086867A (en) * 2018-07-02 2018-12-25 武汉魅瞳科技有限公司 A kind of convolutional neural networks acceleration system based on FPGA

Also Published As

Publication number Publication date
CN109784489A (en) 2019-05-21

Similar Documents

Publication Publication Date Title
CN109784489B (en) Convolutional neural network IP core based on FPGA
CN107578098B (en) Neural network processor based on systolic array
US20190087713A1 (en) Compression of sparse deep convolutional network weights
US20170236053A1 (en) Configurable and Programmable Multi-Core Architecture with a Specialized Instruction Set for Embedded Application Based on Neural Networks
CN107239824A (en) Apparatus and method for realizing sparse convolution neutral net accelerator
CN111210019B (en) Neural network inference method based on software and hardware cooperative acceleration
CN113051216B (en) MobileNet-SSD target detection device and method based on FPGA acceleration
CN110991631A (en) Neural network acceleration system based on FPGA
CN110766127B (en) Neural network computing special circuit and related computing platform and implementation method thereof
Li et al. A multistage dataflow implementation of a deep convolutional neural network based on FPGA for high-speed object recognition
CN110543939A (en) hardware acceleration implementation framework for convolutional neural network backward training based on FPGA
Hoffmann et al. A survey on CNN and RNN implementations
Liu et al. FPGA-NHAP: A general FPGA-based neuromorphic hardware acceleration platform with high speed and low power
Xiao et al. FPGA implementation of CNN for handwritten digit recognition
Chen et al. A 67.5 μJ/prediction accelerator for spiking neural networks in image segmentation
Lien et al. Sparse compressed spiking neural network accelerator for object detection
Chen et al. Cerebron: A reconfigurable architecture for spatiotemporal sparse spiking neural networks
Duan et al. Energy-efficient architecture for FPGA-based deep convolutional neural networks with binary weights
Mukhopadhyay et al. Systematic realization of a fully connected deep and convolutional neural network architecture on a field programmable gate array
CN110738317A (en) FPGA-based deformable convolution network operation method, device and system
CN114462587A (en) FPGA implementation method for photoelectric hybrid computing neural network
Clere et al. FPGA based reconfigurable coprocessor for deep convolutional neural network training
Bai et al. An OpenCL-based FPGA accelerator with the Winograd’s minimal filtering algorithm for convolution neuron networks
Wen FPGA-Based Deep Convolutional Neural Network Optimization Method
Chang et al. A Real-Time 1280× 720 Object Detection Chip With 585 MB/s Memory Traffic

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant