CN108416422B - FPGA-based convolutional neural network implementation method and device - Google Patents

FPGA-based convolutional neural network implementation method and device Download PDF

Info

Publication number
CN108416422B
CN108416422B CN201810074941.8A CN201810074941A CN108416422B CN 108416422 B CN108416422 B CN 108416422B CN 201810074941 A CN201810074941 A CN 201810074941A CN 108416422 B CN108416422 B CN 108416422B
Authority
CN
China
Prior art keywords
data
input
module
neural network
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810074941.8A
Other languages
Chinese (zh)
Other versions
CN108416422A (en
Inventor
罗聪
万文涛
梁洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nationz Technologies Inc
Original Assignee
Nationz Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nationz Technologies Inc filed Critical Nationz Technologies Inc
Publication of CN108416422A publication Critical patent/CN108416422A/en
Application granted granted Critical
Publication of CN108416422B publication Critical patent/CN108416422B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a method and a device for realizing a convolutional neural network based on an FPGA, which comprises the steps of initializing editable resources of the FPGA to generate a functional module required by a realization model, loading weight data of each processing level in the convolutional neural network model to a memory storage of the FPGA, associating a state register and the processing level of the FPGA, storing the data to be processed to the memory storage through a memory controller of the FPGA, finally reading parameters of the state register, determining the processing level to be operated, completing the processing of the data by the processing level to be operated until the sequential operation of all the processing levels of the convolutional neural network model to be realized is finished, and outputting a processing result corresponding to the data to be processed; in the whole process, the convolutional neural network is realized by the hardware of the FPGA, and the convolutional neural network is not dependent on software any more, so that the problem that the conventional convolutional neural network technology depends on software realization is solved.

Description

FPGA-based convolutional neural network implementation method and device
Technical Field
The invention relates to the Field of Field Programmable Gate Arrays (FPGA), in particular to a convolutional neural network implementation method and device based on the FPGA.
Background
With the explosive growth of artificial intelligence, deep learning has become an effective means of extracting valuable information from a vast data analysis, and convolutional neural networks have received attention due to their weight reusability. Most of the convolutional neural networks are realized by software, the data volume is large, the requirement on the computing capacity of hardware is high, the high computing capacity depending on the cloud is high, and the power consumption is large.
Disclosure of Invention
The invention provides a convolutional neural network implementation method and device based on an FPGA (field programmable gate array), which are used for solving the problem that the conventional convolutional neural network technology depends on software implementation.
In order to solve the technical problems, the invention adopts the following technical scheme:
an FPGA-based convolutional neural network implementation method, which comprises the following steps:
initializing editable resources of the FPGA to generate an input cache module, an output cache module, an input control module, an output control module, a neural network processing unit, a data reading module and an operation control module;
loading weight data of each processing level in the convolutional neural network model to be realized into a memory storage of the FPGA, and associating a state register and the processing level of the FPGA;
storing the data to be processed into a memory through a memory controller of the FPGA;
The operation control module reads the parameters of the state register, determines the to-be-operated processing level, controls the input buffer module, the output buffer module, the input control module, the output control module, the neural network processing unit and the data reading module to finish the processing of the to-be-operated processing level until the sequential operation of all the processing levels of the convolutional neural network model to be realized is finished, and outputs the processing result corresponding to the to-be-processed data.
Further, when the to-be-operated processing level is a convolution computing level, the operation control module controls the input buffer module, the output buffer module, the input control module, the output control module, the neural network processing unit and the data reading module to complete the processing of the to-be-operated processing level on the data, and the processing comprises the following steps:
the control data reading module reads weight data and input data corresponding to the convolution computation level stored in the memory through the memory controller and stores the weight data and the input data into the input buffer module;
the control input control module inputs the weight data and the input data stored by the input buffer module into the neural network processing unit;
the control neural network processing unit calculates input data by using the weight data and outputs a calculation result;
Controlling the output control module to store the calculation result into the output buffer module;
and controlling the memory controller to read the calculation result in the output buffer module and store the calculation result into the memory.
Further, when the to-be-operated processing level is a pooled operation level, the operation control module controls the input buffer module, the output buffer module, the input control module, the output control module, the neural network processing unit and the data reading module to complete the processing of the to-be-operated processing level on the data, and the method comprises the following steps:
the control data reading module reads input data corresponding to the pooling operation level stored in the memory storage through the memory controller and stores the input data into the input cache module;
the control input control module divides the input data stored by the input buffer module into a plurality of pooling windows, and sequentially inputs the pooled data into the neural network processing unit from the pooling windows;
controlling a neural network processing unit to carry out maximum pooling comparison on input data and outputting a comparison result;
the control output control module stores the comparison result into the output buffer module;
and controlling the memory controller to read the comparison result in the output buffer module and store the comparison result into the memory.
Further, when the to-be-operated processing level is a connection operation level, the operation control module controls the input buffer module, the output buffer module, the input control module, the output control module, the neural network processing unit and the data reading module to complete the processing of the to-be-operated processing level on the data, and the processing comprises the following steps:
determining output data of other processing layers corresponding to the input data of the current processing layer;
the storage addresses of the output data of other preprocessing layers in the memory are configured as the input addresses of the input data of the current processing layer;
the control data reading module reads input data corresponding to the input address stored in the memory through the memory controller and stores the input data into the input buffer module.
Further, when the processing level to be operated is a reorganization operation level, the operation control module controls the input buffer module, the output buffer module, the input control module, the output control module, the neural network processing unit and the data reading module to complete the processing of the data by the processing level to be operated, and the processing includes:
the control data reading module reads the input data corresponding to the reorganization operation level stored in the memory through the memory controller and stores the input data into the input cache module;
The control input control module inputs the input data stored by the input buffer module into the neural network processing unit;
controlling a neural network processing unit to carry out recombination operation on input data and outputting a recombination result;
controlling the output control module to store the reorganization result into the output buffer module;
the control memory controller reads the reorganization result in the output buffer module and stores the reorganization result into the memory;
a mapping between the memory address of the input data in the memory storage and the memory address of the reorganization result in the memory storage is established.
Further, when the processing level to be operated is a classified operation level, the operation control module controls the input buffer module, the output buffer module, the input control module, the output control module, the neural network processing unit and the data reading module to complete the processing of the data by the processing level to be operated, including:
the control data reading module reads input data corresponding to the classification operation level stored in the memory storage through the memory controller and stores the input data into the input cache module;
the control input control module inputs the input data stored by the input buffer module as an input characteristic vector into the neural network processing unit;
The control neural network processing unit performs classification calculation on the input data and outputs a detection result;
controlling the output control module to store the detection result into the output buffer module;
and controlling the memory controller to read and output the detection result in the cache module and outputting the detection result.
Further, before the data to be processed is stored into the memory storage through the memory controller of the FPGA, the method further includes:
judging whether the data to be processed meets the calculation requirement of a convolutional neural network model to be realized or not;
if the calculation requirement is not met, carrying out normalization processing and/or bilinear interpolation processing on the data to be processed until the calculation requirement is met;
and storing the processed data to be processed into a memory.
An FPGA-based convolutional neural network implementation apparatus, comprising:
the initialization module is used for initializing editable resources of the FPGA and generating an input cache module, an output cache module, an input control module, an output control module, a neural network processing unit, a data reading module and an operation control module; the operation control module is used for reading parameters of the state register, determining a to-be-operated processing level, controlling the input buffer module, the output buffer module, the input control module, the output control module, the neural network processing unit and the data reading module to finish the processing of the to-be-operated processing level until the sequential operation of all the processing levels of the convolutional neural network model to be realized is finished, and outputting a processing result corresponding to the to-be-processed data;
The loading module is used for loading weight data of each processing level in the convolutional neural network model to the memory storage of the FPGA, associating the state register of the FPGA with the processing level and storing the data to be processed to the memory storage through the memory controller of the FPGA.
Further, the neural network processing unit includes a plurality of processing units for processing data in parallel.
Further, the input buffer module comprises two input storage units, and the two input storage units are used for buffering input data and/or weight data of the neural network processing unit in a ping-pong double-buffering mode; and/or the output buffer module comprises two output storage units, and the two output storage units are used for buffering output data of the neural network processing unit in a ping-pong double-buffering mode.
Advantageous effects
The invention provides a method and a device for realizing a convolutional neural network based on an FPGA, which comprises the steps of initializing editable resources of the FPGA, generating an input buffer memory module, an output buffer memory module, an input control module, an output control module, a neural network processing unit, a data reading module and an operation control module, loading weight data of each processing level in a convolutional neural network model to be realized into a memory storage of the FPGA, associating a state register and the processing level of the FPGA, storing the data to be processed into the memory storage through a memory controller of the FPGA, finally reading parameters of the state register by the operation control module, determining the level to be processed, and controlling the input buffer memory module, the output buffer memory module, the input control module, the output control module, the neural network processing unit and the data reading module to finish the processing of the data by the input buffer memory module, the output control module, the neural network processing unit and the data reading module until the sequential operation of all the processing levels of the convolutional neural network model to be realized is finished, and outputting processing results corresponding to the data to be processed; in the whole process, the convolutional neural network is realized by the hardware of the FPGA, and the convolutional neural network is not dependent on software any more, so that the problem that the conventional convolutional neural network technology depends on software realization is solved.
Drawings
Fig. 1 is a flowchart of a convolutional neural network implementation method according to a first embodiment of the present invention;
fig. 2 is a schematic structural diagram of a convolutional neural network implementation device according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a convolutional neural network implementation device according to a third embodiment of the present invention;
fig. 4 is a schematic diagram of a convolutional neural network implementation method according to a third embodiment of the present invention;
FIG. 5 is a schematic diagram of a ping-pong dual cache according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a neural network processing unit according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a max-pooling operation according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of logic control according to an embodiment of the present invention;
fig. 9 is a schematic diagram of data reorganization according to an embodiment of the present invention.
Detailed Description
The invention is applicable to all terminal equipment provided with FPGA chips, including PC, mobile phone, PAD, deposit machine and the like. The invention will be described in further detail below with reference to the drawings by means of specific embodiments.
Embodiment one:
fig. 1 is a flowchart of an FPGA-based convolutional neural network implementation method according to an embodiment of the present invention, please refer to fig. 1, and the FPGA-based convolutional neural network implementation method according to the embodiment includes the following steps:
S101: and initializing editable resources of the FPGA to generate an input buffer module, an output buffer module, an input control module, an output control module, a neural network processing unit, a data reading module and an operation control module.
The editable resources of the FPGA can be constructed into any functional mode according to the requirement, and when the equipment is initialized, the editable resources of the FPGA are constructed into the functional mode which is necessary for realizing the convolutional neural network model, and then the convolutional neural network function is realized on the basis so as to process the data.
In the invention, the input buffer module and the output buffer module both adopt a ping-pong double buffer mechanism to buffer data, and the neural network processing unit comprises a plurality of PEs (Processing Element, processing units) which process the data in parallel; these will be described in embodiment three.
In the present invention, the neural network processing units are time-multiplexed, i.e. they have different roles at different processing levels.
S102: and loading weight data of each processing level in the convolutional neural network model to be realized into a memory storage of the FPGA, and associating a state register of the FPGA with the processing level.
The convolutional neural network model generally comprises a plurality of processing layers, and the convolutional neural network model according to the third embodiment of the invention comprises twenty-two convolutional layers, five maximum pooling layers, two connecting layers, one recombination layer, one classification layer and one preprocessing layer, and the total of thirty-two processing layers realizes real-time operation processing of input picture data and outputs detection results.
In order to identify the processing hierarchy, the invention carries out association identification on the status register and the processing hierarchy, wherein the association mode can be that a plurality of status registers are arranged, each status register corresponds to one processing hierarchy, or only one status register is arranged, and the status registers are updated in real time according to the operation process.
S103: and storing the data to be processed into a memory through a memory controller of the FPGA.
The data to be processed refers to data, such as image data, which is externally required to be processed by the convolutional neural network.
Because the data which can be processed by different convolutional neural network models are limited, before the data to be processed is stored into a memory through a memory controller of an FPGA, whether the data to be processed meets the calculation requirement of the convolutional neural network model to be realized or not needs to be judged, and if not, normalization processing and/or bilinear interpolation processing are carried out on the data to be processed until the calculation requirement is met, and the processed data to be processed is stored into the memory.
S104: the operation control module reads the parameters of the state register, determines the to-be-operated processing level, controls the input buffer module, the output buffer module, the input control module, the output control module, the neural network processing unit and the data reading module to finish the processing of the to-be-operated processing level until the sequential operation of all the processing levels of the convolutional neural network model to be realized is finished, and outputs the processing result corresponding to the to-be-processed data.
The step mainly comprises the steps that an operation control module controls each functional module to process input data according to parameters of a state register, the convolutional neural network processing is completed, and specific implementation of the convolutional neural network processing is different for different kinds of processing layers.
When the processing hierarchy to be operated is a convolution computation hierarchy, step S104 includes:
the control data reading module reads weight data and input data corresponding to the convolution computation level stored in the memory through the memory controller and stores the weight data and the input data into the input buffer module;
the control input control module inputs the weight data and the input data stored by the input buffer module into the neural network processing unit;
the control neural network processing unit calculates input data by using the weight data and outputs a calculation result;
controlling the output control module to store the calculation result into the output buffer module;
and controlling the memory controller to read the calculation result in the output buffer module and store the calculation result into the memory.
When the processing hierarchy to be executed is a pooled operation hierarchy, step S104 includes:
the control data reading module reads input data corresponding to the pooling operation level stored in the memory storage through the memory controller and stores the input data into the input cache module;
The control input control module divides the input data stored by the input buffer module into a plurality of pooling windows, and sequentially inputs the pooled data into the neural network processing unit from the pooling windows;
controlling a neural network processing unit to carry out maximum pooling comparison on input data and outputting a comparison result;
the control output control module stores the comparison result into the output buffer module;
and controlling the memory controller to read the comparison result in the output buffer module and store the comparison result into the memory.
When the processing hierarchy to be operated is the link operation hierarchy, step S104 includes:
determining output data of other processing layers corresponding to the input data of the current processing layer;
the storage addresses of the output data of other preprocessing layers in the memory are configured as the input addresses of the input data of the current processing layer;
the control data reading module reads input data corresponding to the input address stored in the memory through the memory controller and stores the input data into the input buffer module.
When the processing hierarchy to be operated is the reorganization operation hierarchy, step S104 includes:
the control data reading module reads the input data corresponding to the reorganization operation level stored in the memory through the memory controller and stores the input data into the input cache module;
The control input control module inputs the input data stored by the input buffer module into the neural network processing unit;
controlling a neural network processing unit to carry out recombination operation on input data and outputting a recombination result;
controlling the output control module to store the reorganization result into the output buffer module;
the control memory controller reads the reorganization result in the output buffer module and stores the reorganization result into the memory;
a mapping between the memory address of the input data in the memory storage and the memory address of the reorganization result in the memory storage is established.
When the processing hierarchy to be executed is a classification operation hierarchy, step S104 includes:
the control data reading module reads input data corresponding to the classification operation level stored in the memory storage through the memory controller and stores the input data into the input cache module;
the control input control module inputs the input data stored by the input buffer module as an input characteristic vector into the neural network processing unit;
the control neural network processing unit performs classification calculation on the input data and outputs a detection result;
controlling the output control module to store the detection result into the output buffer module;
and controlling the memory controller to read and output the detection result in the cache module and outputting the detection result.
The embodiment provides a convolutional neural network implementation method based on an FPGA, which comprises the steps of initializing editable resources of the FPGA, generating an input buffer memory module, an output buffer memory module, an input control module, an output control module, a neural network processing unit, a data reading module and an operation control module, loading weight data of all processing levels in a convolutional neural network model to be implemented into a memory storage of the FPGA, associating a state register and the processing levels of the FPGA, storing the data to be processed into the memory storage through a memory controller of the FPGA, finally reading parameters of the state register by the operation control module, determining the levels to be processed, controlling the input buffer memory module, the output buffer memory module, the input control module, the output control module, the neural network processing unit and the data reading module to complete processing of the data by the levels to be processed until all the processing levels of the convolutional neural network model to be implemented are sequentially operated, and outputting processing results corresponding to the data to be processed; in the whole process, the convolutional neural network is realized by the hardware of the FPGA, and the convolutional neural network is not dependent on software any more, so that the problem that the conventional convolutional neural network technology depends on software realization is solved.
Embodiment two:
fig. 2 is a schematic structural diagram of a convolutional neural network implementation device according to a second embodiment of the present invention, please refer to fig. 2, and the convolutional neural network implementation device 2 according to the present embodiment includes:
the initialization module 21 is configured to initialize editable resources of the FPGA, and generate an input buffer module, an output buffer module, an input control module, an output control module, a neural network processing unit, a data reading module, and an operation control module; the operation control module is used for reading parameters of the state register, determining a to-be-operated processing level, controlling the input buffer module, the output buffer module, the input control module, the output control module, the neural network processing unit and the data reading module to finish the processing of the to-be-operated processing level until the sequential operation of all the processing levels of the convolutional neural network model to be realized is finished, and outputting a processing result corresponding to the to-be-processed data;
the loading module 22 is configured to load weight data of each processing level in the convolutional neural network model to be implemented into a memory storage of the FPGA, associate a status register of the FPGA with the processing level, and store the data to be processed into the memory storage through a memory controller of the FPGA.
Embodiment III:
the present embodiment will be described taking an example of input data as a picture.
The embodiment realizes the deep learning convolutional neural network model by hardware, and the realized platform is a SNPS-DX 7 (model of a product of SNopsys) FPGA development board of the SNopsys (New Cisco technology Co., USA); specifically, firstly, the trained weight parameters of the convolutional neural network model are loaded into a DDR (Double Data Rate, double Rate synchronous dynamic random access memory) of an FPGA development board, then input Data are preprocessed in a preprocessing module, the preprocessed Data are transmitted into the DDR of the FPGA development board, then the weight parameters and the input Data of a current layer network in the DDR are continuously extracted through a DMA (Direct Memory Access ) unit, the weight parameters and the input Data are transmitted to an NPU (network process units, a neural network processing unit) for parallel operation, and the output Data after operation is input of the next layer and is stored back into the DDR through an output buffer module. And finally, the data feature vectors which finish all convolution operations are transmitted to a classification module to finish feature classification calculation.
Specifically, as shown in fig. 3, the apparatus provided in this embodiment includes an input end a, an input end B, a preprocessing unit 301, a DDR controller 302 (i.e., the above memory controller), a DDR memory 303 (i.e., the above memory), a DMA unit 304 for reading and writing weight data, a buffer unit 305 for buffering weight data, a buffer unit 306 for buffering input data, a DMA unit 307 for reading and writing input data, an input control module 308 for input control, an NPU unit 309, an output control module 310 for output control, an output buffer module 311 for buffering output data, an operation control module 312, and a classification calculation unit 313; the buffer unit 305 and the buffer unit 306 form the above input buffer module, the operation control module 312 includes an instruction unit 3121, a decoder 3122, and a control logic unit 3123, where the instruction unit 3121 is used for receiving instruction data, the decoder 3122 is used for decoding the instruction data, and the control logic unit 3123 is used for outputting a corresponding control instruction according to a decoding result.
The DDR controller 302 controls the connection and data transfer functions of the DDR memory with other external modules, including storage control of input data, read control of DMA read DDR data, storage control of output data after hardware operation, and read control of final output feature vector data.
The input buffer memory module and the output buffer memory module adopt a ping-pong double buffer memory mode. As shown in fig. 5, the buffer module includes a first buffer unit 51, a second buffer unit 52, and a selection control unit 53, where the selection control unit 53 is configured to select which buffer unit is to be buffered with input data, and control which buffer unit outputs buffered output data, and output an identification flag signal, where the flag is a current state of input/output buffer, and indicates a current data state of two buffers. Specifically, the input buffer includes a weight data buffer and an input data buffer, where the weight data buffer adopts a ping-pong double buffer to buffer the weight data, the bias value and the regularization parameter of the current layer, and when the weight data of one buffer area participates in the operation, the DMA unit 304 loads data into the other buffer area, so that the waiting time for loading data can be reduced; correspondingly, the input data buffer also adopts a ping-pong double-buffer mode to buffer the input data of the current layer, and when one buffer area participates in operation, the DMA unit 307 loads data to the other buffer area; and the output data buffer adopts a ping-pong double-buffer mode, and when the feature map data calculated by the NPU is buffered in one buffer area, the other buffer area in which the data is already buffered writes the data into the DDR memory.
As shown in fig. 6, the NPU unit 309 includes p×p parallel processing units PE (PE 0 to PEn) for parallel computing multiply/add/subtract operations of the convolution process. The intermediate result of the calculation of one channel and the convolution kernel is stored in a temporary storage register, the result of the calculation of the next channel and the convolution kernel is added with the intermediate result and then stored in the temporary storage register again, the calculation is repeated until the calculation of all channels and the convolution kernel is completed, and then BN (Batch Normalization) operation is carried out on the obtained data.
The batch regularized BN operation involves the following expressions:
wherein (1)>Gamma is a weight, beta is a correction value, epsilon is a constant for ensuring numerical stability, and three parameters gamma, beta and epsilon are obtained through cloud training.
After batch regularized BN operations are performed, the data is activated, and the activation function expression is: y= (x > 0)? x:0.1 x; that is, when x is greater than 0, y=x, and when x is less than 0, y=0.1x, x being the input of the NPU unit and y being the output of the NPU unit.
And finally, the calculated result stores the obtained new feature map data into the DDR memory through the output buffer module.
The input control module 308 and the output control module 310 are used for connecting the transmission trend of data between the two modules; specifically, the input control module 308 is configured to rearrange the data in the data buffer according to the interface of the NPU unit for inputting the data, and correctly transmit the data to the corresponding input interface, and the output control module 310 is configured to rearrange the output data of the NPU according to the input interface of the output buffer, and correctly transmit the data to the corresponding input interface.
The operation control module 312 controls the logic state of the whole system, and determines which level of the calculated deep convolutional neural network the current state is in by reading the current state register, so as to execute the logic control instruction under the corresponding state and control the operation of data.
As shown in fig. 4, the method provided in this embodiment includes the following steps:
s401: and acquiring weight data of the model, and loading the weight data into the DDR memory.
And acquiring weight data of the trained deep convolutional neural network from the cloud, and loading the weight data into the DDR memory of the FPGA development board through the USB.
Specifically, the GPU (GraphicsProcessingUnit, graphic processor) acceleration training yolo (a deep learning algorithm) convolutional neural network, get the weight data of the trained face detection model, load the weight data into the DDR memory of the FPGA development board through USB (Universal Serial Bus ).
S402: the input data is preprocessed and stored in the DDR memory.
The method comprises the following steps: carrying out normalization processing on input data to enable the input data to meet calculation requirements; carrying out bilinear interpolation processing on input data to enable the picture size to meet the calculation requirement; the preprocessed input data is stored in the DDR memory.
Specifically, the input picture data is subjected to normalization preprocessing, the gray value is divided by 255 to be normalized to be between 0 and 1, the size of the input picture data is rearranged to 416 x 416 by adopting a bilinear interpolation method, the input picture size requirement of the yolo convolutional neural network is met, and then the input picture data is stored in a DDR memory.
S403: the current status register is read and the corresponding processing hierarchy is determined.
And reading the current state register, judging which layer of the calculation depth convolutional neural network the current state is in, and executing a logic control instruction under the corresponding state to control the operation of data. Defining n state registers R0, R1, …, R (n-1), and Rn, wherein each register stores state data corresponding to the current layer, which means that the whole deep convolutional neural network needs to operate R0 to Rn common n layers of network, and control logic reads the registers according to the sequence, executes a logic control function of the corresponding layer, controls the flow direction of the whole hardware data, and completes the calculation of the deep convolutional neural network.
S404: and calling the corresponding processing hierarchy to process the data.
When the current state register corresponds to convolution calculation, the convolution calculation operation is executed, and the step comprises the following steps: loading weight parameters of a convolution layer and input data into a parallel convolution processing unit PE, setting a floating point number (32 bit)/fixed point number (16 bit) matrix with the weight parameters of k, wherein the input data is a floating point number (32 bit)/fixed point number (16 bit) matrix with the weight parameters of a, the sliding step length is 1, and the number of the parallel convolution processing units PE is P, so that the convolution sum of P input data and the weight can be calculated simultaneously; the convolution layer calculation comprises multiplication and accumulation operation of weight and input data, batch regularization BN calculation operation, offset addition and activation function activation, one featuremap is obtained after the input data of one convolution kernel and a plurality of input channels are calculated, and then the next convolution kernel and the input data are calculated after the convolution kernel and the input data are stored in a memory until the calculation of one layer of the deep convolution neural network is completed.
When the current status register is a pooling operation, performing a maximum pooling operation, where the step includes: and (3) setting the floating point number/fixed point number matrix with the input data of A.A, the sliding step length of s and the number of parallel convolution processing units of P.P, dividing the input data into (A/s) pooling windows by adopting maximum pooling operation, sequentially loading P.P input data at corresponding positions from the (A/s) pooling windows each time, and outputting P.P maximum pooling results after s cycles.
When the current status register is a connection operation, the connection layer operation is executed, and the connection layer operation uses the output data of one layer or two layers calculated before as the input data of the current layer, so that the address of the output data of one layer before reloading in the DDR memory is required to be used as the input address of the input data of the current layer, and the connection layer operation can be completed.
When the current status register is a reorganization layer operation, the reorganization layer operation is to split and reorganize the current layer, the original input data is set as 2h x 2w x 2c data, the step length is 2, and through the reorganization layer operation, the feature map of the output data is h x w x 8c, and an address mapping unit needs to be added to map the original address into new address storage data through an address, and the new address storage data is used as the input data of the next layer.
S405: judging whether the network layer calculation is finished, if so, executing step S406, otherwise, returning to step S403.
When the current status register is the classified layer calculation, determining that the network layer calculation is finished, and executing step S406; when the current status register is not the classification layer calculation, it is determined that the network layer calculation is not ended, and step S403 is executed.
S406: and executing the classification layer computing operation and outputting a result.
The classification layer is calculated by taking the calculation results after the previous operations of each convolution layer, pooling layer, connecting layer, recombination layer and the like as the input feature vectors of the layer, obtaining the detection results through classification calculation and outputting the detection results.
According to the embodiment, the complex deep convolutional neural network is realized by utilizing the hardware FPGA, so that the deep convolutional neural network model highly dependent on the strong cloud computing capability is put into a local terminal to operate, real-time processing of data is not required to be performed by depending on the network, and the problem that a large complex deep convolutional neural network cannot operate in a hardware terminal is solved. Meanwhile, the invention can process the deep convolutional neural network with more complex structure and more network layers, can adapt to the current deep learning algorithm, and can process a convolutional layer, a pooling layer, a connecting layer and a recombination layer. Compared with the prior method, the convolution layer can process batch regularized BN operation and activation function leakage function, and the realization of a connection layer and a recombination layer is increased, so that the convolution layer has frontier.
Furthermore, the embodiment is suitable for processing input graphic data and weight data of floating point number (32 bit) or fixed point number (16 bit), and can process deep learning algorithm models of different data types by changing internal multiplication, addition and subtraction units into floating point number or fixed point number operation units, so that the embodiment has high flexibility, and the weight and intermediate result data quantity are reduced, and the calculation accuracy is not greatly changed by changing the deep learning algorithm of the floating point number data type into the hardware fixed point number data type.
Furthermore, in this embodiment, the input image data or video frame data is directly processed by the preprocessing module, then the processed data is input into the convolutional neural network to perform operations of each layer, the data after the operations of each network layer are input as feature vectors of the classification layer, final classification detection calculation is performed, and finally the data is output, so that real-time detection of the face in the image or video frame is completed, the whole process is realized in local FPGA hardware, networking is not needed, and compared with the traditional CPU and GPU schemes, the power consumption is greatly reduced, and the configuration is more flexible to adapt to the current deep learning algorithm.
For example, a yolo convolutional neural network model has twenty-two convolutional layers, five maximum pooling layers, two connecting layers, one recombination layer, one classification layer and one preprocessing module, so that real-time operation processing of input picture data is realized and a detection result is output. The size of the input picture is changed into 416 x 416 after pretreatment, the convolution kernel sizes are 3*3 and 1*1, and the step size of the pooling layer is 2 x 2. And using a yolo convolutional neural network to train a face detection model at the cloud end by utilizing GPU acceleration, and inputting picture data.
For the maximum pooling operation, as shown in fig. 7, a floating point number/fixed point number matrix with input data being a×a, a sliding step length being s, and the number of parallel convolution processing units PE being p×p is set, the maximum pooling operation is adopted, the input data is divided into (a/s) pooling windows, P input data at corresponding positions are sequentially loaded from the (a/s) pooling windows each time, and p×p maximum pooling results can be output after s×s periods, which is the maximum pooling operation process.
As shown in fig. 8, the logic control states of the operation control module include:
(1) read_reg, status register logic; the whole deep convolutional neural network needs to operate R0, R1, …, R (n-1), rn is shared by n layers of network, state data of all network layers are stored in state registers from R0 to Rn, and the whole deep convolutional neural network can be operated by sequentially reading the values of the current state registers.
(2) conv, convolution operation control logic; the convolution operation is completed through the logic control states of data preparation-idel, data initialization-init, data operation-datamode, batch regularization BN operation-BN, activation function-Active and data output-output.
(3) pool, max pooling control logic; and finishing the maximum pooling operation through the logic control states of data preparation-ideal, data initialization-init, maximum value comparison-MAX, temporary storage value writing-write and data output-output.
(4) route, link layer control logic; and the address of the output data of a certain layer before address data preparation-idel and loading-Load addr in the DDR memory is used as the input address of the input data of the current layer, so that the operation of the wiring layer is completed.
(5) reorder, recombination layer control logic; after address data preparation-idel, address data calculation-Count addr and mapping-reorder, the data are rearranged, as shown in fig. 9, and the input data of 2h×2w×2c are mapped into new data of h×8c as the input data of the next layer.
The present invention also provides a computer-readable storage medium storing one or more programs that are executed to implement the steps of the methods provided by all embodiments of the present invention.
As can be seen from the implementation of the above embodiments, the present invention has the following advantages:
the invention provides a method and a device for realizing a convolutional neural network based on an FPGA, which comprises the steps of initializing editable resources of the FPGA, generating an input buffer memory module, an output buffer memory module, an input control module, an output control module, a neural network processing unit, a data reading module and an operation control module, loading weight data of each processing level in a convolutional neural network model to be realized into a memory storage of the FPGA, associating a state register and the processing level of the FPGA, storing the data to be processed into the memory storage through a memory controller of the FPGA, finally reading parameters of the state register by the operation control module, determining the level to be processed, and controlling the input buffer memory module, the output buffer memory module, the input control module, the output control module, the neural network processing unit and the data reading module to finish the processing of the data by the input buffer memory module, the output control module, the neural network processing unit and the data reading module until the sequential operation of all the processing levels of the convolutional neural network model to be realized is finished, and outputting processing results corresponding to the data to be processed; in the whole process, the convolutional neural network is realized by the hardware of the FPGA, and the convolutional neural network is not dependent on software any more, so that the problem that the conventional convolutional neural network technology depends on software realization is solved.
The foregoing is a further detailed description of the invention in connection with specific embodiments, and it is not intended that the invention be limited to such description. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.

Claims (10)

1. The method for realizing the convolutional neural network based on the FPGA is characterized by comprising the following steps of:
initializing editable resources of the FPGA to generate an input cache module, an output cache module, an input control module, an output control module, a neural network processing unit, a data reading module and an operation control module;
loading weight data of each processing level in a convolutional neural network model to be realized into a memory storage of the FPGA, and associating a state register of the FPGA with the processing level;
storing data to be processed into the memory through a memory controller of the FPGA;
and the operation control module reads the parameters of the state register, determines a to-be-operated processing level, controls the input buffer module, the output buffer module, the input control module, the output control module, the neural network processing unit and the data reading module to finish the processing of the to-be-operated processing level on the data until the sequential operation of all the processing levels of the to-be-realized convolutional neural network model is finished, and outputs a processing result corresponding to the to-be-processed data.
2. The convolutional neural network implementation method of claim 1, wherein when the processing hierarchy to be operated is a convolutional computation hierarchy, the operation control module controlling the input buffer module, the output buffer module, the input control module, the output control module, the neural network processing unit, and the data reading module to complete the processing of the data by the processing hierarchy to be operated comprises:
the data reading module is controlled to read weight data and input data corresponding to the convolution computation level stored in the memory through the memory controller, and the weight data and the input data are stored in the input cache module;
the input control module is controlled to input the weight data and the input data stored by the input buffer module into the neural network processing unit;
controlling the neural network processing unit to calculate the input data by using the weight data, and outputting a calculation result;
controlling the output control module to store the calculation result into the output buffer module;
and controlling the memory controller to read the calculation result in the output buffer module and store the calculation result into the memory.
3. The convolutional neural network implementation method of claim 1, wherein when the processing hierarchy to be operated is a pooled operation hierarchy, the operation control module controlling the input buffer module, the output buffer module, the input control module, the output control module, the neural network processing unit, and the data reading module to complete the processing of the data by the processing hierarchy to be operated comprises:
controlling the data reading module to read the input data corresponding to the pooling operation level stored in the memory storage through the memory controller, and storing the input data into the input cache module;
the input control module is controlled to divide the input data stored by the input cache module into a plurality of pooling windows, and the input data are sequentially input into the neural network processing unit from the pooling windows;
controlling the neural network processing unit to carry out maximum pooling comparison on input data and outputting a comparison result;
controlling the output control module to store the comparison result into the output buffer module;
and controlling the memory controller to read the comparison result in the output buffer module and store the comparison result into the memory.
4. The convolutional neural network implementation method of claim 1, wherein when the processing hierarchy to be operated is a link operation hierarchy, the operation control module controls the input buffer module, the output buffer module, the input control module, the output control module, the neural network processing unit, and the data reading module to complete the processing of the data by the processing hierarchy to be operated, including:
determining output data of other processing layers corresponding to the input data of the current processing layer;
the storage addresses of the output data of the other processing layers in the memory are configured as the input addresses of the input data of the current processing layer;
and controlling the data reading module to read the input data corresponding to the input address stored in the memory through the memory controller, and storing the input data into the input cache module.
5. The convolutional neural network implementation method of claim 1, wherein when the processing hierarchy to be operated is a reorganization operation hierarchy, the operation control module controls the input buffer module, the output buffer module, the input control module, the output control module, the neural network processing unit, and the data reading module to complete the processing of the data by the processing hierarchy to be operated, including:
Controlling the data reading module to read the input data corresponding to the reorganization operation level stored in the memory storage through the memory controller, and storing the input data into the input cache module;
the input control module is controlled to input the input data stored by the input buffer module into the neural network processing unit;
controlling the neural network processing unit to carry out recombination operation on the input data and outputting a recombination result;
controlling the output control module to store the reorganization result into the output buffer module;
controlling the memory controller to read the reorganization result in the output buffer module and store the reorganization result into the memory;
and establishing a mapping between the storage address of the input data in the memory storage and the storage address of the recombination result in the memory storage.
6. The convolutional neural network implementation method of claim 1, wherein when the processing hierarchy to be executed is a classification operation hierarchy, the execution control module controls the input buffer module, the output buffer module, the input control module, the output control module, the neural network processing unit, and the data reading module to complete the processing of the data by the processing hierarchy to be executed, including:
Controlling the data reading module to read input data corresponding to the classification operation level stored in the memory storage through the memory controller, and storing the input data into the input cache module;
the input control module is controlled to input the input data stored by the input buffer module as an input characteristic vector into the neural network processing unit;
controlling the neural network processing unit to perform classification calculation on the input data and outputting a detection result;
controlling the output control module to store the detection result into the output buffer module;
and controlling the memory controller to read the detection result in the output buffer module and output the detection result.
7. The convolutional neural network implementation method of any one of claims 1-6, further comprising, prior to storing the data to be processed to the memory storage via a memory controller of the FPGA:
judging whether the data to be processed meets the calculation requirement of the convolutional neural network model to be realized or not;
if not, carrying out normalization processing and/or bilinear interpolation processing on the data to be processed until the calculation requirement is met;
And storing the processed data to be processed into the memory.
8. The utility model provides a convolutional neural network realization device based on FPGA which characterized in that includes:
the initialization module is used for initializing editable resources of the FPGA and generating an input cache module, an output cache module, an input control module, an output control module, a neural network processing unit, a data reading module and an operation control module;
the loading module is used for loading weight data of each processing level in the convolutional neural network model to be realized into a memory storage of the FPGA, associating a state register of the FPGA with the processing level, and storing the data to be processed into the memory storage through a memory controller of the FPGA;
the operation control module is used for reading the parameters of the state register, determining a to-be-operated processing level, controlling the input buffer module, the output buffer module, the input control module, the output control module, the neural network processing unit and the data reading module to finish the processing of the to-be-operated processing level on the data until the sequential operation of all the processing levels of the to-be-realized convolutional neural network model is finished, and outputting a processing result corresponding to the to-be-processed data.
9. The convolutional neural network implementation device of claim 8, wherein the neural network processing unit comprises a plurality of processing units for processing data in parallel.
10. The convolutional neural network implementation device of claim 8 or 9, wherein the input buffer module comprises two input storage units, and the two input storage units are used for buffering input data and/or weight data of the neural network processing unit in a ping-pong double-buffering manner; and/or the output buffer module comprises two output storage units, and the two output storage units are used for buffering the output data of the neural network processing unit in a ping-pong double-buffering mode.
CN201810074941.8A 2017-12-29 2018-01-25 FPGA-based convolutional neural network implementation method and device Active CN108416422B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711485144 2017-12-29
CN2017114851440 2017-12-29

Publications (2)

Publication Number Publication Date
CN108416422A CN108416422A (en) 2018-08-17
CN108416422B true CN108416422B (en) 2024-03-01

Family

ID=63126240

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810074941.8A Active CN108416422B (en) 2017-12-29 2018-01-25 FPGA-based convolutional neural network implementation method and device

Country Status (2)

Country Link
CN (1) CN108416422B (en)
WO (1) WO2019127838A1 (en)

Families Citing this family (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110874605A (en) * 2018-08-31 2020-03-10 北京嘉楠捷思信息技术有限公司 Image recognition processing method and device
CN109214506B (en) * 2018-09-13 2022-04-15 深思考人工智能机器人科技(北京)有限公司 Convolutional neural network establishing device and method based on pixels
CN109272113B (en) * 2018-09-13 2022-04-19 深思考人工智能机器人科技(北京)有限公司 Convolutional neural network establishing device and method based on channel
CN110929855B (en) * 2018-09-20 2023-12-12 合肥君正科技有限公司 Data interaction method and device
CN109272109B (en) * 2018-10-30 2020-07-17 北京地平线机器人技术研发有限公司 Instruction scheduling method and device of neural network model
CN109446996B (en) * 2018-10-31 2021-01-22 智慧眼科技股份有限公司 Face recognition data processing device and method based on FPGA
CN109542513B (en) * 2018-11-21 2023-04-21 山东浪潮科学研究院有限公司 Convolutional neural network instruction data storage system and method
CN109740732B (en) * 2018-12-27 2021-05-11 深圳云天励飞技术有限公司 Neural network processor, convolutional neural network data multiplexing method and related equipment
CN109948789A (en) * 2019-03-21 2019-06-28 百度在线网络技术(北京)有限公司 Data load method and device for convolutional neural networks
CN110032374B (en) * 2019-03-21 2023-04-07 深兰科技(上海)有限公司 Parameter extraction method, device, equipment and medium
CN109919312B (en) * 2019-03-29 2021-04-23 北京智芯微电子科技有限公司 Operation method and device of convolutional neural network and DPU
CN110058943B (en) * 2019-04-12 2021-09-21 三星(中国)半导体有限公司 Memory optimization method and device for electronic device
CN110097174B (en) * 2019-04-22 2021-04-20 西安交通大学 Method, system and device for realizing convolutional neural network based on FPGA and row output priority
CN110110850A (en) * 2019-04-29 2019-08-09 山东浪潮人工智能研究院有限公司 Based on before FPGA to reversed reusable processing unit implementation method
CN110378470B (en) * 2019-07-19 2023-08-18 Oppo广东移动通信有限公司 Optimization method and device for neural network model and computer storage medium
CN110636221A (en) * 2019-09-23 2019-12-31 天津天地人和企业管理咨询有限公司 System and method for super frame rate of sensor based on FPGA
CN110738317A (en) * 2019-10-17 2020-01-31 中国科学院上海高等研究院 FPGA-based deformable convolution network operation method, device and system
CN112749778B (en) * 2019-10-29 2023-11-28 北京灵汐科技有限公司 Neural network mapping method and device under strong synchronization
CN112784952B (en) * 2019-11-04 2024-03-19 珠海格力电器股份有限公司 Convolutional neural network operation system, method and equipment
CN110826507B (en) * 2019-11-11 2022-08-23 北京百度网讯科技有限公司 Face detection method, device, equipment and storage medium
CN112819022B (en) * 2019-11-18 2023-11-07 同方威视技术股份有限公司 Image recognition device and image recognition method based on neural network
CN111126309A (en) * 2019-12-26 2020-05-08 长沙海格北斗信息技术有限公司 Convolutional neural network architecture method based on FPGA and face recognition method thereof
CN113111995A (en) * 2020-01-09 2021-07-13 北京君正集成电路股份有限公司 Method for shortening model reasoning and model post-processing operation time
CN111260050B (en) * 2020-01-19 2023-03-07 中国电子科技集团公司信息科学研究院 Method and device for controlling convolutional neural network to process data
CN111416743B (en) * 2020-03-19 2021-09-03 华中科技大学 Convolutional network accelerator, configuration method and computer readable storage medium
CN111427838B (en) * 2020-03-30 2022-06-21 电子科技大学 Classification system and method for dynamically updating convolutional neural network based on ZYNQ
CN115380292A (en) * 2020-04-03 2022-11-22 北京希姆计算科技有限公司 Data storage management device and processing core
CN111445420B (en) * 2020-04-09 2023-06-06 北京爱芯科技有限公司 Image operation method and device of convolutional neural network and electronic equipment
CN113673664B (en) * 2020-05-14 2023-09-12 杭州海康威视数字技术股份有限公司 Data overflow detection method, device, equipment and storage medium
CN111783971B (en) * 2020-07-02 2024-04-09 上海赛昉科技有限公司 Highly flexibly configurable data post-processor for deep neural network
CN111931925B (en) * 2020-08-10 2024-02-09 西安电子科技大学 Acceleration system of binary neural network based on FPGA
CN112070217B (en) * 2020-10-15 2023-06-06 天津大学 Internal storage bandwidth optimization method of convolutional neural network accelerator
CN112270252A (en) * 2020-10-26 2021-01-26 西安工程大学 Multi-vehicle target identification method for improving YOLOv2 model
CN112434635B (en) * 2020-12-02 2024-02-09 深圳龙岗智能视听研究院 Convolutional neural network feature extraction method, system, embedded device and medium
CN112541583A (en) * 2020-12-16 2021-03-23 华中光电技术研究所(中国船舶重工集团公司第七一七研究所) Neural network accelerator
CN112766478B (en) * 2021-01-21 2024-04-12 中国电子科技集团公司信息科学研究院 FPGA (field programmable Gate array) pipeline structure oriented to convolutional neural network
CN113222107A (en) * 2021-03-09 2021-08-06 北京大学 Data processing method, device, equipment and storage medium
CN112990157B (en) * 2021-05-13 2021-08-20 南京广捷智能科技有限公司 Image target identification acceleration system based on FPGA
CN113379047B (en) * 2021-05-25 2024-04-05 北京微芯智通科技合伙企业(有限合伙) System and method for realizing convolutional neural network processing
CN117112452B (en) * 2023-08-24 2024-04-02 上海合芯数字科技有限公司 Register simulation configuration method, device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228238A (en) * 2016-07-27 2016-12-14 中国科学技术大学苏州研究院 The method and system of degree of depth learning algorithm is accelerated on field programmable gate array platform
CN106228240A (en) * 2016-07-30 2016-12-14 复旦大学 Degree of depth convolutional neural networks implementation method based on FPGA
CN106940815A (en) * 2017-02-13 2017-07-11 西安交通大学 A kind of programmable convolutional neural networks Crypto Coprocessor IP Core
CN106959937A (en) * 2017-03-30 2017-07-18 中国人民解放军国防科学技术大学 A kind of vectorization implementation method of warp product matrix towards GPDSP
WO2017185418A1 (en) * 2016-04-29 2017-11-02 北京中科寒武纪科技有限公司 Device and method for performing neural network computation and matrix/vector computation

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10942671B2 (en) * 2016-04-25 2021-03-09 Huawei Technologies Co., Ltd. Systems, methods and devices for a multistage sequential data process
CN106250939B (en) * 2016-07-30 2020-07-24 复旦大学 Handwritten character recognition method based on FPGA + ARM multilayer convolutional neural network
CN106875012B (en) * 2017-02-09 2019-09-20 武汉魅瞳科技有限公司 A kind of streamlined acceleration system of the depth convolutional neural networks based on FPGA

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017185418A1 (en) * 2016-04-29 2017-11-02 北京中科寒武纪科技有限公司 Device and method for performing neural network computation and matrix/vector computation
CN106228238A (en) * 2016-07-27 2016-12-14 中国科学技术大学苏州研究院 The method and system of degree of depth learning algorithm is accelerated on field programmable gate array platform
CN106228240A (en) * 2016-07-30 2016-12-14 复旦大学 Degree of depth convolutional neural networks implementation method based on FPGA
CN106940815A (en) * 2017-02-13 2017-07-11 西安交通大学 A kind of programmable convolutional neural networks Crypto Coprocessor IP Core
CN106959937A (en) * 2017-03-30 2017-07-18 中国人民解放军国防科学技术大学 A kind of vectorization implementation method of warp product matrix towards GPDSP

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
A high performance FPGA-based accelerator for large-scale convolutional neural networks;Huimin Li 等;《2016 26th International Conference on Field Programmable Logic and Applications》;第1-9页 *
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks;Chen Zhang 等;《Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays》;第161–170页 *
卷积码编码器和Viterbi译码器的FPGA实现;孙磊;《信息技术》;第27卷(第10期);第7-9、22页 *
基于FPGA的卷积神经网络Softmax层实现;李理 等;《现代计算机(专业版)》;第21-24页 *

Also Published As

Publication number Publication date
CN108416422A (en) 2018-08-17
WO2019127838A1 (en) 2019-07-04

Similar Documents

Publication Publication Date Title
CN108416422B (en) FPGA-based convolutional neural network implementation method and device
US11263007B2 (en) Convolutional neural network hardware acceleration device, convolutional calculation method, and storage medium
CN109543832B (en) Computing device and board card
CN107832843B (en) Information processing method and related product
CN110097174B (en) Method, system and device for realizing convolutional neural network based on FPGA and row output priority
US9411726B2 (en) Low power computation architecture
CN109522052B (en) Computing device and board card
CN107169563B (en) Processing system and method applied to two-value weight convolutional network
CN111310904B (en) Apparatus and method for performing convolutional neural network training
KR102470264B1 (en) Apparatus and method for performing reverse training of a fully-connected layer neural network
CN107341542B (en) Apparatus and method for performing recurrent neural networks and LSTM operations
CN108229671B (en) System and method for reducing storage bandwidth requirement of external data of accelerator
CN108629406B (en) Arithmetic device for convolutional neural network
CN107766079B (en) Processor and method for executing instructions on processor
CN111105023A (en) Data stream reconstruction method and reconfigurable data stream processor
US20230196113A1 (en) Neural network training under memory restraint
CN109711540B (en) Computing device and board card
CN110232665B (en) Maximum pooling method and device, computer equipment and storage medium
CN111488976B (en) Neural network computing device, neural network computing method and related products
CN111488963A (en) Neural network computing device and method
CN111368967B (en) Neural network computing device and method
US20220108203A1 (en) Machine learning hardware accelerator
CN115222028A (en) One-dimensional CNN-LSTM acceleration platform based on FPGA and implementation method
CN112396072B (en) Image classification acceleration method and device based on ASIC (application specific integrated circuit) and VGG16
CN111368987B (en) Neural network computing device and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant