CN111104124A - Pythrch framework-based rapid deployment method of convolutional neural network on FPGA - Google Patents

Pythrch framework-based rapid deployment method of convolutional neural network on FPGA Download PDF

Info

Publication number
CN111104124A
CN111104124A CN201911084126.0A CN201911084126A CN111104124A CN 111104124 A CN111104124 A CN 111104124A CN 201911084126 A CN201911084126 A CN 201911084126A CN 111104124 A CN111104124 A CN 111104124A
Authority
CN
China
Prior art keywords
layer
neural network
fpga
file
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911084126.0A
Other languages
Chinese (zh)
Other versions
CN111104124B (en
Inventor
姜宏旭
韩琪
刘晓戬
李波
张永华
林珂玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201911084126.0A priority Critical patent/CN111104124B/en
Publication of CN111104124A publication Critical patent/CN111104124A/en
Application granted granted Critical
Publication of CN111104124B publication Critical patent/CN111104124B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/067Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using optical means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Design And Manufacture Of Integrated Circuits (AREA)

Abstract

The invention discloses a rapid deployment method of a convolutional neural network on an FPGA (field programmable gate array) based on a Pythroch framework, which comprises the steps of establishing a model rapid mapping mechanism, constructing a reconfigurable computing unit and carrying out a self-adaptive processing flow based on rule mapping. When a convolutional neural network is defined under a Pythrch framework, a model rapid mapping mechanism is established through the construction of naming rules. And performing optimization strategy calculation under the constraint condition of hardware resources, establishing a template base based on the hardware optimization strategy, and constructing a reconfigurable calculation unit at the FPGA end. And finally, decomposing the complex network model file at the FPGA end in the self-adaptive processing flow based on the rule mapping, abstracting the network into a directed acyclic graph, and finally generating a neural network accelerator to realize the integrated flow from the model file of the Pythrch frame to the FPGA deployment. The method can establish the directed acyclic graph of the network through a model fast mapping mechanism, can complete the FPGA deployment process only by inputting hardware design variables in the FPGA deployment process, and is simple and strong in universality.

Description

Pythrch framework-based rapid deployment method of convolutional neural network on FPGA
Technical Field
The invention belongs to the technical field of hardware acceleration of convolutional neural networks, and relates to a rapid deployment method of a convolutional neural network on an FPGA (field programmable gate array) based on a Pythrch framework.
Background
In recent years, convolutional neural networks have been widely used in the fields of natural language processing, computer vision, and the like. Neural networks generally include: training and testing. Training is a process of extracting model parameters from training data and neural network models (ResNet, RNN, etc.) by using a CPU or a GPU. The test is to check the result after the test data is run by using a trained model (a neural network model plus model parameters). And the tasks, Pythrch and tensorflow are used for uniformly abstracting link data related to the training process to form a usable frame.
At present, the deep learning model deployed on the FPGA in the industry mostly obtains a network structure by analyzing the buffer prototxt, and finds a parameter value (eg. deep science and technology, horizon, and business soup) in the corresponding buffer model. The Pythrch framework definition network has simple structure and flexible use and is widely used in academic circles. Since the Pytorch model does not contain topology information of the network, the model trained in the Pytorch can be deployed only by converting the model into a buffer model through a tool such as onnx, but the onnx tool only supports a conventional buffer layer, and cannot be converted if the model is a custom layer in the Pytorch. And even after the conversion is successful, a great deal of effort is required to align the output of the pytore with the output of the caffe.
Therefore, providing a fast deployment method of a convolutional neural network based on a Pytorch framework on an FPGA is a technical problem to be solved urgently by those skilled in the art.
Disclosure of Invention
In order to achieve the above purpose, the invention provides a rapid deployment method of a convolutional neural network based on a Pytorch framework on an FPGA, which facilitates rapid deployment of the trained convolutional neural network of the Pytorch framework on the FPGA. The method mainly comprises the steps of establishing a model fast mapping mechanism, constructing a reconfigurable computing unit and carrying out a self-adaptive processing flow based on rule mapping. The efficient and convenient naming rule is created, the upper layer topology sequence and the lower layer topology sequence of the model are stored in the model file, then the complex model file is decomposed, each layer is stored as a renamed binary file, and the establishment of the model fast mapping mechanism is completed. And then, carrying out optimization strategy calculation under the constraint condition of hardware resources, and selecting the optimization strategy adopted by the FPGA deployment according to the current hardware resources. And meanwhile, a template base based on a hardware optimization strategy is established, corresponding template files in the template base are directly called when the FPGA is deployed, and the template base comprises a basic convolutional neural network structure. And finally, creating a specific structure, reading the weight file of the layer at the FPGA end, updating the structure of the structure according to the configuration information, and storing the name of the next layer so as to find the lower layer information to be executed from the weight file. Therefore, the rapid deployment of the convolutional neural network under the Pythrch framework on the FPGA is completed. The specific scheme for achieving the purpose is as follows:
firstly, establishing a model fast mapping mechanism, naming each layer of a convolutional neural network model topological structure under a Pythrch frame according to the input and output sequence of an upper layer and a lower layer, carrying out decomposition and storage on each layer of a neural network model file obtained after model training, and completing the establishment of the network topological structure of the model file under the Pythrch frame;
constructing a reconfigurable computing unit, wherein the reconfigurable computing unit comprises optimization strategy calculation under a hardware resource constraint condition and establishment of a template base based on a hardware optimization strategy and is used for generating the reconfigurable computing unit of the FPGA end;
and step three, analyzing configuration information of each layer in the neural network model file based on the adaptive processing flow of the rule mapping, performing FPGA control logic adaptive configuration through a reconfigurable computing unit of an FPGA end, and finally generating the neural network accelerator.
Preferably, the first step specifically includes:
(1) constructing naming rules of each layer of the model in a Pythrch frame, and renaming each layer of the model according to the naming rules;
(2) training the renamed network model to obtain a neural network model file with a network topological structure;
(3) and decomposing the neural network model file, storing each layer as a renamed binary file, and completing establishment of a model fast mapping mechanism.
Preferably, in the step (1), each layer in the convolutional neural network is named, and the naming rule is the name of the layer + the name of the lower layer + the configuration information.
Preferably, the convolutional layer configuration information is: convolution kernel size _ step size _ zero padding; the pooling layer configuration information is: the pooling window size _ step _ zero padding, BN layer and active layer do not need configuration information.
Preferably, in the step (3), in a neural network model file decomposition and storage stage, the trained neural network model file is firstly propagated forward once, and each time a layer of the neural network model is read, parameters in the neural network model file are stored in a binary file, where a file name of the binary file is a name of a corresponding layer. In particular, when the layer has no parameters, it needs to be saved as an empty binary file.
Preferably, the second step specifically includes:
(1) performing optimization strategy calculation under the constraint condition of hardware resources, and selecting an optimization strategy adopted by FPGA deployment according to the resources of the current hardware;
(2) and establishing a template library based on a hardware optimization strategy, and directly calling a corresponding template file in the template library when the FPGA is deployed.
Preferably, the hardware optimization strategy in step (1) includes setting of feature diagram block size, input feature diagram parallelism, and output feature diagram parallelism.
Preferably, the template library based on the hardware optimization strategy in the step (2) mainly includes a convolution module, a BN layer module, an active layer module, a pooling module, a full connection layer calculation module, and an input/output feature map buffering module.
Preferably, the third step specifically includes: creating a structure, reading the weight file of the corresponding layer at the FPGA end, analyzing the configuration information according to the name of the weight file, updating the structure according to the configuration information, and storing the name of the next layer to find the lower layer information to be executed from the weight file.
Preferably, the structure maintenance information includes hardware optimization parameters, convolutional layer configuration parameters, BN layer configuration parameters, pooling layer configuration parameters, and names of the current layer and the lower layer.
Preferably, the structure comprises the following parameters: the method comprises the steps of partitioning the size of a feature map, outputting the parallelism of the feature map, inputting the parallelism of the feature map, a convolutional layer flag bit, a BN layer flag bit, an activation layer flag bit, a pooling layer flag bit, a full-link layer flag bit, a convolutional layer convolution kernel size, a convolutional layer convolution window sliding step length, the zero padding number of the convolutional layer input feature map, the pooling layer pooling window size, a pooling layer pooling window sliding step length, the zero padding number of the pooling layer input feature map and the full-link layer calculation kernel size.
Compared with the prior art, the invention has the following beneficial effects:
1. a model fast mapping mechanism is established through efficient and convenient naming rules and a complex model file decomposition and storage mechanism, the convolutional neural network topology information is stored in a model parameter file, and the problem that the model file obtained through Pythrch frame training does not contain network topology information is solved.
2. By performing optimization strategy calculation under the constraint of hardware resources, the method is convenient for selecting proper optimization acceleration strategies for different hardware. Meanwhile, the template base based on the hardware optimization strategy can support common operations in the convolutional neural network, and has certain universality and expansibility.
3. The self-adaptive processing flow based on the rule mapping is established, the network can be abstracted into a directed acyclic graph in the network reasoning stage of the FPGA, the FPGA control logic can be configured in a self-adaptive mode according to a specific structure, and the links needing human participation in FPGA deployment are reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is obvious that the drawings in the following description are only embodiments of the invention, and that for a person skilled in the art, other drawings can be obtained from the provided drawings without inventive effort.
FIG. 1 is a flowchart of a fast deployment method of a convolutional neural network based on a Pythrch framework in an FPGA according to the present invention;
fig. 2 is a diagram illustrating an input/output characteristic diagram buffer module constructed by a reconfigurable computing unit according to an embodiment of the present invention;
fig. 3 is a flowchart illustrating generation of a network topology directed acyclic graph in an adaptive processing flow based on rule mapping according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, for a flow chart of a fast deployment method of a convolutional neural network based on a Pytorch framework on an FPGA, the design and implementation of the fast deployment method of the convolutional neural network on the FPGA are mainly divided into three parts: establishing a model fast mapping mechanism, constructing a reconfigurable computing unit and carrying out a self-adaptive processing flow based on rule mapping.
S1 model establishing fast mapping mechanism
Firstly, renaming each layer of the model when the convolutional neural network model is defined under a Pythrch framework, naming each layer in the model according to a naming mode of 'the layer name + the lower layer name + the configuration information', and separating the 'the layer name' from the 'the lower layer name' by two underlines. For example, for a convolutional layer with a convolutional kernel size of 3 × 3, a step size of 1, and padding of 0, the layer is named "conv 1__ conv2__3_1_ 0". Therefore, when the neural network model of the Pytrch frame is built, the upper and lower layer topological structures of the model are saved in the model name, and the model file of the Pytrch frame is conveniently converted into the binary file identified by the FPGA in the next step. Specifically, the output of a certain layer is a plurality of next layers, which are named as "the name of the present layer + the lower layer 1+ the lower layer 2 … + the configuration information", for example, "conv 2__ conv3__ conv4__3_2_ 1". And after naming the convolutional neural network model under the Pythrch framework, carrying out model training, so that the upper layer topological sequence and the lower layer topological sequence of the model can be stored in a model file.
And then loading the obtained neural network model file into a convolutional neural network model under a Pythrch frame for one-time forward propagation, and after reading one layer of the model, saving parameters in the model into a binary bin file, wherein the name of the binary bin file is the layer name. In particular, when the layer has no parameters, it needs to be saved as an empty binary file.
For the convolutional layer, the configuration information is "convolutional kernel size _ step _ padding", for the pooling layer, the configuration information is "pooling window size _ step _ padding", and for the BN layer and the activation function layer, no configuration information is needed.
S2 reconfigurable computing unit construction
The reconfigurable computing unit constructs an optimization strategy calculation under the constraint condition of hardware resources and establishes a template base based on the hardware optimization strategy. The specific steps of the optimization strategy calculation under the hardware resource constraint condition are shown in the following table:
Figure RE-GDA0002359229840000061
traversing the size of the input feature diagram and the size of the input and output channels of the neural network, searching for Tm (parallelism of the output feature diagram), Tn (parallelism of the input feature diagram), Tr (height of the feature diagram after partitioning) and Tc (width of the feature diagram after partitioning) which accord with the following formula, if the occupied resource of the searched hardware design parameter group is less than the total resource of the hardware and the current overall calculation delay is the minimum delay, saving the hardware design parameter group, and if not, continuously searching, and finally outputting the hardware design parameter adopted by the minimum delay as an optimization strategy under the constraint condition of the hardware resource.
Figure RE-GDA0002359229840000062
Figure RE-GDA0002359229840000063
DSP=18×Tm×Tn (3)
In the formula, LAT represents the whole calculation time delay under the current optimization strategy, BRAM represents the storage resource occupation under the current optimization strategy, and DSP represents the DSP calculation resource occupation under the current optimization strategy. R represents the height of the characteristic diagram, C represents the width of the characteristic diagram, M represents an output channel, N represents an input channel, K represents the size of a convolution kernel, Bw represents the data calculation bit width, and BS represents the size of each block BRAM in the FPGA.
The template base based on the hardware optimization strategy comprises a convolution module, a BN layer module, an activation layer module, a pooling module, a full connection layer calculation module and an input and output characteristic diagram buffer module. For the entire reconfigurable computing unit, the reconfigurable parameters include: feature map block size, input feature map parallelism, output feature map parallelism, and the like. As shown in the table:
Figure RE-GDA0002359229840000071
Figure RE-GDA0002359229840000081
the convolution calculation module comprises a block input feature map buffer area, a weight parameter buffer area, a multiplier and an adder. Before convolution starts, a convolution calculation module is configured according to convolution layer configuration parameters, wherein the convolution calculation module comprises convolution kernel size, convolution window sliding step length, input and output feature diagram parallelism and the like. And after configuration is completed, loading Tm & ltx & gt C _ k & ltx & gt weights to a weight parameter buffer area in a data stream mode, and loading Tn block input feature maps with the size of Tr & ltx & gt to an input feature map buffer area in a weight data stream mode. The convolution calculation module designs a convolution calculation unit for the sliding window convolution calculation, and the convolution calculation unit is composed of C _ k × C _ k DSPs. When convolution operation starts, C _ k × C _ k input feature graphs of Tn input channels and weights corresponding to the same number are input into a convolution calculation unit, window sliding is carried out on the block input feature graphs after multiplication and accumulation calculation is completed, and the next group of feature graphs and weights are loaded for convolution calculation. And after the convolution calculation of the group of input feature maps is finished until the input feature maps slide to the position of Tr & Tc, and after the calculation is finished, the convolution output feature maps are stored in an output feature map buffer area of the reconfigurable calculation IP core.
The pooling calculation module is a process for sampling the feature map output by convolution, and before pooling calculation is started, the pooling calculation module is configured according to pooling layer configuration parameters, including pooling window size and parallelism of output feature maps. And loading the Tm output feature maps obtained after convolution calculation into a pooling layer input feature map buffer area after configuration is finished, inputting P _ k feature values of the same positions of the Tm output feature maps into a comparison unit when pooling operation is started, and storing the maximum value in the P _ k feature values as the feature value of the output feature map in the output feature map buffer area of the reconfigurable computing IP core by the comparison unit.
The full-connection calculation module is similar to the convolution calculation module in hardware structure and calculation process and comprises an input characteristic diagram buffer area, a weight parameter buffer area, a multiplier and an adder. Before the calculation of the full-connection layer is started, the full-connection calculation module is firstly configured according to full-connection layer configuration parameters, including the calculation core size of the full-connection layer, the parallelism of input and output characteristic diagrams and the like. And after configuration is completed, loading Tm (maximum transmission number) Tn F _ k weights to a weight parameter buffer area in a data stream mode, and loading Tn block input feature maps with the size of Tr Tc to an input feature map buffer area in a weight data stream mode. And if the size of the input feature map of the full connection layer is smaller than Tr Tc, loading the whole input feature map into the input feature map buffer area. When the full-connection calculation is started, F _ k × F _ k input feature maps of Tn input channels are input to the full-connection calculation unit with the same number of weights. After the Tm output characteristic graphs are calculated, the Tm output characteristic graphs are respectively added with the corresponding Tm offsets, and then the sum is output to an output characteristic graph buffer area to be used as a calculation result of a full-connection calculation module.
The input/output characteristic diagram buffer module comprises an input characteristic diagram pingpangram and an output characteristic diagram pingpangram, as shown in fig. 3, a first group of input characteristic diagrams are stored in an I _ ram1 before calculation is started, a reconfigurable computing IP core reads the first group of input characteristic diagrams for calculation after the calculation is started, meanwhile, a second group of input characteristic diagrams are stored in an I _ ram2, the reconfigurable computing IP core reads data of the I _ ram2 for calculation after the first group of input characteristic diagrams are calculated, and at the moment, a third group of input characteristic diagrams are stored in an I _ ram 1. This reduces the latency of the transmission of the input profile. Similarly, after the reconfigurable computing IP core is computed, the first group of output characteristic diagrams are stored in the O _ ram1, after the first group of data is output, the second group of output characteristic diagrams are output to the O _ ram2, and in the same way, the two output characteristic diagram buffers are alternately output, so that the waiting time for transmitting the output characteristic diagrams is reduced.
S3 adaptive processing flow based on rule mapping
A specific structure is first created, as shown below,
Figure RE-GDA0002359229840000101
the structure maintenance information includes hardware optimization parameters, convolutional layer configuration parameters, BN layer configuration parameters, pooling layer configuration parameters, and the names of the current layer and the lower layer. Reading the layer weight file at the FPGA end, analyzing the layer configuration information according to the weight file name, updating the structure, and storing the next layer name to find the lower layer information to be executed from the weight file.
In the inference stage of the FPGA neural network, as shown in fig. 3, the weight file name of the layer is firstly analyzed at the FPGA end, and the structure is updated. And then reading the weight and the input characteristic diagram of the layer from the off-chip memory, judging whether the layer is a convolution layer according to the maintained structure, if so, calling a convolution calculation module in a template library based on a hardware optimization strategy, if not, judging whether the layer is a pooling layer, and calling a pooling calculation module in the template library based on the hardware optimization strategy in the same way. And after judging whether the function activating operation or the full connection operation is included, writing the data into the output buffer area and writing the data back to the off-chip memory from the output buffer area. Through a series of judgment and execution logics, a given neural network model topological structure is abstracted into a directed acyclic graph executed by the network through the maintained structure, so that the sequence of layer-by-layer serial execution is configured.
The method comprises the steps that in the self-adaptive processing flow based on rule mapping, an FPGA reads generated weight file names, analyzes the weight file names to maintain a specific structure, abstracts a network directed acyclic graph in judgment logic, and then calls each module in a template library based on a hardware optimization strategy, and in the process, the rapid deployment process from a convolutional neural network model under a Pythrch frame to the FPGA can be completed in a self-adaptive mode.
The fast deployment method of the convolutional neural network based on the Pytorch framework on the FPGA provided by the present invention is described in detail above, a specific example is applied in the present document to explain the principle and the implementation of the present invention, and the description of the above embodiment is only used to help understanding the method of the present invention and the core idea thereof; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. A rapid deployment method of a convolutional neural network on an FPGA based on a Pythrch framework is characterized by comprising the following steps:
firstly, establishing a model fast mapping mechanism, naming each layer of a convolutional neural network model topological structure under a Pythrch frame according to the input and output sequence of an upper layer and a lower layer, carrying out decomposition and storage on each layer of a neural network model file obtained after model training, and completing the establishment of the network topological structure of the model file under the Pythrch frame;
constructing a reconfigurable computing unit, wherein the reconfigurable computing unit comprises optimization strategy calculation under a hardware resource constraint condition and establishment of a template base based on a hardware optimization strategy and is used for generating the reconfigurable computing unit of the FPGA end;
and step three, analyzing configuration information of each layer in the neural network model file based on the adaptive processing flow of the rule mapping, performing FPGA control logic adaptive configuration through a reconfigurable computing unit of an FPGA end, and finally generating the neural network accelerator.
2. The method according to claim 1, wherein the first step specifically comprises:
(1) constructing naming rules of each layer of the model in a Pythrch frame, and renaming each layer of the model according to the naming rules;
(2) training the renamed network model to obtain a neural network model file with a network topological structure;
(3) and decomposing the neural network model file, storing each layer as a renamed binary file, and completing establishment of a model fast mapping mechanism.
3. The method according to claim 2, wherein in the step (1), each layer in the convolutional neural network is named, and a naming rule is a name of the layer + a name of a lower layer + configuration information.
4. The method of claim 3, wherein the convolutional layer configuration information is: convolution kernel size _ step size _ zero padding; the pooling layer configuration information is: the pooling window size _ step _ zero padding, BN layer and active layer do not need configuration information.
5. The method according to claim 2, wherein in the step (3), in the neural network model file decomposition and storage stage, the trained neural network model file is first propagated forward once, and each time a layer of the neural network model is read, parameters in the trained neural network model file are stored in a binary file, where a file name of the binary file is a name of the corresponding layer.
6. The method according to claim 1, wherein the second step specifically comprises:
(1) performing optimization strategy calculation under the constraint condition of hardware resources, and selecting an optimization strategy adopted by FPGA deployment according to the resources of the current hardware;
(2) and establishing a template library based on a hardware optimization strategy, and directly calling a corresponding template file in the template library when the FPGA is deployed.
7. The fast deployment method of the convolutional neural network based on the Pythrch framework on the FPGA as claimed in claim 6, wherein the hardware optimization strategy in step (1) includes setting of feature map block size, input feature map parallelism, and output feature map parallelism.
8. The fast deployment method of the Pythrch framework-based convolutional neural network on FPGA according to claim 6, wherein the template library based on the hardware optimization strategy in step (2) mainly comprises a convolution module, a BN layer module, an activation layer module, a pooling module, a fully-connected layer calculation module and an input-output feature map buffer module.
9. The method for rapidly deploying a convolutional neural network based on a Pytorch framework on an FPGA according to claim 1, wherein the third step specifically comprises: creating a structure, reading the weight file of the corresponding layer at the FPGA end, analyzing the configuration information according to the name of the weight file, updating the structure according to the configuration information, and storing the name of the next layer to find the lower layer information to be executed from the weight file.
10. The method for rapidly deploying a convolutional neural network on an FPGA based on a Pytorch framework as claimed in claim 9, wherein the structure maintenance information includes hardware optimization parameters, convolutional layer configuration parameters, BN layer configuration parameters, pooling layer configuration parameters, and names of the layer and the lower layer.
CN201911084126.0A 2019-11-07 2019-11-07 Pythrch framework-based rapid deployment method of convolutional neural network on FPGA Active CN111104124B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911084126.0A CN111104124B (en) 2019-11-07 2019-11-07 Pythrch framework-based rapid deployment method of convolutional neural network on FPGA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911084126.0A CN111104124B (en) 2019-11-07 2019-11-07 Pythrch framework-based rapid deployment method of convolutional neural network on FPGA

Publications (2)

Publication Number Publication Date
CN111104124A true CN111104124A (en) 2020-05-05
CN111104124B CN111104124B (en) 2021-07-20

Family

ID=70420627

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911084126.0A Active CN111104124B (en) 2019-11-07 2019-11-07 Pythrch framework-based rapid deployment method of convolutional neural network on FPGA

Country Status (1)

Country Link
CN (1) CN111104124B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111931913A (en) * 2020-08-10 2020-11-13 西安电子科技大学 Caffe-based deployment method of convolutional neural network on FPGA
CN112596718A (en) * 2020-12-24 2021-04-02 中国航空工业集团公司西安航空计算技术研究所 Hardware code generation and performance evaluation method
CN113222121A (en) * 2021-05-31 2021-08-06 杭州海康威视数字技术股份有限公司 Data processing method, device and equipment
CN113778459A (en) * 2021-09-08 2021-12-10 北京航空航天大学杭州创新研究院 Operator library design method for deploying optimization on FPGA and DSP
CN113780542A (en) * 2021-09-08 2021-12-10 北京航空航天大学杭州创新研究院 FPGA-oriented multi-target network structure construction method
CN114219083A (en) * 2021-04-26 2022-03-22 无锡江南计算技术研究所 Automatic deep learning model conversion method facing Caffe2 training based on ONNX
CN115018062A (en) * 2022-05-30 2022-09-06 南京航空航天大学 Convolutional neural network accelerator based on FPGA
WO2023015500A1 (en) * 2021-08-11 2023-02-16 Baidu.Com Times Technology (Beijing) Co., Ltd. Multiple-model heterogeneous computing

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107748913A (en) * 2017-11-09 2018-03-02 睿魔智能科技(东莞)有限公司 A kind of general miniaturization method of deep neural network
CN108416438A (en) * 2018-05-30 2018-08-17 济南浪潮高新科技投资发展有限公司 A kind of convolutional neural networks hardware module dispositions method
CN108805272A (en) * 2018-05-03 2018-11-13 东南大学 A kind of general convolutional neural networks accelerator based on FPGA
CN109460827A (en) * 2018-11-01 2019-03-12 郑州云海信息技术有限公司 A kind of deep learning environment is built and optimization method and system
WO2019126585A1 (en) * 2017-12-21 2019-06-27 Paypal, Inc Robust features generation architecture for fraud modeling

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107748913A (en) * 2017-11-09 2018-03-02 睿魔智能科技(东莞)有限公司 A kind of general miniaturization method of deep neural network
WO2019126585A1 (en) * 2017-12-21 2019-06-27 Paypal, Inc Robust features generation architecture for fraud modeling
CN108805272A (en) * 2018-05-03 2018-11-13 东南大学 A kind of general convolutional neural networks accelerator based on FPGA
CN108416438A (en) * 2018-05-30 2018-08-17 济南浪潮高新科技投资发展有限公司 A kind of convolutional neural networks hardware module dispositions method
CN109460827A (en) * 2018-11-01 2019-03-12 郑州云海信息技术有限公司 A kind of deep learning environment is built and optimization method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
余子健等: "基于FPGA的卷积神经网络加速器", 《计算机工程》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111931913A (en) * 2020-08-10 2020-11-13 西安电子科技大学 Caffe-based deployment method of convolutional neural network on FPGA
CN111931913B (en) * 2020-08-10 2023-08-01 西安电子科技大学 Deployment method of convolutional neural network on FPGA (field programmable gate array) based on Caffe
CN112596718A (en) * 2020-12-24 2021-04-02 中国航空工业集团公司西安航空计算技术研究所 Hardware code generation and performance evaluation method
CN112596718B (en) * 2020-12-24 2023-04-14 中国航空工业集团公司西安航空计算技术研究所 Hardware code generation and performance evaluation method
CN114219083A (en) * 2021-04-26 2022-03-22 无锡江南计算技术研究所 Automatic deep learning model conversion method facing Caffe2 training based on ONNX
CN113222121A (en) * 2021-05-31 2021-08-06 杭州海康威视数字技术股份有限公司 Data processing method, device and equipment
CN113222121B (en) * 2021-05-31 2023-08-29 杭州海康威视数字技术股份有限公司 Data processing method, device and equipment
WO2023015500A1 (en) * 2021-08-11 2023-02-16 Baidu.Com Times Technology (Beijing) Co., Ltd. Multiple-model heterogeneous computing
CN113778459A (en) * 2021-09-08 2021-12-10 北京航空航天大学杭州创新研究院 Operator library design method for deploying optimization on FPGA and DSP
CN113780542A (en) * 2021-09-08 2021-12-10 北京航空航天大学杭州创新研究院 FPGA-oriented multi-target network structure construction method
CN113780542B (en) * 2021-09-08 2023-09-12 北京航空航天大学杭州创新研究院 Method for constructing multi-target network structure facing FPGA
CN115018062A (en) * 2022-05-30 2022-09-06 南京航空航天大学 Convolutional neural network accelerator based on FPGA

Also Published As

Publication number Publication date
CN111104124B (en) 2021-07-20

Similar Documents

Publication Publication Date Title
CN111104124B (en) Pythrch framework-based rapid deployment method of convolutional neural network on FPGA
US20180204110A1 (en) Compressed neural network system using sparse parameters and design method thereof
Pelikan et al. Estimation of distribution algorithms
CN115456160A (en) Data processing method and data processing equipment
Swiecicka et al. Multiprocessor scheduling and rescheduling with use of cellular automata and artificial immune system support
Dong et al. Multi-surrogate-based Differential Evolution with multi-start exploration (MDEME) for computationally expensive optimization
CN113033811A (en) Processing method and device of two-quantum-bit logic gate
EP3614312B1 (en) System, method and processing program for determining a calculation technique
CN114595580B (en) Complex workflow engine method meeting optimization design of large flexible blade
CN110796233A (en) Self-adaptive compression method of deep residual convolution neural network based on transfer learning
US20200226458A1 (en) Optimizing artificial neural network computations based on automatic determination of a batch size
US11461656B2 (en) Genetic programming for partial layers of a deep learning model
CN116401552A (en) Classification model training method and related device
Xiao et al. An efficient algorithm for dynamic shortest path tree update in network routing
Batyuk et al. Streaming process discovery method for semi-structured business processes
CN115599918B (en) Graph enhancement-based mutual learning text classification method and system
Di Puglia Pugliese et al. Dynamic programming for spanning tree problems: application to the multi-objective case
CN111914083A (en) Statement processing method, device and storage medium
CN116306424A (en) PISA architecture chip resource arrangement method based on dynamic amplification layer-by-layer optimization algorithm with adjustable level margin improvement
CN112214683B (en) Mixed recommendation model processing method, system and medium based on heterogeneous information network
WO2021238734A1 (en) Method for training neural network, and related device
CN111931913B (en) Deployment method of convolutional neural network on FPGA (field programmable gate array) based on Caffe
CN111709275B (en) Deep network construction method for Affordance reasoning
Gelle et al. Constraint satisfaction methods for applications in engineering
US7305373B1 (en) Incremental reduced error pruning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant