WO2022001126A1 - Fpga-based neural network operation method, apparatus, and device - Google Patents

Fpga-based neural network operation method, apparatus, and device Download PDF

Info

Publication number
WO2022001126A1
WO2022001126A1 PCT/CN2021/076835 CN2021076835W WO2022001126A1 WO 2022001126 A1 WO2022001126 A1 WO 2022001126A1 CN 2021076835 W CN2021076835 W CN 2021076835W WO 2022001126 A1 WO2022001126 A1 WO 2022001126A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
fpga
sub
data
chip memory
Prior art date
Application number
PCT/CN2021/076835
Other languages
French (fr)
Chinese (zh)
Inventor
仝培霖
朱克峰
赵红博
Original Assignee
浪潮(北京)电子信息产业有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浪潮(北京)电子信息产业有限公司 filed Critical 浪潮(北京)电子信息产业有限公司
Publication of WO2022001126A1 publication Critical patent/WO2022001126A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/7817Specially adapted for signal processing, e.g. Harvard architectures

Definitions

  • the present application relates to the field of artificial intelligence, and in particular, to an FPGA-based neural network computing method, device, and device.
  • FPGA Field Programmable Gate Array
  • Field Programmable Gate Array has the ability to perform data operations in parallel, and has a flexible structure, it can realize the pipeline design of data operations, so FPGA is often used to infer the neural network model obtained by training. operation.
  • the on-chip memory of the FPGA is a memory medium set on the FPGA chip, which has high data reading and writing efficiency.
  • the on-chip memory of the FPGA will be used to provide data for the computing logic. Due to the limited rated resources of the on-chip memory of the FPGA, when performing inference operations on the neural network model, the neural network model may not be completely written into the on-chip memory of a single FPGA, and it is difficult to ensure the overall performance of the inference operation of the neural network model based on the FPGA. efficient.
  • the purpose of the present application is to provide an FPGA-based neural network operation method to relatively ensure the overall efficiency of the inference operation of the neural network model based on the FPGA.
  • the application provides a kind of FPGA-based neural network computing method, comprising:
  • the neural network model is divided into sub-models with corresponding data volumes; wherein, the data volume of each sub-model is not greater than the on-chip memory capacity of the corresponding FPGA;
  • the data flow between the corresponding FPGAs is set according to the execution sequence between the sub-models, and each FPGA is controlled to execute the neural network operation based on the corresponding sub-models in turn according to the execution sequence.
  • the neural network model is divided into sub-models with corresponding data volumes, including:
  • the sub-model corresponding to each target network layer is obtained by splitting in the neural network model.
  • the amount of layer data includes the amount of parameters and the amount of process data; wherein, the amount of process data is the amount of data generated when the corresponding network layer performs the neural network operation.
  • the parameter quantities and process data quantities of each network layer in the neural network model are counted, including:
  • the corresponding parameter amount is obtained based on the number of filters, the number of channels and the size of the convolution kernel in each network layer in the neural network model, and the corresponding process data amount is obtained based on the number of filters in each network layer and the size of the intermediate data.
  • the method before counting the on-chip memory capacities corresponding to multiple FPGAs, the method further includes:
  • the data volume of the neural network model is larger than the on-chip memory capacity of one FPGA, perform the step of counting the on-chip memory capacities corresponding to multiple FPGAs.
  • the method further includes:
  • the neural network model after parameter data type conversion is divided into sub-models with corresponding data amount.
  • the data type of the parameters in the neural network model is converted from floating point type to fixed point type, including:
  • the data type of the parameters in the corresponding channel in the neural network model is converted from floating point type to fixed point type.
  • an FPGA-based neural network computing device comprising:
  • a model acquisition module for acquiring the neural network model
  • the memory statistics module is used to count the on-chip memory capacity corresponding to multiple FPGAs
  • the model splitting module is used to split the neural network model into sub-models with corresponding data amounts according to the on-chip memory capacity of each FPGA; wherein, the data amount of each sub-model is not greater than the on-chip memory capacity of the corresponding FPGA;
  • the model allocation module is used to allocate the sub-model to the on-chip memory of the corresponding FPGA
  • the model execution module is used to set the data flow direction between the corresponding FPGAs according to the execution sequence between the sub-models, and sequentially control the FPGAs to perform neural network operations based on the corresponding sub-models according to the execution sequence.
  • an FPGA-based neural network computing device including:
  • the processor is configured to implement the steps of the above-mentioned FPGA-based neural network operation method when executing the computer program.
  • the present application also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the above-mentioned FPGA-based neural network computing method are implemented.
  • the FPGA-based neural network computing method acquires a neural network model and counts the on-chip memory capacity corresponding to multiple FPGAs, and then divides the neural network model into subsections with corresponding data amounts according to the on-chip memory capacity of each FPGA. model, in which the data volume of each sub-model is not greater than the on-chip memory capacity of the corresponding FPGA, so as to further allocate the sub-models to the on-chip memory of the corresponding FPGA, and set the corresponding FPGA according to the execution order between the sub-models. According to the execution sequence, each FPGA is controlled to execute the neural network operation based on the corresponding sub-model.
  • multiple FPGAs jointly provide on-chip memory resources, and by dividing the complete neural network model into multiple sub-models, and assigning the sub-models to the on-chip memories of multiple FPGAs, further avoiding the need for a single FPGA on-chip memory.
  • the problem of relatively limited rated resources further ensures the overall efficiency of inference operations based on FPGAs for executing neural network models.
  • the present application also provides an FPGA-based neural network computing device, equipment and storage medium, and the beneficial effects are the same as described above.
  • FIG. 1 is a flowchart of an FPGA-based neural network computing method disclosed in an embodiment of the application
  • FIG. 2 is a flowchart of an FPGA-based neural network computing method disclosed in an embodiment of the application
  • FIG. 3 is a schematic structural diagram of an FPGA-based neural network computing device disclosed in an embodiment of the present application.
  • the on-chip memory of the FPGA is a memory medium set on the FPGA chip, which has high data reading and writing efficiency.
  • the on-chip memory of the FPGA will be used to provide data for the computing logic. Due to the limited rated resources of the on-chip memory of the FPGA, when performing inference operations on the neural network model, the neural network model may not be completely written into the on-chip memory of a single FPGA, and it is difficult to ensure the overall performance of the inference operation of the neural network model based on the FPGA. efficient.
  • the core of the present application is to provide an FPGA-based neural network operation method to relatively ensure the overall efficiency of the inference operation of the neural network model based on the FPGA.
  • an embodiment of the present application discloses an FPGA-based neural network computing method, including:
  • Step S10 Obtain a neural network model.
  • the neural network model obtained in this step refers to the neural network model used to perform inference operations, that is to say, the neural network model obtained in this step is used for A model for analyzing data in real scenarios.
  • Step S11 Count on-chip memory capacities corresponding to multiple FPGAs.
  • the focus of this step is to count the respective on-chip memory capacities of multiple FPGAs. That is, in this embodiment, the number of FPGAs used to perform neural network operations should be greater than 1. On this basis, each Each FPGA provides its own on-chip memory resources for cooperative operations on the neural network model.
  • the purpose of counting the on-chip memory capacities corresponding to multiple FPGAs in this step is to allocate sub-models of corresponding data amounts to the on-chip memory of the FPGA according to the on-chip memory capacities of different FPGAs in subsequent steps.
  • step S10 and step S11 in this embodiment is The sequence is not fixed, and can be executed simultaneously, which should be determined according to the actual situation, and is not specifically limited here.
  • Step S12 According to the on-chip memory capacity of each FPGA, the neural network model is divided into sub-models with corresponding data amounts.
  • the data volume of each sub-model is not greater than the on-chip memory capacity of the corresponding FPGA.
  • this step further divides the neural network model into sub-models with corresponding data amounts according to the on-chip memory capacity of each FPGA. That is, in this step, according to the on-chip memory capacity of each FPGA Memory capacity, split the complete neural network model into sub-models that match the on-chip memory capacity of each FPGA, and each sub-model can be combined into a complete neural network model.
  • the data volume of each sub-model is not greater than the on-chip memory capacity of its corresponding FPGA, that is, the data volume of the sub-model obtained by splitting the on-chip memory capacity of the FPGA should be less than or equal to the FPGA's on-chip memory capacity.
  • On-chip memory capacity to ensure that the FPGA can properly perform operations on the sub-model.
  • Step S13 Allocate the sub-model to the on-chip memory corresponding to the FPGA.
  • this step further allocates the sub-models to the on-chip memory corresponding to the FPGA, so that in the subsequent steps, each FPGA is used to separate the sub-models. Executes a submodel of on-chip memory.
  • Step S14 Set the data flow direction between the corresponding FPGAs according to the execution sequence between the sub-models, and sequentially control the FPGAs to execute the neural network operation based on the corresponding sub-models according to the execution sequence.
  • this step further sets the data flow direction between the corresponding FPGAs according to the execution sequence between the sub-models, and sequentially controls the FPGAs to execute based on the corresponding sub-models according to the execution sequence
  • the purpose of neural network operation is to ensure that there is data flow between FPGAs, and the data flow between FPGAs is consistent with the execution order of the corresponding sub-models in each FPGA.
  • the effect of the network model is consistent, thereby ensuring the reliability of the neural network operation of the FPGA.
  • the FPGA-based neural network computing method acquires a neural network model and counts the on-chip memory capacity corresponding to multiple FPGAs, and then divides the neural network model into subsections with corresponding data amounts according to the on-chip memory capacity of each FPGA. model, in which the data volume of each sub-model is not greater than the on-chip memory capacity of the corresponding FPGA, so as to further allocate the sub-models to the on-chip memory of the corresponding FPGA, and set the corresponding FPGA according to the execution order between the sub-models. According to the execution sequence, each FPGA is controlled to execute the neural network operation based on the corresponding sub-model.
  • multiple FPGAs jointly provide on-chip memory resources, and by dividing the complete neural network model into multiple sub-models, and assigning the sub-models to the on-chip memories of multiple FPGAs, further avoiding the need for a single FPGA on-chip memory.
  • the problem of relatively limited rated resources further ensures the overall efficiency of inference operations based on FPGAs for executing neural network models.
  • the method before counting the on-chip memory capacities corresponding to multiple FPGAs, the method further includes:
  • the data volume of the neural network model is larger than the on-chip memory capacity of one FPGA, perform the step of counting the on-chip memory capacities corresponding to multiple FPGAs.
  • the key point of this embodiment is that after acquiring the neural network model, it is further judged whether the data volume of the neural network model is greater than the on-chip memory capacity of an FPGA, and then only when the data volume of the neural network model is greater than the on-chip memory capacity of an FPGA, execute The step of counting the on-chip memory capacity corresponding to multiple FPGAs, and when the data volume of the neural network model is less than or equal to the on-chip memory capacity of one FPGA, it is only necessary to completely allocate the audit network model to the on-chip memory of one FPGA.
  • This embodiment further ensures the flexibility of implementing the neural network operation process based on the FPGA.
  • an embodiment of the present application discloses an FPGA-based neural network computing method, including:
  • Step S20 Obtain a neural network model.
  • Step S21 Count on-chip memory capacities corresponding to multiple FPGAs.
  • Step S22 Count the layer data amount of each network layer in the neural network model.
  • the focus of this embodiment is to divide the neural network model into sub-models in units of network layers in the neural network model.
  • the layer data amount of each network layer in the neural network model is counted first, and the purpose is to divide the sub-model composed of a corresponding number of network layers to the FPGA according to the FPGA on-chip memory capacity in the subsequent steps.
  • Step S23 Based on the layer data amount of each network layer and the on-chip memory capacity of each FPGA, sequentially calculate the target network layer corresponding to each FPGA.
  • this step further calculates the target network layer corresponding to each FPGA in turn based on the data volume of each network layer, that is, the layer data volume and the on-chip memory capacity of each FPGA.
  • the number of target network layers corresponding to each FPGA is determined according to the on-chip memory capacity of the corresponding FPGA.
  • Step S24 Split the neural network model to obtain sub-models corresponding to each target network layer.
  • the data volume of each sub-model is not greater than the on-chip memory capacity of the corresponding FPGA.
  • this step further divides the neural network model into sub-models with target network layers.
  • the neural network model when the number of target network layers corresponding to the FPGA is greater than 1 and the target network layers are adjacent, split the neural network model to obtain sub-models corresponding to each target network layer.
  • the network model is split to obtain sub-models that include each target network layer, so as to relatively reduce the number of sub-models, and then control the number of calls to the sub-models when each FPGA performs neural network operations based on the corresponding sub-models, improving the execution neural network. The efficiency of network operations.
  • Step S25 Allocate the sub-model to the on-chip memory corresponding to the FPGA.
  • Step S26 Set the data flow direction between the corresponding FPGAs according to the execution sequence between the sub-models, and sequentially control the FPGAs to perform neural network operations based on the corresponding sub-models according to the execution sequence.
  • This embodiment further ensures the network integrity of the neural network model segments included in the sub-model, and relatively improves the reliability of controlling each FPGA to perform neural network operations based on the corresponding sub-model.
  • N N>1) FPGA devices available, the memory sizes are M[1]...M[N], the number of layers of the neural network model is len, and the parameters of each layer are W[1]...W[len] , the intermediate data volume of each layer is A[1]...A[len], then it can be compared by cyclic accumulation.
  • the neural network model When it is accumulated to the i-th layer, the data volume is greater than the memory of the first FPGA device, then the neural network model The sub-models containing layers 1 to i-1 are assigned to the first FPGA device, and then cyclically accumulated and compared again starting from i, until the jth layer is greater than the second FPGA device, and the neural network model includes layers i to j-1.
  • the submodel of is assigned to the second FPGA device, and so on, until all parameters are assigned to the FPGA device. If the FPGA device is insufficient, the sub-model is allocated to each device according to the cycle, and the sub-model is first placed in the off-chip memory. According to the execution sequence, each FPGA is controlled to execute the neural network operation based on the corresponding sub-model.
  • the layer data volume includes parameter volume and process data volume; wherein, the process data volume is the data volume of data generated during the neural network operation performed by the corresponding network layer.
  • the layer data volume of each network layer in the neural network model further includes the parameter volume and process data volume of each network layer, wherein the parameter volume refers to the network layer when performing the neural network operation.
  • the process data volume refers to the data volume of the data generated by the network layer performing the neural network operation process.
  • the parameter amount and the process data amount of each network layer in the neural network model are counted, including:
  • the corresponding parameter amount is obtained based on the number of filters, the number of channels and the size of the convolution kernel in each network layer in the neural network model, and the corresponding process data amount is obtained based on the number of filters in each network layer and the size of the intermediate data.
  • the parameter quantities of each network layer in the neural network model are statistically obtained based on the number of filters, the number of channels, and the size of the convolution kernel in the corresponding network layer, and the amount of process data of each network layer is based on the corresponding The number of filters in the network layer and the size of the intermediate data are counted.
  • This embodiment further ensures the overall accuracy of the process of calculating the target network layer corresponding to each FPGA, thereby ensuring the reliability of the FPGA-based neural network operation.
  • the number of channels of a certain network layer in the neural network model is C
  • the number of filters is N
  • the length and width of the convolution kernel are K respectively, that is, the length of the convolution kernel is K.
  • the parameter quantity of the network layer is C*K*K*N
  • the length and width of the intermediate data are W, H respectively , that is, when the size of the intermediate data is W*H
  • the parameter quantity of the network layer is N*W*H.
  • Methods when the data type of the parameters in the neural network model is a floating-point type, before dividing the neural network model into sub-models with corresponding data amounts, Methods also include:
  • the neural network model after parameter data type conversion is divided into sub-models with corresponding data amount.
  • the focus of this embodiment is that when the data type of the parameters in the neural network model is a floating point type, before dividing the neural network model into sub-models with corresponding data amounts, the parameters in the neural network model are The data type of FPGA is converted from floating-point type to fixed-point type, which relatively reduces the overall data volume of the neural network model, which can relatively reduce the overall data volume of neural network operations based on FPGA, and relatively improve the overall efficiency of neural network operations.
  • convert the data type of the parameters in the neural network model from floating-point to fixed-point including:
  • the data type of the parameters in the corresponding channel in the neural network model is converted from floating point type to fixed point type.
  • the quantization coefficient of each channel is calculated based on the maximum parameter value of each channel in the neural network model, and then the quantization coefficient of each channel is calculated.
  • the quantization coefficient is used as a conversion weight between data types, that is, according to each quantization coefficient, the data type of the parameters in the corresponding channel in the neural network model is converted from a floating point type to a fixed point type.
  • the maximum parameter value of each channel in the neural network model is used as the limiting factor of the data bit range of the fixed-point data when the data type of the parameters in the channel is converted from floating-point to fixed-point, which further ensures that the neural network The accuracy of converting the data types of parameters in the model from floating point to fixed point.
  • the neural network model is divided into two parts, one part is the parameters in the neural network model, which are called weights here; the other part is the intermediate data processed inside the neural network model, which is called activation values here.
  • the quantization of the neural network model is to convert the model data from floating point to fixed point data. First, the weights need to be quantized to 2 n , and then the activation values are quantized.
  • the quantification of the activation value also needs to determine its threshold range. Since the thresholds in different tasks are different, it is necessary to combine the dataset to obtain the threshold range, and then obtain the threshold range. details as follows:
  • an embodiment of the present application provides an FPGA-based neural network computing device, including:
  • a model obtaining module 10 for obtaining a neural network model
  • the memory statistics module 11 is used to count the on-chip memory capacities corresponding to multiple FPGAs;
  • Model splitting module 12 is used to split the neural network model into a sub-model with corresponding data volume according to the on-chip memory capacity of each FPGA; wherein, the data volume of each sub-model is not greater than the on-chip memory capacity of corresponding FPGA;
  • the model allocation module 13 is used for allocating the sub-model to the on-chip memory corresponding to the FPGA;
  • the model execution module 14 is configured to set the data flow direction between the corresponding FPGAs according to the execution sequence between the sub-models, and sequentially control the FPGAs to perform neural network operations based on the corresponding sub-models according to the execution sequence.
  • the FPGA-based neural network computing device acquires a neural network model and counts the on-chip memory capacities corresponding to multiple FPGAs, and then divides the neural network model into sub-systems with corresponding data amounts according to the on-chip memory capacity of each FPGA. model, in which the data volume of each sub-model is not greater than the on-chip memory capacity of the corresponding FPGA, so as to further allocate the sub-models to the on-chip memory of the corresponding FPGA, and set the corresponding FPGA according to the execution order between the sub-models. According to the execution sequence, each FPGA is controlled to execute the neural network operation based on the corresponding sub-model.
  • the device adopts the way that multiple FPGAs jointly provide on-chip memory resources.
  • the on-chip memory of a single FPGA is further avoided.
  • the problem of relatively limited rated resources further ensures the overall efficiency of inference operations based on FPGAs for executing neural network models.
  • an embodiment of the present application also provides an FPGA-based neural network computing device, including:
  • the processor is configured to implement the steps of the above-mentioned FPGA-based neural network operation method when executing the computer program.
  • the FPGA-based neural network computing device acquires a neural network model and counts the on-chip memory capacities corresponding to multiple FPGAs, and then divides the neural network model into subsections with corresponding data amounts according to the on-chip memory capacity of each FPGA. model, in which the data volume of each sub-model is not greater than the on-chip memory capacity of the corresponding FPGA, so as to further allocate the sub-models to the on-chip memory of the corresponding FPGA, and set the corresponding FPGA according to the execution order between the sub-models. According to the execution sequence, each FPGA is controlled to execute the neural network operation based on the corresponding sub-model.
  • This device uses multiple FPGAs to jointly provide on-chip memory resources.
  • the on-chip memory of a single FPGA is further avoided.
  • the problem of relatively limited rated resources further ensures the overall efficiency of inference operations based on FPGAs for executing neural network models.
  • embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the above-mentioned FPGA-based neural network computing method are implemented.
  • the computer-readable storage medium provided by the present application acquires a neural network model and counts the on-chip memory capacities corresponding to multiple FPGAs, and then divides the neural network model into sub-models with corresponding data volumes according to the on-chip memory capacities of each FPGA, Among them, the data amount of each sub-model is not greater than the on-chip memory capacity of the corresponding FPGA, so as to further allocate the sub-models to the on-chip memory of the corresponding FPGA, and set the corresponding FPGA according to the execution order between the sub-models. Data flow, and then control each FPGA to perform neural network operations based on the corresponding sub-models in turn according to the execution sequence.
  • the computer-readable storage medium adopts the way that multiple FPGAs jointly provide on-chip memory resources, and further avoids the need for a single
  • the problem of the relatively limited rated resources of the FPGA on-chip memory further ensures the overall efficiency of the inference operation of the neural network model based on the FPGA.
  • the FPGA-based neural network computing method, device, device, and storage medium provided by the present application have been described above in detail.
  • the various embodiments in the specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments, and the same and similar parts between the various embodiments can be referred to each other.
  • the description is relatively simple, and the relevant part can be referred to the description of the method. It should be pointed out that for those of ordinary skill in the art, without departing from the principles of the present application, several improvements and modifications can also be made to the present application, and these improvements and modifications also fall within the protection scope of the claims of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Logic Circuits (AREA)

Abstract

An FPGA-based neural network operation method, apparatus, and device, and a storage medium. The method comprises: obtaining a neural network model (S10); counting on-chip memory capacities corresponding to a plurality of FPGAs (S11); according to the on-chip memory capacity of each FPGA, splitting the neural network model into sub-models having corresponding data volumes (S12), wherein the data volume of each sub-model is not greater than the on-chip memory capacity of the corresponding FPGA; distributing the sub-models to on-chip memories of the corresponding FPGAs (S13); and setting a data flow direction among the corresponding FPGAs according to an execution sequence among the sub-models, and sequentially controlling the FPGAs to execute neural network operation on the basis of the corresponding sub-models according to the execution sequence (S14). The method ensures the overall efficiency of executing the reasoning operation of the neural network model on the basis of the FPGAs.

Description

一种基于FPGA的神经网络运算方法、装置及设备A neural network computing method, device and equipment based on FPGA
本申请要求于2020年6月30日提交中国专利局、申请号为CN202010614610.6、发明名称为“一种基于FPGA的神经网络运算方法、装置及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on June 30, 2020 with the application number CN202010614610.6 and the invention title "An FPGA-based neural network computing method, device and equipment", all of which are The contents are incorporated herein by reference.
技术领域technical field
本申请涉及人工智能领域,特别是涉及一种基于FPGA的神经网络运算方法、装置及设备。The present application relates to the field of artificial intelligence, and in particular, to an FPGA-based neural network computing method, device, and device.
背景技术Background technique
由于FPGA(Field Programmable Gate Array,现场可编程逻辑门阵列),具有并行执行数据运算的能力,并且结构灵活,能够实现数据运算的流水线设计,因此当前往往使用FPGA对训练得到的神经网络模型进行推理运算。Because FPGA (Field Programmable Gate Array, Field Programmable Gate Array) has the ability to perform data operations in parallel, and has a flexible structure, it can realize the pipeline design of data operations, so FPGA is often used to infer the neural network model obtained by training. operation.
FPGA的片上内存是设置于FPGA芯片上的内存介质,具有较高的数据读写效率,进而在通过FPGA进行神经网络模型的推理运算时,均会使用FPGA的片上内存为计算逻辑提供数据,但由于FPGA的片上内存的额定资源有限,因此在对神经网络模型进行推理运算时,可能无法将神经网络模型完整写入单个FPGA的片上内存,进而难以确保基于FPGA执行神经网络模型的推理运算的整体效率。The on-chip memory of the FPGA is a memory medium set on the FPGA chip, which has high data reading and writing efficiency. When the inference operation of the neural network model is performed through the FPGA, the on-chip memory of the FPGA will be used to provide data for the computing logic. Due to the limited rated resources of the on-chip memory of the FPGA, when performing inference operations on the neural network model, the neural network model may not be completely written into the on-chip memory of a single FPGA, and it is difficult to ensure the overall performance of the inference operation of the neural network model based on the FPGA. efficient.
由此可见,提供一种基于FPGA的神经网络运算方法,以相对确保基于FPGA执行神经网络模型的推理运算的整体效率,是本领域技术人员需要解决的问题。It can be seen that it is a problem to be solved by those skilled in the art to provide an FPGA-based neural network operation method to relatively ensure the overall efficiency of the inference operation of the neural network model based on the FPGA.
发明内容SUMMARY OF THE INVENTION
本申请的目的是提供一种基于FPGA的神经网络运算方法,以相对确保基于FPGA执行神经网络模型的推理运算的整体效率。The purpose of the present application is to provide an FPGA-based neural network operation method to relatively ensure the overall efficiency of the inference operation of the neural network model based on the FPGA.
为解决上述技术问题,本申请提供一种基于FPGA的神经网络运算方 法,包括:For solving the above-mentioned technical problems, the application provides a kind of FPGA-based neural network computing method, comprising:
获取神经网络模型;Get the neural network model;
统计多个FPGA对应的片上内存容量;Count the on-chip memory capacity corresponding to multiple FPGAs;
根据各FPGA的片上内存容量,将神经网络模型拆分为具有相应数据量的子模型;其中,各子模型的数据量不大于所对应的FPGA的片上内存容量;According to the on-chip memory capacity of each FPGA, the neural network model is divided into sub-models with corresponding data volumes; wherein, the data volume of each sub-model is not greater than the on-chip memory capacity of the corresponding FPGA;
将子模型分配至对应FPGA的片上内存;Allocate the sub-model to the on-chip memory of the corresponding FPGA;
根据各子模型之间的执行顺序设定相应各FPGA之间的数据流向,并根据执行顺序依次控制各FPGA基于相应的子模型执行神经网络运算。The data flow between the corresponding FPGAs is set according to the execution sequence between the sub-models, and each FPGA is controlled to execute the neural network operation based on the corresponding sub-models in turn according to the execution sequence.
优选地,根据各FPGA的片上内存容量,将神经网络模型拆分为具有相应数据量的子模型,包括:Preferably, according to the on-chip memory capacity of each FPGA, the neural network model is divided into sub-models with corresponding data volumes, including:
统计神经网络模型中各网络层的层数据量;Count the layer data volume of each network layer in the neural network model;
基于各网络层的层数据量以及各FPGA的片上内存容量,依次计算各FPGA对应的目标网络层;Based on the layer data volume of each network layer and the on-chip memory capacity of each FPGA, calculate the target network layer corresponding to each FPGA in turn;
在神经网络模型中拆分得到各目标网络层对应的子模型。The sub-model corresponding to each target network layer is obtained by splitting in the neural network model.
优选地,层数据量包括参数量以及过程数据量;其中,过程数据量为相应网络层执行神经网络运算过程中产生数据的数据量。Preferably, the amount of layer data includes the amount of parameters and the amount of process data; wherein, the amount of process data is the amount of data generated when the corresponding network layer performs the neural network operation.
优选地,统计神经网络模型中各网络层的参数量以及过程数据量,包括:Preferably, the parameter quantities and process data quantities of each network layer in the neural network model are counted, including:
基于神经网络模型中各网络层的过滤器数量、通道数量以及卷积核尺寸统计得到相应的参数量,以及基于各网络层中过滤器数量以及中间数据的尺寸统计得到相应的过程数据量。The corresponding parameter amount is obtained based on the number of filters, the number of channels and the size of the convolution kernel in each network layer in the neural network model, and the corresponding process data amount is obtained based on the number of filters in each network layer and the size of the intermediate data.
优选地,在统计多个FPGA对应的片上内存容量之前,方法还包括:Preferably, before counting the on-chip memory capacities corresponding to multiple FPGAs, the method further includes:
判断神经网络模型的数据量是否大于一个FPGA的片上内存容量;Determine whether the data volume of the neural network model is greater than the on-chip memory capacity of an FPGA;
如果神经网络模型的数据量大于一个FPGA的片上内存容量,则执行统计多个FPGA对应的片上内存容量的步骤。If the data volume of the neural network model is larger than the on-chip memory capacity of one FPGA, perform the step of counting the on-chip memory capacities corresponding to multiple FPGAs.
优选地,当神经网络模型中参数的数据类型为浮点型时,在将神经网络模型拆分为具有相应数据量的子模型之前,方法还包括:Preferably, when the data type of the parameters in the neural network model is a floating point type, before splitting the neural network model into sub-models with corresponding data amounts, the method further includes:
将神经网络模型中参数的数据类型由浮点型转化为定点型;Convert the data type of the parameters in the neural network model from floating-point to fixed-point;
将神经网络模型拆分为具有相应数据量的子模型,包括:Split the neural network model into sub-models with corresponding amounts of data, including:
将参数的数据类型转化后的神经网络模型拆分为具有相应数据量的子模型。The neural network model after parameter data type conversion is divided into sub-models with corresponding data amount.
优选地,将神经网络模型中参数的数据类型由浮点型转化为定点型,包括:Preferably, the data type of the parameters in the neural network model is converted from floating point type to fixed point type, including:
获取神经网络模型中各通道的最大参数值;Obtain the maximum parameter value of each channel in the neural network model;
根据各最大参数值计算相应各通道的量化系数;Calculate the quantization coefficients of the corresponding channels according to the maximum parameter values;
基于各量化系数将神经网络模型中相应通道内参数的数据类型由浮点型转化为定点型。Based on each quantization coefficient, the data type of the parameters in the corresponding channel in the neural network model is converted from floating point type to fixed point type.
此外,本申请还提供一种基于FPGA的神经网络运算装置,包括:In addition, the present application also provides an FPGA-based neural network computing device, comprising:
模型获取模块,用于获取神经网络模型;A model acquisition module for acquiring the neural network model;
内存统计模块,用于统计多个FPGA对应的片上内存容量;The memory statistics module is used to count the on-chip memory capacity corresponding to multiple FPGAs;
模型拆分模块,用于根据各FPGA的片上内存容量,将神经网络模型拆分为具有相应数据量的子模型;其中,各子模型的数据量不大于所对应的FPGA的片上内存容量;The model splitting module is used to split the neural network model into sub-models with corresponding data amounts according to the on-chip memory capacity of each FPGA; wherein, the data amount of each sub-model is not greater than the on-chip memory capacity of the corresponding FPGA;
模型分配模块,用于将子模型分配至对应FPGA的片上内存;The model allocation module is used to allocate the sub-model to the on-chip memory of the corresponding FPGA;
模型执行模块,用于根据各子模型之间的执行顺序设定相应各FPGA之间的数据流向,并根据执行顺序依次控制各FPGA基于相应的子模型执行神经网络运算。The model execution module is used to set the data flow direction between the corresponding FPGAs according to the execution sequence between the sub-models, and sequentially control the FPGAs to perform neural network operations based on the corresponding sub-models according to the execution sequence.
此外,本申请还提供一种基于FPGA的神经网络运算设备,包括:In addition, the present application also provides an FPGA-based neural network computing device, including:
存储器,用于存储计算机程序;memory for storing computer programs;
处理器,用于执行计算机程序时实现如上述的基于FPGA的神经网络运算方法的步骤。The processor is configured to implement the steps of the above-mentioned FPGA-based neural network operation method when executing the computer program.
此外,本申请还提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现如上述的基于FPGA的神经网络运算方法的步骤。In addition, the present application also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the above-mentioned FPGA-based neural network computing method are implemented.
本申请所提供的基于FPGA的神经网络运算方法,获取神经网络模型并统计多个FPGA对应的片上内存容量,进而根据各FPGA的片上内存容量,将神经网络模型拆分为具有相应数据量的子模型,其中,各子模型的 数据量不大于所对应的FPGA的片上内存容量,以此进一步将子模型分配至对应FPGA的片上内存,根据各子模型之间的执行顺序设定相应各FPGA之间的数据流向,进而根据执行顺序依次控制各FPGA基于相应的子模型执行神经网络运算。本方法采用多个FPGA共同提供片上内存的资源的方式,通过将完整的神经网络模型划分为多个子模型,并将子模型分配至多个FPGA的片上内存的方式,进一步避免了单个FPGA片上内存的额定资源相对有限的问题,进一步确保了基于FPGA执行神经网络模型的推理运算的整体效率。此外,本申请还提供一种基于FPGA的神经网络运算装置、设备及存储介质,有益效果同上所述。The FPGA-based neural network computing method provided by the present application acquires a neural network model and counts the on-chip memory capacity corresponding to multiple FPGAs, and then divides the neural network model into subsections with corresponding data amounts according to the on-chip memory capacity of each FPGA. model, in which the data volume of each sub-model is not greater than the on-chip memory capacity of the corresponding FPGA, so as to further allocate the sub-models to the on-chip memory of the corresponding FPGA, and set the corresponding FPGA according to the execution order between the sub-models. According to the execution sequence, each FPGA is controlled to execute the neural network operation based on the corresponding sub-model. In this method, multiple FPGAs jointly provide on-chip memory resources, and by dividing the complete neural network model into multiple sub-models, and assigning the sub-models to the on-chip memories of multiple FPGAs, further avoiding the need for a single FPGA on-chip memory. The problem of relatively limited rated resources further ensures the overall efficiency of inference operations based on FPGAs for executing neural network models. In addition, the present application also provides an FPGA-based neural network computing device, equipment and storage medium, and the beneficial effects are the same as described above.
附图说明Description of drawings
图1为本申请实施例公开的一种基于FPGA的神经网络运算方法的流程图;1 is a flowchart of an FPGA-based neural network computing method disclosed in an embodiment of the application;
图2为本申请实施例公开的一种基于FPGA的神经网络运算方法的流程图;2 is a flowchart of an FPGA-based neural network computing method disclosed in an embodiment of the application;
图3为本申请实施例公开的一种基于FPGA的神经网络运算装置的结构示意图。FIG. 3 is a schematic structural diagram of an FPGA-based neural network computing device disclosed in an embodiment of the present application.
具体实施方式detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下,所获得的所有其他实施例,都属于本申请保护范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. All other embodiments obtained by those of ordinary skill in the art based on the embodiments in the present application without creative work fall within the protection scope of the present application.
FPGA的片上内存是设置于FPGA芯片上的内存介质,具有较高的数据读写效率,进而在通过FPGA进行神经网络模型的推理运算时,均会使用FPGA的片上内存为计算逻辑提供数据,但由于FPGA的片上内存的额定资源有限,因此在对神经网络模型进行推理运算时,可能无法将神经网 络模型完整写入单个FPGA的片上内存,进而难以确保基于FPGA执行神经网络模型的推理运算的整体效率。The on-chip memory of the FPGA is a memory medium set on the FPGA chip, which has high data reading and writing efficiency. When the inference operation of the neural network model is performed through the FPGA, the on-chip memory of the FPGA will be used to provide data for the computing logic. Due to the limited rated resources of the on-chip memory of the FPGA, when performing inference operations on the neural network model, the neural network model may not be completely written into the on-chip memory of a single FPGA, and it is difficult to ensure the overall performance of the inference operation of the neural network model based on the FPGA. efficient.
为此,本申请的核心是提供一种基于FPGA的神经网络运算方法,以相对确保基于FPGA执行神经网络模型的推理运算的整体效率。Therefore, the core of the present application is to provide an FPGA-based neural network operation method to relatively ensure the overall efficiency of the inference operation of the neural network model based on the FPGA.
为了使本技术领域的人员更好地理解本申请方案,下面结合附图和具体实施方式对本申请作进一步的详细说明。In order to make those skilled in the art better understand the solution of the present application, the present application will be further described in detail below with reference to the accompanying drawings and specific embodiments.
请参见图1所示,本申请实施例公开了一种基于FPGA的神经网络运算方法,包括:Referring to FIG. 1, an embodiment of the present application discloses an FPGA-based neural network computing method, including:
步骤S10:获取神经网络模型。Step S10: Obtain a neural network model.
需要说明的是,本步骤中获取的神经网络模型指的是用于执行推理运算的神经网络模型,也就是说,本步骤中获取的神经网络模型是通过卷积神经网络训练后,用于对实际场景下的数据进行分析的模型。It should be noted that the neural network model obtained in this step refers to the neural network model used to perform inference operations, that is to say, the neural network model obtained in this step is used for A model for analyzing data in real scenarios.
步骤S11:统计多个FPGA对应的片上内存容量。Step S11: Count on-chip memory capacities corresponding to multiple FPGAs.
需要说明的是,本步骤的重点在于统计多个FPGA各自的片上内存容量,也就是说,本实施例中,用于执行神经网络运算的FPGA的数量应大于1,在此基础上,每一个FPGA均提供各自的片上内存资源以用于对神经网络模型进行协同运算。本步骤统计多个FPGA对应的片上内存容量的目的是在后续步骤中,根据不同FPGA的片上内存容量对该FPGA的片上内存分配相应数据量的子模型。It should be noted that the focus of this step is to count the respective on-chip memory capacities of multiple FPGAs. That is, in this embodiment, the number of FPGAs used to perform neural network operations should be greater than 1. On this basis, each Each FPGA provides its own on-chip memory resources for cooperative operations on the neural network model. The purpose of counting the on-chip memory capacities corresponding to multiple FPGAs in this step is to allocate sub-models of corresponding data amounts to the on-chip memory of the FPGA according to the on-chip memory capacities of different FPGAs in subsequent steps.
另外,需要说明的是,由于获取神经网络模型的步骤以及统计多个FPGA对应的片上内存容量的步骤之间没有执行顺序上的关联,因此本实施例中的步骤S10与步骤S11之间的执行顺序不固定,也可同时执行,应根据实际情况而定,在此不做具体限定。In addition, it should be noted that, since there is no correlation in execution order between the steps of acquiring the neural network model and the steps of counting on-chip memory capacities corresponding to multiple FPGAs, the execution between step S10 and step S11 in this embodiment is The sequence is not fixed, and can be executed simultaneously, which should be determined according to the actual situation, and is not specifically limited here.
步骤S12:根据各FPGA的片上内存容量,将神经网络模型拆分为具有相应数据量的子模型。Step S12: According to the on-chip memory capacity of each FPGA, the neural network model is divided into sub-models with corresponding data amounts.
其中,各子模型的数据量不大于所对应的FPGA的片上内存容量。Wherein, the data volume of each sub-model is not greater than the on-chip memory capacity of the corresponding FPGA.
在统计多个FPGA对应的片上内存容量之后,本步骤进一步根据各FPGA的片上内存容量将神经网络模型拆分为具有相应数据量的子模型, 也就是说,本步骤中,依照各FPGA的片上内存容量,将完整的神经网络模型拆分为与各FPGA的片上内存容量匹配的子模型,各个子模型能够组合为完整的神经网络模型。需要强调的是,各子模型的数据量均不大于其所对应的FPGA的片上内存容量,也就是说,针对FPGA的片上内存容量拆分得到的子模型的数据量应小于或等于该FPGA的片上内存容量,以此确保FPGA能够正常执行对子模型的运算。After counting the on-chip memory capacities corresponding to multiple FPGAs, this step further divides the neural network model into sub-models with corresponding data amounts according to the on-chip memory capacity of each FPGA. That is, in this step, according to the on-chip memory capacity of each FPGA Memory capacity, split the complete neural network model into sub-models that match the on-chip memory capacity of each FPGA, and each sub-model can be combined into a complete neural network model. It should be emphasized that the data volume of each sub-model is not greater than the on-chip memory capacity of its corresponding FPGA, that is, the data volume of the sub-model obtained by splitting the on-chip memory capacity of the FPGA should be less than or equal to the FPGA's on-chip memory capacity. On-chip memory capacity to ensure that the FPGA can properly perform operations on the sub-model.
步骤S13:将子模型分配至对应FPGA的片上内存。Step S13: Allocate the sub-model to the on-chip memory corresponding to the FPGA.
在根据各FPGA的片上内存容量,将神经网络模型拆分为具有相应数据量的子模型之后,本步骤进一步将子模型分配至对应FPGA的片上内存,以此在后续步骤中,通过各FPGA分别执行片上内存的子模型。After dividing the neural network model into sub-models with corresponding data amounts according to the on-chip memory capacity of each FPGA, this step further allocates the sub-models to the on-chip memory corresponding to the FPGA, so that in the subsequent steps, each FPGA is used to separate the sub-models. Executes a submodel of on-chip memory.
步骤S14:根据各子模型之间的执行顺序设定相应各FPGA之间的数据流向,并根据执行顺序依次控制各FPGA基于相应的子模型执行神经网络运算。Step S14: Set the data flow direction between the corresponding FPGAs according to the execution sequence between the sub-models, and sequentially control the FPGAs to execute the neural network operation based on the corresponding sub-models according to the execution sequence.
在将子模型分配至对应FPGA的片上内存之后,本步骤进一步根据各子模型之间的执行顺序设定相应各FPGA之间的数据流向,并根据执行顺序依次控制各FPGA基于相应的子模型执行神经网络运算,目的是确保FPGA之间具有数据的流动,并且FPGA之间的数据流向与相应各FPGA中子模型的执行顺序一致,目的是确保依次执行各个子模型运算的效果与执行完整的神经网络模型的效果一致,从而确保FPGA的神经网络运算的可靠性。After allocating the sub-models to the on-chip memory of the corresponding FPGA, this step further sets the data flow direction between the corresponding FPGAs according to the execution sequence between the sub-models, and sequentially controls the FPGAs to execute based on the corresponding sub-models according to the execution sequence The purpose of neural network operation is to ensure that there is data flow between FPGAs, and the data flow between FPGAs is consistent with the execution order of the corresponding sub-models in each FPGA. The effect of the network model is consistent, thereby ensuring the reliability of the neural network operation of the FPGA.
本申请所提供的基于FPGA的神经网络运算方法,获取神经网络模型并统计多个FPGA对应的片上内存容量,进而根据各FPGA的片上内存容量,将神经网络模型拆分为具有相应数据量的子模型,其中,各子模型的数据量不大于所对应的FPGA的片上内存容量,以此进一步将子模型分配至对应FPGA的片上内存,根据各子模型之间的执行顺序设定相应各FPGA之间的数据流向,进而根据执行顺序依次控制各FPGA基于相应的子模型执行神经网络运算。本方法采用多个FPGA共同提供片上内存的资源的方式,通过将完整的神经网络模型划分为多个子模型,并将子模型分配至多个FPGA的片上内存的方式,进一步避免了单个FPGA片上内存的额定资 源相对有限的问题,进一步确保了基于FPGA执行神经网络模型的推理运算的整体效率。The FPGA-based neural network computing method provided by the present application acquires a neural network model and counts the on-chip memory capacity corresponding to multiple FPGAs, and then divides the neural network model into subsections with corresponding data amounts according to the on-chip memory capacity of each FPGA. model, in which the data volume of each sub-model is not greater than the on-chip memory capacity of the corresponding FPGA, so as to further allocate the sub-models to the on-chip memory of the corresponding FPGA, and set the corresponding FPGA according to the execution order between the sub-models. According to the execution sequence, each FPGA is controlled to execute the neural network operation based on the corresponding sub-model. In this method, multiple FPGAs jointly provide on-chip memory resources, and by dividing the complete neural network model into multiple sub-models, and assigning the sub-models to the on-chip memories of multiple FPGAs, further avoiding the need for a single FPGA on-chip memory. The problem of relatively limited rated resources further ensures the overall efficiency of inference operations based on FPGAs for executing neural network models.
在上述实施例的基础上,作为一种优选的实施方式,在统计多个FPGA对应的片上内存容量之前,方法还包括:On the basis of the foregoing embodiment, as a preferred implementation manner, before counting the on-chip memory capacities corresponding to multiple FPGAs, the method further includes:
判断神经网络模型的数据量是否大于一个FPGA的片上内存容量;Determine whether the data volume of the neural network model is greater than the on-chip memory capacity of an FPGA;
如果神经网络模型的数据量大于一个FPGA的片上内存容量,则执行统计多个FPGA对应的片上内存容量的步骤。If the data volume of the neural network model is larger than the on-chip memory capacity of one FPGA, perform the step of counting the on-chip memory capacities corresponding to multiple FPGAs.
本实施方式的重点在于,在获取到神经网络模型之后,进一步判断神经网络模型的数据量是否大于一个FPGA的片上内存容量,进而只有当神经网络模型的数据量大于一个FPGA的片上内存容量,执行统计多个FPGA对应的片上内存容量的步骤,而当神经网络模型的数据量小于或等于一个FPGA的片上内存容量时,仅需要将该审计网络模型完整分配至一个FPGA的片上内存即可。本实施方式进一步确保了基于FPGA实现神经网络运算过程的灵活性。The key point of this embodiment is that after acquiring the neural network model, it is further judged whether the data volume of the neural network model is greater than the on-chip memory capacity of an FPGA, and then only when the data volume of the neural network model is greater than the on-chip memory capacity of an FPGA, execute The step of counting the on-chip memory capacity corresponding to multiple FPGAs, and when the data volume of the neural network model is less than or equal to the on-chip memory capacity of one FPGA, it is only necessary to completely allocate the audit network model to the on-chip memory of one FPGA. This embodiment further ensures the flexibility of implementing the neural network operation process based on the FPGA.
请参见图2所示,本申请实施例公开了一种基于FPGA的神经网络运算方法,包括:Referring to FIG. 2 , an embodiment of the present application discloses an FPGA-based neural network computing method, including:
步骤S20:获取神经网络模型。Step S20: Obtain a neural network model.
步骤S21:统计多个FPGA对应的片上内存容量。Step S21: Count on-chip memory capacities corresponding to multiple FPGAs.
步骤S22:统计神经网络模型中各网络层的层数据量。Step S22: Count the layer data amount of each network layer in the neural network model.
需要说明的是,本实施例的重点是以神经网络模型中的网络层为单位对神经网络模型进行子模型的拆分。It should be noted that the focus of this embodiment is to divide the neural network model into sub-models in units of network layers in the neural network model.
本步骤中,首先统计神经网络模型中各网络层的层数据量,目的是在后续步骤中,根据FPGA片上内存容量向FPGA划分由相应数量的网络层构成的子模型。In this step, the layer data amount of each network layer in the neural network model is counted first, and the purpose is to divide the sub-model composed of a corresponding number of network layers to the FPGA according to the FPGA on-chip memory capacity in the subsequent steps.
步骤S23:基于各网络层的层数据量以及各FPGA的片上内存容量,依次计算各FPGA对应的目标网络层。Step S23: Based on the layer data amount of each network layer and the on-chip memory capacity of each FPGA, sequentially calculate the target network layer corresponding to each FPGA.
在统计神经网络模型中各网络层的层数据量之后,本步骤进一步基于 各网络层中数据的数据量,即层数据量以及各FPGA的片上内存容量,依次计算各FPGA对应的目标网络层,此处各FPGA对应的目标网络层的数量根据相应FPGA的片上内存容量而定。After calculating the layer data volume of each network layer in the neural network model, this step further calculates the target network layer corresponding to each FPGA in turn based on the data volume of each network layer, that is, the layer data volume and the on-chip memory capacity of each FPGA. Here, the number of target network layers corresponding to each FPGA is determined according to the on-chip memory capacity of the corresponding FPGA.
步骤S24:在神经网络模型中拆分得到各目标网络层对应的子模型。Step S24: Split the neural network model to obtain sub-models corresponding to each target network layer.
其中,各子模型的数据量不大于所对应的FPGA的片上内存容量。Wherein, the data volume of each sub-model is not greater than the on-chip memory capacity of the corresponding FPGA.
需要说明的是,在依次计算各FPGA对应的目标网络层之后,本步骤进一步在神经网络模型中划分具有目标网络层的子模型。It should be noted that, after calculating the target network layers corresponding to each FPGA in turn, this step further divides the neural network model into sub-models with target network layers.
作为一种优选的实施方式,当FPGA对应的目标网络层的数量大于1并且目标网络层相邻时,在神经网络模型中拆分得到各目标网络层对应的子模型,可以具体是,在神经网络模型中拆分得到均包含各目标网络层的子模型,以此相对减少子模型的数量,进而在控制各FPGA基于相应的子模型执行神经网络运算时对子模型的调用次数,提高执行神经网络运算的效率。As a preferred embodiment, when the number of target network layers corresponding to the FPGA is greater than 1 and the target network layers are adjacent, split the neural network model to obtain sub-models corresponding to each target network layer. The network model is split to obtain sub-models that include each target network layer, so as to relatively reduce the number of sub-models, and then control the number of calls to the sub-models when each FPGA performs neural network operations based on the corresponding sub-models, improving the execution neural network. The efficiency of network operations.
步骤S25:将子模型分配至对应FPGA的片上内存。Step S25: Allocate the sub-model to the on-chip memory corresponding to the FPGA.
步骤S26:根据各子模型之间的执行顺序设定相应各FPGA之间的数据流向,并根据执行顺序依次控制各FPGA基于相应的子模型执行神经网络运算。Step S26: Set the data flow direction between the corresponding FPGAs according to the execution sequence between the sub-models, and sequentially control the FPGAs to perform neural network operations based on the corresponding sub-models according to the execution sequence.
本实施例进一步确保了子模型中所包含的神经网络模型片段的网络完整性,相对提高了控制各FPGA基于相应的子模型执行神经网络运算时的可靠性。This embodiment further ensures the network integrity of the neural network model segments included in the sub-model, and relatively improves the reliability of controlling each FPGA to perform neural network operations based on the corresponding sub-model.
下面通过具体场景对本实施方式进行说明。The present embodiment will be described below through specific scenarios.
基于FPGA执行神经网络运算之前,需要得到可使用FPGA设备数和各个FPGA设备的片上内存容量,以方便模型拆分。假设FPGA设备有N(N>1)个可用,内存大小分别为M[1]…M[N],神经网络模型的层数为len,各层的参数为W[1]…W[len],各层的中间数据量为A[1]…A[len],那么可以用循环累加对比,当累加到第i层时,数据量大于第一个FPGA设备的内存,那么将神经网络模型中包含1~i-1层的子模型赋值给第一个FPGA设备,然后从i开始再次循环累加对比,直到第j层大于第二个FPGA设备,将神经网络模型中包含i~j-1层的子模型赋值给第二个FPGA设备, 然后依次类推,直到将所有的参数均分配到FPGA设备中。若FPGA设备不足,那么按照循环次续,循环对各个设备分配子模型,先放在片外内存,当该设备计算完毕后,将待执行运算的子模型读入FPGA设备的片上内存,以此根据执行顺序依次控制各FPGA基于相应的子模型执行神经网络运算。Before performing neural network operations based on FPGA, it is necessary to obtain the number of FPGA devices that can be used and the on-chip memory capacity of each FPGA device to facilitate model splitting. Suppose there are N (N>1) FPGA devices available, the memory sizes are M[1]...M[N], the number of layers of the neural network model is len, and the parameters of each layer are W[1]...W[len] , the intermediate data volume of each layer is A[1]…A[len], then it can be compared by cyclic accumulation. When it is accumulated to the i-th layer, the data volume is greater than the memory of the first FPGA device, then the neural network model The sub-models containing layers 1 to i-1 are assigned to the first FPGA device, and then cyclically accumulated and compared again starting from i, until the jth layer is greater than the second FPGA device, and the neural network model includes layers i to j-1. The submodel of is assigned to the second FPGA device, and so on, until all parameters are assigned to the FPGA device. If the FPGA device is insufficient, the sub-model is allocated to each device according to the cycle, and the sub-model is first placed in the off-chip memory. According to the execution sequence, each FPGA is controlled to execute the neural network operation based on the corresponding sub-model.
在上述实施例的基础上,作为一种优选的实施方式,层数据量包括参数量以及过程数据量;其中,过程数据量为相应网络层执行神经网络运算过程中产生数据的数据量。On the basis of the above embodiment, as a preferred implementation, the layer data volume includes parameter volume and process data volume; wherein, the process data volume is the data volume of data generated during the neural network operation performed by the corresponding network layer.
需要说明的是,本实施例的重点在于神经网络模型中各网络层的层数据量进一步包含各网络层的参数量以及过程数据量,其中,参数量指的是网络层执行神经网络运算时所依照参数的数据量,过程数据量指的是网络层执行神经网络运算过程所产生数据的数据量。本实施方式通过进一步对层数据量进行细化,进一步确保了基于各网络层的层数据量以及各FPGA的片上内存容量,依次计算各FPGA对应的目标网络层过程的整体准确性,进而确保基于FPGA的神经网络运算的可靠性。It should be noted that, the key point of this embodiment is that the layer data volume of each network layer in the neural network model further includes the parameter volume and process data volume of each network layer, wherein the parameter volume refers to the network layer when performing the neural network operation. According to the data volume of the parameters, the process data volume refers to the data volume of the data generated by the network layer performing the neural network operation process. By further refining the layer data volume, this implementation further ensures that the overall accuracy of the target network layer process corresponding to each FPGA is sequentially calculated based on the layer data volume of each network layer and the on-chip memory capacity of each FPGA. The reliability of FPGA neural network operations.
更进一步的,作为一种优选的实施方式,统计神经网络模型中各网络层的参数量以及过程数据量,包括:Further, as a preferred embodiment, the parameter amount and the process data amount of each network layer in the neural network model are counted, including:
基于神经网络模型中各网络层的过滤器数量、通道数量以及卷积核尺寸统计得到相应的参数量,以及基于各网络层中过滤器数量以及中间数据的尺寸统计得到相应的过程数据量。The corresponding parameter amount is obtained based on the number of filters, the number of channels and the size of the convolution kernel in each network layer in the neural network model, and the corresponding process data amount is obtained based on the number of filters in each network layer and the size of the intermediate data.
需要说明的是,在本实施方式中,神经网络模型中各网络层的参数量基于相应网络层中的过滤器数量、通道数量以及卷积核尺寸统计得到,各网络层的过程数据量基于相应网络层中的过滤器数量以及中间数据的尺寸统计得到。本实施方式进一步确保了计算各FPGA对应的目标网络层过程的整体准确性,进而确保基于FPGA的神经网络运算的可靠性。It should be noted that, in this embodiment, the parameter quantities of each network layer in the neural network model are statistically obtained based on the number of filters, the number of channels, and the size of the convolution kernel in the corresponding network layer, and the amount of process data of each network layer is based on the corresponding The number of filters in the network layer and the size of the intermediate data are counted. This embodiment further ensures the overall accuracy of the process of calculating the target network layer corresponding to each FPGA, thereby ensuring the reliability of the FPGA-based neural network operation.
在本实施方式技术方案的基础上,具体的,当神经网络模型中某一网络层的通道数量为C,过滤器数量为N,卷积核的长和宽分别为K,即卷积核的尺寸为K*K时,则该网络层的参数量为C*K*K*N;当神经网络模 型中某一网络层的过滤器数量为N,中间数据的长和宽分别为W、H,即中间数据的尺寸为W*H时,则该网络层的参数量为N*W*H。On the basis of the technical solution of this embodiment, specifically, when the number of channels of a certain network layer in the neural network model is C, the number of filters is N, and the length and width of the convolution kernel are K respectively, that is, the length of the convolution kernel is K. When the size is K*K, the parameter quantity of the network layer is C*K*K*N; when the number of filters of a network layer in the neural network model is N, the length and width of the intermediate data are W, H respectively , that is, when the size of the intermediate data is W*H, the parameter quantity of the network layer is N*W*H.
在上述一系列实施方式的基础上,作为一种优选的实施方式,当神经网络模型中参数的数据类型为浮点型时,在将神经网络模型拆分为具有相应数据量的子模型之前,方法还包括:On the basis of the above-mentioned series of implementations, as a preferred implementation, when the data type of the parameters in the neural network model is a floating-point type, before dividing the neural network model into sub-models with corresponding data amounts, Methods also include:
将神经网络模型中参数的数据类型由浮点型转化为定点型;Convert the data type of the parameters in the neural network model from floating-point to fixed-point;
将神经网络模型拆分为具有相应数据量的子模型,包括:Split the neural network model into sub-models with corresponding amounts of data, including:
将参数的数据类型转化后的神经网络模型拆分为具有相应数据量的子模型。The neural network model after parameter data type conversion is divided into sub-models with corresponding data amount.
需要说明的是,本实施例的重点是当神经网络模型中参数的数据类型为浮点型时,在将神经网络模型拆分为具有相应数据量的子模型之前,将该神经网络模型中参数的数据类型由浮点型转化为定点型,以此相对降低神经网络模型的整体数据量,进而能够相对减少基于FPGA执行神经网络运算的整体数据量,相对提高神经网络运算的整体效率。It should be noted that the focus of this embodiment is that when the data type of the parameters in the neural network model is a floating point type, before dividing the neural network model into sub-models with corresponding data amounts, the parameters in the neural network model are The data type of FPGA is converted from floating-point type to fixed-point type, which relatively reduces the overall data volume of the neural network model, which can relatively reduce the overall data volume of neural network operations based on FPGA, and relatively improve the overall efficiency of neural network operations.
更进一步的,将神经网络模型中参数的数据类型由浮点型转化为定点型,包括:Further, convert the data type of the parameters in the neural network model from floating-point to fixed-point, including:
获取神经网络模型中各通道的最大参数值;Obtain the maximum parameter value of each channel in the neural network model;
根据各最大参数值计算相应各通道的量化系数;Calculate the quantization coefficients of the corresponding channels according to the maximum parameter values;
基于各量化系数将神经网络模型中相应通道内参数的数据类型由浮点型转化为定点型。Based on each quantization coefficient, the data type of the parameters in the corresponding channel in the neural network model is converted from floating point type to fixed point type.
需要说明的是,本实施方式在将神经网络模型中参数的数据类型由浮点型转化为定点型时,基于神经网络模型中各通道的最大参数值计算相应各通道的量化系数,进而将各量化系数作为数据类型之间的转化权值,即根据各量化系数将神经网络模型中相应通道内参数的数据类型由浮点型转化为定点型。本实施方式根据神经网络模型中各通道的最大参数值作为将通道内参数的数据类型由浮点型转化为定点型时,定点型数据的数据位数范围的限制因素,进一步确保了将神经网络模型中参数的数据类型由浮点型转化为定点型的准确性。It should be noted that in this embodiment, when the data type of the parameters in the neural network model is converted from floating-point type to fixed-point type, the quantization coefficient of each channel is calculated based on the maximum parameter value of each channel in the neural network model, and then the quantization coefficient of each channel is calculated. The quantization coefficient is used as a conversion weight between data types, that is, according to each quantization coefficient, the data type of the parameters in the corresponding channel in the neural network model is converted from a floating point type to a fixed point type. In this embodiment, the maximum parameter value of each channel in the neural network model is used as the limiting factor of the data bit range of the fixed-point data when the data type of the parameters in the channel is converted from floating-point to fixed-point, which further ensures that the neural network The accuracy of converting the data types of parameters in the model from floating point to fixed point.
下面通过具体场景对本实施方式进行说明。The present embodiment will be described below through specific scenarios.
神经网络模型分为两部分,一部分为神经网络模型中的参数,在此称为权值;另一部分为神经网络模型内部处理的中间数据,在此称为激活值。对神经网络模型量化即为将模型数据从浮点转换为定点数据,首先需要将权值量化为2 n,再进行激活值的量化。 The neural network model is divided into two parts, one part is the parameters in the neural network model, which are called weights here; the other part is the intermediate data processed inside the neural network model, which is called activation values here. The quantization of the neural network model is to convert the model data from floating point to fixed point data. First, the weights need to be quantized to 2 n , and then the activation values are quantized.
对权值量化需要选取适合的权值阈值结合重新训练,将神经网络模型中的参数转换为定点(INT)类型,因直接全部量化会导致精度略有下降,需要逐步将模型量化,然后重新训练剩下的参数。步骤如下:For weight quantization, it is necessary to select a suitable weight threshold combined with retraining, and convert the parameters in the neural network model to fixed-point (INT) type. Because direct full quantization will lead to a slight decrease in accuracy, the model needs to be gradually quantized and then retrained. the remaining parameters. Proceed as follows:
将权值量化为±2 n,即±1,±0.5,±0.25…,因权值越大,作用越大,因此从最大值开始量化,首先量化前25%,将值量化为2的n次幂,剩下的值进行重新训练,使得精度不降,然后保持此数据不变,量化接下来的25%最大值,剩下数据重训练,使得精度不下降,依次循环,直到所有权值均为2的n次幂。此时的模型权值全部为INT值,接下来处理激活值。 Quantize the weights to ± 2n , that is, ±1, ±0.5, ±0.25..., because the larger the weight, the greater the effect, so start quantization from the maximum value, first quantize the first 25%, and quantize the value to n of 2 power, the remaining values are retrained so that the accuracy does not drop, then keep the data unchanged, quantify the next 25% of the maximum value, retrain the remaining data so that the accuracy does not drop, and cycle in turn until all values are equal. to the nth power of 2. At this time, the model weights are all INT values, and the activation values are processed next.
激活值的量化也需要确定其阈值范围,由于在不同任务中的阈值不同,因此需要结合数据集获得阈值范围,然后获得阈值范围。具体如下:The quantification of the activation value also needs to determine its threshold range. Since the thresholds in different tasks are different, it is necessary to combine the dataset to obtain the threshold range, and then obtain the threshold range. details as follows:
从数据集的类别中各挑选一张图片,使用量化后网络进行推理,得到所有通道中每个元素的最大值,此为激活值和类别信息表,据此为下一步的比较依据。Pick a picture from each category of the dataset, use the quantized network to infer, and get the maximum value of each element in all channels, which is the activation value and category information table, which is the basis for comparison in the next step.
将上步中的数据按照通道作对比,得到各通道元素的最大值,使用该值乘以2的m次幂,保证乘积不超过2 n-1,记录该m值。此为各通道的激活值的量化系数。至此得到各通道的量化系数,再次按照层计算m均值,对该层的所有大于均值m的取m,小于均值m的取原值。此量化算法高效可行,可大幅降低神经网络模型的数据量。 Compare the data in the previous step by channel to obtain the maximum value of each channel element, multiply this value by 2 to the power of m to ensure that the product does not exceed 2 n -1, and record the m value. This is the quantization coefficient of the activation value of each channel. At this point, the quantization coefficients of each channel are obtained, and the average value of m is calculated again according to the layer. All the values of the layer greater than the average value m are taken as m, and the original values of the values smaller than the average value m are taken. This quantization algorithm is efficient and feasible, and can greatly reduce the data volume of the neural network model.
请参见图3所示,本申请实施例提供了一种基于FPGA的神经网络运算装置,包括:Referring to FIG. 3 , an embodiment of the present application provides an FPGA-based neural network computing device, including:
模型获取模块10,用于获取神经网络模型;a model obtaining module 10, for obtaining a neural network model;
内存统计模块11,用于统计多个FPGA对应的片上内存容量;The memory statistics module 11 is used to count the on-chip memory capacities corresponding to multiple FPGAs;
模型拆分模块12,用于根据各FPGA的片上内存容量,将神经网络模 型拆分为具有相应数据量的子模型;其中,各子模型的数据量不大于所对应的FPGA的片上内存容量; Model splitting module 12 is used to split the neural network model into a sub-model with corresponding data volume according to the on-chip memory capacity of each FPGA; wherein, the data volume of each sub-model is not greater than the on-chip memory capacity of corresponding FPGA;
模型分配模块13,用于将子模型分配至对应FPGA的片上内存;The model allocation module 13 is used for allocating the sub-model to the on-chip memory corresponding to the FPGA;
模型执行模块14,用于根据各子模型之间的执行顺序设定相应各FPGA之间的数据流向,并根据执行顺序依次控制各FPGA基于相应的子模型执行神经网络运算。The model execution module 14 is configured to set the data flow direction between the corresponding FPGAs according to the execution sequence between the sub-models, and sequentially control the FPGAs to perform neural network operations based on the corresponding sub-models according to the execution sequence.
本申请所提供的基于FPGA的神经网络运算装置,获取神经网络模型并统计多个FPGA对应的片上内存容量,进而根据各FPGA的片上内存容量,将神经网络模型拆分为具有相应数据量的子模型,其中,各子模型的数据量不大于所对应的FPGA的片上内存容量,以此进一步将子模型分配至对应FPGA的片上内存,根据各子模型之间的执行顺序设定相应各FPGA之间的数据流向,进而根据执行顺序依次控制各FPGA基于相应的子模型执行神经网络运算。本装置采用多个FPGA共同提供片上内存的资源的方式,通过将完整的神经网络模型划分为多个子模型,并将子模型分配至多个FPGA的片上内存的方式,进一步避免了单个FPGA片上内存的额定资源相对有限的问题,进一步确保了基于FPGA执行神经网络模型的推理运算的整体效率。The FPGA-based neural network computing device provided by the present application acquires a neural network model and counts the on-chip memory capacities corresponding to multiple FPGAs, and then divides the neural network model into sub-systems with corresponding data amounts according to the on-chip memory capacity of each FPGA. model, in which the data volume of each sub-model is not greater than the on-chip memory capacity of the corresponding FPGA, so as to further allocate the sub-models to the on-chip memory of the corresponding FPGA, and set the corresponding FPGA according to the execution order between the sub-models. According to the execution sequence, each FPGA is controlled to execute the neural network operation based on the corresponding sub-model. The device adopts the way that multiple FPGAs jointly provide on-chip memory resources. By dividing the complete neural network model into multiple sub-models, and assigning the sub-models to the on-chip memories of multiple FPGAs, the on-chip memory of a single FPGA is further avoided. The problem of relatively limited rated resources further ensures the overall efficiency of inference operations based on FPGAs for executing neural network models.
此外,本申请实施例还提供一种基于FPGA的神经网络运算设备,包括:In addition, an embodiment of the present application also provides an FPGA-based neural network computing device, including:
存储器,用于存储计算机程序;memory for storing computer programs;
处理器,用于执行计算机程序时实现如上述的基于FPGA的神经网络运算方法的步骤。The processor is configured to implement the steps of the above-mentioned FPGA-based neural network operation method when executing the computer program.
本申请所提供的基于FPGA的神经网络运算设备,获取神经网络模型并统计多个FPGA对应的片上内存容量,进而根据各FPGA的片上内存容量,将神经网络模型拆分为具有相应数据量的子模型,其中,各子模型的数据量不大于所对应的FPGA的片上内存容量,以此进一步将子模型分配至对应FPGA的片上内存,根据各子模型之间的执行顺序设定相应各FPGA之间的数据流向,进而根据执行顺序依次控制各FPGA基于相应的子模型 执行神经网络运算。本设备采用多个FPGA共同提供片上内存的资源的方式,通过将完整的神经网络模型划分为多个子模型,并将子模型分配至多个FPGA的片上内存的方式,进一步避免了单个FPGA片上内存的额定资源相对有限的问题,进一步确保了基于FPGA执行神经网络模型的推理运算的整体效率。The FPGA-based neural network computing device provided by the present application acquires a neural network model and counts the on-chip memory capacities corresponding to multiple FPGAs, and then divides the neural network model into subsections with corresponding data amounts according to the on-chip memory capacity of each FPGA. model, in which the data volume of each sub-model is not greater than the on-chip memory capacity of the corresponding FPGA, so as to further allocate the sub-models to the on-chip memory of the corresponding FPGA, and set the corresponding FPGA according to the execution order between the sub-models. According to the execution sequence, each FPGA is controlled to execute the neural network operation based on the corresponding sub-model. This device uses multiple FPGAs to jointly provide on-chip memory resources. By dividing the complete neural network model into multiple sub-models, and assigning the sub-models to the on-chip memories of multiple FPGAs, the on-chip memory of a single FPGA is further avoided. The problem of relatively limited rated resources further ensures the overall efficiency of inference operations based on FPGAs for executing neural network models.
此外,本申请实施例还提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现如上述的基于FPGA的神经网络运算方法的步骤。In addition, embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the above-mentioned FPGA-based neural network computing method are implemented.
本申请所提供的计算机可读存储介质,获取神经网络模型并统计多个FPGA对应的片上内存容量,进而根据各FPGA的片上内存容量,将神经网络模型拆分为具有相应数据量的子模型,其中,各子模型的数据量不大于所对应的FPGA的片上内存容量,以此进一步将子模型分配至对应FPGA的片上内存,根据各子模型之间的执行顺序设定相应各FPGA之间的数据流向,进而根据执行顺序依次控制各FPGA基于相应的子模型执行神经网络运算。本计算机可读存储介质采用多个FPGA共同提供片上内存的资源的方式,通过将完整的神经网络模型划分为多个子模型,并将子模型分配至多个FPGA的片上内存的方式,进一步避免了单个FPGA片上内存的额定资源相对有限的问题,进一步确保了基于FPGA执行神经网络模型的推理运算的整体效率。The computer-readable storage medium provided by the present application acquires a neural network model and counts the on-chip memory capacities corresponding to multiple FPGAs, and then divides the neural network model into sub-models with corresponding data volumes according to the on-chip memory capacities of each FPGA, Among them, the data amount of each sub-model is not greater than the on-chip memory capacity of the corresponding FPGA, so as to further allocate the sub-models to the on-chip memory of the corresponding FPGA, and set the corresponding FPGA according to the execution order between the sub-models. Data flow, and then control each FPGA to perform neural network operations based on the corresponding sub-models in turn according to the execution sequence. The computer-readable storage medium adopts the way that multiple FPGAs jointly provide on-chip memory resources, and further avoids the need for a single The problem of the relatively limited rated resources of the FPGA on-chip memory further ensures the overall efficiency of the inference operation of the neural network model based on the FPGA.
以上对本申请所提供的一种基于FPGA的神经网络运算方法、装置、设备及存储介质进行了详细介绍。说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围内。The FPGA-based neural network computing method, device, device, and storage medium provided by the present application have been described above in detail. The various embodiments in the specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments, and the same and similar parts between the various embodiments can be referred to each other. As for the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant part can be referred to the description of the method. It should be pointed out that for those of ordinary skill in the art, without departing from the principles of the present application, several improvements and modifications can also be made to the present application, and these improvements and modifications also fall within the protection scope of the claims of the present application.
还需要说明的是,在本说明书中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should also be noted that, in this specification, relational terms such as first and second, etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities or operations. There is no such actual relationship or sequence between operations. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device that includes a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

Claims (10)

  1. 一种基于FPGA的神经网络运算方法,其特征在于,包括:An FPGA-based neural network computing method, comprising:
    获取神经网络模型;Get the neural network model;
    统计多个FPGA对应的片上内存容量;Count the on-chip memory capacity corresponding to multiple FPGAs;
    根据各所述FPGA的片上内存容量,将所述神经网络模型拆分为具有相应数据量的子模型;其中,各所述子模型的数据量不大于所对应的所述FPGA的片上内存容量;According to the on-chip memory capacity of each of the FPGAs, the neural network model is divided into sub-models with corresponding data volumes; wherein, the data volume of each of the sub-models is not greater than the corresponding on-chip memory capacity of the FPGA;
    将所述子模型分配至对应所述FPGA的片上内存;Allocating the sub-model to the on-chip memory corresponding to the FPGA;
    根据各所述子模型之间的执行顺序设定相应各所述FPGA之间的数据流向,并根据所述执行顺序依次控制各所述FPGA基于相应的所述子模型执行神经网络运算。The data flow direction between the corresponding FPGAs is set according to the execution sequence between the sub-models, and each FPGA is controlled to perform neural network operations based on the corresponding sub-models in sequence according to the execution sequence.
  2. 根据权利要求1所述的基于FPGA的神经网络运算方法,其特征在于,所述根据各所述FPGA的片上内存容量,将所述神经网络模型拆分为具有相应数据量的子模型,包括:The FPGA-based neural network computing method according to claim 1, wherein, according to the on-chip memory capacity of each FPGA, the neural network model is divided into sub-models with corresponding data volumes, comprising:
    统计所述神经网络模型中各网络层的层数据量;Count the layer data volume of each network layer in the neural network model;
    基于各所述网络层的层数据量以及各所述FPGA的片上内存容量,依次计算各所述FPGA对应的目标网络层;Based on the layer data volume of each of the network layers and the on-chip memory capacity of each of the FPGAs, sequentially calculate the target network layer corresponding to each of the FPGAs;
    在所述神经网络模型中拆分得到各所述目标网络层对应的所述子模型。The sub-model corresponding to each target network layer is obtained by splitting the neural network model.
  3. 根据权利要求2所述的基于FPGA的神经网络运算方法,其特征在于,所述层数据量包括参数量以及过程数据量;其中,所述过程数据量为相应所述网络层执行神经网络运算过程中产生数据的数据量。The FPGA-based neural network computing method according to claim 2, wherein the layer data volume includes a parameter volume and a process data volume; wherein, the process data volume is the neural network computing process performed by the corresponding network layer The amount of data generated in the data.
  4. 根据权利要求3所述的基于FPGA的神经网络运算方法,其特征在于,统计所述神经网络模型中各网络层的参数量以及过程数据量,包括:The FPGA-based neural network computing method according to claim 3, wherein the statistics of the parameter quantities and process data quantities of each network layer in the neural network model include:
    基于所述神经网络模型中各所述网络层的过滤器数量、通道数量以及卷积核尺寸统计得到相应的所述参数量,以及基于各所述网络层中所述过滤器数量以及中间数据的尺寸统计得到相应的所述过程数据量。The corresponding parameter amount is obtained based on the number of filters, the number of channels and the size of the convolution kernel of each of the network layers in the neural network model, and the number of filters based on the number of filters and the intermediate data in each of the network layers is obtained. Dimensional statistics result in the corresponding amount of the process data.
  5. 根据权利要求1所述的基于FPGA的神经网络运算方法,其特征在于,在所述统计多个FPGA对应的片上内存容量之前,所述方法还包括:The FPGA-based neural network computing method according to claim 1, wherein, before the counting of the on-chip memory capacities corresponding to multiple FPGAs, the method further comprises:
    判断所述神经网络模型的数据量是否大于一个FPGA的片上内存容量;Determine whether the data volume of the neural network model is greater than the on-chip memory capacity of an FPGA;
    如果所述神经网络模型的数据量大于一个FPGA的片上内存容量,则执行所述统计多个FPGA对应的片上内存容量的步骤。If the data amount of the neural network model is greater than the on-chip memory capacity of one FPGA, the step of counting the on-chip memory capacities corresponding to multiple FPGAs is performed.
  6. 根据权利要求1至5任意一项所述的基于FPGA的神经网络运算方法,其特征在于,当所述神经网络模型中参数的数据类型为浮点型时,在所述将所述神经网络模型拆分为具有相应数据量的子模型之前,所述方法还包括:The FPGA-based neural network computing method according to any one of claims 1 to 5, wherein when the data type of the parameters in the neural network model is a floating point type, in the process of applying the neural network model Before splitting into sub-models with corresponding data amounts, the method further includes:
    将所述神经网络模型中参数的数据类型由所述浮点型转化为定点型;Converting the data type of the parameters in the neural network model from the floating point type to the fixed point type;
    所述将所述神经网络模型拆分为具有相应数据量的子模型,包括:The described dividing the neural network model into sub-models with corresponding data amounts, including:
    将参数的数据类型转化后的所述神经网络模型拆分为具有相应数据量的所述子模型。The neural network model converted from the data types of the parameters is divided into the sub-models with corresponding data amounts.
  7. 根据权利要求6所述的基于FPGA的神经网络运算方法,其特征在于,所述将所述神经网络模型中参数的数据类型由所述浮点型转化为定点型,包括:The FPGA-based neural network computing method according to claim 6, wherein the converting the data type of the parameters in the neural network model from the floating point type to the fixed point type comprises:
    获取所述神经网络模型中各通道的最大参数值;Obtain the maximum parameter value of each channel in the neural network model;
    根据各所述最大参数值计算相应各所述通道的量化系数;Calculate the quantization coefficients of the corresponding channels according to the maximum parameter values;
    基于各所述量化系数将所述神经网络模型中相应所述通道内参数的数据类型由所述浮点型转化为所述定点型。The data type of the corresponding in-channel parameter in the neural network model is converted from the floating point type to the fixed point type based on each of the quantization coefficients.
  8. 一种基于FPGA的神经网络运算装置,其特征在于,包括:An FPGA-based neural network computing device, comprising:
    模型获取模块,用于获取神经网络模型;A model acquisition module for acquiring the neural network model;
    内存统计模块,用于统计多个FPGA对应的片上内存容量;The memory statistics module is used to count the on-chip memory capacity corresponding to multiple FPGAs;
    模型拆分模块,用于根据各所述FPGA的片上内存容量,将所述神经网络模型拆分为具有相应数据量的子模型;其中,各所述子模型的数据量不大于所对应的所述FPGA的片上内存容量;A model splitting module is used to split the neural network model into sub-models with corresponding data volumes according to the on-chip memory capacity of each of the FPGAs; wherein, the data volume of each of the sub-models is not greater than the corresponding Describe the on-chip memory capacity of the FPGA;
    模型分配模块,用于将所述子模型分配至对应所述FPGA的片上内存;a model allocation module, for allocating the sub-model to the on-chip memory corresponding to the FPGA;
    模型执行模块,用于根据各所述子模型之间的执行顺序设定相应各所述FPGA之间的数据流向,并根据所述执行顺序依次控制各所述FPGA基于相应的所述子模型执行神经网络运算。A model execution module, configured to set the data flow direction between the corresponding FPGAs according to the execution sequence between the sub-models, and sequentially control the FPGAs to execute based on the corresponding sub-models according to the execution sequence Neural network operations.
  9. 一种基于FPGA的神经网络运算设备,其特征在于,包括:An FPGA-based neural network computing device, comprising:
    存储器,用于存储计算机程序;memory for storing computer programs;
    处理器,用于执行所述计算机程序时实现如权利要求1至7任一项所述的基于FPGA的神经网络运算方法的步骤。The processor is configured to implement the steps of the FPGA-based neural network operation method according to any one of claims 1 to 7 when executing the computer program.
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述的基于FPGA的神经网络运算方法的步骤。A computer-readable storage medium, characterized in that, a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the FPGA-based FPGA-based storage medium according to any one of claims 1 to 7 is implemented. The steps of the neural network operation method.
PCT/CN2021/076835 2020-06-30 2021-02-19 Fpga-based neural network operation method, apparatus, and device WO2022001126A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010614610.6A CN111860810A (en) 2020-06-30 2020-06-30 Neural network operation method, device and equipment based on FPGA
CN202010614610.6 2020-06-30

Publications (1)

Publication Number Publication Date
WO2022001126A1 true WO2022001126A1 (en) 2022-01-06

Family

ID=72989688

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/076835 WO2022001126A1 (en) 2020-06-30 2021-02-19 Fpga-based neural network operation method, apparatus, and device

Country Status (2)

Country Link
CN (1) CN111860810A (en)
WO (1) WO2022001126A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860810A (en) * 2020-06-30 2020-10-30 浪潮(北京)电子信息产业有限公司 Neural network operation method, device and equipment based on FPGA
CN114816752A (en) * 2022-04-26 2022-07-29 山东云海国创云计算装备产业创新中心有限公司 Memory management method, system, equipment and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109685202A (en) * 2018-12-17 2019-04-26 腾讯科技(深圳)有限公司 Data processing method and device, storage medium and electronic device
CN110413255A (en) * 2018-04-28 2019-11-05 北京深鉴智能科技有限公司 Artificial neural network method of adjustment and device
CN111274034A (en) * 2020-01-19 2020-06-12 北京奇艺世纪科技有限公司 Resource allocation method and device for model reasoning, computer equipment and storage medium
CN111860810A (en) * 2020-06-30 2020-10-30 浪潮(北京)电子信息产业有限公司 Neural network operation method, device and equipment based on FPGA

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110717574B (en) * 2018-07-11 2023-07-07 杭州海康威视数字技术股份有限公司 Neural network operation method and device and heterogeneous intelligent chip
CN110209472B (en) * 2018-08-29 2023-04-07 腾讯科技(深圳)有限公司 Task data processing method and board card
US20200117978A1 (en) * 2018-10-12 2020-04-16 Alibaba Group Holding Limited Systems and methods for efficiently mapping neural networks to programmable logic devices
CN109993296B (en) * 2019-04-01 2020-12-29 安徽寒武纪信息科技有限公司 Quantitative implementation method and related product
CN111027669A (en) * 2019-10-21 2020-04-17 浙江省北大信息技术高等研究院 Method and device for realizing deep neural network on field programmable gate array

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110413255A (en) * 2018-04-28 2019-11-05 北京深鉴智能科技有限公司 Artificial neural network method of adjustment and device
CN109685202A (en) * 2018-12-17 2019-04-26 腾讯科技(深圳)有限公司 Data processing method and device, storage medium and electronic device
CN111274034A (en) * 2020-01-19 2020-06-12 北京奇艺世纪科技有限公司 Resource allocation method and device for model reasoning, computer equipment and storage medium
CN111860810A (en) * 2020-06-30 2020-10-30 浪潮(北京)电子信息产业有限公司 Neural network operation method, device and equipment based on FPGA

Also Published As

Publication number Publication date
CN111860810A (en) 2020-10-30

Similar Documents

Publication Publication Date Title
CN108280514B (en) FPGA-based sparse neural network acceleration system and design method
US10915816B2 (en) System and method of executing neural networks
WO2022001126A1 (en) Fpga-based neural network operation method, apparatus, and device
CN108572873B (en) Load balancing method and device for solving Spark data inclination problem
CN110058883B (en) CNN acceleration method and system based on OPU
CN107832839B (en) Method and apparatus for performing operations in convolutional neural networks
US20180113649A1 (en) Data processing using resistive memory arrays
US11755683B2 (en) Flexible accelerator for sparse tensors (FAST) in machine learning
WO2020133317A1 (en) Computing resource allocation technology and neural network system
CN105607952B (en) Method and device for scheduling virtualized resources
US11797830B2 (en) Flexible accelerator for sparse tensors in convolutional neural networks
US20210326687A1 (en) Neural Network System and Data Processing Technology
WO2024007849A1 (en) Distributed training container scheduling for intelligent computing
CN113515351A (en) Resource scheduling implementation method based on energy consumption and QoS (quality of service) cooperative optimization
WO2022110860A1 (en) Hardware environment-based data operation method, apparatus and device, and storage medium
CN110874625A (en) Deep neural network quantification method and device
CN110069284B (en) Compiling method and compiler based on OPU instruction set
CN116720549A (en) FPGA multi-core two-dimensional convolution acceleration optimization method based on CNN input full cache
CN111176637A (en) Schedulability analysis method of AADL model based on cache preemption delay constraint
CN112561049B (en) Resource allocation method and device of DNN accelerator based on memristor
CN112183744A (en) Neural network pruning method and device
CN115689062B (en) Photovoltaic output power prediction method based on rapid online migration neural network
CN115712506A (en) Resource allocation method and accelerator
CN113723538B (en) Cross-platform power consumption performance prediction method and system based on hierarchical migration learning
CN112989270A (en) Convolution calculating device based on hybrid parallel

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21834363

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21834363

Country of ref document: EP

Kind code of ref document: A1