WO2022001126A1

WO2022001126A1 - Fpga-based neural network operation method, apparatus, and device

Info

Publication number: WO2022001126A1
Application number: PCT/CN2021/076835
Authority: WO
Inventors: 仝培霖; 朱克峰; 赵红博
Original assignee: 浪潮(北京)电子信息产业有限公司
Priority date: 2020-06-30
Filing date: 2021-02-19
Publication date: 2022-01-06
Also published as: CN111860810A

Abstract

An FPGA-based neural network operation method, apparatus, and device, and a storage medium. The method comprises: obtaining a neural network model (S10); counting on-chip memory capacities corresponding to a plurality of FPGAs (S11); according to the on-chip memory capacity of each FPGA, splitting the neural network model into sub-models having corresponding data volumes (S12), wherein the data volume of each sub-model is not greater than the on-chip memory capacity of the corresponding FPGA; distributing the sub-models to on-chip memories of the corresponding FPGAs (S13); and setting a data flow direction among the corresponding FPGAs according to an execution sequence among the sub-models, and sequentially controlling the FPGAs to execute neural network operation on the basis of the corresponding sub-models according to the execution sequence (S14). The method ensures the overall efficiency of executing the reasoning operation of the neural network model on the basis of the FPGAs.

Description

A neural network computing method, device and equipment based on FPGA

This application claims the priority of the Chinese patent application filed on June 30, 2020 with the application number CN202010614610.6 and the invention title "An FPGA-based neural network computing method, device and equipment", all of which are The contents are incorporated herein by reference.

technical field

The present application relates to the field of artificial intelligence, and in particular, to an FPGA-based neural network computing method, device, and device.

Background technique

Because FPGA (Field Programmable Gate Array, Field Programmable Gate Array) has the ability to perform data operations in parallel, and has a flexible structure, it can realize the pipeline design of data operations, so FPGA is often used to infer the neural network model obtained by training. operation.

The on-chip memory of the FPGA is a memory medium set on the FPGA chip, which has high data reading and writing efficiency. When the inference operation of the neural network model is performed through the FPGA, the on-chip memory of the FPGA will be used to provide data for the computing logic. Due to the limited rated resources of the on-chip memory of the FPGA, when performing inference operations on the neural network model, the neural network model may not be completely written into the on-chip memory of a single FPGA, and it is difficult to ensure the overall performance of the inference operation of the neural network model based on the FPGA. efficient.

It can be seen that it is a problem to be solved by those skilled in the art to provide an FPGA-based neural network operation method to relatively ensure the overall efficiency of the inference operation of the neural network model based on the FPGA.

SUMMARY OF THE INVENTION

The purpose of the present application is to provide an FPGA-based neural network operation method to relatively ensure the overall efficiency of the inference operation of the neural network model based on the FPGA.

For solving the above-mentioned technical problems, the application provides a kind of FPGA-based neural network computing method, comprising:

Get the neural network model;

Count the on-chip memory capacity corresponding to multiple FPGAs;

According to the on-chip memory capacity of each FPGA, the neural network model is divided into sub-models with corresponding data volumes; wherein, the data volume of each sub-model is not greater than the on-chip memory capacity of the corresponding FPGA;

Allocate the sub-model to the on-chip memory of the corresponding FPGA;

The data flow between the corresponding FPGAs is set according to the execution sequence between the sub-models, and each FPGA is controlled to execute the neural network operation based on the corresponding sub-models in turn according to the execution sequence.

Preferably, according to the on-chip memory capacity of each FPGA, the neural network model is divided into sub-models with corresponding data volumes, including:

Count the layer data volume of each network layer in the neural network model;

Based on the layer data volume of each network layer and the on-chip memory capacity of each FPGA, calculate the target network layer corresponding to each FPGA in turn;

The sub-model corresponding to each target network layer is obtained by splitting in the neural network model.

Preferably, the amount of layer data includes the amount of parameters and the amount of process data; wherein, the amount of process data is the amount of data generated when the corresponding network layer performs the neural network operation.

Preferably, the parameter quantities and process data quantities of each network layer in the neural network model are counted, including:

The corresponding parameter amount is obtained based on the number of filters, the number of channels and the size of the convolution kernel in each network layer in the neural network model, and the corresponding process data amount is obtained based on the number of filters in each network layer and the size of the intermediate data.

Preferably, before counting the on-chip memory capacities corresponding to multiple FPGAs, the method further includes:

Determine whether the data volume of the neural network model is greater than the on-chip memory capacity of an FPGA;

If the data volume of the neural network model is larger than the on-chip memory capacity of one FPGA, perform the step of counting the on-chip memory capacities corresponding to multiple FPGAs.

Preferably, when the data type of the parameters in the neural network model is a floating point type, before splitting the neural network model into sub-models with corresponding data amounts, the method further includes:

Convert the data type of the parameters in the neural network model from floating-point to fixed-point;

Split the neural network model into sub-models with corresponding amounts of data, including:

The neural network model after parameter data type conversion is divided into sub-models with corresponding data amount.

Preferably, the data type of the parameters in the neural network model is converted from floating point type to fixed point type, including:

Obtain the maximum parameter value of each channel in the neural network model;

Calculate the quantization coefficients of the corresponding channels according to the maximum parameter values;

Based on each quantization coefficient, the data type of the parameters in the corresponding channel in the neural network model is converted from floating point type to fixed point type.

In addition, the present application also provides an FPGA-based neural network computing device, comprising:

A model acquisition module for acquiring the neural network model;

The memory statistics module is used to count the on-chip memory capacity corresponding to multiple FPGAs;

The model splitting module is used to split the neural network model into sub-models with corresponding data amounts according to the on-chip memory capacity of each FPGA; wherein, the data amount of each sub-model is not greater than the on-chip memory capacity of the corresponding FPGA;

The model allocation module is used to allocate the sub-model to the on-chip memory of the corresponding FPGA;

The model execution module is used to set the data flow direction between the corresponding FPGAs according to the execution sequence between the sub-models, and sequentially control the FPGAs to perform neural network operations based on the corresponding sub-models according to the execution sequence.

In addition, the present application also provides an FPGA-based neural network computing device, including:

memory for storing computer programs;

The processor is configured to implement the steps of the above-mentioned FPGA-based neural network operation method when executing the computer program.

In addition, the present application also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the above-mentioned FPGA-based neural network computing method are implemented.

The FPGA-based neural network computing method provided by the present application acquires a neural network model and counts the on-chip memory capacity corresponding to multiple FPGAs, and then divides the neural network model into subsections with corresponding data amounts according to the on-chip memory capacity of each FPGA. model, in which the data volume of each sub-model is not greater than the on-chip memory capacity of the corresponding FPGA, so as to further allocate the sub-models to the on-chip memory of the corresponding FPGA, and set the corresponding FPGA according to the execution order between the sub-models. According to the execution sequence, each FPGA is controlled to execute the neural network operation based on the corresponding sub-model. In this method, multiple FPGAs jointly provide on-chip memory resources, and by dividing the complete neural network model into multiple sub-models, and assigning the sub-models to the on-chip memories of multiple FPGAs, further avoiding the need for a single FPGA on-chip memory. The problem of relatively limited rated resources further ensures the overall efficiency of inference operations based on FPGAs for executing neural network models. In addition, the present application also provides an FPGA-based neural network computing device, equipment and storage medium, and the beneficial effects are the same as described above.

Description of drawings

1 is a flowchart of an FPGA-based neural network computing method disclosed in an embodiment of the application;

2 is a flowchart of an FPGA-based neural network computing method disclosed in an embodiment of the application;

FIG. 3 is a schematic structural diagram of an FPGA-based neural network computing device disclosed in an embodiment of the present application.

detailed description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. All other embodiments obtained by those of ordinary skill in the art based on the embodiments in the present application without creative work fall within the protection scope of the present application.

Therefore, the core of the present application is to provide an FPGA-based neural network operation method to relatively ensure the overall efficiency of the inference operation of the neural network model based on the FPGA.

In order to make those skilled in the art better understand the solution of the present application, the present application will be further described in detail below with reference to the accompanying drawings and specific embodiments.

Referring to FIG. 1, an embodiment of the present application discloses an FPGA-based neural network computing method, including:

Step S10: Obtain a neural network model.

It should be noted that the neural network model obtained in this step refers to the neural network model used to perform inference operations, that is to say, the neural network model obtained in this step is used for A model for analyzing data in real scenarios.

Step S11: Count on-chip memory capacities corresponding to multiple FPGAs.

It should be noted that the focus of this step is to count the respective on-chip memory capacities of multiple FPGAs. That is, in this embodiment, the number of FPGAs used to perform neural network operations should be greater than 1. On this basis, each Each FPGA provides its own on-chip memory resources for cooperative operations on the neural network model. The purpose of counting the on-chip memory capacities corresponding to multiple FPGAs in this step is to allocate sub-models of corresponding data amounts to the on-chip memory of the FPGA according to the on-chip memory capacities of different FPGAs in subsequent steps.

In addition, it should be noted that, since there is no correlation in execution order between the steps of acquiring the neural network model and the steps of counting on-chip memory capacities corresponding to multiple FPGAs, the execution between step S10 and step S11 in this embodiment is The sequence is not fixed, and can be executed simultaneously, which should be determined according to the actual situation, and is not specifically limited here.

Step S12: According to the on-chip memory capacity of each FPGA, the neural network model is divided into sub-models with corresponding data amounts.

Wherein, the data volume of each sub-model is not greater than the on-chip memory capacity of the corresponding FPGA.

After counting the on-chip memory capacities corresponding to multiple FPGAs, this step further divides the neural network model into sub-models with corresponding data amounts according to the on-chip memory capacity of each FPGA. That is, in this step, according to the on-chip memory capacity of each FPGA Memory capacity, split the complete neural network model into sub-models that match the on-chip memory capacity of each FPGA, and each sub-model can be combined into a complete neural network model. It should be emphasized that the data volume of each sub-model is not greater than the on-chip memory capacity of its corresponding FPGA, that is, the data volume of the sub-model obtained by splitting the on-chip memory capacity of the FPGA should be less than or equal to the FPGA's on-chip memory capacity. On-chip memory capacity to ensure that the FPGA can properly perform operations on the sub-model.

Step S13: Allocate the sub-model to the on-chip memory corresponding to the FPGA.

After dividing the neural network model into sub-models with corresponding data amounts according to the on-chip memory capacity of each FPGA, this step further allocates the sub-models to the on-chip memory corresponding to the FPGA, so that in the subsequent steps, each FPGA is used to separate the sub-models. Executes a submodel of on-chip memory.

Step S14: Set the data flow direction between the corresponding FPGAs according to the execution sequence between the sub-models, and sequentially control the FPGAs to execute the neural network operation based on the corresponding sub-models according to the execution sequence.

After allocating the sub-models to the on-chip memory of the corresponding FPGA, this step further sets the data flow direction between the corresponding FPGAs according to the execution sequence between the sub-models, and sequentially controls the FPGAs to execute based on the corresponding sub-models according to the execution sequence The purpose of neural network operation is to ensure that there is data flow between FPGAs, and the data flow between FPGAs is consistent with the execution order of the corresponding sub-models in each FPGA. The effect of the network model is consistent, thereby ensuring the reliability of the neural network operation of the FPGA.

The FPGA-based neural network computing method provided by the present application acquires a neural network model and counts the on-chip memory capacity corresponding to multiple FPGAs, and then divides the neural network model into subsections with corresponding data amounts according to the on-chip memory capacity of each FPGA. model, in which the data volume of each sub-model is not greater than the on-chip memory capacity of the corresponding FPGA, so as to further allocate the sub-models to the on-chip memory of the corresponding FPGA, and set the corresponding FPGA according to the execution order between the sub-models. According to the execution sequence, each FPGA is controlled to execute the neural network operation based on the corresponding sub-model. In this method, multiple FPGAs jointly provide on-chip memory resources, and by dividing the complete neural network model into multiple sub-models, and assigning the sub-models to the on-chip memories of multiple FPGAs, further avoiding the need for a single FPGA on-chip memory. The problem of relatively limited rated resources further ensures the overall efficiency of inference operations based on FPGAs for executing neural network models.

On the basis of the foregoing embodiment, as a preferred implementation manner, before counting the on-chip memory capacities corresponding to multiple FPGAs, the method further includes:

The key point of this embodiment is that after acquiring the neural network model, it is further judged whether the data volume of the neural network model is greater than the on-chip memory capacity of an FPGA, and then only when the data volume of the neural network model is greater than the on-chip memory capacity of an FPGA, execute The step of counting the on-chip memory capacity corresponding to multiple FPGAs, and when the data volume of the neural network model is less than or equal to the on-chip memory capacity of one FPGA, it is only necessary to completely allocate the audit network model to the on-chip memory of one FPGA. This embodiment further ensures the flexibility of implementing the neural network operation process based on the FPGA.

Referring to FIG. 2 , an embodiment of the present application discloses an FPGA-based neural network computing method, including:

Step S20: Obtain a neural network model.

Step S21: Count on-chip memory capacities corresponding to multiple FPGAs.

Step S22: Count the layer data amount of each network layer in the neural network model.

It should be noted that the focus of this embodiment is to divide the neural network model into sub-models in units of network layers in the neural network model.

In this step, the layer data amount of each network layer in the neural network model is counted first, and the purpose is to divide the sub-model composed of a corresponding number of network layers to the FPGA according to the FPGA on-chip memory capacity in the subsequent steps.

Step S23: Based on the layer data amount of each network layer and the on-chip memory capacity of each FPGA, sequentially calculate the target network layer corresponding to each FPGA.

After calculating the layer data volume of each network layer in the neural network model, this step further calculates the target network layer corresponding to each FPGA in turn based on the data volume of each network layer, that is, the layer data volume and the on-chip memory capacity of each FPGA. Here, the number of target network layers corresponding to each FPGA is determined according to the on-chip memory capacity of the corresponding FPGA.

Step S24: Split the neural network model to obtain sub-models corresponding to each target network layer.

It should be noted that, after calculating the target network layers corresponding to each FPGA in turn, this step further divides the neural network model into sub-models with target network layers.

As a preferred embodiment, when the number of target network layers corresponding to the FPGA is greater than 1 and the target network layers are adjacent, split the neural network model to obtain sub-models corresponding to each target network layer. The network model is split to obtain sub-models that include each target network layer, so as to relatively reduce the number of sub-models, and then control the number of calls to the sub-models when each FPGA performs neural network operations based on the corresponding sub-models, improving the execution neural network. The efficiency of network operations.

Step S25: Allocate the sub-model to the on-chip memory corresponding to the FPGA.

Step S26: Set the data flow direction between the corresponding FPGAs according to the execution sequence between the sub-models, and sequentially control the FPGAs to perform neural network operations based on the corresponding sub-models according to the execution sequence.

This embodiment further ensures the network integrity of the neural network model segments included in the sub-model, and relatively improves the reliability of controlling each FPGA to perform neural network operations based on the corresponding sub-model.

The present embodiment will be described below through specific scenarios.

Before performing neural network operations based on FPGA, it is necessary to obtain the number of FPGA devices that can be used and the on-chip memory capacity of each FPGA device to facilitate model splitting. Suppose there are N (N>1) FPGA devices available, the memory sizes are M[1]...M[N], the number of layers of the neural network model is len, and the parameters of each layer are W[1]...W[len] , the intermediate data volume of each layer is A[1]…A[len], then it can be compared by cyclic accumulation. When it is accumulated to the i-th layer, the data volume is greater than the memory of the first FPGA device, then the neural network model The sub-models containing layers 1 to i-1 are assigned to the first FPGA device, and then cyclically accumulated and compared again starting from i, until the jth layer is greater than the second FPGA device, and the neural network model includes layers i to j-1. The submodel of is assigned to the second FPGA device, and so on, until all parameters are assigned to the FPGA device. If the FPGA device is insufficient, the sub-model is allocated to each device according to the cycle, and the sub-model is first placed in the off-chip memory. According to the execution sequence, each FPGA is controlled to execute the neural network operation based on the corresponding sub-model.

On the basis of the above embodiment, as a preferred implementation, the layer data volume includes parameter volume and process data volume; wherein, the process data volume is the data volume of data generated during the neural network operation performed by the corresponding network layer.

It should be noted that, the key point of this embodiment is that the layer data volume of each network layer in the neural network model further includes the parameter volume and process data volume of each network layer, wherein the parameter volume refers to the network layer when performing the neural network operation. According to the data volume of the parameters, the process data volume refers to the data volume of the data generated by the network layer performing the neural network operation process. By further refining the layer data volume, this implementation further ensures that the overall accuracy of the target network layer process corresponding to each FPGA is sequentially calculated based on the layer data volume of each network layer and the on-chip memory capacity of each FPGA. The reliability of FPGA neural network operations.

Further, as a preferred embodiment, the parameter amount and the process data amount of each network layer in the neural network model are counted, including:

It should be noted that, in this embodiment, the parameter quantities of each network layer in the neural network model are statistically obtained based on the number of filters, the number of channels, and the size of the convolution kernel in the corresponding network layer, and the amount of process data of each network layer is based on the corresponding The number of filters in the network layer and the size of the intermediate data are counted. This embodiment further ensures the overall accuracy of the process of calculating the target network layer corresponding to each FPGA, thereby ensuring the reliability of the FPGA-based neural network operation.

On the basis of the technical solution of this embodiment, specifically, when the number of channels of a certain network layer in the neural network model is C, the number of filters is N, and the length and width of the convolution kernel are K respectively, that is, the length of the convolution kernel is K. When the size is K*K, the parameter quantity of the network layer is C*K*K*N; when the number of filters of a network layer in the neural network model is N, the length and width of the intermediate data are W, H respectively , that is, when the size of the intermediate data is W*H, the parameter quantity of the network layer is N*W*H.

On the basis of the above-mentioned series of implementations, as a preferred implementation, when the data type of the parameters in the neural network model is a floating-point type, before dividing the neural network model into sub-models with corresponding data amounts, Methods also include:

It should be noted that the focus of this embodiment is that when the data type of the parameters in the neural network model is a floating point type, before dividing the neural network model into sub-models with corresponding data amounts, the parameters in the neural network model are The data type of FPGA is converted from floating-point type to fixed-point type, which relatively reduces the overall data volume of the neural network model, which can relatively reduce the overall data volume of neural network operations based on FPGA, and relatively improve the overall efficiency of neural network operations.

Further, convert the data type of the parameters in the neural network model from floating-point to fixed-point, including:

Obtain the maximum parameter value of each channel in the neural network model;

It should be noted that in this embodiment, when the data type of the parameters in the neural network model is converted from floating-point type to fixed-point type, the quantization coefficient of each channel is calculated based on the maximum parameter value of each channel in the neural network model, and then the quantization coefficient of each channel is calculated. The quantization coefficient is used as a conversion weight between data types, that is, according to each quantization coefficient, the data type of the parameters in the corresponding channel in the neural network model is converted from a floating point type to a fixed point type. In this embodiment, the maximum parameter value of each channel in the neural network model is used as the limiting factor of the data bit range of the fixed-point data when the data type of the parameters in the channel is converted from floating-point to fixed-point, which further ensures that the neural network The accuracy of converting the data types of parameters in the model from floating point to fixed point.

The present embodiment will be described below through specific scenarios.

The neural network model is divided into two parts, one part is the parameters in the neural network model, which are called weights here; the other part is the intermediate data processed inside the neural network model, which is called activation values here. The quantization of the neural network model is to convert the model data from floating point to fixed point data. First, the weights need to be quantized to 2 ⁿ , and then the activation values are quantized.

For weight quantization, it is necessary to select a suitable weight threshold combined with retraining, and convert the parameters in the neural network model to fixed-point (INT) type. Because direct full quantization will lead to a slight decrease in accuracy, the model needs to be gradually quantized and then retrained. the remaining parameters. Proceed as follows:

Quantize the weights to ± ²ⁿ , that is, ±1, ±0.5, ±0.25..., because the larger the weight, the greater the effect, so start quantization from the maximum value, first quantize the first 25%, and quantize the value to n of 2 power, the remaining values are retrained so that the accuracy does not drop, then keep the data unchanged, quantify the next 25% of the maximum value, retrain the remaining data so that the accuracy does not drop, and cycle in turn until all values are equal. to the nth power of 2. At this time, the model weights are all INT values, and the activation values are processed next.

The quantification of the activation value also needs to determine its threshold range. Since the thresholds in different tasks are different, it is necessary to combine the dataset to obtain the threshold range, and then obtain the threshold range. details as follows:

Pick a picture from each category of the dataset, use the quantized network to infer, and get the maximum value of each element in all channels, which is the activation value and category information table, which is the basis for comparison in the next step.

Compare the data in the previous step by channel to obtain the maximum value of each channel element, multiply this value by 2 to the power of m to ensure that the product does not exceed 2 ⁿ -1, and record the m value. This is the quantization coefficient of the activation value of each channel. At this point, the quantization coefficients of each channel are obtained, and the average value of m is calculated again according to the layer. All the values of the layer greater than the average value m are taken as m, and the original values of the values smaller than the average value m are taken. This quantization algorithm is efficient and feasible, and can greatly reduce the data volume of the neural network model.

Referring to FIG. 3 , an embodiment of the present application provides an FPGA-based neural network computing device, including:

a model obtaining module 10, for obtaining a neural network model;

The memory statistics module 11 is used to count the on-chip memory capacities corresponding to multiple FPGAs;

Model splitting module 12 is used to split the neural network model into a sub-model with corresponding data volume according to the on-chip memory capacity of each FPGA; wherein, the data volume of each sub-model is not greater than the on-chip memory capacity of corresponding FPGA;

The model allocation module 13 is used for allocating the sub-model to the on-chip memory corresponding to the FPGA;

The model execution module 14 is configured to set the data flow direction between the corresponding FPGAs according to the execution sequence between the sub-models, and sequentially control the FPGAs to perform neural network operations based on the corresponding sub-models according to the execution sequence.

The FPGA-based neural network computing device provided by the present application acquires a neural network model and counts the on-chip memory capacities corresponding to multiple FPGAs, and then divides the neural network model into sub-systems with corresponding data amounts according to the on-chip memory capacity of each FPGA. model, in which the data volume of each sub-model is not greater than the on-chip memory capacity of the corresponding FPGA, so as to further allocate the sub-models to the on-chip memory of the corresponding FPGA, and set the corresponding FPGA according to the execution order between the sub-models. According to the execution sequence, each FPGA is controlled to execute the neural network operation based on the corresponding sub-model. The device adopts the way that multiple FPGAs jointly provide on-chip memory resources. By dividing the complete neural network model into multiple sub-models, and assigning the sub-models to the on-chip memories of multiple FPGAs, the on-chip memory of a single FPGA is further avoided. The problem of relatively limited rated resources further ensures the overall efficiency of inference operations based on FPGAs for executing neural network models.

In addition, an embodiment of the present application also provides an FPGA-based neural network computing device, including:

memory for storing computer programs;

The FPGA-based neural network computing device provided by the present application acquires a neural network model and counts the on-chip memory capacities corresponding to multiple FPGAs, and then divides the neural network model into subsections with corresponding data amounts according to the on-chip memory capacity of each FPGA. model, in which the data volume of each sub-model is not greater than the on-chip memory capacity of the corresponding FPGA, so as to further allocate the sub-models to the on-chip memory of the corresponding FPGA, and set the corresponding FPGA according to the execution order between the sub-models. According to the execution sequence, each FPGA is controlled to execute the neural network operation based on the corresponding sub-model. This device uses multiple FPGAs to jointly provide on-chip memory resources. By dividing the complete neural network model into multiple sub-models, and assigning the sub-models to the on-chip memories of multiple FPGAs, the on-chip memory of a single FPGA is further avoided. The problem of relatively limited rated resources further ensures the overall efficiency of inference operations based on FPGAs for executing neural network models.

In addition, embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the above-mentioned FPGA-based neural network computing method are implemented.

The computer-readable storage medium provided by the present application acquires a neural network model and counts the on-chip memory capacities corresponding to multiple FPGAs, and then divides the neural network model into sub-models with corresponding data volumes according to the on-chip memory capacities of each FPGA, Among them, the data amount of each sub-model is not greater than the on-chip memory capacity of the corresponding FPGA, so as to further allocate the sub-models to the on-chip memory of the corresponding FPGA, and set the corresponding FPGA according to the execution order between the sub-models. Data flow, and then control each FPGA to perform neural network operations based on the corresponding sub-models in turn according to the execution sequence. The computer-readable storage medium adopts the way that multiple FPGAs jointly provide on-chip memory resources, and further avoids the need for a single The problem of the relatively limited rated resources of the FPGA on-chip memory further ensures the overall efficiency of the inference operation of the neural network model based on the FPGA.

The FPGA-based neural network computing method, device, device, and storage medium provided by the present application have been described above in detail. The various embodiments in the specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments, and the same and similar parts between the various embodiments can be referred to each other. As for the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant part can be referred to the description of the method. It should be pointed out that for those of ordinary skill in the art, without departing from the principles of the present application, several improvements and modifications can also be made to the present application, and these improvements and modifications also fall within the protection scope of the claims of the present application.

It should also be noted that, in this specification, relational terms such as first and second, etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities or operations. There is no such actual relationship or sequence between operations. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device that includes a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

Claims

An FPGA-based neural network computing method, comprising:

Get the neural network model;

Count the on-chip memory capacity corresponding to multiple FPGAs;

According to the on-chip memory capacity of each of the FPGAs, the neural network model is divided into sub-models with corresponding data volumes; wherein, the data volume of each of the sub-models is not greater than the corresponding on-chip memory capacity of the FPGA;

Allocating the sub-model to the on-chip memory corresponding to the FPGA;

The data flow direction between the corresponding FPGAs is set according to the execution sequence between the sub-models, and each FPGA is controlled to perform neural network operations based on the corresponding sub-models in sequence according to the execution sequence.
The FPGA-based neural network computing method according to claim 1, wherein, according to the on-chip memory capacity of each FPGA, the neural network model is divided into sub-models with corresponding data volumes, comprising:

Count the layer data volume of each network layer in the neural network model;

Based on the layer data volume of each of the network layers and the on-chip memory capacity of each of the FPGAs, sequentially calculate the target network layer corresponding to each of the FPGAs;

The sub-model corresponding to each target network layer is obtained by splitting the neural network model.
The FPGA-based neural network computing method according to claim 2, wherein the layer data volume includes a parameter volume and a process data volume; wherein, the process data volume is the neural network computing process performed by the corresponding network layer The amount of data generated in the data.
The FPGA-based neural network computing method according to claim 3, wherein the statistics of the parameter quantities and process data quantities of each network layer in the neural network model include:

The corresponding parameter amount is obtained based on the number of filters, the number of channels and the size of the convolution kernel of each of the network layers in the neural network model, and the number of filters based on the number of filters and the intermediate data in each of the network layers is obtained. Dimensional statistics result in the corresponding amount of the process data.
The FPGA-based neural network computing method according to claim 1, wherein, before the counting of the on-chip memory capacities corresponding to multiple FPGAs, the method further comprises:

Determine whether the data volume of the neural network model is greater than the on-chip memory capacity of an FPGA;

If the data amount of the neural network model is greater than the on-chip memory capacity of one FPGA, the step of counting the on-chip memory capacities corresponding to multiple FPGAs is performed.
The FPGA-based neural network computing method according to any one of claims 1 to 5, wherein when the data type of the parameters in the neural network model is a floating point type, in the process of applying the neural network model Before splitting into sub-models with corresponding data amounts, the method further includes:

Converting the data type of the parameters in the neural network model from the floating point type to the fixed point type;

The described dividing the neural network model into sub-models with corresponding data amounts, including:

The neural network model converted from the data types of the parameters is divided into the sub-models with corresponding data amounts.
The FPGA-based neural network computing method according to claim 6, wherein the converting the data type of the parameters in the neural network model from the floating point type to the fixed point type comprises:

Obtain the maximum parameter value of each channel in the neural network model;

Calculate the quantization coefficients of the corresponding channels according to the maximum parameter values;

The data type of the corresponding in-channel parameter in the neural network model is converted from the floating point type to the fixed point type based on each of the quantization coefficients.
An FPGA-based neural network computing device, comprising:

A model acquisition module for acquiring the neural network model;

The memory statistics module is used to count the on-chip memory capacity corresponding to multiple FPGAs;

A model splitting module is used to split the neural network model into sub-models with corresponding data volumes according to the on-chip memory capacity of each of the FPGAs; wherein, the data volume of each of the sub-models is not greater than the corresponding Describe the on-chip memory capacity of the FPGA;

a model allocation module, for allocating the sub-model to the on-chip memory corresponding to the FPGA;

A model execution module, configured to set the data flow direction between the corresponding FPGAs according to the execution sequence between the sub-models, and sequentially control the FPGAs to execute based on the corresponding sub-models according to the execution sequence Neural network operations.
An FPGA-based neural network computing device, comprising:

memory for storing computer programs;

The processor is configured to implement the steps of the FPGA-based neural network operation method according to any one of claims 1 to 7 when executing the computer program.
A computer-readable storage medium, characterized in that, a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the FPGA-based FPGA-based storage medium according to any one of claims 1 to 7 is implemented. The steps of the neural network operation method.