CN111738423A - Method and device for compiling neural network model, storage medium and electronic equipment - Google Patents
Method and device for compiling neural network model, storage medium and electronic equipment Download PDFInfo
- Publication number
- CN111738423A CN111738423A CN202010601610.2A CN202010601610A CN111738423A CN 111738423 A CN111738423 A CN 111738423A CN 202010601610 A CN202010601610 A CN 202010601610A CN 111738423 A CN111738423 A CN 111738423A
- Authority
- CN
- China
- Prior art keywords
- feasible
- feature map
- parameter
- neural network
- network model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the invention provides a compiling method and device of a neural network model, a storage medium and electronic equipment, and relates to the field of deep learning. The method comprises the following steps: acquiring an original characteristic diagram parameter of each convolution layer in the neural network model; splitting the original characteristic diagram parameters according to the input-output parameter relational expression and the memory capacity to obtain a feasible characteristic diagram parameter set of each convolution layer; determining a corresponding target characteristic map parameter with the highest data transfer efficiency for each convolution layer from the feasible characteristic map parameter set of each convolution layer; and generating an executable file for the neural network model according to the target characteristic diagram parameters corresponding to each convolution layer. By finding a corresponding parameter with the highest data carrying efficiency for each convolution layer, the overall data reuse rate of the neural network model is improved, the operation amount during operation is reduced, the operation amount of an executable file corresponding to the neural network model during operation can be reduced, and the operation efficiency is improved.
Description
Technical Field
The application relates to the field of deep learning, in particular to a compiling method and device of a neural network model, a storage medium and electronic equipment.
Background
Deep learning enables machines to simulate human activities such as audio-visual and thinking, solves a plurality of complex pattern recognition problems, and makes great progress in a plurality of technical fields such as natural language processing, image recognition, voice recognition, data mining, personalized recommendation and the like.
How to construct the neural network model and compile the constructed neural network model are core links of deep learning. At present, when a constructed neural network model is compiled into an executable file, the compiled executable file has large computation amount and low operation efficiency during operation due to large parameter quantity of the neural network model. Therefore, the compiling process of the neural network model needs to be optimized to improve the execution efficiency of the compiled executable file, but the current optimization method is simpler, and the running efficiency of the compiled executable file is still very low.
Disclosure of Invention
The object of the present application includes, for example, providing a compiling method, device, storage medium and electronic device for a neural network model, which can compile the neural network model into an executable file, reduce the computation amount of the executable file during running, and improve the running efficiency of the executable file.
The embodiment of the application can be realized as follows:
in a first aspect, an embodiment provides a method for compiling a neural network model, where the method includes: acquiring an original characteristic diagram parameter of each convolution layer in the neural network model; splitting the original characteristic diagram parameters according to the input-output parameter relational expression and the memory capacity to obtain a feasible characteristic diagram parameter set of each convolution layer; determining a corresponding target characteristic map parameter for each convolution layer from the feasible characteristic map parameter set of each convolution layer; the data handling efficiency corresponding to the target characteristic diagram parameters is highest; and generating an executable file for the neural network model according to the target characteristic diagram parameters corresponding to each convolution layer.
In an alternative embodiment, the raw feature map parameters include raw output feature map parameters; the step of splitting the original characteristic diagram parameters according to the input-output parameter relational expression and the memory capacity to obtain a feasible characteristic diagram parameter set of each convolution layer comprises the following steps: splitting the original output characteristic diagram parameters according to the input-output parameter relational expression and the memory capacity to obtain a feasible output characteristic diagram parameter set of each convolution layer; determining a feasible input characteristic diagram parameter set of each convolution layer according to the feasible output characteristic diagram parameter set of each convolution layer and the input-output parameter relational expression; and determining a feasible feature map parameter set of each convolutional layer according to the feasible output feature map parameter set and the feasible input feature map parameter set of each convolutional layer.
In an optional embodiment, the step of determining a corresponding target feature map parameter for each convolutional layer from the set of feasible feature map parameters of each convolutional layer includes: determining the data repeated loading capacity corresponding to each feasible feature map parameter according to the value of each feasible feature map parameter in the feasible feature map parameter set of each convolutional layer; and taking the feasible characteristic diagram parameter with the minimum data repeated loading amount in the feasible characteristic diagram parameter set of each convolution layer as the target characteristic diagram parameter corresponding to each convolution layer.
In an optional embodiment, the neural network model includes a plurality of convolutional layers, and the step of determining a corresponding target feature map parameter for each convolutional layer from the set of feasible feature map parameters of each convolutional layer includes: determining the data repeated loading capacity corresponding to each feasible feature map parameter according to the value of each feasible feature map parameter in the feasible feature map parameter set of the target convolutional layer; determining an optimal characteristic map parameter from a feasible characteristic map parameter set of the target convolutional layer according to the number of preset processing cores and the data repetition loading amount; the target convolutional layer is any one of the plurality of convolutional layers, and the optimal characteristic map parameter is a feasible characteristic map parameter with the highest data transfer efficiency in a feasible characteristic map parameter set of the target convolutional layer; and taking the optimal characteristic map parameters as target characteristic map parameters corresponding to the target convolutional layer.
In an optional embodiment, the step of generating an executable file for the neural network model according to the target feature map parameters corresponding to each convolutional layer includes: and generating a three-dimensional Direct Memory Access (DMA) data carrying instruction according to the target characteristic diagram parameters corresponding to each convolution layer, and generating an executable file of the neural network model according to the three-dimensional DMA data carrying instruction.
In an alternative embodiment, the method further comprises: executing the executable file to implement the data processing function of the neural network model.
In a second aspect, an embodiment provides a compiling apparatus of a neural network model, including: the acquisition module is used for acquiring the original characteristic map parameters of each convolution layer in the neural network model; the splitting module is used for splitting the original characteristic diagram parameters according to the input-output parameter relational expression and the memory capacity to obtain a feasible characteristic diagram parameter set of each convolution layer; the splitting module is further configured to determine a corresponding target feature map parameter for each convolution layer from the feasible feature map parameter set of each convolution layer; the data handling efficiency corresponding to the target characteristic diagram parameters is highest; and the generating module is used for generating an executable file for the neural network model according to the target characteristic diagram parameters corresponding to each convolution layer.
In an alternative embodiment, the raw feature map parameters include raw output feature map parameters; the splitting module is used for splitting the original output characteristic diagram parameters according to the input-output parameter relational expression and the memory capacity to obtain a feasible output characteristic diagram parameter set of each convolution layer; the splitting module is further used for determining a feasible input characteristic diagram parameter set of each convolution layer according to the feasible output characteristic diagram parameter set of each convolution layer and the input-output parameter relational expression; the splitting module is further used for determining a feasible feature map parameter set of each convolutional layer according to the feasible output feature map parameter set and the feasible input feature map parameter set of each convolutional layer.
In an optional embodiment, the splitting module is configured to determine, according to a value of each feasible feature map parameter in the feasible feature map parameter set of each convolutional layer, a data repetition load amount corresponding to each feasible feature map parameter; the splitting module is further configured to use a feasible feature map parameter with the minimum data repetition loading amount in the feasible feature map parameter set of each convolutional layer as a target feature map parameter corresponding to each convolutional layer.
In an optional embodiment, the neural network model includes a plurality of convolutional layers, and the splitting module is configured to determine, according to a value of each feasible feature map parameter in a feasible feature map parameter set of a target convolutional layer, a data repetition load amount corresponding to each feasible feature map parameter; the splitting module is further used for determining an optimal feature map parameter from a feasible feature map parameter set of the target convolutional layer according to the preset number of processing cores and the data repeat loading amount; the target convolutional layer is any one of the plurality of convolutional layers, and the optimal characteristic map parameter is a feasible characteristic map parameter with the highest data transfer efficiency in a feasible characteristic map parameter set of the target convolutional layer; the splitting module is further configured to use the optimal feature map parameter as a target feature map parameter corresponding to the target convolutional layer.
In an optional embodiment, the generating module is configured to generate a three-dimensional DMA data transfer instruction according to a target feature map parameter corresponding to each convolutional layer, and generate an executable file of the neural network model according to the three-dimensional DMA data transfer instruction.
In an optional embodiment, the apparatus further comprises an execution module, configured to execute the executable file to implement the data processing function of the neural network model.
In a third aspect, embodiments provide a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements a method of compiling a neural network model according to any one of the preceding embodiments.
In a fourth aspect, an embodiment provides an electronic device, including a processor and a memory, where the memory stores machine-readable instructions, and the processor is configured to execute the machine-readable instructions to implement the method for compiling a neural network model according to any one of the foregoing embodiments.
The beneficial effects of the embodiment of the application include, for example: according to the compiling method of the neural network model, the original characteristic map parameters of each convolution layer in the neural network model can be split, a corresponding target characteristic map parameter with the highest data carrying efficiency is found for each convolution layer, the overall data reuse rate of the neural network model is further improved, the operation amount of an executable file corresponding to the neural network model in operation is reduced, and the operation efficiency of the executable file is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a block diagram of an electronic device according to an embodiment of the present disclosure;
FIG. 2 is a flowchart of a method for compiling a neural network model according to an embodiment of the present disclosure;
fig. 3 is a flowchart of S110 in the method for compiling a neural network model according to the embodiment of the present application;
fig. 4 is a flowchart of S120 in the method for compiling a neural network model according to the embodiment of the present application;
fig. 5 is another flowchart of S120 in the method for compiling a neural network model according to the embodiment of the present application;
FIG. 6 is another flowchart of a method for compiling a neural network model provided in the embodiments of the present application;
fig. 7 is a functional block diagram of a compiling apparatus of a neural network model according to an embodiment of the present application.
Icon: 100-an electronic device; 110-a memory; 120-a processor; 130-a bus; 140-a communication interface; 200-compiling means of the neural network model; 210-an obtaining module; 220-splitting module; 230-a generation module; 240-running the module.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
It should be noted that the features of the embodiments of the present application may be combined with each other without conflict.
In the process of implementing the technical solution of the embodiment of the present application, the inventors of the present application find that:
at present, a constructed neural network model is usually compiled into an executable file through a compiler, and when the constructed neural network model is compiled into the executable file, because the quantity of convolution layer parameters in the neural network model is large, if the neural network model is directly compiled without optimization, the compiled executable file has large computation amount during operation and low operation efficiency. Therefore, the compiler often needs to implement optimization of convolution operation by splitting a feature map (splitting an operation in a convolution layer into a plurality of sub-operations, which are independently calculated by a processor), and multiplexing weights or feature maps (data multiplexing method).
However, the current feature map splitting method is usually implemented only at the hardware level, and the hardware splitting and data multiplexing method is fixed. For a neural network model comprising a plurality of convolutional layers, the prior art can only multiplex weights for the convolutional layers of the neural network model or multiplex feature maps for the convolutional layers of the neural network model. However, for convolution operations of different convolutional layers, different signature splitting methods and different data multiplexing methods have different effects on the computation amount of the convolutional layers. When the feature graph data volume is large and the weight data volume is small, the method of multiplexing the feature graph and repeatedly loading the weight can reduce the whole data loading volume and reduce the convolution operation volume of the convolutional layer; when the feature graph is small and the weight data volume is large, the method of multiplexing the weights and repeatedly loading the feature graph reduces the whole data loading volume and reduces the convolution operation volume of the convolutional layer.
In a common neural network model, the feature map gradually becomes smaller along with the execution process of the convolutional layer, so both cases can exist, and if only the weight multiplexing is performed on the plurality of convolutional layers of the neural network model, or only the feature map multiplexing is performed on the plurality of convolutional layers of the neural network model, the overall data load of the plurality of convolutional layers in the neural network model cannot be reduced to the minimum. That is, the current optimization method is simple, and the running efficiency of the compiled executable file is still very low.
Therefore, in order to improve the above-mentioned drawbacks, embodiments of the present application provide a method and an apparatus for compiling a neural network model, a storage medium, and an electronic device, which are capable of compiling the neural network model into an executable file, reducing the computation load of the executable file during running, and improving the running efficiency of the executable file. It should be noted that the defects of the solutions in the above prior art are the results obtained after the inventor has made practice and careful study, and therefore, the discovery process of the above problems and the solutions proposed by the embodiments of the present application in the following description should be the contribution of the inventor to the present application in the course of the present application.
Referring to fig. 1, a block diagram of an electronic device 100 according to an embodiment of the present disclosure is shown. The electronic device 100 may include a memory 110, a processor 120, a bus 130, and a communication interface 140, the memory 110, the processor 120, and the communication interface 140 being electrically connected to each other, directly or indirectly, to enable transmission or interaction of data. For example, the components may be electrically connected to each other via one or more buses 130 or signal lines. Processor 120 may process information and/or data related to the compilation of the neural network model to perform one or more of the functions described herein. For example, the processor 120 may obtain an original feature map parameter of each convolution layer in the neural network model, and compile the neural network model according to the data, thereby implementing the compiling method of the neural network model provided by the present application.
The Memory 110 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.
The processor 120 may be an integrated circuit chip having signal processing capabilities. The processor 120 may be a general-purpose processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
It will be appreciated that the configuration shown in FIG. 1 is merely illustrative and that the electronic device 100 may include more or fewer components than shown in FIG. 1 or may have a different configuration than shown in FIG. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof. For example, the electronic device 100 may be a neural network accelerator, a server, a computer, a mobile phone, a tablet, a cloud platform, etc., and therefore, the present application is not limited to the specific type of the electronic device 100.
For convenience of understanding, the following embodiments of the present application will specifically describe a compiling method of a neural network model provided in the embodiments of the present application by taking the electronic device 100 shown in fig. 1 as an example, and referring to the drawings. Referring to fig. 2, fig. 2 is a flowchart illustrating a method for compiling a neural network model according to an embodiment of the present disclosure. The compiling method of the neural network model may be applied to the electronic device 100 described above, and the compiling method of the neural network model may include the following steps:
s100, acquiring the original characteristic diagram parameters of each convolution layer in the neural network model.
In some possible embodiments, the electronic device 100 may acquire the pre-established neural network model from a storage medium of another device (e.g., a background server, a cloud server, or the like), or may also acquire the pre-established neural network model from a storage medium of the electronic device, and therefore, the present application does not limit the manner of acquiring the neural network model.
After obtaining the neural network model, the electronic device 100 may calculate the original feature map parameters (e.g., the input feature map parameters and the output feature map parameters of the convolutional layers) of each convolutional layer in the neural network model according to the global input feature map size and the operator calculation parameters (e.g., stride parameter, kernel size parameter, etc.) of the neural network model.
In addition, the raw feature map parameters of each convolution layer in the neural network model may also be pre-stored in the storage medium of the electronic device 100, and when "acquiring the raw feature map parameters of each convolution layer in the neural network model", it is only necessary to acquire the raw feature map parameters of each convolution layer in the pre-stored neural network model from the storage medium.
And S110, splitting the original characteristic diagram parameters according to the input-output parameter relational expression and the memory capacity to obtain a feasible characteristic diagram parameter set of each convolution layer.
In some possible embodiments, since when the neural network model is compiled into an executable file, the input, output, and weight sub-blocks of the executable file all need to be loaded into an SRAM (Static Random-Access Memory) space for storage, the above Memory capacity may be a storage capacity of an SRAM of the electronic device 100. Of course, in other possible embodiments, the above-mentioned memory capacity may also be a preset capacity or a partial capacity in the SRAM storage capacity, which is not limited in this application.
Since there is a fixed correspondence between the input feature map parameter and the output feature map parameter of each convolutional layer of the neural network model according to the basic principle of convolution calculation, the input-output parameter relation may be consistent with the fixed correspondence, or the fixed correspondence may be the input-output parameter relation. Furthermore, after obtaining the raw feature map parameters of each convolution layer in the neural network model, the electronic device 100 may split the raw feature map parameters according to the input-output parameter relational expression and the memory capacity, and may obtain a feasible feature map parameter set of each convolution layer.
The original feature map parameters may include original input feature map parameters and original output feature map parameters. When splitting the original characteristic diagram parameters according to the input-output parameter relational expression and the memory capacity to obtain a feasible characteristic diagram parameter set of each convolution layer, splitting the original characteristic diagram parameters based on the original input characteristic diagram parameters to obtain a feasible characteristic diagram parameter set of each convolution layer; or splitting the original characteristic diagram parameters based on the original output characteristic diagram parameters to obtain a feasible characteristic diagram parameter set of each convolution layer; the original characteristic diagram parameters can be split based on the original input characteristic diagram parameters and the original output characteristic diagram parameters at the same time to obtain a feasible characteristic diagram parameter set of each convolution layer, and therefore the specific splitting mode is not limited in the application.
In order to further improve the operation efficiency of the method provided by the present application, in some possible embodiments, the original feature map parameters in the present application may include original output feature map parameters, and for how to "split the original feature map parameters according to the input-output parameter relation and the memory capacity to obtain a feasible feature map parameter set for each convolutional layer", on the basis of fig. 2, please refer to fig. 3, S110 may include:
S110A, splitting the original output characteristic diagram parameters according to the input-output parameter relational expression and the memory capacity to obtain a feasible output characteristic diagram parameter set of each convolution layer.
In some possible embodiments, the raw input feature map parameters included in the raw feature map parameters may include: inputting a channel dimension parameter inf _ c, a height dimension parameter inf _ h and a width dimension parameter inf _ w of the feature map; the raw output feature map parameters included in the raw feature map parameters may include: the channel dimension parameter ouf _ c, the height dimension parameter ouf _ h, and the width dimension parameter ouf _ w of the output feature map, and the original feature map parameters may further include a weight parameter: a weight channel parameter wt _ c, a weight dimension parameter wt _ n, a weight height parameter wt _ h, and a weight width parameter wt _ w.
In the basic principle of convolution calculation, the channel dimension parameter ouf _ c of the output feature map and the weight dimension parameter wt _ n of the weight parameter have a corresponding fixed relationship, and the height dimension parameter ouf _ h and the width dimension parameter ouf _ w of the output feature map and the height dimension parameter inf _ h and the width dimension parameter inf _ w of the input feature map have a corresponding fixed relationship, respectively. From these fixed relationships, the following equation 1 can be derived:
the method comprises the following steps that pad is the zero padding number of convolution operation, s is a convolution stepping parameter, and both pad and s are parameter values analyzed according to a neural network model, namely, pad and s can be regarded as preset parameter values;
furthermore, since the input, output and weight sub-blocks of the executable file generated by compiling all need to be loaded into the SRAM space for storage, the maximum size of each sub-block needs to be smaller than the maximum SRAM capacity, and assuming that the SRAM capacity sizes of the input feature map, the weight and the output feature map of the accelerator are inf _ L, wt _ L, ouf _ L, respectively, the sizes need to satisfy the following formula 2:
in particular, when the input feature map, the weight, and the output feature map share the capacity of the accelerator SRAM, assuming that the capacity of the accelerator SRAM is L, the above equation 2 may be:
inf_c*inf_h*inf_w+ouf_c*ouf_h*ouf_w+wt_n*wt_c*wt_h*wt_w<L;
for convolution calculation, the width and height sizes of the convolution input feature map must be larger than the weight width and height sizes, and the sub-block size of the output feature map must be smaller than the original size. Therefore, assuming that the original size of the output signature is s _ ouf _ c _ s _ ouf _ h _ s _ ouf _ w (i.e., the original sizes of the output signatures are s _ ouf _ c, s _ ouf _ h, and s _ ouf _ w, respectively), the following formula 3 can be obtained:
further, by combining the above equations 1, 2, and 3, an input/output parameter relation can be obtained as follows:
in order to ensure the integrity of the accumulation for calculating an output point, avoid splitting the accumulation process for calculating an output point into several calculations, and add an intermediate data buffer, the unknowns inf _ c, wt _ h, and wt _ w in the above formula can all use the original values (i.e., inf _ c, wt _ h, and wt _ w are known quantities), and thus it can be seen that only ouf _ c, ouf _ h, and ouf _ w in the above formula are unknown quantities, and the upper and lower limits thereof are fixed. According to the input-output parameter relational expression and the size of the memory capacity L, ouf _ c, ouf _ h and ouf _ w are set as unknowns, and the possible values of the unknowns are traversed, so that various splitting size combinations of the output feature map can be obtained (namely, a feasible feature map parameter set of each convolution layer is obtained).
It should be understood that S110A may be understood as a first step of "splitting the original feature map parameters based on the original output feature map parameters to obtain a feasible feature map parameter set for each convolutional layer", and since splitting the original feature map parameters based on the original output feature map parameters may avoid splitting the accumulation process of calculating one output point into several calculations, increasing an intermediate data cache (i.e., the SS110A may effectively avoid caching intermediate results), thereby improving the operating efficiency of the method provided by the present application.
S110B, determining the feasible input characteristic map parameter set of each convolution layer according to the feasible output characteristic map parameter set of each convolution layer and the input-output parameter relational expression.
After the feasible output feature map parameter set of each convolution layer is obtained, the feasible input feature map parameters and the feasible weight parameters corresponding to each split size combination can be calculated (that is, the feasible input feature map parameter set of each convolution layer is determined) according to the feasible output feature map parameter set of each convolution layer and by combining the formula 1.
S110C, determining the feasible feature map parameter set of each convolution layer according to the feasible output feature map parameter set and the feasible input feature map parameter set of each convolution layer.
After the feasible output characteristic diagram parameter set and the feasible input characteristic diagram parameter set of each convolution layer are determined, the feasible output characteristic diagram parameter set and the feasible input characteristic diagram parameter set of each convolution layer are respectively merged, and then the feasible characteristic diagram parameter set of each convolution layer can be determined.
Referring to fig. 2 again, in S120, a corresponding target feature map parameter is determined for each convolution layer from the feasible feature map parameter set of each convolution layer; the data transfer efficiency corresponding to the target characteristic diagram parameters is highest.
In the present embodiment, after performing S100 to S110, each convolution layer corresponds to one feasible feature map parameter set, and each feasible feature map parameter set may include a plurality of feasible feature map parameters. In order to reduce the computation amount of the executable file compiled by the neural network model during the operation and improve the operation efficiency of the executable file, for a certain feasible feature map parameter set, the feasible feature map parameter with the highest data transfer efficiency in a plurality of feasible feature map parameters included in the feasible feature map parameter set can be calculated and used as the target feature map parameter corresponding to the feasible feature map parameter set. By analogy, the above operation can be performed on each feasible feature map parameter set, and finally, a corresponding target feature map parameter is determined for each convolution layer from the feasible feature map parameter set of each convolution layer.
It should be understood that, since the present application can find a corresponding target feature map parameter with the highest data transfer efficiency for each convolutional layer, and further, for a neural network model including a plurality of convolutional layers, the present application can find a weight multiplexing or feature map multiplexing mode with the highest data transfer efficiency (i.e., data multiplexing rate) for each convolutional layer of the neural network model. That is to say, the method and the device can find a best data multiplexing mode for each convolution layer of the neural network model to improve the overall data multiplexing rate of the neural network model, reduce the operation amount of the executable file corresponding to the neural network model during operation, and improve the operation efficiency of the executable file (that is, the convolution calculations of different sizes can find the best splitting combination and data multiplexing mode, so that the overall data multiplexing rate of the network is improved, and the reasoning efficiency is improved). The method is characterized in that the method comprises the following steps of calculating the optimal splitting size of convolution calculation of any size, and realizing flexible splitting of different types of convolution calculation in the same network, so that the overall data reuse rate of the network is improved, and the network reasoning efficiency is indirectly improved.
Therefore, the method provided by the application can reduce the operation amount of the executable file compiled by the neural network model during operation, and improve the operation efficiency of the executable file. In addition, the method provided by the application is flexible to realize, the supported convolution calculation size is not limited by the hardware support size, and the method can support large-size convolution operation.
In some possible embodiments, for how to determine a corresponding target feature map parameter for each convolutional layer from the feasible feature map parameter set of each convolutional layer, referring to fig. 4, S120 may include, based on fig. 2:
S120A, determining the data repetitive loading quantity corresponding to each feasible feature map parameter according to the value of each feasible feature map parameter in the feasible feature map parameter set of each convolutional layer.
In some possible embodiments, after obtaining various splitting size combinations of the feasible output characteristic diagram parameter set, the feasible input characteristic diagram parameter set, and the feasible weight parameter (after obtaining the feasible characteristic diagram parameter set of each convolution layer), when no splitting is performed, the input characteristic diagram, the weight, and the output characteristic diagram are subjected to a DDR (Double data rate SDRAM) to SRAM (or SRAM to DDR) data transmission process only once, and if the number of times of loading the whole data portion is greater than one, statistics are included. When splitting is performed, different data repeat loading amount calculation modes exist according to different split size combinations, and the following steps are performed:
in the calculation method 1, when only the output channel is split (that is, the channel dimension parameter inf _ c, the height dimension parameter inf _ h, and the width dimension parameter inf _ w of the input feature map, the channel dimension parameter ouf _ c, the height dimension parameter ouf _ h, and the width dimension parameter ouf _ w of the output feature map are kept unchanged while the weight channel parameter wt _ c, the weight dimension parameter wt _ n, the weight height parameter wt _ h, and the weight width parameter wt _ w are split), the feature map is preferably multiplexed on the plurality of convolutional layers of the neural network model. In this case, although the weight needs to be loaded a plurality of times, the weights loaded a plurality of times are independent of each other, and there is no duplication. Further, in the case of a multiplexed signature, all data need only be loaded once. Therefore, when only the output channel is divided, the data is repeatedly loaded by the amount F1Is zero.
In the calculation method 2, when both the output width and the output height are divided (that is, the channel dimension parameter inf _ c, the height dimension parameter inf _ h, and the width dimension parameter inf _ w of the input feature map, the channel dimension parameter ouf _ c, the height dimension parameter ouf _ h, and the width dimension parameter ouf _ w of the output feature map are divided, and the weight channel parameter wt _ c, the weight dimension parameter wt _ n, the weight height parameter wt _ h, and the weight width parameter wt _ w are kept unchanged), the weight multiplexing of the plurality of convolutional layers of the neural network model is more optimal. Similar to the calculation mode 1, the feature map needs to be loaded multiple times, and the weight needs to be loaded only once. However, since the width and the height are divided, fixed repeated data is introduced into the input feature diagram according to the scanning characteristic of convolution, and the repeated data quantity F is further in this case2Can be calculated using the following formula:
where s is a convolution step parameter, H _ num is the number of subblocks partitioned in the height (H) dimension of the input feature map, and W _ num is the number of subblocks partitioned in the width (W) dimension of the input feature map.
In the calculation method 3, when the output width, height and channel are split simultaneously (that is, the channel dimension parameter inf _ c, the height dimension parameter inf _ h and the width dimension parameter inf _ w of the input feature map, the channel dimension parameter ouf _ c, the height dimension parameter ouf _ h and the width dimension parameter ouf _ w of the output feature map, the weight channel parameter wt _ c, the weight dimension parameter wt _ n, the weight height parameter wt _ h and the weight width parameter wt _ w are split), one sub-feature map needs to be convolved with a plurality of weight sub-blocks, and one weight sub-block also needs to be convolved with a plurality of sub-feature maps. At this time, if the feature map is multiplexed, the feature map is loaded only once, the feature map repeat load amount due to splitting needs to be calculated, and the weight needs to be loaded h _ num × w _ num-1 times in an overall repeat manner, so the data repeat amount calculation formula is:
F3=(h_num*w_num-1)*ouf_c*inf_c*wt_h*wt_w+F2;
if the weight is multiplexed, the weight is loaded only once, the characteristic diagram needs to be loaded repeatedly (wt _ num-1) times in a whole manner, and the data repetition calculation formula is as follows:
F′3=(wt_num-1)*inf_c*inf_h*inf_w+wt_num*F2;
and wt _ num is the number of subblocks split by the weight in the N dimension, and is repeated data generated by splitting the W dimension and the H dimension of the input characteristic diagram.
And S120B, taking the feasible feature map parameter with the minimum data repetitive loading amount in the feasible feature map parameter set of each convolution layer as the target feature map parameter corresponding to each convolution layer.
In other possible embodiments, in order to further improve the operation efficiency of the method provided in the present application, the neural network model includes a plurality of convolutional layers, and for how to "determine a corresponding target feature map parameter for each convolutional layer from the set of feasible feature map parameters of each convolutional layer", with reference to fig. 5, S120 may include, on the basis of fig. 2:
s120a, determining the data repetitive load quantity corresponding to each feasible feature map parameter according to the value of each feasible feature map parameter in the feasible feature map parameter set of the target convolutional layer.
In this embodiment, for how to determine the data repeat load amount corresponding to each feasible feature map parameter according to the value of each feasible feature map parameter in the feasible feature map parameter set of the target convolutional layer, "refer to S120A described above, and details are not repeated here.
S120b, determining an optimal feature map parameter from the feasible feature map parameter set of the target convolutional layer according to the preset number of processing cores and the data repetition loading amount; the target convolutional layer is any one of the plurality of convolutional layers, and the optimal characteristic map parameter is a feasible characteristic map parameter with the highest data transfer efficiency in a feasible characteristic map parameter set of the target convolutional layer.
In some possible embodiments, after the data repeat loading amounts F of various split size combinations are counted, the feasible characteristic diagram parameters with the highest data transfer efficiency may be determined by comprehensively considering the data repeat loading amounts and the MAC resource utilization rates, that is, the optimal split combination is found by using the criteria that the data repeat loading amounts are less and the MAC resource utilization rates are higher.
Hardware MAC resource utilization is associated with a particular hardware layout, in which hardware design, the MAC resource of the hardware is assumed to be divided into NpeGroups, each called a PE (logic core), and each PE computes the data of one output channel, so that when the output channel is NpeThe efficiency is the highest at the multiple, and at this time, the data transfer efficiency of any two feasible feature map parameters in the feasible feature map parameter set of the target convolutional layer can be compared by the following formula:
wherein, any two feasible feature map parameters in the feasible feature map parameter set of the target convolutional layer are assumed to be goorp 1 and group2, respectively, and T in the formulasFor handling unit data time, TpTime consuming to compute a single channel size-specific convolution, Ts、TpAll can be obtained by testing in advance; s _ ouf _ c is the number of original output channels; ouf _ cgroup1、ouf_cgroup1The maximum number of output channels, goorp 1 and group2 respectively; ouf _ cleangroup1、ouf_clastgroup2The number of channels is output for the last sub-block generated by the segmentation respectively. OperationsRounding is done downwards and upwards respectively. When the calculation result w of the formula is larger than zero, the gourp2 is better; otherwise, gourp1 is more preferred.
And then, traversing and comparing any two feasible feature map parameters in the feasible feature map parameter set of the target convolutional layer according to the formula, so as to determine an optimal feature map parameter from the feasible feature map parameter set of the target convolutional layer.
And S120c, taking the optimal feature map parameters as target feature map parameters corresponding to the target convolutional layer.
Since the target convolutional layer is any one of the plurality of convolutional layers, the processing of S120a, S120b, and S120c is performed for the feasible feature map parameter set of each convolutional layer, so that a corresponding target feature map parameter with the highest data transfer efficiency can be determined for each convolutional layer.
Referring to fig. 2 again, in S130, an executable file is generated for the neural network model according to the target feature map parameters corresponding to each convolutional layer.
In some possible embodiments, for how "generating an executable file for the neural network model according to the target feature map parameters corresponding to each convolutional layer", S130 may include: and generating a three-dimensional Direct Memory Access (DMA) data carrying instruction according to the target characteristic diagram parameters corresponding to each convolution layer, and generating an executable file of the neural network model according to the three-dimensional DMA data carrying instruction. The three-dimensional DMA data transfer command can be understood as a three-dimensional DMA data transfer command.
It can be understood that, the feature map is carried by using the three-dimensional DMA, and seamless splicing of each sub-block on the three-dimensional logic size can be directly realized in the process of outputting the sub-feature map without additional slice/match operation.
Further, in order to implement the data processing function of the neural network model, referring to fig. 6 on the basis of fig. 2, after S130, the method may further include:
and S140, executing the executable file to realize the data processing function of the neural network model.
In this embodiment, the data processing function of the neural network model may be natural language processing, image recognition, voice recognition, data mining, personalized recommendation, and the like. Furthermore, the method and the device can split the off-line characteristic diagram in the compiler, so that processing logic during running can be reduced, hardware logic complexity is reduced, and implementation cost is reduced.
It should be understood that by the compiling method of the neural network model provided by the application, the original feature map parameters of each convolution layer in the neural network model can be split, and a corresponding target feature map parameter with the highest data transfer efficiency is found for each convolution layer, so that the overall data reuse rate of the neural network model is improved, the operation amount of an executable file corresponding to the neural network model during operation is reduced, and the operation efficiency of the executable file is improved.
In order to execute the corresponding steps in the above embodiments and various possible manners, an implementation manner of a neural network model compiling apparatus is given below, please refer to fig. 7, and fig. 7 shows a functional block diagram of the neural network model compiling apparatus provided by the embodiment of the present application. It should be noted that the basic principle and the generated technical effect of the compiling apparatus 200 of the neural network model provided in the present embodiment are the same as those of the above embodiments, and for the sake of brief description, no part of the present embodiment is mentioned, and reference may be made to the corresponding contents in the above embodiments. The neural network model compiling apparatus 200 includes: the device comprises an acquisition module 210, a splitting module 220, a generation module 230 and an operation module 240.
Alternatively, the modules may be stored in a memory in the form of software or Firmware (Firmware) or be fixed in an Operating System (OS) of the electronic device 100 provided in the present application, and may be executed by a processor in the electronic device 100. Meanwhile, data, codes of programs, and the like required to execute the above modules may be stored in the memory.
The obtaining module 210 may be configured to obtain raw feature map parameters of each convolutional layer in the neural network model.
It is to be appreciated that acquisition module 210 can be utilized to support electronic device 100 in performing the aforementioned S100, and/or other processes for the techniques described herein.
The splitting module 220 may be configured to split the original feature map parameters according to the input/output parameter relation and the memory capacity, so as to obtain a feasible feature map parameter set of each convolutional layer.
It is to be appreciated that the splitting module 220 can be utilized to support the electronic device 100 in performing the aforementioned S110, and/or the like, and/or other processes for the techniques described herein.
The splitting module 220 may be further configured to determine a corresponding target feature map parameter for each convolutional layer from the set of feasible feature map parameters of each convolutional layer; and the data handling efficiency corresponding to the target characteristic diagram parameters is highest.
It is to be appreciated that the splitting module 220 can be utilized to support the electronic device 100 in performing the above-described S120, and/or the like, and/or other processes for the techniques described herein.
The generating module 230 may be configured to generate an executable file for the neural network model according to the target feature map parameters corresponding to each convolutional layer.
It will be appreciated that generation module 230 may be used to support electronic device 100 in performing the above-described S130, and/or the like, and/or other processes for the techniques described herein.
The execution module 240 may be used to execute the executable file to implement the data processing functions of the neural network model.
It is to be appreciated that the execution module 240 can be utilized to support the electronic device 100 in performing the aforementioned S140, and/or the like, and/or other processes for the techniques described herein.
In some possible embodiments, the original feature map parameters in the present application may include original output feature map parameters, and for how to split the original feature map parameters according to the input-output parameter relational expression and the memory capacity to obtain a feasible feature map parameter set of each convolution layer, the splitting module 220 may be configured to split the original output feature map parameters according to the input-output parameter relational expression and the memory capacity to obtain a feasible output feature map parameter set of each convolution layer; the splitting module 220 may also be configured to determine a feasible input feature map parameter set of each convolution layer according to the feasible output feature map parameter set of each convolution layer and the input-output parameter relational expression; the splitting module 220 may be further configured to determine a feasible feature map parameter set for each convolutional layer according to the feasible output feature map parameter set and the feasible input feature map parameter set for each convolutional layer.
It is to be appreciated that the splitting module 220 may be utilized to support the electronic device 100 in performing the above-described S110A, S110B, S110C, and/or other processes for the techniques described herein.
In some possible embodiments, for how to determine a corresponding target feature map parameter for each convolutional layer from the feasible feature map parameter set of each convolutional layer, the splitting module 220 may be configured to determine a data repetition load amount corresponding to each feasible feature map parameter according to a value of each feasible feature map parameter in the feasible feature map parameter set of each convolutional layer; the splitting module 220 may further be configured to use the feasible feature map parameter with the minimum data repetition loading amount in the feasible feature map parameter set of each convolutional layer as the target feature map parameter corresponding to each convolutional layer.
It is to be appreciated that the split module 220 can be utilized to support the electronic device 100 in performing the above-described S120A, S120B, and/or the like, and/or other processes for the techniques described herein.
In other possible embodiments, in order to further improve the operating efficiency of the method provided by the present application, the neural network model includes a plurality of convolutional layers, and as to how to "determine a corresponding target feature map parameter for each convolutional layer from the feasible feature map parameter set of each convolutional layer", the splitting module 220 may be configured to determine, according to a value of each feasible feature map parameter in the feasible feature map parameter set of the target convolutional layer, a data repetition loading amount corresponding to each feasible feature map parameter; the splitting module 220 may also be configured to determine an optimal feature map parameter from the feasible feature map parameter set of the target convolutional layer according to the preset number of processing cores and the data repetition loading amount; the target convolutional layer is any one of the plurality of convolutional layers, and the optimal characteristic map parameter is a feasible characteristic map parameter with the highest data transfer efficiency in a feasible characteristic map parameter set of the target convolutional layer; the splitting module 220 may further be configured to use the optimal feature map parameter as a target feature map parameter corresponding to the target convolutional layer.
It is to be appreciated that the splitting module 220 may be utilized to support the electronic device 100 in performing the above-described S120a, S120b, S120c, and/or other processes for the techniques described herein.
Based on the above method embodiment, the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the steps of the method for compiling a neural network model.
Specifically, the storage medium may be a general storage medium, such as a mobile disk, a hard disk, and the like, and when a computer program on the storage medium is executed, the compiling method of the neural network model may be executed, so as to solve the problems that the current optimization method is simple and the running efficiency of a compiled executable file is still very low, and achieve the purposes of compiling the neural network model into an executable file, reducing the operation amount of the executable file during running, and improving the running efficiency of the executable file.
To sum up, the embodiment of the present application provides a compiling method, an apparatus, a storage medium and an electronic device for a neural network model, where the method includes: acquiring an original characteristic diagram parameter of each convolution layer in the neural network model; splitting the original characteristic diagram parameters according to the input-output parameter relational expression and the memory capacity to obtain a feasible characteristic diagram parameter set of each convolution layer; determining a corresponding target characteristic map parameter for each convolution layer from the feasible characteristic map parameter set of each convolution layer; the data handling efficiency corresponding to the target characteristic diagram parameters is highest; and generating an executable file for the neural network model according to the target characteristic diagram parameters corresponding to each convolution layer. According to the compiling method of the neural network model, the original characteristic map parameters of each convolution layer in the neural network model can be split, a corresponding target characteristic map parameter with the highest data carrying efficiency is found for each convolution layer, the overall data reuse rate of the neural network model is further improved, the operation amount of an executable file corresponding to the neural network model in operation is reduced, and the operation efficiency of the executable file is improved.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (10)
1. A method of compiling a neural network model, the method comprising:
acquiring an original characteristic diagram parameter of each convolution layer in the neural network model;
splitting the original characteristic diagram parameters according to the input-output parameter relational expression and the memory capacity to obtain a feasible characteristic diagram parameter set of each convolution layer;
determining a corresponding target characteristic map parameter for each convolution layer from the feasible characteristic map parameter set of each convolution layer; the data handling efficiency corresponding to the target characteristic diagram parameters is highest;
and generating an executable file for the neural network model according to the target characteristic diagram parameters corresponding to each convolution layer.
2. The method of claim 1, wherein the raw feature map parameters comprise raw output feature map parameters;
the step of splitting the original characteristic diagram parameters according to the input-output parameter relational expression and the memory capacity to obtain a feasible characteristic diagram parameter set of each convolution layer comprises the following steps:
splitting the original output characteristic diagram parameters according to the input-output parameter relational expression and the memory capacity to obtain a feasible output characteristic diagram parameter set of each convolution layer;
determining a feasible input characteristic diagram parameter set of each convolution layer according to the feasible output characteristic diagram parameter set of each convolution layer and the input-output parameter relational expression;
and determining a feasible feature map parameter set of each convolutional layer according to the feasible output feature map parameter set and the feasible input feature map parameter set of each convolutional layer.
3. The method of claim 2, wherein the step of determining a corresponding target profile parameter for each convolutional layer from the set of feasible profile parameters for each convolutional layer comprises:
determining the data repeated loading capacity corresponding to each feasible feature map parameter according to the value of each feasible feature map parameter in the feasible feature map parameter set of each convolutional layer;
and taking the feasible characteristic diagram parameter with the minimum data repeated loading amount in the feasible characteristic diagram parameter set of each convolution layer as the target characteristic diagram parameter corresponding to each convolution layer.
4. The method of claim 2, wherein the neural network model comprises a plurality of convolutional layers, and wherein the step of determining a corresponding target feature map parameter for each convolutional layer from the set of feasible feature map parameters for each convolutional layer comprises:
determining the data repeated loading capacity corresponding to each feasible feature map parameter according to the value of each feasible feature map parameter in the feasible feature map parameter set of the target convolutional layer;
determining an optimal characteristic map parameter from a feasible characteristic map parameter set of the target convolutional layer according to the number of preset processing cores and the data repetition loading amount; the target convolutional layer is any one of the plurality of convolutional layers, and the optimal characteristic map parameter is a feasible characteristic map parameter with the highest data transfer efficiency in a feasible characteristic map parameter set of the target convolutional layer;
and taking the optimal characteristic map parameters as target characteristic map parameters corresponding to the target convolutional layer.
5. The method of claim 1, wherein the step of generating an executable file for the neural network model based on the target profile parameters corresponding to each convolutional layer comprises:
and generating a three-dimensional Direct Memory Access (DMA) data carrying instruction according to the target characteristic diagram parameters corresponding to each convolution layer, and generating an executable file of the neural network model according to the three-dimensional DMA data carrying instruction.
6. The method of claim 1, further comprising:
executing the executable file to implement the data processing function of the neural network model.
7. An apparatus for compiling a neural network model, comprising:
the acquisition module is used for acquiring the original characteristic map parameters of each convolution layer in the neural network model;
the splitting module is used for splitting the original characteristic diagram parameters according to the input-output parameter relational expression and the memory capacity to obtain a feasible characteristic diagram parameter set of each convolution layer;
the splitting module is further configured to determine a corresponding target feature map parameter for each convolution layer from the feasible feature map parameter set of each convolution layer; the data handling efficiency corresponding to the target characteristic diagram parameters is highest;
and the generating module is used for generating an executable file for the neural network model according to the target characteristic diagram parameters corresponding to each convolution layer.
8. The apparatus of claim 7, wherein the raw feature map parameters comprise raw output feature map parameters;
the splitting module is used for splitting the original output characteristic diagram parameters according to the input-output parameter relational expression and the memory capacity to obtain a feasible output characteristic diagram parameter set of each convolution layer;
the splitting module is further used for determining a feasible input characteristic diagram parameter set of each convolution layer according to the feasible output characteristic diagram parameter set of each convolution layer and the input-output parameter relational expression;
the splitting module is further used for determining a feasible feature map parameter set of each convolutional layer according to the feasible output feature map parameter set and the feasible input feature map parameter set of each convolutional layer.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method of compiling a neural network model according to any one of claims 1 to 6.
10. An electronic device comprising a processor and a memory, the memory storing machine readable instructions, the processor being configured to execute the machine readable instructions to implement the method of compiling a neural network model of any one of claims 1-6.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010601610.2A CN111738423A (en) | 2020-06-28 | 2020-06-28 | Method and device for compiling neural network model, storage medium and electronic equipment |
PCT/CN2020/135681 WO2022001014A1 (en) | 2020-06-28 | 2020-12-11 | Neural network model compilation method and apparatus, storage medium, and electronic device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010601610.2A CN111738423A (en) | 2020-06-28 | 2020-06-28 | Method and device for compiling neural network model, storage medium and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111738423A true CN111738423A (en) | 2020-10-02 |
Family
ID=72651518
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010601610.2A Pending CN111738423A (en) | 2020-06-28 | 2020-06-28 | Method and device for compiling neural network model, storage medium and electronic equipment |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111738423A (en) |
WO (1) | WO2022001014A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022001014A1 (en) * | 2020-06-28 | 2022-01-06 | 湖南国科微电子股份有限公司 | Neural network model compilation method and apparatus, storage medium, and electronic device |
CN114239803A (en) * | 2021-12-13 | 2022-03-25 | 北京地平线机器人技术研发有限公司 | Method and device for compiling neural network model, electronic equipment and storage medium |
CN114625378A (en) * | 2022-03-28 | 2022-06-14 | 北京地平线机器人技术研发有限公司 | Method and device for compiling neural network model and computer readable storage medium |
CN116415103A (en) * | 2023-06-09 | 2023-07-11 | 之江实验室 | Data processing method, device, storage medium and electronic equipment |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115982110B (en) * | 2023-03-21 | 2023-08-29 | 北京探境科技有限公司 | File running method, file running device, computer equipment and readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106650922A (en) * | 2016-09-29 | 2017-05-10 | 清华大学 | Hardware neural network conversion method, computing device, compiling method and neural network software and hardware collaboration system |
US20190251424A1 (en) * | 2018-02-13 | 2019-08-15 | Beijing Kuangshi Technology Co., Ltd. | Operation apparatus, operation execution device and operation execution method |
CN110555516A (en) * | 2019-08-27 | 2019-12-10 | 上海交通大学 | FPGA-based YOLOv2-tiny neural network low-delay hardware accelerator implementation method |
CN110929860A (en) * | 2019-11-07 | 2020-03-27 | 深圳云天励飞技术有限公司 | Convolution acceleration operation method and device, storage medium and terminal equipment |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111738423A (en) * | 2020-06-28 | 2020-10-02 | 湖南国科微电子股份有限公司 | Method and device for compiling neural network model, storage medium and electronic equipment |
-
2020
- 2020-06-28 CN CN202010601610.2A patent/CN111738423A/en active Pending
- 2020-12-11 WO PCT/CN2020/135681 patent/WO2022001014A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106650922A (en) * | 2016-09-29 | 2017-05-10 | 清华大学 | Hardware neural network conversion method, computing device, compiling method and neural network software and hardware collaboration system |
US20190251424A1 (en) * | 2018-02-13 | 2019-08-15 | Beijing Kuangshi Technology Co., Ltd. | Operation apparatus, operation execution device and operation execution method |
CN110555516A (en) * | 2019-08-27 | 2019-12-10 | 上海交通大学 | FPGA-based YOLOv2-tiny neural network low-delay hardware accelerator implementation method |
CN110929860A (en) * | 2019-11-07 | 2020-03-27 | 深圳云天励飞技术有限公司 | Convolution acceleration operation method and device, storage medium and terminal equipment |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022001014A1 (en) * | 2020-06-28 | 2022-01-06 | 湖南国科微电子股份有限公司 | Neural network model compilation method and apparatus, storage medium, and electronic device |
CN114239803A (en) * | 2021-12-13 | 2022-03-25 | 北京地平线机器人技术研发有限公司 | Method and device for compiling neural network model, electronic equipment and storage medium |
CN114625378A (en) * | 2022-03-28 | 2022-06-14 | 北京地平线机器人技术研发有限公司 | Method and device for compiling neural network model and computer readable storage medium |
CN116415103A (en) * | 2023-06-09 | 2023-07-11 | 之江实验室 | Data processing method, device, storage medium and electronic equipment |
CN116415103B (en) * | 2023-06-09 | 2023-09-05 | 之江实验室 | Data processing method, device, storage medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
WO2022001014A1 (en) | 2022-01-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111738423A (en) | Method and device for compiling neural network model, storage medium and electronic equipment | |
US20210224654A1 (en) | Batch Processing In A Neural Network Processor | |
US20190164045A1 (en) | Method and apparatus for performing operation of convolutional layer in convolutional neural network | |
TWI627593B (en) | Rotating data for neural network computations | |
CN109656623B (en) | It executes the method and device of convolution algorithm operation, generate the method and device of instruction | |
US20190065958A1 (en) | Apparatus and Methods for Training in Fully Connected Layers of Convolutional Networks | |
US11461632B2 (en) | Method and apparatus for adapting parameters of neural network | |
JP7488185B2 (en) | Image Transformation for Machine Learning | |
WO2017116924A1 (en) | Neural network training performance optimization framework | |
CN113994350A (en) | Generating parallel computing schemes for neural networks | |
Bottleson et al. | clcaffe: Opencl accelerated caffe for convolutional neural networks | |
US9767074B2 (en) | Method and device for fast fourier transform | |
CN111091572B (en) | Image processing method and device, electronic equipment and storage medium | |
CN110390075A (en) | Matrix preprocess method, device, terminal and readable storage medium storing program for executing | |
CN112001491A (en) | Search method and device for determining neural network architecture for processor | |
CN114792124A (en) | Implementing dilated convolutions in hardware | |
CN113254867A (en) | Automatic configuration template generation method and device, server and storage medium | |
CN116090518A (en) | Feature map processing method and device based on systolic operation array and storage medium | |
CN114461390A (en) | Evaluation method combining multi-dimensional analysis and critical path method and related device | |
CN117413280A (en) | Convolution with kernel expansion and tensor accumulation | |
CN110347511B (en) | Geographic distributed process mapping method and device containing privacy constraint conditions and terminal | |
US8887115B1 (en) | Assigning method, recording medium, information processing apparatus, and analysis system | |
CN113066038A (en) | Image evaluation method and device, electronic equipment and computer storage medium | |
CN111832714A (en) | Operation method and device | |
CN114115804B (en) | Multiplier conversion method, system, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |