CN111738424A - Neural network processing method, neural network processing device, electronic equipment and storage medium - Google Patents

Neural network processing method, neural network processing device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111738424A
CN111738424A CN202010612070.8A CN202010612070A CN111738424A CN 111738424 A CN111738424 A CN 111738424A CN 202010612070 A CN202010612070 A CN 202010612070A CN 111738424 A CN111738424 A CN 111738424A
Authority
CN
China
Prior art keywords
splitting
neural network
convolutional layer
mode
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010612070.8A
Other languages
Chinese (zh)
Other versions
CN111738424B (en
Inventor
曾华
唐荔
唐剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Goke Microelectronics Co Ltd
Original Assignee
Hunan Goke Microelectronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Goke Microelectronics Co Ltd filed Critical Hunan Goke Microelectronics Co Ltd
Priority to CN202010612070.8A priority Critical patent/CN111738424B/en
Publication of CN111738424A publication Critical patent/CN111738424A/en
Application granted granted Critical
Publication of CN111738424B publication Critical patent/CN111738424B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of neural networks, and provides a neural network processing method, a neural network processing device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a neural network to be processed, wherein the neural network comprises at least one convolution layer, and each convolution layer corresponds to an input feature map and a convolution parameter; determining the splitting mode of each convolutional layer and the data multiplexing mode of the convolutional layer according to the input characteristic diagram and the convolution parameters of each convolutional layer; and splitting the convolution operation of the neural network according to the splitting mode and the data multiplexing mode of each convolution layer to obtain a splitting result of the convolution operation of the neural network. Compared with the prior art, the embodiment of the invention determines the splitting mode of the convolutional layer and the data multiplexing mode of the convolutional layer for each convolutional layer, so that each convolutional layer can be split for convolution calculation according to the splitting mode and the data multiplexing mode of the input feature diagram suitable for the convolution, and the calculation acceleration efficiency of the neural network is improved.

Description

Neural network processing method, neural network processing device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of neural network technologies, and in particular, to a neural network processing method and apparatus, an electronic device, and a storage medium.
Background
In the existing neural network accelerator, because the network layer parameter quantity in the neural network is large, when the neural network is processed, a compiler often needs to split the feature map of a convolutional layer, the convolutional operation of one convolutional layer is split into a plurality of sub-operations by splitting and using a multiplexing weight or a data multiplexing mode of the feature map, the plurality of sub-operations are respectively and independently calculated, and finally the accelerated operation of the neural network is realized.
In the prior art, the feature diagram splitting and the data multiplexing are realized by hardware, but the calculation acceleration efficiency of the neural network is not high by the realization mode.
Disclosure of Invention
An object of the embodiments of the present invention is to provide a neural network processing method, apparatus, electronic device, and storage medium, which can improve the calculation acceleration efficiency of a neural network.
In order to achieve the above purpose, the embodiment of the present invention adopts the following technical solutions:
in a first aspect, this embodiment provides a neural network processing method, where the method includes: acquiring a neural network to be processed, wherein the neural network comprises at least one convolution layer, and each convolution layer corresponds to an input feature map and a convolution parameter; determining the splitting mode of each convolutional layer and the data multiplexing mode of the convolutional layer according to the input characteristic diagram and the convolution parameters of each convolutional layer; and splitting the convolution operation of the neural network according to the splitting mode and the data multiplexing mode of each convolution layer to obtain a splitting result of the convolution operation of the neural network.
In a second aspect, the present embodiment provides a neural network processing apparatus, including an obtaining module, a determining module, and a splitting module, where the obtaining module is configured to obtain a neural network to be processed, where the neural network includes at least one convolutional layer, and each convolutional layer corresponds to an input feature map and a convolutional parameter; the determining module is used for determining the splitting mode of each convolutional layer and the data multiplexing mode of the convolutional layer according to the input characteristic diagram and the convolution parameters of the convolutional layer; and the splitting module is used for splitting the convolution operation of the neural network according to the splitting mode and the data multiplexing mode of each convolution layer to obtain the splitting result of the convolution operation of the neural network.
In a third aspect, the present embodiment provides an electronic device, including: one or more processors; memory for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement a neural network processing method as in any one of the preceding embodiments.
In a fourth aspect, the present embodiment provides a computer-readable storage medium, on which a computer program is stored, and the computer program is executed by a processor to execute the neural network processing method according to any one of the foregoing embodiments.
Compared with the prior art, embodiments of the present invention provide a neural network processing method, an apparatus, an electronic device, and a storage medium, in which a splitting manner of a convolutional layer and a data multiplexing manner of the convolutional layer are determined for each convolutional layer, so that each convolutional layer can perform splitting of convolutional calculation according to the splitting manner and the data multiplexing manner of an input feature map suitable for the convolutional layer, thereby improving calculation acceleration efficiency of a neural network.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 shows a flowchart of a neural network processing method according to an embodiment of the present invention.
Fig. 2 is a flowchart illustrating another neural network processing method according to an embodiment of the present invention.
Fig. 3 is a flowchart illustrating another neural network processing method according to an embodiment of the present invention.
Fig. 4 shows a block diagram of a neural network processing device according to an embodiment of the present invention.
Fig. 5 is a block diagram of an electronic device according to an embodiment of the present invention.
Icon: 10-an electronic device; 11-a memory; 12-a communication interface; 13-a processor; 14-a bus; 100-a neural network processing device; 110-an obtaining module; 120-a determination module; 130-splitting module; 140-operation module.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In the description of the present invention, it should be noted that if the terms "upper", "lower", "inside", "outside", etc. indicate an orientation or a positional relationship based on that shown in the drawings or that the product of the present invention is used as it is, this is only for convenience of description and simplification of the description, and it does not indicate or imply that the device or the element referred to must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present invention.
Furthermore, the appearances of the terms "first," "second," and the like, if any, are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
It should be noted that the features of the embodiments of the present invention may be combined with each other without conflict.
When an existing neural network accelerator performs acceleration processing on a neural network, the neural network generally needs to be compiled by a compiler, and because network layer parameters related to the neural network are more, the data volume of operation during operation of the neural network is overlarge, so that the compiler generally splits a characteristic diagram of the neural network during compilation, splits the operation of one neural network into a plurality of sub-operations, and finally realizes the operation of the whole neural network, wherein the plurality of sub-operations are independent operations.
In the prior art, splitting of the feature map is usually realized on a hardware level, and the convolution operation of a large-size feature map can be realized by splitting the feature map and multiplexing weights or the feature map, but the acceleration mode has the problem of low acceleration efficiency.
The inventor conducts intensive research based on the problem and finds that the splitting mode and the data multiplexing mode of the hardware implementation are fixed, namely, the splitting mode and the data multiplexing mode are the same for all the convolutional layers, the sizes of the input characteristic diagrams of all the convolutional layers are often different, and the splitting mode is the same for the input characteristic diagrams with different sizes, so that the splitting of some convolutional layers is not reasonable and efficient. The inventor also finds that different splitting modes and different multiplexing modes have different influences on the operation, for example, when the data volume of the input characteristic diagram is large and the data volume of the weight is small, the final overall data loading volume is small by adopting the multiplexing mode of multiplexing the characteristic diagram and repeatedly loading the weight; when the input characteristic diagram is small and the weight data amount is large, a multiplexing mode of multiplexing weight and repeatedly loading the input characteristic diagram is adopted, and the final overall data loading amount is small. Through the above analysis, the inventor finds that the adoption of the same splitting mode and multiplexing mode for each convolution layer is the reason that the acceleration efficiency of the neural network is not high.
Based on the reason for this problem, the inventor proposes a neural network processing method, device, electronic device, and storage medium, which determine, for each convolutional layer, a splitting manner of the convolutional layer and a data multiplexing manner of the convolutional layer, so that each convolutional layer can perform splitting of convolutional calculation according to the splitting manner and the data multiplexing manner of the input feature map suitable for the present convolution, thereby improving calculation acceleration efficiency of a neural network, which will be described in detail below.
Referring to fig. 1, fig. 1 is a flowchart illustrating a neural network processing method according to an embodiment of the present invention, where the method includes the following steps:
step S101, obtaining a neural network to be processed, wherein the neural network comprises at least one convolution layer, and each convolution layer corresponds to an input feature map and convolution parameters.
In this embodiment, Neural Networks NNs (Neural Networks, NNs) are also called Artificial Neural Networks (ANNs) or Connection models (Connection models), which are an algorithmic mathematical Model simulating animal Neural network behavior characteristics to perform distributed parallel information processing.
In the present embodiment, the Convolutional Neural network CNN (Convolutional Neural Networks, CNN) is a typical Neural network, which is a feed-forward Neural network fnn (feed forward Neural Networks) that includes convolution calculation and has a deep structure. The convolutional neural network includes convolutional layers, pooling layers, and fully-connected layers. The convolution parameters include a weight, which is also called a convolution kernel, a filter, and the like, a step size Stride indicating a step length of each of the convolutions in the horizontal direction and the vertical direction in the original picture, and a boundary filling parameter pad indicating a parameter for filling the edges of the image without changing the image size.
Step S102, determining the splitting mode of the convolutional layer and the data multiplexing mode of the convolutional layer according to the input characteristic diagram and the convolution parameter of each convolutional layer.
In this embodiment, the splitting manner of the convolutional layer may be splitting of the input feature map from any one dimension or two dimensions or three dimensions of the height dimension, the width dimension and the channel dimension of the input feature map.
The data multiplexing method of the convolutional layer may be either a weight multiplexing method or a signature multiplexing method.
Step S103, splitting the convolution operation of the neural network according to the splitting mode and the data multiplexing mode of each convolution layer to obtain the splitting result of the convolution operation of the neural network.
According to the neural network processing method provided by the embodiment of the invention, the splitting mode of the convolutional layer and the data multiplexing mode of the convolutional layer are determined for each convolutional layer, so that each convolutional layer can be split for convolution calculation according to the splitting mode and the data multiplexing mode of the input feature diagram suitable for the convolution, and the calculation acceleration efficiency of the neural network is improved.
On the basis of fig. 1, an embodiment of the present invention further provides a neural network processing method, which finally obtains a splitting manner of each convolutional layer and a multiplexing manner of the convolutional layer by splitting an output characteristic diagram of each convolutional layer, please refer to fig. 2, fig. 2 shows a flowchart of another neural network processing method provided in the embodiment of the present invention, and step S102 is a processing manner for any target convolutional layer in a neural network, and includes the following sub-steps:
and a substep S1021, determining the size of the output characteristic diagram of the target convolutional layer according to the step length of the target convolutional layer, the number of weights and the size of the input characteristic diagram, wherein the output characteristic diagram is obtained by the input characteristic diagram through the convolution operation of the target convolutional layer.
In this embodiment, the corresponding convolution modes are different according to different restrictions of the weight shift range of convolution, and there are three main convolution modes: the full mode, the same mode, the valid mode, and the method for calculating the size of the output feature map by different convolution modes are also different, and the present invention is not limited to a specific convolution mode. Taking the same mode as an example, the size of the feature graph is input: win*HinConvolution step size: s, the size of the output signature is: wout*HoutWidth, then Wout=math.ceil(Win/s),Hout=math.ceil(HinCeil () is a rounding up function. Taking validmode as an example, the feature map size is input: win*HinConvolution step size: s, weight size: k, the size of the output signature is: wout*HoutWidth, then Wout=math.ceil((Win–k+1)/s),Hout=math.ceil((Hin-k+1)/s)。
In addition, the output feature map may have channels in addition to the height and width, and the channels of the output feature map are equal to the number of weights.
As can be seen from the above description, the size of the corresponding output feature map may be obtained from the size of the input feature map and the convolution parameters of the corresponding convolutional layer, or the size of the corresponding input feature map may be obtained from the size of the output feature map and the convolution parameters of the corresponding convolutional layer.
In the substep S1022, the output characteristic diagram of the target convolutional layer is split according to a preset splitting manner and a preset data multiplexing manner, so as to obtain a first splitting manner and a target multiplexing manner of the output characteristic diagram of the target convolutional layer which satisfy preset conditions.
In this embodiment, there are various ways to split the output feature map of the target convolutional layer, for example, the output feature map is: 512 × 56 is split into 7 output feature sub-graphs of 288 × 8 × 56 and 7 output feature sub-graphs of 224 × 8 × 56, which is a splitting mode, and 8 output feature sub-graphs of 336 × 7 × 56 and 8 output feature sub-graphs of 176 × 7 × 56, which is another splitting mode. In this embodiment, the first splitting manner is a splitting manner that satisfies a preset condition among the splitting manners of the output feature maps of the target convolutional layer, and the target multiplexing manner is a data multiplexing manner that satisfies a preset condition among the data multiplexing manners of the output feature maps of the target convolutional layer.
In this embodiment, the output characteristic diagram of the target convolutional layer may have a plurality of splitting modes and data multiplexing modes, where the preset conditions include a first preset condition and a second preset condition, only the splitting mode that simultaneously satisfies the first preset condition and the second preset condition can be used as the first splitting mode, and correspondingly, only the data multiplexing mode that simultaneously satisfies the first preset condition and the second preset condition can be used as the target multiplexing mode. As a specific embodiment, the method for obtaining the first splitting manner and the target multiplexing manner may be:
firstly, splitting the output characteristic diagram of the target convolutional layer according to each preset splitting mode to obtain a plurality of initial splitting results meeting a first preset condition.
In this embodiment, the initial splitting result is a splitting result that satisfies a first preset condition among a plurality of splitting results split according to a plurality of preset splitting manners, and when the splitting result satisfies the first preset condition, it is correspondingly determined that the splitting manner corresponding to the splitting result satisfies the first preset condition.
In this embodiment, the processing procedure of each preset splitting manner is the same, for convenience of description, any one target splitting manner of the multiple preset splitting manners is described, and for any one target splitting manner, the method for obtaining the initial splitting result meeting the first preset condition may be:
firstly, splitting an output characteristic graph of a target convolutional layer according to a target splitting mode to obtain a splitting result of the output characteristic graph, wherein the splitting result of the output characteristic graph comprises the size of an output characteristic subgraph obtained after splitting the output characteristic graph and the number of the output characteristic subgraphs.
In this embodiment, the splitting results obtained after splitting the output feature map of the target convolutional layer according to different preset splitting manners are also different, for example, for the output feature map with the number of channels being 512, the width being 56, and the height being 56, only the width is split, the splitting manner of splitting into the width being 7 is different from the splitting manner of splitting into the width only, and the size of the output feature subgraph and the number of the output feature subgraphs obtained by splitting into the splitting manner of the width being 8 are different.
And secondly, obtaining the size of the input feature subgraph after splitting the corresponding input feature subgraph and the size of the sub-weight after splitting the weight according to the size, the step length and the pad of the output feature subgraph.
In this embodiment, the size (height and width) of the corresponding input feature sub-graph can be obtained according to the size (height and width), step size and pad of the output feature sub-graph according to the relationship between the input feature graph and the output feature graph in the sub-step S1021. Due to the convolution characteristic, the number of channels of the output characteristic subgraph is the weight number, so that the weight can be split according to the weight number, and the size of the sub-weight after splitting the weight is obtained. For example, the weights are: 512 (number) × 256 (number of channels) × 1 (height) × 1 (width), and the number of channels of the output feature sub-graph is 288, then the weight is split into 2 sub-weights, which are: 288*256*1*1, 224*256*1*1.
Thirdly, if the size of any output feature sub-graph is smaller than the first preset value, the size of any input feature sub-graph is smaller than the second preset value, and the size of any sub-weight is smaller than the third preset value, the splitting result of the output feature sub-graph is used as an initial splitting result meeting the first preset condition.
In this embodiment, the output feature sub-graph, the input feature sub-graph, and the sub-weights may respectively use different Static Random-Access memories (SRAMs), and the first preset value, the second preset value, and the third preset value are respectively the maximum SRAM capacities that can be used by the output feature sub-graph, the input feature sub-graph, and the sub-weights. In addition, as another implementation manner, the output feature subgraph, the input feature subgraph and the sub-weights can also share the SRAM, and at this time, the sum of the magnitudes of the output feature subgraph, the input feature subgraph and the sub-weights is smaller than the maximum capacity available for the shared SRAM.
As a specific embodiment, in order to obtain the initial splitting result satisfying the first preset condition as soon as possible, the maximum size of the output feature subgraph or the input feature subgraph may be determined by using the available capacities of the output feature subgraph, the input feature subgraph and the sub-weights, respectively, as the limiting conditions. For example, the input feature map 256 × 56 (channel count × height × width), the weight 512 × 256 × 1 (weight count × channel count × height × width), pad ═ 0, the step stride ═ 1, the input feature map, the weight, and the output feature map each have an exclusive capacity of 128KB SRAM, and when the splitting manner splits only the height and the channel, one splitting manner may be:
(1) the output feature size was calculated, resulting in 512 × 56.
(2) Assuming that the maximum height of the input feature sub-graph is h _ in, 256 × h _ in × 56<128 × 1024, the maximum value of h _ in can be calculated to be 9.14, h _ in is aligned with 8 (not necessarily), and the maximum h _ in is 8.
(3) And calculating the maximum height h _ out of the output characteristic subgraph according to h _ in, wherein h _ out is (h _ in-h _ wt + pad)/stride +1, h _ wt is the weight height and is 1, pad is 0, stride is 1, and h _ out is 8.
(4) And (3) calculating the maximum channel number c _ out of the output characteristic diagram according to the capacity of the SRAM of the output characteristic diagram, wherein c _ out h _ out 56<128 < 1024, substituting h _ out in the step (3) into 8, calculating to obtain c _ out which is 292.57, aligning the c _ out by 8, and rounding to obtain the maximum channel c _ out which is 288 of the output characteristic diagram.
Through the steps, the size of the split maximum input characteristic subgraph, the sub-weight and the maximum output characteristic subgraph can be obtained: inputting a characteristic diagram: 256 by 8 by 56; and (3) weighting: 288 × 256 × 1; outputting a characteristic diagram: 288 × 8 × 56, the result of the splitting method is: inputting a characteristic subgraph: 7 256 by 8 by 56; the sub-weight is: one 288 × 256 × 1 and one 224 × 256 × 1; outputting a characteristic subgraph: 7 288 × 8 × 56 and 7 224 × 8 × 56. On the basis, the height of the output characteristic subgraph can be gradually reduced, the height of the input characteristic subgraph can be different in splitting mode and corresponding to different initial splitting results.
Secondly, according to the preset splitting mode of each initial splitting result, determining a corresponding initial multiplexing mode from a plurality of preset data multiplexing modes, and calculating the data repeated loading capacity of the initial splitting result according to a target multiplexing mode.
In this embodiment, the output feature map and the input feature map may each include two dimensions: the height dimension and the width dimension, may also include three dimensions: a height dimension, a width dimension, and a channel dimension. The dimension combination is two dimensions or all combinations of three dimensions. Taking three dimensions as an example, all combinations of the three dimensions are 7, which are respectively: (height dimension), (width dimension), (channel dimension), (height dimension, width dimension), (height dimension, channel dimension), (width dimension, channel dimension), (height dimension, width dimension, channel dimension). Each initial splitting result corresponds to a dimension combination, for example, if the initial splitting result is to split only the height of the output feature map, the dimension combination corresponding to the initial splitting result is: (height dimension), if the initial splitting result is to split the height and width of the output characteristic diagram, the dimension combination corresponding to the initial splitting result is: (height dimension, width dimension), if the initial splitting result splits the height, width and channel of the output characteristic diagram, the dimension combination corresponding to the initial splitting result is: (height dimension, width dimension, channel dimension).
As a specific implementation manner, the method for calculating the data duplication loading amount of the initial splitting result may have the following three scenarios:
(1) and if the dimension combination corresponding to the initial splitting result only comprises channel dimensions, calculating the data repeated loading capacity of the initial splitting result in a characteristic diagram multiplexing mode.
In this embodiment, if the dimension combination corresponding to the initial splitting result only includes channel dimensions, that is, only channels of the output feature map are split, at this time, the feature map multiplexing mode is an optimal mode, although the weights need to be loaded for multiple times, the weights loaded for multiple times are independent of each other, and there is no duplication1Is zero.
(2) And if the dimension combination corresponding to the initial splitting result does not comprise the channel dimension, calculating the data repeated loading capacity of the initial splitting result by adopting a weight multiplexing mode.
In this embodiment, taking three dimensions as an example, the combination of dimensions excluding the channel dimensions is: (height dimension), (width dimension), (height dimension, width dimension). At this time, the weight multiplexing mode is preferably adopted, the input characteristic diagram needs to be loaded for multiple times, and the weight only needs to be loaded once. However, since the width and the height are divided, fixed repeated data is introduced into the input feature map according to the scanning characteristic of convolution, and in this case, the repeated data quantity F is2Can be calculated using the following formula:
F2=(wt_h-s)*inf_w*(h_num-1)+(wt_w-s)*inf_h*(w_num-1)+(wt_h-s)*(wt_w-s)*(h_num-1)*(w_num-1)*2
wt _ h is the height of the weight, s is the step size, inf _ w is the width of the input feature map, h _ num is the number of sub-input feature maps divided in the height dimension of the input feature map, wt _ w is the width of the weight, inf _ h is the height of the input feature map, and w _ num is the number of sub-input feature maps divided in the width dimension of the input feature map.
It should be noted that if the dimension combination is: (height dimension), the value of w _ num in the formula is 0, if the dimension combination is: (width dimension), h _ num in the formula takes the value 0.
(3) And if the dimension combination corresponding to the initial splitting result comprises the channel dimension and any one of the height dimension and the width dimension, respectively calculating the data repeated loading capacity of the initial splitting result by adopting a characteristic diagram multiplexing mode and a weight multiplexing mode.
In this embodiment, taking three dimensions as an example, the dimension combination including both the channel dimension and any one of the height dimension and the width dimension includes: (height dimension, channel dimension), (width dimension, channel dimension), (height dimension, width dimension, channel dimension). At this time, each splitting mode respectively calculates the corresponding data repeated loading amount according to the characteristic diagram multiplexing mode and the weight multiplexing mode.
In this embodiment, in this splitting manner, one input feature subgraph needs to be convolved with a plurality of sub-weights, and one sub-weight also needs to be convolved with a plurality of input feature subgraphs.
When the feature diagram multiplexing mode is adopted, the input feature diagram is only loaded once, the feature diagram repeated loading capacity caused by splitting is also required to be included, but the weight is required to be wholly and repeatedly loaded for h _ num × w _ num-1 times, and the data repetition calculation formula at the moment is as follows:
F3=(h_num*w_num-1)*ouf_c*inf_c*wt_h*wt_w+F2
where ouf _ c is the channel value of the output feature map, and inf _ c is the channel value of the input feature map.
When a weight multiplexing mode is adopted, the weight is loaded once, the characteristic diagram needs to be loaded with wt _ num-1 times in a whole repeated mode, and the data repetition calculation formula is as follows:
F3′=(wt_num-1)*inf_c*inf_h*inf_w+wt_num*F2
here, wt _ num is the number of sub-weights whose weights are divided in the number dimension, and as can be seen from the characteristics of convolution, wt _ num is ouf _ c.
And finally, taking the preset splitting mode of the initial splitting result corresponding to the data repeated loading quantity meeting the second preset condition as the first splitting mode, and taking the initial multiplexing mode of the initial splitting result corresponding to the data repeated loading quantity meeting the second preset condition as the target multiplexing mode.
In this embodiment, at least two ways are provided for determining whether the data reload amount satisfies the second preset condition:
(1) and judging the data repeated loading capacity corresponding to the minimum initial splitting result as the data repeated loading capacity meeting the second preset condition.
According to the embodiment of the invention, the corresponding data repeated loading amount is calculated according to the splitting mode of each convolutional layer and the corresponding data repeated loading mode, the splitting mode corresponding to the minimum data repeated loading amount and the corresponding data repeated loading mode are selected as the first splitting mode and the target multiplexing mode, and further according to the splitting mode of the target convolutional layer and the data multiplexing mode of the target convolutional layer obtained by the first splitting mode and the target multiplexing mode, the data repeated loading amount of the sub-operation split by the target convolutional layer is finally minimized, and the acceleration efficiency of convolutional operation of the convolutional layer is improved.
(2) And judging whether the data repeated loading amount meets a second preset condition or not according to the number of the hardware computing units and the data repeated loading amount. The specific judgment method may be:
first, the number of hardware computing units is obtained.
In this embodiment, one hardware calculation unit is responsible for calculating a channel of one output feature map, and when the number of channels of the split output feature map is an integral multiple of the number of hardware calculation units, the hardware calculation efficiency can be exerted to the greatest extent, and at this time, the hardware calculation efficiency is the highest. For example, the number of the hardware computing units is 8, and when the number of channels of the split output feature map is an integral multiple of 8, the hardware computing efficiency is highest.
And secondly, judging the alignment state of the preset splitting mode corresponding to each initial splitting result relative to the hardware computing units according to the number of the hardware computing units.
In this embodiment, if the number of channels of the output feature subgraph or the number of channels of the sub-weight obtained by splitting according to the preset splitting mode corresponding to the initial splitting result is an integral multiple of the hardware computing unit, it is determined that the alignment state of the preset splitting mode corresponding to the initial splitting result relative to the hardware computing unit is alignment, otherwise, the alignment state is non-alignment.
And thirdly, calculating a comprehensive evaluation value of the data repeated load of each initial splitting result according to a preset weight ratio of the data repeated load to the alignment state.
In this embodiment, in order to better and fully exert the computing efficiency of the hardware, a tradeoff needs to be made between the data duplication load and the number alignment of the hardware computing units, that is, the data duplication load occupies a little weight or the number alignment of the hardware computing units occupies a little weight, when the data duplication load occupies a little weight, the data duplication load is determined to be less than the priority and to satisfy the second preset condition, and when the number alignment of the hardware computing units occupies a little weight, the data duplication load corresponding to the number alignment of the hardware computing units is determined to satisfy the second preset condition.
As a specific embodiment, a weight ratio of the data repetitive load amount to the alignment state may be preset, and the comprehensive evaluation value of each initial splitting result may be calculated according to the weight ratio. For example, if the data duplication load of the initial split result 1 is 300, the aligned state is aligned, the data duplication load of the initial split result 2 is 100, the aligned state is not aligned, and if the weight ratio of the data duplication load to the aligned state is 1/4, the comprehensive evaluation value of the data duplication load of the initial split result 1 is: 300 (1/4) +1 (3/4) ═ 75.75, the integrated estimate of data repeat load for initial split result 2 was: 100 (1/4) +0 (3/4) ═ 25.
It should be noted that, if a dimension combination corresponding to an initial splitting result includes both a channel dimension and any one of a height dimension and a width dimension, at this time, one initial splitting result corresponds to two data repeat load amounts, and a corresponding comprehensive evaluation value needs to be calculated for each data repeat load amount corresponding to the initial splitting result.
And finally, determining the data repeated loading amount corresponding to the initial splitting result with the maximum comprehensive evaluation value as the data repeated loading amount meeting the second preset condition.
And a substep S1023 of determining a weight splitting mode of the target convolutional layer and a second splitting mode of the input feature map according to the first splitting mode.
In this embodiment, the number of channels after splitting the output feature graph of the first splitting manner is equal to the number of the split weights of the target convolutional layer, and according to the size of the output feature graph after splitting of the output feature graph of the first splitting manner and the number after splitting, the size of the input feature graph corresponding to the output feature graph and the number after splitting can be obtained, and finally, the second splitting manner of the input feature graph including the size of the input feature graph and the number after splitting can be obtained.
The embodiment of the invention splits the output characteristic graph, obtains the split of the input characteristic graph by splitting the output characteristic graph, can directly seamlessly splice the output characteristic subgraphs of the split input characteristic subgraphs, and finally obtains the complete output characteristic subgraph, thereby avoiding that the output characteristic subgraphs need extra computation of the output characteristic subgraphs due to the direct splitting of the input characteristic graph and then can be spliced into the complete output characteristic subgraph, and avoiding the reduction of the acceleration efficiency of the convolutional layer due to the extra computation of the output characteristic subgraph.
And a substep S1024 of using the weight splitting manner and the second splitting manner as splitting manners of the target convolutional layer and using the target multiplexing manner as a data multiplexing manner of the target convolutional layer.
In this embodiment, the splitting manner of the target convolutional layer includes a weight splitting manner of the target convolutional layer and a second splitting manner of the input feature map of the target convolutional layer.
On one hand, the neural network processing method provided by the embodiment of the invention provides two reasonable convolutional layer splitting modes for each convolutional layer: one mode can minimize the data repetition loading capacity of each convolution layer, so that the acceleration efficiency of the convolution layers is improved, and the other mode combines the data repetition capacity and fully utilizes hardware resources, so that each convolution layer is reasonably split, and the acceleration efficiency of the convolution layers is improved; on the other hand, the split of the input feature graph is obtained through the split of the output feature graph, the condition that extra computation of the output feature subgraph is needed by the output feature subgraph due to the fact that the input feature graph is directly split is avoided, and the condition that the acceleration efficiency of the convolutional layer is reduced due to the fact that extra computation of the output feature subgraph is avoided.
In this embodiment, after determining the splitting manner and the data multiplexing manner of each convolutional layer, it is further required to split the convolution operation of each convolutional layer according to the splitting manner and the data multiplexing manner of each convolutional layer and execute the split sub-operations, splice the results of all the sub-operations of each convolutional layer, and finally complete the convolution operation of the convolutional layer to obtain the output of the convolutional layer. Therefore, an embodiment of the present invention further provides a neural network processing method, please refer to fig. 3, and fig. 3 shows a flowchart of another neural network processing method provided in the embodiment of the present invention, where the method includes the following steps:
in step S201, a DMA command for transferring data of the input feature subgraph of each convolution layer is generated according to the size of the input feature subgraph of each sub-operation of each convolution layer and the number of the input feature subgraphs.
In this embodiment, a Direct Memory Access (DMA) instruction is used to transfer data of an input feature sub-graph for performing a sub-operation to a Memory for performing a corresponding operation.
Step S202, according to the data multiplexing method of each convolution layer, adjusting the operation sequence of each sub-operation of the convolution layer to generate the calculation instruction of the sub-operation.
In this embodiment, one convolution layer can be divided into a plurality of sub-operations, and the operation order of each sub-operation in the plurality of sub-operations of the convolution layer is adjusted according to the data multiplexing method of the convolution layer to generate a calculation instruction of each sub-operation.
Step S203, according to the DMA instruction and the calculation instruction of each sub-operation of each convolutional layer, completing the convolutional operation of each convolutional layer to obtain the operation result of the neural network.
In this embodiment, the convolution operation of each convolution layer may be split into a plurality of sub-operations, for each convolution layer, according to the size and the sub-weight of the input feature subgraph corresponding to each sub-operation of the convolution layer, the corresponding sub-operation is performed to obtain the operation result of the sub-operation, and the operation results of all the sub-operations of the convolution layer are spliced to obtain the output feature graph corresponding to the input feature graph of the convolution layer. In the neural network, the input of the first convolution layer may be a preset picture, the input of each subsequent convolution layer is the output of the last convolution layer, and when each convolution layer of the neural network performs convolution operation in sequence, the operation result of the neural network is finally obtained.
According to the neural network processing method provided by the embodiment of the invention, the corresponding data shifting DMA instruction and the calculation instruction are generated for each split sub-operation according to the split result of the convolution operation of each convolution layer, so that the efficient convolution operation of each convolution layer is realized.
In order to execute the corresponding steps in the above embodiments and various possible implementations, an implementation of a block schematic diagram of a neural network processing apparatus applied to an electronic device is given below, please refer to fig. 4, where fig. 4 shows a block schematic diagram of a neural network processing apparatus 100 applied to an electronic device according to an embodiment of the present invention. It should be noted that the basic principle and the resulting technical effect of the neural network processing apparatus 100 applied to the electronic device 10 provided in the present embodiment are the same as those of the above embodiments, and for a brief description, no mention is made in this embodiment, and reference may be made to the corresponding contents in the above embodiments.
The neural network processing device 100 includes:
an obtaining module 110, configured to obtain a neural network to be processed, where the neural network includes at least one convolution layer, and each convolution layer corresponds to an input feature map and a convolution parameter; determining the splitting mode of each convolutional layer and the data multiplexing mode of the convolutional layer according to the input characteristic diagram and the convolution parameters of each convolutional layer; and splitting the convolution operation of the neural network according to the splitting mode and the data multiplexing mode of each convolution layer to obtain a splitting result of the convolution operation of the neural network.
Specifically, the convolution parameters include a weight and a step size, and for any target convolution layer in the neural network, the obtaining module 110 is specifically configured to: determining the size of an output characteristic diagram of the target convolutional layer according to the step length of the target convolutional layer, the number of weights and the size of the input characteristic diagram, wherein the output characteristic diagram is obtained by performing convolution operation on the input characteristic diagram through the target convolutional layer; splitting the output characteristic diagram of the target convolutional layer according to a preset splitting mode and a preset data multiplexing mode to obtain a first splitting mode and a target multiplexing mode of the output characteristic diagram of the target convolutional layer meeting preset conditions; determining a weight splitting mode of the target convolutional layer and a second splitting mode of the input characteristic diagram according to the first splitting mode; the weight splitting mode and the second splitting mode of the target convolutional layer are used as splitting modes of the target convolutional layer, and the target multiplexing mode is used as a data multiplexing mode of the target convolutional layer.
Specifically, the preset splitting manner is multiple, the preset data multiplexing manner is multiple, the preset condition includes a first preset condition and a second preset condition, and the obtaining module 110 is specifically configured to, when the first splitting manner and the target multiplexing manner of the output characteristic diagram of the target convolutional layer meeting the preset condition are obtained by splitting the output characteristic diagram of the target convolutional layer according to the preset splitting manner and the preset data multiplexing manner: splitting the output characteristic diagram of the target convolutional layer according to each preset splitting mode to obtain a plurality of initial splitting results meeting a first preset condition; determining a corresponding initial multiplexing mode from a plurality of preset data multiplexing modes according to the preset splitting mode of each initial splitting result, and calculating the data repeated loading capacity of the initial splitting result according to a target multiplexing mode; and taking the preset splitting mode of the initial splitting result corresponding to the data repeated loading quantity meeting the second preset condition as a first splitting mode, and taking the initial multiplexing mode of the initial splitting result corresponding to the data repeated loading quantity meeting the second preset condition as a target multiplexing mode.
Specifically, the convolution parameters further include a boundary filling parameter pad, the obtaining module 110 performs splitting on the output feature map of the target convolution layer according to each preset splitting mode, and when a plurality of initial splitting results satisfying a first preset condition are obtained, the obtaining module 110 specifically splits the output feature map of the target convolution layer according to any one of the plurality of preset splitting modes, and when a processing of an initial splitting result satisfying the first preset condition is obtained, the obtaining module 110 is specifically configured to: splitting the output characteristic graph of the target convolutional layer according to a target splitting mode to obtain a splitting result of the output characteristic graph, wherein the splitting result of the output characteristic graph comprises the size of the output characteristic subgraph and the number of the output characteristic subgraphs obtained after the output characteristic graph is split; obtaining the size of the input characteristic subgraph after splitting the corresponding input characteristic subgraph and the size of the sub-weight after splitting the weight according to the size, the step length and the pad of the output characteristic subgraph; and if the size of any output feature sub-graph is smaller than a first preset value, the size of any input feature sub-graph is smaller than a second preset value, and the size of any sub-weight is smaller than a third preset value, taking the splitting result of the output feature graph as an initial splitting result meeting a first preset condition.
Specifically, the output feature map includes a channel dimension, a height dimension, and a width dimension, the preset splitting manners include all the dimension combinations of the three dimensions, each initial splitting result corresponds to one dimension combination, the preset data multiplexing manner includes a feature map multiplexing manner and a weight multiplexing manner, the obtaining module 110 determines a corresponding initial multiplexing manner from the preset data multiplexing manners according to the preset splitting manner of each initial splitting result, and when the data repeat load amount of the initial splitting result is calculated according to the target multiplexing manner, the obtaining module is specifically configured to: if the dimension combination corresponding to the initial splitting result only comprises channel dimensions, calculating the data repeated loading capacity of the initial splitting result in a characteristic diagram multiplexing mode; if the dimension combination corresponding to the initial splitting result does not comprise the channel dimension, calculating the data repeated loading capacity of the initial splitting result in a weight multiplexing mode; and if the dimension combination corresponding to the initial splitting result comprises the channel dimension and any one of the height dimension and the width dimension, respectively calculating the data repeated loading capacity of the initial splitting result by adopting a characteristic diagram multiplexing mode and a weight multiplexing mode.
Specifically, before executing the preset splitting manner corresponding to the data reload amount meeting the second preset condition as the first splitting manner and the preset data multiplexing manner corresponding to the data reload amount meeting the second preset condition as the target multiplexing manner, the obtaining module 110 is further specifically configured to: and judging the data repeated loading capacity corresponding to the minimum initial splitting result as the data repeated loading capacity meeting the second preset condition.
Specifically, before executing the preset splitting manner corresponding to the data reload amount meeting the second preset condition as the first splitting manner and the preset data multiplexing manner corresponding to the data reload amount meeting the second preset condition as the target multiplexing manner, the obtaining module 110 is further specifically configured to: acquiring the number of hardware computing units; judging the alignment state of the preset splitting mode corresponding to each initial splitting result relative to the hardware computing unit according to the number of the hardware computing units; calculating a comprehensive evaluation value of the data repeated load of each initial splitting result according to a preset weight proportion of the data repeated load to the alignment state; and judging the data repeated loading amount corresponding to the initial splitting result with the maximum comprehensive evaluation value as the data repeated loading amount meeting the second preset condition.
The determining module 120 is configured to determine a splitting manner of each convolutional layer and a data multiplexing manner of the convolutional layer according to the input feature map and the convolution parameter of the convolutional layer.
The splitting module 130 is configured to split the convolution operation of the neural network according to the splitting mode and the data multiplexing mode of each convolutional layer, so as to obtain a splitting result of the convolution operation of the neural network.
The operation module 140 is configured to generate a direct memory access DMA instruction for moving data of the input feature subgraph of each convolution layer according to the size of the input feature subgraph of each sub-operation of each convolution layer and the number of the input feature subgraphs; adjusting the operation sequence of each sub-operation of each convolution layer according to the data multiplexing mode of each convolution layer to generate a calculation instruction of the sub-operation; and according to the DMA instruction and the calculation instruction of each sub-operation of each convolution layer, completing the convolution operation of each convolution layer to obtain the operation result of the neural network.
Referring to fig. 5, fig. 5 is a block diagram illustrating an electronic device 10 according to an embodiment of the present invention. The electronic device 10 may be an entity computer such as a host or a server, a host group composed of a plurality of hosts, or a server group composed of a plurality of servers, a virtual host or a virtual server, or a virtual host group or a virtual server group, which can realize the same function as the entity computer, and the electronic device 10 may also be a tablet computer, a mobile terminal, or other devices. The electronic device 10 further comprises a memory 11, a communication interface 12, a processor 13 and a bus 14. The memory 11, the communication interface 12, and the processor 13 are connected by a bus 14.
The memory 11 is used for storing a program, such as the neural network processing device 100 in fig. 4, the neural network processing device 100 includes at least one software functional module which can be stored in the memory 11 in a form of software or firmware (firmware), and the processor 13 executes the program after receiving an execution instruction to implement the neural network processing method disclosed in the above embodiment.
The Memory 11 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Alternatively, the memory 11 may be a storage device built in the processor 13, or may be a storage device independent of the processor 13.
The communication connection with other external devices is realized through at least one communication interface 12 (which may be wired or wireless).
The bus 14 may be an ISA bus, PCI bus, EISA bus, or the like. Fig. 5 is indicated by only one double-headed arrow, but does not indicate only one bus or one type of bus.
The processor 13 may be a neural network accelerator, and the neural network accelerator may have a compiling function to compile the neural network, and the neural network accelerator performs an execution according to a result of the compiling function, so as to finally complete a convolution operation of the neural network. The processor 13 is an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 13. The processor 13 may be a general-purpose processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components.
An embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to execute the neural network processing method according to any one of the foregoing embodiments.
In summary, embodiments of the present invention provide a neural network processing method, an apparatus, an electronic device, and a storage medium, where the method includes: acquiring a neural network to be processed, wherein the neural network comprises at least one convolution layer, and each convolution layer corresponds to an input feature map and a convolution parameter; determining the splitting mode of each convolutional layer and the data multiplexing mode of the convolutional layer according to the input characteristic diagram and the convolution parameters of each convolutional layer; and splitting the convolution operation of the neural network according to the splitting mode and the data multiplexing mode of each convolution layer to obtain a splitting result of the convolution operation of the neural network. Compared with the prior art, the embodiment of the invention determines the splitting mode of the convolutional layer and the data multiplexing mode of the convolutional layer for each convolutional layer, so that each convolutional layer can be split for convolution calculation according to the splitting mode and the data multiplexing mode of the input feature diagram suitable for the convolution, and the calculation acceleration efficiency of the neural network is improved.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (11)

1. A neural network processing method, the method comprising:
acquiring a neural network to be processed, wherein the neural network comprises at least one convolution layer, and each convolution layer corresponds to an input feature map and a convolution parameter;
determining the splitting mode of each convolutional layer and the data multiplexing mode of the convolutional layer according to the input characteristic diagram and the convolution parameters of each convolutional layer;
and splitting the convolution operation of the neural network according to the splitting mode and the data multiplexing mode of each convolution layer to obtain a splitting result of the convolution operation of the neural network.
2. The neural network processing method of claim 1, wherein the convolution parameters include weights and step sizes, and the step of determining the convolution layer splitting mode and the convolution layer data multiplexing mode according to the input feature map and convolution parameters of each convolution layer comprises:
performing the following processing on any target convolutional layer in the neural network:
determining the size of an output feature map of the target convolutional layer according to the step length of the target convolutional layer, the number of weights and the size of an input feature map, wherein the output feature map is obtained by performing convolution operation on the input feature map through the target convolutional layer;
splitting the output characteristic diagram of the target convolutional layer according to a preset splitting mode and a preset data multiplexing mode to obtain a first splitting mode and a target multiplexing mode of the output characteristic diagram of the target convolutional layer which meet preset conditions;
determining a weight splitting mode of the target convolutional layer and a second splitting mode of the input characteristic diagram according to the first splitting mode;
and taking the weight splitting mode and the second splitting mode of the target convolutional layer as splitting modes of the target convolutional layer, and taking the target multiplexing mode as a data multiplexing mode of the target convolutional layer.
3. The neural network processing method according to claim 2, wherein the preset splitting manner is multiple, the preset data multiplexing manner is multiple, the preset conditions include a first preset condition and a second preset condition, and the step of splitting the output feature map of the target convolutional layer according to the preset splitting manner and the preset data multiplexing manner to obtain the first splitting manner and the target multiplexing manner of the output feature map of the target convolutional layer which satisfy the preset conditions includes:
splitting the output characteristic diagram of the target convolutional layer according to each preset splitting mode to obtain a plurality of initial splitting results meeting the first preset condition;
determining a corresponding initial multiplexing mode from a plurality of preset data multiplexing modes according to each initial splitting result, and calculating the data repeated loading capacity of the initial splitting result according to the target multiplexing mode;
and taking the preset splitting mode of the initial splitting result corresponding to the data repeated loading quantity meeting the second preset condition as a first splitting mode, and taking the initial multiplexing mode of the initial splitting result corresponding to the data repeated loading quantity meeting the second preset condition as a target multiplexing mode.
4. The neural network processing method according to claim 3, wherein the convolution parameters further include a boundary filling parameter pad, and the step of splitting the output feature map of the target convolution layer according to each of the preset splitting manners to obtain a plurality of initial splitting results that satisfy the first preset condition includes:
splitting the output characteristic diagram of the target convolutional layer according to any one of the preset splitting modes to obtain an initial splitting result meeting the first preset condition, wherein the processing process comprises the following steps:
splitting the output characteristic graph of the target convolutional layer according to the target splitting mode to obtain a splitting result of the output characteristic graph, wherein the splitting result of the output characteristic graph comprises the size of the output characteristic subgraph obtained after splitting the output characteristic graph and the number of the output characteristic subgraphs;
obtaining the size of the corresponding input feature subgraph after splitting the input feature subgraph and the size of the sub-weight after splitting the weight according to the size, the step length and the pad of the output feature subgraph;
and if the size of any one output feature sub-graph is smaller than a first preset value, the size of any one input feature sub-graph is smaller than a second preset value, and the size of any one sub-weight is smaller than a third preset value, taking the splitting result of the output feature graph as an initial splitting result meeting the first preset condition.
5. The neural network processing method of claim 3, wherein the output feature map includes a channel dimension, a height dimension, and a width dimension, each of the initial split results corresponds to one of the dimension combinations, the preset data multiplexing modes include a feature map multiplexing mode and a weight multiplexing mode, and the step of determining the corresponding initial multiplexing mode from the plurality of preset data multiplexing modes according to each of the initial split results and calculating the data repetition load amount of the initial split result according to the target multiplexing mode includes:
if the dimension combination corresponding to the initial splitting result only comprises channel dimensions, calculating the data repeated loading capacity of the initial splitting result in the characteristic diagram multiplexing mode;
if the dimension combination corresponding to the initial splitting result does not comprise a channel dimension, calculating the data repeated loading capacity of the initial splitting result by adopting the weight multiplexing mode;
and if the dimension combination corresponding to the initial splitting result comprises the channel dimension and any one of the height dimension and the width dimension, respectively calculating the data repeated loading capacity of the initial splitting result by adopting a characteristic diagram multiplexing mode and a weight multiplexing mode.
6. The neural network processing method according to claim 5, wherein the step of using the preset splitting manner corresponding to the data repeat load amount satisfying the second preset condition as the first splitting manner and using the preset data multiplexing manner corresponding to the data repeat load amount satisfying the second preset condition as the target multiplexing manner further comprises:
and determining the data repeated loading amount corresponding to the minimum initial splitting result as the data repeated loading amount meeting the second preset condition.
7. The neural network processing method according to claim 5, wherein the step of using the preset splitting manner corresponding to the data repeat load amount satisfying the second preset condition as the first splitting manner and using the preset data multiplexing manner corresponding to the data repeat load amount satisfying the second preset condition as the target multiplexing manner further comprises:
acquiring the number of hardware computing units;
judging the alignment state of the preset splitting mode corresponding to each initial splitting result relative to the hardware computing unit according to the number of the hardware computing units;
calculating a comprehensive evaluation value of the data repeated load of each initial splitting result according to a preset weight proportion of the data repeated load to the alignment state;
and determining the data repeated loading amount corresponding to the initial splitting result with the maximum comprehensive evaluation value as the data repeated loading amount meeting the second preset condition.
8. The neural network processing method of claim 1, wherein the split results of the convolution operations of the neural network include operation split results of all convolution layers of the neural network, each operation split result of the convolution layers includes a plurality of sub-operations, each sub-operation corresponds to a size of the input feature sub-graph and a number of the input feature sub-graphs, and the method further includes:
generating a Direct Memory Access (DMA) instruction for moving the data of the input characteristic subgraph of the convolutional layer according to the size of the input characteristic subgraph of each sub-operation of each convolutional layer and the number of the input characteristic subgraphs;
adjusting the operation sequence of each sub-operation of the convolutional layer according to the data multiplexing mode of each convolutional layer to generate a calculation instruction of the sub-operation;
and according to the DMA instruction and the calculation instruction of each sub-operation of each convolutional layer, completing the convolution operation of each convolutional layer to obtain an operation result of the neural network.
9. An apparatus for neural network processing, the apparatus comprising:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a neural network to be processed, the neural network comprises at least one convolutional layer, and each convolutional layer corresponds to an input feature map and a convolution parameter;
the determining module is used for determining the splitting mode of the convolutional layer and the data multiplexing mode of the convolutional layer according to the input characteristic diagram and the convolution parameters of each convolutional layer;
and the splitting module is used for splitting the convolution operation of the neural network according to the splitting mode and the data multiplexing mode of each convolution layer to obtain the splitting result of the convolution operation of the neural network.
10. An electronic device, characterized in that the electronic device comprises:
one or more processors;
memory for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the neural network processing method of any one of claims 1-8.
11. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, performs the neural network processing method according to any one of claims 1 to 8.
CN202010612070.8A 2020-06-29 2020-06-29 Neural network processing method and device, electronic equipment and storage medium Active CN111738424B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010612070.8A CN111738424B (en) 2020-06-29 2020-06-29 Neural network processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010612070.8A CN111738424B (en) 2020-06-29 2020-06-29 Neural network processing method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111738424A true CN111738424A (en) 2020-10-02
CN111738424B CN111738424B (en) 2023-12-26

Family

ID=72653601

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010612070.8A Active CN111738424B (en) 2020-06-29 2020-06-29 Neural network processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111738424B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023045638A1 (en) * 2021-09-26 2023-03-30 寒武纪(西安)集成电路有限公司 Computing device, method for implementing convolution operation by using computing device, and related product
CN115936086A (en) * 2023-01-09 2023-04-07 苏州浪潮智能科技有限公司 Acceleration method, device, equipment and medium based on deep neural network

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160148078A1 (en) * 2014-11-20 2016-05-26 Adobe Systems Incorporated Convolutional Neural Network Using a Binarized Convolution Layer
CN107316079A (en) * 2017-08-08 2017-11-03 珠海习悦信息技术有限公司 Processing method, device, storage medium and the processor of terminal convolutional neural networks
CN107451653A (en) * 2017-07-05 2017-12-08 深圳市自行科技有限公司 Computational methods, device and the readable storage medium storing program for executing of deep neural network
CN107451654A (en) * 2017-07-05 2017-12-08 深圳市自行科技有限公司 Acceleration operation method, server and the storage medium of convolutional neural networks
US20180285715A1 (en) * 2017-03-28 2018-10-04 Samsung Electronics Co., Ltd. Convolutional neural network (cnn) processing method and apparatus
US20190171930A1 (en) * 2017-12-05 2019-06-06 Samsung Electronics Co., Ltd. Method and apparatus for processing convolution operation in neural network
CN109919311A (en) * 2019-03-13 2019-06-21 北京地平线机器人技术研发有限公司 The method for generating instruction sequence, the method and apparatus for executing neural network computing
CN110705687A (en) * 2019-09-05 2020-01-17 北京三快在线科技有限公司 Convolution neural network hardware computing device and method
CN110717583A (en) * 2019-09-30 2020-01-21 上海寒武纪信息科技有限公司 Convolution circuit, processor, chip, board card and electronic equipment
CN110909874A (en) * 2019-11-22 2020-03-24 迪爱斯信息技术股份有限公司 Convolution operation optimization method and device of neural network model
CN111126309A (en) * 2019-12-26 2020-05-08 长沙海格北斗信息技术有限公司 Convolutional neural network architecture method based on FPGA and face recognition method thereof

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160148078A1 (en) * 2014-11-20 2016-05-26 Adobe Systems Incorporated Convolutional Neural Network Using a Binarized Convolution Layer
US20180285715A1 (en) * 2017-03-28 2018-10-04 Samsung Electronics Co., Ltd. Convolutional neural network (cnn) processing method and apparatus
CN107451653A (en) * 2017-07-05 2017-12-08 深圳市自行科技有限公司 Computational methods, device and the readable storage medium storing program for executing of deep neural network
CN107451654A (en) * 2017-07-05 2017-12-08 深圳市自行科技有限公司 Acceleration operation method, server and the storage medium of convolutional neural networks
CN107316079A (en) * 2017-08-08 2017-11-03 珠海习悦信息技术有限公司 Processing method, device, storage medium and the processor of terminal convolutional neural networks
US20190171930A1 (en) * 2017-12-05 2019-06-06 Samsung Electronics Co., Ltd. Method and apparatus for processing convolution operation in neural network
CN109919311A (en) * 2019-03-13 2019-06-21 北京地平线机器人技术研发有限公司 The method for generating instruction sequence, the method and apparatus for executing neural network computing
CN110705687A (en) * 2019-09-05 2020-01-17 北京三快在线科技有限公司 Convolution neural network hardware computing device and method
CN110717583A (en) * 2019-09-30 2020-01-21 上海寒武纪信息科技有限公司 Convolution circuit, processor, chip, board card and electronic equipment
CN110909874A (en) * 2019-11-22 2020-03-24 迪爱斯信息技术股份有限公司 Convolution operation optimization method and device of neural network model
CN111126309A (en) * 2019-12-26 2020-05-08 长沙海格北斗信息技术有限公司 Convolutional neural network architecture method based on FPGA and face recognition method thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张雨丰;郑忠龙;刘华文;向道红;何小卫;李知菲;何依然;KHODJA ABD ERRAOUF;: "基于特征图切分的轻量级卷积神经网络", 模式识别与人工智能, no. 03 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023045638A1 (en) * 2021-09-26 2023-03-30 寒武纪(西安)集成电路有限公司 Computing device, method for implementing convolution operation by using computing device, and related product
CN115936086A (en) * 2023-01-09 2023-04-07 苏州浪潮智能科技有限公司 Acceleration method, device, equipment and medium based on deep neural network

Also Published As

Publication number Publication date
CN111738424B (en) 2023-12-26

Similar Documents

Publication Publication Date Title
US11907830B2 (en) Neural network architecture using control logic determining convolution operation sequence
US11960566B1 (en) Reducing computations for data including padding
JP7025441B2 (en) Scheduling of neural network processing
US10884707B1 (en) Transpose operations using processing element array
KR20170133364A (en) Batch processing in a neural network processor
CN114026569A (en) Extended convolution using systolic arrays
WO2022001014A1 (en) Neural network model compilation method and apparatus, storage medium, and electronic device
CN111738424A (en) Neural network processing method, neural network processing device, electronic equipment and storage medium
CN113313243A (en) Method, device and equipment for determining neural network accelerator and storage medium
KR20200100190A (en) Image Transformation for Machine Learning
CN112884137A (en) Hardware implementation of neural network
US11501145B1 (en) Memory operation for systolic array
CN111523642A (en) Data reuse method, operation method and device and chip for convolution operation
CN117271136A (en) Data processing method, device, equipment and storage medium
CN111767243A (en) Data processing method, related device and computer readable medium
WO2020026475A1 (en) Neural network processor, neural network processing method, and program
US20110238945A1 (en) Apparatus and method for generating code overlay
CN111027688A (en) Neural network calculator generation method and device based on FPGA
CN115481717A (en) Method for operating neural network model, readable medium and electronic device
KR20160037737A (en) Scheduler computing device, data node of distributed computing system having the same, and method thereof
KR20220098341A (en) Neural network operation method, apparatus, electronic device and storage medium
JP2023513608A (en) Address generation method and unit, deep learning processor, chip, electronic device and computer program
CN112884138A (en) Hardware implementation of neural network
CN116157807A (en) Elastic bottleneck architecture for variable convolution operations
CN111832714A (en) Operation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant