CN112949831B - Depth-first data scheduling method, system and equipment based on block convolution - Google Patents

Depth-first data scheduling method, system and equipment based on block convolution Download PDF

Info

Publication number
CN112949831B
CN112949831B CN202110315074.4A CN202110315074A CN112949831B CN 112949831 B CN112949831 B CN 112949831B CN 202110315074 A CN202110315074 A CN 202110315074A CN 112949831 B CN112949831 B CN 112949831B
Authority
CN
China
Prior art keywords
layer
size
block
convolution
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110315074.4A
Other languages
Chinese (zh)
Other versions
CN112949831A (en
Inventor
尹志刚
张鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202110315074.4A priority Critical patent/CN112949831B/en
Publication of CN112949831A publication Critical patent/CN112949831A/en
Application granted granted Critical
Publication of CN112949831B publication Critical patent/CN112949831B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/046Forward inferencing; Production systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the field of convolutional neural networks, and particularly relates to a depth-first data scheduling method, system and device based on block convolution, aiming at solving the problems that the conventional convolutional model calculation method needs to calculate layer by layer, and a large amount of memory is needed for storing an intermediate result feature map, so that the intermediate result feature map is not suitable for being deployed in all hardware devices. The invention comprises the following steps: dividing the input feature image into a plurality of blocks, calling each block one by one to perform convolution or maximum pooling to generate a next layer of feature map, continuing to call a next layer to obtain a deeper feature map if the next layer of feature map reaches a preset block size, and returning to the 0 th layer for calling if the next layer of feature map is smaller than the preset block size until the reasoning process is completed. The invention avoids the memory consumption caused by storing a large number of convolution layer intermediate results and improves the reasoning efficiency of the convolution model on the all-hardware equipment.

Description

Depth-first data scheduling method, system and equipment based on block convolution
Technical Field
The invention belongs to the field of convolutional neural networks, and particularly relates to a depth-first data scheduling method, system and device based on block convolution.
Background
With the continuous development of deep learning technology, a series of models represented by convolutional neural networks have good effects in the fields of image classification, target detection and the like, and are widely applied in life. However, the feature map of each convolution layer in the convolutional neural network is usually large, a large amount of memory is occupied by adopting a layer-by-layer convolution mode, and the memory of all hardware equipment is usually limited, so that the convolution model is difficult to deploy on the all hardware equipment, and the application of the convolutional neural network is limited to a certain extent. In addition, the layer-by-layer convolution mode can only perform the next layer of convolution after the previous layer of convolution is finished, so that the flexibility is low, and certain resource waste can be caused on all hardware equipment.
At present, the memory occupation during model forward reasoning can be reduced to a certain extent by adopting methods such as model pruning, quantification and the like, but the situation of insufficient memory can still occur when the model is large. Therefore, it is necessary to design a convolution and scheduling method for a full hardware device to realize that the convolution model runs efficiently on the resource-limited full hardware device.
Disclosure of Invention
In order to solve the above problems in the prior art, that is, the data scheduling of the existing full hardware device needs to perform convolution on the whole image at the same time, and the processing of the image occupies too large memory and is not suitable for being deployed in the full hardware device, the invention provides a depth-first data scheduling method based on block convolution, which comprises the following steps:
step S100, dividing the feature map feature0 of the 0 th layer into m × n blocks with a preset size B, setting a coordinate index (X, Y), initializing (X, Y) ═ 0, and setting the layer number j of the feature map ═ 0;
step S200, if unprocessed blocks exist in feature graphs featurej of the j-th layer, inputting the blocks (X, Y) of featurej into the next layer of network for operation; the next layer network is any one of a convolutional layer or a maximum pooling layer; if the next layer of network is a convolutional layer, generating a block with the size of B, namely a block (X, Y) of featurej +1 with the size of B; if the next network is the maximum pooling layer, the generation size is
Figure GDA0003160898010000021
Block of, i.e. size
Figure GDA0003160898010000022
Block (X, Y) of featurej +1 of (1);
if there is no unprocessed block in feature0, performing the operation of step S200 on the next block (X, Y) in feature0 of layer 0, that is, making X ═ NX, Y ═ NY, j ═ 0, and going to step S200;
if the feature0 has no unprocessed block, finishing data scheduling, wherein the characteristic diagram at the deepest layer is a forward reasoning result;
step S300, if the total size B of the feature map featurej +1 of the j +1 th layerj+1If B is less than B, the operation of step S200 is repeated for the next block (X, Y) of featurej, i.e., X is NX and Y is NY, and step S200 is shifted to; if the feature map featurej +1 of layer j +1 has an overall dimension Bj+1Setting the feature map with the size B of featurej +1 as 1 block repeats the operation of step S200, that is, turning j to j +1, and then going to step S200. In some preferred embodimentsIn the formula, the size of the 0 th layer feature image feature0 is set to H × W — 2M*k*2N*kThe block size is preset to be B2k*2kAnd B is smaller than the size of feature 0.
In some preferred embodiments, the NXjAnd NYjThe obtaining method comprises the following steps:
x is configured into m bits of binary representation, X is from low bit to high bit0,x1,x2,......,xm-1(ii) a Y is configured as m bits in binary representation, from low to high being Y0,y1,y2,......,ym-1(ii) a Performing two-dimensional shuffling on binary representations of the coordinate X and the coordinate Y to generate a two-dimensional shuffling coordinate X0,y0,x1,y1,x2,y2,......,xm-1,ym-1Adding 1 to the two-dimensional mixed-arranging coordinate, then performing two-dimensional mixed-arranging to generate a coordinate NX and a coordinate NY, wherein X and Y of each layer are independently calculated, and the calling sequence is generated by an NX and NY obtaining method.
In some preferred embodiments, before passing through the convolutional layer, the method further comprises: carrying out edge zero filling on each block;
for a single-channel feature map, the size calculation formula of the edge zero padding is as follows:
Figure GDA0003160898010000031
Figure GDA0003160898010000032
wherein (in)w,inh) Indicates the size of the input feature map (out)w,outh) And size of convolution layer output feature map (kernel)w,kernelh) Represents the convolution kernel size, (stride)w,strideh) Representing the step size of the convolution kernel in both width and height directions, (pad)w,padh) Representing edgesThe size of the zero padding.
In some preferred embodiments, each time total dimension B of featurej is reachedj+1When B is reached, the convolution and max pooling are continued, and the result of featurej is not saved.
In some preferred embodiments, the method of the present invention may directly call a convolution kernel for convolution as long as the data is completely prepared.
On the other hand, the invention provides a depth-first data scheduling system based on block convolution, which comprises an image dividing module, a characteristic image convolution module and a depth-first calling module;
the image dividing module divides the feature map 0 of the 0 th layer into m × n blocks with a preset size of B, sets coordinate indexes (X, Y), initializes (X, Y) to (0,0), and sets the layer number j of the feature map to 0;
if unprocessed blocks exist in the feature map featurej of the jth layer, the feature image convolution module inputs the blocks (X, Y) of featurej into the next layer of network for operation; the next layer network is any one of a convolutional layer or a maximum pooling layer; if the next layer of network is a convolutional layer, generating a block with the size of B, namely a block (X, Y) of featurej +1 with the size of B; if the next network is the maximum pooling layer, the generation size is
Figure GDA0003160898010000041
Block of, i.e. size
Figure GDA0003160898010000042
Block (X, Y) of featurej +1 of (1);
if the feature is not present, performing the function of the feature image convolution module on the next block (X, Y) of the feature0 of the 0 th layer, that is, making X ═ NX, Y ═ NY, j ═ 0, and transferring to the feature image convolution module;
if unprocessed blocks do not exist in feature0, data scheduling is completed, and the characteristic diagram at the deepest layer is a forward reasoning result;
the depth-first calling module is configured to determine if the feature map featurej +1 of the j +1 th layerOverall dimension Bj+1If the value is less than B, repeating the function of the feature image convolution module for the next block (X, Y) of featurej, namely, enabling X to be NX and Y to be NY to be transferred into the feature image convolution module; if the feature map featurej +1 of layer j +1 has an overall dimension Bj+1And setting the feature map of featurej +1 as the function of 1 block repeated feature image convolution module, namely, switching j to the feature image convolution module by j + 1.
In a third aspect of the present invention, an electronic device is provided, including: at least one processor; and a memory communicatively coupled to at least one of the processors; wherein the memory stores instructions executable by the processor for execution by the processor to implement the above-described block convolution-based depth-first data scheduling method.
In a fourth aspect of the present invention, a computer-readable storage medium is provided, where the computer-readable storage medium stores computer instructions for being executed by the computer to implement the above-mentioned depth-first data scheduling method based on block convolution.
The invention has the beneficial effects that:
(1) according to the depth-first data scheduling method based on the block convolution, the depth-first scheduling method for the block convolution replaces the conventional calling method which needs to perform layer-by-layer convolution on the image, so that the memory consumption caused by storing a large number of convolution layer intermediate results is avoided, and the reasoning efficiency of a convolution model on all-hardware equipment is improved.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a flowchart illustrating an embodiment of a depth-first data scheduling method based on block convolution according to the present invention;
FIG. 2 is a schematic diagram illustrating block zero padding of a feature map in an embodiment of a depth-first data scheduling method based on block convolution according to the present invention;
FIG. 3 is a schematic diagram illustrating the principle of depth-first scheduling according to an embodiment of the depth-first data scheduling method based on block convolution according to the present invention;
FIG. 4 is a schematic diagram illustrating the principle of two-dimensional shuffling and de-shuffling according to an embodiment of the depth-first data scheduling method based on block convolution of the present invention;
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
The invention discloses a depth-first data scheduling method based on block convolution, which comprises the following steps:
step S100, dividing the feature map feature0 of the 0 th layer into m × n blocks with a preset size B, setting a coordinate index (X, Y), initializing (X, Y) ═ 0, and setting the layer number j of the feature map ═ 0;
step S200, if unprocessed blocks exist in feature graphs featurej of the j-th layer, inputting the blocks (X, Y) of featurej into the next layer of network for operation; the next layer network is any one of a convolutional layer or a maximum pooling layer; if the next layer of network is a convolutional layer, generating a block with the size of B, namely a block (X, Y) of featurej +1 with the size of B; if the next network is the maximum pooling layer, the generation size is
Figure GDA0003160898010000061
Block of, i.e. size
Figure GDA0003160898010000062
Block (X, Y) of featurej +1 of (1);
if there is no unprocessed block in feature0, performing the operation of step S200 on the next block (X, Y) in feature0 of layer 0, that is, making X ═ NX, Y ═ NY, j ═ 0, and going to step S200;
if unprocessed blocks do not exist in feature0, data scheduling is completed, and the characteristic diagram at the deepest layer is a forward reasoning result;
step S300, if the total size B of the feature map featurej +1 of the j +1 th layerj+1If B is less than B, the operation of step S200 is repeated for the next block (X, Y) of featurej, i.e., X is NX and Y is NY, and step S200 is shifted to; if the feature map featurej +1 of layer j +1 has an overall dimension Bj+1Setting the feature map with the size B of featurej +1 as 1 block repeats the operation of step S200, that is, turning j to j +1, and then going to step S200.
In order to more clearly describe the depth-first data scheduling method based on block convolution of the present invention, the following describes each step in the embodiment of the present invention in detail with reference to fig. 1.
The depth-first data scheduling method based on block convolution according to the first embodiment of the present invention includes steps S100 to S300, and the steps are described in detail as follows:
step S100, dividing the feature map feature0 of the 0 th layer into m × n blocks with a preset size B, setting a coordinate index (X, Y), initializing (X, Y) ═ 0, and setting the layer number j of the feature map ═ 0;
in this embodiment, as shown in fig. 2, before a feature map passes through a convolutional layer each time, zero padding is performed on each block, a feature diagram on the left side of fig. 2 is an H × W feature diagram, the feature map is partitioned according to a preset partition size, and a schematic diagram on the right side of fig. 2 is a schematic diagram for performing zero padding on the edge of each partitioned block.
For a single-channel feature map, the size calculation formula of the edge zero padding is as follows:
Figure GDA0003160898010000071
Figure GDA0003160898010000072
wherein (in)w,inh) Indicates the size of the input feature map (out)w,outh) And size of convolution layer output feature map (kernel)w,kernelh) Represents the convolution kernel size, (stride)w,strideh) Representing the step size of the convolution kernel in both width and height directions, (pad)w,padh) Indicating the size of the edge zero padding.
For each block, it is necessary to set the feature size of its input convolution to be the same as the feature size of the convolution output, i.e., (out)w,outh)=(inw,inh) After the size of the convolution kernel and the convolution step length are known, the edge zero padding size in the width direction and the height direction can be obtained through calculation of a size calculation formula of edge zero padding. Taking FIG. 2 as an example, if (in) of each blockw,inh) Let us say (5, 5) the convolution kernel size (kernel)w,kernelh) (3, 3) and convolution step size (stride)w,strideh) When the convolution output characteristic diagram size (out) is satisfied, the value (1,1) is obtainedw,outh) (5, 5), then (pad) is calculated by the size calculation formula of the edge zero paddingw,padh) The two rows of the zero-line compensation are (1,1), namely, the zero-line compensation is carried out on two sides in the width direction, and the zero-line compensation is carried out on two times in the height direction.
In the present embodiment, for example, the size of the 0 th layer feature image feature0 is set to H × W — 2M*k*2N*kThe block size is preset to be B2k*2kAnd B is smaller than the size of feature 0. The size of the feature graph is kept unchanged after the feature graph passes through the convolutional layer, and the side length of the feature graph is half of that of the feature graph before pooling due to the largest pooling layer, so that only two situations can occur in the actual operation process; in the first case, the side length of the characteristic graph is larger than B, and the characteristic graph can be divided into a plurality of blocks according to B; in the second case, if the edge length of the feature map is smaller than B, the convolution is continued according to the actual size.
If unprocessed blocks exist in the feature graph featurej of the j-th layer, inputting the blocks (X, Y) of featurej into the next layer of network for operation; the next layer network is any one of a convolutional layer or a maximum pooling layer; if the next layer of network is a convolutional layer, generating a block with the size of B, namely a block (X, Y) of featurej +1 with the size of B; if the next network is the maximum pooling layer, the generation size is
Figure GDA0003160898010000081
Block of, i.e. size
Figure GDA0003160898010000082
Block (X, Y) of featurej +1 of (1);
if there is no unprocessed block in feature0, performing the operation of step S200 on the next block (X, Y) in feature0 of layer 0, that is, making X ═ NX, Y ═ NY, j ═ 0, and going to step S200;
if unprocessed blocks do not exist in feature0, data scheduling is completed, and the characteristic diagram at the deepest layer is a forward reasoning result;
step S300, if the total size B of the feature map featurej +1 of the j +1 th layerj+1If B is less than B, the operation of step S200 is repeated for the next block (X, Y) of featurej, i.e., X is NX and Y is NY, and step S200 is shifted to; if the feature map featurej +1 of layer j +1 has an overall dimension Bj+1Setting the feature map with the size B of featurej +1 as 1 block repeats the operation of step S200, that is, turning j to j +1, and then going to step S200.
According to the size of the input image and the number of maximized pooling layers in the model, the size of the feature map of the last layer may be larger than, equal to or smaller than a preset block size B.
In this embodiment, the NXjAnd NYjThe two-dimensional scheduling method may be obtained through a preset sorting table, or may be obtained according to a two-dimensional scheduling method specifically proposed in the present application, where the two-dimensional scheduling method may be executed by setting a full hardware method of a two-dimensional scheduling apparatus, as shown in fig. 4, specifically:
the arrangement of X being in binary representationm bits, x from low to high0,x1,x2,......,xm-1(ii) a Y is configured as m bits in binary representation, from low to high being Y0,y1,y2,......,ym-1(ii) a Performing two-dimensional shuffling on binary representations of the coordinate X and the coordinate Y to generate a two-dimensional shuffling coordinate X0,y0,x1,y1,x2,y2,......,xm-1,ym-1Adding 1 to the two-dimensional mixed-arranging coordinate, then performing two-dimensional mixed-arranging to generate a coordinate NX and a coordinate NY, wherein X and Y of each layer are independently calculated, and the calling sequence is generated by an NX and NY obtaining method.
Taking the layer 0 feature map as an example, as shown in fig. 3, the feature0 on the left of fig. 3 is a feature map composed of m × n blocks; firstly, block (0,0) is called for convolution to obtain feature1(0,0), the feature1(0,0) is just one block size, and the execution is continued to the next layer. FIG. 3 shows that the next layer is a maxpool layer, and the block (0,0) of feature2 after execution is one fourth of the block size, and at this time, the next layer will not be executed again, and the process will return to feature 0;
after convolution and maxpool are respectively executed by sequentially scheduling block (0,0), block (1,0), block (0,1) and block (1,1) of feature0, block (0,0) of feature2 meets a block size, as shown in the right side of fig. 3, at this time, the block (0,0) of feature2 is not returned to the feature0 for scheduling, but the block (0,0) of feature2 is called to continue to be executed downwards until the maxpool layer is met to enable the feature to be smaller than a block size, block (2,0), block (3,0), block (2,1) and block (3,1) are traced back to the feature0 to continue to be called, and the analogy is repeated until the forward reasoning process is completed;
in this embodiment, as long as the data preparation is completed, the convolution kernel can be called directly for convolution.
In the present embodiment, each time the total size B of featurejj+1When B is reached, the convolution and max pooling are continued, and the result of featurej is not saved.
In the conventional feature map processing method, generally, all feature0 are convolved to obtain a complete feature1 of the next layer, and then the next layer is convolved. The conventional convolution method needs to occupy a large memory to store the feature map 1, and the problem that the memory needs to be occupied to store the feature map 2 in the pooling process also exists. In contrast, the present application does not store the result of feature1 in memory but directly performs the next layer of operation using the result as an intermediate result. If the next layer is still a convolutional layer, a block (0,0) of feature2 is available, which can continue to be convolved with the next layer without being saved because it satisfies a block size (or has not been max pooled). If the next layer is the largest pooling layer, block (0,0) of feature2 can be obtained, the size of the block is only one fourth of the block, at this time, block (0,1) of feature0 is traced back to carry out convolution, and the tracing back process is repeated until the size of feature2 is one block size.
A depth-first data scheduling system based on block convolution according to a second embodiment of the present invention includes: the system comprises an image dividing module, a characteristic image convolution module and a depth priority calling module;
the image dividing module divides the feature map 0 of the 0 th layer into m × n blocks with a preset size of B, sets coordinate indexes (X, Y), initializes (X, Y) to (0,0), and sets the layer number j of the feature map to 0; (ii) a
In this embodiment, the system further includes an image zero padding module configured to perform edge zero padding on each block before each time the feature map passes through the convolutional layer, and a size calculation formula of the edge zero padding for the feature map of a single channel is as follows:
Figure GDA0003160898010000101
Figure GDA0003160898010000102
wherein (in)w,inh) Indicates the size of the input feature map (out)w,outh) And size of convolution layer output feature map (kernel)w,kernelh) Represents the convolution kernel size, (stride)w,strideh) Representing the step size of the convolution kernel in both width and height directions, (pad)w,padh) Indicating the size of the edge zero padding.
If unprocessed blocks exist in the feature map featurej of the jth layer, the feature image convolution module inputs the blocks (X, Y) of featurej into the next layer of network for operation; the next layer network is any one of a convolutional layer or a maximum pooling layer; if the next layer of network is a convolutional layer, generating a block with the size of B, namely a block (X, Y) of featurej +1 with the size of B; if the next network is the maximum pooling layer, the generation size is
Figure GDA0003160898010000103
Block of, i.e. size
Figure GDA0003160898010000111
Block (X, Y) of featurej +1 of (1);
if the feature is not present, performing the function of the feature image convolution module on the next block (X, Y) of the feature0 of the 0 th layer, that is, making X ═ NX, Y ═ NY, j ═ 0, and transferring to the feature image convolution module;
if unprocessed blocks do not exist in feature0, data scheduling is completed, and the characteristic diagram at the deepest layer is a forward reasoning result;
the depth-first calling module is configured to determine the total size B of the feature map featurej +1 of the j +1 th layerj+1If the value is less than B, repeating the function of the feature image convolution module for the next block (X, Y) of featurej, namely, enabling X to be NX and Y to be NY to be transferred into the feature image convolution module; if the feature map featurej +1 of layer j +1 has an overall dimension Bj+1And setting the feature map with the size of B of featurej +1 as the function of 1 block repeated feature image convolution module, namely, turning j +1 into the feature image convolution module.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.
It should be noted that, the depth-first data scheduling system based on block convolution according to the foregoing embodiment is only illustrated by the division of the functional modules, and in practical applications, the functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.
An electronic device according to a third embodiment of the present invention is characterized by including: at least one processor; and a memory communicatively coupled to at least one of the processors; wherein the memory stores instructions executable by the processor for execution by the processor to implement the above-described block convolution-based depth-first data scheduling method.
A computer-readable storage medium according to a fourth embodiment of the present invention is characterized in that the computer-readable storage medium stores computer instructions for being executed by the computer to implement the above-mentioned depth-first data scheduling method based on block convolution.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (9)

1. A depth-first data scheduling method based on block convolution is characterized by comprising the following steps:
step S100, dividing the feature map feature0 of the 0 th layer into p × q blocks with a preset size B, setting a coordinate index (X, Y), initializing (X, Y) ═ 0, and setting the layer number j of the feature map ═ 0;
step S200, if unprocessed blocks exist in feature graphs featurej of the j-th layer, inputting the blocks (X, Y) of featurej into the next layer of network for operation; the next layer network is any one of a convolutional layer or a maximum pooling layer; if the next layer of network is a convolutional layer, generating a block with the size of B, namely a block (X, Y) of featurej +1 with the size of B; if the next network is the maximum pooling layer, the generation size is
Figure FDA0003215907430000011
Block of, i.e. size
Figure FDA0003215907430000012
Block (X, Y) of featurej +1 of (1);
if there is no unprocessed block in feature0, performing the operation of step S200 on the next block (X, Y) in feature0 of layer 0, that is, making X ═ NX, Y ═ NY, j ═ 0, and going to step S200; the formula of X ═ NX, Y ═ NY, specifically: x is configured into m bits of binary representation, X is from low bit to high bit0,x1,x2,......,xm-1(ii) a Y is configured as m bits in binary representation, from low to high being Y0,y1,y2,......,ym-1(ii) a Performing two-dimensional shuffling on binary representations of the coordinate X and the coordinate Y to generate a two-dimensional shuffling coordinate X0,y0,x1,y1,x2,y2,......,xm-1,ym-1Adding 1 to the two-dimensional mixed-arranging coordinate, performing two-dimensional mixed-arranging to generate a coordinate NX and a coordinate NY, wherein X and Y of each layer are independently calculated, and the calling sequence is generated by an NX and NY obtaining method;
if unprocessed blocks do not exist in feature0, data scheduling is completed, and the characteristic diagram at the deepest layer is a forward reasoning result;
step S300, if the total size B of the feature map featurej +1 of the j +1 th layerj+1If B is less than B, the operation of step S200 is repeated for the next block (X, Y) of featurej, i.e., X is NX and Y is NY, and step S200 is shifted to; if the feature map featurej +1 of layer j +1 has an overall dimension Bj+1Setting the feature map with the size B of featurej +1 as 1 block repeats the operation of step S200, that is, turning j to j +1, and then going to step S200.
2. The method of claim 1, further comprising setting the size of the layer 0 feature image feature0 to H x W-2p*k*2q*kThe block size is preset to be B2k*2kAnd B is smaller than the size of feature 0.
3. The method of claim 1, wherein before the feature map passes through the convolutional layer each time, the method further comprises: carrying out edge zero filling on each block;
for a single-channel feature map, the size calculation formula of the edge zero padding is as follows:
Figure FDA0003215907430000021
Figure FDA0003215907430000022
wherein (in)w,inh) Indicates the size of the input feature map (out)w,outh) Size of convolution layer output characteristic diagram (kernel)w,kernelh) Represents the convolution kernel size, (stride)w,strideh) Representing the step size of the convolution kernel in both width and height directions, (pad)w,padh) Indicating the size of the edge zero padding.
4. The block convolution-based depth-first data scheduling method of claim 1, wherein each time the total size B of featurej is usedj+1When B is reached, the convolution and max pooling are continued, and the result of featurej is not saved.
5. The method of claim 1, wherein a convolution kernel can be called directly for convolution as long as data preparation is completed.
6. A system for block convolution based depth-first data scheduling, the system comprising: the system comprises an image dividing module, a characteristic image convolution module and a depth priority calling module;
the image dividing module is used for dividing the feature map 0 of the 0 th layer into p × q blocks with the preset size of B, setting coordinate indexes (X, Y), initializing (X, Y) to be (0,0), and setting the layer number j of the feature map to be 0;
if unprocessed blocks exist in the feature map featurej of the jth layer, the feature image convolution module inputs the blocks (X, Y) of featurej into the next layer of network for operation; the next layer network is any one of a convolutional layer or a maximum pooling layer; if the next layer of network is a convolutional layer, generating a block with the size of B, namely a block (X, Y) of featurej +1 with the size of B; if it isThe next network is the largest pooling layer, and the generation size is
Figure FDA0003215907430000031
Block of, i.e. size
Figure FDA0003215907430000032
Block (X, Y) of featurej +1 of (1);
if the feature is not present, performing the function of the feature image convolution module on the next block (X, Y) of the feature0 of the 0 th layer, that is, making X ═ NX, Y ═ NY, j ═ 0, and transferring to the feature image convolution module; the formula of X ═ NX, Y ═ NY, specifically: x is configured into m bits of binary representation, X is from low bit to high bit0,x1,x2,......,xm-1(ii) a Y is configured as m bits in binary representation, from low to high being Y0,y1,y2,……,ym-1(ii) a Performing two-dimensional shuffling on binary representations of the coordinate X and the coordinate Y to generate a two-dimensional shuffling coordinate X0,y0,x1,y1,x2,y2,......,xm-1,ym-1Adding 1 to the two-dimensional mixed-arranging coordinate, performing two-dimensional mixed-arranging to generate a coordinate NX and a coordinate NY, wherein X and Y of each layer are independently calculated, and the calling sequence is generated by an NX and NY obtaining method;
if unprocessed blocks do not exist in feature0, data scheduling is completed, and the characteristic diagram at the deepest layer is a forward reasoning result;
the depth-first calling module is configured to determine the total size B of the feature map featurej +1 of the j +1 th layerj+1If the value is less than B, repeating the function of the feature image convolution module for the next block (X, Y) of featurej, namely, enabling X to be NX and Y to be NY to be transferred into the feature image convolution module; if the feature map featurej +1 of layer j +1 has an overall dimension Bj+1And setting the feature map with the size of B of featurej +1 as the function of 1 block repeated feature image convolution module, namely, turning j +1 into the feature image convolution module.
7. The block convolution-based depth-first data scheduling system of claim 6 further comprising an image zero padding module configured to zero edge each block after dividing the feature0 into p x q blocks before each pass of the feature map through the convolutional layer;
for a single-channel feature map, the size calculation formula of the edge zero padding is as follows:
Figure FDA0003215907430000041
Figure FDA0003215907430000042
wherein (in)w,inh) Indicates the size of the input feature map (out)w,outh) Size of convolution layer output characteristic diagram (kernel)w,kernelh) Represents the convolution kernel size, (stride)w,strideh) Representing the step size of the convolution kernel in both width and height directions, (pad)w,padh) Indicating the size of the edge zero padding.
8. An electronic device, comprising: at least one processor; and a memory communicatively coupled to at least one of the processors; wherein the memory stores instructions executable by the processor for execution by the processor to implement the block convolution based depth-first data scheduling method of any of claims 1-5.
9. A computer-readable storage medium storing computer instructions for execution by the computer to implement the block convolution-based depth-first data scheduling method of any one of claims 1 to 5.
CN202110315074.4A 2021-03-24 2021-03-24 Depth-first data scheduling method, system and equipment based on block convolution Active CN112949831B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110315074.4A CN112949831B (en) 2021-03-24 2021-03-24 Depth-first data scheduling method, system and equipment based on block convolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110315074.4A CN112949831B (en) 2021-03-24 2021-03-24 Depth-first data scheduling method, system and equipment based on block convolution

Publications (2)

Publication Number Publication Date
CN112949831A CN112949831A (en) 2021-06-11
CN112949831B true CN112949831B (en) 2021-10-01

Family

ID=76227703

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110315074.4A Active CN112949831B (en) 2021-03-24 2021-03-24 Depth-first data scheduling method, system and equipment based on block convolution

Country Status (1)

Country Link
CN (1) CN112949831B (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107437110B (en) * 2017-07-11 2021-04-02 中国科学院自动化研究所 Block convolution optimization method and device of convolutional neural network
CN108875904A (en) * 2018-04-04 2018-11-23 北京迈格威科技有限公司 Image processing method, image processing apparatus and computer readable storage medium
CN109858495B (en) * 2019-01-16 2023-09-22 五邑大学 Feature extraction method and device based on improved convolution block and storage medium thereof

Also Published As

Publication number Publication date
CN112949831A (en) 2021-06-11

Similar Documents

Publication Publication Date Title
CN115186821B (en) Core particle-oriented neural network inference overhead estimation method and device and electronic equipment
CN111898733B (en) Deep separable convolutional neural network accelerator architecture
CN112073221B (en) Method and device for realizing network node sequencing
CN112711422A (en) Optimization method and system for neural network compiling
CN112163601B (en) Image classification method, system, computer device and storage medium
CN114492782B (en) On-chip core compiling and mapping method and device of neural network based on reinforcement learning
CN115421897B (en) Core particle-oriented deep neural network pipeline parallel scheduling method and device
CN112819157B (en) Neural network training method and device, intelligent driving control method and device
CN115660078A (en) Distributed computing method, system, storage medium and electronic equipment
CN111652330A (en) Image processing method, device, system, electronic equipment and readable storage medium
CN112200300A (en) Convolutional neural network operation method and device
CN115860081B (en) Core algorithm scheduling method, system, electronic equipment and storage medium
CN116075821A (en) Form convolution and acceleration
CN111931927B (en) Method and device for reducing occupation of computing resources in NPU
CN116720551B (en) Convolution acceleration method and convolution accelerator of impulse neural network
CN115186802A (en) Block sparse method and device based on convolutional neural network and processing unit
Fobel et al. A scalable, serially-equivalent, high-quality parallel placement methodology suitable for modern multicore and GPU architectures
CN113986816A (en) Reconfigurable computing chip
CN115017773A (en) Dimension reduction method of three-dimensional grid model, electronic equipment and medium
CN112949831B (en) Depth-first data scheduling method, system and equipment based on block convolution
CN110533161A (en) A kind of characteristic pattern processing method based on layering group convolutional neural networks
CN116501325A (en) Operator processing method and computer equipment
CN117808101A (en) Neural network reasoning method, system and storage medium based on FPGA
CN116522844B (en) Circuit dividing method, circuit node voltage calculating method, terminal and storage medium
WO2024191479A1 (en) Dynamic uncompression for channel-separable operation in neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant