CN112949831B

CN112949831B - Depth-first data scheduling method, system and equipment based on block convolution

Info

Publication number: CN112949831B
Application number: CN202110315074.4A
Authority: CN
Inventors: 尹志刚; 张鹏
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2021-03-24
Filing date: 2021-03-24
Publication date: 2021-10-01
Anticipated expiration: 2041-03-24
Also published as: CN112949831A

Abstract

The invention belongs to the field of convolutional neural networks, and particularly relates to a depth-first data scheduling method, system and device based on block convolution, aiming at solving the problems that the conventional convolutional model calculation method needs to calculate layer by layer, and a large amount of memory is needed for storing an intermediate result feature map, so that the intermediate result feature map is not suitable for being deployed in all hardware devices. The invention comprises the following steps: dividing the input feature image into a plurality of blocks, calling each block one by one to perform convolution or maximum pooling to generate a next layer of feature map, continuing to call a next layer to obtain a deeper feature map if the next layer of feature map reaches a preset block size, and returning to the 0 th layer for calling if the next layer of feature map is smaller than the preset block size until the reasoning process is completed. The invention avoids the memory consumption caused by storing a large number of convolution layer intermediate results and improves the reasoning efficiency of the convolution model on the all-hardware equipment.

Description

Depth-first data scheduling method, system and equipment based on block convolution

Technical Field

The invention belongs to the field of convolutional neural networks, and particularly relates to a depth-first data scheduling method, system and device based on block convolution.

Background

With the continuous development of deep learning technology, a series of models represented by convolutional neural networks have good effects in the fields of image classification, target detection and the like, and are widely applied in life. However, the feature map of each convolution layer in the convolutional neural network is usually large, a large amount of memory is occupied by adopting a layer-by-layer convolution mode, and the memory of all hardware equipment is usually limited, so that the convolution model is difficult to deploy on the all hardware equipment, and the application of the convolutional neural network is limited to a certain extent. In addition, the layer-by-layer convolution mode can only perform the next layer of convolution after the previous layer of convolution is finished, so that the flexibility is low, and certain resource waste can be caused on all hardware equipment.

At present, the memory occupation during model forward reasoning can be reduced to a certain extent by adopting methods such as model pruning, quantification and the like, but the situation of insufficient memory can still occur when the model is large. Therefore, it is necessary to design a convolution and scheduling method for a full hardware device to realize that the convolution model runs efficiently on the resource-limited full hardware device.

Disclosure of Invention

In order to solve the above problems in the prior art, that is, the data scheduling of the existing full hardware device needs to perform convolution on the whole image at the same time, and the processing of the image occupies too large memory and is not suitable for being deployed in the full hardware device, the invention provides a depth-first data scheduling method based on block convolution, which comprises the following steps:

step S100, dividing the feature map feature0 of the 0 th layer into m × n blocks with a preset size B, setting a coordinate index (X, Y), initializing (X, Y) ═ 0, and setting the layer number j of the feature map ═ 0;

step S200, if unprocessed blocks exist in feature graphs featurej of the j-th layer, inputting the blocks (X, Y) of featurej into the next layer of network for operation; the next layer network is any one of a convolutional layer or a maximum pooling layer; if the next layer of network is a convolutional layer, generating a block with the size of B, namely a block (X, Y) of featurej +1 with the size of B; if the next network is the maximum pooling layer, the generation size is

Block of, i.e. size

Block (X, Y) of featurej +1 of (1);

if there is no unprocessed block in feature0, performing the operation of step S200 on the next block (X, Y) in feature0 of layer 0, that is, making X ═ NX, Y ═ NY, j ═ 0, and going to step S200;

if the feature0 has no unprocessed block, finishing data scheduling, wherein the characteristic diagram at the deepest layer is a forward reasoning result;

step S300, if the total size B of the feature map featurej +1 of the j +1 th layer_j+1If B is less than B, the operation of step S200 is repeated for the next block (X, Y) of featurej, i.e., X is NX and Y is NY, and step S200 is shifted to; if the feature map featurej +1 of layer j +1 has an overall dimension B_j+1Setting the feature map with the size B of featurej +1 as 1 block repeats the operation of step S200, that is, turning j to j +1, and then going to step S200. In some preferred embodimentsIn the formula, the size of the 0 th layer feature image feature0 is set to H × W — 2^M*k*2^N*kThe block size is preset to be B2^k*2^kAnd B is smaller than the size of feature 0.

In some preferred embodiments, the NX_jAnd NY_jThe obtaining method comprises the following steps:

x is configured into m bits of binary representation, X is from low bit to high bit₀，x₁，x₂，......，x_m-1(ii) a Y is configured as m bits in binary representation, from low to high being Y₀，y₁，y₂，......，y_m-1(ii) a Performing two-dimensional shuffling on binary representations of the coordinate X and the coordinate Y to generate a two-dimensional shuffling coordinate X₀，y₀，x₁，y₁，x₂，y₂，......，x_m-1，y_m-1Adding 1 to the two-dimensional mixed-arranging coordinate, then performing two-dimensional mixed-arranging to generate a coordinate NX and a coordinate NY, wherein X and Y of each layer are independently calculated, and the calling sequence is generated by an NX and NY obtaining method.

In some preferred embodiments, before passing through the convolutional layer, the method further comprises: carrying out edge zero filling on each block;

for a single-channel feature map, the size calculation formula of the edge zero padding is as follows:

wherein (in)_w，in_h) Indicates the size of the input feature map (out)_w，out_h) And size of convolution layer output feature map (kernel)_w，kernel_h) Represents the convolution kernel size, (stride)_w，stride_h) Representing the step size of the convolution kernel in both width and height directions, (pad)_w，pad_h) Representing edgesThe size of the zero padding.

In some preferred embodiments, each time total dimension B of featurej is reached_j+1When B is reached, the convolution and max pooling are continued, and the result of featurej is not saved.

In some preferred embodiments, the method of the present invention may directly call a convolution kernel for convolution as long as the data is completely prepared.

On the other hand, the invention provides a depth-first data scheduling system based on block convolution, which comprises an image dividing module, a characteristic image convolution module and a depth-first calling module;

the image dividing module divides the feature map 0 of the 0 th layer into m × n blocks with a preset size of B, sets coordinate indexes (X, Y), initializes (X, Y) to (0,0), and sets the layer number j of the feature map to 0;

if unprocessed blocks exist in the feature map featurej of the jth layer, the feature image convolution module inputs the blocks (X, Y) of featurej into the next layer of network for operation; the next layer network is any one of a convolutional layer or a maximum pooling layer; if the next layer of network is a convolutional layer, generating a block with the size of B, namely a block (X, Y) of featurej +1 with the size of B; if the next network is the maximum pooling layer, the generation size is

Block of, i.e. size

Block (X, Y) of featurej +1 of (1);

if the feature is not present, performing the function of the feature image convolution module on the next block (X, Y) of the feature0 of the 0 th layer, that is, making X ═ NX, Y ═ NY, j ═ 0, and transferring to the feature image convolution module;

if unprocessed blocks do not exist in feature0, data scheduling is completed, and the characteristic diagram at the deepest layer is a forward reasoning result;

the depth-first calling module is configured to determine if the feature map featurej +1 of the j +1 th layerOverall dimension B_j+1If the value is less than B, repeating the function of the feature image convolution module for the next block (X, Y) of featurej, namely, enabling X to be NX and Y to be NY to be transferred into the feature image convolution module; if the feature map featurej +1 of layer j +1 has an overall dimension B_j+1And setting the feature map of featurej +1 as the function of 1 block repeated feature image convolution module, namely, switching j to the feature image convolution module by j + 1.

In a third aspect of the present invention, an electronic device is provided, including: at least one processor; and a memory communicatively coupled to at least one of the processors; wherein the memory stores instructions executable by the processor for execution by the processor to implement the above-described block convolution-based depth-first data scheduling method.

In a fourth aspect of the present invention, a computer-readable storage medium is provided, where the computer-readable storage medium stores computer instructions for being executed by the computer to implement the above-mentioned depth-first data scheduling method based on block convolution.

The invention has the beneficial effects that:

(1) according to the depth-first data scheduling method based on the block convolution, the depth-first scheduling method for the block convolution replaces the conventional calling method which needs to perform layer-by-layer convolution on the image, so that the memory consumption caused by storing a large number of convolution layer intermediate results is avoided, and the reasoning efficiency of a convolution model on all-hardware equipment is improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is a flowchart illustrating an embodiment of a depth-first data scheduling method based on block convolution according to the present invention;

FIG. 2 is a schematic diagram illustrating block zero padding of a feature map in an embodiment of a depth-first data scheduling method based on block convolution according to the present invention;

FIG. 3 is a schematic diagram illustrating the principle of depth-first scheduling according to an embodiment of the depth-first data scheduling method based on block convolution according to the present invention;

FIG. 4 is a schematic diagram illustrating the principle of two-dimensional shuffling and de-shuffling according to an embodiment of the depth-first data scheduling method based on block convolution of the present invention;

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

The invention discloses a depth-first data scheduling method based on block convolution, which comprises the following steps:

Block of, i.e. size

Block (X, Y) of featurej +1 of (1);

step S300, if the total size B of the feature map featurej +1 of the j +1 th layer_j+1If B is less than B, the operation of step S200 is repeated for the next block (X, Y) of featurej, i.e., X is NX and Y is NY, and step S200 is shifted to; if the feature map featurej +1 of layer j +1 has an overall dimension B_j+1Setting the feature map with the size B of featurej +1 as 1 block repeats the operation of step S200, that is, turning j to j +1, and then going to step S200.

In order to more clearly describe the depth-first data scheduling method based on block convolution of the present invention, the following describes each step in the embodiment of the present invention in detail with reference to fig. 1.

The depth-first data scheduling method based on block convolution according to the first embodiment of the present invention includes steps S100 to S300, and the steps are described in detail as follows:

in this embodiment, as shown in fig. 2, before a feature map passes through a convolutional layer each time, zero padding is performed on each block, a feature diagram on the left side of fig. 2 is an H × W feature diagram, the feature map is partitioned according to a preset partition size, and a schematic diagram on the right side of fig. 2 is a schematic diagram for performing zero padding on the edge of each partitioned block.

wherein (in)_w，in_h) Indicates the size of the input feature map (out)_w，out_h) And size of convolution layer output feature map (kernel)_w，kernel_h) Represents the convolution kernel size, (stride)_w，stride_h) Representing the step size of the convolution kernel in both width and height directions, (pad)_w，pad_h) Indicating the size of the edge zero padding.

For each block, it is necessary to set the feature size of its input convolution to be the same as the feature size of the convolution output, i.e., (out)_w，out_h)＝(in_w，in_h) After the size of the convolution kernel and the convolution step length are known, the edge zero padding size in the width direction and the height direction can be obtained through calculation of a size calculation formula of edge zero padding. Taking FIG. 2 as an example, if (in) of each block_w，in_h) Let us say (5, 5) the convolution kernel size (kernel)_w，kernel_h) (3, 3) and convolution step size (stride)_w，stride_h) When the convolution output characteristic diagram size (out) is satisfied, the value (1,1) is obtained_w，out_h) (5, 5), then (pad) is calculated by the size calculation formula of the edge zero padding_w，pad_h) The two rows of the zero-line compensation are (1,1), namely, the zero-line compensation is carried out on two sides in the width direction, and the zero-line compensation is carried out on two times in the height direction.

In the present embodiment, for example, the size of the 0 th layer feature image feature0 is set to H × W — 2^M*k*2^N*kThe block size is preset to be B2^k*2^kAnd B is smaller than the size of feature 0. The size of the feature graph is kept unchanged after the feature graph passes through the convolutional layer, and the side length of the feature graph is half of that of the feature graph before pooling due to the largest pooling layer, so that only two situations can occur in the actual operation process; in the first case, the side length of the characteristic graph is larger than B, and the characteristic graph can be divided into a plurality of blocks according to B; in the second case, if the edge length of the feature map is smaller than B, the convolution is continued according to the actual size.

If unprocessed blocks exist in the feature graph featurej of the j-th layer, inputting the blocks (X, Y) of featurej into the next layer of network for operation; the next layer network is any one of a convolutional layer or a maximum pooling layer; if the next layer of network is a convolutional layer, generating a block with the size of B, namely a block (X, Y) of featurej +1 with the size of B; if the next network is the maximum pooling layer, the generation size is

Block of, i.e. size

Block (X, Y) of featurej +1 of (1);

According to the size of the input image and the number of maximized pooling layers in the model, the size of the feature map of the last layer may be larger than, equal to or smaller than a preset block size B.

In this embodiment, the NX_jAnd NY_jThe two-dimensional scheduling method may be obtained through a preset sorting table, or may be obtained according to a two-dimensional scheduling method specifically proposed in the present application, where the two-dimensional scheduling method may be executed by setting a full hardware method of a two-dimensional scheduling apparatus, as shown in fig. 4, specifically:

the arrangement of X being in binary representationm bits, x from low to high₀，x₁，x₂，......，x_m-1(ii) a Y is configured as m bits in binary representation, from low to high being Y₀，y₁，y₂，......，y_m-1(ii) a Performing two-dimensional shuffling on binary representations of the coordinate X and the coordinate Y to generate a two-dimensional shuffling coordinate X₀，y₀，x₁，y₁，x₂，y₂，......，x_m-1，y_m-1Adding 1 to the two-dimensional mixed-arranging coordinate, then performing two-dimensional mixed-arranging to generate a coordinate NX and a coordinate NY, wherein X and Y of each layer are independently calculated, and the calling sequence is generated by an NX and NY obtaining method.

Taking the layer 0 feature map as an example, as shown in fig. 3, the feature0 on the left of fig. 3 is a feature map composed of m × n blocks; firstly, block (0,0) is called for convolution to obtain feature1(0,0), the feature1(0,0) is just one block size, and the execution is continued to the next layer. FIG. 3 shows that the next layer is a maxpool layer, and the block (0,0) of feature2 after execution is one fourth of the block size, and at this time, the next layer will not be executed again, and the process will return to feature 0;

after convolution and maxpool are respectively executed by sequentially scheduling block (0,0), block (1,0), block (0,1) and block (1,1) of feature0, block (0,0) of feature2 meets a block size, as shown in the right side of fig. 3, at this time, the block (0,0) of feature2 is not returned to the feature0 for scheduling, but the block (0,0) of feature2 is called to continue to be executed downwards until the maxpool layer is met to enable the feature to be smaller than a block size, block (2,0), block (3,0), block (2,1) and block (3,1) are traced back to the feature0 to continue to be called, and the analogy is repeated until the forward reasoning process is completed;

in this embodiment, as long as the data preparation is completed, the convolution kernel can be called directly for convolution.

In the present embodiment, each time the total size B of featurej_j+1When B is reached, the convolution and max pooling are continued, and the result of featurej is not saved.

In the conventional feature map processing method, generally, all feature0 are convolved to obtain a complete feature1 of the next layer, and then the next layer is convolved. The conventional convolution method needs to occupy a large memory to store the feature map 1, and the problem that the memory needs to be occupied to store the feature map 2 in the pooling process also exists. In contrast, the present application does not store the result of feature1 in memory but directly performs the next layer of operation using the result as an intermediate result. If the next layer is still a convolutional layer, a block (0,0) of feature2 is available, which can continue to be convolved with the next layer without being saved because it satisfies a block size (or has not been max pooled). If the next layer is the largest pooling layer, block (0,0) of feature2 can be obtained, the size of the block is only one fourth of the block, at this time, block (0,1) of feature0 is traced back to carry out convolution, and the tracing back process is repeated until the size of feature2 is one block size.

A depth-first data scheduling system based on block convolution according to a second embodiment of the present invention includes: the system comprises an image dividing module, a characteristic image convolution module and a depth priority calling module;

the image dividing module divides the feature map 0 of the 0 th layer into m × n blocks with a preset size of B, sets coordinate indexes (X, Y), initializes (X, Y) to (0,0), and sets the layer number j of the feature map to 0; (ii) a

In this embodiment, the system further includes an image zero padding module configured to perform edge zero padding on each block before each time the feature map passes through the convolutional layer, and a size calculation formula of the edge zero padding for the feature map of a single channel is as follows:

Block of, i.e. size

Block (X, Y) of featurej +1 of (1);

the depth-first calling module is configured to determine the total size B of the feature map featurej +1 of the j +1 th layer_j+1If the value is less than B, repeating the function of the feature image convolution module for the next block (X, Y) of featurej, namely, enabling X to be NX and Y to be NY to be transferred into the feature image convolution module; if the feature map featurej +1 of layer j +1 has an overall dimension B_j+1And setting the feature map with the size of B of featurej +1 as the function of 1 block repeated feature image convolution module, namely, turning j +1 into the feature image convolution module.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.

It should be noted that, the depth-first data scheduling system based on block convolution according to the foregoing embodiment is only illustrated by the division of the functional modules, and in practical applications, the functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.

An electronic device according to a third embodiment of the present invention is characterized by including: at least one processor; and a memory communicatively coupled to at least one of the processors; wherein the memory stores instructions executable by the processor for execution by the processor to implement the above-described block convolution-based depth-first data scheduling method.

A computer-readable storage medium according to a fourth embodiment of the present invention is characterized in that the computer-readable storage medium stores computer instructions for being executed by the computer to implement the above-mentioned depth-first data scheduling method based on block convolution.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.

The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. A depth-first data scheduling method based on block convolution is characterized by comprising the following steps:

step S100, dividing the feature map feature0 of the 0 th layer into p × q blocks with a preset size B, setting a coordinate index (X, Y), initializing (X, Y) ═ 0, and setting the layer number j of the feature map ═ 0;

Block of, i.e. size

Block (X, Y) of featurej +1 of (1);

if there is no unprocessed block in feature0, performing the operation of step S200 on the next block (X, Y) in feature0 of layer 0, that is, making X ═ NX, Y ═ NY, j ═ 0, and going to step S200; the formula of X ═ NX, Y ═ NY, specifically: x is configured into m bits of binary representation, X is from low bit to high bit₀，x₁，x₂，......，x_m-1(ii) a Y is configured as m bits in binary representation, from low to high being Y₀，y₁，y₂，......，y_m-1(ii) a Performing two-dimensional shuffling on binary representations of the coordinate X and the coordinate Y to generate a two-dimensional shuffling coordinate X₀，y₀，x₁，y₁，x₂，y₂，......，x_m-1，y_m-1Adding 1 to the two-dimensional mixed-arranging coordinate, performing two-dimensional mixed-arranging to generate a coordinate NX and a coordinate NY, wherein X and Y of each layer are independently calculated, and the calling sequence is generated by an NX and NY obtaining method;

2. The method of claim 1, further comprising setting the size of the layer 0 feature image feature0 to H x W-2^p*k*2^q*kThe block size is preset to be B2^k*2^kAnd B is smaller than the size of feature 0.

3. The method of claim 1, wherein before the feature map passes through the convolutional layer each time, the method further comprises: carrying out edge zero filling on each block;

wherein (in)_w，in_h) Indicates the size of the input feature map (out)_w，out_h) Size of convolution layer output characteristic diagram (kernel)_w，kernel_h) Represents the convolution kernel size, (stride)_w，stride_h) Representing the step size of the convolution kernel in both width and height directions, (pad)_w，pad_h) Indicating the size of the edge zero padding.

4. The block convolution-based depth-first data scheduling method of claim 1, wherein each time the total size B of featurej is used_j+1When B is reached, the convolution and max pooling are continued, and the result of featurej is not saved.

5. The method of claim 1, wherein a convolution kernel can be called directly for convolution as long as data preparation is completed.

6. A system for block convolution based depth-first data scheduling, the system comprising: the system comprises an image dividing module, a characteristic image convolution module and a depth priority calling module;

the image dividing module is used for dividing the feature map 0 of the 0 th layer into p × q blocks with the preset size of B, setting coordinate indexes (X, Y), initializing (X, Y) to be (0,0), and setting the layer number j of the feature map to be 0;

if unprocessed blocks exist in the feature map featurej of the jth layer, the feature image convolution module inputs the blocks (X, Y) of featurej into the next layer of network for operation; the next layer network is any one of a convolutional layer or a maximum pooling layer; if the next layer of network is a convolutional layer, generating a block with the size of B, namely a block (X, Y) of featurej +1 with the size of B; if it isThe next network is the largest pooling layer, and the generation size is

Block of, i.e. size

Block (X, Y) of featurej +1 of (1);

if the feature is not present, performing the function of the feature image convolution module on the next block (X, Y) of the feature0 of the 0 th layer, that is, making X ═ NX, Y ═ NY, j ═ 0, and transferring to the feature image convolution module; the formula of X ═ NX, Y ═ NY, specifically: x is configured into m bits of binary representation, X is from low bit to high bit₀，x₁，x₂，......，x_m-1(ii) a Y is configured as m bits in binary representation, from low to high being Y₀，y₁，y₂，……，y_m-1(ii) a Performing two-dimensional shuffling on binary representations of the coordinate X and the coordinate Y to generate a two-dimensional shuffling coordinate X₀，y₀，x₁，y₁，x₂，y₂，......，x_m-1，y_m-1Adding 1 to the two-dimensional mixed-arranging coordinate, performing two-dimensional mixed-arranging to generate a coordinate NX and a coordinate NY, wherein X and Y of each layer are independently calculated, and the calling sequence is generated by an NX and NY obtaining method;

7. The block convolution-based depth-first data scheduling system of claim 6 further comprising an image zero padding module configured to zero edge each block after dividing the feature0 into p x q blocks before each pass of the feature map through the convolutional layer;

8. An electronic device, comprising: at least one processor; and a memory communicatively coupled to at least one of the processors; wherein the memory stores instructions executable by the processor for execution by the processor to implement the block convolution based depth-first data scheduling method of any of claims 1-5.

9. A computer-readable storage medium storing computer instructions for execution by the computer to implement the block convolution-based depth-first data scheduling method of any one of claims 1 to 5.