CN113887720B

CN113887720B - Upsampling reverse blocking mapping method

Info

Publication number: CN113887720B
Application number: CN202111148518.6A
Authority: CN
Inventors: 施先广; 胡有能; 李一涛; 何增; 马德; 岳克强
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2021-09-29
Filing date: 2021-09-29
Publication date: 2024-04-26
Anticipated expiration: 2041-09-29
Also published as: CN113887720A

Abstract

The invention discloses an up-sampling reverse blocking mapping method, which comprises the following steps: s1, reading input feature map data and storing the input feature map data in a shift buffer area; s2, finding out pixel points in the output feature map blocks, and mapping the positions of the four nearest adjacent pixel points in the input feature map; s3, calculating to obtain pixel values of the output feature image by adopting a pipeline mode, wherein the method comprises the following steps: s31, in the vertical direction, multiplying the parameters of the four nearest pixel points and the column direction once, and adding the obtained four intermediate values to obtain two intermediate values; s32, multiplying the two intermediate values with parameters in the row direction to obtain two intermediate values, and adding the two intermediate values once to obtain an up-sampled pixel point of the output feature map; s4, returning to S1 after the data of the first block are processed, and processing the next block; s5, after the input feature map is processed, continuing to process the next feature map according to the register instruction.

Description

Upsampling reverse blocking mapping method

Technical Field

The invention relates to the technical field of neural networks, in particular to an up-sampling reverse blocking mapping method.

Background

The appearance of the deep learning algorithm further makes the application of artificial intelligence technology break through. In the early stage, the deep learning algorithm mainly runs on a server with a high-performance GPU, and with deep learning deep application, people find that the method is simple and efficient, but has the problems of high power consumption, large volume and the like. In view of these problems, the neural network accelerator can effectively solve the problems. Upsampling is an important ring in neural network accelerator implementation and is therefore also gaining higher importance and development.

The main function of up-sampling is to construct new pixel points by using the existing pixel points, and the up-sampling is commonly used in a neural network accelerator, such as enhancement of a feature map, restoration of the size of the feature map and the like. It is important for correct recognition and learning of deep learning. The existing up-sampling is usually a point-to-point mapping, that is, four adjacent points of the input feature map are read, a new pixel point is calculated, and then the four points of the input feature map are read again, so that a new pixel point is calculated. The disadvantage of this is that data multiplexing cannot be realized, because the pixels with different output feature patterns can be mapped to four adjacent pixels with the same input feature patterns, if the mapping is point-to-point, repeated reading will occur, which causes waste of time and resources.

Disclosure of Invention

In order to solve the defects in the prior art and realize the purpose of high-efficiency multiplexing of data, the invention adopts the following technical scheme:

an up-sampling reverse blocking mapping method comprises the following steps:

S1, reading input feature map data, storing the input feature map data in a shift buffer area, and generating blocks of an output feature map according to four nearest pixel points ram [0], ram [1], ram [ W ], ram [ W+1] of the input feature map;

s2, finding out pixel points in the output feature map blocks, mapping the positions of the four nearest neighboring pixel points in the input feature map, mapping different pixel points in each block of the output feature map to points in the input feature map, and separating the same four nearest neighboring pixel points in the input feature map by different distances in the horizontal and vertical directions, wherein h_param00 and h_param01 are parameters in the row direction, represent the distances in the horizontal direction, and v_param00 and v_param01 are parameters in the column direction, and represent the distances in the vertical direction;

S3, calculating the distance parameters obtained by the sum S2 of the pixel points obtained by the S1 in a pipeline mode to obtain the pixel value of the output characteristic image, wherein the method comprises the following steps:

s31, in the vertical direction, multiplying the parameters v_param00 and v_param01 of the four nearest pixels and the column direction once, and adding the obtained four intermediate values to obtain two intermediate values;

S32, multiplying the two intermediate values with parameters h_param00 and h_param01 in the row direction to obtain two intermediate values, and adding the two intermediate values once to obtain an up-sampled pixel point of the output feature diagram;

S4, returning to S1 after the data of the first block are processed, and processing the next block;

S5, after the input feature map is processed, continuing to process the next feature map according to the register instruction.

The data to be processed and the calculation parameters can be obtained rapidly through inverse block mapping and data multiplexing, and then the two-stage multiply-add operation is realized by utilizing the assembly line, so that the calculation rate is improved.

Further, in the step S1, the data buffer block adopts a shift buffer mode, the input feature map w×h, W, H is greater than or equal to 2, W represents the width of the input feature map as W pixels, H represents the height of the input feature map as H pixels, the number of pixels initially input to the buffer block is w+2, the pixels in the first block of the output feature map are generated according to the four nearest neighboring pixels of the input feature map, after all the pixels in the block are generated, new data are read from left to right, and the pixels in the second block of the output feature map are generated, and so on until all the blocks of the output feature map are generated.

Further, after the rightmost block of the output feature map is generated, the pixel points of the two input feature maps are read in, and the next block of the output feature map is generated.

Further, for the input feature map w×h, w=1, h≡2, w≡2, h=1, or w=1, h=1, the pixel point which does not exist in the four nearest-neighbor pixel points of the input feature map is set to 0.

Further, in the step S2, the output feature map is mapped to four nearest neighboring pixels of the input feature map in a reverse block mapping manner, the mapping direction is from left to right to the block at the far right, and then from left to right from the lower left block, so as to cycle until the complete output feature map is mapped, the same block of the output feature map is mapped to four nearest neighboring pixels of the same input feature map, and different blocks of the output feature map are mapped to four nearest neighboring pixels of the input feature map which are not identical.

Further, in S2, when the size of the input feature map is w×h, w=1, h≡2, w+.2, h=1, or w=1, h=1, the pixel that does not exist in the four pixels of the input feature map is set to 0.

Further, in S3, data ram [0], data ram [1], data ram [ W ] and data ram [ W+1] are read out from the shift buffer.

Further, in the step S3, when the feature diagram W×H is input, W is equal to or greater than 2, and H=1, the data ram [0] and the data ram [1] are read out from the shift buffer area, and the data ram [ W ] and the data ram [ W+1] are assigned with 0; when the characteristic diagram W multiplied by H is input, W=1, H is more than or equal to 2, reading data ram [0] and data ram [ W ] from the shift buffer area, and assigning 0 to data ram [1] and data ram [ W+1 ]; when the feature map w×h, w=1, h=1, the data ram [0] is read out from the shift buffer, and the data ram [1], the data ram [ W ], and the data ram [ w+1] are assigned 0.

Because the design is very flexible, the initial data reading mode can realize up-sampling of a single pixel point (1 multiplied by 1), a single column pixel point (1 multiplied by H) and a single row pixel point (W multiplied by 1) only by slightly changing.

The invention has the advantages that:

The invention realizes the high-efficiency multiplexing of data by a block reverse mapping mode and the buffer design of the shift register; the data processing adopts a pipeline mode, so that the calculation speed is improved; the data calculation process uses a floating point-to-fixed point mode, and is customized and optimized according to specific design precision requirements, so that the cache and the power consumption consumed by multiply-add operation in the data processing process are reduced; meanwhile, the parallel mode is simple, most of the existing neural networks can be matched, and the parallel mode is irrelevant to the specific layer number of the neural networks.

Drawings

Fig. 1 is a flow chart of the method of the present invention.

FIG. 2 is an up-sampling reverse blocking mapping input feature map data cache map (W, H.gtoreq.2) of the present invention.

FIG. 3 is a data cache diagram of an upsampled inverse blocking map input feature map (W=1, H.gtoreq.2 or W.gtoreq.2, H=1) of the present invention.

Fig. 4 is a data buffer diagram (w=1, h=1) of an upsampled inverse blocking map input feature map according to the present invention.

FIG. 5 is a schematic diagram of a data cache block according to the present invention.

FIG. 6 is a schematic diagram of an upsampling inverse blocking mapping scheme (W, H.gtoreq.2) according to the present invention.

FIG. 7 is a schematic diagram of an upsampling inverse blocking mapping scheme (W=1, H.gtoreq.2 or W.gtoreq.2, H=1) according to the present invention.

Fig. 8 is a schematic diagram of an upsampling inverse block mapping scheme (w=1, h=1) according to the present invention.

Fig. 9 is a schematic diagram of upsampling parameter generation in the present invention.

Fig. 10 is a schematic diagram of an up-sampling data processing module in the present invention.

Detailed Description

The following describes specific embodiments of the present invention in detail with reference to the drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the invention, are not intended to limit the invention.

The inverse block mapping and multiplexing calculation up-sampling method of the neural network accelerator is mentioned in the design. In the block mapping, the four nearest adjacent pixel points of the input feature map are not identical, and the same part adopts data multiplexing, so that the condition of repeated reading can not occur.

As shown in fig. 1, an upsampling inverse blocking mapping method includes the following steps:

Step one: and reading partial data of the characteristic diagram according to a contracted mode and storing the partial data in a shift buffer area.

As shown in fig. 2-4, the input signature buffer block does not buffer only the four nearest four pixels for data multiplexing.

As shown in fig. 5, the data buffer block adopts a shift buffer mode. The timing of the data caches is slightly different for different sizes of input feature maps. As shown in fig. 2, for an input feature map with a width and height of w×h (W, H is greater than or equal to 2), the number of data input to the buffer block is (w+2), at this time, a first block of the output feature map appears for the first time, which corresponds to four pixels nearest to the input feature map, and can start to generate pixels in the first block of the output feature map, after all pixels in the block are generated, only one data is required to be read in, and then the pixel in a second block of the output feature map can be generated, and note that after the rightmost block of the output feature map is generated, the next block of the output feature map can be generated only by reading in the data of the two input feature maps, and the steps are sequentially circulated until all the blocks of the output feature map are generated; as shown in fig. 3, for an input feature map size w×h (w=1, H is greater than or equal to 2 or W is greater than or equal to 2, h=1), there are actually no four nearest-neighbor pixels, the number of data initially input to the buffer block is 2, the remaining two pixels are replaced by 0, so that the generation of pixels in the first block of the output feature map can be started, and after all pixels in the block are generated, only one data is read, the generation of pixels in the second block of the output feature map can be started; as shown in fig. 4, for an input feature map having an input feature map size of w×h (w=1, h=1), there is only one block of the output feature map, the number of data initially input to the buffer block is 1, and the remaining three pixels are replaced with 0, so that the generation of the output feature map can be started.

Step two: and partitioning according to a partitioning reverse mapping mode, sequentially finding out pixel points in the partitioning, and calculating four parameters of the pixel points of the output feature image at positions corresponding to the input features.

As shown in fig. 6-8, the dashed boxes respectively represent the blocks of the output feature map and the four nearest pixel points mapped on the input feature map, and as shown in fig. 6, the output feature map is mapped to the four nearest pixel points of the input feature map by means of inverse block mapping, the mapping direction is from left to right to the block at the far right, and then from left to right from the lower left block, so that the cycle is performed until the complete output feature map is mapped. The same block of the output feature map is mapped to four nearest neighbor pixels of the same input feature map, and different blocks of the output feature map are mapped to four nearest neighbor pixels of the input feature map that are not identical. As shown in fig. 7 and 8, when the input feature map size is w×h (w=1, h≡2 or w≡2, h=1) or w×h (w=1, h=1), actually, there are no four nearest-neighbor pixels in the input feature map, and the pixel that is not present is replaced with a pixel having a value of 0.

As shown in FIG. 9, ram [0], ram [1] or 0, ram [ W ] or 0, ram [ W+1] or 0 are respectively the four nearest neighboring pixels in the input feature map, dout is a pixel in a certain block of the output feature map, and mapped at the position of the input feature map. As shown in fig. 6-9, each block of the output feature map often contains a plurality of pixel points, all the pixel points in the same block are mapped to the same four nearest neighboring pixel points, but the horizontal distance and the vertical distance of the point of different pixel points mapped in the input feature map from the nearest neighboring four pixel points are not identical, i.e. h_param00, h_param01, v_param00 and v_param01 are not identical.

Step three: inputting the parameters obtained in the first step and the parameters obtained in the second step into a data processing module, and calculating the pixel values of the output feature images by adopting a pipeline mode by the module;

The whole data processing module is divided into two stages of calculation, and each stage is a multiplication and addition combination. When the characteristic diagram (W multiplied by H, W, H is more than or equal to 2) is input, reading data ram [0], data ram [1], data ram [ W ] and data ram [ W+1] from the shift buffer; when a feature map (W multiplied by H, W is more than or equal to 2 and H=1) is input, reading data ram [0] and data ram [1] from the shift buffer area, and assigning 0 to the data ram [ W ] and the data ram [ W+1 ]; when a feature map (W multiplied by H, W=1, H is more than or equal to 2) is input, reading data ram [0] and data ram [ W ] from the shift buffer area, and assigning 0 to data ram [1] and data ram [ W+1 ]; when the feature map (w×h, w=1, h=1) is input, the data ram [0] is read out from the shift buffer, and the data ram [1], the data ram [ W ], and the data ram [ w+1] are assigned 0. Because the design is very flexible, the initial data reading mode can realize up-sampling of a single pixel point (1 multiplied by 1), a single column pixel point (1 multiplied by H) and a single row pixel point (W multiplied by 1) only by slightly changing;

As shown in fig. 10, the first-stage multiplication and addition is to input four adjacent pixel values and parameters (v_param00 and v_param01) in the column direction in the vertical direction, and four intermediate values are obtained through one multiplication; and respectively adding the four intermediate values to obtain two intermediate values, and completing one multiplication and addition operation at the moment. The second-stage multiply-add is to multiply two intermediate values generated by the first-stage multiply-add with the row direction parameters (h_param00 and h_param01) respectively to obtain two intermediate values; and adding the two intermediate values again to obtain a result, namely outputting the pixel points after up-sampling of the feature map. The data to be processed and the calculation parameters can be obtained rapidly through inverse block mapping and data multiplexing, and then the two-stage multiply-add operation is realized by utilizing the assembly line, so that the calculation rate is improved.

Step four: after the data of one block is processed, the next block is processed through the first step, the second step and the third step;

step five: after the input feature map is processed, the next feature map is processed continuously according to the register instruction.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced with equivalents; such modifications and substitutions do not depart from the spirit of the technical solutions according to the embodiments of the present invention.

Claims

1. An up-sampling reverse blocking mapping method is characterized by comprising the following steps:

2. The method for mapping up-sampling reverse blocks according to claim 1, wherein in the step S1, the data buffer block adopts a shift buffer mode, the input feature map w×h, W, H is equal to or greater than 2, W represents the width of the input feature map W pixels, H represents the height of the input feature map H pixels, the number of pixels initially input to the buffer block is w+2, the pixels in the first block of the output feature map are generated according to four nearest neighboring pixels of the input feature map, after all pixels in the block are generated, new data are read from left to right, and the pixels in the second block of the output feature map are generated, and so on until all the blocks of the output feature map are generated.

3. The method of up-sampling inverse block mapping according to claim 2, wherein after generating the rightmost block of the output feature map, the pixels of the two input feature maps are read in to generate the next block of the output feature map.

4. The method according to claim 2, wherein for the input feature map w×h, w=1, h++2, or w++2, h=1, or w=1, h=1, the pixel point that does not exist in the four nearest neighboring pixels of the input feature map is set to 0.

5. The up-sampling inverse block mapping method according to claim 2, wherein in S2, the output feature map is mapped to four nearest neighboring pixels of the input feature map by means of inverse block mapping, the mapping direction is from left to right to the block on the right, and then from left to right from the lower left block, so as to cycle until the complete output feature map is mapped, the same block of the output feature map is mapped to four nearest neighboring pixels of the same input feature map, and different blocks of the output feature map are mapped to four nearest neighboring pixels of the input feature map which are not identical.

6. The method of up-sampling inverse block mapping according to claim 4, wherein in S2, for the input feature map size w×h, w=1, h≡2, or w≡2, h=1, or w=1, h=1, the pixel that does not exist in the four pixels of the input feature map is set to 0.

7. The method according to claim 5, wherein in S3, data ram [0], data ram [1], data ram [ W ] and data ram [ w+1] are read out from the shift buffer.

8. The method of up-sampling inverse block mapping according to claim 6, wherein in S3, when the input feature map w×h, W is equal to or greater than 2, h=1, data ram [0], data ram [1] are read out from the shift buffer, and data ram [ W ] and data ram [ w+1] are assigned 0; when the characteristic diagram W multiplied by H is input, W=1, H is more than or equal to 2, reading data ram [0] and data ram [ W ] from the shift buffer area, and assigning 0 to data ram [1] and data ram [ W+1 ]; when the feature map w×h, w=1, h=1, the data ram [0] is read out from the shift buffer, and the data ram [1], the data ram [ W ], and the data ram [ w+1] are assigned 0.