CN113887720B - Upsampling reverse blocking mapping method - Google Patents

Upsampling reverse blocking mapping method Download PDF

Info

Publication number
CN113887720B
CN113887720B CN202111148518.6A CN202111148518A CN113887720B CN 113887720 B CN113887720 B CN 113887720B CN 202111148518 A CN202111148518 A CN 202111148518A CN 113887720 B CN113887720 B CN 113887720B
Authority
CN
China
Prior art keywords
feature map
block
input feature
data
pixels
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111148518.6A
Other languages
Chinese (zh)
Other versions
CN113887720A (en
Inventor
施先广
胡有能
李一涛
何增
马德
岳克强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202111148518.6A priority Critical patent/CN113887720B/en
Publication of CN113887720A publication Critical patent/CN113887720A/en
Application granted granted Critical
Publication of CN113887720B publication Critical patent/CN113887720B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Abstract

The invention discloses an up-sampling reverse blocking mapping method, which comprises the following steps: s1, reading input feature map data and storing the input feature map data in a shift buffer area; s2, finding out pixel points in the output feature map blocks, and mapping the positions of the four nearest adjacent pixel points in the input feature map; s3, calculating to obtain pixel values of the output feature image by adopting a pipeline mode, wherein the method comprises the following steps: s31, in the vertical direction, multiplying the parameters of the four nearest pixel points and the column direction once, and adding the obtained four intermediate values to obtain two intermediate values; s32, multiplying the two intermediate values with parameters in the row direction to obtain two intermediate values, and adding the two intermediate values once to obtain an up-sampled pixel point of the output feature map; s4, returning to S1 after the data of the first block are processed, and processing the next block; s5, after the input feature map is processed, continuing to process the next feature map according to the register instruction.

Description

Upsampling reverse blocking mapping method
Technical Field
The invention relates to the technical field of neural networks, in particular to an up-sampling reverse blocking mapping method.
Background
The appearance of the deep learning algorithm further makes the application of artificial intelligence technology break through. In the early stage, the deep learning algorithm mainly runs on a server with a high-performance GPU, and with deep learning deep application, people find that the method is simple and efficient, but has the problems of high power consumption, large volume and the like. In view of these problems, the neural network accelerator can effectively solve the problems. Upsampling is an important ring in neural network accelerator implementation and is therefore also gaining higher importance and development.
The main function of up-sampling is to construct new pixel points by using the existing pixel points, and the up-sampling is commonly used in a neural network accelerator, such as enhancement of a feature map, restoration of the size of the feature map and the like. It is important for correct recognition and learning of deep learning. The existing up-sampling is usually a point-to-point mapping, that is, four adjacent points of the input feature map are read, a new pixel point is calculated, and then the four points of the input feature map are read again, so that a new pixel point is calculated. The disadvantage of this is that data multiplexing cannot be realized, because the pixels with different output feature patterns can be mapped to four adjacent pixels with the same input feature patterns, if the mapping is point-to-point, repeated reading will occur, which causes waste of time and resources.
Disclosure of Invention
In order to solve the defects in the prior art and realize the purpose of high-efficiency multiplexing of data, the invention adopts the following technical scheme:
an up-sampling reverse blocking mapping method comprises the following steps:
S1, reading input feature map data, storing the input feature map data in a shift buffer area, and generating blocks of an output feature map according to four nearest pixel points ram [0], ram [1], ram [ W ], ram [ W+1] of the input feature map;
s2, finding out pixel points in the output feature map blocks, mapping the positions of the four nearest neighboring pixel points in the input feature map, mapping different pixel points in each block of the output feature map to points in the input feature map, and separating the same four nearest neighboring pixel points in the input feature map by different distances in the horizontal and vertical directions, wherein h_param00 and h_param01 are parameters in the row direction, represent the distances in the horizontal direction, and v_param00 and v_param01 are parameters in the column direction, and represent the distances in the vertical direction;
S3, calculating the distance parameters obtained by the sum S2 of the pixel points obtained by the S1 in a pipeline mode to obtain the pixel value of the output characteristic image, wherein the method comprises the following steps:
s31, in the vertical direction, multiplying the parameters v_param00 and v_param01 of the four nearest pixels and the column direction once, and adding the obtained four intermediate values to obtain two intermediate values;
S32, multiplying the two intermediate values with parameters h_param00 and h_param01 in the row direction to obtain two intermediate values, and adding the two intermediate values once to obtain an up-sampled pixel point of the output feature diagram;
S4, returning to S1 after the data of the first block are processed, and processing the next block;
S5, after the input feature map is processed, continuing to process the next feature map according to the register instruction.
The data to be processed and the calculation parameters can be obtained rapidly through inverse block mapping and data multiplexing, and then the two-stage multiply-add operation is realized by utilizing the assembly line, so that the calculation rate is improved.
Further, in the step S1, the data buffer block adopts a shift buffer mode, the input feature map w×h, W, H is greater than or equal to 2, W represents the width of the input feature map as W pixels, H represents the height of the input feature map as H pixels, the number of pixels initially input to the buffer block is w+2, the pixels in the first block of the output feature map are generated according to the four nearest neighboring pixels of the input feature map, after all the pixels in the block are generated, new data are read from left to right, and the pixels in the second block of the output feature map are generated, and so on until all the blocks of the output feature map are generated.
Further, after the rightmost block of the output feature map is generated, the pixel points of the two input feature maps are read in, and the next block of the output feature map is generated.
Further, for the input feature map w×h, w=1, h≡2, w≡2, h=1, or w=1, h=1, the pixel point which does not exist in the four nearest-neighbor pixel points of the input feature map is set to 0.
Further, in the step S2, the output feature map is mapped to four nearest neighboring pixels of the input feature map in a reverse block mapping manner, the mapping direction is from left to right to the block at the far right, and then from left to right from the lower left block, so as to cycle until the complete output feature map is mapped, the same block of the output feature map is mapped to four nearest neighboring pixels of the same input feature map, and different blocks of the output feature map are mapped to four nearest neighboring pixels of the input feature map which are not identical.
Further, in S2, when the size of the input feature map is w×h, w=1, h≡2, w+.2, h=1, or w=1, h=1, the pixel that does not exist in the four pixels of the input feature map is set to 0.
Further, in S3, data ram [0], data ram [1], data ram [ W ] and data ram [ W+1] are read out from the shift buffer.
Further, in the step S3, when the feature diagram W×H is input, W is equal to or greater than 2, and H=1, the data ram [0] and the data ram [1] are read out from the shift buffer area, and the data ram [ W ] and the data ram [ W+1] are assigned with 0; when the characteristic diagram W multiplied by H is input, W=1, H is more than or equal to 2, reading data ram [0] and data ram [ W ] from the shift buffer area, and assigning 0 to data ram [1] and data ram [ W+1 ]; when the feature map w×h, w=1, h=1, the data ram [0] is read out from the shift buffer, and the data ram [1], the data ram [ W ], and the data ram [ w+1] are assigned 0.
Because the design is very flexible, the initial data reading mode can realize up-sampling of a single pixel point (1 multiplied by 1), a single column pixel point (1 multiplied by H) and a single row pixel point (W multiplied by 1) only by slightly changing.
The invention has the advantages that:
The invention realizes the high-efficiency multiplexing of data by a block reverse mapping mode and the buffer design of the shift register; the data processing adopts a pipeline mode, so that the calculation speed is improved; the data calculation process uses a floating point-to-fixed point mode, and is customized and optimized according to specific design precision requirements, so that the cache and the power consumption consumed by multiply-add operation in the data processing process are reduced; meanwhile, the parallel mode is simple, most of the existing neural networks can be matched, and the parallel mode is irrelevant to the specific layer number of the neural networks.
Drawings
Fig. 1 is a flow chart of the method of the present invention.
FIG. 2 is an up-sampling reverse blocking mapping input feature map data cache map (W, H.gtoreq.2) of the present invention.
FIG. 3 is a data cache diagram of an upsampled inverse blocking map input feature map (W=1, H.gtoreq.2 or W.gtoreq.2, H=1) of the present invention.
Fig. 4 is a data buffer diagram (w=1, h=1) of an upsampled inverse blocking map input feature map according to the present invention.
FIG. 5 is a schematic diagram of a data cache block according to the present invention.
FIG. 6 is a schematic diagram of an upsampling inverse blocking mapping scheme (W, H.gtoreq.2) according to the present invention.
FIG. 7 is a schematic diagram of an upsampling inverse blocking mapping scheme (W=1, H.gtoreq.2 or W.gtoreq.2, H=1) according to the present invention.
Fig. 8 is a schematic diagram of an upsampling inverse block mapping scheme (w=1, h=1) according to the present invention.
Fig. 9 is a schematic diagram of upsampling parameter generation in the present invention.
Fig. 10 is a schematic diagram of an up-sampling data processing module in the present invention.
Detailed Description
The following describes specific embodiments of the present invention in detail with reference to the drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the invention, are not intended to limit the invention.
The inverse block mapping and multiplexing calculation up-sampling method of the neural network accelerator is mentioned in the design. In the block mapping, the four nearest adjacent pixel points of the input feature map are not identical, and the same part adopts data multiplexing, so that the condition of repeated reading can not occur.
As shown in fig. 1, an upsampling inverse blocking mapping method includes the following steps:
Step one: and reading partial data of the characteristic diagram according to a contracted mode and storing the partial data in a shift buffer area.
As shown in fig. 2-4, the input signature buffer block does not buffer only the four nearest four pixels for data multiplexing.
As shown in fig. 5, the data buffer block adopts a shift buffer mode. The timing of the data caches is slightly different for different sizes of input feature maps. As shown in fig. 2, for an input feature map with a width and height of w×h (W, H is greater than or equal to 2), the number of data input to the buffer block is (w+2), at this time, a first block of the output feature map appears for the first time, which corresponds to four pixels nearest to the input feature map, and can start to generate pixels in the first block of the output feature map, after all pixels in the block are generated, only one data is required to be read in, and then the pixel in a second block of the output feature map can be generated, and note that after the rightmost block of the output feature map is generated, the next block of the output feature map can be generated only by reading in the data of the two input feature maps, and the steps are sequentially circulated until all the blocks of the output feature map are generated; as shown in fig. 3, for an input feature map size w×h (w=1, H is greater than or equal to 2 or W is greater than or equal to 2, h=1), there are actually no four nearest-neighbor pixels, the number of data initially input to the buffer block is 2, the remaining two pixels are replaced by 0, so that the generation of pixels in the first block of the output feature map can be started, and after all pixels in the block are generated, only one data is read, the generation of pixels in the second block of the output feature map can be started; as shown in fig. 4, for an input feature map having an input feature map size of w×h (w=1, h=1), there is only one block of the output feature map, the number of data initially input to the buffer block is 1, and the remaining three pixels are replaced with 0, so that the generation of the output feature map can be started.
Step two: and partitioning according to a partitioning reverse mapping mode, sequentially finding out pixel points in the partitioning, and calculating four parameters of the pixel points of the output feature image at positions corresponding to the input features.
As shown in fig. 6-8, the dashed boxes respectively represent the blocks of the output feature map and the four nearest pixel points mapped on the input feature map, and as shown in fig. 6, the output feature map is mapped to the four nearest pixel points of the input feature map by means of inverse block mapping, the mapping direction is from left to right to the block at the far right, and then from left to right from the lower left block, so that the cycle is performed until the complete output feature map is mapped. The same block of the output feature map is mapped to four nearest neighbor pixels of the same input feature map, and different blocks of the output feature map are mapped to four nearest neighbor pixels of the input feature map that are not identical. As shown in fig. 7 and 8, when the input feature map size is w×h (w=1, h≡2 or w≡2, h=1) or w×h (w=1, h=1), actually, there are no four nearest-neighbor pixels in the input feature map, and the pixel that is not present is replaced with a pixel having a value of 0.
As shown in FIG. 9, ram [0], ram [1] or 0, ram [ W ] or 0, ram [ W+1] or 0 are respectively the four nearest neighboring pixels in the input feature map, dout is a pixel in a certain block of the output feature map, and mapped at the position of the input feature map. As shown in fig. 6-9, each block of the output feature map often contains a plurality of pixel points, all the pixel points in the same block are mapped to the same four nearest neighboring pixel points, but the horizontal distance and the vertical distance of the point of different pixel points mapped in the input feature map from the nearest neighboring four pixel points are not identical, i.e. h_param00, h_param01, v_param00 and v_param01 are not identical.
Step three: inputting the parameters obtained in the first step and the parameters obtained in the second step into a data processing module, and calculating the pixel values of the output feature images by adopting a pipeline mode by the module;
The whole data processing module is divided into two stages of calculation, and each stage is a multiplication and addition combination. When the characteristic diagram (W multiplied by H, W, H is more than or equal to 2) is input, reading data ram [0], data ram [1], data ram [ W ] and data ram [ W+1] from the shift buffer; when a feature map (W multiplied by H, W is more than or equal to 2 and H=1) is input, reading data ram [0] and data ram [1] from the shift buffer area, and assigning 0 to the data ram [ W ] and the data ram [ W+1 ]; when a feature map (W multiplied by H, W=1, H is more than or equal to 2) is input, reading data ram [0] and data ram [ W ] from the shift buffer area, and assigning 0 to data ram [1] and data ram [ W+1 ]; when the feature map (w×h, w=1, h=1) is input, the data ram [0] is read out from the shift buffer, and the data ram [1], the data ram [ W ], and the data ram [ w+1] are assigned 0. Because the design is very flexible, the initial data reading mode can realize up-sampling of a single pixel point (1 multiplied by 1), a single column pixel point (1 multiplied by H) and a single row pixel point (W multiplied by 1) only by slightly changing;
As shown in fig. 10, the first-stage multiplication and addition is to input four adjacent pixel values and parameters (v_param00 and v_param01) in the column direction in the vertical direction, and four intermediate values are obtained through one multiplication; and respectively adding the four intermediate values to obtain two intermediate values, and completing one multiplication and addition operation at the moment. The second-stage multiply-add is to multiply two intermediate values generated by the first-stage multiply-add with the row direction parameters (h_param00 and h_param01) respectively to obtain two intermediate values; and adding the two intermediate values again to obtain a result, namely outputting the pixel points after up-sampling of the feature map. The data to be processed and the calculation parameters can be obtained rapidly through inverse block mapping and data multiplexing, and then the two-stage multiply-add operation is realized by utilizing the assembly line, so that the calculation rate is improved.
Step four: after the data of one block is processed, the next block is processed through the first step, the second step and the third step;
step five: after the input feature map is processed, the next feature map is processed continuously according to the register instruction.
The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced with equivalents; such modifications and substitutions do not depart from the spirit of the technical solutions according to the embodiments of the present invention.

Claims (8)

1. An up-sampling reverse blocking mapping method is characterized by comprising the following steps:
S1, reading input feature map data, storing the input feature map data in a shift buffer area, and generating blocks of an output feature map according to four nearest pixel points ram [0], ram [1], ram [ W ], ram [ W+1] of the input feature map;
s2, finding out pixel points in the output feature map blocks, mapping the positions of the four nearest neighboring pixel points in the input feature map, mapping different pixel points in each block of the output feature map to points in the input feature map, and separating the same four nearest neighboring pixel points in the input feature map by different distances in the horizontal and vertical directions, wherein h_param00 and h_param01 are parameters in the row direction, represent the distances in the horizontal direction, and v_param00 and v_param01 are parameters in the column direction, and represent the distances in the vertical direction;
S3, calculating the distance parameters obtained by the sum S2 of the pixel points obtained by the S1 in a pipeline mode to obtain the pixel value of the output characteristic image, wherein the method comprises the following steps:
s31, in the vertical direction, multiplying the parameters v_param00 and v_param01 of the four nearest pixels and the column direction once, and adding the obtained four intermediate values to obtain two intermediate values;
S32, multiplying the two intermediate values with parameters h_param00 and h_param01 in the row direction to obtain two intermediate values, and adding the two intermediate values once to obtain an up-sampled pixel point of the output feature diagram;
S4, returning to S1 after the data of the first block are processed, and processing the next block;
S5, after the input feature map is processed, continuing to process the next feature map according to the register instruction.
2. The method for mapping up-sampling reverse blocks according to claim 1, wherein in the step S1, the data buffer block adopts a shift buffer mode, the input feature map w×h, W, H is equal to or greater than 2, W represents the width of the input feature map W pixels, H represents the height of the input feature map H pixels, the number of pixels initially input to the buffer block is w+2, the pixels in the first block of the output feature map are generated according to four nearest neighboring pixels of the input feature map, after all pixels in the block are generated, new data are read from left to right, and the pixels in the second block of the output feature map are generated, and so on until all the blocks of the output feature map are generated.
3. The method of up-sampling inverse block mapping according to claim 2, wherein after generating the rightmost block of the output feature map, the pixels of the two input feature maps are read in to generate the next block of the output feature map.
4. The method according to claim 2, wherein for the input feature map w×h, w=1, h++2, or w++2, h=1, or w=1, h=1, the pixel point that does not exist in the four nearest neighboring pixels of the input feature map is set to 0.
5. The up-sampling inverse block mapping method according to claim 2, wherein in S2, the output feature map is mapped to four nearest neighboring pixels of the input feature map by means of inverse block mapping, the mapping direction is from left to right to the block on the right, and then from left to right from the lower left block, so as to cycle until the complete output feature map is mapped, the same block of the output feature map is mapped to four nearest neighboring pixels of the same input feature map, and different blocks of the output feature map are mapped to four nearest neighboring pixels of the input feature map which are not identical.
6. The method of up-sampling inverse block mapping according to claim 4, wherein in S2, for the input feature map size w×h, w=1, h≡2, or w≡2, h=1, or w=1, h=1, the pixel that does not exist in the four pixels of the input feature map is set to 0.
7. The method according to claim 5, wherein in S3, data ram [0], data ram [1], data ram [ W ] and data ram [ w+1] are read out from the shift buffer.
8. The method of up-sampling inverse block mapping according to claim 6, wherein in S3, when the input feature map w×h, W is equal to or greater than 2, h=1, data ram [0], data ram [1] are read out from the shift buffer, and data ram [ W ] and data ram [ w+1] are assigned 0; when the characteristic diagram W multiplied by H is input, W=1, H is more than or equal to 2, reading data ram [0] and data ram [ W ] from the shift buffer area, and assigning 0 to data ram [1] and data ram [ W+1 ]; when the feature map w×h, w=1, h=1, the data ram [0] is read out from the shift buffer, and the data ram [1], the data ram [ W ], and the data ram [ w+1] are assigned 0.
CN202111148518.6A 2021-09-29 2021-09-29 Upsampling reverse blocking mapping method Active CN113887720B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111148518.6A CN113887720B (en) 2021-09-29 2021-09-29 Upsampling reverse blocking mapping method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111148518.6A CN113887720B (en) 2021-09-29 2021-09-29 Upsampling reverse blocking mapping method

Publications (2)

Publication Number Publication Date
CN113887720A CN113887720A (en) 2022-01-04
CN113887720B true CN113887720B (en) 2024-04-26

Family

ID=79007775

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111148518.6A Active CN113887720B (en) 2021-09-29 2021-09-29 Upsampling reverse blocking mapping method

Country Status (1)

Country Link
CN (1) CN113887720B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101795405A (en) * 2009-11-06 2010-08-04 杭州士兰微电子股份有限公司 H.264 high-speed luminance interpolating device and method
CN108133270A (en) * 2018-01-12 2018-06-08 清华大学 Convolutional neural networks accelerating method and device
CN110363284A (en) * 2019-06-20 2019-10-22 东南大学 A kind of convolutional neural networks hardware accelerator of the novel convolution algorithm accelerating module of band

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101795405A (en) * 2009-11-06 2010-08-04 杭州士兰微电子股份有限公司 H.264 high-speed luminance interpolating device and method
CN108133270A (en) * 2018-01-12 2018-06-08 清华大学 Convolutional neural networks accelerating method and device
CN110363284A (en) * 2019-06-20 2019-10-22 东南大学 A kind of convolutional neural networks hardware accelerator of the novel convolution algorithm accelerating module of band

Also Published As

Publication number Publication date
CN113887720A (en) 2022-01-04

Similar Documents

Publication Publication Date Title
CN109063825B (en) Convolutional neural network accelerator
Gao et al. Pixel transposed convolutional networks
CN107340993B (en) Arithmetic device and method
US10096134B2 (en) Data compaction and memory bandwidth reduction for sparse neural networks
CN109934331B (en) Apparatus and method for performing artificial neural network forward operations
US5740285A (en) Image reduction/enlargement technique
CN110780923B (en) Hardware accelerator applied to binary convolution neural network and data processing method thereof
CN112348870B (en) Significance target detection method based on residual error fusion
CN109993293B (en) Deep learning accelerator suitable for heap hourglass network
CN110348531B (en) Deep convolution neural network construction method with resolution adaptability and application
CN111666442B (en) Image retrieval method and device and computer equipment
CN109447897B (en) Real scene image synthesis method and system
Chervyakov et al. Residue number system-based solution for reducing the hardware cost of a convolutional neural network
Li et al. Efficient depthwise separable convolution accelerator for classification and UAV object detection
CN113362242B (en) Image restoration method based on multi-feature fusion network
CN108171328A (en) A kind of convolution algorithm method and the neural network processor based on this method
CN111882053B (en) Neural network model compression method based on splicing convolution
CN109993701B (en) Depth map super-resolution reconstruction method based on pyramid structure
CN113887720B (en) Upsampling reverse blocking mapping method
CN116245765A (en) Image denoising method and system based on enhanced depth expansion convolutional neural network
CN115660984A (en) Image high-definition restoration method and device and storage medium
CN110619387B (en) Channel expansion method based on convolutional neural network
CN109416757B (en) Method, apparatus and computer-readable storage medium for processing numerical data
CN110807479A (en) Neural network convolution calculation acceleration method based on Kmeans algorithm
CN112329544A (en) Gesture recognition machine learning method and system based on depth information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant