CN114612309A - Full-on-chip dynamic reconfigurable super-resolution device - Google Patents

Full-on-chip dynamic reconfigurable super-resolution device Download PDF

Info

Publication number
CN114612309A
CN114612309A CN202210512559.7A CN202210512559A CN114612309A CN 114612309 A CN114612309 A CN 114612309A CN 202210512559 A CN202210512559 A CN 202210512559A CN 114612309 A CN114612309 A CN 114612309A
Authority
CN
China
Prior art keywords
convolution
data
circuit
length
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210512559.7A
Other languages
Chinese (zh)
Other versions
CN114612309B (en
Inventor
常亮
赵鑫
周军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202210512559.7A priority Critical patent/CN114612309B/en
Publication of CN114612309A publication Critical patent/CN114612309A/en
Application granted granted Critical
Publication of CN114612309B publication Critical patent/CN114612309B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)

Abstract

A full-on-chip dynamic reconfigurable super-resolution device belongs to the technical field of image processing. The full-on-chip dynamic reconfigurable super-resolution device comprises a preprocessing circuit, an arithmetic operation circuit, an interpolation circuit and a post-processing circuit; the preprocessing circuit comprises a weight buffer, an input buffer and an input image color space conversion circuit, the arithmetic operation circuit comprises a data redistribution circuit, a convolution calculation block, a shared addition tree circuit and an interlayer buffer, the interpolation circuit comprises a nearest neighbor interpolation circuit and a temporary buffer, and the post-processing circuit comprises an output shaping circuit and an output image color space conversion circuit. The invention adopts the mapping strategy of convolution compression, convolution decomposition and PE remapping and the convolution calculation block consisting of a plurality of dynamic reconfigurable PE calculation units, greatly reduces the calculation amount of deconvolution calculation, improves the calculation efficiency of deconvolution calculation, effectively eliminates invalid calculation and avoids the problem of unbalanced calculation load.

Description

Full-on-chip dynamic reconfigurable super-resolution device
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a full-on-chip dynamically reconfigurable super-resolution device.
Background
With the development of artificial intelligence algorithm, the super-resolution network based on deep learning has better imagesThe reconstruction effect has good application prospect in key industries such as old photo repair, medical detection, security monitoring and the like. However, the super-resolution network has the problems of large calculation amount and large data access amount, so that the requirement on hardware is extremely high. The reason for this is mainly due to the large amount of calculation in the deconvolution process. In order to solve the above problems, Chang et al (Chang, Jung-Wo, Keon-Wo Kang, and Suk-Ju Kang. "Anenerg-effective FPGA-based conditional functional network access for single image super-resolution"IEEE Transactions on Circuits and Systems for Video Technology30.1 (2018): 281 and 295.) proposes a TDC (transposed convolution conversion) method for converting deconvolution into convolution, which effectively reduces the amount of deconvolution calculation by converting deconvolution into convolution operation and redistributing calculation tasks. The problems of low utilization rate of PE (processing elements) and unbalanced calculation load still exist, and the speedup space of deconvolution is not fully mined.
Disclosure of Invention
The invention aims to provide a full-on-chip dynamically reconfigurable super-resolution device aiming at the problems in the background art.
In order to realize the purpose, the technical scheme adopted by the invention is as follows:
a full-on-chip dynamic reconfigurable super-resolution device comprises a preprocessing circuit, an arithmetic operation circuit, an interpolation circuit and a post-processing circuit;
wherein the pre-processing circuit comprises a weight buffer, an input buffer and an input image color space conversion circuit; the weight buffer is used for caching weight data of the super-resolution network, the input buffer is used for caching input image data, the input image color space conversion circuit reads the input image data in the input buffer and converts the input image data from an RGB format into a YCbCr format, Y-channel data obtained after conversion are input into the data redistribution circuit, and Cb and Cr channel data are input into the nearest neighbor interpolation circuit;
the arithmetic operation circuit comprises a data redistribution circuit, a convolution calculation block, a shared addition tree circuit and an interlayer buffer; the data redistribution circuit reads the weight data in the weight buffer and the Y-channel data output by the input image color space conversion circuit, and redistributes the weight data and the Y-channel data according to a designated mapping strategy according to a scaling factor to obtain redistributed data; the convolution calculation block receives the redistributed data and performs convolution operation on the redistributed data to obtain a convolution operation result; the shared addition tree circuit receives convolution operation results and accumulates the convolution operation results to obtain output characteristic diagram data of a current layer in the super-resolution network; the interlayer buffer receives and stores output characteristic diagram data of a current layer, when the maximum number of layers of the super-resolution network is not reached, the output characteristic diagram data is used as an input characteristic diagram of a next layer of network and is input into the data redistribution circuit, and when the maximum number of layers of the super-resolution network is reached, the output characteristic diagram data is input into the output shaping circuit;
The interpolation circuit comprises a nearest neighbor interpolation circuit and a temporary buffer; the nearest neighbor interpolation circuit interpolates the received Cb and Cr channel data based on a nearest neighbor interpolation strategy to obtain an interpolated characteristic diagram; the temporary buffer receives and buffers the characteristic graph after interpolation;
the post-processing circuit comprises an output shaping circuit and an output image color space conversion circuit; the output shaping circuit reads the output characteristic diagram data of the interlayer buffer and rearranges the output characteristic diagram data to obtain the sequential output of Y-channel data; the output image color space conversion circuit reads the sequential output of the Y-channel data and the characteristic diagram after interpolation, and converts the sequential output of the Y-channel data and the characteristic diagram after interpolation into RGB format data for output.
Further, the input image data is a plurality of sub-images obtained by segmenting the original image.
Further, the weight data is obtained by segmenting the original image into a plurality of sub-images, and then training the segmented sub-images as a training data set.
Further, the mapping strategy comprises convolution compression, convolution decomposition and PE remapping processes, wherein the convolution compression compresses the weight data and the Y-channel data according to a scaling factor, and removes a 0 value in the Y-channel data and the weight data corresponding to the 0 value; the convolution decomposition decomposes the compressed data into convolutions of different lengths; PE remapping combines convolutions of different lengths into convolutions of fixed length and inputs them to a convolution computation block.
Further, when the scaling factor is 2, the apparatus can realize
Figure 128811DEST_PATH_IMAGE001
And the deconvolution parallel operation with the size of 9 multiplied by 9, wherein the processing procedures of the data redistribution circuit and the convolution calculation block are as follows:
1) and (3) convolution compression:
compressing the input Y-channel data and the weight data to obtain
Figure 45951DEST_PATH_IMAGE002
A convolution of size 5 × 5,
Figure 888006DEST_PATH_IMAGE003
Convolution with size of 5 × 4,
Figure 345532DEST_PATH_IMAGE002
4 x 5 convolution sum
Figure 222221DEST_PATH_IMAGE002
A convolution of size 4 x 4, wherein,mandnis a positive integer;
2) and (3) convolution decomposition:
will be provided with
Figure 107000DEST_PATH_IMAGE002
A convolution of size 5 x 5 is decomposed into
Figure 436350DEST_PATH_IMAGE004
A convolution sum of length 9
Figure 635251DEST_PATH_IMAGE003
A convolution of length 7;
Figure 366446DEST_PATH_IMAGE002
a convolution of size 5 x 4 is decomposed into
Figure 422127DEST_PATH_IMAGE004
A convolution sum of length 9
Figure 176456DEST_PATH_IMAGE003
A length-2 convolution;
Figure 913468DEST_PATH_IMAGE002
a convolution of size 4 x 5 is decomposed into
Figure 499170DEST_PATH_IMAGE004
A convolution sum of length 9
Figure 991331DEST_PATH_IMAGE003
A length-2 convolution;
Figure 295274DEST_PATH_IMAGE002
a convolution of size 4 x 4 is decomposed into
Figure 695031DEST_PATH_IMAGE002
A convolution sum of length 9
Figure 400819DEST_PATH_IMAGE003
A convolution of length 7;
3) PE remapping:
the convolution of length 7 is combined with the convolution of length 2 to yield
Figure 735985DEST_PATH_IMAGE004
Convolution of length 9 and then with the rest
Figure 527224DEST_PATH_IMAGE005
One longConvolution with degree 9 is input into a convolution calculation block together;
4) and (3) deconvolution operation:
input ofmnThe convolution is sent into a convolution calculation block for convolution calculation to obtain mnAnd convolution operation results.
Further, when the scaling factor is 3, the apparatus can realizemnAnd the deconvolution parallel operation with the size of 9 multiplied by 9, wherein the processing procedures of the data redistribution circuit and the convolution calculation block are as follows:
1) and (3) convolution compression:
compressing the input Y-channel data and the weight data to obtainmnConvolution of size 3 × 3;
2) and (3) convolution decomposition:
will be provided withmnA convolution of size 3 x 3 is decomposed intomnA convolution of length 9;
3) PE remapping:
will be provided withmnConvolution input convolution calculation blocks with the length of 9;
4) and (3) deconvolution operation:
input ofmnThe convolution is sent into a convolution calculation block for convolution calculation to obtainmnAnd (5) convolution operation results.
Further, when the scaling factor is 4, the apparatus can realize
Figure 606038DEST_PATH_IMAGE006
Parallel operation of deconvolution with the size of 9 × 9, wherein the processing processes of the data redistribution circuit and the convolution calculation block are specifically as follows:
1) and (3) convolution compression:
compressing the input Y-channel data and the weight data to obtain
Figure 166333DEST_PATH_IMAGE002
Convolution with a size of 3 x 3,
Figure 872733DEST_PATH_IMAGE007
Convolution with size of 3 x 2,
Figure 885688DEST_PATH_IMAGE008
Convolution with size of 2 x 3,mnConvolution of size 2 × 2;
2) and (3) convolution decomposition:
will be provided with
Figure 830511DEST_PATH_IMAGE002
A convolution of size 3 x 3 is decomposed into
Figure 245311DEST_PATH_IMAGE002
A convolution of length 9;
Figure 250177DEST_PATH_IMAGE007
a convolution of size 3 x 2 is decomposed into
Figure 688111DEST_PATH_IMAGE009
A length-3 convolution;
Figure 108728DEST_PATH_IMAGE010
a convolution of size 2 x 3 is decomposed into
Figure 643615DEST_PATH_IMAGE011
A convolution sum of length 3
Figure 757064DEST_PATH_IMAGE009
A length-2 convolution;mna convolution of size 2 x 2 is decomposed into
Figure 744612DEST_PATH_IMAGE012
A length 4 convolution sum
Figure 968920DEST_PATH_IMAGE013
A length-2 convolution;
3) PE remapping:
combining the convolution of length 4, the convolution of length 3 and the convolution of length 2 to obtain
Figure 92734DEST_PATH_IMAGE012
Convolution of length 9 and then with the rest
Figure 377084DEST_PATH_IMAGE002
Inputting convolution with the length of 9 into a convolution calculation block;
4) and (3) deconvolution operation:
input ofmnThe convolution is sent into a convolution calculation block for convolution calculation to obtainmnAnd (5) convolution operation results.
Preferably, whenm×nThe device has the highest calculation efficiency when the frequency is multiple of 9.
Wherein the convolution computation block comprisesm×nThe dynamic reconfigurable PE computing unit comprises 1 st to 9 th pixel points, 1 st to 9 th weight data points, 1 st to 9 th multipliers, 1 st to 8 th adders, a first data selector and a second data selector; 1 st pixel point A1And the 1 st weight data point W1The product of (2) with the 2 nd pixel A2And 2 nd weight data point W2The 1 st data is obtained by adding the products of the first and second adders in the 1 st adder; point 3 of pixel A 3And 3 rd weight data point W3The product of (2) and the 4 th pixel A4And the 4 th weight data point W4The products of (1) are added in a 3 rd adder to obtain 2 nd data; adding the 1 st data and the 2 nd data in a 2 nd adder to obtain 3 rd data, inputting the 3 rd data into an input end of a first data selector, connecting one output end of the first data selector with a first input end of a 4 th adder, and taking the other output end as a first output of the dynamic reconfigurable PE computing unit; the 6 th pixel point A6And the 6 th weight data point W6The product of (2) and the 7 th pixel A7And the 7 th weight data point W7The products of (1) are added in a 6 th adder to obtain 4 th data; the 5 th pixel point A5And the 5 th weight data point W5The product of (1) and the 4 th data are added in a 5 th adder to obtain 5 th data; adding the 5 th data and the data output by the first data selector in a 4 th adder to obtain 6 th data; 8 th pixel point A8And the 8 th weight data point W8The product of (2) and the 9 th pixel A9And the 9 th weight data point W9The products of (a) are added in an 8 th adder to obtain a 7 th productData; inputting the obtained 7 th data into the input end of a second data selector, and adding the data at one output end of the second data selector and the 6 th data in a 7 th adder to obtain a second output of the dynamic reconfigurable PE calculation unit; and the data at the other output end of the second data selector is used as a third output of the dynamic reconfigurable PE computing unit.
Further, each dynamically reconfigurable PE computing unit can implement 3 operating modes:
mode 0: outputting 1 convolution operation result with the length of 9;
mode 1: outputting 1 convolution operation result with the length of 7 and 1 convolution operation result with the length of 2;
mode 2: and outputting 1 convolution operation result with the length of 4, 1 convolution operation result with the length of 3 and 1 convolution operation result with the length of 2.
The result of the first output is 1 convolution operation result with the length of 4 in the mode 2; the result of the second output is 1 convolution operation result with the length of 9 in the mode 0, or 1 convolution operation result with the length of 7 in the mode 1, or 1 convolution operation result with the length of 3 in the mode 2; the result of the third output is 1 convolution operation result with length of 2 in mode 1 or 1 convolution operation result with length of 2 in mode 2.
Compared with the prior art, the invention has the beneficial effects that:
1. the full-on-chip dynamic reconfigurable super-resolution device provided by the invention adopts a mapping strategy of convolution compression, convolution decomposition and PE remapping and a convolution calculation block consisting of a plurality of dynamic reconfigurable PE calculation units, so that the calculation amount of deconvolution calculation is greatly reduced, the calculation efficiency of deconvolution calculation is improved, invalid calculation is effectively eliminated, and the problem of unbalanced calculation load is avoided.
2. According to the full on-chip dynamic reconfigurable super-resolution device provided by the invention, the input image data and the weight data are obtained by segmenting the original image into a plurality of sub-images and training, so that the data volume between layers is greatly reduced, the communication between an intermediate network layer and an off-chip memory is avoided, the full on-chip storage is realized, and the throughput of the device is improved.
Drawings
Fig. 1 is a schematic structural diagram of a full-on-chip dynamically reconfigurable super-resolution device provided by the present invention;
fig. 2 is a schematic structural diagram of a dynamic reconfigurable PE computing unit in the full on-chip dynamic reconfigurable super-resolution device provided by the present invention.
Detailed Description
The technical scheme of the invention is detailed in the following by combining the drawings and the embodiment.
Examples
Fig. 1 is a schematic structural diagram of a full on-chip dynamically reconfigurable super-resolution device provided by the present invention; the device comprises a preprocessing circuit, an arithmetic operation circuit, an interpolation circuit and a post-processing circuit;
wherein the pre-processing circuit comprises a weight buffer, an input buffer and an input image color space conversion circuit; the weight buffer is used for caching weight data of the super-resolution network, the input buffer is used for caching input image data, the input image color space conversion circuit reads the input image data in the input buffer and converts the input image data from an RGB format into a YCbCr format, Y-channel data obtained after conversion are input into the data redistribution circuit, and Cb and Cr channel data are input into the nearest neighbor interpolation circuit;
The arithmetic operation circuit comprises a data redistribution circuit, a convolution calculation block, a shared addition tree circuit and an interlayer buffer; the data redistribution circuit reads the weight data in the weight buffer and the Y-channel data output by the input image color space conversion circuit, and redistributes the weight data and the Y-channel data according to a designated mapping strategy according to a scaling factor to obtain redistributed data; the convolution calculation block receives the redistributed data and performs convolution operation on the redistributed data to obtain a convolution operation result; the shared addition tree circuit receives convolution operation results and accumulates the convolution operation results to obtain output characteristic diagram data of a current layer in the super-resolution network; the interlayer buffer receives and stores output characteristic diagram data of a current layer, when the maximum number of layers of the super-resolution network is not reached, the output characteristic diagram data is used as an input characteristic diagram of a next layer of network and is input into the data redistribution circuit, and when the maximum number of layers of the super-resolution network is reached, the output characteristic diagram data is input into the output shaping circuit;
the interpolation circuit comprises a nearest neighbor interpolation circuit and a temporary buffer; the nearest neighbor interpolation circuit interpolates the received Cb and Cr channel data based on a nearest neighbor interpolation strategy to obtain an interpolated characteristic diagram; the temporary buffer receives and buffers the characteristic graph after interpolation;
The post-processing circuit comprises an output shaping circuit and an output image color space conversion circuit; the output shaping circuit reads the output characteristic diagram data of the interlayer buffer and rearranges the output characteristic diagram data to obtain the sequential output of Y-channel data; the output image color space conversion circuit reads the sequential output of the Y-channel data and the feature map after interpolation, converts the sequential output of the Y-channel data and the feature map after interpolation into RGB format data and outputs the data.
The input image data is a plurality of RGB format images having a size of 54 × 36 obtained by dividing an original image having a size of 1080 × 720 in an RGB format.
The weight data is obtained by segmenting an original image into a plurality of sub-images and then training the segmented sub-images.
The convolution calculation block comprises 3 multiplied by 3 dynamic reconfigurable PE calculation units, and each dynamic reconfigurable PE calculation unit comprises 1 st to 9 th pixel points, 1 st to 9 th weight data points, 1 st to 9 th multipliers, 1 st to 8 th adders, a first data selector and a second data selector, as shown in FIG. 2; 1 st pixel point A1And the 1 st weight data point W1The product of (2) with the 2 nd pixel A 2And 2 nd weight data point W2The 1 st data is obtained by adding the products of the first and second adders in the 1 st adder; the 3 rd pixel A3And 3 rd weight data point W3The product of (2) and the 4 th pixel A4And the 4 th weight data point W4The products of (1) are added in a 3 rd adder to obtain 2 nd data; adding the 1 st data and the 2 nd data in a 2 nd adder to obtainThe 3 rd data of the first adder is input into the input end of a first data selector, one output end of the first data selector is connected with the first input end of a 4 th adder, and the other output end of the first data selector is used as the first output of the dynamic reconfigurable PE computing unit; the 6 th pixel point A6And the 6 th weight data point W6The product of (2) and the 7 th pixel A7And the 7 th weight data point W7The products of (1) are added in a 6 th adder to obtain 4 th data; the 5 th pixel point A5And the 5 th weight data point W5The product of (1) and the 4 th data are added in a 5 th adder to obtain 5 th data; adding the 5 th data and the data output by the first data selector in a 4 th adder to obtain 6 th data; 8 th pixel point A8And the 8 th weight data point W8The product of (2) and the 9 th pixel A9And the 9 th weight data point W9The products of (1) are added in an 8 th adder to obtain 7 th data; inputting the obtained 7 th data into the input end of a second data selector, and adding the data at one output end of the second data selector and the 6 th data in a 7 th adder to obtain a second output of the dynamic reconfigurable PE calculation unit; and the data at the other output end of the second data selector is used as a third output of the dynamic reconfigurable PE computing unit.
The scaling factor is set to 4, so that 16 deconvolution parallel operations with the size of 9 × 9 can be realized, and the processing processes of the data redistribution circuit and the convolution calculation block specifically include:
1) and (3) convolution compression:
compressing the input Y-channel data and the weight data to obtain 1 convolution with the size of 3 multiplied by 3, 3 convolutions with the size of 3 multiplied by 2, 3 convolutions with the size of 2 multiplied by 3 and 9 convolutions with the size of 2 multiplied by 2;
2) and (3) convolution decomposition:
decomposing 1 convolution of size 3 x 3 into 1 convolution of length 9; 3 convolutions of size 3 x 2 are decomposed into 6 convolutions of length 3; the 3 convolutions of size 2 x 3 are decomposed into 2 convolutions of length 3 and 6 convolutions of length 2; the 9 convolutions of size 2 x 2 are decomposed into 8 convolutions of length 4 and 2 convolutions of length 2;
3) PE remapping:
combining the convolution with the length of 4, the convolution with the length of 3 and the convolution with the length of 2 to obtain 8 convolutions with the length of 9, and inputting the 8 convolutions with the remaining 1 convolutions with the length of 9 into a convolution calculation block;
4) and (3) deconvolution operation:
the input 9 convolutions are sent into a convolution calculation block for convolution calculation to obtain 9 convolution operation results; wherein the convolution computation block comprises 9 dynamically reconfigurable PE computation units arranged in a 3 x 3 arrangement.
Although the present invention has been described with reference to a preferred embodiment, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (6)

1. A full-on-chip dynamic reconfigurable super-resolution device is characterized by comprising a preprocessing circuit, an arithmetic operation circuit, an interpolation circuit and a post-processing circuit;
wherein the pre-processing circuit comprises a weight buffer, an input buffer and an input image color space conversion circuit; the weight buffer is used for caching weight data of the super-resolution network, the input buffer is used for caching input image data, the input image color space conversion circuit reads the input image data in the input buffer and converts the input image data from an RGB format into a YCbCr format, Y-channel data obtained after conversion are input into the data redistribution circuit, and Cb and Cr channel data are input into the nearest neighbor interpolation circuit;
The arithmetic operation circuit comprises a data redistribution circuit, a convolution calculation block, a shared addition tree circuit and an interlayer buffer; the data redistribution circuit reads the weight data in the weight buffer and the Y-channel data output by the input image color space conversion circuit, and redistributes the weight data and the Y-channel data according to a designated mapping strategy according to a scaling factor to obtain redistributed data; the convolution calculation block receives the redistributed data and performs convolution operation on the redistributed data to obtain a convolution operation result; the shared addition tree circuit receives convolution operation results and accumulates the convolution operation results to obtain output characteristic diagram data of a current layer in the super-resolution network; the interlayer buffer receives and stores output characteristic diagram data of a current layer, when the maximum number of layers of the super-resolution network is not reached, the output characteristic diagram data is used as an input characteristic diagram of a next layer of network and is input into the data redistribution circuit, and when the maximum number of layers of the super-resolution network is reached, the output characteristic diagram data is input into the output shaping circuit;
the interpolation circuit comprises a nearest neighbor interpolation circuit and a temporary buffer; the nearest neighbor interpolation circuit interpolates the received Cb and Cr channel data based on a nearest neighbor interpolation strategy to obtain an interpolated characteristic diagram; the temporary buffer receives and buffers the characteristic diagram after interpolation;
The post-processing circuit comprises an output shaping circuit and an output image color space conversion circuit; the output shaping circuit reads the output characteristic diagram data of the interlayer buffer and rearranges the output characteristic diagram data to obtain the sequential output of Y-channel data; the output image color space conversion circuit reads the sequential output of the Y-channel data and the characteristic diagram after interpolation, and converts the sequential output of the Y-channel data and the characteristic diagram after interpolation into RGB format data for output.
2. The full on-chip dynamic reconfigurable super-resolution device according to claim 1, wherein the mapping strategy comprises convolution compression, convolution decomposition and PE remapping processes, wherein the convolution compression compresses the weight data and the Y-channel data according to a scaling factor, and removes a 0 value and the weight data corresponding to the 0 value in the Y-channel data; the convolution decomposition decomposes the compressed data into convolutions of different lengths; PE remapping combines convolutions of different lengths into convolutions of fixed length and inputs them to a convolution computation block.
3. The full on-chip dynamic reconfigurable super-resolution device according to claim 1, wherein when the scaling factor is 2, the processing procedures of the data redistribution circuit and the convolution calculation block are as follows:
1) And (3) convolution compression:
compressing the input Y-channel data and the weight data to obtain
Figure 586952DEST_PATH_IMAGE001
Convolution with size of 5 × 5,
Figure 24755DEST_PATH_IMAGE002
Convolution with size of 5 × 4,
Figure 334514DEST_PATH_IMAGE001
4 x 5 convolution sum
Figure 936396DEST_PATH_IMAGE001
A convolution of size 4 x 4, wherein,mandnis a positive integer;
2) and (3) convolution decomposition:
will be provided with
Figure 130749DEST_PATH_IMAGE001
A convolution of size 5 x 5 is decomposed into
Figure 970529DEST_PATH_IMAGE003
A convolution sum of length 9
Figure 982347DEST_PATH_IMAGE002
A convolution of length 7;
Figure 727318DEST_PATH_IMAGE001
each size of 5X 4The convolution is decomposed into
Figure 584415DEST_PATH_IMAGE003
A convolution sum of length 9
Figure 685227DEST_PATH_IMAGE002
A length-2 convolution;
Figure 602367DEST_PATH_IMAGE001
a convolution of size 4 x 5 is decomposed into
Figure 647683DEST_PATH_IMAGE003
A convolution sum of length 9
Figure 495423DEST_PATH_IMAGE002
A length-2 convolution;
Figure 450740DEST_PATH_IMAGE001
a convolution of size 4 x 4 is decomposed into
Figure 538782DEST_PATH_IMAGE001
A convolution sum of length 9
Figure 71395DEST_PATH_IMAGE002
A convolution of length 7;
3) PE remapping:
the convolution of length 7 is combined with the convolution of length 2 to yield
Figure 270295DEST_PATH_IMAGE003
Convolution of length 9 and then with the rest
Figure 329386DEST_PATH_IMAGE004
Inputting convolution with the length of 9 into a convolution calculation block;
4) and (3) deconvolution operation:
input ofmnIs sent by convolutionPerforming convolution calculation in a convolution calculation block to obtainmnAnd (5) convolution operation results.
4. The full on-chip dynamic reconfigurable super-resolution device according to claim 1, wherein when the scaling factor is 3, the processing procedures of the data redistribution circuit and the convolution calculation block are as follows:
1) And (3) convolution compression:
compressing the input Y-channel data and the weight data to obtainmnConvolution of size 3 × 3;
2) and (3) convolution decomposition:
will be provided withmnA convolution of size 3 x 3 is decomposed intomnA convolution of length 9;
3) PE remapping:
will be provided withmnConvolution input convolution calculation blocks with the length of 9;
4) and (3) deconvolution operation:
input ofmnThe convolution is sent into a convolution calculation block for convolution calculation to obtainmnAnd (5) convolution operation results.
5. The full on-chip dynamic reconfigurable super-resolution device according to claim 1, wherein when the scaling factor is 4, the processing procedures of the data redistribution circuit and the convolution calculation block are as follows:
1) and (3) convolution compression:
compressing the input Y-channel data and the weight data to obtain
Figure 791592DEST_PATH_IMAGE001
Convolution with a size of 3 x 3,
Figure 77080DEST_PATH_IMAGE005
Convolution with size of 3 x 2,
Figure 814091DEST_PATH_IMAGE006
Convolution with size of 2 x 3,mnConvolution of size 2 × 2;
2) and (3) convolution decomposition:
will be provided with
Figure 727690DEST_PATH_IMAGE001
A convolution of size 3 x 3 is decomposed into
Figure 360796DEST_PATH_IMAGE001
A convolution of length 9;
Figure 868001DEST_PATH_IMAGE005
a convolution of size 3 x 2 is decomposed into
Figure 408704DEST_PATH_IMAGE007
A length-3 convolution;
Figure 442388DEST_PATH_IMAGE008
a convolution of size 2 x 3 is decomposed into
Figure 308713DEST_PATH_IMAGE009
A convolution sum of length 3
Figure 37634DEST_PATH_IMAGE007
A length-2 convolution;mna convolution of size 2 x 2 is decomposed into
Figure 319711DEST_PATH_IMAGE010
A convolution sum of length 4
Figure 473481DEST_PATH_IMAGE011
A length-2 convolution;
3) PE remapping:
combining the convolution of length 4, the convolution of length 3 and the convolution of length 2 to obtain
Figure 245128DEST_PATH_IMAGE010
Convolution of length 9 and then with the rest
Figure 461345DEST_PATH_IMAGE001
Inputting convolution with the length of 9 into a convolution calculation block;
4) and (3) deconvolution operation:
input ofmnThe convolution is sent into a convolution calculation block for convolution calculation to obtainmnAnd (5) convolution operation results.
6. The full on-chip dynamic reconfigurable super-resolution device according to claim 1, wherein the convolution calculation block comprisesm×nThe dynamic reconfigurable PE computing unit comprises 1 st to 9 th pixel points, 1 st to 9 th weight data points, 1 st to 9 th multipliers, 1 st to 8 th adders, a first data selector and a second data selector; 1 st pixel point A1And the 1 st weight data point W1The product of (2) with the 2 nd pixel A2And 2 nd weight data point W2The 1 st data is obtained by adding the products of the first and second adders in the 1 st adder; point 3 of pixel A3And 3 rd weight data point W3The product of (2) and the 4 th pixel A4And the 4 th weight data point W4The products of (1) are added in a 3 rd adder to obtain 2 nd data; adding the 1 st data and the 2 nd data in a 2 nd adder to obtain 3 rd data, inputting the 3 rd data into an input end of a first data selector, connecting one output end of the first data selector with a first input end of a 4 th adder, and taking the other output end as a first output of the dynamic reconfigurable PE computing unit; the 6 th pixel point A 6And the 6 th weight data point W6The product of (2) and the 7 th pixel A7And 7 th weight data point W7The products of (1) are added in a 6 th adder to obtain 4 th data; the 5 th pixel A5And the 5 th weight data point W5The product of (1) and the 4 th data are added in a 5 th adder to obtain 5 th data; adding the 5 th data and the data output by the first data selector in a 4 th adder to obtain 6 th data; 8 th pixel point A8And the 8 th weight data point W8The product of (2) and the 9 th pixel A9And the 9 th weight data point W9The product of (a) is added at 8 thAdding the obtained data in the device to obtain 7 th data; inputting the obtained 7 th data into the input end of a second data selector, and adding the data at one output end of the second data selector and the 6 th data in a 7 th adder to obtain a second output of the dynamic reconfigurable PE calculation unit; and the data at the other output end of the second data selector is used as a third output of the dynamic reconfigurable PE computing unit.
CN202210512559.7A 2022-05-12 2022-05-12 Full-on-chip dynamic reconfigurable super-resolution device Active CN114612309B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210512559.7A CN114612309B (en) 2022-05-12 2022-05-12 Full-on-chip dynamic reconfigurable super-resolution device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210512559.7A CN114612309B (en) 2022-05-12 2022-05-12 Full-on-chip dynamic reconfigurable super-resolution device

Publications (2)

Publication Number Publication Date
CN114612309A true CN114612309A (en) 2022-06-10
CN114612309B CN114612309B (en) 2022-10-14

Family

ID=81870355

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210512559.7A Active CN114612309B (en) 2022-05-12 2022-05-12 Full-on-chip dynamic reconfigurable super-resolution device

Country Status (1)

Country Link
CN (1) CN114612309B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109544457A (en) * 2018-12-04 2019-03-29 电子科技大学 Image super-resolution method, storage medium and terminal based on fine and close link neural network
US10701394B1 (en) * 2016-11-10 2020-06-30 Twitter, Inc. Real-time video super-resolution with spatio-temporal networks and motion compensation
CN112991173A (en) * 2021-03-12 2021-06-18 西安电子科技大学 Single-frame image super-resolution reconstruction method based on dual-channel feature migration network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10701394B1 (en) * 2016-11-10 2020-06-30 Twitter, Inc. Real-time video super-resolution with spatio-temporal networks and motion compensation
CN109544457A (en) * 2018-12-04 2019-03-29 电子科技大学 Image super-resolution method, storage medium and terminal based on fine and close link neural network
CN112991173A (en) * 2021-03-12 2021-06-18 西安电子科技大学 Single-frame image super-resolution reconstruction method based on dual-channel feature migration network

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
HONG-LIANG CHANG .EL: "Reconstruction of Proton Image with Ion Recombination Compensation", 《2021 9TH INTERNATIONAL CONFERENCE ON ORANGE TECHNOLOGY》 *
欧阳宁等: "基于特征重用模型的超分辨率重建方法", 《桂林电子科技大学学报》 *
赵鑫: "多源高分辨率遥感图像自动配准算法研究 ——面向震后灾情快速评估", 《中国优秀博硕士学位论文全文数据库(硕士)》 *
陈晨等: "基于四通道卷积稀疏编码的图像超分辨率重建方法", 《计算机应用》 *
高昭昭等: "基于卷积神经网络的单帧毫米波图像超分辨算法", 《电子信息对抗技术》 *

Also Published As

Publication number Publication date
CN114612309B (en) 2022-10-14

Similar Documents

Publication Publication Date Title
CN1248510C (en) Picture processing system
US7415162B2 (en) Method and apparatus for lossless data transformation with preprocessing by adaptive compression, multidimensional prediction, multi-symbol decoding enhancement enhancements
Kim et al. SF-CNN: a fast compression artifacts removal via spatial-to-frequency convolutional neural networks
US20060228031A1 (en) Fast adaptive lifting lossless wavelet transform
CN114612309B (en) Full-on-chip dynamic reconfigurable super-resolution device
CN114399036A (en) Efficient convolution calculation unit based on one-dimensional Winograd algorithm
CN102271251A (en) Lossless image compression method
CN112702600B (en) Image coding and decoding neural network layered fixed-point method
CN112184587B (en) Edge data enhancement model, and efficient edge data enhancement method and system based on model
CN105245889B (en) A kind of reference frame compression method based on stratified sampling
CN107682699A (en) A kind of nearly Lossless Image Compression method
CN106559668A (en) A kind of low code rate image compression method based on intelligent quantization technology
Divakara et al. High speed area optimized hybrid da architecture for 2d-dtcwt
CN110737869B (en) DCT/IDCT multiplier circuit optimization method and application
KR20220114435A (en) Method and apparatus for accelerating convolutional neural networks
Khosla et al. Design of Hybrid Compression Model using DWT-DCT-HUFFMAN Algorithms for Compression of Bit Stream
CN113538237A (en) Image splicing system and method and electronic equipment
CN113674151A (en) Image super-resolution reconstruction method based on deep neural network
Belyaev et al. Lossless image compression algorithm based on haar transform
US7864857B1 (en) Data comparison methods and apparatus suitable for image processing and motion search
US7552160B2 (en) Integrated lifting wavelet transform
CN115187455A (en) Lightweight super-resolution reconstruction model and system for compressed image
CN115601242B (en) Lightweight image super-resolution reconstruction method suitable for hardware deployment
Wang et al. A lossless compression of remote sensing images based on ANS entropy coding algorithm
CN112308772B (en) Super-resolution reconstruction method based on deep learning local and non-local information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant