WO2019143024A1 - 라인 단위 연산을 이용한 초해상화 방법 및 장치 - Google Patents
라인 단위 연산을 이용한 초해상화 방법 및 장치 Download PDFInfo
- Publication number
- WO2019143024A1 WO2019143024A1 PCT/KR2018/015733 KR2018015733W WO2019143024A1 WO 2019143024 A1 WO2019143024 A1 WO 2019143024A1 KR 2018015733 W KR2018015733 W KR 2018015733W WO 2019143024 A1 WO2019143024 A1 WO 2019143024A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image processing
- feature map
- convolution
- image
- line
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 107
- 238000012545 processing Methods 0.000 claims abstract description 129
- 239000000872 buffer Substances 0.000 claims abstract description 52
- 238000003672 processing method Methods 0.000 claims abstract description 12
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 7
- 230000006835 compression Effects 0.000 description 30
- 238000007906 compression Methods 0.000 description 30
- 230000015654 memory Effects 0.000 description 26
- 238000004422 calculation algorithm Methods 0.000 description 23
- 238000013527 convolutional neural network Methods 0.000 description 23
- 238000013139 quantization Methods 0.000 description 22
- 239000010410 layer Substances 0.000 description 15
- 230000008569 process Effects 0.000 description 15
- 230000004913 activation Effects 0.000 description 14
- 238000012360 testing method Methods 0.000 description 12
- 238000010586 diagram Methods 0.000 description 10
- 238000013528 artificial neural network Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 8
- 238000013507 mapping Methods 0.000 description 6
- 238000011156 evaluation Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 230000015556 catabolic process Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 238000006731 degradation reaction Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 101150055297 SET1 gene Proteins 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 101100365548 Caenorhabditis elegans set-14 gene Proteins 0.000 description 1
- 108010002352 Interleukin-1 Proteins 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 230000000593 degrading effect Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000011229 interlayer Substances 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000000611 regression analysis Methods 0.000 description 1
- 229920006395 saturated elastomer Polymers 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000009827 uniform distribution Methods 0.000 description 1
- 238000010977 unit operation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/01—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
Definitions
- the embodiments described below relate to an image processing method and apparatus using line unit operations.
- UHD video is becoming popular in ultra High Definition (UHD) TV, Internet Protocol Television (IPTV) services and smart phone applications. While many high-end TVs and smartphones support 4K UHD video, there are many video streams with full high definition (FHD) resolution (1920x1080) due to legacy acquisition devices and services.
- UHD ultra High Definition
- IPTV Internet Protocol Television
- FHD full high definition
- the upscaling method is classified into two types.
- a single image upscaling method utilizes local spatial correlation within an LR image to recover lost high frequency detail.
- video upscaling methods use additional data dimensions (time) to improve performance and are expensive to compute.
- a single image up-scaling algorithm can be divided into an interpolation method and a super resolution (SR) method.
- the interpolation method uses a simple interpolation kernel such as a bilinear or bicubic kernel.
- the SR method showed better performance than the previous interpolation method.
- the basic concept of a learning-based approach is learning the mapping function from LR to HR image or video. Learning-based methods can be classified into two types.
- the first is to learn LR-HR mapping (based on the input image internal information) using the surrounding information of the LR image itself and the second to use an external LR-HR image pair (based on external training image or dictionary) .
- Machine learning algorithms such as sparse coding, anchored neighbors and linear mapping kernels have been proposed for SR.
- Machine learning-based methods find features in hand-crafted designs and use these handcrafted features to learn mappings, but in-depth neural networks learn the best features and mappings themselves, And effectively.
- This CNN structure includes a number of layers and nonlinear functions, and is designed to perform SR and generate HR images or high quality video.
- CNNs Due to excessive multiplications and other computations, conventional CNNs are known to be difficult to implement in low complexity hardware for real-time applications. Moreover, the analysis of computational complexity and execution time for sophisticated CNN-based SR methods was performed at the level of software (Software (SW)) of CPU (Central Processing Unit) and / or GPU (Graphics Processing Unit) platform. In addition, when the CNN structure is implemented in SW and HW, it is difficult to implement in real time because a plurality of frame buffers must be used to store intermediate feature maps.
- SW Software
- CPU Central Processing Unit
- GPU Graphics Processing Unit
- Embodiments can provide techniques for processing images using line-by-line operations.
- an image processing apparatus comprising: a receiver for receiving an image; at least a first line buffer for outputting the image as an image line in units of lines; And a feature map processor for storing the feature map in units of at least one line and outputting the feature maps stored in units of at least one line in a two-dimensional form do.
- the first convolution operator may be implemented in a residual block for learning and outputting a residual signal.
- the first convolution operator may include at least one 1-D convolution operator for performing a 1-D (1-dimensional) convolution operation.
- the first convolution operator may include a depth-wise convolution operator and a point-wise convolution operator directly connected to the depth-wise convolution operator.
- the feature map processor may include a compressor for compressing the feature map in at least one line unit.
- the feature map processor may further include at least one second line buffer for storing feature maps compressed on a line-by-line basis.
- the feature map processor may further include a decompressor for reconstructing the feature map compressed in units of lines into a two-dimensional feature map.
- the image processing apparatus may further include a second convolution operator for performing a convolution operation based on the feature map output in a two-dimensional form.
- the second convolution operator may include at least one 2-D convolution operator for performing a 2-D (2-dimensional) convolution operation.
- the second convolution operator comprises: a depth-wise convolution operator
- the image processing apparatus may further include a quantizer that quantizes the result of at least one convolution operation.
- the image processing apparatus may further include a weight buffer for storing parameters used for the convolution operation.
- An image processing method includes the steps of receiving an image, outputting the image through an image line on a line basis through at least one first line buffer, Generating a feature map by performing a first convolution operation; storing the feature map in units of at least one line and processing the feature map to output the feature map stored in units of at least one line in a two- .
- the first convolution operation may be performed in a residual block for learning and outputting a residual signal.
- the generating may comprise performing at least one 1-D (1-dimensional) convolution operation.
- the generating step may include performing a depth-wise convolution operation and directly performing a point-wise convolution operation on the result of the depth-wise convolution operation.
- the processing may include compressing the feature map in units of at least one line.
- the processing may further include storing a feature map compressed in at least one line unit.
- the processing may further include restoring the feature map compressed in the at least one line unit into a two-dimensional feature map.
- the image processing method may further include performing a second convolution operation based on the feature map output in a two-dimensional form.
- the performing may comprise performing at least one 2-D (2-dimensional) convolution operation.
- the performing of the depth-wise convolution operation may include performing a depth-wise convolution operation and directly performing a point-wise convolution operation on the result of the depthwise convolution operation.
- the image processing method may further include the step of quantizing the result of at least one convolution operation.
- the image processing method may further include storing a parameter used for the convolution operation.
- Figure 1 shows a schematic block diagram of an image processing apparatus according to one embodiment.
- Fig. 2 shows a schematic block diagram of the controller shown in Fig.
- Fig. 3 shows a schematic block diagram of the first convolution operator shown in Fig. 2.
- 4A shows an example of a conventional depth wise separable convolution.
- FIG. 4B shows an example of the operation of the first convolution operator shown in FIG.
- 4C shows another example of the operation of the first convolution operator shown in FIG.
- Figure 5 shows a schematic block diagram of the feature map processor shown in Figure 2;
- FIG. 6A shows an example of the operation of the feature map processor shown in FIG.
- Fig. 6B shows an example of a compression algorithm of the compressor shown in Fig. 5.
- FIG. 7 shows a schematic block diagram of a second convolution operator shown in FIG. 2.
- Fig. 8 shows an example of the hardware structure of the image processing apparatus shown in Fig.
- Fig. 9 shows an example of a neural network structure used by the image processing apparatus shown in Fig.
- Fig. 10 shows an example of a framework for verifying the image processing apparatus shown in Fig.
- FIG. 11A shows an example of the performance of the image processing apparatus shown in FIG.
- Fig. 11B shows another example of the performance of the image processing apparatus shown in Fig.
- Fig. 11C shows another example of the performance of the image processing apparatus shown in Fig.
- 12A shows an example of an original high resolution image.
- 12B shows an example of an image processed through a bicubic method.
- 12C shows an example of an image processed through the SRCNN method.
- 12D shows an example of an image processed through the SRCNN-Ex method.
- 12E shows an example of an image processed through the FSRCNN method.
- 12F shows an example of an image processed through the FSRCNN-s method.
- 12G shows an example of an image processed through the VDSR method.
- 12H shows an example of an image processed by applying the quantized weight to the image processing apparatus shown in FIG.
- FIG. 12I shows an example of an image processed by applying the quantized weight and activation by the image processing apparatus shown in FIG. 1.
- FIG. 12I shows an example of an image processed by applying the quantized weight and activation by the image processing apparatus shown in FIG. 1.
- 12J shows an example of an image processed by applying the quantized weight and activation and intermediate feature map compression to the image processing apparatus shown in Fig.
- 13A shows another example of the original high resolution image.
- 13B shows another example of an image processed through a bicubic method.
- 13C shows another example of an image processed through the SRCNN method.
- 13D shows another example of the image processed through the SRCNN-Ex method.
- 13E shows another example of an image processed through the FSRCNN method.
- 13F shows another example of an image processed through the FSRCNN-s method.
- 13G shows another example of the image processed through the VDSR method.
- 13H shows another example of an image processed by applying the quantized weight to the image processing apparatus shown in FIG.
- FIG. 13I shows another example of an image processed by applying the quantized weight and activation to the image processing apparatus shown in FIG.
- 13J shows another example of the image processed by applying the quantized weight and activation and intermediate feature map compression to the image processing apparatus shown in Fig.
- SR hardware is implemented as an FPGA.
- first, second, or the like may be used to describe various elements, but the elements should not be limited by terms.
- the terms may be named for the purpose of distinguishing one element from another, for example, without departing from the scope of the right according to the concept of the embodiment, the first element being referred to as the second element,
- the second component may also be referred to as a first component.
- a module in this specification may mean hardware capable of performing the functions and operations according to the respective names described in this specification and may mean computer program codes capable of performing specific functions and operations , Or an electronic recording medium, e.g., a processor or a microprocessor, equipped with computer program code capable of performing certain functions and operations.
- a module may mean a functional and / or structural combination of hardware for carrying out the technical idea of the present invention and / or software for driving the hardware.
- Figure 1 shows a schematic block diagram of an image processing apparatus according to one embodiment.
- the image processing apparatus 10 may receive an image and process the received image.
- the image processing apparatus 10 can process a received image to generate a high-resolution image.
- the image processing apparatus 10 can efficiently process images in low specification hardware.
- the image processing apparatus 10 can learn the neural network based on the received image.
- the image processing apparatus 10 can increase the resolution of the image using the learned neural network.
- the image processing apparatus 10 may perform a super-resolution on a low-resolution image.
- the image processing apparatus 10 can provide a hardware-friendly Convolutional Neural Network (CNN) based Super Resolution (SR) method.
- the image processing apparatus 10 may be implemented on an FPGA to convert a 2K full high definition (FHD) image to a 4K UHD (Ultra High Definition) at 60 fps.
- FHD full high definition
- UHD Ultra High Definition
- the image processing apparatus 10 can effectively perform the SR using the neural network in hardware with limited computation and memory space.
- the image processing apparatus 10 can process the LR (Low Resolution) input line by line and keep the parameter value of the convolution filter at a small number. That is, the image processing apparatus 10 can significantly reduce the number of filter parameters compared to the conventional CNN by processing LR (Low Resolution) data in line units.
- the image processing apparatus 10 can process the SR image using a cascade of 1-D (1-Dimensional) convolution.
- the image processing apparatus 10 can save the required line memory by keeping the large acceptance field along the horizontal line while keeping the vertical receptive field small.
- the line memory may include a line buffer.
- the image processing apparatus 10 can combine the residual connection with the depth-wise separable convolution to reduce the number of filter parameters of the neural network and reduce the SR performance .
- the image processing apparatus 10 can convert the 32-bit floating point data into the fixed-point data with the fixed-point data without substantially degrading the Peak Signal to Noise Ratio (PSNR) through quantization.
- the image processing apparatus 10 can compress the feature map to reduce the line memory required to store the feature map data.
- the image processing apparatus 10 includes a receiver 100 and a controller 200.
- the receiver 100 may receive the image.
- the image received by the receiver 100 may mean an image of an object formed by refraction or reflection of light.
- an image may include video, pictures, and photographs.
- Receiver 100 may receive the image in the form of pixel information.
- the receiver 100 may receive two-dimensional pixel information.
- the image may include a low resolution image and a high resolution image.
- the controller 200 can process the received image.
- the controller 200 can increase the resolution of the received image.
- the controller 200 may perform super resolution on the received image.
- the controller 200 can process an image line by line.
- the controller 200 may output an image in units of lines and increase the resolution of the image by performing operations on a line-by-line basis.
- Figure 2 shows a schematic block diagram of the controller shown in Figure 1;
- the controller 200 may include a first line buffer 210, a first convolution operator 220, and a feature map processor 230.
- the controller 200 may include a second convolution operator 240, a quantizer 250, and a weight buffer 260.
- the first line buffer 210 can output an image as an image line in units of lines.
- the first line buffer 210 may include at least one line buffer. At this time, the number of line buffers can be determined according to the size of the convolution filter.
- the first convolution operator 220 may generate a feature map by performing a convolution operation based on the image line.
- the feature map processor 230 can process feature maps in units of at least one line and output feature maps in units of at least one line in a two-dimensional form.
- the second convolution operator 240 can perform a convolution operation based on the feature map output in a two-dimensional form.
- the quantizer 250 may quantize the result of at least one convolution operation.
- the quantizer 250 may quantize the convolution result and filter parameters using a variety of quantization methods.
- the quantization method of the quantizer 250 may include any quantization algorithm that can convert a floating point to a fixed point, including uniform quantization and non-uniform quantization.
- the quantizer 250 may quantize at least one convolution result and filter parameters using uniform quantization.
- the image processing apparatus 10 may use a fixed-point representation via the quantizer 250 to reduce complexity.
- the quantizer 250 may convert a floating-point to fixed-point data.
- Fixed-point data can be defined as [IL, FL] to indicate the number.
- IL integer length
- FL fraction length
- the quantizer 250 may calculate the total number of bits used to represent a number by adding the fractional number of bits to the integer number of bits.
- the quantizer 250 may be set by a fixed-point format in [IL, FL], limits the precision of the FL data bits, and the range [-2 IL-1, 2 IL -1 -2 -FL] .
- the quantizer 250 may use a rounding method that rounds to nearest when converting a floating point to a fixed point. This rounding method can be expressed by Equation (1).
- Equation 2 the formula for converting a floating-point to a fixed-point can be expressed as shown in Equation 2 below.
- the optimal WL, IL, and FL values may be applied to the image processing apparatus 10 to minimize the PSNR degradation of the test set for floating point data (filter parameters and activation values) through a number of experiments.
- the drop due to quantization in the network used by the image processing apparatus 10 may be very small.
- the weight buffer 260 may store parameters used in the convolution operation.
- the above-described components can operate in a pipe-line form.
- at least one convolution operation may operate in a pipelined fashion.
- Fig. 3 shows a schematic block diagram of the first convolution operator shown in Fig. 2.
- the first convolution operator 220 may be implemented in a residual block for learning and outputting a residual signal.
- the first convolution operator 220 may include at least one 1-D convolution operator for performing a 1-dimensional (1-dimensional) convolution operation.
- the 1-D convolution operation may be a convolution operation in which data on a line-by-line basis is input.
- a 1-D convolution operation may mean 1 x n convolution.
- the length n of the line-by-line data may have any integer value of 2 or more.
- the first convolution operator 220 includes a point-wise (PW) convolution operator 223 directly connected to a depth-wise (DW) convolution operator 221 and a depthwise convolution operator 221, . ≪ / RTI >
- the depth-wise convolution operator 221 can perform a convolution operation in the depth direction of the feature map.
- the depth-wise convolution operator 221 can perform at least one depthwise convolution operation.
- the point-wise convolution operator 223 may perform a convolution operation on a point-by-point basis.
- the point-wise convolution operator 223 may perform at least one point-wise convolution operation.
- the depth-wise convolution operator 221 may include at least one 1-D convolution operator.
- the point-wise convolution operator 223 may perform 1x1 convolution.
- FIG. 4A shows an example of the conventional depth wise separable convolution
- FIG. 4B shows an example of the operation of the first convolution operator shown in FIG. 2
- FIG. 4C shows an example of the operation of the first convolution operator shown in FIG. Another example is shown.
- the first convolution operator 220 may perform a convolution operation using depth-wise separable convolutions (DSC). Accordingly, the first convolution operator 220 can achieve similar classification performance with only about 1/9 of the number of parameters as compared with the case of the existing non-separable convolution.
- DSC depth-wise separable convolutions
- the DSC may include a cascaded depthwise convolution operation, a rectified linear unit (ReLU), and a pointwise convolution operation.
- ReLU rectified linear unit
- DSC can exhibit poor performance when used in regression problems such as SR.
- Batch normalization BN can require relatively high computational complexity to degrade performance and to calculate the mean and variance in regression analysis.
- the first convolution operator 220 can use a structure in which BN and ReLU are removed from the DSC.
- the first convolution operator 220 removes the ReLU operation between the depth-wise convolution operator 221 and the point-wise convolution operator 223, and can directly use them.
- FIG. 4A shows a conventional DSC structure
- FIG. 4B can show a structure of a convolution layer used by the first convolution operator 220.
- Table 1 shows the results of comparing the PSNR and SSIM performances with and without the ReLU between the depth-wise convolution operator 221 and the point-wise convolution operator 223 for the Set-5 data set.
- a 3 ⁇ 3 size filter can be used for depth-wise convolution.
- some display applications such as T-Con, may not be able to use line memory excessively, which can limit the use of 3x3 filters in the network.
- a large acceptance field using a 3x3 or larger filter may be needed to achieve high performance in deep running.
- the image processing apparatus 10 can make the network more compact and suitable for the hardware in which the LR input data is streamed line by line.
- the image processing apparatus 10 may apply a 1-D horizontal convolution to the first convolution operator 220.
- the first convolution calculator 220 may have a rectangular vertical field having a longer length in the horizontal direction and a shorter length in the vertical direction. In this way, the image processing apparatus 10 can make the line memory required to store the intermediate feature maps as small as possible.
- the network may need to maintain a convolution filter that is as small as possible.
- a small number of filter parameters can worsen the learning of networks including DSC and 1-D horizontal convolution.
- the image processing apparatus 10 can significantly reduce the number of filters while having excellent SR performance by inserting a residual connection in the network.
- the image processing apparatus 10 may reduce the number of filters by implementing the first convolution operator 220 in the residual block.
- implementing 2-D convolutional residual connections may require additional line memories to store the input of the residual connections, which may also be needed at the end of the connection.
- Figure 4C may represent the final DSC structure with the final 1-D horizontal convolution and residual connections.
- Figure 5 shows a schematic block diagram of the feature map processor shown in Figure 2;
- the feature map processor 230 may include a compressor 231, a second line buffer 233, and a decompressor 235.
- the compressor 231 can compress the feature map in units of at least one line.
- the compressor 231 may compress the feature map in at least one direction of the width, height, and depth directions of the feature map.
- the second line buffer 233 can store the compressed characteristic map on a line-by-line basis.
- the second line buffer 233 may include at least one line buffer. At this time, the number of line buffers can be determined according to the size of a convolution filter used for performing a convolution operation.
- the decompressor 235 can restore the feature map compressed in units of lines into a two-dimensional feature map.
- Fig. 6A shows an example of the operation of the feature map processor shown in Fig. 2
- Fig. 6B shows an example of the compression algorithm of the compressor shown in Fig.
- the compressor 231 may compress the feature map through various algorithms.
- the compression algorithm may include Fixed-Length Coding and Variable-Length Coding.
- the fixed length coding algorithm may include the DXT algorithm and may include block based algorithms such as JPEG and JPEG 2000.
- the fixed length coding algorithm can be preferred in terms of hardware complexity.
- variable length coding algorithm may include Huffman and Arithmetic Coding.
- a variable length coding algorithm can be used to increase the compression rate.
- the compressor 231 can compress the feature map using the algorithm of the above-described algorithm.
- the size of the acceptance field can have a significant impact on performance.
- both the horizontal and vertical acceptance fields can be important.
- the feature map data may have to be stored in the line memory when the data output after passing through the previous convolution layer is transmitted in the next 3x3 convolution layer.
- a line memory that is twice as many as the number of line memories required to store the output feature maps of the current layer may be needed.
- the use of many line memories causes many problems in chip design.
- the main problem is the increase in chip size due to the increase in the number of power rings used in line memory, the routing congestion in place & route (P & R), and the powering on memory memory block boundaries If the power ring is insufficient, it may include the occurrence of a voltage drop.
- the compressor 231 can use the feature map compression method considering various points from the hardware implementation viewpoint.
- the feature map compressed by the compressor 231 may include an intermediate feature map.
- the compressor 231 can use a very simple compression algorithm. Since compression is considered to reduce the number of line memories, the size of the logic used in the compression method may have to be smaller than the memory size required to store the intermediate feature map before compression.
- the compressor 231 can provide an efficient compression algorithm for such data characteristics.
- the compressor 231 can compress the data using only the data adjacent in the horizontal direction in order to effectively use the line memory.
- the algorithm used by the compressor 231 may include an algorithm modified to fit the DXT5 algorithm to the CNN structure.
- DXT5 is capable of independently compressing each RGB color channel input in compressing RGB pixels illustratively comprised of 4x4 blocks.
- the maximum value (MAX) and minimum value (MIN) of each color channel can be calculated.
- Six intermediate points can be generated through interpolation using the maximum value (MAX) and the minimum value (MIN).
- MAX, MIN, and 6 intermediate points can be defined as reference colors for compression.
- each pixel may be assigned an index value of the color closest to the reference color.
- the encoding can be completed by storing a 4x4 block index value and MAX and MIN values. There are eight neighboring index values for each pixel in the 4x4 block, and each index can be represented by three bits for each pixel.
- the decoding can be easily performed using MAX, MIN values and index values in the reverse order of the encoding process. If bits per pixel (bits per pixel (bpp)) of RGB input is 8 bits, DXT5 can have a fixed compression ratio of 2: 1 for a 4x4 block.
- the compression ratio (CR) can be calculated as shown in Equation (3).
- the compressor 231 of the image processing apparatus 10 may provide a compression method that modifies the DXT5 algorithm to minimize image quality degradation and increase CR.
- the difference between the conventional DXT 5 and the image processing apparatus 10 can be shown in Table 2.
- the compressor 231 can fix the minimum value to 0 and calculate only the maximum value. Fixing the minimum value to 0 uses a characteristic that the data of the intermediate feature maps is close to 0 or 0.
- the imposition apparatus 10 can reduce the bit for storing the minimum value and eliminate the logic for calculating the minimum value by fixing the minimum value to zero. Since the intermediate feature map data must be processed line by line in hardware, the block size of the feature map data can be set to 1x32.
- a 5-bit index may be assigned to a quantization level for each piece of data in a 1x32 block of the feature map.
- One index of data is allocated to maintain image quality.
- the 5 bit length for the indices can be determined experimentally by examining the PSNR performance along the bit length for the data point indices.
- the CR of the compressor 231 can be expressed by Equation (4).
- the compressor 231 uses divisor values of 32 (a multiple of 2) instead of 31 (2 5 -1) to reduce hardware complexity for calculating intermediate points. Can be set. Thereby, the compressor 231 can calculate the midpoints with shift and add operators (shift and add operators)
- FIG. 7 shows a schematic block diagram of a second convolution operator shown in FIG. 2.
- the second convolution operator 240 may include a point-wise convolution operator 241 and a point-wise convolution operator 243 directly connected to the depth-wise convolution operator 241.
- the second convolution operator 240 may include at least one 2-D convolution operator for performing a 2-D (2-dimensional) convolution operation.
- the depth-wise convolution operator 241 can perform the convolution operation in the depth direction of the feature map.
- the depth-wise convolution operator 241 may perform at least one depthwise convolution operation.
- the point-wise convolution operator 243 may perform a convolution operation on a point-by-point basis.
- the point-wise convolution operator 243 may perform at least one point-wise convolution operation.
- the 2-D convolution operation may be a convolution operation using two-dimensional data as input.
- a 2-D convolution operation may mean m x n convolution.
- the length m and n of the line-by-line data may have an arbitrary integer value of 2 or more.
- 2-D convolution used by the image processing apparatus 10 may not be limited thereto.
- the depth-wise convolution operator 241 can perform the convolution operation in the depth direction of the feature map.
- the point-wise convolution operator 243 may perform a convolution operation on a point-by-point basis.
- the depth-wise convolution operator 241 may include at least one 2-D convolution operator.
- the point-wise convolution operator 243 may perform 1x1 convolution.
- the second convolution operator 240 may perform the convolution operation using the depthwise separable convolution.
- the description related to the depthwise separable convolution is the same as the first convolution operator 220 and will not be described in detail.
- Fig. 8 shows an example of the hardware structure of the image processing apparatus shown in Fig.
- the image processing apparatus 10 can process a low-resolution image to generate a high-resolution image.
- the image processing apparatus 10 may generate a 4K UHD image from the FHD image.
- the example of FIG. 8 may represent a pipeline hardware architecture for SR.
- the example of FIG. 8 may be of two types. Type-1 means no compression of the intermediate feature map, and Type-2 means compression.
- the image processing apparatus 10 includes a first pixel information converter, a first line buffer 210, a data aligner, a depth-wise convolution operator 221, a point-wise convolution operator 223, a compressor 231, 2 line buffer 233, a decompressor 235, a depth-wise convolution operator 241, a point-wise convolution operator 243, a quantizer 250, a weight buffer 270, a second pixel information converter, Buffer.
- the image received by the first pixel information converter may comprise color data.
- the color data may include RGB channel data and YCbCr channel data.
- the first pixel information converter may convert the first color data into the second color data.
- the first color data may include RGB channels.
- the second color data may include a YCbCr channel.
- the first pixel information converter may convert the RGB channel of the LR input image into a YCbCr channel.
- the first line buffer 210 may illustratively include four line buffers.
- the depth-wise convolution operator 221 can perform a 1x5 convolution operation.
- the second line buffer 233 may include an even line buffer and an odd line buffer.
- the depth-wise convolution operator 241 can perform a 3x3 convolution operation.
- the second pixel information information converter may convert the second color data into the first color data.
- the second pixel information converter may convert the YCbCr channel to an RGB channel.
- the weight buffer 260 may store parameters (or filter parameters) used in the convolution operation.
- the weight buffer 260 may update the parameters received from the convolution operators.
- the third line buffer may include a plurality of line buffers.
- the third line buffer may include four output line buffers.
- the outputs of all convolution operators may be quantized through the quantizer 250. All of the weighting parameters may also be quantized through the quantizer 250.
- the quantizer 250 can convert a 32-bit floating point to a 10-bit fixed point.
- the weight buffer 260 may store the quantized weight parameters.
- the arrows may represent data paths. That is, the arrow may represent a datapath according to type-1 and a datapath according to type-2.
- the image processing apparatus 10 can operate in a pipeline structure.
- the pipeline may refer to a structure in which the output of the processing step of one data is connected to the input of the next step.
- the processing step of the connected data can be performed in several steps simultaneously or in parallel.
- the components included in the image processing apparatus 10 may operate concurrently or in parallel to process images.
- at least one convolution operation of the image processing apparatus may operate in a pipeline form.
- Convolution operators may load the convolution filter parameters from the weight buffer 260.
- the first pixel information converter can then extract the YCbCr value from the RGB input stream.
- the first line buffer 210 may store four rows of the YCbCr LR input image for use in the shortest entry upscaling to obtain the interpolated image for the residual connection at the end of the network.
- the data aligner may rearrange the data of the four line buffers and input streams of the first line buffer 210 and generate 3x3 sized YCbCr LR patches.
- the Y channel of the LR patches can be transmitted in a 3x3 convolution layer.
- the feature map may pass through the ReLU activation function. Thereafter, the output of the ReLU function may pass through the first convolution operator 220.
- the first convolution operator 220 may generate a feature map (or an intermediate feature map).
- the compressor 231 compresses the intermediate feature map which has passed through the residual block and the ReLU, and the second line buffer 233 can store it.
- the decompressor 235 may read the data stored in the second line buffer 233 and decompress the decompressed data at a timing of one-delayed line data DE (data enable).
- the depth-wise convolution operator 241 performs a 3 ⁇ 3 convolution operation on the decompressed data
- the point-wise convolution operator 243 performs a 1 ⁇ 1 convolution operation on the output of the depth-wise convolution operator 241 .
- the number of feature map channels may be reduced from 32 to 16 in half.
- the feature map with the reduced channel can pass through the compressor 231, the second line buffer 233, and the decompressor 235 in sequence. Thereafter, the convolution operation may be performed again through the depth-wise convolution operator 241 and the point-wise convolution operator 243.
- the output after repeated convolution operations can be made up of four channels used to create a 2x2 HR patch in a manner similar to subpixel convolution.
- the image processing apparatus 10 adds the super-resolved Y data (Y C ) of 2x2 size and the 2X up-sampled data (Y N ) by the shortest entry interpolation method, (Y F ).
- Y N data is stored in a FIFO (First-In-First-Out) and can be read at the same timing as Y C.
- the CbCr data delayed from the FIFO may also be upsampled twice based on the nearest neighbor interpolation and sent to the second pixel information converter to obtain RGB pixels.
- the two output buffers of the third line buffer can store the generated 2x2 RGB HR patches and can be sent to the display device every output clock cycle at output timing.
- Four line buffers can be used as the third line buffers to avoid read / write conflicts for 2x2 RGB HR patches using a double buffering scheme for stream processing.
- Fig. 9 shows an example of a neural network structure used by the image processing apparatus shown in Fig.
- the image processing apparatus 10 can process images using a hardware-friendly CNN-based SR network.
- the image processing apparatus 10 can process the image using only a part of the color data.
- only the luminance signal (Y) channel can be input to the CNN network for processing.
- the performance of learning using only Y channel may be similar to the performance of learning using RGB channel.
- the number of parameters used for the RGB channel may be three times greater than the number of parameters used only for the Y channel in the 2-D convolution of the first layer and the point-wise convolution of the last layer .
- the color difference signal (Cb, Cr) channel data can be upscaled using an interpolation method.
- the interpolation method may include bicubic interpolation and nearest neighbor interpolation.
- the image processing apparatus 10 can be upscaled using simple shortest-point interpolation instead of bicubic interpolation for reduced hardware complexity and hardware efficiency. Also, in order to reduce the complexity, the image processing apparatus 10 may learn a neural network using a residual learning technique.
- the image processing apparatus 10 can calculate the final HR image Y F by adding the interpolated LR image Y N and the output Y C of the network. This can be expressed by Equation (5).
- the image processing apparatus 10 combines depth-wise separable convolutions, 1-D horizontal convolution and residual connection Can be used.
- the number of filter parameters is about 21 times smaller than that of the conventional Super-Resolution Convolutional Neural Network (SRCNN) -Ex, about 4.5 times smaller than that of the FSRCNN (Fast Super Resolution Convolutional Neural Network) While PSNR and Structural Similarity (SSIM) performance may be similar to SRCNN-Ex.
- SRCNN Super-Resolution Convolutional Neural Network
- FSRCNN Fast Super Resolution Convolutional Neural Network
- PSNR and Structural Similarity (SSIM) performance may be similar to SRCNN-Ex.
- the image processing apparatus 10 can perform a convolution operation through a 2-D convolution layer of two layers and a 1-D convolution layer.
- the 2-D convolution operation may be a 3x3 convolution operation
- the 1-D convolution operation may be a 1x5 convolution operation.
- the total receiving field size may be 7 x 15.
- Fig. 10 shows an example of a framework for verifying the image processing apparatus shown in Fig.
- performance may be evaluated for popular data sets to compare bi-cubic and conventional CNN-based SR methods with the image processing apparatus 10.
- the performance of the image processing apparatus 10 can be compared with software-based methods including SRCNN, SRCNN-Ex, FSRCNN, FSRCNN-s and VDSR (Very Deep Super-Resolution).
- the performance of the image processing apparatus 10 can be compared with other real-time SR hardware in view of the gate count and the operating frequency.
- a set of benchmark data that are available for learning and testing can be used.
- the SR network can be learned using 291 images consisting of 200 images from 91 images from Yang et al. And Berkeley Segmentation Dataset.
- Test set-1 and test set-2 can be used for performance comparison.
- Test set-1 can consist of Set5, Set14, B100 and Urban100, which can often be used as SR benchmarks in many ways.
- Test Set-2 can consist of 8 4K UHD images and can be used for testing.
- PSNR and SSIM can be used as metrics for evaluation. Since SR was performed on the luminance channel of the YCbCr color space, the PSNR and SSIM can be computed using the Y channel of the reconstructed original HR image.
- the LR input image can be intentionally generated by downsampling from the original HR image using bi-cubic interpolation of double scale. Subimages of size 128 x 128 may be cropped randomly for learning.
- the LR-HR learning image pair can be augmented using rotation, reflection, and scaling.
- the weights may be initialized using a uniform distribution, and the bias may not be used to reduce the number of parameters.
- L1 loss can be used as a cost function instead of L2 loss.
- the proposed SR network can be learned using the Adam optimizer.
- the learning rate is set to 0.0001 and can be reduced by 10 for every 50 epochs.
- the size of the mini batch can be set to two.
- a 3.4GHz NVIDIA Titan X GPU and an Intel Core i7-6700 CPU can be used for learning tests.
- the weight parameters of the SR network during the learning phase may be quantized according to Equation 2 from floating point to fixed point in the test phase while calculating the floating point.
- the compressed intermediate feature map and the final SR image in the algorithm step can be compared with the hardware simulation results designed using the golden model.
- FIG. 11A shows an example of the performance of the image processing apparatus shown in FIG. 1
- FIG. 11B shows another example of the performance of the image processing apparatus shown in FIG. 1
- Another example of performance is shown.
- weighting parameters and activation may be quantized for hardware implementation. It may be important to find the appropriate quantization bit depth since the quantization for the weight parameter and activation greatly affects the quality of the output image. That is, in the above description, appropriate values for three parameters of the word length (WL), the integer length (IL), and the decimal length (FL) may be required. To do this, experiments can be performed by changing WL, IL, FL for various data sets.
- 11A may represent a PSNR graph for WL and IL that quantize the weight parameter values for the Set5 data set.
- the PSNR performance of the SR network may be similar to the case of no weighting and active quantization.
- the WL bit depth can be set to 10 bits and the IL bit depth can be set to 2, which can also be used for active quantization and intermediate feature map compression.
- FIG. 11B shows the PSNR performance of the SR network for WL and IL bit depths for active quantization. Based on the experimental result of FIG. 11B, WL for active quantization can be set to 14 bits, and IL can be set to 2 bits.
- FIG. 11C can illustrate experimental results for a compression method applied to a quantized feature map to reduce line memory usage.
- Various block sizes and indices may be examined in relation to PSNR performance. The smaller the number of quantization levels of compression, the higher the compression ratio but the lower the PSNR performance.
- a block size of 32 bits and an index size of 5 bits can be selected as a compromise between the required line memory and the resulting PSNR.
- FIG. 12A shows an example of an original high-resolution image
- FIG. 12B shows an example of an image processed through a bicubic method
- FIG. 12C shows an example of an image processed through a SRCNN method.
- FIG. 12D shows an example of an image processed through the FSRCNN-s method
- FIG. 12E shows an example of an image processed through the FSRCNN method
- FIG. 12F shows an example of an image processed through the FSRCNN- .
- FIG. 12G shows an example of an image processed through the VDSR method
- FIG. 12H shows an example of an image processed by applying the quantized weight to the image processing apparatus shown in FIG.
- FIG. 12I shows an example of an image processed by applying the quantized weight and activation to the image processing apparatus shown in FIG. 1
- FIG. 12J shows an example in which the image processing apparatus shown in FIG. An example of an image processed by applying compression is shown.
- FIG. 13A shows another example of the original high-resolution image
- FIG. 13B shows another example of the image processed through the bicubic method
- FIG. 13C shows another example of the image processed through the SRCNN method.
- Fig. 13D shows another example of the image processed through the SRCNN-s method
- Fig. 13E shows another example of the image processed through the FSRCNN method
- Fig. 13F shows another example of the processed image through the FSRCNN- .
- FIG. 13G shows another example of the image processed through the VDSR method
- FIG. 13H shows another example of the image processed by applying the quantized weight to the image processing apparatus shown in FIG.
- Fig. 13I shows another example of the image processed by applying the quantized weight and activation to the image processing apparatus shown in Fig. 1
- Fig. 13J shows the image processing apparatus shown in Fig. Another example of an image processed by applying compression is shown.
- the performance of the image processing apparatus 10 can be compared with other CNN-based SR methods including bicubic and SRCNN, SRCNN-Ex, FSRCNN, and FSRCNN-s.
- Publicly available MATLAB (TM) source code for SRCNN, SRCNN-Ex, FSRCNN, and FSRCNN-s may be used and the image processing apparatus 10 may be implemented using PyTorch.
- the boundaries of the HR reconstruction and the original image may be excluded from the PSNR / SSIM calculation. All methods can be executed on the CPU platform.
- the execution time of the image processing apparatus 10 can be measured based on a software implementation using PyTorch.
- the image processing apparatus 10 performs better than FSRCNN-s and occupies only 64% of the filter parameters in comparison with FSRCNN-s. Also, it can be confirmed that the performance of the image processing apparatus 10 does not deteriorate even after quantization for the value of the weight parameter and activation.
- Table 5 shows the performance comparison results of the image processing apparatus 10 and other CNN-based SR methods in terms of the average calculation time of the PSNR and SSIM of the test set 2 composed of the 4K UHD test image.
- the image processing apparatus 10 can restore a quality HR image comparable to other SR methods.
- RCNN, SRCNN-Ex, FSRCNN, and FSRCNN-s can use relatively long run times because public code is implemented in MATLAB and may not be optimized on the CPU platform.
- the network used by the image processing apparatus 10 for a fair comparison can also be implemented in TensorFlow, and other codes can be created in TensorFlow and the execution time can be measured in the GPU platform.
- the execution time measured by the GPU for various CNN-based SR methods including the image processing apparatus 10 can be confirmed.
- the execution time of the image processing apparatus 10 executed in the GPU was measured to be about 50 ms, which is about three times faster than the FPGA implementation.
- 12A to 12J can represent restored images and their cropped regions using five CNN-based SR methods including bi-cubic and image processing apparatus 10. [ It can be confirmed that the image processing apparatus 10 uses the smallest number of parameters but the resulting HR image can still be perceived with sharp edges and few artifacts.
- 13A-13J may represent cropped regions of the reconstructed HR image for a 4K UHD resolution children image. It can be confirmed that the visual quality of the image processing apparatus 10 and other CNN-based SR methods are similar.
- SR hardware is implemented as an FPGA.
- Table 6 shows the details of Lee, Yang, previous Super-Interpolation (SI) and implementation of the image processing apparatus 10.
- Lee et al. Presented hardware using a Lagrange interpolation method using a sharpening algorithm to obtain a 4K UHD video stream from HD and FHD streams at 30 fps.
- Yang's HW architecture is based on Anchor Neighborhood Regression (ANR), which uses a dictionary to obtain intermediate images at a target resolution for generating high-frequency patches and obtains FHD at 60 fps.
- ANR Anchor Neighborhood Regression
- the machine learning based SI HW architecture can be based on linear mapping using edge orientation analysis, which directly restores the HR image through high frequency reconstruction without the need for intermediate images.
- the image processing apparatus 10 may be implemented using a system verilog of the FPGA.
- the output clock speed of the image processing apparatus 10 may be four times the input clock speed. This may be because the ratio of FHD operating frequencies above 4K UHD is usually 1/4.
- the image processing apparatus 10 may process 4 pixels per clock cycle to support a 4K UHD video stream of 60 fps and may be used in combination with the 150MHz target operating frequency and the synthesis and P & R steps of the Vivado Design Suite 2015.4 May be implemented according to the constraints applied.
- Xilinx Kintex UltraScale FPGA KCU105 evaluation board and TED's HDMI 2.0 expansion card can be used to support FHD input and 4K UHD output video interface to validate implemented SR hardware.
- the image processing apparatus 10 may be provided with two types of SR HWs. Both types of SR HW may be type-1 with no feature map compression applied and type-2 with feature map compression applied.
- a 110K slice LUT and a 102K slice register can be used that occupy 45.38% of the total slice LUTs and 21.08% of the total slice registers in the XCKU040 FPGA device.
- Type-2 a 151K slice LUT and a 121K slice register corresponding to 62.6% of the total slice LUT and 24.97% of the total slice register can be used.
- Type-1 and Type-2 can take full advantage of the 1,920 DSP block in the XCKU040 FPGA device on the KCU105 evaluation board.
- Type-2 can reduce on-chip memory usage of about 50% of Type-1 (for example, block RAM on an FPGA). On the other hand, Type-2 can further use a 38% slice LUT and about 18% slice registers in comparison to Type-1 to implement two compressors 231 and six decompressors 235.
- the image processing apparatus 10 may require a larger number of line memories and gates as compared to the non-CNN based SR method, but it can recover a fairly high quality 4K UHD HR image in real time at 60 fps.
- the method according to an embodiment may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium.
- the computer readable medium may include program instructions, data files, data structures, and the like, alone or in combination.
- Program instructions to be recorded on the medium may be those specially designed and constructed for the embodiments or may be available to those skilled in the art of computer software.
- Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like.
- program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like.
- the hardware device may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.
- the software may include a computer program, code, instructions, or a combination of one or more of the foregoing, and may be configured to configure the processing device to operate as desired or to process it collectively or collectively Device can be commanded.
- the software and / or data may be in the form of any type of machine, component, physical device, virtual equipment, computer storage media, or device , Or may be permanently or temporarily embodied in a transmitted signal wave.
- the software may be distributed over a networked computer system and stored or executed in a distributed manner.
- the software and data may be stored on one or more computer readable recording media.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Signal Processing (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
Description
Claims (24)
- 이미지를 수신하는 수신기;상기 이미지를 라인 단위의 이미지 라인으로 출력하는 적어도 하나의 제1 라인 버퍼(line buffer);상기 이미지 라인에 기초하여 컨벌루션 연산을 수행함으로써 특징맵을 생성하는 제1 컨벌루션 연산기; 및상기 특징맵을 적어도 하나의 라인 단위로 저장하고, 상기 적어도 하나의 라인 단위로 저장된 특징맵을 2 차원 형태로 출력하도록 처리하는 특징맵 처리기를 포함하는 이미지 처리 장치.
- 제1항에 있어서,상기 제1 컨벌루션 연산기는,잔차 신호를 학습하여 출력하도록 하는 잔차 블록(Residual Block) 내에 구현되는이미지 처리 장치.
- 제1항에 있어서,상기 제1 컨벌루션 연산기는,1-D(1-dimensional) 컨벌루션 연산을 수행하는 적어도 하나의 1-D 컨벌루션 연산기를 포함하는 이미지 처리 장치.
- 제1항에 있어서,상기 제1 컨벌루션 연산기는,뎁스 와이즈(depth-wise) 컨벌루션 연산기; 및상기 뎁스 와이즈 컨벌루션 연산기와 직접적으로 연결된 포인트 와이즈(point-wise) 컨벌루션 연산기를 포함하는 이미지 처리 장치.
- 제1항에 있어서,상기 특징맵 처리기는,상기 특징맵을 적어도 하나의 라인 단위로 압축하는 컴프레서(compressor)를 포함하는 이미지 처리 장치.
- 제5항에 있어서,상기 특징맵 처리기는,라인 단위로 압축된 특징맵을 저장하는 적어도 하나의 제2 라인 버퍼를 더 포함하는 이미지 처리 장치.
- 제6항에 있어서,상기 특징맵 처리기는,상기 라인 단위로 압축된 특징맵을 2 차원 특징맵으로 복원하는 디컴프레서(decompressor)를 더 포함하는 이미지 처리 장치.
- 제1항에 있어서,2차원 형태로 출력된 특징맵에 기초하여 컨벌루션 연산을 수행하는 제2 컨벌루션 연산기를 더 포함하는 이미지 처리 장치.
- 제8항에 있어서,상기 제2 컨벌루션 연산기는,2-D(2-dimensional) 컨벌루션 연산을 수행하는 적어도 하나의 2-D 컨벌루션 연산기를 포함하는 이미지 처리 장치.
- 제8항에 있어서,상기 제2 컨벌루션 연산기는,뎁스 와이즈(depth-wise) 컨벌루션 연산기; 및상기 뎁스 와이즈 컨벌루션 연산기와 직접적으로 연결된 포인트 와이즈(point-wise) 컨벌루션 연산기를 포함하는 이미지 처리 장치.
- 제1항에 있어서,적어도 하나의 컨벌루션 연산 결과를 양자화하는 양자화기를 더 포함하는 이미지 처리 장치.
- 제1항에 있어서,컨벌루션 연산에 사용되는 파라미터를 저장하는 가중치 버퍼를 더 포함하는 이미지 처리 장치.
- 이미지를 수신하는 단계;적어도 하나의 제1 라인 버퍼(line buffer)를 통해 상기 이미지를 라인 단위의 이미지 라인으로 출력하는 단계;상기 이미지 라인에 기초하여 제1 컨벌루션 연산을 수행함으로써 특징맵을 생성하는 단계; 및상기 특징맵을 적어도 하나의 라인 단위로 저장하고, 상기 적어도 하나의 라인 단위로 저장된 특징맵을 2 차원 형태로 출력하도록 상기 특징맵을 처리하는 단계를 포함하는 이미지 처리 방법.
- 제13항에 있어서,상기 제1 컨벌루션 연산은,잔차 신호를 학습하여 출력하도록 하는 잔차 블록(Residual Block)내에서 수행되는이미지 처리 방법.
- 제13항에 있어서,상기 생성하는 단계는,적어도 한 번의 1-D(1-dimensional) 컨벌루션 연산을 수행하는 단계를 포함하는 이미지 처리 방법.
- 제13항에 있어서,상기 생성하는 단계는,뎁스 와이즈(depth-wise) 컨벌루션 연산을 수행하는 단계; 및상기 뎁스 와이즈 컨벌루션 연산 결과를 직접적으로 포인트 와이즈(point-wise) 컨벌루션 연산하는 단계를 포함하는 이미지 처리 방법.
- 제12항에 있어서,상기 처리하는 단계는,상기 특징맵을 적어도 하나의 라인 단위로 압축하는 단계를 포함하는 이미지 처리 방법.
- 제17항에 있어서,상기 처리하는 단계는,적어도 하나의 라인 단위로 압축된 특징맵을 저장하는 단계를 더 포함하는 이미지 처리 방법.
- 제18항에 있어서,상기 처리하는 단계는,상기 적어도 하나의 라인 단위로 압축된 특징맵을 2 차원 특징맵으로 복원하는 단계를 더 포함하는 이미지 처리 방법.
- 제13항에 있어서,2차원 형태로 출력된 특징맵에 기초하여 제2 컨벌루션 연산을 수행하는 단계를 더 포함하는 이미지 처리 방법.
- 제20항에 있어서,상기 수행하는 단계는,적어도 한 번의 2-D(2-dimensional) 컨벌루션 연산을 수행하는 단계를 포함하는 이미지 처리 방법.
- 제20항에 있어서,상기 수행하는 단계는,뎁스 와이즈(depth-wise) 컨벌루션 연산을 수행하는 단계; 및상기 뎁스 와이즈 컨벌루션 연산 결과를 직접적으로 포인트 와이즈(point-wise) 컨벌루션 연산하는 단계를 포함하는 이미지 처리 방법.
- 제13항에 있어서,적어도 하나의 컨벌루션 연산 결과를 양자화하는 단계를 더 포함하는 이미지 처리 방법.
- 제13항에 있어서,컨벌루션 연산에 사용되는 파라미터를 저장하는 단계를 더 포함하는 이미지 처리 방법.
Priority Applications (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/961,688 US11412175B2 (en) | 2018-01-16 | 2018-12-12 | Super-resolution method and device using linewise operation |
AU2018357828A AU2018357828A1 (en) | 2018-01-16 | 2018-12-12 | Method and apparatus for super-resolution using line unit operation |
AU2019101273A AU2019101273A4 (en) | 2018-01-16 | 2019-10-21 | Method and apparatus for super-resolution using line unit operation |
AU2019101270A AU2019101270A4 (en) | 2018-01-16 | 2019-10-21 | Method and apparatus for super-resolution using line unit operation |
AU2019101272A AU2019101272A4 (en) | 2018-01-16 | 2019-10-21 | Method and apparatus for super-resolution using line unit operation |
AU2019101274A AU2019101274A4 (en) | 2018-01-16 | 2019-10-21 | Method and apparatus for super-resolution using line unit operation |
AU2019101271A AU2019101271A4 (en) | 2018-01-16 | 2019-10-21 | Method and apparatus for super-resolution using line unit operation |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2018-0005783 | 2018-01-16 | ||
KR20180005783 | 2018-01-16 | ||
KR10-2018-0091482 | 2018-08-06 | ||
KR1020180091482A KR102017995B1 (ko) | 2018-01-16 | 2018-08-06 | 라인 단위 연산을 이용한 초해상화 방법 및 장치 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019143024A1 true WO2019143024A1 (ko) | 2019-07-25 |
Family
ID=67301177
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2018/015733 WO2019143024A1 (ko) | 2018-01-16 | 2018-12-12 | 라인 단위 연산을 이용한 초해상화 방법 및 장치 |
Country Status (2)
Country | Link |
---|---|
AU (6) | AU2018357828A1 (ko) |
WO (1) | WO2019143024A1 (ko) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11057585B2 (en) | 2018-01-16 | 2021-07-06 | Korea Advanced Institute Of Science And Technology | Image processing method and device using line input and output |
US11956569B2 (en) | 2018-01-16 | 2024-04-09 | Korea Advanced Institute Of Science And Technology | Image processing method and device using a line-wise operation |
US11962937B2 (en) | 2018-01-16 | 2024-04-16 | Korea Advanced Institute Of Science And Technology | Method and device of super resolution using feature map compression |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210334072A1 (en) * | 2020-04-22 | 2021-10-28 | Facebook, Inc. | Mapping convolution to connected processing elements using distributed pipelined separable convolution operations |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130083844A1 (en) * | 2011-09-30 | 2013-04-04 | In Suk Chong | Coefficient coding for sample adaptive offset and adaptive loop filter |
KR20140072097A (ko) * | 2011-10-14 | 2014-06-12 | 아나로그 디바이시즈 인코포레이티드 | 동적으로 재구성가능한 파이프라인형 프리-프로세서 |
KR20160015799A (ko) * | 2014-07-31 | 2016-02-15 | 삼성전자주식회사 | 인루프 필터 파라미터 예측을 사용하는 비디오 부호화 방법 및 그 장치, 비디오 복호화 방법 및 그 장치 |
KR20170059040A (ko) * | 2015-11-19 | 2017-05-30 | 전자부품연구원 | 비디오 부호화기의 최적 모드 결정 장치 및 최적 모드 결정을 이용한 비디오 부호화 방법 |
KR20180001428A (ko) * | 2016-06-24 | 2018-01-04 | 한국과학기술원 | Cnn 기반 인루프 필터를 포함하는 부호화 방법과 장치 및 복호화 방법과 장치 |
-
2018
- 2018-12-12 WO PCT/KR2018/015733 patent/WO2019143024A1/ko active Application Filing
- 2018-12-12 AU AU2018357828A patent/AU2018357828A1/en active Pending
-
2019
- 2019-10-21 AU AU2019101272A patent/AU2019101272A4/en active Active
- 2019-10-21 AU AU2019101271A patent/AU2019101271A4/en active Active
- 2019-10-21 AU AU2019101270A patent/AU2019101270A4/en active Active
- 2019-10-21 AU AU2019101273A patent/AU2019101273A4/en active Active
- 2019-10-21 AU AU2019101274A patent/AU2019101274A4/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130083844A1 (en) * | 2011-09-30 | 2013-04-04 | In Suk Chong | Coefficient coding for sample adaptive offset and adaptive loop filter |
KR20140072097A (ko) * | 2011-10-14 | 2014-06-12 | 아나로그 디바이시즈 인코포레이티드 | 동적으로 재구성가능한 파이프라인형 프리-프로세서 |
KR20160015799A (ko) * | 2014-07-31 | 2016-02-15 | 삼성전자주식회사 | 인루프 필터 파라미터 예측을 사용하는 비디오 부호화 방법 및 그 장치, 비디오 복호화 방법 및 그 장치 |
KR20170059040A (ko) * | 2015-11-19 | 2017-05-30 | 전자부품연구원 | 비디오 부호화기의 최적 모드 결정 장치 및 최적 모드 결정을 이용한 비디오 부호화 방법 |
KR20180001428A (ko) * | 2016-06-24 | 2018-01-04 | 한국과학기술원 | Cnn 기반 인루프 필터를 포함하는 부호화 방법과 장치 및 복호화 방법과 장치 |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11057585B2 (en) | 2018-01-16 | 2021-07-06 | Korea Advanced Institute Of Science And Technology | Image processing method and device using line input and output |
US11412175B2 (en) | 2018-01-16 | 2022-08-09 | Korea Advanced Institute Of Science And Technology | Super-resolution method and device using linewise operation |
US11956569B2 (en) | 2018-01-16 | 2024-04-09 | Korea Advanced Institute Of Science And Technology | Image processing method and device using a line-wise operation |
US11962937B2 (en) | 2018-01-16 | 2024-04-16 | Korea Advanced Institute Of Science And Technology | Method and device of super resolution using feature map compression |
US11968472B2 (en) | 2018-01-16 | 2024-04-23 | Korea Advanced Institute Of Science And Technology | Image pipeline processing method and device |
US11974069B2 (en) | 2018-01-16 | 2024-04-30 | Korea Advanced Institute Of Science And Technology | Image processing method and device using feature map compression |
Also Published As
Publication number | Publication date |
---|---|
AU2018357828A1 (en) | 2019-08-01 |
AU2019101273A4 (en) | 2019-11-28 |
AU2019101270A4 (en) | 2019-11-28 |
AU2018357828A2 (en) | 2019-11-21 |
AU2019101271A4 (en) | 2019-11-28 |
AU2019101272A4 (en) | 2019-11-28 |
AU2019101274A4 (en) | 2019-11-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019143026A1 (ko) | 특징맵 압축을 이용한 이미지 처리 방법 및 장치 | |
WO2019143024A1 (ko) | 라인 단위 연산을 이용한 초해상화 방법 및 장치 | |
WO2019143027A1 (ko) | 이미지 파이프라인 처리 방법 및 장치 | |
WO2017065525A2 (ko) | 영상을 부호화 또는 복호화하는 방법 및 장치 | |
WO2020080698A1 (ko) | 영상의 주관적 품질을 평가하는 방법 및 장치 | |
WO2018030599A1 (ko) | 인트라 예측 모드 기반 영상 처리 방법 및 이를 위한 장치 | |
WO2020080765A1 (en) | Apparatuses and methods for performing artificial intelligence encoding and artificial intelligence decoding on image | |
WO2018038554A1 (ko) | 이차 변환을 이용한 비디오 신호의 인코딩/디코딩 방법 및 장치 | |
WO2017014585A1 (ko) | 그래프 기반 변환을 이용하여 비디오 신호를 처리하는 방법 및 장치 | |
WO2018062788A1 (ko) | 인트라 예측 모드 기반 영상 처리 방법 및 이를 위한 장치 | |
WO2013157825A1 (ko) | 영상 부호화/복호화 방법 및 장치 | |
WO2013002619A2 (ko) | 고정소수점 변환을 위한 비트뎁스 조절을 수반하는 비디오 부호화 방법 및 그 장치, 비디오 복호화 방법 및 그 장치 | |
WO2014163249A1 (ko) | 동영상 처리 방법 및 장치 | |
WO2012044076A2 (ko) | 비디오의 부호화 방법 및 장치, 복호화 방법 및 장치 | |
WO2018124333A1 (ko) | 인트라 예측 모드 기반 영상 처리 방법 및 이를 위한 장치 | |
WO2018105759A1 (ko) | 영상 부호화/복호화 방법 및 이를 위한 장치 | |
WO2018131986A1 (ko) | 영상의 부호화/복호화 방법 및 장치 | |
WO2011071325A2 (en) | Method and apparatus for encoding and decoding image by using rotational transform | |
WO2021172834A1 (en) | Apparatus and method for performing artificial intelligence encoding and artificial intelligence decoding on image by using pre-processing | |
WO2019221472A1 (ko) | 참조 샘플을 이용하는 비디오 신호 처리 방법 및 장치 | |
WO2017150823A1 (ko) | 비디오 신호 부호화/복호화 방법 및 이를 위한 장치 | |
WO2019143025A1 (ko) | 라인 입력 및 출력을 이용한 이미지 처리 방법 및 장치 | |
WO2016076624A1 (ko) | 그래프 기반 변환(graph based transform)을 이용한 비디오 신호 처리 방법 및 이를 위한 장치 | |
WO2010019002A2 (ko) | H.264 표준의 영상 프레임에서 섬네일 이미지를 생성하는 방법 | |
WO2020256482A1 (ko) | 변환에 기반한 영상 코딩 방법 및 그 장치 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2018357828 Country of ref document: AU Date of ref document: 20181212 Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18900822 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18900822 Country of ref document: EP Kind code of ref document: A1 |