CN116843555B

CN116843555B - Image interpolation method, device, electronic equipment and storage medium

Info

Publication number: CN116843555B
Application number: CN202311127116.7A
Authority: CN
Inventors: 邱丰; 徐林
Original assignee: Rongming Microelectronics Jinan Co ltd
Current assignee: Rongming Microelectronics Jinan Co ltd
Priority date: 2023-09-04
Filing date: 2023-09-04
Publication date: 2023-12-19
Anticipated expiration: 2043-09-04
Also published as: CN116843555A

Abstract

The present disclosure relates to the field of image processing technologies, and in particular, to an image interpolation method, an image interpolation device, an electronic device, and a storage medium. The image interpolation method comprises the following steps: inputting the original image and the scaling into a deep learning hardware reasoning accelerator, wherein the deep learning hardware reasoning accelerator comprises a convolution layer and a deconvolution layer; the deep learning hardware reasoning accelerator is used for scaling the original image according to the scaling scale so as to obtain a target image; the convolution layer carries out convolution processing on the original image according to the scaling ratio and outputs N sub-images, wherein N is an integer greater than 1; the N sub-images respectively store pixel values of target pixels at N relative positions in the target image; the deconvolution layer performs a merging operation on the N sub-images according to the N relative positions to obtain a target image. According to the technical scheme, the calculation efficiency and the implementation flexibility can be improved, and the power consumption is reduced.

Description

Image interpolation method, device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image interpolation method, an image interpolation device, an electronic device, and a storage medium.

Background

In image processing, the operation of geometric transformation of an image is an operation of mapping one image into another image. In general, geometric transformations can be categorized as scaling, flipping, affine (translation or rotation), perspective, remapping, etc.

When the image is geometrically transformed, pixels which cannot be directly assigned will tend to appear, for example, when the image is enlarged twice, there are more pixels which cannot be directly mapped. When this occurs, an image interpolation algorithm will typically be employed. In brief, the image interpolation algorithm uses known pixel points to calculate unknown points.

There are a number of image interpolation algorithms, the three most common interpolation algorithms are presented below:

(1) Nearest neighbor method (Nearest Interpolation): and (3) a pixel point in the target image is corresponding to the original image, and then the pixel value of the pixel point with the nearest integer coordinate is found and is output as the pixel value of the target image. The algorithm calculates the speed at the highest speed, but the amplified target image has serious mosaic, obvious blockiness effect appears, and the reduced target image has serious distortion.

(2) Bilinear interpolation (Bilinear Interpolation): bilinear interpolation is to calculate one pixel point in the target image with 4 (2×2) pixel points in the original image. Its effect and speed are in between the other two methods, belonging to the default algorithm in many frameworks.

(3) Bicubic interpolation (Bicubic Interpolation): bicubic interpolation is the calculation of one pixel in the target image with 16 (4*4) pixels in the original image. This algorithm works best among the three methods but is relatively computationally expensive.

The general image interpolation is implemented by software, hardware or GPU (graphics processing unit, graphics processor) alone. The delay is severe due to the large number of computations involved therein, resulting in a separate implementation using software. The hardware alone cannot flexibly change the framework used for calculating the image interpolation or the kernel function of the framework, so that the implementation method is single and cannot be perfectly suitable for all conditions. While GPU implementations may address the drawbacks of the former two, traditional GPU computing is very expensive, while power consumption is also very high. As described above, implementing the image interpolation algorithm alone using software, hardware, or GPU is not most preferred in terms of efficiency, flexibility, and power consumption.

Disclosure of Invention

The invention aims to provide an image interpolation method, an image interpolation device, electronic equipment and a storage medium, which can improve the calculation efficiency and the realization flexibility and reduce the power consumption.

According to a first aspect of an embodiment of the present application, there is provided an image interpolation method, including:

Inputting an original image and a scaling into a deep learning hardware inference accelerator, wherein the deep learning hardware inference accelerator comprises a convolution layer and a deconvolution layer; the deep learning hardware reasoning accelerator is used for scaling the original image according to the scaling ratio so as to obtain a target image;

the convolution layer carries out convolution processing on the original image according to the scaling ratio and outputs N sub-images, wherein N is an integer greater than 1; n sub-images respectively store pixel values of target pixels at N relative positions in the target image;

and the deconvolution layer carries out merging operation on the N sub-images according to the N relative positions to obtain the target image.

In one embodiment, before the convolution layer convolves the original image according to the scaling and outputs N sub-images, the convolution layer further includes:

determining the channel number of the target convolution kernels, the size of each channel target convolution kernel, the weight parameter of each channel target convolution kernel and the corresponding position of each channel target convolution kernel according to the scaling; wherein the number of channels of the target convolution kernel is N;

and merging all the channel target convolution kernels according to the positions corresponding to each channel target convolution kernel to obtain the weight parameters of the convolution layers.

In one embodiment, the determining the number of channels of the target convolution kernel, the size of each channel target convolution kernel, the weight parameter of each channel target convolution kernel, and the corresponding position of each channel target convolution kernel according to the scaling ratio includes:

determining the number of channels of the initial convolution kernel, the size of each channel initial convolution kernel, the weight parameter of each channel initial convolution kernel and the corresponding position of each channel initial convolution kernel according to the scaling; wherein the number of channels of the initial convolution kernel is N;

determining the size of each channel target convolution kernel according to the position of the size of each channel initial convolution kernel and the position corresponding to each channel initial convolution kernel, wherein the size of each channel target convolution kernel is larger than the size of each channel initial convolution kernel;

and for each channel of the initial convolution kernel, converting the initial convolution kernel of the channel into the target convolution kernel of the channel according to the size of the initial convolution kernel of the channel, the size of the target convolution kernel of the channel and the corresponding position of the initial convolution kernel of the channel, wherein the corresponding position of the target convolution kernel of the channel is the same as the corresponding position of the initial convolution kernel of the channel.

In one embodiment, the determining the number of channels of the initial convolution kernel according to the scaling includes:

Determining the relative positions of target pixels in the target image and original pixels in the original image according to the scaling;

classifying all the target pixels according to the relative positions of the target pixels and the original pixels, and determining the category number of the target pixels;

and determining the channel number of the initial convolution kernel according to the category number of the target pixel.

In one embodiment, the determining the weight parameter of the initial convolution kernel of each channel according to the scaling includes:

for each channel of the initial convolution kernel, determining a plurality of original pixels in the initial convolution kernel size range which participate in convolution operation each time in the original image;

for each original pixel, determining the distance between the target pixel and the original pixel in the X axis and the Y axis; wherein, a plurality of original pixels are arranged along an X axis and a Y axis in an array manner, and the X axis and the Y axis are mutually perpendicular;

and calculating to obtain the weight parameter of the initial convolution kernel according to the distances between the target pixel and the original pixel on the X axis and the Y axis and the image interpolation function.

In one embodiment, the determining the size of the target convolution kernel of each channel according to the size of the initial convolution kernel of each channel and the position corresponding to the initial convolution kernel of each channel includes:

Determining a minimum repeating unit according to the size of each channel initial convolution kernel and the position corresponding to each channel initial convolution kernel; the minimum repeating unit is a unit formed by a plurality of original pixels in a target convolution kernel size range which participates in convolution operation each time in the original image;

and determining the size of each channel target convolution kernel according to the size of the minimum repeating unit.

In one embodiment, for each channel of the initial convolution kernel, the converting the channel initial convolution kernel into the channel target convolution kernel according to the size of the channel initial convolution kernel, the size of the channel target convolution kernel, and the corresponding position of the channel initial convolution kernel includes:

determining a filling position according to the size of the initial convolution kernel of the channel, the size of the target convolution kernel of the channel and the corresponding position of the initial convolution kernel of the channel for each channel of the initial convolution kernel;

and filling the filling position with a weight parameter to obtain the channel target convolution kernel.

In one embodiment, the deep learning hardware inference accelerator further comprises a filler layer;

the convolution layer carries out convolution processing on the original image according to the scaling ratio and before outputting N sub-images, the convolution layer further comprises:

The filling layer fills a circle of pixels around the periphery of the original image, and the width of the filled pixel area is 1 pixel.

In one embodiment, the original image is in an image format of RGB or YUV444;

the convolution layer carries out convolution processing on the original image according to the scaling ratio and outputs N sub-images, and the convolution layer comprises the following steps:

for any channel of the original image, the convolution layer carries out convolution processing on the channel image according to the scaling and outputs N sub-images;

the deconvolution layer performs a merging operation on the N sub-images according to N relative positions to obtain the target image, and the deconvolution layer comprises the following steps:

for any channel of the original image, the deconvolution layer performs merging operation on the N sub-images according to N relative positions to obtain a first intermediate image;

and combining the first intermediate images of all the channels to obtain the target image.

In one embodiment, the original image is in an image format of RGB or YUV444;

and for all the sub-images of the channels of the original image, the deconvolution layer carries out merging operation on the 3N sub-images according to N relative positions to obtain the target image.

In one embodiment of the present invention, in one embodiment,

the image format of the original image is YUV420;

for the Y channel of the original image, the convolution layer carries out convolution processing on the channel image according to the scaling ratio and outputs N sub-images;

for the U channel of the original image, the convolution layer carries out convolution processing on the channel image according to the scaling ratio and outputs N sub-images;

for the V channel of the original image, the convolution layer carries out convolution processing on the channel image according to the scaling ratio and outputs N sub-images;

for the Y channel of the original image, the deconvolution layer carries out merging operation on the N sub-images according to N relative positions to obtain a second intermediate image;

For the U channel of the original image, the deconvolution layer carries out merging operation on the N sub-images according to N relative positions to obtain a third intermediate image;

for the V channel of the original image, the deconvolution layer carries out merging operation on the N sub-images according to N relative positions to obtain a fourth intermediate image;

and merging the second intermediate image, the third intermediate image and the fourth intermediate image to obtain the target image.

In one embodiment, the scale is in fractional form.

According to a second aspect of embodiments of the present application, there is provided an image interpolation apparatus, including:

an input module configured to input an original image and a scale into a deep learning hardware inference accelerator, wherein the deep learning hardware inference accelerator comprises a convolution layer and a deconvolution layer; the deep learning hardware reasoning accelerator is used for scaling the original image according to the scaling ratio so as to obtain a target image;

the processing module is configured to perform convolution processing on the original image according to the scaling ratio by the convolution layer and output N sub-images, wherein N is an integer greater than 1; n sub-images respectively store pixel values of target pixels at N relative positions in the target image;

And the merging module is configured to perform merging operation on the N sub-images according to N relative positions by the deconvolution layer to obtain the target image.

According to a third aspect of embodiments of the present application, there is provided an electronic device comprising a memory and a processor, the memory being for storing a computer program executable by the processor; the processor is configured to execute the computer program in the memory to implement the method described above.

According to a fourth aspect of embodiments of the present application, there is provided a computer readable storage medium having stored thereon a computer program, characterized in that the above-mentioned method is enabled when the executable computer program in the storage medium is executed by a processor.

Compared with the prior art, the beneficial effect of this application lies in: inputting the original image and the scaling ratio into a deep learning hardware reasoning accelerator so that the deep learning hardware reasoning accelerator scales the original image according to the scaling ratio to obtain a target image; the deep learning hardware reasoning accelerator comprises a convolution layer and a deconvolution layer, the convolution layer carries out convolution processing on an original image according to a scaling ratio and outputs N sub-images, wherein N is an integer greater than 1, the N sub-images respectively store pixel values of target pixels of N relative positions in a target image, and the deconvolution layer carries out merging operation on the N sub-images according to the N relative positions to obtain the target image. The image interpolation is realized by adopting the deep learning hardware reasoning accelerator, compared with the image interpolation realized by using software alone, the calculation efficiency is higher, the power consumption is lower, compared with the image interpolation realized by using GPU, the image interpolation algorithm can be adjusted by adjusting the convolution kernel of the convolution layer, and the deep learning hardware reasoning accelerator does not need to be modified, so that the realization is more flexible. In summary, the technical scheme provided by the application can improve the computing efficiency and the implementation flexibility, and can also reduce the power consumption.

Drawings

Fig. 1 is a schematic diagram showing a relative distance between an insertion pixel point and a reference pixel point according to the related art.

Fig. 2 is a flow chart illustrating an image interpolation method according to an exemplary embodiment.

Fig. 3 is a schematic diagram showing a relative position between an original pixel and a target pixel according to an exemplary embodiment.

FIG. 4 is a schematic diagram of a deep learning hardware inference accelerator, according to an example embodiment.

Fig. 5 is a flowchart illustrating an image interpolation method according to another exemplary embodiment.

FIG. 6 is a flow chart illustrating one specific implementation of step 501, according to an exemplary embodiment.

FIG. 7 is a flowchart illustrating a method of determining the number of channels of an initial convolution kernel from a scale according to an exemplary embodiment.

FIG. 8 is a flowchart illustrating a method of determining weight parameters for each channel initial convolution kernel based on scaling, according to an example embodiment.

FIG. 9 is a flow chart illustrating one specific implementation of step 602 according to an exemplary embodiment.

FIG. 10 is a flow chart illustrating one specific implementation of step 603 according to an exemplary embodiment.

FIG. 11 is a diagram illustrating a fill pattern with initial convolution kernels at the top left side of a fill location in accordance with an example embodiment.

FIG. 12 is a diagram illustrating a fill pattern with left-hand bottom initial convolution kernels at a fill position in accordance with an example embodiment.

FIG. 13 is a diagram illustrating a fill pattern with initial convolution kernels on the upper right side as a fill position, according to an example embodiment.

FIG. 14 is a diagram illustrating a fill pattern with a bottom-right initial convolution kernel according to an example embodiment.

Fig. 15 is a flowchart illustrating an image interpolation method of which an image format of an original image is RGB or YUV444 according to an exemplary embodiment.

Fig. 16 is a flowchart illustrating an image interpolation method of which an image format of an original image is RGB or YUV444 according to another exemplary embodiment.

Fig. 17 is a flowchart illustrating an image interpolation method for an original image with an image format of YUV420 according to an exemplary embodiment.

Fig. 18 is a block diagram illustrating an image interpolation apparatus according to an exemplary embodiment.

Fig. 19 is a block diagram of an electronic device, according to an example embodiment.

Detailed Description

Unless defined otherwise, technical or scientific terms used in the specification and claims should be given the ordinary meaning as understood by one of ordinary skill in the art to which the invention pertains. In the following, specific embodiments of the present invention will be described with reference to the drawings, and it should be noted that in the course of the detailed description of these embodiments, it is not possible in the present specification to describe all features of an actual embodiment in detail for the sake of brevity. Modifications and substitutions of embodiments of the invention may be made by those skilled in the art without departing from the spirit and scope of the invention, and the resulting embodiments are also within the scope of the invention.

In the related art, there are many image interpolation algorithms, such as nearest neighbor method, bilinear interpolation, and bicubic interpolation. Since image interpolation involves a large number of computations, the delay is severe with software implementation alone. The hardware alone cannot flexibly change the framework used for calculating the image interpolation or the kernel function of the framework, so that the implementation method is single and cannot be perfectly suitable for all conditions. While GPU implementations may address the drawbacks of the former two, traditional GPU computing is very expensive, while power consumption is also very high.

Taking bicubic interpolation as an example, bicubic interpolation is also known as tri-linear interpolation, cubic convolution interpolation, or cubic convolution algorithm. The algorithm uses 16 gray values around the sampling point to perform cubic interpolation. The method not only considers the gray value influence of four direct adjacent points, but also considers the influence of the gray value change rate between the adjacent points.

Assuming that the size of image a is MN, the size of image B scaled by K times is MN, i.e., k=m/M. Each pixel value in image a is known, while the pixel value in image B is unknown. To determine the value of each pixel (X, Y) in image B, we must first find the pixel (X, Y) in image a that corresponds to (X, Y), and then calculate the pixel value at B (X, Y) using the 16 pixels closest to a (X, Y) as parameters. The weights of these 16 pixels are calculated using a bicubic method, and the pixel value at B (X, Y) is a weighted sum of 16 pixels.

The bicubic interpolation function was constructed as follows:

；

wherein W (x) is a kernel function,xrepresenting the distance between the interpolated pixel point and the reference pixel point, the usual implementation is to compute the rows and columns sequentially. a is a super parameter controlling the process, and is usually selected to be-0.5 or-1. The kernel function will also be different depending on the relative position between the newly inserted pixel and the reference pixel. This results in an increase in computational complexity.

As shown in fig. 1, distances Dx1, dx2, dx3, dx4 between the insertion pixel point P1 and 4 reference pixel points P2 in the same row are 1.25, 0.25, -0.75, -1.75, respectively, and distances Dy1, dy2, dy3, dy4 between the insertion pixel point P1 and 4 reference pixel points P2 in the same column are 1.25, 0.25, -0.75, -1.75, respectively.

The kernel function will also be different depending on the relative position between the newly inserted pixel and the reference pixel. This results in an increase in computational complexity.

As described above, implementing the image interpolation algorithm alone using software, hardware, or GPU is not most preferred in terms of efficiency, flexibility, and power consumption.

With the continuous development of artificial intelligence, especially deep learning, the method has wide application in image and audio processing. The main implementation method of deep learning is based on a Deep Convolutional Neural Network (DCNN). Because of its wide range of applications, there are a wide variety of deep learning reasoning hardware accelerators on the market today. Compared with the traditional GPU-based reasoning engine, the system is higher in efficiency and lower in power consumption. To further improve reasoning efficiency, hardware accelerators typically require the use of fixed point computing (GPU using floating point computing), and most often 8-bit fixed points are used to achieve a balance of computational efficiency and accuracy. It is mainly to use special hardware (e.g. NPU (neural processing unit)/TPU (tensor processor)) to perform the usual convolution, activation, etc. To increase its applicability, NPUs are also commonly used in conjunction with some general purpose digital signal processors (DPSs), where the DSP processes operations that some NPUs cannot handle, such as some mathematical calculations (SOFTMAX) and pre-and post-processing. However, since the NPU and DSP are affiliated with different computing units, the handling of data and the conversion of data formats may be involved, which greatly affects the overall performance of the hardware inference accelerator.

In order to solve the technical problems, the application provides an image interpolation method, an image interpolation device, electronic equipment and a storage medium, which can improve the calculation efficiency and the realization flexibility and reduce the power consumption.

Fig. 2 is a flow chart illustrating an image interpolation method according to an exemplary embodiment. The image interpolation method is applied to a deep learning hardware reasoning accelerator. Referring to fig. 2, the image interpolation method may include the steps of:

step 201, inputting an original image and a scaling ratio into a deep learning hardware reasoning accelerator, wherein the deep learning hardware reasoning accelerator comprises a convolution layer and a deconvolution layer; the deep learning hardware inference accelerator is used to scale the original image by a scaling scale to obtain the target image.

Step 202, a convolution layer carries out convolution processing on an original image according to a scaling ratio and outputs N sub-images, wherein N is an integer greater than 1; the N sub-images store pixel values of N relative-position target pixels in the target image, respectively.

In step 203, the deconvolution layer performs a merging operation on the N sub-images according to the N relative positions, so as to obtain a target image.

In this embodiment, the original image and the scaling ratio are input to the deep learning hardware inference accelerator, so that the deep learning hardware inference accelerator scales the original image according to the scaling ratio to obtain the target image.

In one exemplary embodiment, the original image is 1080P, scaled 3/2, i.e., reduced by a factor of 1.5. The target image is 720P. As shown in fig. 3, the original pixels 31 are pixels of an original image, and a plurality of original pixels 31 are arranged in an array along an X-axis and a Y-axis to form the original image, wherein the X-axis and the Y-axis are perpendicular to each other. The target pixels 32 are pixels of a target image, and a plurality of target pixels 32 are arranged in an array along the X-axis and the Y-axis to form the target image. The minimal repeating unit 33 is used to determine the weight parameters of the convolutional layer.

In the above-described exemplary embodiment, as shown in fig. 3, the target pixels 32 can be classified into four types according to the relative positions to the original pixels 31: (1) The first type of target pixel 321 is located at the lower right corner of the original pixel 31 with respect to the original pixel 31, wherein the positive X-axis direction is right, the negative X-axis direction is left, the positive Y-axis direction is up, and the negative Y-axis direction is down. (2) The second type of target pixel 322 is located in the upper right corner of the original pixel 31 with respect to the original pixel 31. (3) The third category of target pixels 323 is located in the lower left corner of the original pixel 31 with respect to the original pixel 31. (4) The fourth category of target pixels 324 is located in the upper left corner of the original pixel 31 relative to the original pixel 31.

In one embodiment, as shown in FIG. 4, the deep learning hardware inference accelerator includes an input layer 41, a convolution layer 43, a deconvolution layer 44, and an output layer 45. In the above-described exemplary embodiment, the input layer 41 is configured to input an original image and a scale, the convolution layer 43 is configured to convolve the original image according to the scale, and output 4 sub-images (360×640) that respectively store pixel values of target pixels at the above-described 4 relative positions in the target image. In other exemplary embodiments, N is other integers greater than 1. The deconvolution layer 44 is configured to perform a merging operation on the 4 sub-images according to 4 relative positions, merge the 4 sub-images output by the convolution layer 43 into a single channel in a correct order, obtain a target image, and the output layer 45 is configured to output the target image.

In the embodiment, the original image and the scaling ratio are input into the deep learning hardware reasoning accelerator, so that the deep learning hardware reasoning accelerator scales the original image according to the scaling ratio to obtain the target image; the deep learning hardware reasoning accelerator comprises a convolution layer and a deconvolution layer, the convolution layer carries out convolution processing on an original image according to a scaling ratio and outputs N sub-images, wherein N is an integer greater than 1, the N sub-images respectively store pixel values of target pixels of N relative positions in a target image, and the deconvolution layer carries out merging operation on the N sub-images according to the N relative positions to obtain the target image. The image interpolation is realized by adopting the deep learning hardware reasoning accelerator, compared with the image interpolation realized by using software alone, the calculation efficiency is higher, the power consumption is lower, compared with the image interpolation realized by using GPU, the image interpolation algorithm can be adjusted by adjusting the convolution kernel of the convolution layer, and the deep learning hardware reasoning accelerator does not need to be modified, so that the realization is more flexible. In summary, the technical scheme provided by the application can improve the calculation efficiency and the implementation flexibility and can reduce the power consumption.

In one embodiment, as shown in fig. 5, prior to step 202, the steps of:

step 501, determining the number of channels of the target convolution kernel, the size of each channel target convolution kernel, the weight parameter of each channel target convolution kernel and the corresponding position of each channel target convolution kernel according to the scaling ratio; wherein the number of channels of the target convolution kernel is N.

In one embodiment, as shown in fig. 6, step 501 may include the steps of:

step 601, determining the number of channels of the initial convolution kernel, the size of each channel initial convolution kernel, the weight parameter of each channel initial convolution kernel and the corresponding position of each channel initial convolution kernel according to the scaling; wherein the number of channels of the initial convolution kernel is N.

In one embodiment, the scale is in fractional form. Since the number of channels of the initial convolution kernel, and the size of each channel of the initial convolution kernel are integers, the scaling needs to be converted into a fractional form. The common image scaling scale can be converted into fractional form, for example, the original image is 1080P, the target image is 720, and the scaling scale is 2/3; the original image is 4K, the target image is 1080P, and the scaling ratio is 1/2; the original image is 4K, the target image is 720P, and the scaling ratio is 1/3.

In one embodiment, as shown in FIG. 7, determining the number of channels of the initial convolution kernel from the scale may include the steps of:

in step 701, the relative position of the target pixel in the target image and the original pixel in the original image is determined according to the scaling.

Step 702, classifying all the target pixels according to the relative positions of the target pixels and the original pixels, and determining the number of classes of the target pixels.

In step 703, the number of channels of the initial convolution kernel is determined according to the number of classes of the target pixel.

In the above-described exemplary embodiment, as shown in fig. 3, the relative positions of the target pixel 32 in the target image and the original pixel 31 in the original image may be determined according to the scaling. Then, all the target pixels 32 may be classified according to the relative positions of the target pixels 32 and the original pixels 31, and the number of categories of the target pixels 32 may be determined. Then, the number of channels of the initial convolution kernel is determined according to the number of classes of the target pixel. Wherein the number of channels of the initial convolution kernel is equal to the number of classes of the target pixel. For example, the number of classes of target pixels 32 is 4, as is the number of channels of the initial convolution kernel.

In one embodiment, the size of the initial convolution kernel for each channel may be determined from the scaling and image interpolation functions. In the exemplary embodiment described above, the scaling is 3/2, the image interpolation function is a bicubic interpolation function, and the size of the initial convolution kernel for each channel is 4.

In one embodiment, as shown in fig. 8, determining the weight parameter of each channel initial convolution kernel according to the scaling may include the steps of:

step 801, for each channel of the initial convolution kernel, determining a plurality of original pixels in the original image within the size range of the initial convolution kernel each time participating in the convolution operation.

Step 802, determining, for each original pixel, a distance between a target pixel and the original pixel in an X-axis and a Y-axis; the plurality of original pixels are arranged in an array along an X axis and a Y axis, and the X axis and the Y axis are mutually perpendicular.

Step 803, calculating to obtain the weight parameter of the initial convolution kernel according to the distance between the target pixel and the original pixel on the X axis and the Y axis and the image interpolation function.

In the above-described exemplary embodiment, for each channel of the initial convolution kernel, a plurality of original pixels in the original image within the initial convolution kernel size range each participating in the convolution operation are determined, wherein the original pixels in the original image within the initial convolution kernel size range each participating in the convolution operation are 16 original pixels.

Then, for each original pixel, determining the distance between the target pixel and the original pixel in the X axis and the Y axis; the plurality of original pixels are arranged in an array along an X axis and a Y axis, and the X axis and the Y axis are mutually perpendicular. For example, the distances between the target pixel 321 and the original pixel on the X-axis are 1.25, 0.25, -0.75, and-1.75, respectively, and the distances between the target pixel 321 and the original pixel on the Y-axis are 1.25, 0.25, -0.75, and-1.75, respectively. The distances between the target pixel 322 and the original pixel in the X-axis are 1.25, 0.25, -0.75 and-1.75, respectively, and the distances between the target pixel 322 and the original pixel in the Y-axis are 1.25, 0.25, -0.75 and-1.75, respectively. The distances between the target pixel 323 and the original pixel in the X axis are 1.75, 0.75, -0.25 and-1.25, respectively, and the distances between the target pixel 323 and the original pixel in the Y axis are 1.75, 0.75, -0.25 and-1.25, respectively. The distances between the target pixel 323 and the original pixel in the X axis are 1.75, 0.75, -0.25 and-1.25, respectively, and the distances between the target pixel 323 and the original pixel in the Y axis are 1.75, 0.75, -0.25 and-1.25, respectively.

And then, calculating to obtain the weight parameter of the initial convolution kernel according to the distances between the target pixel and the original pixel on the X axis and the Y axis and the image interpolation function. For example, substituting the distance between the target pixel 321 and the original pixel in the X-axis and the Y-axis into the bicubic interpolation function can obtain the initial convolution kernel of the corresponding channel. Substituting the distance between the target pixel 322 and the original pixel in the X axis and the Y axis into the bicubic interpolation function can obtain the initial convolution kernel of the corresponding channel. Substituting the distance between the target pixel 323 and the original pixel in the X axis and the Y axis into the bicubic interpolation function can obtain the initial convolution kernel of the corresponding channel. Substituting the distance between the target pixel 324 and the original pixel in the X-axis and the Y-axis into the bicubic interpolation function can obtain the initial convolution kernel of the corresponding channel. Thus, the initial convolution kernels (kernel functions) of 4 different channels are obtained in total, and the weight parameters of the initial convolution kernels are obtained.

In one embodiment, the original image is derived to obtain the positions corresponding to the initial convolution kernels for the 4 different channels to merge into the different channels of one convolution layer.

Step 602, determining the size of each channel target convolution kernel according to the size of each channel initial convolution kernel and the position corresponding to each channel initial convolution kernel, wherein the size of each channel target convolution kernel is larger than the size of each channel initial convolution kernel.

In one embodiment, as shown in FIG. 9, step 602 may include the steps of:

step 901, determining a minimum repeating unit according to the size of each channel initial convolution kernel and the position corresponding to each channel initial convolution kernel; the minimum repeating unit is a unit composed of a plurality of original pixels in a target convolution kernel size range which participates in convolution operation each time in the original image.

Step 902, determining the size of each channel target convolution kernel according to the size of the minimum repeating unit.

As shown in fig. 3, since the convolution operation is a repeated operation on the original image using a convolution kernel, the minimum repeated unit can ensure that its condition is satisfied. As described above, there are 4 initial convolution kernels, each of which needs to cover 16 corresponding original pixels, which are not exactly the same. Therefore, the 16 corresponding original pixels covered by each of the 4 initial convolution kernels need to be merged into the minimal repeating unit 33. The minimum repeating unit 33 has a size of 5, i.e., the minimum repeating unit 33 includes 25 original pixels 31.

In one embodiment, the minimum repeating unit may be determined according to the size of each channel initial convolution kernel and the position corresponding to each channel initial convolution kernel, where the minimum repeating unit is a unit composed of a plurality of original pixels in the size range of the target convolution kernel participating in the convolution operation each time in the original image. Then, the size of each channel target convolution kernel is determined according to the size of the minimum repeating unit, wherein the size of each channel target convolution kernel is equal to the size of the minimum repeating unit.

Step 603, for each channel of the initial convolution kernel, converting the channel initial convolution kernel into the channel target convolution kernel according to the size of the channel initial convolution kernel, the size of the channel target convolution kernel and the position corresponding to the channel initial convolution kernel, where the position corresponding to the channel target convolution kernel is the same as the position corresponding to the channel initial convolution kernel.

In one embodiment, as shown in fig. 10, step 603 may include the steps of:

step 1001, for each channel of the initial convolution kernel, determining a filling position according to the size of the initial convolution kernel of the channel, the size of the target convolution kernel of the channel, and the position corresponding to the initial convolution kernel of the channel.

Step 1002, filling weight parameters into filling positions to obtain the channel target convolution kernel.

In one embodiment, for each channel of the initial convolution kernel, a fill position is determined based on the size of the channel initial convolution kernel, the size of the channel target convolution kernel, and the corresponding position of the channel initial convolution kernel. And then filling weight parameters into the filling positions to obtain the channel target convolution kernel. The weight parameter of the padding may be 0. In this way, it is ensured that each channel target convolution kernel is at a corresponding position.

For example, as shown in fig. 11, for a channel for generating an initial convolution kernel of the target pixel 321, the filling position is the upper left side, where the weight parameter 111 is a weight parameter of the initial convolution kernel, the weight parameter 112 is a filled weight parameter, and the weight parameter 112 is 0.

As shown in fig. 12, for the channel used to generate the initial convolution kernel of the target pixel 322, the fill position is the lower left side, where the weight parameter 111 is the weight parameter of the initial convolution kernel, the weight parameter 112 is the weight parameter of the fill, and the weight parameter 112 is 0.

As shown in fig. 13, for the channel used to generate the initial convolution kernel of the target pixel 323, the fill position is the upper right side, where the weight parameter 111 is the weight parameter of the initial convolution kernel, the weight parameter 112 is the weight parameter of the fill, and the weight parameter 112 is 0.

As shown in fig. 14, for the channel used to generate the initial convolution kernel for the target pixel 324, the fill position is the bottom right, where the weight parameter 111 is the weight parameter of the initial convolution kernel, the weight parameter 112 is the weight parameter of the fill, and the weight parameter 112 is 0.

Step 502, merging all the channel target convolution kernels according to the corresponding positions of each channel target convolution kernel to obtain the weight parameters of the convolution layers.

In the exemplary embodiment described above, the initial convolution kernel of 4*4 is filled with the target convolution kernel of 5*5. And then, combining the 4 channel target convolution kernels according to the positions corresponding to each channel target convolution kernel, so as to obtain the weight parameters of the convolution layers. The shape of the weight parameter of the convolution layer is [1,5,5,4 ]. Since 2 x 2 pixels are generated at a time, the convolution step is [3,3].

In one embodiment, as shown in FIG. 4, the deep learning hardware inference accelerator further includes a filler layer 42. Prior to step 202, the steps of:

the filling layer 42 fills a circle of pixels around the periphery of the original image, and the width of the filled pixel area is 1 pixel. The pixel values of the filling may be mirror images, that is, pixels symmetric to the edges are filled with pixel values of the pixels, but not limited thereto. In this way, the size and the expectation of the output target image can be made uniform (720P).

It should be noted that the pixel values to be filled have a direct influence on the boundary of the result, and different filling methods may be selected according to the supporting situation of the hardware, wherein it is recommended to use mirror images or symmetrical mirror images for achieving the best boundary effect.

In one embodiment, a weight parameter less than 0.1/256 in the target convolution kernel is assigned to 0, which can improve the calculation efficiency.

Fig. 15 is a flowchart illustrating an image interpolation method according to another exemplary embodiment. In this embodiment, the image format of the original image is RGB or YUV444. Referring to fig. 15, the image interpolation method may include the steps of:

step 1501, inputting the original image and the scaling into a deep learning hardware inference accelerator, wherein the deep learning hardware inference accelerator comprises a convolution layer and a deconvolution layer; the deep learning hardware inference accelerator is used to scale the original image by a scaling scale to obtain the target image.

In this embodiment, step 1501 is similar to step 201 described above, and will not be described again.

In step 1502, for any channel of the original image, the convolution layer convolves the channel image according to the scaling, and outputs N sub-images.

For the original images in RGB and YUV444 formats, the size of each channel image is the same, and scaling processing can be performed on each channel image before merging. In the present embodiment, the image format of the original image is described as RGB. In other embodiments, the image scaling may also be performed by using the image interpolation method provided in the present embodiment for the original image with the same size for each channel image.

In this embodiment, the original image includes an R-channel image, a G-channel image, and a B-channel image. The R channel image, G channel image and B channel image are the same size. For any one channel of the R channel image, the G channel image and the B channel image, the convolution layer carries out convolution processing on the channel image according to the scaling ratio and outputs N sub-images.

For example, for an R-channel image, the convolution layer convolves the R-channel image according to a scaling ratio and outputs N sub-images. For the G channel image, the convolution layer carries out convolution processing on the G channel image according to the scaling ratio and outputs N sub-images. For the B-channel image, the convolution layer carries out convolution processing on the B-channel image according to the scaling ratio and outputs N sub-images.

In step 1503, for any channel of the original image, the deconvolution layer performs a merging operation on the N sub-images according to N relative positions, to obtain a first intermediate image.

In this embodiment, for any one channel of the R channel image, the G channel image, and the B channel image, the deconvolution layer performs a merging operation on the N sub-images according to the N relative positions, to obtain a first intermediate image.

For example, for an R-channel image, the deconvolution layer performs a merging operation on the N sub-images according to N relative positions, resulting in a corresponding first intermediate image. And for the G-channel image, the deconvolution layer carries out merging operation on the N sub-images according to the N relative positions to obtain a corresponding first intermediate image. And for the B-channel image, the deconvolution layer carries out merging operation on the N sub-images according to the N relative positions to obtain a corresponding first intermediate image. Thus, a total of 3 first intermediate images are obtained.

Step 1504, merging the first intermediate images of all channels to obtain a target image.

In this embodiment, the deconvolution layer combines the first intermediate images of all channels to obtain the target image. For example, the deconvolution layer combines the R-channel image, the G-channel image, and the first intermediate image corresponding to each of the B-channel images to obtain the target image.

Fig. 16 is a flowchart illustrating an image interpolation method according to another exemplary embodiment. In this embodiment, the image format of the original image is RGB or YUV444. Referring to fig. 16, the image interpolation method may include the steps of:

step 1601, inputting the original image and the scaling into a deep learning hardware inference accelerator, wherein the deep learning hardware inference accelerator comprises a convolution layer and a deconvolution layer; the deep learning hardware inference accelerator is used to scale the original image by a scaling scale to obtain the target image.

In this embodiment, step 1601 is similar to step 201 described above, and is not described herein.

In step 1602, for any channel of the original image, the convolution layer convolves the channel image according to the scaling, and outputs N sub-images.

In this embodiment, step 1602 is similar to step 1502 described above, and will not be described again.

In step 1603, for all the sub-images of the channels of the original image, the deconvolution layer performs a merging operation on the 3N sub-images according to N relative positions, to obtain a target image.

In the present embodiment, the image format of the original image is described as RGB. For the R channel image, the convolution layer carries out convolution processing on the R channel image according to the scaling ratio and outputs N sub-images. For the G channel image, the convolution layer carries out convolution processing on the G channel image according to the scaling ratio and outputs N sub-images. For the B-channel image, the convolution layer carries out convolution processing on the B-channel image according to the scaling ratio and outputs N sub-images. Thus, a total of 3N sub-images were obtained. Then, the deconvolution layer performs a merging operation on the 3N sub-images according to N relative positions, resulting in a target image. In this embodiment, the merging operation may be performed only once, saving program steps.

Fig. 17 is a flowchart illustrating an image interpolation method according to another exemplary embodiment. In this embodiment, the image format of the original image is YUV420. Referring to fig. 17, the image interpolation method may include the steps of:

Step 1701, inputting the original image and the scaling into a deep learning hardware inference accelerator, wherein the deep learning hardware inference accelerator comprises a convolution layer and a deconvolution layer; the deep learning hardware inference accelerator is used to scale the original image by a scaling scale to obtain the target image.

In this embodiment, step 1701 is similar to step 201 described above, and will not be described again.

In step 1702, for the Y channel of the original image, the convolution layer convolves the channel image according to the scaling, and outputs N sub-images, for the U channel of the original image, the convolution layer convolves the channel image according to the scaling, and outputs N sub-images, and for the V channel of the original image, the convolution layer convolves the channel image according to the scaling, and outputs N sub-images.

In this embodiment, since the image format of the original image is YUV420, the original image includes a Y-channel image, a U-channel image, and a V-channel image, and since the U-channel image, the V-channel image, and the Y-channel image have different sizes and may also have different storage modes, the U-channel image, the V-channel image, and the Y-channel image are processed respectively.

In the present embodiment, for the Y channel of the original image, the convolution layer 43 convolves the channel image according to the scaling, and outputs N sub-images. Wherein, before the convolution layer 43 performs convolution processing on the Y-channel image of the original image, the filling layer 42 fills the Y-channel image, and the pixel value of the filled pixel is 0.

For the U channel of the original image, the convolution layer carries out convolution processing on the channel image according to the scaling ratio and outputs N sub-images. Wherein, before the convolution layer 43 convolves the U-channel image of the original image, the filling layer 42 fills the U-channel image, and the pixel value of the filled pixel is 128.

For the V channel of the original image, the convolution layer carries out convolution processing on the channel image according to the scaling ratio and outputs N sub-images. Wherein, before the convolution layer 43 convolves the V-channel image of the original image, the filling layer 42 fills the V-channel image, and the pixel value of the filled pixel is 128.

In processing an image in YUV format, if a constant is used as a filling value, the selected value must be consistent with the corresponding color gamut, and it is generally not recommended to use a default filling method of deep learning (filling value=0).

In step 1703, for the Y channel of the original image, the deconvolution layer performs a merging operation on the N sub-images according to the N relative positions to obtain a second intermediate image, for the U channel of the original image, the deconvolution layer performs a merging operation on the N sub-images according to the N relative positions to obtain a third intermediate image, and for the V channel of the original image, the deconvolution layer performs a merging operation on the N sub-images according to the N relative positions to obtain a fourth intermediate image.

In this embodiment, step 1703 is similar to step 1503 described above, and will not be described herein.

Step 1704, merging the second intermediate image, the third intermediate image and the fourth intermediate image to obtain the target image.

In this embodiment, step 1704 is similar to step 1504 described above, and will not be described again.

In this embodiment, an image interpolation method for the case that the sizes of the channels of the original image are inconsistent is provided, so that the input image format is not limited, and the implementation flexibility is improved.

According to the technical scheme, the interpolation algorithm is realized on the deep learning hardware accelerator by adopting a convolution method. The mathematical operation of the matrix is realized by using convolution, and meanwhile, the sub-images generated by the convolution layer are fused by arranging the deconvolution layer so as to obtain a correct result.

The technical scheme provided by the application has the following advantages:

(1) And the interpolation algorithm is realized in one step by using a convolution layer and a specially designed weight parameter.

(2) The application is compatible with all AI (Artificial Intelligence ) hardware reasoning accelerators without using operations such as preprocessing, post-processing and the like. AI hardware inference accelerators include deep learning hardware inference accelerators, as well as other AI hardware inference accelerators.

(3) The weight parameters of the convolution layer can be flexibly transformed according to the type of an image scaling algorithm (image interpolation function), can support all existing mainstream interpolation methods, including a multi-phase image interpolation algorithm and the like, and can also support the weight parameters customized by a user. For example, the weight parameters may be obtained from formulas substituted into the relative distance-to-image scaling algorithm. The image scaling algorithm is changed by only bringing a new formula. At the same time, different kinds of scaling methods may be supported, e.g. the BICUBIC algorithm uses adjacent 16 pixels. If the LANCZOS3 algorithm needs to be realized and adjacent 36 pixels are used, the method can be realized by only modifying the convolution kernel, and an AI hardware reasoning accelerator does not need to be modified.

(4) The custom Padding mode is used, and special treatment on boundary conditions is not needed;

(5) The input image format is not limited, and RGB, YUV or other image formats can be input without format conversion and can be directly operated in the original image format.

(6) The whole process does not need other hardware participation such as CPU, GPU, DPS (data processing system), and only needs to use an AI hardware reasoning accelerator supporting convolution.

Fig. 18 is a block diagram of an image interpolation apparatus according to an exemplary embodiment. As shown in fig. 18, in the present embodiment, the image interpolation apparatus includes:

an input module 181 configured to input the original image and the scale into a deep learning hardware inference accelerator, wherein the deep learning hardware inference accelerator comprises a convolution layer and a deconvolution layer; the deep learning hardware reasoning accelerator is used for scaling the original image according to the scaling ratio so as to obtain a target image;

a processing module 182 configured to perform convolution processing on the original image by the convolution layer according to the scaling, and output N sub-images, where N is an integer greater than 1; n sub-images respectively store pixel values of target pixels at N relative positions in the target image;

and a merging module 183 configured to perform a merging operation on the N sub-images according to N relative positions by the deconvolution layer, so as to obtain the target image.

The embodiment of the application also provides electronic equipment, which comprises a processor and a memory; the memory is used for storing a computer program executable by the processor; the processor is configured to execute a computer program in the memory to implement the image interpolation method of any of the above embodiments.

Embodiments of the present application also propose a computer-readable storage medium, when an executable computer program in the storage medium is executed by a processor, capable of implementing the image interpolation method of any of the above embodiments.

The specific manner in which the processor performs the operations in the apparatus of the above embodiments has been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Fig. 19 is a block diagram of an electronic device, according to an example embodiment. For example, electronic device 1900 may be provided as a server. Referring to fig. 19, device 1900 includes a processing component 1922 that further includes one or more processors and memory resources represented by memory 1932 for storing instructions, such as application programs, that can be executed by processing component 1922. The application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions. Further, processing component 1922 is configured to execute instructions to perform the above-described methods for image interpolation.

Electronic device 1900 may also include a power component 1926 configured to perform power management of device 1900, a wired or wireless network interface 1950 configured to connect device 1900 to a network, and an input/output (I/O) interface 1958. The device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, macOS XTM, unixTM, linuxTM, freeBSDTM, or the like.

In an exemplary embodiment, a non-transitory computer-readable storage medium is also provided, such as a memory 1932, comprising instructions executable by the processing component 1922 of the device 1900 to perform the methods described above. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

In the present invention, the terms "first", "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The term "plurality" refers to two or more, unless explicitly defined otherwise.

The embodiments are described above in order to facilitate the understanding and application of the present application by those of ordinary skill in the art. It will be apparent to those skilled in the art that various modifications can be made to these embodiments and that the general principles described herein may be applied to other embodiments without the use of inventive faculty. Accordingly, the present application is not limited to the embodiments herein, and those skilled in the art, based on the present disclosure, may make improvements and modifications without departing from the scope and spirit of the present application.

Claims

1. An image interpolation method, comprising:

the deconvolution layer carries out merging operation on the N sub-images according to N relative positions to obtain the target image;

Combining all the channel target convolution kernels according to the positions corresponding to each channel target convolution kernel to obtain weight parameters of the convolution layers;

the determining, according to the scaling, the number of channels of the target convolution kernel, the size of each channel target convolution kernel, the weight parameter of each channel target convolution kernel, and the position corresponding to each channel target convolution kernel includes:

for each channel of the initial convolution kernel, converting the initial convolution kernel of the channel into the target convolution kernel of the channel according to the size of the initial convolution kernel of the channel, the size of the target convolution kernel of the channel and the corresponding position of the initial convolution kernel of the channel, wherein the corresponding position of the target convolution kernel of the channel is the same as the corresponding position of the initial convolution kernel of the channel;

The determining the channel number of the initial convolution kernel according to the scaling comprises the following steps:

2. The image interpolation method of claim 1, wherein the determining the weight parameter of the initial convolution kernel for each channel according to the scaling comprises:

3. The image interpolation method of claim 1, wherein the determining the size of the target convolution kernel for each channel according to the size of the initial convolution kernel for each channel and the position corresponding to the initial convolution kernel for each channel comprises:

4. The image interpolation method as set forth in claim 3, wherein said converting the channel initial convolution kernel into the channel target convolution kernel according to a size of the channel initial convolution kernel, a size of the channel target convolution kernel, and a position corresponding to the channel initial convolution kernel for each channel of the initial convolution kernels, comprises:

5. The image interpolation method of claim 1, wherein the deep learning hardware inference accelerator further comprises a filler layer;

6. The image interpolation method according to claim 1, wherein the image format of the original image is RGB or YUV444;

7. The image interpolation method according to claim 1, wherein the image format of the original image is RGB or YUV444;

8. The image interpolation method according to claim 1, wherein the image format of the original image is YUV420;

9. The image interpolation method of claim 1, wherein the scale is in fractional form.

10. An image interpolation apparatus, comprising:

the merging module is configured to perform merging operation on the N sub-images according to N relative positions by the deconvolution layer to obtain the target image;

the processing module is configured to determine the number of channels of the target convolution kernel, the size of each channel target convolution kernel, the weight parameter of each channel target convolution kernel and the corresponding position of each channel target convolution kernel according to the scaling ratio before the convolution layer carries out convolution processing on the original image according to the scaling ratio and outputs N sub-images, and combine all the channel target convolution kernels according to the corresponding position of each channel target convolution kernel to obtain the weight parameter of the convolution layer; wherein the number of channels of the target convolution kernel is N;

the determining, according to the scaling, the number of channels of the target convolution kernel, the size of each channel target convolution kernel, the weight parameter of each channel target convolution kernel, and the position corresponding to each channel target convolution kernel includes: determining the number of channels of the initial convolution kernel, the size of each channel initial convolution kernel, the weight parameter of each channel initial convolution kernel and the corresponding position of each channel initial convolution kernel according to the scaling; wherein the number of channels of the initial convolution kernel is N; determining the size of each channel target convolution kernel according to the position of the size of each channel initial convolution kernel and the position corresponding to each channel initial convolution kernel, wherein the size of each channel target convolution kernel is larger than the size of each channel initial convolution kernel; for each channel of the initial convolution kernel, converting the initial convolution kernel of the channel into the target convolution kernel of the channel according to the size of the initial convolution kernel of the channel, the size of the target convolution kernel of the channel and the corresponding position of the initial convolution kernel of the channel, wherein the corresponding position of the target convolution kernel of the channel is the same as the corresponding position of the initial convolution kernel of the channel;

The determining the channel number of the initial convolution kernel according to the scaling comprises the following steps: determining the relative positions of target pixels in the target image and original pixels in the original image according to the scaling; classifying all the target pixels according to the relative positions of the target pixels and the original pixels, and determining the category number of the target pixels; and determining the channel number of the initial convolution kernel according to the category number of the target pixel.

11. An electronic device comprising a memory and a processor, the memory for storing a computer program executable by the processor; the processor is configured to execute a computer program in the memory to implement the method of any one of claims 1-9.

12. A computer readable storage medium having stored thereon a computer program, characterized in that the method according to any of claims 1-9 is enabled when the executable computer program in the storage medium is executed by a processor.