CN116402684A

CN116402684A - Super resolution method, apparatus and computer storage medium for image

Info

Publication number: CN116402684A
Application number: CN202310333426.8A
Authority: CN
Inventors: 朱丹; 高艳
Original assignee: BOE Technology Group Co Ltd
Current assignee: BOE Technology Group Co Ltd
Priority date: 2023-03-30
Filing date: 2023-03-30
Publication date: 2023-07-07

Abstract

The application discloses a super-resolution method, super-resolution equipment and a computer storage medium for images, and belongs to the field of image processing. The method comprises the following steps: acquiring an image to be processed; inputting the image to be processed into a superdivision model; and acquiring a processed image output by the super-division model, wherein the resolution of the processed image is larger than that of the image to be processed. According to the method, the image to be processed is input into the superdivision model, the superdivision model comprises the feature extraction module, the channel splitting module, the display lookup tables and the up-sampling module which are sequentially connected, the channel splitting module can split the channel feature images output by the feature extraction module and then input the display lookup tables respectively, the display lookup tables process the channel feature images respectively and output the processed channel feature images, and therefore a feature reconstruction module comprising a convolution layer is not required to be arranged, the problem that the time consumption of a superresolution method of the image in the related technology is long is solved, and the time consumption shortening effect is achieved.

Description

Super resolution method, apparatus and computer storage medium for image

Technical Field

The present invention relates to the field of image processing, and in particular, to a super-resolution method, apparatus, and computer storage medium for an image.

Background

The super-resolution technique of an image is a technique of processing an image of low resolution into an image of high resolution.

In the super-resolution method of the image, the image to be processed is input into a super-division model, the super-division model comprises a feature extraction module, a feature reconstruction module and an up-sampling module, wherein the feature reconstruction module can comprise a plurality of convolution layers which are sequentially arranged, a plurality of channel feature images output by the feature extraction module can be processed, and the processed channel feature images are input into the up-sampling module.

However, in the above super-resolution model, the calculation amount of a plurality of convolution layers in the feature reconstruction module is large, which results in a large calculation amount and a long time consumption of the super-resolution method of the image.

Disclosure of Invention

The embodiment of the application provides a super-resolution method, device and computer storage medium of an image. The technical scheme is as follows:

according to a first aspect of the present application, there is provided a super resolution method of an image, the method comprising:

acquiring an image to be processed;

inputting the image to be processed into a superdivision model, wherein the superdivision model comprises a feature extraction module, a channel splitting module, a plurality of display lookup tables and an up-sampling module which are sequentially connected, the feature extraction module is used for carrying out feature extraction processing on the image to be processed to obtain a plurality of channel feature images, the channel splitting module is used for splitting the plurality of channel feature images and then respectively inputting the plurality of display lookup tables into the plurality of display lookup tables, the plurality of display lookup tables are used for respectively processing the plurality of channel feature images and outputting a plurality of processed channel feature images, the up-sampling module is used for up-sampling the plurality of processed channel feature images and outputting the superdivision model, and the plurality of display lookup tables are obtained by supervised training of the superdivision model;

And acquiring a processing image output by the superdivision model, wherein the resolution of the processing image is larger than that of the image to be processed.

Optionally, the processing the plurality of channel feature maps respectively and outputting the processed plurality of channel feature maps includes:

respectively carrying out parallel processing on the plurality of channel feature graphs;

and outputting the processed multiple channel feature maps in parallel.

Optionally, before the image to be processed is input into the super-division model, the method further includes:

acquiring a to-be-trained superdivision model, wherein the to-be-trained superdivision model comprises a to-be-trained feature extraction module, the channel splitting module, a plurality of to-be-trained display lookup tables and the up-sampling module which are connected in sequence;

performing multiple times of cyclic training on the superdivision model to be trained;

stopping the cyclic training in response to reaching a training cut-off condition after the nth cyclic training in the cyclic training;

determining the superdivision model based on n times of the cyclic training;

wherein, the one-time cyclic training includes:

inputting a first image in a training sample in a training set into the super-score model to be trained, wherein the training set comprises a plurality of training samples, the training samples comprise a first image and a second image, and the second image is a true value image with resolution ratio larger than that of the first image;

Acquiring a training processing image output by the superdivision model to be trained;

acquiring a loss value between the training processing image and the second image;

and adjusting the feature extraction module to be trained and the display lookup tables to be trained based on the loss value.

Optionally, the acquiring a loss value between the training processing image and the second image includes:

acquiring a loss value between the training processing image and the second image, at least one of a first loss value and a second loss value of the loss values, the first loss value and the second loss value comprising:

wherein Loss1 is the first Loss value, loss2 is the second Loss value, C is the number of channels of the first image, H is the height of the second image, W is the width of the second image, i and j are pixel coordinates, y _i，j，n For the nth channel in the second image, the value of the pixel with coordinates (i, j), f (x) _i，j，n ) For the nth channel in the training process image, the value of the pixel with coordinates (i, j) is calculated.

Optionally, the determining the superscore model based on the n times of the cyclic training includes:

acquiring image similarity corresponding to a superdivision model to be trained in each cycle training in n times of cycle training, wherein the image similarity is the similarity between the training processing image and the second image;

And determining the superscore model to be trained in the x-th cycle training as the superscore model, wherein the superscore model to be trained in the x-th cycle training is the superscore model to be trained with the maximum corresponding image similarity in the n-time cycle training.

Optionally, the training cutoff condition includes at least one of a number of times of cyclic training reaching a specified value, and a similarity between the training processed image and the second image reaching a specified value.

Optionally, the plurality of channel feature maps comprises a plurality of feature map sets, each of the feature map sets comprising at least one of the channel feature maps;

the plurality of display lookup tables respectively correspond to the plurality of feature map groups, and the channel splitting module is used for inputting the feature map groups into the corresponding display lookup tables.

Optionally, the upsampling module includes a recombination-based upsampling operator.

Optionally, the upsampling module further includes a cutoff layer, where the cutoff layer is located at an input end of the resampling-based upsampling operator, and is configured to limit values of the processed plurality of channel feature maps to a preset range and output the values to the reorganizing-based upsampling operator.

Optionally, the method is used for a terminal, the terminal comprises a first processor and a neural network processor, and the superdivision model is located in the neural network processor;

the inputting the image to be processed into the super-division model comprises the following steps:

acquiring the image to be processed through the first processor;

and importing the image to be processed into the neural network processor from the first processor, and inputting the image to be processed into the super-division model by the neural network processor.

Optionally, the terminal further includes a display, and after the processing image output by the superdivision model is acquired, the method further includes:

and importing the processed image into the first processor from the neural network processor, wherein the first processor is used for controlling the display to display the processed image.

Optionally, the method is used for a terminal, the terminal comprises a first processor and a graphics processor, and the superdivision model is located in the graphics processor;

acquiring the image to be processed through the first processor;

and importing the image to be processed into the graphic processor from the first processor, and inputting the image to be processed into the super-division model by the graphic processor.

According to another aspect of the embodiments of the present application, there is provided a super-resolution apparatus of an image, including a processor and a memory, in which at least one instruction, at least one program, a code set, or an instruction set is stored, the at least one instruction, the at least one program, the code set, or the instruction set being loaded and executed by the processor to implement a super-resolution method of an image as described above.

According to another aspect of the embodiments of the present application, there is provided a computer storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by a processor to implement a super resolution method of an image as described above.

According to another aspect of embodiments of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from the computer-readable storage medium by a processor of a computer device, and executed by the processor, cause the computer device to perform the methods provided in the various alternative implementations described above.

According to another aspect of the embodiments of the present application, there is provided a super-resolution apparatus of an image, the super-resolution apparatus of an image including:

the image acquisition module is used for acquiring an image to be processed;

the image input module is used for inputting the image to be processed into a super-division model, the super-division model comprises a feature extraction module, a channel splitting module, a plurality of display lookup tables and an up-sampling module which are connected in sequence, the feature extraction module is used for carrying out feature extraction processing on the image to be processed so as to obtain a plurality of channel feature images, the channel splitting module is used for splitting the plurality of channel feature images and then respectively inputting the plurality of display lookup tables, the plurality of display lookup tables are used for respectively processing the plurality of channel feature images and outputting a plurality of processed channel feature images, the up-sampling module is used for up-sampling the plurality of processed channel feature images and outputting the super-division model, and the plurality of display lookup tables are obtained by supervised training of the super-division model;

and the output module is used for acquiring a processing image output by the super-division model, and the resolution of the processing image is larger than that of the image to be processed.

Optionally, the image input module includes:

the parallel processing unit is used for respectively carrying out parallel processing on the plurality of channel feature graphs;

and the parallel output unit is used for outputting the processed multiple channel feature graphs in parallel.

Optionally, the super-resolution device of the image further includes:

the model acquisition module is used for acquiring a to-be-trained superdivision model, and the to-be-trained superdivision model comprises a to-be-trained feature extraction module, the channel splitting module, a plurality of to-be-trained display lookup tables and the up-sampling module which are connected in sequence;

the training module is used for carrying out multiple-time cyclic training on the superscore model to be trained;

the stopping module is used for stopping the cyclic training in response to the condition that the training cut-off condition is reached after the nth cyclic training in the cyclic training;

the determining module is used for determining the superdivision model based on n times of cyclic training;

wherein, training module includes:

the input unit is used for inputting a first image in a training sample in a training set into the super-score model to be trained, the training set comprises a plurality of training samples, the training samples comprise a first image and a second image, and the second image is a true value image with resolution ratio larger than that of the first image;

The image acquisition unit is used for acquiring a training processing image output by the superdivision model to be trained;

a loss acquisition unit configured to acquire a loss value between the training processing image and the second image;

and the adjusting unit is used for adjusting the feature extraction module to be trained and the display lookup tables to be trained based on the loss value.

Optionally, the loss acquisition unit is configured to:

Optionally, the determining module is configured to:

the image input module is used for:

acquiring the image to be processed through the first processor;

Optionally, the terminal further includes a display, and after the processed image output by the super-resolution model is acquired, the super-resolution device of the image further includes:

the importing module is used for importing the processed image into the first processor from the neural network processor, and the first processor is used for controlling the display to display the processed image.

the image input module is used for:

acquiring the image to be processed through the first processor;

The beneficial effects that technical scheme that this application embodiment provided include at least:

the image to be processed is input into the super-division model, the super-division model comprises a feature extraction module, a channel splitting module, a plurality of display lookup tables and an up-sampling module which are sequentially connected, the channel splitting module can split a plurality of channel feature images output by the feature extraction module and then input the display lookup tables respectively, the display lookup tables respectively process the channel feature images and output the processed channel feature images so as to be processed by the up-sampling module and output the super-division model, and therefore, a feature reconstruction module comprising a convolution layer is not required to be set.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a superdivision model;

fig. 2 is a block diagram of a terminal in an embodiment of the present application;

FIG. 3 is a block diagram of another terminal in an embodiment of the present application;

FIG. 4 is a method flow chart of a super resolution method for an image according to an embodiment of the present application;

FIG. 5 is a schematic diagram of the structure of a superdivision model in the embodiment shown in FIG. 4;

FIG. 6 is a method flow diagram of another method for super resolution of an image provided by an embodiment of the present application;

FIG. 7 is a schematic flow chart of one of the methods of FIG. 6 incorporated into a terminal;

FIG. 8 is a schematic structural diagram of a superdivision model according to an embodiment of the present application;

FIG. 9 is a flowchart of a method for obtaining a superscore model in an embodiment of the present application;

FIG. 10 is a training flow diagram of the method of FIG. 9;

FIG. 11 is a method flow chart of another super resolution method for an image provided by an embodiment of the present application;

FIG. 12 is a schematic flow chart of one of the methods shown in FIG. 11 incorporated into a terminal;

FIG. 13 is a schematic view of the processing effect of a compounder according to the application embodiment;

fig. 14 is a schematic view of an application scenario of the super resolution method of an image provided in the embodiment of the present application;

Fig. 15 is a block diagram of a super-resolution device for an image according to an embodiment of the present application.

Specific embodiments thereof have been shown by way of example in the drawings and will herein be described in more detail. These drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but to illustrate the concepts of the present application to those skilled in the art by reference to specific embodiments.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Super-Resolution (Super-Resolution) technology of an image is a technology for improving Resolution of an original image by means of hardware or software.

In the super resolution method of an image, an image to be processed is input into a super division model, as shown in fig. 1, fig. 1 is a schematic structural diagram of the super division model, the super division model includes a feature extraction module 11, a feature reconstruction module 12 and an up-sampling module 13, wherein the feature extraction module 11 and the feature reconstruction module 12 may each include a plurality of sequentially connected convolution layers (conv) sequentially arranged, the up-sampling module 13 may include sequentially connected up-sampling operators 131 and a truncation layer 132, the feature extraction module 11 may be used to process the image to be processed to obtain a plurality of channel feature maps, the feature reconstruction module 12 may process the plurality of channel feature maps output by the feature extraction module 11, and input the processed channel feature maps into the up-sampling module 13, and the up-sampling module 13 is used to obtain a processed image with high resolution (resolution is higher than the image to be processed) based on the channel feature maps processed by the feature reconstruction module 12.

However, in the above super-resolution model, the calculation amount of the plurality of convolution layers in the feature reconstruction module is large, which results in the calculation amount of the super-resolution method of the image being large, and may result in the time consumption of the super-resolution method being long.

In addition, the upsampling module in the above super-division model may also cause the processing speed of the super-division model to be slower.

The super resolution method provided in the embodiment of the present application may be applied to a terminal, and fig. 2 is a block diagram of a structure of a terminal in the embodiment of the present application, where the terminal 20 may include a first processor 21, a graphics processor 22, and a display 23.

The first processor 21 may comprise a central processor (Central Processing Unit, CPU), the first processor 21 may control the terminal as a whole, the graphics processor (Graphics Processing Unit, GPU) 22 may be used to process some images and the display 23 may be used to display images. Under such a configuration, the superdivision model according to the embodiments of the present application may be deployed in the graphics processor 22.

In addition, the terminal may further include a decoding module 24, and the decoding module 24 may be used to decode image data acquired from the outside of the terminal.

Fig. 3 is a block diagram of another terminal in an embodiment of the present application, the terminal 20 may include a first processor 21, a neural network processor (Neuralnetworks Process Units, NPU) 25, and a display 23.

The first processor 21 may include a central processing unit (Central Processing Unit, CPU), the first processor 21 may perform overall control on the terminal, the neural network processor 25 is constructed to simulate a biological neural network, the CPU and the GPU need to perform neuronal processing with thousands of instructions, the NPU can perform only one or a few, and the neural network processor 25 has significant advantages in terms of deep learning processing efficiency. The display 23 may be used to display images. With this structure, the superdivision model according to the embodiment of the present application may be deployed in the neural network processor 25.

Fig. 4 is a flowchart of a method for providing super resolution of an image according to an embodiment of the present application, fig. 5 is a schematic structural diagram of a super resolution model in the embodiment shown in fig. 4, please refer to fig. 4 and fig. 5, the method may be applied to the terminal shown in fig. 2 or fig. 3, and the method may include the following steps:

step 401, acquiring an image to be processed.

Step 402, inputting the image to be processed into the superdivision model.

The super-division model comprises a feature extraction module 41, a channel splitting module 42, a plurality of display lookup tables 43 and an up-sampling module 44 which are sequentially connected, wherein the feature extraction module 41 is used for carrying out feature extraction processing on an image to be processed to obtain a plurality of channel feature images, the channel splitting module 42 is used for splitting the plurality of channel feature images and then respectively inputting the plurality of channel feature images into the plurality of display lookup tables 43, the plurality of display lookup tables 43 are used for respectively processing the plurality of channel feature images and outputting the processed plurality of channel feature images, the up-sampling module 44 is used for up-sampling the processed plurality of channel feature images and outputting the super-division model, and the plurality of display lookup tables 43 are obtained by the super-division model through supervised training.

Step 403, obtaining a processed image output by the super-division model, wherein the resolution of the processed image is larger than that of the image to be processed.

In summary, in the super-resolution method for an image provided in the embodiment of the present application, an image to be processed is input into a super-resolution model, where the super-resolution model includes a feature extraction module, a channel splitting module, a plurality of display lookup tables and an up-sampling module that are sequentially connected, the channel splitting module may split a plurality of channel feature maps output by the feature extraction module and then input the plurality of display lookup tables respectively, the plurality of display lookup tables process the plurality of channel feature maps respectively, and output the processed plurality of channel feature maps, so that the up-sampling module processes the processed plurality of channel feature maps and then outputs the super-resolution model, thereby eliminating the need to set a feature reconstruction module including a convolution layer.

Fig. 6 is a flowchart of a method of super resolution of another image according to an embodiment of the present application, and fig. 7 is a flowchart of a method of combining the method of fig. 6 with a terminal. The method can be applied to the terminal shown in fig. 2, please refer to fig. 6 and fig. 7, and the method can include the following steps:

Step 601, acquiring an image to be processed through a first processor.

When the method provided by the embodiment of the application is applied, the image to be processed can be acquired through the first processor in the terminal.

For example, the image data (may be image data in a video stream) may be acquired by a decoding module in the terminal, and the image data is decoded by the decoding module to obtain an image to be processed, and the image to be processed is transmitted to the first processor.

Step 602, importing an image to be processed into a graphics processor from a first processor, and inputting the image to be processed into a super-division model by the graphics processor.

In the embodiment of the application, the hyper-division model is deployed in the graphic processor, so that the image to be processed can be imported into the graphic processor from the first processor, and the image to be processed is imported into the hyper-division model by the graphic processor.

Fig. 8 is a schematic structural diagram of a superdivision model in the embodiment of the present application, where the superdivision model may include a feature extraction module 81, a channel splitting module 82, a plurality of display look-up tables (L1, L2, L3, and L4) and an up sampling module 83 that are sequentially connected, the feature extraction module 81 is configured to perform feature extraction processing on an image to be processed to obtain a plurality of channel feature maps, the feature extraction module 81 may include at least one convolution layer (exemplary, a convolution kernel may be 3×3), the channel splitting module 82 is configured to split the plurality of channel feature maps (the channel feature maps may be related to the number of channels of the image to be processed, the exemplary image to be processed is an RGB three-channel image, and then the channel feature maps are also three-channel feature maps) and input into the plurality of display look-up tables (L1, L2, L3, and L4) respectively, and output the processed plurality of channel feature maps t1, and the up sampling module 83 is configured to sample the plurality of channel feature maps and output the model. The display lookup tables (L1, L2, L3 and L4) are obtained by supervised training of the superdivision model, and the specific training process can refer to the subsequent embodiment. Wherein the splitting module can split the plurality of channel feature graphs into s ² The feature map group s can be the multiple of the height and width of the processed image processed by the super-division model and the height and width of the resolution of the image to be processed, and the number of the display lookup tables is s ² And each.

Illustratively, if the height and width (e.g., 4×4) of the processed image is 2 times the height and width (e.g., 2×2) of the image to be processed, the feature map set may be 4, and the number of the display lookup tables is also 4.

Optionally, the plurality of channel feature maps includes a plurality of feature map groups, each feature map group includes at least one channel feature map, the plurality of display lookup tables (L1, L2, L3, and L4) respectively correspond to the plurality of feature map groups, and the channel splitting module is configured to input the feature map groups into the corresponding display lookup tables. The super-division model shown in fig. 8 is one in which the plurality of channel feature maps include 4 feature map groups, the 4 feature map groups correspond to 4 display look-up tables (L1, L2, L3, and L4), respectively, a first feature map group of the 4 feature map groups includes channel feature maps f1, f2, f3, a second feature map group includes channel feature maps f4, f5, f6, a third feature map group includes channel feature maps f7, f8, f9, and a fourth feature map group includes channel feature maps f10, f11, f12.

In an exemplary embodiment, the plurality of display look-up tables may process the plurality of channel feature maps in parallel, respectively, and output the processed plurality of channel feature maps in parallel. Thus, the image processing speed of the super-division model can be improved.

In an exemplary embodiment, the upsampling module 83 includes a recombination-based upsampling operator 831. The resampling-based upsampling operator 831 is an upsampling operator that does not change parameters in the channel feature map, but only reorganizes the channel feature map, and illustratively, the resampling-based upsampling operator 831 may include PixelShuffle, depth _to_space and a combiner (Muxer Layer, ML) for combining every n×n feature images in the channel feature image input to the combiner into a feature image having a resolution n times that of the feature image of the input signal and outputting the feature image.

In an exemplary embodiment, the upsampling module 83 further includes a cutoff layer 832, where the cutoff layer 832 is located at an input of the recombination-based upsampling operator 831 for limiting the values of the processed plurality of channel feature maps to within a preset range and outputting to the recombination-based upsampling operator 831. The cut-off layer may be used for quantization (e.g., int8 quantization), and illustratively, the cut-off layer may constrain the output of the super-division model, e.g., such that the output maximum does not exceed 255 and the minimum does not exceed 0.

Compared with the structure that the truncated layer is arranged behind the upsampling operator in the related art, in the super-resolution model provided by the embodiment of the invention, the truncated layer 832 is positioned in front of the upsampling operator 831 based on recombination, and then the truncated layer 832 can carry out the truncated processing on a plurality of channel feature graphs in parallel, so that the processing speed of the super-resolution model can be improved, and the processing efficiency of the super-resolution method of the image can be improved. In addition, because the up-sampling operator based on recombination is adopted, the up-sampling operator based on recombination can not change the numerical value of pixels, but only remodel the pixels, and further, the precision loss during model quantization can not be caused when the truncated layer is arranged before the up-sampling operator based on recombination.

Illustratively, the test chip is: gao Tongxiao Dragon 760G for superprocessing:

in one test, the resolution of the image to be processed is 640x360, and the resolution of the processed image is: 1920x1080, the time spent processing with the superscore model shown in fig. 1 is: the time spent processing with the superdivision model shown in fig. 8 in the embodiment of the present application is 37.1 ms: 31.1 milliseconds, the time taken is reduced by 6 milliseconds.

In another test, the time spent processing with the superdivision model shown in fig. 1 is: 59 milliseconds, the time spent processing with the superdivision model shown in fig. 8 in the embodiment of the present application is: 20.1 milliseconds, which would take up to 38.9 milliseconds.

In an exemplary embodiment, the display lookup table may be a three-dimensional display lookup table (3D-LUT), and the three-dimensional display lookup table may correspondingly find the corresponding output value according to RGB (red, green, blue) values of the input image. Compared with a one-dimensional (1D) display lookup table, which can only control single-channel color output, the output of each channel is independent, the output of the three-dimensional display lookup table has correlation with RGB three channels, and the three-dimensional display lookup table has huge capacity, for example, the 64-order lookup table has more than 26 ten thousand color output values (the one-dimensional lookup table has only 765 color output values), so that the output of color brightness is more accurate and real, and the high-capacity lookup table data can also store subjective information such as brightness, color, detail preference and the like, so that the brightness, color correction and reduction processing task of an image can be realized by using the three-dimensional display lookup table.

Because the three-dimensional display lookup table is a large-capacity numerical matrix, and the process of calculating output can be conducted slightly (a tri-linear interpolation formula can be directly deduced), the numerical values in the three-dimensional display lookup table can be obtained in a parameter training mode.

And 603, rendering the processed image output by the super-division model through a graphic processor to obtain a rendered image.

In the super-resolution method of the image, the super-division model is deployed in the graphic processor, so that after the graphic processor obtains the processing image with improved resolution through the super-division model, the graphic processor can continue to render the processing image to obtain a rendered image.

Step 604, the rendered image is input to a display for display.

After the rendered image is acquired, the rendered image may be input to a display for display by a graphics processor. Therefore, the super-resolution processing and displaying of the image to be processed are realized.

In addition, referring to fig. 7, it can be seen that the decoding module decodes, the step of importing the image to be processed into the Graphics Processor (GPU) from the first processor (CPU), the step of processing the channel feature map by the feature extraction module and the channel splitting module, the step of processing the channel feature map by the display lookup table, and the step of rendering the graphics processor may be multithreaded steps of the pipeline, and may be implemented in parallel in the multithreaded queue 1 respectively. Illustratively, when the graphics processor renders the 10 th frame of image data, the display lookup table is processing the 11 th frame of image data, the feature extraction module and the channel splitting module are processing the 12 th frame of image data, the first processor is importing the 13 th frame of image to be processed into the graphics processor, and the decoding module is decoding the 14 th frame of image data, so that the super resolution method of the image processes one frame of image for the time length consumed by the longest part of the 5 parts.

Therefore, the processing efficiency of the super-resolution method provided by the embodiment of the application in super-division processing can be improved in a multi-thread mode. The upsampling module is not shown in fig. 7, but is not limited thereto by the embodiments of the present application, and may be located between the display look-up table and the GPU rendering.

In addition, the plurality of display look-up tables (L1, L2, L3, L4) may process the plurality of channel feature maps in parallel, respectively, in the multithreaded queue 2, and output the processed plurality of channel feature maps in parallel. Thus, the image processing speed of the super-division model can be improved.

The superscore model according to the embodiments of the present application may be obtained through training, and exemplary, fig. 9 is a flowchart of a method for obtaining the superscore model according to the embodiments of the present application, fig. 10 is a training flowchart of the method shown in fig. 9, and referring to fig. 9 and fig. 10, the method may include the following steps:

step 801, obtaining a superscore model to be trained.

The structure of the superdivision model to be trained may be similar to that of the superdivision model in the above embodiment, and for example, the structure of the superdivision model to be trained may refer to the superdivision model shown in fig. 5.

The method provided by the embodiment of the application can be applied to the training device, and the training device can be the terminal related in the embodiment, or can be a device for training, such as a server, and the embodiment of the application is not limited to this.

Step 802, performing multiple times of cyclic training on the superdivision model to be trained.

In the embodiment of the application, the training device can perform multiple-cycle training on the superdivision model to be trained.

Wherein, the one-time cyclic training may include:

1) And inputting a first image in a training sample in the training set into the superscore model to be trained.

The training set includes a plurality of training samples, each training sample includes a first image and a second image, the second image is a true image with a resolution greater than that of the first image, and for example, the second image may be subjected to a degradation process to reduce the resolution of the image, so as to obtain the first image, so that the second image becomes the true image of the first image. Of course, the training samples may be obtained in other manners, which are not limited in this embodiment of the present application.

2) And obtaining a training processing image output by the superdivision model to be trained.

The to-be-trained superdivision model can process the first image and output a processed training processing image, wherein the training processing image is an image with increased resolution after the superdivision processing of the to-be-trained superdivision model.

3) A loss value between the training process image and the second image is acquired.

The training device may obtain a loss value between the training process image and the second image.

In an exemplary embodiment, the loss value is at least one of a first loss value and a second loss value, the first loss value and the second loss value comprising:

wherein Loss1 is a first Loss value, loss2 is a second Loss value, C is the number of channels of the first image, H is the height of the second image, W is the width of the second image, i and j are pixel coordinates, y _i，j，n For the nth channel in the second image, coordinates (i, j)Values of pixels, x _i，j，n For the nth channel in the first image, the value of the pixel with coordinates (i, j), f (x) _i，j，n ) For the n-th channel in the training process image, the value of the pixel with coordinates (i, j) is calculated.

Of course, the loss value may also include other determination manners, which are not limited by the embodiment of the present application.

4) And adjusting the feature extraction module to be trained and a plurality of display lookup tables to be trained based on the loss value.

After the training device obtains the loss value, the feature extraction module to be trained and the display lookup tables (L1 ', L2', L3', L4') to be trained can be adjusted (parameters of the convolution layer in the display lookup tables to be trained and the feature extraction module to be trained are adjusted) based on the loss value.

And 803, stopping the cyclic training in response to the condition that the training cut-off condition is reached after the nth cyclic training in the cyclic training.

The training device may stop cycling training after reaching the training cutoff condition. For example, the training apparatus may determine whether the training cutoff condition is reached after each cycle training, and stop the cycle training after the training cutoff condition is reached.

Optionally, the training cutoff condition includes at least one of a number of times of cyclic training reaching a specified value, and a similarity between the training processing image and the second image reaching the specified value. The similarity between the training process image and the second image may be determined by a variety of parameters, for example, the similarity may include a peak signal-to-noise ratio (Peak Signal to Noise Ratio, PSNR).

Illustratively, the training cutoff condition includes a number of times of cyclic training reaching 400 times, and the training apparatus may stop the cyclic training after 400 times of cyclic training.

And 804, determining a superdivision model based on n times of cyclic training.

After the training device completes n times of cyclic training, the superscore model may be determined based on the parameters acquired by the n times of cyclic training.

In an exemplary embodiment, step 804 may include:

1) And acquiring the image similarity corresponding to the superdivision model to be trained in each cycle training in n times of cycle training, wherein the image similarity is the similarity between the training processing image and the second image.

The training device can acquire the image similarity corresponding to the superscore model to be trained in each cycle training in n times of cycle training, wherein the image similarity is the similarity between the training processing image and the second image. The similarity may be a peak signal to noise ratio.

2) And determining the superdivision model to be trained in the x-th cycle training as the superdivision model.

The training device can determine the superscore model to be trained in the x-th cycle training as the superscore model, and the superscore model to be trained in the x-th cycle training is the superscore model to be trained with the maximum corresponding image similarity in the n-time cycle training. x is less than or equal to n.

For example, n=400, where n=400, the training device may obtain the similarity between the training processed image and the second image in each cycle of the 400 cycles of training, and total 400 similarities, where the maximum similarity is the similarity corresponding to the 350 th cycle of training, and x=350, where the training device may determine the superscore model to be trained for generating the training processed image in the 350 th cycle of training as the superscore model after training is completed.

In an exemplary embodiment, the method for obtaining the superscore model provided in the embodiment of the present application may be performed before step 401 or step 402 in the embodiment shown in fig. 4, or may be performed before step 601 or step 602 in the embodiment shown in fig. 6.

Fig. 11 is a flowchart of a method of super resolution of another image according to an embodiment of the present application, and fig. 12 is a flowchart of a method of combining the method shown in fig. 11 with a terminal, where the method can be applied to the terminal shown in fig. 3, please refer to fig. 11 and fig. 12, and the method can include the following steps:

step 1101, obtaining an image to be processed by a first processor.

Step 1102, the image to be processed is imported into the neural network processor from the first processor, and the neural network processor inputs the image to be processed into the superdivision model.

In the embodiment of the application, the super-division model is deployed in the neural network processor, so that the image to be processed can be imported into the neural network processor from the first processor, the neural network processor inputs the image to be processed into the super-division model, and the super-division model outputs the processed image with increased resolution. The neural network processor has remarkable advantages in the aspect of the processing efficiency of deep learning, so that the calculated amount can be reduced by deploying the super-resolution model in the neural network processor, the running speed of the super-resolution model can be accelerated, and the processing efficiency of the super-resolution method of the image provided by the embodiment of the application can be improved.

The superdivision model may refer to fig. 8, where the superdivision model may include a feature extraction module 81, a channel splitting module 82, a plurality of display look-up tables (L1, L2, L3, and L4), and an up-sampling module 83, which are sequentially connected, the feature extraction module 81 is configured to perform feature extraction processing on an image to be processed to obtain a plurality of channel feature maps, the channel splitting module 82 is configured to split the plurality of channel feature maps, input the plurality of display look-up tables (L1, L2, L3, and L4) respectively, and the plurality of display look-up tables (L1, L2, L3, and L4) are configured to process the plurality of channel feature maps respectively, output the processed plurality of channel feature maps, and the up-sampling module 83 is configured to up-sample the processed plurality of channel feature maps, and output the superdivision model. The display lookup tables (L1, L2, L3 and L4) are obtained by supervised training of the superdivision model, and the specific training process can refer to the subsequent embodiment.

In an exemplary embodiment, the upsampling module 83 includes a recombination-based upsampling operator 831. The resampling-based upsampling operator 831 is an upsampling operator that does not change parameters in the channel feature map, but only reorganizes the channel feature map, and illustratively, the resampling-based upsampling operator 831 may include PixelShuffle, depth _to_space and a combiner (Muxer Layer, ML) for combining every n×n feature images in the channel feature image input to the combiner into a feature image having a resolution n times that of the feature image of the input signal and outputting the feature image. Fig. 13 is a schematic diagram of the processing effect of the compounder according to the application embodiment, where the compounder Mux only recombines the pixels in the multiple channel feature map t1, but does not change the values of the pixels.

In an exemplary embodiment, the upsampling module 83 further includes a cutoff layer 832, where the cutoff layer 832 is located at an input of the recombination-based upsampling operator 831 for limiting the values of the processed plurality of channel feature maps to within a preset range and outputting to the recombination-based upsampling operator 831. Compared with the structure that the truncated layer is arranged behind the upsampling operator in the related art, in the super-resolution model provided by the embodiment of the invention, the truncated layer 832 is positioned in front of the upsampling operator 831 based on recombination, and then the truncated layer 832 can carry out the truncated processing on a plurality of channel feature graphs in parallel, so that the processing speed of the super-resolution model can be improved, and the processing efficiency of the super-resolution method of the image can be improved.

In addition, in the method provided by the embodiment of the application, the calculation amount in the super processing is reduced by arranging the super division model in the neural network processor, arranging the truncated layer in front of the up-sampling operator and other modes, so that the method provided by the embodiment of the application can be conveniently applied to various terminal devices with the neural network processor, such as an intelligent terminal, a tablet computer, a television terminal, a conference all-in-one terminal and the like.

Step 1103, importing the processed image output by the super-division model into a first processor.

After the hyper-model in the neural network processor outputs the processed image, the terminal may import the processed image output by the hyper-model in the neural network processor into the first processor.

Step 1104, controlling a display to display the processed image by the first processor.

The terminal may control the display to display the processed image through the first processor. Therefore, the super-resolution processing and displaying of the image to be processed are realized.

The super-resolution method of the image provided by the embodiment of the application can be applied to various scenes, and the super-resolution model involved in the super-resolution method of the image can be combined and arranged in the terminal in a software or hardware mode.

Fig. 14 is a schematic diagram of an application scenario of the super-resolution method of an image provided in the embodiment of the present application, where a super-resolution model is respectively set in a conference terminal 1 of a conference person 1 and in a conference terminal 2 of a conference person 2, where each of the conference terminal 1 and the conference terminal 2 includes a degradation processing module and the super-resolution model provided in the embodiment of the present application, and each conference terminal processes data (video data or image data) through the degradation processing module when sending the data, so as to reduce resolution and reduce bandwidth occupied by data transmission; when each conference terminal receives data (video data or image data), the data is processed through the super-division model, so that the resolution is increased, the problems of blurring, jaggy and the like of a video picture are avoided, and the user experience is improved.

Therefore, the conference terminal 1 and the conference terminal 2 can realize the effect of the video conference with higher image quality with smaller bandwidth, reduce the possibility of video conference blocking and greatly improve the user experience.

In addition, the super-resolution method of the image provided by the embodiment of the application can be further applied to various terminals (such as a mobile phone, a tablet personal computer, a smart watch, virtual reality equipment or augmented reality equipment) for displaying images and videos, and the super-resolution model can be embedded into a video board card of the terminal, and the terminal can realize a high-resolution image display effect or video display effect based on the super-resolution model after acquiring image data or video data. For example, the resolution of the image acquired by the terminal may be 960×540, and the super-resolution model provided in the embodiment of the present application may improve the resolution of the image to 4K, so as to greatly improve the look and feel of the user.

The following are device embodiments of the present disclosure that may be used to perform method embodiments of the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the method of the present disclosure.

Fig. 15 is a block diagram of a super-resolution apparatus for an image according to an embodiment of the present application, and a super-resolution apparatus 1500 for an image includes:

an image acquisition module 1510 for acquiring an image to be processed;

the image input module 1520 is configured to input an image to be processed into a super-division model, where the super-division model includes a feature extraction module, a channel splitting module, a plurality of display lookup tables and an up-sampling module that are sequentially connected, the feature extraction module is configured to perform feature extraction processing on the image to be processed to obtain a plurality of channel feature graphs, the channel splitting module is configured to split the plurality of channel feature graphs and input the plurality of display lookup tables respectively, the plurality of display lookup tables are configured to process the plurality of channel feature graphs respectively and output the processed plurality of channel feature graphs, the up-sampling module is configured to up-sample the processed plurality of channel feature graphs and output the super-division model, and the plurality of display lookup tables are obtained by supervised training of the super-division model;

The output module 1530 is configured to obtain a processed image output by the super-resolution model, where a resolution of the processed image is greater than a resolution of the image to be processed.

Optionally, the image input module includes:

Optionally, the super-resolution device of the image further includes:

the model acquisition module is used for acquiring a to-be-trained superdivision model, and the to-be-trained superdivision model comprises a to-be-trained feature extraction module, a channel splitting module, a plurality of to-be-trained display lookup tables and an up-sampling module which are connected in sequence;

the stopping module is used for stopping the cyclic training in response to the condition that the nth cyclic training in the cyclic training reaches the training stop condition;

the determining module is used for determining a superdivision model based on n times of cyclic training;

wherein, training module includes:

the input unit is used for inputting a first image in training samples in a training set into a super-score model to be trained, the training set comprises a plurality of training samples, the training samples comprise a first image and a second image, and the second image is a true value image with resolution ratio larger than that of the first image;

Optionally, the loss acquisition unit is configured to:

acquiring a loss value between the training process image and the second image, at least one of a first loss value and a second loss value of the loss value, the first loss value and the second loss value comprising:

wherein Loss1 is a first Loss value, loss2 is a second Loss value, C is the number of channels of the first image, H is the height of the second image, W is the width of the second image, i and j are pixel coordinates, y _i，j，n For the nth channel in the second image, the value of the pixel with coordinates (i, j), f (x) _i，j，n ) For the n-th channel in the training process image, the value of the pixel with coordinates (i, j) is calculated.

Optionally, the determining module is configured to:

acquiring image similarity corresponding to a to-be-trained superdivision model in each cycle training in n times of cycle training, wherein the image similarity is the similarity between a training processing image and a second image;

and determining the superdivision model to be trained in the x-th cycle training as a superdivision model, wherein the superdivision model to be trained in the x-th cycle training is the superdivision model to be trained with the maximum image similarity in the n-time cycle training.

Optionally, the training cutoff condition includes at least one of a number of times of cyclic training reaching a specified value, and a similarity between the training processing image and the second image reaching the specified value.

Optionally, the plurality of channel feature maps comprises a plurality of feature map sets, each feature map set comprising at least one channel feature map;

Optionally, the upsampling module comprises a recombination-based upsampling operator.

Optionally, the upsampling module further includes a truncated layer located at an input of the recombination-based upsampling operator, for limiting values of the processed plurality of channel feature maps to a preset range, and outputting to the recombination-based upsampling operator.

an image input module for:

acquiring an image to be processed through a first processor;

and importing the image to be processed into a neural network processor from the first processor, and inputting the image to be processed into the superdivision model by the neural network processor.

the importing module is used for importing the processed image into a first processor from the neural network processor, and the first processor is used for controlling a display to display the processed image.

Optionally, the method is used for a terminal, the terminal comprises a first processor and a graphic processor, and the superdivision model is located in the graphic processor;

an image input module for:

acquiring an image to be processed through a first processor;

and importing the image to be processed into a graphic processor from the first processor, and inputting the image to be processed into the super-division model by the graphic processor.

In addition, the embodiment of the application also provides a super-resolution device of an image, which includes a processor and a memory, where at least one instruction, at least one section of program, a code set or an instruction set is stored in the memory, and the at least one instruction, the at least one section of program, the code set or the instruction set is loaded and executed by the processor to implement the super-resolution method of the image as described above.

The embodiment of the application further provides a computer storage medium, in which at least one instruction, at least one section of program, a code set or an instruction set is stored, and the at least one instruction, the at least one section of program, the code set or the instruction set is loaded and executed by a processor to implement the super-resolution method of the image as described above.

Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. The computer instructions are read from the computer-readable storage medium by a processor of a computer device, and executed by the processor, cause the computer device to perform the methods provided in the various alternative implementations described above.

The term "at least one of a and B" in this application is merely an association relationship describing an association object, and means that three relationships may exist, for example, at least one of a and B may mean: a exists alone, A and B exist together, and B exists alone. Similarly, "at least one of A, B and C" means that there may be seven relationships, which may be represented: there are seven cases where a alone, B alone, C alone, a and B together, a and C together, C and B together, A, B and C together. Similarly, "at least one of A, B, C and D" means that there may be fifteen relationships, which may be represented: there are fifteen cases where a alone, B alone, C alone, D alone, a and B together, a and C together, a and D together, C and B together, D and B together, C and D together, A, B and C together, A, B and D together, A, C and D together, B, C and D together, A, B, C and D together.

In this application, the terms "first," "second," "third," and "fourth" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The term "plurality" refers to two or more, unless explicitly defined otherwise.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The foregoing description of the preferred embodiments is merely exemplary in nature and is in no way intended to limit the invention, since it is intended that all modifications, equivalents, improvements, etc. that fall within the spirit and scope of the invention.

Claims

1. A method of super resolution of an image, the method comprising:

acquiring an image to be processed;

2. The method of claim 1, wherein processing the plurality of channel feature maps, respectively, and outputting the processed plurality of channel feature maps comprises:

and outputting the processed multiple channel feature maps in parallel.

3. The method of claim 1, wherein prior to said inputting the image to be processed into the superdivision model, the method further comprises:

determining the superdivision model based on n times of the cyclic training;

wherein, the one-time cyclic training includes:

4. A method according to claim 3, wherein said obtaining a loss value between the training process image and the second image comprises:

5. A method according to claim 3, wherein said determining said hyper-score model based on n of said cyclic training comprises:

6. The method of claim 3, wherein the training cutoff condition comprises at least one of a number of cycles of training reaching a specified value, and a similarity between the training process image and the second image reaching a specified value.

7. A method according to claim 3, wherein the plurality of channel feature maps comprises a plurality of feature map sets, each feature map set comprising at least one of the channel feature maps;

8. The method of any of claims 1 to 7, wherein the upsampling module comprises a recombination-based upsampling operator.

9. The method of claim 8, wherein the upsampling module further comprises a cutoff layer at an input of the recombination-based upsampling operator for limiting values of the processed plurality of channel feature maps to within a preset range and outputting to the recombination-based upsampling operator.

10. The method of claim 9, wherein the method is for a terminal comprising a first processor and a neural network processor, the hyper-model being located in the neural network processor;

acquiring the image to be processed through the first processor;

11. The method of claim 10, wherein the terminal further comprises a display, wherein after the obtaining the processed image output by the hyper-model, the method further comprises:

12. The method of claim 9, wherein the method is for a terminal comprising a first processor and a graphics processor, the hyper-model being located in the graphics processor;

acquiring the image to be processed through the first processor;

13. A super-resolution device of an image, characterized in that the super-resolution device of an image comprises a processor and a memory, in which at least one instruction, at least one program, a set of codes or a set of instructions is stored, which is loaded and executed by the processor to implement the song recommendation method according to any one of claims 1 to 11.

14. A non-transitory computer storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions loaded and executed by a processor to implement the super resolution method of an image according to any one of claims 1 to 12.