CN112488916B

CN112488916B - Training method and computer equipment for image super-resolution reconstruction model

Info

Publication number: CN112488916B
Application number: CN201910866170.0A
Authority: CN
Inventors: 王树朋
Original assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Current assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Priority date: 2019-09-12
Filing date: 2019-09-12
Publication date: 2023-12-26
Anticipated expiration: 2039-09-12
Also published as: CN112488916A

Abstract

The application relates to a training method and computer equipment of an image super-resolution reconstruction model, wherein the method comprises the following steps: inputting a first image and a first super-scale in training data into an image super-scale reconstruction model to generate a generated image corresponding to the first image under the first super-scale, wherein the training data comprises a plurality of training image groups, each training image group comprises the first image, a second image and the first super-scale, and the second image is an image corresponding to the first image under the first super-scale; and adjusting model parameters of the image super-resolution reconstruction model according to the second image and the generated image until a preset training condition is met, so as to obtain the trained image super-resolution reconstruction model. In the training process, the numerical value of the first super-division scale is not limited, and can be any numerical value, and the trained image super-division reconstruction model obtained through training of the invention realizes any scale super-division, so that more requirements can be met in practical application.

Description

Training method and computer equipment for image super-resolution reconstruction model

Technical Field

The application relates to the technical field of image processing, in particular to a training method and computer equipment of an image super-resolution reconstruction model.

Background

Image Super-resolution reconstruction is a research direction comparing heat in the fields of computer vision and image processing, and development of deep learning technology promotes progress of image Super-resolution reconstruction methods, such as Super-resolution technologies of EDSR (Enhanced deep Super-resolution network), ESRGAN (Enhanced Super-Resolution Generative Adversarial Networks) and the like, however, the Super-resolution methods have defects: in the image super-resolution reconstruction technology based on deep learning, an up-sampling method for determining a super-division scale, such as bicubic interpolation method and PixelShuffle, is often performed by adopting a mode of fixing an integer multiple scale, for example: 2-fold, 3-fold and 4-fold superdivisions cannot be achieved with non-integer-fold superdivisions of 1.5-fold, 2.5-fold and the like, and in practice, non-integer-fold superdivisions tend to have more applications. The drawbacks of the prior art lead to a great limitation in the application of super-resolution image reconstruction techniques based on deep learning.

Accordingly, the prior art is in need of improvement.

Disclosure of Invention

The invention aims to solve the technical problem of providing a training method and computer equipment for an image super-resolution reconstruction model so as to realize any scale super-resolution.

In one aspect, an embodiment of the present invention provides a training method for an image super-resolution reconstruction model, including:

Inputting a first image and a first super-scale in training data into an image super-scale reconstruction model, and generating a generated image corresponding to the first image under the first super-scale through the image super-scale reconstruction model, wherein the training data comprises a plurality of groups of training image groups, each group of training image groups comprises a first image, a second image and the first super-scale, and the second image is an image corresponding to the first image under the first super-scale;

and adjusting model parameters of the image super-resolution reconstruction model according to the second image corresponding to the first image and the generated image corresponding to the first image, and continuously executing the step of inputting the first image and the first super-resolution scale in the training data into the image super-resolution reconstruction model until a preset training condition is met, so as to obtain the trained image super-resolution reconstruction model.

In a second aspect, an embodiment of the present invention provides an image super-resolution reconstruction method, where the method includes:

acquiring an image to be processed and a second super-scale of the image to be processed;

inputting the image to be processed and the second super-scale into a trained image super-division reconstruction model to obtain a super-division reconstruction image corresponding to the image to be processed, wherein the trained image super-division reconstruction model is an image super-division reconstruction model obtained through training by the training method of the image super-division model.

In a third aspect, an embodiment of the present invention provides a computer device, including a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor performs the steps of:

Compared with the prior art, the embodiment of the invention has the following advantages:

according to the training method provided by the embodiment of the invention, a first image and a first super-scale input image super-scale reconstruction model in training data are subjected to generation of a generated image corresponding to the first image under the first super-scale through the image super-scale reconstruction model, wherein the training data comprise a plurality of training image groups, each training image group comprises a first image, a second image and the first super-scale, and the second image is an image corresponding to the first image under the first super-scale; and adjusting model parameters of the image super-resolution reconstruction model according to the second image corresponding to the first image and the generated image corresponding to the first image, and continuously executing the step of inputting the first image and the first super-resolution scale in the training data into the image super-resolution reconstruction model until a preset training condition is met, so as to obtain the trained image super-resolution reconstruction model. In the method, the coordinates of the first pixel point are calculated through rounding during training, so that coordinate mapping between the first image and the generated image can be found under the super-division scale of any numerical value, and the trained image super-division reconstruction model obtained through training does not limit the numerical value of the super-division scale; according to the method, when training is carried out, the first weight under the first super-division scale is generated, for different first super-division scales, the weight corresponding to each super-division scale can be dynamically generated, the trained image super-division reconstruction model obtained through training can realize that one model outputs super-division results under any scale, and more requirements can be met in practical application.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to the drawings without inventive effort to those skilled in the art.

FIG. 1 is a schematic flow chart of a training method of an image super-resolution reconstruction model in an embodiment of the invention;

FIG. 2 is a schematic diagram of a Bayer array in accordance with an embodiment of the present invention;

FIG. 3 is a schematic diagram of a situation in an embodiment of the present invention, in which the first overscaling is not included in the first position offset, and the situation occurs when a plurality of different first overscaling is trained simultaneously;

FIG. 4 is a schematic diagram illustrating a calculation of a first weight of a first pixel according to an embodiment of the present invention;

FIG. 5 is a schematic flow chart of an image super-resolution reconstruction method in an embodiment of the invention;

FIG. 6 is a schematic diagram of super-resolution effect of different scales according to an embodiment of the present invention;

fig. 7 is an internal structural diagram of a computer device in an embodiment of the present invention.

Detailed Description

In order to make the present invention better understood by those skilled in the art, the following description will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Various non-limiting embodiments of the present invention are described in detail below with reference to the attached drawing figures.

Referring to fig. 1, a training method of an image super-resolution reconstruction model in an embodiment of the present invention is shown. In this embodiment, the method may include, for example, the steps of:

s1, inputting a first image and a first super-scale in training data into an image super-scale reconstruction model, and generating a generated image corresponding to the first image under the first super-scale through the image super-scale reconstruction model, wherein the training data comprises a plurality of training image groups, each training image group comprises a first image, a second image and the first super-scale, and the second image is an image corresponding to the first image under the first super-scale.

In the embodiment of the present invention, the first image may be a three-primary-color image (RGB image), or may be an original image (RAW image) acquired by an image sensor; the image super-resolution reconstruction technology is to amplify a low-resolution image and supplement more image details to obtain a clear high-resolution image, wherein the magnification is a numerical value of super-division scale; the value of the first super-scale is any value, for example, may be a non-integer multiple: 1.5, 2.5, 3.3, etc., may be an integer multiple: 2. 4, 6, etc.; the first image obtains the second image under the first super-division scale, the ratio of the resolution of the second image to the resolution of the first image is equal to the numerical value of the first super-division scale, and compared with the first image, the second image is clearer and can be regarded as a standard image for super-division reconstruction of the image, more image details are added.

In the embodiment of the invention, if the first image is a RAW image, all information including all noise of the image is included in the RAW image, so that preprocessing is needed for the RAW image, that is, noise in the RAW image is to be removed, and noise of the image obtained through preprocessing is less. Preprocessing includes dimension reduction processing and normalization of the image.

Specifically, if the first image is a RAW image, the step S1 further includes:

m1, preprocessing the first image to obtain a preprocessed image, and taking the preprocessed image as the first image.

In the embodiment of the present invention, if the first image is a RAW image, all information including all noise of the image in the RAW image is processed by using image signal processing (Image Signal Processing, ISP) to obtain RGB data in the conventional method, the ISP process includes operations of demosaicing, denoising, color correction, white balance, and the like, and the processing process may result in partial information loss of the image, and the lost information may affect the final super-resolution quality.

Specifically, the preprocessing the first image to obtain a preprocessed image, and taking the preprocessed image as the first image includes:

m11, converting the single-dimensional data of the first image into four-dimensional data;

and M12, carrying out normalization processing on the four-dimensional data to obtain a preprocessed image, and taking the preprocessed image as a first image.

In the embodiment of the invention, the image acquired by the image sensor is in a RAW format, the image sensor can be a charge coupled device (Charge Coupled Device, CCD) sensor or a Complementary Metal Oxide Semiconductor (CMOS) sensor, the data of the color image shot by the image sensor is arranged in a bayer array, and the bayer array is shown in fig. 2 and consists of 8 green, 4 blue and 4 red pixels.

For example, after the acquired single image X in the RAW format, it includes: preprocessing an original image X, wherein the preprocessing comprises the following steps: the dimension of the RAW image data is changed from h×w×1, for example, in fig. 3, h= 8,W =8, the RAW image data is 8×8×1, the dimension is 1, toAfter the dimension has been changed, the RAW image data is 4 x 4, the dimension is 4; then carrying out normalization treatment to obtain a dimension of +.>RAW image mapping to [0,1 ]]Space.

In the embodiment of the invention, a first image and a first super-scale input image in training data are subjected to super-division reconstruction model, and a generated image corresponding to the first image is generated through the image super-division reconstruction model. The image super-resolution reconstruction model comprises: the device comprises a feature extraction module and an up-sampling module, wherein the feature extraction module is used for extracting features of the first image, and the up-sampling module is used for outputting the generated image according to an output result of the feature extraction module.

Specifically, step S1 includes:

s11, inputting the first image into the feature extraction module to obtain a feature image corresponding to the first image.

The feature extraction module is a module necessary for super-division reconstruction, and in the embodiment of the invention, the feature extraction module can be any one of the existing networks, such as a residual module of an EDSR super-division network, an RDB module in an RDN super-division network, a dense connection module in an SRDenseNet, a SE module in a SENet, an acceptance module in an acceptance network, or Convlstm, and the like. The feature extraction module is used for extracting more abstract image features, and the feature images mainly comprise color features, texture features, shape features and spatial relation features of the images, and are irrelevant to specific forms of the feature images, so that the feature extraction scheme of the existing full-roll neural network can be used as the feature extraction module of the case.

For example, selecting a residual module of an EDSR super-division network as a feature extraction module in a super-division reconstruction model in the embodiment of the invention, and inputting the first image into the residual module of the EDSR super-division network to obtain a feature image corresponding to the first image; or selecting an RDB module in the RDN super-division network as a feature extraction module in the embodiment of the invention, and inputting the first image into the RDB module in the RDN super-division network to obtain the feature image.

S12, inputting the characteristic image and the first super-scale into the up-sampling module to obtain a generated image corresponding to the characteristic image.

In the embodiment of the invention, the up-sampling module mainly amplifies the image by an up-sampling method, and the up-sampling method mainly comprises three modes: interpolation, deconvolution and reverse pooling can be realized by adopting deconvolution in the embodiment of the invention, and the deconvolution process is characterized in that the deconvolution process is realized by training and learning parameters of an up-sampling module, and the up-sampling module can obtain a generated image which is closer to a second image after training is finished.

Specifically, the upsampling module includes a first upsampling layer, a second upsampling layer, a third upsampling layer, and an output layer, and step S12 includes:

s121, inputting the characteristic image and the first super-scale to the first upsampling layer to obtain a first position offset and a first local block corresponding to each first pixel point, wherein the first pixel point is a pixel point in the characteristic image.

In the embodiment of the present invention, according to the feature image and the first super-scale, the size of the generated image is obtained through the first upsampling layer, and because the generated image is not actually obtained when this step is performed, the generated image may be referred to as an assumed generated image, the coordinates of each pixel point in the assumed generated image may be obtained through the size of the assumed generated image, and then each pixel point in the feature image, that is, each first pixel point, may be deduced from the coordinates of each pixel point in the assumed generated image, where any one pixel point in the feature image may be referred to as a first pixel point. And through the first up-sampling layer, a first position offset and a first local block corresponding to each first pixel point can be obtained. The position offset generated in the process of pushing out each pixel point in the characteristic image from the assumed coordinates of each pixel point in the generated image is the first position offset amount. And after each first pixel point is obtained, taking each first pixel point as a center, and taking part of pixel points in the characteristic image to form the first local block, wherein the part of pixel points comprise neighborhood pixel points taking the first pixel point as a center in the characteristic image.

Specifically, step S121 includes:

s121a, inputting the feature image and the first super-scale to the first upsampling layer to determine each first assumed pixel point of the generated image.

In the embodiment of the present invention, the resolution of the generated image may be obtained by the first upsampling layer through the resolution of the feature image and the numerical value of the first super-resolution, and the coordinates of each pixel point in the generated image may be known through the resolution of the generated image.

For example, assuming that the resolution of the input feature image X is 200×300, the first super-division scale r1=1.5, the resolution of the generated image Y of the feature image under the first super-division scale may be 300×450, the pixel point of the generated image Y may be assumed to be the first assumed pixel point, the abscissa of each first assumed pixel point of the generated image may take 1 to 300, and the ordinate may take 1 to 450.

S121b, calculating first calculated values respectively corresponding to the first assumed pixel points according to the first assumed pixel points and the first super-division scale.

In the embodiment of the invention, the generated image is an image obtained by the feature image under a first super-scale, and the ratio of the coordinates of each pixel point of the generated image to the coordinates of each pixel point of the feature image is theoretically close to the numerical value of the first super-scale. And dividing the coordinates of each first assumed pixel point by the numerical value of the first super-division scale to obtain first calculated values respectively corresponding to each first assumed pixel point, wherein the first calculated values represent real coordinates of each first pixel point of the feature image which is theoretically obtained by generating the coordinates of each pixel point of the image and the first super-division scale.

For example, for one pixel f1= (i 1, j 1) = (80,400) in each first assumed pixel of the generated image Y, the first superscale r1=1.5, and the abscissa and the ordinate of f1 are divided by r1=1.5 to calculate the first calculated value p1 corresponding to the first assumed pixel f1, that is For one pixel point f2= (i 2, j 2) = (10, 25) in each first assumed pixel point of the generated image Y, similarly, a first calculated value +. >

S121c, performing rounding calculation on the first calculation values respectively corresponding to the first assumed pixel points to obtain first pixel points respectively corresponding to the first assumed pixel points;

in the embodiment of the present invention, since the value of the first super-resolution scale r1 may be any value, the coordinates obtained by directly dividing the coordinates of each first assumed pixel point by r1 cannot be guaranteed to be integers, so that the first calculated value obtained in step S121b cannot be used as the first pixel point corresponding to each first assumed pixel point, and rounding calculation needs to be performed on the first calculated value, alternatively, rounding calculation is performed on the first calculated value downward, and the rounded value is used as the first pixel point corresponding to each first assumed pixel point. In the embodiment of the present invention, by performing rounding calculation on the first calculated value in step S221c, any scale super-division (not limited to integer super-division scale) may be implemented.

For example, for one pixel point f1= (i 1, j 1) = (80,400) of the respective first assumed pixel points of the generated image Y, its corresponding first calculated valuePerforming downward rounding calculation on p1 to obtain a first pixel point +.>For one pixel point f2= (i 2, j 2) = (10, 25) of the respective first assumed pixel points of the generated image Y, the same is true for its corresponding first pixel point +. >Calculating a first calculated value corresponding to each first assumed pixel point of the generated image Y, and performing rounding calculation on the first calculated value to obtain each first pixel point of the generated image corresponding to each first assumed pixel point respectively, namely, finding out the position relation mapping between each pixel point of the feature image and each pixel point in the generated image, wherein when the first super-resolution scale is larger than 1, the number of the pixel points of the generated image is larger than that of the pixel points of the feature image, namely, the first pixel points of a plurality of the first assumed pixel points of the generated image corresponding to one feature image appear.

S121d, calculating a difference value between the first pixel point and a first calculated value corresponding to the first pixel point for each first pixel point to obtain first position offset corresponding to each first pixel point.

In the embodiment of the present invention, in step S221c, a rounding calculation is performed on the first calculated value, where the first assumed pixel point directly corresponds to the first calculated value through the first super-division scale, and a position offset, called a first position offset, occurs between the first calculated value and the first pixel point of the feature image obtained by rounding the first calculated value.

For example, one of the first pixels of the feature image The corresponding first calculated value is: />From p1 and q1, a first positional offset of q1 can be obtained +.>

Alternatively, considering the effect of the first super-scale, it may be: and aiming at each first pixel point, obtaining a first position offset corresponding to each first pixel point according to the first pixel point, a first calculated value corresponding to the first pixel point and the first super-division scale.

In the embodiment of the present invention, the first position offset may further include a first overscaling scale in addition to the difference between the first calculated value and the first pixel point, so that the first position offset may be used to distinguish the first position offset of different overscaling scales when training multiple different first overscaling scales simultaneously, and further distinguish the first weight value of different overscaling scales and the pixel value of the pixel point of the generated image obtained under different overscaling scales.

If the first position offset does not include the first overscaling, when training multiple different first overscaling at the same time, the following situations may occur:

referring to fig. 3, i is a first image, ii is a generated image corresponding to the first image when the first super-division is 2, and iii is a generated image corresponding to the first image when the first super-division scale is 4; for the first pixel point (i ', j') in the I, the pixel point (i, j) in the generated image II obtained when the super-division scale is 2, and the first pixel point (i ', j') in the I, the pixel point (2 i,2 j) in the generated image III obtained when the super-division scale is 4, if the first super-division scale is not included in the first position offset when a plurality of different first super-division scales are trained simultaneously, the first weight of the (i ', j') when the first super-division scale is 2 is the same as the first weight of the (i ', j') when the first super-division scale is 4, and then the pixel value of the pixel point (i, j) in the generated image II is the same as the pixel value of the pixel point (2 i,2 j) in the generated image III; similarly, for the first pixel point (i '+1, j' +1), (i '+1, j' +1) in the i, the first weight of the first superscale is the same as the first weight of the first superscale of (i '+1, j' +1) in the first superscale of 4, and thus the pixel value of the pixel point (i, j) in the generated image ii is the same as the pixel value of the pixel point (2 i,2 j) in the generated image iii; in this way, the pixel value of each pixel point in the generated image ii is the same as the pixel value of a part of the pixels points in the generated image iii, which can be understood as the sub-image of the generated image iii generated by the generated image ii, so that the super-resolution reconstruction effect is limited.

Therefore, the first position offset includes the difference between the first calculated value and the first pixel point and the first overscaling scale, so that the first position offset is used for distinguishing the first position offset of different overscaling scales when a plurality of different first overscaling scales are trained simultaneously, and further distinguishing the first weight value under different overscaling scales and the pixel value of the pixel point of the generated image obtained under different overscaling scales.

For example, when the first pixel point f1= (i 1, j 1) = (80,400), r1=3,the corresponding first calculated value is: /> From p1, q1 and r1, the first positional offset of q1 can be obtained +.> A first pixel point f1= (i 2, j 2) = (40,200) in the first image, r1' =1.5, and +.>Its corresponding first calculated value +.>From p1', q1' and r1', a first positional offset of q1' can be obtained +.>It can be seen that V1 and V1' are different; further, the first pixel point f1= (i 1, j 1) = (80,400) is obtained when r1=3Is different from the first weight obtained when r1' =1.5 for the first pixel point f1= (i 2, j 2) = (40,200); further, the pixel value of the pixel point of the generated image corresponding to the first pixel point f1= (i 1, j 1) = (80,400) when r1=3 is different from the pixel value of the pixel point of the generated image corresponding to the first pixel point f1= (i 2, j 2) = (40,200) when r1' =1.5.

S121e, extracting first local blocks corresponding to the first pixel points in the characteristic image according to the sizes of the convolution kernels of the first pixel points and the first upsampling layer.

In the embodiment of the present invention, a k×k matrix is obtained through a convolution kernel in the first upsampling layer, each first pixel point is sequentially used as a center point of the k×k matrix, and a region corresponding to the k×k matrix with the center point being the first pixel point may be used as a first local block corresponding to the first pixel point, for example, a region corresponding to the k×k matrix with the center point being the first pixel point q2 may be used as a first local block corresponding to the first pixel point q2, and a region corresponding to the k×k matrix with the center point being the first pixel point q3 may be used as a first local block corresponding to the first pixel point q 3; the first local block comprises a first pixel point and neighborhood pixel points of the first pixel point, k is the number of convolution kernels of a first up-sampling layer, and the size of the first local block is k.

For example, in the feature image X, a first local block h '1 with a size of k×k is taken around q1= (53,260), where k is the size of the convolution kernel of the first upsampling layer, and assuming that k=3, the first local block h'1 is composed of 9 pixels in the feature image X.

In the embodiment of the present invention, when the first pixel point is an edge pixel point of the feature image, a complete area corresponding to a k×k matrix with the first pixel point as a center point cannot be obtained in the feature image, and at this time, zero padding operation is required, that is, a neighborhood pixel point of the first pixel point lacking in the first local block is filled with a pixel value of "0", so that after zero padding, a part of the pixel points in the first local block corresponding to the first pixel point at the edge of the feature image are part of the first pixel points in the feature image, and another part of the pixel points are filling pixel points with a filled pixel value of 0.

Illustrating: if the first pixel point coordinate q4= (1, 1) at one edge of the feature image X, k=3, in the feature image X, the first partial block h '2 with the size of 3*3 is taken as the center, the first partial block h'2 includes the first pixel point q4= (1, 1), the first pixel point q5= (1, 2), the first pixel point q6= (2, 1), the first pixel point q7= (2, 2), and the remaining five filling pixel points with the pixel value of 0.

S122, inputting the first position offset and the first local block corresponding to each first pixel point to the second upsampling layer to obtain the first weight corresponding to each first pixel point.

In the embodiment of the present invention, the first weights corresponding to the first pixel points are obtained according to the first position offset and the first local block of each first pixel point, so that the process of obtaining the first weights through the second upsampling layer may also be referred to as a bilateral upsampling process, where the first weights are related to not only the first position offset but also the first local block.

In the embodiment of the invention, the first position offset and the first local block of each first pixel point are input into the second upsampling layer in a linear combination mode, and the first weight is obtained through the transformation of a nonlinear activation function. The first position offset and the first local block are input into the second upsampling layer to obtain a first weight, the first position offset and the first local block are output results of the first upsampling layer, the first weight represents the link strength between the neurons in the second upsampling layer and the neurons in the first upsampling layer, the larger the first weight is, the larger the link strength between the neurons is, and the smaller the first weight is, the smaller the link strength between the neurons is.

Specifically, the second upsampling layer includes a first convolution layer and a second convolution layer, and step 1222 includes:

S122a, inputting the first position offset into the first convolution layer to obtain first position components corresponding to the first pixel points respectively.

In the embodiment of the invention, a first position offset is input into a first convolution layer, and a first position component is obtained after nonlinear transformation is carried out on the first position offset. The first position component reflects the first position offset of the first pixel point, and the larger the first position component is, the larger the first position offset is, and conversely, the smaller the first position component is, the smaller the first position offset is.

For example, referring to fig. 3, for one pixel q1 of each first pixel, the first position offset V1 is input to the first convolution layer to obtain a first position component point'1 corresponding to q1, and similarly, for each first pixel q1, … …, qn, a first position component corresponding to each first pixel may be obtained: point '1, … …, point' n, where n is the number of first pixels in the generated image.

S122b, inputting the first local block into the second convolution layer to obtain first data components corresponding to the first pixel points respectively.

In the embodiment of the invention, a first local block is input into a second convolution layer, and a first data component is obtained after nonlinear transformation is carried out on the first local block. The first data component reflects the characteristic information of a first local block of each pixel point in the generated image mapped to each pixel point in the characteristic image, and the characteristic information of the first local block comprises the color characteristic, the texture characteristic shape characteristic and the spatial relationship characteristic of each pixel point in the first local block.

For example, referring to fig. 4, for one pixel q1 of each first pixel, a first partial block h '1 is input to the second convolution layer to obtain a first data component data'1 corresponding to q1, and for each first pixel q1, … …, qn, the first data component corresponding to each first pixel may be obtained: data '1, … …, data' n, where n is the number of first pixels in the generated image.

S122c, obtaining first weights corresponding to the first pixel points respectively according to the first position components and the first data components.

In the embodiment of the invention, the first weight corresponding to each first pixel point can be obtained according to the first position component output by the first convolution layer and the first data component output by the second convolution layer; and summing the first position component and the first data component according to the weight of the first position component to obtain first weight values corresponding to the first pixel points respectively.

For example, for one pixel q1 of the first pixels, the first position component thereof is: the first data component of point '1 is data'1, and the first weight W1 of q1 can be obtained by multiplying point '1 by data'1, and similarly, for each first pixel q1, … …, qn, the first weight W '1, … … W' n corresponding to each first pixel can be obtained, where n is the number of first pixels in the generated image. For one pixel point q1 in the first pixel points, the first position component is: point '1, the first data component is data '1, or the first weight W '1 is obtained by summing the weights of the first position component and the first data component, which is recorded as: w '1 = αpoint '1+ βdata '1, where α is a parameter of the first convolution layer, β is a parameter of the second convolution layer, point '1 is a first position component of q1, and data '1 is a first data component of q 1.

In the embodiment of the present invention, for the first image, if the values of the first overdriving scales are different, in step S122, different first weights are obtained corresponding to the different first overdriving scales, that is, for each overdriving scale, the model provided in the embodiment of the present invention may dynamically generate a corresponding first weight, so that a set of models may output results under different overdriving scales.

S123, inputting the first local block and the first weight to the third upsampling layer to obtain pixel values respectively corresponding to second pixel points, wherein the second pixel points are pixel points in the generated image.

In the embodiment of the invention, the second pixel point t1= (i 1, j 1) is the same as the first assumed pixel point f1= (i 1, j 1) in terms of coordinate values, but different in meaning, and the first assumed pixel point is a pixel point of a generated image assumed by a feature image and a first super-division scale, and the generated image is not obtained when the first assumed pixel point is obtained; the second pixel points are the pixel points of the generated image output by the third upsampling layer according to the weight and the local block of the first pixel points of the characteristic image, the generated image can be obtained according to the pixel values corresponding to the second pixel points, and each pixel point in the generated image is called as a second pixel point.

For example, for a first pixel q1= (i 1', j 1') of the feature image X, a corresponding second pixel t1= (i 1, j 1) of the generated image Y; for the pixel value of the second pixel point t1= (I1, j 1) of the generated image Y, it may be considered that the third upsampling layer is jointly determined by the feature value of the local block h '1 corresponding to the feature image pixel point q1= (I1 ', j1 '), and the first weight W1 corresponding to q1, and may be considered as a mapping function of the first local block h1 of the feature image X to the pixel value I of the generated image Y, as shown in formula (1):

I ^Y (i,j)＝Φ(s ^x (h′),W′(i′,j′)) (1)

Wherein Φ represents the mapping between the characteristic value S of the first local block h 'and the pixel value of each second pixel point of the generated image, (h') represents the characteristic value of the first local block h 'of the pixel point (I', j ') in the characteristic image X, W' (I ', j') is the first weight value of the pixel point (I ', j') in the characteristic image X, I ^Y (i, j) generating a pixel value for a second pixel point in the image having coordinates (i 1, j 1).

S124, inputting the pixel values corresponding to the second pixel points to the output layer to obtain the generated image.

In the embodiment of the invention, the output layer is used for converting the output result of the third sampling layer into common RGB format data, namely, respectively corresponding pixel values of each second pixel point are converted into common RGB format data, specifically, scaling is firstly carried out, respectively corresponding pixel values of each second pixel point are multiplied by 255, the final output is ensured to have the same range of RGB data, data exceeding the range of [0,255] possibly exists after scaling, data interception is needed to be carried out on the scaled result, data smaller than 0 is set as 0, and data larger than 255 is set as 255.

S2, according to the second image corresponding to the first image and the generated image corresponding to the first image, adjusting model parameters of the image super-resolution reconstruction model, and continuing to execute the step of inputting the first image and the first super-resolution scale in the training data into the image super-resolution reconstruction model until preset training conditions are met, so that the trained image super-resolution reconstruction model is obtained.

In the embodiment of the invention, the second image is an image of the first image under the super-division scale and is equivalent to a standard image obtained by image super-division reconstruction processing, the generated image is an image obtained by inputting the first image into an image super-division reconstruction model, the generated image is obtained by comparing the image super-division reconstruction model with the second image to obtain a difference value, the model parameters of the image super-division reconstruction model are adjusted by using the obtained difference value, and the model parameters of the image super-division reconstruction model are modified, so that the generated image obtained by the image super-division reconstruction model is more similar to the second image.

Optionally, in step S2, adjusting model parameters of the image super-resolution reconstruction model according to the second image and the generated image, including:

s21a, calculating a first loss value according to the second image and the generated image;

in the embodiment of the invention, the second image is an image of the first image subjected to the first super-division scale transformation and can be regarded as a standard answer, the first image is obtained by a super-division reconstruction model to generate an image, and the pixel value of each pixel point of the generated image is compared with the pixel value of each pixel point of the second image to obtain a first Loss value Loss1.

S22a, adjusting parameters of the image super-resolution reconstruction model according to the first loss value.

In the embodiment of the invention, the parameter beta 1 of the image super-resolution reconstruction model is modified by back propagation of the first Loss value Loss1, and the modified parameter beta 2 is obtained.

In the embodiment of the invention, after modifying the parameters, the step of super-dividing the first image and the first super-dividing scale input image in the training data into the super-dividing reconstruction model is continuously executed until a preset training condition is met, wherein the preset training condition comprises that the first loss value meets a preset requirement or the training times reaches a preset time. The preset requirement may be determined according to a super-division reconstruction model trained by the image, and not described in detail herein, the preset number may be a maximum training number of the super-division reconstruction model of the image, for example, 50000 times, etc. The method comprises the steps of outputting a generated image on a graph super-resolution reconstruction model, calculating a first loss value according to the generated image and a second image, judging whether the first loss value meets a preset requirement after the first loss value is calculated, ending training if the first loss value meets the preset requirement, judging whether the training times of the image super-resolution reconstruction model reach the training times if the first loss value does not meet the preset requirement, adjusting parameters of the image super-resolution reconstruction model according to the first loss value if the first loss value does not reach the preset times, ending training if the first loss value reaches the preset times, judging whether the training of the image super-resolution reconstruction model is ended through the loss function value and the training times, and avoiding the situation that the image super-resolution reconstruction model enters a dead cycle because the loss function value cannot reach the preset requirement.

Further, since the modification of the parameters of the image super-resolution reconstruction model is performed when the training condition of the image super-resolution reconstruction model does not meet the preset condition (for example, the first loss value does not meet the preset requirement and the training frequency does not reach the preset frequency), after the parameters of the image super-resolution reconstruction model are modified according to the first loss value, the image super-resolution reconstruction model needs to be continuously trained, that is, the step of inputting the first image and the first super-resolution scale in the training data into the image super-resolution reconstruction model is continuously performed. The first image and the first super-scale in the super-division reconstruction model of the first image and the first super-scale input image in the training image set are continuously executed, and the first image and the first super-scale in the super-division reconstruction model of the first image and the first super-scale input image in the training image set are not input. For example, all the first images in the training data have unique image identifiers (e.g., image numbers) with different values of the first super-division scale (r1=1, r2= … …), the image identifiers of the first images of the first training input image super-division reconstruction model are different from the image identifiers of the first images of the second training input preset network model, for example, the image numbers of the first images of the first training input image super-division reconstruction model are 1, the first super-division scale is r1, the image numbers of the first images of the second training input image super-division reconstruction model are 2, the first super-division scale is r2, the image numbers of the first images of the nth training input image super-division reconstruction model are N, and the first super-division scale is rN. Of course, in practical application, because the number of the first images in the training data is limited, in order to improve the training effect of the image super-resolution reconstruction model, the first images in the training data can be sequentially input to the image super-resolution reconstruction model to train the image super-resolution reconstruction model, and after all the first images in the training data and the first super-resolution scales corresponding to all the first images respectively are input to the image super-resolution reconstruction model, the operation of sequentially inputting the first images in the training data and the first super-resolution scales corresponding to all the first images respectively to the image super-resolution reconstruction model can be continuously executed, so that the training image group in the training data is circularly input to the image super-resolution reconstruction model. In the process of inputting the first images into the image super-resolution reconstruction model for training, the first images may be input according to the image number sequence of each first image, or may not be input according to the image number sequence of each first image, and of course, the same first image may be used repeatedly to train the preset network model, or the same first image may not be used repeatedly to train the preset network model.

Optionally, in step S2, according to the second image and the generated image, the adjusting the model parameters of the image super-resolution reconstruction model may further include:

s21b, calculating a first loss value according to the second image and the generated image.

In the embodiment of the present invention, step S21a is the same as step S21b, and the pixel value of each pixel of the generated image is compared with the pixel value of each pixel of the second image to obtain a first Loss value Loss1.

S22b, inputting the generated image and the second image into a discriminator network, obtaining the probability of the generated image, and calculating a second loss value according to the probability.

In this embodiment of the present invention, optionally, the image super-resolution reconstruction model is trained by adopting a manner of generating a countermeasure, the image super-resolution reconstruction model is used as a generator in the process of generating the countermeasure training, so as to obtain a generated image, the generated image and the second image are input into a discriminator network, the discriminator network is enabled to distinguish which of the generated image and the second image is a real image of the first image under the first super-resolution scale, and a second loss value is calculated according to a probability that the generated image output by the discriminator is the real image.

S23b, adjusting parameters of the image super-resolution reconstruction model according to the first loss value and the second loss value.

In the embodiment of the invention, the image super-resolution reconstruction model is trained in a manner of generating countermeasure, the first loss value is equivalent to a loss value generated for a generator in training, the second loss value is equivalent to a loss value generated for a discriminator in training, and parameters of the image super-resolution reconstruction model are adjusted according to the first loss value and the second loss value. Through repeated iterative training, the generated image output by the image super-resolution reconstruction model is more similar to the second image, the discriminator cannot identify true and false images, the probability output by the discriminator is close to 0.5, the probability is close to 0.5, and the discriminator cannot distinguish true and false, and only random guessing is possible.

In the embodiment of the invention, the step of super-dividing the first image and the first super-dividing scale input image in the training data into the super-dividing reconstruction model is continuously executed after the parameters are modified until the preset training condition is met, wherein the preset training condition comprises that the first loss value meets the preset requirement or the training times reaches the preset times. The preset requirement may be determined according to a super-division reconstruction model trained by the image, and not described in detail herein, the preset number may be a maximum training number of the super-division reconstruction model of the image, for example, 50000 times, etc. The preset training condition can also be that the probability of the output of the discriminator is close to 0.5, which means that the generated image obtained by the super-resolution reconstruction model is very close to the second image.

The embodiment of the invention also provides an image super-resolution reconstruction method, referring to fig. 5, which shows an image super-resolution reconstruction method, the image super-resolution reconstruction method may include the following steps:

and K1, acquiring an image to be processed and a second super-scale of the image to be processed.

In the embodiment of the present invention, the image to be processed may be an RGB image, or may be an original image (RAW image) acquired by an image sensor, and the value of the second super-scale may be any value, for example, may be a non-integer multiple, for example, 1.5, 2.5, 3.3, or may be an integer multiple: 2. 4, 6, etc.

If the image to be processed is a RAW image, all information including all noise of the image in the RAW image needs to be preprocessed, the noise in the RAW image needs to be removed in preprocessing, and the noise of the image which can be obtained after preprocessing is less. Preprocessing includes dimension reduction processing and normalization of the image.

Specifically, when the image to be processed is a RAW image, after step K1, it includes: preprocessing the image to be processed to obtain a preprocessed image, and taking the preprocessed image as the image to be processed.

Specifically, the preprocessing the image to be processed to obtain a preprocessed image, and taking the preprocessed image as the image to be processed includes:

k11, converting the single-dimensional data of the image to be processed into four-dimensional data;

and K12, carrying out normalization processing on the four-dimensional data to obtain a preprocessed image, and taking the preprocessed image as an image to be processed.

In the embodiment of the invention, the image acquired by the image sensor is in a RAW format, the image sensor can be a CCD sensor or a CMOS sensor, and the data of the color image shot by the image sensor is arranged in a Bayer array.

And K2, inputting the image to be processed and the second super-division scale into a trained image super-division reconstruction model to obtain a super-division reconstruction image corresponding to the image to be processed, wherein the trained image super-division reconstruction model is an image super-division reconstruction model obtained by the training method of the image super-division reconstruction model.

In the embodiment of the invention, the super-division reconstructed image of the image to be processed under the second super-division scale is obtained through the trained image super-division reconstruction model. The image super-resolution reconstruction model comprises: the device comprises a feature extraction module and an up-sampling module, wherein the feature extraction module is used for extracting features of the image to be processed, and the up-sampling module is used for outputting the super-resolution reconstructed image according to an output result of the feature extraction module.

Specifically, step K2 includes:

and K21, inputting the image to be processed into a feature extraction module to obtain the feature image to be processed.

In the embodiment of the invention, the feature extraction module can be any one of the existing networks, such as a residual module of an EDSR super-division network, an RDB module in an RDN super-division network, a dense connection module in an SRDenseNet, an SE module in a SENet, an acceptance module in an acceptance network, or Convlstm, and the like.

For example, a residual module of the EDSR super-division network may be selected as the feature extraction module in the super-division reconstruction model in the embodiment of the present invention, and the image to be processed is input into the residual module of the EDSR super-division network, so as to obtain a feature image to be processed corresponding to the image to be processed.

And K22, inputting the feature image to be processed and the second super-division scale into an up-sampling module to obtain a super-division reconstructed image.

Specifically, the upsampling module includes a first upsampling layer, a second upsampling layer, a third upsampling layer, and an output layer, and step K22 includes:

and K221, inputting the feature image to be processed and the second super-scale to the first upsampling layer to obtain second position offset and second local blocks respectively corresponding to third pixel points, wherein the third pixel points are pixel points in the feature image to be processed.

According to the embodiment of the invention, according to the feature image to be processed and the second super-division scale, the size of the super-division reconstructed image is obtained through the first up-sampling layer, and because the super-division reconstructed image is not really obtained in the step, the super-division reconstructed image can be called as an assumed super-division reconstructed image, the coordinates of each pixel point in the assumed super-division reconstructed image can be obtained through the size of the assumed super-division reconstructed image, and then each pixel point in the feature image to be processed, namely each third pixel point, can be deduced from the coordinates of each pixel point in the assumed super-division reconstructed image, and any pixel point in the feature image to be processed can be called as a first pixel point. And the second position offset and the second local block corresponding to each third pixel point can be obtained through the first upsampling layer. And deducing the position offset generated in the process of each pixel point in the feature image to be processed from the coordinates of each pixel point in the assumed super-resolution reconstructed image, wherein the generated position offset is the second position offset. And after each third pixel point is obtained by the second local block, taking part of pixel points in the feature image to be processed as a center by taking each third pixel point, and forming the third local block, wherein part of pixels comprise neighborhood pixel points taking the third pixel point as a center in the feature image to be processed.

Specifically, step K221 includes:

and K221a, inputting the feature image to be processed and the second super-division scale to the first upsampling layer to determine a second suppositional pixel point of the super-division reconstructed image.

In the embodiment of the present invention, the first upsampling layer may obtain the resolution of the super-resolution reconstructed image by using the resolution of the feature image to be processed and the numerical value of the second super-resolution, and the coordinates of each pixel point in the super-resolution reconstructed image may be known by using the resolution of the super-resolution reconstructed image.

For example, assuming that the resolution of the input feature image a is 500×500, the second super-division scale r2=2.2, the resolution of the super-division reconstructed image B may be 1100×1100, and the pixel point of the super-division reconstructed image B may be assumed to be a second assumed pixel point, where an abscissa of the second assumed pixel point may take 1 to 1100 and an ordinate of the second assumed pixel point may take 1 to 1100.

And K221b, calculating a second calculated value corresponding to the second assumed pixel point according to the second assumed pixel point and the second super-division scale.

In the embodiment of the invention, the super-resolution reconstructed image is an image obtained by the feature image to be processed under a second super-resolution, and the ratio of the coordinates of each pixel point of the super-resolution reconstructed image to the coordinates of each pixel point of the feature image to be processed is theoretically close to the numerical value of the second super-resolution. And dividing the coordinates of each second assumed pixel point by the numerical value of a second super-division scale to obtain second calculated values respectively corresponding to each second assumed pixel point, wherein the second calculated values represent the real coordinates of each third pixel point of the feature image to be processed, which are theoretically obtained through the coordinates of each pixel point of the super-division reconstructed image and the second super-division scale.

For example, for one pixel point a1= (i 1, j 1) = (700,200) in each second assumed pixel point of the super-divided reconstructed image B, the second super-division scale r2=2.2, and the abscissa and the ordinate of a1 are divided by r2, so as to calculate a second calculated value B1 corresponding to the second assumed pixel point a1, that is

And K221c, performing rounding calculation on the second calculated value to obtain a third pixel point corresponding to the second assumed pixel point.

In the embodiment of the present invention, since the value of the second super-resolution scale r2 may be any value, and each second assumed pixel point is directly divided by r2, and cannot be guaranteed to be an integer, a rounding calculation needs to be performed on the second calculated value, and optionally, a downward rounding calculation is performed on the second calculated value, and the rounded value is used as a third pixel point corresponding to each second assumed pixel point.

For example, for one pixel point a1= (i 1, j 1) = (700,200) of the respective second assumed pixel points of the super-divided reconstructed image B, the corresponding second calculated value b1=(318.18,90.91) performing downward rounding calculation on b1 to obtain a third pixel point corresponding to a1

And K221d, calculating the difference value between the third pixel points and the second calculated value to obtain second position offset corresponding to each third pixel point.

In the embodiment of the present invention, in step K221c, a rounding calculation is performed on the second calculated value, and a position offset, called a second position offset, appears between the second calculated value directly corresponding to the second assumed pixel point through the second super-division scale and the third pixel point of the feature image obtained by rounding the second calculated value.

For example, for one pixel point c1= (318,90) in the second pixel points of the feature image to be processed, the corresponding second calculated value is: b1 = (318.18,90.91), a second position offset of c1 can be obtained from b1 and c1

Alternatively, considering the influence of the second super-scale, it may be: and aiming at each third pixel point, obtaining second position offset corresponding to each third pixel point according to the second pixel point, a second calculated value corresponding to the second pixel point and the second super-division scale.

In the embodiment of the present invention, the second position offset may further include a second overscaling, in addition to the difference between the second calculated value and the second pixel point, so that the second position offset may be used to distinguish the second position offset of different overscaling when training multiple different second overscaling simultaneously, and further distinguish the second weight value of different overscaling and the pixel value of the pixel point of the overscaling reconstructed image obtained under different overscaling.

For example, when a1= (i 1, j 1) = (700,200), the second superscaler 2 = 2.2, then c1= (318,90),the corresponding second calculated value is: b1 = (318.18,90.91), from b1, c1 and r2, a second amount of displacement of c1 can be obtained

And K221e, extracting second local blocks corresponding to the third pixel points respectively from the feature image to be processed according to the third pixel points and the convolution kernel in the first upsampling layer.

In the embodiment of the present invention, a k×k matrix is obtained through a convolution kernel in the first upsampling layer, each third pixel point is sequentially used as a center point of the k×k matrix, and a region corresponding to the k×k matrix whose center point is the third pixel point may be used as a second local block corresponding to the third pixel point, for example, a region corresponding to the k×k matrix whose center point is the third pixel point c2 may be used as a second local block corresponding to the third pixel point c2, and a region corresponding to the k×k matrix whose center point is the third pixel point c3 may be used as a second local block corresponding to the third pixel point c 3; the second local block comprises a third pixel point and neighborhood pixel points of the third pixel point, k is the number of convolution kernels of the first up-sampling layer, and the size of the second local block is k.

For example, in the feature image a to be processed, a second local block h "1 with a size of k×k is taken around c1, where k is the size of the convolution kernel of the first upsampling layer, and assuming that k=3, the second local block h"1 is composed of 9 pixels in the feature image a to be processed.

In the embodiment of the present invention, when the third pixel point is an edge pixel point of the feature image to be processed, a complete area corresponding to a k×k matrix with the third pixel point as a center point cannot be obtained in the feature image to be processed, and at this time, zero padding operation is required, that is, positions of neighbor pixel points of the third pixel point missing in the second local block are padded with a pixel value of "0", so that after zero padding, part of pixel points in the second local block corresponding to the third pixel point at the edge of the feature image to be processed are part of the third pixel points in the feature image to be processed, and the other part of pixel points are padded with a pixel value of 0.

Illustrating: if the coordinates c4= (1, 1) of a third pixel point of the feature image a to be processed, k=3, in the feature image a to be processed, taking the c4= (1, 1) as the center, taking a second local block h "2 with a size of 3*3, wherein the second local block h"2 includes the third pixel point c4= (1, 1), the third pixel point c5= (1, 2), the third pixel point c6= (2, 1), the third pixel point c7= (2, 2), and the remaining five filling pixel points with a pixel value of 0.

And K222, inputting the second position offset and the second partial block corresponding to each third pixel point to the second upsampling layer to obtain the second weight corresponding to each third pixel point.

In the embodiment of the present invention, the second weights corresponding to the third pixel points are obtained according to the second position offset and the second local block of the third pixel points, so that the process of obtaining the second weights through the second upsampling layer may also be referred to as a bilateral upsampling process, where the second weights are not only related to the second position offset but also related to the second local block.

In the embodiment of the present invention, the second position offset and the second local block of each third pixel point are input into the second upsampling layer in a linear combination mode, and the second weight is obtained through transformation of a nonlinear activation function. The second position offset and the second local block are input into the second upsampling layer to obtain a second weight, the second position offset and the second local block are output results of the first upsampling layer, the second weight represents the link strength between the neurons in the second upsampling layer and the neurons in the first upsampling layer, the larger the second weight represents the link strength between the neurons, and the smaller the second weight represents the link strength between the neurons.

Specifically, the second upsampling layer includes a first convolution layer and a second convolution layer, and step K222 includes:

And K222a, inputting the second position offset into the first convolution layer to obtain second position components corresponding to the third pixel points respectively.

In the embodiment of the invention, the second position offset is input into the first convolution layer, and the second position offset is obtained after nonlinear transformation, wherein the second position component reflects the second position offset of the third pixel point, and the larger the second position component is, the larger the second position offset is, otherwise, the smaller the second position component is, the smaller the second position offset is.

For example, for one pixel point c1 in each third pixel point, the second position offset V2 is input to the first convolution layer to obtain a second position component point "1 corresponding to q1, and similarly, for each third pixel point c"1, … …, c "n, the second position component corresponding to each third pixel point may be obtained: points "1, … …, points" n, n are the number of pixels of the super-resolution reconstructed image.

And K222b, inputting the second local block into the second convolution layer to obtain second data components corresponding to the third pixel points respectively.

In the embodiment of the invention, a second local block is input into a second convolution layer, and after nonlinear transformation is performed on the second local block, a second data component is obtained, wherein the second data component reflects the characteristic information of the second local block, in which each pixel point in the super-resolution reconstructed image is mapped to each pixel point in the feature image to be processed, and the characteristic information of the second local block comprises the color characteristic, the texture characteristic shape characteristic and the spatial relationship characteristic of each pixel point in the second local block.

For example, for one pixel point c1 in each third pixel point, the second partial block h "1 is input to the second convolution layer to obtain a second data component data"1 corresponding to c1, and similarly, for each third pixel point c1, … …, cn, a second data component corresponding to each third pixel point may be obtained: data "1, … …, data" n.

And K222c, multiplying the second position component by the second data component to obtain second weight values corresponding to the third pixel points respectively.

In the embodiment of the invention, according to the second position component output by the first convolution layer and the second data component output by the second convolution layer, second weight values corresponding to the third pixel points respectively can be obtained according to the second position component and the second data component; and carrying out summation operation according to the weights of the second position component and the second data component to obtain second weights corresponding to the third pixel points respectively.

For example, for one pixel point c1 of the third pixel points, the second position component thereof is: point "1, the second data component is data"1 ", and multiplying point"1 by data "1 can obtain a second weight W"1 of c1, and similarly, for each third pixel point c1, … …, cn, a second weight W "1, … … W" n corresponding to each third pixel point can be obtained. For one pixel point c1 of the third pixel points, the second position component thereof is: point "1, the second data component is data"1 ", or may be a second weight W"1 obtained by summing the weights of the two position components and the second data component, which is recorded as: w "1=α 'point"1+β' data "1, where α 'is a parameter of a first convolution layer in the trained super-resolution reconstructed image model, β' is a parameter of a second convolution layer of the trained super-resolution reconstructed image model, point"1 is a second position component of c1, and data "1 is a second data component of c 1.

K223, inputting the second local block and the second weight to the third upsampling layer to obtain pixel values corresponding to fourth pixel points respectively, wherein the fourth pixel points are pixel points in the super-resolution reconstructed image;

In the embodiment of the present invention, for the third pixel point c1= (I1 ', j 1') of the feature image a to be processed, the fourth pixel point d1= (I1, j 1) of the super-resolution reconstructed image B is correspondingly generated, and as shown in formula (2), the pixel value of each fourth pixel point in the super-resolution reconstructed image B can be obtained through the mapping function from the second local block h "1 of the feature image a to be processed to the pixel value I of the super-resolution reconstructed image B.

I ^B (i,j)＝Φ(s ^A (h″),W″(i′,j′)) (2)

Wherein Φ represents the mapping between the eigenvalue S of the second local block h″ to the pixel values of the second pixels of the generated image, S ^A (h ') represents the feature value of a second local block h ' of pixel points (I ', j ') in the feature image A to be processed, W ' (I ', j ') is the second weight of pixel points (I ', j ') in the feature image A to be processed, I ^B (i, j) is the pixel value of the fourth pixel point with the coordinates (i, j) in the super-resolution reconstructed image B.

And K224, inputting the pixel values corresponding to the fourth pixel points respectively to the output layer to obtain the super-resolution reconstructed image.

In the embodiment of the invention, the output layer is used for converting the output result of the third sampling layer into common RGB format data, namely, respectively corresponding pixel values of each fourth pixel point are converted into common RGB format data, specifically, scaling is firstly carried out, respectively corresponding pixel values of each fourth pixel point are multiplied by 255, the final output is ensured to have the same range of RGB data, data exceeding the range of [0,255] possibly exists after scaling, data interception is carried out on the scaled result, the data smaller than 0 is set as 0, and the data larger than 255 is set as 255, so that the super-resolution reconstructed image is obtained.

The existing super-division method needs to recalculate different models for different scales, and the image super-division reconstruction model can realize super-division of various arbitrary scales. Referring to fig. 6, fig. 6 shows the super-division effect of different scales, a is 1.5 times, b is 2 times, c is 2.5 times, d is 3 times, e is 3.5 times, f is 4 times, when the same image is subjected to multi-scale amplification super-division, only the super-division scale of a sampling module in an input super-division reconstruction model is required to be changed, so that the operation of multi-scale image super-division by one set of model is convenient and quick, and the user experience is definitely improved.

In one embodiment, the present invention provides a computer device, which may be a terminal, with an internal structure as shown in fig. 7. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a training method for an image super-resolution reconstruction model. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the block diagram of fig. 7 is merely a partial structure related to the present application and does not constitute a limitation of the computer device to which the present application is applied, and that a specific computer device may include more or less components than those shown in the drawings, or may combine some components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory storing a computer program and a processor implementing the following steps when executing the computer program:

In one embodiment, a computer readable storage medium is provided having stored thereon a computer program which when executed by a processor performs the steps of:

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The training method and the computer equipment of the image super-resolution reconstruction model comprise the following steps: inputting a first image and a first super-scale in training data into an image super-scale reconstruction model, and generating a generated image corresponding to the first image under the first super-scale through the image super-scale reconstruction model, wherein the training data comprises a plurality of groups of training image groups, each group of training image groups comprises a first image, a second image and the first super-scale, and the second image is an image corresponding to the first image under the first super-scale; and adjusting model parameters of the image super-resolution reconstruction model according to the second image corresponding to the first image and the generated image corresponding to the first image, and continuously executing the step of inputting the first image and the first super-resolution scale in the training data into the image super-resolution reconstruction model until a preset training condition is met, so as to obtain the trained image super-resolution reconstruction model. In the method, the coordinates of the first pixel point are calculated through rounding during training, so that coordinate mapping between the first image and the generated image can be found under the super-division scale of any numerical value, and the trained image super-division reconstruction model obtained through training does not limit the numerical value of the super-division scale; according to the method, when training is carried out, the first weight under the first super-division scale is generated, for different first super-division scales, the weight corresponding to each super-division scale can be dynamically generated, the trained image super-division reconstruction model obtained through training can realize that one model outputs super-division results under any scale, and more requirements can be met in practical application.

Claims

1. A training method for an image super-resolution reconstruction model, the method comprising:

according to the second image corresponding to the first image and the generated image corresponding to the first image, adjusting model parameters of the image super-resolution reconstruction model, and continuously executing the step of inputting the first image and the first super-resolution scale in the training data into the image super-resolution reconstruction model until a preset training condition is met, so as to obtain a trained image super-resolution reconstruction model;

the image super-resolution reconstruction model comprises: the device comprises a feature extraction module and an up-sampling module;

the step of generating the generated image corresponding to the first image under the first super-scale by using the image super-scale reconstruction model by using the first image and the first super-scale input image super-scale reconstruction model in the training data comprises the following steps:

Inputting the first image into the feature extraction module to obtain a feature image corresponding to the first image;

and inputting the characteristic image and the first super-scale into the up-sampling module to obtain a generated image corresponding to the characteristic image.

2. The method of claim 1, wherein the upsampling module comprises a first upsampling layer, a second upsampling layer, a third upsampling layer, and an output layer;

the step of inputting the feature image and the first super-scale into the up-sampling module to obtain a generated image corresponding to the feature image includes:

inputting the characteristic image and the first super-scale to the first upsampling layer to obtain a first position offset and a first local block respectively corresponding to each first pixel point, wherein the first pixel point is a pixel point in the characteristic image;

inputting the first position offset and the first local block corresponding to each first pixel point to the second upsampling layer to obtain a first weight corresponding to each first pixel point;

inputting the first local block and the first weight to the third upsampling layer to obtain pixel values respectively corresponding to second pixel points, wherein the second pixel points are pixel points in the generated image;

And inputting pixel values corresponding to the second pixel points respectively to the output layer to obtain the generated image.

3. The method of claim 2, wherein the inputting the feature image and the first superscales to the first upsampling layer to obtain a first position offset and a first local block for each first pixel respectively comprises:

inputting the feature image and the first superscales to the first upsampling layer to determine respective first hypothetical pixels of the generated image;

according to the first assumed pixel points and the first super-division scale, calculating first calculated values respectively corresponding to the first assumed pixel points;

performing rounding calculation on the first calculated values respectively corresponding to the first assumed pixel points to obtain first pixel points respectively corresponding to the first assumed pixel points;

calculating a difference value between each first pixel point and a first calculated value corresponding to the first pixel point to obtain first position offset corresponding to each first pixel point;

and extracting first local blocks corresponding to the first pixel points in the characteristic image according to the sizes of the convolution kernels of the first pixel points and the first upsampling layer.

4. The method of claim 2, wherein the second upsampling layer comprises a first convolution layer and a second convolution layer;

the step of inputting the first position offset and the first local block corresponding to each first pixel point to the second upsampling layer to obtain the first weight corresponding to each first pixel point includes:

inputting the first position offset into the first convolution layer to obtain first position components corresponding to the first pixel points respectively;

inputting the first local block into the second convolution layer to obtain first data components corresponding to the first pixel points respectively;

and obtaining first weights corresponding to the first pixel points respectively according to the first position components and the first data components.

5. The method according to claim 2, wherein inputting the first position offset and the first local block corresponding to each of the first pixel points to the second upsampling layer to obtain the first weight corresponding to each of the first pixel points includes:

and splicing and inputting the first position offset and the first local block corresponding to each first pixel point to the second upsampling layer to obtain the first weight corresponding to each first pixel point.

6. The method of claim 1, wherein adjusting model parameters of the image super-resolution reconstruction model based on the second image and the generated image comprises:

calculating a first loss value according to the second image and the generated image;

and adjusting parameters of the image super-resolution reconstruction model according to the first loss value.

7. The method according to claim 1, wherein adjusting the model parameters of the image super-resolution reconstruction model according to the second image corresponding to the first image and the generated image corresponding to the first image comprises:

calculating a first loss value according to a second image corresponding to the first image and a generated image corresponding to the first image;

inputting the generated image and the second image into a discriminator network to obtain the probability of the generated image, and calculating a second loss value according to the probability;

and adjusting parameters of the image super-resolution reconstruction model according to the first loss value and the second loss value.

8. The method of claim 1, wherein when the first image is an original image acquired by an image sensor, prior to said inputting the first image and the first upscaled input image into a super-resolution reconstruction model, the method further comprises:

And preprocessing the first image to obtain a preprocessed image, and taking the preprocessed image as the first image.

9. The method of claim 8, wherein preprocessing the first image to obtain a preprocessed image, and taking the preprocessed image as the first image comprises:

converting the single-dimensional data of the first image into four-dimensional data;

and carrying out normalization processing on the four-dimensional data to obtain a preprocessed image, and taking the preprocessed image as a first image.

10. An image super-resolution reconstruction method, characterized in that the method comprises:

inputting the image to be processed and the second super-scale into a trained image super-division reconstruction model to obtain a super-division reconstruction image corresponding to the image to be processed, wherein the trained image super-division reconstruction model is the image super-division reconstruction model according to any one of claims 1-9.

11. The method of claim 10, wherein the image super-resolution reconstruction model comprises: the device comprises a feature extraction module and an up-sampling module;

Inputting the image to be processed and the second super-scale into a trained image super-division reconstruction model to obtain a super-division reconstruction image corresponding to the image to be processed, wherein the method comprises the following steps of:

inputting the image to be processed into a feature extraction module to obtain a feature image to be processed;

and inputting the feature image to be processed and the second super-division scale into an up-sampling module to obtain a super-division reconstructed image.

12. The method of claim 11, wherein the upsampling module comprises a first upsampling layer, a second upsampling layer, a third upsampling layer, and an output layer;

the step of inputting the feature image to be processed and the second super-division scale into an up-sampling module to obtain a super-division reconstructed image comprises the following steps:

inputting the feature image to be processed and the second super-scale to the first upsampling layer to obtain second position offset and second local blocks corresponding to third pixel points respectively, wherein the third pixel points are pixel points in the feature image to be processed;

inputting the second position offset and the second local block corresponding to each third pixel point to the second upsampling layer to obtain second weights corresponding to each third pixel point;

Inputting the second local block and the second weight to the third upsampling layer to obtain pixel values respectively corresponding to fourth pixel points, wherein the fourth pixel points are pixel points in the super-resolution reconstructed image;

and inputting pixel values corresponding to the fourth pixel points respectively to the output layer to obtain the super-resolution reconstructed image.

13. The method according to claim 12, wherein the inputting the feature image to be processed and the second super-scale to the first upsampling layer to obtain the second position offset and the second local block respectively corresponding to each third pixel point includes:

inputting the feature image to be processed and the second super-division scale to the first upsampling layer to determine a second assumed pixel point of the super-division reconstructed image;

calculating a second calculated value corresponding to the second assumed pixel point according to the second assumed pixel point and the second super-division scale;

performing rounding calculation on the second calculated value to obtain a third pixel point corresponding to the second assumed pixel point;

calculating the difference value between the third pixel points and the second calculated value to obtain second position offset corresponding to each third pixel point;

And extracting second local blocks corresponding to the third pixel points respectively from the feature image to be processed according to the third pixel points and the convolution kernel in the first upsampling layer.

14. The method of claim 12, wherein the second upsampling layer comprises a first convolution layer and a second convolution layer;

the step of inputting the second position offset and the second local block corresponding to each third pixel point to the second upsampling layer to obtain the second weight corresponding to each third pixel point includes:

inputting the second position offset into the first convolution layer to obtain second position components corresponding to the third pixel points respectively;

inputting the second local block into the second convolution layer to obtain second data components corresponding to the third pixel points respectively;

and obtaining second weights corresponding to the third pixel points respectively according to the second position components and the second data components.

15. The method of claim 12, wherein inputting the second position offset and the second local block corresponding to each of the third pixels to the second upsampling layer to obtain the second weight corresponding to each of the third pixels comprises:

And splicing and inputting the second position offset and the second local block to the second upsampling layer to obtain second weights respectively corresponding to the third pixel points.

16. The method of claim 10, wherein when the image to be processed is an original image acquired by an image sensor, the method further comprises, prior to said inputting the image to be processed and the second upscaling into a trained image superreconstruction model:

preprocessing the image to be processed to obtain a preprocessed image, and taking the preprocessed image as the image to be processed.

17. The method according to claim 16, wherein preprocessing the image to be processed to obtain a preprocessed image, and taking the preprocessed image as the image to be processed, comprises:

converting the single-dimensional data of the image to be processed into four-dimensional data;

and carrying out normalization processing on the four-dimensional data to obtain a preprocessed image, and taking the preprocessed image as an image to be processed.

18. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 9 when the computer program is executed.

19. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 9.