CN112488916A

CN112488916A - Training method of image hyper-resolution reconstruction model and computer equipment

Info

Publication number: CN112488916A
Application number: CN201910866170.0A
Authority: CN
Inventors: 王树朋
Original assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Current assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Priority date: 2019-09-12
Filing date: 2019-09-12
Publication date: 2021-03-12
Anticipated expiration: 2039-09-12
Also published as: CN112488916B

Abstract

The application relates to a training method and computer equipment for an image hyper-resolution reconstruction model, wherein the method comprises the following steps: inputting a first image and a first super-resolution scale in training data into an image super-resolution reconstruction model to generate a corresponding generated image of the first image under the first super-resolution scale, wherein the training data comprises a plurality of groups of training image groups, each group of training image group comprises a first image, a second image and a first super-resolution scale, and the second image is an image corresponding to the first image under the first super-resolution scale; and adjusting the model parameters of the image hyper-resolution reconstruction model according to the second image and the generated image until the preset training condition is met, so as to obtain the trained image hyper-resolution reconstruction model. The method does not limit the numerical value of the first hyper-resolution scale during training, can be any numerical value, realizes the hyper-resolution of any scale by the trained image hyper-resolution reconstruction model obtained by the training of the invention, and can meet more requirements in practical application.

Description

Training method of image hyper-resolution reconstruction model and computer equipment

Technical Field

The application relates to the technical field of picture processing, in particular to a training method and computer equipment for an image hyper-resolution reconstruction model.

Background

Image Super-Resolution reconstruction is a research direction for comparing fire and heat in the field of computer vision and image processing, and the development of deep learning technology promotes the progress of image Super-Resolution reconstruction methods, such as the hyperdifferentiation technologies of EDSR (Enhanced deep Super-Resolution network), ESRGAN (Enhanced Super-Resolution general adaptive Networks), etc., however, these hyperdifferentiation methods all have defects: in the image super-resolution reconstruction technology based on deep learning, an up-sampling method for determining a super-resolution scale, such as bicubic interpolation method and PixelShuffle, is often used for super-resolution in a fixed integer multiple scale manner, for example: the 2-time, 3-time and 4-time overcladding cannot realize the 1.5-time, 2.5-time and other non-integral multiple overcladding, and actually, the non-integral multiple overcladding often has more applications. The defects of the prior art cause that the application of the super-resolution image reconstruction technology based on deep learning has great limitation.

Therefore, the prior art is in need of improvement.

Disclosure of Invention

The invention aims to solve the technical problem of providing a training method and computer equipment of an image hyper-resolution reconstruction model so as to realize hyper-resolution of any scale.

In one aspect, an embodiment of the present invention provides a training method for an image hyper-resolution reconstruction model, including:

inputting a first image and a first super-resolution scale in training data into an image super-resolution reconstruction model, and generating a generated image corresponding to the first image under the first super-resolution scale through the image super-resolution reconstruction model, wherein the training data comprises a plurality of groups of training image groups, each group of training image group comprises a first image, a second image and a first super-resolution scale, and the second image is an image corresponding to the first image under the first super-resolution scale;

and adjusting model parameters of the image hyper-resolution reconstruction model according to a second image corresponding to the first image and a generated image corresponding to the first image, and continuing to execute the step of inputting the first image and the first hyper-resolution scale in the training data into the image hyper-resolution reconstruction model until a preset training condition is met, so as to obtain a trained image hyper-resolution reconstruction model.

In a second aspect, an embodiment of the present invention provides an image hyper-resolution reconstruction method, where the method includes:

acquiring an image to be processed and a second super-scale of the image to be processed;

and inputting the image to be processed and the second hyper-resolution scale into a trained image hyper-resolution reconstruction model to obtain a hyper-resolution reconstruction image corresponding to the image to be processed, wherein the trained image hyper-resolution reconstruction model is an image hyper-resolution reconstruction model obtained by training through the image hyper-resolution model training method.

In a third aspect, an embodiment of the present invention provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the following steps when executing the computer program:

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the following steps:

Compared with the prior art, the embodiment of the invention has the following advantages:

according to the training method provided by the embodiment of the invention, a first image and a first hyper-scale input image hyper-scale reconstruction model in training data are input, and a generated image corresponding to the first image under the first hyper-scale is generated through the image hyper-scale reconstruction model, wherein the training data comprises a plurality of groups of training image groups, each group of training image group comprises a first image, a second image and a first hyper-scale, and the second image is an image corresponding to the first image under the first hyper-scale; and adjusting model parameters of the image hyper-resolution reconstruction model according to a second image corresponding to the first image and a generated image corresponding to the first image, and continuing to execute the step of inputting the first image and the first hyper-resolution scale in the training data into the image hyper-resolution reconstruction model until a preset training condition is met, so as to obtain a trained image hyper-resolution reconstruction model. In the method, during training, the coordinates of the first pixel points are calculated through rounding, so that the coordinate mapping between the first image and the generated image can be found under the super-resolution scale of any numerical value, and the trained image super-resolution reconstruction model obtained through training does not limit the numerical value of the super-resolution scale; the method generates the first weight under the first hyper-resolution scale during training, and can dynamically generate the weight corresponding to each hyper-resolution scale for different first hyper-resolution scales.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a training method of an image hyper-resolution reconstruction model according to an embodiment of the present invention;

FIG. 2 is a schematic illustration of a Bayer array in an embodiment of the invention;

FIG. 3 is a schematic diagram illustrating a situation that occurs when a plurality of different first superscales are trained simultaneously, without including a first superscale in a first position offset amount, according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating calculation of a first weight of a first pixel according to an embodiment of the present invention;

FIG. 5 is a schematic flow chart of an image hyper-resolution reconstruction method according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of an over-resolution effect of different scales according to an embodiment of the present invention;

fig. 7 is an internal structural diagram of a computer device in an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Various non-limiting embodiments of the present invention are described in detail below with reference to the accompanying drawings.

Referring to fig. 1, a training method of an image hyper-resolution reconstruction model in an embodiment of the present invention is shown. In this embodiment, the method may include, for example, the steps of:

s1, inputting a first image and a first hyper-scale in training data into an image hyper-scale reconstruction model, and generating a generated image corresponding to the first image under the first hyper-scale through the image hyper-scale reconstruction model, wherein the training data comprises a plurality of groups of training image groups, each group of training image group comprises a first image, a second image and a first hyper-scale, and the second image is an image corresponding to the first image under the first hyper-scale.

In the embodiment of the present invention, the first image may be a three primary color image (RGB image), or may be an original image (RAW image) acquired by an image sensor; the image super-resolution reconstruction technology is used for amplifying a low-resolution image and supplementing more image details to obtain a clear high-resolution image, wherein the amplification factor is a numerical value of a super-resolution scale; the value of the first superscale is any value, and for example, may be a non-integer multiple: 1.5, 2.5, 3.3, etc., and can also be integer multiples: 2. 4, 6, etc.; the first image obtains the second image under a first super-resolution scale, the ratio of the resolution of the second image to the resolution of the first image is equal to the numerical value of the first super-resolution scale, more image details are added to the second image compared with the first image, the second image is clearer, and the second image can be regarded as a standard image for image super-resolution reconstruction.

In the embodiment of the present invention, if the first image is a RAW image, the RAW image includes all information including all noise of the image, and therefore the RAW image needs to be preprocessed, where the preprocessing is to remove the noise in the RAW image, and the noise of the preprocessed image is less. The preprocessing comprises the steps of carrying out dimensionality reduction processing and normalization on the image.

Specifically, if the first image is a RAW image, the method further includes, before step S1:

m1, preprocessing the first image to obtain a preprocessed image, and taking the preprocessed image as the first image.

In the embodiment of the present invention, if the first Image is a RAW Image, the RAW Image includes all information including all noise of the Image, in the conventional method, Image Signal Processing (ISP) is used to process the RAW data to obtain RGB data, the ISP process includes operations such as demosaicing, denoising, color correction, white balance, and the like, the Processing process may cause part of information of the Image to be lost, and the lost information may affect the final quality of the super-score.

Specifically, the preprocessing the first image to obtain a preprocessed image, and taking the preprocessed image as the first image includes:

m11, converting the single-dimensional data of the first image into four-dimensional data;

m12, carrying out normalization processing on the four-dimensional data to obtain a preprocessed image, and taking the preprocessed image as a first image.

In the embodiment of the present invention, an image acquired by an image sensor is in a RAW format, the image sensor may be a Charge Coupled Device (CCD) sensor or a Complementary Metal-Oxide-Semiconductor (CMOS) sensor, data of a color image captured by the image sensor is arranged in a bayer array, and the bayer array is composed of 8 green pixels, 4 blue pixels, and 4 red pixels as shown in fig. 2.

For example, after acquiring a single image X in RAW format, the method includes: preprocessing an original image X, wherein the preprocessing comprises the following steps: the dimension of the RAW image data is changed from H × W × 1, for example, in fig. 3, H is 8, W is 8, RAW image data is 8 × 8 × 1, and the dimension is 1

After the change in the dimensions has been completed,RAW image data is 4 × 4 × 4 with a dimension of 4; then normalization processing is carried out to obtain the dimension of

Is mapped to [0,1 ]]A space.

In the embodiment of the invention, the first image in the training data and the first hyper-resolution scale are input into the image hyper-resolution reconstruction model, and the generated image corresponding to the first image is generated through the image hyper-resolution reconstruction model. The image hyper-resolution reconstruction model comprises: the image processing device comprises a feature extraction module and an up-sampling module, wherein the feature extraction module is used for extracting features of the first image, and the up-sampling module is used for outputting the generated image according to an output result of the feature extraction module.

Specifically, step S1 includes:

and S11, inputting the first image into the feature extraction module to obtain a feature image corresponding to the first image.

The feature extraction module is a module necessary for the super-resolution reconstruction, and the feature extraction module in the embodiment of the present invention may be any one of existing networks, such as a residual module of an EDSR super-resolution network, an RDB module in an RDN super-resolution network, a dense connection module in an srdensnet, an SE module in a SENet, an inclusion module in an inclusion network, or ConvLstm, and the like. The feature extraction module is used for extracting more abstract image features, and the feature images mainly comprise color features, texture features, shape features and spatial relation features of the images and are irrelevant to the concrete forms of the image features, so that the existing feature extraction scheme of the full-volume neural network can be used as the feature extraction module of the case.

For example, a residual error module of the EDSR hyper-division network is selected as a feature extraction module in the hyper-division reconstruction model in the embodiment of the present invention, and the first image is input into the residual error module of the EDSR hyper-division network, so as to obtain a feature image corresponding to the first image; or selecting an RDB module in the RDN super-division network as a feature extraction module in the embodiment of the invention, and inputting the first image into the RDB module in the RDN super-division network to obtain the feature image.

And S12, inputting the feature image and the first super-scale into the up-sampling module to obtain a generated image corresponding to the feature image.

In the embodiment of the invention, the up-sampling module mainly amplifies the image by an up-sampling method, and the up-sampling method mainly comprises three modes: the embodiment of the invention can realize deconvolution by adopting an interpolation method, deconvolution and deconvolution, and has the greatest characteristic that the deconvolution process needs to learn the parameters of an up-sampling module through training, and after the training is finished, the up-sampling module can obtain a generated image closer to a second image.

Specifically, the upsampling module includes a first upsampling layer, a second upsampling layer, a third upsampling layer and an output layer, and step S12 includes:

s121, inputting the feature image and the first super-resolution scale into the first up-sampling layer to obtain a first position offset and a first local block corresponding to each first pixel point, wherein the first pixel points are pixel points in the feature image.

In the embodiment of the present invention, according to the feature image and the first super-resolution scale, the size of the generated image is obtained through the first up-sampling layer, and since the generated image is not really obtained in performing this step, the generated image may be referred to as an assumed generated image, coordinates of each pixel point in the assumed generated image may be obtained through the size of the assumed generated image, and then each pixel point in the feature image, that is, each first pixel point, may be derived from the coordinates of each pixel point in the assumed generated image, and any one pixel point in the feature image may be referred to as a first pixel point. And through the first upper sampling layer, the first position offset and the first local block corresponding to each first pixel point can be obtained. And deducing the position offset generated in the process of generating each pixel point in the characteristic image according to the assumed coordinates of each pixel point in the generated image, wherein the generated position offset is the first position offset. And after each first pixel point is obtained, the first local block takes each first pixel point as a center, and partial pixel points in the characteristic image are taken to form the first local block, wherein the partial pixel points comprise neighborhood pixel points taking the first pixel points as the center in the characteristic image.

Specifically, step S121 includes:

s121a, inputting the feature image and the first super-scale into the first upsampling layer to determine first hypothetical pixel points of the generated image.

In the embodiment of the present invention, the first upsampling layer may obtain the resolution of the generated image through the resolution of the feature image and the first superscale numerical value, and may know the coordinates of each pixel point in the generated image through the resolution of the generated image.

For example, assuming that the resolution of the input feature image X is 200 × 300, and the first super-resolution r1 is 1.5, the resolution of the generated image Y of the feature image at the first super-resolution may be 300 × 450, and it may be assumed that the pixel points of the generated image Y are first assumed pixel points, and the abscissa of each first assumed pixel point of the generated image may be 1 to 300, and the ordinate may be 1 to 450.

S121b, calculating first calculation values respectively corresponding to the first assumed pixel points according to the first assumed pixel points and the first super-resolution scale.

In the embodiment of the present invention, the generated image is an image obtained from the feature image at a first super-resolution scale, and a ratio of coordinates of each pixel point of the generated image to coordinates of each pixel point of the feature image is theoretically close to a numerical value of the first super-resolution scale. And dividing the coordinates of each first assumed pixel point by a value of a first super-resolution scale to obtain a first calculation value corresponding to each first assumed pixel point, wherein the first calculation value represents the real coordinates of each first pixel point of the characteristic image which should be obtained theoretically through the coordinates of each pixel point of the generated image and the first super-resolution scale.

For example, for each of the first assumed pixel points f1 of the generated image Y being (i1, j1) being (80,400), and the first super-scale r1 being 1.5, the abscissa and ordinate of f1 are divided by r1 being 1.5, so as to calculate a first calculated value p1 corresponding to the first assumed pixel point f1, that is, the first calculated value p1 corresponding to the first assumed pixel point f1 is calculated

Similarly, for each of the first assumed pixel points f2 of the generated image Y being (i2, j2) being (10,25), the first calculated value corresponding to the first assumed pixel point f2 may be calculated

S121c, performing rounding calculation on the first calculated values respectively corresponding to the first assumed pixel points to obtain first pixel points respectively corresponding to the first assumed pixel points;

in the embodiment of the present invention, since the value of the first superscale r1 may be any value, and the coordinate obtained by directly dividing the coordinate of each first hypothetical pixel by r1 cannot be guaranteed to be an integer, the first calculated value obtained in step S121b cannot be used as the first pixel corresponding to each first hypothetical pixel, and the first calculated value needs to be rounded. In the embodiment of the present invention, by performing rounding calculation on the first calculated value in step S221c, overclass at any scale (not limited to the integer overclass scale) can be implemented.

For example, for each of the first assumed pixels of the generated image Y, one pixel f1 is (i1, j1) is (80,400), and its corresponding first calculated value

Performing rounding-down calculation on p1 to obtain a first pixel point corresponding to f1

For each of the first assumed pixels of the generated image Y, one pixel f2 is (i2, j2) is (10,25), and similarly, the corresponding first pixel is the same as the corresponding first pixel

Calculating a first calculation value corresponding to each first assumed pixel point of the generated image Y, rounding the first calculation value to obtain each first pixel point of the generated image corresponding to each first assumed pixel point, namely finding the position relation mapping between each pixel point of the characteristic image and each pixel point in the generated image, and deducing, wherein when the first super-resolution scale is larger than 1, the number of the pixel points of the generated image is larger than that of the pixel points of the characteristic image, namely, a plurality of first assumed pixel points of the generated image correspond to the first pixel point of one characteristic image.

S121d, calculating a difference between the first pixel point and the first calculated value corresponding to the first pixel point for each first pixel point, and obtaining a first position offset corresponding to each first pixel point.

In this embodiment of the present invention, in step S221c, since the rounding calculation is performed on the first calculated value, the position offset, which is referred to as a first position offset, occurs between the first calculated value directly corresponding to the first assumed pixel point through the first super-resolution scale and the first pixel point of the feature image obtained by rounding the first calculated value.

For example, one pixel point of the first pixel points of the feature image is targeted

The corresponding first calculated value is:

the first position offset of q1 can be obtained from p1 and q1

Alternatively, considering the effect of the first superscalar, it may be: and aiming at each first pixel point, obtaining a first position offset corresponding to each first pixel point according to the first pixel point, a first calculated value corresponding to the first pixel point and the first super-resolution scale.

In the embodiment of the present invention, the first position offset may further include a first super-resolution scale in addition to the difference between the first calculation value and the first pixel point, so that when the first position offset is used to train a plurality of different first super-resolution scales at the same time, the first position offset of different super-resolution scales may be distinguished, and further, the first weight values under different super-resolution scales and the pixel values of the pixel points of the generated image obtained under different super-resolution scales may be distinguished.

If the first super-resolution scale is not included in the first position offset, the following may occur when simultaneously training a plurality of different first super-resolution scales:

referring to fig. 3, i is a first image, ii is a generated image corresponding to the first image when the first super-resolution is 2, and iii is a generated image corresponding to the first image when the first super-resolution is 4; for a first pixel point (i ', j') in the image I, the pixel point in the generated image II obtained when the super-resolution scale is 2 is (i, j), and the first pixel point (i ', j') in the image I, the pixel point in the generated image III obtained when the super-resolution scale is 4 is (2i,2j), if a plurality of different first super-resolution scales are trained simultaneously, the first position offset does not include the first super-resolution scale, so that the first weight of the pixel point (i ', j') when the first super-resolution scale is 2 is the same as the first weight of the pixel point (i ', j') when the first super-resolution scale is 4, and the pixel value of the pixel point (i, j) in the generated image II is the same as the pixel value of the pixel point (2i,2j) in the generated image III; similarly, for a first pixel point (i '+1, j' +1) in the image i, a first weight value of (i '+1, j' +1) when the first super-scale is 2 is the same as a first weight value of (i '+1, j' +1) when the first super-scale is 4, so that a pixel value of the pixel point (i, j) in the generated image ii is the same as a pixel value of the pixel point (2i,2j) in the generated image iii; therefore, the pixel value of each pixel point in the generated image II is the same as the pixel value of part of pixel points in the generated image III, and the pixel points in the generated image II can be understood to be taken from the generated image III, namely the generated image II is a sub-image of the generated image III, so that the super-resolution reconstruction effect can be limited.

Therefore, the first position offset includes a difference between the first calculation value and the first pixel point and a first super-resolution scale, so that when the first position offset is used for simultaneously training a plurality of different first super-resolution scales, the first position offset of different super-resolution scales can be distinguished, and further, the first weight values under different super-resolution scales and the pixel values of the pixel points of the generated image obtained under different super-resolution scales are distinguished.

For example, when the first pixel point f1 in the first image is (i1, j1) is (80,400), r1 is 3,

the corresponding first calculated value is:

from p1, q1 and r1, a first amount of positional deviation of q1 can be obtained

When the first pixel point f1 in the first image is (i2, j2) is (40,200), and r1' is 1.5,

corresponding first calculated value

From p1', q1' and r1', a first amount of positional deviation of q1' can be obtained

It can be seen that V1 and V1' are different; further, a first weight obtained when the first pixel point f1 is (i1, j1) is (80,400) and r1 is 3 is different from a first weight obtained when the first pixel point f1 is (i2, j2) is (40,200) and r1' is 1.5; further, the pixel value of the pixel point of the generated image corresponding to the first pixel point f1 (i1, j1) (80,400) when r1 is 3 is different from the pixel value of the pixel point of the generated image corresponding to the first pixel point f1 (i2, j2) (40,200) when r1' is 1.5.

And S121e, extracting first local blocks corresponding to the first pixel points in the feature image according to the sizes of the first pixel points and the convolution kernels of the first upsampling layer.

In the embodiment of the present invention, k × k matrices are obtained through convolution kernels in the first upper sampling layer, each first pixel point is sequentially used as a central point of the k × k matrices, a region corresponding to the k × k matrix whose central point is the first pixel point can be used as a first local block corresponding to the first pixel point, for example, a region corresponding to the k × k matrix whose central point is the first pixel point q2 can be used as a first local block corresponding to the first pixel point q2, and a region corresponding to the k × k matrix whose central point is the first pixel point q3 can be used as a first local block corresponding to the first pixel point q 3; the first local block comprises first pixel points and neighborhood pixel points of the first pixel points, k is the number of convolution kernels of the first upper sampling layer, and the size of the first local block is k x k.

For example, a first pixel coordinate q1 of the feature image X is (53,260), in the feature image X, a first local block h '1 with a size k × k is taken with q1 being (53,260) as a center, where k is the size of the convolution kernel of the first upsampling layer, and if k is 3, the first local block h'1 is composed of 9 pixels in the feature image X.

In the embodiment of the present invention, when the first pixel point is an edge pixel point of the feature image, a complete region corresponding to a k × k matrix with the first pixel point as a central point cannot be obtained in the feature image, at this time, zero padding operation needs to be performed, that is, a pixel value "0" is used to fill a neighborhood pixel point of the first pixel point that is lacked in the first local block, so that after zero padding, a part of pixel points in the first local block corresponding to the first pixel point at the edge of the feature image are part of the first pixel points in the feature image, and another part of pixel points are filled pixel points with a pixel value of 0.

For example, the following steps are carried out: if an edge first pixel coordinate q4 of the feature image X is (1,1), k is 3, in the feature image X, taking q4 as (1,1) as a center, taking a first local block h '2 with a size of 3 × 3, where the first local block h'2 includes a first pixel q4 as (1,1), a first pixel q5 as (1,2), a first pixel q6 as (2,1), a first pixel q7 as (2,2), and the remaining five filled pixels with pixel values of 0.

And S122, inputting the first position offset and the first local block corresponding to each first pixel point to the second upsampling layer to obtain a first weight corresponding to each first pixel point.

In the embodiment of the present invention, the first weights respectively corresponding to the first pixel points are obtained according to the first position offset and the first local block of each first pixel point, so that the first weights are not only related to the first position offset but also related to the first local block, and a process of obtaining the first weights through the second upsampling layer may also be referred to as a bilateral upsampling process.

In the embodiment of the present invention, the first position offset and the first local block of each first pixel point are input into the second upsampling layer in a linear combination manner, and a first weight is obtained through transformation of a nonlinear activation function. And the first position offset and the first local block are input into the second upsampling layer to obtain a first weight, the first position offset and the first local block are output results of the first upsampling layer, the first weight represents the connection strength between neurons in the second upsampling layer and neurons in the first upsampling layer, the larger the first weight is, the larger the connection strength between the neurons is, and otherwise, the smaller the first weight is, the smaller the connection strength between the neurons is.

Specifically, the second upsampling layer includes a first convolutional layer and a second convolutional layer, and step 1222 includes:

s122a, inputting the first position offset into the first convolution layer, so as to obtain first position components corresponding to the first pixels, respectively.

In the embodiment of the invention, the first position offset is input into the first convolution layer, and the first position offset is subjected to nonlinear transformation to obtain the first position component. The first position component reflects the size of the first position offset of the first pixel point, the larger the first position component is, the larger the first position offset is, and otherwise, the smaller the first position component is, the smaller the first position offset is.

For example, referring to fig. 3, for a pixel q1 in each first pixel, the first position offset V1 is input to the first convolution layer to obtain a first position component point'1 corresponding to q1, and similarly, for each first pixel q1, … …, qn, the first position component corresponding to each first pixel may be obtained: point '1, … …, point' n, where n is the number of first pixel points in the generated image.

S122b, inputting the first local block into the second convolution layer, and obtaining first data components corresponding to the first pixels, respectively.

In the embodiment of the present invention, the first partial block is input to the second convolution layer, and the first data component is obtained after the first partial block is subjected to nonlinear transformation. The first data component represents feature information of a first local block in which each pixel point in the generated image is mapped to each pixel point in the feature image, and the feature information of the first local block includes color features, texture feature shape features and spatial relationship features of each pixel point in the first local block.

For example, referring to fig. 4, for one pixel q1 of each first pixel, the first local block h '1 is input to the second convolutional layer to obtain a first data component data'1 corresponding to q1, and similarly, for each first pixel q1, … …, qn, the first data component corresponding to each first pixel may be obtained: data '1, … …, data' n, where n is the number of first pixel points in the generated image.

S122c, obtaining first weights respectively corresponding to the first pixel points according to the first position component and the first data component.

In the embodiment of the present invention, a first weight corresponding to each first pixel point may be obtained according to a first position component output by the first convolution layer and a first data component output by the second convolution layer; or performing summation operation according to the weights of the first position component and the first data component to obtain first weights corresponding to the first pixel points respectively.

For example, for one pixel q1 of the first pixels, the first position component is: point '1, where the first data component is data'1, the point '1 is multiplied by the data'1 to obtain a first weight W1 of q1, and similarly, for each first pixel point q1, … …, qn, a first weight W '1, … … W' n corresponding to each first pixel point can be obtained, where n is the number of first pixel points in the generated image. For one pixel q1 of the first pixels, the first position component is: point '1, where the first data component is data '1, or a first weight W '1 obtained by performing summation operation according to the weights of the first location component and the first data component is recorded as: w '1 ═ α point '1+ β data '1, where α is a parameter of the first convolutional layer, β is a parameter of the second convolutional layer, point '1 is the first location component of q1, and data '1 is the first data component of q 1.

In the embodiment of the present invention, for the first image, if the numerical values of the first super-resolution scales are different, different first weights are obtained corresponding to different first super-resolution scales in step S122, that is, for the model provided in the embodiment of the present invention, for each super-resolution scale, a corresponding first weight can be dynamically generated, and an output result of a set of models under different super-resolution scales can be output.

And S123, inputting the first local block and the first weight into the third upper sampling layer to obtain pixel values corresponding to second pixels respectively, wherein the second pixels are pixels in the generated image.

In the embodiment of the present invention, the second pixel point t1 (i1, j1) has the same coordinate value as the first assumed pixel point f1 (i1, j1), but has a different meaning, and the first assumed pixel point is a pixel point of a generated image assumed by a feature image and a first super-resolution scale, and a generated image is not obtained when the first assumed pixel point is obtained; and the second pixel point is a pixel point of a generated image output by the third upper sampling layer according to the weight and the local block of the first pixel point of the characteristic image, the generated image can be obtained according to the pixel value corresponding to the second pixel point, and each pixel point in the generated image is called as a second pixel point.

For example, for a first pixel point q1 ═ (i1', j1') of the feature image X, a second pixel point t1 ═ (i1, j1) of the corresponding generated image Y; for the pixel value of the second pixel point t1 ═(I1, j1) of the generated image Y, it can be considered that the third upsampling layer can be considered as a mapping function of the first local block h1 of the feature image X to the pixel value I of the generated image Y, as shown in formula (1), as determined by the feature value of the local block h '1 corresponding to the feature image pixel point q1 ═(I1', j1') and the first weight W1 corresponding to q 1:

I^Y(i,j)＝Φ(s^x(h′),W′(i′,j′)) (1)

phi represents the mapping from the feature value S of the first local block h ' to the pixel value of each second pixel point of the generated image, (h ') represents the feature value of the first local block h ' of the pixel point (I ', j ') in the feature image X, W ' (I ', j ') is the first weight of the pixel point (I ', j ') in the feature image X, and I ' (I ', j ') is the first weight of the pixel point (I ', j ') in the feature image X^Y(i, j) is the pixel value of the second pixel point with the coordinate (i1, j1) in the generated image.

And S124, inputting the pixel values corresponding to the second pixel points into the output layer to obtain the generated image.

In the embodiment of the present invention, the output layer is configured to convert the output result of the third sampling layer into common RGB format data, that is, the pixel values corresponding to the second pixels are converted into common RGB format data, specifically, scaling is performed first, and the pixel values corresponding to the second pixels are multiplied by 255, so as to ensure that the final output has the same range as the RGB data, data beyond the range of [0,255] may exist after scaling, data truncation needs to be performed on the result after scaling, and data smaller than 0 is set as 0, and data larger than 255 is set as 255.

S2, adjusting model parameters of the image hyper-resolution reconstruction model according to the second image corresponding to the first image and the generated image corresponding to the first image, and continuing to execute the step of inputting the first image and the first hyper-resolution scale in the training data into the image hyper-resolution reconstruction model until a preset training condition is met, so as to obtain the trained image hyper-resolution reconstruction model.

In the embodiment of the present invention, the second image is an image of the first image in the hyper-resolution scale, and is equivalent to a standard image obtained by image hyper-resolution reconstruction processing, the generated image is an image obtained by inputting the first image into an image hyper-resolution reconstruction model, a difference value is obtained by comparing the generated image obtained by the image hyper-resolution reconstruction model with the second image, a model parameter of the image hyper-resolution reconstruction model is adjusted by using the obtained difference value, and the model parameter of the image hyper-resolution reconstruction model is modified, so that the generated image obtained by the image hyper-resolution reconstruction model is closer to the second image.

Optionally, in step S2, adjusting a model parameter of the image hyper-resolution reconstruction model according to the second image and the generated image, including:

s21a, calculating a first loss value according to the second image and the generated image;

in the embodiment of the present invention, the second image is an image of the first image after the first super-resolution scale transformation, and may be regarded as a standard answer, the first image obtains a generated image through a super-resolution reconstruction model, and the pixel values of the pixels of the generated image are compared with the pixel values of the pixels of the second image, so as to obtain a first Loss value Loss 1.

And S22a, adjusting the parameters of the image hyper-resolution reconstruction model according to the first loss value.

In the embodiment of the invention, the parameter of the image hyper-resolution reconstruction model is assumed to be β 1, and the first Loss value Loss1 is reversely propagated to modify the parameter β 1 of the image hyper-resolution reconstruction model, so as to obtain a modified parameter β 2.

In the embodiment of the invention, after the parameters are modified, the step of performing the hyper-resolution reconstruction of the first image and the first hyper-resolution scale input image in the training data is continued until a preset training condition is met, wherein the preset training condition comprises that the first loss value meets a preset requirement or the training frequency reaches a preset frequency. The preset requirement may be determined according to the hyper-resolution reconstruction model trained by the image, and will not be described in detail herein, and the preset number may be the maximum training number of the image hyper-resolution reconstruction model, for example, 50000 times. Therefore, a generated image is output from the image super-resolution reconstruction model, a first loss value is calculated according to the generated image and the second image, after the first loss value is obtained through calculation, whether the first loss value meets a preset requirement is judged, if the first loss value meets the preset requirement, the training is finished, if the first loss value does not meet the preset requirement, whether the training frequency of the image super-resolution reconstruction model reaches the training frequency is judged, if the training frequency does not reach the preset frequency, parameters of the image super-resolution reconstruction model are adjusted according to the first loss value, if the training frequency reaches the preset frequency, the training is finished, and therefore whether the image super-resolution reconstruction model is finished or not is judged through the loss function value and the training frequency, and the phenomenon that the image super-resolution reconstruction model enters dead cycle due to the fact that the loss function value cannot reach the preset requirement can be avoided.

Further, since the modifying of the parameter of the image hyper-resolution reconstruction model is performed when the training condition of the image hyper-resolution reconstruction model does not satisfy the preset condition (for example, the first loss value does not satisfy the preset requirement and the training frequency does not reach the preset frequency), after the parameter of the image hyper-resolution reconstruction model is modified according to the first loss value, the image hyper-resolution reconstruction model needs to be continuously trained, that is, the step of inputting the first image and the first hyper-resolution scale in the training data into the image hyper-resolution reconstruction model is continuously performed. And continuing to perform the hyper-resolution reconstruction of the first image and the first hyper-resolution in the first image and the first hyper-resolution input image in the training image set, wherein the first image and the first hyper-resolution in the hyper-resolution reconstruction model can be the first image and the first hyper-resolution which are not input into the hyper-resolution reconstruction model. For example, all the first images in the training data have unique image identifiers (e.g., image numbers), and the first hyper-resolution scale is different in value (r1 is 1, and r2 is 2 … …), and the image identifier of the first image of the first training input image hyper-resolution reconstruction model is different from the image identifier of the first image of the second training input preset network model, for example, the image number of the first image of the first training input image hyper-resolution reconstruction model is 1, the first hyper-resolution scale is r1, the image number of the first image of the second training input image hyper-resolution reconstruction model is 2, the first hyper-resolution scale is r2, the image number of the first image of the nth training input image hyper-resolution reconstruction model is N, and the first hyper-resolution scale is rN. Certainly, in practical application, because the number of the first images in the training data is limited, in order to improve the training effect of the image hyper-differential reconstruction model, the first images in the training data may be sequentially input to the image hyper-differential reconstruction model to train the image hyper-differential reconstruction model, and after the first hyper-differential scales corresponding to all the first images and all the first images in the training data are respectively input to the image hyper-differential reconstruction model, the operation of sequentially inputting the first images in the training data and the first hyper-differential scales corresponding to all the first images to the image hyper-differential reconstruction model may be continuously performed, so that the training image groups in the training data are circularly input to the image hyper-differential reconstruction model. It should be noted that, in the process of training the first image input image hyper-resolution reconstruction model, the first image input image may be input according to an image number sequence of each first image, or may not be input according to an image number sequence of each first image, of course, the same first image may be repeatedly used for training the preset network model, or the same first image may not be repeatedly used for training the preset network model, and in this embodiment, a specific implementation manner of "continuously performing the step of inputting the first image in the training image set into the preset network model" is not limited.

Optionally, in step S2, adjusting a model parameter of the image hyper-resolution reconstruction model according to the second image and the generated image, and the method may further include:

s21b, calculating a first loss value from the second image and the generated image.

In the embodiment of the present invention, step S21a is the same as step S21b, and the pixel value of each pixel of the generated image is compared with the pixel value of each pixel of the second image, so as to obtain a first Loss value Loss 1.

S22b, inputting the generated image and the second image into a discriminator network to obtain the probability of the generated image, and calculating a second loss value according to the probability.

In the embodiment of the present invention, optionally, the image hyper-resolution reconstruction model is trained in a manner of generating a countermeasure, the image hyper-resolution reconstruction model is used as a generator in the training of generating the countermeasure to obtain a generated image, the generated image and the second image are input into a discriminator network, the discriminator network is enabled to distinguish which of the generated image and the second image is a true image of the first image under a first hyper-resolution scale, and a second loss value is calculated according to a probability that the generated image output by the discriminator is the true image.

S23b, adjusting parameters of the image hyper-resolution reconstruction model according to the first loss value and the second loss value.

In the embodiment of the invention, the image hyper-resolution reconstruction model is trained in a mode of generating a countermeasure, the first loss value is equivalent to the loss value of a generator in generation and training, the second loss value is equivalent to the loss value of a discriminator in generation and training, and the parameters of the image hyper-resolution reconstruction model are adjusted according to the first loss value and the second loss value. Through multiple times of iterative training, the generated image output by the image hyper-resolution reconstruction model is closer to the second image, the discriminator cannot identify true and false images, the probability output by the discriminator is close to 0.5, the probability is close to 0.5, and the discriminator cannot distinguish true and false images and only can randomly guess the images.

In the embodiment of the invention, after the parameters are modified, the step of performing the hyper-resolution reconstruction of the first image and the first hyper-resolution scale input image in the training data is continued until a preset training condition is met, wherein the preset training condition comprises that the first loss value meets a preset requirement or the training times reach a preset number. The preset requirement may be determined according to the hyper-resolution reconstruction model trained by the image, and will not be described in detail herein, and the preset number may be the maximum training number of the image hyper-resolution reconstruction model, for example, 50000 times. The preset training condition may be that the probability output by the discriminator is close to 0.5, which indicates that the generated image obtained by the hyper-resolution reconstruction model is very close to the second image.

An embodiment of the present invention further provides an image hyper-resolution reconstruction method, and referring to fig. 5, an image hyper-resolution reconstruction method is shown, where the image hyper-resolution reconstruction method may include the following steps:

k1, acquiring the image to be processed and the second super-scale of the image to be processed.

In this embodiment of the present invention, the image to be processed may be an RGB image, or may also be an original image (RAW image) acquired by an image sensor, and the numerical value of the second super-resolution scale may be any numerical value, for example, may be a non-integer multiple, such as 1.5, 2.5, 3.3, or may also be an integer multiple: 2. 4, 6, etc.

If the image to be processed is a RAW image, the RAW image contains all information including all noise of the image, and therefore the RAW image needs to be preprocessed, the preprocessing is to remove the noise in the RAW image, and the noise of the image obtained through preprocessing is less. The preprocessing comprises the steps of carrying out dimensionality reduction processing and normalization on the image.

Specifically, when the image to be processed is a RAW image, after step K1, the method includes: and preprocessing the image to be processed to obtain a preprocessed image, and taking the preprocessed image as the image to be processed.

Specifically, the preprocessing the image to be processed to obtain a preprocessed image, and taking the preprocessed image as the image to be processed includes:

k11, converting the single-dimensional data of the image to be processed into four-dimensional data;

k12, carrying out normalization processing on the four-dimensional data to obtain a preprocessed image, and taking the preprocessed image as an image to be processed.

In the embodiment of the present invention, the image acquired by the image sensor is in a RAW format, the image sensor may be a CCD sensor or a CMOS sensor, and the data of the color image captured by the image sensor is arranged in a bayer array.

After the dimension change, the RAW image data is 4 × 4 × 4, the dimension is 4; then normalization processing is carried out to obtain the dimension of

Is mapped to [0,1 ]]A space.

And K2, inputting the image to be processed and the second hyper-resolution scale into a trained image hyper-resolution reconstruction model to obtain a hyper-resolution reconstruction image corresponding to the image to be processed, wherein the trained image hyper-resolution reconstruction model is the image hyper-resolution reconstruction model obtained by the training method of the image hyper-resolution reconstruction model.

In the embodiment of the invention, the hyper-resolution reconstructed image of the image to be processed under the second hyper-resolution scale is obtained through the trained image hyper-resolution reconstruction model. The image hyper-resolution reconstruction model comprises: the image processing device comprises a feature extraction module and an up-sampling module, wherein the feature extraction module is used for extracting the features of the image to be processed, and the up-sampling module is used for outputting the hyper-resolution reconstructed image according to the output result of the feature extraction module.

Specifically, step K2 includes:

and K21, inputting the image to be processed into a feature extraction module to obtain a feature image to be processed.

In the embodiment of the present invention, the feature extraction module may be any one of existing networks, such as a residual module of an EDSR hyper-division network, an RDB module in an RDN hyper-division network, a dense connection module in an SRDenseNet, an SE module in a SENet, an inclusion module in an inclusion network, or ConvLstm, and the like.

For example, a residual error module of the EDSR hyper-division network may be selected as a feature extraction module in the hyper-division reconstruction model in the embodiment of the present invention, and the to-be-processed image is input into the residual error module of the EDSR hyper-division network, so as to obtain a to-be-processed feature image corresponding to the to-be-processed image.

And K22, inputting the characteristic image to be processed and the second super-resolution scale into an up-sampling module to obtain a super-resolution reconstructed image.

Specifically, the upsampling module includes a first upsampling layer, a second upsampling layer, a third upsampling layer, and an output layer, and step K22 includes:

and K221, inputting the feature image to be processed and the second super-resolution scale into the first up-sampling layer to obtain a second position offset and a second local block corresponding to each third pixel point, wherein the third pixel points are pixel points in the feature image to be processed.

In the embodiment of the invention, according to the feature image to be processed and the second super-resolution scale, the size of the super-resolution reconstructed image is obtained through the first up-sampling layer, and since the super-resolution reconstructed image is not really obtained in the step, the super-resolution reconstructed image can be called as an assumed super-resolution reconstructed image, the coordinates of each pixel point in the assumed super-resolution reconstructed image can be obtained through the size of the assumed super-resolution reconstructed image, and then each pixel point in the feature image to be processed, namely each third pixel point, can be deduced from the coordinates of each pixel point in the assumed super-resolution reconstructed image, and any pixel point in the feature image to be processed can be called as a first pixel point. And through the first upper sampling layer, second position offset and a second local block corresponding to each third pixel point can be obtained. And deducing the position offset generated in the process of each pixel point in the characteristic image to be processed by the coordinate of each pixel point in the assumed hyper-resolution reconstructed image, wherein the position offset is the second position offset. And after each third pixel point is obtained, the second local block takes each third pixel point as a center, and partial pixel points in the characteristic image to be processed are taken to form a third local block, wherein the partial pixel points comprise neighborhood pixel points taking the third pixel points as the center in the characteristic image to be processed.

Specifically, step K221 includes:

and K221a, inputting the feature image to be processed and the second super-resolution scale into the first upsampling layer to determine a second assumed pixel point of the super-resolution reconstructed image.

In the embodiment of the invention, the first upsampling layer can obtain the resolution of the super-resolution reconstructed image through the resolution of the feature image to be processed and the numerical value of the second super-resolution scale, the coordinates of each pixel point in the super-resolution reconstructed image can be known through the resolution of the super-resolution reconstructed image, and the coordinates of each pixel point in the generated image are called as second assumed pixel points by calculating the resolution of the super-resolution reconstructed image because the super-resolution reconstructed image is not generated.

For example, assuming that the resolution of the input feature image a is 500 × 500 and the second super-resolution r2 is 2.2, the resolution of the super-resolution reconstructed image B may be 1100 × 1100, a pixel point of the super-resolution reconstructed image B may be assumed, that is, a second assumed pixel point, an abscissa of the second assumed pixel point may be 1 to 1100, and an ordinate of the second assumed pixel point may be 1 to 1100.

K221b, and calculating a second calculation value corresponding to the second assumed pixel point according to the second assumed pixel point and the second super-resolution scale.

In the embodiment of the invention, the super-resolution reconstructed image is an image obtained by the feature image to be processed under a second super-resolution scale, and the ratio of the coordinates of each pixel point of the super-resolution reconstructed image to the coordinates of each pixel point of the feature image to be processed is theoretically close to the numerical value of the second super-resolution scale. And dividing the coordinates of each second assumed pixel point by a value of a second super-resolution scale to obtain a second calculated value corresponding to each second assumed pixel point, wherein the second calculated value represents the real coordinates of each third pixel point of the characteristic image to be processed, which should be obtained theoretically through the coordinates of each pixel point of the super-resolution reconstructed image and the second super-resolution scale.

For example, for each second assumed pixel point a1 of the hyper-resolution reconstructed image B being (i1, j1) being (700,200), and the second hyper-resolution r2 being 2.2, the abscissa and ordinate of a1 are divided by r2, so that the second calculated value B1 corresponding to the second assumed pixel point a1, that is, the second calculated value B1 corresponding to the second assumed pixel point a1 can be calculated

And K221c, performing rounding calculation on the second calculated value to obtain a third pixel point corresponding to the second assumed pixel point.

In the embodiment of the present invention, since the value of the second superscale r2 may be any value, and each second assumed pixel is directly divided by r2, which cannot be guaranteed to be an integer, the rounding calculation needs to be performed on the second calculated value, optionally, the rounding calculation is performed on the second calculated value, and the rounded value is used as the third pixel corresponding to each second assumed pixel.

For example, for each second assumed pixel point of the hyper-resolution reconstructed image B, a pixel point a1 is equal to (i1, j1) or equal to (700,200), a corresponding second calculated value B1 is equal to (318.18,90.91), and rounding-down is performed on B1 to obtain a third pixel point corresponding to a1

And K221d, calculating a difference value between the third pixel point and the second calculated value to obtain a second position offset corresponding to each third pixel point.

In the embodiment of the present invention, in step K221c, since the second calculated value is rounded, the second calculated value directly corresponding to the second assumed pixel point through the second super-resolution scale and a position offset, referred to as a second position offset, occurs between the second calculated value and the third pixel point of the feature image obtained by rounding the second calculated value.

For example, for one pixel point c1 of the second pixel points of the feature image to be processed, the corresponding second calculated value is: b1 is (318.18,90.91), and the second amount of positional displacement of c1 can be obtained from b1 and c1

Alternatively, considering the effect of the second superscalar, it may be: and aiming at each third pixel point, obtaining a second position offset corresponding to each third pixel point according to the second pixel point, a second calculated value corresponding to the second pixel point and the second super-resolution scale.

In the embodiment of the present invention, the second position offset may further include a second super-resolution scale in addition to the difference between the second calculated value and the second pixel point, so that when the second position offset is used to train a plurality of different second super-resolution scales at the same time, the second position offset of different super-resolution scales may be distinguished, and further, a second weight value under different super-resolution scales and a pixel value of a pixel point of a super-resolution reconstructed image obtained under different super-resolution scales are distinguished.

For example, when a1 is equal to (i1, j1) is equal to (700,200), and the second super-resolution r2 is equal to 2.2, then c1 is equal to (318,90), and its corresponding second calculated value is: when b1 is (318.18,90.91), the second amount of displacement of c1 of c1 can be obtained from b1, c1 and r2

And K221e, extracting second local blocks corresponding to the third pixel points respectively from the feature image to be processed according to the third pixel points and the sizes of convolution kernels in the first up-sampling layer.

In the embodiment of the present invention, k × k matrices are obtained through convolution kernels in the first upper sampling layer, each third pixel point is sequentially used as a central point of the k × k matrices, a region corresponding to the k × k matrix whose central point is the third pixel point can be used as a second local block corresponding to the third pixel point, for example, a region corresponding to the k × k matrix whose central point is the third pixel point c2 can be used as a second local block corresponding to the third pixel point c2, and a region corresponding to the k × k matrix whose central point is the third pixel point c3 can be used as a second local block corresponding to the third pixel point c 3; the second local block includes a third pixel point and a neighborhood pixel point of the third pixel point, where k is the number of convolution kernels of the first upsampling layer, and the size of the second local block is k × k.

For example, a third pixel coordinate c1 of the feature image a to be processed is (318,90), in the feature image a to be processed, taking c1 as a center, taking a second local block h "1 with a size k × k, where k is the size of the convolution kernel of the first upsampling layer, and assuming that k is 3, the second local block h" 1 is composed of 9 pixels in the feature image a to be processed.

In the embodiment of the present invention, when the third pixel point is an edge pixel point of the feature image to be processed, a complete region corresponding to a k × k matrix with the third pixel point as a central point cannot be obtained in the feature image to be processed, at this time, zero padding operation needs to be performed, that is, the position of a neighborhood pixel point of the third pixel point that is missing in the second local block is filled with a pixel value "0", so that after zero padding, a part of pixel points in the second local block corresponding to the third pixel point at the edge of the feature image to be processed are part of the third pixel points in the feature image to be processed, and another part of pixel points are filled with pixel values of 0.

For example, the following steps are carried out: if a coordinate c4 of a third pixel point of the feature image a to be processed is equal to (1,1), k is equal to 3, in the feature image a to be processed, the c4 is equal to (1,1) as a center, a second local block h "2 with a size of 3 × 3 is taken, the second local block h" 2 includes a third pixel point c4 equal to (1,1), a third pixel point c5 is equal to (1,2), a third pixel point c6 is equal to (2,1), a third pixel point c7 is equal to (2,2), and the remaining five filling pixel points with pixel values of 0.

And K222, inputting the second position offset and the second local block corresponding to each third pixel point to the second upsampling layer to obtain a second weight corresponding to each third pixel point.

In the embodiment of the present invention, the second weight corresponding to each third pixel point is obtained according to the second position offset and the second local block of each third pixel point, so that the second weight is not only related to the second position offset but also related to the second local block, and a process of obtaining the second weight through the second upsampling layer may also be referred to as a bilateral upsampling process.

In the embodiment of the present invention, the second position offset and the second local block of each third pixel point are input into the second upsampling layer in a linear combination manner, and a second weight is obtained through transformation of a nonlinear activation function. And inputting the second position offset and the second local block into the second upsampling layer to obtain a second weight, wherein the second position offset and the second local block are output results of the first upsampling layer, the second weight represents the connection strength between the neurons in the second upsampling layer and the neurons in the first upsampling layer, the larger the second weight is, the larger the connection strength between the neurons is, and otherwise, the smaller the second weight is, the smaller the connection strength between the neurons is.

Specifically, the second upsampling layer includes a first convolutional layer and a second convolutional layer, and step K222 includes:

and K222a, inputting the second position offset into the first convolution layer to obtain second position components corresponding to the third pixel points respectively.

In the embodiment of the present invention, the second position offset is input into the first convolution layer, and the second position offset is subjected to nonlinear transformation to obtain the second position component, where the second position component represents the size of the second position offset of the third pixel, and a larger second position component indicates a larger second position offset, whereas a smaller second position component indicates a smaller second position offset.

For example, for one pixel c1 of the third pixels, the second position offset V2 is input to the first convolution layer to obtain the second position component point "1" corresponding to q1, and similarly, for each third pixel c "1, … …, c" n, the second position component corresponding to each third pixel may be obtained: and point '1, … …, point' n, n is the number of pixel points of the super-resolution reconstructed image.

And K222b, inputting the second local block into the second convolution layer to obtain second data components corresponding to the third pixel points respectively.

In the embodiment of the present invention, a second local block is input to a second convolution layer, and after the second local block is subjected to nonlinear transformation, a second data component is obtained, where the second data component represents feature information of the second local block in which each pixel in the hyper-resolution reconstructed image is mapped to each pixel in the feature image to be processed, and the feature information of the second local block includes color features, texture feature shape features, and spatial relationship features of each pixel in the second local block.

For example, for one pixel c1 of the third pixels, the second local block h ″ 1 is input to the second convolutional layer to obtain the second data component data "1" corresponding to c1, and similarly, for each third pixel c1, … …, cn, the second data component corresponding to each third pixel may be obtained: data "1, … …, data" n.

And K222c, obtaining second weight values respectively corresponding to the third pixel points according to the multiplication of the second position component and the second data component.

In the embodiment of the present invention, according to the second position component output by the first convolution layer and the second data component output by the second convolution layer, the second weight corresponding to each third pixel point can be obtained according to the second position component and the second data component; or performing summation operation according to the weights of the second position component and the second data component to obtain second weights corresponding to the third pixel points respectively.

For example, for one pixel point c1 in each third pixel point, the second position component is: point '1, the second data component is data '1, the point '1 is multiplied by the data '1 to obtain a second weight value W '1 of c1, and similarly, for each third pixel point c1, … …, cn, a second weight value W '1, … … W ' n corresponding to each third pixel point can be obtained. For one pixel point c1 of the third pixel points, the second position component is: point "1, where the second data component is data" 1, or performing summation operation according to the weights of the two position components and the second data component to obtain a second weight W "1, which is recorded as: w "1 ═ α 'point" 1+ β' data "1, where α 'is a parameter of the first convolution layer in the trained hyper-partially reconstructed image model, β' is a parameter of the second convolution layer of the trained hyper-partially reconstructed image model, point" 1 is the second position component of c1, and data "1 is the second data component of c 1.

K223, inputting the second local block and the second weight value into the third upsampling layer to obtain a pixel value corresponding to each fourth pixel point, wherein the fourth pixel points are pixel points in the hyper-resolution reconstructed image;

in the embodiment of the present invention, for a third pixel point c1 of the to-be-processed feature image a being (I1', j1'), and a fourth pixel point d1 of the corresponding generated hyper-resolution reconstructed image B being (I1, j1), as in formula (2), a pixel value of each fourth pixel point in the hyper-resolution reconstructed image B may be obtained through a mapping function from the second local block h ″ 1 of the to-be-processed feature image a to the pixel value I of the hyper-resolution reconstructed image B.

I^B(i,j)＝Φ(s^A(h″),W″(i′,j′)) (2)

Wherein Φ represents a mapping between a feature value S of the second local block h ″ to a pixel value of each second pixel of the generated image, S^A(h ') represents the characteristic value of a second local block h' of the pixel point (I ', j') in the characteristic image A to be processed, W '(I', j ') is a second weight value of the pixel point (I', j ') in the characteristic image A to be processed, I' (I ', j') is a second weight value of the pixel point (I ', j') in the characteristic image A to be processed^BAnd (i, j) is the pixel value of a fourth pixel point with the coordinate (i, j) in the super-resolution reconstructed image B.

And K224, inputting the pixel values corresponding to the fourth pixel points into the output layer to obtain the hyper-resolution reconstructed image.

In the embodiment of the present invention, the output layer is configured to convert the output result of the third sampling layer into common RGB format data, that is, to convert the pixel value corresponding to each fourth pixel into common RGB format data, specifically, scale scaling is performed first, and the pixel value corresponding to each fourth pixel is multiplied by 255, so as to ensure that the final output has the same range as the RGB data, data beyond the range of [0,255] may exist after scale scaling, data truncation needs to be performed on the result after scale scaling, the data smaller than 0 is set to 0, and the data larger than 255 is set to 255, so as to obtain the super-resolution reconstructed image.

The existing hyper-resolution method needs to adopt different models for recalculation according to different scales, and an image hyper-resolution reconstruction model can realize the hyper-resolution of various arbitrary scales. Referring to fig. 6, fig. 6 shows the super-resolution effect of different scales, where a is 1.5 times, b is 2 times, c is 2.5 times, d is 3 times, e is 3.5 times, and f is 4 times, when a same image is subjected to multi-scale amplification and super-resolution, only the super-resolution scale of the upper sampling module in the input super-resolution reconstruction model needs to be changed, so that the operation of a set of models for multi-scale image super-resolution is convenient and fast, and the user experience is undoubtedly improved.

In one embodiment, the present invention provides a computer device, which may be a terminal, having an internal structure as shown in fig. 7. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a training method for an image hyper-resolution reconstruction model. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the illustration in fig. 7 is merely a block diagram of a portion of the structure associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, there is provided a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

In one embodiment, a computer-readable storage medium is provided, having stored thereon a computer program which, when executed by a processor, performs the steps of:

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The training method and the computer device for the image hyper-resolution reconstruction model comprise the following steps: inputting a first image and a first super-resolution scale in training data into an image super-resolution reconstruction model, and generating a generated image corresponding to the first image under the first super-resolution scale through the image super-resolution reconstruction model, wherein the training data comprises a plurality of groups of training image groups, each group of training image group comprises a first image, a second image and a first super-resolution scale, and the second image is an image corresponding to the first image under the first super-resolution scale; and adjusting model parameters of the image hyper-resolution reconstruction model according to a second image corresponding to the first image and a generated image corresponding to the first image, and continuing to execute the step of inputting the first image and the first hyper-resolution scale in the training data into the image hyper-resolution reconstruction model until a preset training condition is met, so as to obtain a trained image hyper-resolution reconstruction model. In the method, during training, the coordinates of the first pixel points are calculated through rounding, so that the coordinate mapping between the first image and the generated image can be found under the super-resolution scale of any numerical value, and the trained image super-resolution reconstruction model obtained through training does not limit the numerical value of the super-resolution scale; the method generates the first weight under the first hyper-resolution scale during training, and can dynamically generate the weight corresponding to each hyper-resolution scale for different first hyper-resolution scales.

Claims

1. A training method of an image hyper-resolution reconstruction model is characterized by comprising the following steps:

2. The method of claim 1, wherein the image hyper-resolution reconstruction model comprises: the device comprises a feature extraction module and an up-sampling module;

the method for inputting a first image and a first super-resolution scale in training data into an image super-resolution reconstruction model, and generating a generated image corresponding to the first image under the first super-resolution scale through the image super-resolution reconstruction model comprises the following steps:

inputting the first image into the feature extraction module to obtain a feature image corresponding to the first image;

and inputting the characteristic image and the first super-scale into the up-sampling module to obtain a generated image corresponding to the characteristic image.

3. The method of claim 2, wherein the upsampling module comprises a first upsampling layer, a second upsampling layer, a third upsampling layer, and an output layer;

the inputting the feature image and the first super-scale into the up-sampling module to obtain a generated image corresponding to the feature image includes:

inputting the feature image and the first super-scale into the first up-sampling layer to obtain a first position offset and a first local block corresponding to each first pixel point, wherein the first pixel points are pixel points in the feature image;

inputting the first position offset and the first local block corresponding to each first pixel point to the second upsampling layer to obtain a first weight corresponding to each first pixel point;

inputting the first local block and the first weight to the third upsampling layer to obtain pixel values corresponding to second pixels respectively, wherein the second pixels are pixels in the generated image;

and inputting the pixel values corresponding to the second pixel points into the output layer to obtain the generated image.

4. The method of claim 3, wherein the inputting the feature image and the first super-scale into the first upsampling layer to obtain a first position offset and a first local block corresponding to each first pixel point comprises:

inputting the feature image and the first superscale into the first upsampling layer to determine first hypothetical pixel points of the generated image;

calculating a first calculation value respectively corresponding to each first assumed pixel point according to each first assumed pixel point and the first super-resolution scale;

rounding the first calculation values respectively corresponding to the first assumed pixel points to obtain first pixel points respectively corresponding to the first assumed pixel points;

calculating a difference value between the first pixel point and a first calculated value corresponding to the first pixel point aiming at each first pixel point to obtain a first position offset corresponding to each first pixel point;

and extracting first local blocks corresponding to the first pixel points in the characteristic image according to the sizes of the first pixel points and the convolution kernels of the first upper sampling layer.

5. The method of claim 3, wherein the second upsampling layer comprises a first convolutional layer and a second convolutional layer;

the inputting the first position offset and the first local block corresponding to each first pixel point to the second upsampling layer to obtain the first weight corresponding to each first pixel point includes:

inputting the first position offset amount into the first convolution layer to obtain first position components corresponding to the first pixel points respectively;

inputting the first local block into the second convolution layer to obtain first data components corresponding to the first pixel points respectively;

and obtaining a first weight value corresponding to each first pixel point according to the first position component and the first data component.

6. The method according to claim 3, wherein the inputting the first position offset and the first local block corresponding to each of the first pixel points to the second upsampling layer to obtain the first weight corresponding to each of the first pixel points comprises:

and splicing and inputting the first position offset and the first local block which respectively correspond to each first pixel point to the second upper sampling layer to obtain a first weight which respectively corresponds to each first pixel point.

7. The method of claim 1, wherein the adjusting model parameters of the image hyper-resolution reconstruction model from the second image and the generated image comprises:

calculating a first loss value from the second image and the generated image;

and adjusting parameters of the image hyper-resolution reconstruction model according to the first loss value.

8. The method according to claim 1, wherein the adjusting the model parameters of the image hyper-resolution reconstruction model according to the second image corresponding to the first image and the generated image corresponding to the first image comprises:

calculating a first loss value according to a second image corresponding to the first image and a generated image corresponding to the first image;

inputting the generated image and the second image into a discriminator network to obtain the probability of the generated image, and calculating a second loss value according to the probability;

and adjusting parameters of the image hyper-resolution reconstruction model according to the first loss value and the second loss value.

9. The method of claim 1, wherein when the first image is a raw image acquired by an image sensor, prior to said hyper-separating the first image and the first hyper-scale input image into a reconstructed model, the method further comprises:

and preprocessing the first image to obtain a preprocessed image, and taking the preprocessed image as the first image.

10. The method according to claim 9, wherein the preprocessing the first image to obtain a preprocessed image, and using the preprocessed image as the first image comprises:

converting single-dimensional data of the first image into four-dimensional data;

and carrying out normalization processing on the four-dimensional data to obtain a preprocessed image, and taking the preprocessed image as a first image.

11. A method of image hyper-resolution reconstruction, the method comprising:

inputting the image to be processed and the second hyper-scale into a trained image hyper-resolution reconstruction model to obtain a hyper-resolution reconstruction image corresponding to the image to be processed, wherein the trained image hyper-resolution reconstruction model is the image hyper-resolution reconstruction model according to any one of claims 1 to 10.

12. The method of claim 11, wherein the image hyper-resolution reconstruction model comprises: the device comprises a feature extraction module and an up-sampling module;

the inputting the image to be processed and the second super-resolution scale into a trained image super-resolution reconstruction model to obtain a super-resolution reconstruction image corresponding to the image to be processed includes:

inputting the image to be processed into a feature extraction module to obtain a feature image to be processed;

and inputting the characteristic image to be processed and the second super-resolution scale into an up-sampling module to obtain a super-resolution reconstructed image.

13. The method of claim 12, wherein the upsampling module comprises a first upsampling layer, a second upsampling layer, a third upsampling layer, and an output layer;

inputting the feature image to be processed and the second super-resolution scale into an up-sampling module to obtain a super-resolution reconstructed image, wherein the method comprises the following steps:

inputting the feature image to be processed and the second super-resolution scale into the first upper sampling layer to obtain a second position offset and a second local block corresponding to each third pixel point, wherein the third pixel points are pixel points in the feature image to be processed;

inputting the second position offset and the second local block corresponding to each third pixel point to the second upsampling layer to obtain a second weight corresponding to each third pixel point;

inputting the second local block and the second weight value into the third upsampling layer to obtain pixel values corresponding to fourth pixel points respectively, wherein the fourth pixel points are pixel points in the hyper-resolution reconstructed image;

and inputting the pixel values corresponding to the fourth pixel points into the output layer to obtain the hyper-resolution reconstructed image.

14. The method according to claim 13, wherein the inputting the feature image to be processed and the second super-scale into the first upsampling layer to obtain a second position offset and a second local block corresponding to each third pixel point respectively comprises:

inputting the feature image to be processed and the second super-resolution scale into the first upsampling layer to determine a second assumed pixel point of the super-resolution reconstructed image;

calculating a second calculation value corresponding to the second assumed pixel point according to the second assumed pixel point and the second super-resolution scale;

performing rounding calculation on the second calculated value to obtain a third pixel point corresponding to the second assumed pixel point;

calculating the difference value between the third pixel point and the second calculated value to obtain a second position offset corresponding to each third pixel point;

and extracting a second local block corresponding to each third pixel point in the feature image to be processed according to the third pixel points and the size of the convolution kernel in the first up-sampling layer.

15. The method of claim 13, wherein the second upsampling layer comprises a first convolutional layer and a second convolutional layer;

the inputting the second position offset and the second local block corresponding to each third pixel point to the second upsampling layer to obtain a second weight corresponding to each third pixel point includes:

inputting the second position offset amount to the first convolution layer to obtain second position components corresponding to the third pixel points respectively;

inputting the second local block into the second convolution layer to obtain second data components corresponding to the third pixel points respectively;

and obtaining second weights respectively corresponding to the third pixel points according to the second position component and the second data component.

16. The method according to claim 13, wherein the inputting the second position offset and the second local block respectively corresponding to the third pixels into the second upsampling layer to obtain second weights respectively corresponding to the third pixels comprises:

and splicing and inputting the second position offset and the second local block to the second upsampling layer to obtain second weights respectively corresponding to the third pixel points.

17. The method of claim 11, wherein when the image to be processed is a raw image obtained by an image sensor, prior to the inputting the image to be processed and the second hyper-scale into a trained hyper-spectral reconstruction model, the method further comprises:

and preprocessing the image to be processed to obtain a preprocessed image, and taking the preprocessed image as the image to be processed.

18. The method according to claim 17, wherein the preprocessing the image to be processed to obtain a preprocessed image, and taking the preprocessed image as the image to be processed comprises:

converting the single-dimensional data of the image to be processed into four-dimensional data;

and carrying out normalization processing on the four-dimensional data to obtain a preprocessed image, and taking the preprocessed image as an image to be processed.

19. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method of any one of claims 1 to 10 when executing the computer program.

20. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 10.