CN116580269B

CN116580269B - Method for training model, method for processing image, electronic device and storage medium

Info

Publication number: CN116580269B
Application number: CN202310854693.XA
Authority: CN
Inventors: 毕涵; 武臻尧
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2023-07-13
Filing date: 2023-07-13
Publication date: 2023-09-19
Anticipated expiration: 2043-07-13
Also published as: CN116580269A

Abstract

The application provides a method for training a model, a method for processing an image, electronic equipment and a storage medium, wherein the method comprises the following steps: obtaining a feature map of a first image through a first encoder, and obtaining a first feature map of a second image through a second encoder; fusing the feature map of the first image and the first noise map to obtain a fused first result; fusing the fused first result and the first feature map of the second image to obtain a fused second result; inputting the fused first result and the fused second result to a target module to obtain a second noise diagram output by the first model to be trained; and determining target parameters of the first model to be trained according to the second noise diagram and the first noise diagram. In the above process, since the feature images of the first image and the first feature images of the second image correspond to the same feature space, the problem that the color of the obtained image is too changed can be avoided, and at the same time, the second image can control the details of the image obtained based on the model.

Description

Method for training model, method for processing image, electronic device and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method for training a model, a method for processing an image, an electronic device, and a storage medium.

Background

With the continuous development of deep learning technology, the application range of the deep learning technology is wider and wider, for example, the deep learning model is trained, and the trained model is utilized to improve the resolution of an input image.

In the related art, there is a model capable of improving the resolution of an image, however, an output image obtained by the model has a large difference from an original image in terms of image details and image colors, although the resolution is improved with respect to the original image.

Disclosure of Invention

The application provides a method for training a model, a method for processing an image, electronic equipment and a storage medium.

In a first aspect, a method for training a model is provided, the first model to be trained comprising: a first encoder, a second encoder, and a target module, the method comprising:

Encoding a first image from an image space to a feature space by a first encoder to obtain a feature map of the first image, and encoding a second image from the image space to the feature space by a second encoder to obtain the first feature map of the second image, wherein the difference between the second image and the first image is as follows: the resolution of the second image is lower than that of the first image, and the feature map of the first image and the first feature map of the second image correspond to the same feature space;

fusing the feature map of the first image and the first noise map to obtain a fused first result;

fusing the fused first result and the first feature map of the second image to obtain a fused second result;

inputting the fused first result and the fused second result into a target module to obtain a second noise diagram output by the first model to be trained, wherein the target module is used for predicting noise in the fused first result;

and determining target parameters of the first model to be trained according to the second noise diagram and the first noise diagram.

In this embodiment, in the process of training the first model to be trained, a first result obtained after adding noise to a feature image of the first image, a first feature image of the second image (as a control condition of the first model to be trained), and a second result obtained after fusing the first result are input to the target module to predict noise in the first result.

Therefore, in the application stage of the first target model obtained after the convergence of the first model to be trained, when the low-resolution image input to the first target model is used as the control condition of the first target model, the first target image obtained through the first target model is consistent with the image details of the input low-resolution image, the first target image is obtained through decoding a result obtained by carrying out repeated iterative noise reduction on the noise input to the first target model through the noise output by the first target model, the noise output by the first target model is obtained by the first target model based on the noise input each time and the low-resolution image, other noise input to the first target model except the noise initially input to the first target model is obtained after each iterative noise reduction, and the decoder is matched with the first encoder.

In addition, by making the feature map of the first image obtained based on the first encoder and the first feature map of the second image obtained based on the second encoder correspond to the same feature space, therefore, the first result obtained after adding noise to the feature map of the first image and the first feature map of the second image still correspond to the same feature space, in this case, in the fusion process of the first feature map of the second image and the first result, since the feature information represented by the plurality of matrices for characterizing the first feature map and the feature information represented by the plurality of matrices for characterizing the first result are in one-to-one correspondence, when the plurality of matrices for characterizing the first feature map and the plurality of matrices for characterizing the first result are added in one-to-one correspondence, a set of matrices for characterizing the same feature information can be ensured to be added, so that in the application stage of the first object model, the first object image can be made to the low resolution image with respect to the low resolution image, on the basis of improvement, a large difference in image color and the low resolution of the first object image can be avoided.

In one possible implementation manner, the target module includes at least one first downsampling module connected in series, at least one first upsampling module connected in series, and a plurality of second downsampling modules connected in series, wherein the at least one first downsampling module is connected in series with the at least one first upsampling module, the maximum order and the minimum order of output matrixes corresponding to the at least one first downsampling module and the at least one first upsampling module are the same, and the order of the output matrix corresponding to the plurality of second downsampling modules is a subset of the order of the output matrix corresponding to the at least one first upsampling module;

inputting the fused first result and the fused second result to a target module to obtain a second noise diagram output by the first model to be trained, wherein the second noise diagram comprises the following components:

the method comprises the steps that output of a previous stage module is subjected to downsampling in sequence through at least one first downsampling module to obtain a first downsampling result, wherein the first downsampling result is output of a first downsampling module with the smallest order of an output matrix in the at least one first downsampling module, and input of a first downsampling module with the largest order of the output matrix in the at least one first downsampling module is a fused first result;

Sequentially downsampling the output of the upper stage module through a plurality of second downsampling modules to obtain a plurality of second downsampling results, wherein the input of the second downsampling module with the largest order of the output matrix in the second downsampling modules is a fused second result;

sequentially upsampling respective inputs through at least one first upsampling module to obtain at least one first upsampling result, wherein the output of a first upsampling module with the largest order of an output matrix in the at least one first upsampling module is a second noise figure, or the output of a second downsampling module with the same maximum order of the output matrix in the at least one first upsampling module in a plurality of second downsampling modules is added with the output of a first upsampling module with the largest order of the output matrix in the at least one first upsampling module to obtain a second noise figure;

the input of the first up-sampling module with the minimum order of the output matrix in at least one first up-sampling module is a first down-sampling result; the input of the target up-sampling module in the at least one first up-sampling module is the result of adding the output of the upper stage module of the target up-sampling module and the output of the second down-sampling module with the smallest order of the output matrix in the plurality of second down-sampling modules, and the target up-sampling module is the next stage module of the first up-sampling module with the same order of the output matrix in the at least one first up-sampling module and the smallest order of the output matrix in the plurality of second down-sampling modules; the input of the other up-sampling modules in the at least one first up-sampling module is the output of the upper-stage module, or is the result of adding the output of the upper-stage module and the output of the second down-sampling module with the same order as the output matrix of the upper-stage module in the plurality of second down-sampling modules.

In this embodiment, when predicting noise, the target module is used as a model of predicting noise, and at least one first downsampling module connected in series in the target module downsamples a first result obtained by fusing a feature map of the first image and the first noise map, that is: feature extraction is carried out on the output matrix corresponding to the first result, the order of the output matrix corresponding to the first result is continuously reduced until the output matrix with the minimum order (first downsampling result) is obtained, and downsampling is carried out on the second result after the first feature graphs of the first result and the second image are fused through a plurality of second downsampling modules connected in series in the target module, namely: extracting features of an output matrix corresponding to the second result, and continuously reducing the order of the output matrix corresponding to the second result to obtain a second downsampling result corresponding to each second downsampling module; for at least one first up-sampling module in the target module, taking the first down-sampling result as the input of the first up-sampling module with the minimum order of the output matrix in the at least one first up-sampling module, adding the output of the first up-sampling module with the same order as the minimum order of the output matrix in the at least one first up-sampling module and the output of the second down-sampling module with the minimum order of the output matrix in the plurality of second down-sampling modules as the input of the target up-sampling module, taking the output of the upper stage module or the result of adding the output of the upper stage module and the output of the second down-sampling module with the same order as the output of the output matrix of the upper stage module in the plurality of second down-sampling modules as the input of other up-sampling modules, the output of the first up-sampling module with the largest order of the output matrix in the at least one first up-sampling module is the second noise figure output by the first model to be trained, or the output of the second down-sampling module with the same maximum order of the output matrix in the at least one first up-sampling module in the plurality of second down-sampling modules is the second noise figure output by the first model to be trained after the addition of the output of the first up-sampling module with the largest order of the output matrix in the at least one first up-sampling module, in the process, the second result fused by the plurality of second down-sampling modules connected in series is the second noise figure output by the first model to be trained, which is favorable for extracting more characteristics of the second image, and adding the second downsampling result with the output of a corresponding module in at least one first upsampling module, connecting at least one module in a plurality of second downsampling modules with at least one first upsampling module, transmitting the information of the second image to the at least one first upsampling module, and better training a first model to be trained.

In one possible implementation manner, determining the target parameter of the first model to be trained according to the second noise map and the first noise map includes:

determining a first loss value of the second noise diagram and the first noise diagram by adopting a first preset loss function;

and determining target parameters of the first model to be trained according to the magnitude relation between the first loss value and the first preset threshold value.

In this embodiment, the first preset loss function is a standard for determining whether the first model to be trained is qualified, so that the model obtained by training can be effectively ensured to have a higher-precision output result.

In one possible implementation manner, determining the target parameter of the first model to be trained according to the magnitude relation between the first loss value and the first preset threshold value includes:

returning to execute if the first loss value is greater than or equal to a first preset threshold value: the method comprises the steps of encoding a first image from an image space to a feature space through a first encoder to obtain a feature image of the first image, encoding a second image from the image space to the feature space through a second encoder to obtain a first feature image of the second image until a first loss value is smaller than a first preset threshold value, and taking parameters used in a corresponding first model to be trained training process under the condition that the first loss value is smaller than the first preset threshold value as target parameters, wherein the parameters used in the corresponding first model to be trained training process under the condition that the first loss value is smaller than the first preset threshold value are obtained after adjustment of the parameters used in the last first model to be trained.

In this embodiment, when the first loss value is greater than or equal to the first preset threshold, the parameter used in the last training process of the first model to be trained is adjusted, the adjusted parameter is used as the parameter used in the current training process of the first model to be trained, the first model to be trained is trained, and training is stopped until the first loss value is less than the first preset threshold, so as to obtain the target parameter, and through the process, more accurate model parameters can be obtained, which is beneficial to improving the performance of the model.

In one possible implementation, before the second image is encoded from the image space to the feature space by the second encoder, the method further includes:

performing degradation treatment on the first image to obtain a degraded first image, and upsampling the degraded first image to obtain a second image;

or,

and converting the first image into an unprocessed RAW file, and performing channel separation and interpolation processing on the RAW file to obtain a second image.

In this embodiment, the second image is obtained through the above manner, which is simple and efficient, so that the second image is only lower in resolution than the first image, and is beneficial to the subsequent training process of the first model to be trained.

In a second aspect, a method for training a model is provided, wherein the second model to be trained includes: a first encoder, a third encoder, and a target module, the method comprising:

encoding a first image from an image space to a feature space by a first encoder to obtain a feature map of the first image, and encoding a second image from the image space to the feature space by a third encoder to obtain a second feature map of the second image, wherein the second image and the first image are distinguished by: the resolution of the second image is lower than that of the first image, and the feature images of the first image and the second feature images of the second image correspond to different feature spaces;

fusing the fused first result and the second feature map of the second image to obtain a fused third result;

inputting the fused first result and the fused third result into a target module to obtain a third noise diagram output by the second model to be trained, wherein the target module is used for predicting noise in the fused first result;

and determining target parameters of the second model to be trained according to the third noise diagram and the first noise diagram.

In this embodiment, in the process of training the second model to be trained, a first result obtained after adding noise to the feature map of the first image, a second feature map of the second image (as a control condition of the second model to be trained), and a third result obtained after fusing the first result are input to the target module to predict the noise in the first result.

Therefore, in the application stage of the second target model obtained after the convergence of the second model to be trained, when the low-resolution image input to the second target model is used as the control condition of the second target model, the second target image obtained through the second target model is consistent with the image details of the input low-resolution image, the second target image is obtained by denoising the result obtained by merging the feature image corresponding to the image obtained by denoising the low-resolution image input to the second target model through the noise output by the second target model and the noise, the result obtained by decoding the result obtained by denoising through a decoder after multiple iterations, the noise output by the second target model is obtained by merging the feature image corresponding to the image obtained by denoising the low-resolution image through the second target model, and the other noise which is obtained by merging the feature image corresponding to the image obtained by denoising the low-resolution image is obtained by denoising the feature image which is originally, besides the noise obtained by merging the feature image obtained by denoising the low-resolution image, so that the difference between the second target model and the low-resolution image can be avoided in the aspect of high-resolution image compared with the low-resolution image through the second target image.

inputting the fused first result and the fused third result to a target module to obtain a third noise diagram output by the second model to be trained, wherein the third noise diagram comprises the following steps:

Sequentially downsampling the output of the upper stage module through a plurality of second downsampling modules to obtain a plurality of third downsampling results, wherein the input of the second downsampling module with the largest order of the output matrix in the second downsampling modules is the fused third result;

sequentially upsampling respective inputs through at least one first upsampling module to obtain at least one second upsampling result, wherein the output of a first upsampling module with the largest order of an output matrix in the at least one first upsampling module is a third noise figure, or the result of adding the output of a second downsampling module with the same maximum order of the output matrix in the at least one first upsampling module in a plurality of second downsampling modules to the output of a first upsampling module with the largest order of the output matrix in the at least one first upsampling module is a third noise figure;

In this embodiment, when predicting noise, the target module is used as a model of predicting noise, and at least one first downsampling module connected in series in the target module downsamples a first result obtained by fusing a feature map of the first image and the first noise map, that is: feature extraction is carried out on the output matrix corresponding to the first result, the order of the output matrix corresponding to the first result is continuously reduced until the output matrix with the minimum order (first downsampling result) is obtained, and downsampling is carried out on a third result obtained by fusing the first result and the second feature map of the second image through a plurality of second downsampling modules connected in series in the target module, namely: extracting features of an output matrix corresponding to the third result, and continuously reducing the order of the output matrix corresponding to the third result to obtain a third downsampling result corresponding to each second downsampling module; for at least one first up-sampling module in the target module, taking the first down-sampling result as the input of the first up-sampling module with the minimum order of the output matrix in the at least one first up-sampling module, adding the output of the first up-sampling module with the same order as the minimum order of the output matrix in the at least one first up-sampling module and the output of the second down-sampling module with the minimum order of the output matrix in the plurality of second down-sampling modules as the input of the target up-sampling module, taking the output of the upper stage module or the result of adding the output of the upper stage module and the output of the second down-sampling module with the same order as the output of the output matrix of the upper stage module as the input of other up-sampling modules, sequentially up-sampling the respective inputs through the at least one first up-sampling module, obtaining at least one second up-sampling result, wherein the output of the first up-sampling module with the largest order of the output matrix in the at least one first up-sampling module is a third noise diagram output by the second model to be trained, or the output of the second down-sampling module with the same maximum order of the output matrix in the at least one first up-sampling module in the plurality of second down-sampling modules is added with the output of the first up-sampling module with the largest order of the output matrix in the at least one first up-sampling module to be the third noise diagram output by the second model to be trained, in the process, the plurality of second down-sampling modules connected in series are used for down-sampling the third result fused by the second feature diagrams of the first result and the second image, which is beneficial to adding the third down-sampling result with the output of the corresponding module in the at least one first up-sampling module, at least one module in a plurality of second downsampling modules and at least one first upsampling module can be connected, information of a second image is transmitted to the at least one first upsampling module, a second model to be trained is trained better, the target module has better generating capacity, and accuracy of a third noise figure can be guaranteed.

In one possible implementation manner, determining the target parameter of the second model to be trained according to the third noise diagram and the first noise diagram includes:

determining a second loss value of the third noise diagram and the first noise diagram by adopting a second preset loss function;

and determining target parameters of the second model to be trained according to the magnitude relation between the second loss value and the second preset threshold value.

In this embodiment, the second preset loss function is a measurement standard of whether the second model to be trained is qualified or not, so that the model obtained by training can be effectively ensured to have an output result with higher precision.

In one possible implementation manner, determining the target parameter of the second model to be trained according to the magnitude relation between the second loss value and the second preset threshold value includes:

returning to execute if the second loss value is greater than or equal to the second preset threshold value: the method comprises the steps of encoding a first image from an image space to a feature space through a first encoder to obtain a feature image of the first image, encoding a second image from the image space to the feature space through a third encoder to obtain a second feature image of the second image until a second loss value is smaller than a second preset threshold value, and taking parameters used in a second model to be trained training process corresponding to the condition that the second loss value is smaller than the second preset threshold value as target parameters, wherein the parameters used in the second model to be trained training process corresponding to the condition that the second loss value is smaller than the second preset threshold value are obtained after the parameters used in the last second model to be trained training process are adjusted.

In this embodiment, when the second loss value is greater than or equal to the second preset threshold, the parameters used in the last training process of the second model to be trained are adjusted, the adjusted parameters are used as parameters used in the current training process of the second model to be trained, the second model to be trained is trained, and training is stopped until the second loss value is less than the second preset threshold, so as to obtain the target parameters, and through the process, more accurate model parameters can be obtained, thereby being beneficial to improving the performance of the model.

In one possible implementation, before the second image is encoded from the image space to the feature space by the third encoder, the method further includes:

or,

In this embodiment, the second image is obtained through the above manner, which is simple and efficient, so that the second image is only lower in resolution than the first image, and is beneficial to the subsequent training process of the second model to be trained.

In a third aspect, a method of processing an image is provided, the method comprising:

acquiring a third image;

inputting the third image and the fourth noise diagram into a first target model to obtain an output result of the first target model, wherein the first target model is obtained by the method for training the model according to any one of the first aspects;

and inputting the output result of the first target model to a decoder to obtain a first target image, wherein the decoder is matched with the first encoder in the first aspect, and the resolution of the first target image is higher than that of the third image.

In this embodiment, a third image (as a control condition of the first target model) and a fourth noise image are input into the first target model, the noise output by the first target model is used to make multiple iterations of noise reduction on the fourth noise image input into the first target model to obtain an output result of the first target model without noise, and the decoder is used to decode the output result of the first target model from the feature space to the image space to obtain a first target image with a resolution higher than that of the third image.

In a fourth aspect, there is provided a method of processing an image, the method comprising:

acquiring a fourth image;

denoising the fourth image to obtain a fifth image, and encoding the fifth image from an image space to a feature space by a fourth encoder to obtain a feature map of the fifth image;

fusing a feature map of a fifth image and a fifth noise map, and inputting a fourth result obtained after fusing and the fourth image into a second target model to obtain an output result of the second target model, wherein the second target model is obtained by a method for training the model according to any one of the second aspects;

and inputting the output result of the second target model to a decoder to obtain a second target image, wherein the decoder is matched with the first encoder in the second aspect, and the decoder is matched with a fourth encoder, and the resolution of the second target image is higher than that of the fourth image.

In this embodiment, denoising the fourth image to avoid interference to subsequent image generation caused by noise carried by the fourth image, encoding the fifth image obtained by denoising from image space to feature space through a fourth encoder, fusing the feature image of the obtained fifth image with the fifth noise image to obtain a fourth result, taking the fourth result and the fourth image (serving as a control condition of the second target model) as input of the second target model, denoising the fourth result obtained by fusing the feature image of the fifth noise image with the feature image of the fifth image through noise output by the second target model for multiple iterations, obtaining an output result of the second target model without noise, decoding the output result of the second target model from the feature space to the image space through a decoder, obtaining a second target image with higher resolution than that of the fourth image, wherein in the process, because the second target model is obtained by training the second model to be trained, in an application stage of the second target model, the second target image can be compared with a low-resolution image (namely, the fourth image can be improved in resolution, and in addition, the difference between the fourth target image and the fourth image can be avoided on the basis of the fourth image is greatly improved, and the difference between the fourth target image and the fourth image can be avoided.

In a fifth aspect, an apparatus for training a model is provided, where the apparatus is included in an electronic device, and the apparatus has a function of implementing the behavior of the electronic device in the method for training a model described above. The functions may be realized by hardware, or may be realized by hardware executing corresponding software. The hardware or software includes one or more modules or units corresponding to the functions described above. For example, a fusion module or unit, a determination module or unit, etc.

In a sixth aspect, an apparatus for processing an image is provided, which is included in an electronic device, and has a function of implementing the behavior of the electronic device in the above-described method for processing an image. The functions may be realized by hardware, or may be realized by hardware executing corresponding software. The hardware or software includes one or more modules or units corresponding to the functions described above. For example, an acquisition module or unit, an input module or unit, etc.

In a seventh aspect, an electronic device is provided comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor executing the method of training a model as described above or the method of processing an image when executing the computer program.

In an eighth aspect, a computer readable storage medium is provided, the computer readable storage medium storing a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of training a model as described above, or the method of processing an image.

In a ninth aspect, there is provided a computer program product comprising: computer program code which, when run by an electronic device, causes the electronic device to perform the above-described method of training a model, or the method of processing an image.

The advantages of the fifth aspect may refer to the first aspect and the second aspect, and are not described herein. Advantageous effects of the sixth aspect refer to the third aspect and the fourth aspect, and are not described herein. Advantageous effects of the seventh aspect to the ninth aspect refer to the first aspect to the fourth aspect, and are not described here.

Drawings

FIG. 1 is a flow chart of a method for training a model according to an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a first model to be trained according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a target module according to an embodiment of the present application;

FIG. 4 is a schematic diagram of another object module according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of another object module according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of another first model to be trained according to an embodiment of the present application;

FIG. 7 is a schematic structural diagram of yet another first model to be trained according to an embodiment of the present application;

FIG. 8 is a flow chart of another method for training a model according to an embodiment of the present application;

FIG. 9 is a flow chart of yet another method for training a model provided by an embodiment of the present application;

FIG. 10 is a schematic structural diagram of a second model to be trained according to an embodiment of the present application;

FIG. 11 is a schematic structural diagram of another second model to be trained according to an embodiment of the present application;

FIG. 12 is a schematic diagram of a structure of a second model to be trained according to an embodiment of the present application;

FIG. 13 is a flow chart of yet another method for training a model provided by an embodiment of the present application;

FIG. 14 is a flow chart of a method of processing an image according to an embodiment of the present application;

FIG. 15 is a schematic view of a structure for processing an image based on a first object model according to an embodiment of the present application;

FIG. 16 is a flow chart of another method for processing an image according to an embodiment of the present application;

FIG. 17 is a schematic diagram of a structure for processing an image based on a second object model according to an embodiment of the present application;

fig. 18 is a schematic structural diagram of a self-encoder according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, embodiments of the present application will be described in further detail below with reference to the accompanying drawings. The terms "first" and "second" are used below for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present embodiment, unless otherwise specified, the meaning of "plurality" is two or more.

In the related art, a model capable of improving the resolution of an image is provided, the model includes a Condition (conditional) branch, the Condition branch can control the generation of content, the Condition can be an edge map, a depth map or a segmentation map, and the model can generate an image conforming to the Condition. For example, an original image (source image) is subjected to edge detection by using a channel edge detection (canny edge detection) algorithm, an edge image (channel edge) is generated, the generated edge image is used as an input of a Condition branch of the model, different output images can be generated based on the model by applying different random noises, the outline of an object in the images is consistent with the outline of the object in the edge image, however, the resolution of the output image obtained by the model is improved relative to the original image, but the image detail and the image color are greatly different from the original image, and the image cannot be directly used for an image enhancement task or an image super-resolution task.

Based on the above, the embodiment of the application provides a method for training a model and a method for processing an image, wherein the method for training the model is used for solving the problems of the model in the related technology, and the method for processing the image is used for describing the application process of the trained model.

The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be implemented independently or combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments.

The method for training the model provided by the embodiment of the application is described in detail below.

Fig. 1 shows a flow diagram of a method of training a model for training a first model to be trained, the first model to be trained comprising: the first encoder, the second encoder and the target module.

The training process of the first model to be trained will be described below taking the structure of the first model to be trained shown in fig. 2 as an example. Referring to fig. 1, a method of training a model includes:

s101, encoding a first image from an image space to a feature space through a first encoder to obtain a feature map of the first image, and encoding a second image from the image space to the feature space through a second encoder to obtain the first feature map of the second image.

The feature map of the first image obtained by the first encoder and the first feature map of the second image obtained by the second encoder correspond to the same feature space, namely the feature information represented by the feature map and the first feature map are the same. The first encoder and the second encoder may be implemented by a neural network structure, or may be implemented by a first base module stack, where the first base module may be determined from a convolutional layer, an active layer, a normalization layer, an attention mechanism (attention) layer, a downsampling layer, a residual network, or other structures, which are not limited herein. The first image may be obtained from an open source high definition database, or may be obtained by a high definition camera, and the first image may be one image or may be a plurality of different images, where the obtaining mode and the number of the first image are not limited. The format of the first image may be: portable network graphics (portable network graphics, PNG), bitmaps (BMP), etc., without limitation herein. The second image can be obtained by the first image, and the difference between the second image and the first image is that: the resolution of the second image is lower than the resolution of the first image. The number of the second images is the same as that of the first images.

In some embodiments, the second encoder and the first encoder may be the same encoder (the model structure and the model parameters are the same) or may be different encoders, but when the second encoder and the first encoder are different encoders, it is necessary to ensure that the feature map of the first image obtained by the first encoder and the first feature map of the second image obtained by the second encoder correspond to the same feature space. When the second encoder and the first encoder are different encoders, the second encoder can be obtained by: based on image sample data (the image sample data may be obtained from an open source database, or may be obtained by other means, which is not limited herein), training a first encoder and a decoder together in advance, fixing model parameters of the decoder after the training is finished, keeping the model parameters unchanged, and training a second encoder and a decoder with the model parameters unchanged based on the same image sample data together, after the training is finished, obtaining a second encoder capable of ensuring that a feature map of a first image obtained by the first encoder and a first feature map of a second image obtained by the second encoder correspond to the same feature space.

Since the whole training process of the first model to be trained is performed in the hidden space (also called feature space) domain, the input image needs to be converted from the image domain (i.e. the image space) to the feature space domain, namely: the first image is input to a first encoder, the first image can be converted from an image space to a feature space through the first encoder, a feature image of the first image is obtained, the second image is input to a second encoder, the second image can be converted from the image space to the feature space through the second encoder, a first feature image of the second image is obtained, and the feature image of the first image obtained through the first encoder corresponds to the same feature space as the first feature image of the second image obtained through the second encoder.

It will be appreciated that: the feature map of the first image obtained by the first encoder and the first feature map of the second image obtained by the second encoder may each be characterized by a plurality of n×n matrices (i.e. a plurality of n-order matrices), where the number of matrices refers to the number of channels and there are several n×n matrices for several channels, each channel corresponding to an n×n matrix. Wherein the number of matrices used for representing the feature map of the first image obtained by the first encoder and the order n of the matrices are determined by the model parameters of the first encoder, n is an integer greater than 1, and each matrix comprises The feature points in the n-order matrixes can represent the color, shape, detail, texture and other feature information of the first image. The number of matrices and the order n of the matrices used for representing the first feature map of the second image obtained by the second encoder are determined by the model parameters of the second encoder, n is an integer greater than 1, and each matrix contains->The feature points in the n-order matrixes can represent the color, shape, detail, texture and other features of the second image. The number and the order of the matrixes obtained by the second encoder are the same as those of the matrixes obtained by the first encoder. Accordingly, the feature map of the first image obtained by the first encoder and the first feature map of the second image obtained by the second encoder correspond to the same feature space, and can be understood as: in the matrix obtained by the second encoder and the matrix obtained by the first encoder, the characteristic information represented by the matrix corresponding to the same channel is the same.

In some embodiments, optionally, before the second image is encoded from the image space to the feature space by the second encoder in S101, the method further includes:

or,

The degradation process may be implemented by using a blind image super resolution model (Real-ESRGAN), or may be implemented by a downsampling method, which is not limited herein. The degradation process may include operations of adding noise, blurring, compressing, and the like to deteriorate sharpness. The interpolation process may be the nearest neighbor interpolation (nearest interpolation) process, or may be other interpolation processing methods, which are not limited herein.

Specifically, the first image after the degradation processing can be obtained by performing the degradation processing on the first image by performing operations of noise addition, blurring, compression addition, and the like that deteriorate the sharpness. Since the size information of the first image after the degradation process (i.e., the length and width of the first image) is reduced, it is necessary to upsample the first image after the degradation process, restore the size information of the first image after the degradation process to be identical to the size information of the first image, and obtain the second image.

In order to enable the model obtained after training based on the method for training the first model to be trained in the present embodiment to be applied to the RAW file, the RAW file needs to be used in the process of training the model, at this time, the first image may be converted into an unprocessed RAW file by a method such as format conversion, and channel separation (channel split) is performed on the RAW file, so as to obtain images corresponding to three channels of red (red, R), green (green, G), blue (blue, B) of the RAW file, and then interpolation processing is performed on the images of the three channels, for example, nearest interpolation is performed, so that a second image may be obtained.

In this embodiment, the second image is obtained in the above manner, which is simple and efficient, so that the second image is only lower in resolution than the first image, which is beneficial to the execution of the subsequent steps.

S102, fusing the feature map of the first image and the first noise map to obtain a fused first result.

The object module is used as a noise prediction model, and is essentially to use a deep learning model to predict noise, so that in the process of predicting noise in an image, the noise is not contained in the first image, and the noise adding process needs to be carried out on the feature map of the first image. The specific noise adding process can be as follows: setting the mean and variance, generating a two-dimensional gaussian noise map subject to the distribution of the set mean and variance based on a noise generating function, such as torch. Randn, setting a random step size, and representing the level of added noise to obtain a first noise map, where the first noise map may also be represented by an n-order matrix, the order of the matrix for representing the first noise map is the same as the order of the matrix for representing the feature map of the first image obtained by the first encoder, so that the feature map of the first image and the first noise map are fused, each matrix for representing the feature map of the first image is added to each corresponding element (i.e., value) in the matrix for representing the first noise map, and after adding, a fused first result is obtained, where the first result may also be represented by an n-order matrix, and the number and the order of the matrix for representing the first result are the same as the number and the order of the matrix for representing the feature map of the first image.

And S103, fusing the fused first result and the first feature map of the second image to obtain a fused second result.

Specifically, after the first result is obtained, the number, the order and the characteristic information represented by the matrix for representing the first result are the same as the number, the order and the characteristic information represented by the matrix for representing the characteristic map of the first image, the number, the order and the characteristic information represented by the matrix for representing the characteristic map of the first image are the same as the number, the order and the characteristic information represented by the matrix for representing the first characteristic map of the second image, so that the number, the order and the characteristic information represented by the matrix for representing the first result are the same as the number, the order and the characteristic information represented by the matrix for representing the first characteristic map of the second image, and the fused first result and the first characteristic map of the second image can be obtained by adding the number, the order and the characteristic information represented by the matrix for representing the first characteristic map of the first image and the first characteristic map of the second image.

For example, assuming that 4 matrices are used for characterizing the first result, namely, matrix a corresponding to channel 1, matrix B corresponding to channel 2, matrix C corresponding to channel 3 and matrix D corresponding to channel 4, and 4 matrices are used for characterizing the first feature map of the second image, namely, matrix a corresponding to channel 1, matrix B corresponding to channel 2, matrix C corresponding to channel 3 and matrix D corresponding to channel 4, respectively, then the first feature maps of the first result and the second image are fused, namely, matrix a and matrix a (the same characterizing feature information is characterized by both) are correspondingly added, matrix B and matrix B (the same characterizing feature information is characterized by both) are correspondingly added, matrix C and matrix C (the same characterizing feature information is characterized by both) are correspondingly added, the obtained second result is still 4 matrices, and the characterizing feature information of each of the 4 matrices corresponding to each channel in the 4 matrices used for characterizing the first result is the same characterizing feature information.

S104, inputting the fused first result and the fused second result to the target module to obtain a second noise diagram output by the first model to be trained.

The target module is used for predicting noise in the image, and may adopt structures such as U-Net, VGG (visual geometry group), residual network (res Net), and the like, which are not specifically limited herein.

After the fused first result and the fused second result are obtained, the target module is used as a noise prediction model, the fused first result and the fused second result are input into the target module, the first result and the second result are acted through the target module, noise contained in the first result can be predicted, a second noise diagram output by the first model to be trained is obtained, and the number and the order of matrixes used for representing the second noise diagram are the same as those used for representing the first result.

In some embodiments, the target module may include at least one first downsampling module connected in series, at least one first upsampling module connected in series, and a plurality of second downsampling modules connected in series, where the at least one first downsampling module is connected in series with the at least one first upsampling module, the maximum order and the minimum order of output matrices corresponding to the at least one first downsampling module and the at least one first upsampling module are the same, and the order of output matrices corresponding to the plurality of second downsampling modules is a subset of the order of output matrices corresponding to the at least one first upsampling module, and S104 may be implemented by:

The input of the first up-sampling module with the minimum order of the output matrix in at least one first up-sampling module is a first down-sampling result; the input of the target up-sampling module in the at least one first up-sampling module is the result of adding the output of the upper stage module of the target up-sampling module and the output of the second down-sampling module with the smallest order of the output matrix in the plurality of second down-sampling modules, and the target up-sampling module is the next stage module of the first up-sampling module with the same order of the output matrix in the at least one first up-sampling module and the smallest order of the output matrix in the plurality of second down-sampling modules; the input of the other up-sampling modules in the at least one first up-sampling module is the output of the upper-stage module, or is the result of adding the output of the upper-stage module and the output of the second down-sampling module with the same order as the output matrix of the upper-stage module in the plurality of second down-sampling modules. The maximum order of the output matrix is the same as the order of the matrix corresponding to the first result (i.e., the matrix used to characterize the first result), the minimum order is smaller than the maximum order, the minimum order may be predetermined, e.g., the minimum order may be a multiple of the maximum order, e.g. 、/>And +.>And the like, as the case may be, without specific limitation herein.

Specifically, when predicting noise, the target module is used as a model of predicting noise, and at least one first downsampling module connected in series in the target module downsamples a first result obtained by fusing the feature map of the first image and the first noise map, namely: feature extraction is carried out on the output matrix corresponding to the first result, the order of the output matrix corresponding to the first result is continuously reduced until the output matrix with the minimum order (first downsampling result) is obtained, and downsampling is carried out on the second result after the first feature graphs of the first result and the second image are fused through a plurality of second downsampling modules connected in series in the target module, namely: extracting features of an output matrix corresponding to the second result, and continuously reducing the order of the output matrix corresponding to the second result to obtain a second downsampling result corresponding to each second downsampling module; for at least one first up-sampling module in the target module, taking a first down-sampling result as an input of a first up-sampling module with the smallest order of an output matrix in the at least one first up-sampling module, adding an output of the first up-sampling module with the same order as the smallest order of the output matrix in the at least one second up-sampling module to an output of a second down-sampling module with the smallest order of the output matrix in the at least one second up-sampling module as an input of the target up-sampling module, taking an output of the up-sampling module or an output of a second down-sampling module with the same order as the output of the output matrix of the up-sampling module in the at least one first up-sampling module as an input of other up-sampling modules, sequentially up-sampling respective inputs by the at least one first up-sampling module to obtain at least one first up-sampling result, wherein an output of the first up-sampling module with the largest order of the output matrix in the at least one first up-sampling module is a first to-be-trained noise output, or an output of the first up-sampling module with the same order as the largest order of the output of the first up-sampling module in the at least one first up-sampling module is a first to be-sampled noise output of the first up-sampling module with the largest order of the output matrix in the at least one first up-sampling module.

In the process, the second results obtained by fusing the first results and the first feature images of the second images are subjected to downsampling through the plurality of second downsampling modules connected in series, so that more features of the second images can be extracted, the second downsampling results are added with the output of the corresponding module in the at least one first upsampling module, at least one module in the plurality of second downsampling modules and the at least one first upsampling module can be connected, information of the second images is transmitted to the at least one first upsampling module, the first model to be trained is trained better, and the target module has better generating capacity and can guarantee the accuracy of the second noise images.

The specific implementation process of S104 will be specifically described below by taking the example that the target module shown in fig. 3 includes four first downsampling modules (encoding blocks 1-4) connected in series, four first upsampling modules (decoding blocks 1-4) connected in series, and four second downsampling modules (encoding blocks 5-8) connected in series.

The coding block may be implemented by stacking a first base module, which may be predetermined, or may be implemented by other structures as the case may be, and is not limited herein; the coding block may be any other network or structure capable of downsampling, and is not limited herein. The decoding block may be implemented by a second base module stack, where the second base module may be determined from a convolution layer, an activation layer, a normalization layer, an attention mechanism (attention) layer, an upsampling layer, a residual network, and the like, and may be implemented by other structures, which are not limited herein; the decoding block may be any other network or structure capable of up-sampling, and is not limited herein.

As shown in fig. 3: the first result and the second result are both represented by (M, 4,64,64), where M represents the number of input images, and its value remains unchanged during the training process of the first model to be trained, 4 represents the number of channels, and 64 are used to represent the number of rows and the number of columns of the matrix corresponding to the first result and the second result, where in this embodiment, the number of rows and the number of columns of the matrix are the same.

Downsampling the first result by a first downsampling module with the largest order of an output matrix (such as the coding block 1 in fig. 3, the coding block with the unchanged order of the matrix is maintained, and the order of the output matrix is 64) in the four first downsampling modules to obtain a matrix with the output of 64×64; the output of the previous stage block (i.e., the output of the encoding block 1) is downsampled by the encoding block 2, the output of the encoding block 1 can be changed from a 64×64 matrix to a 32×32 matrix, the output of the previous stage block (i.e., the output of the encoding block 2) is downsampled by the encoding block 3, the output of the encoding block 2 can be changed from a 32×32 matrix to a 16×16 matrix, the output of the previous stage block (i.e., the output of the encoding block 3) is downsampled by the encoding block 4, the output of the encoding block 3 can be changed from a 16×16 matrix to an 8×8 matrix, and the first downsampling block (e.g., the encoding block 4 in fig. 3, the order of the matrix is reduced) has the order of 8) of the output matrix of the first downsampling result. The internal network structure of the coding block 1 in fig. 3 may adopt a structure of 2 convolution layers+group normalization (group normalization, GN) layers+attention layers, the convolution kernel of the 2 convolution layers is 3×3, the step size is 1, the internal network structure of the coding block 2-coding block 4 may adopt a structure of two convolution layers+gn layers+attention layers+downsampling layers, the convolution kernel of the convolution layers is 3×3, the step size is 1, the convolution kernel of the downsampling layer is 3×3, and the step size is 2.

The second result is downsampled by a second downsampling module with the largest order of the output matrix (such as the coding block 5 in fig. 3, the order of the output matrix is 64) in the four second downsampling modules, so as to obtain a second downsampling result (represented by a 64×64 matrix), and the output of the previous stage module is downsampled by the coding blocks 6-8 in sequence, so as to respectively obtain three second downsampling results, namely, the output of the coding block 6 (represented by a 32×32 matrix), the output of the coding block 7 (represented by a 16×16 matrix) and the output of the coding block 8 (represented by an 8×8 matrix), wherein the internal network structure of the coding blocks 5-8 can be the same as or different from the internal network structure of the coding blocks 1-4, and the internal network structure of the coding blocks 5-8 can be the same as the internal network structure of the coding blocks 1-4, and the internal network structure of the coding blocks can be represented by 8.

Upsampling a first downsampling result (i.e., the output of the coding block 4) by a first upsampling module (e.g., the decoding block 4 in fig. 3, the coding block maintaining the matrix order unchanged, the order of the output matrix being 8) with the smallest order of the output matrix of the four first upsampling modules to obtain a first upsampling result (represented by an 8×8 matrix); up-sampling the result of adding the output of the decoding block 4 to the output of the encoding block 8 by a target up-sampling module (e.g., the decoding block 3 in fig. 3, where the decoding block 3 is a first up-sampling module (i.e., a next stage module of the decoding block 4) of which the order of the output matrix in the four first up-sampling modules is the same as the minimum order of the output matrix in the four second down-sampling modules), to obtain the output of the decoding block 3 (represented by a matrix of 16×16); the decoding block 2 and the decoding block 1 are the remaining other upsampling modules, the input of the decoding block 2 may be the output of the previous stage module (i.e. the decoding block 3), or the result of adding the output of the previous stage module (i.e. the decoding block 3) to the output of a second downsampling module (i.e. the encoding block 7) with the same order as the output matrix of the decoding block 3, of the four second downsampling modules, the input of the decoding block 2 shown in fig. 3 is the result of adding the output of the decoding block 3 to the output of the encoding block 7, the other input of the decoding block 2 is not shown in fig. 3, and the input of the decoding block 2 is upsampled by the decoding block 2 to obtain the output of the decoding block 2 (represented by a matrix of 32×32); the input of the decoding block 1 may be the output of the previous stage module (i.e. the decoding block 2), or the result of adding the output of the previous stage module (i.e. the decoding block 2) to the output of a second downsampling module (i.e. the encoding block 6) of the four second downsampling modules having the same order as the output matrix of the decoding block 2, the input of the decoding block 1 shown in fig. 3 being the result of adding the output of the decoding block 2 to the output of the encoding block 6, the other input of the decoding block 2 not being shown in fig. 3, the input of which is upsampled by the decoding block 1 to obtain the output of the decoding block 1 (characterized by a matrix of 64×64).

After the four first upsampling results (i.e., the outputs of the decoding blocks 1-4) are obtained, the output of the first upsampling module with the largest order of the output matrix among the four first upsampling modules (i.e., the decoding block 1) is the second noise figure, or the output of the second downsampling module with the same largest order of the output matrix among the four first upsampling modules (i.e., the encoding block 5) and the output of the first upsampling module with the largest order of the output matrix among the four first upsampling modules are added to the output of the decoding block 1, the second noise figure shown in fig. 3 is the result of adding the output of the encoding block 5 to the output of the decoding block 1, and is still indicated by (M, 4,64,64), and another implementation of the second noise figure is not shown in fig. 3.

It should be noted that: the number and structure of the first downsampling modules (encoding blocks 1-4), the number and structure of the second downsampling modules (encoding blocks 5-8), and the number and structure of the first upsampling modules (decoding blocks 1-4) included in the target modules shown in fig. 3 are for illustration only and are not intended to be limiting.

In some embodiments, the second downsampling module may include a coding block, and the order of the output matrix of the coding block is a maximum order. Another implementation procedure of S104 will be described below by taking the example that the target module shown in fig. 4 includes four first downsampling modules (encoding blocks 1-4) connected in series, four first upsampling modules (decoding blocks 1-4) connected in series, and one second downsampling module (encoding block 5).

The functions of the encoding blocks 1-4 and 5 shown in fig. 4 are the same as those of the encoding blocks 1-4 and 5 shown in fig. 3, and will not be repeated here.

In the case where the target module includes a second downsampling module and the order of the output matrix of the second downsampling module is the maximum order, the module with the smallest order of the output matrix in the second downsampling module (e.g., the coding block 5 in fig. 4) is also the coding block 5, so that the output of the coding block 5 is added to the module with the same order of the output matrix of the coding block 5 in the four first upsampling modules, i.e., the output of the coding block 5 needs to be added to the output of the decoding block 1, and the addition result is a second noise figure, i.e., the output in fig. 4.

In some embodiments, an intermediate block may be further included between the at least one first downsampling module connected in series and the at least one first upsampling module connected in series, where the intermediate block is used to perform feature extraction on the first downsampling result, and an order of an output matrix of the intermediate block is the same as an order of an output matrix of a first downsampling module with a minimum order of the output matrices of the four first downsampling modules; the last stage module of the plurality of second downsampling modules connected in series may also be connected in series with an intermediate block, the output matrix of which has the same order as the output matrix of the last stage module of the plurality of second downsampling modules. The specific implementation process of S104 will be specifically described below with reference to fig. 3, where the target module shown in fig. 5 includes four first downsampling modules (encoding blocks 1-4) connected in series, an intermediate block 1, four first upsampling modules (decoding blocks 1-4) connected in series, four second downsampling modules (encoding blocks 5-8) connected in series, and an intermediate block 2.

Wherein, the middle block 1 is positioned between the coding block 4 and the decoding block 4, and the middle block 2 is the next stage module of the coding block 8. The internal network structure of the intermediate block 1 and the intermediate block 2 may adopt a structure of a convolution layer+an attention layer+a convolution layer, the convolution kernel of 2 convolution layers is 3×3, the step size is 1, or may adopt other structures, which are not limited herein. The functions of the coding blocks 1-4 and 5-8 shown in fig. 5 are the same as those of the coding blocks 1-4 and 5-8 shown in fig. 3, and will not be described again here.

As shown in fig. 5, in the case where the intermediate block 1 is located between the encoding block 4 and the decoding block 4, the intermediate block 2 is the next stage block of the encoding block 8, the order of the output matrix of the intermediate block 2 is the same as the order of the output matrix of the intermediate block 1, the intermediate block 2 is the last stage block of the right encoding block 5-intermediate block 2, it is necessary to add the result of addition to the output of the intermediate block 1, and the result of addition is the input of the decoding block 4, the input of the decoding block 4-decoding block 1 may be the output of the previous stage block, or the result of addition to the output of the previous stage block is the output of the second downsampling block 5, which is the result of addition to the output of the output matrix of the previous stage block 5, of the decoding block 4-decoding block 1 shown in fig. 5 is the result of addition to the output of the second downsampling block 5, which is the result of addition to the output of the decoding block 1.

It should be noted that: the number of intermediate blocks 1, the internal network configuration, and the number of intermediate blocks 2 included in the target module shown in fig. 5 are only for illustrative purposes, and are not limited thereto.

Fig. 6 shows a schematic diagram of another first model to be trained based on the structure of the target module shown in fig. 5.

As shown in fig. 6, the first image with M, three channels and size information (length and width) of 512×512, that is, (M, 3,512,512) is taken as an input of the first encoder, the second image with M, three channels and size information of 512×512, that is, (M, 3,512,512) is taken as an input of the second encoder, the number of the images is unchanged after the first encoder encodes the first image, the number of the channels becomes 4, the obtained feature map of the first image is represented by a matrix of 64×64, that is, (M, 4,64,64), the number of the images is unchanged after the second encoder encodes the second image, the number of the channels becomes 4, the output matrix becomes 64×64, that is, (M, 4,64,64), the output of the first encoder and the first noise map are fused, the first result and the output of the second encoder are obtained, the first result and the second result are input to the target module, and the second result are obtained through the effect of the target module, the second noise map (M, 4,64,64) is obtained, wherein all the effects of the target module are not described in detail here.

Taking the structure of the target module shown in fig. 5 as an example, a new convolutional layer, a first zero convolution, a second zero convolution, a third zero convolution, a fourth zero convolution, a fifth zero convolution, and a sixth zero convolution are taken as an example, and the structure of still another first model to be trained shown in fig. 7 is described below.

The structure of the target module shown in fig. 7 is newly added with a second zero convolution-a sixth zero convolution, a convolution layer and a first zero convolution are added in the first model to be trained, and the first zero convolution-the sixth zero convolution can be realized by adopting a convolution layer with a convolution kernel of 1×1. The convolution layer is a previous module of the coding block 1 and is used for carrying out convolution operation on the first result, extracting characteristics, and the output of the convolution layer can be represented by (M, 320,64,64); the first zero convolution is a next stage module of the second encoder and is used for carrying out convolution operation on the output of the second encoder, extracting characteristics, the output of the first zero convolution can be characterized by (M, 320,64,64), and the number and the order of matrixes used for representing the output of the first zero convolution are the same as the number and the order of matrixes used for representing the output of a convolution layer; the second zero convolution is connected with the middle block 2 and is used for fusing the output of the middle block 2 with the output of the middle block 1 after the convolution operation of the second zero convolution; the third zero convolution is connected with the encoding block 8, and is used for fusing the output of the encoding block 8 with the output of the decoding block 4 after the convolution operation of the third zero convolution; the fourth zero convolution is connected with the encoding block 7, and is used for fusing the output of the encoding block 7 with the output of the decoding block 3 after the convolution operation of the fourth zero convolution; the fifth zero convolution is connected with the encoding block 6, and is used for fusing the output of the encoding block 6 with the output of the decoding block 2 after the convolution operation of the fifth zero convolution; and the sixth zero convolution is connected to the encoding block 5, and is used for fusing the output of the encoding block 5 with the output of the decoding block 1 after the convolution operation of the sixth zero convolution.

The initial weight value in the first zero convolution-sixth zero convolution is 0, that is, the initial output of the second encoder after the first zero convolution is 0, the output of the encoding block 5-middle block 2 after the second zero convolution-sixth zero convolution is applied, the initial output is also 0, then the result of adding the output of the middle block 2 after the second zero convolution to the output of the middle block 1 is the output of the middle block 1, the result of adding the output of the encoding block 8 after the third zero convolution to the output of the decoding block 4 is the output of the decoding block 4, so that the result of adding the output of the encoding block 5 after the sixth zero convolution to the output of the decoding block 1 is the output of the decoding block 1, so that the performance of the structures of the decoding block 4, the decoding block 3, the decoding block 2 and the decoding block 1 from the encoding block 1 can be ensured, then the weight value in the first zero convolution-sixth zero convolution starts to change in the training process of the target module, the model can be more stable through the effect of the first zero convolution to the sixth zero convolution on the model, and the training process can be faster.

Illustratively, the second zero convolution, the third zero convolution, the fourth zero convolution, the fifth zero convolution, and the sixth zero convolution shown in fig. 7 are set based on the structure (five blocks included) of the encoding block 5-middle block 2 in fig. 7, and in the case that the number of included blocks is changed, the number of zero convolutions is also changed, which is not particularly limited herein.

S105, determining target parameters of the first model to be trained according to the second noise diagram and the first noise diagram.

The target parameters of the first model to be trained are parameters used when the first model to be trained converges, and may be specifically determined by a structure of the first model to be trained, for example, a weight of a convolution layer, a bias, a mean of a normalized layer, a variance, and the like, which are not specifically limited herein.

After the second noise map is obtained, since the second noise map is a predicted result of the noise included in the first result, according to the second noise map and the first noise map, by calculating the similarity of the second noise map and the first noise map, that is, the proximity degree of the second noise map and the first noise map, whether the first model to be trained converges can be determined, so that the target parameter used by the first model to be trained during convergence can be determined.

In some embodiments, optionally, the first encoder and the second encoder in the first model to be trained may be pre-trained structures, and parameters in the first encoder and the second encoder are kept unchanged, and accordingly, only target parameters corresponding to the target module need to be determined in the training process of the first model to be trained, and by the method, the training efficiency of the model can be improved.

In some embodiments, optionally, the first model to be trained further includes an upsampling module connected in series with the second encoder, and the upsampling module is located before the second encoder, the upsampling module being configured to upsample the second image to increase size information of the second image.

The up-sampling rate may be preset, and the up-sampling rate is a positive number greater than 1, for example, 1.5 or 2, etc., and may be optionally determined, which is not limited herein.

Specifically, the size of the image (i.e., the length and width of the image) in the image super-resolution task may become larger, so that in order to enable the model obtained after training based on the method for training the first model to be trained in this embodiment to be applied to the image super-resolution task, it is required to ensure that the size information of the image input to the second encoder is larger than the size information of the first size, therefore, after the second image is obtained, it is required to upsample the second image by the upsampling module, input the upsampled image to the second encoder, and encode the upsampled image from the image space to the feature space by the second encoder, so as to obtain the feature map of the upsampled image.

In this embodiment, through the above process, the model obtained after training based on the method for training the model for the first model to be trained in this embodiment can be applied to the image super-resolution task.

In some embodiments, optionally, S105 determines a target parameter of the first model to be trained according to the second noise figure and the first noise figure, including:

The first preset loss function may be a cosine loss function, a join timing classification (connectionist temporal classification, CTC) loss function, a multi-classification cross entropy loss function, a mean square loss function, etc., and the loss function may be determined according to actual use requirements, which is not limited herein. The first preset threshold may be a preset value, or may be any value as appropriate, which is not limited herein.

Specifically, after the output of the first model to be trained, that is, the second noise diagram is obtained, according to the second noise diagram and the first noise diagram, a first loss value corresponding to the second noise diagram and the first noise diagram can be calculated through a first preset loss function. Based on the magnitude relation between the first loss value and the first preset threshold value, whether the parameters of the first model to be trained need to be adjusted or not can be determined, and therefore target parameters of the first model to be trained are obtained.

In this embodiment, the first preset loss function is a standard for determining whether the first model to be trained is qualified, and the similarity between the second noise map and the first noise map can be calculated through the first preset loss function to verify the recognition accuracy of the first model to be trained, so that the model obtained by training can be effectively ensured to have a higher-accuracy output result.

Fig. 8 shows a flow diagram of another method of training a model.

Referring to fig. 8, the method of training a model includes:

s801, encoding the first image from the image space to the feature space by the first encoder, obtaining a feature map of the first image, and encoding the second image from the image space to the feature space by the second encoder, obtaining a first feature map of the second image.

The specific implementation manner of this step is referred to the description in S101 above, and in order to avoid repetition, the description is omitted here.

S802, fusing the feature map of the first image and the first noise map to obtain a fused first result.

The specific implementation manner of this step is referred to the description in S102 above, and in order to avoid repetition, the description is omitted here.

S803, fusing the fused first result and the first feature map of the second image to obtain a fused second result.

The specific implementation manner of this step is referred to the description in S103 above, and in order to avoid repetition, the description is omitted here.

S804, inputting the fused first result and the fused second result to the target module to obtain a second noise diagram output by the first model to be trained.

The specific implementation manner of this step is referred to the description in S104 above, and in order to avoid repetition, the description is omitted here.

S805, determining a first loss value of the second noise map and the first noise map by adopting a first preset loss function.

S806, determining whether the first loss value is smaller than a first preset threshold.

If yes, execute S807; if not, S808 is performed.

The magnitude relation between the first loss value and the first preset threshold is compared to determine whether the first loss value is less than the first preset threshold.

S807, taking the parameter used in the training process of the corresponding first model to be trained under the condition that the first loss value is smaller than a first preset threshold value as a target parameter.

If the first loss value is smaller than the first preset threshold, the first model to be trained converges, namely the first model to be trained is trained, and at the moment, the parameters used in the training process of the corresponding first model to be trained under the condition that the first loss value is smaller than the first preset threshold are taken as target parameters, so that the target parameters of the first model to be trained are obtained.

S808, adjusting parameters used in the training process of the first model to be trained.

If the first loss value is greater than or equal to the first preset threshold, it indicates that the first model to be trained is not converged, and further the first model to be trained needs to be continuously trained, at this time, parameters used in the training process of the first model to be trained are adjusted, S801 is executed back based on the adjusted parameters, and the parameters are executed downwards in sequence until the first loss value is smaller than the first preset threshold, and parameters used in the training process of the first model to be trained corresponding to the situation that the first loss value is smaller than the first preset threshold are taken as target parameters, where the parameters used in the training process of the first model to be trained corresponding to the situation that the first loss value is smaller than the first preset threshold are obtained after the parameters used in the training process of the first model to be trained last time are adjusted.

In this embodiment, when the first loss value is greater than or equal to a first preset threshold, the parameters used in the previous training process of the first model to be trained are adjusted, the adjusted parameters are used as parameters used in the current training process of the first model to be trained, and the first model to be trained is trained until the first loss value is less than the first preset threshold, so as to obtain the target parameters; under the condition that the first loss value is smaller than a first preset threshold value, directly taking parameters used in a training process of a first model to be trained corresponding to the condition that the first loss value is smaller than the first preset threshold value as target parameters, and obtaining accurate model parameters through the process, thereby being beneficial to improving the performance of the model.

FIG. 9 is a flow chart of a method for training a second model to be trained according to an embodiment of the present application, the first model to be trained including: the system comprises a first encoder, a third encoder and a target module, wherein the first encoder in a second model to be trained is the same as the first encoder in a first model to be trained, and the target module in the second model to be trained is the same as the target module in the first model to be trained.

The training process of the second model to be trained will be described below taking the structure of the second model to be trained shown in fig. 10 as an example. Referring to fig. 9, a method of training a model includes:

s901, encoding a first image from an image space to a feature space by a first encoder to obtain a feature map of the first image, and encoding a second image from the image space to the feature space by a third encoder to obtain a second feature map of the second image.

The third encoder and the first encoder adopt different structures, the third encoder also has the function of encoding the second image from the image space to the feature space, and the feature map of the first image obtained by the first encoder corresponds to the different feature spaces with the second feature map of the second image obtained by the third encoder. The third encoder may be implemented by the first base module stack, may be implemented using a convolutional neural network structure, and may be implemented using other structures, which are not limited herein.

Since the whole training process of the second model to be trained is performed in the feature space domain, the input image needs to be converted from the image domain (i.e. the image space) to the feature space domain, namely: the first image is input to a first encoder, the first image can be converted into a feature space from an image space through the first encoder, a feature image of the first image is obtained, the second image is input to a third encoder, and the second image can be converted into the feature space from the image space through the third encoder, so that a second feature image of the second image is obtained. Since the second encoder of the first model to be trained is replaced with the third encoder in the second model to be trained, the feature map obtained by the third encoder in S901 is different from the feature map obtained by the second encoder in S101, and for distinguishing, the feature map obtained by encoding the second image from the image space to the feature space by the third encoder in S901 is called: the second feature map of the second image, and since the third encoder and the first encoder adopt different structures, the feature map of the first image and the second feature map of the second image correspond to different feature spaces.

It will be appreciated that: the third encoderThe second feature map of the two images may also be characterized by a plurality of n-order matrices, where the number of matrices refers to the number of channels and there are several n matrices for several channels, one n matrix for each channel. Wherein the number of matrices used for representing the second characteristic image of the second image obtained by the third encoder and the order n of the matrices are determined by the model parameters of the third encoder, n is an integer greater than 1, and each matrix comprisesThe feature points in the n-order matrixes can represent the color, shape, detail, texture and other features of the second image. The number and the order of the matrixes obtained by the third encoder are the same as those of the matrixes obtained by the first encoder. Accordingly, the feature map of the first image obtained by the first encoder and the second feature map of the second image obtained by the third encoder correspond to different feature spaces, which can be understood as: in the matrix obtained by the third encoder and the matrix obtained by the first encoder, the information represented by the matrix corresponding to the same channel is different.

In some embodiments, optionally, before the second image is encoded from the image space to the feature space by the third encoder in S901, the method further includes:

or,

S902, fusing the feature map of the first image and the first noise map to obtain a fused first result.

S903, fusing the fused first result and the second feature map of the second image to obtain a fused third result.

Specifically, since the second feature map of the second image is obtained in S901, the number and the order of the matrices used for representing the first result are the same as the number and the order of the matrices used for representing the second feature map of the second image, and the fused first result and the second feature map of the second image are fused, specifically, the matrix used for representing the first result and the matrix used for representing the second feature map of the second image are correspondingly added, so that a fused third result can be obtained, and the number and the order of the matrices used for representing the third result are the same as the number and the order of the matrices used for representing the first result.

The second encoder of the first model to be trained is replaced with the third encoder in the second model to be trained, so that the second feature map of the second image obtained by the third encoder in S901 is different from the first feature map of the second image obtained by the second encoder in S101, and therefore, the fused first result and the first feature map of the second image are fused in S103, and the obtained second result is different from the result obtained by fusing the fused first result and the second feature map of the second image in S903, and here, for distinguishing, the fused result in S903 is referred to as a third result.

S904, inputting the fused first result and the fused third result to a target module to obtain a third noise diagram output by the second model to be trained.

Specifically, on the basis of the third result obtained in S903, the fused first result and the fused third result are input into the target module, and the first result and the third result are acted by the target module, so that noise contained in the first result can be predicted, a third noise diagram output by the second model to be trained is obtained, and the number and the order of the matrixes for representing the third noise diagram are the same as those for representing the first result.

Since the target module in the second model to be trained is the same as the target module in the first model to be trained, in one possible implementation manner, the target module includes at least one first downsampling module connected in series, at least one first upsampling module connected in series, and a plurality of second downsampling modules connected in series, where the at least one first downsampling module is connected in series with the at least one first upsampling module, the maximum order and the minimum order of the output matrix corresponding to each of the at least one first downsampling module and the at least one first upsampling module are the same, and the order of the output matrix corresponding to the plurality of second downsampling modules is a subset of the order of the output matrix corresponding to the at least one first upsampling module, and the implementation manner of S904 is described below:

Specifically, when predicting noise, the target module is used as a model of predicting noise, and at least one first downsampling module connected in series in the target module downsamples a first result obtained by fusing the feature map of the first image and the first noise map, namely: feature extraction is carried out on the output matrix corresponding to the first result, the order of the output matrix corresponding to the first result is continuously reduced until the output matrix with the minimum order (first downsampling result) is obtained, and downsampling is carried out on a third result obtained by fusing the first result and the second feature map of the second image through a plurality of second downsampling modules connected in series in the target module, namely: extracting features of an output matrix corresponding to the third result, and continuously reducing the order of the output matrix corresponding to the third result to obtain a third downsampling result corresponding to each second downsampling module; for at least one first up-sampling module in the target module, taking a first down-sampling result as an input of a first up-sampling module with the smallest order of an output matrix in the at least one first up-sampling module, adding the output of the first up-sampling module with the same order as the smallest order of the output matrix in the at least one second down-sampling module to the output of a second down-sampling module with the smallest order of the output matrix in the at least one second up-sampling module, taking the output of the up-sampling module as the input of the target up-sampling module, taking the result of adding the output of the up-sampling module to the output of a second down-sampling module with the same order as the output of the output matrix of the up-sampling module in the at least one first up-sampling module as the input of other up-sampling modules, sequentially up-sampling the respective input by the at least one first up-sampling module to obtain at least one second up-sampling result, taking the output of the first up-sampling module with the largest order of the output matrix in the at least one first up-sampling module as the second to-be-trained noise output of the second to-be-sampled, or taking the output of the first up-sampling module with the largest order as the output of the first up-sampling module with the largest order of the first up-sampling module.

In this embodiment, the second downsampling modules connected in series are used to downsample the third result obtained by fusing the first result and the second feature map of the second image, which is favorable for adding the third downsampling result to the output of the corresponding module in the at least one first upsampling module, so that the at least one module in the plurality of second downsampling modules and the at least one first upsampling module can be connected, the information of the second image is transferred to the at least one first upsampling module, the second model to be trained is better trained, and the target module has better generating capability and can ensure the accuracy of the third noise map.

Since in S904, only the second result in S104 is replaced by the third result, the implementation manner of obtaining the noise map by the target module is not changed, and other implementation manners of the target module can refer to the description in S104, so that repetition is avoided and no further description is provided here.

Taking the structure of the target module shown in fig. 5 as an example, a new convolution layer and a first zero convolution are taken as an example, and the structure of another second model to be trained shown in fig. 11 is described below.

Since the model structure and model parameters of the third encoder are different from those of the first encoder, the number of output matrices of the third encoder may be the same as or different from that of the first result. In the case where the number of output matrices of the third encoder (for example, 256) is different from the number of output matrices of the first result (for example, 4), the first result is denoted by (M, 4,64,64), the output of the third encoder is denoted by (M, 256,64,64), where M denotes the number of pieces of input image, the value thereof remains unchanged during training of the second model to be trained, 4 denotes the number of channels, and 64 are used to denote the number of rows and the number of columns of the matrix corresponding to the outputs of the first result and the third encoder, and the number of rows and the number of columns of the matrix in this embodiment are the same. The method comprises the steps of adding a convolution layer before a coding block 1, adding a first zero convolution before a coding block 5, carrying out convolution operation on a first result through the convolution layer, extracting characteristics, changing the number of output matrixes of the first result from 4 to 320, carrying out convolution operation on the output of a third coder through the first zero convolution, extracting characteristics, changing the number of output matrixes of the third coder from 256 to 320, and enabling the number and the order of matrixes in the output of the convolution layer and the output of the first zero convolution to be the same, namely, the number of channels (the number of matrixes) to be 320 and the order of matrixes to be 64. And fusing the output after the convolution layer and the output after the first zero convolution to obtain a fusion result, inputting the output of the convolution layer and the fusion result into a target module, and obtaining a third noise diagram through the effect of the target module. In fig. 11, compared with fig. 6, the structure of fig. 11 except for the third encoder (replacing the second encoder in fig. 6), the convolution layer, and the first zero convolution is the same as that of fig. 6, and in order to avoid repetition, a description thereof is omitted.

Fig. 12 shows a schematic structural diagram of a further second model to be trained. The structure shown in fig. 12 is added with a second zero convolution, a third zero convolution, a fourth zero convolution, a fifth zero convolution, and a sixth zero convolution on the basis of the structure shown in fig. 11, wherein the effect of the second zero convolution-sixth zero convolution is described in the corresponding implementation manner in fig. 7, and is not repeated herein.

S905, determining target parameters of the second model to be trained according to the third noise diagram and the first noise diagram.

The target parameters of the second model to be trained are parameters used when the second model to be trained converges, and may be specifically determined by the structure of the second model to be trained, for example, a weight of a convolution layer, a bias, a mean of a normalized layer, a variance, and the like, which are not specifically limited herein.

After the third noise diagram is obtained, since the third noise diagram is a prediction result of the noise included in the first result, according to the third noise diagram and the first noise diagram, by calculating the similarity of the third noise diagram and the first noise diagram, that is, the proximity degree of the third noise diagram and the first noise diagram, whether the second model to be trained converges can be determined, so that the target parameter used by the second model to be trained in convergence can be determined.

In some embodiments, optionally, the first encoder and the third encoder in the second model to be trained may be pre-trained structures, and parameters in the first encoder and the third encoder are kept unchanged, and accordingly, only the target parameters of the target module need to be determined in the training process of the second model to be trained, so that the training efficiency of the model can be improved.

In some embodiments, optionally, the second model to be trained further includes an upsampling module, the upsampling module being connected in series with the third encoder, and the upsampling module being located before the third encoder, the upsampling module being configured to upsample the second image to increase size information of the second image;

specifically, in order to enable the model obtained after training based on the method for training the second model to be trained in the present embodiment to be applied to the image super-resolution task, it is required to ensure that the size information of the image input to the third encoder is larger than the size information of the first size, and therefore, after the second image is obtained, it is required to upsample the second image by the upsampling module and input the upsampled image to the third encoder, so that the image obtained after upsampling is encoded from the image space to the feature space by the second encoder, and a feature map of the upsampled image is obtained.

In this embodiment, through the above process, the model obtained after training based on the method for training the second model to be trained in this embodiment can be applied to the image super-resolution task.

In one possible implementation, S905 determines, according to the third noise map and the first noise map, a target parameter of the second model to be trained, including:

The second predetermined loss function may be a cosine loss function, a join timing classification (connectionist temporal classification, CTC) loss function, a multi-classification cross entropy loss function, a mean square loss function, etc., and the loss function may be determined according to actual use requirements, which is not limited herein. The second predetermined loss function may be the same as the first predetermined loss function, or may be different from the first predetermined loss function, and is not limited herein. The second preset threshold may be a preset value, or may be any value as appropriate, which is not limited herein. The second preset threshold value and the first preset threshold value may be the same or different, and are not limited herein.

Specifically, after the output of the second model to be trained, that is, the third noise diagram is obtained, a corresponding second loss value can be calculated through a second preset loss function according to the third noise diagram and the first noise diagram. Based on the magnitude relation between the second loss value and the second preset threshold value, whether the parameters of the second model to be trained need to be adjusted or not can be determined, and therefore target parameters of the second model to be trained are obtained.

In this embodiment, the second preset loss function is a standard for determining whether the second model to be trained is qualified, and the similarity between the third noise diagram and the first noise diagram can be calculated through the second preset loss function to verify the recognition accuracy of the second model to be trained, so that the model obtained by training can be effectively ensured to have a higher-accuracy output result.

Fig. 13 shows a flow diagram of yet another method of training a model.

Referring to fig. 13, the method of training a model includes:

s1301, encoding the first image from the image space to the feature space by the first encoder, obtaining a feature map of the first image, and encoding the second image from the image space to the feature space by the third encoder, obtaining a second feature map of the second image.

The specific implementation manner of this step is referred to the description in S901, and in order to avoid repetition, the description is omitted here.

S1302, fusing the feature map of the first image and the first noise map to obtain a fused first result.

The specific implementation of this step is referred to the description in S902 above, and in order to avoid repetition, the description is omitted here.

And S1303, fusing the fused first result and the second feature map of the second image to obtain a fused third result.

The specific implementation of this step is described in S903, and in order to avoid repetition, the description is omitted here.

And S1304, inputting the fused first result and the fused third result to a target module to obtain a third noise diagram output by the second model to be trained.

The specific implementation manner of this step is referred to the description in S904, and in order to avoid repetition, the description is omitted here.

S1305, determining a second loss value of the third noise diagram and the first noise diagram by adopting a second preset loss function.

S1306, determining whether the second loss value is smaller than a second preset threshold.

If yes, executing S1307; if not, execution proceeds to S1308.

S1307, taking the parameters used in the training process of the second model to be trained corresponding to the condition that the second loss value is smaller than the second preset threshold value as target parameters.

If the second loss value is smaller than the second preset threshold, the second model to be trained converges, namely the second model to be trained is trained, and at the moment, the parameters used in the training process of the corresponding second model to be trained under the condition that the second loss value is smaller than the second preset threshold are taken as target parameters, so that the target parameters of the second model to be trained are obtained.

S1308, parameters used in the training process of the second model to be trained are adjusted.

If the second loss value is greater than or equal to the second preset threshold, it is indicated that the second model to be trained is not converged, and further the second model to be trained needs to be trained continuously, at this time, parameters used in the training process of the second model to be trained are adjusted, S1301 is executed back based on the adjusted parameters, and the parameters are executed downwards in sequence until the second loss value is smaller than the second preset threshold, and parameters used in the training process of the second model to be trained corresponding to the situation that the second loss value is smaller than the second preset threshold are taken as target parameters, where the parameters used in the training process of the second model to be trained corresponding to the situation that the second loss value is smaller than the second preset threshold are obtained after the parameters used in the training process of the last second model to be trained are adjusted.

In this embodiment, when the second loss value is greater than or equal to the second preset threshold, the parameters used in the last training process of the second model to be trained are adjusted, the adjusted parameters are used as parameters used in the current training process of the second model to be trained, and the second model to be trained is trained until the second loss value is less than the second preset threshold, so as to obtain the target parameters; under the condition that the second loss value is smaller than a second preset threshold value, directly taking the parameter used in the training process of the second model to be trained corresponding to the condition that the second loss value is smaller than the second preset threshold value as a target parameter, and obtaining more accurate model parameters through the process, thereby being beneficial to improving the performance of the model.

The method for processing an image provided by the embodiment of the application is described in detail below. Fig. 14 is a flowchart of a method of processing an image, which is implemented based on the first object model, and a process of processing an image will be described below taking the schematic diagram of the structure of processing an image based on the first object model shown in fig. 15 as an example.

Referring to fig. 14, a method of processing an image includes:

s1401, a third image is acquired.

The third image is an image to be improved in resolution, may be obtained from a gallery, may be obtained from a directory where the received image is located, or may be an image obtained by performing channel separation and difference processing on a RAW file obtained by shooting with a camera, or may be obtained by other manners, which is not limited herein.

S1402, inputting the third image and the fourth noise diagram into the first target model to obtain an output result of the first target model.

The first target model is obtained after training by a model training method aiming at a first model to be trained. The fourth noise figure may be generated by setting the mean and variance based on a noise generating function, such as torch. The step size of the fourth noise map may be preset, and may be determined according to circumstances, which is not limited herein. The iterative noise reduction times are the same as the corresponding numerical value of the step length of the fourth noise diagram.

And inputting the third image and the fourth noise image into the first target model, and carrying out repeated iterative noise reduction on the fourth noise image input into the first target model through the noise output by the first target model to obtain an output result of the first target model without noise.

For example, assuming that the noise step is T (T is a positive integer and has a large value), the fourth noise figure (assuming thatRepresentation) is larger, the third image and the fourth noise diagram are input into the first target model, the obtained output is predicted one-step noise, and one-time iterative noise reduction is carried out on the fourth noise diagram through the predicted one-step noise, namely: subtracting the fourth noise figure from the predicted one-step noise to obtain a subtracted result (/ -)>) Will->And the third image is input into the first target model, the obtained output is predicted one-step noise, and the predicted one-step noise pair is used for +.>And carrying out one iteration noise reduction, namely: will->Subtracting the predicted one-step noise to obtain +.>Will->And inputting the third image into the first target model, repeating the process, and subtracting the noise input into the first target model for the last time from the noise output by the first target model for the last time after T iterations, thereby obtaining the output result of the first target model.

It should be noted that: the noise step length T is only used to describe the iterative noise reduction process, and is not used to limit the iterative noise reduction times.

S1403, the output result of the first object model is input to the decoder, and the first object image is obtained.

Wherein the decoder is matched with a first encoder in a first model to be trained, and the resolution of the first target image is higher than the resolution of the third image. The decoder may be implemented by the second base module stack, and may be implemented by other structures, which are not limited herein; the decoder may be any other network or structure capable of upsampling, and is not limited herein.

Since the output result of the first object model is a representation in the feature space, it is necessary to input the output result of the first object model to a decoder for decoding, thereby restoring the output result of the first object model from the feature space to the image space, and obtaining the first object image.

In some embodiments, in order to enable the first target model to be used for the image super-resolution task, after the third image is acquired, the third image needs to be up-sampled to increase size information of the third image, a result obtained by up-sampling the third image and the third noise map are input to the first target model to obtain an output result, and the output result is input to a decoder to obtain a corresponding target image.

Fig. 16 is a flowchart of another method for processing an image, which is implemented based on the second object model, and a process for processing an image will be described below taking the schematic diagram of fig. 17 for processing an image based on the second object model as an example.

Referring to fig. 16, a method of processing an image includes:

s1601, a fourth image is acquired.

The fourth image is an image to be improved in resolution, may be obtained from a gallery, may be obtained from a directory where the received image is located, or may be an image obtained by performing channel separation and difference processing on a RAW file obtained by shooting with a camera, or may be obtained by other manners, which is not limited herein. The fourth image may be the same as or different from the third image, and is not limited herein.

And S1602, denoising the fourth image to obtain a fifth image, and encoding the fifth image from the image space to the feature space by a fourth encoder to obtain a feature map of the fifth image.

The fourth encoder and the first encoder may be the same encoder (the model structure and the model parameters are the same), or may be different encoders, but when the fourth encoder and the first encoder are different encoders, it is necessary to ensure that the feature map of the first image obtained by the first encoder and the feature map of the fifth image obtained by the fourth encoder correspond to the same feature space, that is, the information of the feature map characterization of the two is the same.

Because the fourth image has low resolution, in order to avoid interference caused by noise existing in the fourth image, denoising processing needs to be performed on the fourth image to obtain a fifth image, where denoising processing may be implemented by a Super Resolution (SR) model or a Noise Reduction (NR) model, or may be implemented by other models, and is not limited herein. The fifth image is input to a fourth encoder, and the fourth encoder can encode the fifth image from the image space to the feature space to obtain a feature map of the fifth image.

S1603, fusing the feature map of the fifth image and the fifth noise map, and inputting the fourth result and the fourth image obtained after fusion into the second target model to obtain an output result of the second target model.

The fifth noise map may be generated in the same manner as the first noise map, or may be generated in another manner, which is not limited herein. The noise step of the fifth noise figure may be adjusted according to the resolution of the fourth image, and the noise step may be increased when the resolution of the fourth image is lower. The second target model is obtained after training by a model training method aiming at a second model to be trained. The iterative noise reduction times are the same as the corresponding values of the step length of the fifth noise diagram.

And fusing the feature map of the fifth image and the fifth noise map to obtain a fourth result, inputting the fourth result and the fourth image into the second target model, and carrying out repeated iterative noise reduction on the fourth result obtained by fusing the feature map of the fifth image and the feature map of the fifth image through the noise output by the second target model to obtain the output result of the second target model without noise.

Exemplary, assume that the feature map of the fifth image is denoted by x and the fifth noise map is denoted by The fourth result is expressed as [ ]) M represents the step size and the iterative noise reduction number of the fifth noise figure, and the fourth result (/ -)>) And inputting the fourth image into the second target model, obtaining output which is predicted one-step noise, and carrying out one-time iterative noise reduction on a fourth result through the predicted one-step noise, namely: and subtracting the predicted one-step noise from a fourth result, wherein the fourth result is obtained by fusing a feature map of the fifth image and the fifth noise map, and the subtraction is that the fifth noise map is subtracted from the predicted one-step noise, and the feature map of the fifth image is kept unchanged, so as to obtain a subtracted result (>) Will ()>) And the fourth image is input into the second target model, the obtained output is predicted one-step noise, and the predicted one-step noise pair (/ pair) is used for obtaining the target image>) And carrying out one iteration noise reduction, namely: will (/ ->) Subtracting the predicted one-step noise to obtain (/ -A)>) Will ()>) And inputting the fourth image into the second target model, repeating the process, and subtracting the noise output by the second target model from the fusion result input into the second target model last time after M iterations, thereby obtaining the output result of the second target model.

It should be noted that: the noise step length M is only used for explaining the iterative noise reduction process, and is not used for limiting the iterative noise reduction times.

S1604, the output result of the second object model is input to the decoder, to obtain a second object image.

The decoder is matched with a first encoder in a second model to be trained, the decoder is matched with a fourth encoder, and the resolution of the second target image is higher than that of the fourth image.

Since the output result of the second object model is a representation in the feature space, it is necessary to input the output result of the second object model to the decoder for decoding, thereby restoring the output result of the second object model from the feature space to the image space, and obtaining the second object image.

In some embodiments, in order to enable the second target model to be used for the image super-resolution task, after the fourth image is acquired, the fourth image needs to be upsampled to increase the size information of the fourth image, then the upsampled result of the fourth image is subjected to denoising processing to obtain a sixth image, the sixth image is encoded from the image space to the feature space through a fourth encoder to obtain a feature map of the sixth image, the feature map of the sixth image and the fourth noise map are fused, the fused result and the upsampled result of the fourth image are input to the second target model to obtain an output result, and the output result is input to a decoder to obtain a corresponding target image.

In some embodiments, the first encoder involved in the method for training a model and the decoder involved in the method for processing an image may be trained in advance, and a training process of the self-encoder will be described below by taking a schematic structure of the self-encoder as shown in fig. 18 as an example, where the structure of the self-encoder includes the encoder and the decoder. The self-encoder may be trained specifically by:

as shown in fig. 18, an input image (the input image may be acquired by the same method as the first image acquisition method, or may be acquired by other methods, not limited herein) is encoded from an image space to a feature space by an encoder, and an output of the encoder is decoded from the feature space to the image space by a decoder to obtain an output image; determining a third loss value of the input image and the output image by adopting a third preset loss function; and determining target parameters of the encoder and the decoder according to the magnitude relation between the third loss value and the reference threshold value.

The third predetermined loss function may be a cosine loss function, a join timing classification (connectionist temporal classification, CTC) loss function, a multi-classification cross entropy loss function, a mean square loss function, etc., and the loss function may be determined according to actual use requirements, which is not limited herein. The third predetermined loss function may be the same as the first predetermined loss function, or may be different from the first predetermined loss function, and is not limited herein. The third predetermined loss function and the second predetermined loss function may be the same or different, and are not limited herein. The reference threshold may be preset, or may be determined according to circumstances, and is not limited herein.

After the self-encoder is trained, the finger model parameters are kept unchanged, the first encoder, the second encoder and the fourth encoder in the image processing method in the model training method can directly adopt the structure and model parameters of the encoder in the self-encoder, and the decoder in the image processing method can directly adopt the structure and model parameters of the decoder in the self-encoder, so that the model training process and the image processing process are accelerated.

It should be understood that the above description is intended to aid those skilled in the art in understanding the embodiments of the present application, and is not intended to limit the embodiments of the present application to the specific values or particular scenarios illustrated. It will be apparent to those skilled in the art from the foregoing description that various equivalent modifications or variations can be made, and such modifications or variations are intended to be within the scope of the embodiments of the present application.

The embodiment of the application also provides a device for training the model, which can be a chip, a component or a module, and can comprise a processor and a memory which are connected; the memory is configured to store computer-executable instructions, and when the apparatus is running, the processor may execute the computer-executable instructions stored in the memory, so that the chip performs the method for training the model in the above method embodiments.

The embodiment of the application also provides a device for processing the image, which can be a chip, a component or a module, and can comprise a processor and a memory which are connected; the memory is configured to store computer-executable instructions, and when the apparatus is running, the processor may execute the computer-executable instructions stored in the memory, so that the chip performs the method for processing an image in the above method embodiments.

The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores computer instructions; the computer readable storage medium, when run on means for training a model, causes the means for training a model to perform the method for training a model as described above, or the means for processing an image to perform the method for processing an image as described above. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device including one or more servers, data centers, etc. that can be integrated with the medium. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium, or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

The embodiments of the present application also provide a computer program product comprising computer instructions which, when run on a device for training a model, enable the device for training a model to perform the technical solution for training a model as shown above, or which, when run on a device for processing an image, enable the device for processing an image to perform the technical solution for processing an image as shown above.

The device for training a model, the device for processing an image, the electronic device, the computer readable storage medium, and the computer program product provided in the embodiments of the present application are all configured to execute the corresponding methods provided above, so that the beneficial effects achieved by the device for training a model can refer to the beneficial effects corresponding to the corresponding methods provided above, and are not described herein again.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus and the electronic device described above may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that reference to "a plurality" in the present specification and appended claims means two or more. In the description of the present application, "/" means or, unless otherwise indicated, for example, A/B may represent A or B; "and/or" herein is merely an association describing an associated object, and refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations, e.g., a and/or B, which may represent: a exists alone, A and B exist together, and B exists alone.

As used in the present description and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method of training a model, characterized in that a first model to be trained comprises: a first encoder, a second encoder, and a target module, the method comprising:

encoding a first image from an image space to a feature space by the first encoder to obtain a feature map of the first image, and encoding a second image from the image space to the feature space by the second encoder to obtain the first feature map of the second image, wherein the difference between the second image and the first image is: the resolution of the second image is lower than that of the first image, and the feature map of the first image and the first feature map of the second image correspond to the same feature space;

inputting the fused first result and the fused second result to the target module to obtain a second noise diagram output by the first model to be trained, wherein the target module is used for predicting noise in the fused first result;

2. The method of claim 1, wherein the target module comprises at least one first downsampling module in series, at least one first upsampling module in series, and a plurality of second downsampling modules in series, the at least one first downsampling module being in series with the at least one first upsampling module, the at least one first downsampling module, the at least one first upsampling module each corresponding to the same maximum order, the minimum order of the output matrix, the order of the output matrix corresponding to the plurality of second downsampling modules being a subset of the order of the output matrix corresponding to the at least one first upsampling module;

Inputting the fused first result and the fused second result to the target module to obtain a second noise diagram output by the first model to be trained, wherein the second noise diagram comprises:

the output of the previous stage module is subjected to downsampling in sequence through the at least one first downsampling module to obtain a first downsampling result, wherein the first downsampling result is the output of a first downsampling module with the smallest order of an output matrix in the at least one first downsampling module, and the input of a first downsampling module with the largest order of the output matrix in the at least one first downsampling module is the fused first result;

sequentially downsampling the output of the previous stage module through the plurality of second downsampling modules to obtain a plurality of second downsampling results, wherein the input of a second downsampling module with the largest order of an output matrix in the plurality of second downsampling modules is the fused second result;

sequentially upsampling respective inputs through the at least one first upsampling module to obtain at least one first upsampling result, wherein the output of a first upsampling module with the largest order of an output matrix in the at least one first upsampling module is the second noise figure, or the result of adding the output of a second downsampling module with the same largest order of the output matrix in the at least one first upsampling module in the plurality of second downsampling modules to the output of a first upsampling module with the largest order of the output matrix in the at least one first upsampling module is the second noise figure;

The input of a first up-sampling module with the minimum order of an output matrix in the at least one first up-sampling module is the first down-sampling result; the input of the target up-sampling module in the at least one first up-sampling module is the result of adding the output of the upper stage module of the target up-sampling module and the output of the second down-sampling module with the smallest order of the output matrix in the plurality of second down-sampling modules, and the target up-sampling module is the lower stage module of the first up-sampling module with the same order of the output matrix in the at least one first up-sampling module and the smallest order of the output matrix in the plurality of second down-sampling modules; the input of other up-sampling modules in the at least one first up-sampling module is the output of the upper-stage module, or is the result of adding the output of the upper-stage module and the output of a second down-sampling module with the same order as the output matrix of the upper-stage module in the plurality of second down-sampling modules.

3. The method of claim 1, wherein determining the target parameters of the first model to be trained from the second noise figure and the first noise figure comprises:

Determining a first loss value of the second noise map and the first noise map by adopting a first preset loss function;

and determining the target parameters of the first model to be trained according to the magnitude relation between the first loss value and the first preset threshold value.

4. A method according to claim 3, wherein determining the target parameters of the first model to be trained according to the magnitude relation between the first loss value and the first preset threshold value comprises:

returning to execute if the first loss value is greater than or equal to the first preset threshold value: and encoding a first image from an image space to a feature space through the first encoder to obtain a feature image of the first image, encoding a second image from the image space to the feature space through the second encoder to obtain a first feature image of the second image until the first loss value is smaller than the first preset threshold value, and taking a parameter used in the corresponding first model to be trained in the case that the first loss value is smaller than the first preset threshold value as the target parameter, wherein the parameter used in the corresponding first model to be trained in the case that the first loss value is smaller than the first preset threshold value is obtained after the parameter used in the last first model to be trained is adjusted.

5. The method according to any of claims 1-4, wherein prior to said encoding the second image from image space to feature space by said second encoder, obtaining a first feature map of said second image, said method further comprises:

performing degradation treatment on the first image to obtain a degraded first image, and performing up-sampling on the degraded first image to obtain the second image;

or,

and converting the first image into an unprocessed RAW file, and performing channel separation and interpolation processing on the RAW file to obtain the second image.

6. A method of processing an image, the method comprising:

acquiring a third image;

inputting the third image and the fourth noise figure into a first target model to obtain an output result of the first target model, wherein the first target model is obtained by the method for training a model according to any one of claims 1 to 5;

inputting the output result of the first object model to a decoder to obtain a first object image, wherein the decoder is matched with the first encoder according to any one of the claims 1 to 5, and the resolution of the first object image is higher than the resolution of the third image.

7. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the method of any one of claims 1 to 5 or the method of claim 6 when the computer program is executed.

8. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 5, or the method of claim 6.