CN116543246A

CN116543246A - Training method of image denoising model, image denoising method, device and equipment

Info

Publication number: CN116543246A
Application number: CN202210089889.XA
Authority: CN
Inventors: 刘帅伟; 黄飞
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-01-25
Filing date: 2022-01-25
Publication date: 2023-08-04

Abstract

The application discloses a training method of an image denoising model, an image denoising method, an image denoising device and equipment, and relates to the technical field of artificial intelligence. The method comprises the following steps: acquiring a training sample of an image denoising model; extracting characteristic information of the noisy sample image by an encoder; processing the characteristic information of the noisy sample image through a first decoder to obtain a prediction noise image corresponding to the noisy sample image; generating a first prediction denoising image corresponding to the noisy sample image according to the prediction noise image and the noisy sample image; processing the characteristic information of the noisy sample image through a second decoder to obtain a second prediction denoising image corresponding to the noisy sample image; and determining training loss of the image denoising model according to the first prediction denoising image, the second prediction denoising image and the noise-free sample image, and adjusting parameters of the image denoising model based on the training loss. The method and the device can be applied to the scenes with images such as documents, PPT (slide show) and the like subjected to shadow removal.

Description

Training method of image denoising model, image denoising method, device and equipment

Technical Field

The embodiment of the application relates to the technical field of artificial intelligence, in particular to a training method of an image denoising model, an image denoising method, an image denoising device and image denoising equipment.

Background

When people shoot images, some shooting problems can be avoided, and noise such as shadows, blurring and the like exists in the shot images.

Taking image shadow removal as an example, in the related art, there is provided an image shadow removal method based on deep learning. And inputting the shadow image and the shadow mask image corresponding to the shadow image into a trained image shadow removing model, and outputting the shadow removing image by the image shadow removing model.

However, in practical applications, only a shadow image can be obtained, but a shadow mask image corresponding to the shadow image cannot be obtained, which results in that the above manner cannot achieve a better shadow removing effect.

Disclosure of Invention

The embodiment of the application provides a training method of an image denoising model, an image denoising method, an image denoising device and equipment. The technical scheme is as follows:

according to an aspect of an embodiment of the present application, there is provided a training method of an image denoising model, the image denoising model including an encoder, a first decoder, and a second decoder; the method comprises the following steps:

Acquiring a training sample of the image denoising model, wherein the training sample comprises a noise-free sample image, a pure noise image and a noisy sample image generated based on the noise-free sample image and the pure noise image;

extracting characteristic information of the noisy sample image by the encoder;

processing the characteristic information of the noisy sample image through the first decoder to obtain a prediction noise image corresponding to the noisy sample image; generating a first prediction denoising image corresponding to the noisy sample image according to the prediction noise image and the noisy sample image;

processing the characteristic information of the noisy sample image through the second decoder to obtain a second prediction denoising image corresponding to the noisy sample image;

and determining training loss of the image denoising model according to the first prediction denoising image, the second prediction denoising image and the noise-free sample image, and adjusting parameters of the image denoising model based on the training loss.

According to an aspect of an embodiment of the present application, there is provided an image denoising method, including:

acquiring a noisy image to be processed by an image denoising model, wherein the image denoising model comprises an encoder and a first decoder;

Extracting characteristic information of the noisy image through the encoder;

processing the characteristic information of the noisy image through the first decoder to obtain a noisy image corresponding to the noisy image;

and generating a denoising image corresponding to the noisy image according to the noisy image and the noisy image.

According to an aspect of an embodiment of the present application, there is provided a training apparatus of an image denoising model including an encoder, a first decoder, and a second decoder; the device comprises:

the sample acquisition module is used for acquiring training samples of the image denoising model, wherein the training samples comprise a noise-free sample image, a pure noise image and a noisy sample image generated based on the noise-free sample image and the pure noise image;

the information extraction module is used for extracting the characteristic information of the noisy sample image through the encoder;

the first processing module is used for processing the characteristic information of the noisy sample image through the first decoder to obtain a prediction noise image corresponding to the noisy sample image; generating a first prediction denoising image corresponding to the noisy sample image according to the prediction noise image and the noisy sample image;

The second processing module is used for processing the characteristic information of the noisy sample image through the second decoder to obtain a second prediction denoising image corresponding to the noisy sample image;

and the parameter adjustment module is used for determining the training loss of the image denoising model according to the first prediction denoising image, the second prediction denoising image and the noiseless sample image, and adjusting the parameters of the image denoising model based on the training loss.

According to an aspect of an embodiment of the present application, there is provided an image denoising apparatus, including:

the image acquisition module is used for acquiring a noisy image processed by an image denoising model, and the image denoising model comprises an encoder and a first decoder;

the information extraction module is used for extracting the characteristic information of the noisy image through the encoder;

the information processing module is used for processing the characteristic information of the noisy image through the first decoder to obtain a noisy image corresponding to the noisy image;

and the image generation module is used for generating a denoising image corresponding to the noisy image according to the noisy image and the noisy image.

According to an aspect of the embodiments of the present application, there is provided a computer device, where the computer device includes a processor and a memory, where at least one instruction, at least one program, a code set, or an instruction set is stored in the memory, where the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by the processor to implement a training method of the image denoising model or implement the image denoising method.

According to an aspect of the embodiments of the present application, there is provided a computer readable storage medium having stored therein at least one instruction, at least one program, a code set, or an instruction set, where the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by a processor to implement the training method of the image denoising model or implement the image denoising method.

According to an aspect of embodiments of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the training method of the image denoising model or performs the image denoising method.

The technical scheme provided by the embodiment of the application at least comprises the following beneficial effects:

the embodiment of the application provides a training method of an image denoising model based on multi-task learning, which enables the image denoising model to learn a predicted noise image corresponding to a noisy sample image on one hand, and enables the image denoising model to learn a predicted noise image corresponding to a noisy sample image on the other hand, wherein the two subtasks mutually promote in performance, thereby being beneficial to improving the denoising effect of the image denoising model obtained by final training. Moreover, the input data of the image denoising model only needs to comprise a noisy sample image and does not need to comprise a pure noise image, so that the limitation of the model input data in the related technology can be overcome, and the universality and the practicability of the scheme are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an implementation environment for an embodiment provided herein;

FIG. 2 is a flowchart of a training method for an image denoising model according to one embodiment of the present application;

FIG. 3 is a schematic diagram of a shadow decoder provided by one embodiment of the present application;

FIG. 4 is a schematic diagram of a restoration decoder provided in one embodiment of the present application;

FIG. 5 is a flowchart of a training method for an image denoising model according to another embodiment of the present application;

FIG. 6 is a flow chart of a classifier provided in one embodiment of the present application;

FIG. 7 is a flow chart of a arbiter provided in one embodiment of the present application;

FIG. 8 is a schematic diagram of the components of a arbiter provided in one embodiment of the present application;

FIG. 9 is a schematic diagram of a arbiter according to one embodiment of the present application;

FIG. 10 is a flow chart of an image denoising method provided by one embodiment of the present application;

FIG. 11 is a schematic diagram of an image denoising method according to one embodiment of the present application;

FIG. 12 is a block diagram of a training apparatus for image denoising model provided in one embodiment of the present application;

FIG. 13 is a block diagram of a training apparatus for image denoising model according to another embodiment of the present application;

FIG. 14 is a block diagram of an image denoising apparatus provided according to one embodiment of the present application;

FIG. 15 is a block diagram of an image denoising apparatus according to another embodiment of the present application;

fig. 16 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

The embodiment of the application relates to a machine learning technology in artificial intelligent learning, an image denoising model is trained in a machine learning mode, and an image is denoised by adopting the image denoising model, and the technical scheme of the application is introduced through several embodiments.

Referring to fig. 1, a schematic diagram of an implementation environment of an embodiment of the present application is shown. The scenario implementation environment may include a model training apparatus 10 and a model using apparatus 20.

Model training device 10 may be an electronic device such as a computer, server, intelligent robot, or some other electronic device with relatively high computing power. Model training apparatus 10 is used to train an image denoising model 30. In the embodiment of the present application, the image denoising model 30 is a neural network model for image denoising, and the model training device 10 may train the image denoising model 30 so that it has a better denoising performance.

The trained image denoising model 30 can be deployed in the model using device 20 to provide an image denoising result. The model using device 20 may be a terminal device such as a mobile phone, a computer, a smart tv, a multimedia playing device, a wearable device, a medical device, or a server, which is not limited in this application.

In some embodiments, as shown in fig. 1, the image denoising model 30 may include: an encoder 31, a first decoder 32 and a second decoder 33. The encoder 31 is used to extract characteristic information of the noisy sample image. The first decoder 32 is configured to process the feature information of the noisy sample image to obtain a prediction noise image corresponding to the noisy sample image, and then may generate a first prediction denoising image corresponding to the noisy sample image according to the prediction noise image and the noisy sample image. The second decoder 33 is configured to process the feature information of the noisy sample image to obtain a second predicted denoising image corresponding to the noisy sample image. Then, a training loss of the image denoising model 30 is determined according to the first prediction denoising image, the second prediction denoising image and the noise-free sample image, and parameters of the image denoising model 30 are adjusted based on the training loss.

The training method of the image denoising model provided by the embodiment of the application adopts the idea of multi-task learning. The multitasking learning refers to learning by unifying a plurality of subtasks into one machine learning model. In the training process, the model can learn a plurality of subtasks simultaneously, partial information of the model can be shared among the subtasks, and interaction is achieved. In the embodiment of the application, on one hand, the image denoising model learns a prediction noise image corresponding to the noisy sample image, and on the other hand, the image denoising model learns a prediction denoising image corresponding to the noisy sample image (namely a second prediction denoising image below), and the two subtasks mutually promote in performance, so that the denoising effect of the image denoising model obtained by final training is improved.

Optionally, as shown in fig. 1, the image denoising model 30 further includes a classifier 34. The classifier 34 is configured to process the feature information of the noisy sample image or the noiseless sample image to obtain a noise prediction result of the noisy sample image or the noiseless sample image, where the noise prediction result is used to indicate whether the noisy sample image or the noiseless sample image contains noise. Based on the two subtasks, one subtask can be further added, three subtasks are simultaneously carried out in the model training process, and the three subtasks are mutually promoted in performance, so that the denoising effect of the image denoising model is further improved.

Optionally, the embodiment of the application can also introduce the idea of countermeasure learning in the training process of the image denoising model. A discriminator and a generator are introduced in the countermeasure learning, the discriminator being used to distinguish whether the input sample is from a sample generated by the generator or a label sample. The samples that the generator uses to fool the arbiter itself to generate are label samples. In the course of antagonism learning of generator and discriminant, the final generator is evolved step by step to reach the distribution of the generated data approaching the distribution of the label data. Optionally, as shown in fig. 1, in the case of introducing the countermeasure learning, the image denoising model 30 further includes a discriminator 35, where the discriminator 35 is configured to perform pixel level discrimination on the first prediction denoised image and the noise-free sample image, respectively, to obtain pixel level discrimination results corresponding to the first prediction denoised image and the noise-free sample image, respectively. The corresponding generator can be seen as a combined structure of the encoder 31 and the first decoder 32. The detail of the denoising result is optimized by performing pixel level discrimination by using countermeasure learning, so that the denoising effect of the image denoising model can be further improved, and the detail display quality of the image denoising model is improved.

The image denoising model provided by the embodiment of the application can be suitable for removing noise information in images such as pictures or videos. Optionally, the noise information includes, but is not limited to, at least one of: shadows, rain, snow, fog, flaws, and the like.

In the following, the technical solutions of the present application will be described by several method embodiments.

Referring to fig. 2, a flowchart of a training method of an image denoising model according to an embodiment of the present application is shown. The subject of execution of the steps of the method may be the model training apparatus described above. The method may include the following steps (210-250):

at step 210, training samples of an image denoising model are obtained, the training samples including a noise-free sample image, a pure noise image, and a noisy sample image generated based on the noise-free sample image and the pure noise image.

The image denoising model is a neural network model for removing noise information in an image, including but not limited to at least one of: shadows, rain, snow, fog, flaws, and the like. Through the image denoising model, the effects of shadow removal, rain removal, snow removal, fog removal, flaw removal and the like can be achieved. Considering the performance of the image denoising model, it generally has only a function of removing one type of noise information, for example, a certain image denoising model only has a shadow removing function, but the application is not limited to a single image denoising model, which has a function of removing multiple types of noise information, for example, a certain image denoising model has multiple functions of removing rain, snow, fog, and the like.

Alternatively, the image denoising model includes an encoder, a first decoder (or referred to as a "noise decoder"), and a second decoder (or referred to as a "restoration decoder"). Wherein the encoder and the first decoder are a combination for processing the noisy sample image to obtain a prediction noise image. The encoder and the second decoder are another combination for processing the noisy sample image to obtain a predicted denoised image (i.e. hereinafter "second predicted denoised image").

The training samples are used for training the image denoising model, and comprise a noise-free sample image, a pure noise image and a noisy sample image generated based on the noise-free sample image and the pure noise image. The noise-free sample image refers to an image without noise information, the pure noise image refers to an image with noise information only, and the noise-containing sample image may be an image generated by fusing the noise-free sample image and the pure noise image.

In the case where the noise information is a shadow, the noise-free sample image is an image that does not contain a shadow. The noise-free sample image is an image obtained by photographing the target object under a first specific condition, for example, photographing the target object under conditions of specific angles, illumination, distance, and the like, so as to obtain the noise-free sample image. The target object may be an object such as a book, paper, PPT, etc., which is not limited in this application. Alternatively, different conditions such as angle, illumination, distance and the like can be adjusted, or different target objects can be adjusted, so that a plurality of different noise-free sample images can be shot.

In the case where the noise information is shadow, the pure noise image is an image containing only shadow. The pure noise image is an image obtained by photographing the blank object under a second specific condition, such as photographing the blank object under a specific angle, illumination, distance, and the like, and generating shadows by a natural or structural shielding manner, thereby obtaining the pure noise image. The blank object may be blank paper, blank wall, blank curtain, or the like, which is not limited in this application. Optionally, conditions such as different angles, illumination, distance and the like can be adjusted, or different blank objects can be adjusted, so that a plurality of different pure noise images can be shot.

Optionally, any noise-free sample image and any pure noise image are synthesized, so that a noisy sample image can be generated. For example, the noisy sample image is obtained by multiplying the pixel values of the corresponding positions in the noiseless sample image and the pure noise image. For another example, the pixel values at corresponding positions in the noise-free sample image and the pure noise image are added to obtain the noisy sample image. In the embodiment of the present application, the generation manner of the noisy sample image is not limited. By selecting different noiseless sample images and different pure noise images for synthesis, a large number of noisy sample images can be generated and used as training samples for training the model.

At step 220, the characteristic information of the noisy sample image is extracted by the encoder.

The encoder is used for extracting characteristic information of the noisy sample image. The encoder may be a neural network capable of extracting high-level semantic features from noisy sample images as feature information. Optionally, the encoder includes a downsampling module, a normalization module, and an activation function module. After the noisy sample image is input to the encoder, firstly, the noisy sample image is subjected to downsampling treatment through a downsampling module, so that a downsampling treatment result is obtained; then, carrying out normalization processing on the downsampling processing result through a normalization module to obtain a normalization processing result; and then, processing the normalization processing result through an activation function module, and finally outputting the characteristic information of the noisy sample image. The characteristic information of the noisy sample image may be output in the form of a characteristic map.

In some embodiments, as shown in fig. 3, fig. 3 illustrates a schematic diagram of an encoder including a downsampling module 31, a normalizing module 32, and an activation function module 33. Illustratively, the encoder section contains 8 downsampling modules to progressively extract high-level semantic features from low to high for the input noisy sample image. Each downsampling module can halve the spatial resolution of the input features, and after 8 downsampling modules, the spatial resolution of the input noisy sample image is reduced to 1/256 of the original spatial resolution of the high-level semantic features. Therefore, the spatial resolution of the input noisy sample image is required to be an integer multiple of 256. Of course, the number of the downsampling modules can be flexibly adjusted according to actual tasks, which is not limited in the application. As shown in fig. 3, the downsampling process of the downsampling module 31 is implemented using a convolutional neural network, for example, using a convolution of step size 2 instead of the conventional pooling process of kernel size 2. After convolution, the normalization module 32 is used to perform batch normalization (Batch Normlization, abbreviated as BN) processing on the downsampling processing result output by the downsampling module 31, so as to obtain a normalization processing result. Then, the normalization processing result is processed by the activation function module 33, for example, a leak ReLU with a parameter of 0.2 is used as an activation function, so as to obtain feature information of the noisy sample image. In the encoder, convolution operation is adopted to replace pooling operation for downsampling, so that information loss in the downsampling process can be reduced, image information of each position is considered as comprehensively as possible, and accuracy of finally obtained characteristic information is improved.

Step 230, processing the characteristic information of the noisy sample image through a first decoder to obtain a predicted noise image corresponding to the noisy sample image; and generating a first prediction denoising image corresponding to the noisy sample image according to the prediction noise image and the noisy sample image.

The first decoder may also be referred to as a noise decoder, and is configured to predict noise information included in the input image, that is, to predict a prediction noise image corresponding to the noisy sample image. Then, based on the noisy sample image and the prediction noise image, a first prediction noise-removed image may be obtained, wherein a resolution of the first prediction noise-removed image is the same as a resolution of the noisy sample image.

The first decoder may be a neural network. Optionally, the first decoder includes an upsampling module, a normalizing module, and an activation function module. After the characteristic information of the sample image with noise is input to a first decoder, the characteristic information of the sample image with noise is subjected to up-sampling processing through an up-sampling module, and an up-sampling processing result is obtained; then, carrying out normalization processing on the upsampling processing result through a normalization module to obtain a normalization processing result; and then, processing the normalization processing result through an activation function module, and finally outputting a prediction noise image corresponding to the noisy sample image.

In some embodiments, as shown in fig. 4, fig. 4 illustrates a schematic diagram of a noise decoder including an upsampling module 41, a normalization module 42, and an activation function module 43. Illustratively, the noise decoder comprises 8 upsampling modules to gradually recover the spatial resolution lost due to the encoder, resulting in an output of the same spatial resolution as the input noisy sample image. As shown in fig. 4, the upsampling process of the upsampling module 41 is implemented using a deconvolution neural network, for example, employing deconvolution of the learnable parameters instead of the conventional bilinear interpolation method. After deconvolution, the up-sampling processing result output by the up-sampling module 41 is subjected to BN processing by using the normalization module 42, and a normalization processing result is obtained. Then, the normalization processing result is processed by the activation function module 43, for example, a ReLU is used as an activation function, so as to obtain a prediction noise image corresponding to the noisy sample image. In the decoder, the deconvolution of the learnable parameters is adopted to replace the traditional bilinear interpolation method, so that on one hand, the characterization performance of the decoder can be improved because the deconvolution of the learnable parameters has stronger characterization capability, and on the other hand, the deconvolution processing has lower complexity compared with the bilinear interpolation method, so that the complexity of the calculation processing can be reduced.

And after obtaining a prediction noise image corresponding to the noisy sample image, obtaining a first prediction denoising image based on the prediction noise image and the noisy sample image. Optionally, the first predicted denoised image is generated in a manner corresponding to the manner in which the noisy sample image is generated. For example, in the case of multiplying the pixel values at the corresponding positions in the noise-free sample image and the pure noise image to obtain a noisy sample image, it is necessary to divide the noisy sample image by the prediction noise image at this time to obtain a first prediction noise-removed image. For another example, when a noisy sample image is obtained by adding pixel values at corresponding positions in a noise-free sample image and a pure noise image, it is necessary to obtain a first predicted noise-removed image by subtracting a predicted noise image from the noisy sample image.

And step 240, processing the characteristic information of the noisy sample image through a second decoder to obtain a second prediction denoising image corresponding to the noisy sample image.

The second decoder, which may also be referred to as a restoration decoder, is used to remove noise information from the noisy-sample image to obtain a second predicted denoised image. Alternatively, the structure of the second decoder is the same as that of the first decoder. Wherein the resolution of the second predictive denoised image is the same as the resolution of the noisy sample image.

The second decoder may be a neural network. Optionally, the second decoder includes an upsampling module, a normalizing module, and an activation function module. After the characteristic information of the sample image with noise is input to the second decoder, the characteristic information of the sample image with noise is subjected to up-sampling processing through an up-sampling module, and an up-sampling processing result is obtained; then, carrying out normalization processing on the upsampling processing result through a normalization module to obtain a normalization processing result; and then, processing the normalization processing result through an activation function module, and finally outputting a second prediction denoising image corresponding to the noisy sample image.

In some embodiments, the structure of the restoration decoder is the same as that of the noise decoder, and taking the structure of fig. 4 as an example, fig. 4 exemplarily shows a schematic diagram of the restoration decoder. The restoration decoder comprises an upsampling module 41, a normalizing module 42 and an activation function module 43. Illustratively, the restoration decoder comprises 8 upsampling modules to gradually restore the spatial resolution loss due to the encoder, resulting in an output of the same spatial resolution as the input noisy sample image. As shown in fig. 4, the upsampling process of the upsampling module 41 is implemented using a deconvolution neural network, for example, employing deconvolution of the learnable parameters instead of the conventional bilinear interpolation method. After deconvolution, the up-sampling processing result output by the up-sampling module 41 is subjected to BN processing by using the normalization module 42, and a normalization processing result is obtained. Then, the normalization processing result is processed by the activation function module 43, for example, a ReLU is used as an activation function, so as to obtain a second predicted denoising image corresponding to the noisy sample image. The recovery decoder and the noise decoder have the same configuration, but do not share parameters and learn independently. The restoration decoder is an end-to-end neural network capable of directly outputting the denoised image.

Step 250, determining a training loss of the image denoising model according to the first prediction denoising image, the second prediction denoising image and the noise-free sample image, and adjusting parameters of the image denoising model based on the training loss.

Optionally, a training loss of the encoder and the first decoder is determined from the first predicted denoised image and the noise-free sample image, and parameters of the encoder and the first decoder are adjusted based on the training loss. Optionally, a training loss of the encoder and the second decoder is determined from the second predicted denoised image and the noise-free sample image, and parameters of the encoder and the second decoder are adjusted based on the training loss. The training loss is used for measuring the difference between a predicted denoising image and a noise-free sample image obtained by the image denoising model.

Referring to fig. 5, a flowchart of a training method of an image denoising model according to another embodiment of the present application is shown. The subject of execution of the steps of the method may be the model training apparatus described above. The method may include the following steps (510-580). Wherein, part of the steps in this embodiment are described above, and are not described herein.

Step 510, obtaining training samples of an image denoising model, wherein the training samples comprise a noise-free sample image, a pure noise image, and a noisy sample image generated based on the noise-free sample image and the pure noise image.

At step 520, feature information of the noisy sample image is extracted by the encoder.

Step 530, processing the characteristic information of the noisy sample image through a first decoder to obtain a prediction noise image corresponding to the noisy sample image; and generating a first prediction denoising image corresponding to the noisy sample image according to the prediction noise image and the noisy sample image.

Optionally, generating the first predicted denoising image corresponding to the noisy sample image comprises: dividing the value of the pixel at the corresponding position in the sample image with noise and the predicted noise image to obtain a first predicted denoising image; or subtracting the values of the pixels at the corresponding positions in the sample image with noise and the prediction noise image to obtain a first prediction denoising image.

In actual operation, the generation mode of the first prediction denoising image is determined according to the generation mode of the noisy sample image. For example, in the case where the noisy sample image is obtained by multiplying the values of the pixels at the corresponding positions in the noise-free sample image and the pure noise image, the values of the pixels at the corresponding positions in the noisy sample image and the predicted noise image are divided to obtain the first predicted denoised image. And when the noisy sample image is obtained by adding the values of the pixels at the corresponding positions in the noiseless sample image and the pure noise image, subtracting the values of the pixels at the corresponding positions in the noisy sample image and the predicted noise image to obtain a first predicted denoising image.

In some embodiments, when the predicted noisy sample image is an image obtained by multiplying the values of the pixels at corresponding locations in the noise-free sample image and the pure noise image, then the first predicted denoising image is an image obtained by dividing the values of the pixels in the predicted noisy sample image by the values of the pixels at corresponding locations in the predicted noise image, and the first predicted denoising image may be formulated as follows:

where Input is the Input to the overall network model, i.e., the predicted noisy sample image. Noise is the output of the Noise decoder, i.e. the prediction Noise image. Epsilon is a very small constant to avoid the occurrence of a denominator of 0 in the above formula. By adding a very small constant epsilon, the feasibility of implementing the scheme is ensured while the result is not affected.

In some embodiments, when the predicted noisy sample image is an image obtained by adding the values of the pixels at the corresponding positions in the noise-free sample image and the pure noise image, then the first predicted denoising image is an image obtained by subtracting the values of the pixels at the corresponding positions in the predicted noise image from the values of the pixels in the predicted noisy sample image, and the first predicted denoising image may be formulated as follows:

Input-Noise

where Input is the Input to the overall network model, i.e., the predicted noisy sample image. Noise is the output of the Noise decoder, i.e. the prediction Noise image.

Step 540, determining a first training loss based on the first predicted denoised image and the noise-free sample image.

The first training loss is used to measure the difference between the first predicted denoised image and the noise-free sample image.

Optionally, step 540 includes the following substeps (1-2):

1. a first pixel level loss, a first semantic feature loss, and a first image gradient loss are determined from the first predicted denoised image and the noise-free sample image.

The first pixel level loss is used to measure the difference between pixels of the first predicted denoised image and pixels of the noise-free sample image. For example, for corresponding position pixels in the first predicted denoised image and the noise-free sample image, a distance value between the values of the two pixels is calculated, and then the distance values of all corresponding position pixels are summed to obtain the first pixel level loss. The first pixel level loss may be calculated using an L1 distance, cosine distance, euclidean distance, etc., which is not limited in this application.

The first semantic feature loss is used to measure the difference between the semantic features of the first predictive denoised image and the semantic features of the noise-free sample image. The semantic features refer to high-level semantic features in the image. For example, by a feature extractor which is pre-trained, semantic features of the first predicted denoising image and semantic features of the noise-free sample image are extracted respectively, and then a distance value between the two semantic features is calculated to obtain a first semantic feature loss. Semantic feature loss may also be referred to as perceptual loss (loss).

The first image gradient penalty is used to measure the difference between the image gradient of the first predictive denoised image and the image gradient of the noise-free sample image. The first image gradient penalty is obtained by calculating the image gradient of the first predicted denoised image and the image gradient of the noise-free sample image, respectively, and then calculating the difference between these two image gradients. The image gradient loss may also be referred to as gradient loss (gradient loss).

2. A first training penalty is determined based on the first pixel level penalty, the first semantic feature penalty, and the first image gradient penalty.

Optionally, the first pixel level loss, the first semantic feature loss, and the first image gradient loss are weighted and summed to obtain a first training loss. In the embodiment of the application, the first training loss is obtained through calculation according to the first pixel level loss, the first semantic feature loss and the first image gradient loss, and the multi-aspect difference information such as pixel differences, semantic feature differences and gradient differences among images is considered, so that the result of the first training loss is more accurate, and the obtained image denoising model is further more accurate.

At step 550, parameters of the encoder and the first decoder are adjusted based on the first training loss.

Alternatively, the parameters of the encoder and the first decoder are adjusted based on the first training loss using a counter-propagating gradient descent algorithm.

And step 560, processing the characteristic information of the noisy sample image through a second decoder to obtain a second prediction denoising image corresponding to the noisy sample image.

Step 570, determining a second training loss based on the second predicted denoised image and the noise-free sample image.

The second training loss is used to measure the difference between the second predicted denoised image and the noise-free sample image.

Optionally, step 570 includes the following substeps (1-2):

1. and determining a second pixel level loss, a second semantic feature loss and a second image gradient loss according to the second predicted denoising image and the noise-free sample image.

The second pixel level loss is used for measuring the difference between the pixel point of the second prediction denoising image and the pixel point of the noiseless sample image, the second semantic feature loss is used for measuring the difference between the semantic feature of the second prediction denoising image and the semantic feature of the noiseless sample image, and the second image gradient loss is used for measuring the difference between the image gradient of the second prediction denoising image and the image gradient of the noiseless sample image.

In addition, the calculation manners of the second pixel level loss, the second semantic feature loss and the second image gradient loss may refer to the calculation manners of the first pixel level loss, the first semantic feature loss and the first image gradient loss described above, and the detailed description thereof will not be repeated here.

2. And determining a second training loss according to the second pixel level loss, the second semantic feature loss and the second image gradient loss.

Optionally, the second pixel level loss, the second semantic feature loss, and the second image gradient loss are weighted and summed to obtain a second training loss. In the embodiment of the application, the second training loss is obtained through calculation according to the second pixel level loss, the second semantic feature loss and the second image gradient loss, and the multi-aspect difference information such as pixel differences, semantic feature differences and gradient differences among images is considered, so that the result of the second training loss is more accurate, and the obtained image denoising model is further more accurate.

In step 580, parameters of the encoder and the second decoder are adjusted based on the second training loss.

In addition, in some exemplary embodiments, the same data enhancement processing may also be performed on noisy and noiseless sample images, including, but not limited to, at least one of: random scaling, random clipping, random horizontal flipping, random vertical flipping, random rotation, affine transformation, translational scaling rotation, perspective transformation. And inputting the noisy sample image subjected to the data enhancement processing into an encoder for feature extraction. The data enhancement processing is carried out on the images, so that the capability of the model for learning the characteristics from different images can be improved, and the finally trained model has higher robustness on different images.

According to the method, the process of parameter adjustment of the image denoising model by the first training loss and the second training loss is separated, so that the training process of the noise decoder and the training process of the restoration decoder are independent, simultaneous operation of multiple tasks is realized, and the time for training the model is saved.

Meanwhile, training loss is obtained through a plurality of indexes (comprising pixel-level loss, semantic feature loss and image gradient loss), and parameter adjustment of the image denoising model is carried out through the training loss, and the training loss is generated through the plurality of indexes, so that the parameter adjustment of the image denoising model is more accurate.

On the basis of the training method based on the image denoising model provided by the embodiment of fig. 5, as shown in fig. 6, the training method provided by another exemplary embodiment of the present application may further include the following steps (610-630):

in step 610, the characteristic information of the noisy sample image or the noiseless sample image is processed by the classifier to obtain a noise prediction result of the noisy sample image or the noiseless sample image, where the noise prediction result is used to indicate whether the noisy sample image or the noiseless sample image contains noise.

The classifier is used for judging whether the input image contains noise, the input image can be a noisy sample image or a non-noisy sample image, and the output result of the classifier is the probability of judging whether the input image contains noise, namely a noise prediction result. For example, when a noisy sample image with noise is input, the noise prediction result output by the classifier is that the image has a noise probability of 90%; when a noise-free sample image is input, the noise prediction result output by the classifier is that the image contains noise with the probability of 10%. The range of values of the output result of the classifier can be a value between the closed intervals [0,1 ].

Step 620, determining a third training loss according to the noise prediction result and the noise real result, wherein the third training loss is used for measuring the difference between the noise prediction result and the noise real result.

And obtaining a third training loss according to the difference between the noise prediction result output by the classifier and the noise real result of the input image.

Step 630, based on the third training loss, adjusts parameters of the encoder and classifier.

In this embodiment, on the basis of the first decoder and the second decoder, a classifier is further introduced, so that three subtasks are performed simultaneously in the model training process, and the three subtasks are mutually promoted in performance, which is helpful to further improve the denoising effect of the image denoising model.

Meanwhile, by introducing the classifier, whether the input image is noisy or not is judged according to the classifier so as to determine whether the denoising operation is needed through the first decoder branch, the unnecessary operation of denoising the noiseless image by the image denoising model is omitted, and the time and cost of denoising the image are saved.

Based on the training method of the image denoising model provided by any embodiment, the idea of countermeasure learning can be further introduced on the basis of multi-task learning to optimize the model performance. As shown in fig. 7, the training method provided in another exemplary embodiment of the present application may further include the following steps (710 to 720):

In step 710, pixel level discrimination is performed on the first prediction denoising image and the noiseless sample image by the discriminator, so as to obtain pixel level discrimination results corresponding to the first prediction denoising image and the noiseless sample image respectively.

The discriminator is used for judging whether the input image is subjected to model denoising processing at a pixel level, and judging each pixel in the input image when judging at the pixel level. The input image comprises a first predictive denoising image and a noise-free sample image, and the output result is the probability of judging whether the input image is subjected to model denoising processing. For example, when a first predicted denoising image containing 10000 pixels is input, the output result of the discriminator is the predicted result corresponding to the 10000 pixels in the image, and the predicted result of each pixel is whether the pixel is the pixel obtained after denoising processing of the model. For example, the prediction result may be any probability value between [0,1 ].

Optionally, the arbiter comprises a feature mapping layer, n cascaded downsampling layers, n cascaded upsampling layers, and a feature extraction layer, n being an integer greater than 1. Step 610 includes the following steps (1-4):

1. and processing the first prediction denoising image or the noiseless sample image through the feature mapping layer to obtain an initial feature map of the first prediction denoising image or the noiseless sample image.

The feature mapping layer firstly amplifies the spatial resolution of the input image, and then obtains an initial feature map of the input image.

In some embodiments, as shown in fig. 8, fig. 8 illustrates a schematic diagram of various levels in a arbiter. Wherein the feature mapping layer 81 includes: a convolution module 81a and an activation function module 81b. The input image is first subjected to a convolution module 81a to obtain a convolution result diagram, and then the convolution result diagram is input into an activation function module 82b to obtain an initial feature diagram of the input image.

2. And carrying out n-step downsampling processing on the initial feature map through n cascade downsampling layers to obtain a downsampled feature map.

And carrying out multiple downsampling treatments on the obtained initial feature map through a plurality of downsampling layers to obtain a downsampled feature map with reduced spatial resolution.

In some embodiments, as shown in fig. 8. Wherein the downsampling layer 82 comprises: a downsampling module 82a, a normalization module 82b, and an activation function module 82c. The initial feature map of the input image is input into the downsampling module 82a to obtain a downsampling result map, then the downsampling result map is input into the normalizing module 82b to obtain a normalized result map, and finally the normalized result map is input into the activating function module 82c to obtain a downsampled feature map of the input image output by the encoder. Alternatively, the downsampling process in the downsampling module uses convolution with a step size of 2, spectral normalization (Spectral Normalization, SN for short) as the normalization method, and leak ReLU as the activation function.

3. Performing n-step up-sampling processing on the down-sampling feature map through n cascaded up-sampling layers to obtain an up-sampling feature map; wherein, jump connection is adopted between the downsampling layer and the upper adopting layer.

And carrying out multiple upsampling treatment on the obtained downsampled feature map through a plurality of upsampling layers to obtain an upsampled feature map with increased spatial resolution.

In some embodiments, as shown in fig. 8. The upsampling layer 83 includes: a type adjustment module 83a, an up-sampling module 83b, a normalization module 83c and an activation function module 83d. The downsampled feature map of the input image is first passed through a type adjustment module 83a to obtain a category adjustment result map. And then inputting the class adjustment result graph into an up-sampling module 83b to obtain an up-sampling result graph, inputting the up-sampling result graph into a normalization module 83c to obtain a normalization result graph, and finally inputting the normalization result graph into an activation function module 83d to obtain an up-sampling feature graph of the input image output by the encoder. Alternatively, the upsampling process in the upsampling module uses a convolution with a step size of 1, uses spectral normalization as the normalization method, and uses the leak ReLU as the activation function.

Alternatively, the upsampling layer and the downsampling layer may include a plurality of upsampling layers. In some embodiments, as shown in fig. 8, fig. 9 illustrates an overall structural schematic of the arbiter. Alternatively, in fig. 9, the arbiter consists of one feature mapping layer, three downsampling layers, three upsampling layers, and one feature extraction layer. It can be seen that the downsampling layer and the upsampling layer are connected by a jump, i.e. as shown in fig. 9, for upsampling layer two 91 not only upsampling layer three 92 is controlled, but also downsampling layer two 93. Likewise, for upsampling layer one 94, not only upsampling layer two 91 is controlled, but also downsampling layer one 95 is controlled. Similarly, the upsampling layer and the downsampling layer are also skip connected during image denoising using a shadow decoder or a restoration decoder for steps 220 through 240.

4. And processing the up-sampling feature map through a feature extraction layer to obtain a pixel level discrimination result of the first prediction denoising image or the noise-free sample image.

And judging the pixel value of the up-sampling feature map through the feature extraction layer to obtain a pixel level judging result: an image obtained after the noise removal processing and an image obtained without the noise removal processing.

In some embodiments, as shown in fig. 8. The feature extraction layer 84 includes: two downsampling layers with a convolution step size of 1 and one downsampling module 84a with a convolution step size of 1. The up-sampling feature map of the input image is first subjected to a down-sampling module 84a to obtain a down-sampling result map, then the down-sampling result map is input into a normalization module 84b to obtain a normalization result map, and finally the normalization result map is input into an activation function module 84c to obtain a secondary down-sampling feature map of the input image output by the encoder. After the downsampling layer with the convolution step length of 1 is performed once again, the obtained image is input into the downsampling module 84d with the convolution step length of 1, and then the pixel level judging result of the image is obtained. Alternatively, the downsampling process in the downsampling module uses convolution with a step size of 1, uses spectral normalization as the normalization method, and uses the leak ReLU as the activation function.

Step 720, determining a discrimination loss corresponding to the discriminator based on the pixel level discrimination result, wherein the discrimination loss is used for measuring the accuracy of the pixel level discrimination result output by the discriminator.

The discrimination loss is used for adjusting parameters of the discriminator and also used for adjusting parameters of the encoder and the first decoder. Optionally, parameter adjustment of the encoder and the first decoder is performed after the combination of the discrimination loss and the first training loss.

Obtaining a judging result of the judging device according to the pixel-level judging result and the real judging result of the picture: correct probability and/or error probability. The correct probability is the correct rate of the judging result of the judging device, the error probability is the error rate of the judging result of the judging device, and the sum of the correct probability and the error probability of the same judging device is 1.

The embodiment improves the denoising effect of the image denoising generating model by introducing the idea of generating the countermeasure on the basis of multi-task learning.

The above distinguishing methods are pixel level distinguishing methods, and optionally, the distinguishing can be performed by a global distinguishing method, so that compared with the global distinguishing method, the result obtained by the pixel level distinguishing method is more accurate, and a more accurate distinguishing device can be trained. Meanwhile, the discriminators and the image denoising model are used for optimizing the model performance against the learning thought, so that the trained image denoising model is more accurate under the condition of more accurate discrimination.

The training process of the image denoising model is described above, and the using process of the image denoising model is described below. It should be noted that the training process and the use process of the model correspond, and for details not described in detail on one side, reference is made to the description on the other side.

Referring to fig. 9, a flowchart of an image denoising method according to an embodiment of the present application is shown. The subject of execution of the steps of the method may be the model-using device described above. The method may include the following steps (910-940):

step 910, obtaining a noisy image to be processed by an image denoising model, the image denoising model comprising an encoder and a first decoder.

In step 920, feature information of the noisy image is extracted by the encoder.

Optionally, processing the feature information of the noisy image through a classifier to obtain a noise prediction result of the noisy image, wherein the noise prediction result is used for indicating whether the noisy image contains noise or not; in the case where the noise prediction result indicates that noise is included in the noisy image, a step of processing the feature information of the noisy image by the first decoder to obtain a noisy image corresponding to the noisy image, that is, step 930 hereinafter, is performed. And under the condition that the noise prediction result indicates that the noisy image does not contain noise, the image denoising model does not denoise the input noisy image (or input image), and directly outputs or carries out subsequent processing. Optionally, in the case that the noise prediction result indicates that the noisy image contains noise, the image denoising model performs processing on the feature information of the noisy image through the first decoder on the noisy image, so as to obtain a noisy image corresponding to the noisy image.

And step 930, processing the characteristic information of the noisy image through a first decoder to obtain a noisy image corresponding to the noisy image.

The trained image denoising model adopts a first decoder, namely a denoising decoder to denoise the image. Alternatively, the trained image denoising model may also employ a second decoder, i.e., a restoration decoder, to denoise the image.

Optionally, the encoder comprises k concatenated lower employing layers, the first decoder comprises k concatenated upper employing layers, k is an integer greater than 1; step 930 includes: carrying out k-step up-sampling processing on the characteristic information of the noisy image through k cascaded upper adoption layers to obtain a noisy image; wherein, jump connection is adopted between the downsampling layer and the upper adopting layer.

Step 940, generating a denoising image corresponding to the noisy image according to the noisy image and the noisy image.

Optionally, step 940 includes: the method comprises the steps of dividing a noisy image by a numerical value of a pixel at a corresponding position in the noisy image to obtain a first prediction denoising image; or subtracting the values of the pixels at the corresponding positions in the noisy image and the noisy image to obtain a first prediction denoising image.

Optionally, performing first size adjustment processing on the noisy image to obtain an adjusted noisy image; the adjusted noisy image is input to an encoder for feature extraction; performing second size adjustment processing on the noise image to obtain an adjusted noise image; wherein the second resizing process is a reverse resizing process corresponding to the first resizing process. Generating a denoising image corresponding to the noisy image according to the noisy image and the noisy image, including: and generating a denoising image corresponding to the noisy image according to the adjusted noisy image and the noisy image.

The image denoising method of the shadow decoder part in the image denoising model is the same as that of the shadow decoder part in the image denoising model, and will not be described here again.

In the use process of the image denoising model, the embodiment uses the first decoder with the training completed to perform image denoising, and does not use the second decoder with the training completed to perform image denoising. Because the result map output by the first decoder is a prediction noise image and the result output by the second decoder is a second prediction noise-removed image. In the above step, in the process of obtaining the final predicted denoising image, the image output by the decoder needs to be resized, which seriously affects the accuracy of the output image. The output image of the first decoder is a prediction noise image, and the size of the noise image is not adjusted when the output image is processed with the noise image. Thus, in the process of denoising the image by using the trained first decoder, the final result image is calculated by the predicted noise image and the original noisy image. At this time, only the noise image is predicted for resizing, and the original noisy image is not resized, so that the portion of the denoised image to be obtained in the final result image is generated without resizing. In the image denoising model formed by the second decoder, the output result of the second decoder is a second predicted denoising image, and the final result can be obtained after the image is subjected to size adjustment (adopting a difference algorithm). The detail retention of the de-noised image generated by the first decoder is better than that of the second decoder and is more suitable for de-noising the image by the image de-noising model.

Meanwhile, by introducing the classifier, whether the input image is noisy or not is judged according to the classifier, so that whether the denoising operation is needed through the first decoder branch is determined, the unnecessary operation of denoising the noiseless image by the image denoising model is omitted, and the time and cost of denoising the image are saved.

The following are device embodiments of the present application, and for details not described in detail in the device embodiments of the present application, reference may be made to method embodiments of the present application.

Referring to fig. 12, a block diagram of a training apparatus for an image denoising model according to an embodiment of the present application is shown. The device has the function of realizing the training method of the image denoising model, and the function can be realized by hardware or corresponding software executed by hardware. The device can be the model training equipment introduced above or can be arranged in the model training equipment. The apparatus 1200 may include: a sample acquisition module 1210, an information extraction module 1220, a first processing module 1230, a second processing module 1240, and a parameter adjustment module 1250.

A sample acquisition module 1210 is configured to acquire a training sample of the image denoising model, where the training sample includes a noise-free sample image, a pure noise image, and a noisy sample image generated based on the noise-free sample image and the pure noise image.

An information extraction module 1220 is configured to extract, by the encoder, feature information of the noisy sample image.

A first processing module 1230, configured to process, by using the first decoder, the feature information of the noisy sample image to obtain a prediction noise image corresponding to the noisy sample image; and generating a first prediction denoising image corresponding to the noisy sample image according to the prediction noise image and the noisy sample image.

And a second processing module 1240, configured to process, by using the second decoder, the feature information of the noisy sample image to obtain a second predicted denoising image corresponding to the noisy sample image.

A parameter adjustment module 1250 is configured to determine a training loss of the image denoising model according to the first predicted denoising image, the second predicted denoising image, and the noise-free sample image, and adjust parameters of the image denoising model based on the training loss.

In some embodiments, the parameter adjustment module 1250 is configured to:

determining a first training loss according to the first predicted denoising image and the noiseless sample image, wherein the first training loss is used for measuring the difference between the first predicted denoising image and the noiseless sample image;

Parameters of the encoder and the first decoder are adjusted based on the first training loss.

In some embodiments, the parameter adjustment module 1250 is configured to:

determining a first pixel level loss, a first semantic feature loss, and a first image gradient loss from the first predicted denoised image and the noise-free sample image; wherein the first pixel level loss is used to measure a difference between pixels of the first predictive denoising image and pixels of the noiseless sample image, the first semantic feature loss is used to measure a difference between semantic features of the first predictive denoising image and semantic features of the noiseless sample image, and the first image gradient loss is used to measure a difference between image gradients of the first predictive denoising image and image gradients of the noiseless sample image;

determining the first training loss based on the first pixel level loss, the first semantic feature loss, and the first image gradient loss.

In some embodiments, the parameter adjustment module 1250 is configured to:

determining a second training loss based on the second predicted de-noised image and the noiseless sample image, the second training loss being used to measure a difference between the second predicted de-noised image and the noiseless sample image;

Parameters of the encoder and the second decoder are adjusted based on the second training loss.

In some embodiments, the parameter adjustment module 1250 is configured to:

determining a second pixel level loss, a second semantic feature loss, and a second image gradient loss from the second predicted denoised image and the noise-free sample image; wherein the second pixel level loss is used to measure a difference between pixels of the second predicted denoising image and pixels of the noiseless sample image, the second semantic feature loss is used to measure a difference between semantic features of the second predicted denoising image and semantic features of the noiseless sample image, and the second image gradient loss is used to measure a difference between image gradients of the second predicted denoising image and image gradients of the noiseless sample image;

determining the second training loss based on the second pixel level loss, the second semantic feature loss, and the second image gradient loss.

In some embodiments, the image denoising model further comprises a arbiter. As shown in fig. 13, the apparatus 1200 further includes: the result acquisition module 1260.

The result obtaining module 1260 is configured to perform pixel level discrimination on the first predicted denoising image and the noiseless sample image by using the discriminator, so as to obtain pixel level discrimination results corresponding to the first predicted denoising image and the noiseless sample image respectively.

The parameter adjustment module 1250 is further configured to determine a discrimination loss corresponding to the discriminator based on the pixel level discrimination result, where the discrimination loss is used to measure accuracy of the pixel level discrimination result output by the discriminator; wherein the discrimination loss is used to adjust parameters of the discriminator and also to adjust parameters of the encoder and the first decoder.

In some embodiments, the arbiter comprises a feature map layer, n cascaded downsampling layers, n cascaded upsampling layers, and a feature extraction layer, n being an integer greater than 1. The result obtaining module 1260 is configured to:

processing the first prediction denoising image or the noiseless sample image through the feature mapping layer to obtain an initial feature image of the first prediction denoising image or the noiseless sample image;

performing n-step downsampling processing on the initial feature map through the n cascade downsampling layers to obtain a downsampled feature map;

Performing n-step up-sampling processing on the down-sampling feature map through the n cascaded up-sampling layers to obtain an up-sampling feature map; the downsampling layer and the upper adopting layer are connected in a jumping manner;

and processing the up-sampling feature map through the feature extraction layer to obtain a pixel level discrimination result of the first prediction denoising image or the noise-free sample image.

In some embodiments, the image denoising model further comprises a classifier. As shown in fig. 13, the apparatus 1200 further includes a noise prediction module 1270.

A noise prediction module 1270, configured to process, by using the classifier, the feature information of the noisy sample image or the noise-free sample image, to obtain a noise prediction result of the noisy sample image or the noise-free sample image, where the noise prediction result is used to indicate whether noise is included in the noisy sample image or the noise-free sample image.

The parameter adjustment module 1250 is further configured to determine a third training loss according to the noise prediction result and the noise real result, where the third training loss is used to measure a difference between the noise prediction result and the noise real result; based on the third training loss, parameters of the encoder and the classifier are adjusted.

In some embodiments, the first processing module 1230 is configured to: dividing the value of the pixel at the corresponding position in the sample image with noise and the prediction noise image to obtain the first prediction denoising image; or subtracting the numerical value of the pixel at the corresponding position in the sample image with noise and the prediction noise image to obtain the first prediction denoising image.

In some embodiments, as shown in fig. 13, the apparatus 1200 further includes a data enhancement module (not labeled in the figure).

A data enhancement module for performing a same data enhancement process on the noisy sample image and the noise-free sample image, the data enhancement process comprising at least one of: random scaling, random clipping, random horizontal overturning, random vertical overturning, random rotation, affine transformation, translational scaling rotation and perspective transformation; and the noisy sample image or the noiseless sample image after the data enhancement processing is input to the encoder for feature extraction.

Referring to fig. 14, a block diagram of an image denoising apparatus according to an embodiment of the present application is shown. The device has the function of realizing the image denoising method, and the function can be realized by hardware or corresponding software executed by hardware. The device can be the model using equipment introduced above or can be arranged in the model using equipment. The apparatus 1400 may include: an image acquisition module 1410, an information extraction module 1420, an information processing module 1430, and an image generation module 1440.

An image acquisition module 1410 for acquiring a noisy image to be processed by an image denoising model, the image denoising model comprising an encoder and a first decoder.

An information extraction module 1420 is configured to extract, by the encoder, feature information of the noisy image.

And the information processing module 1430 is configured to process, by using the first decoder, the feature information of the noisy image to obtain a noisy image corresponding to the noisy image.

The image generating module 1440 is configured to generate a denoising image corresponding to the noisy image according to the noisy image and the noisy image.

In some embodiments, the image generation module 1440 is configured to: dividing the value of the pixel at the corresponding position in the noisy image and the noisy image to obtain the first prediction denoising image; or subtracting the numerical values of the pixels at the corresponding positions in the noisy image and the noisy image to obtain the first prediction denoising image.

In some embodiments, the encoder comprises k concatenated lower employed layers, and the first decoder comprises k concatenated upper employed layers, k being an integer greater than 1. The information processing module 1430 is configured to: performing k-step up-sampling processing on the characteristic information of the noisy image through the k cascaded upper adoption layers to obtain the noisy image; and jump connection is adopted between the downsampling layer and the upper adopting layer.

In some embodiments, the image denoising model further comprises a classifier. As shown in fig. 15, the apparatus 1400 further includes a noise prediction module 1450, configured to process, by using the classifier, the feature information of the noisy image to obtain a noise prediction result of the noisy image, where the noise prediction result is used to indicate whether noise is included in the noisy image; and executing the step of processing the characteristic information of the noisy image through the first decoder to obtain a noisy image corresponding to the noisy image under the condition that the noise prediction result indicates that the noisy image contains noise.

In some embodiments, as shown in fig. 15, the apparatus 1400 further includes a resizing module 1460 configured to perform a first resizing process on the noisy image to obtain an adjusted noisy image; the adjusted noisy image is input to the encoder for feature extraction; performing second size adjustment processing on the noise image to obtain an adjusted noise image; wherein the second resizing process is a reverse resizing process corresponding to the first resizing process.

The image generating module 1440 is further configured to generate a denoising image corresponding to the noisy image according to the adjusted noisy image and the noisy image.

It should be noted that, in the apparatus provided in the foregoing embodiment, when implementing the functions thereof, only the division of the foregoing functional modules is used as an example, in practical application, the foregoing functional allocation may be implemented by different functional modules, that is, the internal structure of the device is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the apparatus and the method embodiments provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the apparatus and the method embodiments are detailed in the method embodiments and are not repeated herein.

Referring to fig. 16, a schematic structural diagram of a computer device according to an embodiment of the present application is shown. The computer device may be any electronic device having data computing, processing and storage functions, such as a cell phone, tablet computer, PC (Personal Computer ) or server, etc. The computer device may be implemented as a model using device for implementing the image denoising method provided in the above embodiment; alternatively, the computer device may be implemented as a model training device for implementing the training method of the image denoising model provided in the above embodiment. Specifically, the present invention relates to a method for manufacturing a semiconductor device.

The computer device 1600 includes a central processing unit (such as a CPU (Central Processing Unit, central processing unit), a GPU (Graphics Processing Unit, graphics processor), an FPGA (Field Programmable Gate Array ), etc.) 1601, a system Memory 1604 including a RAM (Random-Access Memory) 1602 and a ROM (Read-Only Memory) 1603, and a system bus 1605 connecting the system Memory 1604 and the central processing unit 1601. The computer device 1600 also includes a basic input/output system (Input Output System, I/O system) 1606 to facilitate transfer of information between the various devices within the server, and a mass storage device 1607 for storing an operating system 1613, application programs 1614, and other program modules 1615.

The basic input/output system 1606 includes a display 1608 for displaying information and an input device 1609, such as a mouse, keyboard, etc., for user input of information. Wherein the display 1608 and the input device 1609 are both coupled to the central processing unit 1601 by way of an input output controller 1610 coupled to the system bus 1605. The basic input/output system 1606 may also include an input/output controller 1610 for receiving and processing input from a keyboard, mouse, or electronic stylus among a plurality of other devices. Similarly, the input-output controller 1610 also provides output to a display screen, printer, or other type of output device.

The mass storage device 1607 is connected to the central processing unit 1601 by a mass storage controller (not shown) connected to the system bus 1605. The mass storage device 1607 and its associated computer-readable media provide non-volatile storage for the computer device 1600. That is, the mass storage device 1607 may include a computer readable medium (not shown) such as a hard disk or CD-ROM (Compact Disc Read-Only Memory) drive.

Without loss of generality, the computer readable medium may include computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM (Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), flash Memory or other solid state Memory technology, CD-ROM, DVD (Digital Video Disc, high density digital video disc) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will recognize that the computer storage medium is not limited to the ones described above. The system memory 1304 and mass storage device 1607 described above may be collectively referred to as memory.

According to an embodiment of the application, the computer device 1600 may also operate by a remote computer connected to the network through a network, such as the Internet. That is, the computer device 1600 may be connected to the network 1612 through a network interface unit 1611 coupled to the system bus 1605, or alternatively, the network interface unit 1611 may be used to connect to other types of networks or remote computer systems (not shown).

The memory further includes at least one instruction, at least one program, code set, or instruction set stored in the memory and configured to be executed by the one or more processors to implement the image denoising method or training method of an image denoising model described above.

In an exemplary embodiment, there is also provided a computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes or a set of instructions, which when executed by a processor of a computer device, implements the image denoising method or the training method of an image denoising model provided in the above embodiment.

Alternatively, the computer-readable storage medium may include: ROM (Read-Only Memory), RAM (Random-Access Memory), SSD (Solid State Drives, solid State disk), optical disk, or the like. The random access memory may include ReRAM (Resistance Random Access Memory, resistive random access memory) and DRAM (Dynamic Random Access Memory ), among others.

In an exemplary embodiment, a computer program product or computer program is also provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device executes the image denoising method or the training method of the image denoising model.

It should be understood that references herein to "a plurality" are to two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. In addition, the step numbers described herein are merely exemplary of one possible execution sequence among steps, and in some other embodiments, the steps may be executed out of the order of numbers, such as two differently numbered steps being executed simultaneously, or two differently numbered steps being executed in an order opposite to that shown, which is not limited by the embodiments of the present application.

The foregoing is illustrative of the present application and is not to be construed as limiting thereof, but rather as providing for the use of various modifications, equivalents, improvements or alternatives falling within the spirit and principles of the present application.

Claims

1. The training method of the image denoising model is characterized in that the image denoising model comprises an encoder, a first decoder and a second decoder; the method comprises the following steps:

extracting characteristic information of the noisy sample image by the encoder;

2. The method of claim 1, wherein the determining a training loss of the image denoising model from the first predicted denoising image, the second predicted denoising image, and the noise-free sample image, and adjusting parameters of the image denoising model based on the training loss, comprises:

3. The method of claim 2, wherein the determining a first training loss from the first predicted denoised image and the noise-free sample image comprises:

4. The method of claim 1, wherein the determining a training loss of the image denoising model from the first predicted denoising image, the second predicted denoising image, and the noise-free sample image, and adjusting parameters of the image denoising model based on the training loss, comprises:

5. The method of claim 4, wherein said determining a second training loss from said second predicted denoised image and said noise-free sample image comprises:

6. The method of claim 1, wherein the image denoising model further comprises a discriminator; the method further comprises the steps of:

respectively carrying out pixel level discrimination on the first prediction denoising image and the noiseless sample image through the discriminator to obtain pixel level discrimination results respectively corresponding to the first prediction denoising image and the noiseless sample image;

determining a discrimination loss corresponding to the discriminator based on the pixel level discrimination result, wherein the discrimination loss is used for measuring the accuracy of the pixel level discrimination result output by the discriminator;

wherein the discrimination loss is used to adjust parameters of the discriminator and also to adjust parameters of the encoder and the first decoder.

7. The method of claim 6, wherein the arbiter comprises a feature map layer, n cascaded downsampling layers, n cascaded upsampling layers, and a feature extraction layer, n being an integer greater than 1;

the pixel level discrimination is performed on the first prediction denoising image and the noiseless sample image by the discriminator to obtain pixel level discrimination results respectively corresponding to the first prediction denoising image and the noiseless sample image, including:

8. The method of claim 1, wherein the image denoising model further comprises a classifier; the method further comprises the steps of:

processing the characteristic information of the noisy sample image or the noiseless sample image through the classifier to obtain a noise prediction result of the noisy sample image or the noiseless sample image, wherein the noise prediction result is used for indicating whether the noisy sample image or the noiseless sample image contains noise or not;

Determining a third training loss according to the noise prediction result and the noise real result, wherein the third training loss is used for measuring the difference between the noise prediction result and the noise real result;

based on the third training loss, parameters of the encoder and the classifier are adjusted.

9. The method of claim 1, wherein generating a first predicted de-noised image corresponding to the noisy sample image from the predicted noise image and the noisy sample image comprises:

dividing the value of the pixel at the corresponding position in the sample image with noise and the prediction noise image to obtain the first prediction denoising image;

or,

and subtracting the numerical value of the pixel at the corresponding position in the sample image with noise from the numerical value of the pixel at the corresponding position in the prediction noise image to obtain the first prediction denoising image.

10. The method according to any one of claims 1 to 9, further comprising, after the obtaining the training samples of the image denoising model:

performing the same data enhancement processing on the noisy sample image and the noise-free sample image, the data enhancement processing including at least one of: random scaling, random clipping, random horizontal overturning, random vertical overturning, random rotation, affine transformation, translational scaling rotation and perspective transformation;

And the noisy sample image or the noiseless sample image after the data enhancement processing is input to the encoder for feature extraction.

11. A method of denoising an image, the method comprising:

extracting characteristic information of the noisy image through the encoder;

12. The method of claim 11, wherein generating a de-noised image corresponding to the noisy image from the noisy image and the noisy image comprises:

dividing the value of the pixel at the corresponding position in the noisy image and the noisy image to obtain the first prediction denoising image;

or,

and subtracting the numerical values of the pixels at the corresponding positions in the noisy image and the noisy image to obtain the first prediction denoising image.

13. The method of claim 11, wherein the encoder comprises k concatenated lower employing layers, the first decoder comprises k concatenated upper employing layers, and k is an integer greater than 1;

the processing, by the first decoder, the feature information of the noisy image to obtain a noisy image corresponding to the noisy image includes:

performing k-step up-sampling processing on the characteristic information of the noisy image through the k cascaded upper adoption layers to obtain the noisy image; and jump connection is adopted between the downsampling layer and the upper adopting layer.

14. The method of claim 11, wherein the image denoising model further comprises a classifier; the method further comprises the steps of:

processing the characteristic information of the noisy image through the classifier to obtain a noise prediction result of the noisy image, wherein the noise prediction result is used for indicating whether the noisy image contains noise or not;

and executing the step of processing the characteristic information of the noisy image through the first decoder to obtain a noisy image corresponding to the noisy image under the condition that the noise prediction result indicates that the noisy image contains noise.

15. The method according to any one of claims 11 to 14, further comprising:

performing first size adjustment processing on the noisy image to obtain an adjusted noisy image; the adjusted noisy image is input to the encoder for feature extraction;

performing second size adjustment processing on the noise image to obtain an adjusted noise image; wherein the second resizing process is a reverse resizing process corresponding to the first resizing process;

the generating a denoising image corresponding to the noisy image according to the noisy image and the noisy image comprises:

and generating a denoising image corresponding to the noisy image according to the adjusted noisy image and the noisy image.

16. An image denoising model training device is characterized in that the image denoising model comprises an encoder, a first decoder and a second decoder; the device comprises:

17. An image denoising apparatus, comprising:

18. A computer device comprising a processor and a memory, wherein the memory has stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the training method of the image denoising model of any one of claims 1 to 10, or to implement the image denoising method of any one of claims 11 to 15.

19. A computer readable storage medium having stored therein at least one instruction, at least one program, code set, or instruction set, the at least one instruction, the at least one program, the code set, or instruction set being loaded and executed by a processor to implement the training method of the image denoising model of any one of claims 1 to 10, or to implement the image denoising method of any one of claims 11 to 15.

20. A computer program product or computer program, characterized in that it comprises computer instructions stored in a computer readable storage medium, which are read and executed by a processor from an operating room computer readable storage medium to implement the training method of an image denoising model according to any one of claims 1 to 10, or to implement the image denoising method according to any one of claims 11 to 15.