CN115689902A - Image restoration model training method, image restoration device and storage medium - Google Patents

Image restoration model training method, image restoration device and storage medium Download PDF

Info

Publication number
CN115689902A
CN115689902A CN202110830977.6A CN202110830977A CN115689902A CN 115689902 A CN115689902 A CN 115689902A CN 202110830977 A CN202110830977 A CN 202110830977A CN 115689902 A CN115689902 A CN 115689902A
Authority
CN
China
Prior art keywords
image
ith
loss value
parameter
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110830977.6A
Other languages
Chinese (zh)
Inventor
左力文
夏叶锋
彭湃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Hangzhou Information Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Hangzhou Information Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202110830977.6A priority Critical patent/CN115689902A/en
Publication of CN115689902A publication Critical patent/CN115689902A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Processing (AREA)

Abstract

The application discloses an image restoration model training method, an image restoration device and a storage medium, wherein the image restoration model training method comprises the following steps: inputting a first image to be restored and the ith random noise in the N random noises into a generation network of an image restoration model to generate an ith second image; determining a first loss value according to the relative entropy of the first parameter and the second parameter; the first parameter is determined according to the normalization results of the ith random noise and the N random noises; the second parameter is determined according to the normalization result of the N second images correspondingly generated by the ith second image and the N random noises; n is an integer greater than 1; and updating the weight parameter of the image restoration model according to the determined first loss value.

Description

Image restoration model training method, image restoration device and storage medium
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to an image inpainting model training method, an image inpainting apparatus, an electronic device, and a storage medium.
Background
Image inpainting, refers to the process of missing or damaged parts in a reconstructed image. For some application scenarios, one image to be restored may correspond to a plurality of reasonable restoration results, for example, an image to be restored with a damaged face nose region, and the reasonable restoration image may be a high nose bridge or a collapsed nose bridge.
In the related art, in the method for image restoration of an image to be restored based on random noise, when an image to be restored is restored, only a single restored image can be obtained, and a variety of restored images cannot be provided for one image to be restored.
Disclosure of Invention
In view of this, embodiments of the present application provide an image inpainting model training method, an image inpainting apparatus, an electronic device, and a storage medium, so as to at least solve a problem that, in a related art, when performing image inpainting, a variety of inpainting images cannot be provided for one image to be inpainted.
The technical scheme of the embodiment of the application is realized as follows:
the embodiment of the application provides an image inpainting model training method, which comprises the following steps:
inputting the first image to be restored and the ith random noise in the N random noises into a generation network of an image restoration model to generate an ith second image;
determining a first loss value according to the relative entropy of the first parameter and the second parameter; the first parameter is determined according to the normalization results of the ith random noise and the N random noises; the second parameter is determined according to the normalization result of the N second images correspondingly generated by the ith second image and the N random noises; n is an integer greater than 1;
and updating the weight parameter of the image restoration model according to the determined first loss value.
In the foregoing solution, before the updating the weight parameter of the image restoration model according to the determined first loss value, the method further includes:
classifying the ith second image through a discrimination network of the image restoration model, and determining a second loss value based on a classification result of the ith second image and a classification result of a third image; the third image represents a calibration result corresponding to the first image;
determining a third loss value based on the ith second image and the third image;
the updating the weight parameter of the image restoration model according to the determined first loss value comprises the following steps:
and updating the weight parameter of the image restoration model according to the determined first loss value, second loss value and third loss value.
In the foregoing solution, the determining a third loss value based on the ith second image and the third image includes:
determining a fourth loss value according to the norm of the ith second image and the third image;
determining a first feature vector based on the ith second image, determining a second feature vector based on the third image, and determining a fifth loss value according to the norm of the vector difference between the first feature vector and the second feature vector;
determining the third loss value according to the fourth loss value and the fifth loss value.
In the above solution, before the inputting the first image to be restored and the ith random noise in the N random noises into the network for generating the image restoration model, the method further includes:
determining a fourth image from the set sample gallery;
and performing mask processing on the determined set area of the third image to obtain the first image.
In the foregoing solution, the determining the fourth image from the set sample gallery includes:
and carrying out target detection on a fifth image in the set sample gallery, cutting the fifth image based on a target rectangular frame positioned in the target detection process, and determining the fourth image.
The embodiment of the application further provides an image restoration method, which comprises the following steps:
inputting a sixth image to be restored and N random noises into a generation network of an image restoration model, and outputting N seventh images; wherein the content of the first and second substances,
the image restoration model is obtained by adopting any one of the image restoration model training methods.
The embodiment of the present application further provides an image inpainting model training device, including:
the generating unit is used for inputting the first image to be repaired and the ith random noise in the N random noises into a generating network of the image repairing model to generate an ith second image;
the first processing unit is used for determining a first loss value according to the relative entropy of the first parameter and the second parameter; the first parameter is determined according to the normalization results of the ith random noise and the N random noises; the second parameter is determined according to the normalization result of the N second images correspondingly generated by the ith second image and the N random noises; n is an integer greater than 1;
and the training unit is used for updating the weight parameter of the image restoration model according to the determined first loss value.
An embodiment of the present application further provides an image restoration device, including:
the repairing unit is used for inputting a sixth image to be repaired and N random noises into a generation network of the image repairing model and outputting N seventh images; wherein, the first and the second end of the pipe are connected with each other,
the image restoration model is obtained by adopting any one of the image restoration model training methods.
An embodiment of the present application further provides a first electronic device, including: a first processor and a first communication interface; wherein, the first and the second end of the pipe are connected with each other,
the first processor is used for inputting the first image to be restored and the ith random noise in the N random noises into a generation network of the image restoration model to generate an ith second image;
determining a first loss value according to the relative entropy of the first parameter and the second parameter; the first parameter is determined according to the normalization results of the ith random noise and the N random noises; the second parameter is determined according to the normalization result of the N second images correspondingly generated by the ith second image and the N random noises; n is an integer greater than 1;
and updating the weight parameter of the image restoration model according to the determined first loss value.
An embodiment of the present application further provides a second electronic device, including: a second processor and a second communication interface; wherein the content of the first and second substances,
the second processor is used for inputting a sixth image to be repaired and N random noises into a generation network of the image repair model and outputting N seventh images; wherein the content of the first and second substances,
the image restoration model is obtained by adopting any one of the image restoration model training methods.
An embodiment of the present application further provides a first electronic device, including: a first processor and a first memory for storing a computer program capable of running on the processor,
wherein the first processor is configured to execute the steps of any one of the image inpainting model training methods when the computer program is executed.
An embodiment of the present application further provides a second electronic device, including: a second processor and a second memory for storing a computer program capable of running on the processor,
wherein the second processor is configured to execute the steps of the image inpainting method when the computer program is run.
An embodiment of the present application further provides a storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the image inpainting model training method described in any one of the above, or implements the steps of the image inpainting method described above.
According to the image restoration model training method, the image restoration device, the electronic device and the storage medium provided by the embodiment of the application, the ith random noise in the first image to be restored and the N random noises is input into a generation network of the image restoration model, the ith second image is generated, the normalization result of the ith random noise and the N random noises is used as a first parameter, the normalization result of the N second images generated by the ith second image and the N random noises correspondingly is used as a second parameter, a first loss value is determined based on the relative entropy of the first parameter and the second parameter, and the weight parameter of the generation network of the image restoration model is updated according to the first loss value. When the image restoration model is trained, the weight parameters of the generation network of the image restoration model are updated through the first loss values, so that the generated N second images overcome the problem of mode collapse existing in the generation-based countermeasure network. In this way, based on a plurality of random noises, diversified image restoration results corresponding to the same image to be restored can be generated, thereby providing diversified image restoration results for the same image to be restored.
Drawings
Fig. 1 is a schematic flowchart of an image inpainting model training method according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart of an image restoration method according to an embodiment of the present application;
fig. 3 is a schematic flowchart of an image restoration method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an image restoration model according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an image inpainting model training device according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an image restoration apparatus according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a first electronic device according to an embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of a second electronic device according to an embodiment of the present application.
Detailed Description
Image inpainting, refers to the process of missing or damaged parts in a reconstructed image. For some application scenarios, one image to be restored may correspond to a plurality of reasonable restoration results, for example, an image to be restored with a damaged face nose region, and the reasonable restoration image may be a high nose bridge or a collapsed nose bridge.
In the related art, in the method for image restoration of an image to be restored based on random noise, when an image to be restored is restored, only a single restored image can be obtained, and a variety of restored images cannot be provided for one image to be restored.
Based on this, according to the image inpainting model training method, the image inpainting method, the apparatus, the electronic device, and the storage medium provided in the embodiments of the present application, the ith random noise in the first image to be inpainted and the N random noises is input to the generation network of the image inpainting model, the ith second image is generated, the normalization result of the ith random noise and the N random noises is used as the first parameter, and the normalization result of the N second images generated by the ith second image and the N random noises correspondingly is used as the second parameter, the first loss value is determined based on the relative entropy of the first parameter and the second parameter, and the weight parameter of the generation network of the image inpainting model is updated according to the first loss value. When the image restoration model is trained, the weight parameters of the generation network of the image restoration model are updated through the first loss values, so that the second images generated correspondingly based on the similar noise in the random noise space are similar in the image space and the second images generated correspondingly based on the distant noise in the random noise space are distant in the image space, and the problem of mode collapse in the generation-based countermeasure network is solved. In this way, based on a plurality of random noises, diversified image restoration results corresponding to the same image to be restored can be generated, thereby providing diversified image restoration results for the same image to be restored.
In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
As shown in fig. 1, the flow diagram of the image inpainting model training method provided in the embodiment of the present application includes:
step 101: and inputting the first image to be restored and the ith random noise in the N random noises into a generation network of the image restoration model to generate an ith second image.
Wherein N is an integer greater than 1.
Inputting the first image to be restored and the ith random noise in the N random noises into a generation network of an image restoration model to generate an ith second image, wherein N is an integer larger than 1, and i is any positive integer from 1 to N. Here, the first image represents an image to be restored to which the image restoration model is to be input, and the second image represents a restored image corresponding to the first image output by the image restoration model. The image restoration model may be one of the generation countermeasure networks (GAN), such as a condition generation countermeasure network (CGAN) that inputs additional information as a condition to the generation network and the discrimination network.
The random noise may be input into the generation network of the image inpainting model in the form of a feature vector, and the random noise may be obtained by sampling random samples from a prior distribution, such as a uniform distribution or a multidimensional gaussian distribution with a covariance matrix I. The value of N may be an nth power of 2, such as 8, 16, 32, etc., and is determined according to a video memory of a training device, such as a GPU.
Step 102: a first loss value is determined based on the relative entropy of the first parameter and the second parameter.
Wherein the first parameter is determined according to the normalization results of the ith random noise and the N random noises; and the second parameter is determined according to the normalization result of the N second images correspondingly generated by the ith second image and the N random noises.
Determining a first parameter according to the normalization result of the ith random noise and the N random noises, determining a second parameter according to the normalization result of the ith second image and the N second images correspondingly generated by the N random noises, and determining a first loss value based on the relative entropy result of the first parameter and the second parameter.
Step 103: and updating the weight parameter of the image restoration model according to the determined first loss value.
And updating the weight parameters of the image restoration model so as to improve the judgment result of the restoration image output by the generation network of the image restoration model in the judgment network.
In the training process, the generation network and the discrimination network of the image restoration model are alternately and iteratively trained, and the weight parameters are updated until the two parties reach a dynamic balance, namely Nash balance. The optimal state is that the discriminating network cannot distinguish the generated restored image from the real image, i.e. in case of loss of original GAN, the discriminating probability of both in the discriminator is 0.5.
And in the process of reversely propagating the loss value to each layer of the image restoration model, calculating the gradient of the loss function according to the loss value, and updating the weight parameter reversely propagated to the current layer along the descending direction of the gradient.
And taking the updated weight parameters as the weight parameters used by the trained image restoration model.
Here, an update stop condition may be set, and when the update stop condition is satisfied, the weight parameter obtained by the last update may be determined as the weight parameter used by the trained image restoration model. Updating the stopping condition such as a set training iteration number (iteration), wherein one training iteration number is a process of training the image restoration model once according to the first image. Of course, the update stop condition is not limited to this, and may be a loss value threshold of a set loss function, for example.
It should be noted that a loss function (loss function) is used to measure the degree of inconsistency between the predicted value and the true value (calibration value) of the model. In practical applications, model training is achieved by minimizing a loss function.
The backward propagation is relative to the forward propagation, which refers to the feedforward processing of the model, and the backward propagation is opposite to the forward propagation. And the back propagation refers to updating the weight parameters of each layer of the model according to the output result of the model.
Therefore, when the trained image restoration model is used for generating the network restoration image, the problem of mode collapse existing in the generation countermeasure network can be solved, and diversified image restoration results corresponding to the same image to be restored can be generated based on a plurality of random noises, so that the diversified image restoration results of the same image to be restored can be provided for the user to select.
In practice, the first loss value may be determined in the following manner.
Ith first parameter P i It can be determined by the following formula (1):
Figure BDA0003175539160000081
wherein, the first and the second end of the pipe are connected with each other,
z i for the ith random noise among the N random noises,
z j is the jth random noise in the N random noises.
Ith second parameter Q i It can be determined by the following formula (2):
Figure BDA0003175539160000082
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003175539160000083
generating a network for an image inpainting model based on a first image to be inpainted
Figure BDA0003175539160000084
And an ith random noise generated ith second image,
Figure BDA0003175539160000085
generating a network for an image inpainting model based on a first image to be inpainted
Figure BDA0003175539160000086
And a jth second image generated by a jth random noise.
First loss value L multimodal It can be determined by the following formula (3):
L multimodal =λ 1 KL(P i ||Q i ) (3)
wherein the content of the first and second substances,
λ 1 a weight representing the value of the first penalty,
KL divergence calculation probability P i And Q i Expected value of the logarithmic difference.
In an embodiment, before the updating the weight parameter of the image inpainting model according to the determined first loss value, the method further includes:
classifying the ith second image through a discrimination network of the image restoration model, and determining a second loss value based on a classification result of the ith second image and a classification result of a third image; the third image represents a calibration result corresponding to the first image;
determining a third loss value based on the ith second image and the third image;
the updating the weight parameter of the image restoration model according to the determined first loss value comprises:
and updating the weight parameter of the image restoration model according to the determined first loss value, second loss value and third loss value.
Before updating the weight parameters of the image restoration model according to the determined first loss value, classifying the ith second image and the third image respectively through a discrimination network of the image restoration model, determining a second loss value based on the classification result of the ith second image and the classification result of the third image, determining a third loss value based on the ith second image and the third image, and updating the weight parameters of the image restoration model according to the determined first loss value, the determined second loss value and the determined third loss value. Here, the third image represents the calibration result corresponding to the first image, and may be a fourth image corresponding to the first image, that is, the third image is an original image of the first image; or may be an image determined from the feature information of the first image in the image database.
In this way, the weight parameter is updated according to the determined first loss value, the second loss value and the third loss value, and the generated restoration image is more vivid visually through the introduction of the second loss value. And meanwhile, determining the loss of the generated image and the calibrated image on a pixel level, so that the repaired image is as same as a corresponding real image on a pixel space as possible. And by comparing high-level perception and semantic differences between images, the image restoration result is ensured to be semantically consistent with the corresponding real image.
Here, the second loss value L adv It can be determined by the following formula (4):
Figure BDA0003175539160000091
wherein D and G are the results of discriminating the network and generating the network output respectively,
Figure BDA0003175539160000092
generating a network for the image restoration model based on a first image to be restored;
x is a calibration result corresponding to the first image, namely a third image;
z i is the ith random noise in the N random noises.
In an embodiment, the determining a third loss value based on the ith second image and the third image comprises:
determining a fourth loss value according to the norm of the ith second image and the third image;
determining a first feature vector based on the ith second image, determining a second feature vector based on the third image, and determining a fifth loss value according to the norm of the vector difference between the first feature vector and the second feature vector;
determining the third loss value according to the fourth loss value and the fifth loss value.
And determining a fourth loss value according to the L1 norm of the ith second image and the third image. Determining a first feature vector based on the ith second image, determining a second feature vector based on the third image, and determining a fifth loss value according to the L1 norm of the vector difference between the first feature vector and the second feature vector. And calculating to obtain a third loss value according to the determined fourth loss value and the determined fifth loss value. Here, the feature vector may be determined by inputting the corresponding image to a neural network model pre-trained using ImageNet, such as VGGNet.
Fourth loss value L L1 It can be determined by the following formula (5):
Figure BDA0003175539160000101
wherein the content of the first and second substances,
λ 2 a weight representing a fourth penalty value.
Here, the fifth loss value L perceptual It can be determined by the following equation (6):
Figure BDA0003175539160000102
wherein the content of the first and second substances,
λ 3 a weight representing a fifth loss value is calculated,
f (-) represents the corresponding feature vector of the image.
Third lossLoss value L 3 It can be determined by the following formula (7):
L 3 =L L1 +L perceptual (7)
therefore, the weight parameter is updated according to the determined first loss value, the second loss value and the third loss value, and the generated restoration image is more vivid visually through the introduction of the second loss value. And meanwhile, determining the loss of the generated image and the calibrated image on the pixel level based on the fourth loss value, so that the repaired image is as same as the corresponding real image on the pixel space as possible. And by introducing a fifth loss value, high-level perception and semantic difference between the images are compared, so that the image repairing result is semantically consistent with the corresponding real image.
In an embodiment, before the inputting the first image to be repaired and the ith random noise in the N random noises into the network for generating the image repair model, the method further includes:
determining a fourth image from the set sample gallery;
and performing mask processing on the determined set area of the fourth image to obtain the first image.
And determining a fourth image in the set sample image library, and performing mask processing on a set area of the determined fourth image to obtain a first image to be restored for training. Here, the set sample gallery may be a commonly used database such as ImageNet, SUN, or the like, or may be a sample gallery created as needed. The masking process may be implemented by OpenCV or the like.
In training the image restoration model, the shape of the set region of the fourth image may be a certain shape such as a rectangle or an ellipse, or may be a result of recognition of a specific object in the image, depending on different uses of the model in restoring face images, landscape images, and the like.
Masks are randomly added at corresponding positions according to different purposes of models. For example, the method is added to a face area in face image restoration, and is added to an area corresponding to a specific object (such as a human body, a tree, a stone, a building, etc.) in landscape image restoration.
In one embodiment, the determining the fourth image from the set sample gallery includes:
and performing target detection on a fifth image in the set sample gallery, cutting the fifth image based on a target rectangular frame positioned in the target detection process, and determining the fourth image.
And carrying out target detection on the fifth image of the set sample gallery, selecting a target in the image based on a target rectangular frame positioned in the target detection process, obtaining a coordinate frame of a target position, cutting out a detection frame containing the target according to the coordinate frame, and obtaining a fourth image. Here, the fifth image in the sample gallery is the image dataset used for training. When performing the cropping, the image may be subjected to processing such as alignment and correction to obtain an image that meets a set standard.
Therefore, by carrying out target detection and trimming on the original data set picture, the recognition target can be kept, meanwhile, the neural network model can be concentrated in the target information of the learning image during training, the training complexity of the model is reduced, and the image restoration model trained in the way can have good application performance.
As shown in fig. 2, the image restoration method according to the embodiment of the present application includes:
step 201: inputting a sixth image to be restored and N random noises into a generation network of an image restoration model, and outputting N seventh images; wherein, the first and the second end of the pipe are connected with each other,
the image restoration model is obtained by adopting any one of the image restoration model training methods.
And inputting the sixth image to be repaired and N random noises into a generation network of the image repairing model, and outputting N repaired seventh images. Here, the generation network used by the image restoration model is obtained by training by using any one of the image restoration model training methods described above. The network for generating the image inpainting model uses the sixth image and each random noise in the N random noises as a group, and each group of images and random noises can generate a corresponding seventh image, so that the N random noises and the sixth image can generate N seventh images.
Thus, when the trained image restoration model is used for generating the network restoration image, the problem of mode collapse existing in the generation countermeasure network can be solved, and diversified image restoration results corresponding to the same image to be restored can be generated based on a plurality of random noises, so that the diversified image restoration results of the same image to be restored are provided.
Application examples of the embodiments of the present application are given below:
in the field of unsupervised learning, the GAN includes a generation network and a discrimination network, and can generate a visually realistic image conforming to the image distribution after training by fitting the image distribution through learning.
Fig. 3 shows a schematic flowchart of an image restoration method, which includes the following steps:
step 301: and acquiring a training data set of the image to be restored.
Here, the training data set that does not meet the training requirement is preprocessed, which includes but is not limited to normalizing the images in the training data set, adding a mask to the images in the training data set according to a specific requirement, and the preprocessed images are used as the images to be restored in the training data set to participate in training.
According to different purposes of the model, the mask is randomly added at corresponding positions, for example, the mask is added in a face area when a face image is repaired, and the mask is added in an area corresponding to a specific object (such as a human body, a tree, a stone, a building and the like) when a landscape image is repaired.
Step 302: and constructing an unsupervised image restoration model.
As shown in fig. 4, the image restoration model includes a random noise driving module, an image input module to be restored, a network restoration generating module, a network discriminating module, a loss countering module, a pixel level loss module, a sensing loss module, and a multi-modal constraint module.
The image to be repaired is processed into an input format which meets the requirements of the generation repairing module by the image to be repaired input module, wherein the input format comprises but is not limited to alignment, rectification, cutting and the like. And inputting the image processed by the image input module to be repaired and the random noise sample into the generation network repairing module.
The random noise driving module samples random noise samples from the prior distribution and inputs the random noise samples and the image to be repaired determined by the image to be repaired input module into the generation network repairing module.
The generation network repair module is a generation network structure, generally consists of an encoder-decoder network architecture, and can comprise a plurality of jump connection layers, or a nearest neighbor upsampling layer, a nearest neighbor downsampling layer and a plurality of residual blocks, and is mainly used for receiving sample inputs of a random noise driving module and an image input module to be repaired to generate diversified repair images.
The trained image restoration model can generate diversified image restoration results according to different noise samples.
The judging network module is used for judging the fidelity of the repaired image and exciting the generating network repairing module to generate a visually vivid and semantically consistent image repairing result. The discriminant network module is a structure of a discriminant network, usually a deep convolutional neural network.
Step 303: and training the constructed image restoration model.
The image restoration model is constructed by training a confrontation loss module, a pixel level loss module, a perception loss module and a multi-modal constraint module, wherein:
a loss confrontation module for calculating a loss value L by the formula (4) adv
Wherein D and G are the results output by the discrimination network module and the generation network repair module respectively, x and
Figure BDA0003175539160000131
respectively corresponding real complete image in the training set, image to be repaired and z processed by mask i For the ith noise sample of the N random noise samples sampled from the prior distribution, the prior distribution is typically a multiple ofA dimensional standard gaussian distribution or a uniform distribution.
The loss-counteracting module can improve the fidelity of the restored image.
A pixel level loss module for calculating a loss value L by equation (5) L1
Wherein | · | 1 Denotes the L1 norm, λ 2 The weight representing the pixel level penalty is used to adjust the weight of this penalty in the overall penalty function.
The pixel level loss module is used for calculating the loss of the generated image and the calibration image on the pixel level, so that the repaired image is as same as the corresponding real image on the pixel space as possible.
A perception loss module for calculating a loss value L by formula (6) perceptual
Wherein F (-) denotes the feature vector obtained by inputting the corresponding image into the neural network model pre-trained on ImageNet, wherein the neural network model can be VGGNet, lambda 3 And the weight representing the perception loss is used for adjusting the weight of the loss of the term in the overall loss function.
The perception loss module ensures that the image restoration result is semantically consistent with the corresponding real image by comparing high-level perception and semantic difference between the images.
A multi-modal constraint module, which can calculate the loss value L by formula (3) multimodal
For each image to be restored
Figure BDA0003175539160000141
After probability normalization is performed on Euclidean distances among a set number (N) of random noise samples z by using a softmax function, probability distribution is recorded as P i . P corresponding to ith random noise sample in N random noise samples i It can be determined by the following formula (1).
For images generated during training
Figure BDA0003175539160000142
Using softmax function as wellThe numbers are normalized by probability, and the probability distribution is recorded as Q i . Q corresponding to ith generated image in N generated images i It can be determined by the following formula (2).
Based on P by equation (3) i And Q i Determining L multimodal
Wherein λ is 1 Representing the weight of a multi-modal loss function, KL divergence computation probability P i And Q i The expected value of the logarithmic difference of (d).
The multi-modal constraint module stimulates a generation countermeasure network and generates diversified image restoration results according to input random noise, thereby solving the problem of mode collapse in image restoration.
Thus, the loss function of the image restoration model can be calculated by the following equation (8).
L total =L adv +L L1 +L percetual +L multimodal (8)
Step 304: and inputting the image to be restored into the trained image restoration model.
After the training of the image restoration model is finished, the image to be restored is input into a generation network restoration module, and a plurality of random noises z are sampled from prior distribution to generate diversified image restoration results.
Because the image restoration data set used for training is constructed by applying random masks to the complete images in the training set, for the image restoration method based on the CGAN, each image to be restored in the training set only corresponds to one calibration result (real image), the CGAN cannot learn condition distribution under the condition of the image to be restored, the model obtained by training can only obtain one restoration result for each image to be restored, and cannot obtain various meaningful reasonable restoration results which are possible semantically.
Moreover, GAN has a pattern collapse problem, and the generated repair results converge and cannot generate diversified repair results. However, for some application scenarios, it is possible that one image to be restored corresponds to a plurality of reasonable restoration results, for example, an image to be restored with a damaged face nose region may be a high nose bridge or a collapsed nose bridge, and at this time, diversified restoration results cannot be generated, and the requirement for image restoration cannot be met.
In the embodiment of the application, an image restoration model based on prior noise drive is designed, and through disturbance on input noise, a visually vivid, semantically consistent and diversified image restoration result of the same image to be restored is obtained. And the problem of mode collapse of the application in the image restoration field is solved based on the network framework and the loss function of the image restoration model, so that the diversity of image restoration results is realized.
In order to implement the method according to the embodiment of the present application, an embodiment of the present application further provides an image inpainting model training apparatus, which is disposed on a first electronic device, and as shown in fig. 5, the apparatus includes:
a generating unit 501, configured to input an ith random noise in the first image to be restored and the N random noises into a generation network of an image restoration model, and generate an ith second image;
a first processing unit 502, configured to determine a first loss value according to a relative entropy of a first parameter and a second parameter; the first parameter is determined according to the normalization results of the ith random noise and the N random noises; the second parameter is determined according to the normalization result of the N second images correspondingly generated by the ith second image and the N random noises; n is an integer greater than 1;
a training unit 503, configured to update the weight parameter of the image inpainting model according to the determined first loss value.
Wherein, in an embodiment, the apparatus further comprises:
the second processing unit is used for classifying the ith second image through a discrimination network of the image restoration model and determining a second loss value based on the classification result of the ith second image and the classification result of the third image; the third image represents a calibration result corresponding to the first image;
a third processing unit for determining a third loss value based on the ith second image and the third image;
the training unit 503 is configured to:
and updating the weight parameter of the image restoration model according to the determined first loss value, second loss value and third loss value.
In an embodiment, the third processing unit is configured to:
determining a fourth loss value according to the norm of the ith second image and the third image;
determining a first feature vector based on the ith second image, determining a second feature vector based on the third image, and determining a fifth loss value according to the norm of the vector difference between the first feature vector and the second feature vector;
determining the third loss value according to the fourth loss value and the fifth loss value.
In one embodiment, the apparatus further comprises:
the fourth processing unit is used for determining a fourth image from the set sample gallery;
and the fifth processing unit is used for performing mask processing on the determined set area of the fourth image to obtain the first image.
In an embodiment, the fourth processing unit is configured to:
and carrying out target detection on a fifth image in the set sample gallery, cutting the fifth image based on a target rectangular frame positioned in the target detection process, and determining the fourth image.
In practical applications, the generating Unit 501, the first Processing Unit 502, the training Unit 503, the second Processing Unit, the third Processing Unit, the fourth Processing Unit, and the fifth Processing Unit may be implemented by a Processor in an image restoration model training device, such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), a Micro Control Unit (MCU), or a Programmable Gate Array (FPGA).
It should be noted that: in the image restoration model training apparatus provided in the above embodiment, when performing the image restoration model training, only the division of the program modules is taken as an example, and in practical applications, the processing distribution may be completed by different program modules according to needs, that is, the internal structure of the apparatus may be divided into different program modules to complete all or part of the processing described above. In addition, the image restoration model training device and the image restoration model training method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments in detail and are not described herein again.
In order to implement the method of the embodiment of the present application, an embodiment of the present application further provides an image restoration apparatus, disposed on a second electronic device, as shown in fig. 6, where the apparatus includes:
a repairing unit 601, configured to input a sixth image to be repaired and N random noises into a network for generating an image repairing model, and output N seventh images; wherein the content of the first and second substances,
the image restoration model is obtained by adopting any one of the image restoration model training methods.
In practical applications, the repairing unit 601 may be implemented by a processor in an image repairing apparatus, such as a CPU, a DSP, an MCU, or an FPGA.
It should be noted that: in the image restoration device provided in the foregoing embodiment, when performing image restoration, only the division of each program module is illustrated, and in practical applications, the processing allocation may be completed by different program modules according to needs, that is, the internal structure of the device may be divided into different program modules to complete all or part of the processing described above. In addition, the image restoration device and the image restoration method provided by the above embodiments belong to the same concept, and the specific implementation process thereof is described in the method embodiments in detail, and is not described herein again.
Based on the hardware implementation of the program module, and in order to implement the image inpainting model training method according to the embodiment of the present application, an embodiment of the present application further provides a first electronic device, as shown in fig. 7, where the first electronic device 700 includes:
a first communication interface 701, which is capable of performing information interaction with other network nodes;
the first processor 702 is connected to the first communication interface 701, so as to implement information interaction with other network nodes, and is configured to execute the method provided by one or more technical solutions of the first electronic device side when running a computer program. And the computer program is stored on the first memory 703.
Specifically, the first processor 702 is configured to input a first image to be restored and an ith random noise in N random noises into a generation network of an image restoration model, and generate an ith second image;
determining a first loss value according to the relative entropy of the first parameter and the second parameter; the first parameter is determined according to the normalization results of the ith random noise and the N random noises; the second parameter is determined according to the normalization result of the N second images correspondingly generated by the ith second image and the N random noises; n is an integer greater than 1;
and updating the weight parameter of the image restoration model according to the determined first loss value.
In an embodiment, the first processor 702 is configured to classify the ith second image through a discriminant network of the image inpainting model, and determine a second loss value based on a classification result of the ith second image and a classification result of a third image; the third image represents a calibration result corresponding to the first image;
determining a third loss value based on the ith second image and the third image;
the updating the weight parameter of the image restoration model according to the determined first loss value comprises:
and updating the weight parameter of the image restoration model according to the determined first loss value, second loss value and third loss value.
In an embodiment, the first processor 702 is configured to determine a fourth loss value according to a norm of the ith second image and the third image;
determining a first feature vector based on the ith second image, determining a second feature vector based on the third image, and determining a fifth loss value according to a norm of a vector difference between the first feature vector and the second feature vector;
determining the third loss value according to the fourth loss value and the fifth loss value.
In one embodiment, the first processor 702 is configured to determine a fourth image from a set sample gallery;
and performing mask processing on the determined set area of the fourth image to obtain the first image.
In an embodiment, the first processor 702 is configured to perform object detection on a fifth image in the set sample gallery, crop the fifth image based on an object rectangular frame located in the object detection process, and determine the fourth image.
It should be noted that: the specific processes of the first processor 702 and the first communication interface 701 may be understood with reference to the above-described methods.
Of course, in practice, the various components in the first electronic device 700 are coupled together by the bus system 704. It is understood that the bus system 704 is used to enable communications among the components. The bus system 704 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled in fig. 7 as the bus system 704.
The first memory 703 in the embodiment of the present application is used for storing various types of data to support the operation of the first electronic device 700. Examples of such data include: any computer program for operating on the first electronic device 700.
The method disclosed in the embodiments of the present application can be applied to the first processor 702, or implemented by the first processor 702. The first processor 702 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be implemented by integrated logic circuits of hardware or instructions in the form of software in the first processor 702. The first processor 702 described above may be a general purpose processor, a DSP, or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The first processor 702 may implement or perform the methods, steps and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in a storage medium located in the first memory 703, and the first processor 702 reads the information in the first memory 703 and completes the steps of the foregoing method in combination with its hardware.
Optionally, when the first processor 702 executes the program, the corresponding process implemented by the electronic device in each method according to the embodiment of the present application is implemented, and for brevity, no further description is provided here.
In an exemplary embodiment, the first electronic Device 700 may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, programmable Logic Devices (PLDs), complex Programmable Logic Devices (CPLDs), FPGAs, general purpose processors, controllers, MCUs, microprocessors (microprocessors), or other electronic components for performing the aforementioned methods.
Based on the hardware implementation of the program module, and in order to implement the image inpainting method according to the embodiment of the present application, an embodiment of the present application further provides a second electronic device, as shown in fig. 8, where the second electronic device 800 includes:
a second communication interface 801 capable of performing information interaction with other network nodes;
the second processor 802 is connected to the second communication interface 801 to perform information interaction with other network nodes, and is configured to execute the image inpainting method according to the foregoing technical solution when running a computer program. And the computer program is stored on the second memory 803.
Specifically, the second processor 802 is configured to input a sixth image to be repaired and N random noises into a network for generating an image repair model, and output N seventh images; wherein the content of the first and second substances,
the image restoration model is obtained by adopting any one of the image restoration model training methods.
It should be noted that: the specific processing of the second processor 802 and the second communication interface 801 may be understood with reference to the methods described above.
Of course, in practice, the various components in the second electronic device 800 are coupled together by the bus system 804. It is understood that the bus system 804 is used to enable communications among the components. The bus system 804 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 804 in FIG. 8.
The second memory 803 in the embodiment of the present application is used to store various types of data to support the operation of the second electronic device 800. Examples of such data include: any computer program for operating on the second electronic device 800.
The method disclosed in the embodiment of the present application can be applied to the second processor 802, or implemented by the second processor 802. The second processor 802 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be implemented by integrated logic circuits of hardware or instructions in the form of software in the second processor 802. The second processor 802 described above may be a general purpose processor, a DSP, or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The second processor 802 may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in a storage medium located in the second memory 803, and the second processor 802 reads the information in the second memory 803, and completes the steps of the foregoing method in conjunction with its hardware.
Optionally, when the second processor 802 executes the program, the corresponding process implemented by the electronic device in each method according to the embodiment of the present application is implemented, and for brevity, no further description is given here.
In an exemplary embodiment, the second electronic device 800 may be implemented by one or more ASICs, DSPs, PLDs, CPLDs, FPGAs, general-purpose processors, controllers, MCUs, microprocessors, or other electronic components for performing the aforementioned methods.
It is understood that the memories (the first memory 703 and the second memory 803) of the embodiments of the present application may be volatile memories or nonvolatile memories, and may include both volatile and nonvolatile memories. Among them, the nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a magnetic random access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical Disc, or a Compact Disc Read-Only Memory (CD-ROM); the magnetic surface storage may be disk storage or tape storage. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration, and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), synchronous Static Random Access Memory (SSRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), double Data Rate Synchronous Random Access Memory (ESDRAM), enhanced Synchronous Dynamic Random Access Memory (ESDRAM), enhanced Synchronous Random Access Memory (DRAM), synchronous Random Access Memory (DRAM), direct Random Access Memory (DRmb Access Memory). The memories described in the embodiments of the present application are intended to comprise, without being limited to, these and any other suitable types of memory.
In an exemplary embodiment, the present application further provides a storage medium, specifically a computer storage medium, which is a computer readable storage medium, for example, the storage medium includes a first memory 703 storing a computer program, and the computer program is executable by a first processor 702 of a first electronic device 700 to perform the steps described in the foregoing first electronic device side method. For example, the second memory 803 may store a computer program, which may be executed by the second processor 802 of the second electronic device 800 to perform the steps of the second electronic device side method. The computer readable storage medium may be Memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash Memory, magnetic surface Memory, optical disk, or CD-ROM.
It should be noted that: "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
The technical means described in the embodiments of the present application may be arbitrarily combined without conflict.
The above description is only a preferred embodiment of the present application, and is not intended to limit the scope of the present application.

Claims (13)

1. An image inpainting model training method is characterized by comprising the following steps:
inputting a first image to be restored and the ith random noise in the N random noises into a generation network of an image restoration model to generate an ith second image;
determining a first loss value according to the relative entropy of the first parameter and the second parameter; the first parameter is determined according to the normalization results of the ith random noise and the N random noises; the second parameter is determined according to the normalization result of the ith second image and N second images correspondingly generated by the N random noises; n is an integer greater than 1;
and updating the weight parameter of the image restoration model according to the determined first loss value.
2. The image inpainting model training method of claim 1, prior to said updating the weight parameters of the image inpainting model according to the determined first loss value, the method further comprising:
classifying the ith second image through a discrimination network of the image restoration model, and determining a second loss value based on a classification result of the ith second image and a classification result of a third image; the third image represents a calibration result corresponding to the first image;
determining a third loss value based on the ith second image and the third image;
the updating the weight parameter of the image restoration model according to the determined first loss value comprises:
and updating the weight parameter of the image restoration model according to the determined first loss value, second loss value and third loss value.
3. The image inpainting model training method of claim 2, the determining a third loss value based on the ith second image and the third image, comprising:
determining a fourth loss value according to the norm of the ith second image and the third image;
determining a first feature vector based on the ith second image, determining a second feature vector based on the third image, and determining a fifth loss value according to a norm of a vector difference between the first feature vector and the second feature vector;
determining the third loss value according to the fourth loss value and the fifth loss value.
4. The image inpainting model training method according to claim 1, before inputting the ith random noise in the first image to be inpainted and the N random noises into the generation network of the image inpainting model, the method further comprising:
determining a fourth image from the set sample gallery;
and performing mask processing on the determined set area of the fourth image to obtain the first image.
5. The image inpainting model training method of claim 4, wherein the determining a fourth image from a set sample gallery comprises:
and carrying out target detection on a fifth image in the set sample gallery, cutting the fifth image based on a target rectangular frame positioned in the target detection process, and determining the fourth image.
6. An image restoration method, comprising:
inputting a sixth image to be restored and N random noises into a generation network of an image restoration model, and outputting N seventh images; wherein the content of the first and second substances,
the image restoration model is obtained by training according to the image restoration model training method of any one of claims 1 to 5.
7. An image inpainting model training device, comprising:
the generating unit is used for inputting the first image to be repaired and the ith random noise in the N random noises into a generating network of the image repairing model to generate an ith second image;
the first processing unit is used for determining a first loss value according to the relative entropy of the first parameter and the second parameter; the first parameter is determined according to the normalization results of the ith random noise and the N random noises; the second parameter is determined according to the normalization result of the N second images correspondingly generated by the ith second image and the N random noises; n is an integer greater than 1;
and the training unit is used for updating the weight parameters of the image restoration model according to the determined first loss value.
8. An image restoration device, characterized by comprising:
the repairing unit is used for inputting a sixth image to be repaired and N random noises into a generation network of the image repairing model and outputting N seventh images; wherein the content of the first and second substances,
the image restoration model is obtained by training according to the image restoration model training method of any one of claims 1 to 5.
9. A first electronic device, comprising: a first processor and a first communication interface; wherein, the first and the second end of the pipe are connected with each other,
the first processor is used for inputting the first image to be restored and the ith random noise in the N random noises into a generation network of the image restoration model to generate an ith second image;
determining a first loss value according to the relative entropy of the first parameter and the second parameter; the first parameter is determined according to the normalization results of the ith random noise and the N random noises; the second parameter is determined according to the normalization result of the N second images correspondingly generated by the ith second image and the N random noises; n is an integer greater than 1;
and updating the weight parameter of the image restoration model according to the determined first loss value.
10. A second electronic device, comprising: a second processor and a second communication interface; wherein, the first and the second end of the pipe are connected with each other,
the second processor is used for inputting a sixth image to be repaired and N random noises into a generation network of the image repair model and outputting N seventh images; wherein the content of the first and second substances,
the image restoration model is obtained by training according to the image restoration model training method of any one of claims 1 to 5.
11. A first electronic device, comprising: a first processor and a first memory for storing a computer program capable of running on the processor,
wherein the first processor is adapted to perform the steps of the method of any one of claims 1 to 5 when running the computer program.
12. A second electronic device, comprising: a second processor and a second memory for storing a computer program capable of running on the processor,
wherein the second processor is adapted to perform the steps of the method of claim 6 when running the computer program.
13. A storage medium having stored thereon a computer program for performing the steps of the method of any one of claims 1 to 5 or for performing the steps of the method of claim 6 when executed by a processor.
CN202110830977.6A 2021-07-22 2021-07-22 Image restoration model training method, image restoration device and storage medium Pending CN115689902A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110830977.6A CN115689902A (en) 2021-07-22 2021-07-22 Image restoration model training method, image restoration device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110830977.6A CN115689902A (en) 2021-07-22 2021-07-22 Image restoration model training method, image restoration device and storage medium

Publications (1)

Publication Number Publication Date
CN115689902A true CN115689902A (en) 2023-02-03

Family

ID=85043949

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110830977.6A Pending CN115689902A (en) 2021-07-22 2021-07-22 Image restoration model training method, image restoration device and storage medium

Country Status (1)

Country Link
CN (1) CN115689902A (en)

Similar Documents

Publication Publication Date Title
US11403838B2 (en) Image processing method, apparatus, equipment, and storage medium to obtain target image features
US10740897B2 (en) Method and device for three-dimensional feature-embedded image object component-level semantic segmentation
CN113902926B (en) General image target detection method and device based on self-attention mechanism
US20220004744A1 (en) Human posture detection method and apparatus, device and storage medium
US8311368B2 (en) Image-processing apparatus and image-processing method
CN112634209A (en) Product defect detection method and device
CN111738363B (en) Alzheimer disease classification method based on improved 3D CNN network
US20200364538A1 (en) Method of performing, by electronic device, convolution operation at certain layer in neural network, and electronic device therefor
CN111768375B (en) Asymmetric GM multi-mode fusion significance detection method and system based on CWAM
CN115147598A (en) Target detection segmentation method and device, intelligent terminal and storage medium
CN111488901A (en) Method and apparatus for extracting features from input images within multiple modules in CNN
CN111242176B (en) Method and device for processing computer vision task and electronic system
CN113361636A (en) Image classification method, system, medium and electronic device
CN114429208A (en) Model compression method, device, equipment and medium based on residual structure pruning
CN111445496B (en) Underwater image recognition tracking system and method
CN114170654A (en) Training method of age identification model, face age identification method and related device
CN115344805A (en) Material auditing method, computing equipment and storage medium
CN116757986A (en) Infrared and visible light image fusion method and device
CN117217280A (en) Neural network model optimization method and device and computing equipment
CN116432736A (en) Neural network model optimization method and device and computing equipment
CN111008992B (en) Target tracking method, device and system and storage medium
WO2024011859A1 (en) Neural network-based face detection method and device
CN112734649A (en) Image degradation method and system based on lightweight neural network
CN112101438A (en) Left and right eye classification method, device, server and storage medium
KR20210029595A (en) Keyword Spotting Apparatus, Method and Computer Readable Recording Medium Thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination