CN112085670A

CN112085670A - Image restoration method and system based on semantic interpretable information

Info

Publication number: CN112085670A
Application number: CN202010838923.XA
Authority: CN
Inventors: 樊硕
Original assignee: Beijing Moviebook Technology Corp ltd
Current assignee: Beijing Moviebook Technology Corp ltd
Priority date: 2020-08-19
Filing date: 2020-08-19
Publication date: 2020-12-15

Abstract

The application provides an image restoration method and system combining semanteme interpretable information, wherein in the method provided by the application, an image to be restored is obtained firstly; selecting data marked with image attribute information and data marked with image segmentation information in a preset data set, and constructing an auxiliary data set; and then inputting the image to be restored into a pre-trained image restoration model, and restoring the image to be restored by using an auxiliary data set. The image restoration method and system based on the combined semantic interpretable information provided by the application are used for providing an image restoration model based on the combined semantic interpretable information aiming at the defect that the existing method does not consider semantic correlation among images, and the semantic interpretable information is combined with a GAN model to improve the accuracy of image restoration.

Description

Image restoration method and system based on semantic interpretable information

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image restoration method and system combining semantic interpretable information.

Background

Image inpainting is a popular direction of research in the field of computer vision, aiming at restoring missing regions with damaged images. Conventional image inpainting methods are mainly based on patch matching or texture synthesis, but when dealing with large areas or arbitrary missing regions of large amounts of images, they can affect the quality of the generated images. Therefore, deep learning and GAN (generic adaptive Network, generated countermeasure Network) based methods are mostly adopted to generate more excellent image restoration results at present.

However, most of the existing models pay attention to low-level feature information of the image, and do not consider semantic consistency relationship between the repaired image and the original image, because of the characteristic of the GAN-based repair model, that is, the model can generate many possible repair regions in the case of missing regions, so that the image repair result may have defects unrelated to the original image semantics, and the accuracy of the image repair is low.

Disclosure of Invention

It is an object of the present application to overcome the above problems or to at least partially solve or mitigate the above problems.

According to one aspect of the application, there is provided an image inpainting method in conjunction with semantically interpretable information, comprising:

acquiring an image to be repaired;

selecting data marked with image attribute information and data marked with image segmentation information from a preset data set, and constructing an auxiliary data set;

and inputting the image to be repaired into a pre-trained image repairing model, and performing image repairing on the image to be repaired by using the auxiliary data set through the image repairing model.

Optionally, the image restoration model includes a restoration network and a discriminant network;

the inputting the image to be restored into a pre-trained image restoration model, and the image restoration model restoring the image to be restored by using the auxiliary data set, includes:

inputting the image to be restored into the restoration network, and generating a restored image of the image to be restored by combining the auxiliary data set;

inputting the repaired image into the judging network, and judging the repaired image based on the judging network.

Optionally, the repair network includes three sub-networks, which are a generator network, an image attribute embedded network, and an image segmentation embedded network, respectively; the generator network comprises an encoder network and a decoder network;

the image attribute embedded network and the image segmentation embedded network are obtained by pre-training image attribute information and image segmentation information marked in the auxiliary data set respectively;

the discrimination network includes a global discriminator, an image attribute discriminator, and an image segmentation discriminator.

Optionally, the inputting the image to be repaired into the repair network, and generating a repaired image of the image to be repaired by combining the auxiliary data set includes:

inputting the image to be repaired into the image attribute embedded network and the image segmentation embedded network, and extracting an attribute vector and a segmentation map vector of the image to be repaired;

and transmitting the image to be repaired, the attribute vector of the image to be repaired and the segmentation map vector to the generator network together, and repairing the image to be repaired to obtain a repaired image.

Optionally, before the inputting the repaired image into the judgment network and judging the repaired image based on the judgment network, the method further includes:

acquiring a real image corresponding to the image to be restored;

embedding network prediction through the image attribute and generating a first attribute vector of the real image and a second attribute vector of the repair image;

generating a first segmentation map of the real image and a second segmentation map of the repair image through the image segmentation embedding network.

Optionally, the inputting the repair image into the judgment network, and judging the repair image based on the judgment network includes:

inputting the real image and the repair image into the global discriminator, testing the authenticity of the repair image based on the real image;

inputting the first attribute vector and the second attribute vector into the image attribute discriminator as a true discrimination object and a false discrimination object respectively for comparison discrimination;

and inputting the first segmentation chart and the second segmentation chart into the image segmentation discriminator for comparison and judgment.

According to another aspect of the application, there is provided an image inpainting system incorporating semantically interpretable information, comprising:

an image to be repaired acquisition module configured to acquire an image to be repaired;

the auxiliary data set construction module is configured to select data marked with image attribute information and data marked with image segmentation information from a preset data set to construct an auxiliary data set;

and the image restoration module is configured to input the image to be restored into a pre-trained image restoration model, and the image restoration model carries out image restoration on the image to be restored by utilizing the auxiliary data set.

the image inpainting module is further configured to:

Optionally, the image inpainting module is further configured to:

The application provides an image restoration method and system combining semanteme interpretable information, wherein in the method provided by the application, an image to be restored is obtained firstly; selecting data marked with image attribute information and data marked with image segmentation information in a preset data set, and constructing an auxiliary data set; and then inputting the image to be restored into a pre-trained image restoration model, and restoring the image to be restored by using an auxiliary data set.

The image restoration method and system based on the combined semantic interpretable information provided by the application are used for providing an image restoration model based on the combined semantic interpretable information aiming at the defect that the existing method does not consider semantic correlation among images, and the semantic interpretable information is combined with a GAN model to improve the accuracy of image restoration.

The above and other objects, advantages and features of the present application will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.

Drawings

Some specific embodiments of the present application will be described in detail hereinafter by way of illustration and not limitation with reference to the accompanying drawings. The same reference numbers in the drawings identify the same or similar elements or components. Those skilled in the art will appreciate that the drawings are not necessarily drawn to scale. In the drawings:

FIG. 1 is a flow chart of an image inpainting method incorporating semantically interpretable information according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a repair network structure according to an embodiment of the present application;

FIG. 3 is a block diagram of an image inpainting system architecture that combines semantically interpretable information, according to one embodiment of the present application;

FIG. 4 is a block diagram of an image inpainting system architecture that combines semantically interpretable information according to another embodiment of the present application;

FIG. 5 is a schematic diagram of a computing device according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a computer-readable storage medium according to an embodiment of the application.

Detailed Description

The CE (Context encoder) model with the CNN structure of the encoder-decoder, which is proposed for the first time, can be used for jointly training the CE model to have the combined loss of reconstruction loss and antagonism loss, and compared with the model generated by the traditional method, the CE model can generate a better image restoration result. However, due to the limitations of the generator of the basic encoder-decoder architecture, the image inpainting results generated by CE still have the disadvantage of blurring and lack of fine-grained detailed description. Later methods used post-processing on top of using encoder-decoder models to improve the quality of the generated image, e.g. using CE as an initial stage and improving the repair result by propagating surrounding texture information, or using global and local discriminators to further improve the image repair result, and also proposed a context-based refined image repair network to add a more fine-grained description to the generated image.

These above methods focus primarily on enhancing the resolution of the repair image, but at the same time ignore the semantic consistency between the repair content and the existing image context. This is because today's repair methods rely mainly on the assumption that patches similar to those in the missing region should appear in the non-missing region, they fill the missing region with background patches obtained from the non-missing region, and therefore these methods only focus on low-level features of the image and cannot handle cases where the missing region contains non-repetitive content. The conventional model starts with a classical coder-decoder structure, which is very susceptible to false one-to-many ambiguities in generating the initial results, and the post-processing stage does not take effect if semantically incorrect intermediate results are given.

Fig. 1 is a flowchart illustrating an image restoration method combining semantically interpretable information according to an embodiment of the present application. Referring to fig. 1, an image restoration method combining semantically interpretable information provided in an embodiment of the present application may include:

step S101: acquiring an image to be repaired;

step S102: selecting data marked with image attribute information and data marked with image segmentation information from a preset data set, and constructing an auxiliary data set;

step S103: inputting the image to be restored into a pre-trained image restoration model, and performing image restoration on the image to be restored by using an auxiliary data set through the image restoration model.

The embodiment of the application provides an image restoration method combining semanteme interpretable information, wherein in the method provided by the application, an image to be restored is obtained firstly; selecting data marked with image attribute information and data marked with image segmentation information in a preset data set, and constructing an auxiliary data set; and then inputting the image to be restored into a pre-trained image restoration model, and restoring the image to be restored by using an auxiliary data set. Based on the method provided by the application, aiming at the defect that the semantic correlation between the images is not considered in the existing method, the image restoration model combining the semantic interpretable information is provided, and the accuracy of image restoration is improved by combining the semantic interpretable information with the GAN model.

The semantic interpretable information in the embodiment of the application mainly comprises two aspects of information, namely image attribute information which represents vectors of a plurality of descriptive text labels related to the image, such as image character gender, hair color and the like; the second is image segmentation information that associates each pixel of the image with a region label, such as the region of the eyes, mouth and nose from the facial image, that can resolve the spatial relationships and boundaries between different segmented regions. The aim of the method is to utilize and integrate the two basic information in the image restoration process, train the most advanced multi-label image classification model and the image semantic segmentation model in advance and combine the two basic information into the image restoration process.

When the image restoration is performed by combining the semantic interpretable information, step S101 is first executed to obtain an image to be restored.

The image to be repaired is selected from a preset repairing training set (marked as d)₀) The CelebA-HQ data set is adopted as the repair training set d in the embodiment of the application₀。

After the image to be restored is acquired, step S102 is executed to select data marked with image attribute information and data marked with image segmentation information from a preset data set, and an auxiliary data set is constructed.

In the image restoration process, an auxiliary data set needs to be constructed to identify and analyze semantic information of the image to be restored, and further, the semantic information of the image to be restored needs to be analyzed by using data marked with image attribute information and data marked with image segmentation information. In the embodiment of the present application, the data set labeled with image attribute information is labeled as d₁Using CelebA data set; the data set labeled with image segmentation information is labeled d₂Using HelenFace dataset and dataset d₁And a data set d₂Is selected to have and repair training set d₀The restored image in (1) has similar contents.

Then, step S103 is executed, the image to be repaired is input into the pre-trained image repairing model, and the image repairing model performs image repairing on the image to be repaired by using the auxiliary data set.

In an optional embodiment of the present application, the image restoration model includes a restoration network and a discriminant network; step 103 may be further understood as: inputting an image to be restored into a restoration network, and generating a restoration image of the image to be restored by combining an auxiliary data set; and inputting the repaired image into a judging network, and judging the repaired image based on the judging network.

Alternatively, the repair network consists of three sub-networks, respectively generator network G, image attribute embedding network W_aAnd image segmentation embedded network W_s。

The generator network G is composed of an encoder network and a decoder network, the encoder comprises 6 convolutional layers in total, wherein 3 convolutional layers are set to be downsampled, 3 convolutional layers are upsampled, output features of images are obtained from the encoder and then are sent to the decoder, the decoder learns how to reconstruct original feature vectors, and the decoder adopts the upsampled 3 convolutional layers.

And the image attributes are embedded in the network W_aAnd image segmentation embedded network W_sPre-training the training data with the auxiliary training set, firstly using the auxiliary training set d₁Pre-training image attribute embedded network W_aWherein the most advanced multi-label image classification model is adopted as the image attribute to be embedded into the network W_a(ii) a Reuse of the auxiliary training set d₂Pre-training image segmentation embedded network W_sThe most advanced image segmentation network is also adopted as the image segmentation embedded network W_s。

When image restoration is carried out based on a restoration network, an image to be restored is input into an image attribute embedded network and an image segmentation embedded network, and an attribute vector and a segmentation map vector of the image to be restored are extracted; and then, transmitting the image to be restored, the attribute vector of the image to be restored and the segmentation map vector to a generator network together, and restoring the image to be restored to obtain a restored image.

As shown in fig. 2, taking an image to be restored as an image x for example, in order to extract attributes and segmentation information of the image, an input image x is first fed to an image attribute embedding network W_aAnd image segmentation embedded network W_sObtaining N₁Attribute vector W of dimension_a(x) And a segmentation map W_s(x) Image x and its segmentationDrawing W_s(x) Connected in a bitwise-wise-multiply operation and fed to the encoder network part of the generator network G, which then generates a space of size M₁×M₁The intermediate feature map of (1). Then N is added₁Attribute vector W of dimension_a(x) And M₁×M₁Are connected to obtain M₁×M₁×N₁Mapping the features of M, and finally, mapping M₁×M₁×N₁Is fed into the decoder network part of the generator, through a series of upsampling operations of the decoder network, for generating the restoration image z.

Further, after the image to be repaired is repaired and output from the repairing network, the image to be repaired is input into the judging network for judgment.

The discrimination network is a multi-stage discrimination network and consists of three discriminators: a global discriminator D_gAn image attribute discriminator D_aAnd an image segmentation discriminator D_s。

In an optional embodiment of the present application, before the restored image is input to the discrimination network, a real image corresponding to the image to be restored needs to be acquired; embedding network prediction through image attributes and generating a first attribute vector of a real image and a second attribute vector of a restored image; and generating a first segmentation graph of the real image and a second segmentation graph of the repair image through the image segmentation embedded network. And after the real image, the first attribute vector, the second attribute vector, the first segmentation graph and the second segmentation graph are obtained, inputting the real image, the first attribute vector, the second attribute vector, the first segmentation graph and the second segmentation graph into the discrimination network together.

In this embodiment, the real image is a complete image corresponding to the image to be restored, and may be obtained in advance through the data set. And the real image corresponding to the image x to be repaired is assumed to be y. Like the real image y, after the image x to be restored is input into the restoration network for restoration and before the image x is input into the discrimination network, the image x to be restored needs to be embedded into the network again through the image attribute embedding network and the image segmentation embedding network, the attribute vector and the segmentation map of the restored image z are output, and then the image x to be restored and the attribute vector and the segmentation map of the real image y are input into the discrimination network for comparison.

Global discriminator D_gThe overall consistency between the restored image z and the real image y is checked. The global discriminator has the same function as a general GAN structure discriminator, is a binary classifier and consists of convolution layers and fully-connected layers and is used for testing whether an input image is forged or real.

Image attribute discriminator D_aComposed of convolutional layer and full link layer, which needs to be embedded into the network W through image attributes_aPredicting an attribute vector W of a real image y_a(y) predicting the attribute vector W of the restored image z_a(z) then mixing W_a(y) and W_a(z) As a pair of true and false contrast judgment objects, D_aWherein W is determined_a(y) as true comparison object, W_a(z) as a "false" alignment target.

Discriminator D for image segmentation_sThe restored image z and the real image y need to be embedded in the web W by image segmentation before being fed to the segmentation discriminator_sGenerating a composition graph W_s(z) and W_s(y) dividing the image z into a plurality of images W_a(z) concatenating image y with its segmentation map W_s(y) connected, the connected images being fed to an image segmentation discriminator D_sIs determined. In this embodiment, the connection between the image and the segmentation map may be performed by a bitwise multiplication operation between the image and the segmentation map.

Summarizing, inputting a real image and a restored image into a global discriminator, and testing the authenticity of the restored image based on the real image; inputting the first attribute vector and the second attribute vector as a true discrimination object and a false discrimination object respectively into the image attribute discriminator for comparison discrimination; the first segmentation chart and the second segmentation chart are input into an image segmentation discriminator to be compared and distinguished, a real sample (output is 'true' or 1) is identified and classified as correctly as possible, and a sample (output is 'false' or 0) is generated.

Through the three groups of discriminators, a game process can be formed with the repair network, and result promotion of image repair is continuously promoted.

Based on the same inventive concept, as shown in fig. 3, an embodiment of the present application further provides an image inpainting system combining semantically interpretable information, including:

an image to be repaired acquisition module 310 configured to acquire an image to be repaired;

an auxiliary data set constructing module 320 configured to select data labeled with image attribute information and data labeled with image segmentation information from a preset data set, and construct an auxiliary data set;

and the image restoration module 330 is configured to input the image to be restored into a pre-trained image restoration model, and the image restoration model performs image restoration on the image to be restored by using the auxiliary data set.

In an optional embodiment of the present application, the image restoration model includes a restoration network and a discriminant network; the repair network comprises three sub-networks, namely a generator network, an image attribute embedded network and an image segmentation embedded network; the generator network comprises an encoder network and a decoder network;

An image inpainting module 330, further configured to: inputting an image to be restored into the restoration network, and generating a restored image of the image to be restored by combining an auxiliary data set;

In an alternative embodiment of the present application, the repair network includes three sub-networks, namely, a generator network, an image attribute embedded network, and an image segmentation embedded network; the generator network comprises an encoder network and a decoder network;

the image attribute embedded network and the image segmentation embedded network are obtained by pre-training image attribute information and image segmentation information in the auxiliary data set respectively;

In an optional embodiment of the present application, the image inpainting module 330 is further configured to: inputting image attributes of an image to be restored into an embedded network and an image segmentation embedded network, and extracting attribute vectors and segmentation map vectors of the image to be restored;

and transmitting the image to be repaired, the attribute vector of the image to be repaired and the segmentation map vector to a generator network together, and repairing the image to be repaired to obtain a repaired image.

In an alternative embodiment of the present application, as shown in fig. 4, the system further includes:

a real image analysis module 340 configured to acquire a real image corresponding to an image to be restored; embedding network prediction through image attributes and generating a first attribute vector of a real image and a second attribute vector of a restored image; a first segmentation map of the real image and a second segmentation map of the restored image are generated by the image segmentation embedded network.

In an optional embodiment of the present application, the image inpainting module 330 may be further configured to:

inputting the real image and the repaired image into the global discriminator, and testing the authenticity of the repaired image based on the real image;

and inputting the first attribute vector and the second attribute vector as a true discrimination object and a false discrimination object respectively into the image attribute discriminator for comparison discrimination.

And inputting the first segmentation chart and the second segmentation chart into an image segmentation discriminator for comparison and judgment.

An embodiment of the present application further provides a computing device, which, with reference to fig. 5, comprises a memory 520, a processor 510 and a computer program stored in said memory 520 and executable by said processor 510, the computer program being stored in a space 530 for program code in the memory 520, the computer program, when executed by the processor 510, implementing the method steps 531 for performing any of the methods according to the present invention.

The embodiment of the application also provides a computer readable storage medium. Referring to fig. 6, the computer readable storage medium comprises a storage unit for program code provided with a program 531' for performing the steps of the method according to the invention, which program is executed by a processor.

The embodiment of the application also provides a computer program product containing instructions. Which, when run on a computer, causes the computer to carry out the steps of the method according to the invention.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed by a computer, cause the computer to perform, in whole or in part, the procedures or functions described in accordance with the embodiments of the application. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by a program, and the program may be stored in a computer-readable storage medium, where the storage medium is a non-transitory medium, such as a random access memory, a read only memory, a flash memory, a hard disk, a solid state disk, a magnetic tape (magnetic tape), a floppy disk (floppy disk), an optical disk (optical disk), and any combination thereof.

The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An image inpainting method that incorporates semantically interpretable information, comprising:

acquiring an image to be repaired;

2. The method of claim 1, wherein the image inpainting model comprises an inpainting network and a discriminating network;

3. The method of claim 2,

the repair network comprises three sub-networks, namely a generator network, an image attribute embedded network and an image segmentation embedded network; the generator network comprises an encoder network and a decoder network;

4. The method according to claim 3, wherein the inputting the image to be repaired into the repair network, and the generating a repair image of the image to be repaired in combination with the auxiliary data set comprises:

5. The method according to claim 3, wherein before inputting the repair image into the discrimination network and discriminating the repair image based on the discrimination network, the method further comprises:

acquiring a real image corresponding to the image to be restored;

6. The method of claim 5, wherein the inputting the repair image into the decision network, and the deciding the repair image based on the decision network comprises:

7. An image inpainting system incorporating semantically interpretable information, comprising:

8. The system of claim 7,

the image restoration model comprises a restoration network and a discrimination network;

the image inpainting module is further configured to:

9. The system of claim 8, wherein the repair network comprises three sub-networks, respectively a generator network, an image attribute embedding network, and an image segmentation embedding network; the generator network comprises an encoder network and a decoder network;

the discrimination network comprises a global discriminator, an image attribute discriminator and an image segmentation discriminator;

the image inpainting module is further configured to: inputting the image to be repaired into the image attribute embedded network and the image segmentation embedded network, and extracting an attribute vector and a segmentation map vector of the image to be repaired; and transmitting the image to be repaired, the attribute vector of the image to be repaired and the segmentation map vector to the generator network together, and repairing the image to be repaired to obtain a repaired image.

10. The system of claim 9, wherein the image inpainting module is further configured to: