WO2023124697A1

WO2023124697A1 - Image enhancement method, apparatus, storage medium, and electronic device

Info

Publication number: WO2023124697A1
Application number: PCT/CN2022/134845
Authority: WO
Inventors: 唐斯伟; 郑程耀; 吴文岩; 钱晨
Original assignee: 上海商汤智能科技有限公司
Priority date: 2021-12-31
Filing date: 2022-11-29
Publication date: 2023-07-06
Also published as: CN114331906A

Abstract

Provided in embodiments of the present disclosure are an image enhancement method, an apparatus, a storage medium, and an electronic device. The method may comprise: performing feature extraction on a target image to obtain appearance information of the target image, wherein the target image comprises a first object, and the appearance information represents a surface visual feature in the target image; obtaining structural information of a second object, wherein the first object and the second object are target objects of a same type, and the structural information represents a contour feature of the second object; and generating an enhanced image on the basis of the appearance information and the structural information, wherein the enhanced image comprises a target object having the appearance information and the structural information.

Description

Image enhancement method and device, storage medium and electronic equipment

Cross References to Related Applications

This disclosure claims priority to Chinese Patent Application No. 202111669721.8 filed on December 31, 2021, which is incorporated herein by reference.

technical field

The present disclosure relates to artificial intelligence technology, and in particular to an image enhancement method and device, storage medium and electronic equipment.

Background technique

Image enhancement has a wide range of applications in various scenarios. For example, in the scene of training a neural network, more and richer sample images can be obtained by performing image enhancement on the sample images. For another example, image enhancement can also be used to implement some face image enhancement applications such as makeup migration and face drive.

The image enhancement methods in related technologies either use traditional image processing methods such as stretching and interpolation for image enhancement, but the enhanced image quality obtained in this way is not high, and usually only image enhancement can be performed under limited conditions. less. In addition, if a neural network is used for image enhancement, the training of the neural network needs to obtain enough sample images. For example, a video of a certain length of time of a user with a single ID is often required to obtain multiple images of the user in the video. Face images, the cost of obtaining training samples in this way is relatively high, and it is also very inconvenient for users.

Contents of the invention

Embodiments of the present disclosure at least provide an image enhancement method and device, a storage medium, and an electronic device.

In a first aspect, an image enhancement method is provided, the method comprising:

performing feature extraction on the target image to obtain appearance information of the target image, wherein the target image includes a first object; the appearance information represents surface visual features in the target image;

Acquiring structural information of a second object, wherein the first object and the second object are target objects of the same type; the structural information represents an outline feature of the second object;

An enhanced image is generated based on the appearance information and the structure information, wherein the enhanced image includes a target object having the appearance information and the structure information.

In some examples, the method is performed by an image enhancement device, and an image enhancement network is deployed in the image enhancement device, and the image enhancement network includes: an appearance extractor and a generator; performing feature extraction on the target image to obtain The appearance information of the target image includes: performing feature extraction on the target image through the appearance extractor in the image enhancement network to obtain the appearance information of the target image; the generating based on the appearance information and structural information The image enhancement includes: generating an enhanced image based on the appearance information and the structure information by the generator in the image enhancement network.

In some examples, the acquiring the structural information of the second object includes: acquiring an initial image, the initial image including the second object; performing key point detection on the initial image to obtain the key points of the second object; and obtain the structural information of the second object according to the key points of the second object.

In some examples, the second object is included in the auxiliary image; the method further includes: acquiring an initial image including the target object; performing key point detection on the initial image to obtain the Key points of the target object in the initial image; cropping the initial image according to the key points of the target object to obtain the target image or auxiliary image including the target object.

In some examples, the method further includes: after generating the enhanced image based on the appearance information and the structural information, the enhanced image replaces a corresponding image portion in the initial image.

In some examples, the first object and the second object are the same target object, or different target objects of the same type, and the target object is one of the facial features in a human face.

In a second aspect, a training method of an image enhancement network is provided, the method comprising:

Acquiring a sample image including a first object and structural information of a second object, wherein the first object and the second object are the same target object with different structural information; the structural information represents the structure information of the second object contour features;

performing feature extraction on the sample image through an image enhancement network to obtain appearance information of the sample image, wherein the appearance information represents surface visual features in the sample image;

performing image generation processing on the appearance information and the structure information through the image enhancement network, and outputting a sample enhanced image, wherein the sample enhanced image includes the target object having the appearance information and the structure information;

and adjusting network parameters of the image enhancement network according to the sample enhanced image.

In some examples, the second object is included in the auxiliary image, and the image enhancement network includes: an appearance extractor and a generator; and adjusting network parameters of the image enhancement network according to the sample enhancement image includes : adjusting the network parameters of the appearance extractor and the generator according to the difference between the sample enhanced image and the auxiliary image.

In some examples, the second object is included in the auxiliary image; the adjusting the network parameters of the image enhancement network according to the sample enhanced image includes: inputting the sample enhanced image into the discriminator to obtain The discrimination value output by the discriminator; the first loss is obtained according to the difference between the discrimination value and the discrimination true value, and the second loss is obtained according to the difference between the sample enhanced image and the auxiliary image; according to the The first loss and the second loss are used to adjust network parameters of at least one of the appearance extractor, the generator, and the discriminator.

In a third aspect, an image enhancement device is provided, the device comprising:

The appearance extraction module is used for feature extraction of the target image to obtain appearance information of the target image, wherein the target image includes a first object; the appearance information represents the surface visual features in the target image;

A structure acquisition module, configured to acquire structure information of a second object, the first object and the second object are target objects of the same type; the structure information represents the outline feature of the second object;

An image generating module, configured to generate an enhanced image based on the appearance information and the structure information, wherein the enhanced image includes a target object with the appearance information and the structure information.

In a fourth aspect, a training device for an image enhancement network is provided, the device comprising:

An information acquisition module, configured to acquire a sample image including a first object and structural information of a second object, wherein the first object and the second object are the same target object with different structural information; the structure information representing contour features of said second object;

A feature extraction module, configured to perform feature extraction on the sample image through an image enhancement network to obtain appearance information of the sample image, wherein the appearance information represents surface visual features in the sample image;

An image output module, configured to perform image generation processing on the appearance information and the structure information through the image enhancement network, and output a sample enhanced image, wherein the sample enhanced image includes the appearance information and the structure information the target audience of

A parameter adjustment module, configured to adjust the network parameters of the image enhancement network according to the sample enhanced image.

According to a fifth aspect, an electronic device is provided, including: a memory and a processor, the memory is used to store computer-readable instructions, and the processor is used to call the computer instructions to implement the method in any embodiment of the present disclosure.

In a sixth aspect, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed by a processor, the method in any embodiment of the present disclosure is implemented.

The image enhancement method and device, storage medium and electronic equipment provided by the embodiments of the present disclosure can enhance the sample image according to various types of structural information through this method. Since the structural information can be varied and not limited, more Enhanced images for rich samples make the types of samples more abundant. When the generated sample enhanced images are applied to tasks such as model training, rich and diverse samples can improve the robustness and generalization of model training, and in this way A richer variety of samples is obtained. Compared with the previous sample acquisition method, the cost of sample acquisition is reduced, and the sample acquisition is easier. In addition, the method uses an image enhancement network to generate a sample enhanced image, which can make the generated image quality higher than conventional image processing methods such as interpolation and stretching.

Description of drawings

In order to more clearly illustrate the technical solutions in one or more embodiments of the present disclosure or related technologies, the following will briefly introduce the drawings that need to be used in the descriptions of the embodiments or related technologies. Obviously, the accompanying drawings in the following description The drawings are only some embodiments described in one or more embodiments of the present disclosure, and those skilled in the art can obtain other drawings based on these drawings without any creative effort.

Fig. 1 shows a schematic flowchart of a method for training an image enhancement network provided by at least one embodiment of the present disclosure;

Fig. 2 shows a schematic framework diagram of image enhancement provided by at least one embodiment of the present disclosure;

Fig. 3A shows a schematic diagram of structure information of a first object provided by at least one embodiment of the present disclosure;

Fig. 3B shows a schematic diagram of the structure information of the second object provided by at least one embodiment of the present disclosure;

Fig. 4 shows a schematic diagram of structural information of another eye provided by at least one embodiment of the present disclosure;

Fig. 5 shows a schematic diagram of network training provided by at least one embodiment of the present disclosure;

Fig. 6 shows a schematic flowchart of an image enhancement method provided by at least one embodiment of the present disclosure;

Fig. 7 shows a schematic structural diagram of an image enhancement device provided by at least one embodiment of the present disclosure;

Fig. 8 shows a schematic structural diagram of an image enhancement network training device provided by at least one embodiment of the present disclosure.

Detailed ways

In order to enable those skilled in the art to better understand the technical solutions in one or more embodiments of the present disclosure, the following will describe the technical solutions in one or more embodiments of the present disclosure in conjunction with the drawings in one or more embodiments of the present disclosure The technical solutions are clearly and completely described, and obviously, the described embodiments are only some of the embodiments of the present disclosure, rather than all the embodiments. Based on one or more embodiments of the present disclosure, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the protection scope of the present disclosure.

Embodiments of the present disclosure aim to provide an image enhancement method, which can generate an enhanced image through a trained neural network. Wherein, the neural network may be called an image enhancement network, and the enhanced image may be an image obtained after enhancement processing is performed on the basis of an initial image. The enhancement process can be, for example, deforming the image. For example, taking the enhancement of a human face image as an example, it can include but not limited to changes in the angle of the face, the change in the expression of the face, the change in the orientation of the face, and the size of the facial features. changes etc. Exemplarily, assuming that the initial image is a human face image in which the mouth of the human face is in a closed state, and the mouth in the human face image is transformed into a smiling mouth to obtain an enhanced image.

In the following embodiments, the training process of the image enhancement network will be described first, and then how to generate an enhanced image through the trained image enhancement network will be described.

Fig. 1 shows a schematic flowchart of a method for training an image enhancement network provided by at least one embodiment of the present disclosure. As shown in Fig. 1, the method may include the following processing:

In step 100, a sample image and structural information of a second object are acquired.

The training method of this embodiment can be performed by the training device of the image enhancement network. For example, the training device can be deployed on an electronic device (such as a server), and the training device can include images to be trained. Enhanced network.

In this step, the training device of the image enhancement network can obtain the sample image to be enhanced, and in this embodiment, the image to be enhanced in the training stage can be called a sample image. Wherein, the sample image includes the first object. Exemplarily, the sample image may be an image including eyes, and the first object may be the eyes in the sample image. For another example, the sample image may be an image including trees, and the first object may be the trees in the sample image.

The training device may also obtain structural information of a second object, which is the same target object with different structural information from the first object.

The structure information can be understood as representing the contour features of the second object, for example, the size and structure of the object. An example is as follows: Taking the facial features of a human face as an example, the acquired structural information of the facial features can be information such as the contour features of the mouth, the contour features of the nose, etc.; it can also be feature information such as the height of the nose. Wherein, the record form of the contour feature includes but is not limited to: expressed as a contour line, or recorded as a plurality of key points distributed on the contour line, and the position coordinates or key point identifiers of these key points can be recorded.

For example, the target object is an eye, and the structural information may be an outline feature of the eye. The structural information of the first object illustrated in Figure 3A and the structural information of the second object illustrated in Figure 3B can be seen as eyes with two different structural information, but these two objects can be the eyes of the same person, just one One is in the state of squinting, and the other is in the state of widening, so the structural information of the two eyes is different.

Similarly, the first object and the second object with different structural information can also be the following examples: for another example, if the target object is a mouth, then the first object can be a closed mouth, and the second object can be An open mouth, even if the two mouths are the mouths of the same person, the structural information of the two mouths is different due to their different states. For example, due to the above-mentioned different states of the mouth, the position of each contour key point recorded in the contour feature of the closed mouth is different from the position of each contour key point recorded in the contour feature of the open mouth.

In step 102, feature extraction is performed on the sample image through an image enhancement network to obtain appearance information of the sample image.

In an example, the appearance information may be acquired through feature extraction by an appearance extractor in the image enhancement network. As shown in FIG. 2 , after the training device acquires the sample image, it can input the sample image to the image enhancement network. The image enhancement network may include an appearance extractor, and the appearance extractor 21 may perform feature extraction on the sample image to obtain appearance information of the sample image. This embodiment does not limit the network structure of the appearance extractor. For example, the appearance extractor may include various modules such as a convolution layer, a residual module, an activation layer, and a pooling layer.

The appearance information represents surface visual features in the target image. The surface visual features include but not limited to texture, color, lighting information, etc. in the target image. Taking the sample image as a face image as an example, after feature extraction by the appearance extractor, the appearance information obtained can include: the brightness of the face area, the texture of the face, the color of the face, etc.

For example, the appearance information of the sample image output by the appearance extractor 21 may be expressed as a one-dimensional tensor, which may be a 64*1 tensor.

In addition, for the appearance information included in the sample image, the appearance extractor 21 may extract all or part of the appearance information, which may be determined according to actual business requirements. For example, taking an eye picture as an example, in addition to the eyes, the eye picture also includes a part of the face area around the eyes and the eyebrow area. Then, the appearance information of the brightness, color, and texture of all these areas can be extracted by the appearance extractor 21, or only the appearance information of the eyebrow area can be extracted, or only the appearance information of the face area around the eyes can be extracted. The extraction of the appearance information of at least a part of the region in the sample image can be realized by designing and training the appearance extractor 21 .

Optionally, for the above-mentioned appearance information such as brightness, color, texture, etc., the function of the appearance extractor 21 can also be designed to realize the extraction of at least part of the appearance information, for example, only the texture and texture in the sample image are extracted. color without extracting brightness.

In addition, for the same reason, the structure information of the second object obtained in step 100 may also be determined to obtain at least part of the structure information of the second object according to actual business requirements. For example, taking the second object as the eye as an example, if you want to obtain all the structural information of the eye, it can include the key points of the outer contour of the eye, the key points of the contour of the eyeball, and the center point of the eyeball; and if you want to obtain the part of the eye Structural information may only include the outline points of the eye, excluding key points of the outline of the eyeball and the center point of the eyeball.

As mentioned above, the structural information obtained in step 100 and the appearance information obtained in step 102 in this embodiment are at least part of the extracted information, and these information will participate in the image generation process.

In step 104, an image generation process is performed on the appearance information and structure information through an image enhancement network, and a sample enhanced image is output; wherein, the sample enhanced image includes the appearance information and the target object with the structure information.

In this embodiment, the image enhancement network can generate a sample enhanced image according to the appearance information and structure information obtained above. For example, as shown in FIG. 2 , image generation processing may be performed by the generator 22 to output a sample enhanced image. The sample enhanced image can have both appearance information and structure information. Wherein, the structure information may be possessed by a target object in the sample enhanced image, and the target object may be the aforementioned first object or second object.

In an example, please refer to FIG. 2 , assuming that the sample image is an image containing eyes, and the structure information is a structure map of the eyes in another state. After being processed by the image enhancement network, compared with the sample image, the output sample enhanced image may replace the structural information of the first object in the sample image with the structural information of the second object, and the structural information of the second object in the sample image Other information may not change, for example, the face texture around the eyes, face color, eyebrows, eyeball position inside the eyes, eye color, etc. in the sample image may not change, which is the same as in the sample image.

In step 106, network parameters of the image enhancement network are adjusted according to the sample enhanced image.

In this embodiment, the image serving as the label of the sample enhanced image may be an auxiliary image where the second object is located. The auxiliary image may have the same image size as the sample image, and the auxiliary image and the sample image may include the same area.

For example, the sample image in Figure 2 includes an eye and an eyebrow, and the auxiliary image corresponding to the sample image can also include an eye and an eyebrow, that is, it includes the same area as the sample image, and the size of the sample image and the auxiliary image Can be the same. The difference lies in that the structural information of eyes in the sample image and the auxiliary image are different, for example, the eyes in the sample image are wide open, while the eyes in the auxiliary image are squinted.

After the sample enhanced image is obtained, network parameters of the image enhancement network can be adjusted according to the sample enhanced image. For example, according to the difference between the sample enhanced image and the auxiliary image, the L1 norm loss function (L1loss) between the sample enhanced image and the auxiliary image can be solved, and the network parameters of the appearance extractor and generator can be adjusted according to the L1 loss.

Through the image enhancement network training method of this embodiment, the sample image can be enhanced according to various types of structural information. Since the structural information can be varied and not limited, a richer sample enhanced image can be obtained, so that the sample image The types are more abundant. When the generated sample enhanced images are applied to tasks such as model training, rich and diverse samples can improve the robustness and generalization of model training, and in this way, more abundant sample types can be obtained. Compared with Compared with the previous sample acquisition method, the cost of sample acquisition is reduced, and sample acquisition is more convenient. In addition, the method uses an image enhancement network to generate a sample enhanced image, compared with conventional image processing methods such as interpolation and stretching, so that the quality of the generated image is higher.

In addition, the first object included in the sample image may be determined according to actual application requirements. For example, in the sample enhanced image obtained according to the embodiment of the present disclosure, if the actual application needs to include the image of the eye, the first object in the sample enhanced image is the eye. For another example, if the actual application requires an image including a mouth, the first object in the sample enhanced image is the mouth. In addition, other organs in the facial features can also be enhanced, such as eyebrows, nose, etc. Correspondingly, the sample image containing the organ to be enhanced and the corresponding structural information of the organ can be used to generate the sample enhanced image.

The sample enhanced image shown in Figure 2 is an image including eyes, but in actual implementation, sometimes the initially obtained image can be an image with a relatively large range including the entire face, then you can perform the image enhancement process shown in Figure 2 , to preprocess the initial image.

As shown in Figure 4, suppose an initial image is obtained, which can be called "initial image". Through the pre-trained key point detection network, detect the key points of the face in the initial image (for example, it can be 106 key points). The initial image is cropped according to the detected key points to obtain an image including the face, and the areas other than the face such as the hands and neck are removed. The size of the cropped face image may be 1024*1024. As shown in FIG. 4 , this FIG. 4 schematically shows the face image obtained after cropping the initial image, and some key points of the face in the face image, for example, key point 41 , key point 42 and so on.

Further, if one wants to enhance and deform one of the organ regions in the face image through the image enhancement network shown in Figure 2, for example, to enhance the mouth, then according to the above-mentioned key points of the face, the image in Figure 4 The shown face image is further cropped to obtain an image including the mouth. As shown in FIG. 4 , the mouth in the mouth image is the mouth in the face image. Moreover, the mouth image can be used as a sample image in the training phase of the image enhancement network, or can also be used as an auxiliary image.

Furthermore, according to the above key points of the mouth, the structure information of the corresponding mouth can also be obtained. As shown in FIG. 4, the structure information may be a structure map (heatmap) corresponding to the mouth. Keypoints for the mouth may be included in the structure map. The structure graph can be input into the image enhancement network to assist the sample image to generate a corresponding sample enhanced image.

In addition, taking the enhanced face image as an example, when preparing the training data of the image enhancement network, the following data can be prepared:

1) A small number of face images with the same ID: for example, 15 face images of the same person Zhang. The same ID refers to the same person, for example, multiple face images of Xiao Wang belong to the same ID, and the ID is Xiao Wang's identification.

2) A larger number of face images of other IDs, and each ID has a certain number of face images with different expressions and different angles. For example, the 15,000 other IDs may be face images of Xiao Wang, Xiao Dong and other people.

As above, the prepared training data may include face images of multiple IDs, and each ID may include face images of multiple expressions and different angles, and different expressions and angles may correspond to different structural information.

When using the above training data to train the image enhancement network, the sample image and the auxiliary image can be two face images belonging to the same ID randomly selected from the above training data. For example, two face images of Xiao Zhang can be extracted. Both images are of Xiao Zhang’s face. In one image, Xiao Zhang has squinted eyes, in the other image, Xiao Zhang has his eyes wide open. The structural information of the eyes in the images is different, but the appearance information other than the structural information is the same. In this way, the auxiliary image is used as the label of this enhancement, and the network parameters of the image enhancement network are subsequently adjusted according to the difference between the auxiliary image and the sample enhanced image output by the image enhancement network.

In an example, each face image in the above training data may be preprocessed as shown in FIG. 4 . For example, identify the key points of the face in each face image, and then crop the face image and an image including one of the five organs of the face according to the key points of the face. Exemplarily, assuming that an organ image including eyes is required, each image in the above training data may be cropped to obtain an eye image including eyes. Then, the two eye images belonging to the same person are used as the auxiliary image and the sample image respectively, and the enhanced eye image is obtained through the output of the image enhancement network shown in Figure 2, that is, in the enhanced eye image, the structural information of the eye in the sample image is replaced by is the structural information of the eye in the auxiliary image.

Fig. 5 shows another schematic diagram of network training provided by at least one embodiment of the present disclosure. When training the image enhancement network, in addition to adjusting the network parameters according to the difference between the sample enhancement image and the auxiliary image mentioned above, it is also The training method shown in Figure 5 can be used.

As shown in FIG. 5 , the sample enhanced image and the corresponding label (for example, the label may be an auxiliary image) may be input into the discriminator 23 to obtain a discriminant value output by the discriminator 23 . For example, the discriminant value may be a numerical value between 0 and 1, which is used to represent the probability of authenticity of the sample enhanced image. The first loss is obtained according to the difference between the discriminant value and the discriminant true value; and the second loss is obtained according to the difference between the sample enhanced image and the auxiliary image. Further adjusting network parameters of at least one of the appearance extractor, generator and discriminator according to the first loss and the second loss.

In addition, the aforementioned generator and discriminator may adopt a conventional Generative adversarial nets (GAN) network structure, which is not limited in this embodiment. For example, the network structure may include convolutional layers, residual modules, pooling layers, linear layers, activation layers, etc.

This way of generating a sample enhanced image by training an image enhancement network through a generative confrontation network can make the discriminant value output by the discriminator as close to the real value as possible through training, thereby improving the fidelity of the enhanced image generation and helping to generate more accurate images. High quality enhanced images.

After the above image enhancement network is trained, it can be used to generate enhanced images. Fig. 6 shows a schematic flowchart of an image enhancement method provided by at least one embodiment of the present disclosure. As shown in Fig. 6, the method may be executed by an image enhancement device, and the method may include the following processing:

In step 600, feature extraction is performed on the target image to obtain appearance information of the target image, and the target image includes a first object.

In an example, the target image may be an image including eyes, for example, the sample image shown in FIG. 2 includes an image of human eyes. In this embodiment, the eyes in the target image may be referred to as the first object, and the purpose of this embodiment may be to enhance the target image, and perform enhancement and deformation on the eyes in the target image.

Among them, the appearance information of the target image can be obtained by extracting the features of the target image through the appearance extractor in the trained image enhancement network.

In addition, if the initial image is an image including a complete human face, then the initial image can be preprocessed to obtain a target image including eyes. For example, the key point detection network can be used to detect the key points of the face in the initial image to obtain the key points of the face in the initial image. And the initial image can be cropped according to these face key points to obtain the above-mentioned target image including eyes.

In step 602, the structure information of the second object is obtained according to the key points of the second object in the auxiliary image, and both the first object and the second object are target objects of the same type.

In this step, the second object in the auxiliary image is the same type of object as the first object, for example, both objects are eyes, or both objects are mouths. The object of the same type may be referred to as a target object. The eyes in the auxiliary image and the target image are different. The eyes in the target image are referred to as the first object, and the eyes in the auxiliary image are referred to as the second object.

The first object and the second object in this embodiment may be the same target object, for example, both are Xiao Wang's eyes, and the eyes of the two objects are in different states (eg, one is wide open, and the other is squinting). Alternatively, the first object and the second object may also belong to different target objects, for example, the first object is Xiao Wang's eyes, and the second object is Xiao Zhang's eyes.

In this embodiment, the structure information of the second object can be obtained according to the key points of the second object in the auxiliary image. Among them, the image enhancement network can include a network module for extracting key points, then after the auxiliary image is input into the image enhancement network, the key points in the auxiliary image can be extracted through the network module, and then the structure of the second object can be obtained according to the key points information. Alternatively, the image enhancement network may not include a network module for extracting key points, but the structure information of the second object may be obtained through other processing modules other than the image enhancement network, and the structure information may be input into the image enhancement network.

In step 604, an enhanced image is generated based on the appearance information and structural information, and the enhanced image replaces the structural information of the first object in the target image with the structural information of the second object.

For example, the generator in the image enhancement network can perform image generation processing according to the acquired appearance information and structure information, and finally generate an enhanced image. The enhanced image includes the appearance information of the target image and the structure information of the second object in the auxiliary image, then the enhanced image is compared with the target image by replacing the structure information of the first object in the target image with the structure information of the second object structural information.

In one example, according to actual application requirements, if the enhanced image including eyes output by the image enhancement device of this embodiment through the image enhancement network can be used for subsequent network training, the enhanced image may not undergo subsequent processing. In another example, although a single organ part can be enhanced through the image enhancement network shown in Figure 2, it is hoped that the final output is an image of the entire face. For example, the initial image may be a face image of Xiao Wang, and it is desired to obtain an enhanced image that changes the structural information of Xiao Wang's eyes. Then, the structural information of Xiao Zhang’s eyes can be obtained, combined with the structural information of Xiao Zhang’s eyes and Xiao Wang’s eye image cropped from Xiao Wang’s face image, the image generation process is performed through the image enhancement network, and the obtained enhanced In the image, the structural information of Xiao Wang's eyes is replaced with the structural information of Xiao Zhang's eyes. But at this time, the enhanced image output by the image enhancement network is an image including Xiao Wang's eyes, and the enhanced image can also be pasted back to the original Xiao Wang's face image, that is, the enhanced image will be replaced with the corresponding part of Xiao Wang's face image , the updated face image of Xiao Wang can be obtained, which can also be called the enhanced face image of Xiao Wang.

In another example, if you want to change the enhanced face image of multiple organs such as eyes and mouth, it can be processed as follows: the eye image (including the image of the eye) is obtained by cutting out the key points of the face from the initial face image and the mouth image (including the mouth image), and then, the eye image and the mouth image are respectively enhanced through the image enhancement network to obtain the corresponding enhanced images, for example, the eye enhanced image and the mouth enhanced image. Finally, paste the eye-enhanced image and the mouth-enhanced image back into the original image respectively, and replace the corresponding parts in the above-mentioned initial human face image.

The process of generating an enhanced image in the above-mentioned Figure 6 can be applied to the training scene of the network. For example, if a neural network is to be trained, but the training samples are not enough, the enhanced image is generated through the above-mentioned Figure 6 in the embodiment of the present disclosure to obtain a more accurate image. Rich sample images. The image enhancement network provided by the above embodiments of the present disclosure can combine arbitrary structural information to generate an enhanced image. Taking face enhancement as an example, when an enhanced image is generated by this method, a richer enhanced face image can be generated, which can include multiple Enhanced face images from various angles and expressions. This rich and diverse enhanced image, when applied to the training neural network model, helps to improve the generalization and robustness of the trained neural network model, and the method generates enhanced images through the trained image enhancement network , and also used the method of generative confrontation in the training process, so that the quality of the generated enhanced image is higher, more realistic and clear.

In a scenario where data acquisition is difficult, for example, only a small amount of data with the same ID can be obtained, these small amounts of data can be enriched through the image enhancement network of the embodiment of the present disclosure, so that when obtaining data, it is reduced. Difficulty in obtaining data.

In addition, the process of generating an enhanced image in FIG. 6 can also be applied to other scenarios, for example, it can be applied to face image enhancement applications such as makeup migration and face driving.

For example, if you want to transform the eyes of the human face in the original image, you can use this method to enhance the image containing the eyes, and replace the eyes in the original image with the enhanced eye image.

For another example, in the makeup transfer scene, one person Xiao Zhang’s eye makeup can be transferred to another person Xiao Wang’s eyes, then the appearance information related to Xiao Zhang’s eye makeup can be extracted through the appearance extractor in the image enhancement network , and then combine the structural information of Xiao Wang's eyes to generate an enhanced image. In the enhanced image, the structure of Xiao Wang's eyes has not changed, but it already has Xiao Zhang's eye makeup.

For another example, in the face-driven scene, it is assumed that Xiao Zhang's facial expressions are used to drive Xiao Wang's face to do the same facial expressions, and it is assumed that the specific movements are mouth movements. That can combine the appearance information of Xiao Wang's face picture and the structure information of Xiao Zhang's mouth to generate an enhanced image, so that the enhanced image is still Xiao Wang's face, but the movements and expressions of the mouth are replaced by Xiao Zhang's. expression.

In order to implement the image enhancement method in any embodiment of the present disclosure above, an embodiment of the present disclosure further provides an image enhancement device. As shown in FIG. 7 , the image enhancement device may include: an appearance extraction module 71 , a structure acquisition module 72 and an image generation module 73 .

The appearance extraction module 71 is configured to perform feature extraction on the target image to obtain appearance information of the target image, wherein the target image includes a first object; and the appearance information represents surface visual features in the target image.

The structure acquisition module 72 is configured to acquire the structure information of the second object, the first object and the second object are target objects of the same type; the structure information represents the outline feature of the second object.

An image generating module 73, configured to generate an enhanced image based on the appearance information and the structure information, where the enhanced image includes a target object with the appearance information and the structure information.

In one example, when the appearance extraction module 71 is used to perform feature extraction on the target image to obtain the appearance information of the target image, it includes: performing feature extraction on the target image through an appearance extractor in the image enhancement network Extract to obtain the appearance information of the target image.

The image generation module 73, when used to generate an enhanced image based on the appearance information and the structure information, includes: generating an enhanced image based on the appearance information and the structure information by the generator in the image enhancement network Enhance images.

In one example, when the structure acquisition module 72 is used to acquire the structure information of the second object, it includes: acquiring an initial image, which includes the second object; Detecting to obtain key points of the second object in the initial image; obtaining the structural information of the second object according to the key points of the second object.

In an example, the device further includes: a preprocessing module. The preprocessing module is configured to acquire an initial image, which includes the target object; perform key point detection on the initial image to obtain key points of the target object in the initial image; according to the The key points of the target object are used to crop the initial image to obtain the target image or an auxiliary image including the target object; wherein the second object is included in the auxiliary image.

In order to implement the image enhancement network training method of any embodiment of the present disclosure, an embodiment of the present disclosure further provides an image enhancement network training device. As shown in FIG. 8 , the training device of the image enhancement network may include: an information acquisition module 81 , a feature extraction module 82 , an image output module 83 and a parameter adjustment module 84 .

An information acquisition module 81, configured to acquire a sample image including a first object and structural information of a second object, wherein the first object and the second object are the same target object with different structural information; the structural information represents the outline feature of the second object.

The feature extraction module 82 is configured to perform feature extraction on the sample image through an image enhancement network to obtain appearance information of the sample image, and the appearance information represents surface visual features in the sample image.

An image output module 83, configured to perform image generation processing on the appearance information and the structure information through the image enhancement network, and output a sample enhanced image, wherein the sample enhanced image includes the appearance information and the structure information. The target audience for the message.

The parameter adjustment module 84 is configured to adjust network parameters of the image enhancement network according to the sample enhanced image.

In an example, when the parameter adjustment module 84 is used to adjust the network parameters of the image enhancement network according to the sample enhanced image, it includes: according to the difference between the sample enhanced image and the auxiliary image, adjusting Network parameters of the appearance extractor and generator; wherein the second object is included in the auxiliary image, and the appearance extractor and generator are included in the image enhancement network.

In one example, when the parameter adjustment module 84 is used to adjust the network parameters of the image enhancement network according to the sample enhanced image, it includes: inputting the sample enhanced image into the discriminator to obtain the The discriminant value output by the discriminator; the first loss is obtained according to the difference between the discriminant value and the discriminant true value, and the second loss is obtained according to the difference between the sample enhanced image and the auxiliary image; according to the first loss and the second loss, adjusting network parameters of at least one of the appearance extractor, the generator, and the discriminator; wherein the second object is included in the auxiliary image.

Those skilled in the art will appreciate that one or more embodiments of the present disclosure may be provided as a method, system or computer program product. Accordingly, one or more embodiments of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present disclosure may employ a computer program embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein. The form of the product.

An embodiment of the present disclosure also provides a computer-readable storage medium, on which a computer program can be stored, and when the program is executed by a processor, the image enhancement method described in any embodiment of the present disclosure or the training of the image enhancement network can be implemented. method.

An embodiment of the present disclosure also provides an electronic device, the electronic device includes: a memory and a processor, the memory is used to store computer-readable instructions, and the processor is used to call the computer instructions to implement any embodiment of the present disclosure The image enhancement method or the training method of the image enhancement network.

Wherein, "and/or" mentioned in the embodiments of the present disclosure means at least one of the two, for example, "A and/or B" includes three options: A, B, and "A and B".

Each embodiment in the present disclosure is described in a progressive manner, and the same and similar parts between the various embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the data processing device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiment.

The foregoing describes specific embodiments of the present disclosure. Other implementations are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in an order different from that in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain embodiments.

Embodiments of the subject matter and functional operations described in this disclosure can be implemented in digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware including the structures disclosed in this disclosure and their structural equivalents, or in A combination of one or more of . Embodiments of the subject matter described in this disclosure can be implemented as one or more computer programs, i.e. one or more of computer program instructions encoded on a tangible, non-transitory program carrier for execution by or to control the operation of data processing apparatus. Multiple modules. Alternatively or additionally, the program instructions may be encoded on an artificially generated propagated signal, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode and transmit information to a suitable receiver device for transmission by the data The processing means executes. A computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.

The processes and logic flows described in this disclosure can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, such as FPG (Field Programmable Gate Array) or SIC (Application Specific Integrated Circuit).

Computers suitable for the execution of a computer program include, for example, general and/or special purpose microprocessors, or any other type of central processing unit. Generally, a central processing unit will receive instructions and data from a read only memory and/or a random access memory. The essential components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Typically, a computer will also include, or be operatively coupled to, one or more mass storage devices for storing data, such as magnetic or magneto-optical disks, or optical disks, to receive data therefrom or to It transmits data, or both. However, it is not necessary for a computer to have such a device. Furthermore, a computer may be embedded in another device such as a mobile phone, a personal digital assistant (PD or more), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a device such as a Universal Serial Bus ( USB) flash drives, to name a few.

Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media, and memory devices, including, for example, semiconductor memory devices (such as EPROM, EEPROM, and flash memory devices), magnetic disks (such as internal hard disks or removable disks), magneto-optical disks, and CD ROM and DVD-ROM disks. The processor and memory can be supplemented by, or incorporated in, special purpose logic circuitry.

While this disclosure contains many specific implementation details, these should not be construed as limitations on the scope of any disclosure or of what may be claimed, but rather as primarily describing features of particular disclosed embodiments. Certain features that are described in multiple embodiments within this disclosure can also be implemented in combination in a single embodiment. On the other hand, various features that are described in a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may function in certain combinations as described above and even be initially so claimed, one or more features from a claimed combination may in some cases be removed from that combination and the claimed A protected combination can point to a subcombination or a variant of a subcombination.

Similarly, while operations are depicted in the figures in a particular order, this should not be construed as requiring that those operations be performed in the particular order shown, or sequentially, or that all illustrated operations be performed, to achieve the desired result. In some cases, multitasking and parallel processing may be advantageous. Furthermore, the separation of various system modules and components in the above-described embodiments should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can often be integrated together in a single software product in, or packaged into multiple software products.

Thus, certain embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.

The above description is only a preferred embodiment of one or more embodiments of the present disclosure, and is not intended to limit one or more embodiments of the present disclosure. Within the spirit and principle of one or more embodiments of the present disclosure, Any modification, equivalent replacement, improvement, etc. should be included in the protection scope of one or more embodiments of the present disclosure.

Claims

A method of image enhancement, comprising:

performing feature extraction on the target image to obtain appearance information of the target image, wherein the target image includes a first object; the appearance information represents surface visual features in the target image;

Acquiring structural information of a second object, wherein the first object and the second object are target objects of the same type; the structural information represents an outline feature of the second object;

An enhanced image is generated based on the appearance information and the structure information, wherein the enhanced image includes a target object having the appearance information and the structure information.
The method according to claim 1, wherein the method is executed by an image enhancement device, and an image enhancement network is deployed in the image enhancement device, and the image enhancement network includes: an appearance extractor and a generator;

The performing feature extraction on the target image to obtain the appearance information of the target image includes: performing feature extraction on the target image by an appearance extractor in the image enhancement network to obtain the appearance information of the target image;

The generating an enhanced image based on the appearance information and the structure information includes: generating an enhanced image based on the appearance information and the structure information by the generator in the image enhancement network.
The method according to claim 1, wherein said obtaining the structural information of the second object comprises:

acquiring an initial image, the initial image including the second object;

performing key point detection on the initial image to obtain key points of the second object in the initial image;

Obtain the structural information of the second object according to the key points of the second object.
The method according to claim 1, wherein the second object is included in the auxiliary image; the method further comprises:

acquiring an initial image, the initial image including the target object;

performing key point detection on the initial image to obtain key points of the target object in the initial image;

The initial image is cropped according to the key point of the target object to obtain the target image or the auxiliary image including the target object.
The method according to claim 4, further comprising: after the enhanced image is generated based on the appearance information and the structural information, the enhanced image replaces the corresponding image in the initial image part.
The method according to claim 1, wherein the first object and the second object are the same target object, or different target objects of the same type, and the target object is one of the five sense organs in the human face. one.
A training method for an image enhancement network, comprising:

Acquiring a sample image including a first object and structural information of a second object, wherein the first object and the second object are the same target object with different structural information; the structural information represents the structure information of the second object contour features;

performing feature extraction on the sample image through an image enhancement network to obtain appearance information of the sample image, wherein the appearance information represents surface visual features in the sample image;

performing image generation processing on the appearance information and the structure information through the image enhancement network, and outputting a sample enhanced image, wherein the sample enhanced image includes the target object having the appearance information and the structure information;

and adjusting network parameters of the image enhancement network according to the sample enhanced image.
The training method according to claim 7, wherein the second object is included in the auxiliary image, and the image enhancement network includes: an appearance extractor and a generator; The network parameters of the image enhancement network include:

Adjusting network parameters of the appearance extractor and the generator based on the difference between the sample enhanced image and the auxiliary image.
The training method according to claim 7, wherein the second object is included in the auxiliary image; the image enhancement network comprises: an appearance extractor and a generator;

The step of adjusting the network parameters of the image enhancement network according to the sample enhancement image includes:

inputting the sample enhanced image into a discriminator to obtain a discriminant value output by the discriminator;

Obtaining a first loss based on the difference between the discriminant value and the discriminant true value, and obtaining a second loss based on the difference between the sample enhanced image and the auxiliary image;

A network parameter of at least one of the appearance extractor, the generator, and the discriminator is adjusted based on the first loss and the second loss.
An image enhancement device comprising:

The appearance extraction module is used to perform feature extraction on the target image to obtain appearance information of the target image, wherein the target image includes a first object; the appearance information represents surface visual features in the target image;

A structure acquisition module, configured to acquire structure information of a second object, the first object and the second object are target objects of the same type; the structure information represents the outline feature of the second object;

An image generating module, configured to generate an enhanced image based on the appearance information and the structure information, wherein the enhanced image includes a target object with the appearance information and the structure information.
A training device for an image enhancement network, comprising:

An information acquisition module, configured to acquire a sample image including a first object and structural information of a second object, wherein the first object and the second object are the same target object with different structural information; the structural information represents contour features of the second object;

A feature extraction module, configured to perform feature extraction on the sample image through an image enhancement network to obtain appearance information of the sample image, wherein the appearance information represents surface visual features in the sample image;

An image output module, configured to perform image generation processing on the appearance information and the structure information through the image enhancement network, and output a sample enhanced image, wherein the sample enhanced image includes the appearance information and the structure information the target audience of

A parameter adjustment module, configured to adjust the network parameters of the image enhancement network according to the sample enhanced image.
An electronic device, comprising: a memory and a processor, the memory is used to store computer-readable instructions, and the processor is used to call the computer instructions to implement the method described in any one of claims 1 to 6, or a claim The method described in any one of 7 to 9.
A computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method according to any one of claims 1 to 6, or the method according to any one of claims 7 to 9 is realized.