CN115719503A

CN115719503A - Image processing method and device, electronic device and storage medium

Info

Publication number: CN115719503A
Application number: CN202110976011.3A
Authority: CN
Inventors: 王欣睿
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-08-24
Filing date: 2021-08-24
Publication date: 2023-02-28

Abstract

The embodiment of the application discloses an image processing method and device, electronic equipment and a storage medium, wherein the method comprises the following steps: aligning the face area of the first face image with the face area of the second face image to obtain a target face image of the first face image after alignment; respectively extracting the features of the target face image and the second face image under multiple scales to obtain a corresponding first multi-scale feature map and a corresponding second multi-scale feature map; acquiring a first mask of a target face image and a second mask of a second face image; and performing face fusion processing on the target face image and the second face image according to a first residual link established between the first mask and the first multi-scale feature image and a second residual link established between the second mask and the second multi-scale feature image to obtain a fused face image. The technology of the embodiment of the application can adaptively fuse any two human face pictures, such as any two human face pictures in a multimedia video.

Description

Image processing method and device, electronic device and storage medium

Technical Field

The present application relates to the field of image processing and computer vision technologies, and in particular, to an image processing method and apparatus, an electronic device, and a computer-readable storage medium.

Background

Face fusion/face change is one of the important subjects in the field of image processing and computer vision, and the existing face fusion technology can be mainly divided into two types. The first type is a method of implementing face fusion by face alignment and cedar fusion, and the other type is a method of implementing face fusion by a neural network represented by deepfake (AI face changing tool).

In the first method, the face areas of an original image and a target image are found by face detection or face key points and the like, and after the steps of stretching, aligning and the like are carried out, the face is attached by using poisson fusion and the like; the method has the advantages of convenient use, face changing and fusion as long as the face area can be detected for any original image and target image, but the method also has obvious defects, the Poisson fusion speed is low, the effect cannot be ensured, and obvious boundaries and artifacts can appear on the image with large chromatic aberration.

In recent years, a Convolutional Neural Network (CNN) is gradually becoming a mainstream method in the field of image processing and computer vision, and therefore, in another method, an end-to-end Neural Network is trained on a large number of photos of two persons through a deepake and similar schemes to complete face replacement, which is more natural than the first method in a fusion boundary and can automatically correct angles and orientations.

Disclosure of Invention

In order to solve the above technical problem, embodiments of the present application provide an image processing method and apparatus, an electronic device, and a computer-readable storage medium, which can quickly implement face fusion and replacement of a face image.

Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.

According to an aspect of an embodiment of the present application, there is provided an image processing method including: aligning a face area of a first face image with a face area of a second face image to obtain a target face image of the first face image after alignment; respectively extracting the features of the target face image and the second face image under multiple scales to obtain a corresponding first multi-scale feature map and a corresponding second multi-scale feature map; acquiring a first mask of the target face image and a second mask of the second face image; and performing face fusion processing on the target face image and the second face image according to a first residual error link established between the first mask and the first multi-scale feature map and a second residual error link established between the second mask and the second multi-scale feature map to obtain a fused face image.

According to an aspect of an embodiment of the present application, there is provided an image processing apparatus including: the face alignment module is configured to align a face region of a first face image with a face region of a second face image to obtain a target face image of the first face image after alignment; the multi-scale feature map acquisition module is configured to respectively extract features of the target face image and the second face image under multiple scales to obtain a corresponding first multi-scale feature map and a corresponding second multi-scale feature map; the mask acquisition module is configured to acquire a first mask of the target face image and a second mask of the second face image; and the face fusion module is configured to perform face fusion processing on the target face image and the second face image according to a first residual link established between the first mask and the first multi-scale feature map and a second residual link established between the second mask and the second multi-scale feature map so as to obtain a fused face image. According to an aspect of the embodiments of the present application, there is provided an electronic device including a processor and a memory, where the memory stores computer-readable instructions, and the computer-readable instructions, when executed by the processor, implement the image processing method as described above.

According to an aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a processor of a computer, cause the computer to execute the image processing method as described above.

According to an aspect of embodiments herein, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the method provided in the various alternative embodiments described above.

In the technical scheme provided by the embodiment of the application, two aligned face images are directly input, multi-scale feature map and mask residual error link in the aligned face images are established, face fusion processing is carried out on the multi-scale feature map and the mask after the residual error link is established, and the fused face images are recorded. The human face fusion method is high in speed, does not need massive data training, and can adaptively fuse any two human face pictures.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and, together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:

FIG. 1 is a schematic illustration of an implementation environment to which the present application relates;

FIG. 2 is a flow chart illustrating a method of image processing according to an exemplary embodiment of the present application;

FIG. 3 is a schematic diagram illustrating face fusion using a multi-scale encoder-decoder with residual linking according to an exemplary embodiment of the present application;

FIG. 4 is a flow chart of a method of image processing shown in another exemplary embodiment of the present application;

FIG. 5 is a flow chart of step S210 in the embodiment shown in FIG. 2 of the present application in an exemplary embodiment;

FIG. 6 is a flow chart of step S510 in the embodiment shown in FIG. 5 of the present application in an exemplary embodiment;

FIG. 7 is a schematic diagram illustrating the structure of an object detection algorithm in an exemplary embodiment of the present application;

fig. 8 is a schematic structural diagram of a face keypoint detection network according to an exemplary embodiment of the present application;

FIG. 9 is a flowchart of step S630 in the embodiment of FIG. 6 of the present application in an exemplary embodiment;

FIG. 10 is a flowchart of step S530 in the embodiment shown in FIG. 5 of the present application in an exemplary embodiment;

fig. 11 is a schematic structural diagram of an image processing apparatus according to an exemplary embodiment of the present application;

FIG. 12 is a block diagram illustrating a computer system suitable for use to implement an electronic device in accordance with an exemplary embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

It should also be noted that: reference to "a plurality" in this application means two or more. "and/or" describe the association relationship of the associated objects, meaning that there may be three relationships, e.g., A and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach to make computers have intelligence, and is applied in various fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

The image processing method and apparatus, the electronic device, and the storage medium according to the embodiments of the present application relate to artificial intelligence technology and machine learning technology, and the embodiments will be described in detail below.

Referring first to fig. 1, fig. 1 is a schematic diagram of an implementation environment related to the present application. The implementation environment includes a user terminal 100 and a server 200, and the user terminal 100 and the server 200 communicate with each other through a wired or wireless network.

The user terminal 100 is used for collecting a face image to be subjected to face fusion processing, inputting the collected face image to the server 200, processing the face image by the server 200, performing face fusion on the received face image, sending an effect image obtained after face fusion to the user terminal 100, and visually displaying the effect image obtained after face fusion through a display module of the user terminal 100.

Illustratively, after receiving a set of first face image and second face image to be subjected to face fusion processing, the user terminal 100 sends the first face image and the second face image to the server 200, the server 200 aligns a face region of the first face image with a face region of the second face image to obtain a target face image of the first face image after alignment processing, then extracts features of the target face image and the second face image under multiple scales respectively to obtain a corresponding first multi-scale feature map and a second multi-scale feature map, and then obtains a first mask of the target face image and a second mask of the second face image, and performs face fusion processing on the target face image and the second face image according to a first residual link established between the first mask and the first multi-scale feature map and a second residual link established between the second mask and the second multi-scale feature map to obtain a fused face image.

The user terminal 100 includes, but is not limited to, a mobile phone, a computer, an intelligent voice interaction device, an intelligent household appliance, a vehicle-mounted terminal, and the like, and any electronic device capable of implementing image visualization may be, for example, a smart phone, a tablet, a notebook, a computer, and the like, which is not limited in this respect. The server 200 may be an independent physical server, or a server cluster or a distributed system formed by a plurality of physical servers, where the plurality of servers may form a block chain, and the server is a node on the block chain, and the server 200 may also be a cloud server that provides basic cloud computing services such as cloud service, cloud database, cloud computing, cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN (Content Delivery Network), big data, and artificial intelligence platform, which is not limited herein.

FIG. 2 is a flow chart illustrating an image processing method according to an exemplary embodiment. As shown in fig. 2, in an exemplary embodiment, the method may include steps S210 to S270, which are described in detail as follows:

step S210: and aligning the face area of the first face image with the face area of the second face image to obtain a target face image of the first face image after alignment.

In this embodiment, a group of face pictures to be face-fused is obtained: firstly, aligning the human face directions in the two human face images of the first human face image and the second human face image; if the face direction in the first face image needs to be aligned with the face direction in the second face image, the face in the first face image is aligned with the face in the second face image by a face alignment method, and a target face image of the first face image after alignment processing is obtained.

Of course, in other embodiments, the face in the second face image may be aligned with the face in the first face image by a face alignment method, so as to obtain a target face image of the second face image after alignment processing.

Step S230: and respectively extracting the features of the target face image and the second face image under multiple scales to obtain a corresponding first multi-scale feature map and a corresponding second multi-scale feature map.

In this embodiment, the face image is processed by the encoder, and feature maps of different scales of the face image are extracted to obtain a multi-scale feature map.

In a specific embodiment, a target face image and a second face image are processed through two encoders with the same structure, and a first multi-scale feature map of the target face image and a second multi-scale feature map of the second face image are respectively extracted.

Illustratively, before the step, a mask of the target face image can be obtained, and then the face region of the target face image is subjected to color correction according to the mask of the target face image, so as to provide a face image convenient to process for a subsequent face fusion process.

Step S250: and acquiring a first mask of the target face image and a second mask of the second face image.

In the embodiment, masks with different sizes of a target face image are respectively obtained to form a first mask of the target image; and obtaining masks with different sizes of the second face image to form a second mask. The first mask and the second mask with different sizes can be obtained by scaling the masks.

Step S270: and performing face fusion processing on the target face image and the second face image according to a first residual error link established by the first mask and the first multi-scale feature map and a second residual error link established by the second mask and the second multi-scale feature map to obtain a fused face image.

In this embodiment, the first mask and the first multi-scale feature map, and the second mask and the second multi-scale feature map are respectively input into the decoder to perform face fusion of the corresponding target face image and the second face image, and because the residual error link is established between the decoder and the encoder, the residual error link corresponding to the first mask and the first multi-scale feature map is referred to as a first residual error link, and the link corresponding to the second mask and the second multi-scale feature map is referred to as a second residual error link. It should be noted that, in this embodiment, the multi-scale feature map is obtained by the encoder, and the multi-scale feature map and the mask are jointly input to the decoder linked with the encoder link residual for face fusion, which aims to facilitate the generation of a result with richer details in the face fusion process and improve the face fusion effect by the residual link between the encoder and the decoder.

In a specific embodiment, after obtaining the first multi-scale feature map and the second multi-scale feature map, multiplying a plurality of first multi-scale feature maps with different sizes by the first mask scaled to the corresponding size to be used as input of a decoder, and simultaneously multiplying a plurality of second multi-scale feature maps with different sizes by the second mask scaled to the corresponding size to be used as input of the decoder; of course, the input of the decoder is obtained by multiplying the multi-scale feature map and the mask with the same size, but in other embodiments, other ways such as adding the multi-scale feature map and the mask with the same size, or establishing a linear relationship between the multi-scale feature map and the mask with the same size may also be used as the input of the decoder, and is not limited specifically here.

In the embodiment, the face fusion of the target face image and the second face image is completed through the processing of a multi-scale encoder-decoder with residual error linkage; before the multi-scale encoder-decoder with the residual error link is applied to the face fusion of two aligned face images, the multi-scale encoder-decoder with the residual error link can be trained, for example, a large number of aligned face images and corresponding masks are obtained by using the method, face data are synthesized by using Poisson fusion or other image fusion methods, and after manual screening, data with poor synthesis effect are removed, and data with good synthesis effect are left; and finally, training the multi-scale encoder-decoder with the residual error link by adopting a training framework based on a generated countermeasure network (GAN), and finally obtaining the multi-scale encoder-decoder with the residual error link, which can be applied to face fusion of two face images after alignment processing.

When the multi-scale coder-decoder with the residual error link is trained by adopting a training framework based on a generated countermeasure network, the loss function of the multi-scale coder-decoder with the residual error link is adopted as L1 loss, and the weighted sum of the countermeasure loss and the multi-scale perception loss is adopted; the specific loss functions are as follows:

L ₁ Loss＝∑|y-G(x)|

L _adv ＝E _(x，y) [log D(x，y)]+E _(y) [log(1-D(x，G(x)))]

L _perceptual ＝∑|VGG(y)-VGG(G(x))|

wherein L is ₁ Loss is the pixel-by-pixel L1 Loss of the network output and truth label, y is the label, G is the generator, x is the network input, L _adv Representing the countermeasure loss, D is a discriminator, E is mathematical expectation, x and y are subject to certain distribution, VGG is a pre-training VGG (convolutional neural network) feature extraction network, and L _perceptual And representing the sensing loss of the network output and the label, and calculating the sum of the L1 loss of the extracted characteristics of each layer.

Referring to fig. 3, a schematic diagram of a face fusion performed by using the multi-scale encoder-decoder with residual links in a specific embodiment is shown, in the schematic diagram, a multi-scale feature map of two face images is obtained by using an encoder, a first multi-scale feature map of a target face image is obtained by using the encoder (in fig. 3, the solid rectangular frame is regarded as a feature map of different scales), then a first residual link is established between the first multi-scale feature map and a first mask of the same size (in fig. 3, an image of an "X" shape is used for establishing a residual link), meanwhile, a second face image is subjected to the same processing, then the first residual link established between the first mask and the first multi-scale feature map and a second residual link established between the second mask and the second multi-scale feature map are input to a decoder together for face fusion (in fig. 3, "+" is input between the first residual link and the second residual link established between the first mask and the second multi-scale feature map, and an empty rectangular face is used for obtaining outputs of different sizes obtained by the decoder), and finally, a fused image is obtained.

In the embodiment, two input face pictures can be directly processed through the multi-scale encoder-decoder with residual error link, the fused result is output, compared with the existing fusion method, the speed is higher, the problems of artifacts and contour distortion can be effectively avoided, when any two faces are changed, a large number of pictures of the two faces do not need to be collected to carry out independent model training, the process is simple and convenient, a pipeline (research and development pipeline/product line) based on face fusion can be realized, the any two faces can be processed without retraining the model, and the effect is superior to that of the traditional method.

Fig. 4 is a flowchart illustrating an image processing method according to another exemplary embodiment, which may precede step S230 set forth in the embodiment illustrated in fig. 2, and as illustrated in fig. 4, the method may include steps S410 to S430, described in detail as follows:

step S410: and adjusting the angle of the face in the target face image.

In this embodiment, as for the problem that the orientation of the face in the target face image has too large difference after the alignment processing in step S210 in the embodiment shown in fig. 2, the face angle in the target face image is adjusted, specifically, the features of the target face image can be extracted by the generator, and then the convolution is performed to obtain the target face image with the adjusted face angle.

Step S430: and if the difference between the adjusted angle and the angle of the face in the second face image is not within the preset error threshold range, continuously adjusting the angle of the face in the target face image until the difference between the adjusted angle and the angle of the face in the second face image is within the preset error threshold range.

In this embodiment, it is determined whether the face orientation of the target face image after adjusting the face angle is within an error range from the face orientation of the second face image, and if not, the angle of the target face image is continuously adjusted by the generator until the difference between the adjusted angle and the face angle in the second face image is within a preset error threshold.

The specific judging mode can judge the authenticity of the target face image and the second face image after the face angle is adjusted through the discriminator, and the generator stops working until the authenticity output by the discriminator is within the error allowable range.

In this embodiment, the face angle of the target face image is adjusted by generating the antagonistic network hidden space interpolation method based on the generator and the discriminator, so as to solve the problem of poor fusion effect caused by too large difference in face orientation of the target face image, and subsequently, the face fusion can be performed through steps S230 to S270 provided in the embodiment in fig. 2, so as to improve the face fusion effect.

Fig. 5 is a flow chart of step S210 in the embodiment shown in fig. 2 in an exemplary embodiment. As shown in fig. 5, in an exemplary embodiment, the process of performing alignment processing on the face region of the first face image and the face region of the second face image to obtain the target face image of the first face image after alignment processing may include steps S510 to S530, which are described in detail as follows:

step S510: and acquiring the coordinates of the key points of the face image.

In this embodiment, two face images to be subjected to face fusion image processing are first obtained, which are a first face image and a second face image, respectively, and then face key point coordinates of the two face images are obtained, which include a first face key point coordinate of the first face image and a second face key point coordinate of the second face image.

Step S530: and aligning the face area of the first face image and the face area of the second face image according to the first face key point coordinates and the second face key point coordinates to obtain a target face image of the first face image after alignment.

In this embodiment, the face alignment is performed according to the first face key point coordinates and the second face key point coordinates, and if the face direction in the first face image needs to be aligned with the face direction in the second face image, the first face key point coordinates are adjusted according to the second face key point coordinates, so that the face in the first face image is aligned with the face in the second face image, and a target face image of the first face image after alignment processing is obtained.

Certainly, in other embodiments, the coordinates of the key points of the second face may also be adjusted, so that the face in the second face image is aligned with the face in the first face image, and a target face image of the second face image after alignment processing is obtained.

In this embodiment, after the target face image subjected to the face alignment processing is obtained, the face fusion processing is performed on the target face and the face image not subjected to the face alignment processing, so as to obtain a fused face image.

If in an embodiment, the target face image is obtained by aligning the first face image, then performing face fusion processing on the target face image and the second face image to obtain a fused face image; and if the target face image is obtained after the second face image is aligned, carrying out face fusion processing on the target face image and the first face image to obtain a fused face image.

Preferably, before performing face fusion processing on the target face image and the unaligned face image to obtain a fused face image, the method includes: and a mask of the target face image can be obtained, and then the color correction is carried out on the face area of the target face image according to the mask of the target face image, so that the face image convenient to process is provided for the subsequent face fusion processing.

In the embodiment, the face key point coordinates of the two images to be subjected to face fusion processing are obtained, the face is aligned based on the face key points of the two images, the face alignment accuracy is improved, and finally the images subjected to face alignment are subjected to face fusion to obtain the fused face images.

Fig. 6 is a flow chart of step S510 in the embodiment shown in fig. 5 in an exemplary embodiment. As shown in fig. 6, in an exemplary embodiment, the acquiring of the face key point coordinates of the face image may include steps S610 to S650, which are described in detail as follows:

step S610: and acquiring the position coordinates of the face frame in the face image.

In this embodiment, the detecting the face in the face image first specifically includes: by inputting RGB (three primary color picture) color pictures, then cutting out a face frame, the position coordinates of the face frame are output.

The face frame covers a face area in a face image, the position coordinates of the output face frame are (x, y, w, h), wherein x and y are coordinates of the upper left corner of the face frame in a rectangular coordinate system, w and h are the width and the height of the face frame, the face frame can be rectangular or square, and if the face frame is square, w = h.

In a specific embodiment, the face frame recognition may be performed by using a target detection algorithm (retinet), and a structure diagram of the target detection algorithm may refer to fig. 7, specifically, the target detection algorithm includes a residual error network (ResNet) for extracting a main feature of a face image, a multi-scale target detection network (feature pyramid net) for obtaining a plurality of times of features by fusing the main feature, and a target frame regression and classification regression subnet (class + box subnets) for outputting a key point position; the target frame regression subnet (class subnet) adopts 4 convolutions of 256 channels and 1 convolution of num _ priors x num _ classes, num _ priors refers to the number of prior frames owned by the feature layer, and num _ classes refers to how many classes of targets are detected by the network together; and (3) classifying a regression subnet (box subnet) by adopting 4 convolutions of 256 channels and 1 convolution of num _ priors x 4, wherein num _ priors refers to the number of prior frames owned by the feature layer, 4 refers to the adjustment condition of the prior frames, finally, the face frames in the face image are output through class subnets, and meanwhile, the box subnet outputs the coordinates of the face frames. Of course, the above is only one face frame detection method in this embodiment, and other face frame detection methods may also be used in other embodiments.

Step S630: and acquiring the position of a key point of the face in the face frame.

In this embodiment, after the face frame is obtained, the face region cut out from the face frame in the face image is input to the face key point detection network, and the position of the face key point is output.

Step S650: and acquiring the coordinates of the key points of the face according to the positions of the key points of the face and the position coordinates of the face frame.

In this embodiment, the position coordinates of the face frame are used as a reference, and the face key point coordinates of the face key point position are obtained according to the face key point position.

The number of the key points may be limited in this embodiment, for example, when the face key point detection network is input, 106 key points are set, the positions of the 106 key points are input, and finally data (106,2) is obtained, that is, coordinates of the 106 key points on the face image, of course, the number of other key points may also be set in advance according to specific needs.

In a specific embodiment, the face key point detection may be performed through a face key point detection network as shown in fig. 8, specifically, a face region corresponding to a face frame cut out from a face image is input into the face key point detection network, upsampling and downsampling are performed multiple times through the funnel-shaped network to obtain position features of the face key points on multiple scales, and finally, face key point coordinates are obtained.

In the embodiment, the coordinates of the face frame are obtained by cutting out the face frame, the face frame is cut out to be identified, the positions of the face key points are obtained, the coordinates of the face key points of the key points on the face image are output by using the coordinates of the face frame to a reference coordinate system, and a foundation is laid for subsequent face alignment by obtaining the accurate positions of the face key points on the face image.

Fig. 9 is a flow chart of step S630 in the embodiment shown in fig. 6 in an exemplary embodiment. As shown in fig. 9, in an exemplary embodiment, the obtaining of the position of the face key point in the face frame may include steps S910 to S930, which are described in detail as follows:

step S910: converting the size of the face frame into a preset size;

in this embodiment, before obtaining the coordinates of the face key points, the size of the face frame is converted to a preset size, for example, the size of the face frame may be scaled to a resolution of 256 × 256 or other sizes.

Step S930: and acquiring the positions of the key points of the face in the face frame converted to the preset size.

In this embodiment, the face key point detection network is input to the face area corresponding to the face frame converted to the preset size, so as to obtain the position of the face key point after size conversion.

In the embodiment, the sizes of the face frames in the face images subjected to face alignment are within the same range through conversion of the sizes of the face frames, that is, the reference systems of the coordinates of the key points of the faces in the obtained face images are the same, and the accuracy of face alignment is improved.

Fig. 10 is a flowchart of step S530 in the embodiment shown in fig. 5 in an exemplary embodiment. As shown in fig. 6, in an exemplary embodiment, the process of aligning the face area of the first face image with the face area of the second face image according to the first face key point coordinate and the second face key point coordinate to obtain the target face image of the first face image after alignment processing includes steps S101 to S105, which are described in detail as follows:

step S101: and respectively carrying out feature standardization processing on the first face key point coordinates and the second face key point coordinates to obtain first key point features and second key point features.

In this embodiment, feature standardization processing is performed on the key point coordinates of the two face images, specifically, the first face key point coordinate and the second face key point coordinate are respectively subtracted from the mean value, and then the difference of squares is divided, so as to obtain a first key point feature and a second key point feature.

Step S103: and performing regularization processing on the first key point features and the second key point features, and performing singular value decomposition on data obtained after regularization processing to obtain an affine transformation matrix.

In this embodiment, the first keypoint features and the second keypoint features are subjected to regularization processing, and then are decomposed (singular value decomposition) by using svd to obtain an affine transformation matrix.

In a specific embodiment, the affine transformation matrix can be obtained by:

R＝arg min _Ω ||ΩA-B||subject to Ω ^T Ω＝I

a and B are respectively a first face key point coordinate of a first face image and a second face key point coordinate of a second face image, omega is a least square coefficient, R is an affine transformation matrix, I is a unit matrix, T is a transposition, the second face image is taken as a standard shape, and an affine transformation matrix from the first face image to the second face image is obtained by using a least square method.

Step S105: and carrying out affine transformation on the first face image through the affine transformation matrix to obtain a target face image.

After obtaining the affine transformation matrix, performing affine transformation on the face image to be aligned by using the affine transformation matrix to obtain a target face image, wherein the face angle and the orientation of the target face image are the same as the face angle and the orientation of another unaligned face image.

In one embodiment, if the face area of the first face image needs to be aligned with the face area of the second face image, affine transformation is performed on the first face image through an affine transformation matrix to obtain a target face image of the first face image after alignment processing, wherein the face angle and the orientation of the target face image are the same as those of the second face image; and when the face area of the second face image needs to be aligned with the face area of the first face image, performing affine transformation on the second face image through an affine transformation matrix to obtain a target face image of the second face image after alignment processing, wherein the face angle and the orientation of the target face image are the same as those of the first face image.

In the embodiment, the target face image with the same angle and orientation as the other face image is obtained by aligning the face region of the face image to be aligned with the face region of the other face image, the alignment process is simple, the processing effect is high, and a basis can be provided for subsequent face fusion processing.

Fig. 11 is a block diagram illustrating an image processing apparatus according to an exemplary embodiment. As shown in fig. 11, the apparatus includes:

the face alignment module 111 is configured to align a face region of the first face image with a face region of the second face image to obtain a target face image of the first face image after alignment;

a multi-scale feature map acquisition module 113 configured to extract features of the target face image and the second face image under multiple scales respectively to obtain a corresponding first multi-scale feature map and a corresponding second multi-scale feature map;

a mask acquisition module 115 configured to acquire a first mask of the target face image and a second mask of the second face image;

and the face fusion module 117 is configured to perform face fusion processing on the target face image and the second face image according to a first residual link established between the first mask and the first multi-scale feature map and a second residual link established between the second mask and the second multi-scale feature map, so as to obtain a fused face image.

The embodiment provides an image processing device, and based on the device, face fusion between any two face images can be realized.

In another exemplary embodiment, the image processing apparatus further includes:

and the color correction module is configured to perform color correction on the face area of the target face image according to the mask of the target face image.

and the angle adjusting module is configured to adjust the angle of the face in the target face image.

And the angle checking module is configured to continue to adjust the angle of the face in the target face image until the difference between the adjusted angle and the angle of the face in the second face image is within the preset error threshold if the difference between the adjusted angle and the angle of the face in the second face image is not within the preset error threshold.

The angle adjustment module and the angle inspection module provided in this embodiment may further cooperate with the face alignment module 111, the multi-scale feature map acquisition module 113, and the mask acquisition module 115 in the above embodiments, and the face fusion module 117 is configured to perform face fusion, specifically, the face alignment module 111 performs face alignment, and then the angle adjustment module and the angle inspection module adjust the face angle of the target image until the difference between the face angle of the target image and the face angle in the second face image is within a preset error threshold, and then the multi-scale feature map acquisition module 113 and the mask acquisition module 115 perform face fusion processing on the target image after face angle adjustment and the second face image by the face fusion module 117, so as to obtain a fused face image.

In another exemplary embodiment, the face alignment module 111 includes:

a key point coordinate obtaining unit configured to obtain face key point coordinates of the face image; the face key point coordinates of the face image comprise first face key point coordinates of a first face image and second face key point coordinates of a second face image;

the face alignment unit is configured to align the face area of the first face image with the face area of the second face image according to the first face key point coordinate and the second face key point coordinate to obtain a target face image of the first face image after alignment;

in another exemplary embodiment, the key point coordinate acquiring unit includes:

a face frame position acquisition plate configured to acquire a position coordinate of a face frame in a face image; the face frame covers a face area in the face image;

a face key point position acquisition plate configured to acquire a face key point position in a face frame;

and the face key point coordinate acquisition plate is configured to acquire the face key point coordinates of the face key point positions according to the face key point positions and the position coordinates of the face frame.

In another exemplary embodiment, the face keypoint location acquisition tile includes:

a size conversion block configured to convert the size of the face frame to a preset size;

and the face key point position acquisition block is configured to acquire the face key point position in the face frame converted to the preset size.

In another exemplary embodiment, the face alignment unit includes:

a key point feature acquisition block configured to perform feature standardization processing on the first face key point coordinates and the second face key point coordinates respectively to obtain first key point features and second key point features;

an affine transformation matrix obtaining plate, configured to perform regularization on the first key point features and the second key point features, and perform singular value decomposition on data obtained after the regularization to obtain an affine transformation matrix;

and the target face image acquisition plate is configured to perform affine transformation on the first face image through an affine transformation matrix to obtain a target face image.

It should be noted that the apparatus provided in the foregoing embodiment and the method provided in the foregoing embodiment belong to the same concept, and specific ways of performing operations by the modules and units have been described in detail in the method embodiment, and are not described again here.

Embodiments of the present application also provide an electronic device, including a processor and a memory, where the memory has stored thereon computer readable instructions, which when executed by the processor, implement the image processing method as described above.

FIG. 12 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.

It should be noted that the computer system 1600 of the electronic device shown in fig. 12 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 12, computer system 1600 includes a Central Processing Unit (CPU) 1601, which can perform various suitable actions and processes, such as executing the methods described in the above embodiments, according to a program stored in a Read-Only Memory (ROM) 1602 or a program loaded from a storage portion 1608 into a Random Access Memory (RAM) 1603. In the RAM 1603, various programs and data necessary for system operation are also stored. The CPU 1601, ROM 1602, and RAM 1603 are connected to each other via a bus 1604. An Input/Output (I/O) interface 1605 is also connected to the bus 1604.

The following components are connected to the I/O interface 1605: an input portion 1606 including a keyboard, a mouse, and the like; an output section 1607 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage portion 1608 including a hard disk and the like; and a communication section 1609 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 1609 performs communication processing via a network such as the internet. The driver 1610 is also connected to the I/O interface 1605 as needed. A removable medium 1611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1610 as necessary, so that a computer program read out therefrom is mounted in the storage portion 1608 as necessary.

In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method illustrated by the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication portion 1609, and/or installed from the removable media 1611. When the computer program is executed by a Central Processing Unit (CPU) 1601, various functions defined in the system of the present application are executed.

It should be noted that the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer readable storage medium may be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with a computer program embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. The computer program embodied on the computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

Yet another aspect of the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as described above. The computer-readable storage medium may be included in the electronic device described in the above embodiment, or may exist separately without being incorporated in the electronic device.

Another aspect of the application also provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the method provided in the above embodiments.

The above description is only a preferred exemplary embodiment of the present application, and is not intended to limit the embodiments of the present application, and those skilled in the art can easily make various changes and modifications according to the main concept and spirit of the present application, so that the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An image processing method, comprising:

aligning a face area of a first face image with a face area of a second face image to obtain a target face image of the first face image after alignment;

respectively extracting the features of the target face image and the second face image under multiple scales to obtain a corresponding first multi-scale feature map and a corresponding second multi-scale feature map;

acquiring a first mask of the target face image and a second mask of the second face image;

and performing face fusion processing on the target face image and the second face image according to a first residual error link established between the first mask and the first multi-scale feature map and a second residual error link established between the second mask and the second multi-scale feature map to obtain a fused face image.

2. The method according to claim 1, wherein before said extracting the features of the target face image and the second face image at multiple scales to obtain the corresponding first multi-scale feature map and second multi-scale feature map, the method further comprises the steps of:

and carrying out color correction on the face area of the target face image according to the mask of the target face image.

3. The method according to claim 1, wherein before said extracting the features of the target face image and the second face image at multiple scales to obtain the corresponding first multi-scale feature map and second multi-scale feature map, the method further comprises the steps of:

adjusting the angle of the face in the target face image;

if the difference between the adjusted angle and the angle of the face in the second face image is not within the preset error threshold range, the angle of the face in the target face image is continuously adjusted until the difference between the adjusted angle and the angle of the face in the second face image is within the preset error threshold.

4. The method according to claim 1, wherein the aligning the face region of the first face image with the face region of the second face image to obtain the target face image of the first face image after alignment processing comprises:

acquiring face key point coordinates of a face image; the face key point coordinates of the face image comprise first face key point coordinates of the first face image and second face key point coordinates of the second face image;

and aligning the face area of the first face image with the face area of the second face image according to the first face key point coordinates and the second face key point coordinates to obtain a target face image of the first face image after alignment.

5. The method according to claim 4, wherein the aligning the face region of the first face image with the face region of the second face image according to the first face keypoint coordinates and the second face keypoint coordinates to obtain the target face image of the first face image after alignment processing, comprises:

respectively carrying out feature standardization processing on the first face key point coordinates and the second face key point coordinates to obtain first key point features and second key point features;

regularization processing is carried out on the first key point features and the second key point features, singular value decomposition is carried out on data obtained after regularization processing, and an affine transformation matrix is obtained;

and carrying out affine transformation on the first face image through the affine transformation matrix to obtain the target face image.

6. The method of claim 4, wherein the obtaining of the face key point coordinates of the face image comprises:

acquiring the position coordinates of a face frame in the face image; the face frame covers a face area in the face image;

acquiring the positions of the key points of the face in the face frame;

and acquiring the coordinates of the face key points of the face key point positions according to the positions of the face key points and the position coordinates of the face frame.

7. The method of claim 6, wherein the obtaining the positions of the face key points in the face frame comprises:

converting the size of the face frame to a preset size;

and acquiring the positions of the key points of the face in the face frame converted to the preset size.

8. An image processing apparatus characterized by comprising:

the face alignment module is configured to align a face region of a first face image with a face region of a second face image to obtain a target face image of the first face image after alignment;

the multi-scale feature map acquisition module is configured to respectively extract features of the target face image and the second face image under multiple scales to obtain a corresponding first multi-scale feature map and a corresponding second multi-scale feature map;

the mask acquisition module is configured to acquire a first mask of the target face image and a second mask of the second face image;

and the face fusion module is configured to perform face fusion processing on the target face image and the second face image according to a first residual link established between the first mask and the first multi-scale feature map and a second residual link established between the second mask and the second multi-scale feature map so as to obtain a fused face image.

9. An electronic device, comprising:

a memory storing computer readable instructions;

a processor to read computer readable instructions stored by the memory to perform the method of any of claims 1-7.

10. A computer-readable storage medium having computer-readable instructions stored thereon, which, when executed by a processor of a computer, cause the computer to perform the method of any one of claims 1-7.