CN111833413A

CN111833413A - Image processing method, image processing device, electronic equipment and computer readable storage medium

Info

Publication number: CN111833413A
Application number: CN202010710400.7A
Authority: CN
Inventors: 郑子奇; 徐国强; 邱寒
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-07-22
Filing date: 2020-07-22
Publication date: 2020-10-27
Anticipated expiration: 2040-07-22
Also published as: CN111833413B; WO2022016996A1

Abstract

The embodiment of the application provides an image processing method, an image processing device, electronic equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring a face image set, wherein the face image set comprises a face image of a first person, a face image of a second person in a specified posture and a face image of a third person in a specified expression; extracting the features of each face image in the face image set to obtain a first face feature set, wherein the first face feature set comprises the face features of a first person, the posture features of a second person and the expression features of a third person; and carrying out face synthesis according to the first face feature set to obtain a synthesized face image of the first person, wherein the synthesized face image of the first person has the face feature of the first person, the posture feature of the second person and the expression feature of the third person. By the adoption of the method and the device, the quality of the generated face image can be improved. In addition, the application also relates to a block chain technology, and the synthetic face image of the first person can be written into the block chain.

Description

Image processing method, image processing device, electronic equipment and computer readable storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image processing method and apparatus, an electronic device, and a computer-readable storage medium.

Background

Image generation is an information processing capability that has become popular in recent years. Image generation involves many aspects, of which face image generation is a very important research area. The main purpose of the method is to generate a face image which is high in quality and is used commercially or experimentally, or to generate a face image satisfying a restriction condition according to the restriction of a user.

At present, for generating a face image, a large-scale related technology is mainly used in the fields of media, popularization and the like, such as virtual character production. But the human face editing is difficult to refine, and the granularity of the control is insufficient. On one hand, the method is not easy to meet due to the fact that the technology is immature, and on the other hand, the related technology needs to support mass data. Meanwhile, for the related technology with higher completion degree, various defects can exist, such as the problem that the character identity does not reach the standard, such as the existence of a Chinese character, the expressive force of an expression is insufficient, and the like; or the problems of low definition of the face image, single image content and the like are caused. Therefore, how to improve the quality of the generated face image becomes an urgent problem to be solved.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, electronic equipment and a computer readable storage medium, which can improve the quality of a generated face image.

In a first aspect, an embodiment of the present application provides an image processing method, including:

acquiring a face image set, wherein the face image set comprises a face image of a first person, a face image of a second person in a specified posture and a face image of a third person in a specified expression;

extracting the features of each face image in the face image set to obtain a first face feature set, wherein the first face feature set comprises the face features of the first person, the posture features of the second person and the expression features of the third person;

and performing face synthesis according to the first face feature set to obtain a synthesized face image of the first person, wherein the synthesized face image of the first person has the face feature of the first person, the posture feature of the second person and the expression feature of the third person.

Optionally, the performing face synthesis according to the first face feature set to obtain a synthesized face image of the first person includes:

utilizing the trained convolutional neural network model to perform upsampling on the first human face feature set to obtain a synthesized human face image of the first person, wherein the upsampling comprises any one of the following items: bilinear interpolation, nearest neighbor interpolation, and transposed convolution.

Optionally, the upsampling includes a transposed convolution, and the upsampling the first human face feature set by using the trained convolutional neural network model to obtain a synthesized human face image of the first person includes:

and performing transposition convolution on the first person feature set by using a transposition convolution layer included in the trained convolutional neural network model to obtain a synthetic face image of the first person.

Optionally, the performing a transpose convolution on the first human face feature set by using a transpose convolution layer included in the trained convolutional neural network model to obtain a synthesized human face image of the first person includes:

constructing a feature map by using the first face feature set;

and performing transposition convolution on the feature graph by using a transposition convolution layer included in the trained convolution neural network model to obtain a synthetic face image of the first person.

Optionally, after obtaining the synthetic face image of the first person, the method further includes:

carrying out image detection on the synthesized face image of the first person to obtain an image detection result;

performing face correction on the synthesized face image of the first person according to the image detection result to obtain a corrected synthesized face image of the first person;

and outputting the corrected synthesized face image of the first person.

Optionally, the method further comprises:

acquiring a facial image data set, wherein the facial image data set comprises at least one group of images, and each group of images in the at least one group of images comprises an original facial image, at least one facial image corresponding to each posture in at least one posture and at least one facial image corresponding to each expression in at least one expression;

and training an initial convolutional neural network model by using the face image data set to obtain a trained convolutional neural network model.

Optionally, the training an initial convolutional neural network model by using the face image data set to obtain a trained convolutional neural network model includes:

performing feature extraction on a target group of images in the at least one group of images to obtain a second face feature set;

utilizing an initial convolutional neural network model to perform upsampling on the second face feature set to obtain at least one first synthesized face image;

and restoring each first synthesized face image in the at least one first synthesized face image through the initial convolutional neural network model to obtain a second synthesized face image corresponding to each first synthesized face image, wherein the second synthesized face image is matched with the original face image included in the target group of face images to obtain a converged convolutional neural network model as the trained convolutional neural network model.

In a second aspect, an embodiment of the present application provides an image processing apparatus, including:

the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a face image set, and the face image set comprises a face image of a first person, a face image of a specified posture of a second person and a face image of a specified expression of a third person;

the processing module is used for extracting the features of each face image in the face image set to obtain a first face feature set, wherein the first face feature set comprises the face features of the first person, the posture features of the second person and the expression features of the third person;

the processing module is further configured to perform face synthesis according to the first face feature set to obtain a synthesized face image of the first person, where the synthesized face image of the first person has the face feature of the first person, the posture feature of the second person, and the expression feature of the third person.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the processor and the memory are connected to each other, where the memory is used to store a computer program, and the computer program includes program instructions, and the processor is configured to call the program instructions to execute the method according to the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, which stores a computer program, where the computer program is executed by a processor to implement the method according to the first aspect.

In summary, the electronic device may obtain a facial image set, where the facial image set includes a facial image of a first person, a facial image of a second person in a specified posture, and a facial image of a third person in a specified expression; the electronic equipment can extract the features of each face image in the face image set to obtain a first face feature set, and carry out face synthesis according to the first face feature set to obtain a synthesized face image of the first person, wherein the synthesized face image of the first person has the face features of the first person, the posture features of the second person and the expression features of the third person.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of an image processing method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of another image processing method provided in the embodiments of the present application;

fig. 3 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

Because the changes of the human face pose and the expression have diversity and complexity, and the defects of the human face image generation technology in the prior art are added, such as the special instability of a human face generation model and the loss of details possibly caused, the generation of the human face expression image in multiple angles is very difficult, and the quality of the generated human face image is not high.

The image processing scheme described in the embodiments of the present application specifically includes: acquiring a face image set, wherein the face image set comprises a face image of a first person, a face image of a second person in a specified posture and a face image of a third person in a specified expression; extracting the features of each face image in the face image set to obtain a first face feature set, wherein the first face feature set comprises the face features of the first person, the posture features of the second person and the expression features of the third person; and carrying out face synthesis according to the first face feature set to obtain a synthesized face image of the first person, wherein the synthesized face image of the first person has the face feature of the first person, the posture feature of the second person and the expression feature of the third person. The process can generate faces with different expressions in multiple angles, and meet certain image quality requirements while editing the face attributes with certain accuracy,

in one application scenario, the user may upload the front photo of the person a, the side photo of the person B, such as the photo turned by 60 ° on the right side, and the expression photo of the person B to the electronic device, and then, by using the above-mentioned image processing scheme, the photo with the expression on the side of the person a may be synthesized, where the expression is the expression corresponding to the expression photo of the person B. The person a may be the user himself or another person.

In an application scenario, a user may upload a front photo of a person a to an electronic device, select a side photo of the person B and an expression photo of the person B from a plurality of photos provided by the electronic device for face synthesis, and then submit a synthesis instruction to the electronic device by clicking a synthesis button, after detecting the synthesis instruction, the electronic device may acquire the front photo of the person a, the side photo of the person B, and the expression photo of the person B, and then, by using the above-mentioned image processing scheme, may synthesize a side emotional photo of the person a, where the expression is an expression corresponding to the expression photo of the person B.

In one application scenario, the electronic device may provide a face synthesis interface for the user, so that the user can set a face image set based on the face synthesis interface, such as setting a front photograph of person a, a side photograph of person B, and an emoticon of person B. The face synthesis interface may include a synthesis button, and the user may submit a synthesis instruction to the electronic device by clicking the synthesis button, and the electronic device may obtain the face image set after detecting the synthesis instruction. In one embodiment, the compositing instructions may carry a collection of facial images, such as a frontal photograph of person A, a lateral photograph of person B, and an emoticon of person B.

In one embodiment, in order to avoid a situation that a synthetic face image is used for crime as a true face, the synthetic face image of the first person may be written into the block chain, so as to achieve the purpose of tracing the synthetic face image. Or, the electronic device may write the synthesized face image of the first person and the identifier of the user who requests to synthesize the synthesized face image or the device information of the user terminal corresponding to the user into the block chain. The identifier of the user may be, for example, an account number of the user or a mobile phone number of the user, which may be used to uniquely identify the user. The device information of the ue may be, for example, a physical address, a device number, an internet protocol address, and the like of the ue, which are used to uniquely identify the ue. In an embodiment, the electronic device referred to in the embodiment of the present application may be a user terminal corresponding to the user, or may not be the user terminal. In one embodiment, the electronic device may write the rectified synthetic face image of the first person into the block chain in a case where the synthesized face image of the first person is rectified.

In one embodiment, in consideration of the privacy of the synthesized face image, the digest of the synthesized face image of the first person may be calculated to obtain digest information, and then the digest information is written into the block chain. The embodiment of the present application does not limit the algorithm used for calculating the summary information of the synthesized face image of the first person. Correspondingly, the summary calculation may be performed on the synthesized face image of the first person and the above-mentioned user identifier to obtain first summary information, and then the first summary information is written into the block chain, or the summary calculation may be performed on the synthesized face image of the first person and the above-mentioned device information of the user terminal to obtain second summary information, and then the second summary information is written into the block chain. In an embodiment, in a case that the electronic device corrects the synthesized face image of the first person, the electronic device may perform summary calculation on the corrected synthesized face image of the first person to obtain third summary information, and then write the third summary information into the block chain.

Please refer to fig. 1, which is a flowchart illustrating an image processing method according to an embodiment of the present disclosure. The method can be applied to electronic equipment, and the electronic equipment can be a terminal or a server. The terminal may be an intelligent terminal such as a notebook computer, a desktop computer, etc., and the server includes, but is not limited to, a single server or a server cluster. Specifically, the method may comprise the steps of:

s101, a face image set is obtained, and the face image set comprises a face image of a first person, a face image of a second person in a specified posture and a face image of a third person in a specified expression.

The first person, the second person and the third person may be the same or different. The designated pose may also be referred to as a fixed pose, for example, the designated pose may be a designated left-right corner, a designated up-down corner, or a corner within a designated plane. The gesture described in the embodiments of the present application may refer to a facial gesture. Accordingly, the specified expression may also be referred to as a fixed expression, for example, the specified expression may be angry, happy, or sad. The face image according to the embodiment of the present application may refer to an image including a face of a person, that is, may refer to an image including a face of a person.

In one embodiment, the electronic device may perform step S101 when detecting the composition instruction.

In an embodiment, when the synthesis instruction is detected, the process of acquiring the facial image set may acquire, for the electronic device, a facial image set requested to be carried by the facial image set. Or, when the synthesis instruction is detected, the process of acquiring the facial image set may be that the electronic device reads the facial image set from a specified directory.

S102, extracting the features of each face image in the face image set to obtain a first face feature set, wherein the first face feature set comprises the face features of the first person, the posture features of the second person and the expression features of the third person.

The facial features may also be referred to as facial features. The referred pose features may refer to pose features of the face.

In one embodiment, the electronic device may invoke a feature extraction algorithm to perform feature extraction on each face image in the face image set, so as to obtain a first face feature set. By adopting the process, the characteristics can be effectively extracted, and the accuracy of the extracted characteristics is guaranteed.

In an embodiment, the electronic device may specifically invoke a feature extraction algorithm to perform feature extraction on a face image of a first person to obtain facial features of the first person, perform feature extraction on a face image of a second person in a specified posture to obtain posture features of the second person, and perform feature extraction on a face image of a third person in a specified expression to obtain expression features of the third person.

In one embodiment, the feature extraction algorithm employed by the electronic device may also be different depending on the features extracted. For example, the electronic device may specifically perform feature extraction on a face image of a first person by using a facial feature extraction algorithm, perform feature extraction on a face image of a specified pose of a second person by using a pose feature extraction algorithm, and perform feature extraction on a face image of a specified expression of a third person by using an expression feature extraction algorithm.

S103, carrying out face synthesis according to the first face feature set to obtain a synthesized face image of the first person, wherein the synthesized face image of the first person has the face feature of the first person, the posture feature of the second person and the expression feature of the third person.

In the embodiment of the application, the electronic device can perform feature fusion according to the first human face feature set, so as to obtain a synthetic human face image of the first person. After obtaining the synthetic face image of the first person, the electronic device may output the synthetic face image of the first person. The face corresponding to the synthesized face image of the first person obtained by the process is the face of the first person, the posture corresponding to the synthesized face image of the first person is the specified posture, and the expression corresponding to the synthesized face image of the first person is the specified expression.

In an embodiment, in order to perform feature fusion effectively, the electronic device may specifically perform face synthesis according to preset fusion parameters and the first face feature set, so as to obtain a synthesized face image of the first person. Wherein, the fusion parameters can be set according to experience.

In one embodiment, in order to perform feature fusion efficiently, the electronic device may further perform upsampling according to the first human face feature set to obtain a synthetic human face image of the first human being. Wherein the upsampling may comprise any one of: bilinear interpolation, nearest neighbor interpolation, and transposed convolution.

In the embodiment shown in fig. 1, the electronic device may acquire a facial image set, where the facial image set includes a facial image of a first person, a facial image of a second person in a specified posture, and a facial image of a third person in a specified expression; the electronic equipment can extract the features of each face image in the face image set to obtain a first face feature set, and carries out face synthesis according to the first face feature set to obtain a synthesized face image of the first person, wherein the synthesized face image of the first person has the face features of the first person, the posture features of the second person and the expression features of the third person, and by adopting the process, vivid multi-angle face expression images can be generated, and the quality of the generated face images can be improved.

Please refer to fig. 2, which is a flowchart illustrating another image processing method according to an embodiment of the present disclosure. The method can be applied to electronic equipment, and the electronic equipment can be a terminal or a server. The terminal may be an intelligent terminal such as a notebook computer, a desktop computer, etc., and the server includes, but is not limited to, a single server or a server cluster. Compared with the embodiment of fig. 1, the embodiment of the present application can improve the stability of the synthesized face image through the image rectification process of steps S204 to S206, so that the quality of the output synthesized face image is higher. Specifically, the method may comprise the steps of:

s201, a face image set is obtained, wherein the face image set comprises a face image of a first person, a face image of a second person in a specified posture and a face image of a third person in a specified expression.

S202, extracting the features of each face image in the face image set to obtain a first face feature set, wherein the first face feature set comprises the face features of the first person, the posture features of the second person and the expression features of the third person.

S203, carrying out face synthesis according to the first face feature set to obtain a synthesized face image of the first person, wherein the synthesized face image of the first person has the face feature of the first person, the posture feature of the second person and the expression feature of the third person.

Steps S201 to S203 can refer to steps S101 to S103 in the embodiment of fig. 1, and the embodiments of the present application are not described herein again.

And S204, carrying out image detection on the synthesized face image of the first person to obtain an image detection result.

S205, performing face correction on the synthesized face image of the first person according to the image detection result to obtain a corrected synthesized face image of the first person.

And S206, outputting the corrected synthesized face image of the first person.

In steps S204 to S206, in order to make the output synthetic face image more truly stable, the electronic device may correct the synthetic face image of the first person. Specifically, the electronic device may perform image detection on a synthesized face image of a first person to obtain an image detection result, correct the synthesized face image of the first person according to the image detection result to obtain a corrected synthesized face image of the first person, and output the corrected synthesized face image of the first person.

In one embodiment, the electronic device performs image detection on the synthesized face image of the first person to obtain an image detection result, and corrects the synthesized face image of the first person according to the image detection result to obtain a corrected synthesized face image of the first person as follows: the electronic equipment carries out face detection on the synthesized face image of the first person to obtain the coordinates of each key point in a plurality of key points in the synthesized face image of the first person as an image detection result; the electronic equipment calculates a transformation matrix used for transforming the coordinates of each key point to the coordinates of a preset key point corresponding to the key point; and the electronic equipment obtains the corrected synthetic face image of the first person according to the synthetic face image of the first person and the transformation matrix. The key points herein may refer to face key points. The preset key points may refer to key points of any image or designated image in the face image set. Since the key points are one of the very important attributes of the human face, the key points are detected for the human face of a certain person, a certain head posture or a certain expression to correct the key points of the generated human face, so that the output synthesized human face image is more stable under most conditions, the possibility of distortion of the synthesized human face image is reduced, and the synthesized human face image can be more normal to look through correction.

In an embodiment, in order to effectively perform feature extraction, the electronic device performs feature extraction on each face image in the face image set, and the process of obtaining the first face feature set may further perform feature extraction on each face image in the face image set by using a trained convolutional neural network model for the electronic device, so as to obtain the first face feature set.

In an embodiment, the process that the electronic device performs feature extraction on each face image in the face image set by using the trained convolutional neural network model to obtain the first face feature set may specifically be that the electronic device performs feature extraction on each face image in the face image set by using a convolutional layer included in the trained convolutional neural network model to obtain the first face feature set.

In an embodiment, in order to perform feature fusion effectively, the process of obtaining the synthetic face image of the first person, in which the electronic device performs face synthesis according to the first face feature set, may further perform upsampling according to the first face feature set to obtain the synthetic face image of the first person. Wherein the upsampling may comprise any one of: bilinear interpolation, nearest neighbor interpolation, and transposed convolution.

In an embodiment, the electronic device may specifically perform upsampling on the first face feature set through the trained convolutional neural network model to obtain a synthesized face image of the first person. The up-sampling is realized through the trained convolutional neural network model, so that the up-sampling efficiency is higher.

In an embodiment, when the upsampling includes bilinear interpolation, the electronic device may specifically perform bilinear interpolation operation according to the first face feature set through the trained convolutional neural network model, so as to obtain a synthesized face image of the first person. The image obtained by the bilinear interpolation method has high quality, and the condition of discontinuous pixel values cannot occur.

In an embodiment, when the upsampling includes bilinear interpolation, the electronic device may specifically construct a feature map according to the first face feature set, and then perform bilinear interpolation operation according to the feature map by using the trained convolutional neural network model, so as to obtain a synthesized face image of the first person. The processing procedure of the bilinear interpolation method can be referred to the following formula:

y_i,j＝x_i-,j-(1-Δi)(1-Δj)+x_j+,j-Δi(1-Δj)+x_i-,j+(1-Δi)Δj+x_i+,j+Δ i Δ j formula 1.1;

where y represents the synthetic face image. x represents a feature map. Δ i ═ i-i^-。Δj＝j-j^-. i denotes the abscissa of the pixel of the target position, and j denotes the ordinate of the pixel of the target position. i.e. i^-Represents rounding down i, i⁺Indicating rounding up i. j is a function of^-Indicating that j is rounded down. j is a function of⁺Indicating that j is rounded up.

In one embodiment, y may be taken as the synthetic face image of the first person.

In one embodiment, after y is obtained, the convolution operation may be performed on the first convolution layer in the trained convolutional neural network model to obtain a final synthetic face image as the synthetic face image of the first person. Or after y is obtained, y is input into the first convolution layer to perform convolution operation, so that a first synthesized face image is obtained, and then bilinear operation is performed according to the first synthesized face image, so that a final synthesized face image is obtained and serves as a synthesized face image of the first person. The above process is matched with bilinear interpolation operation and convolution operation, so that the final synthesized face image has higher resolution.

It should be noted that the bilinear interpolation in the embodiments of the present application essentially finds four pixels around the target position (i, j) in x, and calculates the pixel value of the pixel at the target position y using the pixel values of the four pixels.

In an embodiment, when the upsampling includes nearest neighbor interpolation, the electronic device may specifically perform nearest neighbor interpolation operation according to the first face feature set through the trained convolutional neural network model to obtain a synthesized face image of the first person.

In an embodiment, when the upsampling includes nearest neighbor interpolation, the electronic device may specifically construct a feature map according to the first face feature set, and then perform nearest neighbor interpolation operation according to the feature map by using the trained convolutional neural network model, so as to obtain a synthesized face image of the first person. The processing procedure of the method using nearest neighbor interpolation can be referred to the following formula:

where y represents the synthetic face image. x represents a feature map. i denotes the abscissa of the pixel of the target position, and j denotes the ordinate of the pixel of the target position. u denotes the abscissa of the pixel in x, and v denotes the ordinate of the pixel in x.

In one embodiment, after y is obtained, y is input to the second convolutional layer of the trained convolutional neural network model to perform a convolution operation, so as to obtain a final synthetic face image. The second convolutional layer here may be the same as or different from the first convolutional layer. Or after y is obtained, y is input into the second convolution layer to perform convolution operation, so that a second synthesized face image is obtained, and then nearest neighbor interpolation operation is performed according to the second synthesized face image, so that a final synthesized face image is obtained and serves as the synthesized face image of the first person.

It should be noted that, the nearest neighbor interpolation referred to in the embodiment of the present application essentially finds the pixel value of the pixel nearest to the target position (i, j) in the source image as the pixel value of the pixel of the target position in the synthetic face image.

In an embodiment, when the upsampling includes a transposed convolution, the electronic device may perform the transposed convolution according to the first face feature set by using a transposed convolution layer included in the trained convolutional neural network model, so as to obtain a synthesized face image of the first person.

In an embodiment, the electronic device may specifically construct a feature map by using the first person feature set, and perform a transpose convolution on the feature map by using a transpose convolution layer included in the trained convolutional neural network model to obtain a synthesized face image of the first person. The processing procedure of the method using the transposed convolution can be referred to as the following formula:

where y represents the synthetic face image. x represents a feature map. k denotes a convolution kernel of the transposed convolution layer. c denotes the column of the convolution kernel, r denotes the row of the convolution kernel, and r' denotes the row of the region where the sliding window of the convolution kernel slides. row denotes a column and col denotes a row. I denotes a preset column range and J denotes a preset row range.

In one embodiment, the operation of transposing convolution may also be performed multiple times, so as to obtain a final synthetic face image as the synthetic face image of the first person. For example, the electronic device may be configured to transpose the convolution layer after another transpose convolution layer, and then the next transpose convolution layer may perform a transpose convolution operation according to an output of the previous transpose convolution layer, so as to obtain a final synthesized face image as a synthesized face image of the first person.

In one embodiment, the aforementioned trained convolutional neural network model can be obtained by:

1. the electronic equipment acquires a facial image data set, wherein the facial image data set comprises at least one group of images, and each group of images in the at least one group of images comprises an original facial image, at least one facial image corresponding to each posture in at least one posture and at least one facial image corresponding to each expression in at least one expression. For example, one of the at least one set of images may include a facial image of the person C, one facial image for each of the 54 expressions, one facial image for each of the 4 poses.

2. The electronic equipment trains the initial convolutional neural network model by using the face image data set to obtain the trained convolutional neural network model.

Specifically, the electronic device trains an initial convolutional neural network model by using the face image data set, and a process of obtaining the trained convolutional neural network model may be as follows:

firstly, the electronic equipment performs feature extraction on a target group of images in at least one group of images to obtain a second face feature set. The target group image may be any one of the at least one group of images, or any one of the at least one group of images randomly selected from the at least one group of images. In one embodiment, the electronic device may perform feature extraction on a target set of images in the at least one set of images through an initial convolutional neural network model. In one embodiment, the electronic device may specifically perform feature extraction on a target set of images in the at least one set of images through convolutional layers included in the initial convolutional neural network model.

And secondly, the electronic equipment utilizes the initial convolutional neural network model to perform upsampling on the second face feature set to obtain at least one first synthesized face image. In an embodiment, the electronic device may perform bilinear interpolation according to the second face feature set by using an initial convolutional neural network model to obtain at least one first synthesized face image, or the electronic device may perform nearest neighbor interpolation according to the second face feature set by using an initial convolutional neural network to obtain at least one first synthesized face image, or the electronic device may perform transpose convolution according to the second face feature set by using a transpose convolutional layer included in the initial convolutional neural network model to obtain at least one first synthesized face image.

And thirdly, the electronic equipment restores each first synthesized face image in the at least one first synthesized face image through the initial convolutional neural network model to obtain a second synthesized face image corresponding to each first synthesized face image, the second synthesized face image is matched with the original face image included in the target group of face images, and the converged convolutional neural network model is obtained and used as the trained convolutional neural network model. Where matching may be similar, analogous, or identical. In the embodiment of the invention, the electronic equipment can repeatedly execute the process of restoring each first synthesized face image in the at least one first synthesized face image through the initial convolutional neural network model in the step (i) and (iii) to obtain a second synthesized face image corresponding to each first synthesized face image until the model converges, and the converged convolutional neural network model is obtained and used as the trained convolutional neural network model.

In an embodiment, the electronic device performs reduction processing on each first synthesized face image in the at least one first synthesized face image through the initial convolutional neural network model to obtain a second synthesized face image corresponding to each first synthesized face image, which may be performed by the process of performing feature extraction on each first synthesized face image and an original face image by the electronic device to obtain a third face feature set, and then performs upsampling on the third face feature set through the initial convolutional neural network model to obtain the second synthesized face image corresponding to each first synthesized face image.

As can be seen, in the embodiment shown in fig. 2, the electronic device may correct the synthesized face image of the first person to obtain a corrected synthesized face image of the first person, and the process makes the output synthesized face image more stable and real.

Fig. 3 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present disclosure. The apparatus may be applied to the aforementioned electronic device. Specifically, the apparatus may include:

an obtaining module 301, configured to obtain a facial image set, where the facial image set includes a facial image of a first person, a facial image of a specified pose of a second person, and a facial image of a specified expression of a third person.

The processing module 302 is configured to perform feature extraction on each face image in the face image set to obtain a first face feature set, where the first face feature set includes a face feature of the first person, a posture feature of the second person, and an expression feature of the third person.

The processing module 302 is further configured to perform face synthesis according to the first face feature set to obtain a synthesized face image of the first person, where the synthesized face image of the first person has the facial features of the first person, the posture features of the second person, and the expression features of the third person.

In an optional implementation manner, the processing module 302 performs face synthesis according to the first face feature set to obtain a synthesized face image of the first person, specifically, performs upsampling on the first face feature set by using a trained convolutional neural network model to obtain the synthesized face image of the first person, where the upsampling includes any one of: bilinear interpolation, nearest neighbor interpolation, and transposed convolution.

In an optional implementation manner, the upsampling includes a transposed convolution, and the processing module 302 performs upsampling on the first human face feature set by using the trained convolutional neural network model to obtain a synthetic human face image of the first person, specifically, performs transposed convolution on the first human face feature set by using a transposed convolution layer included in the trained convolutional neural network model to obtain the synthetic human face image of the first person.

In an optional implementation manner, the processing module 302 performs a transposed convolution on the first human face feature set by using a transposed convolution layer included in the trained convolutional neural network model to obtain a synthetic human face image of the first person, specifically, a feature map is constructed by using the first human face feature set; and performing transposition convolution on the feature graph by using a transposition convolution layer included in the trained convolution neural network model to obtain a synthetic face image of the first person.

In an alternative embodiment, the image processing apparatus further comprises an output module 303.

In an optional implementation manner, the processing module 302 is further configured to, after obtaining the synthetic face image of the first person, perform image detection on the synthetic face image of the first person to obtain an image detection result; and carrying out face correction on the synthesized face image of the first person according to the image detection result to obtain the corrected synthesized face image of the first person.

In an alternative embodiment, the output module 303 is configured to output the corrected synthetic face image of the first person.

In an optional implementation, the processing module 302 is further configured to obtain a facial image dataset, where the facial image dataset includes at least one group of images, and each group of images in the at least one group of images includes an original facial image, at least one facial image corresponding to each pose in at least one pose, and at least one facial image corresponding to each expression in at least one expression; and training an initial convolutional neural network model by using the face image data set to obtain a trained convolutional neural network model.

In an optional implementation manner, the processing module 302 trains an initial convolutional neural network model by using the face image data set to obtain a trained convolutional neural network model, specifically, performs feature extraction on a target group of images in the at least one group of images to obtain a second face feature set; utilizing an initial convolutional neural network model to perform upsampling on the second face feature set to obtain at least one first synthesized face image; and restoring each first synthesized face image in the at least one first synthesized face image through the initial convolutional neural network model to obtain a second synthesized face image corresponding to each first synthesized face image, wherein the second synthesized face image is matched with the original face image included in the target group of face images to obtain a converged convolutional neural network model as the trained convolutional neural network model.

In the embodiment shown in fig. 3, the image processing device may obtain a facial image set, where the facial image set includes a facial image of a first person, a facial image of a specified pose of a second person, and a facial image of a specified expression of a third person; the image processing device can extract the features of each face image in the face image set to obtain a first face feature set, and carry out face synthesis according to the first face feature set to obtain a synthesized face image of the first person, wherein the synthesized face image of the first person has the face features of the first person, the posture features of the second person and the expression features of the third person.

Please refer to fig. 4, which is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. The electronic device described in this embodiment may include: one or more processors 1000, one or more input devices 2000, one or more output devices 3000, and memory 4000. The processor 1000, the input device 2000, the output device 3000, and the memory 4000 may be connected by a bus. The input device 2000 and the output device 3000 are optional devices in the electronic device, that is, the electronic device may only include the processor 1000 and the memory 4000. In one embodiment, the input device 2000, the output device 3000 may be a standard wired or wireless communication interface. In an embodiment, the input device 2000 may be a touch screen or a touch display screen, and the output device 3000 may be a display screen or a touch display screen, which is not limited in the embodiments of the present application.

The Processor 1000 may be a Central Processing Unit (CPU), and may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 4000 may be a high-speed RAM memory or a non-volatile memory (e.g., a disk memory). The memory 4000 is used to store a set of program codes, and the input device 2000, the output device 3000, and the processor 1000 may call the program codes stored in the memory 4000. Specifically, the method comprises the following steps:

a processor 1000, configured to obtain a facial image set, where the facial image set includes a facial image of a first person, a facial image of a second person in a specified posture, and a facial image of a third person in a specified expression; extracting the features of each face image in the face image set to obtain a first face feature set, wherein the first face feature set comprises the face features of the first person, the posture features of the second person and the expression features of the third person; and performing face synthesis according to the first face feature set to obtain a synthesized face image of the first person, wherein the synthesized face image of the first person has the face feature of the first person, the posture feature of the second person and the expression feature of the third person.

In an embodiment, the processor 1000 performs face synthesis according to the first face feature set to obtain a synthesized face image of the first person, specifically, performs upsampling on the first face feature set by using a trained convolutional neural network model to obtain the synthesized face image of the first person, where the upsampling includes any one of: bilinear interpolation, nearest neighbor interpolation, and transposed convolution.

In an embodiment, the upsampling includes a transposed convolution, and the processor 1000 performs upsampling on the first human face feature set by using the trained convolutional neural network model to obtain a synthetic human face image of the first person, specifically, performs transposed convolution on the first human face feature set by using a transposed convolution layer included in the trained convolutional neural network model to obtain a synthetic human face image of the first person.

In an embodiment, the processor 1000 performs a transpose convolution on the first human face feature set by using a transpose convolution layer included in the trained convolutional neural network model to obtain a synthetic human face image of the first person, specifically, constructs a feature map by using the first human face feature set; and performing transposition convolution on the feature graph by using a transposition convolution layer included in the trained convolution neural network model to obtain a synthetic face image of the first person.

In one embodiment, the processor 1000 is further configured to, after obtaining the synthesized face image of the first person, perform image detection on the synthesized face image of the first person to obtain an image detection result; performing face correction on the synthesized face image of the first person according to the image detection result to obtain a corrected synthesized face image of the first person; the corrected synthetic face image of the first person is output through the output device 3000.

In one embodiment, the processor 1000 is further configured to obtain a facial image data set, where the facial image data set includes at least one group of images, and each group of images in the at least one group of images includes an original facial image, at least one facial image corresponding to each pose in at least one pose, and at least one facial image corresponding to each expression in at least one expression; and training an initial convolutional neural network model by using the face image data set to obtain a trained convolutional neural network model.

In an embodiment, the processor 1000 trains an initial convolutional neural network model by using the face image data set to obtain a trained convolutional neural network model, specifically, performs feature extraction on a target group image in the at least one group of images to obtain a second face feature set; utilizing an initial convolutional neural network model to perform upsampling on the second face feature set to obtain at least one first synthesized face image; and restoring each first synthesized face image in the at least one first synthesized face image through the initial convolutional neural network model to obtain a second synthesized face image corresponding to each first synthesized face image, wherein the second synthesized face image is matched with the original face image included in the target group of face images to obtain a converged convolutional neural network model as the trained convolutional neural network model.

In a specific implementation, the processor 1000, the input device 2000, and the output device 3000 described in this embodiment of the present application may perform the implementation described in the embodiment of fig. 1 and the embodiment of fig. 2, and may also perform the implementation described in this embodiment of the present application, which is not described herein again.

The functional modules in the embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a form of sampling hardware, and can also be realized in a form of sampling software functional modules.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The computer readable storage medium may be volatile or nonvolatile. For example, the computer storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like. The computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An image processing method, comprising:

2. The method according to claim 1, wherein the performing face synthesis according to the first face feature set to obtain a synthesized face image of the first person comprises:

3. The method of claim 2, wherein the upsampling comprises transpose convolution, and wherein the upsampling the first set of facial features using the trained convolutional neural network model to obtain a synthetic facial image of the first person comprises:

4. The method according to claim 3, wherein the performing the transposed convolution on the first set of face features by using the transposed convolution layer included in the trained convolutional neural network model to obtain the synthesized face image of the first person comprises:

constructing a feature map by using the first face feature set;

5. The method of any of claims 1-4, wherein after obtaining the synthetic face image of the first person, the method further comprises:

and outputting the corrected synthesized face image of the first person.

6. The method according to any one of claims 2-4, further comprising:

7. The method of claim 6, wherein the training an initial convolutional neural network model using the face image dataset to obtain a trained convolutional neural network model comprises:

8. An image processing apparatus characterized by comprising:

9. An electronic device, comprising a processor and a memory, the processor and the memory being interconnected, wherein the memory is configured to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1-7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which is executed by a processor to implement the method according to any one of claims 1-7.