CN113361387A - Face image fusion method and device, storage medium and electronic equipment - Google Patents
Face image fusion method and device, storage medium and electronic equipment Download PDFInfo
- Publication number
- CN113361387A CN113361387A CN202110619880.0A CN202110619880A CN113361387A CN 113361387 A CN113361387 A CN 113361387A CN 202110619880 A CN202110619880 A CN 202110619880A CN 113361387 A CN113361387 A CN 113361387A
- Authority
- CN
- China
- Prior art keywords
- face
- model
- image
- attribute
- fusion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007500 overflow downdraw method Methods 0.000 title claims abstract description 14
- 230000004927 fusion Effects 0.000 claims abstract description 208
- 238000005070 sampling Methods 0.000 claims abstract description 82
- 238000000034 method Methods 0.000 claims abstract description 57
- 230000001815 facial effect Effects 0.000 claims abstract description 37
- 238000012549 training Methods 0.000 claims description 72
- 238000000605 extraction Methods 0.000 claims description 46
- 230000008569 process Effects 0.000 claims description 23
- 230000006870 function Effects 0.000 claims description 11
- 238000007499 fusion processing Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 230000000694 effects Effects 0.000 abstract description 6
- 230000000007 visual effect Effects 0.000 abstract description 3
- 230000036544 posture Effects 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 7
- 230000009471 action Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a face image fusion method and device, a storage medium and electronic equipment, wherein the method comprises the following steps: when a face fusion instruction is received, acquiring a face template image and a face image to be fused corresponding to the face fusion instruction; performing multi-level down-sampling on the face template image to obtain original attribute features of the face template image, and performing multi-level up-sampling on the original features to obtain attribute features generated by up-sampling at each level; extracting the characteristics of the facial image to be fused by using a facial recognition model to obtain the identity characteristics of the facial image to be fused; and inputting the original attribute features, the attribute features generated by sampling at each level and the identity features into a pre-constructed feature fusion model to obtain a face fusion image output by the feature fusion model. By applying the method provided by the embodiment of the invention, the fusion effect of the pictures can be improved under the condition that the human face postures and the visual angles of the source images to be fused and the template images have larger differences.
Description
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and an apparatus for fusing a face image, a storage medium, and an electronic device.
Background
With the development of society and science and technology, the demand of people for good portrait shooting effect is increasing day by day. The functions of face beautification, chartlet, hair-changing type, face-changing and the like in each photographing application or image processing application are popular with users. The face changing method is mainly characterized in that face fusion is carried out on a user photo and a material photo, so that the fused image has human face appearance characteristics in the user photo and character image characteristics in the material photo, entertainment requirements of users are met, and interestingness of application is improved.
In the current image processing technology, pixels in a face area are usually directly replaced to realize the fusion of a face image of a user and a face template image, but the method is very sensitive to the face pose and the view angle, and the image fusion effect is poor under the condition that the difference between the face pose and the view angle of a source image to be fused and the template image of the user is large.
Disclosure of Invention
The invention aims to provide a face image fusion method, which can improve the fusion effect of pictures.
The invention also provides a face image fusion device, which is used for ensuring the realization and the application of the method in practice.
A face image fusion method comprises the following steps:
when a face fusion instruction is received, acquiring a face template image and a face image to be fused corresponding to the face fusion instruction;
performing multi-level down-sampling on a face template image to obtain original attribute features of the face template image, and performing multi-level up-sampling on the original features to obtain attribute features generated by up-sampling at each level;
extracting the characteristics of the facial image to be fused by using a facial recognition model to obtain the identity characteristics of the facial image to be fused;
and inputting the original attribute features, the attribute features generated by sampling at each level and the identity features into a pre-constructed feature fusion model to obtain a face fusion image output by the feature fusion model.
Optionally, in the method, the performing multi-level down-sampling on the face template image to obtain an original attribute feature of the face template image, and performing multi-level up-sampling on the original feature to obtain an attribute feature generated by each level of up-sampling includes:
utilizing an M-level down-sampling module in a preset attribute feature extraction model to down-sample the face model image to obtain the original attribute features of the face template image;
and sequentially upsampling the original attribute features by utilizing an M-level upsampling module in the attribute feature extraction module to obtain the attribute features output by each level of upsampling module, wherein the original attribute features and the output of the M-level downsampling module are used as the input of a first-level upsampling module in the attribute feature extraction model, the output of an i-th-level upsampling module and the output of an M-i-level downsampling module are used as the input of an i + 1-level upsampling module, M and i are positive integers, i is 1,.
Optionally, in the above method, the feature fusion model includes M +1 feature fusion modules arranged in sequence; inputting the original attribute features, the attribute features generated by sampling at each level and the identity features into a pre-constructed feature fusion model to obtain a face fusion image output by the feature fusion model, wherein the face fusion image comprises:
inputting the original attribute features, the attribute features generated by sampling at each level and the identity features into each feature fusion module in the feature fusion model to obtain a face fusion image output by the feature fusion model;
wherein, the original attribute feature and the identity feature are used as the input of the first feature fusion module in the feature fusion model; the output of the kth feature fusion module, the attribute feature output by the kth-level down-sampling module and the identity feature are used as the input of a (k + 1) th feature fusion module; k1.., M.
The method, optionally, includes a process of constructing a feature fusion model, including:
acquiring a model to be trained and a training data set, wherein the training data comprises a plurality of picture groups, and each picture group comprises a face template training image and a face training image to be fused; the model to be trained comprises an initial attribute feature extraction model, a face recognition model and an initial feature fusion model;
sequentially inputting each picture group into the model to be trained, and processing the face template training image by using the initial attribute feature extraction model to obtain the attribute features of the face template training image; extracting the characteristics of the facial image to be fused by using a facial recognition model to obtain the identity characteristics of the facial image to be fused; fusing the attribute features of the face model training images and the attribute features of the face images to be fused by using the initial feature fusion model to obtain image data corresponding to the picture group; substituting the image data into a loss function of the model to be trained to obtain a loss value corresponding to the image data; adjusting network parameters of the initial attribute feature extraction model and the initial feature fusion model based on the loss value to obtain an initial feature fusion model meeting preset training completion conditions;
wherein the loss values include a countermeasure loss, an identity loss, an attribute loss, and a reconstruction loss; the countermeasure loss represents the loss of trueness of the image data output by the model to be trained; the identity loss represents the loss of the similarity degree of the image data output by the model to be trained and the identity characteristics of the face training image to be fused; the attribute loss represents the loss of the similarity degree of the initial image data of the model to be trained and the attribute characteristics of the face template image; reconstructing loss to represent pixel distance loss between a face object contained in image data output by a model to be trained and a face object contained in the face template training image;
and taking the initial feature fusion model which meets the training completion condition as a feature fusion model.
A face image fusion apparatus, comprising:
the system comprises a receiving unit, a fusion unit and a fusion unit, wherein the receiving unit is used for acquiring a face template image and a face image to be fused corresponding to a face fusion instruction when the face fusion instruction is received;
the first execution unit is used for carrying out multi-stage down-sampling on the face template image to obtain the original attribute characteristics of the face template image, and carrying out multi-stage up-sampling on the original characteristics to obtain the attribute characteristics generated by each stage of up-sampling;
the second execution unit is used for extracting the characteristics of the face image to be fused by using the face recognition model to obtain the identity characteristics of the face image to be fused;
and the feature fusion unit is used for inputting the original attribute features, the attribute features generated by sampling at each level and the identity features into a pre-constructed feature fusion model to obtain a face fusion image output by the feature fusion model.
The above apparatus, optionally, the first execution unit includes:
the first execution subunit is used for utilizing an M-level down-sampling module in a preset attribute feature extraction model to down-sample the face model image to obtain the original attribute feature of the face template image;
and the second execution subunit is used for sequentially performing upsampling on the original attribute features by using M-level upsampling modules in the attribute feature extraction module to obtain the attribute features output by each level of upsampling modules, wherein the original attribute features and the output of the M-level downsampling module are used as the input of a first-level upsampling module in the attribute feature extraction model, the output of an i-level upsampling module and the output of an M-i-level downsampling module are used as the input of an i + 1-level upsampling module, M and i are positive integers, i is 1,.
Optionally, in the apparatus described above, the feature fusion model includes M +1 feature fusion modules arranged in sequence; the feature fusion unit includes:
the input subunit is used for inputting the original attribute features, the attribute features generated by sampling at each level and the identity features into each feature fusion module in the feature fusion model so as to obtain a face fusion image output by the feature fusion model;
wherein, the original attribute feature and the identity feature are used as the input of the first feature fusion module in the feature fusion model; the output of the kth feature fusion module, the attribute feature output by the kth-level down-sampling module and the identity feature are used as the input of a (k + 1) th feature fusion module; k1.., M.
The above apparatus, optionally, the feature fusion unit includes:
the system comprises an acquisition subunit, a fusion processing subunit and a fusion processing subunit, wherein the acquisition subunit is used for acquiring a model to be trained and a training data set, the training data comprises a plurality of picture groups, and each picture group comprises a face template training image and a face training image to be fused; the model to be trained comprises an initial attribute feature extraction model, a face recognition model and an initial feature fusion model;
the training subunit is used for sequentially inputting each picture group into the model to be trained so as to process the face template training image by using the initial attribute feature extraction model to obtain the attribute features of the face template training image; extracting the characteristics of the facial image to be fused by using a facial recognition model to obtain the identity characteristics of the facial image to be fused; fusing the attribute features of the face model training images and the attribute features of the face images to be fused by using the initial feature fusion model to obtain image data corresponding to the picture group; substituting the image data into a loss function of the model to be trained to obtain a loss value corresponding to the image data; adjusting network parameters of the initial attribute feature extraction model and the initial feature fusion model based on the loss value to obtain an initial feature fusion model meeting preset training completion conditions;
wherein the loss values include a countermeasure loss, an identity loss, an attribute loss, and a reconstruction loss; the countermeasure loss represents the loss of trueness of the image data output by the model to be trained; the identity loss represents the loss of the similarity degree of the image data output by the model to be trained and the identity characteristics of the face training image to be fused; the attribute loss represents the loss of the similarity degree of the initial image data of the model to be trained and the attribute characteristics of the face template image; reconstructing loss to represent pixel distance loss between a face object contained in image data output by a model to be trained and a face object contained in the face template training image;
and the third execution subunit is used for taking the initial feature fusion model which meets the training completion condition as a feature fusion model.
A storage medium comprises storage instructions, wherein when the instructions are executed, a device where the storage medium is located is controlled to execute the human face image fusion method.
An electronic device comprising a memory, and one or more instructions, wherein the one or more instructions are stored in the memory and configured to be executed by one or more processors to perform the method of face image fusion as described above.
Compared with the prior art, the invention has the following advantages:
the invention provides a face image fusion method and device, a storage medium and electronic equipment, wherein the method comprises the following steps: when a face fusion instruction is received, acquiring a face template image and a face image to be fused corresponding to the face fusion instruction; performing multi-level down-sampling on a face template image to obtain original attribute features of the face template image, and performing multi-level up-sampling on the original features to obtain attribute features generated by up-sampling at each level; extracting the characteristics of the facial image to be fused by using a facial recognition model to obtain the identity characteristics of the facial image to be fused; and inputting the original attribute features, the attribute features generated by sampling at each level and the identity features into a pre-constructed feature fusion model to obtain a face fusion image output by the feature fusion model. By applying the method provided by the embodiment of the invention, the attribute characteristics of the face template image and the identity characteristics of the face image to be fused are extracted, and the attribute characteristics and the identity characteristics are fused, so that the fusion effect of the image can be improved under the condition that the difference between the face posture and the visual angle of the source image to be fused and the template image is large. The sense of incongruity of the face part can be reduced. By carrying out multi-stage feature extraction on different feature image sizes and utilizing multi-stage face attribute features, multi-stage fusion of input source face identity features and target face attribute features is carried out under multiple dimensions, finally fused features have stronger expression capability, and the generated face fusion image is more vivid.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a method for fusing face images according to the present invention;
FIG. 2 is a diagram illustrating an exemplary structure of an attribute feature extraction model according to the present invention;
FIG. 3 is a flow chart of a process for constructing a feature fusion model provided by the present invention;
FIG. 4 is a diagram illustrating a structure of a model to be trained according to the present invention;
FIG. 5 is a schematic structural diagram of a face image fusion apparatus according to the present invention;
fig. 6 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The embodiment of the invention provides a face image fusion method, which can be applied to electronic equipment, wherein the electronic equipment can be a server, and a flow chart of the method is shown in figure 1, and specifically comprises the following steps:
s101: and when a face fusion instruction is received, acquiring a face template image and a face image to be fused corresponding to the face fusion instruction.
In the embodiment of the invention, under the condition of receiving the face fusion instruction, the face fusion instruction can be analyzed to obtain the instruction information, and the face template image and the face image to be fused are obtained from the instruction information.
The face template image may include a first face object, the face image to be fused may include a second face object, and a person to which the first face object belongs and a person to which the second face object belongs may be the same or different.
S102: the method comprises the steps of conducting multilevel down-sampling on a face template image to obtain original attribute features of the face template image, conducting multilevel up-sampling on the original features, and obtaining attribute features generated by sampling on each level.
The original attribute features and the attribute features generated by sampling at each level may include one or more of pose features, background features, scene illumination features, expression features, skin color features, and the like in the face model image.
S103: and extracting the characteristics of the face image to be fused by using a face recognition model to obtain the identity characteristics of the face image to be fused.
In the embodiment of the present invention, the face recognition model may be an arcface model that has been trained, and the identity feature of the face image can be extracted to perform face recognition, that is, the identity feature is used to recognize the person to which the face object in the face image belongs.
S104: and inputting the original attribute features, the attribute features generated by sampling at each level and the identity features into a pre-constructed feature fusion model to obtain a face fusion image output by the feature fusion model.
In the embodiment of the present invention, the structure of the feature fusion model may be a residual structure.
The face fusion image may include the identity feature of the face image to be fused and the attribute feature of the face model image.
By applying the method provided by the embodiment of the invention, the attribute characteristics of the face template image and the identity characteristics of the face image to be fused are extracted, and the attribute characteristics and the identity characteristics are fused, so that the fusion effect of the image can be improved under the condition that the difference between the face posture and the visual angle of the source image to be fused and the template image is large. The sense of incongruity of the face part can be reduced. By carrying out multi-stage feature extraction on different feature image sizes and utilizing multi-stage face attribute features, multi-stage fusion of input source face identity features and target face attribute features is carried out under multiple dimensions, finally fused features have stronger expression capability, and the generated face fusion image is more vivid.
In this embodiment of the present invention, based on the foregoing implementation process, specifically, the performing multi-level down-sampling on the face template image to obtain the original attribute features of the face template image, and performing multi-level up-sampling on the original features to obtain the attribute features generated by each level of up-sampling includes:
utilizing an M-level down-sampling module in a preset attribute feature extraction model to down-sample the face model image to obtain the original attribute features of the face template image;
and sequentially upsampling the original attribute features by utilizing an M-level upsampling module in the attribute feature extraction module to obtain the attribute features output by each level of upsampling module, wherein the original attribute features and the output of the M-level downsampling module are used as the input of a first-level upsampling module in the attribute feature extraction model, the output of an i-th-level upsampling module and the output of an M-i-level downsampling module are used as the input of an i + 1-level upsampling module, M and i are positive integers, i is 1,.
In the method provided by the embodiment of the present invention, the attribute feature extraction model may include an M-level down-sampling module, a connection layer, and an M-level up-sampling module, referring to fig. 2, which is a structural example diagram of the attribute feature extraction model provided by the embodiment of the present invention, in the M-level down-sampling module, an output of an i-th down-sampling module is an input of an i + 1-th down-sampling module and an M-i + 1-th up-sampling module.
Specifically, feature extraction is performed by performing convolution layer calculation based on an input image, and downsampling of features is performed to increase the receptive field of a model and enhance the expression of global features. The up-sampling is a process of restoring the feature map of the image, and the original feature map and the feature map obtained by up-sampling are integrated with feature information through skip level connection in the restoring process.
In the embodiment of the present invention, based on the implementation process, specifically, the feature fusion model includes M +1 feature fusion modules arranged in sequence; inputting the original attribute features, the attribute features generated by sampling at each level and the identity features into a pre-constructed feature fusion model to obtain a face fusion image output by the feature fusion model, wherein the face fusion image comprises:
inputting the original attribute features, the attribute features generated by sampling at each level and the identity features into each feature fusion module in the feature fusion model to obtain a face fusion image output by the feature fusion model;
wherein, the original attribute feature and the identity feature are used as the input of the first feature fusion module in the feature fusion model; the output of the kth feature fusion module, the attribute feature output by the kth-level down-sampling module and the identity feature are used as the input of a (k + 1) th feature fusion module; k1.., M.
In the embodiment of the invention, for any one feature fusion module, the original attribute features or the attribute features input into the feature fusion module are respectively obtained by calculating two independent convolution layersAndthe identity characteristic input is respectively obtained by two independent full-connection layer calculationsAndthe front-level feature input is expressed asNormalized by the examples to fk,fkObtaining M through the output of the convolution layerk(as a weight parameter for fusion, the value is between 0 and 1). And fkConversion with attributes, respectively And identity transformation Respectively obtain A by calculationkAnd IkThe output of the final feature fusion module layer is formed by MkAct on AkAnd IkThe weighting of the features is obtained. A. thek、IkAnd the calculation formula of the fusion feature can be expressed as follows:
wherein k represents the kth feature fusion module,multiplication of corresponding elements of features representing a multi-dimensional matrix representation, foutRepresenting the output of the feature fusion computation, the output of the layers of the feature fusion model, like the residual connection of ResNet, can be expressed as:
in the embodiment of the present invention, based on the implementation process, specifically, the process of constructing the feature fusion model, as shown in fig. 3, specifically includes:
s301: acquiring a model to be trained and a training data set, wherein the training data comprises a plurality of picture groups, and each picture group comprises a face template training image and a face training image to be fused; the model to be trained comprises an initial attribute feature extraction model, a face recognition model and an initial feature fusion model.
In the embodiment of the present invention, as shown in fig. 4, the structure of the initial attribute feature extraction model is consistent with that of the U-net model, and includes 3 important parts, i.e., down-sampling, up-sampling, and skip level connection, and firstly, feature extraction is performed by performing feature down-sampling based on the input image through convolutional layer calculation, and the down-sampling increases the receptive field of the model to enhance the expression of global features. The up-sampling is a process of restoring the feature map of the image, the original feature map and the feature map obtained by up-sampling are integrated with feature information through grade skipping connection in the restoring process, and the attribute feature of the first face, namely different feature maps obtained in the up-sampling process, is used as multi-grade attribute feature representation of the face attribute.
Optionally, the face recognition model arcface can extract features of a face picture to perform face recognition, the extracted face features are used for representing identity features of a face, and the extraction of the identity features does not participate in model training in the method because a pre-trained arcface model is used.
Specifically, the network structure design of the feature fusion module is similar to the residual module structure design of the ResNet network model. The feature fusion module network layer comprises three inputs, namely attribute feature information, identity feature information and feature information output by a previous layer of feature fusion module, and the feature fusion calculation process comprises the following steps that the attribute feature input is respectively obtained through calculation of two independent convolution layersAndthe identity characteristic input is respectively obtained by two independent full-connection layer calculationsAndthe front-level feature input is expressed asNormalized by the examples to fk,fkObtaining M through the output of the convolution layerkAnd f iskConversion with attributes, respectively And identity transformation Respectively obtain A by calculationkAnd IkThe output of the final feature fusion module layer is formed by MkAct on AkAnd IkThe weighting of the features is obtained.
S302: sequentially inputting each picture group into the model to be trained, and processing the face template training image by using the initial attribute feature extraction model to obtain the attribute features of the face template training image; extracting the characteristics of the facial image to be fused by using a facial recognition model to obtain the identity characteristics of the facial image to be fused; fusing the attribute features of the face model training images and the attribute features of the face images to be fused by using the initial feature fusion model to obtain image data corresponding to the picture group; substituting the image data into a loss function of the model to be trained to obtain a loss value corresponding to the image data; and adjusting the network parameters of the initial attribute feature extraction model and the initial feature fusion model based on the loss value to obtain the initial feature fusion model meeting the preset training completion condition.
Wherein the loss values include a countermeasure loss, an identity loss, an attribute loss, and a reconstruction loss; the countermeasure loss represents the loss of trueness of the image data output by the model to be trained; the identity loss represents the loss of the similarity degree of the image data output by the model to be trained and the identity characteristics of the face training image to be fused; the attribute loss represents the loss of the similarity degree of the initial image data of the model to be trained and the attribute characteristics of the face template image; the reconstruction loss represents the pixel distance loss between the face object contained in the image data output by the model to be trained and the face object contained in the face template training image.
In an embodiment of the invention, the loss function may be as follows:
loss=ladv+λ1lid+λ2lattr+λ3lrec
wherein ladvTo combat the loss term,/idFor identity loss terms,/attrFor attribute loss items,/recFor attribute loss terms, the expression of each loss term is as follows:
lid=1-cos(Zid(Xs),Zid(Y's,t))
wherein, Y's,tImage data output for a feature fusion model, XtTraining images for face templates, Y's,tAn image is trained for a human face.
S303: and taking the initial feature fusion model which meets the training completion condition as a feature fusion model.
In an embodiment of the present invention, the training completion condition may be convergence of the loss function.
Corresponding to the method described in fig. 1, an embodiment of the present invention further provides a face image fusion apparatus, which is used for specifically implementing the method in fig. 1, and the face image fusion apparatus provided in the embodiment of the present invention may be applied to an electronic device, and a schematic structural diagram of the face image fusion apparatus is shown in fig. 5, and specifically includes:
the receiving unit 501 is configured to, when a face fusion instruction is received, obtain a face template image and a face image to be fused corresponding to the face fusion instruction;
the first execution unit 502 is configured to perform multi-level down-sampling on a face template image to obtain an original attribute feature of the face template image, and perform multi-level up-sampling on the original feature to obtain an attribute feature generated by each level of up-sampling;
a second executing unit 503, configured to perform characteristic extraction on the to-be-fused face image by using a face recognition model, so as to obtain an identity feature of the to-be-fused face image;
the feature fusion unit 504 is configured to input the original attribute features, the attribute features generated by sampling at each level, and the identity features into a pre-constructed feature fusion model, so as to obtain a face fusion image output by the feature fusion model.
In this embodiment of the present invention, based on the above scheme, specifically, the first execution unit 502 includes:
the first execution subunit is used for utilizing an M-level down-sampling module in a preset attribute feature extraction model to down-sample the face model image to obtain the original attribute feature of the face template image;
and the second execution subunit is used for sequentially performing upsampling on the original attribute features by using M-level upsampling modules in the attribute feature extraction module to obtain the attribute features output by each level of upsampling modules, wherein the original attribute features and the output of the M-level downsampling module are used as the input of a first-level upsampling module in the attribute feature extraction model, the output of an i-level upsampling module and the output of an M-i-level downsampling module are used as the input of an i + 1-level upsampling module, M and i are positive integers, i is 1,.
In the embodiment of the present invention, based on the above scheme, specifically, the feature fusion model includes M +1 feature fusion modules arranged in sequence; the feature fusion unit 504 includes:
the input subunit is used for inputting the original attribute features, the attribute features generated by sampling at each level and the identity features into each feature fusion module in the feature fusion model so as to obtain a face fusion image output by the feature fusion model;
wherein, the original attribute feature and the identity feature are used as the input of the first feature fusion module in the feature fusion model; the output of the kth feature fusion module, the attribute feature output by the kth-level down-sampling module and the identity feature are used as the input of a (k + 1) th feature fusion module; k1.., M.
In an embodiment of the present invention, based on the above scheme, specifically, the feature fusion unit 504 includes:
the system comprises an acquisition subunit, a fusion processing subunit and a fusion processing subunit, wherein the acquisition subunit is used for acquiring a model to be trained and a training data set, the training data comprises a plurality of picture groups, and each picture group comprises a face template training image and a face training image to be fused; the model to be trained comprises an initial attribute feature extraction model, a face recognition model and an initial feature fusion model;
the training subunit is used for sequentially inputting each picture group into the model to be trained so as to process the face template training image by using the initial attribute feature extraction model to obtain the attribute features of the face template training image; extracting the characteristics of the facial image to be fused by using a facial recognition model to obtain the identity characteristics of the facial image to be fused; fusing the attribute features of the face model training images and the attribute features of the face images to be fused by using the initial feature fusion model to obtain image data corresponding to the picture group; substituting the image data into a loss function of the model to be trained to obtain a loss value corresponding to the image data; adjusting network parameters of the initial attribute feature extraction model and the initial feature fusion model based on the loss value to obtain an initial feature fusion model meeting preset training completion conditions;
wherein the loss values include a countermeasure loss, an identity loss, an attribute loss, and a reconstruction loss; the countermeasure loss represents the loss of trueness of the image data output by the model to be trained; the identity loss represents the loss of the similarity degree of the image data output by the model to be trained and the identity characteristics of the face training image to be fused; the attribute loss represents the loss of the similarity degree of the initial image data of the model to be trained and the attribute characteristics of the face template image; reconstructing loss to represent pixel distance loss between a face object contained in image data output by a model to be trained and a face object contained in the face template training image;
and the third execution subunit is used for taking the initial feature fusion model which meets the training completion condition as a feature fusion model.
The specific principle and the implementation process of each unit and each module in the facial image fusion device disclosed in the embodiment of the present invention are the same as those of the facial image fusion method disclosed in the embodiment of the present invention, and reference may be made to corresponding parts in the facial image fusion method provided in the embodiment of the present invention, which are not described herein again.
The embodiment of the invention also provides a storage medium, which comprises a stored instruction, wherein when the instruction runs, the equipment where the storage medium is located is controlled to execute the face image fusion method.
An electronic device is provided in an embodiment of the present invention, and the structural diagram of the electronic device is shown in fig. 6, which specifically includes a memory 601 and one or more instructions 602, where the one or more instructions 602 are stored in the memory 601 and configured to be executed by one or more processors 603 to perform the following operations on the one or more instructions 602:
when a face fusion instruction is received, acquiring a face template image and a face image to be fused corresponding to the face fusion instruction;
performing multi-level down-sampling on a face template image to obtain original attribute features of the face template image, and performing multi-level up-sampling on the original features to obtain attribute features generated by up-sampling at each level;
extracting the characteristics of the facial image to be fused by using a facial recognition model to obtain the identity characteristics of the facial image to be fused;
and inputting the original attribute features, the attribute features generated by sampling at each level and the identity features into a pre-constructed feature fusion model to obtain a face fusion image output by the feature fusion model.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functions of the units may be implemented in the same software and/or hardware or in a plurality of software and/or hardware when implementing the invention.
From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The human face image fusion method provided by the invention is described in detail above, a specific example is applied in the text to explain the principle and the implementation mode of the invention, and the description of the above example is only used to help understanding the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
Claims (10)
1. A face image fusion method is characterized by comprising the following steps:
when a face fusion instruction is received, acquiring a face template image and a face image to be fused corresponding to the face fusion instruction;
performing multi-level down-sampling on a face template image to obtain original attribute features of the face template image, and performing multi-level up-sampling on the original features to obtain attribute features generated by up-sampling at each level;
extracting the characteristics of the facial image to be fused by using a facial recognition model to obtain the identity characteristics of the facial image to be fused;
and inputting the original attribute features, the attribute features generated by sampling at each level and the identity features into a pre-constructed feature fusion model to obtain a face fusion image output by the feature fusion model.
2. The method of claim 1, wherein the down-sampling the face template image in multiple stages to obtain original attribute features of the face template image, and up-sampling the original features in multiple stages to obtain attribute features generated by up-sampling in each stage comprises:
utilizing an M-level down-sampling module in a preset attribute feature extraction model to down-sample the face model image to obtain the original attribute features of the face template image;
and sequentially upsampling the original attribute features by utilizing an M-level upsampling module in the attribute feature extraction module to obtain the attribute features output by each level of upsampling module, wherein the original attribute features and the output of the M-level downsampling module are used as the input of a first-level upsampling module in the attribute feature extraction model, the output of an i-th-level upsampling module and the output of an M-i-level downsampling module are used as the input of an i + 1-level upsampling module, M and i are positive integers, i is 1,.
3. The method of claim 2, wherein the feature fusion model comprises M +1 feature fusion modules arranged in sequence; inputting the original attribute features, the attribute features generated by sampling at each level and the identity features into a pre-constructed feature fusion model to obtain a face fusion image output by the feature fusion model, wherein the face fusion image comprises:
inputting the original attribute features, the attribute features generated by sampling at each level and the identity features into each feature fusion module in the feature fusion model to obtain a face fusion image output by the feature fusion model;
wherein, the original attribute feature and the identity feature are used as the input of the first feature fusion module in the feature fusion model; the output of the kth feature fusion module, the attribute feature output by the kth-level down-sampling module and the identity feature are used as the input of a (k + 1) th feature fusion module; k1.., M.
4. The method of claim 1, wherein the process of constructing a feature fusion model comprises:
acquiring a model to be trained and a training data set, wherein the training data comprises a plurality of picture groups, and each picture group comprises a face template training image and a face training image to be fused; the model to be trained comprises an initial attribute feature extraction model, a face recognition model and an initial feature fusion model;
sequentially inputting each picture group into the model to be trained, and processing the face template training image by using the initial attribute feature extraction model to obtain the attribute features of the face template training image; extracting the characteristics of the facial image to be fused by using a facial recognition model to obtain the identity characteristics of the facial image to be fused; fusing the attribute features of the face model training images and the attribute features of the face images to be fused by using the initial feature fusion model to obtain image data corresponding to the picture group; substituting the image data into a loss function of the model to be trained to obtain a loss value corresponding to the image data; adjusting network parameters of the initial attribute feature extraction model and the initial feature fusion model based on the loss value to obtain an initial feature fusion model meeting preset training completion conditions;
wherein the loss values include a countermeasure loss, an identity loss, an attribute loss, and a reconstruction loss; the countermeasure loss represents the loss of trueness of the image data output by the model to be trained; the identity loss represents the loss of the similarity degree of the image data output by the model to be trained and the identity characteristics of the face training image to be fused; the attribute loss represents the loss of the similarity degree of the initial image data of the model to be trained and the attribute characteristics of the face template image; reconstructing loss to represent pixel distance loss between a face object contained in image data output by a model to be trained and a face object contained in the face template training image;
and taking the initial feature fusion model which meets the training completion condition as a feature fusion model.
5. A face image fusion device is characterized by comprising:
the system comprises a receiving unit, a fusion unit and a fusion unit, wherein the receiving unit is used for acquiring a face template image and a face image to be fused corresponding to a face fusion instruction when the face fusion instruction is received;
the first execution unit is used for carrying out multi-stage down-sampling on the face template image to obtain the original attribute characteristics of the face template image, and carrying out multi-stage up-sampling on the original characteristics to obtain the attribute characteristics generated by each stage of up-sampling;
the second execution unit is used for extracting the characteristics of the face image to be fused by using the face recognition model to obtain the identity characteristics of the face image to be fused;
and the feature fusion unit is used for inputting the original attribute features, the attribute features generated by sampling at each level and the identity features into a pre-constructed feature fusion model to obtain a face fusion image output by the feature fusion model.
6. The apparatus of claim 5, wherein the first execution unit comprises:
the first execution subunit is used for utilizing an M-level down-sampling module in a preset attribute feature extraction model to down-sample the face model image to obtain the original attribute feature of the face template image;
and the second execution subunit is used for sequentially performing upsampling on the original attribute features by using M-level upsampling modules in the attribute feature extraction module to obtain the attribute features output by each level of upsampling modules, wherein the original attribute features and the output of the M-level downsampling module are used as the input of a first-level upsampling module in the attribute feature extraction model, the output of an i-level upsampling module and the output of an M-i-level downsampling module are used as the input of an i + 1-level upsampling module, M and i are positive integers, i is 1,.
7. The apparatus of claim 6, wherein the feature fusion model comprises M +1 feature fusion modules arranged in sequence; the feature fusion unit includes:
the input subunit is used for inputting the original attribute features, the attribute features generated by sampling at each level and the identity features into each feature fusion module in the feature fusion model so as to obtain a face fusion image output by the feature fusion model;
wherein, the original attribute feature and the identity feature are used as the input of the first feature fusion module in the feature fusion model; the output of the kth feature fusion module, the attribute feature output by the kth-level down-sampling module and the identity feature are used as the input of a (k + 1) th feature fusion module; k1.., M.
8. The apparatus of claim 5, wherein the feature fusion unit comprises:
the system comprises an acquisition subunit, a fusion processing subunit and a fusion processing subunit, wherein the acquisition subunit is used for acquiring a model to be trained and a training data set, the training data comprises a plurality of picture groups, and each picture group comprises a face template training image and a face training image to be fused; the model to be trained comprises an initial attribute feature extraction model, a face recognition model and an initial feature fusion model;
the training subunit is used for sequentially inputting each picture group into the model to be trained so as to process the face template training image by using the initial attribute feature extraction model to obtain the attribute features of the face template training image; extracting the characteristics of the facial image to be fused by using a facial recognition model to obtain the identity characteristics of the facial image to be fused; fusing the attribute features of the face model training images and the attribute features of the face images to be fused by using the initial feature fusion model to obtain image data corresponding to the picture group; substituting the image data into a loss function of the model to be trained to obtain a loss value corresponding to the image data; adjusting network parameters of the initial attribute feature extraction model and the initial feature fusion model based on the loss value to obtain an initial feature fusion model meeting preset training completion conditions;
wherein the loss values include a countermeasure loss, an identity loss, an attribute loss, and a reconstruction loss; the countermeasure loss represents the loss of trueness of the image data output by the model to be trained; the identity loss represents the loss of the similarity degree of the image data output by the model to be trained and the identity characteristics of the face training image to be fused; the attribute loss represents the loss of the similarity degree of the initial image data of the model to be trained and the attribute characteristics of the face template image; reconstructing loss to represent pixel distance loss between a face object contained in image data output by a model to be trained and a face object contained in the face template training image;
and the third execution subunit is used for taking the initial feature fusion model which meets the training completion condition as a feature fusion model.
9. A storage medium, characterized in that the storage medium comprises a storage instruction, wherein when the instruction runs, a device on which the storage medium is located is controlled to execute the face image fusion method according to any one of claims 1 to 4.
10. An electronic device comprising a memory and one or more instructions, wherein the one or more instructions are stored in the memory and configured to be executed by the one or more processors to perform the method of fusing facial images according to any one of claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110619880.0A CN113361387A (en) | 2021-06-03 | 2021-06-03 | Face image fusion method and device, storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110619880.0A CN113361387A (en) | 2021-06-03 | 2021-06-03 | Face image fusion method and device, storage medium and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113361387A true CN113361387A (en) | 2021-09-07 |
Family
ID=77531687
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110619880.0A Pending CN113361387A (en) | 2021-06-03 | 2021-06-03 | Face image fusion method and device, storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113361387A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114140320A (en) * | 2021-12-09 | 2022-03-04 | 北京百度网讯科技有限公司 | Image migration method and training method and device of image migration model |
WO2023036239A1 (en) * | 2021-09-10 | 2023-03-16 | 北京字跳网络技术有限公司 | Human face fusion method and apparatus, device and storage medium |
WO2023050868A1 (en) * | 2021-09-30 | 2023-04-06 | 北京百度网讯科技有限公司 | Method and apparatus for training fusion model, image fusion method and apparatus, and device and medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111369646A (en) * | 2020-03-09 | 2020-07-03 | 南京理工大学 | Expression synthesis method integrating attention mechanism |
CN111652827A (en) * | 2020-04-24 | 2020-09-11 | 山东大学 | Front face synthesis method and system based on generation countermeasure network |
CN111783647A (en) * | 2020-06-30 | 2020-10-16 | 北京百度网讯科技有限公司 | Training method of face fusion model, face fusion method, device and equipment |
CN112364828A (en) * | 2020-11-30 | 2021-02-12 | 姜召英 | Face recognition method and financial system |
CN112560753A (en) * | 2020-12-23 | 2021-03-26 | 平安银行股份有限公司 | Face recognition method, device and equipment based on feature fusion and storage medium |
-
2021
- 2021-06-03 CN CN202110619880.0A patent/CN113361387A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111369646A (en) * | 2020-03-09 | 2020-07-03 | 南京理工大学 | Expression synthesis method integrating attention mechanism |
CN111652827A (en) * | 2020-04-24 | 2020-09-11 | 山东大学 | Front face synthesis method and system based on generation countermeasure network |
CN111783647A (en) * | 2020-06-30 | 2020-10-16 | 北京百度网讯科技有限公司 | Training method of face fusion model, face fusion method, device and equipment |
CN112364828A (en) * | 2020-11-30 | 2021-02-12 | 姜召英 | Face recognition method and financial system |
CN112560753A (en) * | 2020-12-23 | 2021-03-26 | 平安银行股份有限公司 | Face recognition method, device and equipment based on feature fusion and storage medium |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023036239A1 (en) * | 2021-09-10 | 2023-03-16 | 北京字跳网络技术有限公司 | Human face fusion method and apparatus, device and storage medium |
WO2023050868A1 (en) * | 2021-09-30 | 2023-04-06 | 北京百度网讯科技有限公司 | Method and apparatus for training fusion model, image fusion method and apparatus, and device and medium |
CN114140320A (en) * | 2021-12-09 | 2022-03-04 | 北京百度网讯科技有限公司 | Image migration method and training method and device of image migration model |
CN114140320B (en) * | 2021-12-09 | 2023-09-01 | 北京百度网讯科技有限公司 | Image migration method and training method and device of image migration model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110473141B (en) | Image processing method, device, storage medium and electronic equipment | |
Zeng et al. | Aggregated contextual transformations for high-resolution image inpainting | |
Zhu et al. | A deep collaborative framework for face photo–sketch synthesis | |
CN113361387A (en) | Face image fusion method and device, storage medium and electronic equipment | |
CN111553267B (en) | Image processing method, image processing model training method and device | |
CN109308725B (en) | System for generating mobile terminal table sentiment picture | |
CN111680550B (en) | Emotion information identification method and device, storage medium and computer equipment | |
CN111833360B (en) | Image processing method, device, equipment and computer readable storage medium | |
CN111292262B (en) | Image processing method, device, electronic equipment and storage medium | |
CN112837215B (en) | Image shape transformation method based on generation countermeasure network | |
CN117033609B (en) | Text visual question-answering method, device, computer equipment and storage medium | |
CN111489405B (en) | Face sketch synthesis system for generating confrontation network based on condition enhancement | |
CN111108508B (en) | Face emotion recognition method, intelligent device and computer readable storage medium | |
CN111046738B (en) | Precision improvement method of light u-net for finger vein segmentation | |
CN111127309A (en) | Portrait style transfer model training method, portrait style transfer method and device | |
CN113935435A (en) | Multi-modal emotion recognition method based on space-time feature fusion | |
WO2021217919A1 (en) | Facial action unit recognition method and apparatus, and electronic device, and storage medium | |
CN116958324A (en) | Training method, device, equipment and storage medium of image generation model | |
CN115222578A (en) | Image style migration method, program product, storage medium, and electronic device | |
CN116188912A (en) | Training method, device, medium and equipment for image synthesis model of theme image | |
CN116681960A (en) | Intelligent mesoscale vortex identification method and system based on K8s | |
CN116701706B (en) | Data processing method, device, equipment and medium based on artificial intelligence | |
CN113822114A (en) | Image processing method, related equipment and computer readable storage medium | |
CN111311732B (en) | 3D human body grid acquisition method and device | |
CN117094362A (en) | Task processing method and related device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210907 |