CN113361387A

CN113361387A - Face image fusion method and device, storage medium and electronic equipment

Info

Publication number: CN113361387A
Application number: CN202110619880.0A
Authority: CN
Inventors: 欧阳潘义; 吴红; 张亦弛
Original assignee: Hunan MgtvCom Interactive Entertainment Media Co Ltd
Current assignee: Hunan MgtvCom Interactive Entertainment Media Co Ltd
Priority date: 2021-06-03
Filing date: 2021-06-03
Publication date: 2021-09-07

Abstract

The invention provides a face image fusion method and device, a storage medium and electronic equipment, wherein the method comprises the following steps: when a face fusion instruction is received, acquiring a face template image and a face image to be fused corresponding to the face fusion instruction; performing multi-level down-sampling on the face template image to obtain original attribute features of the face template image, and performing multi-level up-sampling on the original features to obtain attribute features generated by up-sampling at each level; extracting the characteristics of the facial image to be fused by using a facial recognition model to obtain the identity characteristics of the facial image to be fused; and inputting the original attribute features, the attribute features generated by sampling at each level and the identity features into a pre-constructed feature fusion model to obtain a face fusion image output by the feature fusion model. By applying the method provided by the embodiment of the invention, the fusion effect of the pictures can be improved under the condition that the human face postures and the visual angles of the source images to be fused and the template images have larger differences.

Description

Face image fusion method and device, storage medium and electronic equipment

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a method and an apparatus for fusing a face image, a storage medium, and an electronic device.

Background

With the development of society and science and technology, the demand of people for good portrait shooting effect is increasing day by day. The functions of face beautification, chartlet, hair-changing type, face-changing and the like in each photographing application or image processing application are popular with users. The face changing method is mainly characterized in that face fusion is carried out on a user photo and a material photo, so that the fused image has human face appearance characteristics in the user photo and character image characteristics in the material photo, entertainment requirements of users are met, and interestingness of application is improved.

In the current image processing technology, pixels in a face area are usually directly replaced to realize the fusion of a face image of a user and a face template image, but the method is very sensitive to the face pose and the view angle, and the image fusion effect is poor under the condition that the difference between the face pose and the view angle of a source image to be fused and the template image of the user is large.

Disclosure of Invention

The invention aims to provide a face image fusion method, which can improve the fusion effect of pictures.

The invention also provides a face image fusion device, which is used for ensuring the realization and the application of the method in practice.

A face image fusion method comprises the following steps:

when a face fusion instruction is received, acquiring a face template image and a face image to be fused corresponding to the face fusion instruction;

performing multi-level down-sampling on a face template image to obtain original attribute features of the face template image, and performing multi-level up-sampling on the original features to obtain attribute features generated by up-sampling at each level;

extracting the characteristics of the facial image to be fused by using a facial recognition model to obtain the identity characteristics of the facial image to be fused;

and inputting the original attribute features, the attribute features generated by sampling at each level and the identity features into a pre-constructed feature fusion model to obtain a face fusion image output by the feature fusion model.

Optionally, in the method, the performing multi-level down-sampling on the face template image to obtain an original attribute feature of the face template image, and performing multi-level up-sampling on the original feature to obtain an attribute feature generated by each level of up-sampling includes:

utilizing an M-level down-sampling module in a preset attribute feature extraction model to down-sample the face model image to obtain the original attribute features of the face template image;

and sequentially upsampling the original attribute features by utilizing an M-level upsampling module in the attribute feature extraction module to obtain the attribute features output by each level of upsampling module, wherein the original attribute features and the output of the M-level downsampling module are used as the input of a first-level upsampling module in the attribute feature extraction model, the output of an i-th-level upsampling module and the output of an M-i-level downsampling module are used as the input of an i + 1-level upsampling module, M and i are positive integers, i is 1,.

Optionally, in the above method, the feature fusion model includes M +1 feature fusion modules arranged in sequence; inputting the original attribute features, the attribute features generated by sampling at each level and the identity features into a pre-constructed feature fusion model to obtain a face fusion image output by the feature fusion model, wherein the face fusion image comprises:

inputting the original attribute features, the attribute features generated by sampling at each level and the identity features into each feature fusion module in the feature fusion model to obtain a face fusion image output by the feature fusion model;

wherein, the original attribute feature and the identity feature are used as the input of the first feature fusion module in the feature fusion model; the output of the kth feature fusion module, the attribute feature output by the kth-level down-sampling module and the identity feature are used as the input of a (k + 1) th feature fusion module; k1.., M.

The method, optionally, includes a process of constructing a feature fusion model, including:

acquiring a model to be trained and a training data set, wherein the training data comprises a plurality of picture groups, and each picture group comprises a face template training image and a face training image to be fused; the model to be trained comprises an initial attribute feature extraction model, a face recognition model and an initial feature fusion model;

sequentially inputting each picture group into the model to be trained, and processing the face template training image by using the initial attribute feature extraction model to obtain the attribute features of the face template training image; extracting the characteristics of the facial image to be fused by using a facial recognition model to obtain the identity characteristics of the facial image to be fused; fusing the attribute features of the face model training images and the attribute features of the face images to be fused by using the initial feature fusion model to obtain image data corresponding to the picture group; substituting the image data into a loss function of the model to be trained to obtain a loss value corresponding to the image data; adjusting network parameters of the initial attribute feature extraction model and the initial feature fusion model based on the loss value to obtain an initial feature fusion model meeting preset training completion conditions;

wherein the loss values include a countermeasure loss, an identity loss, an attribute loss, and a reconstruction loss; the countermeasure loss represents the loss of trueness of the image data output by the model to be trained; the identity loss represents the loss of the similarity degree of the image data output by the model to be trained and the identity characteristics of the face training image to be fused; the attribute loss represents the loss of the similarity degree of the initial image data of the model to be trained and the attribute characteristics of the face template image; reconstructing loss to represent pixel distance loss between a face object contained in image data output by a model to be trained and a face object contained in the face template training image;

and taking the initial feature fusion model which meets the training completion condition as a feature fusion model.

A face image fusion apparatus, comprising:

the system comprises a receiving unit, a fusion unit and a fusion unit, wherein the receiving unit is used for acquiring a face template image and a face image to be fused corresponding to a face fusion instruction when the face fusion instruction is received;

the first execution unit is used for carrying out multi-stage down-sampling on the face template image to obtain the original attribute characteristics of the face template image, and carrying out multi-stage up-sampling on the original characteristics to obtain the attribute characteristics generated by each stage of up-sampling;

the second execution unit is used for extracting the characteristics of the face image to be fused by using the face recognition model to obtain the identity characteristics of the face image to be fused;

and the feature fusion unit is used for inputting the original attribute features, the attribute features generated by sampling at each level and the identity features into a pre-constructed feature fusion model to obtain a face fusion image output by the feature fusion model.

The above apparatus, optionally, the first execution unit includes:

the first execution subunit is used for utilizing an M-level down-sampling module in a preset attribute feature extraction model to down-sample the face model image to obtain the original attribute feature of the face template image;

and the second execution subunit is used for sequentially performing upsampling on the original attribute features by using M-level upsampling modules in the attribute feature extraction module to obtain the attribute features output by each level of upsampling modules, wherein the original attribute features and the output of the M-level downsampling module are used as the input of a first-level upsampling module in the attribute feature extraction model, the output of an i-level upsampling module and the output of an M-i-level downsampling module are used as the input of an i + 1-level upsampling module, M and i are positive integers, i is 1,.

Optionally, in the apparatus described above, the feature fusion model includes M +1 feature fusion modules arranged in sequence; the feature fusion unit includes:

the input subunit is used for inputting the original attribute features, the attribute features generated by sampling at each level and the identity features into each feature fusion module in the feature fusion model so as to obtain a face fusion image output by the feature fusion model;

The above apparatus, optionally, the feature fusion unit includes:

the system comprises an acquisition subunit, a fusion processing subunit and a fusion processing subunit, wherein the acquisition subunit is used for acquiring a model to be trained and a training data set, the training data comprises a plurality of picture groups, and each picture group comprises a face template training image and a face training image to be fused; the model to be trained comprises an initial attribute feature extraction model, a face recognition model and an initial feature fusion model;

the training subunit is used for sequentially inputting each picture group into the model to be trained so as to process the face template training image by using the initial attribute feature extraction model to obtain the attribute features of the face template training image; extracting the characteristics of the facial image to be fused by using a facial recognition model to obtain the identity characteristics of the facial image to be fused; fusing the attribute features of the face model training images and the attribute features of the face images to be fused by using the initial feature fusion model to obtain image data corresponding to the picture group; substituting the image data into a loss function of the model to be trained to obtain a loss value corresponding to the image data; adjusting network parameters of the initial attribute feature extraction model and the initial feature fusion model based on the loss value to obtain an initial feature fusion model meeting preset training completion conditions;

and the third execution subunit is used for taking the initial feature fusion model which meets the training completion condition as a feature fusion model.

A storage medium comprises storage instructions, wherein when the instructions are executed, a device where the storage medium is located is controlled to execute the human face image fusion method.

An electronic device comprising a memory, and one or more instructions, wherein the one or more instructions are stored in the memory and configured to be executed by one or more processors to perform the method of face image fusion as described above.

Compared with the prior art, the invention has the following advantages:

the invention provides a face image fusion method and device, a storage medium and electronic equipment, wherein the method comprises the following steps: when a face fusion instruction is received, acquiring a face template image and a face image to be fused corresponding to the face fusion instruction; performing multi-level down-sampling on a face template image to obtain original attribute features of the face template image, and performing multi-level up-sampling on the original features to obtain attribute features generated by up-sampling at each level; extracting the characteristics of the facial image to be fused by using a facial recognition model to obtain the identity characteristics of the facial image to be fused; and inputting the original attribute features, the attribute features generated by sampling at each level and the identity features into a pre-constructed feature fusion model to obtain a face fusion image output by the feature fusion model. By applying the method provided by the embodiment of the invention, the attribute characteristics of the face template image and the identity characteristics of the face image to be fused are extracted, and the attribute characteristics and the identity characteristics are fused, so that the fusion effect of the image can be improved under the condition that the difference between the face posture and the visual angle of the source image to be fused and the template image is large. The sense of incongruity of the face part can be reduced. By carrying out multi-stage feature extraction on different feature image sizes and utilizing multi-stage face attribute features, multi-stage fusion of input source face identity features and target face attribute features is carried out under multiple dimensions, finally fused features have stronger expression capability, and the generated face fusion image is more vivid.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flow chart of a method for fusing face images according to the present invention;

FIG. 2 is a diagram illustrating an exemplary structure of an attribute feature extraction model according to the present invention;

FIG. 3 is a flow chart of a process for constructing a feature fusion model provided by the present invention;

FIG. 4 is a diagram illustrating a structure of a model to be trained according to the present invention;

FIG. 5 is a schematic structural diagram of a face image fusion apparatus according to the present invention;

fig. 6 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The embodiment of the invention provides a face image fusion method, which can be applied to electronic equipment, wherein the electronic equipment can be a server, and a flow chart of the method is shown in figure 1, and specifically comprises the following steps:

s101: and when a face fusion instruction is received, acquiring a face template image and a face image to be fused corresponding to the face fusion instruction.

In the embodiment of the invention, under the condition of receiving the face fusion instruction, the face fusion instruction can be analyzed to obtain the instruction information, and the face template image and the face image to be fused are obtained from the instruction information.

The face template image may include a first face object, the face image to be fused may include a second face object, and a person to which the first face object belongs and a person to which the second face object belongs may be the same or different.

S102: the method comprises the steps of conducting multilevel down-sampling on a face template image to obtain original attribute features of the face template image, conducting multilevel up-sampling on the original features, and obtaining attribute features generated by sampling on each level.

The original attribute features and the attribute features generated by sampling at each level may include one or more of pose features, background features, scene illumination features, expression features, skin color features, and the like in the face model image.

S103: and extracting the characteristics of the face image to be fused by using a face recognition model to obtain the identity characteristics of the face image to be fused.

In the embodiment of the present invention, the face recognition model may be an arcface model that has been trained, and the identity feature of the face image can be extracted to perform face recognition, that is, the identity feature is used to recognize the person to which the face object in the face image belongs.

S104: and inputting the original attribute features, the attribute features generated by sampling at each level and the identity features into a pre-constructed feature fusion model to obtain a face fusion image output by the feature fusion model.

In the embodiment of the present invention, the structure of the feature fusion model may be a residual structure.

The face fusion image may include the identity feature of the face image to be fused and the attribute feature of the face model image.

By applying the method provided by the embodiment of the invention, the attribute characteristics of the face template image and the identity characteristics of the face image to be fused are extracted, and the attribute characteristics and the identity characteristics are fused, so that the fusion effect of the image can be improved under the condition that the difference between the face posture and the visual angle of the source image to be fused and the template image is large. The sense of incongruity of the face part can be reduced. By carrying out multi-stage feature extraction on different feature image sizes and utilizing multi-stage face attribute features, multi-stage fusion of input source face identity features and target face attribute features is carried out under multiple dimensions, finally fused features have stronger expression capability, and the generated face fusion image is more vivid.

In this embodiment of the present invention, based on the foregoing implementation process, specifically, the performing multi-level down-sampling on the face template image to obtain the original attribute features of the face template image, and performing multi-level up-sampling on the original features to obtain the attribute features generated by each level of up-sampling includes:

In the method provided by the embodiment of the present invention, the attribute feature extraction model may include an M-level down-sampling module, a connection layer, and an M-level up-sampling module, referring to fig. 2, which is a structural example diagram of the attribute feature extraction model provided by the embodiment of the present invention, in the M-level down-sampling module, an output of an i-th down-sampling module is an input of an i + 1-th down-sampling module and an M-i + 1-th up-sampling module.

Specifically, feature extraction is performed by performing convolution layer calculation based on an input image, and downsampling of features is performed to increase the receptive field of a model and enhance the expression of global features. The up-sampling is a process of restoring the feature map of the image, and the original feature map and the feature map obtained by up-sampling are integrated with feature information through skip level connection in the restoring process.

In the embodiment of the present invention, based on the implementation process, specifically, the feature fusion model includes M +1 feature fusion modules arranged in sequence; inputting the original attribute features, the attribute features generated by sampling at each level and the identity features into a pre-constructed feature fusion model to obtain a face fusion image output by the feature fusion model, wherein the face fusion image comprises:

In the embodiment of the invention, for any one feature fusion module, the original attribute features or the attribute features input into the feature fusion module are respectively obtained by calculating two independent convolution layers

And

the identity characteristic input is respectively obtained by two independent full-connection layer calculations

And

the front-level feature input is expressed as

Normalized by the examples to f^k，f^kObtaining M through the output of the convolution layer^k(as a weight parameter for fusion, the value is between 0 and 1). And f^kConversion with attributes, respectively

And identity transformation

Respectively obtain A by calculation^kAnd I^kThe output of the final feature fusion module layer is formed by M^kAct on A^kAnd I^kThe weighting of the features is obtained. A. the^k、I^kAnd the calculation formula of the fusion feature can be expressed as follows:

wherein k represents the kth feature fusion module,

multiplication of corresponding elements of features representing a multi-dimensional matrix representation, f_outRepresenting the output of the feature fusion computation, the output of the layers of the feature fusion model, like the residual connection of ResNet, can be expressed as:

in the embodiment of the present invention, based on the implementation process, specifically, the process of constructing the feature fusion model, as shown in fig. 3, specifically includes:

s301: acquiring a model to be trained and a training data set, wherein the training data comprises a plurality of picture groups, and each picture group comprises a face template training image and a face training image to be fused; the model to be trained comprises an initial attribute feature extraction model, a face recognition model and an initial feature fusion model.

In the embodiment of the present invention, as shown in fig. 4, the structure of the initial attribute feature extraction model is consistent with that of the U-net model, and includes 3 important parts, i.e., down-sampling, up-sampling, and skip level connection, and firstly, feature extraction is performed by performing feature down-sampling based on the input image through convolutional layer calculation, and the down-sampling increases the receptive field of the model to enhance the expression of global features. The up-sampling is a process of restoring the feature map of the image, the original feature map and the feature map obtained by up-sampling are integrated with feature information through grade skipping connection in the restoring process, and the attribute feature of the first face, namely different feature maps obtained in the up-sampling process, is used as multi-grade attribute feature representation of the face attribute.

Optionally, the face recognition model arcface can extract features of a face picture to perform face recognition, the extracted face features are used for representing identity features of a face, and the extraction of the identity features does not participate in model training in the method because a pre-trained arcface model is used.

Specifically, the network structure design of the feature fusion module is similar to the residual module structure design of the ResNet network model. The feature fusion module network layer comprises three inputs, namely attribute feature information, identity feature information and feature information output by a previous layer of feature fusion module, and the feature fusion calculation process comprises the following steps that the attribute feature input is respectively obtained through calculation of two independent convolution layers

And

And

the front-level feature input is expressed as

Normalized by the examples to f^k，f^kObtaining M through the output of the convolution layer^kAnd f is^kConversion with attributes, respectively

And identity transformation

Respectively obtain A by calculation^kAnd I^kThe output of the final feature fusion module layer is formed by M^kAct on A^kAnd I^kThe weighting of the features is obtained.

S302: sequentially inputting each picture group into the model to be trained, and processing the face template training image by using the initial attribute feature extraction model to obtain the attribute features of the face template training image; extracting the characteristics of the facial image to be fused by using a facial recognition model to obtain the identity characteristics of the facial image to be fused; fusing the attribute features of the face model training images and the attribute features of the face images to be fused by using the initial feature fusion model to obtain image data corresponding to the picture group; substituting the image data into a loss function of the model to be trained to obtain a loss value corresponding to the image data; and adjusting the network parameters of the initial attribute feature extraction model and the initial feature fusion model based on the loss value to obtain the initial feature fusion model meeting the preset training completion condition.

Wherein the loss values include a countermeasure loss, an identity loss, an attribute loss, and a reconstruction loss; the countermeasure loss represents the loss of trueness of the image data output by the model to be trained; the identity loss represents the loss of the similarity degree of the image data output by the model to be trained and the identity characteristics of the face training image to be fused; the attribute loss represents the loss of the similarity degree of the initial image data of the model to be trained and the attribute characteristics of the face template image; the reconstruction loss represents the pixel distance loss between the face object contained in the image data output by the model to be trained and the face object contained in the face template training image.

In an embodiment of the invention, the loss function may be as follows:

loss＝l_adv+λ₁l_id+λ₂l_attr+λ₃l_rec

wherein l_advTo combat the loss term,/_idFor identity loss terms,/_attrFor attribute loss items,/_recFor attribute loss terms, the expression of each loss term is as follows:

l_id＝1-cos(Z_id(X_s),Z_id(Y'_s,t))

wherein, Y'_s,tImage data output for a feature fusion model, X_tTraining images for face templates, Y'_s,tAn image is trained for a human face.

S303: and taking the initial feature fusion model which meets the training completion condition as a feature fusion model.

In an embodiment of the present invention, the training completion condition may be convergence of the loss function.

Corresponding to the method described in fig. 1, an embodiment of the present invention further provides a face image fusion apparatus, which is used for specifically implementing the method in fig. 1, and the face image fusion apparatus provided in the embodiment of the present invention may be applied to an electronic device, and a schematic structural diagram of the face image fusion apparatus is shown in fig. 5, and specifically includes:

the receiving unit 501 is configured to, when a face fusion instruction is received, obtain a face template image and a face image to be fused corresponding to the face fusion instruction;

the first execution unit 502 is configured to perform multi-level down-sampling on a face template image to obtain an original attribute feature of the face template image, and perform multi-level up-sampling on the original feature to obtain an attribute feature generated by each level of up-sampling;

a second executing unit 503, configured to perform characteristic extraction on the to-be-fused face image by using a face recognition model, so as to obtain an identity feature of the to-be-fused face image;

the feature fusion unit 504 is configured to input the original attribute features, the attribute features generated by sampling at each level, and the identity features into a pre-constructed feature fusion model, so as to obtain a face fusion image output by the feature fusion model.

In this embodiment of the present invention, based on the above scheme, specifically, the first execution unit 502 includes:

In the embodiment of the present invention, based on the above scheme, specifically, the feature fusion model includes M +1 feature fusion modules arranged in sequence; the feature fusion unit 504 includes:

In an embodiment of the present invention, based on the above scheme, specifically, the feature fusion unit 504 includes:

The specific principle and the implementation process of each unit and each module in the facial image fusion device disclosed in the embodiment of the present invention are the same as those of the facial image fusion method disclosed in the embodiment of the present invention, and reference may be made to corresponding parts in the facial image fusion method provided in the embodiment of the present invention, which are not described herein again.

The embodiment of the invention also provides a storage medium, which comprises a stored instruction, wherein when the instruction runs, the equipment where the storage medium is located is controlled to execute the face image fusion method.

An electronic device is provided in an embodiment of the present invention, and the structural diagram of the electronic device is shown in fig. 6, which specifically includes a memory 601 and one or more instructions 602, where the one or more instructions 602 are stored in the memory 601 and configured to be executed by one or more processors 603 to perform the following operations on the one or more instructions 602:

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functions of the units may be implemented in the same software and/or hardware or in a plurality of software and/or hardware when implementing the invention.

From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

The human face image fusion method provided by the invention is described in detail above, a specific example is applied in the text to explain the principle and the implementation mode of the invention, and the description of the above example is only used to help understanding the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A face image fusion method is characterized by comprising the following steps:

2. The method of claim 1, wherein the down-sampling the face template image in multiple stages to obtain original attribute features of the face template image, and up-sampling the original features in multiple stages to obtain attribute features generated by up-sampling in each stage comprises:

3. The method of claim 2, wherein the feature fusion model comprises M +1 feature fusion modules arranged in sequence; inputting the original attribute features, the attribute features generated by sampling at each level and the identity features into a pre-constructed feature fusion model to obtain a face fusion image output by the feature fusion model, wherein the face fusion image comprises:

4. The method of claim 1, wherein the process of constructing a feature fusion model comprises:

5. A face image fusion device is characterized by comprising:

6. The apparatus of claim 5, wherein the first execution unit comprises:

7. The apparatus of claim 6, wherein the feature fusion model comprises M +1 feature fusion modules arranged in sequence; the feature fusion unit includes:

8. The apparatus of claim 5, wherein the feature fusion unit comprises:

9. A storage medium, characterized in that the storage medium comprises a storage instruction, wherein when the instruction runs, a device on which the storage medium is located is controlled to execute the face image fusion method according to any one of claims 1 to 4.

10. An electronic device comprising a memory and one or more instructions, wherein the one or more instructions are stored in the memory and configured to be executed by the one or more processors to perform the method of fusing facial images according to any one of claims 1-4.