CN113486785A

CN113486785A - Video face changing method, device, equipment and storage medium based on deep learning

Info

Publication number: CN113486785A
Application number: CN202110754216.7A
Authority: CN
Inventors: 张攀; 刘求索
Original assignee: Shenzhen Inveno Technology Co ltd
Current assignee: Shenzhen Inveno Technology Co ltd
Priority date: 2021-07-01
Filing date: 2021-07-01
Publication date: 2021-10-08

Abstract

The invention belongs to the technical field of face changing, and discloses a video face changing method, device, equipment and storage medium based on deep learning. The method comprises the following steps: when a face changing instruction of a video to be face changed is received, a pre-stored face changing model is obtained according to the face changing instruction; extracting a video frame to be face-changed in a video to be face-changed; when the face information exists in the video frame to be changed, determining a face changing image according to the face information in the video frame to be changed by a pre-stored face changing model to obtain the face changing image; and updating the video to be face-changed according to the face-changed image to obtain the target video. Through the mode, the pre-stored face changing model can be obtained when a face changing instruction is received, then the video frame to be changed, needing face changing, of the face is extracted, when face information exists in the video frame to be changed, the face changing image is determined through the pre-stored face changing model, finally, the video to be changed is updated according to the face changing image, the target video is obtained, the video face changing training times are few, the time consumption is short, and the resource consumption is small.

Description

Video face changing method, device, equipment and storage medium based on deep learning

Technical Field

The invention relates to the technical field of face changing, in particular to a video face changing method, device, equipment and storage medium based on deep learning.

Background

Face-changing technology has emerged in recent years, and many scholars are studying both in the industry and academia. The face changing technology can automatically replace the face in the video or the image with the face of another person, and the face changing technology is widely applied to various fields. However, the existing face changing technology needs to be trained for a long time and for a plurality of times when the video is replaced each time, and the resource consumption is overlarge.

The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.

Disclosure of Invention

The invention mainly aims to provide a video face changing method, a video face changing device, video face changing equipment and a storage medium based on deep learning, and aims to solve the technical problems of long training time, more training times and large resource consumption of replacing videos in the prior art.

In order to achieve the above object, the present invention provides a video face changing method based on deep learning, which comprises the following steps:

when a face changing instruction of a video to be face changed is received, acquiring a pre-stored face changing model according to the face changing instruction;

extracting a video frame to be face-changed in the video to be face-changed;

when the face information exists in the video frame to be changed, determining a face changing image according to the face information in the video frame to be changed through the pre-stored face changing model to obtain a face changing image;

and updating the video to be face-changed according to the face-changed image to obtain a target video.

Optionally, when a face change instruction of a video to be face changed is received, before a pre-stored face change model is obtained according to the face change instruction, the method further includes:

acquiring a historical face-changing image, a historical replacement image and a historical target video frame;

training an original model according to the historical face-changing image, the historical replacement image and the historical target video frame to obtain a trained model;

and adjusting the trained model through a preset format to obtain a pre-stored face changing model.

Optionally, before the training an original model according to the historical face-changed image, the historical replacement image, and the historical target video frame to obtain a trained model, the method further includes:

obtaining a plurality of preset loss functions;

obtaining a preset total loss function according to the preset loss function;

updating the original model according to the preset total loss function to obtain a target model;

the training of the original model according to the historical face-changing image, the historical image to be replaced and the historical target video frame to obtain a trained model comprises the following steps:

and training the target model according to the historical face-changing image, the historical image to be replaced and the historical target video frame to obtain a trained model.

Optionally, when the face information exists in the video frame to be changed, determining a face change image according to the face information in the video frame to be changed through the pre-stored face change model, to obtain a face change image, including:

when the face information exists in the video frame to be changed, obtaining an aligned face image according to the video frame to be changed;

and determining the face changing image of the aligned face image through the pre-stored face changing model to obtain the face changing image.

Optionally, when a human face is identified in the video frame to be changed, obtaining an aligned human face image according to the video frame to be changed includes:

when a face is identified in the video frame to be face-changed, detecting the video frame to be face-changed to obtain a plurality of face key points;

cutting a human face area image with a preset size in the video frame to be changed according to the human face key point;

and carrying out alignment operation on the face according to the face region image to obtain an aligned face image.

Optionally, after extracting a frame of the video to be face-changed in the video to be face-changed, the method further includes:

when no human face is identified in the video frame to be changed, taking the video frame to be changed as a first target video frame;

inquiring video frame position information of the first target video frame in the video to be changed;

and writing the first target video frame back to the video to be changed according to the video frame position information.

Optionally, the updating the video to be face-changed according to the face-changed image to obtain a target video, including:

inquiring the corresponding position of the face changing image in the video frame to be changed;

determining a replacement image according to the corresponding position;

replacing the face changing image in the video frame to be changed with the replacing image to obtain a second target video frame;

and updating the video to be face-changed according to the second target video frame to obtain a target video.

In addition, in order to achieve the above object, the present invention further provides a video face changing device based on deep learning, including:

the calling module is used for acquiring a pre-stored face changing model according to a face changing instruction when the face changing instruction of a video to be face changed is received;

the extraction module is used for extracting a video frame to be changed in the video to be changed;

the determining module is used for determining face changing images of the face information in the video frame to be changed through the pre-stored face changing model when the face information exists in the video frame to be changed, so as to obtain face changing images;

and the updating module is used for updating the video to be face-changed according to the face-changed image to obtain a target video.

In addition, in order to achieve the above object, the present invention further provides a video face changing device based on deep learning, including: a memory, a processor, and a deep learning based video facer stored on the memory and executable on the processor, the deep learning based video facer configured to implement a deep learning based video facer method as described above.

In addition, to achieve the above object, the present invention further provides a storage medium, on which a deep learning based video face changing program is stored, which when executed by a processor implements the deep learning based video face changing method as described above.

When a face changing instruction of a video to be face changed is received, a pre-stored face changing model is obtained according to the face changing instruction; extracting a video frame to be face-changed in the video to be face-changed; when the face information exists in the video frame to be changed, determining a face changing image according to the face information in the video frame to be changed through the pre-stored face changing model to obtain a face changing image; and updating the video to be face-changed according to the face-changed image to obtain a target video. Through the mode, the pre-stored face changing model can be obtained when a face changing instruction is received, then the video frame to be changed, needing face changing, of the face is extracted, when face information exists in the video frame to be changed, the face changing image is determined through the pre-stored face changing model, finally, the video frame to be changed is updated according to the face changing image, the target video is obtained, only one pre-stored face changing model is called to process the video frame, and finally the face changing image is obtained, so that the video face changing training times are few, the time consumption of steps is short, and the resource consumption is small.

Drawings

Fig. 1 is a schematic structural diagram of a deep learning-based video face changing device in a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a schematic flowchart of a video face-changing method based on deep learning according to a first embodiment of the present invention;

FIG. 3 is a flowchart illustrating a video face-changing method based on deep learning according to a second embodiment of the present invention;

FIG. 4 is a schematic view of key points of a human face in an embodiment of a deep learning-based video face changing method of the present invention;

fig. 5 is a block diagram illustrating a video face changing device based on deep learning according to a first embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a video face changing device based on deep learning in a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 1, the video face changing device based on deep learning may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a Wireless interface (e.g., a Wireless-Fidelity (Wi-Fi) interface). The Memory 1005 may be a Random Access Memory (RAM) Memory, or may be a Non-Volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the architecture shown in fig. 1 does not constitute a limitation of a deep learning based video facer apparatus, and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a storage medium, may include therein an operating system, a network communication module, a user interface module, and a deep learning-based video facelining program.

In the deep learning based video facelining apparatus shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 of the video face changing device based on deep learning of the present invention may be disposed in the video face changing device based on deep learning, and the video face changing device based on deep learning calls the video face changing program based on deep learning stored in the memory 1005 through the processor 1001, and executes the video face changing method based on deep learning provided by the embodiment of the present invention.

An embodiment of the present invention provides a video face changing method based on deep learning, and referring to fig. 2, fig. 2 is a schematic flow diagram of a first embodiment of a video face changing method based on deep learning according to the present invention.

In this embodiment, the video face changing method based on deep learning includes the following steps:

step S10: and when a face changing instruction of a video to be face changed is received, acquiring a pre-stored face changing model according to the face changing instruction.

It should be noted that the execution subject of the present embodiment is a controller, the controller is mainly used for controlling the video face changing method based on deep learning, and may be any device capable of implementing the function, which is not limited in the present embodiment. The present embodiment and the following embodiments will be specifically described by taking a controller of a video face changing method based on deep learning as an example.

It should be understood that the face change instruction refers to an instruction for instructing to start executing face change of a video to be face changed, and the face change instruction may be an instruction in any form, which is not limited in this embodiment.

In specific implementation, the pre-stored face changing model refers to a pre-processed and stored model for face changing, and the pre-stored face changing model in this embodiment is a model obtained by modifying a part of a structure and adjusting parameters based on a faceshift model.

It should be noted that the video to be changed is a video that needs to perform the step of the face changing method, and may be a video with any length and definition, which is not limited in this embodiment.

It should be understood that, when a face change instruction of a video to be face changed is received, acquiring a pre-stored face change model according to the face change instruction means that the pre-stored face change model is called when a face change instruction that a video to be face changed needs to be face changed is received.

Further, before the step S10, in order to obtain the pre-stored face-changing model after training in advance, the method further includes:

The history face change image is a face change image stored in a step of a successful face change method performed before the face change of this time. The face-changing image refers to an image area which is determined by a pre-stored face-changing model and needs to be replaced in a video frame to be face-changed.

It should be understood that the historical replacement images refer to replacement images stored in a successful face change method step performed prior to the present face change. The replacing image refers to an image which needs to replace the face changing image on the video frame to be face changed.

In a specific implementation, the historical target video frame refers to a video frame which is stored in a step of a successful face changing method before the face changing is performed, and the face is completely replaced.

It should be noted that the historical face-changing image, the historical replacement image, and the historical target video frame are stored in the form of a source data set CelebA-HQ or FFHQ.

It should be understood that the original model is trained according to the historical face-changed image, the historical replacement image and the historical target video frame, and the obtaining of the trained model refers to extracting the historical face-changed image, the historical replacement image and the historical target video frame from a training data set, and then training the original model according to the historical face-changed image, the historical replacement image and the historical target video frame to obtain the trained model.

In a specific implementation, the original model refers to a faceshift model that is invoked prior to training.

It should be noted that the preset format refers to a PB format, and the preset format is used to adjust the trained model to obtain the pre-stored face-changed model refers to a PB format of the trained model, so as to obtain the pre-stored face-changed model.

By the method, the original model can be trained accurately and quickly, and the pre-stored face-changing model is obtained.

Further, in order to perform structure adjustment on the original model, the step of training the original model according to the historical face-changed image, the historical replacement image, and the historical target video frame further includes, before obtaining the trained model:

obtaining a plurality of preset loss functions;

obtaining a preset total loss function according to the preset loss function;

It should be understood that the preset loss function refers to a loss function that is stored in advance to replace the function in the original model.

In a specific implementation, the preset loss function may include the following four loss functions, which are, in order, a GAN loss function, an id loss function, a multi-layer attribute loss function, and a pixel-level reconstruction loss function:

wherein D is the discriminator, G is the generator, E is the mathematical expectation, Z_idBeing a characteristic of the face, X_sFor source faces, i.e. face-changed images, X, which need to be changed_tFor the target face, i.e., the replacement image replacing the face-changed image,

refers to the generated face, i.e. the face after replacement; z^k _attIs the embedding of a certain property in the k-th layer.

In a specific implementation, obtaining the preset total loss function according to the preset loss function means obtaining the preset total loss function according to a GAN loss function, an id loss function, a multi-layer attribute loss function, and a pixel-level reconstruction loss function.

It should be noted that the calculation method of the preset total loss function is as follows:

wherein λ is_att、λ_idAnd λ_recIs a preset coefficient.

It should be understood that, updating the original model according to the preset total loss function to obtain the target model means that, after the preset total loss function is obtained, the preset total loss function is used to replace the loss function in the original model, and the obtained model is the target model.

By the method, the accuracy of the original model can be improved by replacing the loss function in the original model, and the accuracy and the final effect of video face changing are improved.

Step S20: and extracting a video frame to be changed in the video to be changed.

In a specific implementation, the video frame to be changed is a designated video frame that needs to be changed in the video to be changed, and the video frame to be changed may be any frame of a picture in the video to be changed, which is not limited in this embodiment.

It should be noted that, extracting a video frame to be changed in the video to be changed means extracting a video frame to be changed from the video to be changed as the video frame to be changed.

Step S30: and when the face information exists in the video frame to be changed, determining a face changing image according to the face information in the video frame to be changed through the pre-stored face changing model to obtain the face changing image.

It should be understood that, the image recognition means may be used, and other methods may also be used to determine whether the face information exists in the video to be changed, which is not limited in this embodiment.

In a specific implementation, when the face information exists in the video frame to be changed, the face information in the video frame to be changed is determined by the pre-stored face changing model, and the obtained face changing image means that after the face information exists in the video frame to be changed, the video frame to be changed is input into the pre-stored face changing model, and finally, the image output by the pre-stored face changing model is the determined face changing image.

Step S40: and updating the video to be face-changed according to the face-changed image to obtain a target video.

It should be noted that the target video refers to a face-changed video that needs to be finally obtained after the method is performed.

It should be understood that updating the video to be face-changed according to the face-changed image to obtain the target video means that, after the face-changed image is determined, replacing and covering a face-changed image in a video frame to be face-changed by a prepared replacement image, then obtaining a second target video frame, and finally updating the video to be face-changed according to the second target video frame to obtain the target video.

In the embodiment, when a face changing instruction of a video to be face changed is received, a pre-stored face changing model is obtained according to the face changing instruction; extracting a video frame to be face-changed in the video to be face-changed; when the face information exists in the video frame to be changed, determining a face changing image according to the face information in the video frame to be changed through the pre-stored face changing model to obtain a face changing image; and updating the video to be face-changed according to the face-changed image to obtain a target video. Through the mode, the pre-stored face changing model can be obtained when a face changing instruction is received, then the video frame to be changed, needing face changing, of the face is extracted, when face information exists in the video frame to be changed, the face changing image is determined through the pre-stored face changing model, finally, the video frame to be changed is updated according to the face changing image, the target video is obtained, only one pre-stored face changing model is called to process the video frame, and finally the face changing image is obtained, so that the video face changing training times are few, the time consumption of steps is short, and the resource consumption is small.

Referring to fig. 3, fig. 3 is a flowchart illustrating a video face changing method based on deep learning according to a second embodiment of the present invention.

Based on the first embodiment, the video face changing method based on deep learning of the present embodiment includes, in the step S30:

step S301: and when the face information exists in the video frame to be changed, obtaining an aligned face image according to the video frame to be changed.

It should be noted that aligning the face image refers to processing the video frame to be changed to obtain the face image when the face information exists in the video frame to be changed.

It should be understood that, when the face information exists in the video frame to be changed, obtaining an aligned face image according to the video frame to be changed means that, when the face information exists in the video frame to be changed, the video frame to be changed is processed, the image position of the face is determined first, and then the face image is aligned to obtain an aligned face image.

Further, in order to accurately obtain the aligned face image, step S101 includes:

It should be noted that the face key points refer to key image points on the face, which are automatically selected after the video frame to be changed is automatically detected, and are used for identifying image points of facial features, and the number of the face key points may be any number, which is not limited in this embodiment.

In a specific implementation, as shown in fig. 4, a schematic diagram of selecting face key points is shown, the face key points are selected based on feature parts such as facial organs of an identified face, and the number of the face key points is not specifically limited in this embodiment, and may be any number. Fig. 4 is only an illustration, and does not limit the description of the present embodiment.

It should be understood that when a human face is identified in the video frame to be changed, detecting the video frame to be changed to obtain a plurality of human face key points means that when human face information exists in the video frame to be changed, image detection is automatically performed on the video frame to be changed, and then the plurality of human face key points are determined.

In a specific implementation, the preset size is a size that can be set by an administrator or a user, and is used for cutting the face image, and the size of the preset size is not limited in this embodiment.

The face region image refers to an image of a preset size cut out from a video frame to be changed according to the position of a face key point.

It should be understood that, the cutting of the face region image with the preset size in the video frame to be changed according to the face key points means that, after a plurality of face key points are determined, an image region with the preset size is cut out in the video frame to be changed according to the positions of the face key points to serve as the face region image.

In a specific implementation, performing an alignment operation on a face according to the face region image to obtain an aligned face image means that after the face region image is obtained, performing an alignment operation on the face image through affine transformation, that is, adjusting the position and the angle of the face to the front, and obtaining the final image, that is, the aligned face image.

By the method, the face image can be accurately and quickly changed into the aligned face image, so that the subsequent face changing step is more convenient, and the final face changing effect is improved.

Step S302: and determining the face changing image of the aligned face image through the pre-stored face changing model to obtain the face changing image.

It should be noted that, determining the face-changing image of the aligned face image through the pre-stored face-changing model to obtain the face-changing image means that the aligned face image is input into the pre-stored face-changing model obtained after training is completed, then an image output by the pre-stored face-changing model is obtained, and the image output by the pre-stored face-changing model is used as the face-changing image.

Further, in order to put back the video frame to be face-changed to the video frame to be face-changed when the video frame to be face-changed does not recognize a human face, after the step of extracting the video frame to be face-changed in the video frame to be face-changed, the method further includes:

It should be understood that when no human face is identified in the video frame to be changed, taking the video frame to be changed as the first target video frame means that when no human face is identified after image recognition is performed on the video frame to be changed, taking the video frame to be changed without the human face identified as the first target video frame.

In a specific implementation, the video frame position information refers to a position of a frame number of the first target video frame in the video to be face-changed, that is, a frame number of the first target video frame in the video to be face-changed.

It should be noted that, the querying of the video frame position information of the first target video frame in the video to be face-changed refers to querying of the video frame position information of the first target video frame in the video to be face-changed after the first target video frame is determined.

It should be understood that, the step of placing the first target video frame back to the video to be face-changed according to the video frame position information refers to placing the first target video frame back to the video to be face-changed after determining the video frame position information of the first target video frame.

By the method, the video frame to be changed can be quickly placed back to the video to be changed when the face information is not identified in the video frame to be changed, and the video face changing efficiency is improved.

Further, in order to change the face of the video to be changed after the face change image is obtained, the step of updating the video to be changed according to the face change image to obtain a target video includes:

determining a replacement image according to the corresponding position;

It should be noted that, the querying of the corresponding position of the face-changing image in the video frame to be face-changed refers to querying the position of the face-changing image in the video frame to be face-changed, that is, the corresponding position.

It should be understood that the replacement image refers to a face image which is stored in advance and is used for replacing the face image, namely the face image in the video frame to be changed after the replacement is completed.

In a specific implementation, the determining of the replacement image according to the corresponding position refers to determining a replacement image to be replaced according to the corresponding position after determining the corresponding position of the face-changed image.

It should be noted that, replacing the face-changed image in the video frame to be face-changed with the replacement image to obtain the second target video frame means that the face-changed image in the video frame to be face-changed is replaced with the replacement image, and the finally obtained video frame is the second target video frame. That is, the second target video frame is an image in which the face-changed image in the video frame to be face-changed is changed to the replacement image.

It should be understood that, updating the video to be face-changed according to the second target video frame to obtain the target video means that, after the second target video frame is obtained, the position of the frame number of the second target video frame in the video to be face-changed is inquired, that is, the frame number of the second target video frame in the video to be face-changed is the number of the frame number, and then the second target video frame is placed in the video to be face-changed to replace the video frame to be face-changed, so as to obtain the target video.

In the embodiment, when the face information exists in the video frame to be changed, an aligned face image is obtained according to the video frame to be changed; and determining the face changing image of the aligned face image through the pre-stored face changing model to obtain the face changing image. Through the mode, the video frame of the face to be changed can be preprocessed before the face changing model is prestored, the aligned face image is obtained, then the aligned face image is input into the prestored face changing model, the image input with high accuracy of the prestored face changing model is used, and the face changing effect is improved.

In addition, an embodiment of the present invention further provides a storage medium, where the storage medium stores a deep learning-based video face changing program, and the deep learning-based video face changing program, when executed by a processor, implements the steps of the deep learning-based video face changing method as described above.

Since the storage medium adopts all technical solutions of all the embodiments described above, at least all the beneficial effects brought by the technical solutions of the embodiments described above are achieved, and are not described in detail herein.

Referring to fig. 5, fig. 5 is a block diagram illustrating a first embodiment of a video face changing apparatus based on deep learning according to the present invention.

As shown in fig. 5, the video face changing apparatus based on deep learning according to the embodiment of the present invention includes:

the calling module 10 is configured to, when a face changing instruction of a video to be face changed is received, obtain a pre-stored face changing model according to the face changing instruction.

And the extracting module 20 is configured to extract a video frame to be changed in the video to be changed.

And the processing module 30 is configured to determine a face change image according to the pre-stored face change model when the face information exists in the video frame to be changed, so as to obtain the face change image.

And the updating module 40 is configured to update the video to be face-changed according to the face-changed image to obtain a target video.

It should be understood that the above is only an example, and the technical solution of the present invention is not limited in any way, and in a specific application, a person skilled in the art may set the technical solution as needed, and the present invention is not limited thereto.

In the embodiment, when a face changing instruction of a video to be face changed is received, a pre-stored face changing model is obtained according to the face changing instruction; extracting a video frame to be face-changed in the video to be face-changed; when the face information exists in the video frame to be changed, determining a face changing image according to the face information in the video frame to be changed through the pre-stored face changing model to obtain a face changing image; and updating the video to be face-changed according to the face-changed image to obtain a target video. Through the mode, the pre-stored face changing model can be obtained when a face changing instruction is received, then the video frame to be changed, needing face changing, of the face is extracted, when face information exists in the video frame to be changed, the face changing image is determined through the pre-stored face changing model, finally the video line to be changed is updated according to the face changing image, the target video is obtained, the video frame is processed by only calling the pre-stored face changing model, and finally the face changing image is obtained, so that the video face changing training times are few, the time consumption of steps is short, and the resource consumption is small.

In this embodiment, the calling module 10 is further configured to obtain a historical face-changing image, a historical replacement image, and a historical target video frame; training an original model according to the historical face-changing image, the historical replacement image and the historical target video frame to obtain a trained model; and adjusting the trained model through a preset format to obtain a pre-stored face changing model.

In this embodiment, the calling module 10 is further configured to obtain a plurality of preset loss functions; obtaining a preset total loss function according to the preset loss function; updating the original model according to the preset total loss function to obtain a target model; the training of the original model according to the historical face-changing image, the historical image to be replaced and the historical target video frame to obtain a trained model comprises the following steps: and training the target model according to the historical face-changing image, the historical image to be replaced and the historical target video frame to obtain a trained model.

In this embodiment, the processing module 30 is further configured to, when face information exists in the video frame to be face-changed, obtain an aligned face image according to the video frame to be face-changed; and determining the face changing image of the aligned face image through the pre-stored face changing model to obtain the face changing image.

In this embodiment, the processing module 30 is further configured to, when a human face is identified in the video frame to be face-changed, detect the video frame to be face-changed to obtain a plurality of human face key points; cutting a human face area image with a preset size in the video frame to be changed according to the human face key point; and carrying out alignment operation on the face according to the face region image to obtain an aligned face image.

In this embodiment, the extracting module 20 is further configured to, when a human face is not identified in the video frame to be changed, use the video frame to be changed as a first target video frame; inquiring video frame position information of the first target video frame in the video to be changed; and putting the first target video frame back to the video to be changed with the face according to the video frame position information.

In this embodiment, the updating module 40 is further configured to query a corresponding position of the face-changed image in the video frame to be face-changed; determining a replacement image according to the corresponding position; replacing the face changing image in the video frame to be changed with the replacing image to obtain a second target video frame; and updating the video to be face-changed according to the second target video frame to obtain a target video.

Since the present apparatus employs all technical solutions of all the above embodiments, at least all the beneficial effects brought by the technical solutions of the above embodiments are achieved, and are not described in detail herein.

It should be noted that the above-described work flows are only exemplary, and do not limit the scope of the present invention, and in practical applications, a person skilled in the art may select some or all of them to achieve the purpose of the solution of the embodiment according to actual needs, and the present invention is not limited herein.

In addition, the technical details that are not described in detail in this embodiment may be referred to a video face changing method based on deep learning provided in any embodiment of the present invention, and are not described herein again.

Further, it is to be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (e.g. Read Only Memory (ROM)/RAM, magnetic disk, optical disk), and includes several instructions for enabling a terminal device (e.g. a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A video face changing method based on deep learning is characterized in that the video face changing method based on deep learning comprises the following steps:

extracting a video frame to be face-changed in the video to be face-changed;

2. The method of claim 1, wherein before the obtaining of the pre-stored face changing model according to the face changing instruction when the face changing instruction of the video to be face changed is received, the method further comprises:

3. The method of claim 2, wherein before the training an original model according to the historical re-face image, the historical replacement image and the historical target video frame to obtain a trained model, further comprising:

obtaining a plurality of preset loss functions;

obtaining a preset total loss function according to the preset loss function;

4. The method of claim 1, wherein when the face information exists in the video frame to be changed, determining a face change image according to the pre-stored face change model by using the face information in the video frame to be changed to obtain the face change image, comprising:

5. The method of claim 4, wherein when a human face is recognized in the video frame to be face-changed, obtaining an aligned human face image according to the video frame to be face-changed comprises:

6. The method according to any one of claims 1 to 5, wherein after extracting the frame of the video to be face-changed from the video to be face-changed, the method further comprises:

7. The method according to any one of claims 1 to 5, wherein the updating the video to be face-changed according to the face-changed image to obtain a target video comprises:

determining a replacement image according to the corresponding position;

8. A video face changing device based on deep learning, comprising:

the processing module is used for determining face changing images of the face information in the video frame to be changed through the pre-stored face changing model when the face information exists in the video frame to be changed, so as to obtain face changing images;

9. A video faceting apparatus based on deep learning, the apparatus comprising: a memory, a processor, and a deep learning based video facer stored on the memory and executable on the processor, the deep learning based video facer configured to implement a deep learning based video facer method as claimed in any one of claims 1 to 7.

10. A storage medium having stored thereon a deep learning based video faceting program which, when executed by a processor, implements a deep learning based video faceting method as recited in any one of claims 1 to 7.