CN113691833B - Virtual anchor face changing method and device, electronic equipment and storage medium - Google Patents

Virtual anchor face changing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113691833B
CN113691833B CN202010420711.XA CN202010420711A CN113691833B CN 113691833 B CN113691833 B CN 113691833B CN 202010420711 A CN202010420711 A CN 202010420711A CN 113691833 B CN113691833 B CN 113691833B
Authority
CN
China
Prior art keywords
face
virtual anchor
target virtual
changing
anchor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010420711.XA
Other languages
Chinese (zh)
Other versions
CN113691833A (en
Inventor
樊博
徐祯
陈曦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN202010420711.XA priority Critical patent/CN113691833B/en
Priority to PCT/CN2021/078248 priority patent/WO2021232878A1/en
Publication of CN113691833A publication Critical patent/CN113691833A/en
Priority to US17/989,323 priority patent/US20230082830A1/en
Application granted granted Critical
Publication of CN113691833B publication Critical patent/CN113691833B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Processing Or Creating Images (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The embodiment of the invention discloses a virtual anchor face changing method, a virtual anchor face changing device, electronic equipment and a storage medium, wherein historical video materials of a target virtual anchor are processed through an end-to-end sequence learning model to obtain face characteristic parameters of the target virtual anchor; changing faces of the target virtual anchor by using the candidate virtual faces to obtain image materials corresponding to the changed faces of the target virtual anchor; processing the human face characteristic parameters and image materials of the target virtual anchor through an end-to-end sequence learning model to obtain the human face characteristic parameters of the face-changing virtual anchor; and fusing the candidate virtual human faces to the historical video material according to the human face characteristic parameters of the face changing virtual anchor to obtain the video material after face changing. Therefore, the utilization rate of the historical video material is improved, and the face changing virtual anchor is real and natural in image and natural and coordinated in facial feature motion.

Description

Virtual anchor face changing method and device, electronic equipment and storage medium
Technical Field
The embodiment of the invention relates to a video processing technology, in particular to a virtual anchor face changing method, a virtual anchor face changing device, electronic equipment and a storage medium.
Background
The artificial intelligence industry is gradually tending to mature, the artificial intelligence virtual reality conversion technology is closer to the public life, a virtual anchor which is not poor in the real image is formed based on artificial intelligence to perform numerous service scenes such as news broadcasting, virtual teachers, virtual doctors, virtual customer service and the like, and the information expression and transmission efficiency is greatly improved. Under the condition that a virtual anchor carries out news broadcasting, virtual teachers, virtual doctors, virtual customer service and other business scenes, how to quickly change faces of the virtual anchor in historical video materials becomes a new requirement.
Disclosure of Invention
The embodiment of the invention provides a virtual anchor face changing method, a virtual anchor face changing device, electronic equipment and a storage medium, and aims to generate a high-quality face-changed video material.
In a first aspect, an embodiment of the present invention provides a virtual anchor face changing method, including:
processing historical video materials of a target virtual anchor through an end-to-end sequence learning model to obtain human face characteristic parameters of the target virtual anchor;
changing faces of the target virtual anchor by using the candidate virtual faces to obtain image materials corresponding to the changed faces of the target virtual anchor;
processing the human face characteristic parameters of the target virtual anchor and the image materials through the end-to-end sequence learning model to obtain the human face characteristic parameters of the face changing virtual anchor;
and fusing the candidate virtual human faces to a historical video material according to the human face characteristic parameters of the face changing virtual anchor to obtain a video material after face changing.
Optionally, the face feature parameters of the target virtual anchor include:
facial feature motion parameters of the target virtual anchor under different expressions; or
The target virtual anchor comprises facial feature motion parameters under different expressions and at least one additional parameter of the following parameters: the face contour parameter of the target virtual anchor, the face feature proportion parameter of the target virtual anchor, the gender feature parameter of the target virtual anchor, and the proportion parameter between the head of the target virtual anchor and each body part of the target virtual anchor except the head.
Optionally, the obtaining of the face feature parameters of the target virtual anchor by processing historical video materials of the target virtual anchor through an end-to-end sequence learning model includes:
mapping historical video materials through an embedded layer of the end-to-end sequence learning model to obtain original facial features of the target virtual anchor and source text features corresponding to the original facial features;
processing the original facial features and the source text features through a feed-forward converter of the end-to-end sequence learning model to obtain original facial feature vectors corresponding to the original facial features and first text coding features corresponding to the source text features;
aligning the original facial feature vector with the first text coding feature, and then performing frame splicing and decoding to obtain facial feature motion parameters of the target virtual anchor.
Optionally, the changing the face of the target virtual anchor according to the candidate virtual face to obtain an image material of the changed-face virtual anchor includes:
determining candidate virtual faces;
and aiming at the target virtual anchor on the historical video material, carrying out face fusion on the candidate virtual face and the face part of the target virtual anchor to serve as the image material of the corresponding face-changing virtual anchor.
Optionally, the face feature parameters of the face changing virtual anchor include:
facial feature motion parameters of the face changing virtual anchor under different expressions; or
The face changing virtual anchor comprises facial feature motion parameters under different expressions and at least one additional parameter: the face contour parameter of the face changing virtual anchor, the facial feature proportion parameter of the face changing virtual anchor, the gender feature parameter of the face changing virtual anchor, and the proportion parameter between the head of the face changing virtual anchor and each body part except the head on the face changing virtual anchor.
Optionally, the processing, by the end-to-end sequence learning model, the face feature parameter of the target virtual anchor and the image material to obtain the face feature parameter of the face changing virtual anchor includes:
mapping the image material through an embedded layer of the end-to-end sequence learning model to obtain the rear face characteristics of the face change virtual anchor;
processing the rear face feature of the face change and the human face feature parameter of the target virtual anchor through a feedforward converter of the end-to-end sequence learning model to obtain a face change face feature vector corresponding to the rear face feature of the face change and a second text coding feature corresponding to the human face feature parameter of the target virtual anchor;
and aligning the face change facial feature vector with the second text coding feature, and then performing frame splicing and decoding to obtain facial feature motion parameters of the face change virtual anchor.
Optionally, the fusing the candidate virtual face to a historical video material according to the face feature parameters of the face change virtual anchor to obtain a video material after face change, including:
and inputting the facial feature motion parameters of the face changing virtual anchor into a muscle model bound by the face changing virtual anchor, and driving the facial features of the face changing virtual anchor to move so as to obtain the video material after face changing.
In a second aspect, an embodiment of the present invention provides a virtual anchor face changing device, including:
the system comprises a first processing unit, a second processing unit and a third processing unit, wherein the first processing unit is used for processing historical video materials of a target virtual anchor through an end-to-end sequence learning model to obtain human face characteristic parameters of the target virtual anchor;
the material generating unit is used for changing faces of the target virtual anchor by using the candidate virtual faces to obtain image materials corresponding to the face-changed virtual anchor;
the second processing unit is used for processing the human face characteristic parameters and the image materials of the target virtual anchor through the end-to-end sequence learning model to obtain the human face characteristic parameters of the face changing virtual anchor;
and the face change processing unit is used for fusing the candidate virtual faces to historical video materials according to the face characteristic parameters of the face change virtual anchor to obtain the video materials after face change.
Optionally, the face feature parameters of the target virtual anchor include:
facial feature motion parameters of the target virtual anchor under different expressions; or alternatively
The target virtual anchor comprises facial feature motion parameters under different expressions and at least one additional parameter: the face contour parameter of the target virtual anchor, the face feature proportion parameter of the target virtual anchor, the gender feature parameter of the target virtual anchor, and the proportion parameter between the head of the target virtual anchor and each body part of the target virtual anchor except the head.
Optionally, the first processing unit includes:
the first mapping unit is used for mapping historical video materials through an embedding layer of the end-to-end sequence learning model to obtain original facial features of the target virtual anchor and source text features corresponding to the original facial features;
a first encoding unit, configured to process the original facial features and the source text features through a feed-forward converter of the end-to-end sequence learning model to obtain original facial feature vectors corresponding to the original facial features and first text encoding features corresponding to the source text features;
and the first alignment unit is used for aligning the original facial feature vector with the first text coding feature and then performing frame splicing and decoding to obtain the facial feature motion parameters of the target virtual anchor.
Optionally, the material generating unit includes:
a determining subunit, configured to determine a candidate virtual face;
and the face fusion subunit is used for performing face fusion on the candidate virtual face and the face part of the target virtual anchor as an image material of the corresponding face-changing virtual anchor according to the target virtual anchor on the historical video material.
Optionally, the face feature related parameters of the face changing virtual anchor include:
facial feature motion parameters of the face changing virtual anchor under different expressions; or
The face changing virtual anchor comprises facial feature motion parameters under different expressions and at least one additional parameter: the face contour parameter of the face changing virtual anchor, the facial feature proportion parameter of the face changing virtual anchor, the gender feature parameter of the face changing virtual anchor, and the proportion parameter between the head of the face changing virtual anchor and each body part except the head on the face changing virtual anchor.
Optionally, the second processing unit includes:
the second mapping unit is used for mapping the image material through an embedded layer of the end-to-end sequence learning model to obtain the rear face characteristics of the face changing virtual anchor;
the second coding unit is used for processing the face-changed back feature and the human face feature parameter of the target virtual anchor through a feedforward converter of the end-to-end sequence learning model to obtain a face-changed face feature vector corresponding to the face-changed back feature and a second text coding feature corresponding to the human face feature parameter of the target virtual anchor;
and the second alignment unit is used for aligning the face change facial feature vector with the second text coding feature, and then performing frame splicing and decoding to obtain the facial feature motion parameters of the face change virtual anchor.
Optionally, the face change processing unit is specifically configured to:
and inputting the facial characteristic motion parameters of the face changing virtual anchor into a muscle model bound by the face changing virtual anchor, and driving the facial motion of the face changing virtual anchor to obtain the video material after face changing.
In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement any implementation manner of the method in the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements any one implementation of the method of the first aspect.
One or more technical solutions provided by the embodiments of the present invention at least achieve the following technical effects or advantages:
changing faces of the target virtual anchor by using the candidate virtual faces to obtain image materials corresponding to the face-changed virtual anchor; processing the human face characteristic parameters and image materials of the target virtual anchor through an end-to-end sequence learning model to obtain the human face characteristic parameters of the face-changing virtual anchor; according to the face characteristic parameters of the face changing virtual anchor, the candidate virtual faces are fused to the historical video material to obtain the video material after face changing, one face is not pasted to the historical video material, but the face characteristic parameters before and after face changing are combined, so that the face characteristic motion of the virtual anchor is natural and coordinated on the video material after face changing, the generation of the high-quality video material after face changing is achieved, and the utilization rate of the video material based on the virtual anchor is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a virtual anchor face changing method according to an embodiment of the present invention;
FIG. 2 is a functional block diagram of a virtual host face-changing apparatus according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device in an embodiment of the invention.
Detailed Description
According to the virtual anchor face changing method, the virtual anchor face changing device and the electronic equipment, historical video materials of a target virtual anchor are processed through an end-to-end sequence learning model, and face characteristic parameters of the target virtual anchor are obtained; changing faces of the target virtual anchor by using the candidate virtual faces to obtain image materials corresponding to the face-changed virtual anchor; processing the human face characteristic parameters of the target virtual anchor and the image materials through an end-to-end sequence learning model to obtain human face characteristic parameters of the face changing virtual anchor; and according to the face characteristic parameters of the face changing virtual anchor, fusing the candidate virtual faces to the historical video material to obtain the video material after face changing.
Through the technical scheme provided by the embodiment of the invention, the utilization rate of the video material based on the virtual anchor can be improved, and the quality of the video material after face changing is ensured.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions provided by the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention:
in a first aspect, an embodiment of the present invention provides a virtual anchor face changing. Referring to fig. 1, fig. 1 is a flowchart of a virtual anchor face changing method in an embodiment of the present invention, and the virtual anchor face changing method provided in the embodiment of the present invention includes the following steps:
s101: and processing historical video materials of the target virtual anchor through an end-to-end sequence learning model to obtain the human face characteristic parameters of the target virtual anchor.
In practical applications, the historical video material may be a segment of video that is broadcasted by the target virtual host, for example, the segment may be: a 30 minute news cast clip by the target virtual cast, or a1 hour educational video by the target virtual cast, etc. The target virtual anchor is a Digital Human (Digital Human) obtained by modeling a certain real person, and can be used for news broadcasting, virtual teachers, virtual doctors or virtual customer service through a text-driven created target virtual anchor. The target avatar appears in the historical video material as a bust, a full-body, or a head. Whereas the virtual anchor facechange of embodiments of the present invention is directed to only the face or head of the target virtual anchor.
The end-to-end sequence learning model used in the embodiment of the invention can be an end-to-end text-to-speech conversion model, such as a FastSpeech model or an end-to-end text-to-speech conversion model based on a deep neural network, wherein the FastSpeech model is a sequence learning model formed by adopting a feedforward network based on a Transformer and a self-attention mechanism in one-dimensional convolution.
Specifically, for the FastSpeech model, by taking a historical video material as a sample for training the FastSpeech model, a training process of the FastSpeech model is performed, and a face feature parameter of a target virtual anchor is extracted from the historical video material.
In the specific implementation process, the specific types of the face feature parameters extracted from the target virtual anchor are correspondingly different according to different face changing modes, and are respectively described as follows:
the method I comprises the following steps: and if the target virtual anchor in the historical video material is changed on the basis of the candidate virtual face which has the same face contour as that of the target virtual anchor or meets the preset contour similarity, only facial feature motion parameters of the target virtual anchor under different expressions are extracted.
It should be noted that the candidate virtual face may be a digital face obtained by modeling the face of another real person. Or virtual faces screened from a digital face library. Specifically, the candidate virtual face has a different appearance from the target virtual anchor. The implementation process of processing historical video materials by the FastSpeech model to extract facial feature motion parameters of the target virtual anchor under different expressions is described in more detail as follows:
first, step A1 is performed: mapping historical video materials through an embedding layer (embedding) of a FastSpeech model to acquire original face features of a target virtual anchor and source text features corresponding to the original face features; wherein the source text features are phone levels.
Then, step A2 is executed: processing the original facial features and the source text features in the step 1 through a Feed Forward converter (Feed Forward Transformer) of a FastSpeech model to obtain original facial feature vectors corresponding to the original facial features and first text coding features corresponding to the source text features; wherein the original facial feature vectors are feature representations for facial expressions and lip movements, and the first text-coding features are phoneme levels.
Then, step A3 is executed: aligning the original facial feature vector with the first text coding feature, aligning the original facial feature vector with the first text coding feature by using a duration predictor, performing frame splicing and decoding after aligning, and obtaining a facial feature motion parameter sequence of the target virtual anchor, namely obtaining facial feature motion parameters of the target virtual anchor under different expressions.
The process of step A3, more specifically, comprises: the method comprises the steps of splicing a phoneme-level first text coding vector and an original facial feature vector through a FastSpeech model, decoding frame-level coding features obtained through frame splicing through a decoder of the FastSpeech model to obtain facial feature motion parameters, enabling the frame-level coding features to pass through a Gradient Reversal Layer (GRL) of the FastSpeech model, enabling the Gradient direction to be automatically reversed in the back propagation process, achieving identity transformation in the forward propagation process, and inputting the frame-level coding features passing through the Gradient Reversal Layer to a motion parameter classifier to obtain corresponding classification probabilities.
Since the FastSpeech model is a non-autoregressive model using a feedforward converter, without explicitly depending on previous elements, the embodiments of the present invention enable parallel generation of facial feature motion parameter sequences, compared to sequence-based learning using an encoder-attention-decoder based architecture, thereby enabling efficient processing of historical video material.
Specifically, the facial feature motion parameter sequence comprises a plurality of groups of facial feature motion parameters, wherein each group of facial feature motion parameters comprises motion parameters of facial features such as facial muscles, eyes, a nose, eyebrows, a mouth and the like.
It should be noted that the source text features in the embodiment of the present invention may include: phoneme features and/or semantic features, etc. Further, the phoneme is a minimum voice unit divided according to natural attributes of the voice, and is analyzed according to pronunciation actions in syllables, and one pronunciation action constitutes one phoneme. Specifically, the phoneme features correspond to the original facial features in sequence. Wherein the facial features include: expression characteristics and lip characteristics, wherein expression is expression emotion and sentiment, and can refer to thought emotion expressed on the face. Expressive features are typically directed to the entire face. The lip characteristics can be specially specific to lips and have a relation with text contents, voice, pronunciation modes and the like of texts, so that facial expressions and lip movements can be more vivid and finer through facial characteristic motion parameters.
The second method comprises the following steps: and if the target virtual anchor in the historical video material is changed on the basis of any one candidate virtual face. Such as: the gender, the face shape and the like of the candidate virtual face are different from those of the target virtual anchor, and the face changing effect is improved. For extracting the face characteristic parameters of the target virtual anchor, in addition to extracting the face characteristic motion parameter sequence of the target virtual anchor under the historical video material, one or more of face contour parameters, face characteristic proportion parameters, gender characteristic parameters and the like of the target virtual anchor are extracted. Specifically, for the extraction of the facial contour parameters, the facial feature proportion parameters, and the gender feature parameters, the existing method may be adopted, or the extraction may be performed by the same implementation method as that of the above steps A1 to A3, and for the brevity of the description, the details are not repeated herein.
And if the video is based on the candidate virtual human face, directly replacing the head part of the target virtual anchor in the historical video material, instead of only the face part. For extracting the facial feature parameters of the target virtual anchor, in addition to extracting facial feature motion parameters of the target virtual anchor under different expressions, it is also necessary to extract head features of the target virtual anchor, and proportion parameters between the head of the target virtual anchor and various body parts of the target virtual anchor except the head, such as: the ratio between the head and neck of the target virtual anchor. Specifically, the head feature of the target virtual anchor, the head of the target virtual anchor, and the proportional parameters between the head of the target virtual anchor and each body part of the target virtual anchor except the head may be extracted in the existing manner, or extracted in the same manner as in the above steps A1 to A3, and for brevity of the description, details are not repeated here.
Step S102: and changing faces of the target virtual anchor by using the candidate virtual faces to obtain image materials corresponding to the face-changed virtual anchor.
Specifically, in order to make the face effect of the face changing virtual anchor more natural, in the embodiment of the present invention, an offline processing mode may be adopted to fuse the determined candidate virtual face with the face of the target virtual anchor, so as to obtain an image material corresponding to the face changing virtual anchor. Because the face changing virtual anchor in the image material fuses the facial features of the target virtual anchor and the candidate virtual human face, the effect after face changing is more natural. The following describes in detail the process of fusing a candidate virtual face with a face of a target virtual anchor:
firstly, a fusion degree alpha (0 = < alpha < = 1) needs to be set, face key point detection is carried out on a candidate virtual face, and face key points of the candidate virtual face are obtained; and detecting the face key points of the target virtual anchor on the historical video material to obtain the face key points of the target virtual anchor. Carrying out affine transformation on the face key points of the candidate virtual face and the face key points of the target virtual anchor respectively to obtain corresponding affine face images; and carrying out weighted average on the two affine human face image points according to the set fusion degree alpha to obtain a fused image, wherein the fused image is an image material containing the face change virtual anchor. The value of the fusion degree alpha determines the closeness degree of the face of the virtual anchor image after face fusion, the candidate virtual face and the target virtual anchor looks; when the alpha is equal to 0.5, the candidate virtual face and the face of the target virtual anchor are averaged.
S103: and processing the human face characteristic parameters and the image materials of the target virtual anchor through an end-to-end sequence learning model to obtain the human face characteristic parameters of the face-changing virtual anchor.
Specifically, the obtained face characteristic parameters of the face change virtual anchor correspond to the face characteristic parameters of the target virtual anchor, and therefore the face characteristic parameters of the face change virtual anchor include: face feature motion parameters of the face changing virtual anchor under different expressions. Or both of: the face changing virtual anchor comprises facial feature motion parameters under different expressions and at least one additional parameter: the face contour parameter of the face changing virtual anchor, the facial characteristic proportion parameter of the face changing virtual anchor, the gender characteristic parameter of the face changing virtual anchor, and the proportion parameter between the head of the face changing virtual anchor and each body part except the head on the face changing virtual anchor.
The facial feature motion parameters of the face changing virtual anchor under different expressions can be extracted through an end-to-end sequence learning model, and the specific flow is as follows:
b1, mapping the image material through an embedded layer of an end-to-end sequence learning model to obtain the rear face characteristics of the face change virtual anchor;
b2, processing the rear features of the face change and the face feature parameters of the target virtual anchor through a feed-forward converter of the end-to-end sequence learning model to obtain face change face feature vectors corresponding to the rear features of the face change and second text coding features corresponding to the face feature parameters of the target virtual anchor;
and B3, aligning the face change facial feature vector with the second text coding feature, and then performing frame splicing and decoding to obtain facial feature motion parameters of the face change virtual anchor.
In specific implementation, reference may be made to the detailed description of steps A1 to A3 for further implementation details of steps B1 to B3, and for brevity of the description, no further description is provided herein.
The face contour parameter of the face changing virtual anchor, the face characteristic proportion parameter of the face changing virtual anchor, the gender characteristic parameter of the face changing virtual anchor, and the proportion parameter between the head of the face changing virtual anchor and each body part except the head on the face changing virtual anchor can be extracted by adopting the prior art or a similar implementation manner to the steps B1-B3.
And S104, fusing the candidate virtual human faces to the historical video material according to the human face characteristic parameters of the face changing virtual anchor to obtain the video material after face changing.
Specifically, facial feature motion parameters of the face changing virtual anchor are input into a muscle model bound to the face changing virtual anchor, and facial features of the face changing virtual anchor are driven to move, so that a video material after face changing is obtained.
In a second aspect, based on the same inventive concept as the foregoing virtual host face changing method, an embodiment of the present invention provides a virtual host face changing apparatus, as shown in fig. 2, including:
the first processing unit 201 is configured to process a historical video material of a target virtual anchor through an end-to-end sequence learning model to obtain a face feature parameter of the target virtual anchor;
the material generation unit 202 is configured to change the face of the target virtual anchor by using the candidate virtual face to obtain an image material corresponding to the changed face virtual anchor;
the second processing unit 203 is configured to process the face feature parameters and the image materials of the target virtual anchor through an end-to-end sequence learning model to obtain face feature parameters of the face-changed virtual anchor;
and the face changing processing unit 204 is configured to fuse the candidate virtual faces to the historical video material according to the face characteristic parameters of the face changing virtual anchor to obtain a video material after face changing.
Optionally, the face feature parameters of the target virtual anchor include:
the target virtual anchor obtains facial feature motion parameters under different expressions; or alternatively
The target virtual anchor comprises facial feature motion parameters under different expressions and at least one additional parameter of the following parameters: the system comprises a face contour parameter of a target virtual anchor, a face feature proportion parameter of the target virtual anchor, a gender feature parameter of the target virtual anchor, and a proportion parameter between a head of the target virtual anchor and each body part of the target virtual anchor except the head.
Optionally, the first processing unit 201 includes:
the system comprises a first mapping unit, a second mapping unit and a third mapping unit, wherein the first mapping unit is used for mapping historical video materials through an embedded layer of an end-to-end sequence learning model to obtain original facial features of a target virtual anchor and source text features corresponding to the original facial features;
the first coding unit is used for processing the original facial features and the source text features through a feedforward converter of an end-to-end sequence learning model to obtain original facial feature vectors corresponding to the original facial features and first text coding features corresponding to the source text features;
and the first alignment unit is used for aligning the original facial feature vector with the first text coding feature and then performing frame splicing and decoding to obtain the facial feature motion parameters of the target virtual anchor.
Optionally, the material generating unit 202 includes:
a determining subunit, configured to determine a candidate virtual face;
and the face fusion subunit is used for performing face fusion on the candidate virtual faces and the face part of the target virtual anchor as image materials corresponding to the face changing virtual anchor according to the target virtual anchor on the historical video materials.
Optionally, the face feature related parameters of the face changing virtual anchor include:
face feature motion parameters of the face changing virtual anchor under different expressions; or
The face changing virtual anchor comprises facial feature motion parameters under different expressions and at least one additional parameter: the face contour parameter of the face changing virtual anchor, the facial characteristic proportion parameter of the face changing virtual anchor, the gender characteristic parameter of the face changing virtual anchor, and the proportion parameter between the head of the face changing virtual anchor and each body part except the head on the face changing virtual anchor.
Optionally, the second processing unit 203 comprises:
the second mapping unit is used for mapping the image material through an embedded layer of the end-to-end sequence learning model to obtain the rear face characteristics of the face changing virtual anchor;
the second coding unit is used for processing the rear face feature of the face change and the human face feature parameter of the target virtual anchor through a feedforward converter of the end-to-end sequence learning model to obtain a face change face feature vector corresponding to the rear face feature of the face change and a second text coding feature corresponding to the human face feature parameter of the target virtual anchor;
and the second alignment unit is used for aligning the face change facial feature vector with the second text coding feature, and then performing frame splicing and decoding to obtain the facial feature motion parameters of the face change virtual anchor.
Optionally, the face change processing unit 204 is specifically configured to:
and inputting the facial characteristic motion parameters of the face changing virtual anchor into a muscle model bound by the face changing virtual anchor, and driving the facial motion of the face changing virtual anchor to obtain a video material after face changing.
The specific implementation details of the virtual anchor face changing device provided by the embodiment of the invention can refer to the description of the virtual anchor face changing method embodiment, the virtual anchor face changing device combines the human face characteristic parameters before and after face changing, so that the motion of the facial characteristics of an anchor on a video material after face changing is natural and coordinated, the generation of a high-quality video material after face changing is realized, and the utilization rate of the video material based on the virtual anchor can be improved.
Fig. 3 is a block diagram illustrating an electronic device 800 implementing virtual anchor facechange in accordance with an exemplary embodiment. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 3, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.
The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing element 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 may include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various classes of data to support operations at the device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Power component 806 provides power for the various components of device 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device 800.
The multimedia component 808 includes a screen that provides an output interface between the device 800 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 810 is configured to output and/or input audio signals. For example, audio component 810 includes a Microphone (MIC) configured to receive external audio signals when apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed state of the device 800, the relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or a component of the apparatus 800, the presence or absence of user contact with the apparatus 800, orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate communication between the apparatus 800 and other devices in a wired or wireless manner. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors, or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the key press false touch correction method described above is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
The present invention also provides a non-transitory computer readable storage medium having instructions which, when executed by a processor of a mobile terminal, enable the mobile terminal to perform all or part of the steps of the above-described method embodiments of the present invention.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. The invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements that have been described above and shown in the drawings, and that various modifications and changes can be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (8)

1. A virtual anchor face changing method is characterized by comprising the following steps:
processing historical video materials of a target virtual anchor through an end-to-end sequence learning model to obtain human face characteristic parameters of the target virtual anchor;
setting a fusion degree parameter;
carrying out affine transformation on the face key points of the candidate virtual face and the face key points of the target virtual anchor respectively to obtain corresponding affine face images;
carrying out weighted average on the two affine face images according to the fusion degree parameters to obtain a fused image, taking the fused image as an image material of a face changing virtual anchor, and carrying out face modeling on the candidate virtual face and the target virtual anchor respectively to obtain digital faces with different appearances;
processing the human face characteristic parameters of the target virtual anchor and the image materials through the end-to-end sequence learning model to obtain the human face characteristic parameters of the face changing virtual anchor;
and inputting facial feature motion parameters in the facial feature parameters of the face changing virtual anchor into a muscle model bound by the face changing virtual anchor, and driving the facial features of the face changing virtual anchor to move so as to obtain a video material after face changing.
2. The method of claim 1, wherein the face feature parameters of the target virtual anchor comprise:
facial feature motion parameters of the target virtual anchor under different expressions; or alternatively
The target virtual anchor comprises facial feature motion parameters under different expressions and at least one additional parameter: the face contour parameter of the target virtual anchor, the face feature proportion parameter of the target virtual anchor, the gender feature parameter of the target virtual anchor, and the proportion parameter between the head of the target virtual anchor and each body part of the target virtual anchor except the head.
3. The method of claim 2, wherein the processing historical video material of a target virtual anchor through an end-to-end sequence learning model to obtain the face feature parameters of the target virtual anchor comprises:
mapping historical video materials through an embedded layer of the end-to-end sequence learning model to obtain original facial features of the target virtual anchor and source text features corresponding to the original facial features;
processing the original facial features and the source text features through a feedforward converter of the end-to-end sequence learning model to obtain original facial feature vectors corresponding to the original facial features and first text coding features corresponding to the source text features;
and aligning the original facial feature vector with the first text coding feature, and then performing frame splicing and decoding to obtain the facial feature motion parameters of the target virtual anchor.
4. The method of claim 1, wherein the face feature parameters of the face change avatar comprise:
facial feature motion parameters of the face changing virtual anchor under different expressions; or
The face changing virtual anchor comprises facial feature motion parameters under different expressions and at least one additional parameter: the face contour parameter of the face changing virtual anchor, the facial characteristic proportion parameter of the face changing virtual anchor, the gender characteristic parameter of the face changing virtual anchor, the head of the face changing virtual anchor and the proportion parameter between all body parts except the head on the face changing virtual anchor.
5. The method of claim 4, wherein said processing the face feature parameters of the target virtual anchor and the image material through the end-to-end sequence learning model to obtain the face feature parameters of the face change virtual anchor comprises:
mapping the image material through an embedded layer of the end-to-end sequence learning model to obtain the rear face characteristics of the face change virtual anchor;
processing the rear face features of the face change and the face feature parameters of the target virtual anchor through a feed-forward converter of the end-to-end sequence learning model to obtain face change face feature vectors corresponding to the rear face features of the face change and second text coding features corresponding to the face feature parameters of the target virtual anchor;
and aligning the face changing facial feature vector with the second text coding feature, and then performing frame splicing and decoding to obtain facial feature motion parameters of the face changing virtual anchor.
6. A virtual anchor face-changing device, comprising:
the system comprises a first processing unit, a second processing unit and a third processing unit, wherein the first processing unit is used for processing historical video materials of a target virtual anchor through an end-to-end sequence learning model to obtain human face characteristic parameters of the target virtual anchor;
the material generation unit is used for setting a fusion degree parameter;
carrying out affine transformation on the face key points of the candidate virtual face and the face key points of the target virtual anchor respectively to obtain corresponding affine face images;
carrying out weighted average on the two affine human face images according to the fusion degree parameters to obtain fused images, taking the fused images as image materials of a face changing virtual anchor, and respectively carrying out face modeling on the candidate virtual human faces and the target virtual anchor to obtain digital human faces with different appearances;
the second processing unit is used for processing the human face characteristic parameters and the image materials of the target virtual anchor through the end-to-end sequence learning model to obtain the human face characteristic parameters of the face changing virtual anchor;
and the face changing processing unit is used for inputting facial feature motion parameters in the human face feature parameters of the face changing virtual anchor into the muscle model bound by the face changing virtual anchor and driving the facial features of the face changing virtual anchor to move so as to obtain the video material after face changing.
7. An electronic device comprising a memory, one or more processors, and a computer program stored on the memory and executable on the processors, wherein the processor when executing the program implements the method of any of claims 1-5.
8. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any one of claims 1-5.
CN202010420711.XA 2020-05-18 2020-05-18 Virtual anchor face changing method and device, electronic equipment and storage medium Active CN113691833B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202010420711.XA CN113691833B (en) 2020-05-18 2020-05-18 Virtual anchor face changing method and device, electronic equipment and storage medium
PCT/CN2021/078248 WO2021232878A1 (en) 2020-05-18 2021-02-26 Virtual anchor face swapping method and apparatus, electronic device, and storage medium
US17/989,323 US20230082830A1 (en) 2020-05-18 2022-11-17 Method and apparatus for driving digital human, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010420711.XA CN113691833B (en) 2020-05-18 2020-05-18 Virtual anchor face changing method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113691833A CN113691833A (en) 2021-11-23
CN113691833B true CN113691833B (en) 2023-02-03

Family

ID=78575581

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010420711.XA Active CN113691833B (en) 2020-05-18 2020-05-18 Virtual anchor face changing method and device, electronic equipment and storage medium

Country Status (2)

Country Link
CN (1) CN113691833B (en)
WO (1) WO2021232878A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114245155A (en) * 2021-11-30 2022-03-25 北京百度网讯科技有限公司 Live broadcast method and device and electronic equipment
CN115984427B (en) * 2022-12-08 2024-05-17 上海积图科技有限公司 Animation synthesis method, device, equipment and storage medium based on audio
CN115661005B (en) * 2022-12-26 2023-05-12 成都索贝数码科技股份有限公司 Custom digital person generation method and equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
BRPI0904540A2 (en) * 2009-11-27 2011-07-12 Samsung Eletronica Da Amazonia Ltda lip movement synthesis method for virtual head animation through voice processing on handheld devices
WO2016177290A1 (en) * 2015-05-06 2016-11-10 北京蓝犀时空科技有限公司 Method and system for generating and using expression for virtual image created through free combination
CN110929553A (en) * 2018-09-19 2020-03-27 未来市股份有限公司 Method and device for generating facial expressions through data fusion and head-mounted display

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101373014B1 (en) * 2007-09-14 2014-03-13 삼성전자주식회사 Method of controlling digital image processing apparatus for managing face, and image processing apparatus adopting the method
KR101997702B1 (en) * 2017-11-09 2019-10-01 (주)코아시아 3D simulation system for hair-styling
CN107911644B (en) * 2017-12-04 2020-05-08 吕庆祥 Method and device for carrying out video call based on virtual face expression
CN109670427B (en) * 2018-12-07 2021-02-02 腾讯科技(深圳)有限公司 Image information processing method and device and storage medium
CN110136229B (en) * 2019-05-27 2023-07-14 广州亮风台信息科技有限公司 Method and equipment for real-time virtual face changing
CN110390704B (en) * 2019-07-11 2021-02-12 深圳追一科技有限公司 Image processing method, image processing device, terminal equipment and storage medium
CN110866968A (en) * 2019-10-18 2020-03-06 平安科技(深圳)有限公司 Method for generating virtual character video based on neural network and related equipment
CN110889454A (en) * 2019-11-29 2020-03-17 上海能塔智能科技有限公司 Model training method and device, emotion recognition method and device, equipment and medium
CN111010589B (en) * 2019-12-19 2022-02-25 腾讯科技(深圳)有限公司 Live broadcast method, device, equipment and storage medium based on artificial intelligence
CN111010586B (en) * 2019-12-19 2021-03-19 腾讯科技(深圳)有限公司 Live broadcast method, device, equipment and storage medium based on artificial intelligence

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
BRPI0904540A2 (en) * 2009-11-27 2011-07-12 Samsung Eletronica Da Amazonia Ltda lip movement synthesis method for virtual head animation through voice processing on handheld devices
WO2016177290A1 (en) * 2015-05-06 2016-11-10 北京蓝犀时空科技有限公司 Method and system for generating and using expression for virtual image created through free combination
CN110929553A (en) * 2018-09-19 2020-03-27 未来市股份有限公司 Method and device for generating facial expressions through data fusion and head-mounted display

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于人脸特征的三维重建及脸部动画;任梦园;《中国优秀硕士学位论文全文数据库 信息科技辑》;20200115;全文 *

Also Published As

Publication number Publication date
CN113691833A (en) 2021-11-23
WO2021232878A1 (en) 2021-11-25

Similar Documents

Publication Publication Date Title
CN109637518B (en) Virtual anchor implementation method and device
CN113691833B (en) Virtual anchor face changing method and device, electronic equipment and storage medium
CN109257645B (en) Video cover generation method and device
CN110210310B (en) Video processing method and device for video processing
CN111553864B (en) Image restoration method and device, electronic equipment and storage medium
CN107944447B (en) Image classification method and device
CN110458218B (en) Image classification method and device and classification network training method and device
CN109711546B (en) Neural network training method and device, electronic equipment and storage medium
CN113689879B (en) Method, device, electronic equipment and medium for driving virtual person in real time
CN109840917B (en) Image processing method and device and network training method and device
CN110490164B (en) Method, device, equipment and medium for generating virtual expression
CN114266840A (en) Image processing method, image processing device, electronic equipment and storage medium
CN110633470A (en) Named entity recognition method, device and storage medium
JP2024513640A (en) Virtual object action processing method, device, and computer program
CN110415702A (en) Training method and device, conversion method and device
CN112597944A (en) Key point detection method and device, electronic equipment and storage medium
CN115273831A (en) Voice conversion model training method, voice conversion method and device
CN112036174A (en) Punctuation marking method and device
CN111145080B (en) Training method of image generation model, image generation method and device
CN112613447A (en) Key point detection method and device, electronic equipment and storage medium
CN110321829A (en) A kind of face identification method and device, electronic equipment and storage medium
CN113709548B (en) Image-based multimedia data synthesis method, device, equipment and storage medium
CN111860552A (en) Model training method and device based on nuclear self-encoder and storage medium
CN113115104B (en) Video processing method and device, electronic equipment and storage medium
CN108024005B (en) Information processing method and device, intelligent terminal, server and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant