WO2023109753A1 - Procédé et appareil de génération d'animation de personnage virtuel, et support de stockage et terminal - Google Patents

Procédé et appareil de génération d'animation de personnage virtuel, et support de stockage et terminal Download PDF

Info

Publication number
WO2023109753A1
WO2023109753A1 PCT/CN2022/138386 CN2022138386W WO2023109753A1 WO 2023109753 A1 WO2023109753 A1 WO 2023109753A1 CN 2022138386 W CN2022138386 W CN 2022138386W WO 2023109753 A1 WO2023109753 A1 WO 2023109753A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
user
current frame
virtual character
animation
Prior art date
Application number
PCT/CN2022/138386
Other languages
English (en)
Chinese (zh)
Inventor
张建杰
金师豪
林炳坤
柴金祥
Original Assignee
魔珐(上海)信息科技有限公司
上海墨舞科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 魔珐(上海)信息科技有限公司, 上海墨舞科技有限公司 filed Critical 魔珐(上海)信息科技有限公司
Publication of WO2023109753A1 publication Critical patent/WO2023109753A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • H04N21/43072Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of multiple content streams on the same device
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs

Definitions

  • the invention relates to the technical field of video animation, in particular to a method and device for generating animation of a virtual character, a storage medium, and a terminal.
  • Virtual live broadcast technology refers to the technology in which virtual characters replace live anchors for video production.
  • a specific environment for example, motion capture laboratory
  • specific equipment for example, expression capture equipment, motion capture equipment, etc.
  • the technical problem solved by the present invention is to provide a method for generating animation of virtual characters with better versatility, lower cost and better user experience.
  • an embodiment of the present invention provides a method for generating animation of a virtual character, the method comprising: acquiring a current frame image, the current frame image including the user's image; determining the current frame image according to the current frame image
  • the state information corresponding to the user includes: face information, human body posture information and eye direction information, the face information includes facial posture information and facial expression information; redirection processing is performed according to the state information,
  • the animation data includes: facial animation data, body animation data and eyeball animation data.
  • the method further includes: at least according to the animation data, determining video stream data corresponding to the virtual character; sending the video stream data to a live broadcast server, so that the live broadcast server sends the video stream data The data is forwarded to other user terminals.
  • determining the video stream data corresponding to the virtual character includes: acquiring voice information input by the user; synchronizing the voice information and picture information to obtain the video stream corresponding to the virtual character data, wherein the picture information is obtained by rendering the virtual character according to the animation data.
  • the human body posture information includes: trunk and neck movement information, the trunk and neck movement information is used to describe the movement of the user's torso and neck, and the trunk and neck movement information is based on the facial posture information definite.
  • the body animation data includes torso and neck animation data and limbs animation data
  • performing redirection processing according to the state information to obtain the animation data of the virtual character includes: performing redirection according to the torso and neck motion information Process to obtain the animation data of the trunk and neck; obtain the animation data of the limbs selected by the user; determine whether the action corresponding to the animation data of the trunk and neck matches the action corresponding to the animation data of the limbs, and if not, adjust the Trunk and neck animation data, so that the adjusted torso and neck animation data matches the actions corresponding to the limbs animation data; fusion processing is performed between the limbs animation data and the matched trunk and neck animation data to obtain body animation data .
  • determining the state information corresponding to the user in the current frame includes: acquiring limb movement information input by the user, the limb movement information being used to describe the movement of the user's limbs; Fusion processing is performed with the limb movement information to obtain the human body posture information of the current frame.
  • the method before performing fusion processing on the trunk and neck movement information and the limb movement information, the method further includes: judging whether the trunk and neck movements described by the trunk and neck movement information meet the action conditions, If not, adjust the trunk and neck movement information so that the trunk and neck movements described by the adjusted trunk and neck movement information meet the movement conditions; wherein, the movement conditions are based on the limb movement information definite.
  • determining the state information corresponding to the user in the current frame according to the current frame image includes: determining the facial posture information corresponding to the user in the current frame according to the current frame image; inputting the facial posture information corresponding to the user in the current frame into to the human body posture matching model to obtain the current frame user’s corresponding torso and neck movement information; wherein, the human body posture matching model is obtained by training the first preset model according to the first training data, and the first training data It includes multiple pairs of first sample information, and each pair of first sample information includes: facial posture information corresponding to the sample user and torso and neck movement information corresponding to the sample user.
  • inputting the facial posture information into the human body posture matching model includes: obtaining associated posture information, the associated posture information including: facial posture information and/or trunk and neck movement information corresponding to the user in the associated image, wherein, the associated image is a continuous multi-frame image before the current frame image and/or a continuous multi-frame image after the current frame image; the facial posture information corresponding to the current frame user and the associated posture information are input To the human body pose matching model to obtain the torso and neck movement information corresponding to the user in the current frame.
  • determining the state information corresponding to the user according to the current frame image includes: step A: generating a three-dimensional face model according to the initial face information corresponding to the user in the current frame; step B: according to the three-dimensional face model, Determining estimated face feature information, and calculating a first difference between the estimated face feature information and the target face feature information of the current frame, wherein the target face feature information is based on the current frame image Detected; step C: judging whether the first preset condition is met, if yes, then execute step D, otherwise execute step E; step D: use the initial face information as the face information corresponding to the current frame user; step E: update the initial face information, use the updated initial face information as the initial face information corresponding to the user in the current frame, and return to step A until the first preset condition is met; wherein, step 1 is executed for the first time
  • the initial face information corresponding to the user in the current frame is the face information corresponding to the user in the previous frame, or the preset face information, and the first
  • the gaze direction information includes the three-dimensional pupil center position, and according to the current frame image, determining the status information corresponding to the user in the current frame includes: Step 1: Estimating the pupil center position according to the eye information corresponding to the user in the current frame , determine the three-dimensional eyeball model, wherein the eye information includes: eyeball center position, eyeball radius and iris size; Step 2: Calculate estimated eye feature information according to the three-dimensional eyeball model, and calculate the estimated eyeball The second difference between the facial feature information and the target eye feature information, wherein the target eye feature information is detected based on the current frame image; Step 3: judging whether the second preset condition is met, if yes , then perform step 4, otherwise, perform step 5; step 4: use the estimated pupil center position as the three-dimensional pupil center position corresponding to the user in the current frame; step 5: update the estimated pupil center position, and update the The estimated pupil center position is used as the estimated pupil center position corresponding to the user in the current frame, and returns to step 1 until the second preset condition is
  • the human body posture information includes joint angle data of the first skeletal model
  • performing redirection processing according to the state information to obtain the animation data of the virtual character includes: generating a transitional skeletal model, the transitional skeletal model
  • the positions of the multiple preset key joints in the first skeletal model are the same as the positions of the multiple preset key joints in the first skeletal model, and the skeletal shape of the transitional skeletal model is the same as that of the second skeletal model; according to the The joint angle data of the first skeletal model and the first skeletal model determine the positions of the plurality of preset key joints; determine the transition according to the positions of the plurality of preset key joints and the transition skeletal model
  • the joint angle data of the skeleton model to obtain the body animation data of the virtual character; wherein, the first skeleton model is a skeleton model corresponding to the user, and the second skeleton model is the skeleton model of the virtual character, so
  • the bone shape described includes the number of bones and the default orientation of each joint's rotation
  • performing redirection processing according to the state information to obtain the animation data of the virtual character includes: inputting the facial expression information into an expression mapping model, wherein the expression mapping model is based on the second training
  • the data is obtained by training the second preset model, the second training data includes multiple sets of second sample information, and each set of sample information includes: facial expression information of multiple sample users under preset expressions and the virtual character
  • the facial animation data includes mouth animation data
  • performing redirection processing according to the state information to obtain the animation data of the virtual character includes: extracting information related to the mouth from the facial expression information.
  • Expression information recorded as mouth expression information; input the mouth expression information into the first mouth shape mapping model, wherein the first mouth shape mapping model is obtained by training the third preset model according to the third training data Yes
  • the third training data includes multiple sets of third sample information
  • each set of third sample information includes: mouth expression information of multiple sample users under preset expressions and the virtual character’s mouth expression information under the preset expressions Mouth animation data, wherein the multiple sets of third sample information correspond to different preset expressions; acquire the mouth animation data output by the first mouth shape mapping model.
  • the facial animation data includes mouth animation data
  • performing redirection processing according to the state information to obtain the animation data of the virtual character includes: extracting the mouth and mouth according to the 3D face model corresponding to the user in the current frame.
  • the three-dimensional feature points related to the mouth are recorded as the three-dimensional feature information of the mouth;
  • the three-dimensional feature information of the mouth is input into the second mouth shape mapping model, wherein the second mouth shape mapping model is based on the fourth training data for the first Four preset models are trained
  • the fourth training data includes multiple sets of fourth sample information
  • each set of fourth sample information includes: three-dimensional feature information of the mouth of multiple sample users under preset expressions and the virtual character Mouth animation data under the preset expression, wherein the plurality of sets of fourth sample information correspond to different preset expressions; and acquire the mouth animation data output by the second mouth shape mapping model.
  • the animation data further includes tooth animation data
  • performing redirection processing according to the state information to obtain the animation data of the virtual character further includes: determining the tooth animation data according to the mouth animation data .
  • the eye direction information is the zenith angle and azimuth angle of the three-dimensional pupil center position in a spherical coordinate system with the eyeball center position as the coordinate origin, and redirection processing is performed according to the state information to Obtaining the animation data of the virtual character includes: determining the position of the virtual pupil according to the eye radius of the virtual character and the gaze direction information, so as to obtain the animation data of the eye, wherein the position of the virtual pupil is the position of the virtual character The three-dimensional pupil center position of .
  • performing redirection processing according to the state information to obtain the animation data of the virtual character includes: inputting the three-dimensional pupil center position corresponding to the user in the current frame into the eye mapping model, wherein the eye mapping model is based on
  • the fifth training data is obtained by training the fifth preset model, the fifth training data includes multiple pairs of fifth sample information, each pair of fifth sample information includes the user's three-dimensional pupil center position and the The three-dimensional pupil center position of the virtual character under the preset eye direction; the virtual pupil center position is obtained from the eye mapping model to obtain the eyeball animation data, wherein the virtual pupil position is the three-dimensional pupil position of the virtual character Pupil center position.
  • the current frame image is collected by a single camera.
  • An embodiment of the present invention also provides an animation generation device for a virtual character
  • the device includes: an image acquisition module, configured to acquire a current frame image, the current frame image including the user's image; a calculation module, configured to Frame image, determine the state information corresponding to the current frame user, the state information includes: face information, human body posture information and eye direction information, the human face information includes facial posture information and facial expression information; redirection module, It is used to perform redirection processing according to the state information to obtain the animation data of the virtual character, wherein the time code of the animation data is the same as that of the current frame image, and the animation data includes: facial animation data, body animation data Animation data and eye animation data.
  • An embodiment of the present invention also provides a storage medium on which a computer program is stored, and when the computer program is run by a processor, the steps of the above-mentioned method for generating animation of a virtual character are executed.
  • An embodiment of the present invention also provides a terminal, including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor executes the above virtual character when running the computer program.
  • a terminal including a memory and a processor
  • the memory stores a computer program that can run on the processor
  • the processor executes the above virtual character when running the computer program. The steps of the animation generation method.
  • the current frame image is obtained, and the state information corresponding to the current frame user is determined according to the current frame image. Since the state information includes face information, human body posture information, and eye direction information, therefore, according to the state information, the The animation data of the avatar may have the same semantics as the state information corresponding to the user.
  • the user does not need to wear specific motion capture clothing or a specific helmet, and can obtain information such as the user's expression, facial posture, action posture, and eyes only based on a single frame of image, and then proceed according to the state information.
  • the animation of the virtual character is obtained through redirection processing, so the solution provided by the embodiment of the present invention has better versatility, lower cost and better user experience.
  • the movement information of the trunk and neck is obtained based on the facial posture information.
  • the amount of calculation is smaller, and the efficiency of animation generation can be improved while ensuring the animation effect.
  • the human body pose model is a time-series model
  • the facial pose information and associated pose information corresponding to the user in the current frame can be input into the face pose model to obtain the torso and neck corresponding to the user in the current frame action information.
  • Adopting such a scheme is beneficial to avoid inaccurate torso and neck movement information caused by shaking of the user's facial posture in a single frame image, and can make the torso and neck posture described by the user's torso and neck movement information more coherent and smooth, thereby This makes the animation of virtual characters more coherent without additional smoothing.
  • FIG. 1 is a schematic diagram of an application scene of a virtual character animation generation method in a first perspective in an embodiment of the present invention
  • Fig. 2 is a schematic diagram of an application scene of a virtual character animation generation method in a second perspective in an embodiment of the present invention
  • FIG. 3 is a schematic flowchart of a method for generating animation of a virtual character in an embodiment of the present invention
  • FIG. 4 is a partial flow diagram of a specific implementation manner of step S302 in FIG. 3;
  • FIG. 5 is a partial flow diagram of another specific implementation manner of step S302 in FIG. 3;
  • FIG. 6 is a partial flow diagram of a specific implementation manner of step S303 in FIG. 3;
  • Fig. 7 is another schematic diagram of an application scene of a virtual character animation generation method in a first perspective in an embodiment of the present invention.
  • Fig. 8 is a schematic structural diagram of an animation generating device for a virtual character in an embodiment of the present invention.
  • an embodiment of the present invention provides a method for generating animation of a virtual character.
  • the current frame image is obtained, and the current Frame the state information corresponding to the user. Since the state information includes face information, human body posture information and eye direction information, the animation data of the virtual character obtained according to the state information can have the same semantics as the state information corresponding to the user.
  • the user does not need to wear specific motion capture clothing or a specific helmet, and can obtain information such as the user's expression, facial posture, action posture, and eyes only based on a single frame of image, and then proceed according to the state information.
  • the animation of the virtual character is obtained through redirection processing, so the solution provided by the embodiment of the present invention has better versatility, lower cost and better user experience.
  • FIG. 1 is a schematic diagram of an application scene of a virtual character animation generation method in an embodiment of the present invention in a first perspective.
  • FIG. 2 is a virtual character animation generation in an embodiment of the present invention
  • FIG. 7 is another schematic diagram of the application scenario of a virtual character animation generation method in the embodiment of the present invention in the first perspective.
  • the first viewing angle is different from the second viewing angle.
  • a camera 11 may be used to take pictures of a user 10 .
  • the user 10 is the subject of the camera 11, and the user 10 is a real actor. It should be noted that, compared with the prior art, in the solution of the embodiment of the present invention, the user 10 does not need to wear motion capture clothing, and does not need to wear expression capture devices and eye capture devices.
  • the camera 11 may be various existing appropriate photographing devices, and this embodiment does not limit the type and quantity of the camera 11 .
  • camera 11 can be RGB (R is the abbreviation of red RED, G is the abbreviation of green Green, B is the abbreviation of blue Blue) camera; It can also be RGBD (D It is the abbreviation of depth map Depth) camera. That is, the image captured by the camera 11 may be an RGB image, or an RGBD image, etc., but is not limited thereto.
  • the camera 11 shoots the user 10, and the video stream data corresponding to the user 10 can be obtained.
  • the video stream data corresponding to the user 10 can include multiple frames of images, each frame of image has a time code, and each frame of image can include the user 10. image.
  • the distance between the user 10 and the camera 11 is less than a first preset distance threshold, and the image may include an image of the face of the user 10 , and may also include images of the neck and shoulders of the user 10 .
  • the distance between the user 10 and the camera 11 is generally small, therefore, the image may not include the image of the whole body of the user 10 .
  • the camera 11 in the embodiment of the present invention is not set on the wearable device of the user 10, and the distance between the user 10 and the camera 11 is greater than the second preset distance threshold, and the second preset distance threshold is usually far away. less than the first preset distance threshold.
  • the camera 11 can be connected to the terminal 12, and the terminal 12 can be various existing devices with data receiving and data processing functions, and the camera 11 can send the collected video stream data corresponding to the user 10 to the terminal 12.
  • the terminal 12 may be a mobile phone, a tablet computer, a computer, etc., but is not limited thereto. It should be noted that this embodiment does not limit the connection mode between the camera 11 and the terminal 12, which may be a wired connection or a wireless connection (for example, a Bluetooth connection, a LAN connection, etc.). More specifically, the camera 11 may be a camera set on the terminal 12, for example, may be a camera on a mobile phone, a camera on a computer, and the like.
  • the terminal 12 may sequentially process and analyze each frame of the video stream data corresponding to the user 10 collected by the camera 11 according to the sequence of the time code, so as to obtain the status information corresponding to the user 10 . Furthermore, redirection processing can be performed according to the state information corresponding to the user 10 in each frame of image to obtain the animation data of the virtual character 13 corresponding to the frame of image, and the obtained animation data has the same time code as the image.
  • the virtual character 13 may include a virtual person, and may also include virtual animals, virtual plants and other objects with faces and bodies.
  • the virtual character 13 may be three-dimensional or two-dimensional, which is not limited in this embodiment of the present invention.
  • FIG. 3 is a schematic flowchart of a method for generating animation of a virtual character in an embodiment of the present invention.
  • the method can be executed by a terminal, and the terminal can be various terminal devices capable of receiving and processing data, for example, a mobile phone, a computer, a tablet computer, etc., which is not limited in the embodiments of the present invention.
  • the terminal may be the terminal 12 shown in FIG. 1 , but is not limited thereto.
  • the animation generation method of the virtual character shown in Fig. 3 may comprise the following steps:
  • Step S301 Acquiring the current frame image, the current frame image includes the user's image
  • Step S302 According to the current frame image, determine the state information corresponding to the current frame user, the state information includes: face information, human body posture information and eye direction information, and the human face information includes facial posture information and face expression information;
  • Step S303 Perform redirection processing according to the state information to obtain the animation data of the virtual character, wherein the animation data and the time code of the current frame image are the same, and the animation data includes: facial animation data, Body animation data and eye animation data.
  • the method can be implemented in the form of a software program, and the software program runs in a processor integrated inside the chip or chip module; or, the method can be implemented by using hardware or a combination of hardware and software way to achieve.
  • the current frame image may be acquired, and the current frame image may be obtained by taking pictures of the user by the camera. More specifically, the current frame image may be an image currently to be processed in the video stream data corresponding to the user, and the time code of the current frame image may be recorded as the current moment.
  • the video stream data corresponding to the user may be obtained by using a camera to shoot the user.
  • the video stream data corresponding to the user is collected by a single camera, and the camera may be an RGB camera or an RGBD camera, but is not limited thereto.
  • the current frame image includes the image of the user.
  • the current frame image may include an image of the user's face, may also include images of the user's neck and shoulders, may also include images of at least a part of the arm, etc., but is not limited thereto.
  • the state information corresponding to the current frame user may be determined according to the current frame image, and the state information may include: face information, human body posture information and gaze direction information.
  • the state information corresponding to the user may be obtained by restoring and reconstructing the user according to the current frame image.
  • the face information includes facial posture information and facial expression information, wherein the facial posture information is used to describe the position and orientation of the user's face, and more specifically, the position and orientation of the user's face refer to The position and orientation of the user's face in three-dimensional space.
  • the position of the user's face may be the position of the user's face relative to the camera
  • the orientation of the user's face may be the orientation of the user's face relative to the camera.
  • the face information may also include: ID information, which is used to describe the shape of the user's face and the distribution of facial features.
  • the facial expression information can be used to describe the user's expression.
  • the facial expression information can be the weights of multiple blend shapes (Blend shapes), where the multiple blend shapes can be preset; the facial expression information can also be the weight of multiple blend shapes.
  • the human body posture information can be used to describe the motion posture of the user's body.
  • the human body posture information may be joint angle data, and more specifically, the joint angle data is the angle of a joint.
  • the gaze direction information may be used to describe the user's gaze direction.
  • the direction from the center of the eyeball to the center of the three-dimensional pupil is the gaze direction.
  • the center position of the eyeball is the position of the center point of the eyeball
  • the three-dimensional pupil center position is the position of the center point of the pupil. Since the center position of the iris coincides with the center position of the three-dimensional pupil, the specific position of the iris on the eyeball is determined according to the center position of the three-dimensional pupil, so the iris will move with the change of the center position of the three-dimensional pupil, but the iris size of the same user Can be fixed.
  • the iris size is the size of the iris, and the iris size can be used to determine the coverage area of the iris in the eyeball.
  • the gaze direction information may be the three-dimensional pupil center position. More specifically, the gaze direction information may be the zenith angle and azimuth angle of the three-dimensional pupil center position in a spherical coordinate system with the eyeball center position as the coordinate origin. Specifically, the three-dimensional pupil center position can use spherical coordinates (r, ⁇ , ), where r is the radius of the three-dimensional eyeball, ⁇ is the zenith angle, is the azimuth angle.
  • Zenith angle ⁇ and azimuth angle to characterize the ray direction generated by the connection between the eyeball center position and the three-dimensional pupil center position, so the zenith angle ⁇ and azimuth angle in the spherical coordinates of the three-dimensional pupil center position can be used to indicate the direction of the eye.
  • FIG. 4 is a partial flowchart of a specific implementation manner of step S302 in FIG. 3 .
  • the face information corresponding to the user in the current frame can be obtained, more specifically, the facial pose information and facial expression information corresponding to the user in the current frame can be obtained.
  • Step S302 shown in FIG. 4 may include the following steps:
  • Step S401 Generate a 3D face model according to the initial face information corresponding to the user in the current frame;
  • Step S402 According to the three-dimensional face model, determine estimated face feature information, and calculate a first difference between the estimated face feature information and the target face feature information of the current frame;
  • Step S403 Determine whether the first preset condition is met; if yes, execute step S404, otherwise execute step S405;
  • Step S404 use the initial face information as the face information corresponding to the current frame user;
  • Step S405 update the initial face information, and use the updated initial face information as the initial face information corresponding to the user in the current frame; and return to step S401 until the first preset condition is met.
  • the initial face information corresponding to the user in the current frame may be a preset default value, or may be the face information corresponding to the user in the previous frame.
  • the default value of the identity ID information in the initial face information may be the average value of the identity ID information of multiple sample users. Since the average value calculated based on the identity ID information of multiple sample users is universal, it can be used as the default value of the identity ID information in the user's initial face information in the current frame image, and the default value of the facial pose information can be the user The preset position and orientation, and the default value of the facial expression information may be the user's facial expression information under a neutral expression, and the default value may be pre-collected.
  • the "user” in the embodiment of the present invention refers to the user in the current frame image
  • the “multiple sample users” refers to the users involved in the preparatory work such as training data collection before using the camera to collect video stream data or performers.
  • step S401 when step S401 is executed for the first time, the face information corresponding to the user in the last frame can also be used as the initial face information corresponding to the user in the current frame, which is beneficial to reduce the amount of calculation, and make the face of the obtained virtual character
  • the animation data is much smoother without additional smoothing.
  • step S401 When step S401 is executed again, that is, when returning from step S405 to step S401, the initial face information corresponding to the user in the current frame may be the updated initial face information.
  • a three-dimensional face model can be synthesized according to the initial face information corresponding to the user in the current frame.
  • the 3D face model in step S401 is obtained based on the initial face information corresponding to the user in the current frame, not based on the image in the current frame.
  • the embodiment of the present invention does not limit the specific method for synthesizing a three-dimensional face model according to the initial face information (identity ID information, facial posture information, and facial expression information) corresponding to the user in the current frame. It is a variety of existing methods capable of synthesizing 3D face models.
  • estimated face feature information may be calculated according to the three-dimensional face model obtained in step S401.
  • the estimated face feature information is the face feature information obtained according to the three-dimensional face model, and the estimated face feature information may include: two-dimensional projection point coordinate information and texture feature point coordinate information.
  • multiple 3D feature points may be extracted from the 3D face model, and then the multiple 3D feature points are projected onto a 2D plane to obtain multiple 2D projection points.
  • the two-dimensional plane refers to the plane of the image coordinate system of the camera.
  • multiple vertices are extracted from the 3D human face model according to multiple predefined vertex indices to obtain multiple 3D feature points. That is, the 3D feature points are vertices determined on the 3D face model based on the predefined vertex indices.
  • each vertex index is used to refer to a specific facial part, and different vertex indexes refer to different facial parts.
  • vertex index 3780 is used to refer to the tip of the nose, etc.
  • the 3D face model may include multiple vertices, and the vertices corresponding to the multiple vertex indices may be extracted to obtain multiple 3D feature points.
  • multiple 3D feature points may be projected onto a 2D plane, so as to convert the 3D coordinates of each 3D feature point into the 2D coordinates of the 2D projected point corresponding to the 3D feature point.
  • estimated face feature information can be obtained, that is, the estimated face feature information can include two-dimensional coordinates of multiple two-dimensional projection points.
  • a first difference between the estimated face feature information and the target face feature information may be calculated, where the target face feature information is detected according to the current frame image.
  • the feature information of the target face may include: coordinate information of two-dimensional feature points, where the two-dimensional feature points are points with specific semantic information in the current frame image.
  • a machine learning method may be used to detect the current frame image to detect multiple two-dimensional feature points.
  • the semantic information is predefined, and the semantic information can be used to describe the facial parts corresponding to the two-dimensional feature points.
  • the semantic information of the 2D feature point No. 64 is: nose point.
  • the facial parts described by the semantic information of the plurality of two-dimensional feature points are the same as the facial parts referred to by the plurality of vertex indices.
  • the estimated face feature information may also include texture feature point coordinate information
  • the target face feature information may also include pixel coordinates corresponding to texture feature points.
  • the two-dimensional texture coordinates (u, v) corresponding to the pixel point are determined according to the pixel point in the current frame image
  • the three-dimensional texture corresponding to the pixel point on the three-dimensional face model can be determined according to the predefined texture mapping relationship Points, that is, different from the above three-dimensional feature points
  • three-dimensional texture points are vertices determined on the three-dimensional face model according to a predefined texture mapping relationship.
  • multiple three-dimensional texture points may be projected onto a two-dimensional plane to obtain two-dimensional coordinates of corresponding texture feature points. Furthermore, the coordinate difference between the pixel point and the corresponding texture feature point can be calculated.
  • the first difference can be calculated according to the coordinate difference between the pixel point and the corresponding texture feature point, and the coordinate difference between the two-dimensional feature point and the two-dimensional projection point.
  • the embodiment of the present invention does not limit the order of detecting the feature information of the target face and determining the feature information of the estimated face.
  • a first difference between the estimated face feature information and the target face feature information may be calculated. More specifically, coordinate differences between a plurality of two-dimensional projected points and a plurality of two-dimensional feature points may be calculated.
  • step S403 it may be judged whether the first preset condition is met, wherein the first preset condition may include: the first difference is not greater than the first preset threshold, and/or, the number of times the initial face information is updated reached the second preset threshold.
  • step S404 may be executed, that is, the initial face information corresponding to the user in the current frame may be used as the face information corresponding to the user in the current frame.
  • the first preset condition it can be determined that the three-dimensional face model in step S401 conforms to the real face of the user, in other words, the face information in step S401 can accurately and truly describe the user's face in the current frame image. Facial pose and facial expression etc.
  • step S405 can be executed, that is, update the initial face information, and continue to execute steps S401-S403 according to the updated initial face information until the first preset condition is met.
  • only facial pose information and facial expression information may be updated each time initial face information is updated, that is, user ID information is not updated.
  • the ID information of the user may be predetermined. Since the user in the application scenario of this embodiment is usually fixed, that is, in the process of recording video, the object captured by the camera is usually the same person, therefore, the user's ID information can be fixed, that is, can be used Pre-determined ID information. Adopting this solution can simplify the calculation process of face information and help improve the efficiency of animation generation.
  • the identity ID information of the user may be determined before obtaining the video stream data corresponding to the user.
  • the identity ID information of the user may be determined before obtaining the video stream data corresponding to the user.
  • a plurality of identity images can be obtained, each identity image includes an image of the user, wherein the user's expression in each identity image is the default expression and the user's facial posture (that is, face The position and/or orientation of the parts) can be different.
  • the preset initial 3D face model can be iteratively optimized based on multiple identity images to obtain the user's identity ID parameters, and the identity ID obtained based on the multiple identity images can be used in the subsequent animation process of generating the virtual character parameter.
  • the preset initial 3D face model refers to a 3D face model constructed with preset default values according to identity ID parameters, facial posture information and facial expression information, that is, the preset initial 3D face model The model is the initial model without any optimization and adjustment, and the default expression can be a neutral expression.
  • the human body posture information corresponding to the user in the current frame may also be determined.
  • the human body posture information may be obtained by directly constructing a three-dimensional human body model from an image, or may be obtained through calculation based on facial posture information, which is not limited in this embodiment.
  • the human body posture information may include: trunk and neck movement information.
  • the torso and neck action information is used to describe the action posture of the user's torso and neck.
  • the movement information of the trunk and neck may include joint angle data of a plurality of first preset joints, and the first preset joints are joints located on the trunk and the neck.
  • the torso and neck movement information corresponding to the user in the current frame is calculated based on the facial posture information corresponding to the user in the current frame.
  • the facial posture information corresponding to the user in the current frame can be input into the human body posture matching model to obtain the torso and neck movement information corresponding to the user in the current frame.
  • the human body posture matching model may be obtained by training a first preset model according to the first training data, wherein the first preset model may be various existing models with learning capabilities.
  • the first training data may include multiple pairs of first sample information, and each pair of first sample information includes facial posture information corresponding to the sample user and torso and neck movement information corresponding to the sample user. More specifically, each pair of first sample information is obtained by performing motion capture on the sample user, and there is a corresponding relationship between facial posture information and torso and neck motion information belonging to the same pair of first sample information.
  • the multiple pairs of first sample information may be obtained by performing motion capture on the same sample user, or may be obtained by performing motion capture on multiple first sample users. It should be noted that the sample users in this embodiment of the present invention are real people.
  • the human body posture matching model trained by using the first training data can learn the relationship between the position and orientation of the real person's face and the posture of the real person's torso and neck. Therefore, the torso and neck movement information output by the human body pose matching model is real and natural, that is, the user's overall posture presented by the output torso and neck movement information and the input facial posture information is real and natural.
  • the calculation amount is smaller, and the efficiency of animation generation can be improved under the premise of ensuring the animation effect.
  • the torso and neck movement information corresponding to the user in the current frame is calculated based on the facial posture information and associated posture information corresponding to the user in the current frame. More specifically, the input of the human pose matching model is the facial pose information corresponding to the user in the current frame and the associated pose information corresponding to the user in the current frame, and the corresponding output is the torso and neck movement information corresponding to the user in the current frame.
  • the associated posture information includes: the facial posture information and/or the torso and neck movement information corresponding to the user in the associated image, and the associated image is a continuous multi-frame image before the current frame image and/or the current frame image. Consecutive multiple frames of images after a frame image.
  • the time code of the current frame image is recorded as t1
  • the associated posture information may include the facial posture information corresponding to the user in multiple consecutive images with time codes from t1-T to t1-1, where T is a positive integer .
  • the associated gesture information may also include the user's corresponding torso and neck movement information in multiple consecutive images with time codes from t1-T to t1-1.
  • the associated posture information may include facial posture information and torso and neck motion information corresponding to the user in the 30 frame images adjacent to the current frame image and before the current frame image.
  • the associated posture information may also include facial posture information corresponding to the user in images with time codes from t1+1 to t1+T, and may also include torso information corresponding to users in images with time codes from t1+1 to t1+T Neck movement information.
  • the associated posture information may also include facial posture information and torso and neck motion information corresponding to the user in the 30 frame images adjacent to the current frame image and after the current frame image.
  • the human body posture matching model can be a time-series model, and the facial posture information and associated posture information corresponding to the user in the current frame can be input into the human body posture matching model to obtain the torso neck corresponding to the user in the current frame.
  • Department action information Adopting such a scheme is beneficial to avoid inaccurate torso and neck movement information caused by shaking of the user's facial posture in a single frame image, and can make the torso and neck posture described by the torso and neck movement information more coherent and smooth, thereby making the virtual Character animations flow more smoothly without additional smoothing.
  • the human body posture information may further include: limb movement information, and the limb movement information may be used to describe the movement posture of the user's limbs.
  • limb motion information can be used to describe the motion of a user's arm.
  • the limb movement information may include joint angle data of a plurality of second preset joints, and the plurality of second preset joints are joints located in the limbs. More specifically, the plurality of second preset joints may include arm joints.
  • the limb movement information may be a preset default value, for example, the arm movement represented by the default value of the limb movement information may be natural drooping, etc., but it is not limited thereto.
  • the limb movement information may also be input by the user.
  • fusion processing may be performed on the torso and neck movement information and the limb movement information to obtain human body posture information of the current frame.
  • it may be judged whether the trunk and neck movements described by the trunk neck movement information meet the action conditions, and if not, adjust the trunk neck movement information.
  • Body motion information so that the trunk and neck motions described by the adjusted trunk and neck motion information meet the motion conditions.
  • the action condition is determined according to the action information of the limbs.
  • the action condition is determined according to the action information of the limbs.
  • the action posture of the limbs described by the action information of the limbs and the action of the trunk and neck described by the action information of the trunk and neck can be determined.
  • the overall body posture presented by the posture is reasonable and real. If the action information of the trunk and neck does not satisfy the action condition, it can be determined that the overall body posture presented by the actions of the limbs described by the action information of the limbs and the actions of the trunk and neck described by the action information of the trunk and neck is unreasonable.
  • the action condition is the action posture of the trunk and neck that matches the action posture of the limbs described by the limb action information.
  • the posture of the department is inconsistent.
  • FIG. 5 is a partial flow diagram of another specific implementation manner in step S302 .
  • the gaze direction information corresponding to the user in the current frame of the user can be obtained.
  • Step S302 shown in FIG. 5 may include the following steps:
  • Step S501 Determine the three-dimensional eyeball model according to the eye information corresponding to the user in the current frame and the estimated pupil center position;
  • Step S502 Determine estimated eye feature information according to the three-dimensional eyeball model, and calculate a second difference between the estimated eye feature information and the target eye feature information;
  • Step S503 judge whether the second preset condition is met; if yes, execute step S504; otherwise, execute step S505;
  • Step S504 Use the estimated pupil center position as the three-dimensional pupil center position corresponding to the user in the current frame;
  • Step S505 update the estimated pupil center position, and use the updated estimated pupil center position as the estimated pupil center position corresponding to the user in the current frame; and return to step S501 until the second preset condition is met.
  • the eye information includes eyeball center position, eyeball radius and iris size. It can be understood that the eye information is the personalized data of each person's eyeball, and the specific value of the eye information of different users is different. The specific value of the eye information of the same user can be fixed, but the same The user's gaze direction information may be different.
  • step S501 for the first time Before executing step S501 for the first time, it can be judged according to the current frame image whether the eyes are in the closed-eye state, and if so, the gaze direction information of the eyes at the last moment can be used as the gaze direction information corresponding to the current frame user, that is, there is no need to Then execute the steps shown in FIG. 5 for the eye.
  • the eye information corresponding to the user in the current frame may be a preset default value, or may be the eye information corresponding to the user in the previous frame.
  • the estimated pupil center position corresponding to the user in the current frame may be a preset default value, or may be the three-dimensional pupil center position corresponding to the user in the previous frame.
  • the default value of the eyeball center position can be the average value of the eyeball center positions of multiple sample users, similarly, the default value of the eyeball radius can be the average value of the eyeball radius of multiple sample users, iris
  • the default value for size may be the average of the iris sizes of multiple sample users.
  • the default value of the estimated pupil center position may be the position of the pupil when the user looks ahead.
  • step S501 When step S501 is executed again, that is, when returning from step S505 to step S501, the estimated pupil center position corresponding to the user in the current frame may be the updated estimated pupil center position.
  • a three-dimensional eyeball model can be synthesized according to the eye information corresponding to the user in the current frame and the estimated pupil center position.
  • the embodiment of the present invention does not limit the specific method for synthesizing the 3D eyeball model based on the eye information and the estimated pupil center position, and various existing methods capable of synthesizing the 3D eyeball model may be used.
  • step S501 when step S501 is executed for the first time, the eye information corresponding to the user in the previous frame may also be used as the eye information corresponding to the user in the current frame, and the three-dimensional pupil center position corresponding to the user in the previous frame may be used as the eye information corresponding to the user in the current frame. Estimating the position of the center of the pupil can not only reduce the amount of calculation, but also make the eye animation data of the avatar smoother without additional smoothing.
  • the estimated eye feature information can be determined according to the three-dimensional eyeball model obtained in step S501, and the second difference between the estimated eye feature information and the target eye feature information can be calculated, wherein , the target eye feature information is obtained based on the current frame image detection.
  • the eye feature information may include two-dimensional pupil center positions, iris mask positions, and the like.
  • the two-dimensional pupil center position refers to the position of the pupil in the two-dimensional plane
  • the iris mask position may refer to the position of the iris mask in the two-dimensional plane.
  • the estimated eye feature information can be obtained by projecting the pupil position and iris mask position according to the three-dimensional eyeball model to a two-dimensional plane, and the target eye feature information can be obtained by detecting the current frame image using a machine learning method of.
  • step S503 it may be judged whether the second preset condition is satisfied, wherein the second preset condition may include: the second difference is not greater than the third preset threshold, and/or, updating the estimated pupil center position The number of times reaches the fourth preset threshold.
  • the third preset threshold may be the same as or different from the first preset threshold
  • the fourth preset threshold may be the same as or different from the second preset threshold.
  • step S504 can be executed, that is, the estimated pupil center position corresponding to the user in the current frame can be used as the three-dimensional pupil center position corresponding to the user in the current frame.
  • step S505 can be executed, that is, update the estimated pupil center position, and continue to execute steps S501-S503 according to the updated estimated pupil center position information until the second preset condition is met. set conditions.
  • the eye information can be fixed, that is, the eye information can be predetermined, and only the predetermined information is updated each time. Estimating the position of the center of the pupil can simplify the calculation process and improve the efficiency of animation generation.
  • the user's eye information is determined before acquiring the video stream data corresponding to the user.
  • the eye image includes an image of the user's eyes, and the user's expression in the eye image is a neutral expression and the direction of the eyes is looking straight ahead.
  • a plurality of three-dimensional eyelid feature points can be determined according to the eye image, and then the average value of the three-dimensional positions of the multiple three-dimensional eyelid feature points can be calculated, and a preset three-dimensional offset can be added to the average value to obtain The center of the eyeball.
  • the offset direction of the offset is toward the inside of the eyeball.
  • the preset initial three-dimensional eyeball model can also be performed on the preset initial three-dimensional eyeball model according to the eye image to obtain the iris size.
  • the eyeball center position and iris size obtained from the eye image can be used, and the eyeball radius can be the average value of eyeball radii of multiple sample users.
  • the preset initial 3D eyeball model refers to a 3D eyeball model in which both the eyeball information and the 3D pupil center position are constructed using preset default values.
  • eye information may also be updated each time step S505 is performed, which is not limited in this embodiment of the present invention.
  • the gaze direction information of the two eyes can be determined respectively, and whether the gaze direction information of the two eyes satisfies the interaction relationship can be judged. If the gaze direction of the two eyes does not satisfy the interaction relationship, it can be determined that the calculated gaze direction information is incorrect, and the gaze direction information corresponding to the user in the previous frame is used as the gaze direction information corresponding to the user in the current frame.
  • the interactive relationship means that the eye direction of the left eye and the eye direction of the right eye can be made by the same person at the same time.
  • the face information, eye direction information and human body posture information corresponding to the user in the current frame can be obtained.
  • redirection processing may be performed according to the status information corresponding to the user in the current frame, so as to obtain the animation data of the virtual character in the current frame.
  • the animation data of the virtual character may include controller data for generating the animation of the virtual character, and the specific form is a sequence of digitized vectors.
  • the animation data can be converted into a data form that can be received by UE or Unity3d (multiple blend shapes (Blend shapes) weights and joint angle data), and input to a rendering engine, such as UE or Unity3d, to drive the corresponding parts of the virtual character Make corresponding actions.
  • UE or Unity3d multiple blend shapes (Blend shapes) weights and joint angle data
  • FIG. 6 is a partial flowchart of a specific implementation manner of step S303 in FIG. 3 .
  • the body animation data of the virtual character in the current frame can be determined according to the human body posture information corresponding to the user in the current frame.
  • Step S303 shown in FIG. 6 may include the following steps:
  • Step S601 generating a transition skeleton model
  • Step S602 According to the joint angle data of the first skeletal model and the first skeletal model, determine the positions of the plurality of preset key joints;
  • Step S603 Determine the joint angle data of the transition skeleton model according to the positions of the plurality of preset key joints and the transition skeleton model, so as to obtain the body animation data of the virtual character.
  • the user's skeleton and the virtual character's skeleton are inconsistent in definition (the number of bones, the default orientation of the bones, and the joint position), it is impossible to directly transmit the user's body posture information to the virtual character.
  • the definition of the joint angle is also different , the joint angle data cannot be passed directly.
  • using inverse kinematics to transfer will also cause problems in the posture of the avatar.
  • a transitional skeleton model may be generated according to the first skeleton model and the second skeleton model.
  • the first skeleton model is a skeleton model corresponding to the user, more specifically, the skeleton model corresponding to the user is a skeleton model that can be used to describe the skeleton of the user.
  • the first bone model may be obtained by restoring and reconstructing an image, or may be a preset average bone. Wherein, if the average skeleton is adopted, the step of obtaining the user skeleton model can be omitted.
  • the second skeleton model is a skeleton model of the virtual character.
  • the skeleton form of the transitional skeleton model is the same as that of the second skeleton model, and the skeleton form includes the number of bones and the default orientation of the rotation axis of each joint, that is, the number of bones of the transitional skeleton model
  • the number of bones is the same as that of the second bone model
  • the bones in the transition bone model correspond to the bones in the second bone model
  • the default orientation of the rotation axis of each joint in the transition bone model corresponds to that in the second bone model
  • the default orientation of the joint's rotation axis is also the same.
  • a plurality of preset key joints are pre-defined, and the preset key joints are preset defined joints. More specifically, the preset key joints may be selected from the first preset joints and the second preset joints above. The positions of the multiple preset key joints in the first skeletal model may be obtained, and the positions of the multiple preset key joints in the transitional skeletal model are respectively set as the positions of the multiple preset key joints in the first skeletal model The position in , thus the transition bone model can be obtained. More specifically, the position of each preset key joint in the transitional skeletal model is the same as the position of the preset key joint in the first skeletal model.
  • the positions of multiple preset key joints in the first skeletal model can be calculated according to the joint angle data of the first skeletal model and the first skeletal model. Since the position of each preset key joint in the transition skeleton model is the same as the position of the preset key joint in the first skeleton model, the positions of multiple preset key joints in the transition skeleton model can be obtained.
  • the joint angle data of the transitional skeleton model can be calculated and determined according to the positions of multiple preset key joints. Since the skeleton shape of the transitional skeleton model is the same as that of the second skeleton model, the joint angle data of the virtual character can be obtained through direct transmission. In other words, the joint angle data of the transition skeleton model can be directly used as the joint angle data of the virtual character. Further, the joint angle data can be used as body animation data of the virtual character. Thus, the obtained body animation data and human body pose information can have similar semantics.
  • the body animation data of the virtual character can be further optimized. Specifically, it may be judged whether the body animation data satisfies a preset posture constraint condition, and if not, the body animation data may be adjusted to obtain the body animation data of the current frame.
  • the human body posture information includes torso and neck movement information
  • redirection processing can be performed according to the torso and neck movement information to obtain the torso and neck animation data of the virtual character
  • the torso and neck animation data is used to generate the torso of the virtual character and neck movements.
  • the joint angle data of the torso and neck of the virtual character may be obtained through redirection according to the joint angle data of the torso and neck corresponding to the user.
  • the animation data of the limbs of the virtual character may be acquired, and the animation data of the limbs is used to generate movements of the limbs of the virtual character.
  • the limb animation data may be preset.
  • the limb animation data of the virtual character corresponding to the current frame may be determined from a plurality of preset limb animation data according to the user's selection.
  • the animation data of the limbs and the animation data of the torso and neck can be fused to obtain the body animation data of the virtual character.
  • it can be judged whether the action corresponding to the animation data of the trunk and neck matches the action corresponding to the animation data of the limbs. If not, the animation data of the trunk and neck can be adjusted so that the adjusted trunk and neck The action corresponding to the internal animation data matches the action corresponding to the limb animation data.
  • the overall body posture of the generated virtual character is reasonable and real.
  • the action corresponding to the limb animation data is waving goodbye, and the action corresponding to the torso and neck animation data is the torso in a half-lying posture, then the action corresponding to the torso and neck animation data does not match the action corresponding to the limbs animation data.
  • the action corresponding to the adjusted torso and neck animation data is that the torso is in an upright posture, then the adjusted action corresponding to the torso and neck animation data matches the action corresponding to the limbs animation data.
  • the human body posture information is the human body posture information of the current frame obtained by fusing the above-mentioned torso and neck movement information and limb movement information, then the redirection process can be performed according to the human body posture information of the current frame to obtain the corresponding The body animation data of the avatar.
  • redirection processing may also be performed according to the facial expression information corresponding to the user in the current frame, so as to obtain the facial animation data of the avatar in the current frame.
  • the facial expression information is the weight of multiple blend shapes, or the weight of multiple principal component vectors obtained by performing principal component analysis on multiple blend shapes
  • the virtual character is also pre-defined with the same number and For blend shapes with the same semantics, the weight can be directly transferred to the avatar, that is, the facial expression information can be directly used as the facial animation data of the avatar.
  • facial expression information can be input into the expression mapping model to obtain facial animation data.
  • the facial expression information is weights of multiple blend shapes (Blendshapes), or weights of multiple principal component vectors obtained by performing principal component analysis on multiple blend shapes, or is three-dimensional feature points. That is, the weight of multiple blend shapes (Blendshapes), or the weight of multiple principal component vectors obtained by performing principal component analysis on multiple blend shapes, or the three-dimensional feature points input to the expression mapping model to obtain facial animation data.
  • the expression mapping model is obtained by using the second training data to train the second preset model in advance.
  • the embodiment of the present invention does not limit the type and structure of the second preset model, which may be an existing Various models with learning capabilities.
  • the second training data used for training may include multiple sets of second sample information, and each set of second sample information includes: facial expression information of a plurality of sample users under preset expressions and virtual characters in preset expressions. Facial animation data under . Wherein, different sets of second sample information correspond to different preset expressions. Wherein, the facial expression information of the plurality of samples under the preset expression may be collected in advance, and the facial animation data of the virtual character under the preset expression is pre-set by an animator.
  • the expression mapping model trained by using the second training data can learn the mapping relationship between the facial expression information and the facial animation data of the avatar. Therefore, the facial animation data output by the expression mapping model and the facial expression information corresponding to the user can have similar semantics.
  • the expression mapping model is versatile and can be used to determine the virtual character's facial expression information according to any user's corresponding facial expression information. Facial animation data.
  • the facial animation data obtained above may include mouth animation data.
  • the mouth animation data can also be determined by the method described below, and the mouth animation data determined by the following overwrite the mouth animation data in the facial animation data obtained above, to get the updated facial animation data.
  • expression information related to the mouth can be extracted from facial expression information, and recorded as mouth expression information.
  • the blend shapes related to the mouth can be determined according to the semantics of each blend shape, and the weight of the blend shapes related to the mouth is the mouth expression information.
  • mouth expression information can be input into the first mouth shape mapping model.
  • the first mouth shape mapping model is obtained by using the third training data to train the third preset model in advance.
  • the third preset model can be various existing models with learning ability, more specifically, the third preset model can be a radial basis (Radial Basis Function) model, but it is not limited thereto.
  • the third training data may include multiple sets of third sample information, and each set of third sample information includes mouth expression information of multiple sample users under preset expressions and mouth animations of avatars under the preset expressions data. Wherein, different sets of third sample information correspond to different preset expressions.
  • the mouth expression information of a plurality of sample users under the preset expressions may be collected in advance, and the mouth animation data of the virtual character under the preset expressions is pre-set by an animator.
  • the first mouth shape mapping model trained by using the third training data can learn the mapping relationship between the mouth expression information and the virtual character's mouth animation data. Therefore, the mouth animation data output by the first mouth shape mapping model and the user's mouth expression information may have similar semantics.
  • the first mouth shape mapping model is versatile and can be used to determine virtual The character's mouth animation data.
  • the 3D feature points related to the mouth can be extracted, more specifically, according to the predefined vertex index related to the mouth, A plurality of 3D feature points related to the mouth are extracted from the 3D face model, and recorded as the 3D feature information of the mouth.
  • the three-dimensional characteristic information of the mouth can be input into the second mouth shape mapping model to obtain the output animation data of the mouth of the current frame.
  • the second mouth shape mapping model is obtained by using the fourth training data to train the fourth preset model.
  • the fourth preset model can be various existing models with learning ability, more specifically, the fourth preset model can be a radial basis (Radial Basis Function) model, but it is not limited thereto.
  • the fourth training data may include multiple sets of fourth sample information, and each set of fourth sample information includes multiple sample user mouth three-dimensional feature information under preset expressions and the mouth of the avatar under the preset expressions animation data.
  • multiple sets of fourth sample information correspond to different preset expressions.
  • the three-dimensional feature information of the mouth of a plurality of sample users under the preset expressions can be collected in advance, more specifically, the three-dimensional feature information of the mouth of the sample users under the preset expressions can be based on the sample users under the preset expressions Extracted from the 3D face model; the mouth animation data of the virtual character under the preset expression is pre-set by the animator.
  • the second mouth shape mapping model trained by using the fourth training data can learn the mapping relationship between the three-dimensional feature information of the mouth and the animation data of the mouth of the avatar. Therefore, the mouth animation data output by the second mouth shape mapping model and the user's mouth three-dimensional feature information may have similar semantics.
  • the second mouth shape mapping model is also versatile and can be used to The information identifies mouth animation data for the avatar.
  • the teeth animation data may also be determined according to the mouth animation data.
  • tooth animation data can be obtained by adding a preset offset to the mouth animation data.
  • the jaw animation data can be extracted from the mouth animation data, and a preset offset can be added to the jaw animation data to obtain the teeth animation data, so that the teeth of the virtual character can follow the movement of the jaw , so that the overall action posture of the virtual character is more real and natural.
  • redirection processing can also be performed according to the gaze direction information corresponding to the user in the current frame, so as to obtain the gaze animation data of the virtual character in the current frame, so that the gaze direction of the virtual character and the gaze direction of the user can be as close as possible to each other. May agree.
  • the zenith angle ⁇ and azimuth angle when the three-dimensional pupil center position corresponding to the user in the current frame is expressed in spherical coordinates Passed directly to the virtual character.
  • the eyeball center position, eyeball radius, and iris size of the avatar can be preset, and the zenith angle ⁇ and azimuth angle
  • the eyeball center position of the virtual character points to the direction of the three-dimensional pupil center position.
  • the three-dimensional pupil center position of the avatar can be determined, and thus the eye animation data of the avatar can be obtained.
  • gaze animation data may be determined using a gaze mapping model.
  • the eye mapping model may be obtained by using the fifth training data to train the fifth preset model in advance, and the fifth preset model may be various existing models with learning ability.
  • the fifth preset model can be a radial basis model
  • the fifth training data can include multiple pairs of fifth sample information
  • each pair of fifth sample information includes: The three-dimensional pupil center position (can be recorded as a sample pupil position) and the three-dimensional pupil center position of the virtual character in the preset gaze direction (can be recorded as a sample virtual pupil position).
  • multiple pairs of fifth sample information correspond to different preset gaze directions. More specifically, the plurality of preset gaze directions may include looking up, looking left, looking right, looking up, looking down, etc., but not limited thereto.
  • the three-dimensional pupil center positions of the user under multiple preset gaze directions may be obtained based on an image detection algorithm, which is not limited in this embodiment.
  • the three-dimensional pupil center position of the virtual character in each preset gaze direction may be predetermined.
  • the RBF weight parameters of the radial basis model can be calculated and determined according to the user's three-dimensional pupil center position and the virtual character's three-dimensional pupil center position under multiple preset gaze directions, and the RBF weight parameter can be used to characterize the user's corresponding The mapping relationship between the three-dimensional pupil center position and the three-dimensional pupil center position of the avatar, and thus the eye-eye mapping model can be obtained.
  • the three-dimensional pupil center position corresponding to the user in the current frame can be input into the eye mapping model, and the virtual pupil center position corresponding to the current frame output by the eye mapping model can be obtained, thereby obtaining eye animation data, wherein the virtual pupil position is The 3D pupil center position of the avatar.
  • the animation data of the virtual character corresponding to the current frame can be obtained, and the animation data includes but not limited to: facial animation data, body animation data, eyeball animation data, and the like.
  • video stream data corresponding to the virtual character may be determined according to the animation data.
  • the animation data of the virtual character can be calculated and rendered to obtain the video picture information of the virtual character.
  • animation data can be fed into a real-time engine (eg, UE4, Unity, etc.) for evaluation and rendering.
  • a real-time engine eg, UE4, Unity, etc.
  • the video frame information has the same time code as the animation data.
  • the video stream data may be sent to a live server, so that the live server forwards the video stream data to other user terminals.
  • the voice information input by the user can also be acquired.
  • the voice information and the video picture come from different devices, and can be synchronized according to the respective time codes of the voice information and the video picture, so as to obtain the video stream data corresponding to the virtual character, wherein , the picture information is obtained by rendering the virtual character according to the animation data.
  • the voice is synchronized with the expression, eyes and gestures of the virtual character, so as to obtain the live video data of the virtual character.
  • FIG. 8 is a schematic structural diagram of an animation generating device for a virtual character in an embodiment of the present invention.
  • the device shown in FIG. 8 may include:
  • An image acquisition module 81 configured to acquire a current frame image, the current frame image including the image of the user;
  • Calculation module 82 configured to determine the state information corresponding to the user in the current frame according to the current frame image, the state information includes: face information, human body posture information and gaze direction information, and the face information includes face Posture information and facial expression information;
  • a redirection module 83 configured to perform redirection processing according to the state information to obtain animation data of the virtual character, wherein the animation data is the same as the time code of the current frame image, and the animation data includes: Facial animation data, body animation data, and eye animation data.
  • the animation generation device for the above-mentioned virtual character may correspond to a chip with animation generation function in the terminal; or correspond to a chip module with animation generation function in the terminal, or correspond to the terminal.
  • An embodiment of the present invention also provides a storage medium on which a computer program is stored, and when the computer program is run by a processor, the steps of the above-mentioned method for generating animation of a virtual character are executed.
  • the storage medium may include ROM, RAM, magnetic or optical disks, and the like.
  • the storage medium may also include a non-volatile memory (non-volatile) or a non-transitory (non-transitory) memory, and the like.
  • An embodiment of the present invention also provides a terminal, including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor executes the above virtual character when running the computer program.
  • the steps of the animation generation method include but are not limited to terminal devices such as mobile phones, computers, and tablet computers.
  • the processor may be a central processing unit (CPU for short), and the processor may also be other general-purpose processors, digital signal processors (digital signal processor, DSP for short) , application specific integrated circuit (ASIC for short), off-the-shelf programmable gate array (field programmable gate array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the memory in the embodiments of the present application may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memories.
  • the non-volatile memory can be read-only memory (read-only memory, referred to as ROM), programmable read-only memory (programmable ROM, referred to as PROM), erasable programmable read-only memory (erasable PROM, referred to as EPROM) , Electrically Erasable Programmable Read-Only Memory (electrically EPROM, referred to as EEPROM) or flash memory.
  • the volatile memory can be random access memory (RAM), which acts as external cache memory.
  • RAM random access memory
  • static random access memory static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous Dynamic random access memory
  • SDRAM synchronous Dynamic random access memory
  • DDR SDRAM double data rate synchronous dynamic random access memory
  • ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM Synchronously connect dynamic random access memory
  • direct rambus RAM direct rambus RAM
  • the above-mentioned embodiments may be implemented in whole or in part by software, hardware, firmware or other arbitrary combinations.
  • the above-described embodiments may be implemented in whole or in part in the form of computer program products.
  • the computer program product comprises one or more computer instructions or computer programs.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer program can be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer program can be downloaded from a website, computer, server or data center Wired or wireless transmission to another website site, computer, server or data center.
  • the disclosed methods, devices and systems can be implemented in other ways.
  • the device embodiments described above are only illustrative; for example, the division of the units is only a logical function division, and there may be other division methods in actual implementation; for example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, each unit may be physically included separately, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware, or in the form of hardware plus software functional units.
  • each module/unit contained therein may be realized by hardware such as a circuit, or at least some modules/units may be realized by a software program, and the software program Running on the integrated processor inside the chip, the remaining (if any) modules/units can be realized by means of hardware such as circuits; They are all realized by means of hardware such as circuits, and different modules/units can be located in the same component (such as chips, circuit modules, etc.) or different components of the chip module, or at least some modules/units can be realized by means of software programs,
  • the software program runs on the processor integrated in the chip module, and the remaining (if any) modules/units can be realized by hardware such as circuits; /Units can be realized by means of hardware such as circuits, and different modules/units can be located in the same component (such as chips, circuit modules, etc.) or different components in the terminal, or at least some modules/units can be implemented in the form of software programs Realization, the software program runs on
  • Multiple appearing in the embodiments of the present application means two or more.

Abstract

Procédé et appareil de génération d'animation de personnage virtuel, et support de stockage et terminal. Le procédé consiste : à acquérir une trame actuelle d'image, la trame actuelle d'image comprenant une image d'un utilisateur ; selon la trame actuelle d'image, à déterminer des informations d'état, qui correspondent à l'utilisateur dans la trame actuelle, les informations d'état comprenant : des informations faciales, des informations de posture de corps humain et des informations de direction de regard, les informations faciales comprenant des informations de posture faciale et des informations d'expression faciale ; et, selon les informations d'état, à effectuer un traitement de redirection, de façon à obtenir des données d'animation d'un personnage virtuel, les données d'animation ayant le même code temporel que la trame actuelle d'image et les données d'animation comprenant : des données d'animation faciale, des données d'animation corporelle et des données d'animation de globe oculaire. Le procédé de génération d'animation de personnage virtuel selon la présente invention permet d'obtenir une meilleure universalité, un coût réduit et une meilleure expérience utilisateur.
PCT/CN2022/138386 2021-12-14 2022-12-12 Procédé et appareil de génération d'animation de personnage virtuel, et support de stockage et terminal WO2023109753A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111527313.9 2021-12-14
CN202111527313.9A CN114219878B (zh) 2021-12-14 2021-12-14 虚拟角色的动画生成方法及装置、存储介质、终端

Publications (1)

Publication Number Publication Date
WO2023109753A1 true WO2023109753A1 (fr) 2023-06-22

Family

ID=80701814

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/138386 WO2023109753A1 (fr) 2021-12-14 2022-12-12 Procédé et appareil de génération d'animation de personnage virtuel, et support de stockage et terminal

Country Status (2)

Country Link
CN (1) CN114219878B (fr)
WO (1) WO2023109753A1 (fr)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114219878B (zh) * 2021-12-14 2023-05-23 魔珐(上海)信息科技有限公司 虚拟角色的动画生成方法及装置、存储介质、终端
CN115334325A (zh) * 2022-06-23 2022-11-11 联通沃音乐文化有限公司 基于可编辑三维虚拟形象生成直播视频流的方法和系统
WO2024000480A1 (fr) * 2022-06-30 2024-01-04 中国科学院深圳先进技术研究院 Procédé et appareil de génération d'animation d'objet virtuel 3d, dispositif terminal et support
CN115393486B (zh) * 2022-10-27 2023-03-24 科大讯飞股份有限公司 虚拟形象的生成方法、装置、设备及存储介质
CN115665507B (zh) * 2022-12-26 2023-03-21 海马云(天津)信息技术有限公司 含虚拟形象的视频流数据的生成方法、装置、介质及设备
CN116152900B (zh) * 2023-04-17 2023-07-18 腾讯科技(深圳)有限公司 表情信息的获取方法、装置、计算机设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111970535A (zh) * 2020-09-25 2020-11-20 魔珐(上海)信息科技有限公司 虚拟直播方法、装置、系统及存储介质
CN112700523A (zh) * 2020-12-31 2021-04-23 魔珐(上海)信息科技有限公司 虚拟对象面部动画生成方法及装置、存储介质、终端
CN113192132A (zh) * 2021-03-18 2021-07-30 魔珐(上海)信息科技有限公司 眼神捕捉方法及装置、存储介质、终端
US20210279934A1 (en) * 2020-03-09 2021-09-09 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for generating virtual avatar
CN114219878A (zh) * 2021-12-14 2022-03-22 魔珐(上海)信息科技有限公司 虚拟角色的动画生成方法及装置、存储介质、终端

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107154069B (zh) * 2017-05-11 2021-02-02 上海微漫网络科技有限公司 一种基于虚拟角色的数据处理方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210279934A1 (en) * 2020-03-09 2021-09-09 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for generating virtual avatar
CN111970535A (zh) * 2020-09-25 2020-11-20 魔珐(上海)信息科技有限公司 虚拟直播方法、装置、系统及存储介质
CN112700523A (zh) * 2020-12-31 2021-04-23 魔珐(上海)信息科技有限公司 虚拟对象面部动画生成方法及装置、存储介质、终端
CN113192132A (zh) * 2021-03-18 2021-07-30 魔珐(上海)信息科技有限公司 眼神捕捉方法及装置、存储介质、终端
CN114219878A (zh) * 2021-12-14 2022-03-22 魔珐(上海)信息科技有限公司 虚拟角色的动画生成方法及装置、存储介质、终端

Also Published As

Publication number Publication date
CN114219878B (zh) 2023-05-23
CN114219878A (zh) 2022-03-22

Similar Documents

Publication Publication Date Title
WO2023109753A1 (fr) Procédé et appareil de génération d'animation de personnage virtuel, et support de stockage et terminal
JP7200439B1 (ja) アバター表示装置、アバター生成装置及びプログラム
Kuster et al. Gaze correction for home video conferencing
TWI659335B (zh) 圖形處理方法和裝置、虛擬實境系統和計算機儲存介質
JP6234383B2 (ja) ビデオ会議における眼差し補正のための画像処理のための方法およびシステム
US11888909B2 (en) Avatar information protection
CN109671141B (zh) 图像的渲染方法和装置、存储介质、电子装置
WO2021004257A1 (fr) Procédé et appareil de détection de ligne de visée, procédé et appareil de traitement vidéo, dispositif et support d'informations
US11842437B2 (en) Marker-less augmented reality system for mammoplasty pre-visualization
WO2021244172A1 (fr) Procédé de traitement d'image et procédé de synthèse d'image, appareil de traitement d'image et appareil de synthèse d'image, et support de stockage
CN113192132B (zh) 眼神捕捉方法及装置、存储介质、终端
WO2022237249A1 (fr) Procédé, appareil et système de reconstruction tridimensionnelle, support et dispositif informatique
WO2024022065A1 (fr) Appareil et procédé de génération d'expression virtuelle, ainsi que dispositif électronique et support de stockage
CN114821675B (zh) 对象的处理方法、系统和处理器
KR20180098507A (ko) 애니메이션 생성 방법 및 애니메이션 생성 장치
WO2023035725A1 (fr) Procédé et appareil d'affichage d'accessoire virtuel
US20220270337A1 (en) Three-dimensional (3d) human modeling under specific body-fitting of clothes
KR20200134623A (ko) 3차원 가상 캐릭터의 표정모사방법 및 표정모사장치
WO2023185241A1 (fr) Procédé et appareil de traitement de données, dispositif et support
WO2022205167A1 (fr) Procédé et appareil de traitement d'image, plateforme mobile, dispositif terminal et support de stockage
CN115145395B (zh) 虚拟现实交互控制方法、系统及虚拟现实设备
JP2019139608A (ja) 画像生成装置及び画像生成プログラム
US20240020901A1 (en) Method and application for animating computer generated images
WO2024051289A1 (fr) Procédé de remplacement d'arrière-plan d'image et dispositif associé
WO2023151551A1 (fr) Procédé et appareil de traitement d'image vidéo, dispositif électronique et support de stockage

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22906498

Country of ref document: EP

Kind code of ref document: A1