WO2023109753A1 - 虚拟角色的动画生成方法及装置、存储介质、终端 - Google Patents

虚拟角色的动画生成方法及装置、存储介质、终端 Download PDF

Info

Publication number
WO2023109753A1
WO2023109753A1 PCT/CN2022/138386 CN2022138386W WO2023109753A1 WO 2023109753 A1 WO2023109753 A1 WO 2023109753A1 CN 2022138386 W CN2022138386 W CN 2022138386W WO 2023109753 A1 WO2023109753 A1 WO 2023109753A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
user
current frame
virtual character
animation
Prior art date
Application number
PCT/CN2022/138386
Other languages
English (en)
French (fr)
Inventor
张建杰
金师豪
林炳坤
柴金祥
Original Assignee
魔珐(上海)信息科技有限公司
上海墨舞科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 魔珐(上海)信息科技有限公司, 上海墨舞科技有限公司 filed Critical 魔珐(上海)信息科技有限公司
Publication of WO2023109753A1 publication Critical patent/WO2023109753A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • H04N21/43072Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of multiple content streams on the same device
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs

Definitions

  • the invention relates to the technical field of video animation, in particular to a method and device for generating animation of a virtual character, a storage medium, and a terminal.
  • Virtual live broadcast technology refers to the technology in which virtual characters replace live anchors for video production.
  • a specific environment for example, motion capture laboratory
  • specific equipment for example, expression capture equipment, motion capture equipment, etc.
  • the technical problem solved by the present invention is to provide a method for generating animation of virtual characters with better versatility, lower cost and better user experience.
  • an embodiment of the present invention provides a method for generating animation of a virtual character, the method comprising: acquiring a current frame image, the current frame image including the user's image; determining the current frame image according to the current frame image
  • the state information corresponding to the user includes: face information, human body posture information and eye direction information, the face information includes facial posture information and facial expression information; redirection processing is performed according to the state information,
  • the animation data includes: facial animation data, body animation data and eyeball animation data.
  • the method further includes: at least according to the animation data, determining video stream data corresponding to the virtual character; sending the video stream data to a live broadcast server, so that the live broadcast server sends the video stream data The data is forwarded to other user terminals.
  • determining the video stream data corresponding to the virtual character includes: acquiring voice information input by the user; synchronizing the voice information and picture information to obtain the video stream corresponding to the virtual character data, wherein the picture information is obtained by rendering the virtual character according to the animation data.
  • the human body posture information includes: trunk and neck movement information, the trunk and neck movement information is used to describe the movement of the user's torso and neck, and the trunk and neck movement information is based on the facial posture information definite.
  • the body animation data includes torso and neck animation data and limbs animation data
  • performing redirection processing according to the state information to obtain the animation data of the virtual character includes: performing redirection according to the torso and neck motion information Process to obtain the animation data of the trunk and neck; obtain the animation data of the limbs selected by the user; determine whether the action corresponding to the animation data of the trunk and neck matches the action corresponding to the animation data of the limbs, and if not, adjust the Trunk and neck animation data, so that the adjusted torso and neck animation data matches the actions corresponding to the limbs animation data; fusion processing is performed between the limbs animation data and the matched trunk and neck animation data to obtain body animation data .
  • determining the state information corresponding to the user in the current frame includes: acquiring limb movement information input by the user, the limb movement information being used to describe the movement of the user's limbs; Fusion processing is performed with the limb movement information to obtain the human body posture information of the current frame.
  • the method before performing fusion processing on the trunk and neck movement information and the limb movement information, the method further includes: judging whether the trunk and neck movements described by the trunk and neck movement information meet the action conditions, If not, adjust the trunk and neck movement information so that the trunk and neck movements described by the adjusted trunk and neck movement information meet the movement conditions; wherein, the movement conditions are based on the limb movement information definite.
  • determining the state information corresponding to the user in the current frame according to the current frame image includes: determining the facial posture information corresponding to the user in the current frame according to the current frame image; inputting the facial posture information corresponding to the user in the current frame into to the human body posture matching model to obtain the current frame user’s corresponding torso and neck movement information; wherein, the human body posture matching model is obtained by training the first preset model according to the first training data, and the first training data It includes multiple pairs of first sample information, and each pair of first sample information includes: facial posture information corresponding to the sample user and torso and neck movement information corresponding to the sample user.
  • inputting the facial posture information into the human body posture matching model includes: obtaining associated posture information, the associated posture information including: facial posture information and/or trunk and neck movement information corresponding to the user in the associated image, wherein, the associated image is a continuous multi-frame image before the current frame image and/or a continuous multi-frame image after the current frame image; the facial posture information corresponding to the current frame user and the associated posture information are input To the human body pose matching model to obtain the torso and neck movement information corresponding to the user in the current frame.
  • determining the state information corresponding to the user according to the current frame image includes: step A: generating a three-dimensional face model according to the initial face information corresponding to the user in the current frame; step B: according to the three-dimensional face model, Determining estimated face feature information, and calculating a first difference between the estimated face feature information and the target face feature information of the current frame, wherein the target face feature information is based on the current frame image Detected; step C: judging whether the first preset condition is met, if yes, then execute step D, otherwise execute step E; step D: use the initial face information as the face information corresponding to the current frame user; step E: update the initial face information, use the updated initial face information as the initial face information corresponding to the user in the current frame, and return to step A until the first preset condition is met; wherein, step 1 is executed for the first time
  • the initial face information corresponding to the user in the current frame is the face information corresponding to the user in the previous frame, or the preset face information, and the first
  • the gaze direction information includes the three-dimensional pupil center position, and according to the current frame image, determining the status information corresponding to the user in the current frame includes: Step 1: Estimating the pupil center position according to the eye information corresponding to the user in the current frame , determine the three-dimensional eyeball model, wherein the eye information includes: eyeball center position, eyeball radius and iris size; Step 2: Calculate estimated eye feature information according to the three-dimensional eyeball model, and calculate the estimated eyeball The second difference between the facial feature information and the target eye feature information, wherein the target eye feature information is detected based on the current frame image; Step 3: judging whether the second preset condition is met, if yes , then perform step 4, otherwise, perform step 5; step 4: use the estimated pupil center position as the three-dimensional pupil center position corresponding to the user in the current frame; step 5: update the estimated pupil center position, and update the The estimated pupil center position is used as the estimated pupil center position corresponding to the user in the current frame, and returns to step 1 until the second preset condition is
  • the human body posture information includes joint angle data of the first skeletal model
  • performing redirection processing according to the state information to obtain the animation data of the virtual character includes: generating a transitional skeletal model, the transitional skeletal model
  • the positions of the multiple preset key joints in the first skeletal model are the same as the positions of the multiple preset key joints in the first skeletal model, and the skeletal shape of the transitional skeletal model is the same as that of the second skeletal model; according to the The joint angle data of the first skeletal model and the first skeletal model determine the positions of the plurality of preset key joints; determine the transition according to the positions of the plurality of preset key joints and the transition skeletal model
  • the joint angle data of the skeleton model to obtain the body animation data of the virtual character; wherein, the first skeleton model is a skeleton model corresponding to the user, and the second skeleton model is the skeleton model of the virtual character, so
  • the bone shape described includes the number of bones and the default orientation of each joint's rotation
  • performing redirection processing according to the state information to obtain the animation data of the virtual character includes: inputting the facial expression information into an expression mapping model, wherein the expression mapping model is based on the second training
  • the data is obtained by training the second preset model, the second training data includes multiple sets of second sample information, and each set of sample information includes: facial expression information of multiple sample users under preset expressions and the virtual character
  • the facial animation data includes mouth animation data
  • performing redirection processing according to the state information to obtain the animation data of the virtual character includes: extracting information related to the mouth from the facial expression information.
  • Expression information recorded as mouth expression information; input the mouth expression information into the first mouth shape mapping model, wherein the first mouth shape mapping model is obtained by training the third preset model according to the third training data Yes
  • the third training data includes multiple sets of third sample information
  • each set of third sample information includes: mouth expression information of multiple sample users under preset expressions and the virtual character’s mouth expression information under the preset expressions Mouth animation data, wherein the multiple sets of third sample information correspond to different preset expressions; acquire the mouth animation data output by the first mouth shape mapping model.
  • the facial animation data includes mouth animation data
  • performing redirection processing according to the state information to obtain the animation data of the virtual character includes: extracting the mouth and mouth according to the 3D face model corresponding to the user in the current frame.
  • the three-dimensional feature points related to the mouth are recorded as the three-dimensional feature information of the mouth;
  • the three-dimensional feature information of the mouth is input into the second mouth shape mapping model, wherein the second mouth shape mapping model is based on the fourth training data for the first Four preset models are trained
  • the fourth training data includes multiple sets of fourth sample information
  • each set of fourth sample information includes: three-dimensional feature information of the mouth of multiple sample users under preset expressions and the virtual character Mouth animation data under the preset expression, wherein the plurality of sets of fourth sample information correspond to different preset expressions; and acquire the mouth animation data output by the second mouth shape mapping model.
  • the animation data further includes tooth animation data
  • performing redirection processing according to the state information to obtain the animation data of the virtual character further includes: determining the tooth animation data according to the mouth animation data .
  • the eye direction information is the zenith angle and azimuth angle of the three-dimensional pupil center position in a spherical coordinate system with the eyeball center position as the coordinate origin, and redirection processing is performed according to the state information to Obtaining the animation data of the virtual character includes: determining the position of the virtual pupil according to the eye radius of the virtual character and the gaze direction information, so as to obtain the animation data of the eye, wherein the position of the virtual pupil is the position of the virtual character The three-dimensional pupil center position of .
  • performing redirection processing according to the state information to obtain the animation data of the virtual character includes: inputting the three-dimensional pupil center position corresponding to the user in the current frame into the eye mapping model, wherein the eye mapping model is based on
  • the fifth training data is obtained by training the fifth preset model, the fifth training data includes multiple pairs of fifth sample information, each pair of fifth sample information includes the user's three-dimensional pupil center position and the The three-dimensional pupil center position of the virtual character under the preset eye direction; the virtual pupil center position is obtained from the eye mapping model to obtain the eyeball animation data, wherein the virtual pupil position is the three-dimensional pupil position of the virtual character Pupil center position.
  • the current frame image is collected by a single camera.
  • An embodiment of the present invention also provides an animation generation device for a virtual character
  • the device includes: an image acquisition module, configured to acquire a current frame image, the current frame image including the user's image; a calculation module, configured to Frame image, determine the state information corresponding to the current frame user, the state information includes: face information, human body posture information and eye direction information, the human face information includes facial posture information and facial expression information; redirection module, It is used to perform redirection processing according to the state information to obtain the animation data of the virtual character, wherein the time code of the animation data is the same as that of the current frame image, and the animation data includes: facial animation data, body animation data Animation data and eye animation data.
  • An embodiment of the present invention also provides a storage medium on which a computer program is stored, and when the computer program is run by a processor, the steps of the above-mentioned method for generating animation of a virtual character are executed.
  • An embodiment of the present invention also provides a terminal, including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor executes the above virtual character when running the computer program.
  • a terminal including a memory and a processor
  • the memory stores a computer program that can run on the processor
  • the processor executes the above virtual character when running the computer program. The steps of the animation generation method.
  • the current frame image is obtained, and the state information corresponding to the current frame user is determined according to the current frame image. Since the state information includes face information, human body posture information, and eye direction information, therefore, according to the state information, the The animation data of the avatar may have the same semantics as the state information corresponding to the user.
  • the user does not need to wear specific motion capture clothing or a specific helmet, and can obtain information such as the user's expression, facial posture, action posture, and eyes only based on a single frame of image, and then proceed according to the state information.
  • the animation of the virtual character is obtained through redirection processing, so the solution provided by the embodiment of the present invention has better versatility, lower cost and better user experience.
  • the movement information of the trunk and neck is obtained based on the facial posture information.
  • the amount of calculation is smaller, and the efficiency of animation generation can be improved while ensuring the animation effect.
  • the human body pose model is a time-series model
  • the facial pose information and associated pose information corresponding to the user in the current frame can be input into the face pose model to obtain the torso and neck corresponding to the user in the current frame action information.
  • Adopting such a scheme is beneficial to avoid inaccurate torso and neck movement information caused by shaking of the user's facial posture in a single frame image, and can make the torso and neck posture described by the user's torso and neck movement information more coherent and smooth, thereby This makes the animation of virtual characters more coherent without additional smoothing.
  • FIG. 1 is a schematic diagram of an application scene of a virtual character animation generation method in a first perspective in an embodiment of the present invention
  • Fig. 2 is a schematic diagram of an application scene of a virtual character animation generation method in a second perspective in an embodiment of the present invention
  • FIG. 3 is a schematic flowchart of a method for generating animation of a virtual character in an embodiment of the present invention
  • FIG. 4 is a partial flow diagram of a specific implementation manner of step S302 in FIG. 3;
  • FIG. 5 is a partial flow diagram of another specific implementation manner of step S302 in FIG. 3;
  • FIG. 6 is a partial flow diagram of a specific implementation manner of step S303 in FIG. 3;
  • Fig. 7 is another schematic diagram of an application scene of a virtual character animation generation method in a first perspective in an embodiment of the present invention.
  • Fig. 8 is a schematic structural diagram of an animation generating device for a virtual character in an embodiment of the present invention.
  • an embodiment of the present invention provides a method for generating animation of a virtual character.
  • the current frame image is obtained, and the current Frame the state information corresponding to the user. Since the state information includes face information, human body posture information and eye direction information, the animation data of the virtual character obtained according to the state information can have the same semantics as the state information corresponding to the user.
  • the user does not need to wear specific motion capture clothing or a specific helmet, and can obtain information such as the user's expression, facial posture, action posture, and eyes only based on a single frame of image, and then proceed according to the state information.
  • the animation of the virtual character is obtained through redirection processing, so the solution provided by the embodiment of the present invention has better versatility, lower cost and better user experience.
  • FIG. 1 is a schematic diagram of an application scene of a virtual character animation generation method in an embodiment of the present invention in a first perspective.
  • FIG. 2 is a virtual character animation generation in an embodiment of the present invention
  • FIG. 7 is another schematic diagram of the application scenario of a virtual character animation generation method in the embodiment of the present invention in the first perspective.
  • the first viewing angle is different from the second viewing angle.
  • a camera 11 may be used to take pictures of a user 10 .
  • the user 10 is the subject of the camera 11, and the user 10 is a real actor. It should be noted that, compared with the prior art, in the solution of the embodiment of the present invention, the user 10 does not need to wear motion capture clothing, and does not need to wear expression capture devices and eye capture devices.
  • the camera 11 may be various existing appropriate photographing devices, and this embodiment does not limit the type and quantity of the camera 11 .
  • camera 11 can be RGB (R is the abbreviation of red RED, G is the abbreviation of green Green, B is the abbreviation of blue Blue) camera; It can also be RGBD (D It is the abbreviation of depth map Depth) camera. That is, the image captured by the camera 11 may be an RGB image, or an RGBD image, etc., but is not limited thereto.
  • the camera 11 shoots the user 10, and the video stream data corresponding to the user 10 can be obtained.
  • the video stream data corresponding to the user 10 can include multiple frames of images, each frame of image has a time code, and each frame of image can include the user 10. image.
  • the distance between the user 10 and the camera 11 is less than a first preset distance threshold, and the image may include an image of the face of the user 10 , and may also include images of the neck and shoulders of the user 10 .
  • the distance between the user 10 and the camera 11 is generally small, therefore, the image may not include the image of the whole body of the user 10 .
  • the camera 11 in the embodiment of the present invention is not set on the wearable device of the user 10, and the distance between the user 10 and the camera 11 is greater than the second preset distance threshold, and the second preset distance threshold is usually far away. less than the first preset distance threshold.
  • the camera 11 can be connected to the terminal 12, and the terminal 12 can be various existing devices with data receiving and data processing functions, and the camera 11 can send the collected video stream data corresponding to the user 10 to the terminal 12.
  • the terminal 12 may be a mobile phone, a tablet computer, a computer, etc., but is not limited thereto. It should be noted that this embodiment does not limit the connection mode between the camera 11 and the terminal 12, which may be a wired connection or a wireless connection (for example, a Bluetooth connection, a LAN connection, etc.). More specifically, the camera 11 may be a camera set on the terminal 12, for example, may be a camera on a mobile phone, a camera on a computer, and the like.
  • the terminal 12 may sequentially process and analyze each frame of the video stream data corresponding to the user 10 collected by the camera 11 according to the sequence of the time code, so as to obtain the status information corresponding to the user 10 . Furthermore, redirection processing can be performed according to the state information corresponding to the user 10 in each frame of image to obtain the animation data of the virtual character 13 corresponding to the frame of image, and the obtained animation data has the same time code as the image.
  • the virtual character 13 may include a virtual person, and may also include virtual animals, virtual plants and other objects with faces and bodies.
  • the virtual character 13 may be three-dimensional or two-dimensional, which is not limited in this embodiment of the present invention.
  • FIG. 3 is a schematic flowchart of a method for generating animation of a virtual character in an embodiment of the present invention.
  • the method can be executed by a terminal, and the terminal can be various terminal devices capable of receiving and processing data, for example, a mobile phone, a computer, a tablet computer, etc., which is not limited in the embodiments of the present invention.
  • the terminal may be the terminal 12 shown in FIG. 1 , but is not limited thereto.
  • the animation generation method of the virtual character shown in Fig. 3 may comprise the following steps:
  • Step S301 Acquiring the current frame image, the current frame image includes the user's image
  • Step S302 According to the current frame image, determine the state information corresponding to the current frame user, the state information includes: face information, human body posture information and eye direction information, and the human face information includes facial posture information and face expression information;
  • Step S303 Perform redirection processing according to the state information to obtain the animation data of the virtual character, wherein the animation data and the time code of the current frame image are the same, and the animation data includes: facial animation data, Body animation data and eye animation data.
  • the method can be implemented in the form of a software program, and the software program runs in a processor integrated inside the chip or chip module; or, the method can be implemented by using hardware or a combination of hardware and software way to achieve.
  • the current frame image may be acquired, and the current frame image may be obtained by taking pictures of the user by the camera. More specifically, the current frame image may be an image currently to be processed in the video stream data corresponding to the user, and the time code of the current frame image may be recorded as the current moment.
  • the video stream data corresponding to the user may be obtained by using a camera to shoot the user.
  • the video stream data corresponding to the user is collected by a single camera, and the camera may be an RGB camera or an RGBD camera, but is not limited thereto.
  • the current frame image includes the image of the user.
  • the current frame image may include an image of the user's face, may also include images of the user's neck and shoulders, may also include images of at least a part of the arm, etc., but is not limited thereto.
  • the state information corresponding to the current frame user may be determined according to the current frame image, and the state information may include: face information, human body posture information and gaze direction information.
  • the state information corresponding to the user may be obtained by restoring and reconstructing the user according to the current frame image.
  • the face information includes facial posture information and facial expression information, wherein the facial posture information is used to describe the position and orientation of the user's face, and more specifically, the position and orientation of the user's face refer to The position and orientation of the user's face in three-dimensional space.
  • the position of the user's face may be the position of the user's face relative to the camera
  • the orientation of the user's face may be the orientation of the user's face relative to the camera.
  • the face information may also include: ID information, which is used to describe the shape of the user's face and the distribution of facial features.
  • the facial expression information can be used to describe the user's expression.
  • the facial expression information can be the weights of multiple blend shapes (Blend shapes), where the multiple blend shapes can be preset; the facial expression information can also be the weight of multiple blend shapes.
  • the human body posture information can be used to describe the motion posture of the user's body.
  • the human body posture information may be joint angle data, and more specifically, the joint angle data is the angle of a joint.
  • the gaze direction information may be used to describe the user's gaze direction.
  • the direction from the center of the eyeball to the center of the three-dimensional pupil is the gaze direction.
  • the center position of the eyeball is the position of the center point of the eyeball
  • the three-dimensional pupil center position is the position of the center point of the pupil. Since the center position of the iris coincides with the center position of the three-dimensional pupil, the specific position of the iris on the eyeball is determined according to the center position of the three-dimensional pupil, so the iris will move with the change of the center position of the three-dimensional pupil, but the iris size of the same user Can be fixed.
  • the iris size is the size of the iris, and the iris size can be used to determine the coverage area of the iris in the eyeball.
  • the gaze direction information may be the three-dimensional pupil center position. More specifically, the gaze direction information may be the zenith angle and azimuth angle of the three-dimensional pupil center position in a spherical coordinate system with the eyeball center position as the coordinate origin. Specifically, the three-dimensional pupil center position can use spherical coordinates (r, ⁇ , ), where r is the radius of the three-dimensional eyeball, ⁇ is the zenith angle, is the azimuth angle.
  • Zenith angle ⁇ and azimuth angle to characterize the ray direction generated by the connection between the eyeball center position and the three-dimensional pupil center position, so the zenith angle ⁇ and azimuth angle in the spherical coordinates of the three-dimensional pupil center position can be used to indicate the direction of the eye.
  • FIG. 4 is a partial flowchart of a specific implementation manner of step S302 in FIG. 3 .
  • the face information corresponding to the user in the current frame can be obtained, more specifically, the facial pose information and facial expression information corresponding to the user in the current frame can be obtained.
  • Step S302 shown in FIG. 4 may include the following steps:
  • Step S401 Generate a 3D face model according to the initial face information corresponding to the user in the current frame;
  • Step S402 According to the three-dimensional face model, determine estimated face feature information, and calculate a first difference between the estimated face feature information and the target face feature information of the current frame;
  • Step S403 Determine whether the first preset condition is met; if yes, execute step S404, otherwise execute step S405;
  • Step S404 use the initial face information as the face information corresponding to the current frame user;
  • Step S405 update the initial face information, and use the updated initial face information as the initial face information corresponding to the user in the current frame; and return to step S401 until the first preset condition is met.
  • the initial face information corresponding to the user in the current frame may be a preset default value, or may be the face information corresponding to the user in the previous frame.
  • the default value of the identity ID information in the initial face information may be the average value of the identity ID information of multiple sample users. Since the average value calculated based on the identity ID information of multiple sample users is universal, it can be used as the default value of the identity ID information in the user's initial face information in the current frame image, and the default value of the facial pose information can be the user The preset position and orientation, and the default value of the facial expression information may be the user's facial expression information under a neutral expression, and the default value may be pre-collected.
  • the "user” in the embodiment of the present invention refers to the user in the current frame image
  • the “multiple sample users” refers to the users involved in the preparatory work such as training data collection before using the camera to collect video stream data or performers.
  • step S401 when step S401 is executed for the first time, the face information corresponding to the user in the last frame can also be used as the initial face information corresponding to the user in the current frame, which is beneficial to reduce the amount of calculation, and make the face of the obtained virtual character
  • the animation data is much smoother without additional smoothing.
  • step S401 When step S401 is executed again, that is, when returning from step S405 to step S401, the initial face information corresponding to the user in the current frame may be the updated initial face information.
  • a three-dimensional face model can be synthesized according to the initial face information corresponding to the user in the current frame.
  • the 3D face model in step S401 is obtained based on the initial face information corresponding to the user in the current frame, not based on the image in the current frame.
  • the embodiment of the present invention does not limit the specific method for synthesizing a three-dimensional face model according to the initial face information (identity ID information, facial posture information, and facial expression information) corresponding to the user in the current frame. It is a variety of existing methods capable of synthesizing 3D face models.
  • estimated face feature information may be calculated according to the three-dimensional face model obtained in step S401.
  • the estimated face feature information is the face feature information obtained according to the three-dimensional face model, and the estimated face feature information may include: two-dimensional projection point coordinate information and texture feature point coordinate information.
  • multiple 3D feature points may be extracted from the 3D face model, and then the multiple 3D feature points are projected onto a 2D plane to obtain multiple 2D projection points.
  • the two-dimensional plane refers to the plane of the image coordinate system of the camera.
  • multiple vertices are extracted from the 3D human face model according to multiple predefined vertex indices to obtain multiple 3D feature points. That is, the 3D feature points are vertices determined on the 3D face model based on the predefined vertex indices.
  • each vertex index is used to refer to a specific facial part, and different vertex indexes refer to different facial parts.
  • vertex index 3780 is used to refer to the tip of the nose, etc.
  • the 3D face model may include multiple vertices, and the vertices corresponding to the multiple vertex indices may be extracted to obtain multiple 3D feature points.
  • multiple 3D feature points may be projected onto a 2D plane, so as to convert the 3D coordinates of each 3D feature point into the 2D coordinates of the 2D projected point corresponding to the 3D feature point.
  • estimated face feature information can be obtained, that is, the estimated face feature information can include two-dimensional coordinates of multiple two-dimensional projection points.
  • a first difference between the estimated face feature information and the target face feature information may be calculated, where the target face feature information is detected according to the current frame image.
  • the feature information of the target face may include: coordinate information of two-dimensional feature points, where the two-dimensional feature points are points with specific semantic information in the current frame image.
  • a machine learning method may be used to detect the current frame image to detect multiple two-dimensional feature points.
  • the semantic information is predefined, and the semantic information can be used to describe the facial parts corresponding to the two-dimensional feature points.
  • the semantic information of the 2D feature point No. 64 is: nose point.
  • the facial parts described by the semantic information of the plurality of two-dimensional feature points are the same as the facial parts referred to by the plurality of vertex indices.
  • the estimated face feature information may also include texture feature point coordinate information
  • the target face feature information may also include pixel coordinates corresponding to texture feature points.
  • the two-dimensional texture coordinates (u, v) corresponding to the pixel point are determined according to the pixel point in the current frame image
  • the three-dimensional texture corresponding to the pixel point on the three-dimensional face model can be determined according to the predefined texture mapping relationship Points, that is, different from the above three-dimensional feature points
  • three-dimensional texture points are vertices determined on the three-dimensional face model according to a predefined texture mapping relationship.
  • multiple three-dimensional texture points may be projected onto a two-dimensional plane to obtain two-dimensional coordinates of corresponding texture feature points. Furthermore, the coordinate difference between the pixel point and the corresponding texture feature point can be calculated.
  • the first difference can be calculated according to the coordinate difference between the pixel point and the corresponding texture feature point, and the coordinate difference between the two-dimensional feature point and the two-dimensional projection point.
  • the embodiment of the present invention does not limit the order of detecting the feature information of the target face and determining the feature information of the estimated face.
  • a first difference between the estimated face feature information and the target face feature information may be calculated. More specifically, coordinate differences between a plurality of two-dimensional projected points and a plurality of two-dimensional feature points may be calculated.
  • step S403 it may be judged whether the first preset condition is met, wherein the first preset condition may include: the first difference is not greater than the first preset threshold, and/or, the number of times the initial face information is updated reached the second preset threshold.
  • step S404 may be executed, that is, the initial face information corresponding to the user in the current frame may be used as the face information corresponding to the user in the current frame.
  • the first preset condition it can be determined that the three-dimensional face model in step S401 conforms to the real face of the user, in other words, the face information in step S401 can accurately and truly describe the user's face in the current frame image. Facial pose and facial expression etc.
  • step S405 can be executed, that is, update the initial face information, and continue to execute steps S401-S403 according to the updated initial face information until the first preset condition is met.
  • only facial pose information and facial expression information may be updated each time initial face information is updated, that is, user ID information is not updated.
  • the ID information of the user may be predetermined. Since the user in the application scenario of this embodiment is usually fixed, that is, in the process of recording video, the object captured by the camera is usually the same person, therefore, the user's ID information can be fixed, that is, can be used Pre-determined ID information. Adopting this solution can simplify the calculation process of face information and help improve the efficiency of animation generation.
  • the identity ID information of the user may be determined before obtaining the video stream data corresponding to the user.
  • the identity ID information of the user may be determined before obtaining the video stream data corresponding to the user.
  • a plurality of identity images can be obtained, each identity image includes an image of the user, wherein the user's expression in each identity image is the default expression and the user's facial posture (that is, face The position and/or orientation of the parts) can be different.
  • the preset initial 3D face model can be iteratively optimized based on multiple identity images to obtain the user's identity ID parameters, and the identity ID obtained based on the multiple identity images can be used in the subsequent animation process of generating the virtual character parameter.
  • the preset initial 3D face model refers to a 3D face model constructed with preset default values according to identity ID parameters, facial posture information and facial expression information, that is, the preset initial 3D face model The model is the initial model without any optimization and adjustment, and the default expression can be a neutral expression.
  • the human body posture information corresponding to the user in the current frame may also be determined.
  • the human body posture information may be obtained by directly constructing a three-dimensional human body model from an image, or may be obtained through calculation based on facial posture information, which is not limited in this embodiment.
  • the human body posture information may include: trunk and neck movement information.
  • the torso and neck action information is used to describe the action posture of the user's torso and neck.
  • the movement information of the trunk and neck may include joint angle data of a plurality of first preset joints, and the first preset joints are joints located on the trunk and the neck.
  • the torso and neck movement information corresponding to the user in the current frame is calculated based on the facial posture information corresponding to the user in the current frame.
  • the facial posture information corresponding to the user in the current frame can be input into the human body posture matching model to obtain the torso and neck movement information corresponding to the user in the current frame.
  • the human body posture matching model may be obtained by training a first preset model according to the first training data, wherein the first preset model may be various existing models with learning capabilities.
  • the first training data may include multiple pairs of first sample information, and each pair of first sample information includes facial posture information corresponding to the sample user and torso and neck movement information corresponding to the sample user. More specifically, each pair of first sample information is obtained by performing motion capture on the sample user, and there is a corresponding relationship between facial posture information and torso and neck motion information belonging to the same pair of first sample information.
  • the multiple pairs of first sample information may be obtained by performing motion capture on the same sample user, or may be obtained by performing motion capture on multiple first sample users. It should be noted that the sample users in this embodiment of the present invention are real people.
  • the human body posture matching model trained by using the first training data can learn the relationship between the position and orientation of the real person's face and the posture of the real person's torso and neck. Therefore, the torso and neck movement information output by the human body pose matching model is real and natural, that is, the user's overall posture presented by the output torso and neck movement information and the input facial posture information is real and natural.
  • the calculation amount is smaller, and the efficiency of animation generation can be improved under the premise of ensuring the animation effect.
  • the torso and neck movement information corresponding to the user in the current frame is calculated based on the facial posture information and associated posture information corresponding to the user in the current frame. More specifically, the input of the human pose matching model is the facial pose information corresponding to the user in the current frame and the associated pose information corresponding to the user in the current frame, and the corresponding output is the torso and neck movement information corresponding to the user in the current frame.
  • the associated posture information includes: the facial posture information and/or the torso and neck movement information corresponding to the user in the associated image, and the associated image is a continuous multi-frame image before the current frame image and/or the current frame image. Consecutive multiple frames of images after a frame image.
  • the time code of the current frame image is recorded as t1
  • the associated posture information may include the facial posture information corresponding to the user in multiple consecutive images with time codes from t1-T to t1-1, where T is a positive integer .
  • the associated gesture information may also include the user's corresponding torso and neck movement information in multiple consecutive images with time codes from t1-T to t1-1.
  • the associated posture information may include facial posture information and torso and neck motion information corresponding to the user in the 30 frame images adjacent to the current frame image and before the current frame image.
  • the associated posture information may also include facial posture information corresponding to the user in images with time codes from t1+1 to t1+T, and may also include torso information corresponding to users in images with time codes from t1+1 to t1+T Neck movement information.
  • the associated posture information may also include facial posture information and torso and neck motion information corresponding to the user in the 30 frame images adjacent to the current frame image and after the current frame image.
  • the human body posture matching model can be a time-series model, and the facial posture information and associated posture information corresponding to the user in the current frame can be input into the human body posture matching model to obtain the torso neck corresponding to the user in the current frame.
  • Department action information Adopting such a scheme is beneficial to avoid inaccurate torso and neck movement information caused by shaking of the user's facial posture in a single frame image, and can make the torso and neck posture described by the torso and neck movement information more coherent and smooth, thereby making the virtual Character animations flow more smoothly without additional smoothing.
  • the human body posture information may further include: limb movement information, and the limb movement information may be used to describe the movement posture of the user's limbs.
  • limb motion information can be used to describe the motion of a user's arm.
  • the limb movement information may include joint angle data of a plurality of second preset joints, and the plurality of second preset joints are joints located in the limbs. More specifically, the plurality of second preset joints may include arm joints.
  • the limb movement information may be a preset default value, for example, the arm movement represented by the default value of the limb movement information may be natural drooping, etc., but it is not limited thereto.
  • the limb movement information may also be input by the user.
  • fusion processing may be performed on the torso and neck movement information and the limb movement information to obtain human body posture information of the current frame.
  • it may be judged whether the trunk and neck movements described by the trunk neck movement information meet the action conditions, and if not, adjust the trunk neck movement information.
  • Body motion information so that the trunk and neck motions described by the adjusted trunk and neck motion information meet the motion conditions.
  • the action condition is determined according to the action information of the limbs.
  • the action condition is determined according to the action information of the limbs.
  • the action posture of the limbs described by the action information of the limbs and the action of the trunk and neck described by the action information of the trunk and neck can be determined.
  • the overall body posture presented by the posture is reasonable and real. If the action information of the trunk and neck does not satisfy the action condition, it can be determined that the overall body posture presented by the actions of the limbs described by the action information of the limbs and the actions of the trunk and neck described by the action information of the trunk and neck is unreasonable.
  • the action condition is the action posture of the trunk and neck that matches the action posture of the limbs described by the limb action information.
  • the posture of the department is inconsistent.
  • FIG. 5 is a partial flow diagram of another specific implementation manner in step S302 .
  • the gaze direction information corresponding to the user in the current frame of the user can be obtained.
  • Step S302 shown in FIG. 5 may include the following steps:
  • Step S501 Determine the three-dimensional eyeball model according to the eye information corresponding to the user in the current frame and the estimated pupil center position;
  • Step S502 Determine estimated eye feature information according to the three-dimensional eyeball model, and calculate a second difference between the estimated eye feature information and the target eye feature information;
  • Step S503 judge whether the second preset condition is met; if yes, execute step S504; otherwise, execute step S505;
  • Step S504 Use the estimated pupil center position as the three-dimensional pupil center position corresponding to the user in the current frame;
  • Step S505 update the estimated pupil center position, and use the updated estimated pupil center position as the estimated pupil center position corresponding to the user in the current frame; and return to step S501 until the second preset condition is met.
  • the eye information includes eyeball center position, eyeball radius and iris size. It can be understood that the eye information is the personalized data of each person's eyeball, and the specific value of the eye information of different users is different. The specific value of the eye information of the same user can be fixed, but the same The user's gaze direction information may be different.
  • step S501 for the first time Before executing step S501 for the first time, it can be judged according to the current frame image whether the eyes are in the closed-eye state, and if so, the gaze direction information of the eyes at the last moment can be used as the gaze direction information corresponding to the current frame user, that is, there is no need to Then execute the steps shown in FIG. 5 for the eye.
  • the eye information corresponding to the user in the current frame may be a preset default value, or may be the eye information corresponding to the user in the previous frame.
  • the estimated pupil center position corresponding to the user in the current frame may be a preset default value, or may be the three-dimensional pupil center position corresponding to the user in the previous frame.
  • the default value of the eyeball center position can be the average value of the eyeball center positions of multiple sample users, similarly, the default value of the eyeball radius can be the average value of the eyeball radius of multiple sample users, iris
  • the default value for size may be the average of the iris sizes of multiple sample users.
  • the default value of the estimated pupil center position may be the position of the pupil when the user looks ahead.
  • step S501 When step S501 is executed again, that is, when returning from step S505 to step S501, the estimated pupil center position corresponding to the user in the current frame may be the updated estimated pupil center position.
  • a three-dimensional eyeball model can be synthesized according to the eye information corresponding to the user in the current frame and the estimated pupil center position.
  • the embodiment of the present invention does not limit the specific method for synthesizing the 3D eyeball model based on the eye information and the estimated pupil center position, and various existing methods capable of synthesizing the 3D eyeball model may be used.
  • step S501 when step S501 is executed for the first time, the eye information corresponding to the user in the previous frame may also be used as the eye information corresponding to the user in the current frame, and the three-dimensional pupil center position corresponding to the user in the previous frame may be used as the eye information corresponding to the user in the current frame. Estimating the position of the center of the pupil can not only reduce the amount of calculation, but also make the eye animation data of the avatar smoother without additional smoothing.
  • the estimated eye feature information can be determined according to the three-dimensional eyeball model obtained in step S501, and the second difference between the estimated eye feature information and the target eye feature information can be calculated, wherein , the target eye feature information is obtained based on the current frame image detection.
  • the eye feature information may include two-dimensional pupil center positions, iris mask positions, and the like.
  • the two-dimensional pupil center position refers to the position of the pupil in the two-dimensional plane
  • the iris mask position may refer to the position of the iris mask in the two-dimensional plane.
  • the estimated eye feature information can be obtained by projecting the pupil position and iris mask position according to the three-dimensional eyeball model to a two-dimensional plane, and the target eye feature information can be obtained by detecting the current frame image using a machine learning method of.
  • step S503 it may be judged whether the second preset condition is satisfied, wherein the second preset condition may include: the second difference is not greater than the third preset threshold, and/or, updating the estimated pupil center position The number of times reaches the fourth preset threshold.
  • the third preset threshold may be the same as or different from the first preset threshold
  • the fourth preset threshold may be the same as or different from the second preset threshold.
  • step S504 can be executed, that is, the estimated pupil center position corresponding to the user in the current frame can be used as the three-dimensional pupil center position corresponding to the user in the current frame.
  • step S505 can be executed, that is, update the estimated pupil center position, and continue to execute steps S501-S503 according to the updated estimated pupil center position information until the second preset condition is met. set conditions.
  • the eye information can be fixed, that is, the eye information can be predetermined, and only the predetermined information is updated each time. Estimating the position of the center of the pupil can simplify the calculation process and improve the efficiency of animation generation.
  • the user's eye information is determined before acquiring the video stream data corresponding to the user.
  • the eye image includes an image of the user's eyes, and the user's expression in the eye image is a neutral expression and the direction of the eyes is looking straight ahead.
  • a plurality of three-dimensional eyelid feature points can be determined according to the eye image, and then the average value of the three-dimensional positions of the multiple three-dimensional eyelid feature points can be calculated, and a preset three-dimensional offset can be added to the average value to obtain The center of the eyeball.
  • the offset direction of the offset is toward the inside of the eyeball.
  • the preset initial three-dimensional eyeball model can also be performed on the preset initial three-dimensional eyeball model according to the eye image to obtain the iris size.
  • the eyeball center position and iris size obtained from the eye image can be used, and the eyeball radius can be the average value of eyeball radii of multiple sample users.
  • the preset initial 3D eyeball model refers to a 3D eyeball model in which both the eyeball information and the 3D pupil center position are constructed using preset default values.
  • eye information may also be updated each time step S505 is performed, which is not limited in this embodiment of the present invention.
  • the gaze direction information of the two eyes can be determined respectively, and whether the gaze direction information of the two eyes satisfies the interaction relationship can be judged. If the gaze direction of the two eyes does not satisfy the interaction relationship, it can be determined that the calculated gaze direction information is incorrect, and the gaze direction information corresponding to the user in the previous frame is used as the gaze direction information corresponding to the user in the current frame.
  • the interactive relationship means that the eye direction of the left eye and the eye direction of the right eye can be made by the same person at the same time.
  • the face information, eye direction information and human body posture information corresponding to the user in the current frame can be obtained.
  • redirection processing may be performed according to the status information corresponding to the user in the current frame, so as to obtain the animation data of the virtual character in the current frame.
  • the animation data of the virtual character may include controller data for generating the animation of the virtual character, and the specific form is a sequence of digitized vectors.
  • the animation data can be converted into a data form that can be received by UE or Unity3d (multiple blend shapes (Blend shapes) weights and joint angle data), and input to a rendering engine, such as UE or Unity3d, to drive the corresponding parts of the virtual character Make corresponding actions.
  • UE or Unity3d multiple blend shapes (Blend shapes) weights and joint angle data
  • FIG. 6 is a partial flowchart of a specific implementation manner of step S303 in FIG. 3 .
  • the body animation data of the virtual character in the current frame can be determined according to the human body posture information corresponding to the user in the current frame.
  • Step S303 shown in FIG. 6 may include the following steps:
  • Step S601 generating a transition skeleton model
  • Step S602 According to the joint angle data of the first skeletal model and the first skeletal model, determine the positions of the plurality of preset key joints;
  • Step S603 Determine the joint angle data of the transition skeleton model according to the positions of the plurality of preset key joints and the transition skeleton model, so as to obtain the body animation data of the virtual character.
  • the user's skeleton and the virtual character's skeleton are inconsistent in definition (the number of bones, the default orientation of the bones, and the joint position), it is impossible to directly transmit the user's body posture information to the virtual character.
  • the definition of the joint angle is also different , the joint angle data cannot be passed directly.
  • using inverse kinematics to transfer will also cause problems in the posture of the avatar.
  • a transitional skeleton model may be generated according to the first skeleton model and the second skeleton model.
  • the first skeleton model is a skeleton model corresponding to the user, more specifically, the skeleton model corresponding to the user is a skeleton model that can be used to describe the skeleton of the user.
  • the first bone model may be obtained by restoring and reconstructing an image, or may be a preset average bone. Wherein, if the average skeleton is adopted, the step of obtaining the user skeleton model can be omitted.
  • the second skeleton model is a skeleton model of the virtual character.
  • the skeleton form of the transitional skeleton model is the same as that of the second skeleton model, and the skeleton form includes the number of bones and the default orientation of the rotation axis of each joint, that is, the number of bones of the transitional skeleton model
  • the number of bones is the same as that of the second bone model
  • the bones in the transition bone model correspond to the bones in the second bone model
  • the default orientation of the rotation axis of each joint in the transition bone model corresponds to that in the second bone model
  • the default orientation of the joint's rotation axis is also the same.
  • a plurality of preset key joints are pre-defined, and the preset key joints are preset defined joints. More specifically, the preset key joints may be selected from the first preset joints and the second preset joints above. The positions of the multiple preset key joints in the first skeletal model may be obtained, and the positions of the multiple preset key joints in the transitional skeletal model are respectively set as the positions of the multiple preset key joints in the first skeletal model The position in , thus the transition bone model can be obtained. More specifically, the position of each preset key joint in the transitional skeletal model is the same as the position of the preset key joint in the first skeletal model.
  • the positions of multiple preset key joints in the first skeletal model can be calculated according to the joint angle data of the first skeletal model and the first skeletal model. Since the position of each preset key joint in the transition skeleton model is the same as the position of the preset key joint in the first skeleton model, the positions of multiple preset key joints in the transition skeleton model can be obtained.
  • the joint angle data of the transitional skeleton model can be calculated and determined according to the positions of multiple preset key joints. Since the skeleton shape of the transitional skeleton model is the same as that of the second skeleton model, the joint angle data of the virtual character can be obtained through direct transmission. In other words, the joint angle data of the transition skeleton model can be directly used as the joint angle data of the virtual character. Further, the joint angle data can be used as body animation data of the virtual character. Thus, the obtained body animation data and human body pose information can have similar semantics.
  • the body animation data of the virtual character can be further optimized. Specifically, it may be judged whether the body animation data satisfies a preset posture constraint condition, and if not, the body animation data may be adjusted to obtain the body animation data of the current frame.
  • the human body posture information includes torso and neck movement information
  • redirection processing can be performed according to the torso and neck movement information to obtain the torso and neck animation data of the virtual character
  • the torso and neck animation data is used to generate the torso of the virtual character and neck movements.
  • the joint angle data of the torso and neck of the virtual character may be obtained through redirection according to the joint angle data of the torso and neck corresponding to the user.
  • the animation data of the limbs of the virtual character may be acquired, and the animation data of the limbs is used to generate movements of the limbs of the virtual character.
  • the limb animation data may be preset.
  • the limb animation data of the virtual character corresponding to the current frame may be determined from a plurality of preset limb animation data according to the user's selection.
  • the animation data of the limbs and the animation data of the torso and neck can be fused to obtain the body animation data of the virtual character.
  • it can be judged whether the action corresponding to the animation data of the trunk and neck matches the action corresponding to the animation data of the limbs. If not, the animation data of the trunk and neck can be adjusted so that the adjusted trunk and neck The action corresponding to the internal animation data matches the action corresponding to the limb animation data.
  • the overall body posture of the generated virtual character is reasonable and real.
  • the action corresponding to the limb animation data is waving goodbye, and the action corresponding to the torso and neck animation data is the torso in a half-lying posture, then the action corresponding to the torso and neck animation data does not match the action corresponding to the limbs animation data.
  • the action corresponding to the adjusted torso and neck animation data is that the torso is in an upright posture, then the adjusted action corresponding to the torso and neck animation data matches the action corresponding to the limbs animation data.
  • the human body posture information is the human body posture information of the current frame obtained by fusing the above-mentioned torso and neck movement information and limb movement information, then the redirection process can be performed according to the human body posture information of the current frame to obtain the corresponding The body animation data of the avatar.
  • redirection processing may also be performed according to the facial expression information corresponding to the user in the current frame, so as to obtain the facial animation data of the avatar in the current frame.
  • the facial expression information is the weight of multiple blend shapes, or the weight of multiple principal component vectors obtained by performing principal component analysis on multiple blend shapes
  • the virtual character is also pre-defined with the same number and For blend shapes with the same semantics, the weight can be directly transferred to the avatar, that is, the facial expression information can be directly used as the facial animation data of the avatar.
  • facial expression information can be input into the expression mapping model to obtain facial animation data.
  • the facial expression information is weights of multiple blend shapes (Blendshapes), or weights of multiple principal component vectors obtained by performing principal component analysis on multiple blend shapes, or is three-dimensional feature points. That is, the weight of multiple blend shapes (Blendshapes), or the weight of multiple principal component vectors obtained by performing principal component analysis on multiple blend shapes, or the three-dimensional feature points input to the expression mapping model to obtain facial animation data.
  • the expression mapping model is obtained by using the second training data to train the second preset model in advance.
  • the embodiment of the present invention does not limit the type and structure of the second preset model, which may be an existing Various models with learning capabilities.
  • the second training data used for training may include multiple sets of second sample information, and each set of second sample information includes: facial expression information of a plurality of sample users under preset expressions and virtual characters in preset expressions. Facial animation data under . Wherein, different sets of second sample information correspond to different preset expressions. Wherein, the facial expression information of the plurality of samples under the preset expression may be collected in advance, and the facial animation data of the virtual character under the preset expression is pre-set by an animator.
  • the expression mapping model trained by using the second training data can learn the mapping relationship between the facial expression information and the facial animation data of the avatar. Therefore, the facial animation data output by the expression mapping model and the facial expression information corresponding to the user can have similar semantics.
  • the expression mapping model is versatile and can be used to determine the virtual character's facial expression information according to any user's corresponding facial expression information. Facial animation data.
  • the facial animation data obtained above may include mouth animation data.
  • the mouth animation data can also be determined by the method described below, and the mouth animation data determined by the following overwrite the mouth animation data in the facial animation data obtained above, to get the updated facial animation data.
  • expression information related to the mouth can be extracted from facial expression information, and recorded as mouth expression information.
  • the blend shapes related to the mouth can be determined according to the semantics of each blend shape, and the weight of the blend shapes related to the mouth is the mouth expression information.
  • mouth expression information can be input into the first mouth shape mapping model.
  • the first mouth shape mapping model is obtained by using the third training data to train the third preset model in advance.
  • the third preset model can be various existing models with learning ability, more specifically, the third preset model can be a radial basis (Radial Basis Function) model, but it is not limited thereto.
  • the third training data may include multiple sets of third sample information, and each set of third sample information includes mouth expression information of multiple sample users under preset expressions and mouth animations of avatars under the preset expressions data. Wherein, different sets of third sample information correspond to different preset expressions.
  • the mouth expression information of a plurality of sample users under the preset expressions may be collected in advance, and the mouth animation data of the virtual character under the preset expressions is pre-set by an animator.
  • the first mouth shape mapping model trained by using the third training data can learn the mapping relationship between the mouth expression information and the virtual character's mouth animation data. Therefore, the mouth animation data output by the first mouth shape mapping model and the user's mouth expression information may have similar semantics.
  • the first mouth shape mapping model is versatile and can be used to determine virtual The character's mouth animation data.
  • the 3D feature points related to the mouth can be extracted, more specifically, according to the predefined vertex index related to the mouth, A plurality of 3D feature points related to the mouth are extracted from the 3D face model, and recorded as the 3D feature information of the mouth.
  • the three-dimensional characteristic information of the mouth can be input into the second mouth shape mapping model to obtain the output animation data of the mouth of the current frame.
  • the second mouth shape mapping model is obtained by using the fourth training data to train the fourth preset model.
  • the fourth preset model can be various existing models with learning ability, more specifically, the fourth preset model can be a radial basis (Radial Basis Function) model, but it is not limited thereto.
  • the fourth training data may include multiple sets of fourth sample information, and each set of fourth sample information includes multiple sample user mouth three-dimensional feature information under preset expressions and the mouth of the avatar under the preset expressions animation data.
  • multiple sets of fourth sample information correspond to different preset expressions.
  • the three-dimensional feature information of the mouth of a plurality of sample users under the preset expressions can be collected in advance, more specifically, the three-dimensional feature information of the mouth of the sample users under the preset expressions can be based on the sample users under the preset expressions Extracted from the 3D face model; the mouth animation data of the virtual character under the preset expression is pre-set by the animator.
  • the second mouth shape mapping model trained by using the fourth training data can learn the mapping relationship between the three-dimensional feature information of the mouth and the animation data of the mouth of the avatar. Therefore, the mouth animation data output by the second mouth shape mapping model and the user's mouth three-dimensional feature information may have similar semantics.
  • the second mouth shape mapping model is also versatile and can be used to The information identifies mouth animation data for the avatar.
  • the teeth animation data may also be determined according to the mouth animation data.
  • tooth animation data can be obtained by adding a preset offset to the mouth animation data.
  • the jaw animation data can be extracted from the mouth animation data, and a preset offset can be added to the jaw animation data to obtain the teeth animation data, so that the teeth of the virtual character can follow the movement of the jaw , so that the overall action posture of the virtual character is more real and natural.
  • redirection processing can also be performed according to the gaze direction information corresponding to the user in the current frame, so as to obtain the gaze animation data of the virtual character in the current frame, so that the gaze direction of the virtual character and the gaze direction of the user can be as close as possible to each other. May agree.
  • the zenith angle ⁇ and azimuth angle when the three-dimensional pupil center position corresponding to the user in the current frame is expressed in spherical coordinates Passed directly to the virtual character.
  • the eyeball center position, eyeball radius, and iris size of the avatar can be preset, and the zenith angle ⁇ and azimuth angle
  • the eyeball center position of the virtual character points to the direction of the three-dimensional pupil center position.
  • the three-dimensional pupil center position of the avatar can be determined, and thus the eye animation data of the avatar can be obtained.
  • gaze animation data may be determined using a gaze mapping model.
  • the eye mapping model may be obtained by using the fifth training data to train the fifth preset model in advance, and the fifth preset model may be various existing models with learning ability.
  • the fifth preset model can be a radial basis model
  • the fifth training data can include multiple pairs of fifth sample information
  • each pair of fifth sample information includes: The three-dimensional pupil center position (can be recorded as a sample pupil position) and the three-dimensional pupil center position of the virtual character in the preset gaze direction (can be recorded as a sample virtual pupil position).
  • multiple pairs of fifth sample information correspond to different preset gaze directions. More specifically, the plurality of preset gaze directions may include looking up, looking left, looking right, looking up, looking down, etc., but not limited thereto.
  • the three-dimensional pupil center positions of the user under multiple preset gaze directions may be obtained based on an image detection algorithm, which is not limited in this embodiment.
  • the three-dimensional pupil center position of the virtual character in each preset gaze direction may be predetermined.
  • the RBF weight parameters of the radial basis model can be calculated and determined according to the user's three-dimensional pupil center position and the virtual character's three-dimensional pupil center position under multiple preset gaze directions, and the RBF weight parameter can be used to characterize the user's corresponding The mapping relationship between the three-dimensional pupil center position and the three-dimensional pupil center position of the avatar, and thus the eye-eye mapping model can be obtained.
  • the three-dimensional pupil center position corresponding to the user in the current frame can be input into the eye mapping model, and the virtual pupil center position corresponding to the current frame output by the eye mapping model can be obtained, thereby obtaining eye animation data, wherein the virtual pupil position is The 3D pupil center position of the avatar.
  • the animation data of the virtual character corresponding to the current frame can be obtained, and the animation data includes but not limited to: facial animation data, body animation data, eyeball animation data, and the like.
  • video stream data corresponding to the virtual character may be determined according to the animation data.
  • the animation data of the virtual character can be calculated and rendered to obtain the video picture information of the virtual character.
  • animation data can be fed into a real-time engine (eg, UE4, Unity, etc.) for evaluation and rendering.
  • a real-time engine eg, UE4, Unity, etc.
  • the video frame information has the same time code as the animation data.
  • the video stream data may be sent to a live server, so that the live server forwards the video stream data to other user terminals.
  • the voice information input by the user can also be acquired.
  • the voice information and the video picture come from different devices, and can be synchronized according to the respective time codes of the voice information and the video picture, so as to obtain the video stream data corresponding to the virtual character, wherein , the picture information is obtained by rendering the virtual character according to the animation data.
  • the voice is synchronized with the expression, eyes and gestures of the virtual character, so as to obtain the live video data of the virtual character.
  • FIG. 8 is a schematic structural diagram of an animation generating device for a virtual character in an embodiment of the present invention.
  • the device shown in FIG. 8 may include:
  • An image acquisition module 81 configured to acquire a current frame image, the current frame image including the image of the user;
  • Calculation module 82 configured to determine the state information corresponding to the user in the current frame according to the current frame image, the state information includes: face information, human body posture information and gaze direction information, and the face information includes face Posture information and facial expression information;
  • a redirection module 83 configured to perform redirection processing according to the state information to obtain animation data of the virtual character, wherein the animation data is the same as the time code of the current frame image, and the animation data includes: Facial animation data, body animation data, and eye animation data.
  • the animation generation device for the above-mentioned virtual character may correspond to a chip with animation generation function in the terminal; or correspond to a chip module with animation generation function in the terminal, or correspond to the terminal.
  • An embodiment of the present invention also provides a storage medium on which a computer program is stored, and when the computer program is run by a processor, the steps of the above-mentioned method for generating animation of a virtual character are executed.
  • the storage medium may include ROM, RAM, magnetic or optical disks, and the like.
  • the storage medium may also include a non-volatile memory (non-volatile) or a non-transitory (non-transitory) memory, and the like.
  • An embodiment of the present invention also provides a terminal, including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor executes the above virtual character when running the computer program.
  • the steps of the animation generation method include but are not limited to terminal devices such as mobile phones, computers, and tablet computers.
  • the processor may be a central processing unit (CPU for short), and the processor may also be other general-purpose processors, digital signal processors (digital signal processor, DSP for short) , application specific integrated circuit (ASIC for short), off-the-shelf programmable gate array (field programmable gate array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the memory in the embodiments of the present application may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memories.
  • the non-volatile memory can be read-only memory (read-only memory, referred to as ROM), programmable read-only memory (programmable ROM, referred to as PROM), erasable programmable read-only memory (erasable PROM, referred to as EPROM) , Electrically Erasable Programmable Read-Only Memory (electrically EPROM, referred to as EEPROM) or flash memory.
  • the volatile memory can be random access memory (RAM), which acts as external cache memory.
  • RAM random access memory
  • static random access memory static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous Dynamic random access memory
  • SDRAM synchronous Dynamic random access memory
  • DDR SDRAM double data rate synchronous dynamic random access memory
  • ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM Synchronously connect dynamic random access memory
  • direct rambus RAM direct rambus RAM
  • the above-mentioned embodiments may be implemented in whole or in part by software, hardware, firmware or other arbitrary combinations.
  • the above-described embodiments may be implemented in whole or in part in the form of computer program products.
  • the computer program product comprises one or more computer instructions or computer programs.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer program can be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer program can be downloaded from a website, computer, server or data center Wired or wireless transmission to another website site, computer, server or data center.
  • the disclosed methods, devices and systems can be implemented in other ways.
  • the device embodiments described above are only illustrative; for example, the division of the units is only a logical function division, and there may be other division methods in actual implementation; for example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, each unit may be physically included separately, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware, or in the form of hardware plus software functional units.
  • each module/unit contained therein may be realized by hardware such as a circuit, or at least some modules/units may be realized by a software program, and the software program Running on the integrated processor inside the chip, the remaining (if any) modules/units can be realized by means of hardware such as circuits; They are all realized by means of hardware such as circuits, and different modules/units can be located in the same component (such as chips, circuit modules, etc.) or different components of the chip module, or at least some modules/units can be realized by means of software programs,
  • the software program runs on the processor integrated in the chip module, and the remaining (if any) modules/units can be realized by hardware such as circuits; /Units can be realized by means of hardware such as circuits, and different modules/units can be located in the same component (such as chips, circuit modules, etc.) or different components in the terminal, or at least some modules/units can be implemented in the form of software programs Realization, the software program runs on
  • Multiple appearing in the embodiments of the present application means two or more.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

一种虚拟角色的动画生成方法及装置、存储介质、终端,所述方法包括:获取当前帧图像,所述当前帧图像包括用户的影像;根据所述当前帧图像,确定当前帧用户对应的状态信息,所述状态信息包括:人脸信息、人体姿态信息和眼神方向信息,所述人脸信息包括脸部姿态信息和脸部表情信息;根据所述状态信息进行重定向处理,以得到所述虚拟角色的动画数据,其中,所述动画数据和所述当前帧图像的时间码相同,所述动画数据包括:面部动画数据、身体动画数据和眼球动画数据。本发明提供了一种通用性更好、成本更低、用户体验更优的虚拟角色的动画生成方法。

Description

虚拟角色的动画生成方法及装置、存储介质、终端
本申请要求于2021年12月14日提交中国专利局、申请号为202111527313.9、发明名称为“虚拟角色的动画生成方法及装置、存储介质、终端”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及视频动画技术领域,尤其涉及一种虚拟角色的动画生成方法及装置、存储介质、终端。
背景技术
伴随着虚拟现实和增强现实技术的发展,涌现了一批具有代表性的虚拟角色,虚拟直播技术应运而生。虚拟直播技术是指由虚拟角色替代真人主播进行视频制作的技术。现有技术中,通常需要借助特定的环境(例如,动作捕捉实验室)以及特定的装备(例如,表情捕捉设备、动作捕捉设备等)捕捉真人主播的动作和表情等数据,然后再驱动虚拟角色,以得到包含虚拟角色的视频。这样方案对于场地、成本和设备等具有较高的要求,通常需要花费大量的成本,通用性较差。
因此,亟需一种通用性更好、成本更低、用户体验更优的虚拟角色的动画生成方法。
发明内容
本发明解决的技术问题是提供一种通用性更好、成本更低、用户体验更优的虚拟角色的动画生成方法。
为解决上述技术问题,本发明实施例提供一种虚拟角色的动画生成方法,所述方法包括:获取当前帧图像,所述当前帧图像包括用户的影像;根据所述当前帧图像,确定当前帧用户对应的状态信息,所述状态信息包括:人脸信息、人体姿态信息和眼神方向信息,所述人脸信息包括脸部姿态信息和脸部表情信息;根据所述状态信息进行重定向处理,以得到所述虚拟角色的动画数据,其中,所述动画数据和所述当前帧图像的时间码相同,所述动画数据包括:面部动画数据、身体动画数据和眼球动画数据。
可选的,所述方法还包括:至少根据所述动画数据,确定所述虚拟角色对应的视频流数据;将所述视频流数据发送至直播服务器,以使所述直播服务器将所述视频流数据转发至其他用户终端。
可选的,至少根据所述动画数据,确定所述虚拟角色对应的视频流数据包括:获取用户输入的语音信息;对所述语音信息和画面信息进行同步处理,以得到虚拟角色对应的视频流数据,其中,画面信息是根据所述动画数据对所述虚拟角色进行渲染得到的。
可选的,所述人体姿态信息包括:躯干颈部动作信息,所述躯干颈部动作信息用于描述用户躯干和颈部的动作,所述躯干颈部动作信息是根据所述脸部姿态信息确定的。
可选的,身体动画数据包括躯干颈部动画数据和四肢动画数据,根据所述状态信息进行重定向处理,以得到所述虚拟角色的动画数据包括:根据所述躯干颈部动作信息进行重定向处理,以得到所述躯干颈部动画数据;获取用户选择的四肢动画数据;判断所述躯干颈部动画数据对应的动作和所述四肢动画数据对应的动作是否匹配,如果否,则调整所述躯干颈部动画数据,以使调整后的躯干颈部动画数据与所述四肢动画数据对应的动作匹配;将所述四肢动画数据与匹配的躯干颈部动画数据进行融合处理,以得到身体动画数据。
可选的,根据所述当前帧图像,确定当前帧用户对应的状态信息包括:获取用户输入的四肢动作信息,所述四肢动作信息用于描述用 户四肢的动作;对所述躯干颈部动作信息和所述四肢动作信息进行融合处理,以得到当前帧的人体姿态信息。
可选的,对所述躯干颈部动作信息和所述四肢动作信息进行融合处理之前,所述方法还包括:判断所述躯干颈部动作信息描述的躯干和颈部的动作是否满足动作条件,如果否,则调整所述躯干颈部动作信息,以使调整后的躯干颈部动作信息描述的躯干和颈部的动作满足所述动作条件;其中,所述动作条件是根据所述四肢动作信息确定的。
可选的,根据所述当前帧图像,确定当前帧用户对应的状态信息包括:根据所述当前帧图像,确定当前帧用户对应的脸部姿态信息;将当前帧用户对应的脸部姿态信息输入至人体姿态匹配模型,以得到当前帧用户对应的躯干颈部动作信息;其中,所述人体姿态匹配模型是根据第一训练数据对第一预设模型进行训练得到的,所述第一训练数据包括多对第一样本信息,每对第一样本信息包括:样本用户对应的脸部姿态信息和所述样本用户对应的躯干颈部动作信息。
可选的,将所述脸部姿态信息输入至人体姿态匹配模型包括:获取关联姿态信息,所述关联姿态信息包括:关联图像中用户对应的脸部姿态信息和/或躯干颈部动作信息,其中,所述关联图像为所述当前帧图像之前的连续多帧图像和/或所述当前帧图像之后的连续多帧图像;将当前帧用户对应的脸部姿态信息和所述关联姿态信息输入至所述人体姿态匹配模型,以得到当前帧用户对应的躯干颈部动作信息。
可选的,根据所述当前帧图像,确定用户对应的状态信息包括:步骤A:根据当前帧用户对应的初始人脸信息,生成三维人脸模型;步骤B:根据所述三维人脸模型,确定预估人脸特征信息,并计算所述预估人脸特征信息和当前帧的目标人脸特征信息之间的第一差异,其中,所述目标人脸特征信息是根据所述当前帧图像检测得到的;步骤C:判断是否满足第一预设条件,如果是,则执行步骤D,否则执行步骤E;步骤D:将所述初始人脸信息作为当前帧用户对应的人脸 信息;步骤E:更新所述初始人脸信息,将更新后的初始人脸信息作为当前帧用户对应的初始人脸信息,并返回至步骤A,直至满足所述第一预设条件;其中,首次执行步骤A时,当前帧用户对应的初始人脸信息为上一帧用户对应的人脸信息,或者为预设的人脸信息,所述第一预设条件包括:所述第一差异不大于第一预设阈值和/或更新所述初始人脸信息的次数达到第二预设阈值。
可选的,所述眼神方向信息包括三维瞳孔中心位置,根据所述当前帧图像,确定当前帧用户对应的状态信息包括:步骤一:根据当前帧用户对应的眼部信息和预估瞳孔中心位置,确定三维眼球模型,其中,所述眼部信息包括:眼球中心位置、眼球半径和虹膜尺寸;步骤二:根据所述三维眼球模型,计算预估眼部特征信息,并计算所述预估眼部特征信息和目标眼部特征信息之间的第二差异,其中,所述目标眼部特征信息是根据所述当前帧图像检测得到的;步骤三:判断是否满足第二预设条件,如果是,则执行步骤四,否则,执行步骤五;步骤四:将所述预估瞳孔中心位置作为当前帧用户对应的三维瞳孔中心位置;步骤五:更新所述预估瞳孔中心位置,将更新后的预估瞳孔中心位置作为当前帧用户对应的预估瞳孔中心位置,并返回步骤一,直至满足所述第二预设条件;其中,首次执行步骤一时,当前帧用户对应的预估瞳孔中心位置是上一帧用户对应的三维瞳孔中心位置或者是预设位置,所述第二预设条件包括:所述第二差异不大于第三预设阈值和/或所述更新所述预估瞳孔中心位置的次数达到第四预设阈值。
可选的,所述人体姿态信息包括第一骨骼模型的关节角数据,根据所述状态信息进行重定向处理,以得到所述虚拟角色的动画数据包括:生成过渡骨骼模型,所述过渡骨骼模型中多个预设关键关节的位置和所述第一骨骼模型中所述多个预设关键关节的位置相同,所述过渡骨骼模型的骨骼形态和第二骨骼模型的骨骼形态相同;根据所述第一骨骼模型的关节角数据和所述第一骨骼模型,确定所述多个预设关键关节的位置;根据所述多个预设关键关节的位置和所述过渡骨骼模 型,确定所述过渡骨骼模型的关节角数据,以得到所述虚拟角色的身体动画数据;其中,所述第一骨骼模型为与用户对应的骨骼模型,所述第二骨骼模型为所述虚拟角色的骨骼模型,所述骨骼形态包括骨骼的数量和每个关节的旋转轴的默认朝向。
可选的,根据所述状态信息进行重定向处理,以得到所述虚拟角色的动画数据包括:将所述脸部表情信息输入至表情映射模型,其中,所述表情映射模型是根据第二训练数据对第二预设模型训练得到的,所述第二训练数据包括多组第二样本信息,每组样本信息包括:多个样本用户在预设表情下的脸部表情信息和所述虚拟角色在该预设表情下的面部动画数据,其中,所述多组第二样本信息对应不同的预设表情;获取所述表情映射模型输出的面部动画数据。
可选的,所述面部动画数据包括嘴部动画数据,根据所述状态信息进行重定向处理,以得到所述虚拟角色的动画数据包括:从所述脸部表情信息中提取与嘴部相关的表情信息,记为嘴部表情信息;将所述嘴部表情信息输入至第一嘴型映射模型,其中,所述第一嘴型映射模型是根据第三训练数据对第三预设模型训练得到的,所述第三训练数据包括多组第三样本信息,每组第三样本信息包括:多个样本用户在预设表情下的嘴部表情信息以及所述虚拟角色在所述预设表情下的嘴部动画数据,其中,所述多组第三样本信息对应不同的预设表情;获取所述第一嘴型映射模型输出的嘴部动画数据。
可选的,所述面部动画数据包括嘴部动画数据,根据所述状态信息进行重定向处理,以得到所述虚拟角色的动画数据包括:根据当前帧用户对应的三维人脸模型,提取与嘴部相关的三维特征点,记为嘴部三维特征信息;将所述嘴部三维特征信息输入至第二嘴型映射模型,其中,所述第二嘴型映射模型是根据第四训练数据对第四预设模型训练得到的,所述第四训练数据包括多组第四样本信息,每组第四样本信息包括:多个样本用户在预设表情下的嘴部三维特征信息和所述虚拟角色在所述预设表情下的嘴部动画数据,其中,所述多组第四 样本信息对应不同的预设表情;获取所述第二嘴型映射模型输出的嘴部动画数据。
可选的,所述动画数据还包括牙齿动画数据,根据所述状态信息进行重定向处理,以得到所述虚拟角色的动画数据还包括:根据所述嘴部动画数据,确定所述牙齿动画数据。
可选的,所述眼神方向信息为所述三维瞳孔中心位置在以所述眼球中心位置为坐标原点的球坐标系中的天顶角和方位角,根据所述状态信息进行重定向处理,以得到所述虚拟角色的动画数据包括:根据所述虚拟角色的眼球半径和所述眼神方向信息,确定虚拟瞳孔位置,以得到所述眼球动画数据,其中,所述虚拟瞳孔位置为所述虚拟角色的三维瞳孔中心位置。
可选的,根据所述状态信息进行重定向处理,以得到所述虚拟角色的动画数据包括:将当前帧用户对应的三维瞳孔中心位置输入至眼神映射模型,其中,所述眼神映射模型是根据第五训练数据对第五预设模型进行训练得到的,所述第五训练数据包括多对第五样本信息,每对第五样本信息包括用户在预设眼神方向下的三维瞳孔中心位置和所述虚拟角色在该预设眼神方向下的三维瞳孔中心位置;从所述眼神映射模型获取虚拟瞳孔中心位置,以得到所述眼球动画数据,其中,所述虚拟瞳孔位置为所述虚拟角色的三维瞳孔中心位置。
可选的,所述当前帧图像是由单个摄像头采集的。
本发明实施例还提供一种虚拟角色的动画生成装置,所述装置包括:图像获取模块,用于获取当前帧图像,所述当前帧图像包括用户的影像;计算模块,用于根据所述当前帧图像,确定当前帧用户对应的状态信息,所述状态信息包括:人脸信息、人体姿态信息和眼神方向信息,所述人脸信息包括脸部姿态信息和脸部表情信息;重定向模块,用于根据所述状态信息进行重定向处理,以得到所述虚拟角色的动画数据,其中,所述动画数据和所述当前帧图像的时间码相同,所述动画数据包括:面部动画数据、身体动画数据和眼球动画数据。
本发明实施例还提供一种存储介质,其上存储有计算机程序,所述计算机程序被处理器运行时,执行上述的虚拟角色的动画生成方法的步骤。
本发明实施例还提供一种终端,包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的计算机程序,所述处理器运行所述计算机程序时执行上述的虚拟角色的动画生成方法的步骤。
与现有技术相比,本发明实施例的技术方案具有以下有益效果:
在本发明实施例的方案中,获取当前帧图像,并根据当前帧图像确定当前帧用户对应的状态信息,由于状态信息包括人脸信息、人体姿态信息和眼神方向信息,因此,根据状态信息得到的虚拟角色的动画数据可以具有与用户对应的状态信息相同的语义。采用这样的方案,用户无需穿着特定的动作捕捉服装,也无需佩戴特定的头盔,仅根据单帧图像即可得到用户的表情、脸部姿态、动作姿态和眼神等信息,然后再根据状态信息进行重定向处理,得到虚拟角色的动画,因此本发明实施例提供的方案通用性更好、成本更低且用户体验更好。
进一步地,本发明实施例的方案中,躯干颈部动作信息是根据脸部姿态信息得到的,采用这样的方案,计算量更小,能够在保证动画效果的前提下,提高动画生成的效率。
进一步地,本发明实施例的方案中,人体姿态模型为时序模型,可以将当前帧用户对应的脸部姿态信息和关联姿态信息输入至人脸姿态模型,以得到当前帧用户对应的躯干颈部动作信息。采用这样的方案,有利于避免单帧图像中用户的脸部姿态发生抖动导致躯干颈部动作信息的不准确,可以使用户的躯干颈部动作信息描述的躯干颈部姿态更加连贯、流畅,从而使得虚拟角色的动画更加连贯流程,无需另做平滑处理。
附图说明
图1是本发明实施例中一种虚拟角色的动画生成方法的应用场景在第一视角下的示意图;
图2是本发明实施例中一种虚拟角色的动画生成方法的应用场景在第二视角下的示意图;
图3是本发明实施例中一种虚拟角色的动画生成方法的流程示意图;
图4是图3中步骤S302的一种具体实施方式的部分流程示意图;
图5是图3中步骤S302的另一种具体实施方式的部分流程示意图;
图6是图3中步骤S303的一种具体实施方式的部分流程示意图;
图7是本发明实施例中一种虚拟角色的动画生成方法的应用场景在第一视角下的另一示意图;
图8是本发明实施例中一种虚拟角色的动画生成装置的结构示意图。
具体实施方式
如背景技术所述,亟需一种通用性更好、成本更低、用户体验更优的虚拟角色的动画生成方法。
为了解决上述技术问题,本发明实施例提供一种虚拟角色的动画生成方法,在本发明实施例的方案中,在本发明实施例的方案中,获取当前帧图像,并根据当前帧图像确定当前帧用户对应的状态信息,由于状态信息包括人脸信息、人体姿态信息和眼神方向信息,因此,根据状态信息得到的虚拟角色的动画数据可以具有与用户对应的状态信息相同的语义。采用这样的方案,用户无需穿着特定的动作捕捉服装,也无需佩戴特定的头盔,仅根据单帧图像即可得到用户的表情、脸部姿态、动作姿态和眼神等信息,然后再根据状态信息进行重定向 处理,得到虚拟角色的动画,因此本发明实施例提供的方案通用性更好、成本更低且用户体验更好。
为使本发明的上述目的、特征和有益效果能够更为明显易懂,下面结合附图对本发明的具体实施例做详细的说明。
参照图1,图1是本发明实施例中一种虚拟角色的动画生成方法的应用场景在第一视角下的示意图,参照图2,图2是本发明实施例中一种虚拟角色的动画生成方法的应用场景在第二视角下的示意图,参照图7,图7是本发明实施例中一种虚拟角色的动画生成方法的应用场景在第一视角下的另一示意图。其中,第一视角不同于第二视角。下面结合图1、图2和图7,对本发明实施例中的虚拟角色的动画生成方法的应用场景进行非限制性的说明。
如图1和图2所示,本实施例的方案中,可以采用摄像头11对用户10进行拍摄。
具体而言,用户10为摄像头11的拍摄对象,用户10是真人演员。需要说明的是,与现有技术相比,本发明实施例的方案中,用户10无需穿着动作捕捉服装,无需佩戴表情捕捉设备和眼神捕捉设备等。
其中,摄像头11可以是各种现有的适当的拍照设备,本实施例对于摄像头11的类型和数量并不进行限制。
在一个具体的例子中,摄像头11的数量为单个,摄像头11可以为RGB(R为红色RED的缩写,G为绿色Green的缩写,B为蓝色Blue的缩写)摄像头;也可以为RGBD(D为深度图Depth的缩写)摄像头。也即,摄像头11拍摄得到的图像可以为RGB图像,也可以为RGBD图像等,但并不限于此。
进一步地,摄像头11对用户10进行拍摄,可以得到用户10对应的视频流数据,用户10对应的视频流数据可以包括多帧图像,每帧图像具有时间码,每帧图像中可以包括用户10的影像。
在一个具体的例子中,用户10和摄像头11之间的距离小于第一预设距离阈值,图像可以包括用户10脸部的影像,还可以包括用户10颈部和肩膀的影像。换言之,用户10与摄像头11之间的距离通常较小,因此,图像可以不包含用户10全身的影像。需要说明的是,本发明实施例中的摄像头11并非是设置在用户10的佩戴设备上,用户10与摄像头11之间的距离大于第二预设距离阈值,第二预设距离阈值通常远远小于第一预设距离阈值。
进一步地,摄像头11可以和终端12连接,终端12可以是现有的各种具有数据接收和数据处理功能的设备,摄像头11可以将采集到的用户10对应的视频流数据发送至终端12。其中,所述终端12可以是手机、平板电脑和计算机等,但并不限于此。需要说明的是,本实施例对于摄像头11和终端12之间的连接方式并不进行限制,可以是采用有线连接,也可以是无线连接(例如,蓝牙连接、局域网连接等)。更具体地,摄像头11可以是设置在终端12上的摄像头,例如,可以是手机上的摄像头、电脑上的摄像头等。
进一步地,终端12可以按照时间码的先后顺序,依次对摄像头11采集到的用户10对应的视频流数据中的每帧图像进行处理和分析,以得到用户10对应的状态信息。更进一步地,可以根据每帧图像中用户10对应的状态信息进行重定向处理,以得到该帧图像对应的虚拟角色13的动画数据,得到的动画数据与图像具有相同的时间码。
其中,虚拟角色13可以包括虚拟人、也可以包括虚拟动物、虚拟植物等具有脸部和身体的对象。虚拟角色13可以是三维的,也可以是二维的,本发明实施例对此并不进行限制。
关于根据每帧图像生成虚拟角色13的动画数据的更多内容将在下文中具体描述。
需要说明的是,对于用户10对应的视频流数据中的每帧图像,根据该帧图像得到时间码相同的动画数据的具体流程相同,下文仅以 其中一帧(也即,当前帧图像)为例对生成对应的虚拟角色13的动画数据的具体流程进行详细说明。
参照图3,图3是本发明实施例中一种虚拟角色的动画生成方法的流程示意图。所述方法可以由终端执行,所述终端可以是各种具有数据接收和处理能力的终端设备,例如,可以是手机、计算机和平板电脑等等,本发明实施例对此并不进行限制。在一个具体的例子中,终端可以是图1中示出的终端12,但并不限于此。图3示出的虚拟角色的动画生成方法可以包括以下步骤:
步骤S301:获取当前帧图像,所述当前帧图像包括用户的影像;
步骤S302:根据所述当前帧图像,确定当前帧用户对应的状态信息,所述状态信息包括:人脸信息、人体姿态信息和眼神方向信息,所述人脸信息包括脸部姿态信息和脸部表情信息;
步骤S303:根据所述状态信息进行重定向处理,以得到所述虚拟角色的动画数据,其中,所述动画数据和所述当前帧图像的时间码相同,所述动画数据包括:面部动画数据、身体动画数据和眼球动画数据。
可以理解的是,在具体实施中,所述方法可以采用软件程序的方式实现,该软件程序运行于芯片或芯片模组内部集成的处理器中;或者,该方法可以采用硬件或者软硬结合的方式来实现。
在步骤S301的具体实施中,可以获取当前帧图像,当前帧图像可以是由摄像头对用户拍摄得到的。更具体地,当前帧图像可以是用户对应的视频流数据中当前待处理的图像,当前帧图像的时间码可以记为当前时刻。其中,用户对应的视频流数据可以是采用摄像头对用户进行拍摄得到的。在一个具体的例子中,用户对应的视频流数据是由单个摄像头采集得到的,所述摄像头可以是RGB摄像头或者可以是RGBD摄像头,但并不限于此。
进一步地,当前帧图像包括用户的影像。具体而言,当前帧图像 可以包括用户脸部的影像,还可以包括用户颈部和肩膀的影像,还可以包括至少一部分手臂的图像等,但并不限于此。
关于当前帧图像的更多内容可以参照上文关于图1和图2的相关描述,在此不再赘述。
在步骤S302的具体实施中,可以根据当前帧图像确定当前帧用户对应的状态信息,状态信息可以包括:人脸信息、人体姿态信息和眼神方向信息。其中,用户对应的状态信息可以是根据当前帧图像对用户进行还原重建得到的。
具体而言,所述人脸信息包括脸部姿态信息和脸部表情信息,其中,脸部姿态信息用于描述用户脸部的位置和朝向,更具体地,用户脸部的位置和朝向是指用户脸部在三维空间中的位置和朝向。例如,用户脸部的位置可以是用户的脸部相对于摄像头的位置,用户脸部的朝向可以是相对于摄像头的朝向。
进一步地,人脸信息还可以包括:身份ID信息,所述身份ID信息用于描述用户的脸部形状和五官分布。
进一步地,脸部表情信息可以用于描述用户的表情。在一个具体的例子中,脸部表情信息可以是多个混合形状(Blend shapes)的权重,其中,多个混合形状可以是预先设置的;脸部表情信息也可以是对多个混合形状进行主成分分析得到的多个主成分向量的权重;还可以是三维特征点等,但并不限于此。
进一步地,人体姿态信息可以用于描述用户身体的动作姿态。在一个具体的例子中,人体姿态信息可以是关节角数据,更具体地,关节角数据为关节的角度。
进一步地,眼神方向信息可以用于描述用户的眼神方向。具体而言,眼球中心位置指向三维瞳孔中心位置的方向为眼神方向。更具体地,眼球中心位置为眼球的中心点的位置,三维瞳孔中心位置为瞳孔的中心点的位置。由于虹膜的中心位置与三维瞳孔中心位置是重合 的,虹膜在眼球上的具体位置是根据三维瞳孔中心位置确定的,因此虹膜会跟着三维瞳孔中心位置的变化而移动,但同一个用户的虹膜尺寸可以是固定的。其中,虹膜尺寸为虹膜的大小,虹膜尺寸可以用于确定眼球中虹膜的覆盖面积。
在一个具体的例子中,眼神方向信息可以为三维瞳孔中心位置。更具体地,眼神方向信息可以为三维瞳孔中心位置在以眼球中心位置为坐标原点的球坐标系中的天顶角和方位角。具体而言,三维瞳孔中心位置可以采用球坐标(r,θ,
Figure PCTCN2022138386-appb-000001
)的方式进行表示,其中,r为三维眼球的半径,θ为天顶角,
Figure PCTCN2022138386-appb-000002
为方位角。天顶角θ和方位角
Figure PCTCN2022138386-appb-000003
来可以表征眼球中心位置与三维瞳孔中心位置连接产生的射线方向,因此可以采用三维瞳孔中心位置的球坐标中的天顶角θ和方位角
Figure PCTCN2022138386-appb-000004
来表示眼神方向。
参照图4,图4是图3中步骤S302的一种具体实施方式的部分流程示意图。通过图4示出的步骤,可以得到当前帧用户对应的人脸信息,更具体地,可以得到当前帧用户对应的脸部姿态信息和脸部表情信息。图4示出的步骤S302可以包括以下步骤:
步骤S401:根据当前帧用户对应的初始人脸信息,生成三维人脸模型;
步骤S402:根据所述三维人脸模型,确定预估人脸特征信息,并计算所述预估人脸特征信息和当前帧的目标人脸特征信息之间的第一差异;
步骤S403:判断是否满足第一预设条件;如果是,则执行步骤S404,否则执行步骤S405;
步骤S404:将所述初始人脸信息作为当前帧用户对应的人脸信息;
步骤S405:更新所述初始人脸信息,将更新后的初始人脸信息作为当前帧用户对应的初始人脸信息;并返回至步骤S401,直至满 足第一预设条件。
首次执行步骤S401时,当前帧用户对应的初始人脸信息可以是预先设置的默认值,也可以是上一帧用户对应的人脸信息。
具体而言,在首次执行步骤S401时,初始人脸信息中的身份ID信息的默认值可以是多个样本用户的身份ID信息的平均值。由于根据多个样本用户的身份ID信息计算得到的平均值具有通用性,因此可以作为当前帧图像中用户的初始人脸信息中身份ID信息的默认值,脸部姿态信息的默认值可以是用户预先设置的位置和朝向,脸部表情信息的默认值可以是用户在中性表情下的脸部表情信息,该默认值可以是预先采集的。需要说明的是,本发明实施例中的“用户”是指当前帧图像中的用户,而“多个样本用户”是指使用摄像头采集视频流数据之前进行训练数据采集等准备工作时涉及的用户或表演者。
在另一个实施例中,首次执行步骤S401时,也可以将上一帧用户对应的人脸信息作为当前帧用户对应的初始人脸信息,有利于减少计算量,并且使得到的虚拟角色的面部动画数据更加平滑,无需再另外做平滑处理。
再次执行步骤S401时,也即,从步骤S405返回至步骤S401时,当前帧用户对应的初始人脸信息可以为更新后的初始人脸信息。
进一步地,可以根据当前帧用户对应的初始人脸信息,合成得到三维人脸模型。换言之,步骤S401中的三维人脸模型是根据当前帧用户对应的初始人脸信息得到的,并非是根据当前帧图像得到的。需要说明的是,对于根据当前帧用户对应的初始人脸信息(身份ID信息、脸部姿态信息和脸部表情信息)合成三维人脸模型的具体方法,本发明实施例并不进行限制,可以是现有的各种能够合成三维人脸模型的方法。
在步骤S402的具体实施中,可以根据步骤S401中得到的三维人脸模型,计算预估人脸特征信息。所述预估人脸特征信息为根据三 维人脸模型得到的人脸特征信息,预估人脸特征信息可以包括:二维投影点坐标信息和纹理特征点坐标信息。
具体而言,可以从三维人脸模型中提取多个三维特征点,然后将多个三维特征点投影至二维平面,以得到多个二维投影点。其中,所述二维平面是指摄像头的图像坐标系的平面。
更具体地,根据多个预先定义的顶点索引,从三维人脸模型中提取多个顶点,以得到多个三维特征点。也即,三维特征点是基于预先定义的顶点索引在三维人脸模型上确定的顶点。其中,每个顶点索引用于指代特定的脸部部位,不同的顶点索引指代的脸部部位是不同的。例如,顶点索引3780用于指代鼻尖点等。三维人脸模型可以包括多个顶点,可以将多个顶点索引对应的顶点提取出来,即可得到多个三维特征点。
进一步地,可以将多个三维特征点投影至二维平面,以将每个三维特征点的三维坐标转化为该三维特征点对应的二维投影点的二维坐标。由此,可以得到预估人脸特征信息,也即,预估人脸特征信息可以包括多个二维投影点的二维坐标。
进一步地,可以计算预估人脸特征信息和目标人脸特征信息之间的第一差异,其中,目标人脸特征信息是根据当前帧图像检测得到的。目标人脸特征信息可以包括:二维特征点的坐标信息,其中,二维特征点为当前帧图像中具有特定语义信息的点。具体而言,可以采用机器学习方法对当前帧图像进行检测,以检测出多个二维特征点。其中,语义信息是预先定义的,所述语义信息可以用于描述二维特征点所对应的脸部部位。例如,64号二维特征点的语义信息为:鼻尖点。更具体地,多个二维特征点的语义信息描述的脸部部位与多个顶点索引指代的脸部部位是相同的。由此,二维特征点和二维映射点可以是一一对应的。
进一步地,预估人脸特征信息还可以包括纹理特征点坐标信息,对应地,目标人脸特征信息还可以包括与纹理特征点对应的像素点的 坐标。具体而言,根据当前帧图像中的像素点确定该像素点对应的二维纹理坐标(u,v),可以根据预先定义的纹理映射关系确定该像素点在三维人脸模型上对应的三维纹理点,也即,与上文中三维特征点不同,三维纹理点是根据预先定义的纹理映射关系在三维人脸模型上确定的顶点。
进一步地,可以将多个三维纹理点投影至二维平面,以得到对应的纹理特征点的二维坐标。更进一步地,可以计算像素点和对应的纹理特征点之间的坐标差异。
由上,可以根据像素点和对应的纹理特征点之间的坐标差异,以及二维特征点和二维投影点之间的坐标差异,计算得到第一差异。
需要说明的是,本发明实施例对于检测目标人脸特征信息和确定预估人脸特征信息的先后顺序并不进行限制。
进一步地,可以计算预估人脸特征信息和目标人脸特征信息之间的第一差异。更具体地,可以计算多个二维投影点和多个二维特征点之间的坐标差异。
在步骤S403的具体实施中,可以判断是否满足第一预设条件,其中,第一预设条件可以包括:第一差异不大于第一预设阈值,和/或,更新初始人脸信息的次数达到第二预设阈值。
进一步地,如果满足第一预设条件,则可以执行步骤S404,也即,可以将当前帧用户对应的初始人脸信息作为当前帧用户对应的人脸信息。换言之,如果满足第一预设条件,则可以确定步骤S401中的三维人脸模型符合用户真实的人脸,换言之,步骤S401中的人脸信息能够准确、真实地描述用户在当前帧图像中的脸部姿态和脸部表情等。
进一步地,如果不满足第一预设条件,可以执行步骤S405,也即更新初始人脸信息,根据更新后的初始人脸信息,继续执行步骤S401-S403,直到满足第一预设条件。
在一个非限制性的例子中,每次更新初始人脸信息时可以仅更新脸部姿态信息和脸部表情信息,也即,不更新用户的身份ID信息。换言之,用户的身份ID信息可以是预先确定的。由于本实施例的应用场景中用户通常是固定的,也即,在录制视频的过程中,摄像头拍摄的对象通常是同一人,因此,用户的身份ID信息可以是固定的,也即,可以采用预先确定的身份ID信息。采用此方案,可以简化人脸信息的计算过程,有利于提高动画生成的效率。
在具体实施中,可以在获取用户对应的视频流数据之前,确定用户的身份ID信息。具体而言,可以获取多张身份图像,每张身份图像包括用户的影像,其中,每张身份图像中用户的表情为默认表情且多张身份图像中的用户的脸部姿态(也即,脸部的位置和/或朝向)可以是不同的。
进一步地,可以根据多张身份图像对预设的初始三维人脸模型进行迭代优化,以得到用户的身份ID参数,在后续生成虚拟角色的动画过程中可以采用根据多张身份图像得到的身份ID参数。其中,预设的初始三维人脸模型是指根据身份ID参数、脸部姿态信息和脸部表情信息均采用预先设置的默认值构建的三维人脸模型,也即,预设的初始三维人脸模型是未经任何优化和调整的初始模型,默认表情可以为中性表情。
继续参考图3,在步骤S302的具体实施中,还可以确定当前帧用户对应的人体姿态信息。具体而言,人体姿态信息可以是通过图像直接构建三维人体模型得到的,也可以是根据脸部姿态信息计算得到的,本实施例对此并不进行限制。
具体而言,人体姿态信息可以包括:躯干颈部动作信息。其中,躯干颈部动作信息用于描述所述用户的躯干和颈部的动作姿态。
更具体地,躯干颈部动作信息可以包括多个第一预设关节的关节角数据,所述第一预设关节为位于躯干和颈部的关节。
在一个具体的例子中,当前帧用户对应的躯干颈部动作信息是根据当前帧用户对应的脸部姿态信息计算得到的。具体而言,可以将当前帧用户对应的脸部姿态信息输入至人体姿态匹配模型,以得到当前帧用户对应的躯干颈部动作信息。
更具体地,所述人体姿态匹配模型可以是根据第一训练数据对第一预设模型进行训练得到的,其中,第一预设模型可以是现有的各种具有学习能力的模型。所述第一训练数据可以包括多对第一样本信息,每对第一样本信息包括样本用户对应的脸部姿态信息和所述样本用户对应的躯干颈部动作信息。更具体地,每对第一样本信息是对样本用户进行动作捕捉得到的,属于同一对第一样本信息的脸部姿态信息和躯干颈部动作信息之间具有对应关系。其中,多对第一样本信息可以对同一个样本用户进行动作捕捉得到的,也可以是对多个第一样本用户进行动作捕捉得到的。需要说明的是,本发明实施例中的样本用户为真人。
由于第一训练数据来源于真人动作捕捉,采用第一训练数据训练得到的人体姿态匹配模型能够学习到真人脸部的位置、朝向和真人躯干、颈部的姿态之间的关联关系。因此,人体姿态匹配模型输出的躯干颈部动作信息是真实、自然的,也即,输出的躯干颈部动作信息和输入的脸部姿态信息所呈现的用户整体的姿态是真实、自然的。采用上述方案得到躯干颈部动作信息,计算量更小,能够在保证动画效果的前提下,提高动画生成的效率。
在另一个具体的例子中,当前帧用户对应的躯干颈部动作信息是根据当前帧用户对应的脸部姿态信息和关联姿态信息计算得到的。更具体地,人体姿态匹配模型的输入为当前帧用户对应的脸部姿态信息和当前帧用户对应的关联姿态信息,对应的输出为当前帧用户对应的躯干颈部动作信息。
其中,所述关联姿态信息包括:关联图像中用户对应的脸部姿态信息和/或躯干颈部动作信息,所述关联图像为所述当前帧图像之前 的连续多帧图像和/或所述当前帧图像之后的连续多帧图像。
更具体地,将当前帧图像的时间码记为t1,关联姿态信息可以包括时间码为t1-T至t1-1的连续多张图像中用户对应的脸部姿态信息,其中,T为正整数。
进一步地,关联姿态信息还可以包括时间码为t1-T至t1-1的连续多张图像中用户对应的躯干颈部动作信息。例如,T=30时,关联姿态信息可以包括与当前帧图像相邻且在当前帧图像之前的30帧图像中用户对应的脸部姿态信息和躯干颈部动作信息。
进一步地,关联姿态信息还可以包括时间码为t1+1至t1+T的图像中用户对应的脸部姿态信息,还可以包括时间码为t1+1至t1+T的图像中用户对应的躯干颈部动作信息。例如,T=30时,关联姿态信息还可以包括与当前帧图像相邻且在当前帧图像之后的30帧图像中用户对应的脸部姿态信息和躯干颈部动作信息。
由上,本实施例的方案中,人体姿态匹配模型可以为时序模型,可以将当前帧用户对应的脸部姿态信息和关联姿态信息输入至人体姿态匹配模型,以得到当前帧用户对应的躯干颈部动作信息。采用这样的方案,有利于避免单帧图像中用户的脸部姿态发生抖动导致躯干颈部动作信息的不准确,可以使躯干颈部动作信息描述的躯干颈部姿态更加连贯、流畅,从而使得虚拟角色的动画更加连贯流程,无需另做平滑处理。
在另一个实施例中,人体姿态信息还可以包括:四肢动作信息,四肢动作信息可以用于描述所述用户四肢的动作姿态。例如,四肢动作信息可以用于描述用户手臂的动作。更具体地,四肢动作信息可以包括多个第二预设关节的关节角数据,所述多个第二预设关节为位于四肢的关节,更具体地,多个第二预设关节可以包括手臂的关节。进一步地,四肢动作信息可以是预先设置的默认值,例如,四肢动作信息的默认值表征的手臂动作可以是自然下垂等,但并不限于此。四肢动作信息还可以是由用户输入的。
进一步地,可以将对所述躯干颈部动作信息和所述四肢动作信息进行融合处理,以得到当前帧的人体姿态信息。在对所述躯干颈部动作信息和所述四肢动作信息进行融合处理之前,可以先判断躯干颈部动作信息描述的躯干和颈部的动作是否满足动作条件,如果否,则调整所述躯干颈部动作信息,以使调整后的躯干颈部动作信息描述的躯干和颈部的动作满足所述动作条件。其中,所述动作条件是根据四肢动作信息确定的。
具体而言,动作条件是根据四肢动作信息确定的,当躯干颈部动作信息满足动作条件时,可以确定四肢动作信息描述的四肢的动作姿态和躯干颈部动作信息描述的躯干、颈部的动作姿态所呈现的整体的身体姿态是合理、真实的。如果躯干颈部动作信息不满足动作条件,则可以确定四肢动作信息描述的四肢的动作和躯干颈部动作信息描述的躯干、颈部的动作所呈现的整体的身体姿态是不合理的。换言之,动作条件为与四肢动作信息描述的四肢的动作姿态匹配的躯干和颈部的动作姿态,躯干颈部动作信息不满足动作条件,也即,躯干颈部动作信息与动作条件描述的躯干颈部的动作姿态不一致。
进一步地,还可以确定当前帧用户对应的眼神方向信息。参照图5,图5是步骤S302中另一种具体实施方式的部分流程示意图。通过图5示出的步骤,可以得到用户当前帧用户对应的眼神方向信息。图5示出的步骤S302可以包括以下步骤:
步骤S501:根据当前帧用户对应的眼部信息和预估瞳孔中心位置,确定三维眼球模型;
步骤S502:根据所述三维眼球模型,确定预估眼部特征信息,并计算所述预估眼部特征信息和目标眼部特征信息之间的第二差异;
步骤S503:判断是否满足第二预设条件;如果是,则执行步骤S504,否则,执行步骤S505;
步骤S504:将所述预估瞳孔中心位置作为当前帧用户对应的三 维瞳孔中心位置;
步骤S505:更新所述预估瞳孔中心位置,将更新后的预估瞳孔中心位置作为当前帧用户对应的预估瞳孔中心位置;并返回至步骤S501,直至满足第二预设条件。
其中,眼部信息包括眼球中心位置、眼球半径和虹膜尺寸。可以理解的是,眼部信息为每个人眼球的个性化数据,不同用户的眼部信息的具体取值是不同的,同一个用户的眼部信息的具体取值可以是固定的,但同一个用户的眼神方向信息可以是不同的。
需要说明的是,对于两个眼睛,需要分别执行图5示出的步骤,以分别求解当前帧两个眼睛的眼神方向。对于每个眼睛的处理流程是相同的,下文仅就确定其中一个眼睛的眼神方向信息的具体过程进行详细说明。
在首次执行步骤S501之前,可以先根据当前帧图像判断眼睛是否处于闭眼状态,如果是,则可以将上一时刻该眼睛的眼神方向信息作为当前帧用户对应的眼神方向信息,也即,无需再针对该眼睛执行图5示出的步骤。
进一步地,首次执行步骤S501时,当前帧用户对应的眼部信息可以是预先设置的默认值,也可以是上一帧用户对应的眼部信息。当前帧用户对应的预估瞳孔中心位置可以是预先设置的默认值,也可以是上一帧用户对应的三维瞳孔中心位置。
其中,首次执行步骤S501时,眼球中心位置的默认值可以是多个样本用户的眼球中心位置的平均值,类似地,眼球半径的默认值可以是多个样本用户的眼球半径的平均值,虹膜尺寸的默认值可以是多个样本用户的虹膜尺寸的平均值。预估瞳孔中心位置的默认值可以是用户目视前方时瞳孔的位置。
再次执行步骤S501时,也即,从步骤S505返回至步骤S501时,当前帧用户对应的预估瞳孔中心位置可以是更新后的预估瞳孔中心 位置。
进一步地,可以根据当前帧用户对应的眼部信息和预估瞳孔中心位置,合成得到三维眼球模型。其中,对于根据眼部信息和预估瞳孔中心位置合成三维眼球模型的具体方法,本发明实施例对此并不进行限制,可以是现有的各种能够合成三维眼球模型的方法。
在另一个实施例中,首次执行步骤S501时,也可以将上一帧用户对应的眼部信息作为当前帧用户对应的眼部信息,将上一帧用户对应的三维瞳孔中心位置作为当前帧的预估瞳孔中心位置,不仅可以减少计算量,还可以使得虚拟角色的眼球动画数据更加平滑,无需另做平滑处理。
在步骤S502的具体实施中,可以根据步骤S501中得到的三维眼球模型,确定预估眼部特征信息,并可以计算预估眼部特征信息和目标眼部特征信息之间的第二差异,其中,目标眼部特征信息是根据当前帧图像检测得到的。
具体而言,眼部特征信息可以包括二维瞳孔中心位置和虹膜掩膜位置等。其中,二维瞳孔中心位置是指瞳孔在二维平面中的位置,虹膜掩膜位置可以是指虹膜的掩膜在二维平面中的位置。
进一步地,预估眼部特征信息可以是将根据三维眼球模型的瞳孔位置以及虹膜掩膜位置投影到二维平面得到的,目标眼部特征信息可以是采用机器学习方法对当前帧图像进行检测得到的。
在步骤S503的具体实施中,可以判断是否满足第二预设条件,其中,第二预设条件可以包括:第二差异不大于第三预设阈值,和/或,更新预估瞳孔中心位置的次数达到第四预设阈值。其中,第三预设阈值和第一预设阈值可以是相同的,也可以是不同的,第四预设阈值和第二预设阈值可以是相同的,也可以是不同的。
进一步地,如果满足第二预设条件,则可以执行步骤S504,也即,可以将当前帧用户对应的预估瞳孔中心位置作为当前帧用户对应 的三维瞳孔中心位置。
进一步地,如果不满足第二预设条件,可以执行步骤S505,也即,更新预估瞳孔中心位置,根据更新后的预估瞳孔中心位置息,继续执行步骤S501-S503,直至满足第二预设条件。
需要说明的是,与身份ID信息类似,本实施例中考虑到用户通常是固定的,因此,眼部信息可以是固定的,也即,眼部信息可以是预先确定的,每次仅更新预估瞳孔中心位置,可以简化计算过程,提高动画生成的效率。
在具体实施中,在获取用户对应的视频流数据之前,确定用户的眼部信息。具体而言,可以根据眼部图像,眼部图像包括用户眼部的影像且眼部图像中用户的表情为中性表情且眼神方向平视前方。
进一步地,可以根据眼部图像确定多个三维眼皮特征点,然后计算多个三维眼皮特征点的三维位置的平均值,并在平均值的基础上加上预设的三维偏移量,以得到眼球中心位置。其中,偏移量的偏移方向朝向眼球内部。
进一步地,还可以根据眼部图像对预设的初始三维眼球模型进行迭代优化,以得到虹膜尺寸。在后续生成虚拟角色的动画过程中,可以采用根据眼部图像得到的眼球中心位置和虹膜尺寸,眼球半径可以采用多个样本用户的眼球半径的平均值。其中,预设的初始三维眼球模型是指眼球信息和三维瞳孔中心位置均采用预设的默认值构建的三维眼球模型。
在其他实施例中,每次执行步骤S505时,还可以更新眼部信息,本发明实施例对此并不进行限制。
进一步地,分别得到两个眼睛的三维瞳孔中心位置后,可以分别确定两个眼睛的眼神方向信息,并判断两个眼睛的眼神方向信息是否满足互动关系。若两只眼睛的眼神方向不满足互动关系时,则可以确定计算得到的眼神方向信息有误,则将上一帧用户对应的眼神方向信 息作为当前帧用户对应的眼神方向信息。其中,互动关系是指左眼的眼神方向和右眼的眼神方向能够被同一个人同时做出来。
由上,可以得到当前帧用户对应的人脸信息、眼神方向信息和人体姿态信息。
继续参考图1,在步骤S303的具体实施中,可以根据当前帧用户对应的状态信息进行重定向处理,以得到当前帧虚拟角色的动画数据。其中,虚拟角色的动画数据可以包括用于生成虚拟角色动画的控制器数据,具体表现形式为数字化向量的序列。可将所述动画数据转化为UE或Unity3d可以接收的数据形式(多个混合形状(Blend shapes)的权重和关节角数据),输入渲染引擎,如UE或Unity3d,即可驱动虚拟角色的相应部位做出相应的动作。
参照图6,图6是图3中步骤S303的一种具体实施方式的部分流程示意图。通过图6示出的步骤,可以根据当前帧用户对应的人体姿态信息确定当前帧虚拟角色的身体动画数据。图6示出的步骤S303可以包括以下步骤:
步骤S601:生成过渡骨骼模型;
步骤S602:根据第一骨骼模型的关节角数据和所述第一骨骼模型,确定所述多个预设关键关节的位置;
步骤S603:根据所述多个预设关键关节的位置和所述过渡骨骼模型,确定所述过渡骨骼模型的关节角数据,以得到所述虚拟角色的身体动画数据。
考虑到用户的骨骼和虚拟角色的骨骼在定义上(骨骼数量、骨骼默认朝向和关节位置)是不一致的,所以无法将用户的身体姿态信息采用直传的方式给到虚拟角色。具体而言,一方面,由于用于描述用户骨骼的骨骼模型中关节的旋转轴的默认朝向和虚拟角色的骨骼模型中关节的旋转轴的默认朝向是不同的,因此,关节角的定义也不同,无法直接传递关节角数据。另一方面,由于用于描述用户骨骼的 骨骼模型中关节位置和虚拟角色的骨骼模型中关节的位置不同,采用逆运动学的方式进行传递也会导致虚拟角色的姿态出现问题。
本发明实施例的方案中,可以根据第一骨骼模型和第二骨骼模型生成过渡骨骼模型。其中,第一骨骼模型为与用户对应的骨骼模型,更具体地,与用户对应的骨骼模型为可以用于描述用户骨骼的骨骼模型。第一骨骼模型可以是根据图像进行还原重建得到的,也可以是预先设置的平均骨骼。其中,如果采用平均骨骼,则可以省略获取用户骨骼模型的步骤。进一步地,第二骨骼模型为虚拟角色的骨骼模型。
具体而言,过渡骨骼模型的骨骼形态和第二骨骼模型的骨骼形态是相同的,所述骨骼形态包括骨骼的数量和每个关节的旋转轴的默认朝向,也即,过渡骨骼模型的骨骼数量和第二骨骼模型的骨骼数量是相同的,过渡骨骼模型中的骨骼和第二骨骼模型中的骨骼一一对应,过渡骨骼模型中每个关节的旋转轴的默认朝向和第二骨骼模型中对应的关节的旋转轴的默认朝向也是相同的。
进一步地,预先定义有多个预设关键关节,预设关键关节为预设定义的关节。更具体地,预设关键关节可以选自上文中的第一预设关节和第二预设关节。可以获取第一骨骼模型中所述多个预设关键关节的位置,并将过渡骨骼模型中所述多个预设关键关节的位置分别设置为所述多个预设关键关节在第一骨骼模型中的位置,由此可以得到过渡骨骼模型。更具体地,过渡骨骼模型中每个预设关键关节的位置和第一骨骼模型中该预设关键关节的位置相同。
在步骤S602的具体实施中,可以根据第一骨骼模型的关节角数据和第一骨骼模型,计算得到第一骨骼模型中多个预设关键关节的位置。由于过渡骨骼模型中每个预设关键关节的位置和该预设关键关节在第一骨骼模型中的位置相同,因此,可以得到过渡骨骼模型中多个预设关键关节的位置。
进一步地,可以根据多个预设关键关节的位置,计算确定过渡骨骼模型的关节角数据。由于过渡骨骼模型的骨骼形态和第二骨骼模型 的骨骼形态是相同的,因此可以采用直传的方式得到虚拟角色的关节角数据。换言之,可以直接将过渡骨骼模型的关节角数据作为虚拟角色的关节角数据。进一步地,所述关节角数据可以作为虚拟角色的身体动画数据。由此,得到的身体动画数据与人体姿态信息可以具有相似的语义。
进一步地,还可以进一步优化虚拟角色的身体动画数据。具体而言,可以判断身体动画数据是否满足预设的姿态约束条件,如果不满足,则可以对身体动画数据进行调整,以得到当前帧的身体动画数据。
在具体实施中,人体姿态信息包括躯干颈部动作信息,可以根据躯干颈部动作信息进行重定向处理,以得到虚拟角色的躯干颈部动画数据,躯干颈部动画数据用于生成虚拟角色的躯干和颈部的动作。具体而言,可以根据用户对应的躯干颈部的关节角数据,重定向得到虚拟角色的躯干颈部的关节角数据。
进一步地,可以获取虚拟角色的四肢动画数据,四肢动画数据用于生成虚拟角色的四肢的动作。其中,四肢动画数据可以是预先设置的。在具体实施中,可以根据用户的选择从多个预设的四肢动画数据中确定当前帧对应的虚拟角色的四肢动画数据。
进一步地,可以对四肢动画数据和躯干颈部动画数据进行融合处理,以得到虚拟角色的身体动画数据。在具体实施中,在进行融合处理之前,可以先判断躯干颈部动画数据对应的动作是否与四肢动画数据对应的动作匹配,如果否,可以调整躯干颈部动画数据,以使调整后的躯干颈部动画数据对应的动作与四肢动画数据对应的动作匹配。其中,躯干颈部动画数据对应的动作与四肢动画数据对应的动作匹配时,生成的虚拟角色的整体的身体姿态是合理、真实的。例如,四肢动画数据对应的动作为挥手再见,躯干颈部动画数据对应的动作为躯干呈半躺姿态,则躯干颈部动画数据对应的动作与四肢动画数据对应的动作不匹配。调整后的躯干颈部动画数据对应的动作为躯干呈直立姿态,则调整后的躯干颈部动画数据对应的动作和四肢动画数据对应 的动作是匹配的。采用上述的方案,可以对躯干颈部动画数据进行微调,以使躯干颈部动画数据和四肢动画数据生成的虚拟角色的整体身体姿态更加合理、真实自然。
更多关于对四肢动画数据和躯干颈部动画数据进行融合处理的内容可以参照上文关于对躯干颈部动作信息和四肢动作信息进行融合处理的相关描述,在此不再赘述。
在另一个实施例中,人体姿态信息为上述躯干颈部动作信息和四肢动作信息融合得到的当前帧的人体姿态信息,则可以根据当前帧的人体姿态信息进行重定向处理,以得到当前帧对应的虚拟角色的身体动画数据。
继续参考图1,在步骤S303的具体实施中,还可以根据当前帧用户对应的脸部表情信息进行重定向处理,以得到当前帧虚拟角色的面部动画数据。
在一个具体的例子中,脸部表情信息为多个混合形状权重,或者是对多个混合形状进行主成分分析得到的多个主成分向量的权重时,如果虚拟角色也预先定义有相同数量和相同语义的混合形状,则可以将权重直接传递给虚拟角色,也即,可以将脸部表情信息直接作为虚拟角色的面部动画数据。
在另一个具体的例子中,可以将脸部表情信息输入至表情映射模型,以得到面部动画数据。
具体地,脸部表情信息为多个混合形状(Blendshapes)的权重,或者是对多个混合形状进行主成分分析得到的多个主成分向量的权重,或者为三维特征点。即多个混合形状(Blendshapes)的权重,或者是对多个混合形状进行主成分分析得到的多个主成分向量的权重,或者为三维特征点输入至表情映射模型,以得到面部动画数据。
具体而言,所述表情映射模型是预先采用第二训练数据对第二预设模型训练得到的,本发明实施例对于第二预设模型的类型和结构并 不进行限制,可以是现有的各种具有学习能力的模型。
进一步地,用于训练的第二训练数据可以包括多组第二样本信息,每组第二样本信息包括:多个样本用户在预设表情下的脸部表情信息和虚拟角色在该预设表情下的面部动画数据。其中,不同组的第二样本信息对应不同的预设表情。其中,多个样本在预设表情下的脸部表情信息可以是预先采集的,虚拟角色在预设表情下的面部动画数据是预先由动画师设置的。由此,采用第二训练数据训练得到的表情映射模型能够学习到脸部表情信息和虚拟角色的面部动画数据之间的映射关系。因此,表情映射模型输出的面部动画数据与用户对应的脸部表情信息可以具有相似的语义。
进一步地,由于第二样本信息包括多个样本用户在预设表情下的脸部表情信息,因此,表情映射模型具有通用性,可以用于根据任一用户对应的脸部表情信息确定虚拟角色的面部动画数据。
进一步地,上文得到的面部动画数据中可以包括嘴部动画数据。在本发明的一些非限制性的例子中,还可以采用下文描述的方法确定嘴部动画数据,并将通过下文确定的嘴部动画数据覆盖上文得到的面部动画数据中的嘴部动画数据,以得到更新后的面部动画数据。
在第一个非限制性的例子中,可以从脸部表情信息中提取出与嘴部相关的表情信息,记为嘴部表情信息。具体而言,可以根据各个混合形状的语义,确定与嘴部相关的混合形状,与嘴部相关的混合形状的权重即为嘴部表情信息。
进一步地,可以将嘴部表情信息输入至第一嘴型映射模型。其中,第一嘴型映射模型是预先采用第三训练数据对第三预设模型进行训练得到的。第三预设模型可以是各种现有的具有学习能力的模型,更具体地,第三预设模型可以是径向基(Radial Basis Function)模型,但并不限于此。
进一步地,第三训练数据可以包括多组第三样本信息,每组第三 样本信息包括多个样本用户在预设表情下的嘴部表情信息和虚拟角色在该预设表情下的嘴部动画数据。其中,不同组的第三样本信息对应不同的预设表情。多个样本用户在预设表情下的嘴部表情信息可以是预先采集的,虚拟角色在预设表情下的嘴部动画数据是预先由动画师设置的。由此,采用第三训练数据训练得到的第一嘴型映射模型能够学习到嘴部表情信息和虚拟角色嘴部动画数据之间的映射关系。因此,第一嘴型映射模型输出的嘴部动画数据与用户的嘴部表情信息可以具有相似的语义。
进一步地,由于第三样本信息包括多个样本用户在预设表情下的嘴部表情信息,因此,第一嘴型映射模型具有通用性,可以用于根据任一用户的嘴部表情信息确定虚拟角色的嘴部动画数据。
在第二个非限制性的例子中,根据当前帧用户对应的三维人脸模型,可以提取与嘴部相关的三维特征点,更具体地,可以根据预先定义的与嘴部相关的顶点索引,从三维人脸模型中提取多个与嘴部相关的三维特征点,记为嘴部三维特征信息。
进一步地,可以将嘴部三维特征信息输入至第二嘴型映射模型,以得到输出的当前帧的嘴部动画数据。其中,第二嘴型映射模型是采用第四训练数据对第四预设模型训练得到的。第四预设模型可以是各种现有的具有学习能力的模型,更具体地,第四预设模型可以是径向基(Radial Basis Function)模型,但并不限于此。
进一步地,第四训练数据可以包括多组第四样本信息,每组第四样本信息包括多个样本用户在预设表情下的嘴部三维特征信息和虚拟角色在该预设表情下的嘴部动画数据。其中,多组第四样本信息对应不同的预设表情。其中,多个样本用户在预设表情下的嘴部三维特征信息可以是预先采集的,更具体地,样本用户在预设表情下的嘴部三维特征信息可以是基于样本用户在预设表情下的三维人脸模型提取得到的;虚拟角色在预设表情下的嘴部动画数据是预先由动画师设置的。由此,采用第四训练数据训练得到的第二嘴型映射模型能够学 习到嘴部三维特征信息和虚拟角色嘴部动画数据之间的映射关系。因此,第二嘴型映射模型输出的嘴部动画数据与用户的嘴部三维特征信息可以具有相似的语义。
进一步地,由于第四样本信息包括多个样本用户在预设表情下的嘴部三维特征信息,因此,第二嘴型映射模型也具有通用性,可以用于根据任一用户的嘴部三维特征信息确定虚拟角色的嘴部动画数据。
进一步地,在步骤S303的具体实施中,还可以根据嘴部动画数据,确定牙齿动画数据。具体而言,可以在嘴部动画数据的基础上加上预设的偏移量,即可得到牙齿动画数据。更具体地,可以从嘴部动画数据中提取出下巴动画数据,并在下巴动画数据的基础上加上预设的偏移量,以得到牙齿动画数据,从而可以使得虚拟角色的牙齿跟随下巴移动,从而使虚拟角色的整体动作姿态更加真实、自然。
进一步地,步骤S303的具体实施中,还可以根据当前帧用户对应的眼神方向信息进行重定向处理,以得到当前帧虚拟角色的眼神动画数据,以使得虚拟角色的眼神方向和用户的眼神方向尽可能一致。
在一个具体的例子中,可以将当前帧用户对应的三维瞳孔中心位置用球坐标表示时的天顶角θ和方位角
Figure PCTCN2022138386-appb-000005
直接传递给虚拟角色。具体而言,虚拟角色的眼球中心位置、眼球半径和虹膜尺寸可以是预先设置的,可以直接将天顶角θ和方位角
Figure PCTCN2022138386-appb-000006
作为虚拟角色的眼球中心位置指向三维瞳孔中心位置的方向。结合虚拟角色的眼球半径,可以确定虚拟角色的三维瞳孔中心位置,由此可以得到虚拟角色的眼神动画数据。
在另一个具体的例子中,可以采用眼神映射模型确定眼神动画数据。具体而言,眼神映射模型可以是预先采用第五训练数据对第五预设模型进行训练得到的,第五预设模型可以是现有的各种具有学习能力的模型。
在一个非限制性的例子中,第五预设模型可以是径向基模型,第 五训练数据可以包括多对第五样本信息,每对第五样本信息包括:用户在预设眼神方向下的三维瞳孔中心位置(可以记为样本瞳孔位置)以及虚拟角色在该预设眼神方向下的三维瞳孔中心位置(可以记为样本虚拟瞳孔位置)。其中,多对第五样本信息对应不同的预设眼神方向。更具体地,多个预设眼神方向可以包括平视、向左看、向右看、向上看和向下看等,但并不限于此。用户在多个预设眼神方向下的三维瞳孔中心位置可以是基于图像检测算法得到的,本实施例对此并不进行限制。虚拟角色在每个预设眼神方向下的三维瞳孔中心位置可以是预先确定的。
进一步地,可以根据多个预设眼神方向下用户的三维瞳孔中心位置和虚拟角色的三维瞳孔中心位置,计算确定径向基模型的RBF权重参数,所述RBF权重参数可以用于表征用户对应的三维瞳孔中心位置和虚拟角色的三维瞳孔中心位置之间的映射关系,由此可以得到眼神映射模型。
进一步地,可以将当前帧用户对应的三维瞳孔中心位置输入至眼神映射模型中,即可得到眼神映射模型输出的当前帧对应的虚拟瞳孔中心位置,从而得到眼神动画数据,其中,虚拟瞳孔位置为虚拟角色的三维瞳孔中心位置。
由此,可以得到当前帧对应的虚拟角色的动画数据,所述动画数据包括但不限于:面部动画数据、身体动画数据和眼球动画数据等。
进一步地,可以根据所述动画数据,确定所述虚拟角色对应的视频流数据。
具体地,可以将虚拟角色的动画数据进行解算和渲染,以得到虚拟角色的视频画面信息。例如,可以将动画数据输入实时引擎(例如,UE4、Unity等)进行解算和渲染。其中,视频画面信息具有与动画数据相同的时间码。
进一步地,可以将所述视频流数据发送至直播服务器,以使所述 直播服务器将所述视频流数据转发至其他用户终端。
进一步地,还可以获取用户输入的语音信息,语音信息和视频画面是来源于不同的设备,可以根据语音信息和视频画面各自的时间码进行同步处理,以得到虚拟角色对应的视频流数据,其中,画面信息是根据所述动画数据对所述虚拟角色进行渲染得到的。从而使得语音和虚拟角色的表情、眼神和姿态同步,从而得到虚拟角色的直播视频数据。
参照图8,图8是本发明实施例中一种虚拟角色的动画生成装置的结构示意图,图8示出的装置可以包括:
图像获取模块81,用于获取当前帧图像,所述当前帧图像包括用户的影像;
计算模块82,用于根据所述当前帧图像,确定当前帧所述用户对应的状态信息,所述状态信息包括:人脸信息、人体姿态信息和眼神方向信息,所述人脸信息包括脸部姿态信息和脸部表情信息;
重定向模块83,用于根据所述状态信息进行重定向处理,以得到所述虚拟角色的动画数据,其中,所述动画数据和所述当前帧图像的时间码相同,所述动画数据包括:面部动画数据、身体动画数据和眼球动画数据。
在具体实施中,上述虚拟角色的动画生成装置可以对应于终端内具有动画生成功能的芯片;或者对应于终端中具有动画生成功能的芯片模组,或者对应于终端。
关于图8示出的虚拟角色的动画生成装置的工作原理、工作方式和有益效果等更多内容,可以参照上文关于图1至图7的相关描述,在此不再赘述。
本发明实施例还提供一种存储介质,其上存储有计算机程序,所述计算机程序被处理器运行时,执行上述的虚拟角色的动画生成方法的步骤。所述存储介质可以包括ROM、RAM、磁盘或光盘等。所述 存储介质还可以包括非挥发性存储器(non-volatile)或者非瞬态(non-transitory)存储器等。
本发明实施例还提供一种终端,包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的计算机程序,所述处理器运行所述计算机程序时执行上述的虚拟角色的动画生成方法的步骤。所述终端包括但不限于手机、计算机、平板电脑等终端设备。
应理解,本申请实施例中,所述处理器可以为中央处理单元(central processing unit,简称CPU),该处理器还可以是其他通用处理器、数字信号处理器(digital signal processor,简称DSP)、专用集成电路(application specific integrated circuit,简称ASIC)、现成可编程门阵列(field programmable gate array,简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
还应理解,本申请实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,简称ROM)、可编程只读存储器(programmable ROM,简称PROM)、可擦除可编程只读存储器(erasable PROM,简称EPROM)、电可擦除可编程只读存储器(electrically EPROM,简称EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,简称RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的随机存取存储器(random access memory,简称RAM)可用,例如静态随机存取存储器(static RAM,简称SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,简称SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,简称DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,简称ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,简称SLDRAM)和直接内存总线随机存取存 储器(direct rambus RAM,简称DR RAM)。
上述实施例,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令或计算机程序。在计算机上加载或执行所述计算机指令或计算机程序时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机程序可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机程序可以从一个网站站点、计算机、服务器或数据中心通过有线或无线方式向另一个网站站点、计算机、服务器或数据中心进行传输。
在本申请所提供的几个实施例中,应该理解到,所揭露的方法、装置和系统,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的;例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式;例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理包括,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。例如,对于应用于或集成于芯片的各个装置、产品,其包含的各个模块/单元可以都采用电路等硬件的方式实现,或者,至少部分模块/单元可以采用软件程序的方式实现,该软件程序运行于芯片内部集成的处理器,剩余的(如 果有)部分模块/单元可以采用电路等硬件方式实现;对于应用于或集成于芯片模组的各个装置、产品,其包含的各个模块/单元可以都采用电路等硬件的方式实现,不同的模块/单元可以位于芯片模组的同一组件(例如芯片、电路模块等)或者不同组件中,或者,至少部分模块/单元可以采用软件程序的方式实现,该软件程序运行于芯片模组内部集成的处理器,剩余的(如果有)部分模块/单元可以采用电路等硬件方式实现;对于应用于或集成于终端的各个装置、产品,其包含的各个模块/单元可以都采用电路等硬件的方式实现,不同的模块/单元可以位于终端内同一组件(例如,芯片、电路模块等)或者不同组件中,或者,至少部分模块/单元可以采用软件程序的方式实现,该软件程序运行于终端内部集成的处理器,剩余的(如果有)部分模块/单元可以采用电路等硬件方式实现。
应理解,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,表示前后关联对象是一种“或”的关系。
本申请实施例中出现的“多个”是指两个或两个以上。
本申请实施例中出现的第一、第二等描述,仅作示意与区分描述对象之用,没有次序之分,也不表示本申请实施例中对设备个数的特别限定,不能构成对本申请实施例的任何限制。
虽然本发明披露如上,但本发明并非限定于此。任何本领域技术人员,在不脱离本发明的精神和范围内,均可作各种更动与修改,因此本发明的保护范围应当以权利要求所限定的范围为准。

Claims (22)

  1. 一种虚拟角色的动画生成方法,其特征在于,所述方法包括:
    获取当前帧图像,所述当前帧图像包括用户的影像;
    根据所述当前帧图像,确定当前帧用户对应的状态信息,所述状态信息包括:人脸信息、人体姿态信息和眼神方向信息,所述人脸信息包括脸部姿态信息和脸部表情信息;
    根据所述状态信息进行重定向处理,以得到所述虚拟角色的动画数据,其中,所述动画数据和所述当前帧图像的时间码相同,所述动画数据包括:面部动画数据、身体动画数据和眼球动画数据。
  2. 根据权利要求1所述的虚拟角色的动画生成方法,其特征在于,所述方法还包括:
    至少根据所述动画数据,确定所述虚拟角色对应的视频流数据;
    将所述视频流数据发送至直播服务器,以使所述直播服务器将所述视频流数据转发至其他用户终端。
  3. 根据权利要求2所述的虚拟角色的动画生成方法,其特征在于,至少根据所述动画数据,确定所述虚拟角色对应的视频流数据包括:
    获取用户输入的语音信息;
    对所述语音信息和画面信息进行同步处理,以得到虚拟角色对应的视频流数据,其中,画面信息是根据所述动画数据对所述虚拟角色进行渲染得到的。
  4. 根据权利要求1所述的虚拟角色的动画生成方法,其特征在于,所述人体姿态信息包括:躯干颈部动作信息,所述躯干颈部动作信息用于描述用户躯干和颈部的动作,所述躯干颈部动作信息是根据所述脸部姿态信息确定的。
  5. 根据权利要求4所述的虚拟角色的动画生成方法,其特征在于,身体动画数据包括躯干颈部动画数据和四肢动画数据,根据所述状态信息进行重定向处理,以得到所述虚拟角色的动画数据包括:
    根据所述躯干颈部动作信息进行重定向处理,以得到所述躯干颈部动画数据;
    获取用户选择的四肢动画数据;
    判断所述躯干颈部动画数据对应的动作和所述四肢动画数据对应的动作是否匹配,如果否,则调整所述躯干颈部动画数据,以使调整后的躯干颈部动画数据与所述四肢动画数据对应的动作匹配;
    将所述四肢动画数据与匹配的躯干颈部动画数据进行融合处理,以得到身体动画数据。
  6. 根据权利要求4所述的虚拟角色的动画生成方法,其特征在于,根据所述当前帧图像,确定当前帧用户对应的状态信息包括:
    获取用户输入的四肢动作信息,所述四肢动作信息用于描述用户四肢的动作;
    对所述躯干颈部动作信息和所述四肢动作信息进行融合处理,以得到当前帧的人体姿态信息。
  7. 根据权利要求6所述的虚拟角色的动画生成方法,其特征在于,对所述躯干颈部动作信息和所述四肢动作信息进行融合处理之前,所述方法还包括:
    判断所述躯干颈部动作信息描述的躯干和颈部的动作是否满足动作条件,如果否,则调整所述躯干颈部动作信息,以使调整后的躯干颈部动作信息描述的躯干和颈部的动作满足所述动作条件;
    其中,所述动作条件是根据所述四肢动作信息确定的。
  8. 根据权利要求4所述的虚拟角色的动画生成方法,其特征在于,
    根据所述当前帧图像,确定当前帧用户对应的状态信息包括:
    根据所述当前帧图像,确定当前帧用户对应的脸部姿态信息;
    将当前帧用户对应的脸部姿态信息输入至人体姿态匹配模型,以得到当前帧用户对应的躯干颈部动作信息;
    其中,所述人体姿态匹配模型是根据第一训练数据对第一预设模型进行训练得到的,所述第一训练数据包括多对第一样本信息,每对第一样本信息包括:样本用户对应的脸部姿态信息和所述样本用户对应的躯干颈部动作信息。
  9. 根据权利要求8所述的虚拟角色的动画生成方法,其特征在于,将所述脸部姿态信息输入至人体姿态匹配模型包括:
    获取关联姿态信息,所述关联姿态信息包括:关联图像中用户对应的脸部姿态信息和/或躯干颈部动作信息,其中,所述关联图像为所述当前帧图像之前的连续多帧图像和/或所述当前帧图像之后的连续多帧图像;
    将当前帧用户对应的脸部姿态信息和所述关联姿态信息输入至所述人体姿态匹配模型,以得到当前帧用户对应的躯干颈部动作信息。
  10. 根据权利要求1所述的虚拟角色的动画生成方法,其特征在于,根据所述当前帧图像,确定用户对应的状态信息包括:
    步骤A:根据当前帧用户对应的初始人脸信息,生成三维人脸模型;
    步骤B:根据所述三维人脸模型,确定预估人脸特征信息,并计算所述预估人脸特征信息和当前帧的目标人脸特征信息之间的第一差异,其中,所述目标人脸特征信息是根据所述当前帧图像检测得到的;
    步骤C:判断是否满足第一预设条件,如果是,则执行步骤D,否 则执行步骤E;
    步骤D:将所述初始人脸信息作为当前帧用户对应的人脸信息;
    步骤E:更新所述初始人脸信息,将更新后的初始人脸信息作为当前帧用户对应的初始人脸信息,并返回至步骤A,直至满足所述第一预设条件;
    其中,首次执行步骤A时,当前帧用户对应的初始人脸信息为上一帧用户对应的人脸信息,或者为预设的人脸信息,所述第一预设条件包括:所述第一差异不大于第一预设阈值和/或更新所述初始人脸信息的次数达到第二预设阈值。
  11. 根据权利要求1所述的虚拟角色的动画生成方法,其特征在于,所述眼神方向信息包括三维瞳孔中心位置,根据所述当前帧图像,确定当前帧用户对应的状态信息包括:
    步骤一:根据当前帧用户对应的眼部信息和预估瞳孔中心位置,确定三维眼球模型,其中,所述眼部信息包括:眼球中心位置、眼球半径和虹膜尺寸;
    步骤二:根据所述三维眼球模型,确定预估眼部特征信息,并计算所述预估眼部特征信息和目标眼部特征信息之间的第二差异,其中,所述目标眼部特征信息是根据所述当前帧图像检测得到的;
    步骤三:判断是否满足第二预设条件,如果是,则执行步骤四,否则,执行步骤五;
    步骤四:将所述预估瞳孔中心位置作为当前帧用户对应的三维瞳孔中心位置;
    步骤五:更新所述预估瞳孔中心位置,将更新后的预估瞳孔中心位置作为当前帧用户对应的预估瞳孔中心位置,并返回步骤一,直至满足所述第二预设条件;
    其中,首次执行步骤一时,当前帧用户对应的预估瞳孔中心位置 是上一帧用户对应的三维瞳孔中心位置或者是预设位置,所述第二预设条件包括:所述第二差异不大于第三预设阈值和/或所述更新所述预估瞳孔中心位置的次数达到第四预设阈值。
  12. 根据权利要求1所述的虚拟角色的动画生成方法,其特征在于,所述人体姿态信息包括第一骨骼模型的关节角数据,根据所述状态信息进行重定向处理,以得到所述虚拟角色的动画数据包括:
    生成过渡骨骼模型,所述过渡骨骼模型中多个预设关键关节的位置和所述第一骨骼模型中所述多个预设关键关节的位置相同,所述过渡骨骼模型的骨骼形态和第二骨骼模型的骨骼形态相同;
    根据所述第一骨骼模型的关节角数据和所述第一骨骼模型,确定所述多个预设关键关节的位置;
    根据所述多个预设关键关节的位置和所述过渡骨骼模型,确定所述过渡骨骼模型的关节角数据,以得到所述虚拟角色的身体动画数据;
    其中,所述第一骨骼模型为与用户对应的骨骼模型,所述第二骨骼模型为所述虚拟角色的骨骼模型,所述骨骼形态包括骨骼的数量和每个关节的旋转轴的默认朝向。
  13. 根据权利要求1所述的虚拟角色的动画生成方法,其特征在于,根据所述状态信息进行重定向处理,以得到所述虚拟角色的动画数据包括:
    将所述脸部表情信息输入至表情映射模型,其中,所述表情映射模型是根据第二训练数据对第二预设模型训练得到的,所述第二训练数据包括多组第二样本信息,每组样本信息包括:多个样本用户在预设表情下的脸部表情信息和所述虚拟角色在该预设表情下的面部动画数据,其中,所述多组第二样本信息对应不同的预设表情;
    获取所述表情映射模型输出的面部动画数据。
  14. 根据权利要求1所述的虚拟角色的动画生成方法,其特征在于,所述面部动画数据包括嘴部动画数据,根据所述状态信息进行重定向处理,以得到所述虚拟角色的动画数据包括:
    从所述脸部表情信息中提取与嘴部相关的表情信息,记为嘴部表情信息;
    将所述嘴部表情信息输入至第一嘴型映射模型,其中,所述第一嘴型映射模型是根据第三训练数据对第三预设模型训练得到的,所述第三训练数据包括多组第三样本信息,每组第三样本信息包括:多个样本用户在预设表情下的嘴部表情信息以及所述虚拟角色在所述预设表情下的嘴部动画数据,其中,所述多组第三样本信息对应不同的预设表情;
    获取所述第一嘴型映射模型输出的嘴部动画数据。
  15. 根据权利要求1所述的虚拟角色的动画生成方法,其特征在于,所述面部动画数据包括嘴部动画数据,根据所述状态信息进行重定向处理,以得到所述虚拟角色的动画数据包括:
    根据当前帧用户对应的三维人脸模型,提取与嘴部相关的三维特征点,记为嘴部三维特征信息;
    将所述嘴部三维特征信息输入至第二嘴型映射模型,其中,所述第二嘴型映射模型是根据第四训练数据对第四预设模型训练得到的,所述第四训练数据包括多组第四样本信息,每组第四样本信息包括:多个样本用户在预设表情下的嘴部三维特征信息和所述虚拟角色在所述预设表情下的嘴部动画数据,其中,所述多组第四样本信息对应不同的预设表情;
    获取所述第二嘴型映射模型输出的嘴部动画数据。
  16. 根据权利要求14或15所述的虚拟角色的动画生成方法,其特征在于,所述动画数据还包括牙齿动画数据,根据所述状态信息进行重定向处理,以得到所述虚拟角色的动画数据还包括:
    根据所述嘴部动画数据,确定所述牙齿动画数据。
  17. 根据权利要求11所述的虚拟角色的动画生成方法,其特征在于,所述眼神方向信息为所述三维瞳孔中心位置在以所述眼球中心位置为坐标原点的球坐标系中的天顶角和方位角,根据所述状态信息进行重定向处理,以得到所述虚拟角色的动画数据包括:
    根据所述虚拟角色的眼球半径和所述眼神方向信息,确定虚拟瞳孔位置,以得到所述眼球动画数据,其中,所述虚拟瞳孔位置为所述虚拟角色的三维瞳孔中心位置。
  18. 根据权利要求11所述的虚拟角色的动画生成方法,其特征在于,根据所述状态信息进行重定向处理,以得到所述虚拟角色的动画数据包括:
    将当前帧用户对应的三维瞳孔中心位置输入至眼神映射模型,其中,所述眼神映射模型是根据第五训练数据对第五预设模型进行训练得到的,所述第五训练数据包括多对第五样本信息,每对第五样本信息包括用户在预设眼神方向下的三维瞳孔中心位置和所述虚拟角色在该预设眼神方向下的三维瞳孔中心位置;
    从所述眼神映射模型获取虚拟瞳孔中心位置,以得到所述眼球动画数据,其中,所述虚拟瞳孔位置为所述虚拟角色的三维瞳孔中心位置。
  19. 根据权利要求1所述的虚拟角色的动画生成方法,其特征在于,所述当前帧图像是由单个摄像头采集的。
  20. 一种虚拟角色的动画生成装置,其特征在于,所述装置包括:
    图像获取模块,用于获取当前帧图像,所述当前帧图像包括用户的影像;
    计算模块,用于根据所述当前帧图像,确定当前帧用户对应的状态信息,所述状态信息包括:人脸信息、人体姿态信息和眼神方 向信息,所述人脸信息包括脸部姿态信息和脸部表情信息;
    重定向模块,用于根据所述状态信息进行重定向处理,以得到所述虚拟角色的动画数据,其中,所述动画数据和所述当前帧图像的时间码相同,所述动画数据包括:面部动画数据、身体动画数据和眼球动画数据。
  21. 一种存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器运行时,执行权利要求1至19中任一项所述的虚拟角色的动画生成方法的步骤。
  22. 一种终端,包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的计算机程序,其特征在于,所述处理器运行所述计算机程序时执行权利要求1至19中任一项所述的虚拟角色的动画生成方法的步骤。
PCT/CN2022/138386 2021-12-14 2022-12-12 虚拟角色的动画生成方法及装置、存储介质、终端 WO2023109753A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111527313.9 2021-12-14
CN202111527313.9A CN114219878B (zh) 2021-12-14 2021-12-14 虚拟角色的动画生成方法及装置、存储介质、终端

Publications (1)

Publication Number Publication Date
WO2023109753A1 true WO2023109753A1 (zh) 2023-06-22

Family

ID=80701814

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/138386 WO2023109753A1 (zh) 2021-12-14 2022-12-12 虚拟角色的动画生成方法及装置、存储介质、终端

Country Status (2)

Country Link
CN (1) CN114219878B (zh)
WO (1) WO2023109753A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117541758A (zh) * 2023-11-28 2024-02-09 吉林动画学院 虚拟人脸配置参数生成方法、装置、设备和存储介质
CN117893696A (zh) * 2024-03-15 2024-04-16 之江实验室 一种三维人体数据生成方法、装置、存储介质及电子设备

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114219878B (zh) * 2021-12-14 2023-05-23 魔珐(上海)信息科技有限公司 虚拟角色的动画生成方法及装置、存储介质、终端
CN115334325A (zh) * 2022-06-23 2022-11-11 联通沃音乐文化有限公司 基于可编辑三维虚拟形象生成直播视频流的方法和系统
WO2024000480A1 (zh) * 2022-06-30 2024-01-04 中国科学院深圳先进技术研究院 3d虚拟对象的动画生成方法、装置、终端设备及介质
CN115393486B (zh) * 2022-10-27 2023-03-24 科大讯飞股份有限公司 虚拟形象的生成方法、装置、设备及存储介质
CN115665507B (zh) * 2022-12-26 2023-03-21 海马云(天津)信息技术有限公司 含虚拟形象的视频流数据的生成方法、装置、介质及设备
CN116152900B (zh) * 2023-04-17 2023-07-18 腾讯科技(深圳)有限公司 表情信息的获取方法、装置、计算机设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111970535A (zh) * 2020-09-25 2020-11-20 魔珐(上海)信息科技有限公司 虚拟直播方法、装置、系统及存储介质
CN112700523A (zh) * 2020-12-31 2021-04-23 魔珐(上海)信息科技有限公司 虚拟对象面部动画生成方法及装置、存储介质、终端
CN113192132A (zh) * 2021-03-18 2021-07-30 魔珐(上海)信息科技有限公司 眼神捕捉方法及装置、存储介质、终端
US20210279934A1 (en) * 2020-03-09 2021-09-09 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for generating virtual avatar
CN114219878A (zh) * 2021-12-14 2022-03-22 魔珐(上海)信息科技有限公司 虚拟角色的动画生成方法及装置、存储介质、终端

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107154069B (zh) * 2017-05-11 2021-02-02 上海微漫网络科技有限公司 一种基于虚拟角色的数据处理方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210279934A1 (en) * 2020-03-09 2021-09-09 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for generating virtual avatar
CN111970535A (zh) * 2020-09-25 2020-11-20 魔珐(上海)信息科技有限公司 虚拟直播方法、装置、系统及存储介质
CN112700523A (zh) * 2020-12-31 2021-04-23 魔珐(上海)信息科技有限公司 虚拟对象面部动画生成方法及装置、存储介质、终端
CN113192132A (zh) * 2021-03-18 2021-07-30 魔珐(上海)信息科技有限公司 眼神捕捉方法及装置、存储介质、终端
CN114219878A (zh) * 2021-12-14 2022-03-22 魔珐(上海)信息科技有限公司 虚拟角色的动画生成方法及装置、存储介质、终端

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117541758A (zh) * 2023-11-28 2024-02-09 吉林动画学院 虚拟人脸配置参数生成方法、装置、设备和存储介质
CN117893696A (zh) * 2024-03-15 2024-04-16 之江实验室 一种三维人体数据生成方法、装置、存储介质及电子设备
CN117893696B (zh) * 2024-03-15 2024-05-28 之江实验室 一种三维人体数据生成方法、装置、存储介质及电子设备

Also Published As

Publication number Publication date
CN114219878B (zh) 2023-05-23
CN114219878A (zh) 2022-03-22

Similar Documents

Publication Publication Date Title
WO2023109753A1 (zh) 虚拟角色的动画生成方法及装置、存储介质、终端
Kuster et al. Gaze correction for home video conferencing
TWI659335B (zh) 圖形處理方法和裝置、虛擬實境系統和計算機儲存介質
JP6234383B2 (ja) ビデオ会議における眼差し補正のための画像処理のための方法およびシステム
US11888909B2 (en) Avatar information protection
US11842437B2 (en) Marker-less augmented reality system for mammoplasty pre-visualization
WO2021004257A1 (zh) 视线检测和视频处理的方法、装置、设备和存储介质
CN109671141B (zh) 图像的渲染方法和装置、存储介质、电子装置
WO2021244172A1 (zh) 图像处理和图像合成方法、装置和存储介质
WO2024022065A1 (zh) 虚拟表情生成方法、装置、电子设备和存储介质
CN113192132B (zh) 眼神捕捉方法及装置、存储介质、终端
WO2022237249A1 (zh) 三维重建方法、装置和系统、介质及计算机设备
CN114821675B (zh) 对象的处理方法、系统和处理器
KR20180098507A (ko) 애니메이션 생성 방법 및 애니메이션 생성 장치
WO2023035725A1 (zh) 虚拟道具展示方法及装置
US20220270337A1 (en) Three-dimensional (3d) human modeling under specific body-fitting of clothes
KR20200134623A (ko) 3차원 가상 캐릭터의 표정모사방법 및 표정모사장치
WO2024113779A1 (zh) 图像处理方法、装置及相关设备
WO2023185241A1 (zh) 数据处理方法、装置、设备以及介质
JP7504968B2 (ja) アバター表示装置、アバター生成装置及びプログラム
JP7200439B1 (ja) アバター表示装置、アバター生成装置及びプログラム
WO2022205167A1 (zh) 图像处理方法、装置、可移动平台、终端设备和存储介质
CN115145395B (zh) 虚拟现实交互控制方法、系统及虚拟现实设备
US20240020901A1 (en) Method and application for animating computer generated images
WO2024051289A1 (zh) 图像背景替换方法及相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22906498

Country of ref document: EP

Kind code of ref document: A1