WO2024027661A1 - 数字人驱动方法、装置、设备及存储介质 - Google Patents

数字人驱动方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2024027661A1
WO2024027661A1 PCT/CN2023/110343 CN2023110343W WO2024027661A1 WO 2024027661 A1 WO2024027661 A1 WO 2024027661A1 CN 2023110343 W CN2023110343 W CN 2023110343W WO 2024027661 A1 WO2024027661 A1 WO 2024027661A1
Authority
WO
WIPO (PCT)
Prior art keywords
module
motion
information
digital human
control instruction
Prior art date
Application number
PCT/CN2023/110343
Other languages
English (en)
French (fr)
Inventor
崔雨豪
蒲黎明
史运洲
丁浩生
赵中州
周伟
肖志勇
陈海青
Original Assignee
阿里巴巴(中国)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴(中国)有限公司 filed Critical 阿里巴巴(中国)有限公司
Publication of WO2024027661A1 publication Critical patent/WO2024027661A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present disclosure relates to the field of information technology, and in particular to a digital human driving method, device, equipment and storage medium.
  • Digital people can be understood as characters in the virtual world.
  • digital humans need to move freely in the virtual world and interact with the surrounding environment.
  • each node in the state transition graph is an animation clip
  • each edge in the state transition graph is a state transition.
  • the animation clip used to drive the digital human changes from an animation clip connected to the edge to another animation clip connected to the edge, so that the animation clip used to drive the digital human Animation clips are transferred between different animation clips.
  • this method requires manual pre-construction of different state transition conditions and corresponding animation clips, resulting in high labor costs.
  • the present disclosure provides a digital human driving method, device, equipment and storage medium.
  • This embodiment can freely switch between two ways of determining skeletal motion information. , so that the control instructions in different scenarios can generate the skeletal motion information that drives the digital human in different ways.
  • Digital people driven thus saving on labor costs.
  • embodiments of the present disclosure provide a digital human driving method, including:
  • control instruction determine the target module that executes the control instruction from the motion matching module and the motion control module
  • the target module is the motion matching module, determine the target animation segment matching the control instruction from a plurality of preset animation segments according to the control instruction, and combine the bones in the target animation segment Motion information serves as skeletal motion information that drives the digital human;
  • the control instructions and the historical motion bones of the digital human are The skeleton information and historical motion trajectories are input into the pre-trained machine learning model, and the skeleton motion information used to drive the digital human is generated through the machine learning model;
  • the digital human is driven to move.
  • an embodiment of the present disclosure provides a digital human driving device, including:
  • Acquisition module used to obtain control instructions for driving the digital human
  • a first determination module configured to determine the target module for executing the control instruction from the motion matching module and the motion control module according to the control instruction
  • the second determination module is configured to, if the target module is the motion matching module, determine a target animation segment that matches the control instruction from a plurality of preset animation segments according to the control instruction, and combine the The skeletal motion information in the target animation clip is used as the skeletal motion information that drives the digital human;
  • a generation module configured to, if the target module is the motion control module, input the control instructions, historical motion skeleton information and historical motion trajectories of the digital human into the pre-trained machine learning model, and use the The machine learning model generates skeletal motion information for driving the digital human;
  • a driving module configured to drive the digital human to move according to the skeletal motion information.
  • an electronic device including:
  • the computer program is stored in the memory and configured to be executed by the processor to implement the method as described in the first aspect.
  • embodiments of the present disclosure provide a computer-readable storage medium on which a computer program is stored, and the computer program is executed by a processor to implement the method described in the first aspect.
  • the digital human driving method, device, equipment and storage medium determine the target module for executing the control instruction from the motion matching module and the motion control module by driving the digital human control instruction.
  • the selected target modules may be different, so flexible switching between the motion matching module and the motion control module can be achieved.
  • the motion matching module can determine a target animation segment that matches the control instruction from a plurality of preset animation segments according to the control instruction, and use the skeletal motion information in the target animation segment as a driver for the digital human skeletal motion information.
  • the motion control module can input the control instructions, the historical motion skeleton information and the historical motion trajectory of the digital human into the pre-trained machine learning model, and generate the skeleton for driving the digital human through the machine learning model.
  • Sports information Therefore, the motion matching module and the motion control module determine skeletal motion information in different ways respectively.
  • This embodiment can freely switch between the two ways of determining skeletal motion information, so that control instructions in different scenarios can be generated in different ways.
  • To drive the skeletal motion information of the digital human there is no need to construct a state transition diagram, nor do you need to pre-construct different state transition conditions and corresponding animation clips to realize driving the digital human, thus saving labor costs.
  • Figure 1 is a flow chart of a digital human driving method provided by an embodiment of the present disclosure
  • Figure 2 is a schematic diagram of an application scenario provided by an embodiment of the present disclosure
  • Figure 3 is a flow chart of a digital human driving method provided by another embodiment of the present disclosure.
  • Figure 4 is a flow chart of a digital human driving method provided by another embodiment of the present disclosure.
  • Figure 5 is a schematic structural diagram of a digital human driving device provided by an embodiment of the present disclosure.
  • FIG. 6 is a schematic structural diagram of an electronic device embodiment provided by an embodiment of the present disclosure.
  • each node in the state transition graph is an animation clip
  • each edge in the state transition graph is a state transition condition. If on an edge If the state transition condition is established, the animation segment used to drive the digital human changes from one animation segment connected to the edge to another animation segment connected to the edge, so that the animation segment used to drive the digital human changes between different animation segments. transfer.
  • this method requires manual pre-construction of different state transition conditions and corresponding animation clips, resulting in high labor costs.
  • embodiments of the present disclosure provide a digital human driving method, which is introduced below with reference to specific embodiments.
  • Figure 1 is a flow chart of a digital human driving method provided by an embodiment of the present disclosure.
  • the method can be executed by a digital human driving device, which can be implemented in the form of software and/or hardware.
  • the device can be configured in an electronic device, such as a server or a terminal, where the terminal specifically includes a mobile phone, a computer or a tablet computer.
  • the digital human driving method described in this embodiment can be applied to the application scenario shown in Figure 2.
  • the application scenario includes a terminal 21 and a server 22, where the server 22 can drive the digital human using the method described in the embodiment of the present disclosure, and send video files or video streams of the digital human's actions to the terminal 21 , so that the terminal 21 can play pictures of digital people doing actions.
  • the terminal 21 can use the method described in the embodiment of the present disclosure to drive the digital human and play the picture of the digital human doing actions.
  • the method is introduced in detail below in conjunction with Figure 2, as shown in Figure 1. The specific steps of the method are as follows:
  • the server 22 drives the digital human as an example.
  • the server 22 can obtain the control instructions for driving the digital human.
  • the control instruction may be a control instruction from the terminal 21 , for example, a control instruction issued by a user of the terminal 21 .
  • the control instructions may be generated by the server 22.
  • control instruction determine the target module that executes the control instruction from the motion matching module and the motion control module.
  • the server 22 may include a motion matching module and a motion control module, and these two modules may be implemented in software and/or hardware respectively.
  • the motion matching module and the motion control module can respectively determine the skeletal motion information used to drive the digital human.
  • the principles and specific processes of the motion matching module and the motion control module for determining the skeletal motion information are different.
  • the motion matching module can select an animation segment that best matches the control instruction from multiple existing animation segments as the skeletal motion information that drives the digital human.
  • the motion control module can directly generate the driver through a pre-trained machine learning model. Digital human skeletal motion information. Therefore, when the server 22 obtains the control instruction, it is necessary to determine the target module that executes the control instruction from the motion matching module and the motion control module.
  • the server 22 needs to determine a module as a target module from the motion matching module and the motion control module, and the target module determines the skeletal motion information that drives the digital human.
  • the server 22 may pre-store a plurality of preset control instructions and the identification of the module that executes each preset control instruction. That is, the server 22 may pre-store the relationship between the preset control instructions and the module identification. Correspondence.
  • the server 22 obtains a certain control instruction, it can query a preset control instruction that best matches the control instruction from the corresponding relationship according to the control instruction, and assign the module identifier corresponding to the preset control instruction to the The module serves as the module that executes the control instructions.
  • this embodiment can determine in advance which control instructions are executed by which module among the motion matching module and the motion control module.
  • the motion matching module uses a motion matching algorithm to process the control instructions
  • the motion control module uses a motion control algorithm to process the control instructions.
  • the motion matching algorithm may specifically be an algorithm for determining a target animation segment that matches the control instruction from a plurality of preset animation segments.
  • the motion control algorithm can be an algorithm adopted by a machine learning model.
  • the machine learning model can generate the skeletal motion information of the digital person at the next moment or frame based on the control instructions, the historical motion skeleton information and the historical motion trajectory of the digital person. .
  • the target module is the motion matching module, determine the target animation segment that matches the control instruction from the plurality of preset animation segments according to the control instruction, and add the target animation segment to the
  • the skeletal motion information is used as the skeletal motion information that drives the digital human.
  • the server 22 can use the motion matching module as the target module to execute the control instruction. That is to say, the server 22 can give the control instruction to the motion matching module for execution.
  • the motion matching module executes the control instruction, it can determine the target animation fragment matching the control instruction from multiple preset animation fragments stored in the database according to the control instruction, and move the bones in the target animation fragment Information as the skeletal motion information that drives the digital person.
  • the target module is the motion control module
  • the model generates skeletal motion information used to drive the digital human.
  • the historical motion skeleton information of the digital person includes at least one of the following: each skeleton of the digital person The position information, displacement information and rotation information of the point on each trajectory point in the historical movement trajectory; the status information of the digital person on each trajectory point in the historical movement trajectory.
  • the server 22 can use the motion control module as the target module to execute the control instruction. That is to say, the server 22 can give the control instruction to the motion control module for execution.
  • the motion control module executes the control instruction, it can input the control instruction, the historical movement skeleton information and the historical movement trajectory of the digital human into the pre-trained machine learning model, so that the machine learning model can be generated based on the input information.
  • the skeletal motion information of the digital person at the next moment or frame.
  • the digital person includes multiple skeletal points
  • the historical movement trajectory includes multiple historical trajectory points.
  • the historical movement skeletal information of the digital person can be the skeletal movement information of the digital person during its movement on the historical movement trajectory.
  • the historical motion skeleton information of the digital person includes the skeletal posture information of the digital person at each historical track point, and the status information of the digital person at each historical track point, where the digital person is at each historical track point.
  • the skeletal posture information on the trajectory points includes the position information of each skeletal point of the digital person at each historical trajectory point or at each historical moment, and the position information of each skeletal point at two adjacent historical trajectory points or adjacent Displacement information and rotation information between two historical moments.
  • historical trajectory points and historical moments may or may not correspond one-to-one.
  • the status information of the digital person at each historical track point includes walking, running, squatting, standing, etc.
  • the skeletal motion information of the digital person at the next moment or the next frame includes the skeletal posture information of the digital person at the next moment or the next frame.
  • the skeletal posture information of the digital person at the next moment or the next frame includes the The position information of each skeletal point of the digital human at the next moment or the next frame, and the displacement information and rotation of each skeletal point at the next moment relative to the current moment or the next frame relative to the current frame. information. It can be understood that one of the multiple skeletal points included in the digital human is a root node, or the root node can be determined based on the multiple skeletal points, and the projection point of the root node on the ground is recorded as a trajectory point. .
  • the digital human when the server 22 determines the skeletal motion information used to drive the digital human, the digital human can be driven to move in a redirected manner, for example, by binding the rotation information and displacement information of each skeletal point included in the skeletal motion information. to the bones of the digital person, so that the bones of the digital person can perform similar movements to the bones in the bone movement information.
  • the embodiment of the present disclosure determines the target module that executes the control instruction from the motion matching module and the motion control module by driving the digital human control instruction.
  • the selected target module may be different, so the motion can be realized.
  • Flexible switching between matching module and motion control module Because the motion matching module can determine a target animation segment that matches the control instruction from a plurality of preset animation segments according to the control instruction, and use the skeletal motion information in the target animation segment as a driver for the digital human skeletal motion information.
  • the motion control module can input the control instructions, the historical motion skeleton information and the historical motion trajectory of the digital human into the pre-trained machine learning model, and generate the skeleton for driving the digital human through the machine learning model. Sports information.
  • the motion matching module and the motion control module determine skeletal motion information in different ways respectively.
  • This embodiment can freely switch between the two ways of determining skeletal motion information, so that control instructions in different scenarios can be generated in different ways.
  • To drive the skeletal motion information of the digital human there is no need to construct a state transition diagram, nor do you need to pre-construct different state transition conditions and corresponding animation clips to realize driving the digital human, thus saving labor costs.
  • FIG. 3 is a flow chart of a digital human driving method provided by another embodiment of the present disclosure.
  • the specific steps of this method are as follows:
  • the server 22 includes an instruction parsing module, a dynamic state machine, a pre-processing module, a motion matching module, a motion control module, and a post-processing module.
  • the server 22 can implement a driving scheme for digital humans through these modules. Through this scheme, it can generate all-terrain positioning animations, scene interaction animations, long-sequence action animations, etc. corresponding to various instructions.
  • the instruction analysis module can receive control signals for driving the digital human such as brain waves, audio signals, visual signals, voice signals, text signals, path planning signals, etc. as shown in Figure 4. These control signals can be generated by the terminal 21. Sent to server 22, or may be generated by server 22.
  • the brain wave can be a signal induced by an EEG sensor.
  • the EEG sensor can be set in a wearable device.
  • the wearable device can be a terminal 21 and worn on the head of a real person.
  • the EEG sensor can sense different signals. For example, when a real person's brain is thinking "walk forward”, the signal sensed by the EEG sensor is 0; when a real person's brain is thinking "go backward", the signal sensed by the EEG sensor is 1. Therefore, when the brain wave representative signal received by the instruction analysis module is 0, the instruction analysis module analyzes the brain wave into a control instruction of "go forward".
  • the instruction analysis module analyzes the brain wave into a control instruction of "go backward".
  • the 0 and 1 signals sensed by the EEG sensor represent different meanings respectively.
  • the 0 and 1 signals sensed by the EEG sensor represent different meanings.
  • Signal 0 can be parsed by the instruction parsing module as "turn right”
  • signal 1 can be parsed as "turn left”.
  • the audio signal shown in Figure 4 can be a piece of music or audio with rhythm.
  • the instruction analysis module can parse out the control instructions for controlling the behavior of the digital human from the audio signal. For example, according to the volume of the music, it can parse out the control instructions.
  • the control instructions for controlling the movement amplitude of the digital human are analyzed to obtain the control instructions for controlling the rhythm of the digital human's footsteps based on the rhythm of the audio signal.
  • the visual signal can be a video shot in the real world.
  • the instruction analysis module can parse the character movements in the visual signal and convert the character movements into corresponding skeletal motion information.
  • the visual signal may be a virtual visual signal, that is, a visual signal simulated in a virtual environment, such as a 16-line, 64-line vision system, etc.
  • the voice signal may be a voice sent by the user of the terminal 21 for controlling the digital human.
  • the instruction parsing module may convert the voice into text information through automatic speech recognition (Automatic Speech Recognition, ASR) technology, and further parse the text information into At least one control command.
  • the text signal shown in Figure 4 can be text information, and the instruction parsing module can parse the text information and decompose the text information into continuous independent control instructions. For example, the text information is "Go to the chair in front and sit down.” This text information can be decomposed into two control instructions by the instruction parsing module.
  • the path planning signal as shown in Figure 4 can include a destination.
  • the instruction analysis module can perform automatic path planning based on the destination and select the optimal path as the obstacle avoidance path for the digital person. Further, the instruction analysis module can convert the The optimal path is parsed into multiple control instructions. Each control instruction may include position information of a trajectory point on the optimal path, so that the digital human is controlled to move along the optimal path through the multiple control instructions. It can be understood that in other embodiments, the control instructions issued by the instruction parsing module can be replaced by control instructions issued by other control modules, or the instruction parsing module can be replaced by any module capable of issuing control instructions.
  • control instructions received by the dynamic state machine can also be any control instructions input manually.
  • the parsing method used by the instruction parsing module is not limited to the parsing method mentioned above.
  • the instruction parsing module can also parse the control signals it receives through a machine learning model, thereby directly analyzing the control signals it receives. Parse into control instructions, for example, directly parse the text signal or voice signal received by the instruction parsing module into the corresponding control instruction.
  • the instruction analysis module can receive at least one control signal among brain waves, audio signals, visual signals, voice signals, text signals, and path planning signals, and can analyze each control signal is at least one control instruction. Therefore, within a period of time, the instruction parsing module can parse multiple control instructions, and further, the instruction parsing module can send the multiple control instructions parsed to the dynamic state machine. At this time, the dynamic state machine can sort the multiple control instructions, for example, sort them according to the order of execution, thereby obtaining the sorting result. For example, within a period of time, the instruction parsing module parses three control instructions, which are recorded as control instruction A, control instruction B, and control instruction C respectively. The sorting result obtained after the dynamic state machine sorts the three control instructions is control instruction B, control instruction A, and control instruction C.
  • control instruction B is the currently first unexecuted control instruction.
  • control instruction determine the target module that executes the control instruction from the motion matching module and the motion control module.
  • the dynamic state machine determines the target module that executes the control instruction B from the motion matching module and the motion control module.
  • the motion matching module may include multiple sub-modules.
  • the motion matching module includes three sub-modules, which are respectively recorded as displacement sub-module, interaction sub-module and action sub-module.
  • the displacement sub-module is used to process the control instructions about the displacement of the digital human
  • the interaction sub-module is used to process Regarding control instructions for digital human interaction
  • the action submodule is used to process control instructions for digital human actions.
  • the digital human displacement includes the displacement changes caused by the digital human walking, going up and down stairs, climbing mountains, etc.
  • Digital human interaction includes static interactions between digital humans and static objects in the virtual environment (such as sofas, chairs, etc.), and dynamic interactions between digital humans and dynamic objects in the virtual environment (such as other digital humans).
  • Digital human actions include digital human dancing, in-situ martial arts and other in-situ posture changes. It can be understood that in other embodiments, it is not limited to one sub-module to process the control instructions regarding the displacement, interaction or action of the digital human. For example, taking the displacement of the digital human as an example, it can be processed jointly by multiple displacement sub-modules. about The control instructions for the displacement of the digital human, or each displacement sub-module in the plurality of displacement sub-modules is independently used to process the control instructions for the displacement of the digital human.
  • the motion control module can also include three sub-modules, for example, a displacement sub-module, an interaction sub-module and an action sub-module.
  • the functions of each sub-module are as described above and will not be described again here.
  • the displacement sub-module since the scenes and/or control instructions applicable to the motion matching module and the motion control module are different, for the same sub-module, such as the displacement sub-module, the displacement sub-module in the motion matching module and The displacement sub-modules in the motion control module are applicable to different scenarios and/or control instructions.
  • the dynamic state machine determines the target module for executing the control instruction B from the motion matching module and the motion control module according to the control instruction B, it can specifically select from the three sub-modules included in the motion matching module and the three sub-modules included in the motion control module.
  • a submodule is determined as the target module in the module.
  • the dynamic state machine can select the displacement sub-module in the motion matching module as the target module. If the control instruction B is a control instruction about the displacement of the digital human, and the displacement of the digital human is the displacement generated in scenarios such as going up and down stairs, climbing mountains, etc., the dynamic state machine can select the displacement sub-module in the motion control module as the target module.
  • the target module can send a completion signal to the dynamic state machine.
  • the first unexecuted control instruction in the sorting result as mentioned above becomes control instruction A.
  • the dynamic state machine can determine the target module for control instruction A. The determination process is similar to determining the target for control instruction B. The process of the module will not be repeated here.
  • control instruction A is executed, control instruction C becomes the first unexecuted control instruction in the sorting result.
  • the dynamic state machine can determine the target module for control instruction C, and the target module controls the Instruction C is processed.
  • the dynamic state machine can concatenate (for example, sort) multiple control instructions issued by the instruction parsing module, and distribute the multiple control instructions to different sub-modules or motions in the motion matching module in sequence according to the sorting results. Different sub-modules in the control module are processed.
  • the current state of the digital human is an idle state (such as a standing resting state).
  • the instruction analysis module issues two control instructions to the dynamic state machine at this time, namely "walk to the chair” and “sit on the chair.””.
  • the dynamic state machine sorts these two control instructions. The sorting result is "walk to the chair” first and “sit on the chair” last.
  • the dynamic state machine determines the submodule suitable for completing "walking to the chair" from the three submodules included in the motion matching module and the three submodules included in the motion control module, for example, the displacement submodule in the motion matching module. Then, the dynamic state machine distributes "walk to the chair” to the displacement sub-module in the motion matching module.
  • the displacement sub-module After the displacement sub-module is completed, it returns a completed signal to the dynamic state machine. At this time, the digital human can return to the idle state and wait for the call. Then, the dynamic state machine determines the submodule suitable for completing "sit on the chair" from the three submodules included in the motion matching module and the three submodules included in the motion control module, for example, the interaction submodule in the motion control module. Further, the dynamic state machine distributes "sit on the chair” to the interaction sub-module in the motion control module. When "sit on the chair" is processed by the interaction sub-module, the digital human can return to the idle state again.
  • the idle state that the digital person returns to may be different.
  • the idle state that the digital person returns to may be the idle state. The state at the end of the last action in a series of consecutive actions corresponding to the control instruction.
  • the target module is the motion matching module, determine the target animation segment matching the control instruction from the plurality of preset animation segments according to the control instruction, and add the target animation segment to the target animation segment.
  • the skeletal motion information is used as the skeletal motion information that drives the digital human.
  • the motion matching module and the motion control module respectively have corresponding pre-processing modules.
  • the pre-processing modules corresponding to the motion matching module and the motion control module can be the same module, or they can be different modules. If it is the same module, then the pre-processing processes performed by the pre-processing module for the motion matching module and the motion control module are different.
  • the main function of the pre-processing module corresponding to the motion matching module is to refine multiple preset animation clips stored in the database.
  • the motion matching module or the sub-module in the motion matching module can select the target module from the database according to the control instruction.
  • a target animation segment that matches the control instruction is determined from a plurality of stored preset animation segments, and the skeletal motion information in the target animation segment is used as the skeletal motion information that drives the digital human.
  • the motion matching module can determine the target animation fragment that matches the control instruction from multiple preset animation fragments, and output The target animation clip.
  • determining a target animation fragment matching the control instruction from a plurality of preset animation fragments includes: according to at least one historical animation that drives the movement of the digital human The segment and the control instruction determine a target animation segment that matches the control instruction and is connected to the at least one historical animation segment from a plurality of preset animation segments.
  • the input of the motion matching module not only includes the control instruction, but may also include, for example, the first n animation clips.
  • the first n animation clips are historical animation clips that drive the movement of the digital human.
  • the number of historical animation clips is n, and n is greater than or equal to 1. That is to say, the input of the motion matching module may include the first n animation segments and the control instruction. In this case, the motion matching module not only needs to determine from a plurality of preset animation segments that are consistent with the control instruction.
  • the matching target animation clip must also be such that the connection degree between the determined target animation clip and the first n animation clips is greater than or equal to the preset connection degree, that is, the motion matching module needs to output a link that can match the control instruction, and The target animation clip that can be connected to the first n animation clips.
  • determining a target animation fragment that matches the control instruction from a plurality of preset animation fragments includes: according to the historical movement trajectory and drive of the digital human At least one historical animation segment of the digital human movement and the control instruction determine a target animation segment that matches the control instruction and is connected to the at least one historical animation segment from a plurality of preset animation segments.
  • the input of the motion matching module not only includes the first n animation clips and the control instruction, but may also include, for example, the historical motion trajectory of the digital human.
  • the historical movement trajectory can be the movement trajectory of the digital person within a certain historical time period, that is, the trajectory line that the person walked through.
  • the motion matching module may output a target animation segment that can match the control instruction and be connected to the first n animation segments.
  • the target animation segment determined by the motion matching module may include the initial posture or base of the skeleton. Accurate posture, and skeletal motion information based on the initial posture or reference posture. Specifically, this embodiment can use the skeletal motion information in the target animation clip as the skeletal motion information that drives the digital human.
  • the bones included in the target animation clip may be the bones of the digital person, or may not be the bones of the digital person. If it is not the skeleton of the digital human, it is ensured that the skeletal motion information in the target animation clip is redirected to the skeleton of the digital human.
  • the target module is the motion control module
  • the model generates skeletal motion information used to drive the digital human.
  • the main function of the pre-processing module corresponding to the motion control module is to standardize the input of the machine learning model during the training process, that is, the samples.
  • the input of the machine learning model includes Skeleton motion information
  • the standardization process may be to redirect the skeletal motion information to a unified standard skeletal pose, for example, T-pose pose. Thereby improving the accuracy of the trained machine learning model.
  • the motion control module or the sub-module in the motion control module can convert the control instruction, the digital human
  • the historical motion skeleton information and historical motion trajectories are input into the pre-trained machine learning model, so that the machine learning model can generate the skeleton motion information of the digital person at the next moment or next frame based on the input information.
  • the information input by the motion control module to the machine learning model includes the control instruction, the historical motion skeleton information and the historical motion trajectory of the digital human, where the historical motion trajectory can be the digital human. movement trajectory within a certain historical period.
  • the meaning of the historical motion skeleton information here refers to the content described in the above embodiment, and will not be described again here.
  • control instructions, historical movement skeleton information and historical movement trajectories of the digital human are input into a pre-trained machine learning model, and the skeleton used to drive the digital human is generated through the machine learning model.
  • Movement information includes: inputting the control instructions, environmental information around the digital person, historical movement skeleton information and historical movement trajectories of the digital person into a pre-trained machine learning model, through the machine learning model Generate skeletal motion information for driving the digital human at the next moment.
  • the information input by the motion control module to the machine learning model is not limited to including the control instruction, the historical motion skeleton information and the historical motion trajectory of the digital person.
  • the output of the machine learning model is not limited to including the skeletal motion information of the digital person at the next moment or the next frame.
  • it may also include the movement trajectory of the digital person predicted by the machine learning model in a subsequent short period of time. It can be understood that the output of the machine learning model at the current moment can be used as the input of the machine learning model at the next moment, thereby continuously iteratively calculating.
  • the motion trajectory output by the machine learning model at the current moment can be used as the input of the machine learning model.
  • the digital person can be driven once, so that the machine learning model can be driven in real time while outputting in real time.
  • the environmental information around the digital person includes at least one of the following: a preset length of time that the digital person passes The height information of each trajectory point on the historical movement trajectory; the voxelization information of virtual objects within the preset range around the digital human; the trajectory information of dynamic objects around the digital human; the dynamic objects around the digital human and Contact information of said digital person.
  • the environment information around the digital person may specifically be information about the virtual environment in which the digital person is located.
  • the environment information may include the height information of each track point on the historical movement trajectory of the first 2 meters passed by the digital person, and the height information may be height information relative to the reference horizon in the virtual environment.
  • the historical movement trajectory of the first 2 meters passed by the digital person may be the historical movement trajectory of the previous 2 meters relative to the current position of the digital person.
  • the environmental information may also include voxelized information of all objects within 2 meters around the digital person. It can be understood that 2 meters is used as an example for schematic description here, and in other embodiments, the specific numerical value is not limited.
  • the environmental information may also include trajectory information of dynamic objects around the digital person, such as other digital people, and contact information between the digital person and other digital people.
  • the motion matching module and the motion control module also correspond to post-processing modules respectively.
  • the post-processing module corresponding to the motion matching module can determine the motion matching module or a certain sub-module in the motion matching module.
  • the skeletal motion information in the target animation clip is redirected to the skeleton of the digital human, so that the skeleton of the digital human can complete the action corresponding to the target action clip.
  • the target animation clip can be a standard skeletal animation clip.
  • the post-processing module corresponding to the motion control module can redirect the standard skeletal motion information generated by the machine learning model to the skeleton of the digital human, so that the skeleton of the digital human can complete the standard skeletal motion generated by the machine learning model. Actions corresponding to the information.
  • the post-processing module can also use the foot inverse kinematics algorithm (Foot IK) to process the target animation clips determined by the motion matching module or the standard skeletal motion information generated by the machine learning model, so that the digital human can During exercise, the digital person's feet can be fixed on the ground to prevent the digital person from sliding relative to the ground when walking.
  • Foot IK foot inverse kinematics algorithm
  • the post-processing module can also optimize the skeletal motion information generated by the machine learning model.
  • the skeletal motion information generated by the machine learning model is located in the absolute world coordinate system, and the post-processing module can convert the absolute world coordinates into The skeletal motion information in the coordinate system is converted into skeletal motion information in the relative digital human coordinate system, so that the above-mentioned reorientation can be better completed based on the skeletal motion information in the relative digital human coordinate system.
  • the post-processing module can determine whether the skeletal motion information generated by the machine learning model involves a situation where the joint rotation direction may be wrong. If so, the post-processing module can pre-constrain the joint rotation direction, thereby different Add a priori constraints to the joints. For example, the knee joint can only rotate in the front-to-back direction when walking, and rarely rotates in other directions. Therefore, you can add a constraint condition to the knee joint that the rotation direction is the front-to-back direction.
  • the pre-processing module or the post-processing module can be canceled or replaced with other modules.
  • any other rules or machine learning models can be used to replace the functions of the pre-processing module or the post-processing module.
  • the function of the dynamic state machine can be replaced by defining gait information, or the dynamic state machine can be used to replace the gait information input by the model.
  • the machine learning model is trained based on preset skeletal motion information and multiple differentiated environment information adapted to the preset skeletal motion information.
  • the environmental information can be environmental information in the real world.
  • the skeletal motion information of the digital person sitting on the chair is fixed, but in different scenes, the chairs the digital person sits on may be different. Therefore, by adding different types of chairs Differentiated environmental information can be constructed, which can diversify the training data of the machine learning model.
  • the generalization of machine learning models in different environments can be greatly improved. This solves the problem of poor generalization caused by traditional state machines and action library-based matching solutions that need to reconstruct corresponding action segments in different scenarios.
  • constructing differentiated environmental information requires a certain cost
  • the labor cost can be greatly reduced through excellent environment adaptive algorithms, and the cost of constructing environmental information is lower than the cost of collecting adapted skeletal motion information in different environments. Much lower.
  • the algorithm used by the motion control module is a generative algorithm.
  • the machine learning model corresponding to the motion control module can generate skeletal motion information in real time.
  • machine learning models can directly output skeletal motion information instead of extracting an animation clip from an existing animation clip.
  • the training data input to the machine learning model includes skeletal motion information. Some training data may have no frame skipping or model crossing, while some training data may have frame skipping or crossing. However, during the training process, the machine learning model can fit a large amount of training data and automatically learn.
  • the parameters of the machine learning model will be affected by good training data (for example, training data without frame skipping or pattern crossing) will also be affected by bad training data (for example, training data with frame skipping or pattern crossing). Therefore, when the parameters of the machine learning model become stable, the machine learning model can automatically perform rough modification on the unrefined skeletal motion information, or can automatically refine the rough skeletal motion information. This saves the huge cost of refining every action clip in the action library.
  • the embodiment of the present disclosure can automatically match the next adapted animation clip from an action library (such as a database). This process is automatically learned and does not require the introduction of additional labor costs. Since the motion matching algorithm extracts skeletal motion information from existing animation clips, when it is found that an animation clip does not meet the standards, the animation clip or part of the animation clip in the action library can be corrected.
  • an action library such as a database
  • the motion control module can drive the digital human to complete the 10 actions. , and at the same time monitor whether the effect of each action performed by the digital person reaches the standard. If it is found that the digital person does not meet the standard when performing the 3rd and 4th actions. Then in the release stage, the motion matching module can be used to drive the digital human to complete the third and fourth actions, and the motion control module can drive the digital human to complete other actions.
  • the motion matching module can be used to drive the digital human to complete the third and fourth actions, and the motion control module can drive the digital human to complete other actions.
  • FIG. 5 is a schematic structural diagram of a digital human driving device provided by an embodiment of the present disclosure.
  • the digital human driving device provided by the embodiment of the present disclosure can execute the processing flow provided by the digital human driving method embodiment.
  • the digital human driving device 50 includes:
  • Acquisition module 51 used to obtain control instructions for driving the digital human
  • the first determination module 52 is configured to determine the target module that executes the control instruction from the motion matching module and the motion control module according to the control instruction;
  • the second determination module 53 is configured to determine, according to the control instruction, a target animation segment that matches the control instruction from a plurality of preset animation segments if the target module is the motion matching module, and The skeletal motion information in the target animation clip is used as the skeletal motion information that drives the digital human;
  • Generating module 54 configured to input the control instructions, historical motion skeleton information and historical motion trajectories of the digital human into the pre-trained machine learning model if the target module is the motion control module.
  • the machine learning model generates skeletal motion information used to drive the digital human;
  • the driving module 55 is used to drive the movement of the digital human according to the skeletal movement information.
  • the acquisition module 51 is also configured to acquire at least one control signal for driving the digital human before acquiring the control instruction for driving the digital human;
  • the digital human driving device 50 also includes: an analysis module 56 and a sorting module 57 , wherein the parsing module 56 is used to parse each control signal into at least one control instruction, and the sorting module 57 is used to sort at least one control instruction corresponding to the at least one control signal to obtain the sorting result;
  • the acquisition module 51 When obtaining the control instruction for driving the digital human, it is specifically used to: obtain the current first unexecuted control instruction from the sorting result.
  • the first determination module 52 determines a target animation segment that matches the control instruction from a plurality of preset animation segments according to the control instruction, it is specifically used to:
  • a target animation segment that matches the control instruction and is connected to the at least one historical animation segment is determined from a plurality of preset animation segments.
  • At least one historical animation segment that drives the movement of the digital human, and the control instruction it is determined from a plurality of preset animation segments that match the control instruction and are consistent with the control instruction.
  • the target animation clip that is connected to at least one historical animation clip.
  • the generation module 54 inputs the control instructions, historical movement skeleton information and historical movement trajectories of the digital human into a pre-trained machine learning model, and generates a program for driving the digital human through the machine learning model.
  • human skeletal movement information it is specifically used for:
  • control instructions, the environmental information around the digital human, the historical motion skeleton information and historical motion trajectories of the digital human are input into the pre-trained machine learning model, and the next moment is generated through the machine learning model Skeletal motion information used to drive the digital human.
  • the machine learning model is trained based on preset skeletal motion information and multiple differentiated environment information adapted to the preset skeletal motion information.
  • the environmental information around the digital person includes at least one of the following:
  • Voxelization information of virtual objects within a preset range around the digital person
  • the historical motion skeleton information of the digital person includes at least one of the following:
  • the status information of the digital person at each trajectory point in the historical movement trajectory is the status information of the digital person at each trajectory point in the historical movement trajectory.
  • the digital human driving device of the embodiment shown in Figure 5 can be used to execute the technical solution of the above method embodiment. Its implementation principles and technical effects are similar and will not be described again here.
  • FIG. 6 is a schematic structural diagram of an electronic device embodiment provided by an embodiment of the present disclosure. As shown in FIG. 6 , the electronic device includes a memory 61 and a processor 62 .
  • the memory 61 is used to store programs. In addition to the above-mentioned programs, the memory 61 may also be configured to store various other data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phonebook data, messages, pictures, videos, etc.
  • Memory 61 may be implemented by any type of volatile or non-volatile storage device, or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EEPROM), Programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EEPROM erasable programmable read-only memory
  • EPROM Programmable read-only memory
  • PROM programmable read-only memory
  • ROM read-only memory
  • magnetic memory magnetic memory
  • flash memory magnetic or optical disk.
  • the processor 62 is coupled to the memory 61 and executes the program stored in the memory 61 for:
  • control instruction determine the target module that executes the control instruction from the motion matching module and the motion control module
  • a target animation fragment matching the control instruction is determined from a plurality of preset animation fragments, and the bones in the target animation fragment are Motion information serves as skeletal motion information that drives the digital human;
  • the control instructions, the historical motion skeleton information and the historical motion trajectory of the digital human are input into the pre-trained machine learning model, and generated by the machine learning model Skeletal motion information used to drive the digital human;
  • the digital human is driven to move.
  • the electronic device may also include: a communication component 63 , a power supply component 64 , an audio component 65 , a display 66 and other components. Only some components are schematically shown in FIG. 6 , which does not mean that the electronic device only includes the components shown in FIG. 6 .
  • Communication component 63 is configured to facilitate wired or wireless communication between the electronic device and other devices.
  • Electronic devices can access wireless networks based on communication standards, such as WiFi, 2G or 3G, or a combination thereof.
  • the communication component 63 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component 63 also includes a near field communication (NFC) module to facilitate short-range communication.
  • NFC near field communication
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • the power supply component 64 provides power to various components of the electronic device.
  • Power supply components 64 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic devices.
  • Audio component 65 is configured to output and/or input audio signals.
  • audio component 65 includes a microphone (MIC) configured to receive external audio signals when the electronic device is in operating modes, such as call mode, recording mode, and voice recognition mode. The received audio signal may be further stored in memory 61 or sent via communication component 63 .
  • audio component 65 also includes a speaker for outputting audio signals.
  • Display 66 includes a screen, which may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user.
  • the touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide action.
  • embodiments of the present disclosure also provide a computer-readable storage medium on which a computer program is stored, and the computer program is executed by a processor to implement the digital human driving method described in the above embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Processing Or Creating Images (AREA)

Abstract

本公开涉及一种数字人驱动方法、装置、设备及存储介质。本公开通过驱动数字人的控制指令,从运动匹配模块和运动控制模块中确定出执行该控制指令的目标模块。由于运动匹配模块可以根据控制指令,从多个预设动画片段中确定出与控制指令匹配的目标动画片段。运动控制模块可以将控制指令、数字人的历史运动骨骼信息和历史运动轨迹输入到预先训练完成的机器学习模型中,通过机器学习模型生成用于驱动数字人的骨骼运动信息。因此,本实施例可以在两种确定骨骼运动信息的方式之间自由的切换,使得不同场景下的控制指令可以通过不同的方式生成驱动数字人的骨骼运动信息,不需要构建状态转移图,因此节省了人工成本。

Description

数字人驱动方法、装置、设备及存储介质
本申请要求于2022年08月01日提交中国专利局、申请号为202210917824.X、申请名称为“数字人驱动方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及信息技术领域,尤其涉及一种数字人驱动方法、装置、设备及存储介质。
背景技术
随着科技的不断发展,在构建虚拟世界的过程中,例如,构建元宇宙、虚拟主播的过程中,如何驱动数字人运动成为关键性技术,数字人可以理解为虚拟世界中的人物。例如,数字人需要在虚拟世界中进行自由运动、以及和周围环境进行交互等。
但是,本申请的发明人发现,现有技术通过状态转移图来驱动数字人运动,例如,状态转移图中的每个节点是一个动画片段,该状态转移图中的每个边是一个状态转移条件,如果某个边上的状态转移条件成立,则用于驱动数字人的动画片段从该边所连接的一个动画片段变化为该边所连接的另一个动画片段,使得用于驱动数字人的动画片段在不同动画片段之间转移。但是,这种方式需要通过手工的方式预先构建不同的状态转移条件和相应的动画片段,从而导致人工成本较高。
发明内容
为了解决上述技术问题或者至少部分地解决上述技术问题,本公开提供了一种数字人驱动方法、装置、设备及存储介质,本实施例可以在两种确定骨骼运动信息的方式之间自由的切换,使得不同场景下的控制指令可以通过不同的方式生成驱动所述数字人的骨骼运动信息,不需要构建状态转移图,也不需要预先构建不同的状态转移条件和相应的动画片段即可实现对数字人的驱动,因此节省了人工成本。
第一方面,本公开实施例提供一种数字人驱动方法,包括:
获取用于驱动数字人的控制指令;
根据所述控制指令,从运动匹配模块和运动控制模块中确定出执行所述控制指令的目标模块;
若所述目标模块是所述运动匹配模块,则根据所述控制指令,从多个预设动画片段中确定出与所述控制指令匹配的目标动画片段,并将所述目标动画片段中的骨骼运动信息作为驱动所述数字人的骨骼运动信息;
若所述目标模块是所述运动控制模块,则将所述控制指令、所述数字人的历史运动骨 骼信息和历史运动轨迹输入到预先训练完成的机器学习模型中,通过所述机器学习模型生成用于驱动所述数字人的骨骼运动信息;
根据所述骨骼运动信息,驱动所述数字人运动。
第二方面,本公开实施例提供一种数字人驱动装置,包括:
获取模块,用于获取用于驱动数字人的控制指令;
第一确定模块,用于根据所述控制指令,从运动匹配模块和运动控制模块中确定出执行所述控制指令的目标模块;
第二确定模块,用于若所述目标模块是所述运动匹配模块,则根据所述控制指令,从多个预设动画片段中确定出与所述控制指令匹配的目标动画片段,并将所述目标动画片段中的骨骼运动信息作为驱动所述数字人的骨骼运动信息;
生成模块,用于若所述目标模块是所述运动控制模块,则将所述控制指令、所述数字人的历史运动骨骼信息和历史运动轨迹输入到预先训练完成的机器学习模型中,通过所述机器学习模型生成用于驱动所述数字人的骨骼运动信息;
驱动模块,用于根据所述骨骼运动信息,驱动所述数字人运动。
第三方面,本公开实施例提供一种电子设备,包括:
存储器;
处理器;以及
计算机程序;
其中,所述计算机程序存储在所述存储器中,并被配置为由所述处理器执行以实现如第一方面所述的方法。
第四方面,本公开实施例提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行以实现第一方面所述的方法。
本公开实施例提供的数字人驱动方法、装置、设备及存储介质,通过驱动数字人的控制指令,从运动匹配模块和运动控制模块中确定出执行该控制指令的目标模块,当控制指令不同时,选取的目标模块可能是不同的,因此可以实现运动匹配模块和运动控制模块的灵活切换。由于运动匹配模块可以根据所述控制指令,从多个预设动画片段中确定出与所述控制指令匹配的目标动画片段,并将所述目标动画片段中的骨骼运动信息作为驱动所述数字人的骨骼运动信息。运动控制模块可以将所述控制指令、所述数字人的历史运动骨骼信息和历史运动轨迹输入到预先训练完成的机器学习模型中,通过所述机器学习模型生成用于驱动所述数字人的骨骼运动信息。因此,运动匹配模块和运动控制模块分别确定骨骼运动信息的方式不同,本实施例可以在两种确定骨骼运动信息的方式之间自由的切换,使得不同场景下的控制指令可以通过不同的方式生成驱动所述数字人的骨骼运动信息,不需要构建状态转移图,也不需要预先构建不同的状态转移条件和相应的动画片段即可实现对数字人的驱动,因此节省了人工成本。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本公开实施例提供的数字人驱动方法流程图;
图2为本公开实施例提供的应用场景的示意图;
图3为本公开另一实施例提供的数字人驱动方法流程图;
图4为本公开另一实施例提供的数字人驱动方法流程图;
图5为本公开实施例提供的数字人驱动装置的结构示意图;
图6为本公开实施例提供的电子设备实施例的结构示意图。
具体实施方式
为了能够更清楚地理解本公开的上述目的、特征和优点,下面将对本公开的方案进行进一步描述。需要说明的是,在不冲突的情况下,本公开的实施例及实施例中的特征可以相互组合。
在下面的描述中阐述了很多具体细节以便于充分理解本公开,但本公开还可以采用其他不同于在此描述的方式来实施;显然,说明书中的实施例只是本公开的一部分实施例,而不是全部的实施例。
通常情况下,通过状态转移图可以驱动数字人运动,例如,状态转移图中的每个节点是一个动画片段,该状态转移图中的每个边是一个状态转移条件,如果某个边上的状态转移条件成立,则用于驱动数字人的动画片段从该边所连接的一个动画片段变化为该边所连接的另一个动画片段,使得用于驱动数字人的动画片段在不同动画片段之间转移。但是,这种方式需要通过手工的方式预先构建不同的状态转移条件和相应的动画片段,从而导致人工成本较高。针对该问题,本公开实施例提供了一种数字人驱动方法,下面结合具体的实施例对该方法进行介绍。
图1为本公开实施例提供的数字人驱动方法流程图。该方法可以由数字人驱动装置执行,该装置可以采用软件和/或硬件的方式实现,该装置可配置于电子设备中,例如服务器或终端,其中,终端具体包括手机、电脑或平板电脑等。另外,本实施例所述的数字人驱动方法可以适用于如图2所示的应用场景。如图2所示,该应用场景包括终端21和服务器22,其中,服务器22可以采用本公开实施例所述的方法驱动数字人,并将数字人做动作的视频文件或视频流发送给终端21,从而使得终端21可以播放数字人做动作的画面。或者,终端21可以采用本公开实施例所述的方法驱动数字人,并播放数字人做动作的画面。下面结合图2对该方法进行详细介绍,如图1所示,该方法具体步骤如下:
S101、获取用于驱动数字人的控制指令。
假设以服务器22驱动数字人为例,服务器22可以获取用于驱动数字人的控制指令, 该控制指令可以是来自于终端21的控制指令,例如终端21的用户发出的控制指令。或者,该控制指令可以是服务器22生成的。
S102、根据所述控制指令,从运动匹配模块和运动控制模块中确定出执行所述控制指令的目标模块。
例如,服务器22可以包括运动匹配模块和运动控制模块,这两个模块分别可以采用软件和/或硬件的方式实现。运动匹配模块和运动控制模块分别可以确定出用于驱动数字人的骨骼运动信息,但是,运动匹配模块和运动控制模块确定骨骼运动信息的原理和具体过程不同。例如,运动匹配模块可以从已有的多个动画片段中选取一个与该控制指令最匹配的动画片段作为驱动数字人的骨骼运动信息,运动控制模块可以通过预先训练完成的机器学习模型直接生成驱动数字人的骨骼运动信息。因此,在服务器22获取到控制指令的情况下,需要从运动匹配模块和运动控制模块中确定出执行该控制指令的目标模块。也就是说,服务器22需要从运动匹配模块和运动控制模块中确定出一个模块作为目标模块,由该目标模块确定驱动数字人的骨骼运动信息。在本实施例中,服务器22中可以预先存储有多个预设控制指令、以及执行每个预设控制指令的模块的标识,即服务器22可以预先存储有预设控制指令和模块标识之间的对应关系。当服务器22获取到某个控制指令时,可以根据该控制指令从该对应关系中查询出与该控制指令最匹配的一个预设控制指令,并将该预设控制指令对应的模块标识所对应的模块作为执行该控制指令的模块。也就是说,本实施例可以预先确定哪些控制指令由运动匹配模块和运动控制模块中的哪个模块来执行。在本实施例中,运动匹配模块采用运动匹配算法来处理控制指令,运动控制模块采用运动控制算法来处理控制指令。其中,运动匹配算法具体可以是从多个预设动画片段中确定出与所述控制指令匹配的目标动画片段的算法。运动控制算法可以是机器学习模型采用的算法,该机器学习模型可以根据该控制指令、该数字人的历史运动骨骼信息和历史运动轨迹,生成该数字人在下一时刻或下一帧的骨骼运动信息。
S103、若所述目标模块是所述运动匹配模块,则根据所述控制指令,从多个预设动画片段中确定出与所述控制指令匹配的目标动画片段,并将所述目标动画片段中的骨骼运动信息作为驱动所述数字人的骨骼运动信息。
例如,控制指令是“走到前面椅子旁”,服务器22可以将运动匹配模块作为执行该控制指令的目标模块,也就是说,服务器22可以将该控制指令给到运动匹配模块来执行。运动匹配模块在执行该控制指令时,可以根据该控制指令,从数据库中存储的多个预设动画片段中确定出与该控制指令匹配的目标动画片段,并将该目标动画片段中的骨骼运动信息作为驱动该数字人的骨骼运动信息。
S104、若所述目标模块是所述运动控制模块,则将所述控制指令、所述数字人的历史运动骨骼信息和历史运动轨迹输入到预先训练完成的机器学习模型中,通过所述机器学习模型生成用于驱动所述数字人的骨骼运动信息。
可选的,所述数字人的历史运动骨骼信息包括如下至少一种:所述数字人的每个骨骼 点在所述历史运动轨迹中每个轨迹点上的位置信息、位移信息和旋转信息;所述数字人在所述历史运动轨迹中每个轨迹点上的状态信息。
例如,控制指令是“坐到椅子上”,服务器22可以将运动控制模块作为执行该控制指令的目标模块,也就是说,服务器22可以将该控制指令给到运动控制模块来执行。运动控制模块在执行该控制指令时,可以将该控制指令、该数字人的历史运动骨骼信息和历史运动轨迹输入到预先训练完成的机器学习模型中,使得该机器学习模型可以根据输入的信息生成该数字人在下一时刻或下一帧的骨骼运动信息。其中,该数字人包括多个骨骼点,历史运动轨迹包括多个历史轨迹点,该数字人的历史运动骨骼信息可以是该数字人在历史运动轨迹上运动过程中的骨骼运动信息。例如,该数字人的历史运动骨骼信息包括该数字人在每个历史轨迹点上的骨骼姿态信息、以及该数字人在每个历史轨迹点上的状态信息,其中,该数字人在每个历史轨迹点上的骨骼姿态信息包括该数字人的每个骨骼点分别在每个历史轨迹点或每个历史时刻上的位置信息、以及每个骨骼点分别在相邻两个历史轨迹点或相邻两个历史时刻之间的位移信息和旋转信息。其中,历史轨迹点和历史时刻可以一一对应,也可以不对应。该数字人在每个历史轨迹点上的状态信息包括行走、跑步、蹲下、站立等状态。
此外,该数字人在下一时刻或下一帧的骨骼运动信息包括该数字人在下一时刻或下一帧的骨骼姿态信息,例如,该数字人在下一时刻或下一帧的骨骼姿态信息包括该数字人的每个骨骼点分别在下一时刻或下一帧所处的位置信息、以及下一时刻相对于当前时刻或下一帧相对于当前帧而言,每个骨骼点分别的位移信息和旋转信息。可以理解的是,该数字人包括的多个骨骼点中有一个骨骼点是根节点,或者,根据该多个骨骼点可以确定出根节点,该根节点在地面上的投影点记为轨迹点。
S105、根据所述骨骼运动信息,驱动所述数字人运动。
例如,当服务器22确定出用于驱动数字人的骨骼运动信息时,可以根据重定向的方式驱动数字人运动,例如,将该骨骼运动信息中包括的各个骨骼点的旋转信息和位移信息绑定到该数字人的骨骼上,使得该数字人的骨骼可以和该骨骼运动信息中的骨骼做相似的运动。
本公开实施例通过驱动数字人的控制指令,从运动匹配模块和运动控制模块中确定出执行该控制指令的目标模块,当控制指令不同时,选取的目标模块可能是不同的,因此可以实现运动匹配模块和运动控制模块的灵活切换。由于运动匹配模块可以根据所述控制指令,从多个预设动画片段中确定出与所述控制指令匹配的目标动画片段,并将所述目标动画片段中的骨骼运动信息作为驱动所述数字人的骨骼运动信息。运动控制模块可以将所述控制指令、所述数字人的历史运动骨骼信息和历史运动轨迹输入到预先训练完成的机器学习模型中,通过所述机器学习模型生成用于驱动所述数字人的骨骼运动信息。因此,运动匹配模块和运动控制模块分别确定骨骼运动信息的方式不同,本实施例可以在两种确定骨骼运动信息的方式之间自由的切换,使得不同场景下的控制指令可以通过不同的方式生成 驱动所述数字人的骨骼运动信息,不需要构建状态转移图,也不需要预先构建不同的状态转移条件和相应的动画片段即可实现对数字人的驱动,因此节省了人工成本。
图3为本公开另一实施例提供的数字人驱动方法流程图。在本实施例中,该方法具体步骤如下:
S301、获取至少一个用于驱动数字人的控制信号。
如图4所示,服务器22包括指令解析模块、动态状态机、前处理模块、运动匹配模块、运动控制模块、后处理模块。服务器22通过这几个模块可以实现对数字人的驱动方案,通过此方案可以生成和各类指令相对应的全地形位移动画、场景交互动画和长时序动作动画等。例如,指令解析模块可以接收如图4所示的脑电波、音频信号、视觉信号、语音信号、文本信号、路径规划信号等用于驱动数字人的控制信号,这些控制信号可以是终端21生成之后发送给服务器22的,或者可以是服务器22生成的。
S302、将每个控制信号分别解析为至少一个控制指令。
例如,脑电波可以是脑电传感器感应出的信号,脑电传感器可以设置在穿戴式设备中,该穿戴式设备可以是终端21,并穿戴在真实人物的头部,当真实人物的大脑在思考不同的控制指令时,脑电传感器可以感应出不同的信号。例如,当真实人物的大脑在思考“向前走”时,脑电传感器感应出的信号是0,当真实人物的大脑在思考“向后走”时,脑电传感器感应出的信号是1。因此,当指令解析模块接收到的脑电波代表信号0时,指令解析模块将该脑电波解析成的控制指令是“向前走”。当指令解析模块接收到的脑电波代表信号1时,指令解析模块将该脑电波解析成的控制指令是“向后走”。可以理解的是,此处只是示意性说明,在不同的场景中,脑电传感器感应出的0、1信号分别代表不同的意思,例如,在数字人的转向场景中,脑电传感器感应出的信号0可以被指令解析模块解析为“向右转”,信号1可以被解析为“向左转”。
如图4所示的音频信号可以是一段音乐或带有节凑的音频,指令解析模块可以从该音频信号中解析出用于控制数字人行为的控制指令,例如,根据音乐的音量大小解析出控制数字人动作幅度大小的控制指令,根据该音频信号的节凑解析出控制数字人脚步节凑的控制指令。视觉信号可以是一段在真实世界中拍摄的视频,指令解析模块可以解析出该视觉信号中的人物动作,并将该人物动作转换为相应的骨骼运动信息。或者,该视觉信号可以是虚拟视觉信号,即在虚拟环境中模拟出的视觉信号,如16线、64线视觉系统等。语音信号可以是终端21的用户发出的用于控制数字人的语音,指令解析模块可以通过自动语音识别(Automatic Speech Recognition,ASR)技术将该语音转换为文本信息,进一步,将该文本信息解析为至少一个控制指令。另外,如图4所示的文本信号可以是文本信息,指令解析模块可以对文本信息进行解析,并将文本信息拆解为连续的独立的控制指令。例如,文本信息是“去前面椅子旁坐下”,该文本信息可以被指令解析模块拆解为两个控制指令,其中一个控制指令是“走到前面椅子旁”,另一个控制指令是“坐在椅子上”。此外, 如图4所示的路径规划信号可以包括目的地,指令解析模块可以根据该目的地进行自动的路径规划,并选择最优路径作为该数字人的避障路径,进一步,指令解析模块可以将该最优路径解析成多个控制指令,每个控制指令可以包括该最优路径上的一个轨迹点的位置信息,从而通过该多个控制指令来控制数字人沿着该最优路径移动。可以理解的是,在其他实施例中,可以将指令解析模块下发的控制指令替换为其他的控制模块下发的控制指令,或者该指令解析模块可以替换为任何能够下发控制指令的模块,或者动态状态机接收到的控制指令还可以是人工输入的任意控制指令。此外,指令解析模块所采用的解析方式并不限于如上所述的解析方式,例如,指令解析模块还可以通过机器学习模型对其接收到的控制信号进行解析,从而将其接收到的控制信号直接解析为控制指令,例如将指令解析模块接收到的文本信号或语音信号直接解析为相应的控制指令。
S303、将所述至少一个控制信号分别对应的至少一个控制指令进行排序,得到排序结果。
如图4所示,在一段时间内,指令解析模块可以接收到脑电波、音频信号、视觉信号、语音信号、文本信号、路径规划信号中的至少一个控制信号,并且可以将每个控制信号解析为至少一个控制指令。因此,在一段时间内,指令解析模块可以解析出多个控制指令,进一步,指令解析模块可以将其解析出的多个控制指令下发给动态状态机。此时,动态状态机可以对该多个控制指令进行排序,例如,按照执行的先后顺序进行排序,从而得到排序结果。例如,在一段时间内,指令解析模块解析出3个控制指令,分别记为控制指令A、控制指令B、控制指令C。动态状态机对该3个控制指令进行排序后得到的排序结果是控制指令B、控制指令A、控制指令C。
S304、从所述排序结果中获取当前首个未被执行的控制指令。
例如,动态状态机得到排序结果后,从该排序结果中获取当前首个未被执行的控制指令,例如,控制指令B是当前首个未被执行的控制指令。
S305、根据所述控制指令,从运动匹配模块和运动控制模块中确定出执行所述控制指令的目标模块。
例如,动态状态机根据控制指令B,从运动匹配模块和运动控制模块中确定出执行该控制指令B的目标模块。在本实施例中,运动匹配模块可以包括多个子模块。例如,运动匹配模块包括3个子模块,3个子模块分别记为位移子模块、交互子模块和动作子模块,其中,位移子模块用于处理关于数字人位移的控制指令,交互子模块用于处理关于数字人交互的控制指令,动作子模块用于处理关于数字人动作的控制指令。其中,数字人位移包括数字人行走、上下楼、爬山等产生的位移变化。数字人交互包括数字人与虚拟环境中的静态物体(例如沙发、椅子等)之间的静态交互,以及数字人与虚拟环境中的动态物体(例如其他数字人)之间的动态交互。数字人动作包括数字人跳舞、原地武术等原地的姿态变化。可以理解的是,在其他实施例中,不限于是一个子模块来处理关于数字人位移、交互或动作的控制指令,例如,以数字人位移为例,可以由多个位移子模块联合起来处理关于 数字人位移的控制指令,或者该多个位移子模块中的每个位移子模块分别独立的用于处理关于数字人位移的控制指令。
同理,运动控制模块也可以包括3个子模块,例如,位移子模块、交互子模块和动作子模块,各个子模块的作用如上所述,此处不再赘述。但是,在本实施例中,由于运动匹配模块和运动控制模块分别适用的场景和/或控制指令不同,因此,对于同一个子模块,例如位移子模块而言,运动匹配模块中的位移子模块和运动控制模块中的位移子模块分别适用的场景和/或控制指令不同。因此,动态状态机根据控制指令B,从运动匹配模块和运动控制模块中确定出执行该控制指令B的目标模块时,具体可以从运动匹配模块包括的3个子模块和运动控制模块包括的3个子模块中确定出一个子模块作为目标模块。
例如,该控制指令B是关于数字人位移的控制指令,并且数字人位移是在平地上的位移,则动态状态机可以选择运动匹配模块中的位移子模块作为目标模块。如果该控制指令B是关于数字人位移的控制指令,并且数字人位移是在上下楼、爬山等场景下产生的位移,则动态状态机可以选择运动控制模块中的位移子模块作为目标模块。
例如,当目标模块处理完该控制指令B之后,该目标模块可以向动态状态机发送一个完成信号。此时,如上所述的排序结果中当前首个未被执行的控制指令变成了控制指令A,进一步,动态状态机可以给控制指令A确定目标模块,确定过程类似于给控制指令B确定目标模块的过程,此处不再赘述。当控制指令A被执行完成后,控制指令C变成了该排序结果中当前首个未被执行的控制指令,进一步,动态状态机可以给控制指令C确定目标模块,并由目标模块对该控制指令C进行处理。也就是说,动态状态机可以将指令解析模块下发的多个控制指令进行串联(例如排序),并按照排序结果,将该多个控制指令依次分发给运动匹配模块中不同的子模块或运动控制模块中不同的子模块进行处理。
例如,数字人当前的状态为闲置状态(例如站立的休息状态),假设此时指令解析模块向动态状态机下发了两个控制指令,分别是“走到椅子旁边”、“坐到椅子上”。动态状态机对这两个控制指令进行排序,排序结果是“走到椅子旁边”在先,“坐到椅子上”在后。进一步,动态状态机从运动匹配模块包括的3个子模块和运动控制模块包括的3个子模块中判断出适合完成“走到椅子旁边”的子模块,例如,运动匹配模块中的位移子模块。然后,动态状态机将“走到椅子旁边”分发给运动匹配模块中的位移子模块,该位移子模块完成后向动态状态机返回已完成信号,此时数字人可以回到闲置状态等待调用。接着,动态状态机从运动匹配模块包括的3个子模块和运动控制模块包括的3个子模块中判断出适合完成“坐到椅子上”的子模块,例如,运动控制模块中的交互子模块。进一步,动态状态机将“坐到椅子上”分发给运动控制模块中的交互子模块,当“坐到椅子上”被该交互子模块处理完成后,数字人可以再次回到闲置状态。可以理解的是,当数字人执行不同控制指令的动作后,该数字人回到的闲置状态可以是不同的,例如,当数字人完成某一个控制指令的动作后回到的闲置状态可以是该控制指令对应的一系列连贯动作中最后一个动作结束时的状态。
S306、若所述目标模块是所述运动匹配模块,则根据所述控制指令,从多个预设动画片段中确定出与所述控制指令匹配的目标动画片段,并将所述目标动画片段中的骨骼运动信息作为驱动所述数字人的骨骼运动信息。
例如图4所示,运动匹配模块和运动控制模块分别对应有前处理模块,具体的,运动匹配模块和运动控制模块分别对应的前处理模块可以是同一个模块,也可以是不同的模块。如果是同一个模块,那么该前处理模块针对运动匹配模块和运动控制模块分别执行的前处理过程是不同的。在本实施例中,运动匹配模块对应的前处理模块主要完成的功能是对数据库中存储的多个预设动画片段进行精修。
例如,动态状态机给某个控制指令选择的目标模块是运动匹配模块或运动匹配模块中的某个子模块时,运动匹配模块或运动匹配模块中的该子模块可以根据该控制指令,从数据库中存储的多个预设动画片段中确定出与该控制指令匹配的目标动画片段,并将该目标动画片段中的骨骼运动信息作为驱动该数字人的骨骼运动信息。以运动匹配模块执行该控制指令为例,当运动匹配模块的输入是该控制指令时,该运动匹配模块可以从多个预设动画片段中确定出与该控制指令匹配的目标动画片段,并输出该目标动画片段。
在一种可行的实现方式中,根据所述控制指令,从多个预设动画片段中确定出与所述控制指令匹配的目标动画片段,包括:根据驱动所述数字人运动的至少一个历史动画片段和所述控制指令,从多个预设动画片段中确定出与所述控制指令匹配、且与所述至少一个历史动画片段衔接的目标动画片段。
以运动匹配模块执行该控制指令为例,该运动匹配模块的输入不仅包括该控制指令,例如还可以包括前n个动画片段,该前n个动画片段是驱动该数字人运动的历史动画片段,该历史动画片段的个数是n个,n大于或等于1。也就是说,该运动匹配模块的输入可以包括该前n个动画片段和该控制指令,在这种情况下,该运动匹配模块不仅需要从多个预设动画片段中确定出与所述控制指令匹配的目标动画片段,同时还要使得确定出的目标动画片段和该前n个动画片段的衔接度大于或等于预设衔接度,即该运动匹配模块需要输出一个能够与该控制指令匹配、且与该前n个动画片段能够衔接的目标动画片段。
在另一种可行的实现方式中,根据所述控制指令,从多个预设动画片段中确定出与所述控制指令匹配的目标动画片段,包括:根据所述数字人的历史运动轨迹、驱动所述数字人运动的至少一个历史动画片段、以及所述控制指令,从多个预设动画片段中确定出与所述控制指令匹配、且与所述至少一个历史动画片段衔接的目标动画片段。
以运动匹配模块执行该控制指令为例,该运动匹配模块的输入不仅包括该前n个动画片段和该控制指令,例如还可以包括该数字人的历史运动轨迹。该历史运动轨迹可以是数字人在一定历史时间段内的运动轨迹,即行走经过的轨迹线。在这种情况下,该运动匹配模块可以输出一个能够与该控制指令匹配、且与该前n个动画片段能够衔接的目标动画片段。
在本实施例中,由运动匹配模块确定出的目标动画片段可以包括骨骼的初始姿态或基 准姿态、以及基于该初始姿态或基准姿态的骨骼运动信息。具体的,本实施例可以将该目标动画片段中的骨骼运动信息作为驱动该数字人的骨骼运动信息。其中,该目标动画片段中包括的骨骼可以是该数字人的骨骼,也可以不是该数字人的骨骼。如果不是该数字人的骨骼,则能够保证将该目标动画片段中的骨骼运动信息重定向到数字人的骨骼上即可。
S307、若所述目标模块是所述运动控制模块,则将所述控制指令、所述数字人的历史运动骨骼信息和历史运动轨迹输入到预先训练完成的机器学习模型中,通过所述机器学习模型生成用于驱动所述数字人的骨骼运动信息。
例如,在本实施例中,运动控制模块对应的前处理模块主要完成的功能是对机器学习模型在训练过程中的输入即样本进行标准化,例如,在训练阶段中,该机器学习模型的输入包括骨骼运动信息,该标准化的过程可以是将该骨骼运动信息重定向到统一的标准骨骼姿态,例如,T-pose姿态。从而提高训练后的机器学习模型的准确度。
例如,动态状态机给某个控制指令选择的目标模块是运动控制模块或运动控制模块中的某个子模块时,运动控制模块或运动控制模块中的该子模块可以将该控制指令、该数字人的历史运动骨骼信息和历史运动轨迹输入到预先训练完成的机器学习模型中,使得该机器学习模型可以根据这些输入的信息生成该数字人在下一时刻或下一帧的骨骼运动信息。以运动控制模块执行该控制指令为例,该运动控制模块向机器学习模型输入的信息包括该控制指令、该数字人的历史运动骨骼信息和历史运动轨迹,其中,历史运动轨迹可以是该数字人在一定历史时间段内的运动轨迹。此处的历史运动骨骼信息的含义参照上述实施例中所述的内容,此处不再赘述。
可选的,将所述控制指令、所述数字人的历史运动骨骼信息和历史运动轨迹输入到预先训练完成的机器学习模型中,通过所述机器学习模型生成用于驱动所述数字人的骨骼运动信息,包括:将所述控制指令、所述数字人周围的环境信息、所述数字人的历史运动骨骼信息和历史运动轨迹输入到预先训练完成的机器学习模型中,通过所述机器学习模型生成下一时刻用于驱动所述数字人的骨骼运动信息。
例如,以运动控制模块执行该控制指令为例,该运动控制模块向机器学习模型输入的信息不限于包括该控制指令、该数字人的历史运动骨骼信息和历史运动轨迹,例如还可以包括该数字人周围的环境信息。另外,该机器学习模型的输出不限于包括该数字人在下一时刻或下一帧的骨骼运动信息,例如还可以包括该机器学习模型预测的该数字人在后续的一小段时间内的运动轨迹。可以理解的是,该机器学习模型在当前时刻的输出可以作为该机器学习模型下一时刻的输入,从而不断迭代计算,例如,该机器学习模型在当前时刻输出的运动轨迹可以作为该机器学习模型在下一时刻所需输入的历史运动轨迹。也就是说,该机器学习模型的输出是实时的。另外,根据该机器学习模型每一次输出的该数字人在下一时刻或下一帧的骨骼运动信息,可以对该数字人进行一次驱动,从而在该机器学习模型实时输出的同时还可以实时的驱动该数字人运动。
可选的,所述数字人周围的环境信息包括如下至少一种:所述数字人经过的预设长度 的历史运动轨迹上每个轨迹点的高度信息;所述数字人周围预设范围内的虚拟物体的体素化信息;所述数字人周围动态物体的轨迹信息;所述数字人周围动态物体与所述数字人的接触信息。
在本实施例中,数字人周围的环境信息具体可以是该数字人所处的虚拟环境的信息。例如,该环境信息可以包括该数字人经过的前2米的历史运动轨迹上每个轨迹点的高度信息,该高度信息可以是相对于虚拟环境中的参考地平线的高度信息。该数字人经过的前2米的历史运动轨迹可以是相对于数字人当前位置的前2米的历史运动轨迹。另外,该环境信息还可以包括该数字人周围2米范围内的所有物体的体素化信息。可以理解的是,此处以2米为例进行示意性说明,在其他实施例中,并不限定具体的数值。此外,该环境信息还可以包括该数字人周围动态物体例如其他数字人的轨迹信息、以及该数字人与其他数字人之间的接触信息。
另外,如图4所示,运动匹配模块和运动控制模块还分别对应有后处理模块,该运动匹配模块对应的后处理模块可以将该运动匹配模块或该运动匹配模块中的某个子模块确定出的目标动画片段中的骨骼运动信息重定向到该数字人的骨骼上,从而使得该数字人的骨骼可以完成与该目标动作片段相应的动作。其中,目标动画片段可以是标准的骨骼动画片段。运动控制模块对应的后处理模块可以将机器学习模型生成的标准的骨骼运动信息重定向到该数字人的骨骼上,从而使得该数字人的骨骼可以完成与该机器学习模型生成的标准的骨骼运动信息相应的动作。
此外,该后处理模块还可以利用脚部反向运动学算法(Foot IK)对运动匹配模块确定出的目标动画片段或该机器学习模型生成的标准的骨骼运动信息进行处理,从而使得数字人在运动时,该数字人的脚部可以固定在地面上,避免数字人在行走时相对于地面滑动。
另外,该后处理模块还可以对该机器学习模型生成的骨骼运动信息进行优化,例如,该机器学习模型生成的骨骼运动信息位于绝对的世界坐标系中,该后处理模块可以将绝对的世界坐标系中的骨骼运动信息转换为相对的数字人坐标系中的骨骼运动信息,从而根据相对的数字人坐标系中的骨骼运动信息更好的完成如上所述的重定向。再例如,该后处理模块可以判断该机器学习模型生成的骨骼运动信息是否涉及到关节旋转方向可能有误的情况,如果存在,则该后处理模块可以对关节旋转方向进行预先约束,从而对不同的关节添加先验性的约束条件。例如,膝盖关节在行走时只能存在前后方向的旋转,很少出现其他方向的旋转,因此,可以给膝盖关节添加旋转方向为前后方向的约束条件。
可以理解的是,在本实施例中,前处理模块或后处理模块可以被取消,或者可以替换为其他的模块。例如,可以利用任意其他规则或利用机器学习模型来代替前处理模块或后处理模块的功能。另外,在其他一些实施例中,还可以通过定义步态信息来代替动态状态机的功能,或者利用动态状态机代替模型输入的步态信息。
S308、根据所述骨骼运动信息,驱动所述数字人运动。
具体的,S308和S105的实现方式和具体原理一致,此处不再赘述。
可选的,所述机器学习模型是根据预设骨骼运动信息、以及与所述预设骨骼运动信息适配的多个差异化环境信息训练得到的。
例如,在训练运动控制模块对应的机器学习模型时,可以输入与骨骼运动信息相适配的大量差异化的环境信息,该环境信息可以是真实世界中的环境信息。例如,在训练数字人坐椅子这个动作的过程中,数字人坐椅子的骨骼运动信息是固定的,但是在不同场景中,数字人坐的椅子可能是不同的,因此,通过增加不同类型的椅子可以构建差异化的环境信息,可以使得该机器学习模型的训练数据多样化。通过这种环境适配能力的提升,可以大幅提升机器学习模型在不同环境下的泛化性。从而解决了传统状态机和基于动作库匹配方案在不同场景需要重新构建相应的动作片段而导致的泛化性差的问题。另外,虽然构建差异化环境信息需要一定的成本,但是通过优异的环境自适应算法可以大幅降低人工成本,且构建环境信息的成本相比于收集不同环境下的适配的骨骼运动信息的成本要低很多。
另外,由于本实施例引入了运动控制模块,该运动控制模块采用的算法为生成式算法,例如,该运动控制模块对应的机器学习模型可以实时的生成骨骼运动信息。相比于传统状态机和动作库匹配算法,机器学习模型可以直接输出骨骼运动信息,而并非是从已有的动画片段中提取出某个动画片段。机器学习模型在训练过程中,输入到机器学习模型的训练数据中包括骨骼运动信息,而有的训练数据可能是无跳帧或穿模现象的,而有的训练数据可能是有跳帧或穿模现象的,但是,在训练过程中,机器学习模型可以对大量的训练数据进行拟合、并自动学习,因此,在拟合、学习过程中,机器学习模型的参数会受到好的训练数据(例如,无跳帧或穿模现象的训练数据)的影响,也会受到不好的训练数据(例如,有跳帧或穿模现象的训练数据)的影响。因此,当机器学习模型的参数趋于稳定之后,机器学习模型可以自动的对未精修的骨骼运动信息进行粗修,或者可以自动的对已经粗修的骨骼运动信息进行精修。从而节省了对动作库中每个动作片段都进行精修带来的巨大成本。
此外,传统状态机需要根据不同的状态手工构建大量的状态转移条件,成本高。而本公开实施例通过引入了运动匹配算法,能够从动作库(例如数据库)中自动的匹配出下一个适配的动画片段,这个过程是自动学习的,不需要引入额外的人工成本。由于运动匹配算法是从已有的动画片段中提取骨骼运动信息,因此,当发现某个动画片段不满足标准时,可以对该动作库中的该动画片段或该动画片段的部分进行修正即可。
另外,针对一个控制指令,如果需要数字人完成多个例如10个动作,并且该控制指令由运动控制模块来执行,那么在测试阶段中,可以由运动控制模块来驱动数字人完成该10个动作,同时监测该数字人做每个动作的效果是否达标,如果发现该数字人在做第3个动作和第4个动作的时候不达标。那么在发布阶段,可以由运动匹配模块来驱动数字人完成第3个动作和第4个动作,其他动作由运动控制模块来驱动数字人完成。从而使得运动匹配模块和运动控制模块能够完美结合,灵活切换,从而提高对数字人驱动后的展现效果。从而利用运动匹配模块的可控和生成动作精细的优点弥补了运动控制模块在部分场景中生成的数据不达标的问题。
可以理解的是,本实施例所述的方法可以应用于众多场景,例如,元宇宙、虚拟主播等场景中。
图5为本公开实施例提供的数字人驱动装置的结构示意图。本公开实施例提供的数字人驱动装置可以执行数字人驱动方法实施例提供的处理流程,如图5所示,数字人驱动装置50包括:
获取模块51,用于获取用于驱动数字人的控制指令;
第一确定模块52,用于根据所述控制指令,从运动匹配模块和运动控制模块中确定出执行所述控制指令的目标模块;
第二确定模块53,用于若所述目标模块是所述运动匹配模块,则根据所述控制指令,从多个预设动画片段中确定出与所述控制指令匹配的目标动画片段,并将所述目标动画片段中的骨骼运动信息作为驱动所述数字人的骨骼运动信息;
生成模块54,用于若所述目标模块是所述运动控制模块,则将所述控制指令、所述数字人的历史运动骨骼信息和历史运动轨迹输入到预先训练完成的机器学习模型中,通过所述机器学习模型生成用于驱动所述数字人的骨骼运动信息;
驱动模块55,用于根据所述骨骼运动信息,驱动所述数字人运动。
可选的,获取模块51还用于在获取用于驱动数字人的控制指令之前,获取至少一个用于驱动数字人的控制信号;该数字人驱动装置50还包括:解析模块56和排序模块57,其中,解析模块56用于将每个控制信号分别解析为至少一个控制指令,排序模块57用于将所述至少一个控制信号分别对应的至少一个控制指令进行排序,得到排序结果;获取模块51获取用于驱动数字人的控制指令时,具体用于:从所述排序结果中获取当前首个未被执行的控制指令。
可选的,第一确定模块52根据所述控制指令,从多个预设动画片段中确定出与所述控制指令匹配的目标动画片段时,具体用于:
根据驱动所述数字人运动的至少一个历史动画片段和所述控制指令,从多个预设动画片段中确定出与所述控制指令匹配、且与所述至少一个历史动画片段衔接的目标动画片段;或者
根据所述数字人的历史运动轨迹、驱动所述数字人运动的至少一个历史动画片段、以及所述控制指令,从多个预设动画片段中确定出与所述控制指令匹配、且与所述至少一个历史动画片段衔接的目标动画片段。
可选的,生成模块54将所述控制指令、所述数字人的历史运动骨骼信息和历史运动轨迹输入到预先训练完成的机器学习模型中,通过所述机器学习模型生成用于驱动所述数字人的骨骼运动信息时,具体用于:
将所述控制指令、所述数字人周围的环境信息、所述数字人的历史运动骨骼信息和历史运动轨迹输入到预先训练完成的机器学习模型中,通过所述机器学习模型生成下一时刻 用于驱动所述数字人的骨骼运动信息。
可选的,所述机器学习模型是根据预设骨骼运动信息、以及与所述预设骨骼运动信息适配的多个差异化环境信息训练得到的。
可选的,所述数字人周围的环境信息包括如下至少一种:
所述数字人经过的预设长度的历史运动轨迹上每个轨迹点的高度信息;
所述数字人周围预设范围内的虚拟物体的体素化信息;
所述数字人周围动态物体的轨迹信息;
所述数字人周围动态物体与所述数字人的接触信息。
可选的,所述数字人的历史运动骨骼信息包括如下至少一种:
所述数字人的每个骨骼点在所述历史运动轨迹中每个轨迹点上的位置信息、位移信息和旋转信息;
所述数字人在所述历史运动轨迹中每个轨迹点上的状态信息。
图5所示实施例的数字人驱动装置可用于执行上述方法实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。
以上描述了数字人驱动装置的内部功能和结构,该装置可实现为一种电子设备。图6为本公开实施例提供的电子设备实施例的结构示意图。如图6所示,该电子设备包括存储器61和处理器62。
存储器61用于存储程序。除上述程序之外,存储器61还可被配置为存储其它各种数据以支持在电子设备上的操作。这些数据的示例包括用于在电子设备上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。
存储器61可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
处理器62与存储器61耦合,执行存储器61所存储的程序,以用于:
获取用于驱动数字人的控制指令;
根据所述控制指令,从运动匹配模块和运动控制模块中确定出执行所述控制指令的目标模块;
若所述目标模块是所述运动匹配模块,则根据所述控制指令,从多个预设动画片段中确定出与所述控制指令匹配的目标动画片段,并将所述目标动画片段中的骨骼运动信息作为驱动所述数字人的骨骼运动信息;
若所述目标模块是所述运动控制模块,则将所述控制指令、所述数字人的历史运动骨骼信息和历史运动轨迹输入到预先训练完成的机器学习模型中,通过所述机器学习模型生成用于驱动所述数字人的骨骼运动信息;
根据所述骨骼运动信息,驱动所述数字人运动。
进一步,如图6所示,电子设备还可以包括:通信组件63、电源组件64、音频组件65、显示器66等其它组件。图6中仅示意性给出部分组件,并不意味着电子设备只包括图6所示组件。
通信组件63被配置为便于电子设备和其他设备之间有线或无线方式的通信。电子设备可以接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合。在一个示例性实施例中,通信组件63经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件63还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。
电源组件64,为电子设备的各种组件提供电力。电源组件64可以包括电源管理系统,一个或多个电源,及其他与为电子设备生成、管理和分配电力相关联的组件。
音频组件65被配置为输出和/或输入音频信号。例如,音频组件65包括一个麦克风(MIC),当电子设备处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器61或经由通信组件63发送。在一些实施例中,音频组件65还包括一个扬声器,用于输出音频信号。
显示器66包括屏幕,其屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。
另外,本公开实施例还提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行以实现上述实施例所述的数字人驱动方法。
需要说明的是,在本文中,诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上所述仅是本公开的具体实施方式,使本领域技术人员能够理解或实现本公开。对这些实施例的多种修改对本领域的技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本公开的精神或范围的情况下,在其它实施例中实现。因此,本公开将不会被限制于本文所述的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。

Claims (10)

  1. 一种数字人驱动方法,其中,所述方法包括:
    获取用于驱动数字人的控制指令;
    根据所述控制指令,从运动匹配模块和运动控制模块中确定出执行所述控制指令的目标模块;
    若所述目标模块是所述运动匹配模块,则根据所述控制指令,从多个预设动画片段中确定出与所述控制指令匹配的目标动画片段,并将所述目标动画片段中的骨骼运动信息作为驱动所述数字人的骨骼运动信息;
    若所述目标模块是所述运动控制模块,则将所述控制指令、所述数字人的历史运动骨骼信息和历史运动轨迹输入到预先训练完成的机器学习模型中,通过所述机器学习模型生成用于驱动所述数字人的骨骼运动信息;
    根据所述骨骼运动信息,驱动所述数字人运动。
  2. 根据权利要求1所述的方法,其中,获取用于驱动数字人的控制指令之前,所述方法还包括:
    获取至少一个用于驱动数字人的控制信号;
    将每个控制信号分别解析为至少一个控制指令;
    将所述至少一个控制信号分别对应的至少一个控制指令进行排序,得到排序结果;
    相应的,获取用于驱动数字人的控制指令,包括:
    从所述排序结果中获取当前首个未被执行的控制指令。
  3. 根据权利要求1所述的方法,其中,根据所述控制指令,从多个预设动画片段中确定出与所述控制指令匹配的目标动画片段,包括:
    根据驱动所述数字人运动的至少一个历史动画片段和所述控制指令,从多个预设动画片段中确定出与所述控制指令匹配、且与所述至少一个历史动画片段衔接的目标动画片段;或者
    根据所述数字人的历史运动轨迹、驱动所述数字人运动的至少一个历史动画片段、以及所述控制指令,从多个预设动画片段中确定出与所述控制指令匹配、且与所述至少一个历史动画片段衔接的目标动画片段。
  4. 根据权利要求1所述的方法,其中,将所述控制指令、所述数字人的历史运动骨骼信息和历史运动轨迹输入到预先训练完成的机器学习模型中,通过所述机器学习模型生成用于驱动所述数字人的骨骼运动信息,包括:
    将所述控制指令、所述数字人周围的环境信息、所述数字人的历史运动骨骼信息和历史运动轨迹输入到预先训练完成的机器学习模型中,通过所述机器学习模型生成下一时刻用于驱动所述数字人的骨骼运动信息。
  5. 根据权利要求4所述的方法,其中,所述机器学习模型是根据预设骨骼运动信息、以及与所述预设骨骼运动信息适配的多个差异化环境信息训练得到的。
  6. 根据权利要求4所述的方法,其中,所述数字人周围的环境信息包括如下至少一种:
    所述数字人经过的预设长度的历史运动轨迹上每个轨迹点的高度信息;
    所述数字人周围预设范围内的虚拟物体的体素化信息;
    所述数字人周围动态物体的轨迹信息;
    所述数字人周围动态物体与所述数字人的接触信息。
  7. 根据权利要求1所述的方法,其中,所述数字人的历史运动骨骼信息包括如下至少一种:
    所述数字人的每个骨骼点在所述历史运动轨迹中每个轨迹点上的位置信息、位移信息和旋转信息;
    所述数字人在所述历史运动轨迹中每个轨迹点上的状态信息。
  8. 一种数字人驱动装置,其中,包括:
    获取模块,用于获取用于驱动数字人的控制指令;
    第一确定模块,用于根据所述控制指令,从运动匹配模块和运动控制模块中确定出执行所述控制指令的目标模块;
    第二确定模块,用于若所述目标模块是所述运动匹配模块,则根据所述控制指令,从多个预设动画片段中确定出与所述控制指令匹配的目标动画片段,并将所述目标动画片段中的骨骼运动信息作为驱动所述数字人的骨骼运动信息;
    生成模块,用于若所述目标模块是所述运动控制模块,则将所述控制指令、所述数字人的历史运动骨骼信息和历史运动轨迹输入到预先训练完成的机器学习模型中,通过所述机器学习模型生成用于驱动所述数字人的骨骼运动信息;
    驱动模块,用于根据所述骨骼运动信息,驱动所述数字人运动。
  9. 一种电子设备,其中,包括:
    存储器;
    处理器;以及
    计算机程序;
    其中,所述计算机程序存储在所述存储器中,并被配置为由所述处理器执行以实现如权利要求1-7中任一项所述的方法。
  10. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1-7中任一项所述的方法。
PCT/CN2023/110343 2022-08-01 2023-07-31 数字人驱动方法、装置、设备及存储介质 WO2024027661A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210917824.XA CN114998491B (zh) 2022-08-01 2022-08-01 数字人驱动方法、装置、设备及存储介质
CN202210917824.X 2022-08-01

Publications (1)

Publication Number Publication Date
WO2024027661A1 true WO2024027661A1 (zh) 2024-02-08

Family

ID=83022540

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/110343 WO2024027661A1 (zh) 2022-08-01 2023-07-31 数字人驱动方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN114998491B (zh)
WO (1) WO2024027661A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114998491B (zh) * 2022-08-01 2022-11-18 阿里巴巴(中国)有限公司 数字人驱动方法、装置、设备及存储介质
CN115331265A (zh) * 2022-10-17 2022-11-11 广州趣丸网络科技有限公司 姿态检测模型的训练方法和数字人的驱动方法、装置
CN115779436B (zh) * 2023-02-09 2023-05-05 腾讯科技(深圳)有限公司 动画切换方法、装置、设备及计算机可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423809A (zh) * 2017-07-07 2017-12-01 北京光年无限科技有限公司 应用于视频直播平台的虚拟机器人多模态交互方法和系统
CN110618995A (zh) * 2018-12-25 2019-12-27 北京时光荏苒科技有限公司 一种行为轨迹的生成方法、装置、服务器及可读介质
CN113570690A (zh) * 2021-08-02 2021-10-29 北京慧夜科技有限公司 交互动画生成模型训练、交互动画生成方法和系统
US20220198732A1 (en) * 2020-01-19 2022-06-23 Tencent Technology (Shenzhen) Company Limited Animation implementation method and apparatus, electronic device, and storage medium
CN114998491A (zh) * 2022-08-01 2022-09-02 阿里巴巴(中国)有限公司 数字人驱动方法、装置、设备及存储介质

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI448111B (zh) * 2008-03-18 2014-08-01 Icm Inc Automobile detection and control integration device and method thereof
CN102810210A (zh) * 2012-05-30 2012-12-05 天津大学 利用flash脚本实现的三维骨骼动画控制系统及方法
CN106447748B (zh) * 2016-09-14 2019-09-24 厦门黑镜科技有限公司 一种用于生成动画数据的方法和装置
CN110263720B (zh) * 2019-06-21 2022-12-27 中国民航大学 基于深度图像和骨骼信息的动作识别方法
JP6793235B1 (ja) * 2019-10-02 2020-12-02 株式会社Cygames 情報処理システム、情報処理方法、および、情報処理プログラム
CN110781820B (zh) * 2019-10-25 2022-08-05 网易(杭州)网络有限公司 游戏角色的动作生成方法、装置、计算机设备及存储介质
CN111292401B (zh) * 2020-01-15 2022-05-03 腾讯科技(深圳)有限公司 动画处理方法、装置、计算机存储介质及电子设备
CN111968204B (zh) * 2020-07-28 2024-03-22 完美世界(北京)软件科技发展有限公司 一种骨骼模型的运动展示方法和装置
CN113538645A (zh) * 2021-07-19 2021-10-22 北京顺天立安科技有限公司 一种用于虚拟形象的肢体动作与语言因素匹配方法及装置
CN113706666A (zh) * 2021-08-11 2021-11-26 网易(杭州)网络有限公司 动画数据处理方法、非易失性存储介质及电子装置
CN114063624A (zh) * 2021-10-22 2022-02-18 中国船舶重工集团公司第七一九研究所 一种爬游无人潜水器多模式规划运动控制器及其控制方法
CN113822972B (zh) * 2021-11-19 2022-05-27 阿里巴巴达摩院(杭州)科技有限公司 基于视频的处理方法、设备和可读介质
CN114155322A (zh) * 2021-12-01 2022-03-08 北京字跳网络技术有限公司 一种场景画面的展示控制方法、装置以及计算机存储介质
CN114596391A (zh) * 2022-01-19 2022-06-07 阿里巴巴(中国)有限公司 虚拟人物控制方法、装置、设备及存储介质
CN114820888A (zh) * 2022-04-24 2022-07-29 广州虎牙科技有限公司 动画生成方法、系统及计算机设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423809A (zh) * 2017-07-07 2017-12-01 北京光年无限科技有限公司 应用于视频直播平台的虚拟机器人多模态交互方法和系统
CN110618995A (zh) * 2018-12-25 2019-12-27 北京时光荏苒科技有限公司 一种行为轨迹的生成方法、装置、服务器及可读介质
US20220198732A1 (en) * 2020-01-19 2022-06-23 Tencent Technology (Shenzhen) Company Limited Animation implementation method and apparatus, electronic device, and storage medium
CN113570690A (zh) * 2021-08-02 2021-10-29 北京慧夜科技有限公司 交互动画生成模型训练、交互动画生成方法和系统
CN114998491A (zh) * 2022-08-01 2022-09-02 阿里巴巴(中国)有限公司 数字人驱动方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN114998491B (zh) 2022-11-18
CN114998491A (zh) 2022-09-02

Similar Documents

Publication Publication Date Title
WO2024027661A1 (zh) 数字人驱动方法、装置、设备及存储介质
JP6902683B2 (ja) 仮想ロボットのインタラクション方法、装置、記憶媒体及び電子機器
JP6803348B2 (ja) 拡張現実を結合した身体情報分析装置およびその眉型プレビュー方法
CN106201173B (zh) 一种基于投影的用户交互图标的交互控制方法及系统
CN111556278A (zh) 一种视频处理的方法、视频展示的方法、装置及存储介质
CN103092432A (zh) 人机交互操作指令的触发控制方法和系统及激光发射装置
CN110947181A (zh) 游戏画面显示方法、装置、存储介质及电子设备
CN114372356B (zh) 基于数字孪生的人工增强方法、装置及介质
US20230186583A1 (en) Method and device for processing virtual digital human, and model training method and device
CN102222342A (zh) 人体运动跟踪及其识别方法
CN103778549A (zh) 移动应用推广系统及方法
CN104281265A (zh) 一种应用程序的控制方法、装置及电子设备
CN109213304A (zh) 用于直播教学的手势互动方法和系统
JP2003295754A (ja) 手話教育用システム及び該システムを実現するためのプログラム
CN109739353A (zh) 一种基于手势、语音、视线追踪识别的虚拟现实交互系统
CN110568931A (zh) 交互方法、设备、系统、电子设备及存储介质
CN114513694A (zh) 评分确定方法、装置、电子设备和存储介质
US11042215B2 (en) Image processing method and apparatus, storage medium, and electronic device
CN113191184A (zh) 实时视频处理方法、装置、电子设备及存储介质
KR102199078B1 (ko) 동작인식 기반의 스마트 미용기술 러닝 장치 및 방법
CN106716501A (zh) 一种可视化装修设计方法及其装置、机器人
CN105975081A (zh) 体感控制方法和装置
CN111515959B (zh) 一种可编程木偶表演机器人控制方法、系统及机器人
CN114245193A (zh) 显示控制方法、装置和电子设备
CN113780051A (zh) 评估学生专注度的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23849368

Country of ref document: EP

Kind code of ref document: A1