WO2024007648A1

WO2024007648A1 - Digital human driving method and system, and device

Info

Publication number: WO2024007648A1
Application number: PCT/CN2023/087222
Authority: WO
Inventors: 郑洛
Original assignee: 华为云计算技术有限公司
Priority date: 2022-07-08
Filing date: 2023-04-10
Publication date: 2024-01-11
Also published as: CN117409118A

Abstract

A digital human driving method. The method comprises: obtaining first action data, wherein the first action data is action data generated according to the action of a real human; matching the first action data and action data in an action library, and when there is second action data in the action library that matches the first action data, optimizing the first action data according to the second action data to obtain third action data; and then, driving a digital human according to the third action data. Before the digital human is driven, according to data in a prior action database, the obtained action data of the real human is optimized and predicted, and then, the digital human is driven according to the optimized action data, so that the action distortion of the digital human is avoided, and the problem of clipping of the digital human is reduced.

Description

A digital human driving method, system and device

This application claims priority to the Chinese patent application submitted to the State Intellectual Property Office of China on July 8, 2022, with application number 202210797830.6 and the invention title "A digital human driving method, system and equipment", the entire content of which is incorporated by reference. incorporated in this application.

Technical field

The invention relates to the field of computer technology, and in particular to a digital human driving method and system.

Background technique

Due to the popularity of the Metaverse concept, digital humans are also booming as real-person agent technology in the Metaverse. Current digital human driving technology is divided into real person driving and artificial intelligence (artificial intelligence, AI) driving. Real-person driving means that the real person uses some equipment to obtain the action and voice data of the person (performer), and transfers the data to the digital person, and the digital person performs corresponding actions based on the data. AI driving generally uses actions already produced in the action library to drive digital humans. When driving a digital human through a real person, due to the deviation between the human and the digital human, it is necessary to remap the bones of the human and the digital human. However, remapping will cause the digital human's movements to deform, which may lead to mold wear or incorrect movements.

Contents of the invention

The present application provides a digital human driving method and system. Before driving the digital human, the acquired motion data of the human is optimized and predicted based on the data in the motion database. The digital human is then driven based on the optimized motion data. This avoids the movement deformation of the digital human and reduces the problem of the digital human wearing the mold.

In the first aspect, this application provides a digital human driving method, which method includes: obtaining the first action data generated by the target object; matching the first action data with the action data in the action library; when there is a third action data in the action library When the second action data matches the first action data, the digital human is driven according to the second action data.

In this solution, before driving the digital human based on the action data of the person in the middle (that is, the target object), the action data of the person in the middle is first optimized. That is, after acquiring the first action data, it is first determined whether there is action data matching the first action data in the action library. When there is second action data matching the first action data in the action library, the digital human is driven according to the second action data, thereby avoiding the deformation of the digital human's movements and reducing the problem of the digital human being wearing a mold.

In a possible implementation, driving the digital person according to the second action data includes: driving the digital person according to the first action data and the second action data.

That is to say, when there is second action data matching the first action data in the action library, the first action data can be optimized based on the second action data, and then the digital human can be driven based on the optimized first action data, This avoids the movement deformation of the digital human and reduces the problem of the digital human wearing the mold.

In a possible implementation, driving the digital human based on the first action data and the second action data includes:

According to the second action data, at least one data frame in the first action data is optimized to obtain optimized first action data.

That is to say, when applicable scenarios are different, different optimization methods can be selected when optimizing the first action data. In scenarios where time delay does not need to be considered, after it is determined that the second action data exists in the action library to match the first action data, optimization can be started from the first data frame of the first action data, which improves the accuracy of the digital human actions. sex. In scenarios where time delay needs to be considered, after determining the action corresponding to the first action data through m frames of action data, and determining that the second action data exists in the action library to match the first action data, the action data can be obtained from the first action data. m data frames start optimizing. While reducing the delay, the accuracy of the digital human's movements should be improved as much as possible.

In a possible implementation, matching the first action data with the action data in the action library includes: performing feature extraction on m action data frames in the first action data to determine the feature value of the first action data; The feature value of the first action data is compared with the feature value of the action data in the action library; based on the similarity between the feature value of the first action data and the feature value of the action data in the action library being greater than the similarity threshold, it is determined that the feature value in the action library is There is action data matching the first action data.

That is to say, since the first action data is sent frame by frame, when the data frame of the received first action data reaches a threshold, the corresponding first action data can be determined based on the received action data frame. Actions. At this time, the multiple action data frames of the received first action data can be matched with the action data in the action library. When matching multiple action data frames in the first action data with action data in the action library, the feature values of the multiple action data frames can be used as feature values of the first action data and matched with the action data in the action library. eigenvalues are compared. Only when the similarity between the feature value of the first action data and the feature value of a certain action data in the action library reaches a certain threshold, it can be determined that there is action data in the action library that matches the first action data.

In a possible implementation, performing feature extraction on m action data frames and determining the feature value of the first action data includes: obtaining the spatial information and temporal motion information of each data frame in the m action data frames; according to the spatial information and temporal motion information to generate feature values of the first action data.

That is to say, when determining the characteristic value of the first action data, it can be determined based on the spatial information and temporal motion information of each data frame in the first action data.

In a possible implementation, before matching the first action data with the action data in the action library, the method further includes: determining whether the digital human action optimization system has turned on the action data optimization function, and based on the system turning on the action data optimization Function to match the first action data with the action data in the action library.

In other words, the user can determine whether to turn on the motion data optimization function. Improved user experience.

In the second aspect, this application provides a digital human driving system, including:

The collection module is used to obtain the first action data generated by the target object;

An optimization module used to match the first action data with the action data in the action library;

A processing module, configured to drive the digital human according to the second action data when the second action data in the action library matches the first action data.

In one possible implementation, the processing module is used to:

The digital human is driven based on the first motion data and the second motion data.

In a possible implementation, the optimization module is configured to: optimize at least one data frame in the first action data according to the second action data to obtain optimized first action data;

The processing module is used to: drive the digital human based on the optimized first action data.

In one possible implementation, the optimization module is used to:

Perform feature extraction on m action data frames in the first action data to determine the feature value of the first action data; where m is a natural number greater than or equal to 2;

Compare the feature value of the first action data with the feature value of the action data in the action library;

Based on the similarity between the feature value of the first action data and the feature value of the action data in the action library being greater than the similarity threshold, it is determined that the action data in the action library matches the first action data.

In one possible implementation, the optimization module is used to:

Obtain the spatial information and temporal motion information of each data frame in the m action data;

Based on the spatial information and temporal motion information of each data frame, a feature value of the first action data is generated.

In a possible implementation, the optimization module is also used to:

Determine whether the digital human action optimization system has turned on the action data optimization function, and based on the system turning on the action data optimization function, match the first action data with the action data in the action library.

In a third aspect, this application provides an electronic device, including:

At least one memory for storing programs;

At least one processor for executing programs stored in the memory;

Wherein, when the program stored in the memory is executed, the processor is configured to execute the method described in the first aspect or any possible implementation of the first aspect.

In a fourth aspect, embodiments of the present application provide a computing cluster device, including at least one computing device, each computing device including a processor and a memory;

The processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device, so that the computing device cluster executes the method described in the first aspect or any possible implementation of the first aspect.

In a fifth aspect, the present application provides a computer-readable storage medium. The computer-readable storage medium stores a computer program. When the computer program is run on a processor, it causes the processor to execute the first aspect or any one of the first aspects. possible implementation methods.

In a sixth aspect, this application provides a computer program product. When the computer program product is run on a processor, it causes the processor to execute the method described in the first aspect or any possible implementation of the first aspect.

It can be understood that the beneficial effects of the above-mentioned second to sixth aspects can be referred to the relevant descriptions in the above-mentioned first aspect, and will not be described again here.

Description of the drawings

Figure 1 is a schematic flow chart of a method of using real people to drive digital people;

Figure 2(a) is a schematic diagram of an application scenario provided by the embodiment of the present application;

Figure 2(b) is a schematic diagram of another application scenario provided by the embodiment of the present application;

Figure 3 is a schematic diagram of the hardware structure of an electronic device provided by an embodiment of the present application;

Figure 4 is a schematic diagram of the hardware structure of a server provided by an embodiment of the present application;

Figure 5 is a flow chart of a digital human driving method provided in an embodiment of the present application;

Figure 6 is a schematic diagram of the movement changes of a person provided in the embodiment of the present application;

Figure 7 is a schematic diagram of a process for optimizing the first action data provided in the embodiment of the present application;

Figure 8 is a schematic diagram of another process of optimizing the first action data provided in the embodiment of the present application;

Figure 9 is a schematic diagram of another process of optimizing the first action data provided in the embodiment of the present application;

Figure 10 is a schematic flow chart of a digital human optimization method provided by an embodiment of the present application;

Figure 11 is a schematic structural diagram of a computing device provided in an embodiment of the present application;

Figure 12 is a schematic structural diagram of a computing device cluster provided in an embodiment of the present application;

Figure 13 is a schematic diagram of a connection method between computing device clusters provided in an embodiment of the present application.

Detailed ways

In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings.

In the description of the embodiments of this application, any embodiment or design solution that is "exemplary", "such as" or "for example" should not be construed as being more preferred or advantageous than other embodiments or design solutions. . Rather, use of the words "exemplary," "such as," or "for example" is intended to present the concepts in a concrete manner.

In addition, the terms "first" and "second" are only used for descriptive purposes and cannot be understood as indicating or implying relative importance or implicitly indicating the indicated technical features. Therefore, features defined as "first" and "second" may explicitly or implicitly include one or more of these features. The terms “including,” “includes,” “having,” and variations thereof all mean “including but not limited to,” unless otherwise specifically emphasized.

First, the terms involved in the embodiments of this application are introduced.

1. Digital human. In a narrow sense, digital human is the product of the integration of information science and life science. It uses information science methods to conduct virtual simulations of the human body's status and functions at different levels. After the digital human is completed, the digital human needs to be rendered and driven. The drive for digital people usually includes real person drive and AI drive. Among them, AI driving refers to using the actions that have been produced in the action library to drive the digital human. Real-person driving refers to using some equipment to obtain human movement and voice data, and then passing the obtained data to the digital human, who then performs corresponding actions based on the data.

2. Zhongzhiren is a professional "virtual anchor" based on motion capture and facial capture technology. The actors behind the scenes can be called "Zhongzhiren". In the embodiment of the present application, the person in the middle may refer to a person (target object) used to perform actions for the digital person, so that the digital person can perform actions based on the actions of the person in the middle.

As an example, Figure 1 shows a schematic flowchart of a method for using real people to drive digital people. Referring to Figure 1, it can be seen that the method includes: S101-S105.

S101: Obtain spatial data of key points of the human body.

Obtaining the spatial data of the key points of the person's body includes: identifying the key points of the person's body, and then mapping the identified key points to the bone points of the person, and generating spatial data of each bone point of the person.

In a possible example, spatial data of key points of the human body can be captured through an optical motion capture system, an inertial sensor, or a camera.

S102: Convert the acquired spatial data of key points of the human body into motion data of the digital human;

After obtaining the spatial data of the key points of the person's body, it is necessary to extract the action data frame from the obtained spatial data.

S103: Apply the generated action data to the digital human to drive the digital human to perform actions.

After obtaining the action data of the action person, the digital person can make corresponding actions based on the action data. action.

S104, render the digital human and generate a video.

The process by which a computer converts shapes stored in memory into corresponding shapes that are actually drawn on the screen is called rendering. The most commonly used technique in the rendering process is rasterization. Rasterization is the process of converting data into visible pixels.

When rendering a digital human, the digital human can be rendered through a graphics processor (graphics processing unit, GPU). Among them, the specific implementation of the GPU graphics rendering pipeline can be divided into six stages.

The first stage is the vertex shader. The input of this stage is vertex data; where vertex data refers to a collection of a series of vertices; the purpose of the vertex shader is to convert the 3D coordinates of the input vertices into another 3D coordinates, at the same time, the vertex shader can do some basic processing of vertex attributes.

The second stage is shape (primitive) assembly. This stage takes all the vertices output by the vertex shader as input and assembles all the points into the shape of the specified primitive.

The third stage, the geometry shader, takes a set of vertices in the form of primitives as input. It can construct new (or other) primitives by generating new vertices to generate other shapes.

The fourth stage, rasterization, maps primitives to corresponding pixels on the final screen to generate fragments. A fragment is all the data needed to render a pixel.

The fifth stage is the fragment shader. This stage first clips the input fragment; clipping discards all pixels beyond the view to improve execution efficiency.

The sixth stage, testing and mixing, this stage will detect the corresponding depth value (z coordinate) of the fragment, determine whether the pixel is in front or behind other objects, and decide whether it should be discarded; in addition, this stage will also check the alpha value (alpha value defines the transparency of an object), thereby blending the object.

S105, output video.

In the above solution, when obtaining the spatial data of the key points of the body of the human, it is necessary to use higher-precision equipment, and when driving the digital human through the human, it is necessary to use the human with a similar body shape. At the same time, in order to ensure the rationality of the movements of the digital human, it is necessary to require the human being not to make movements that are easy to cross the model, and to increase the collision detection accuracy of the model and repair the rationality of the movement data.

To address the problems in the above solutions, embodiments of the present application provide a digital human driving system. Before driving the digital human, the acquired motion data of the human is optimized and predicted based on the data in the motion database. Then the digital human is driven based on the optimized or predicted action data. This avoids the movement deformation of the digital human and reduces the problem of the digital human wearing the mold.

Next, the technical solutions provided by the embodiments of this application are introduced.

Exemplarily, FIG. 2(a) shows an application scenario. As shown in FIG. 2(a), the electronic device 100 may be included in this scenario. The electronic device 100 is configured with a digital human optimization system. The electronic device 100 can optimize the motion data of the digital human through the digital human optimization system, and drive the digital human based on the optimized motion data.

Exemplarily, FIG. 2(b) shows another application scenario. As shown in FIG. 2(b), this scenario may include an electronic device 100 and a server 200. In this scenario, the digital human optimization system may be configured on the server 200 , or partially configured on the electronic device 100 and partially configured on the server 200 . When the digital human optimization system is configured on the server 200, the server 200 can optimize the action data of the digital human through the digital human optimization system. Then, the server 200 drives the digital human based on the optimized action data, renders the digital human, and generates a video. Finally, the server 200 sends the generated video file to the electronic device 100 for display. When Digital People Optimize System Part Distribution When the electronic device 100 is placed on the server 200 and the other part is configured in the electronic device 100, the electronic device 100 can access the data provided by the server 200. For example, the electronic device 100 can obtain the optimized motion data of the digital human from the server 200 and drive the digital human based on the optimized motion data.

In some embodiments, the electronic device 100 and the server 200 may be connected through a network such as a wired network or a wireless network. For example, the network can be a local area network (LAN) or a wide area network (WAN) (such as the Internet). The network between the electronic device 100 and the server 200 can be implemented using any known network communication protocol. The network communication protocol can be various wired or wireless communication protocols, such as Ethernet, universal serial bus, USB), Firewire (firewire), global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), new radio interface (new radio, NR ), Bluetooth, wireless fidelity (Wi-Fi) and other communication protocols.

By way of example, FIG. 3 shows the hardware structure of an electronic device 100. The electronic device 100 may be, but is not limited to, a mobile phone, a tablet, a laptop, a wearable device, a smart TV, and other electronic devices. Exemplary embodiments of electronic devices include, but are not limited to, electronic devices equipped with iOS, android, Windows, Harmony OS, or other operating systems. The embodiments of this application do not specifically limit the type of electronic equipment.

As shown in FIG. 3 , the electronic device 100 may include: a processor 110 , a memory 120 , a display screen 130 , a communication module 140 and an input device 150 . Among them, the processor 110, the memory 120, the display screen 130, the communication module 140 and the input device 150 can be connected through a bus or other means.

Among them, the processor 110 is the computing core and control core of the electronic device 100 . Processor 110 may include one or more processing units. For example, the processor 110 may include an application processor (AP), a modem, a graphics processing unit (GPU), an image signal processor (ISP), a controller, a video encoder One or more of a decoder, a digital signal processor (DSP), a baseband processor, and/or a neural-network processing unit (NPU), etc. Among them, different processing units can be independent devices or integrated in one or more processors.

The memory 120 may store a program, and the program may be run by the processor 110, so that the processor 110 executes some or all of the methods required to be executed by the electronic device 100 provided in the embodiment of the present application. Memory 120 may also store data. Processor 110 can read data stored in memory 120. The memory 120 and the processor 110 may be provided separately. Optionally, the memory 120 may also be integrated in the processor 110 .

The display screen 130 is used to display images, videos, etc. Display screen 130 includes a display panel. The display panel can use a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light). emitting diode (AMOLED), flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode Tubes (quantum dot light emitting diodes, QLED), etc.

The communication module 140 may include at least one of a mobile communication module and a wireless communication module. When the communication module 140 includes a mobile communication module, the communication module 140 can provide solutions for wireless communication including 2G/3G/4G/5G applied on the electronic device 100 . For example, global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband code division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (long term evolution, LTE), new radio (new radio, NR), etc. When the communication module 140 includes a wireless communication module, the communication module 140 can provide a wireless local area network (WLAN) (such as a wireless fidelity (Wi-Fi) network), Bluetooth, etc., which is applied on the electronic device 100. (bluetooth, BT), global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), infrared technology (infrared, IR) and other wireless communications s solution. For example, the communication module 140 can be used for the electronic device 100 to communicate with the server 200 to complete data interaction.

In some embodiments, electronic device 100 may also include input device 150 . Information can be input to the electronic device 100 and/or control instructions can be issued through the input device 150 . For example, the input device 150 may be, but is not limited to, a mouse, a keyboard, etc.

It can be understood that the structure illustrated in Figure 3 of the embodiment of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the present application, the electronic device 100 may include more or fewer components than shown in the figures, or some components may be combined, some components may be separated, or some components may be arranged differently. The components illustrated may be implemented in hardware, software, or a combination of software and hardware.

By way of example, FIG. 4 shows the hardware structure of a server 200. The server 200 may be, but is not limited to, used to provide cloud services. It may be a server that can establish a communication connection with the electronic device 100 and can provide the electronic device 100 with data processing functions, computing functions and/or storage functions. Super electronic device. The server 200 may be a hardware server, or may be embedded in a virtualized environment. For example, the server 200 may be a virtual machine executed on a hardware server including one or more other virtual machines.

As shown in FIG. 4 , the server 200 may include: a processor 210 , a network interface 220 , and a memory 230 . Among them, the processor 210, the network interface 220, and the memory 230 can be connected through a bus or other means.

In the embodiment of the present application, the processor 210 (or central processing unit (CPU)) is the computing core and control core of the server 200 .

The network interface 220 may include a standard wired interface, a wireless interface (such as WI-FI, mobile communication interface, etc.), and is controlled by the processor 210 for sending and receiving data, for example, receiving a target task sample sent by the electronic device 100 from the network. Data etc.

The memory 230 (memory) is a memory device of the server 200 and is used to store programs and data, such as pre-trained models. It can be understood that the memory 230 this time can be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory; optionally, it can also be at least one located far away from the aforementioned processor 210 storage device. Memory 230 provides storage space, which The storage space stores the server's operating system and executable program code, which can include but is not limited to: Windows system (an operating system), Linux system (an operating system), Hongmeng system (an operating system), etc., in This is not limited.

It can be understood that the structure illustrated in Figure 4 of the embodiment of the present application does not constitute a specific limitation on the server 200. In other embodiments of this application, the server 200 may be a cloud server. Server 200 may include more or fewer components than illustrated, some components may be combined, some components may be separated, or components may be arranged differently. The components illustrated may be implemented in hardware, software, or a combination of software and hardware.

The above is the relevant introduction to the application scenarios, the hardware structure of the electronic device 100 and the hardware structure of the server 200 involved in the embodiments of the application. Next, based on the above description, the digital human optimization system provided in the embodiment of the present application is introduced.

For example, FIG. 5 shows a schematic structural diagram of a digital human driving system provided by an embodiment of the present application. Referring to Figure 5, the system includes: a data collection module 501, a digital human driving module 502, an action optimization module 503, and a digital human rendering module 504.

The data collection module 501 is used to collect the first action data of the person. Specifically, the first action data collected by the data collection module 501 may be the video file data of the person in the subject obtained through the camera, or the spatial data of the key points of the body of the person in the subject obtained through the motion capture device. Among them, the motion capture equipment may include: optical motion capture system, inertial sensor, etc. In a possible example, 17 joint points of the human body can be used as key points of the human body. Among them, the 17 key points of the human body include: nose, left eye, right eye, left ear, right ear, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left hip, right hip, left Knee, right knee, bare left ankle, bare right ankle.

After the digital human driving module 502 receives the first action data sent by the data collection module 501, the digital human driving module 502 determines the first action data. When the first action data is video file data, the digital human driving module 502 also needs to identify the body key points of the person in the video file, and determine the spatial data of the identified body key points. Then, the digital human driving module 502 identifies the spatial data of the human body key points from the video file data as the first action data. When the first action data received by the digital human driving module 502 is the spatial data of key points of the human body, the digital human driving module 502 does not need to process the first action data.

After the digital human driving module 502 receives the first action data sent by the data collection module 501, the digital human driving module 502 also needs to determine whether the digital human optimization system has enabled the digital human action optimization service. If the digital human optimization system does not enable the digital human action optimization service. Then the digital human driving module 502 generates motion driving information of the digital human according to the first motion data to drive the digital human to produce motions corresponding to the first motion data. If the digital human system has enabled the digital human action optimization service. Then the digital human driving module 502 sends the first action data to the action optimization module 503.

In a possible example, the user can determine by himself whether the action optimization service of the digital human optimization system needs to be turned on. This increases the user experience and the usability of the system. In another possible embodiment, the digital human optimization system can also enable the action optimization service by default.

The action optimization module 503 is used to receive the first action data sent by the digital human driving module 502, cache the data frame of the received first action data, and use the spatial information and the spatial information carried by the cached action data frame to The temporal motion information determines the characteristic value of the first motion data. The feature value of the first action data may be a feature vector, and the feature value of the first action data may be used to characterize the type of the first action data. Then, the feature value of the first action data is matched with the feature value of the action data in the reference action library. When there is second action data matching the first action data in the reference action library, the action optimization module 503 can optimize the first action data according to the second action data.

Because, when the digital human driving module 502 transmits the first action data to the action optimization module 503, the first action data is transmitted frame by frame. Each frame of data can only represent the state of the person in it at a certain moment. Therefore, the action optimization module 503 can identify the action corresponding to the first action data only after caching a certain number of first action data frames, and then match the action data in the reference action library with the first action data. For example, take the waving motion. Referring to FIG. 6 , FIG. 6 shows the relative position information of the left shoulder and left wrist of the person in the center carried by the three frames of action data of the person in the middle. Among them, (a), (b), and (c) in Figure 6 respectively represent the left shoulder and left shoulder of the person in the center carried in the first action data frame, the second action data frame, and the third action data frame. The relative position of the wrist. In the first action data frame, the left wrist of the person in the picture is lower than the left shoulder. At this time, the movement of the person in the picture cannot be recognized. In the second action data frame, the left wrist of the person in the middle is flush with the left shoulder. At this time, the action made by the person in the middle is still unrecognizable. In the third action data frame, the left wrist of the subject is higher than the left shoulder. At this time, it can be determined that the action made by the person in the picture is a waving action.

Therefore, in a possible example, a first threshold can be set in advance. When the number of cached data frames of the first action data in the action optimization module 503 exceeds the first threshold, the action optimization module 503 can compare the cached data frames of the first action data with the action data in the reference action library. matching to determine whether action data matching the first action data exists in the reference action library. The first threshold may be an artificially set threshold. When setting the first threshold, it is necessary to ensure that a specific action can be determined through data frames that meet the first threshold.

In another possible example, each time the action optimization module 503 receives a frame of data, it can make a judgment on the data frame and the data frames cached before the data frame. When the action optimization module 503 cannot determine the action corresponding to the first action data through the received data frame, the action optimization module 503 can continue to receive the first action data sent by the digital human driving module 502. When the action optimization module 503 can determine the action corresponding to the first action data through the received data frame, the action optimization module 503 can receive the first action data sent by the digital human driving module 502 while converting the received first action data The data frame of action data is matched with the action data in the reference action library.

When there is an action data in the reference action library that matches the first action data, the action optimization module 503 includes two situations when optimizing the first action data according to the action data in the reference action library. The first situation It is necessary to consider the delay. In the second case, there is no need to consider the delay. Among them, the first situation can correspond to the live broadcast scenario. In this scenario, the action optimization module needs to consider the delay information when optimizing the first action data. The second situation can correspond to the on-demand scene. In this scenario, the action optimization module does not need to consider the delay information when optimizing the first action data.

In the live broadcast scenario, refer to Figure 7. Figure 7 is a schematic diagram of the process of optimizing the first action data by the action optimization module 503. The action corresponding to the first action data in FIG. 7 may be the waving action illustrated in FIG. 6 . Referring to FIG. 7 , the action optimization module 503 optimizing the first action data includes: S701-S707. S701. The action optimization module 503 receives and caches the first data frame of the first action data sent by the digital human driver module 502. Determine the action corresponding to the first action data based on the received first data frame. S702, when the action optimization module 503 cannot determine the action corresponding to the first action data, the action optimization module 503 sends the first data frame to the digital human driving module 502, so that the digital human driving module 502 drives the digital human according to the received first data frame. people. S703. The action optimization module 503 receives and caches the second data frame sent by the digital human driver module 502, and determines the action corresponding to the first action data based on the first data frame and the second data frame. S704, when the action optimization module 503 cannot determine the action corresponding to the first action data, the action optimization module 503 sends a second data frame to the digital human driving module 502, so that the digital human driving module 502 drives the digital human according to the received second data frame. people. S705, the action optimization module 503 receives and caches the third data frame sent by the digital human driver module 502, and determines the action corresponding to the first action data based on the first data frame, the second data frame, and the third data frame. S706. After the action optimization module 503 determines the action corresponding to the first action data, the action optimization module 503 determines whether there is action data matching the first action data in the reference action library. When there is action data matching the first action data in the reference action library, When matching action data, the action optimization module 503 optimizes the third data frame according to the action data in the reference action library. S707, the action optimization module 503 sends the optimized third data frame to the digital human driving module 502, so that the digital human driving module 502 drives the digital human according to the optimized third data frame.

In the live broadcast scenario, the action optimization module 503 receives the first action data frame sent by the digital human driver module 502 while making judgment on the received action data frame. When the action optimization module 503 cannot determine the action corresponding to the first action data based on the currently received data frame and the cached data frame, the action optimization module 503 directly returns the currently received data frame to the digital human driver module 502, This allows the digital human driving module 502 to drive the digital human according to the data frame, thereby avoiding long delays during the live broadcast process. When the action optimization module 503 can determine the action corresponding to the first action data based on the currently received data frame (the third data frame) and the cached data frame, and there is an action matching the first action data in the reference action library When the data is generated, the action optimization module 503 repairs the current data frame and the data frames received thereafter according to the action data in the reference action library, without the need to repair the data frames before the current data frame (the first data frame and the second data frame). frame) to repair.

In the on-demand scenario, refer to 8. Figure 8 is a schematic diagram of the process of optimizing the first action data by the action optimization module 503. The action corresponding to the first action data in Figure 8 may be the waving action illustrated in Figure 6 . Referring to FIG. 8 , the action optimization module 503 optimizing the first action data includes: S801-S805. S801. The action optimization module 503 receives and caches the first data frame of the first action data sent by the digital human driving module 502, and determines the action corresponding to the first action data based on the received first data frame. S802, when the action optimization module 503 cannot determine the action corresponding to the first action data, continue to receive and cache the second data frame of the first action data, and determine the first action data based on the received first data frame and second data frame. corresponding action. S803: When the action optimization module 503 cannot determine the action corresponding to the first action data, it continues to receive and cache the third data frame of the first action data, and based on the received first data frame, second data frame, and third data frame Determine the action corresponding to the first action data. S804: After the action optimization module 503 determines the action corresponding to the first action data, the action optimization module 503 determines whether there is action data matching the first action data in the reference action library. When there is action data matching the first action data in the reference action library, the action optimization module 503 optimizes the first data frame according to the action data in the reference action library. S805, the action optimization module 503 sends the optimized first data frame to the digital human driving module 502, so that the digital human driving module 502 drives the digital human according to the received first data frame.

In the on-demand scenario, refer to Figure 9. Figure 9 shows the action optimization module 503 optimizing the first action data. Process diagram. The action corresponding to the first action data in FIG. 9 may be the waving action illustrated in FIG. 6 . Referring to FIG. 9 , the action optimization module 503 optimizing the first action data includes: S901-S906. S901. The action optimization module 53 receives and caches the first data frame of the first action data sent by the digital human driver module 502. S902, the action optimization module 503 receives and caches the second data frame of the first action data sent by the digital human driver module 502, and determines whether the number of cached data frames is greater than or equal to the first threshold (refer to Figure 6 For waving motion, the first threshold can be set to 3). S903. The action optimization module 503 receives and caches the third data frame of the first action data sent by the digital human driver module 502, and determines whether the number of cached data frames is greater than or equal to the first threshold. S904: When the number of data frames of the first action data cached in the action optimization module 503 is greater than or equal to the first threshold, the action optimization module 503 determines whether there is action data matching the first action data in the reference action library. S905: When there is action data matching the first action data in the reference action library, the action optimization module 503 optimizes the first data frame according to the action data in the reference action library. S906, the action optimization module 503 sends the optimized first data frame to the digital human driving module 502, so that the digital human driving module 502 drives the digital human according to the received first data frame.

In the on-demand scenario, when the first action data matches the action data in the reference action library, the action optimization module 503 receives the data frame of the first data sent by the digital human driver module 502 while processing the cached first data. The data frame is optimized. Since in the on-demand scenario, there is no requirement for delay, at this time, the action optimization module 503 can start optimizing from the first data frame of the first action data.

In the above embodiment, when the action optimization module 503 matches the first action data with the action data in the reference action library, the action optimization module 503 matches the cached multiple data frames with the action data in the reference action library. . In a possible example, the action optimization module 503 may match the feature values of the cached multiple data frames with the feature values of the action data in the reference action library. When there is a feature value of the second action data in the reference action library and the similarity with the feature values of multiple data frames cached in the action optimization module 503 reaches the similarity threshold, the second action data can be determined to be the same as the first action data. match. At this time, the action optimization module 503 can optimize the data frame of the first action data according to the second action data. When optimizing the data frame of the first action data, the relative spatial position information and temporal motion information of the data frame of the first action data may be optimized.

In a possible example, the feature values of each action data in the reference action library and the feature values of the first action data may be feature vectors. The feature vectors of each action data in the reference action library are pre-calculated. When calculating the feature value of the first action data, the feature value of the first action data can be determined based on the relative spatial position information and temporal motion information of each frame in the first action data. Specifically, when calculating the feature value of the first action data, a feature vector may be generated by calculating the quaternion change trajectory of the key point based on the time change of the data frame in the first action data.

In a possible example, the similarity threshold may be preset by the user.

In some possible embodiments. After the action optimization module 503 determines that the action data in the reference action library matches the first action data, the action optimization module 503 can determine the first action data currently received by the action optimization module based on the action data in the reference action library. The next frame of data of the frame. Then the action optimization module 503 sends the acquired next frame data to the digital human driving module, so that the digital human driving module drives the digital human according to the received data frame. In this embodiment, the action optimization module predicts the data frame of the first action data based on the action data in the reference action library to avoid action mutations due to acquisition jitter and increase the naturalness of the action.

In some possible embodiments, after the action optimization module determines that there is a second action data matching the first action data in the reference action library based on the cached data frame of the first action data, the action optimization module 503 can stop receiving the second action data. A data frame of action data. The action optimization module 503 obtains the data frame of the second action data from the reference action library, and sends the data frame of the second action data to the digital human driving module 502, so that the digital human driving module 502 drives the digital human according to the received data frame. people. In this embodiment, the person involved only needs to perform the starting action of the first action. Then, the action optimization module 503 searches the reference action library for a second action that matches the first action based on the starting action of the first action, and sends the action data frame of the second action to the digital human driving module 502, so that the digital human The human driving module 502 drives the digital human according to the received action data frame of the second action to reduce the difficulty of the human's action.

In a possible example, after the digital human's first action is completed, the digital human driving module 503 can transition the first action to the next action by inserting frames.

In a possible example, continuing to refer to FIG. 5 , the action optimization module 503 includes: a feature value calculation module 5031 , a reference action library 5032 , an action matching module 5033 , and an action optimization/prediction module 5034 . Among them, the feature value calculation module 5031 is used to calculate the feature value of the first action data. The reference action library 5032 is used to store verified a priori action data (standard action data). The action matching module 5033 is configured to compare the feature value of the first action data with the data feature value of the action data stored in the reference action library 5032 . When there is a feature value of the second action data in the reference action library 5032 that is the same as a feature value of the first action data, the action matching module 5033 determines that the first action data matches the second action data. The action optimization/prediction module 5034 is used to optimize the first action data based on the second action data, or the action optimization/prediction module 5034 is used to predict the next frame of the first action data based on the second action data. The action data in the reference action library 5032 may be data provided by the service provider in advance. For example, in a live broadcast scenario, when digital people perform live broadcasts through live broadcast software, the action data in the reference action library 5032 can be provided by the live broadcast platform.

The digital human rendering module 504 is used to render the digital human driven by the digital human driving module 503 and generate a corresponding video. Among them, rendering is the last process of computer animation (computer graphics, CG), which is used to make the user-designed content into the final rendering or animation using software. In real work, when a model or scene needs to be output into an image file, video signal or film, it must go through a rendering program.

In the above embodiment, the data collection module 501, the digital human driving module 502, the action optimization module 503, and the digital human rendering module 504 can all be implemented by software, or can be implemented by hardware. Illustratively, the following takes the action optimization module 503 as an example to introduce the implementation of the action optimization module 503. Similarly, the implementation of the data collection module 501, the digital human driving module 502, and the digital human rendering module 504 can refer to the implementation of the action optimization module 503.

Module As an example of a software functional unit, the action optimization module 503 may include code running on a computing instance. The computing instance may include at least one of a physical host (computing device), a virtual machine, and a container. Furthermore, the above computing instance may be one or more. For example, action optimization module 503 may include code running on multiple hosts/virtual machines/containers. It should be noted that multiple hosts/virtual machines/containers used to run the code can be distributed in the same region (region) or in different regions. Further, multiple hosts/virtual machines/containers used to run the code can be distributed in the same availability zone (AZ) or in different AZs. Each AZ includes one data center or multiple AZs. geographically close data centers. Among them, usually a region can include multiple AZs.

Likewise, the multiple hosts/VMs/containers used to run the code can be distributed in the same virtual private cloud (VPC), or across multiple VPCs. Among them, usually a VPC is set up in a region. Cross-region communication between two VPCs in the same region and between VPCs in different regions requires a communication gateway in each VPC, and the interconnection between VPCs is realized through the communication gateway. .

Module As an example of a hardware functional unit, the action optimization module 503 may include at least one computing device, such as a server. Alternatively, the action optimization module 503 may also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD). Among them, the above-mentioned PLD can be a complex programmable logical device (CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL), or any combination thereof.

Multiple computing devices included in the action optimization module 503 may be distributed in the same region or in different regions. Multiple computing devices included in the action optimization module 503 may be distributed in the same AZ or in different AZs. Similarly, multiple computing devices included in the action optimization module 503 may be distributed in the same VPC or in multiple VPCs. The plurality of computing devices may be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.

Next, based on the content described above, a digital human driving method provided by the embodiment of the present application is introduced. It can be understood that this method is proposed based on the content described above, and part or all of the content of the method can be referred to the above description.

Please refer to Figure 10, which is a schematic flowchart of a digital human optimization method provided by an embodiment of the present application. This method is suitable for scenarios in which digital humans are driven by human beings (real-person driving scenarios). As shown in Figure 10, the method includes: S1001-S1005.

In S1001, the digital human optimization system is triggered to start the digital human action optimization service. For example, the digital human optimization system can enable the action optimization service by default. It is also possible for the user to determine whether the action optimization service of the digital human optimization system needs to be enabled.

In S1002, first action data generated by the person being collected is collected. For example, the first action data generated by the person can be collected through the data collection module 501 in the digital human optimization system 500 described in FIG. 5 above. For example, the first action data collected may be video file data containing the actions of the person in the video, or may be spatial data of key points of the body of the person in the video.

In S1003, the digital human action optimization service is started based on the digital human optimization system, and the first action data is matched with the action data in the action library. For example, whether the digital human action optimization service is enabled can be determined through the digital human driving module 502 in Figure 5 above. Further, the digital human driving module 502 also needs to determine the type of the first action data. When the first action data is video file data, the digital human driving module 502 also needs to identify the body key points of the person in the video file, and determine the spatial data of the identified body key points. Then, the digital human driving module 502 uses the recognized spatial data of the body key points of the human as the first action data. When the first action data received by the digital human driving module 502 is the spatial data of key points of the human body, the digital human driving module 502 does not need to process the first action data. For example, the first action data can be matched with the action data in the reference action library 5032 through the action optimization module 503 in FIG. 5 above. specific The implementation process of can refer to the description of the action optimization module 503 in Figure 5 above.

In S1004, based on the second action data matching the first action data in the reference action library, the digital human is driven according to the second action number. For example, the first action data can be optimized through the action optimization module 503 in Figure 5 above to obtain the third action data. When optimizing the first action data, it can be divided into scenarios where delay needs to be considered and scenarios where delay does not need to be considered. For the specific implementation process, please refer to the methods described in Figure 7, Figure 8, and Figure 9 above.

For example, when the second action data matches the first action data in the reference action library, the digital human can be driven directly based on the second action data. At this time, the person in the middle only needs to do the starting action of the first action. Then, the action optimization module 503 searches the reference action library for a second action that matches the first action based on the starting action of the first action, and sends the action data frame of the second action to the digital human driving module 502, so that the digital human The human driving module 502 drives the digital human according to the received action data frame of the second action to reduce the difficulty of the human's action. In the embodiment of the present application, before driving the digital human, the digital human driving module first judges the collected first action data to determine whether the first action data needs to be optimized. When the digital human driving module determines that the first action needs to be optimized, the digital human driving module sends the first data to the action optimization module for optimization. Then the digital human driving module drives the digital human based on the optimized action data, avoiding the deformation of the digital human's movements and reducing the problem of the digital human being wearing a mold.

The present application also provides a computing device 1100. As shown in Figure 11, computing device 1100 includes: bus 1102, processor 1104, memory 1106, and communication interface 1108. The processor 1104, the memory 1106 and the communication interface 1108 communicate through a bus 1102. Computing device 1100 may be a server or a terminal device. It should be understood that this application does not limit the number of processors and memories in the computing device 1100.

The bus 1102 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one line is used in Figure 11, but it does not mean that there is only one bus or one type of bus. Bus 1104 may include a path that carries information between various components of computing device 1100 (eg, memory 1106, processor 1104, communications interface 1108).

The processor 1104 may include a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP). any one or more of them.

Memory 1106 may include volatile memory, such as random access memory (RAM). The processor 1104 may also include non-volatile memory, such as read-only memory (ROM), flash memory, hard disk drive (HDD) or solid state drive (solid state drive). drive, SSD).

The memory 1106 stores executable program code, and the processor 1104 executes the executable program code to realize the functions of the aforementioned data collection module 501, digital human driver module 502, action optimization module 503, and digital human rendering module 504, respectively. Implementing digital human optimization methods. That is, the memory 106 stores instructions for executing the digital human optimization method.

The communication interface 1103 uses transceiver modules such as, but not limited to, network interface cards and transceivers to implement communication between the computing device 1100 and other devices or communication networks.

An embodiment of the present application also provides a computing device cluster. The computing device cluster includes at least one computing device. The computing device may be a server, such as a central server, an edge server, or a local server in a local data center. In some embodiments, the computing device may also be a terminal device such as a desktop computer, a laptop computer, or a smartphone.

As shown in Figure 12, the computing device cluster includes at least one computing device 1100. The same instructions for performing the digital human optimization method may be stored in the memory 1106 of one or more computing devices 1100 in the cluster of computing devices.

In some possible implementations, the memory 1106 of one or more computing devices 1100 in the computing device cluster may also respectively store part of the instructions for executing the digital human optimization method. In other words, a combination of one or more computing devices 1100 may collectively execute instructions for performing a digital human optimization method.

It should be noted that the memory 1106 in different computing devices 1100 in the computing device cluster can store different instructions, which are respectively used to execute part of the functions of the digital human driving system. That is, the instructions stored in the memory 1106 in different computing devices 1100 can implement the functions of one or more modules in the data collection module 501, the digital human driving module 502, the action optimization module 503, and the digital human rendering module 504.

In some possible implementations, one or more computing devices in a cluster of computing devices may be connected through a network. Wherein, the network may be a wide area network or a local area network, etc. Figure 13 shows a possible implementation. As shown in Figure 13, two computing devices 1100A and 1100B are connected through a network. Specifically, the connection to the network is made through a communication interface in each computing device. In this type of possible implementation, the memory 1106 in the computing device 1100A stores instructions for performing the functions of the data acquisition module 501 . At the same time, the memory 1106 in the computing device 1100B stores instructions for executing the functions of the digital human driving module 502, the motion optimization module 503, and the digital human rendering module 504.

The connection method between the computing device clusters shown in Figure 13 can be: Considering that the digital human action optimization method provided by this application needs to store a large amount of data and perform a large amount of calculations, the digital human driving module 502 and the action optimization module are considered 503. The functions implemented by the digital human rendering module 504 are executed by the computing device 100B.

It should be understood that the functions of the computing device 100A shown in FIG. 13 may also be performed by multiple computing devices 100 . Likewise, the functions of computing device 100B may also be performed by multiple computing devices 100 .

An embodiment of the present application also provides a computer program product containing instructions. The computer program product may be a software or program product containing instructions capable of running on a computing device or stored in any available medium. When the computer program product is run on at least one computing device, at least one computing device is caused to execute the digital human motion optimization method.

An embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium may be any available medium that a computing device can store or a data storage device such as a data center that contains one or more available media. The available media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media (eg, solid state drive), etc. The computer-readable storage medium includes instructions that instruct a computing device to perform a digital human motion optimization method.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be used Modify the technical solutions described in the foregoing embodiments, or modify some of the technical features. Equivalent substitutions; and these modifications or substitutions do not cause the essence of the corresponding technical solution to depart from the protection scope of the technical solution of each embodiment of the present invention.

Claims

A digital human driving method, characterized in that the method includes:

Obtain the first action data generated by the target object;

Match the first action data with action data in the action library;

When there is second action data in the action library that matches the first action data, the digital human is driven according to the second action data.
The method of claim 1, wherein driving the digital human according to the second action data includes:

The digital human is driven based on the first motion data and the second motion data.
The method according to claim 2, wherein driving the digital human according to the first action data and the second action data includes:

According to the second action data, optimize at least one data frame in the first action data to obtain optimized first action data;

The digital human is driven based on the optimized first action data.
The method of claim 1, wherein matching the first action data with action data in the action library includes:

Perform feature extraction on m action data frames in the first action data to determine the feature values of the first action data; where m is a natural number greater than or equal to 2;

Compare the characteristic value of the first action data with the characteristic value of the action data in the action library;

Based on the similarity between the feature value of the first action data and the feature value of the action data in the action library being greater than the similarity threshold, it is determined that there is action data in the action library that matches the first action data.
The method according to claim 4, characterized in that, performing feature extraction on the m action data frames and determining the feature value of the first action data includes:

Obtain the spatial information and temporal motion information of each data frame in the m action data frames;

Characteristic values of the first action data are generated based on the spatial information and temporal motion information of each data frame.
The method according to claim 1, characterized in that, before matching the first action data with the action data in the action library, the method further includes:

Determine whether the digital human action optimization system has turned on the action data optimization function, and based on the system turning on the action data optimization function, match the first action data with the action data in the action library.
A digital human-driven system is characterized by including:

The collection module is used to obtain the first action data generated by the target object;

An optimization module, used to match the first action data with action data in the action library;

A processing module configured to drive the digital human according to the second action data when there is second action data in the action library that matches the first action data.
The system according to claim 7, characterized in that the processing module is used for:

The digital human is driven according to the first motion data and the second motion data.
The system according to claim 8, characterized in that the optimization module is configured to: optimize at least one data frame in the first action data according to the second action data to obtain an optimized first action data;

The processing module is used to drive the digital human according to the optimized first action data.
The system according to claim 7, characterized in that the optimization module is used for:

Perform feature extraction on m action data frames in the first action data to determine the feature values of the first action data; where m is a natural number greater than or equal to 2;

Compare the characteristic value of the first action data with the characteristic value of the action data in the action library;

Based on the similarity between the feature value of the first action data and the feature value of the action data in the action library being greater than the similarity threshold, it is determined that there is action data in the action library that matches the first action data.
The system according to claim 10, characterized in that the optimization module is used for:

Obtain the spatial information and temporal motion information of each data frame in the m action data;

Characteristic values of the first action data are generated based on the spatial information and temporal motion information of each data frame.
The system according to claim 7, characterized in that the optimization module is also used to:

Determine whether the digital human action optimization system has turned on the action data optimization function, and based on the system turning on the action data optimization function, match the first action data with the action data in the action library.
An electronic device, characterized by including:

At least one memory for storing programs;

At least one processor for executing programs stored in the memory;

Wherein, when the program stored in the memory is executed, the processor is configured to execute the method according to any one of claims 1-6.
A computing device cluster, characterized by including at least one computing device, each computing device including a processor and a memory;

The processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device, so that the computing device cluster performs the method according to any one of claims 1-6.
A computer-readable storage medium that stores a computer program. When the computer program is run on a processor, it causes the processor to execute the method as described in any one of claims 1-6. .
A computer program product, characterized in that, when the computer program product is run on a processor, the processor is caused to execute the method according to any one of claims 1-6.