WO2024007648A1 - Digital human driving method and system, and device - Google Patents

Digital human driving method and system, and device Download PDF

Info

Publication number
WO2024007648A1
WO2024007648A1 PCT/CN2023/087222 CN2023087222W WO2024007648A1 WO 2024007648 A1 WO2024007648 A1 WO 2024007648A1 CN 2023087222 W CN2023087222 W CN 2023087222W WO 2024007648 A1 WO2024007648 A1 WO 2024007648A1
Authority
WO
WIPO (PCT)
Prior art keywords
action
action data
data
digital human
library
Prior art date
Application number
PCT/CN2023/087222
Other languages
French (fr)
Chinese (zh)
Inventor
郑洛
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2024007648A1 publication Critical patent/WO2024007648A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences

Definitions

  • the invention relates to the field of computer technology, and in particular to a digital human driving method and system.
  • Real-person driving means that the real person uses some equipment to obtain the action and voice data of the person (performer), and transfers the data to the digital person, and the digital person performs corresponding actions based on the data.
  • AI driving generally uses actions already produced in the action library to drive digital humans.
  • the present application provides a digital human driving method and system. Before driving the digital human, the acquired motion data of the human is optimized and predicted based on the data in the motion database. The digital human is then driven based on the optimized motion data. This avoids the movement deformation of the digital human and reduces the problem of the digital human wearing the mold.
  • this application provides a digital human driving method, which method includes: obtaining the first action data generated by the target object; matching the first action data with the action data in the action library; when there is a third action data in the action library When the second action data matches the first action data, the digital human is driven according to the second action data.
  • the action data of the person in the middle is first optimized before driving the digital human based on the action data of the person in the middle (that is, the target object). That is, after acquiring the first action data, it is first determined whether there is action data matching the first action data in the action library. When there is second action data matching the first action data in the action library, the digital human is driven according to the second action data, thereby avoiding the deformation of the digital human's movements and reducing the problem of the digital human being wearing a mold.
  • driving the digital person according to the second action data includes: driving the digital person according to the first action data and the second action data.
  • the first action data can be optimized based on the second action data, and then the digital human can be driven based on the optimized first action data, This avoids the movement deformation of the digital human and reduces the problem of the digital human wearing the mold.
  • driving the digital human based on the first action data and the second action data includes:
  • At least one data frame in the first action data is optimized to obtain optimized first action data.
  • optimization methods can be selected when optimizing the first action data.
  • time delay when it is determined that the second action data exists in the action library to match the first action data, optimization can be started from the first data frame of the first action data, which improves the accuracy of the digital human actions. sex.
  • time delay needs to be considered, after determining the action corresponding to the first action data through m frames of action data, and determining that the second action data exists in the action library to match the first action data, the action data can be obtained from the first action data. m data frames start optimizing. While reducing the delay, the accuracy of the digital human's movements should be improved as much as possible.
  • matching the first action data with the action data in the action library includes: performing feature extraction on m action data frames in the first action data to determine the feature value of the first action data; The feature value of the first action data is compared with the feature value of the action data in the action library; based on the similarity between the feature value of the first action data and the feature value of the action data in the action library being greater than the similarity threshold, it is determined that the feature value in the action library is There is action data matching the first action data.
  • the corresponding first action data can be determined based on the received action data frame. Actions.
  • the multiple action data frames of the received first action data can be matched with the action data in the action library.
  • the feature values of the multiple action data frames can be used as feature values of the first action data and matched with the action data in the action library. eigenvalues are compared. Only when the similarity between the feature value of the first action data and the feature value of a certain action data in the action library reaches a certain threshold, it can be determined that there is action data in the action library that matches the first action data.
  • performing feature extraction on m action data frames and determining the feature value of the first action data includes: obtaining the spatial information and temporal motion information of each data frame in the m action data frames; according to the spatial information and temporal motion information to generate feature values of the first action data.
  • the characteristic value of the first action data it can be determined based on the spatial information and temporal motion information of each data frame in the first action data.
  • the method before matching the first action data with the action data in the action library, the method further includes: determining whether the digital human action optimization system has turned on the action data optimization function, and based on the system turning on the action data optimization Function to match the first action data with the action data in the action library.
  • the user can determine whether to turn on the motion data optimization function. Improved user experience.
  • this application provides a digital human driving system, including:
  • the collection module is used to obtain the first action data generated by the target object
  • An optimization module used to match the first action data with the action data in the action library
  • a processing module configured to drive the digital human according to the second action data when the second action data in the action library matches the first action data.
  • the processing module is used to:
  • the digital human is driven based on the first motion data and the second motion data.
  • the optimization module is configured to: optimize at least one data frame in the first action data according to the second action data to obtain optimized first action data;
  • the processing module is used to: drive the digital human based on the optimized first action data.
  • the optimization module is used to:
  • the optimization module is used to:
  • a feature value of the first action data is generated.
  • the optimization module is also used to:
  • this application provides an electronic device, including:
  • At least one memory for storing programs
  • At least one processor for executing programs stored in the memory
  • the processor when the program stored in the memory is executed, the processor is configured to execute the method described in the first aspect or any possible implementation of the first aspect.
  • embodiments of the present application provide a computing cluster device, including at least one computing device, each computing device including a processor and a memory;
  • the processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device, so that the computing device cluster executes the method described in the first aspect or any possible implementation of the first aspect.
  • the present application provides a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program.
  • the computer program When the computer program is run on a processor, it causes the processor to execute the first aspect or any one of the first aspects. possible implementation methods.
  • this application provides a computer program product.
  • the computer program product When the computer program product is run on a processor, it causes the processor to execute the method described in the first aspect or any possible implementation of the first aspect.
  • Figure 1 is a schematic flow chart of a method of using real people to drive digital people
  • Figure 2(a) is a schematic diagram of an application scenario provided by the embodiment of the present application.
  • Figure 2(b) is a schematic diagram of another application scenario provided by the embodiment of the present application.
  • Figure 3 is a schematic diagram of the hardware structure of an electronic device provided by an embodiment of the present application.
  • Figure 4 is a schematic diagram of the hardware structure of a server provided by an embodiment of the present application.
  • Figure 5 is a flow chart of a digital human driving method provided in an embodiment of the present application.
  • Figure 6 is a schematic diagram of the movement changes of a person provided in the embodiment of the present application.
  • Figure 7 is a schematic diagram of a process for optimizing the first action data provided in the embodiment of the present application.
  • Figure 8 is a schematic diagram of another process of optimizing the first action data provided in the embodiment of the present application.
  • Figure 9 is a schematic diagram of another process of optimizing the first action data provided in the embodiment of the present application.
  • Figure 10 is a schematic flow chart of a digital human optimization method provided by an embodiment of the present application.
  • Figure 11 is a schematic structural diagram of a computing device provided in an embodiment of the present application.
  • Figure 12 is a schematic structural diagram of a computing device cluster provided in an embodiment of the present application.
  • Figure 13 is a schematic diagram of a connection method between computing device clusters provided in an embodiment of the present application.
  • any embodiment or design solution that is "exemplary”, “such as” or “for example” should not be construed as being more preferred or advantageous than other embodiments or design solutions. . Rather, use of the words “exemplary,” “such as,” or “for example” is intended to present the concepts in a concrete manner.
  • first and second are only used for descriptive purposes and cannot be understood as indicating or implying relative importance or implicitly indicating the indicated technical features. Therefore, features defined as “first” and “second” may explicitly or implicitly include one or more of these features.
  • the terms “including,” “includes,” “having,” and variations thereof all mean “including but not limited to,” unless otherwise specifically emphasized.
  • Digital human In a narrow sense, digital human is the product of the integration of information science and life science. It uses information science methods to conduct virtual simulations of the human body's status and functions at different levels. After the digital human is completed, the digital human needs to be rendered and driven.
  • the drive for digital people usually includes real person drive and AI drive.
  • AI driving refers to using the actions that have been produced in the action library to drive the digital human.
  • Real-person driving refers to using some equipment to obtain human movement and voice data, and then passing the obtained data to the digital human, who then performs corresponding actions based on the data.
  • Zhongzhiren is a professional "virtual anchor” based on motion capture and facial capture technology.
  • the actors behind the scenes can be called “Zhongzhiren”.
  • the person in the middle may refer to a person (target object) used to perform actions for the digital person, so that the digital person can perform actions based on the actions of the person in the middle.
  • Figure 1 shows a schematic flowchart of a method for using real people to drive digital people. Referring to Figure 1, it can be seen that the method includes: S101-S105.
  • Obtaining the spatial data of the key points of the person's body includes: identifying the key points of the person's body, and then mapping the identified key points to the bone points of the person, and generating spatial data of each bone point of the person.
  • spatial data of key points of the human body can be captured through an optical motion capture system, an inertial sensor, or a camera.
  • S102 Convert the acquired spatial data of key points of the human body into motion data of the digital human
  • the digital person After obtaining the action data of the action person, the digital person can make corresponding actions based on the action data. action.
  • Rasterization is the process of converting data into visible pixels.
  • the digital human can be rendered through a graphics processor (graphics processing unit, GPU).
  • graphics processor graphics processing unit, GPU
  • the specific implementation of the GPU graphics rendering pipeline can be divided into six stages.
  • the first stage is the vertex shader.
  • the input of this stage is vertex data; where vertex data refers to a collection of a series of vertices; the purpose of the vertex shader is to convert the 3D coordinates of the input vertices into another 3D coordinates, at the same time, the vertex shader can do some basic processing of vertex attributes.
  • the second stage is shape (primitive) assembly. This stage takes all the vertices output by the vertex shader as input and assembles all the points into the shape of the specified primitive.
  • the third stage takes a set of vertices in the form of primitives as input. It can construct new (or other) primitives by generating new vertices to generate other shapes.
  • the fourth stage maps primitives to corresponding pixels on the final screen to generate fragments.
  • a fragment is all the data needed to render a pixel.
  • the fifth stage is the fragment shader. This stage first clips the input fragment; clipping discards all pixels beyond the view to improve execution efficiency.
  • this stage will detect the corresponding depth value (z coordinate) of the fragment, determine whether the pixel is in front or behind other objects, and decide whether it should be discarded; in addition, this stage will also check the alpha value (alpha value defines the transparency of an object), thereby blending the object.
  • embodiments of the present application provide a digital human driving system. Before driving the digital human, the acquired motion data of the human is optimized and predicted based on the data in the motion database. Then the digital human is driven based on the optimized or predicted action data. This avoids the movement deformation of the digital human and reduces the problem of the digital human wearing the mold.
  • FIG. 2(a) shows an application scenario.
  • the electronic device 100 may be included in this scenario.
  • the electronic device 100 is configured with a digital human optimization system.
  • the electronic device 100 can optimize the motion data of the digital human through the digital human optimization system, and drive the digital human based on the optimized motion data.
  • FIG. 2(b) shows another application scenario.
  • this scenario may include an electronic device 100 and a server 200.
  • the digital human optimization system may be configured on the server 200 , or partially configured on the electronic device 100 and partially configured on the server 200 .
  • the server 200 can optimize the action data of the digital human through the digital human optimization system. Then, the server 200 drives the digital human based on the optimized action data, renders the digital human, and generates a video. Finally, the server 200 sends the generated video file to the electronic device 100 for display.
  • the electronic device 100 When Digital People Optimize System Part Distribution
  • the electronic device 100 can access the data provided by the server 200.
  • the electronic device 100 can obtain the optimized motion data of the digital human from the server 200 and drive the digital human based on the optimized motion data.
  • the electronic device 100 and the server 200 may be connected through a network such as a wired network or a wireless network.
  • the network can be a local area network (LAN) or a wide area network (WAN) (such as the Internet).
  • LAN local area network
  • WAN wide area network
  • the network between the electronic device 100 and the server 200 can be implemented using any known network communication protocol.
  • the network communication protocol can be various wired or wireless communication protocols, such as Ethernet, universal serial bus, USB), Firewire (firewire), global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), new radio interface (new radio, NR ), Bluetooth, wireless fidelity (Wi-Fi) and other communication protocols.
  • Ethernet universal serial bus, USB
  • Firewire firewire
  • GSM global system for mobile communications
  • GPRS general packet radio service
  • CDMA code division multiple access
  • CDMA broadband Code division multiple access
  • WCDMA wideband code division multiple access
  • TD-SCDMA time division code division multiple access
  • long term evolution long term evolution
  • LTE long term evolution
  • new radio interface new radio, NR
  • Bluetooth wireless fidelity
  • FIG. 3 shows the hardware structure of an electronic device 100.
  • the electronic device 100 may be, but is not limited to, a mobile phone, a tablet, a laptop, a wearable device, a smart TV, and other electronic devices.
  • Exemplary embodiments of electronic devices include, but are not limited to, electronic devices equipped with iOS, android, Windows, Harmony OS, or other operating systems. The embodiments of this application do not specifically limit the type of electronic equipment.
  • the electronic device 100 may include: a processor 110 , a memory 120 , a display screen 130 , a communication module 140 and an input device 150 .
  • the processor 110, the memory 120, the display screen 130, the communication module 140 and the input device 150 can be connected through a bus or other means.
  • the processor 110 is the computing core and control core of the electronic device 100 .
  • Processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (AP), a modem, a graphics processing unit (GPU), an image signal processor (ISP), a controller, a video encoder One or more of a decoder, a digital signal processor (DSP), a baseband processor, and/or a neural-network processing unit (NPU), etc.
  • AP application processor
  • GPU graphics processing unit
  • ISP image signal processor
  • controller a video encoder
  • DSP digital signal processor
  • NPU neural-network processing unit
  • different processing units can be independent devices or integrated in one or more processors.
  • the memory 120 may store a program, and the program may be run by the processor 110, so that the processor 110 executes some or all of the methods required to be executed by the electronic device 100 provided in the embodiment of the present application.
  • Memory 120 may also store data.
  • Processor 110 can read data stored in memory 120.
  • the memory 120 and the processor 110 may be provided separately.
  • the memory 120 may also be integrated in the processor 110 .
  • the display screen 130 is used to display images, videos, etc.
  • Display screen 130 includes a display panel.
  • the display panel can use a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • AMOLED organic light-emitting diode
  • FLED flexible light-emitting diode
  • Miniled MicroLed, Micro-oLed, quantum dot light-emitting diode Tubes (quantum dot light emitting diodes, QLED), etc.
  • the communication module 140 may include at least one of a mobile communication module and a wireless communication module.
  • the communication module 140 can provide solutions for wireless communication including 2G/3G/4G/5G applied on the electronic device 100 .
  • GSM global system for mobile communications
  • GPRS general packet radio service
  • CDMA code division multiple access
  • WCDMA broadband code division multiple access
  • TD-SCDMA time-division code division multiple access
  • LTE long term evolution
  • new radio new radio
  • the communication module 140 can provide a wireless local area network (WLAN) (such as a wireless fidelity (Wi-Fi) network), Bluetooth, etc., which is applied on the electronic device 100.
  • WLAN wireless local area network
  • Wi-Fi wireless fidelity
  • Bluetooth etc.
  • the communication module 140 can be used for the electronic device 100 to communicate with the server 200 to complete data interaction.
  • electronic device 100 may also include input device 150 .
  • Information can be input to the electronic device 100 and/or control instructions can be issued through the input device 150 .
  • the input device 150 may be, but is not limited to, a mouse, a keyboard, etc.
  • the structure illustrated in Figure 3 of the embodiment of the present application does not constitute a specific limitation on the electronic device 100.
  • the electronic device 100 may include more or fewer components than shown in the figures, or some components may be combined, some components may be separated, or some components may be arranged differently.
  • the components illustrated may be implemented in hardware, software, or a combination of software and hardware.
  • FIG. 4 shows the hardware structure of a server 200.
  • the server 200 may be, but is not limited to, used to provide cloud services. It may be a server that can establish a communication connection with the electronic device 100 and can provide the electronic device 100 with data processing functions, computing functions and/or storage functions. Super electronic device.
  • the server 200 may be a hardware server, or may be embedded in a virtualized environment.
  • the server 200 may be a virtual machine executed on a hardware server including one or more other virtual machines.
  • the server 200 may include: a processor 210 , a network interface 220 , and a memory 230 .
  • the processor 210, the network interface 220, and the memory 230 can be connected through a bus or other means.
  • the processor 210 (or central processing unit (CPU)) is the computing core and control core of the server 200 .
  • the network interface 220 may include a standard wired interface, a wireless interface (such as WI-FI, mobile communication interface, etc.), and is controlled by the processor 210 for sending and receiving data, for example, receiving a target task sample sent by the electronic device 100 from the network. Data etc.
  • a wireless interface such as WI-FI, mobile communication interface, etc.
  • the memory 230 is a memory device of the server 200 and is used to store programs and data, such as pre-trained models. It can be understood that the memory 230 this time can be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory; optionally, it can also be at least one located far away from the aforementioned processor 210 storage device.
  • Memory 230 provides storage space, which The storage space stores the server's operating system and executable program code, which can include but is not limited to: Windows system (an operating system), Linux system (an operating system), Hongmeng system (an operating system), etc., in This is not limited.
  • the structure illustrated in Figure 4 of the embodiment of the present application does not constitute a specific limitation on the server 200.
  • the server 200 may be a cloud server.
  • Server 200 may include more or fewer components than illustrated, some components may be combined, some components may be separated, or components may be arranged differently.
  • the components illustrated may be implemented in hardware, software, or a combination of software and hardware.
  • FIG. 5 shows a schematic structural diagram of a digital human driving system provided by an embodiment of the present application.
  • the system includes: a data collection module 501, a digital human driving module 502, an action optimization module 503, and a digital human rendering module 504.
  • the data collection module 501 is used to collect the first action data of the person.
  • the first action data collected by the data collection module 501 may be the video file data of the person in the subject obtained through the camera, or the spatial data of the key points of the body of the person in the subject obtained through the motion capture device.
  • the motion capture equipment may include: optical motion capture system, inertial sensor, etc.
  • 17 joint points of the human body can be used as key points of the human body.
  • the 17 key points of the human body include: nose, left eye, right eye, left ear, right ear, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left hip, right hip, left Knee, right knee, bare left ankle, bare right ankle.
  • the digital human driving module 502 After the digital human driving module 502 receives the first action data sent by the data collection module 501, the digital human driving module 502 determines the first action data. When the first action data is video file data, the digital human driving module 502 also needs to identify the body key points of the person in the video file, and determine the spatial data of the identified body key points. Then, the digital human driving module 502 identifies the spatial data of the human body key points from the video file data as the first action data. When the first action data received by the digital human driving module 502 is the spatial data of key points of the human body, the digital human driving module 502 does not need to process the first action data.
  • the digital human driving module 502 After the digital human driving module 502 receives the first action data sent by the data collection module 501, the digital human driving module 502 also needs to determine whether the digital human optimization system has enabled the digital human action optimization service. If the digital human optimization system does not enable the digital human action optimization service. Then the digital human driving module 502 generates motion driving information of the digital human according to the first motion data to drive the digital human to produce motions corresponding to the first motion data. If the digital human system has enabled the digital human action optimization service. Then the digital human driving module 502 sends the first action data to the action optimization module 503.
  • the user can determine by himself whether the action optimization service of the digital human optimization system needs to be turned on. This increases the user experience and the usability of the system.
  • the digital human optimization system can also enable the action optimization service by default.
  • the action optimization module 503 is used to receive the first action data sent by the digital human driving module 502, cache the data frame of the received first action data, and use the spatial information and the spatial information carried by the cached action data frame to The temporal motion information determines the characteristic value of the first motion data.
  • the feature value of the first action data may be a feature vector, and the feature value of the first action data may be used to characterize the type of the first action data. Then, the feature value of the first action data is matched with the feature value of the action data in the reference action library. When there is second action data matching the first action data in the reference action library, the action optimization module 503 can optimize the first action data according to the second action data.
  • the action optimization module 503 can identify the action corresponding to the first action data only after caching a certain number of first action data frames, and then match the action data in the reference action library with the first action data. For example, take the waving motion.
  • FIG. 6 shows the relative position information of the left shoulder and left wrist of the person in the center carried by the three frames of action data of the person in the middle.
  • (a), (b), and (c) in Figure 6 respectively represent the left shoulder and left shoulder of the person in the center carried in the first action data frame, the second action data frame, and the third action data frame.
  • the relative position of the wrist In the first action data frame, the left wrist of the person in the picture is lower than the left shoulder. At this time, the movement of the person in the picture cannot be recognized.
  • the second action data frame In the second action data frame, the left wrist of the person in the middle is flush with the left shoulder. At this time, the action made by the person in the middle is still unrecognizable.
  • the third action data frame the left wrist of the subject is higher than the left shoulder. At this time, it can be determined that the action made by the person in the picture is a waving action.
  • a first threshold can be set in advance.
  • the action optimization module 503 can compare the cached data frames of the first action data with the action data in the reference action library. matching to determine whether action data matching the first action data exists in the reference action library.
  • the first threshold may be an artificially set threshold. When setting the first threshold, it is necessary to ensure that a specific action can be determined through data frames that meet the first threshold.
  • each time the action optimization module 503 receives a frame of data it can make a judgment on the data frame and the data frames cached before the data frame.
  • the action optimization module 503 can continue to receive the first action data sent by the digital human driving module 502.
  • the action optimization module 503 can receive the first action data sent by the digital human driving module 502 while converting the received first action data
  • the data frame of action data is matched with the action data in the reference action library.
  • the action optimization module 503 When there is an action data in the reference action library that matches the first action data, the action optimization module 503 includes two situations when optimizing the first action data according to the action data in the reference action library.
  • the first situation It is necessary to consider the delay.
  • the second case there is no need to consider the delay.
  • the first situation can correspond to the live broadcast scenario.
  • the action optimization module needs to consider the delay information when optimizing the first action data.
  • the second situation can correspond to the on-demand scene. In this scenario, the action optimization module does not need to consider the delay information when optimizing the first action data.
  • Figure 7 is a schematic diagram of the process of optimizing the first action data by the action optimization module 503.
  • the action corresponding to the first action data in FIG. 7 may be the waving action illustrated in FIG. 6 .
  • the action optimization module 503 optimizing the first action data includes: S701-S707. S701.
  • the action optimization module 503 receives and caches the first data frame of the first action data sent by the digital human driver module 502. Determine the action corresponding to the first action data based on the received first data frame.
  • the action optimization module 503 when the action optimization module 503 cannot determine the action corresponding to the first action data, the action optimization module 503 sends the first data frame to the digital human driving module 502, so that the digital human driving module 502 drives the digital human according to the received first data frame. people. S703.
  • the action optimization module 503 receives and caches the second data frame sent by the digital human driver module 502, and determines the action corresponding to the first action data based on the first data frame and the second data frame.
  • the action optimization module 503 when the action optimization module 503 cannot determine the action corresponding to the first action data, the action optimization module 503 sends a second data frame to the digital human driving module 502, so that the digital human driving module 502 drives the digital human according to the received second data frame. people.
  • the action optimization module 503 receives and caches the third data frame sent by the digital human driver module 502, and determines the action corresponding to the first action data based on the first data frame, the second data frame, and the third data frame. S706. After the action optimization module 503 determines the action corresponding to the first action data, the action optimization module 503 determines whether there is action data matching the first action data in the reference action library. When there is action data matching the first action data in the reference action library, When matching action data, the action optimization module 503 optimizes the third data frame according to the action data in the reference action library. S707, the action optimization module 503 sends the optimized third data frame to the digital human driving module 502, so that the digital human driving module 502 drives the digital human according to the optimized third data frame.
  • the action optimization module 503 receives the first action data frame sent by the digital human driver module 502 while making judgment on the received action data frame.
  • the action optimization module 503 cannot determine the action corresponding to the first action data based on the currently received data frame and the cached data frame, the action optimization module 503 directly returns the currently received data frame to the digital human driver module 502, This allows the digital human driving module 502 to drive the digital human according to the data frame, thereby avoiding long delays during the live broadcast process.
  • the action optimization module 503 can determine the action corresponding to the first action data based on the currently received data frame (the third data frame) and the cached data frame, and there is an action matching the first action data in the reference action library
  • the action optimization module 503 repairs the current data frame and the data frames received thereafter according to the action data in the reference action library, without the need to repair the data frames before the current data frame (the first data frame and the second data frame). frame) to repair.
  • Figure 8 is a schematic diagram of the process of optimizing the first action data by the action optimization module 503.
  • the action corresponding to the first action data in Figure 8 may be the waving action illustrated in Figure 6 .
  • the action optimization module 503 optimizing the first action data includes: S801-S805. S801.
  • the action optimization module 503 receives and caches the first data frame of the first action data sent by the digital human driving module 502, and determines the action corresponding to the first action data based on the received first data frame.
  • the action optimization module 503 optimizes the first data frame according to the action data in the reference action library. S805, the action optimization module 503 sends the optimized first data frame to the digital human driving module 502, so that the digital human driving module 502 drives the digital human according to the received first data frame.
  • Figure 9 shows the action optimization module 503 optimizing the first action data.
  • the action corresponding to the first action data in FIG. 9 may be the waving action illustrated in FIG. 6 .
  • the action optimization module 503 optimizing the first action data includes: S901-S906. S901.
  • the action optimization module 53 receives and caches the first data frame of the first action data sent by the digital human driver module 502.
  • S902 the action optimization module 503 receives and caches the second data frame of the first action data sent by the digital human driver module 502, and determines whether the number of cached data frames is greater than or equal to the first threshold (refer to Figure 6 For waving motion, the first threshold can be set to 3).
  • the first threshold can be set to 3).
  • the action optimization module 503 receives and caches the third data frame of the first action data sent by the digital human driver module 502, and determines whether the number of cached data frames is greater than or equal to the first threshold.
  • S904 When the number of data frames of the first action data cached in the action optimization module 503 is greater than or equal to the first threshold, the action optimization module 503 determines whether there is action data matching the first action data in the reference action library.
  • S905 When there is action data matching the first action data in the reference action library, the action optimization module 503 optimizes the first data frame according to the action data in the reference action library.
  • the action optimization module 503 sends the optimized first data frame to the digital human driving module 502, so that the digital human driving module 502 drives the digital human according to the received first data frame.
  • the action optimization module 503 receives the data frame of the first data sent by the digital human driver module 502 while processing the cached first data.
  • the data frame is optimized. Since in the on-demand scenario, there is no requirement for delay, at this time, the action optimization module 503 can start optimizing from the first data frame of the first action data.
  • the action optimization module 503 matches the cached multiple data frames with the action data in the reference action library. .
  • the action optimization module 503 may match the feature values of the cached multiple data frames with the feature values of the action data in the reference action library.
  • the second action data can be determined to be the same as the first action data. match.
  • the action optimization module 503 can optimize the data frame of the first action data according to the second action data.
  • the feature values of each action data in the reference action library and the feature values of the first action data may be feature vectors.
  • the feature vectors of each action data in the reference action library are pre-calculated.
  • the feature value of the first action data can be determined based on the relative spatial position information and temporal motion information of each frame in the first action data.
  • a feature vector may be generated by calculating the quaternion change trajectory of the key point based on the time change of the data frame in the first action data.
  • the similarity threshold may be preset by the user.
  • the action optimization module 503 After the action optimization module 503 determines that the action data in the reference action library matches the first action data, the action optimization module 503 can determine the first action data currently received by the action optimization module based on the action data in the reference action library. The next frame of data of the frame. Then the action optimization module 503 sends the acquired next frame data to the digital human driving module, so that the digital human driving module drives the digital human according to the received data frame. In this embodiment, the action optimization module predicts the data frame of the first action data based on the action data in the reference action library to avoid action mutations due to acquisition jitter and increase the naturalness of the action.
  • the action optimization module 503 can stop receiving the second action data.
  • a data frame of action data The action optimization module 503 obtains the data frame of the second action data from the reference action library, and sends the data frame of the second action data to the digital human driving module 502, so that the digital human driving module 502 drives the digital human according to the received data frame.
  • people In this embodiment, the person involved only needs to perform the starting action of the first action.
  • the action optimization module 503 searches the reference action library for a second action that matches the first action based on the starting action of the first action, and sends the action data frame of the second action to the digital human driving module 502, so that the digital human
  • the human driving module 502 drives the digital human according to the received action data frame of the second action to reduce the difficulty of the human's action.
  • the digital human driving module 503 can transition the first action to the next action by inserting frames.
  • the action optimization module 503 includes: a feature value calculation module 5031 , a reference action library 5032 , an action matching module 5033 , and an action optimization/prediction module 5034 .
  • the feature value calculation module 5031 is used to calculate the feature value of the first action data.
  • the reference action library 5032 is used to store verified a priori action data (standard action data).
  • the action matching module 5033 is configured to compare the feature value of the first action data with the data feature value of the action data stored in the reference action library 5032 .
  • the action matching module 5033 determines that the first action data matches the second action data.
  • the action optimization/prediction module 5034 is used to optimize the first action data based on the second action data, or the action optimization/prediction module 5034 is used to predict the next frame of the first action data based on the second action data.
  • the action data in the reference action library 5032 may be data provided by the service provider in advance. For example, in a live broadcast scenario, when digital people perform live broadcasts through live broadcast software, the action data in the reference action library 5032 can be provided by the live broadcast platform.
  • the digital human rendering module 504 is used to render the digital human driven by the digital human driving module 503 and generate a corresponding video.
  • rendering is the last process of computer animation (computer graphics, CG), which is used to make the user-designed content into the final rendering or animation using software.
  • CG computer graphics
  • the data collection module 501, the digital human driving module 502, the action optimization module 503, and the digital human rendering module 504 can all be implemented by software, or can be implemented by hardware.
  • the following takes the action optimization module 503 as an example to introduce the implementation of the action optimization module 503.
  • the implementation of the data collection module 501, the digital human driving module 502, and the digital human rendering module 504 can refer to the implementation of the action optimization module 503.
  • the action optimization module 503 may include code running on a computing instance.
  • the computing instance may include at least one of a physical host (computing device), a virtual machine, and a container.
  • the above computing instance may be one or more.
  • action optimization module 503 may include code running on multiple hosts/virtual machines/containers. It should be noted that multiple hosts/virtual machines/containers used to run the code can be distributed in the same region (region) or in different regions. Further, multiple hosts/virtual machines/containers used to run the code can be distributed in the same availability zone (AZ) or in different AZs. Each AZ includes one data center or multiple AZs. geographically close data centers. Among them, usually a region can include multiple AZs.
  • the multiple hosts/VMs/containers used to run the code can be distributed in the same virtual private cloud (VPC), or across multiple VPCs.
  • VPC virtual private cloud
  • Cross-region communication between two VPCs in the same region and between VPCs in different regions requires a communication gateway in each VPC, and the interconnection between VPCs is realized through the communication gateway. .
  • the action optimization module 503 may include at least one computing device, such as a server.
  • the action optimization module 503 may also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD).
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • the above-mentioned PLD can be a complex programmable logical device (CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL), or any combination thereof.
  • CPLD complex programmable logical device
  • FPGA field-programmable gate array
  • GAL general array logic
  • Multiple computing devices included in the action optimization module 503 may be distributed in the same region or in different regions. Multiple computing devices included in the action optimization module 503 may be distributed in the same AZ or in different AZs. Similarly, multiple computing devices included in the action optimization module 503 may be distributed in the same VPC or in multiple VPCs.
  • the plurality of computing devices may be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.
  • Figure 10 is a schematic flowchart of a digital human optimization method provided by an embodiment of the present application. This method is suitable for scenarios in which digital humans are driven by human beings (real-person driving scenarios). As shown in Figure 10, the method includes: S1001-S1005.
  • the digital human optimization system is triggered to start the digital human action optimization service.
  • the digital human optimization system can enable the action optimization service by default. It is also possible for the user to determine whether the action optimization service of the digital human optimization system needs to be enabled.
  • first action data generated by the person being collected is collected.
  • the first action data generated by the person can be collected through the data collection module 501 in the digital human optimization system 500 described in FIG. 5 above.
  • the first action data collected may be video file data containing the actions of the person in the video, or may be spatial data of key points of the body of the person in the video.
  • the digital human action optimization service is started based on the digital human optimization system, and the first action data is matched with the action data in the action library. For example, whether the digital human action optimization service is enabled can be determined through the digital human driving module 502 in Figure 5 above. Further, the digital human driving module 502 also needs to determine the type of the first action data. When the first action data is video file data, the digital human driving module 502 also needs to identify the body key points of the person in the video file, and determine the spatial data of the identified body key points. Then, the digital human driving module 502 uses the recognized spatial data of the body key points of the human as the first action data.
  • the digital human driving module 502 When the first action data received by the digital human driving module 502 is the spatial data of key points of the human body, the digital human driving module 502 does not need to process the first action data.
  • the first action data can be matched with the action data in the reference action library 5032 through the action optimization module 503 in FIG. 5 above. specific The implementation process of can refer to the description of the action optimization module 503 in Figure 5 above.
  • the digital human is driven according to the second action number.
  • the first action data can be optimized through the action optimization module 503 in Figure 5 above to obtain the third action data.
  • optimizing the first action data it can be divided into scenarios where delay needs to be considered and scenarios where delay does not need to be considered.
  • the specific implementation process please refer to the methods described in Figure 7, Figure 8, and Figure 9 above.
  • the digital human when the second action data matches the first action data in the reference action library, the digital human can be driven directly based on the second action data. At this time, the person in the middle only needs to do the starting action of the first action. Then, the action optimization module 503 searches the reference action library for a second action that matches the first action based on the starting action of the first action, and sends the action data frame of the second action to the digital human driving module 502, so that the digital human The human driving module 502 drives the digital human according to the received action data frame of the second action to reduce the difficulty of the human's action. In the embodiment of the present application, before driving the digital human, the digital human driving module first judges the collected first action data to determine whether the first action data needs to be optimized.
  • the digital human driving module determines that the first action needs to be optimized, the digital human driving module sends the first data to the action optimization module for optimization. Then the digital human driving module drives the digital human based on the optimized action data, avoiding the deformation of the digital human's movements and reducing the problem of the digital human being wearing a mold.
  • computing device 1100 includes: bus 1102, processor 1104, memory 1106, and communication interface 1108.
  • the processor 1104, the memory 1106 and the communication interface 1108 communicate through a bus 1102.
  • Computing device 1100 may be a server or a terminal device. It should be understood that this application does not limit the number of processors and memories in the computing device 1100.
  • the bus 1102 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one line is used in Figure 11, but it does not mean that there is only one bus or one type of bus.
  • Bus 1104 may include a path that carries information between various components of computing device 1100 (eg, memory 1106, processor 1104, communications interface 1108).
  • the processor 1104 may include a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP). any one or more of them.
  • CPU central processing unit
  • GPU graphics processing unit
  • MP microprocessor
  • DSP digital signal processor
  • Memory 1106 may include volatile memory, such as random access memory (RAM).
  • the processor 1104 may also include non-volatile memory, such as read-only memory (ROM), flash memory, hard disk drive (HDD) or solid state drive (solid state drive). drive, SSD).
  • ROM read-only memory
  • flash memory flash memory
  • HDD hard disk drive
  • solid state drive solid state drive
  • the memory 1106 stores executable program code, and the processor 1104 executes the executable program code to realize the functions of the aforementioned data collection module 501, digital human driver module 502, action optimization module 503, and digital human rendering module 504, respectively.
  • Implementing digital human optimization methods That is, the memory 106 stores instructions for executing the digital human optimization method.
  • the communication interface 1103 uses transceiver modules such as, but not limited to, network interface cards and transceivers to implement communication between the computing device 1100 and other devices or communication networks.
  • An embodiment of the present application also provides a computing device cluster.
  • the computing device cluster includes at least one computing device.
  • the computing device may be a server, such as a central server, an edge server, or a local server in a local data center.
  • the computing device may also be a terminal device such as a desktop computer, a laptop computer, or a smartphone.
  • the computing device cluster includes at least one computing device 1100.
  • the same instructions for performing the digital human optimization method may be stored in the memory 1106 of one or more computing devices 1100 in the cluster of computing devices.
  • the memory 1106 of one or more computing devices 1100 in the computing device cluster may also respectively store part of the instructions for executing the digital human optimization method.
  • a combination of one or more computing devices 1100 may collectively execute instructions for performing a digital human optimization method.
  • the memory 1106 in different computing devices 1100 in the computing device cluster can store different instructions, which are respectively used to execute part of the functions of the digital human driving system. That is, the instructions stored in the memory 1106 in different computing devices 1100 can implement the functions of one or more modules in the data collection module 501, the digital human driving module 502, the action optimization module 503, and the digital human rendering module 504.
  • one or more computing devices in a cluster of computing devices may be connected through a network.
  • the network may be a wide area network or a local area network, etc.
  • Figure 13 shows a possible implementation. As shown in Figure 13, two computing devices 1100A and 1100B are connected through a network. Specifically, the connection to the network is made through a communication interface in each computing device.
  • the memory 1106 in the computing device 1100A stores instructions for performing the functions of the data acquisition module 501 .
  • the memory 1106 in the computing device 1100B stores instructions for executing the functions of the digital human driving module 502, the motion optimization module 503, and the digital human rendering module 504.
  • connection method between the computing device clusters shown in Figure 13 can be: Considering that the digital human action optimization method provided by this application needs to store a large amount of data and perform a large amount of calculations, the digital human driving module 502 and the action optimization module are considered 503. The functions implemented by the digital human rendering module 504 are executed by the computing device 100B.
  • computing device 100A shown in FIG. 13 may also be performed by multiple computing devices 100 .
  • the functions of computing device 100B may also be performed by multiple computing devices 100 .
  • An embodiment of the present application also provides a computer program product containing instructions.
  • the computer program product may be a software or program product containing instructions capable of running on a computing device or stored in any available medium.
  • the computer program product is run on at least one computing device, at least one computing device is caused to execute the digital human motion optimization method.
  • An embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be any available medium that a computing device can store or a data storage device such as a data center that contains one or more available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media (eg, solid state drive), etc.
  • the computer-readable storage medium includes instructions that instruct a computing device to perform a digital human motion optimization method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Amplifiers (AREA)
  • Processing Or Creating Images (AREA)

Abstract

A digital human driving method. The method comprises: obtaining first action data, wherein the first action data is action data generated according to the action of a real human; matching the first action data and action data in an action library, and when there is second action data in the action library that matches the first action data, optimizing the first action data according to the second action data to obtain third action data; and then, driving a digital human according to the third action data. Before the digital human is driven, according to data in a prior action database, the obtained action data of the real human is optimized and predicted, and then, the digital human is driven according to the optimized action data, so that the action distortion of the digital human is avoided, and the problem of clipping of the digital human is reduced.

Description

一种数字人驱动方法、系统及设备A digital human driving method, system and device
本申请要求在2022年7月08日提交中国国家知识产权局、申请号为202210797830.6,发明名称为“一种数字人驱动方法、系统及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application submitted to the State Intellectual Property Office of China on July 8, 2022, with application number 202210797830.6 and the invention title "A digital human driving method, system and equipment", the entire content of which is incorporated by reference. incorporated in this application.
技术领域Technical field
本发明涉及计算机技术领域,尤其涉及一种数字人驱动方法及系统。The invention relates to the field of computer technology, and in particular to a digital human driving method and system.
背景技术Background technique
由于元宇宙概念火热,数字人作为元宇宙中真人代理技术也在蓬勃发展。当前的数字人驱动技术分为真人驱动和人工智能(artificial intelligence,AI)驱动。真人驱动是真人使用一些设备获取到中之人(表演者)的动作和语音数据,并将该数据传递给数字人,数字人根据这些数据执行相应的动作。AI驱动,一般是采用动作库中已经制作好的动作对数字人进行驱动。在通过真人对数字人进行驱动时,由于中之人和数字人之间有偏差,需要对中之人和数字人做骨骼重映射。但是重映射会导致数字人的动作变形,可能出现穿模或者动作不到位的情况。Due to the popularity of the Metaverse concept, digital humans are also booming as real-person agent technology in the Metaverse. Current digital human driving technology is divided into real person driving and artificial intelligence (artificial intelligence, AI) driving. Real-person driving means that the real person uses some equipment to obtain the action and voice data of the person (performer), and transfers the data to the digital person, and the digital person performs corresponding actions based on the data. AI driving generally uses actions already produced in the action library to drive digital humans. When driving a digital human through a real person, due to the deviation between the human and the digital human, it is necessary to remap the bones of the human and the digital human. However, remapping will cause the digital human's movements to deform, which may lead to mold wear or incorrect movements.
发明内容Contents of the invention
本申请提供了一种数字人驱动方法及系统,在驱动数字人之前,先根据动作数据库中的数据对获取的中之人的动作数据进行优化和预测。然后根据优化后的动作数据来驱动数字人。避免了数字人的动作变形以及降低了数字人出现穿模的问题。The present application provides a digital human driving method and system. Before driving the digital human, the acquired motion data of the human is optimized and predicted based on the data in the motion database. The digital human is then driven based on the optimized motion data. This avoids the movement deformation of the digital human and reduces the problem of the digital human wearing the mold.
第一方面,本申请提供了一种数字人驱动方法,该方法包括:获取目标对象产生的第一动作数据;将第一动作数据与动作库中的动作数据进行匹配;当动作库中存在第二动作数据与所述第一动作数据匹配时,根据第二动作数据驱动数字人。In the first aspect, this application provides a digital human driving method, which method includes: obtaining the first action data generated by the target object; matching the first action data with the action data in the action library; when there is a third action data in the action library When the second action data matches the first action data, the digital human is driven according to the second action data.
在本方案中,在根据中之人(即目标对象)的动作数据驱动数字人之前,首先对中之人的动作数据进行优化。即在获取第一动作数据以后,先确定动作库中是否存在与第一动作数据相匹配的动作数据。当动作库中存在与第一动作数据相匹配的第二动作数据时,根据第二动作数据驱动数字人,避免了数字人的动作变形以及降低了数字人出现穿模的问题。In this solution, before driving the digital human based on the action data of the person in the middle (that is, the target object), the action data of the person in the middle is first optimized. That is, after acquiring the first action data, it is first determined whether there is action data matching the first action data in the action library. When there is second action data matching the first action data in the action library, the digital human is driven according to the second action data, thereby avoiding the deformation of the digital human's movements and reducing the problem of the digital human being wearing a mold.
在一个可能的实现方式中,根据第二动作数据驱动数字人包括:根据第一动作数据和第二动作数据驱动数字人。In a possible implementation, driving the digital person according to the second action data includes: driving the digital person according to the first action data and the second action data.
也就是说,当动作库中存在与第一动作数据相匹配的第二动作数据时,可以根据第二动作数据对第一动作数据进行优化,然后根据优化后的第一动作数据驱动数字人,避免了数字人的动作变形以及降低了数字人出现穿模的问题。That is to say, when there is second action data matching the first action data in the action library, the first action data can be optimized based on the second action data, and then the digital human can be driven based on the optimized first action data, This avoids the movement deformation of the digital human and reduces the problem of the digital human wearing the mold.
在一个可能的实现方式中,根据第一动作数据和第二动作数据驱动数字人包括: In a possible implementation, driving the digital human based on the first action data and the second action data includes:
根据第二动作数据,对第一动作数据中至少一个数据帧进行优化,得到优化后的第一动作数据。According to the second action data, at least one data frame in the first action data is optimized to obtain optimized first action data.
也就是说,在适用场景不同时,在对第一动作数据进行优化时,可选择不同的优化方式。在不需要考虑时延的场景下,在确定动作库中存在第二动作数据以第一动作数据匹配以后,可以从第一动作数据的第一个数据帧开始优化,提高了数字人动作的准确性。在需要考虑时延的场景下,在通过m帧动作数据确定第一动作数据对应的动作,且确定动作库中存在第二动作数据以第一动作数据匹配以后,可以从第一动作数据的第m个数据帧开始优化。在降低时延的情况下,尽可能的提高数字人动作的准确性。That is to say, when applicable scenarios are different, different optimization methods can be selected when optimizing the first action data. In scenarios where time delay does not need to be considered, after it is determined that the second action data exists in the action library to match the first action data, optimization can be started from the first data frame of the first action data, which improves the accuracy of the digital human actions. sex. In scenarios where time delay needs to be considered, after determining the action corresponding to the first action data through m frames of action data, and determining that the second action data exists in the action library to match the first action data, the action data can be obtained from the first action data. m data frames start optimizing. While reducing the delay, the accuracy of the digital human's movements should be improved as much as possible.
在一个可能的实现方式中,将第一动作数据与动作库中的动作数据进行匹配包括:对第一动作数据中的m个动作数据帧进行特征提取,确定第一动作数据的特征值;将第一动作数据的特征值与动作库中的动作数据的特征值进行比较;基于第一动作数据的特征值与动作库中的动作数据的特征值的相似度大于相似度阈值,确定动作库中存在动作数据与第一动作数据匹配。In a possible implementation, matching the first action data with the action data in the action library includes: performing feature extraction on m action data frames in the first action data to determine the feature value of the first action data; The feature value of the first action data is compared with the feature value of the action data in the action library; based on the similarity between the feature value of the first action data and the feature value of the action data in the action library being greater than the similarity threshold, it is determined that the feature value in the action library is There is action data matching the first action data.
也就是说,由于第一动作数据是一帧一帧的进行发送的,当接收到的第一动作数据的数据帧达到一个阈值以后,就可以根据接收到的动作数据帧确定第一动作数据对应的动作。此时可以将接收到的第一动作数据的多个动作数据帧与动作库中的动作数据进行匹配。在将第一动作数据中的多个动作数据帧与动作库中的动作数据进行匹配时,可以将多个动作数据帧的特征值作为第一动作数据的特征值,与动作库中的动作数据的特征值进行比较。只有当第一动作数据的特征值与动作库中的某一个动作数据的特征值的相似度达到一定阈值以后,才能判定该动作库中存在有动作数据与第一动作数据相匹配。That is to say, since the first action data is sent frame by frame, when the data frame of the received first action data reaches a threshold, the corresponding first action data can be determined based on the received action data frame. Actions. At this time, the multiple action data frames of the received first action data can be matched with the action data in the action library. When matching multiple action data frames in the first action data with action data in the action library, the feature values of the multiple action data frames can be used as feature values of the first action data and matched with the action data in the action library. eigenvalues are compared. Only when the similarity between the feature value of the first action data and the feature value of a certain action data in the action library reaches a certain threshold, it can be determined that there is action data in the action library that matches the first action data.
在一个可能的实现方式中,对m个动作数据帧进行特征提取,确定第一动作数据的特征值包括:获取m个动作数据帧中各个数据帧的空间信息和时间上的运动信息;根据空间信息和时间上的运动信息,生成第一动作数据的特征值。In a possible implementation, performing feature extraction on m action data frames and determining the feature value of the first action data includes: obtaining the spatial information and temporal motion information of each data frame in the m action data frames; according to the spatial information and temporal motion information to generate feature values of the first action data.
也就是说,在确定第一动作数据的特征值时,可以根据第一动作数据中的各个数据帧的空间信息以及时间上的运动信息来确定。That is to say, when determining the characteristic value of the first action data, it can be determined based on the spatial information and temporal motion information of each data frame in the first action data.
在一个可能的实现方式中,在将第一动作数据与动作库中的动作数据进行匹配之前,方法还包括:确定数字人动作优化系统是否开启了动作数据优化功能,基于系统开启了动作数据优化功能,将第一动作数据与动作库中的动作数据进行匹配。In a possible implementation, before matching the first action data with the action data in the action library, the method further includes: determining whether the digital human action optimization system has turned on the action data optimization function, and based on the system turning on the action data optimization Function to match the first action data with the action data in the action library.
也就是说,可以由用户确定是否开启动作数据优化功能。提高了用户的使用体验。In other words, the user can determine whether to turn on the motion data optimization function. Improved user experience.
第二方面,本申请提供了一种数字人驱动系统,包括:In the second aspect, this application provides a digital human driving system, including:
采集模块,用于获取目标对象产生的第一动作数据;The collection module is used to obtain the first action data generated by the target object;
优化模块,用于将第一动作数据与动作库中的动作数据进行匹配;An optimization module used to match the first action data with the action data in the action library;
处理模块,用于当动作库中存在第二动作数据与第一动作数据匹配时,根据第二动作数据驱动数字人。A processing module, configured to drive the digital human according to the second action data when the second action data in the action library matches the first action data.
在一个可能的实现方式中,处理模块用于:In one possible implementation, the processing module is used to:
根据第一动作数据和第二动作数据驱动数字人。The digital human is driven based on the first motion data and the second motion data.
在一个可能的实现方式中,优化模块用于:根据第二动作数据,对第一动作数据中至少一个数据帧进行优化,得到优化后的第一动作数据;In a possible implementation, the optimization module is configured to: optimize at least one data frame in the first action data according to the second action data to obtain optimized first action data;
处理模块用于:根据优化后的第一动作数据驱动数字人。 The processing module is used to: drive the digital human based on the optimized first action data.
在一个可能的实现方式中,优化模块用于:In one possible implementation, the optimization module is used to:
对第一动作数据中的m个动作数据帧进行特征提取,确定第一动作数据的特征值;其中,m为大于等于2的自然数;Perform feature extraction on m action data frames in the first action data to determine the feature value of the first action data; where m is a natural number greater than or equal to 2;
将第一动作数据的特征值与动作库中的动作数据的特征值进行比较;Compare the feature value of the first action data with the feature value of the action data in the action library;
基于第一动作数据的特征值与动作库中的动作数据的特征值的相似度大于相似度阈值,确定动作库中存在动作数据与第一动作数据匹配。Based on the similarity between the feature value of the first action data and the feature value of the action data in the action library being greater than the similarity threshold, it is determined that the action data in the action library matches the first action data.
在一个可能的实现方式中,优化模块用于:In one possible implementation, the optimization module is used to:
获取m个动作数据中各个数据帧的空间信息和时间上的运动信息;Obtain the spatial information and temporal motion information of each data frame in the m action data;
根据各个数据帧的空间信息和时间上的运动信息,生成第一动作数据的特征值。Based on the spatial information and temporal motion information of each data frame, a feature value of the first action data is generated.
在一个可能的实现方式中,优化模块还用于:In a possible implementation, the optimization module is also used to:
确定数字人动作优化系统是否开启了动作数据优化功能,基于系统开启了动作数据优化功能,将第一动作数据与动作库中的动作数据进行匹配。Determine whether the digital human action optimization system has turned on the action data optimization function, and based on the system turning on the action data optimization function, match the first action data with the action data in the action library.
第三方面,本申请提供了一种电子设备,包括:In a third aspect, this application provides an electronic device, including:
至少一个存储器,用于存储程序;At least one memory for storing programs;
至少一个处理器,用于执行存储器存储的程序;At least one processor for executing programs stored in the memory;
其中,当存储器存储的程序被执行时,处理器用于执行第一方面或第一方面的任一种可能的实现方式所描述的方法。Wherein, when the program stored in the memory is executed, the processor is configured to execute the method described in the first aspect or any possible implementation of the first aspect.
第四方面,本申请实施例提供了一种计算集群设备,包括至少一个计算设备,每个计算设备包括处理器和存储器;In a fourth aspect, embodiments of the present application provide a computing cluster device, including at least one computing device, each computing device including a processor and a memory;
所述至少一个计算设备的处理器用于执行至少一个计算设备的存储器中存储的指令,以使得计算设备集群执行第一方面或第一方面的任一种可能的实现方式所描述的方法。The processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device, so that the computing device cluster executes the method described in the first aspect or any possible implementation of the first aspect.
第五方面,本申请提供了一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,当计算机程序在处理器上运行时,使得处理器执行第一方面或第一方面的任一种可能的实现方式所描述的方法。In a fifth aspect, the present application provides a computer-readable storage medium. The computer-readable storage medium stores a computer program. When the computer program is run on a processor, it causes the processor to execute the first aspect or any one of the first aspects. possible implementation methods.
第六方面,本申请提供了一种计算机程序产品,当计算机程序产品在处理器上运行时,使得处理器执行第一方面或第一方面的任一种可能的实现方式所描述的方法。In a sixth aspect, this application provides a computer program product. When the computer program product is run on a processor, it causes the processor to execute the method described in the first aspect or any possible implementation of the first aspect.
可以理解的是,上述第二方面至第六方面的有益效果可以参见上述第一方面中的相关描述,在此不再赘述。It can be understood that the beneficial effects of the above-mentioned second to sixth aspects can be referred to the relevant descriptions in the above-mentioned first aspect, and will not be described again here.
附图说明Description of the drawings
图1为一种利用真人驱动数字人的方法流程示意图;Figure 1 is a schematic flow chart of a method of using real people to drive digital people;
图2(a)为本申请实施例提供的一种应用场景示意图;Figure 2(a) is a schematic diagram of an application scenario provided by the embodiment of the present application;
图2(b)为本申请实施例提供的另一种应用场景示意图;Figure 2(b) is a schematic diagram of another application scenario provided by the embodiment of the present application;
图3是本申请实施例提供的一种电子设备的硬件结构示意图;Figure 3 is a schematic diagram of the hardware structure of an electronic device provided by an embodiment of the present application;
图4是本申请实施例提供的一种服务器的硬件结构示意图;Figure 4 is a schematic diagram of the hardware structure of a server provided by an embodiment of the present application;
图5为本申请实施例中提供的一种数字人驱动方法的流程图;Figure 5 is a flow chart of a digital human driving method provided in an embodiment of the present application;
图6为本申请实施例中提供的一种中之人的动作变化示意图;Figure 6 is a schematic diagram of the movement changes of a person provided in the embodiment of the present application;
图7为本申请实施例中提供的一种对第一动作数据进行优化的过程示意图; Figure 7 is a schematic diagram of a process for optimizing the first action data provided in the embodiment of the present application;
图8为本申请实施例中提供的又一种对第一动作数据进行优化的过程示意图;Figure 8 is a schematic diagram of another process of optimizing the first action data provided in the embodiment of the present application;
图9为本申请实施例中提供的又一种对第一动作数据进行优化的过程示意图;Figure 9 is a schematic diagram of another process of optimizing the first action data provided in the embodiment of the present application;
图10为本申请实施例提供的一种数字人优化方法的流程示意图;Figure 10 is a schematic flow chart of a digital human optimization method provided by an embodiment of the present application;
图11为本申请实施例中提供的一种计算设备的结构示意图;Figure 11 is a schematic structural diagram of a computing device provided in an embodiment of the present application;
图12为本申请实施例中提供的一种计算设备集群的结构示意图;Figure 12 is a schematic structural diagram of a computing device cluster provided in an embodiment of the present application;
图13为本申请实施例中提供的一种计算设备集群之间的连接方式示意图。Figure 13 is a schematic diagram of a connection method between computing device clusters provided in an embodiment of the present application.
具体实施方式Detailed ways
为了使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图,对本申请实施例中的技术方案进行描述。In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings.
在本申请实施例中的描述中,“示例性的”、“例如”或者“举例来说”的任何实施例或设计方案不应该被理解为比其他实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”、“例如”或者“举例来说”等词旨在以具体方式呈现相关概念。In the description of the embodiments of this application, any embodiment or design solution that is "exemplary", "such as" or "for example" should not be construed as being more preferred or advantageous than other embodiments or design solutions. . Rather, use of the words "exemplary," "such as," or "for example" is intended to present the concepts in a concrete manner.
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。In addition, the terms "first" and "second" are only used for descriptive purposes and cannot be understood as indicating or implying relative importance or implicitly indicating the indicated technical features. Therefore, features defined as "first" and "second" may explicitly or implicitly include one or more of these features. The terms “including,” “includes,” “having,” and variations thereof all mean “including but not limited to,” unless otherwise specifically emphasized.
首先对本申请实施例中涉及的术语进行介绍。First, the terms involved in the embodiments of this application are introduced.
1、数字人,数字人在狭义的定义是信息科学与生命科学融合的产物,是利用信息科学的方法对人体在不同水平的状态和功能进行虚拟仿真。在数字人完成以后需要对数字人进行渲染和驱动。而对数字人的驱动通常包括真人驱动和AI驱动。其中,AI驱动是指采用动作库中已经制作好的动作对数字人进行驱动。真人驱动是指使用一些设备获取到人的动作和语音数据,然后将获取到的数据传递给数字人,数字人根据这些数据执行相应的动作。1. Digital human. In a narrow sense, digital human is the product of the integration of information science and life science. It uses information science methods to conduct virtual simulations of the human body's status and functions at different levels. After the digital human is completed, the digital human needs to be rendered and driven. The drive for digital people usually includes real person drive and AI drive. Among them, AI driving refers to using the actions that have been produced in the action library to drive the digital human. Real-person driving refers to using some equipment to obtain human movement and voice data, and then passing the obtained data to the digital human, who then performs corresponding actions based on the data.
2、中之人,以动作捕捉和面部捕捉技术为基础的职业“虚拟主播”,其幕后的演员可以被称为“中之人”。在本申请实施例中,中之人可以是指用于为数字人表演动作的人(目标对象),使得数字人可以根据中之人的动作进行动作。2. Zhongzhiren is a professional "virtual anchor" based on motion capture and facial capture technology. The actors behind the scenes can be called "Zhongzhiren". In the embodiment of the present application, the person in the middle may refer to a person (target object) used to perform actions for the digital person, so that the digital person can perform actions based on the actions of the person in the middle.
示例性的,图1示出了一种利用真人驱动数字人的方法流程示意图。参见图1可知,该方法包括:S101-S105。As an example, Figure 1 shows a schematic flowchart of a method for using real people to drive digital people. Referring to Figure 1, it can be seen that the method includes: S101-S105.
S101,获取中之人的身体关键点的空间数据。S101: Obtain spatial data of key points of the human body.
获取中之人的身体关键点的空间数据包括:识别中之人身体的关键点,然后将识别出的关键点映射为中之人的骨骼点,生成中之人的各个骨骼点的空间数据。Obtaining the spatial data of the key points of the person's body includes: identifying the key points of the person's body, and then mapping the identified key points to the bone points of the person, and generating spatial data of each bone point of the person.
在一个可能的示例中,可以通过光学动作捕捉系统、惯性传感器或者摄像头捕捉中之人的身体关键点的空间数据。In a possible example, spatial data of key points of the human body can be captured through an optical motion capture system, an inertial sensor, or a camera.
S102,将获取的中之人的身体关键点的空间数据,转换为数字人的动作数据;S102: Convert the acquired spatial data of key points of the human body into motion data of the digital human;
在获取到中之人的身体关键点的空间数据以后,还需要从获取的空间数据中提取出动作数据帧。After obtaining the spatial data of the key points of the person's body, it is necessary to extract the action data frame from the obtained spatial data.
S103,将生成的动作数据应用到数字人上,驱动数字人做动作。S103: Apply the generated action data to the digital human to drive the digital human to perform actions.
在获取到动作人的动作数据以后,可以使得数字人根据该动作数据,作出相应的 动作。After obtaining the action data of the action person, the digital person can make corresponding actions based on the action data. action.
S104,对数字人进行渲染并生成视频。S104, render the digital human and generate a video.
计算机将存储在内存中的形状转换成实际绘制在屏幕上的对应的过程称为渲染。渲染过程中最常用的技术就是光栅化。光栅化就是将数据转化成可见像素的过程。The process by which a computer converts shapes stored in memory into corresponding shapes that are actually drawn on the screen is called rendering. The most commonly used technique in the rendering process is rasterization. Rasterization is the process of converting data into visible pixels.
在对数字人进行渲染时,可以通过图形处理器(graphics processing unit,GPU)来对数字人进行渲染。其中,GPU图形渲染流水线的具体实现可以分为六个阶段。When rendering a digital human, the digital human can be rendered through a graphics processor (graphics processing unit, GPU). Among them, the specific implementation of the GPU graphics rendering pipeline can be divided into six stages.
第一阶段,顶点着色器,该阶段的输入是顶点数据(vertex data)数据;其中,顶点数据是指一系列顶点的集合;顶点着色器的目的是把输入顶点的3D坐标转换为另一种3D坐标,同时,顶点着色器可以对顶点属性进行一些基本处理。The first stage is the vertex shader. The input of this stage is vertex data; where vertex data refers to a collection of a series of vertices; the purpose of the vertex shader is to convert the 3D coordinates of the input vertices into another 3D coordinates, at the same time, the vertex shader can do some basic processing of vertex attributes.
第二阶段,形状(图元)装配,该阶段将顶点着色器输出的所有顶点作为输入,并将所有的点装配成指定图元的形状。The second stage is shape (primitive) assembly. This stage takes all the vertices output by the vertex shader as input and assembles all the points into the shape of the specified primitive.
第三阶段,几何着色器,该阶段把图元形式的一系列顶点的集合作为输入,它可以通过产生新顶点构造出新的(或是其它的)图元来生成其他形状。The third stage, the geometry shader, takes a set of vertices in the form of primitives as input. It can construct new (or other) primitives by generating new vertices to generate other shapes.
第四阶段,光栅化,该阶段会把图元映射为最终屏幕上相应的像素,生成片段。片段(fragment)是渲染一个像素所需要的所有数据。The fourth stage, rasterization, maps primitives to corresponding pixels on the final screen to generate fragments. A fragment is all the data needed to render a pixel.
第五阶段,片段着色器,该阶段首先会对输入的片段进行裁切(clipping);其中,裁切会丢弃超出视图以外的所有像素,用来提升执行效率。The fifth stage is the fragment shader. This stage first clips the input fragment; clipping discards all pixels beyond the view to improve execution efficiency.
第六阶段,测试与混合,该阶段会检测片段的对应的深度值(z坐标),判断这个像素位于其它物体的前面还是后面,决定是否应该丢弃;此外,该阶段还会检查alpha值(alpha值定义了一个物体的透明度),从而对物体进行混合。The sixth stage, testing and mixing, this stage will detect the corresponding depth value (z coordinate) of the fragment, determine whether the pixel is in front or behind other objects, and decide whether it should be discarded; in addition, this stage will also check the alpha value (alpha value defines the transparency of an object), thereby blending the object.
S105,输出视频。S105, output video.
在上述方案中,在获取中之人的身体关键点的空间数据时,需要使用较高精度的设备,以及在通过中之人驱动数字人时,需要使用体型相似的中之人。同时为了保证数字人的动作的合理性,需要要求中之人不要做容易穿模的动作,以及需要增加模型的碰撞检测精度,修补动作数据的合理性。In the above solution, when obtaining the spatial data of the key points of the body of the human, it is necessary to use higher-precision equipment, and when driving the digital human through the human, it is necessary to use the human with a similar body shape. At the same time, in order to ensure the rationality of the movements of the digital human, it is necessary to require the human being not to make movements that are easy to cross the model, and to increase the collision detection accuracy of the model and repair the rationality of the movement data.
针对上述方案中的问题,本申请实施例提供了一种数字人驱动系统。在驱动数字人之前,先根据动作数据库中的数据对获取的中之人的动作数据进行优化和预测。然后根据优化或者预测后的动作数据来驱动数字人。避免了数字人的动作变形以及降低了数字人出现穿模的问题。To address the problems in the above solutions, embodiments of the present application provide a digital human driving system. Before driving the digital human, the acquired motion data of the human is optimized and predicted based on the data in the motion database. Then the digital human is driven based on the optimized or predicted action data. This avoids the movement deformation of the digital human and reduces the problem of the digital human wearing the mold.
接下来,对本申请实施例提供的技术方案进行介绍。Next, the technical solutions provided by the embodiments of this application are introduced.
示例性的,图2(a)示出了一种应用场景,如图2(a)所示,该场景下可以包括电子设备100。电子设备100上配置有数字人优化系统。电子设备100可以通过数字人优化系统,对数字人的动作数据进行优化,并根据优化后的动作数据驱动数字人。Exemplarily, FIG. 2(a) shows an application scenario. As shown in FIG. 2(a), the electronic device 100 may be included in this scenario. The electronic device 100 is configured with a digital human optimization system. The electronic device 100 can optimize the motion data of the digital human through the digital human optimization system, and drive the digital human based on the optimized motion data.
示例性的,图2(b)示出了另一种应用场景,如图2(b)所示,该场景下可以包括电子设备100和服务器200。在该场景中,数字人优化系统可以配置在服务器200上,或者,部分配置在电子设备100上,部分配置在服务器200上。当数字人优化系统配置在服务器200上时,服务器200可以通过数字人优化系统,对数字人的动作数据进行优化。然后,服务器200根据优化后的动作数据驱动数字人,并对数字人进行渲染,生成视频。最后,服务器200将生成的视频文件发送给电子设备100进行显示。当数字人优化系统一部分配 置在服务器200上,另一部分配置在电子设备100中时,电子设备100可以访问到服务器200提供的数据。比如,电子设备100可以从服务器200上获取优化后的数字人的动作数据,并根据该优化后的动作数据驱动数字人。Exemplarily, FIG. 2(b) shows another application scenario. As shown in FIG. 2(b), this scenario may include an electronic device 100 and a server 200. In this scenario, the digital human optimization system may be configured on the server 200 , or partially configured on the electronic device 100 and partially configured on the server 200 . When the digital human optimization system is configured on the server 200, the server 200 can optimize the action data of the digital human through the digital human optimization system. Then, the server 200 drives the digital human based on the optimized action data, renders the digital human, and generates a video. Finally, the server 200 sends the generated video file to the electronic device 100 for display. When Digital People Optimize System Part Distribution When the electronic device 100 is placed on the server 200 and the other part is configured in the electronic device 100, the electronic device 100 can access the data provided by the server 200. For example, the electronic device 100 can obtain the optimized motion data of the digital human from the server 200 and drive the digital human based on the optimized motion data.
在一些实施例中,电子设备100与服务器200之间可以通过有线网络(wired network)或无线网络(wireless network)等网络连接。例如,该网络可以为局域网(local area networks,LAN),也可以为广域网(wide area networks,WAN)(例如互联网)。电子设备100与服务器200之间的网络均可使用任何已知的网络通信协议来实现,上述网络通信协议可以是各种有线或无线通信协议,诸如以太网、通用串行总线(universal serial bus,USB)、火线(firewire)、全球移动通讯系统(global system for mobile communications,GSM)、通用分组无线服务(general packet radio service,GPRS)、码分多址接入(code divisionmultiple access,CDMA)、宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA)、长期演进(long term evolution,LTE)、新空口(new radio,NR)、蓝牙(bluetooth)、无线保真(wireless fidelity,Wi-Fi)等通信协议。In some embodiments, the electronic device 100 and the server 200 may be connected through a network such as a wired network or a wireless network. For example, the network can be a local area network (LAN) or a wide area network (WAN) (such as the Internet). The network between the electronic device 100 and the server 200 can be implemented using any known network communication protocol. The network communication protocol can be various wired or wireless communication protocols, such as Ethernet, universal serial bus, USB), Firewire (firewire), global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), new radio interface (new radio, NR ), Bluetooth, wireless fidelity (Wi-Fi) and other communication protocols.
示例性的,图3示出了一种电子设备100的硬件结构。其中,电子设备100可以但不限于为手机、平板电脑、笔记本电脑、可穿戴设备、智能电视等电子设备。电子设备的示例性实施例包括但不限于搭载iOS、android、Windows、鸿蒙系统(Harmony OS)或者其他操作系统的电子设备。本申请实施例对电子设备的类型不做具体限定。By way of example, FIG. 3 shows the hardware structure of an electronic device 100. The electronic device 100 may be, but is not limited to, a mobile phone, a tablet, a laptop, a wearable device, a smart TV, and other electronic devices. Exemplary embodiments of electronic devices include, but are not limited to, electronic devices equipped with iOS, android, Windows, Harmony OS, or other operating systems. The embodiments of this application do not specifically limit the type of electronic equipment.
如图3所示,该电子设备100可以包括:包括处理器110、存储器120、显示屏130、通信模块140和输入设备150。其中,处理器110、存储器120、显示屏130、通信模块140和输入设备150可以通过总线或其他方式连接。As shown in FIG. 3 , the electronic device 100 may include: a processor 110 , a memory 120 , a display screen 130 , a communication module 140 and an input device 150 . Among them, the processor 110, the memory 120, the display screen 130, the communication module 140 and the input device 150 can be connected through a bus or other means.
其中,处理器110是电子设备100的计算核心及控制核心。处理器110可以包括一个或多个处理单元。例如,处理器110可以包括应用处理器(application processor,AP)、调制解调器(modem)、图形处理器(graphics processing unit,GPU)、图像信号处理器(image signal processor,ISP)、控制器、视频编解码器、数字信号处理器(digital signal processor,DSP)、基带处理器、和/或神经网络处理器(neural-network processing unit,NPU)等中的一项或多项。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。Among them, the processor 110 is the computing core and control core of the electronic device 100 . Processor 110 may include one or more processing units. For example, the processor 110 may include an application processor (AP), a modem, a graphics processing unit (GPU), an image signal processor (ISP), a controller, a video encoder One or more of a decoder, a digital signal processor (DSP), a baseband processor, and/or a neural-network processing unit (NPU), etc. Among them, different processing units can be independent devices or integrated in one or more processors.
存储器120可以存储有程序,程序可被处理器110运行,使得处理器110执行本申请实施例中提供的电子设备100所需执行的部分或全部方法。存储器120还可以存储有数据。处理器110可以读取存储器120中存储的数据。存储器120和处理器110可以单独设置。可选地,存储器120也可以集成在处理器110中。The memory 120 may store a program, and the program may be run by the processor 110, so that the processor 110 executes some or all of the methods required to be executed by the electronic device 100 provided in the embodiment of the present application. Memory 120 may also store data. Processor 110 can read data stored in memory 120. The memory 120 and the processor 110 may be provided separately. Optionally, the memory 120 may also be integrated in the processor 110 .
显示屏130用于显示图像,视频等。显示屏130包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极 管(quantum dot light emitting diodes,QLED)等。The display screen 130 is used to display images, videos, etc. Display screen 130 includes a display panel. The display panel can use a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light). emitting diode (AMOLED), flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode Tubes (quantum dot light emitting diodes, QLED), etc.
通信模块140可以包括移动通信模块和无线通信模块中的至少一种。其中,当通信模块140包括移动通信模块时,通信模块140可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。比如,全球移动通讯系统(global system for mobile communications,GSM)、通用分组无线服务(general packet radio service,GPRS)、码分多址接入(code divisionmultiple access,CDMA)、宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA)、长期演进(long term evolution,LTE)、新空口(new radio,NR)等。当通信模块140包括无线通信模块时,通信模块140可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。示例性的,通信模块140可以用于电子设备100与服务器200进行通信,以完成数据交互。The communication module 140 may include at least one of a mobile communication module and a wireless communication module. When the communication module 140 includes a mobile communication module, the communication module 140 can provide solutions for wireless communication including 2G/3G/4G/5G applied on the electronic device 100 . For example, global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband code division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (long term evolution, LTE), new radio (new radio, NR), etc. When the communication module 140 includes a wireless communication module, the communication module 140 can provide a wireless local area network (WLAN) (such as a wireless fidelity (Wi-Fi) network), Bluetooth, etc., which is applied on the electronic device 100. (bluetooth, BT), global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), infrared technology (infrared, IR) and other wireless communications s solution. For example, the communication module 140 can be used for the electronic device 100 to communicate with the server 200 to complete data interaction.
在一些实施例中,电子设备100还可以包括输入设备150。通过该输入设备150可以向电子设备100输入信息和/或下发控制指令等。示例性的,该输入设备150可以但不限于为鼠标、键盘等。In some embodiments, electronic device 100 may also include input device 150 . Information can be input to the electronic device 100 and/or control instructions can be issued through the input device 150 . For example, the input device 150 may be, but is not limited to, a mouse, a keyboard, etc.
可以理解的是,本申请实施例图3示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。It can be understood that the structure illustrated in Figure 3 of the embodiment of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the present application, the electronic device 100 may include more or fewer components than shown in the figures, or some components may be combined, some components may be separated, or some components may be arranged differently. The components illustrated may be implemented in hardware, software, or a combination of software and hardware.
示例性的,图4示出了一种服务器200的硬件结构。其中,服务器200可以但不限于用于提供云服务,其可以为一种可以与电子设备100建立通信连接、且能为电子设备100提供数据处理功能、运算功能和/或存储功能的服务器或者是超级电子设备。其中,服务器200可以是硬件服务器,也可以植入虚拟化环境中,例如,服务器200可以是在包括一个或多个其他虚拟机的硬件服务器上执行的虚拟机。By way of example, FIG. 4 shows the hardware structure of a server 200. The server 200 may be, but is not limited to, used to provide cloud services. It may be a server that can establish a communication connection with the electronic device 100 and can provide the electronic device 100 with data processing functions, computing functions and/or storage functions. Super electronic device. The server 200 may be a hardware server, or may be embedded in a virtualized environment. For example, the server 200 may be a virtual machine executed on a hardware server including one or more other virtual machines.
如图4所示,该服务器200可以包括:处理器210,网络接口220,及存储器230。其中,处理器210,网络接口220,及存储器230可通过总线或其他方式连接。As shown in FIG. 4 , the server 200 may include: a processor 210 , a network interface 220 , and a memory 230 . Among them, the processor 210, the network interface 220, and the memory 230 can be connected through a bus or other means.
本申请实施例中,处理器210(或称为中央处理器(central processing unit,CPU))是服务器200的计算核心及控制核心。In the embodiment of the present application, the processor 210 (or central processing unit (CPU)) is the computing core and control core of the server 200 .
网络接口220可以包括标准的有线接口,无线接口(如WI-FI,移动通信接口等),受处理器210的控制用于收发数据,例如,从网络上接收电子设备100发送的目标任务的样本数据等。The network interface 220 may include a standard wired interface, a wireless interface (such as WI-FI, mobile communication interface, etc.), and is controlled by the processor 210 for sending and receiving data, for example, receiving a target task sample sent by the electronic device 100 from the network. Data etc.
存储器230(memory)是服务器200的记忆设备,用于存放程序和数据,例如存放预训练模型等。可以理解的是,此次的存储器230可以是高速RAM存储器,也可以是非易失性存储器(non-volatile memory),例如至少一个磁盘存储器;可选地还可以是至少一个位于远离前述处理器210的存储装置。存储器230提供存储空间,该存 储空间存储了服务器的操作系统和可执行程序代码,可包括但不限于:Windows系统(一种操作系统),Linux系统(一种操作系统),鸿蒙系统(一种操作系统)等等,在此不做限定。The memory 230 (memory) is a memory device of the server 200 and is used to store programs and data, such as pre-trained models. It can be understood that the memory 230 this time can be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory; optionally, it can also be at least one located far away from the aforementioned processor 210 storage device. Memory 230 provides storage space, which The storage space stores the server's operating system and executable program code, which can include but is not limited to: Windows system (an operating system), Linux system (an operating system), Hongmeng system (an operating system), etc., in This is not limited.
可以理解的是,本申请实施例图4示意的结构并不构成对服务器200的具体限定。在本申请另一些实施例中,服务器200可以为云服务器。服务器200可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。It can be understood that the structure illustrated in Figure 4 of the embodiment of the present application does not constitute a specific limitation on the server 200. In other embodiments of this application, the server 200 may be a cloud server. Server 200 may include more or fewer components than illustrated, some components may be combined, some components may be separated, or components may be arranged differently. The components illustrated may be implemented in hardware, software, or a combination of software and hardware.
以上即是对申请实施例中涉及的应用场景、电子设备100的硬件结构和服务器200的硬件结构的相关介绍。接下来基于上述描述的内容,对本申请实施例中提供的数字人优化系统进行介绍。The above is the relevant introduction to the application scenarios, the hardware structure of the electronic device 100 and the hardware structure of the server 200 involved in the embodiments of the application. Next, based on the above description, the digital human optimization system provided in the embodiment of the present application is introduced.
示例性的,图5示出了本申请实施例提供的一种数字人驱动系统的结构示意图。参照图5,该系统包括:数据采集模块501、数字人驱动模块502、动作优化模块503、数字人渲染模块504。For example, FIG. 5 shows a schematic structural diagram of a digital human driving system provided by an embodiment of the present application. Referring to Figure 5, the system includes: a data collection module 501, a digital human driving module 502, an action optimization module 503, and a digital human rendering module 504.
数据采集模块501用于采集中之人的第一动作数据。具体地,数据采集模块501采集的第一动作数据可以是通过摄像头获取的中之人的视频文件数据,还可以是通过动捕设备获取的中之人的身体关键点的空间数据。其中,动捕设备可以包括:光学动作捕捉系统、惯性传感器等。在一个可能的示例中,可以将人体的17个关节点作为中之人的身体关键点。其中,人体的17个关键点包括:鼻子、左眼、右眼、左耳、右耳、左肩、右肩、左胳膊肘、右胳膊肘、左手腕、右手腕、左臀、右臀、左膝、右膝、左脚裸、右脚裸。The data collection module 501 is used to collect the first action data of the person. Specifically, the first action data collected by the data collection module 501 may be the video file data of the person in the subject obtained through the camera, or the spatial data of the key points of the body of the person in the subject obtained through the motion capture device. Among them, the motion capture equipment may include: optical motion capture system, inertial sensor, etc. In a possible example, 17 joint points of the human body can be used as key points of the human body. Among them, the 17 key points of the human body include: nose, left eye, right eye, left ear, right ear, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left hip, right hip, left Knee, right knee, bare left ankle, bare right ankle.
数字人驱动模块502接收数据采集模块501发送的第一动作数据以后,数字人驱动模块502对第一动作数据进行判断。当第一动作数据为视频文件数据时,数字人驱动模块502还需要从该视频文件中识别出中之人的身体关键点,并确定识别出的身体关键点的空间数据。然后,数字人驱动模块502将从视频文件数据中识别的,中之人的身体关键点的空间数据作为第一动作数据。当数字人驱动模块502接收到的第一动作数据为中之人的身体关键点的空间数据时,数字人驱动模块502不需要再对第一动作数据进行处理。After the digital human driving module 502 receives the first action data sent by the data collection module 501, the digital human driving module 502 determines the first action data. When the first action data is video file data, the digital human driving module 502 also needs to identify the body key points of the person in the video file, and determine the spatial data of the identified body key points. Then, the digital human driving module 502 identifies the spatial data of the human body key points from the video file data as the first action data. When the first action data received by the digital human driving module 502 is the spatial data of key points of the human body, the digital human driving module 502 does not need to process the first action data.
数字人驱动模块502接收数据采集模块501发送的第一动作数据以后,数字人驱动模块502还需要判断数字人优化系统是否开启了数字人动作优化服务。如果数字人优化系统没有开启数字人动作优化服务。则数字人驱动模块502根据第一动作数据生成数字人的动作驱动信息,以驱动数字人产生与第一动作数据对应的动作。如果数字人系统开启了数字人动作优化服务。则数字人驱动模块502将第一动作数据发送到动作优化模块503。After the digital human driving module 502 receives the first action data sent by the data collection module 501, the digital human driving module 502 also needs to determine whether the digital human optimization system has enabled the digital human action optimization service. If the digital human optimization system does not enable the digital human action optimization service. Then the digital human driving module 502 generates motion driving information of the digital human according to the first motion data to drive the digital human to produce motions corresponding to the first motion data. If the digital human system has enabled the digital human action optimization service. Then the digital human driving module 502 sends the first action data to the action optimization module 503.
在一个可能的示例中,用户可以自己确定是否需要开启数字人优化系统的动作优化服务。以此增加用户的体验感,以及系统的可使用性。在另一个可能的实施例中,数字人优化系统也可以默认开启动作优化服务。In a possible example, the user can determine by himself whether the action optimization service of the digital human optimization system needs to be turned on. This increases the user experience and the usability of the system. In another possible embodiment, the digital human optimization system can also enable the action optimization service by default.
动作优化模块503用于接收数字人驱动模块502发送的第一动作数据,并对接收到的第一动作数据的数据帧进行缓存,并根据缓存的动作数据帧所携带的空间信息和 时间上的运动信息,确定第一动作数据的特征值。其中,第一动作数据的特征值可以是特征向量,第一动作数据的特征值可以用来表征第一动作数据的类型。然后,将第一动作数据的特征值与参考动作库中的动作数据的特征值进行匹配。当参考动作库中存在第二动作数据与第一动作数据相匹配时,动作优化模块503可以根据第二动作数据对第一动作数据进行优化。The action optimization module 503 is used to receive the first action data sent by the digital human driving module 502, cache the data frame of the received first action data, and use the spatial information and the spatial information carried by the cached action data frame to The temporal motion information determines the characteristic value of the first motion data. The feature value of the first action data may be a feature vector, and the feature value of the first action data may be used to characterize the type of the first action data. Then, the feature value of the first action data is matched with the feature value of the action data in the reference action library. When there is second action data matching the first action data in the reference action library, the action optimization module 503 can optimize the first action data according to the second action data.
由于,数字人驱动模块502在将第一动作数据传输给动作优化模块503时,第一动作数据是一帧一帧的进行传输的。而每一帧数据只能代表中之人在某一个时刻的状态。因此,动作优化模块503只有在缓存了一定数量的第一动作数据帧以后,才能够识别出第一动作数据所对应的动作,进而将参考动作库中的动作数据与第一动作数据进行匹配。比如,以挥手动作为例。参照图6所示,在图6中示出了中之人的3帧动作数据携带的中之人的左肩和左手腕的相对位置信息。其中,图6中的(a)、(b)、(c)分别表示第一个动作数据帧、第二个动作数据帧、第三个动作数据帧中所携带的中之人的左肩和左手腕的相对位置。在第一个动作数据帧中,中之人的左手腕低于左肩,此时无法识别出中之人所做的动作。在第二个动作数据帧中,中之人的左手腕与左肩齐平,此时仍然无法识别出中之人所做的动作。在第三个动作数据帧中,中之人的左手腕高于左肩。此时,可以确定中之人所做的动作为挥手动作。Because, when the digital human driving module 502 transmits the first action data to the action optimization module 503, the first action data is transmitted frame by frame. Each frame of data can only represent the state of the person in it at a certain moment. Therefore, the action optimization module 503 can identify the action corresponding to the first action data only after caching a certain number of first action data frames, and then match the action data in the reference action library with the first action data. For example, take the waving motion. Referring to FIG. 6 , FIG. 6 shows the relative position information of the left shoulder and left wrist of the person in the center carried by the three frames of action data of the person in the middle. Among them, (a), (b), and (c) in Figure 6 respectively represent the left shoulder and left shoulder of the person in the center carried in the first action data frame, the second action data frame, and the third action data frame. The relative position of the wrist. In the first action data frame, the left wrist of the person in the picture is lower than the left shoulder. At this time, the movement of the person in the picture cannot be recognized. In the second action data frame, the left wrist of the person in the middle is flush with the left shoulder. At this time, the action made by the person in the middle is still unrecognizable. In the third action data frame, the left wrist of the subject is higher than the left shoulder. At this time, it can be determined that the action made by the person in the picture is a waving action.
因此,在一个可能的示例中,可以预先设置一个第一阈值。当动作优化模块503中缓存的第一动作数据的数据帧的个数超过第一阈值时,动作优化模块503可以将已经缓存的第一动作数据的数据帧,与参考动作库中的动作数据进行匹配,以确定参考动作库中是否存在与第一动作数据匹配的动作数据。其中,第一阈值可以是人为设定的阈值。在设定第一阈值时,需要保证通过满足第一阈值的数据帧,能够确定出一个具体的动作。Therefore, in a possible example, a first threshold can be set in advance. When the number of cached data frames of the first action data in the action optimization module 503 exceeds the first threshold, the action optimization module 503 can compare the cached data frames of the first action data with the action data in the reference action library. matching to determine whether action data matching the first action data exists in the reference action library. The first threshold may be an artificially set threshold. When setting the first threshold, it is necessary to ensure that a specific action can be determined through data frames that meet the first threshold.
在另一个可能的示例中,动作优化模块503每接收到一帧数据,就可以对该数据帧,以及该数据帧之前缓存的数据帧进行判断。当动作优化模块503不能通过已经接收到的数据帧,确定第一动作数据对应的动作时,动作优化模块503可以继续接收数字人驱动模块502发送的第一动作数据。当动作优化模块503能够通过已经接收到的数据帧,确定第一动作数据对应的动作时,动作优化模块503可以一边接收数字人驱动模块502发送的第一动作数据,一边将已经接收的第一动作数据的数据帧,与参考动作库中的动作数据进行匹配。In another possible example, each time the action optimization module 503 receives a frame of data, it can make a judgment on the data frame and the data frames cached before the data frame. When the action optimization module 503 cannot determine the action corresponding to the first action data through the received data frame, the action optimization module 503 can continue to receive the first action data sent by the digital human driving module 502. When the action optimization module 503 can determine the action corresponding to the first action data through the received data frame, the action optimization module 503 can receive the first action data sent by the digital human driving module 502 while converting the received first action data The data frame of action data is matched with the action data in the reference action library.
当参考动作库中存在有一个动作数据与第一动作数据相匹配时,动作优化模块503在根据参考动作库中的动作数据对第一动作数据进行优化时,包括两种情况,第一种情况需要考虑时延的,第二种情况不需要考虑时延。其中,第一种情况可对应直播场景。在该场景中,动作优化模块在对第一动作数据进行优化时,需要考虑时延信息。第二种情况可以对应点播场景。在该场景中,动作优化模块在对第一动作数据进行优化时,不需要考虑时延信息。When there is an action data in the reference action library that matches the first action data, the action optimization module 503 includes two situations when optimizing the first action data according to the action data in the reference action library. The first situation It is necessary to consider the delay. In the second case, there is no need to consider the delay. Among them, the first situation can correspond to the live broadcast scenario. In this scenario, the action optimization module needs to consider the delay information when optimizing the first action data. The second situation can correspond to the on-demand scene. In this scenario, the action optimization module does not need to consider the delay information when optimizing the first action data.
在直播场景下,参照图7。图7为动作优化模块503对第一动作数据进行优化的过程示意图。其中,图7中的第一动作数据对应的动作可以是图6中示意的挥手动作。参照图7,动作优化模块503对第一动作数据进行优化包括:S701-S707。S701,动作优化模块503接收并缓存数字人驱动模块502发送的第一动作数据的第一数据帧,根 据接收的第一数据帧确定第一动作数据对应的动作。S702,当动作优化模块503不能确定第一动作数据对应的动作时,动作优化模块503向数字人驱动模块502发送第一数据帧,以使数字人驱动模块502根据接收的第一数据帧驱动数字人。S703,动作优化模块503接收并缓存数字人驱动模块502发送的第二数据帧,根据的第一数据帧、第二数据帧,确定第一动作数据对应的动作。S704,当动作优化模块503不能确定第一动作数据对应的动作时,动作优化模块503向数字人驱动模块502发送第二数据帧,以使数字人驱动模块502根据接收的第二数据帧驱动数字人。S705,动作优化模块503接收并缓存数字人驱动模块502发送的第三数据帧,根据的第一数据帧、第二数据帧、第三数据帧,确定第一动作数据对应的动作。S706,当动作优化模块503确定出第一动作数据对应的动作以后,动作优化模块503确定参考动作库中是否存在与第一动作数据匹配的动作数据,当参考动作库中存在与第一动作数据匹配的动作数据时,动作优化模块503根据参考动作库中的动作数据对第三数据帧进行优化。S707,动作优化模块503将优化后的第三数据帧发送给数字人驱动模块502,以使数字人驱动模块502根据优化后的第三数据帧驱动数字人。In the live broadcast scenario, refer to Figure 7. Figure 7 is a schematic diagram of the process of optimizing the first action data by the action optimization module 503. The action corresponding to the first action data in FIG. 7 may be the waving action illustrated in FIG. 6 . Referring to FIG. 7 , the action optimization module 503 optimizing the first action data includes: S701-S707. S701. The action optimization module 503 receives and caches the first data frame of the first action data sent by the digital human driver module 502. Determine the action corresponding to the first action data based on the received first data frame. S702, when the action optimization module 503 cannot determine the action corresponding to the first action data, the action optimization module 503 sends the first data frame to the digital human driving module 502, so that the digital human driving module 502 drives the digital human according to the received first data frame. people. S703. The action optimization module 503 receives and caches the second data frame sent by the digital human driver module 502, and determines the action corresponding to the first action data based on the first data frame and the second data frame. S704, when the action optimization module 503 cannot determine the action corresponding to the first action data, the action optimization module 503 sends a second data frame to the digital human driving module 502, so that the digital human driving module 502 drives the digital human according to the received second data frame. people. S705, the action optimization module 503 receives and caches the third data frame sent by the digital human driver module 502, and determines the action corresponding to the first action data based on the first data frame, the second data frame, and the third data frame. S706. After the action optimization module 503 determines the action corresponding to the first action data, the action optimization module 503 determines whether there is action data matching the first action data in the reference action library. When there is action data matching the first action data in the reference action library, When matching action data, the action optimization module 503 optimizes the third data frame according to the action data in the reference action library. S707, the action optimization module 503 sends the optimized third data frame to the digital human driving module 502, so that the digital human driving module 502 drives the digital human according to the optimized third data frame.
在直播场景下,动作优化模块503一边接收数字人驱动模块502发送的第一动作数据帧,一边对接收的动作数据帧进行判断。当动作优化模块503根据当前接收到的数据帧,以及已经缓存的数据帧无法确定第一动作数据对应的动作时,动作优化模块503直接将当前接收到的数据帧返回给数字人驱动模块502,以使数字人驱动模块502可以根据该数据帧驱动数字人,避免在直播过程中,产生较长的时延。当动作优化模块503能够根据当前接收到的数据帧(第三数据帧),以及已经缓存的数据帧确定第一动作数据对应的动作,且在参考动作库中存在与第一动作数据匹配的动作数据时,动作优化模块503根据参考动作库中的动作数据,对当前数据帧以及之后接收到的数据帧进行修复,而不需要对当前数据帧之前的数据帧(第一数据帧和第二数据帧)进行修复。In the live broadcast scenario, the action optimization module 503 receives the first action data frame sent by the digital human driver module 502 while making judgment on the received action data frame. When the action optimization module 503 cannot determine the action corresponding to the first action data based on the currently received data frame and the cached data frame, the action optimization module 503 directly returns the currently received data frame to the digital human driver module 502, This allows the digital human driving module 502 to drive the digital human according to the data frame, thereby avoiding long delays during the live broadcast process. When the action optimization module 503 can determine the action corresponding to the first action data based on the currently received data frame (the third data frame) and the cached data frame, and there is an action matching the first action data in the reference action library When the data is generated, the action optimization module 503 repairs the current data frame and the data frames received thereafter according to the action data in the reference action library, without the need to repair the data frames before the current data frame (the first data frame and the second data frame). frame) to repair.
在点播场景下,参照8。图8为动作优化模块503对第一动作数据进行优化的过程示意图。其中,图8中的第一动作数据对应的动作可以是图6中示意的挥手动作。参照图8,动作优化模块503对第一动作数据进行优化包括:S801-S805。S801,动作优化模块503接收并缓存数字人驱动模块502发送的第一动作数据的第一数据帧,根据接收的第一数据帧确定第一动作数据对应的动作。S802,当动作优化模块503不能确定第一动作数据对应的动作时,继续接收并缓存第一动作数据的第二数据帧,并根据接收的第一数据帧、第二数据帧确定第一动作数据对应的动作。S803,当动作优化模块503不能确定第一动作数据对应的动作时,继续接收并缓存第一动作数据的第三数据帧,并根据接收的第一数据帧、第二数据帧、第三数据帧确定第一动作数据对应的动作。S804,当动作优化模块503确定出第一动作数据对应的动作以后,动作优化模块503确定参考动作库中是否存在与第一动作数据匹配的动作数据。当参考动作库中存在与第一动作数据匹配的动作数据时,动作优化模块503根据参考动作库中的动作数据对第一数据帧进行优化。S805,动作优化模块503将优化后的第一数据帧发送给数字人驱动模块502,以使数字人驱动模块502根据接收的第一数据帧驱动数字人。In the on-demand scenario, refer to 8. Figure 8 is a schematic diagram of the process of optimizing the first action data by the action optimization module 503. The action corresponding to the first action data in Figure 8 may be the waving action illustrated in Figure 6 . Referring to FIG. 8 , the action optimization module 503 optimizing the first action data includes: S801-S805. S801. The action optimization module 503 receives and caches the first data frame of the first action data sent by the digital human driving module 502, and determines the action corresponding to the first action data based on the received first data frame. S802, when the action optimization module 503 cannot determine the action corresponding to the first action data, continue to receive and cache the second data frame of the first action data, and determine the first action data based on the received first data frame and second data frame. corresponding action. S803: When the action optimization module 503 cannot determine the action corresponding to the first action data, it continues to receive and cache the third data frame of the first action data, and based on the received first data frame, second data frame, and third data frame Determine the action corresponding to the first action data. S804: After the action optimization module 503 determines the action corresponding to the first action data, the action optimization module 503 determines whether there is action data matching the first action data in the reference action library. When there is action data matching the first action data in the reference action library, the action optimization module 503 optimizes the first data frame according to the action data in the reference action library. S805, the action optimization module 503 sends the optimized first data frame to the digital human driving module 502, so that the digital human driving module 502 drives the digital human according to the received first data frame.
在点播场景下,参照图9。图9为动作优化模块503对第一动作数据进行优化的 过程示意图。其中,图9中的第一动作数据对应的动作可以是图6中示意的挥手动作。参照图9,动作优化模块503对第一动作数据进行优化包括:S901-S906。S901,动作优化模块53接收并缓存数字人驱动模块502,发送的第一动作数据的第一数据帧。S902,动作优化模块503接收并缓存数字人驱动模块502,发送的第一动作数据的第二数据帧,并判断已经缓存的数据帧的个数是否大于等于第一阈值(参照图6所示的挥手动作,第一阈值可以设置为3)。S903,动作优化模块503接收并缓存数字人驱动模块502,发送的第一动作数据的第三数据帧,并判断已经缓存的数据帧的个数是否大于等于第一阈值。S904,当动作优化模块503中缓存的第一动作数据的数据帧的个数大于等于第一阈值时,动作优化模块503确定参考动作库中是否存在与第一动作数据匹配的动作数据。S905,当参考动作库中存在与第一动作数据匹配的动作数据时,动作优化模块503根据参考动作库中的动作数据对第一数据帧进行优化。S906,动作优化模块503将优化后的第一数据帧发送给数字人驱动模块502,以使数字人驱动模块502根据接收的第一数据帧驱动数字人。In the on-demand scenario, refer to Figure 9. Figure 9 shows the action optimization module 503 optimizing the first action data. Process diagram. The action corresponding to the first action data in FIG. 9 may be the waving action illustrated in FIG. 6 . Referring to FIG. 9 , the action optimization module 503 optimizing the first action data includes: S901-S906. S901. The action optimization module 53 receives and caches the first data frame of the first action data sent by the digital human driver module 502. S902, the action optimization module 503 receives and caches the second data frame of the first action data sent by the digital human driver module 502, and determines whether the number of cached data frames is greater than or equal to the first threshold (refer to Figure 6 For waving motion, the first threshold can be set to 3). S903. The action optimization module 503 receives and caches the third data frame of the first action data sent by the digital human driver module 502, and determines whether the number of cached data frames is greater than or equal to the first threshold. S904: When the number of data frames of the first action data cached in the action optimization module 503 is greater than or equal to the first threshold, the action optimization module 503 determines whether there is action data matching the first action data in the reference action library. S905: When there is action data matching the first action data in the reference action library, the action optimization module 503 optimizes the first data frame according to the action data in the reference action library. S906, the action optimization module 503 sends the optimized first data frame to the digital human driving module 502, so that the digital human driving module 502 drives the digital human according to the received first data frame.
在点播场景下,当第一动作数据与参考动作库中的动作数据匹配上以后,动作优化模块503一边接收数字人驱动模块502发送的第一数据的数据帧,一边对已经缓存的第一数据的数据帧进行优化。由于在点播场景下,对时延没有要求,此时,动作优化模块503可以从第一动作数据的第一数据帧开始优化。In the on-demand scenario, when the first action data matches the action data in the reference action library, the action optimization module 503 receives the data frame of the first data sent by the digital human driver module 502 while processing the cached first data. The data frame is optimized. Since in the on-demand scenario, there is no requirement for delay, at this time, the action optimization module 503 can start optimizing from the first data frame of the first action data.
在上述实施例中,动作优化模块503在将第一动作数据与参考动作库中的动作数据进行匹配时,动作优化模块503是将缓存的多个数据帧与参考动作库中的动作数据进行匹配。在一个可能的示例中,动作优化模块503可以将缓存的多个数据帧的特征值与参考动作库中的动作数据的特征值进行匹配。当参考动作库中存在一个第二动作数据的特征值,与动作优化模块503中缓存的多个数据帧的特征值的相似度到达相似度阈值时,可以确定第二动作数据与第一动作数据相匹配。此时,动作优化模块503可以根据第二动作数据对第一动作数据的数据帧进行优化。在对第一动作数据的数据帧进行优化时,可以是对第一动作数据的数据帧的相对空间位置信息,以及时间上的运动信息进行优化。In the above embodiment, when the action optimization module 503 matches the first action data with the action data in the reference action library, the action optimization module 503 matches the cached multiple data frames with the action data in the reference action library. . In a possible example, the action optimization module 503 may match the feature values of the cached multiple data frames with the feature values of the action data in the reference action library. When there is a feature value of the second action data in the reference action library and the similarity with the feature values of multiple data frames cached in the action optimization module 503 reaches the similarity threshold, the second action data can be determined to be the same as the first action data. match. At this time, the action optimization module 503 can optimize the data frame of the first action data according to the second action data. When optimizing the data frame of the first action data, the relative spatial position information and temporal motion information of the data frame of the first action data may be optimized.
在一个可能的示例中,参考动作库中的各个动作数据的特征值以及第一动作数据的特征值可以是特征向量。参考动作库中的各个动作数据的特征向量是预先计算的。在计算第一动作数据的特征值时,可以根据第一动作数据中的各个帧的相对空间位置信息以及时间上的运动信息,确定第一动作数据的特征值。具体地,在计算第一动作数据的特征值时,可以基于第一动作数据中数据帧的时间变化计算关键点的四元数变化轨迹生成一个特征向量。In a possible example, the feature values of each action data in the reference action library and the feature values of the first action data may be feature vectors. The feature vectors of each action data in the reference action library are pre-calculated. When calculating the feature value of the first action data, the feature value of the first action data can be determined based on the relative spatial position information and temporal motion information of each frame in the first action data. Specifically, when calculating the feature value of the first action data, a feature vector may be generated by calculating the quaternion change trajectory of the key point based on the time change of the data frame in the first action data.
在一个可能的示例中,相似度阈值可以是由用户预先进行设定。In a possible example, the similarity threshold may be preset by the user.
在一些可能的实施例中。动作优化模块503在确定参考动作库中存在动作数据与第一动作数据相匹配后,动作优化模块503可以根据参考动作库中的动作数据,确定动作优化模块当前接收到的第一动作数据的数据帧的下一帧数据。然后动作优化模块503将获取的下一帧数据发送给数字人驱动模块,以使数字人驱动模块根据接收到的数据帧驱动数字人。在本实施例中,动作优化模块根据参考动作库中的动作数据对第一动作数据的数据帧进行预测,避免因为采集抖动出现的动作突变,增加动作自然度。 In some possible embodiments. After the action optimization module 503 determines that the action data in the reference action library matches the first action data, the action optimization module 503 can determine the first action data currently received by the action optimization module based on the action data in the reference action library. The next frame of data of the frame. Then the action optimization module 503 sends the acquired next frame data to the digital human driving module, so that the digital human driving module drives the digital human according to the received data frame. In this embodiment, the action optimization module predicts the data frame of the first action data based on the action data in the reference action library to avoid action mutations due to acquisition jitter and increase the naturalness of the action.
在一些可能的实施例中,动作优化模块根据缓存的第一动作数据的数据帧,确定参考动作库中存在一个第二动作数据与第一动作数据相匹配以后,动作优化模块503可以停止接收第一动作数据的数据帧。动作优化模块503从参考动作库中获取第二动作数据的数据帧,并将第二动作数据的数据帧发送给数字人驱动模块502,以使数字人驱动模块502根据接收到的数据帧驱动数字人。在本实施例中,中之人只需做第一动作的开始动作。然后,动作优化模块503根据第一动作的开始动作,在参考动作库中查找与第一动作匹配的第二动作,并将第二动作的动作数据帧发送给数字人驱动模块502,以使数字人驱动模块502根据接收的第二动作的动作数据帧驱动数字人,减少中之人的动作难度。In some possible embodiments, after the action optimization module determines that there is a second action data matching the first action data in the reference action library based on the cached data frame of the first action data, the action optimization module 503 can stop receiving the second action data. A data frame of action data. The action optimization module 503 obtains the data frame of the second action data from the reference action library, and sends the data frame of the second action data to the digital human driving module 502, so that the digital human driving module 502 drives the digital human according to the received data frame. people. In this embodiment, the person involved only needs to perform the starting action of the first action. Then, the action optimization module 503 searches the reference action library for a second action that matches the first action based on the starting action of the first action, and sends the action data frame of the second action to the digital human driving module 502, so that the digital human The human driving module 502 drives the digital human according to the received action data frame of the second action to reduce the difficulty of the human's action.
在一个可能的示例中,当数字人的第一动作完成以后,数字人驱动模块503可以通过插帧的方式,将第一动作过度到下一个动作。In a possible example, after the digital human's first action is completed, the digital human driving module 503 can transition the first action to the next action by inserting frames.
在一个可能的示例中,继续参阅图5,动作优化模块503包括:特征值计算模块5031、参考动作库5032、动作匹配模块5033、动作优化/预测模块5034。其中,特征值计算模块5031用于计算第一动作数据的特征值。参考动作库5032用于存储被验证过的先验动作数据(标准动作数据)。动作匹配模块5033用于将第一动作数据的特征值与参考动作库5032中存储的动作数据的数据特征值比较。当参考动作库5032存在一个第二动作数据的特征值与第一动作数据的特征值相同时,动作匹配模块5033确定第一动作数据与第二动作数据相匹配。动作优化/预测模块5034用于根据第二动作数据对第一动作数据进行优化,或者动作优化/预测模块5034用于根据第二动作数据对第一动作数据的下一帧进行预测。其中,参考动作库5032中的动作数据可以预先由服务厂商提供的数据。比如在直播场景中,数字人通过直播软件进行直播时,参考动作库5032中的动作数据可以由直播平提供。In a possible example, continuing to refer to FIG. 5 , the action optimization module 503 includes: a feature value calculation module 5031 , a reference action library 5032 , an action matching module 5033 , and an action optimization/prediction module 5034 . Among them, the feature value calculation module 5031 is used to calculate the feature value of the first action data. The reference action library 5032 is used to store verified a priori action data (standard action data). The action matching module 5033 is configured to compare the feature value of the first action data with the data feature value of the action data stored in the reference action library 5032 . When there is a feature value of the second action data in the reference action library 5032 that is the same as a feature value of the first action data, the action matching module 5033 determines that the first action data matches the second action data. The action optimization/prediction module 5034 is used to optimize the first action data based on the second action data, or the action optimization/prediction module 5034 is used to predict the next frame of the first action data based on the second action data. The action data in the reference action library 5032 may be data provided by the service provider in advance. For example, in a live broadcast scenario, when digital people perform live broadcasts through live broadcast software, the action data in the reference action library 5032 can be provided by the live broadcast platform.
数字人渲染模块504用于对数字人驱动模块503驱动的数字人进行渲染,产生对应的视频。其中,渲染是计算机动画(computer graphics,CG)的最后一道工序,用于将用户设计的内容利用软件制作成最终效果图或者动画的过程。而现实工作中需要把模型或者场景输出成图像文件、视频信号或者电影胶片时,就必须经过渲染程序。The digital human rendering module 504 is used to render the digital human driven by the digital human driving module 503 and generate a corresponding video. Among them, rendering is the last process of computer animation (computer graphics, CG), which is used to make the user-designed content into the final rendering or animation using software. In real work, when a model or scene needs to be output into an image file, video signal or film, it must go through a rendering program.
在上述实施例中,数据采集模块501、数字人驱动模块502、动作优化模块503、数字人渲染模块504均可以通过软件实现,或者可以通过硬件实现。示例性的,接下来以动作优化模块503为例,介绍动作优化模块503的实现方式。类似的,数据采集模块501、数字人驱动模块502、数字人渲染模块504的实现方式可以参考动作优化模块503的实现方式。In the above embodiment, the data collection module 501, the digital human driving module 502, the action optimization module 503, and the digital human rendering module 504 can all be implemented by software, or can be implemented by hardware. Illustratively, the following takes the action optimization module 503 as an example to introduce the implementation of the action optimization module 503. Similarly, the implementation of the data collection module 501, the digital human driving module 502, and the digital human rendering module 504 can refer to the implementation of the action optimization module 503.
模块作为软件功能单元的一种举例,动作优化模块503可以包括运行在计算实例上的代码。其中,计算实例可以包括物理主机(计算设备)、虚拟机、容器中的至少一种。进一步地,上述计算实例可以是一台或者多台。例如,动作优化模块503可以包括运行在多个主机/虚拟机/容器上的代码。需要说明的是,用于运行该代码的多个主机/虚拟机/容器可以分布在相同的区域(region)中,也可以分布在不同的region中。进一步地,用于运行该代码的多个主机/虚拟机/容器可以分布在相同的可用区(availability zone,AZ)中,也可以分布在不同的AZ中,每个AZ包括一个数据中心或多个地理位置相近的数据中心。其中,通常一个region可以包括多个AZ。 Module As an example of a software functional unit, the action optimization module 503 may include code running on a computing instance. The computing instance may include at least one of a physical host (computing device), a virtual machine, and a container. Furthermore, the above computing instance may be one or more. For example, action optimization module 503 may include code running on multiple hosts/virtual machines/containers. It should be noted that multiple hosts/virtual machines/containers used to run the code can be distributed in the same region (region) or in different regions. Further, multiple hosts/virtual machines/containers used to run the code can be distributed in the same availability zone (AZ) or in different AZs. Each AZ includes one data center or multiple AZs. geographically close data centers. Among them, usually a region can include multiple AZs.
同样,用于运行该代码的多个主机/虚拟机/容器可以分布在同一个虚拟私有云(virtual private cloud,VPC)中,也可以分布在多个VPC中。其中,通常一个VPC设置在一个region内,同一region内两个VPC之间,以及不同region的VPC之间跨区通信需在每个VPC内设置通信网关,经通信网关实现VPC之间的互连。Likewise, the multiple hosts/VMs/containers used to run the code can be distributed in the same virtual private cloud (VPC), or across multiple VPCs. Among them, usually a VPC is set up in a region. Cross-region communication between two VPCs in the same region and between VPCs in different regions requires a communication gateway in each VPC, and the interconnection between VPCs is realized through the communication gateway. .
模块作为硬件功能单元的一种举例,动作优化模块503可以包括至少一个计算设备,如服务器等。或者,动作优化模块503也可以是利用专用集成电路(application-specific integrated circuit,ASIC)实现、或可编程逻辑器件(programmable logic device,PLD)实现的设备等。其中,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD)、现场可编程门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)或其任意组合实现。Module As an example of a hardware functional unit, the action optimization module 503 may include at least one computing device, such as a server. Alternatively, the action optimization module 503 may also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD). Among them, the above-mentioned PLD can be a complex programmable logical device (CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL), or any combination thereof.
动作优化模块503包括的多个计算设备可以分布在相同的region中,也可以分布在不同的region中。动作优化模块503包括的多个计算设备可以分布在相同的AZ中,也可以分布在不同的AZ中。同样,动作优化模块503包括的多个计算设备可以分布在同一个VPC中,也可以分布在多个VPC中。其中,所述多个计算设备可以是服务器、ASIC、PLD、CPLD、FPGA和GAL等计算设备的任意组合。Multiple computing devices included in the action optimization module 503 may be distributed in the same region or in different regions. Multiple computing devices included in the action optimization module 503 may be distributed in the same AZ or in different AZs. Similarly, multiple computing devices included in the action optimization module 503 may be distributed in the same VPC or in multiple VPCs. The plurality of computing devices may be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.
接下来,基于上文所描述的内容,对本申请实施例提供的一种数字人驱动方法进行介绍。可以理解的是,该方法是基于上文所描述的内容提出,该方法中的部分或全部内容可以参见上文中的描述。Next, based on the content described above, a digital human driving method provided by the embodiment of the present application is introduced. It can be understood that this method is proposed based on the content described above, and part or all of the content of the method can be referred to the above description.
请参阅图10,图10是本申请实施例提供的一种数字人优化方法的流程示意图。该方法适用于通过中之人对数字人进行驱动的场景(真人驱动场景)。如图10所示,该方法包括:S1001-S1005。Please refer to Figure 10, which is a schematic flowchart of a digital human optimization method provided by an embodiment of the present application. This method is suitable for scenarios in which digital humans are driven by human beings (real-person driving scenarios). As shown in Figure 10, the method includes: S1001-S1005.
在S1001,触发数字人优化系统开启数字人动作优化服务。示例性的,数字人优化系统可以默认开启动作优化服务。也可以是由用户自己确定是否需要开启数字人优化系统的动作优化服务。In S1001, the digital human optimization system is triggered to start the digital human action optimization service. For example, the digital human optimization system can enable the action optimization service by default. It is also possible for the user to determine whether the action optimization service of the digital human optimization system needs to be enabled.
在S1002,采集中之人产生的第一动作数据。示例性的,可以通过上文图5中所描述的数字人优化系统500中的数据采集模块501采集中之人产生的第一动作数据。示例性的,采集的第一动作数据可以是包含中之人的动作的视频文件数据,也可以是中之人的身体关键点的空间数据。In S1002, first action data generated by the person being collected is collected. For example, the first action data generated by the person can be collected through the data collection module 501 in the digital human optimization system 500 described in FIG. 5 above. For example, the first action data collected may be video file data containing the actions of the person in the video, or may be spatial data of key points of the body of the person in the video.
在S1003,基于数字人优化系统开启了数字人动作优化服务,将所述第一动作数据与动作库中的动作数据进行匹配。示例性的,可以通过上文图5中的数字人驱动模块502来确定是否开启了数字人动作优化服务。进一步地,数字人驱动模块502还需要确定第一动作数据的类型。当第一动作数据为视频文件数据时,数字人驱动模块502还需要从该视频文件中识别出中之人的身体关键点,并确定识别出的身体关键点的空间数据。然后,数字人驱动模块502将识别的中之人的身体关键点的空间数据作为第一动作数据。当数字人驱动模块502接收到的第一动作数据为中之人的身体关键点的空间数据时,数字人驱动模块502不需要再对第一动作数据进行处理。示例性的,可以通过上文图5中的动作优化模块503来将第一动作数据与参考动作库5032中的动作数据进行匹配。具体 的实现过程可以参照上文图5中对动作优化模块503的描述。In S1003, the digital human action optimization service is started based on the digital human optimization system, and the first action data is matched with the action data in the action library. For example, whether the digital human action optimization service is enabled can be determined through the digital human driving module 502 in Figure 5 above. Further, the digital human driving module 502 also needs to determine the type of the first action data. When the first action data is video file data, the digital human driving module 502 also needs to identify the body key points of the person in the video file, and determine the spatial data of the identified body key points. Then, the digital human driving module 502 uses the recognized spatial data of the body key points of the human as the first action data. When the first action data received by the digital human driving module 502 is the spatial data of key points of the human body, the digital human driving module 502 does not need to process the first action data. For example, the first action data can be matched with the action data in the reference action library 5032 through the action optimization module 503 in FIG. 5 above. specific The implementation process of can refer to the description of the action optimization module 503 in Figure 5 above.
在S1004,基于参考动作库中存在第二动作数据与第一动作数据匹配,根据第二动作数驱动数字人。示例性的,可以通过上文图5中的动作优化模块503对第一动作数据进行优化,得到第三动作数据。在对第一动作数据进行优化时,可以分为需要考虑时延的场景,和不需要考虑时延的场景。具体的实现过程可以参照上文图7、图8、图9中所描述的方法。In S1004, based on the second action data matching the first action data in the reference action library, the digital human is driven according to the second action number. For example, the first action data can be optimized through the action optimization module 503 in Figure 5 above to obtain the third action data. When optimizing the first action data, it can be divided into scenarios where delay needs to be considered and scenarios where delay does not need to be considered. For the specific implementation process, please refer to the methods described in Figure 7, Figure 8, and Figure 9 above.
示例性的,在当参考动作库中存在第二动作数据与第一动作数据匹配时,可以直接根据第二动作数据对数字人进行驱动。此时,中之人只需做第一动作的开始动作。然后,动作优化模块503根据第一动作的开始动作,在参考动作库中查找与第一动作匹配的第二动作,并将第二动作的动作数据帧发送给数字人驱动模块502,以使数字人驱动模块502根据接收的第二动作的动作数据帧驱动数字人,减少中之人的动作难度。在本申请实施例中,数字人驱动模块在驱动数字人之前,先对采集的第一动作数据进行判断,以确定是否需要对第一动作数据进行优化。当数字人驱动模块确定第一动作需要进行优化时,数字人驱动模块将第一数据发送给动作优化模块进行优化。然后数字人驱动模块根据优化后的动作数据来驱动数字人,避免了数字人的动作变形以及降低了数字人出现穿模的问题。For example, when the second action data matches the first action data in the reference action library, the digital human can be driven directly based on the second action data. At this time, the person in the middle only needs to do the starting action of the first action. Then, the action optimization module 503 searches the reference action library for a second action that matches the first action based on the starting action of the first action, and sends the action data frame of the second action to the digital human driving module 502, so that the digital human The human driving module 502 drives the digital human according to the received action data frame of the second action to reduce the difficulty of the human's action. In the embodiment of the present application, before driving the digital human, the digital human driving module first judges the collected first action data to determine whether the first action data needs to be optimized. When the digital human driving module determines that the first action needs to be optimized, the digital human driving module sends the first data to the action optimization module for optimization. Then the digital human driving module drives the digital human based on the optimized action data, avoiding the deformation of the digital human's movements and reducing the problem of the digital human being wearing a mold.
本申请还提供一种计算设备1100。如图11所示,计算设备1100包括:总线1102、处理器1104、存储器1106和通信接口1108。处理器1104、存储器1106和通信接口1108之间通过总线1102通信。计算设备1100可以是服务器或终端设备。应理解,本申请不限定计算设备1100中的处理器、存储器的个数。The present application also provides a computing device 1100. As shown in Figure 11, computing device 1100 includes: bus 1102, processor 1104, memory 1106, and communication interface 1108. The processor 1104, the memory 1106 and the communication interface 1108 communicate through a bus 1102. Computing device 1100 may be a server or a terminal device. It should be understood that this application does not limit the number of processors and memories in the computing device 1100.
总线1102可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图11中仅用一条线表示,但并不表示仅有一根总线或一种类型的总线。总线1104可包括在计算设备1100各个部件(例如,存储器1106、处理器1104、通信接口1108)之间传送信息的通路。The bus 1102 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one line is used in Figure 11, but it does not mean that there is only one bus or one type of bus. Bus 1104 may include a path that carries information between various components of computing device 1100 (eg, memory 1106, processor 1104, communications interface 1108).
处理器1104可以包括中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、微处理器(micro processor,MP)或者数字信号处理器(digital signal processor,DSP)等处理器中的任意一种或多种。The processor 1104 may include a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP). any one or more of them.
存储器1106可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。处理器1104还可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器,机械硬盘(hard disk drive,HDD)或固态硬盘(solid state drive,SSD)。Memory 1106 may include volatile memory, such as random access memory (RAM). The processor 1104 may also include non-volatile memory, such as read-only memory (ROM), flash memory, hard disk drive (HDD) or solid state drive (solid state drive). drive, SSD).
存储器1106中存储有可执行的程序代码,处理器1104执行该可执行的程序代码以分别实现前述数据采集模块501、数字人驱动模块502、动作优化模块503、数字人渲染模块504的功能,从而实现数字人优化方法。也即,存储器106上存有用于执行数字人优化方法的指令。The memory 1106 stores executable program code, and the processor 1104 executes the executable program code to realize the functions of the aforementioned data collection module 501, digital human driver module 502, action optimization module 503, and digital human rendering module 504, respectively. Implementing digital human optimization methods. That is, the memory 106 stores instructions for executing the digital human optimization method.
通信接口1103使用例如但不限于网络接口卡、收发器一类的收发模块,来实现计算设备1100与其他设备或通信网络之间的通信。 The communication interface 1103 uses transceiver modules such as, but not limited to, network interface cards and transceivers to implement communication between the computing device 1100 and other devices or communication networks.
本申请实施例还提供了一种计算设备集群。该计算设备集群包括至少一台计算设备。该计算设备可以是服务器,例如是中心服务器、边缘服务器,或者是本地数据中心中的本地服务器。在一些实施例中,计算设备也可以是台式机、笔记本电脑或者智能手机等终端设备。An embodiment of the present application also provides a computing device cluster. The computing device cluster includes at least one computing device. The computing device may be a server, such as a central server, an edge server, or a local server in a local data center. In some embodiments, the computing device may also be a terminal device such as a desktop computer, a laptop computer, or a smartphone.
如图12所示,所述计算设备集群包括至少一个计算设备1100。计算设备集群中的一个或多个计算设备1100中的存储器1106中可以存有相同的用于执行数字人优化方法的指令。As shown in Figure 12, the computing device cluster includes at least one computing device 1100. The same instructions for performing the digital human optimization method may be stored in the memory 1106 of one or more computing devices 1100 in the cluster of computing devices.
在一些可能的实现方式中,该计算设备集群中的一个或多个计算设备1100的存储器1106中也可以分别存有用于执行数字人优化方法的部分指令。换言之,一个或多个计算设备1100的组合可以共同执行用于执行数字人优化方法的指令。In some possible implementations, the memory 1106 of one or more computing devices 1100 in the computing device cluster may also respectively store part of the instructions for executing the digital human optimization method. In other words, a combination of one or more computing devices 1100 may collectively execute instructions for performing a digital human optimization method.
需要说明的是,计算设备集群中的不同的计算设备1100中的存储器1106可以存储不同的指令,分别用于执行数字人驱动系统的部分功能。也即,不同的计算设备1100中的存储器1106存储的指令可以实现数据采集模块501、数字人驱动模块502、动作优化模块503、数字人渲染模块504中的一个或多个模块的功能。It should be noted that the memory 1106 in different computing devices 1100 in the computing device cluster can store different instructions, which are respectively used to execute part of the functions of the digital human driving system. That is, the instructions stored in the memory 1106 in different computing devices 1100 can implement the functions of one or more modules in the data collection module 501, the digital human driving module 502, the action optimization module 503, and the digital human rendering module 504.
在一些可能的实现方式中,计算设备集群中的一个或多个计算设备可以通过网络连接。其中,所述网络可以是广域网或局域网等等。图13示出了一种可能的实现方式。如图13所示,两个计算设备1100A和1100B之间通过网络进行连接。具体地,通过各个计算设备中的通信接口与所述网络进行连接。在这一类可能的实现方式中,计算设备1100A中的存储器1106中存有执行数据采集模块501的功能的指令。同时,计算设备1100B中的存储器1106中存有执行数字人驱动模块502、动作优化模块503、数字人渲染模块504的功能的指令。In some possible implementations, one or more computing devices in a cluster of computing devices may be connected through a network. Wherein, the network may be a wide area network or a local area network, etc. Figure 13 shows a possible implementation. As shown in Figure 13, two computing devices 1100A and 1100B are connected through a network. Specifically, the connection to the network is made through a communication interface in each computing device. In this type of possible implementation, the memory 1106 in the computing device 1100A stores instructions for performing the functions of the data acquisition module 501 . At the same time, the memory 1106 in the computing device 1100B stores instructions for executing the functions of the digital human driving module 502, the motion optimization module 503, and the digital human rendering module 504.
图13所示的计算设备集群之间的连接方式可以是考虑到本申请提供的数字人动作优化方法需要存储大量的数据,以及需要进行大量的计算,因此考虑数字人驱动模块502、动作优化模块503、数字人渲染模块504实现的功能交由计算设备100B执行。The connection method between the computing device clusters shown in Figure 13 can be: Considering that the digital human action optimization method provided by this application needs to store a large amount of data and perform a large amount of calculations, the digital human driving module 502 and the action optimization module are considered 503. The functions implemented by the digital human rendering module 504 are executed by the computing device 100B.
应理解,图13中示出的计算设备100A的功能也可以由多个计算设备100完成。同样,计算设备100B的功能也可以由多个计算设备100完成。It should be understood that the functions of the computing device 100A shown in FIG. 13 may also be performed by multiple computing devices 100 . Likewise, the functions of computing device 100B may also be performed by multiple computing devices 100 .
本申请实施例还提供了一种包含指令的计算机程序产品。所述计算机程序产品可以是包含指令的,能够运行在计算设备上或被储存在任何可用介质中的软件或程序产品。当所述计算机程序产品在至少一个计算设备上运行时,使得至少一个计算设备执行数字人动作优化方法。An embodiment of the present application also provides a computer program product containing instructions. The computer program product may be a software or program product containing instructions capable of running on a computing device or stored in any available medium. When the computer program product is run on at least one computing device, at least one computing device is caused to execute the digital human motion optimization method.
本申请实施例还提供了一种计算机可读存储介质。所述计算机可读存储介质可以是计算设备能够存储的任何可用介质或者是包含一个或多个可用介质的数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘)等。该计算机可读存储介质包括指令,所述指令指示计算设备执行数字人动作优化方法。An embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium may be any available medium that a computing device can store or a data storage device such as a data center that contains one or more available media. The available media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media (eg, solid state drive), etc. The computer-readable storage medium includes instructions that instruct a computing device to perform a digital human motion optimization method.
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行 等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的保护范围。 Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be used Modify the technical solutions described in the foregoing embodiments, or modify some of the technical features. Equivalent substitutions; and these modifications or substitutions do not cause the essence of the corresponding technical solution to depart from the protection scope of the technical solution of each embodiment of the present invention.

Claims (16)

  1. 一种数字人驱动方法,其特征在于,所述方法包括:A digital human driving method, characterized in that the method includes:
    获取目标对象产生的第一动作数据;Obtain the first action data generated by the target object;
    将所述第一动作数据与动作库中的动作数据进行匹配;Match the first action data with action data in the action library;
    当所述动作库中存在第二动作数据与所述第一动作数据匹配时,根据所述第二动作数据驱动数字人。When there is second action data in the action library that matches the first action data, the digital human is driven according to the second action data.
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述第二动作数据驱动数字人包括:The method of claim 1, wherein driving the digital human according to the second action data includes:
    根据第一动作数据和第二动作数据驱动数字人。The digital human is driven based on the first motion data and the second motion data.
  3. 根据权利要求2所述的方法,其特征在于,所述根据第一动作数据和第二动作数据驱动数字人包括:The method according to claim 2, wherein driving the digital human according to the first action data and the second action data includes:
    根据所述第二动作数据,对所述第一动作数据中至少一个数据帧进行优化,得到优化后的第一动作数据;According to the second action data, optimize at least one data frame in the first action data to obtain optimized first action data;
    根据优化后的第一动作数据对数字人进行驱动。The digital human is driven based on the optimized first action data.
  4. 根据权利要求1所述的方法,其特征在于,所述将所述第一动作数据与所述动作库中的动作数据进行匹配包括:The method of claim 1, wherein matching the first action data with action data in the action library includes:
    对第一动作数据中的m个动作数据帧进行特征提取,确定所述第一动作数据的特征值;其中,m为大于等于2的自然数;Perform feature extraction on m action data frames in the first action data to determine the feature values of the first action data; where m is a natural number greater than or equal to 2;
    将所述第一动作数据的特征值与所述动作库中的动作数据的特征值进行比较;Compare the characteristic value of the first action data with the characteristic value of the action data in the action library;
    基于所述第一动作数据的特征值与所述动作库中的动作数据的特征值的相似度大于相似度阈值,确定所述动作库中存在动作数据与所述第一动作数据匹配。Based on the similarity between the feature value of the first action data and the feature value of the action data in the action library being greater than the similarity threshold, it is determined that there is action data in the action library that matches the first action data.
  5. 根据权利要求4所述的方法,其特征在于,所述对所述m个动作数据帧进行特征提取,确定所述第一动作数据的特征值包括:The method according to claim 4, characterized in that, performing feature extraction on the m action data frames and determining the feature value of the first action data includes:
    获取所述m个动作数据帧中各个数据帧的空间信息和时间上的运动信息;Obtain the spatial information and temporal motion information of each data frame in the m action data frames;
    根据所述各个数据帧的空间信息和时间上运动信息,生成所述第一动作数据的特征值。Characteristic values of the first action data are generated based on the spatial information and temporal motion information of each data frame.
  6. 根据权利要求1所述的方法,其特征在于,在将所述第一动作数据与动作库中的动作数据进行匹配之前,所述方法还包括:The method according to claim 1, characterized in that, before matching the first action data with the action data in the action library, the method further includes:
    确定数字人动作优化系统是否开启了动作数据优化功能,基于所述系统开启了动作数据优化功能,将所述第一动作数据与动作库中的动作数据进行匹配。Determine whether the digital human action optimization system has turned on the action data optimization function, and based on the system turning on the action data optimization function, match the first action data with the action data in the action library.
  7. 一种数字人驱动系统,其特征在于,包括:A digital human-driven system is characterized by including:
    采集模块,用于获取目标对象产生的第一动作数据;The collection module is used to obtain the first action data generated by the target object;
    优化模块,用于将所述第一动作数据与动作库中的动作数据进行匹配;An optimization module, used to match the first action data with action data in the action library;
    处理模块,用于当所述动作库中存在第二动作数据与所述第一动作数据匹配时,根据所述第二动作数据驱动数字人。A processing module configured to drive the digital human according to the second action data when there is second action data in the action library that matches the first action data.
  8. 根据权利要求7所述的系统,其特征在于,所述处理模块用于:The system according to claim 7, characterized in that the processing module is used for:
    根据所述第一动作数据和所述第二动作数据驱动数字人。The digital human is driven according to the first motion data and the second motion data.
  9. 根据权利要求8所述的系统,其特征在于,所述优化模块用于:根据所述第二动作数据,对所述第一动作数据中至少一个数据帧进行优化,得到优化后的第一动作数据; The system according to claim 8, characterized in that the optimization module is configured to: optimize at least one data frame in the first action data according to the second action data to obtain an optimized first action data;
    所述处理模块用于:根据优化后的第一动作数据驱动数字人。The processing module is used to drive the digital human according to the optimized first action data.
  10. 根据权利要求7所述的系统,其特征在于,所述优化模块用于:The system according to claim 7, characterized in that the optimization module is used for:
    对第一动作数据中的m个动作数据帧进行特征提取,确定所述第一动作数据的特征值;其中,m为大于等于2的自然数;Perform feature extraction on m action data frames in the first action data to determine the feature values of the first action data; where m is a natural number greater than or equal to 2;
    将所述第一动作数据的特征值与所述动作库中的动作数据的特征值进行比较;Compare the characteristic value of the first action data with the characteristic value of the action data in the action library;
    基于所述第一动作数据的特征值与所述动作库中的动作数据的特征值的相似度大于相似度阈值,确定所述动作库中存在动作数据与所述第一动作数据匹配。Based on the similarity between the feature value of the first action data and the feature value of the action data in the action library being greater than the similarity threshold, it is determined that there is action data in the action library that matches the first action data.
  11. 根据权利要求10所述的系统,其特征在于,所述优化模块用于:The system according to claim 10, characterized in that the optimization module is used for:
    获取所述m个动作数据中各个数据帧的空间信息和时间上的运动信息;Obtain the spatial information and temporal motion information of each data frame in the m action data;
    根据所述各个数据帧的空间信息和时间上的运动信息,生成所述第一动作数据的特征值。Characteristic values of the first action data are generated based on the spatial information and temporal motion information of each data frame.
  12. 根据权利要求7所述的系统,其特征在于,所述优化模块还用于:The system according to claim 7, characterized in that the optimization module is also used to:
    确定数字人动作优化系统是否开启了动作数据优化功能,基于所述系统开启了动作数据优化功能,将所述第一动作数据与动作库中的动作数据进行匹配。Determine whether the digital human action optimization system has turned on the action data optimization function, and based on the system turning on the action data optimization function, match the first action data with the action data in the action library.
  13. 一种电子设备,其特征在于,包括:An electronic device, characterized by including:
    至少一个存储器,用于存储程序;At least one memory for storing programs;
    至少一个处理器,用于执行所述存储器存储的程序;At least one processor for executing programs stored in the memory;
    其中,当所述存储器存储的程序被执行时,所述处理器用于执行如权利要求1-6任一所述的方法。Wherein, when the program stored in the memory is executed, the processor is configured to execute the method according to any one of claims 1-6.
  14. 一种计算设备集群,其特征在于,包括至少一个计算设备,每个计算设备包括处理器和存储器;A computing device cluster, characterized by including at least one computing device, each computing device including a processor and a memory;
    所述至少一个计算设备的处理器用于执行所述至少一个计算设备的存储器中存储的指令,以使得所述计算设备集群执行如权利要求1-6任一项所述的方法。The processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device, so that the computing device cluster performs the method according to any one of claims 1-6.
  15. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,当所述计算机程序在处理器上运行时,使得所述处理器执行如权利要求1-6任一所述的方法。A computer-readable storage medium that stores a computer program. When the computer program is run on a processor, it causes the processor to execute the method as described in any one of claims 1-6. .
  16. 一种计算机程序产品,其特征在于,当所述计算机程序产品在处理器上运行时,使得所述处理器执行如权利要求1-6任一所述的方法。 A computer program product, characterized in that, when the computer program product is run on a processor, the processor is caused to execute the method according to any one of claims 1-6.
PCT/CN2023/087222 2022-07-08 2023-04-10 Digital human driving method and system, and device WO2024007648A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210797830.6 2022-07-08
CN202210797830.6A CN117409118A (en) 2022-07-08 2022-07-08 Digital man driving method, system and equipment

Publications (1)

Publication Number Publication Date
WO2024007648A1 true WO2024007648A1 (en) 2024-01-11

Family

ID=89454260

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/087222 WO2024007648A1 (en) 2022-07-08 2023-04-10 Digital human driving method and system, and device

Country Status (2)

Country Link
CN (1) CN117409118A (en)
WO (1) WO2024007648A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886194A (en) * 2019-02-21 2019-06-14 湖北汽车工业学院 A kind of processing method and equipment of action recognition
US20210375021A1 (en) * 2020-05-29 2021-12-02 Adobe Inc. Neural State Machine Digital Character Animation
CN114022645A (en) * 2021-11-10 2022-02-08 华中师范大学 Action driving method, device, equipment and storage medium of virtual teacher system
CN114401438A (en) * 2021-12-31 2022-04-26 魔珐(上海)信息科技有限公司 Video generation method and device for virtual digital person, storage medium and terminal
CN114630173A (en) * 2022-03-03 2022-06-14 北京字跳网络技术有限公司 Virtual object driving method and device, electronic equipment and readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886194A (en) * 2019-02-21 2019-06-14 湖北汽车工业学院 A kind of processing method and equipment of action recognition
US20210375021A1 (en) * 2020-05-29 2021-12-02 Adobe Inc. Neural State Machine Digital Character Animation
CN114022645A (en) * 2021-11-10 2022-02-08 华中师范大学 Action driving method, device, equipment and storage medium of virtual teacher system
CN114401438A (en) * 2021-12-31 2022-04-26 魔珐(上海)信息科技有限公司 Video generation method and device for virtual digital person, storage medium and terminal
CN114630173A (en) * 2022-03-03 2022-06-14 北京字跳网络技术有限公司 Virtual object driving method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN117409118A (en) 2024-01-16

Similar Documents

Publication Publication Date Title
US11645801B2 (en) Method for synthesizing figure of virtual object, electronic device, and storage medium
US11132543B2 (en) Unconstrained appearance-based gaze estimation
US8958630B1 (en) System and method for generating a classifier for semantically segmenting an image
US11049310B2 (en) Photorealistic real-time portrait animation
US20180247201A1 (en) Systems and methods for image-to-image translation using variational autoencoders
CN116897326A (en) Hand lock rendering of virtual objects in artificial reality
KR20190112894A (en) Method and apparatus for 3d rendering
CN113327278B (en) Three-dimensional face reconstruction method, device, equipment and storage medium
US11887235B2 (en) Puppeteering remote avatar by facial expressions
CN113661471A (en) Hybrid rendering
CN113628327B (en) Head three-dimensional reconstruction method and device
US20230401799A1 (en) Augmented reality method and related device
CN114663572A (en) Optical importance caching using spatial hashing in real-time ray tracing applications
CN111696163A (en) Synthetic infrared image generation for gaze estimation machine learning
WO2024007648A1 (en) Digital human driving method and system, and device
US20230281906A1 (en) Motion vector optimization for multiple refractive and reflective interfaces
WO2022226744A1 (en) Texture completion
US20240046554A1 (en) Presenting virtual representation of real space using spatial transformation
KR20210110164A (en) Electronic device and method for depth map re-projection on electronic device
US9600940B2 (en) Method and systems for processing 3D graphic objects at a content processor after identifying a change of the object
US11954779B2 (en) Animation generation method for tracking facial expression and neural network training method thereof
US11823319B2 (en) Techniques for rendering signed distance functions
EP4195163A1 (en) Method, apparatus, and computer program product for identifying an object of interest within an image from a digital signature generated by a signature encoding module including a hypernetwork
CN113160365A (en) Image processing method, device, equipment and computer readable storage medium
JP2023542598A (en) Character display methods, devices, electronic devices, and storage media

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23834431

Country of ref document: EP

Kind code of ref document: A1