CN115601482A

CN115601482A - Digital human action control method and device, equipment, medium and product thereof

Info

Publication number: CN115601482A
Application number: CN202211185265.4A
Authority: CN
Inventors: 李凌; 林绪虹; 王颖琦
Original assignee: Guangzhou Huaduo Network Technology Co Ltd
Current assignee: Guangzhou Huaduo Network Technology Co Ltd
Priority date: 2022-09-27
Filing date: 2022-09-27
Publication date: 2023-01-13

Abstract

The application relates to a digital human action control method and a device, equipment, medium and product thereof, wherein the method comprises the following steps: acquiring an action file corresponding to the motion of a digital person, wherein the action file comprises information frames corresponding to all image frames in a moving image of the digital person, and the information frames store the position information of skeletal key points of the digital person; performing action abnormity detection according to the position information of the bone key points in each information frame, and determining that action abnormity segments which describe action abnormity phenomena exist, wherein the action abnormity segments comprise one or more information frames with continuous time sequence; correcting the action abnormal segment to overcome the action abnormal phenomenon; and driving the digital human to move by applying the motion file so as to generate the motion image. The method and the device have the advantages that the abnormal action detection and optimization are carried out on the action file for driving the digital person to move, so that the corresponding motion image is smoother and more natural.

Description

Digital human action control method and device, equipment, medium and product thereof

Technical Field

The present application relates to digital human virtual technologies, and in particular, to a method, an apparatus, a device, a medium, and a product for controlling a digital human action.

Background

With the rise of the meta universe, the playing method of the digital people introduced into the live broadcast scene also becomes the standard distribution function of the live broadcast application, and all manufacturers and live broadcast platforms continuously release the virtual digital people function. Live broadcasting by "digital people" has become a choice of many anchor. In a live scene, the 'digital person' has a plurality of new playing methods, and the 'digital person' is driven to move through external materials such as sound, actions, music and the like, so that the 'digital person' has a novel and popular function.

At present, in a scheme of driving a digital person through external materials, a machine learning mode is generally adopted, and an action file for controlling the motion of the digital person is generated through the external materials. The action file comprises each frame of data of the motion of the digital person, each frame of data comprises coordinate information of each skeleton node of a human body, the gesture of drawing the digital person is indicated through the coordinate information of each skeleton node, and the digital person can be driven to present a motion effect by continuously drawing multiple frames of images.

Since the motion files generated by machine learning often have deviations, the motions embodied in the video images of the digital people driven by the motion files often have many uncoordinated abnormal motion phenomena such as die-crossing, unsmooth, static, rhythm and the like, and a good-after scheme needs to be provided.

Disclosure of Invention

An object of the present application is to solve the above problems and provide a digital human motion control method, and a corresponding apparatus, device, non-volatile readable storage medium, and computer program product.

According to an aspect of the present application, there is provided a digital human motion control method, including the steps of:

acquiring an action file corresponding to the motion of a digital person, wherein the action file comprises information frames corresponding to all image frames in a moving image of the digital person, and the information frames store the position information of skeletal key points of the digital person;

performing motion anomaly detection according to the position information of the bone key points in each information frame, and determining that motion anomaly segments describing motion anomaly phenomena exist in the information frames, wherein the motion anomaly segments comprise one or more time-sequence continuous information frames;

correcting the action abnormal segment to overcome the action abnormal phenomenon;

and driving the digital human to move by applying the action file so as to generate the moving image.

Optionally, performing abnormal motion detection according to the position information of the bone key points in each information frame, and determining that there is an abnormal motion segment describing an abnormal motion phenomenon, including:

reading at least one information frame in the action file as a target information frame, and determining a trunk model space of the digital person according to skeleton key points corresponding to a trunk area of the digital person in the target information frame;

detecting whether the limb area falls into the trunk model space or not according to the position information of the skeleton key points corresponding to the limb area in the target information frame, and determining that the target information frame has an abnormal action phenomenon corresponding to the mode crossing when the limb area falls into the trunk model space;

and constructing single or multiple time-sequence continuous target information frames with abnormal action phenomena into abnormal action segments.

Optionally, determining a trunk model space of the digital person according to the skeleton key points corresponding to the trunk region of the digital person in the target information frame includes:

determining a triangular plane area according to two shoulder joint points corresponding to the trunk area in the target information frame and three skeleton key points corresponding to the trunk base point;

calculating and determining a geometric center point of the triangular plane area, and determining coordinate information of the geometric center point in a three-dimensional coordinate system;

and expanding the three-dimensional direction according to the coordinate information of the geometric center point to determine the trunk model space of the digital person.

Optionally, performing motion anomaly detection according to the position information of the bone key points in each information frame, and determining that a motion anomaly segment describing a motion anomaly phenomenon exists in the information frame, including:

continuously calculating the similarity of motion characteristic vectors of every two time-sequence continuous information frames in the motion file, wherein the motion characteristic vectors comprise position information of bone key points of digital human limb parts in corresponding information frames;

and when the similarity exceeds a preset threshold, expanding and determining an action abnormal segment covering the two information frames from the action file based on the two corresponding information frames.

Optionally, the step of correcting the abnormal action segment to overcome the abnormal action phenomenon includes any one or more of the following steps:

when the action abnormal fragment comprises a single information frame, adjusting the position information of the skeleton key points falling into the body model space based on the body model space corresponding to the information frame, so that the corresponding limb area is not overlapped with the body model space;

when the action abnormal fragment comprises a plurality of information frames of a first number, matching an information frame set similar to the action of each information frame in the action abnormal fragment from a material library, and replacing the action abnormal fragment in the action file with the information frame set;

and when the action abnormal segment comprises a plurality of information frames of a second quantity, taking the information frames which are positioned before and after the action abnormal segment in the action file as reference frames, carrying out frame insertion operation based on the two reference frames, generating an information frame set, and replacing the action abnormal segment in the action file with the information frame set.

Optionally, matching an information frame set similar to the actions of the information frames in the action abnormal fragment from the material library, including:

acquiring action characteristic vectors corresponding to all information frames in the action abnormal fragment, wherein the action characteristic vectors comprise position information of bone key points of digital human limb parts in the corresponding information frames;

collecting the abnormal action fragments and the information frames of all the materials in the material library, calculating the data distance between the action characteristic vectors of every two corresponding information frames in the time sequence according to the corresponding relation of the information frames in the time sequence, and accumulating the data distance of each information frame in the information frame collection of each material into the similarity score of the material;

and determining the information frame set of the material with the highest similarity score as the information frame set similar to the action abnormal fragment.

Optionally, performing frame interpolation based on the two reference frames includes:

detecting the similarity of motion characteristic vectors of the two reference frames, wherein the motion characteristic vectors comprise position information of bone key points of digital human limb parts in corresponding information frames;

when the similarity reaches a preset threshold value, performing frame interpolation operation on the two reference frames in the three-dimensional coordinate system;

and when the similarity does not reach a preset threshold value, converting the position information of the skeletal key points in the two reference frames into Euler angles from the three-dimensional coordinate information, performing frame interpolation operation on the two reference frames based on the Euler angles, and converting the two reference frames into the three-dimensional coordinate information again after the frame interpolation operation.

According to another aspect of the present application, there is provided a digital human motion control apparatus including:

the file acquisition module is used for acquiring a motion file corresponding to the motion of a digital person, wherein the motion file comprises information frames corresponding to all image frames in a motion image of the digital person, and the information frames store the position information of skeletal key points of the digital person;

the abnormality detection module is used for detecting action abnormality according to the position information of the bone key points in each information frame and determining action abnormality fragments which describe the action abnormality phenomenon and comprise one or more information frames with continuous time sequence;

the action correcting module is used for correcting the action abnormal segment so as to overcome the action abnormal phenomenon;

and the file application module is used for applying the motion file to drive the digital human to move so as to generate the motion image.

According to another aspect of the present application, there is provided a digital human action control device, comprising a central processing unit and a memory, wherein the central processing unit is used for calling and running a computer program stored in the memory to execute the steps of the digital human action control method.

According to another aspect of the present application, there is provided a non-transitory readable storage medium storing a computer program implemented according to the digital human action control method in the form of computer readable instructions, the computer program, when called by a computer, executing the steps included in the method.

According to another aspect of the present application, there is provided a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method described in any one of the embodiments of the present application.

Compared with the prior art, the method and the device have the advantages that the action abnormity detection is carried out on the action file which is generated in advance and used for driving the digital human to move, the information frame corresponding to the action abnormity segment is determined, the information frame is corrected, and many uncoordinated action abnormity phenomena such as die penetration, unsmoothness, stillness and rhythm in the action of the digital human are eliminated.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a network architecture diagram of an exemplary application environment of the present application;

FIG. 2 is a schematic diagram of an exemplary digital human and its skeletal keypoint distribution of the present application;

FIG. 3 is a schematic flow chart diagram illustrating an embodiment of a digital human motion control method according to the present application;

FIG. 4 is a schematic flow chart illustrating the determination of an abnormal motion segment corresponding to a threading die in the embodiment of the present application;

FIG. 5 is a schematic flow chart of determining a torso model space of a digital person in an embodiment of the present application;

FIG. 6 is a flowchart illustrating a process of determining abnormal motion segments corresponding to unsmooth motion in the embodiment of the present application;

FIG. 7 is a flow chart illustrating matching of information frame sets for anomalous motion segments in accordance with an embodiment of the present application;

FIG. 8 is a flowchart illustrating a frame insertion operation performed by differentiating different situations according to an embodiment of the present application;

FIG. 9 is a functional block diagram of the digital human motion control apparatus of the present application;

fig. 10 is a schematic structural diagram of a digital human motion control device used in the present application.

Detailed Description

Referring to fig. 1, a network architecture adopted in an exemplary application scenario in the present application includes a terminal device 80, a media server 81 and an application server 82, where the application server 82 may be configured to deploy live webcast service, the media server 81 may be configured to generate a video stream corresponding to a digital person according to an action file, when a user accesses the live webcast service provided by the application server 82 from the terminal device 80 and starts to control and output the video stream corresponding to the digital person, an action file for controlling the motion of the digital person to generate the video stream may be specified, and then the media server 81 generates a corresponding video stream according to the action file with the help of a process generated by a computer program that implements the digital person action control method in the present application, and pushes the video stream to a live webcast room under the coordination of the application server 82, thereby providing live webcast service through a network.

In another exemplary application scenario of the present application, a terminal device may implement a function corresponding to the media server, and after the action file is specified by a user, a corresponding digital human video stream is generated according to the action file through running of a process generated by a computer program implementing the digital human action control method of the present application and is pushed to a remote computer device, so that further applications related to conversation and live broadcast can be implemented.

In an exemplary industrial application, the motion file may be provided with motion-related information based on music, voice, text, dance video, etc., and generated by a machine learning model trained in advance to a converged state based on the motion-related information. The action file comprises information frames corresponding to all image frames in a digital human video stream to be generated, each information frame stores and describes image information of a digital human in the corresponding image frame, the image information is described through position information of all preset bone key points of the digital human, and a specific data format can be adapted to an output format of the machine learning model to be standardized or preset arbitrarily. For example, in one embodiment, when the motion file represents each information frame, the coordinate values of the bone key points in the image coordinate system may be provided corresponding to preset bone key points, so as to adapt to the situation of generation of digital human three-dimensional modeling. It can be seen that the action file defined in the present application can be configured to be applied in the present application as long as a data structure inside the action file is defined in advance and is used for representing position information, which is generated by a machine learning model or any other technical implementation manner according to the action-related information and is used for controlling the bone key points of the digital human to generate corresponding motions.

The skeletal key points of the digital person, as illustrated in fig. 2, are mainly distributed at each skeletal joint part of the body and limbs of the digital person, usually, the displacement of the skeletal key points in an image coordinate system can drive the corresponding body part of the digital person to generate a corresponding motion effect, usually, the change values of the position information of the same group of skeletal key points are gradually adjusted in a plurality of information frames, so that the image frames with gradually changed motions of the digital person can be correspondingly controlled, and the video streams formed by the image frames are sequentially played, so that the motion effect of the body part of the digital person corresponding to the corresponding skeletal key points can be visually presented.

The digital person may be modeled in advance so as to adjust image frame positions where the images of the body part corresponding to the respective skeletal key points are located according to the position information of the respective skeletal key points in the information frame, so that the corresponding digital person image frame may be generated by rendering in the following. In addition, according to actual needs, when generating the image frame of the digital person, a corresponding background image or other foreground images and the like can be added to the image frame, and therefore, the method can be flexibly implemented by a person skilled in the art.

Referring to fig. 3, in an embodiment of a digital human action control method according to the present application, the method includes the following steps:

step S1100, acquiring an action file corresponding to the motion of a digital person, wherein the action file comprises information frames corresponding to all image frames in a motion image of the digital person, and the information frames store the position information of skeletal key points of the digital person;

as for a known action file, as mentioned above, it stores information frames corresponding to control a digital person to generate a video, each information frame corresponds to location information describing each skeletal key point in a three-dimensional model of the digital person, as shown in fig. 2, there are 24 skeletal key points in a three-dimensional model of a digital person, therefore, the action file stores location information of the 24 skeletal key points corresponding to different image frames of a video stream, and an image information set formed by location information of all 24 skeletal key points corresponding to each image frame is regarded as an information frame, and usually, a one-to-one correspondence relationship between the information frame and the image frame can be performed according to time sequence.

In one embodiment, the position information may be described as relative position information between skeletal key points, for example, a certain skeletal key point on a body in a three-dimensional model of a digital human is taken as a reference point, and the position information of the reference point is represented as [0,0 ]]The location information of other bone key points can be expressed as [ Δ x _i,j ,Δy _i,j ,Δz _i,j ]Where Δ is used to indicate the coaxial relative offset, i indicates the number of information frames, and j indicates the number of skeletal key points. In another embodiment, the position information can also be described as absolute position information of a three-dimensional coordinate system, for example, the position information of a bone key point is represented as [ x ] _i,j ,y _i,j ,z _i,j ]。

In fact, the data structure inside the motion file can be represented in any way, and the position information of all the skeletal key points in all the information frames in the motion file can be represented by a pre-specified data structure. In any way, the position information of the bone key points can be represented as long as the position information can be analyzed and utilized correspondingly by the application.

Because the digital human is a three-dimensional model obtained by three-dimensional modeling, the position information can be generally expressed according to a three-dimensional coordinate system mode, so that the position information expression can be quickly corresponding to the three-dimensional model of the digital human, and the actual image transformation adjustment of the corresponding bone key points in the three-dimensional model of the digital human is facilitated.

When the action file is obtained by means of a machine learning model, corresponding input information is prepared according to the requirement of a specified input information format when the machine learning model is trained, the input information is fed into the machine learning model, the position information of the skeletal key points of the digital human in each image frame of the corresponding video stream is obtained through inference of the machine learning model, and the position information is stored as the corresponding action file according to the data format specification of the action file and can be called. For example:

in one embodiment, the machine learning model can extract audio features from audio data provided by music files or voice data, then generate position information of digital human skeleton key points corresponding to corresponding dance actions according to the audio features, and store and output corresponding action files for controlling digital people to implement dance actions, thereby implementing intelligent dance services.

In another embodiment, the machine learning model may detect and determine position information of bone key points corresponding to a dance action performed by a character object in each image frame according to image features corresponding to a plurality of image frames in a dance video, and store and generate corresponding action files for controlling the digital person to perform the dance action, thereby implementing a service of performing dance by a real person and transferring the dance action to the digital person.

In another embodiment, the machine learning model may extract text semantic features from an action description text expressed in a natural language, generate position information of a digital human skeleton key point corresponding to a corresponding dance action according to the text semantic features, and store the position information as a corresponding action file to control the digital human to implement the dance action, thereby implementing an associated dance service.

In other embodiments corresponding to other possible scenarios, the machine learning model may also be trained in advance to serve martial arts movements, gymnastics movements, game movements, etc., and perform deep semantic information extraction according to movement-related information provided by materials such as music, voice, text, dance video, etc., and then generate corresponding position information of skeletal key points of the digital person according to the deep semantic information, and store and generate a corresponding movement file for controlling the digital person to implement a corresponding movement.

Step S1200, performing action abnormity detection according to the position information of the bone key points in each information frame, and determining that action abnormity segments which describe action abnormity phenomena exist in the information frames, wherein the action abnormity segments comprise one or more information frames with continuous time sequence;

since the description of the position information of the skeletal key points of the digital person may be inaccurate in the action file generated by the machine learning model or other manners, and after the digital person is controlled to move according to the action file to generate a corresponding video stream, abnormal actions such as mold penetration, unsmooth action or rhythm, and static action of the digital person occur when the video stream is played, the abnormal action detection can be performed on the action file before the action file is applied, one or more information frames corresponding to the abnormal action are determined through detection, and the information frames are determined as abnormal action segments, so that the abnormal action segments are optimized in a centralized manner.

In an embodiment, whether an action abnormal phenomenon corresponding to the template crossing exists in the action file may be detected according to the position information of the bone key points in each information frame, and specifically, the action abnormal phenomenon may be realized by detecting whether one or more information frames will cause a template crossing effect to occur when a digital person performs an action after the one or more information frames are applied. The mold-through referred to herein is a case where different parts of the body of the digital person intersect in a three-dimensional space, for example, when the arms of the digital person pass through the trunk model space, the mold-through phenomenon is known. The torso model space may be defined as a volumetric region corresponding to the digital human thorax. It is understood that, when the digital person performs the action, the limbs such as the hands and the feet can move greatly, so that the die-penetrating phenomenon can be detected by detecting whether the position information of the skeleton key points of the limbs such as the hands and the feet is included in the three-dimensional space range of the body part of the digital person, and whether the die-penetrating phenomenon occurs is correspondingly determined. When one or more information frames are determined to have the mode crossing phenomenon through detection, the information frames are abnormal information frames, the abnormal information frames which are continuous in time sequence are marked as the same action abnormal segment, and the subsequent targeted processing is carried out.

In another embodiment, whether an abnormal motion phenomenon caused by unsmooth motion, such as motion description or unsmooth rhythm, exists in the motion file may be detected according to the position information of the skeletal key points in the information frames, and specifically, the abnormal motion phenomenon may be realized by detecting whether one or more of the information frames will cause unsmooth motion when a digital person performs motion after application. Generally, since temporally adjacent information frames should have a gradual change relationship when describing position information of the same skeletal key point, whether unsmooth situation exists in a temporally subsequent information frame relative to a temporally previous information frame can be determined by detecting similarity between vectors of every two adjacent information frames, and one or more temporally continuous information frames with low similarity are determined as corresponding abnormal motion segments so as to optimize pertinently.

In still another embodiment, whether an action abnormal phenomenon corresponding to action stillness exists in the action file may be detected according to the position information of the skeletal key points in the information frames, specifically, the action abnormal phenomenon may be determined by detecting a variation range of inter-vector similarity of one or more information frames after a time sequence with respect to a reference information frame before the time sequence, when an accumulated similarity of a plurality of information frames after the time sequence is lower than a preset threshold, it may be determined that all information frames from the reference information frame to the last detected information frame constitute an action abnormal segment, and then, corresponding optimization may be performed on the action abnormal segment.

The above different embodiments for performing the abnormal operation detection corresponding to different situations may be applied alternatively or in a majority manner, and the purpose is to detect the abnormal operation segment in the operation file so as to perform the targeted optimization processing. According to the above embodiments, it can be seen that the motion anomaly segment includes one or more information frames, and these information frames are all regarded as abnormal information frames that describe motion anomaly, and need targeted optimization.

Step S1300, correcting the abnormal action segment to overcome the abnormal action phenomenon;

for the motion abnormal segment determined by detection, the whole motion abnormal segment can be corrected by means of technical means, or the position information of the skeleton key point in the abnormal information frame in the motion abnormal segment can be corrected, so that correction optimization of the motion abnormal segment can be realized.

In one embodiment, the following modification step S1310 may be employed: when the action abnormal fragment comprises a single information frame, adjusting the position information of the skeleton key points falling into the body model space based on the body model space corresponding to the information frame, so that the corresponding limb area is not overlapped with the body model space:

for the case that the motion anomaly segment includes only a single information frame, the position information of specific skeletal key points can be individually corrected in consideration of the requirement of minimizing the calculation amount, for example, for a certain skeletal key point of a digital human limb part causing the mold-crossing phenomenon, the correction can be realized by adjusting the position information of the certain skeletal key point to be out of the body model space. Similarly, if the motion is not smooth or static due to a single information frame, the abnormal phenomenon of the non-smooth motion or the static motion can be overcome by correcting the position information of the specific bone key points in the information frame.

In another embodiment, the following modification step S1320 may be adopted: when the action abnormal fragment comprises a plurality of information frames of a first number, matching an information frame set similar to the action of each information frame in the action abnormal fragment from a material library, and replacing the action abnormal fragment in the action file with the information frame set:

for the case that the number of information frames in the abnormal action segment is large, for example, the number of frames exceeds the preset first number, a mode of replacing the whole abnormal action segment may be adopted to correct the abnormal action segment. The material for replacing the abnormal motion segment can be called from a preset material library, the material library can collect and store information frame sets corresponding to different motions in advance, and each information frame set forms a material. When the materials need to be called from the material library, determining the similarity between vectors by the abnormal action fragments and each information frame set in the material library, then sequencing each material according to the similarity, and selecting the information frame set of the material with the highest similarity for replacing the corresponding abnormal action fragment to realize correction. It should be noted that, when calculating the vector similarity between the information frame set of the material and the abnormal motion segment, the vector similarity may be calculated on an information frame level, the similarity between the information frame set and the abnormal motion segment is calculated frame by frame, then the total similarity is obtained by accumulation, and then the information frame set most similar to the abnormal motion segment is determined according to the total similarity sequence for correcting the abnormal motion segment. The embodiment can be used for processing the abnormal motion segments corresponding to the abnormal motion phenomenon caused by the die penetration, and can also be applied to processing the abnormal motion segments corresponding to unsmooth motion, static motion and the like in the same way.

In still another embodiment, the following modification step S1330 processing may be adopted: when the abnormal action segment comprises a plurality of information frames of a second quantity, taking the information frames which are positioned before and after the abnormal action segment in the action file as reference frames, performing frame interpolation operation based on the two reference frames to generate an information frame set, and replacing the abnormal action segment in the action file with the information frame set:

for the case that the number of information frames in the abnormal motion segment is small, for example, the number of frames is lower than the first number and exceeds a preset second number, two information frames before and after the abnormal motion segment may be used as reference frames, a frame interpolation operation is performed on the basis of the two reference frames, information frames smoothly filtered between the two reference frames are obtained through the frame interpolation operation, and the abnormal information frames in the abnormal motion segment are replaced by the information frames. When the frame interpolation operation is implemented, a linear interpolation mode or a spherical interpolation mode can be adopted, and the method can be flexibly selected by a person skilled in the art. Similarly, the embodiment can be used for processing abnormal motion segments corresponding to abnormal motion phenomena caused by die penetration, and can also be applied to processing abnormal motion segments corresponding to unsmooth motion, static motion and the like.

It can be seen that, according to the number of abnormal information frames that need to be corrected in the abnormal movement segment, the different correction steps S1310, S1320, and S1330 can be flexibly selected to correct the abnormal movement segment, and any one or more correction steps can be adopted to perform correction processing, so that after the movement file is applied, the digital person can be controlled to perform smooth and natural movement, and a corresponding smooth image effect is generated.

For a specific example of the case where a plurality of abnormal motion segments exist in the motion file, if two abnormal motion segments are determined to have similar motions through similar detection, and one abnormal motion segment determines a corresponding correction mode to generate a corresponding information frame, the information frames may be adopted to correspondingly replace the other abnormal motion segment, so as to improve correction efficiency. The abnormal movement segments are repeated, and most of the abnormal movement segments are generated in the movement files generated according to music and dance videos, because the essence of the abnormal movement segments is compiled corresponding to music rhythm information, and the music generally has a repeated chapter phenomenon, corresponding dance movements are repeated frequently, so that when the previous abnormal movement segment exists in the movement files, one or more abnormal movement segments often exist in the follow-up process, and for the situation, the method of the embodiment is adopted for processing.

As can be seen from the above description, when the manner disclosed in one or more of the above embodiments is used to correct all abnormal motion segments in the motion file, the motion file is optimized, and when the motion file is used to control the motion of a digital person, the generated motion image can overcome the corresponding abnormal motion phenomenon, so that the image playing effect is more natural and smooth.

And step S1400, the motion file is applied to drive the digital human to move so as to generate the motion image.

After the optimization of the action file is completed, the three-dimensional model of the digital person can be driven to implement corresponding action transformation frame by frame according to each information frame in the action file, then the image frame corresponding to the information frame is obtained through rendering, a plurality of image frames are coded into video streams, and then corresponding video files capable of being played can be generated, or the video streams can be pushed to corresponding network addresses, and after the video streams are played, the smooth movement effect of the digital person implementing corresponding actions according to the action file can be presented.

In one embodiment, when generating the image frames corresponding to the information frames, a background image and/or a decoration image can be added to the digital person to make the picture closer to the life reality.

In another embodiment, a main broadcast user in a live broadcast room of a network live broadcast service can specify a piece of music, a server obtains an action file corresponding to the music, the method optimizes the action file, generates corresponding image frames frame by frame according to each information frame in the optimized action file so as to obtain a corresponding video stream, combines audio streams of the music to form a live broadcast stream, pushes the live broadcast stream to the live broadcast room of the main broadcast user, and analyzes and plays the live broadcast stream by a player in the live broadcast room after audience users in the live broadcast room obtain the live broadcast stream, so that an audio and video playing effect of synchronizing the music and digital human actions can be obtained, and virtual live broadcast service is realized.

According to the embodiment, the method and the device have the advantages that the action abnormity detection is carried out on the action file which is generated in advance and used for driving the digital human to move, the information frame corresponding to the action abnormity segment is determined, the information frame is corrected, and a plurality of inconsistent action abnormity phenomena such as mode crossing, unsmooth, static and rhythm in the action of the digital human are eliminated.

On the basis of any embodiment of the present application, please refer to fig. 4, implementing abnormal motion detection according to the position information of the skeletal key points in each of the information frames, and determining that there is an abnormal motion segment describing an abnormal motion phenomenon, including:

step S2100, reading at least one information frame in the action file as a target information frame, and determining a trunk model space of the digital person according to skeleton key points corresponding to a trunk area of the digital person in the target information frame;

when the action file is subjected to the die-punching detection, the information frames can be used as target information frames one by one to carry out the detection, and if the first target information frame is detected to determine that the die-punching phenomenon exists, the corresponding target information frame is the abnormal information frame.

In order to implement the cross-mode detection, a torso region in the digital person needs to be determined, the torso region of the digital person is generally set as a region corresponding to the chest of the digital person, as can be seen from an example of bone key points of the digital person shown in fig. 2, where the bone key points 16, 17, 9 may be used to define the chest region, and a torso model space may be expanded in a three-dimensional space corresponding to the digital person, with the chest region of the digital person as a center, and the torso model space may be sized to cover the entire chest space of the digital person, and a margin corresponding to a garment worn by the digital person may be reserved, so that the torso model space may be appropriately expanded in accordance with the margin.

Step S2200, detecting whether the limb area falls into the body model space according to the position information of the skeleton key point corresponding to the limb area in the target information frame, and determining that the target information frame has an abnormal action phenomenon corresponding to the crossing of the model when the limb area falls into the body model space;

the limb areas of a digital person, for example, as shown in fig. 2, can be set as the areas bounded by skeletal

key points

17, 19, 21 on the left arm and skeletal

key points

16, 18, 20 on the right arm in the figure.

In particular exemplary embodiments, each arm may use the position information corresponding to the shoulder

joint point

17 or 16, elbow

key point

18 or 19, wrist

joint point

20 or 21 to delineate points to further refine the position information. Specifically, for the left arm in the figure, four points, namely, the elbow joint point 19, the wrist joint point 21, the center point between the shoulder joint point 17 and the elbow joint point 19, and the center point between the elbow joint point 19 and the wrist joint point 21, may be used to define the limb area corresponding to the left arm. Similarly, for the right arm in the figure, four points, namely the elbow joint point 18, the wrist joint point 20, the center point between the shoulder joint point 16 and the elbow joint point 18, and the center point between the elbow joint point 18 and the wrist joint point 20, can be used to define the limb area corresponding to the right arm. The position information corresponding to the above four points of the left arm and the right arm can be constructed as motion feature vectors in the present application for implementing the similarity detection of motion between two information frames, and the specific application thereof will be disclosed in other embodiments of the present application, and not shown here for the time being.

Therefore, the limb area corresponding to each arm can be represented as a feature vector formed by position information of four points, so that numerical representation of the limb area is realized, and the body model space of the digital person is also subjected to corresponding numerical representation in advance, so that on the basis of the numerical representation, whether the limb area falls into the body model space can be determined through calculation, and when any part of the limb area determined according to a target information frame falls into the body model space, the fact that the target information frame describes an abnormal action phenomenon can be determined, so that the target information frame can be determined as an abnormal information frame.

And determining whether each information frame in the action file belongs to an abnormal information frame or not according to the above mode for each information frame in the action file, so as to complete the detection of all the information frames.

It should be noted that, the above is only an example in which the arms refer to the body model space corresponding to the chest for the corresponding limb areas, the same process can be applied to the limb areas corresponding to the legs and feet, and for different flexible requirements, those skilled in the art can adjust the limb areas according to the principles disclosed herein.

Step S2300, constructing a single or multiple time-series continuous target information frames with abnormal action phenomena into abnormal action segments.

When the abnormal information frame is detected, the abnormal information frames which are continuous in time sequence are constructed into the same abnormal action segment according to the continuous relation of the abnormal information frames in time sequence, so that the subsequent abnormal action segment can be corrected in a unit. It is understood that the information frames in the abnormal-motion segment can be one or more of the information frames according to the actual situation.

According to the embodiment, when the die-through detection is carried out, the limb area with large motion amplitude and frequent motion and the body model space with small motion amplitude and relatively infrequent motion are determined according to the information frame in the motion file, then the relative position relation of the limb area relative to the body model space is detected, whether the die-through phenomenon is described by the corresponding information frame is judged through the relative position relation, the data amount depended on is small, the operation efficiency is high, massive operation is not required to be carried out by depending on complex technical means, the detection can be rapidly completed, the detection effect is accurate, and the embodiment can more easily meet the timeliness requirement on scenes with high real-time requirements.

Based on any embodiment of the present application, please refer to fig. 5, determining a torso model space of the digital person according to the skeletal key points corresponding to the torso region of the digital person in the target information frame includes:

step S2110, determining a triangular plane area according to two shoulder joint points corresponding to the trunk area in the target information frame and three skeleton key points corresponding to the trunk base point;

as illustrated by the figure of fig. 2, there are two shoulder joint points 16 and 17, and a trunk base point 9 (which may also be any one of the points 6, 3, and 0) at the bottom of the chest, and a triangular plane area is obtained by connecting the three points.

Step S2120, calculating and determining a geometric center point of the triangular plane area, and determining coordinate information of the geometric center point in a three-dimensional coordinate system;

according to the position information of the two shoulder joint points and the trunk base point in the target information frame, the position information of the geometric center point of the triangular plane area can be rapidly calculated by applying a triangular center point formula, and the position information can be expressed as coordinate information in a three-dimensional coordinate system corresponding to a three-dimensional model of a digital person.

And S2130, expanding the three-dimensional direction according to the coordinate information of the geometric central point, and determining the trunk model space of the digital person.

The digital person is a three-dimensional model, and the trunk model space corresponding to the digital person is determined to be based on the three-dimensional model, so that the geometric center point of the triangular plane area can be used as a base point of the trunk model space, and then a cuboid space is expanded along three dimensions of a three-dimensional coordinate system according to the preset thickness, the cuboid space not only completely covers the body of the corresponding position of the digital person, but also can reserve a certain size for the digital person to wear, therefore, the cuboid space which is set to be the trunk model space of the digital person is set to be numerically represented, and the numerically represented cuboid space can be used for detecting whether the limb area falls into the trunk model space, so that whether a mold penetrating phenomenon exists in an information frame is identified.

According to the embodiment, when the trunk model space of the digital human is determined, the trunk model space can be quickly obtained by applying the geometric principle, the efficiency is very high, the occupation of system operation resources is low, and the implementation is more efficient.

On the basis of any embodiment of the present application, please refer to fig. 6, implementing abnormal motion detection according to the position information of the skeletal key points in each of the information frames, and determining that there is an abnormal motion segment describing an abnormal motion phenomenon, including:

step S3100, continuously calculating similarity of motion characteristic vectors of every two time-sequence continuous information frames in the motion file, wherein the motion characteristic vectors comprise position information of bone key points of digital human limb parts in corresponding information frames;

when detecting abnormal phenomena of motion or unsmooth rhythm of the motion file, the abnormal phenomena can be found based on similarity detection between information frames. Specifically, two adjacent information frames are acquired from the motion file as target information frames, the data distance between vector representations of the position information of each bone key point of the target information frame with the later time sequence and the vector representation of the position information of each bone key point with the former time sequence are calculated, the data distance is converted into similarity, and then whether the motion transition between the two target information frames is not smooth or not is judged according to whether the similarity exceeds a preset threshold or not.

In practice, if the position information of all the skeletal joint points in two target information frames is used to calculate the data distance, the calculation amount is high, and therefore, in this embodiment, for each target information frame, the position information of four points corresponding to the left arm and the position information of four points corresponding to the right arm may be used, and the position information corresponding to eight points in total is used to construct a corresponding motion feature vector, and generally the motion feature vector indicates the image position presented by the arm motion in the corresponding information frame.

The motion feature vector is to indicate the position information of the corresponding limb part of the digital person for performing the motion, so that when selecting each point constituting the motion feature vector, the arm is taken as the main point, and the center points of the elbow joint point, the wrist joint point and both of them, and the center point between the shoulder joint point and the elbow joint point can be determined, which are four points in total. And when the selected limbs are legs and feet, points are correspondingly taken at the legs and feet of the digital person. Of course, suitable modifications can be made on the basis of the construction of the motion feature vector illustrated above, for example, only the position information corresponding to the shoulder joint point, the elbow joint point, and the wrist joint point is selected for constructing the motion feature vector. Note that the construction method of the motion feature vector in this embodiment is also applicable to any other embodiment of the present application.

Since the motion feature vector already represents the position information of each selected point in a vector form, and both target information frames have their corresponding motion feature vectors, any algorithm such as cosine similarity, dot product between vectors, euclidean distance, pearson correlation coefficient, and jackard coefficient may be used to calculate the data distance between two motion feature vectors, and convert the data distance into a similarity representation.

Step S3200, when the similarity exceeds a preset threshold, extending and determining an action abnormal segment covering the two information frames from the action file based on the two corresponding information frames.

It is understood that in the motion file, every two adjacent information frames can calculate the inter-vector similarity between each two adjacent information frames in the above manner, and the similarity represents the degree of similarity when the limb parts in the two information frames perform the motion, so that the higher the similarity is, the smoother and natural the motion change between the two information frames is, and the lower the similarity is, the smoother the motion change between the two information frames is.

Therefore, a threshold value can be preset for checking whether an unsmooth abnormal phenomenon exists between two adjacent information frames, the similarity corresponding to every two adjacent information frames is compared with the threshold value, and when the similarity exceeds the threshold value, the unsmooth abnormal phenomenon exists in the two information frames, so that the information frame pair formed by the two information frames can be marked as an action abnormal segment. If there are multiple consecutive pairs of information frames, these consecutive pairs of information frames can be merged together to construct the same action anomaly segment.

Considering that a plurality of information frames generally need to be combined for smoothing processing when performing action optimization, in one embodiment, a smoothing factor may be preset to specify the number of information frames to be expanded, when an unsmooth information frame pair is determined, a plurality of information frames may be obtained by expanding in the direction of the prior time sequence, and the expanded information frames and the information frame pair together form an action abnormal segment. For example, if the information frames F (m) to F (m + 1) are not smooth and the smoothing factor α is preset in advance, the information frames between F (m- α) to F (m + 1) may be constructed as abnormal-motion segments. Generally, the smoothing factor α is set to an integer between more than 1 and less than 10.

According to the embodiment, when the abnormal phenomenon corresponding to whether the motion of the motion file is smooth or not is detected, the motion characteristic vector of each information frame is constructed based on the position information of a small amount of representative bone key points by taking every two adjacent information frames in time sequence as a unit, the similarity between the two information frames is determined based on the data distance between the motion characteristic vectors, the similarity detection is realized, whether the unsmooth phenomenon exists in the two information frames or not is judged through the similarity, the operation amount is small, the operation speed is high, the actual effect requirement required by real-time processing can be met, and the detection effect is accurate.

On the basis of any embodiment of the present application, please refer to fig. 7, an information frame set that is similar to the actions of the information frames in the action abnormal segment is matched from a material library, which includes:

step S1321, obtaining action feature vectors corresponding to all information frames in the action abnormal fragment, wherein the action feature vectors comprise position information of bone key points of digital human limb parts in the corresponding information frames;

the motion feature vector may be constructed in the same manner as described in the previous embodiments of the present application. Accordingly, for any abnormal motion segment determined according to any of the foregoing embodiments, a motion feature vector corresponding to each information frame can be obtained.

Step S1322, collecting the action abnormal segments and the information frames of all the materials in the material library, calculating the data distance between the action characteristic vectors of every two information frames corresponding to the time sequence according to the corresponding relation of the information frames in the time sequence, and accumulating the data distance of each information frame in the information frame collection of each material as the similarity score of the material;

as mentioned above, the present application prepares a material library for storing a plurality of sets of frames of information, each set of frames of information constituting a corresponding material, each set of frames of information being operable to control the digital person to perform a corresponding action. It can be understood that, due to the preparation, when the materials in the material library are used for controlling a digital person to perform a motion and generating a corresponding image, a better motion image expression can be obtained relative to the motion abnormal segment. The material library may be stored as a local list or may be stored in a remote database or local cache. In order to improve the operation efficiency in use, each information frame in each material in the material library is pre-constructed and stored with a corresponding motion feature vector according to the method for constructing the motion feature vector.

When it is required to determine whether the abnormal motion segment and a target material in the material library represent similar motion, the time sequence corresponding relationship between each information frame in the abnormal motion segment and the information frame of the target material is set, then the data distance between the motion feature vector of each information frame in the abnormal motion segment and the motion feature vector between the information frames in the target material corresponding to the time sequence is calculated, and the method for calculating the data distance is the same as the method described above, and any known algorithm is adopted. The data distances between corresponding information frames between the abnormal action segments and the target material are calculated according to the time sequence relationship, a plurality of corresponding data distances can be obtained along the time sequence, and a similarity score can be obtained by summing the data distances, so that the corresponding similarity score of the abnormal action segments and the target material based on the established time sequence corresponding relationship can be determined.

In one embodiment, the time sequence alignment relationship between the information frames of the same target material and the information frames of the abnormal action segment is adjusted, and a plurality of similarity scores can be obtained according to different alignment relationships, so that a plurality of similarity scores can be obtained based on different time sequence alignment relationships between the same target material and the abnormal action segment. Therefore, it is easy to understand that the number of the information frames in the target material may be greater than the number of the information frames in the abnormal action segments, that is, the abnormal action segments may find the most similar local segments in one target material according to different time sequence alignment relationships, and determine the information frame set corresponding to the local segment, so as to serve as a candidate for replacing the abnormal action segment.

For each material in the material library, an information frame set can be determined according to the process and the corresponding similarity score can be obtained, so that when the information frame set corresponding to the same action abnormal segment and the corresponding similarity score can be determined by using the material library and applying a similarity detection algorithm realized by the application based on the data distance calculation between the action characteristic vectors. It is understood that, the higher the similarity score is, the closer the motion image obtained when the control data person takes a motion is, the motion image corresponding to the motion abnormal segment is, the motion image is obtained when the corresponding information frame set is.

And step S1323, determining the information frame set of the material with the highest similarity score as the information frame set similar to the action abnormal fragment.

Since each material in the material library determines an information frame set for the abnormal action segment and provides a corresponding similarity score, each material can be optimized according to the similarity score, specifically, the information frame set corresponding to the material with the highest similarity score is determined to be a target information frame set for replacing the abnormal action segment, and then the target information frame set is used for replacing the abnormal action segment in the action file.

According to the embodiments, it is possible to match the information frame set most similar to the action of the action abnormal segment from the prepared material library based on the similarity between the actions for replacing the action abnormal segment, so as to implement batch replacement of the information frames in the action abnormal segment, and is suitable for a scene with a large number of frames in the action abnormal segment, which not only can improve the efficiency of correcting the action abnormal segment, but also can ensure to obtain a better action file through high-quality materials.

Based on any embodiment of the present application, please refer to fig. 8, the performing the frame interpolation operation based on the two reference frames includes:

step S1331, detecting the similarity of motion characteristic vectors of the two reference frames, wherein the motion characteristic vectors comprise position information of skeleton key points of digital human limb parts in corresponding information frames;

for the two reference frames F (m) and F (n) requiring frame interpolation, the motion feature vectors thereof can be obtained in the manner of the previous embodiment, and will not be described again. Based on the two motion feature vectors corresponding to the two reference frames, the similarity detection algorithm described above is applied, and the similarity between the two motion feature vectors can be determined similarly. The similarity can represent the deviation degree of the action effect between the two reference frames, so that different smoothing algorithms can be applied according to different deviation degrees, and the action abnormal segment where the two reference frames are located is corrected. In order to detect the degree of deviation, a preset threshold may be set, and when the similarity reaches the preset threshold, it indicates that the motion deviation is small, and a fast operation may be applicable, and step S1332 is performed; otherwise, it indicates that the motion deviation is large, and the step S1333 needs to be applied to perform more accurate frame interpolation.

Step S1332, when the similarity reaches a preset threshold value, performing frame interpolation operation on the two reference frames in a three-dimensional coordinate system;

when the similarity reaches a preset threshold, the frame interpolation operation of the two reference frames can be performed on the basis of a three-dimensional coordinate system corresponding to the three-dimensional model of the digital human, namely, the interpolation processing is performed according to the position information of the bone key points in the two reference frames, and the specific interpolation mode can be linear interpolation or spherical interpolation, so that the method can be flexibly implemented. Under the condition that the similarity is high, interpolation operation is carried out based on the three-dimensional coordinate system, and the information frame generated by interpolation is relatively effective.

And step S1333, when the similarity does not reach a preset threshold value, converting the position information of the skeleton key points in the two reference frames into Euler angles from the three-dimensional coordinate information, performing frame interpolation operation on the two reference frames based on the Euler angles, and converting the two reference frames into the three-dimensional coordinate information again after the frame interpolation operation.

When the similarity does not reach the preset threshold, if the interpolation operation is still performed with reference to the three-dimensional coordinate system, some problems, such as large direction change, abnormal motion after linear interpolation, etc., are usually generated after the generated information frame is applied. Therefore, for the situation that the similarity does not reach the preset threshold value and the action deviation is large, the position information of the bone key points of the two reference frames can be converted into Euler angle representation, then the interpolation operation is carried out on the basis of the Euler angle representation, and the position information representation of the three-dimensional coordinate system is converted back after the interpolation operation is completed. Similarly, the specific interpolation mode may be linear interpolation or spherical interpolation, and can be implemented flexibly. Interpolation operation is performed on the two reference frames based on the Euler angles, and the rotation characteristics of all skeleton key points can be effectively reserved, so that the generated information frame can accurately reflect action change.

According to the embodiment, when the action abnormal segment which represents the unsmooth action is subjected to smoothing processing, the magnitude of action deviation described by two reference frames in the action abnormal segment is adapted, frame interpolation is performed by adopting different coordinate systems, and effective balance is obtained between rapid interpolation and accurate interpolation.

Referring to fig. 9, a digital human motion control apparatus according to an aspect of the present application includes a file obtaining module 1100, an anomaly detection module 1200, a motion correction module 1300, and a file application module 1400, where the file obtaining module 1100 is configured to obtain a motion file corresponding to motion control of a digital human, where the motion file includes information frames corresponding to image frames in a moving image of the digital human, and the information frames store position information of skeletal key points of the digital human; the anomaly detection module 1200 is configured to perform motion anomaly detection according to the position information of the bone key points in each information frame, and determine that there is a motion anomaly segment describing a motion anomaly phenomenon, where the motion anomaly segment includes one or more information frames with continuous time sequence; the motion correction module 1300 is configured to correct the motion abnormality fragment to overcome the motion abnormality; the file application module 1400 is configured to apply the motion file to drive the digital human to move so as to generate the motion image.

On the basis of any embodiment of the present application, the anomaly detection module 1200 includes: the region determining unit is used for reading at least one information frame in the action file as a target information frame and determining a trunk model space of the digital person according to skeleton key points corresponding to the trunk region of the digital person in the target information frame; the die-crossing detection unit is used for detecting whether the limb area falls into the trunk model space or not according to the position information of the skeleton key points corresponding to the limb area in the target information frame, and when the limb area falls into the trunk model space, determining that the target information frame has an abnormal action phenomenon corresponding to die crossing; and the segment construction unit is arranged to construct a single or a plurality of time-sequence continuous target information frames with abnormal action phenomena into abnormal action segments.

On the basis of any embodiment of the present application, the area determination unit includes: the triangle determining subunit is set to determine a triangular plane area according to two shoulder joint points corresponding to the trunk area in the target information frame and three skeleton key points corresponding to the trunk base point; the central point determining subunit is configured to calculate and determine a geometric central point of the triangular plane area, and determine coordinate information of the geometric central point in a three-dimensional coordinate system; and the three-dimensional region determining subunit is configured to perform three-dimensional direction expansion according to the coordinate information of the geometric center point to determine the trunk model space of the digital person.

On the basis of any embodiment of the present application, the anomaly detection module 1200 includes: the similarity detection unit is used for continuously calculating the similarity of motion characteristic vectors of every two information frames with continuous time sequences in the motion file, wherein the motion characteristic vectors comprise the position information of the bone key points of the digital human limb parts in the corresponding information frames; and the segment acquisition unit is set to expand and determine the abnormal motion segment covering the two information frames from the motion file based on the corresponding two information frames when the similarity exceeds a preset threshold.

On the basis of any embodiment of the present application, the action modification module 1300 includes any one or any multiple of the following units: the single-frame correction unit is set to adjust the position information of the skeleton key points falling into the body model space based on the body model space corresponding to the information frame when the action abnormal fragment comprises a single information frame, so that the corresponding limb area is not overlapped with the body model space; the die-through correction unit is used for matching an information frame set similar to the action of each information frame in the action abnormal fragment from a material library when the action abnormal fragment comprises a plurality of information frames of a first number, and replacing the action abnormal fragment in the action file with the information frame set; and the smooth correction unit is used for generating an information frame set by taking the information frames which are positioned before and after the abnormal action segment in the action file as reference frames and performing frame insertion operation based on the two reference frames when the abnormal action segment comprises a plurality of information frames of a second quantity, and replacing the abnormal action segment in the action file with the information frame set.

On the basis of any embodiment of the present application, the die-piercing correction unit includes: the vectorization subunit is configured to obtain motion feature vectors corresponding to information frames in the motion abnormal fragment, and the motion feature vectors include position information of bone key points of digital human limb parts in the corresponding information frames; the matching operation subunit is configured to collect the motion abnormal segments and the information frames of each material in the material library, calculate the data distance between the motion characteristic vectors of every two corresponding information frames of the time sequence according to the corresponding relation of the information frames on the time sequence, and accumulate the data distance of each information frame in the information frame collection of each material into the similarity score of the material; and the set optimization subunit is configured to determine the information frame set of the material with the highest similarity score as the information frame set similar to the action abnormal segment.

On the basis of any embodiment of the present application, the smoothing correction unit includes: the similarity operation subunit is configured to detect similarity of motion feature vectors of the two reference frames, wherein the motion feature vectors include position information of bone key points of digital human limb parts in corresponding information frames; the high-efficiency frame interpolation subunit is set to perform frame interpolation operation on the two reference frames in the three-dimensional coordinate system when the similarity reaches a preset threshold; and the accurate frame inserting subunit is configured to convert the position information of the bone key points in the two reference frames into Euler angles from the three-dimensional coordinate information when the similarity does not reach a preset threshold, perform frame inserting operation on the two reference frames based on the Euler angles, and convert the two reference frames into the three-dimensional coordinate information again after the frame inserting operation.

Another embodiment of the present application also provides a digital human motion control device. As shown in fig. 10, the internal structure of the digital human motion control device is schematically illustrated. The digital human motion control device includes a processor, a computer readable storage medium, a memory, and a network interface connected by a system bus. The computer readable non-volatile readable storage medium of the digital human action control device stores an operating system, a database and computer readable instructions, the database can store information sequences, and the computer readable instructions can make a processor realize a digital human action control method when being executed by the processor.

The processor of the digital human motion control device is used for providing calculation and control capability and supporting the operation of the whole digital human motion control device. The memory of the digital human motion control device may have stored therein computer readable instructions which, when executed by the processor, may cause the processor to perform the digital human motion control method of the present application. The network interface of the digital human motion control equipment is used for connecting and communicating with a terminal.

It will be understood by those skilled in the art that the structure shown in fig. 10 is a block diagram of only a portion of the structure associated with the present application, and does not constitute a limitation on the digital human motion control device to which the present application applies, and that a particular digital human motion control device may include more or fewer components than shown in the drawings, or may combine certain components, or have a different arrangement of components.

In this embodiment, the processor is configured to execute specific functions of the modules in fig. 9, and the memory stores program codes and various data required for executing the modules or sub-modules. The network interface is used for realizing data transmission between user terminals or servers. The non-volatile readable storage medium in the present embodiment stores program codes and data necessary for executing all the modules in the digital human motion control device of the present application, and the server can call the program codes and data of the server to execute the functions of all the modules.

The present application further provides a non-transitory readable storage medium storing computer readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the digital human action control method of any of the embodiments of the present application.

The present application also provides a computer program product comprising computer programs/instructions which, when executed by one or more processors, implement the steps of the method as described in any of the embodiments of the present application.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments of the present application may be implemented by hardware related to instructions of a computer program, which may be stored in a non-volatile readable storage medium, and when executed, may include the processes of the embodiments of the methods as described above. The storage medium may be a computer-readable storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

In summary, the method and the device for detecting the abnormal motion of the digital human play perform abnormal motion detection on the pre-generated motion file for driving the digital human to move, efficiently and quickly optimize the motion file, so that after the obtained motion file drives the digital human to move, the motion image of the digital human is more smooth and natural, inference capability learned by completely depending on a machine learning model is avoided, the implementation cost of the technology is effectively controlled, and meanwhile, the efficient and high-quality processing effect is realized.

Claims

1. A digital human action control method is characterized by comprising the following steps:

acquiring a motion file corresponding to the motion of a digital person, wherein the motion file comprises information frames corresponding to all image frames in a motion image of the digital person, and the information frames store the position information of skeletal key points of the digital person;

and driving the digital human to move by applying the motion file so as to generate the motion image.

2. The digital human motion control method of claim 1, wherein the step of performing motion anomaly detection according to the position information of the skeletal key points in each information frame to determine the existence of a motion anomaly segment describing an abnormal motion phenomenon comprises the steps of:

3. The method of claim 2, wherein determining a torso model space of the digital person based on skeletal key points corresponding to a torso region of the digital person in the target information frame comprises:

4. The digital human motion control method of claim 1, wherein the step of performing motion anomaly detection according to the position information of the skeletal key points in each information frame to determine the existence of a motion anomaly segment describing a motion anomaly phenomenon comprises:

5. The digital human motion control method according to claim 2 or 4, wherein the step of correcting the motion abnormality segment to overcome the motion abnormality comprises any one or more of the following steps:

when the abnormal action fragment comprises a single information frame, adjusting the position information of the skeleton key points falling into the trunk model space based on the trunk model space corresponding to the information frame, so that the corresponding limb area is not overlapped with the trunk model space;

and when the abnormal action segment comprises a plurality of information frames of a second quantity, taking the information frames which are positioned in the action file before and after the abnormal action segment as reference frames, performing frame insertion operation based on the two reference frames to generate an information frame set, and replacing the abnormal action segment in the action file with the information frame set.

6. The digital human action control method of claim 5, wherein matching a set of information frames similar to the actions of each information frame in the action abnormal segment from a material library comprises:

collecting the abnormal action fragments and the information frames of all the materials in the material library, calculating the data distance between the action characteristic vectors of every two information frames corresponding to the time sequence according to the corresponding relation of the information frames on the time sequence, and accumulating the data distance of each information frame in the information frame collection of each material into the similarity score of the material;

7. The digital human motion control method of claim 5, wherein performing frame interpolation based on the two reference frames comprises:

detecting similarity of motion feature vectors of the two reference frames, wherein the motion feature vectors comprise position information of skeleton key points of digital human limb parts in corresponding information frames;

8. A digital human motion control device, comprising:

the file acquisition module is used for acquiring an action file corresponding to the motion of a digital person, wherein the action file comprises information frames corresponding to all image frames in a motion image of the digital person, and the information frames store the position information of skeletal key points of the digital person;

the abnormality detection module is used for detecting action abnormality according to the position information of the bone key points in each information frame and determining action abnormality segments which describe action abnormality and comprise one or more information frames with continuous time sequence;

9. A digital human motion control device comprising a central processor and a memory, wherein the central processor is configured to invoke execution of a computer program stored in the memory to perform the steps of the method according to any one of claims 1 to 7.

10. A non-transitory readable storage medium storing a computer program implemented according to the method of any one of claims 1 to 7 in the form of computer readable instructions, the computer program, when invoked by a computer, performing the steps included in the corresponding method.