CN111443854B

CN111443854B - Action processing method, device and equipment based on digital person and storage medium

Info

Publication number: CN111443854B
Application number: CN202010220094.9A
Authority: CN
Inventors: 张晓东; 李士岩
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-03-25
Filing date: 2020-03-25
Publication date: 2022-01-18
Anticipated expiration: 2040-03-25
Also published as: CN111443854A

Abstract

The application discloses an action processing method, an action processing device, action processing equipment and a storage medium based on a digital person, and relates to artificial intelligence. The specific implementation scheme is as follows: acquiring interactive information, and determining an action to be executed by the digital person according to the interactive information; acquiring corresponding action driving information according to the action to be executed by the digital person; and triggering the digital person to execute corresponding action according to the action driving information. In the scheme provided by the embodiment of the application, the action to be executed by the digital person is determined based on the interaction information, so that the obtained action driving information is also determined based on the interaction information, the digital person can be driven to execute the corresponding action according to the actual interaction with the outside, different action displays can be carried out according to the interaction content different from the outside, and the problem that the action driving of the digital person is single in the current scheme is avoided.

Description

Action processing method, device and equipment based on digital person and storage medium

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for processing actions based on digital people in the field of artificial intelligence technologies.

Background

With the development of information technology, digital people are more and more widely applied. Various convenient services can be provided for human beings through digital people.

When a digital person interacts with a human, the digital person typically needs to perform some action. The current scheme for driving a digital person to perform an action is generally to record a video recording of the action performed by the real person in advance, and then extract related data from the recorded video recording of the action performed by the real person, so as to drive the digital person to perform the same action as the real person in the video recording. However, with the development of digital human applications, the actions that a digital human needs to perform are more complicated, and when actions related to interactive information need to be performed, the actions cannot be driven by recording real human execution actions in advance. For example, when a digital person needs to point to a certain interactive component on a display screen, an executed action needs to be determined according to a specific position of the interactive component on the display screen, and the position of the interactive component cannot be known in advance, so that the current scheme for driving the digital person to execute the action cannot realize the action.

Therefore, there is a need for a scheme capable of driving a digital person to perform an action according to information interacting with the human.

Disclosure of Invention

Provided are a method, device, equipment and storage medium for action processing based on a digital person.

According to a first aspect, there is provided a digital person-based action processing method, comprising:

acquiring interactive information, and determining an action to be executed by the digital person according to the interactive information;

acquiring corresponding action driving information according to the action to be executed by the digital person;

and triggering the digital person to execute corresponding action according to the action driving information.

According to a second aspect, there is provided a digital person-based action processing apparatus comprising:

the acquisition module is used for acquiring the interactive information and determining the action to be executed by the digital person according to the interactive information;

the processing module is used for acquiring corresponding action driving information according to the action to be executed by the digital person;

and the triggering module is used for triggering the digital person to execute corresponding action according to the action driving information.

According to a third aspect, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of digital human-based action processing of any one of the first aspect.

According to a fourth aspect, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the digital person-based action processing method of any one of the first aspects.

According to the action processing method, the action processing device, the action processing equipment and the action processing storage medium based on the digital person, the interaction information is firstly obtained, the action to be executed by the digital person is determined according to the interaction information, then the corresponding action driving information is obtained according to the action to be executed by the digital person, and finally the digital person is triggered to execute the corresponding action according to the action driving information. In the scheme provided by the embodiment of the application, the action to be executed by the digital person is determined based on the interaction information, so that the obtained action driving information is also determined based on the interaction information, the digital person can be driven to execute the corresponding action according to the actual interaction with the outside, different action displays can be carried out according to the interaction content different from the outside, and the problem that the action driving of the digital person is single in the current scheme is avoided.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a schematic diagram of a digital human provided by an embodiment of the present application;

fig. 2 is a schematic view of an application scenario provided in an embodiment of the present application;

FIG. 3 is a flowchart illustrating a method for processing actions based on a digital person according to an embodiment of the present application;

fig. 4 is a first flowchart illustrating a method for acquiring first motion driving information according to an embodiment of the present disclosure;

fig. 5 is a schematic diagram illustrating acquisition of first limb posture information provided in an embodiment of the present application;

fig. 6 is a first schematic diagram illustrating obtaining of first motion driving information according to an embodiment of the present disclosure;

fig. 7 is a flowchart illustrating a second method for acquiring first motion driving information according to an embodiment of the present application;

fig. 8 is a schematic diagram illustrating obtaining of first action driving information according to an embodiment of the present application;

fig. 9 is a flowchart illustrating a method for acquiring second motion driving information according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a digital human-based action processing apparatus according to an embodiment of the present application;

fig. 11 is a block diagram of an electronic device for implementing the digital human-based action processing method according to the embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

First, the digital person to which the present application relates will be described.

Fig. 1 is a schematic diagram of a digital person according to an embodiment of the present disclosure, and as shown in fig. 1, the digital person is a vivid and natural virtual person that is close to a real human image and obtained through technologies such as portrait modeling and motion capture, and the digital person can have abilities of cognition, understanding, and expression through artificial intelligence technologies such as speech recognition and natural language understanding. By means of the technologies of knowledge maps, conversation engines and the like, the digital people can go deep into different fields and industries, and can be quickly copied in a large scale according to requirements, so that one person can serve tens of millions of people.

Fig. 2 is a schematic view of an application scenario provided in an embodiment of the present application, and as shown in fig. 2, the intelligent interaction device for a digital person includes a display screen 21, an input/output device 22, and a digital person 23 displayed on the display screen 21. The input/output device 22 may include a microphone array and a monocular camera, wherein the monocular camera may take a picture to search for a potential target object, the microphone array may acquire and transmit the voice information sent from the outside to the background for analysis, and the voice information sent from the digital person is also transmitted to the outside through the microphone array. The display screen 21 may be an air screen, or other type of display screen. In addition, the display screen 21 and the input/output device 22 may be an integral device or may be two independent devices.

After the intelligent interactive device of the digital person is started, the digital person 23 may be statically or dynamically presented on the display screen 21, and the digital person is configured to implement corresponding functions according to different scenes to which the digital person applies. For example, when applied to a retail setting, a digital person may act as a shopping guide. The related information of various commodities is stored in advance, so that corresponding commodities are recommended to different users. When applied to an exhibition hall scenario, a digital person may act as a presenter. By pre-storing the relevant information of various exhibits, different people visiting forward show the corresponding exhibits, and the like.

The scheme of the present application will be explained below with reference to the accompanying drawings.

Fig. 3 is a schematic flowchart of a digital human-based action processing method according to an embodiment of the present application, and as shown in fig. 3, the method may include:

and S31, acquiring the interactive information, and determining the action to be executed by the digital person according to the interactive information.

In the embodiment of the application, after the digital human interaction equipment is started, a digital human can be displayed in a display screen of the equipment, and the digital human can interact with a real person through the cooperation of all devices in the digital human interaction equipment to acquire interaction information. When interacting with a real person, a digital person often needs to perform some action, such as speaking, displaying an item, etc. According to the interaction information when the comfortable person interacts with the real person, the action to be executed by the comfortable person can be determined.

The digital human interaction device may include a voice acquisition device and an image acquisition device to acquire external voice information and image information, for example, a microphone array may be disposed on the digital human interaction device as the voice acquisition device to acquire the words spoken by the external human, and transmit the words spoken by the external human to a background for analysis, and then drive the digital human to make a corresponding response according to the analysis result. For example, a monocular camera may be further disposed on the digital human interaction device as an image collecting device, and external image information may be collected to determine an interactive object, and so on.

When interacting with a real person, the action to be performed by the digital person is determined according to the specific situation of interaction with the real person. For example, when a real person is initially near a digital human interaction device, the digital person needs to smile or bow to the real person. When a real person inquires about detailed information of an item, a display screen of the digital human interaction device displays a picture or introduction of the item, the digital human needs to introduce the item by combining the content displayed on the display screen, and actions to be performed can include speaking, displaying the item with two hands and the like. When a person needs to input some information, for example, the person needs to sign on a display screen, and at this time, the person needs to indicate the place where the person signs with hands, and so on.

And S32, acquiring corresponding action driving information according to the action to be executed by the digital person.

After determining the action to be performed by the digital person, corresponding action driving information needs to be acquired. As shown in S31, the action to be performed by the digital person is determined according to the interaction information, and the action to be performed by the digital person is different for different interaction information, and thus the corresponding action driving information is also different.

The motion driving information is information for driving the digital person to perform a corresponding motion, and the digital person can perform the corresponding motion according to the motion driving information. The corresponding motion driver information is different for different limbs. For example, when the limb needing to move in the action to be performed by the digital person is the head, the corresponding action driving information is used for driving the head to move; when the limb needing to move in the action to be executed by the digital person is the neck, the corresponding action driving information is used for driving the neck; when the limbs needing to move in the actions to be executed by the digital person are hands, the corresponding action driving information is used for driving the hands; when a limb that needs to be moved in an action to be performed by the digital person is a face, the corresponding action driving information is for driving the face, and so on.

The motion-driven information may be obtained by recording a video of a real person. For example, if it is desired to actuate a digital human to perform a bow, the real human may be recorded to perform the bow and the corresponding data, such as bone data, extracted to obtain corresponding actuation information for the digital human to perform the bow in response to changes in the movement of the bone data for the real human to perform the bow.

In this application, for a bow and other relatively fixed actions, analysis processing may be directly performed through video recording when a real person performs an action, and the digital person is driven to perform the action.

And S33, triggering the digital person to execute corresponding action according to the action driving information.

After the action driving information corresponding to the action to be executed is obtained, the digital person can be triggered to execute the action to be executed according to the action driving information, and action interaction with a real person is realized.

The action processing method based on the digital person, provided by the embodiment of the application, comprises the steps of firstly obtaining interaction information, determining the action to be executed by the digital person according to the interaction information, then obtaining corresponding action driving information according to the action to be executed by the digital person, and finally triggering the digital person to execute the corresponding action according to the action driving information. In the scheme provided by the embodiment of the application, the action to be executed by the digital person is determined based on the interaction information, so that the obtained action driving information is also determined based on the interaction information, the digital person can be driven to execute the corresponding action according to the actual interaction with the outside, different action displays can be carried out according to the interaction content different from the outside, and the problem that the action driving of the digital person is single in the current scheme is avoided.

The embodiments of the present application will be described in detail with reference to specific examples.

When the digital person is in different situation states, the actions required to be performed may be different, and the actions performed by the digital person may include some relatively fixed actions or some relatively flexible actions. Comparatively fixed actions, which may include, for example, saluting, bowing, fist making, etc., are not germane to the interactive components displayed on the screen, and are performed similarly by the digital person regardless of the position of the interactive components. Some more flexible actions are usually related to the position of the interactive components on the screen, and the position of the interactive components is usually not known in advance, so the actions are relatively flexible, for example, when a certain item is displayed on the screen, the hand of the digital person needs to be moved to point at the displayed item, and the position of the hand motion of the digital person needs to be determined according to the position of the displayed item on the screen.

The corresponding motion driving information is different for the two different types of motions. When the action to be executed by the digital person is an action related to the position of the interactive component, the first action driving information needs to be acquired according to the position of the interactive component. When the action to be performed by the digital person is an action unrelated to the position of the interactive component, the second action driving information needs to be acquired at this time. Wherein the first motion driving information and the second motion driving information are different. Different action driving information is acquired through different action types to be respectively processed, so that the action execution of the digital person can be more diversified and flexible.

First, acquisition of the first operation drive information will be described.

Fig. 4 is a first flowchart illustrating a method for acquiring first action driving information according to an embodiment of the present application, as shown in fig. 4, including:

s41, determining the first limb of the digital person, which needs to move to execute the action to be executed.

After the action to be executed by the digital person is determined according to the interactive information, the first limb needing to be moved can be determined according to the action to be executed. For example, if the action to be performed is to point to the interactive component on the display screen, the first limb to be moved at this time is the hand of the digital person; if the action to be performed is looking at the interactive component on the display screen, the first limb that needs to be moved at this time is the digital person's head, in particular the digital person's eyes, etc.

S42, acquiring corresponding first limb posture information according to the first limb, wherein the first limb posture information is acquired when the first limb corresponding to the multiple movements of the person in the motion video is moved to different positions.

After the action to be executed and the first limb needing to move are determined, the corresponding first limb posture information can be obtained according to the first limb. In the embodiment of the application, the first limb posture information is posture information obtained when the first limb corresponding to the movement of the person in the pre-recorded motion video is moved to different positions.

For example, when the first limb needing to be moved is the head, the corresponding first limb posture information is the posture information obtained when the head of a real person rotates to different positions and is recorded in advance; when the first limb needing to be moved is a hand, the corresponding first limb posture information is the posture information which is obtained when the real person moves the arm to different positions and is recorded in advance; when the first limb needing to be moved is the eyes of the face, the corresponding first limb posture information is the posture information obtained when the real person rotates the eyeball to different positions, which is recorded in advance, and comprises the posture when the real person rotates the eyeball to the leftmost side and the posture when the real person rotates the eyeball to the rightmost side, and the like.

And acquiring the pre-recorded posture of the real person when moving the first limb, and extracting data to obtain corresponding first limb posture information.

S43, obtaining the first action driving information according to the position of the interactive component, the initial position of the first limb of the digital person and the first limb posture information.

After the first limb posture information is obtained, the range or the action of the first limb to be moved can be obtained according to the position of the interactive component and the initial position of the first limb of the digital person. For example, when the interactive component is on the left side of the digital person, if the first limb is an eye, the eyeball needs to be rotated to the position of the interactive component, where the orientation of the eyeball is the direction of the interactive component; if the first limb is the left hand at this time, then the left hand needs to be moved to point at the interactive component at this time, and so on.

After determining the range of motion or the action of the first limb for performing the action to be performed, the digital person may be driven to perform the action to be performed according to the corresponding first limb posture information. Where the digital person performs an action requiring a process, such as when the action is to turn the eyeball to a position towards the interactive component, the eyeball is initially forward and then slowly turned. In this case, the first limb posture information includes posture information when the eyeball is turned to the leftmost position and posture information when the eyeball is turned to the rightmost position. All other eyeball orientations are in the range of turning to the leftmost side and the rightmost side of the eyeball, so that different parameter proportions can be adopted for fusion to obtain the action of the eyeball in other directions. For example, by fusing the two posture information of the eyeball toward the leftmost posture and the rightmost posture at a ratio of 50% each, the eyeball-oriented posture can be obtained, in which the eyeball is neither left nor right. When the proportion of the eyeball turned to the leftmost posture exceeds the proportion of the eyeball turned to the rightmost posture, the digital human eyeball driven by the obtained posture information looks to the left. Therefore, when the action to be performed is turning the eyeball to a position toward the interactive component, the initial ratio of the two posture information may be set to 1:1 at this time, and then the ratio of the eyeball to the leftmost posture may be gradually increased, thereby obtaining the first action driving information. At this time, the digital person is driven to move according to the first motion driving information, and the eyeball of the digital person gradually turns to the left from the front.

The above-described embodiment describes the manner of acquiring the first motion-driving information in terms of the rotation of the eyeball, and the manner of acquiring the first motion-driving information corresponding to the motion of each limb in practice is similar to the above-described embodiment. For example, when the action to be performed is smiling, gestures of the mouth of a real person moving to different positions may be recorded in advance, and then fusion proportions of the gestures at different positions and changes in the proportions may be determined according to the amplitude of the smiling of the digital person, so as to obtain corresponding first action driving information. By the method, the digital person can be driven to execute corresponding actions according to the actual interactive component, and the method is more flexible.

A manner of acquiring the first motion driving information when the first limb is the arm of the digital person will be described below with reference to fig. 5 and 6.

When the first limb is an arm, the first limb posture information is the position of the arm relative to the person in the motion video when the person in the motion video moves the arm to the left, right, upper and lower sides.

Fig. 5 is a schematic diagram illustrating the acquisition of the first limb posture information according to the embodiment of the present application, as shown in fig. 5, which includes a real person, and two bone points, namely point a and point B, are located on the left arm of the real person. When the first limb is an arm, the real person first moves his or her arm to a different position. In the example of fig. 5, the real person moves his own arm to four positions, leftmost, rightmost, uppermost and lowermost. When the real person moves his or her arm to the leftmost position, point a on the left arm moves to the position of point a1 in fig. 5, and point B moves to the position of point B1 in fig. 5; when the real person moves his or her arm to the rightmost side, point a on the left arm moves to the position of point a2 in fig. 5, and point B moves to the position of point B2 in fig. 5; when the real person moves his or her arm to the lowermost position, point a on the left arm moves to the position of point a3 in fig. 5, and point B moves to the position of point B3 in fig. 5; when the human being moves his or her arm to the uppermost position, point a on the left arm moves to the position of point a4 in fig. 5 and point B moves to the position of point B4 in fig. 5.

Points a1, a2, A3 and a4 show the range of motion of point a on the left arm of the real person, the range of motion of point a does not exceed point a1 at the leftmost, does not exceed point a2 at the rightmost, does not exceed A3 at the lowest, does not exceed a4 at the uppermost, and is similar to point B, and points B1, B2, B3 and B4 show the range of motion of point B on the left arm of the real person. Through the fusion of the posture information of the leftmost position, the rightmost position, the uppermost position and the lowermost position in different proportions, the posture information of the arm from the point A to any position can be obtained.

After the first limb posture information is acquired, further first motion-driving information may be determined.

Specifically, the first position of the arm limb of the digital person after the digital person performs the action to be performed may be determined according to the position of the interactive component. Then, according to the initial position and the first position of the arm limb of the digital person, determining the change information of the position of the arm limb of the digital person relative to the position of the digital person when the arm limb of the digital person moves from the initial position to the first position of the arm of the digital person. Finally, according to the change information, when the person in the motion video moves the arm limb to the left, right, upper and lower sides, the position of the arm limb relative to the person in the motion video is determined, and first motion driving information is determined.

Fig. 6 is a first schematic diagram for acquiring first action driving information provided in the embodiment of the present application, and as shown in fig. 6, when the action to be performed by the digital person is an action related to the position of the interactive component, a digital person 61 and an interactive component 62 are displayed on the display screen 60, where the interactive component 62 is located on the left side of the digital person 61. The interactive component 62 may be an item displayed by the digital person 61 or a place for providing user input information. For example, when the digital human interactive device is applied to a bank, and when a customer transacts business with the digital human interactive device, the customer needs to sign the business in the process of transacting business, the interactive component 62 may be displayed on the display screen 60, and the digital human 61 needs to prompt the customer to sign at the interactive component 62, for example, in fig. 6, the digital human 61 needs to point at the position of the interactive component 62 with the left hand, so that the customer knows that the position needing to sign is at the interactive component 62.

In the scenario illustrated in fig. 5, the positions of the arms relative to the person in the motion video when the person in the motion video moves the arms to the left, right, top, and bottom are illustrated, thereby obtaining the first limb posture information. Then, according to the position of the interactive component 62, a first position of the arm of the digital person is determined after the digital person performs the action to be performed. In fig. 6, points a and B on the left arm of the digital person 61 are taken as examples, and the points a and B are points at which the left arm of the digital person 61 is at the initial position. Since the action to be performed at this time is that the left hand of the digital person 61 needs to be pointed at the interactive component 62, it can be determined that the point a of the left hand is below the interactive component 62, as at a' in fig. 6, after the digital person 61 performs the action. At the same time, point B on the digital person's left arm needs to be moved to point B' in fig. 6.

After the initial position and the final first position of the left arm of the digital person 61 are determined, the posture information of the arm from the point a to the point a 'and the posture information of the arm from the point B to the point B' can be obtained according to the fusion of the posture information of the arm of the digital person 61 in the four positions of the leftmost position, the rightmost position, the uppermost position and the lowermost position in the example of fig. 5. In fig. 6, two arbitrary points on the left arm of the digital person 61 are illustrated, and after posture information of each point on the arm is acquired, first motion driving information is obtained, and the left arm of the digital person 61 is driven to move from the initial position to the first position. By the method of the example, according to the fusion of the posture information of the digital person at the four positions of the arm at the leftmost position, the rightmost position, the uppermost position and the lowermost position in different proportions, the movement of the arm of the digital person at each position and direction can be controlled by combining the actions to be executed by the digital person, and the flexibility of the action executed by the digital person is realized.

Fig. 4-6 illustrate one implementation of determining first action actuation information when an action to be performed is related to a position of an interactive component, and another implementation will be described below in conjunction with fig. 7 and 8.

Fig. 7 is a flowchart illustrating a second method for acquiring first action driving information according to an embodiment of the present application, as shown in fig. 7, including:

s71, determining the initial position of the first limb and the first limb which needs to move for the digital human to perform the action to be performed.

After the action to be executed by the digital person is determined according to the interactive information, the first limb needing to be moved can be determined according to the action to be executed. For example, if the action to be performed is to point to the interactive component on the display screen, the first limb to be moved at this time is the hand of the digital person; if the action to be performed is looking at the interactive component on the display screen, the first limb that needs to be moved at this time is the digital person's eye, and so on.

In addition to determining the first limb, an initial position of the first limb is acquired, the initial position of the first limb being a position the digital person was in before performing the action to be performed.

S72, according to the position of the interactive assembly, determining a second position of the first limb after the digital person executes the action to be executed.

The second position is a position at which the first limb is located after performing the action to be performed. For the action to be performed, the second position is the position of the first limb at the completion of the action to be performed.

S73, determining the first motion driving information according to the inverse dynamic skeleton model, the initial position of the first limb and the second position.

In the embodiment of the application, after the initial position and the second position of the first limb are determined, the motion modes of other bone points of the first limb are reversely determined by adopting a reverse dynamic bone model, so that first action driving information is obtained, and the digital person is driven to execute the action to be executed according to the first action driving information.

Fig. 8 is a schematic diagram illustrating acquisition of first action driving information provided in the embodiment of the present application, as shown in fig. 8, when the action to be performed by the digital person is an action related to a position of an interaction component, a digital person 81 and an interaction component 82 are displayed on a display screen 80, where the interaction component 82 is located on the left side of the digital person 81. The interactive component 82 may be an item displayed by the digital person 81 or a place for providing user input information. For example, when the digital human interactive device is applied to a bank, and when a customer transacts business with the digital human interactive device, the customer needs to sign the business in the process of transacting business, the interactive component 82 can be displayed on the display screen 80, and the digital human 81 needs to prompt the customer to sign at the interactive component 82, for example, in fig. 8, the digital human 81 needs to point at the position of the interactive component 82 with the left hand, so that the customer knows that the position needing to sign is at the interactive component 82.

In fig. 8, the motion changes of two points on the left arm of the digital person 81, respectively point a and point B, are illustrated. After the position of the interactive component 82 is determined, since the motion to be performed is that the left finger points to the position of the interactive component 82, according to the position of the interactive component 82, the position pointed by the finger after the digital person performs the motion to be performed is determined to be a point a 'in fig. 8, that is, the point a on the left arm of the digital person needs to move to the point a', where the passed positions are the point a1 and the point a2 illustrated in fig. 8.

After the motion change of the point a is determined, the motion change of the point B on the left arm is determined according to the inverse dynamic skeleton model, that is, the motion change of each point on the left arm is determined by the driving of the position change of the left hand, and only the motion change of the point B is taken as an example in fig. 8.

In fig. 8, the change in the motion of the point B is determined from the change in the motion of the point a, and when the point a moves to the point a1, the point B moves to the point B1, when the point a moves to the point a2, the point B moves to the point B2, and when the point a moves to the point a ', the point B moves to the point B'. As shown by the dashed line in fig. 8, which is the motion change of the left arm of the digital person 81. After the second position a' is obtained from the position of the interactive component 82, the corresponding first motion driving information can be obtained according to the inverse dynamic skeleton model. The left arm of the digital person 81 is driven to move from the points a and B to the points a 'and B' according to the first motion driving information. By the example method, the motion change of each corresponding limb can be determined based on the inverse dynamic skeleton model and the second position of the first limb of the digital human, and flexible and vivid display of the motion of the digital human is realized.

Fig. 4-8 illustrate a method of acquiring first motion-driven information when a motion to be performed by a digital person is related to a position of an interactive component. In some cases, the actions to be performed by the digital person are independent of the position of the interactive assembly, such as saluting, bowing, nodding, and the like. These actions are relatively fixed, and the following is an acquisition method for acquiring corresponding second action driving information when the digital person executes an action unrelated to the position of the interactive component.

Fig. 9 is a flowchart illustrating a method for acquiring second action driving information according to an embodiment of the present application, as shown in fig. 9, including:

s91, determining a plurality of action driving information according to the actions to be executed by the digital person, wherein each action driving information is extracted according to the actions to be executed by the person in the action video;

s92, determining the second motion driving information among the plurality of motion driving information.

In the embodiment of the application, when the action to be performed by the digital person is not related to the position of the interactive component, a method that records videos of the real person performing various actions are recorded in advance and then related data are extracted for driving the digital person to perform the action to be performed can be adopted.

For example, if a digital human desires to perform a bow, video recordings taken by a real human while performing the bow may be prerecorded and then data relating to the video recordings taken by a real human while performing the bow, such as data relating to changes in the bone motion of the real human while performing the bow, for actuating a digital human to perform the bow.

If a digital person is driven to perform a bow by extracting relevant data from a video recording taken by a real person during the bow, the digital person will perform the bow in one-to-one correspondence each time the digital person performs the bow, which makes the digital person perform relatively mechanically and uncomfortable. In order to solve the problem, the embodiment of the application adopts a scheme that for each action, videos of different real persons when the real persons execute the action are collected, relevant data of the different real persons when the real persons execute the action videos are respectively extracted, and a plurality of action driving information is obtained. Then, when the action to be performed by the digital person is the action, one of the plurality of action driving information is randomly selected as second action driving information to drive the digital person to perform the action. Because different real persons have slight differences when performing actions, the determined second action driving information can be changed when the digital person randomly selects the second action driving information every time, so that the digital person cannot be identical even if the digital person performs the same action, and the mechanical feeling of the digital person performing the action can be effectively reduced.

For example, for a bow operation, video recordings of a plurality of different real persons during the bow operation are recorded in advance, and the respective pieces of data are extracted to obtain a plurality of pieces of operation drive information. When the digital person is to perform a bow, one of the plurality of motion drive information may be determined as the second motion drive information to drive the digital person to perform the bow. At this time, the change in motion of the digital person performing the bow is similar to the change in motion of the real person performing the bow according to the second motion driving information. By randomly determining the second action driving information in the plurality of action driving information each time, the digital person can have variation and hierarchy even if the digital person performs the same action, and the flexibility is enhanced.

The action processing method based on the digital person, provided by the embodiment of the application, includes the steps of firstly obtaining interaction information, determining an action to be executed by the digital person according to the interaction information, and then obtaining corresponding action driving information according to the action to be executed by the digital person, wherein a mode of obtaining the action driving information is provided respectively for whether the action to be executed is related to the position of an interaction assembly. And finally triggering the digital person to execute corresponding action according to the action driving information. In the scheme provided by the embodiment of the application, the action to be executed by the digital person is determined based on the interaction information, so that the obtained action driving information is also determined based on the interaction information, the digital person can be driven to execute the corresponding action according to the actual interaction with the outside, different action displays can be carried out according to the interaction content different from the outside, and the problem that the action driving of the digital person is single in the current scheme is avoided.

Fig. 10 is a schematic structural diagram of a digital human-based action processing apparatus according to an embodiment of the present application, and as shown in fig. 10, the apparatus includes an obtaining module 101, a processing module 102, and a triggering module 103, where:

the acquisition module 101 is configured to acquire interaction information and determine an action to be performed by a digital person according to the interaction information;

the processing module 102 is configured to obtain corresponding action driving information according to the action to be executed by the digital person;

the triggering module 103 is configured to trigger the digital person to execute a corresponding action according to the action driving information.

In one possible implementation manner, the interaction information includes an interaction component; the processing module 102 is specifically configured to:

if the action to be executed by the digital person is judged and known to be the action related to the position of the interactive assembly, first action driving information is obtained according to the action to be executed by the digital person and the position of the interactive assembly;

otherwise, second action driving information is obtained according to the action to be executed by the digital person.

In one possible implementation, the action to be performed by the digital person is an action related to the position of the interactive component; the processing module 102 is specifically configured to:

determining a first limb of the digital person that requires movement to perform the action to be performed;

acquiring corresponding first limb posture information according to the first limb, wherein the first limb posture information is acquired when the corresponding first limb moves to different positions for multiple times according to the person in the motion video;

and obtaining the first action driving information according to the position of the interactive component, the initial position of the first limb of the digital person and the first limb posture information.

determining a first limb that the digital person needs to move to perform the action to be performed and an initial position of the first limb;

determining a second position of the first limb after the digital person executes the action to be executed according to the position of the interactive component;

determining the first motion-driving information based on a reverse dynamic skeletal model, an initial position of the first limb, and the second position.

In one possible implementation, the action to be performed by the digital person is an action that is unrelated to the location of the interactive component; the processing module 102 is specifically configured to:

determining a plurality of action driving information according to the actions to be executed by the digital person, wherein each action driving information is extracted according to the actions to be executed by the person in the action video;

determining the second motion driving information among the plurality of motion driving information.

In one possible implementation, the first limb is one or more of the following limbs:

head, arm, and neck.

In one possible implementation manner, the first limb is an arm limb, and the first limb posture information is the position of the arm limb relative to the person in the motion video when the person in the motion video moves the arm limb to the left, right, upper side and lower side; the processing module 102 is specifically configured to:

determining a first position of an arm limb of the digital person after the digital person executes the action to be executed according to the position of the interactive component;

determining change information of the position of the arm limb of the digital person relative to the position of the digital person when the arm limb of the digital person moves from the initial position of the arm limb of the digital person to the first position according to the initial position and the first position of the arm limb of the digital person;

and according to the change information, determining the first action driving information when the person in the action video moves the arm limb to the left side, the right side, the upper side and the lower side and the position of the arm limb relative to the person in the action video.

The apparatus provided in the embodiment of the present application may be configured to implement the technical solution of the method embodiment, and the implementation principle and the technical effect are similar, which are not described herein again.

According to embodiments of the present application, there is also provided a digital human interaction device and a readable storage medium.

As shown in fig. 11, the electronic device is a block diagram of an electronic device based on a digital human action processing method according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 11, the electronic apparatus includes: one or more processors 1101, a memory 1102, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 11, a processor 1101 is taken as an example.

The memory 1102 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the digital human-based action processing method provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the digital human-based action processing method provided by the present application.

The memory 1102, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the acquisition module 101, the processing module 102, and the triggering module 103 shown in fig. 10) corresponding to the digital human-based action processing method in the embodiment of the present application. The processor 1101 executes various functional applications of the server and data processing, i.e., implements the digital human-based action processing method in the above-described method embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 1102.

The memory 1102 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the electronic device based on motion processing of the digital person, and the like. Further, the memory 1102 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 1102 may optionally include memory located remotely from the processor 1101, which may be connected over a network to an electronic device based on digital human motion processing. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device based on the digital person's motion processing method may further include: an input device 1103 and an output device 1104. The processor 1101, the memory 1102, the input device 1103 and the output device 1104 may be connected by a bus or other means, and are exemplified by being connected by a bus in fig. 11.

The input device 1103 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus based on motion processing of a digital person, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer, one or more mouse buttons, a track ball, a joystick, or other input devices. The output devices 1104 may include a display device, auxiliary lighting devices (e.g., LEDs), tactile feedback devices (e.g., vibrating motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

In the scheme provided by the embodiment of the application, the action to be executed by the digital person is determined based on the interaction information, so that the obtained action driving information is also determined based on the interaction information, the digital person can be driven to execute the corresponding action according to the actual interaction with the outside, different action displays can be carried out according to the interaction content different from the outside, and the problem that the action driving of the digital person is single in the current scheme is avoided.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A digital person-based action processing method is characterized by comprising the following steps:

acquiring interactive information, and determining an action to be executed by the digital person according to the interactive information; the interactive information comprises an interactive component;

if the action to be executed by the digital person is judged and known to be the action related to the position of the interactive component, determining a first limb of the digital person, which needs to move when executing the action to be executed, to the digital person; determining a first position of the first limb after the digital person executes the action to be executed according to the position of the interactive component, and acquiring first action driving information according to the first limb, the initial position of the first limb and the first position;

2. The method according to claim 1, wherein if it is determined that the action to be performed by the digital person is not an action related to the position of the interactive component, second action driving information is obtained according to the action to be performed by the digital person.

3. The method of claim 2, wherein the action to be performed by the digital person is an action related to a location of an interactive component; the acquiring first motion driving information according to the first limb, the initial position of the first limb and the first position comprises:

and obtaining the first action driving information according to the initial position of the first limb, the first position and the first limb posture information.

4. The method of claim 2, wherein the action to be performed by the digital person is an action related to a location of an interactive component; the acquiring first motion driving information according to the first limb, the initial position of the first limb and the first position includes:

determining the first motion-driving information based on the first limb, a reverse dynamic skeletal model, an initial position of the first limb, and the first position.

5. The method of claim 2, wherein the action to be performed by the digital person is an action that is unrelated to the location of the interactive component; the acquiring of the second action driving information according to the action to be executed by the digital person includes:

6. The method of claim 3 or 4, wherein the first limb is one or more of the following limbs:

head, arm, and neck.

7. The method of claim 3, wherein the first limb is an arm limb, and the first limb pose information is a position of the arm limb relative to the person in the motion video as the person in the motion video moves the arm limb to the left, right, top, and bottom; the obtaining the first motion driving information according to the initial position of the first limb, the first position, and the first limb posture information includes:

and according to the change information, when the person in the motion video moves the arm limb to the left, the right, the upper side and the lower side, the position of the arm limb relative to the person in the motion video determines the first motion driving information.

8. An action processing apparatus based on a digital person, comprising:

the acquisition module is used for acquiring the interactive information and determining the action to be executed by the digital person according to the interactive information; the interactive information comprises an interactive component; the processing module is specifically configured to:

if the action to be executed by the digital person is judged and known to be the action related to the position of the interactive component, determining a first limb of the digital person, which needs to move when executing the action to be executed, to the digital person; determining a first position of the first limb after the digital person executes the action to be executed according to the position of the interactive component, and acquiring first action driving information according to the first limb, the initial position of the first limb and the first position; the processing module is used for acquiring corresponding action driving information according to the action to be executed by the digital person;

9. The apparatus of claim 8, wherein the processing module is specifically configured to:

and judging that the action to be executed by the digital person is not the action related to the position of the interactive assembly, and acquiring second action driving information according to the action to be executed by the digital person.

10. The apparatus of claim 9, wherein the action to be performed by the digital person is an action related to a location of an interactive component; the processing module is specifically configured to:

11. The apparatus of claim 9, wherein the action to be performed by the digital person is an action related to a location of an interactive component; the processing module is specifically configured to:

12. The apparatus of claim 9, wherein the action to be performed by the digital person is an action that is unrelated to the location of the interactive component; the processing module is specifically configured to:

13. The device of claim 10 or 11, wherein the first limb is one or more of the following limbs:

head, arm, and neck.

14. The apparatus of claim 10, wherein the first limb is an arm limb, and the first limb pose information is a position of the arm limb relative to the person in the motion video when the person in the motion video moves the arm limb to the left, right, top, and bottom; the processing module is specifically configured to:

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the digital human-based action processing method of any one of claims 1-7.

16. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the digital human-based action processing method according to any one of claims 1 to 7.