CN110321008B

CN110321008B - Interaction method, device, equipment and storage medium based on AR model

Info

Publication number: CN110321008B
Application number: CN201910576731.3A
Authority: CN
Inventors: 庞文杰
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-06-28
Filing date: 2019-06-28
Publication date: 2023-10-24
Anticipated expiration: 2039-06-28
Also published as: CN110321008A

Abstract

The application provides an interaction method, device, equipment and storage medium based on an AR model, wherein the method comprises the following steps: acquiring a video stream, and identifying key points and key point motion information of objects in the video stream; determining action points corresponding to the key points according to the corresponding relation between the key points and the action points of the AR model; and controlling the action points of the AR model to act according to the key point movement information so as to control the AR model to act. Interaction between the user and the AR model on the terminal equipment can be realized without additional motion capture equipment, so that the motion of the AR model is completed, and the cost can be reduced; moreover, as the corresponding relation is established between the key points of the objects in the video stream and the action points of the AR model, the action points of the AR model can be controlled to make various corresponding actions according to the corresponding relation; the action diversity of the AR model is improved.

Description

Interaction method, device, equipment and storage medium based on AR model

Technical Field

The embodiment of the application relates to the technical field of terminals, in particular to an interaction method, device, equipment and storage medium based on an AR model.

Background

With the development of intelligent technology, an augmented reality technology (Augmented Reality, abbreviated as AR) model has begun to appear and develop. The AR model may exhibit various actions according to the needs of the user. Currently, a motion capture device is provided, which can be connected with the body of a user, so that the motion capture device transmits the motion of the user to a terminal device; and the terminal equipment controls the AR model to move according to the received information.

In the prior art, when interaction is performed between a user and a terminal device to implement an action of an AR model, an existing action capturing device may be used to capture the action of the user so as to give the action of the AR model.

However, in the prior art, an expensive motion capture device is required in order to capture the motion of the user and to impart the motion to the AR model by using the existing motion capture device. When a user uses a terminal device, for interaction between the user and the terminal device to complete the action of the AR model, the prior art mode also needs an additional action capturing device, is inconvenient for interaction between the user and the terminal device, cannot complete the action of the AR model in time, and has higher cost.

Disclosure of Invention

The embodiment of the application provides an interaction method, device, equipment and storage medium based on an AR model, which are used for solving the problems that in the prior art, additional motion capturing equipment is needed, interaction between a user and terminal equipment is inconvenient, the motion of the AR model cannot be completed in time, and the cost is high.

The first aspect of the present application provides an interaction method based on an AR model, where the method is applied to a terminal device, and the method includes:

acquiring a video stream, and identifying key points and key point motion information of objects in the video stream;

determining action points corresponding to the key points according to the corresponding relation between the key points and the action points of the AR model;

and controlling the action points of the AR model to act according to the key point movement information so as to control the AR model to act.

Further, according to the key point motion information, controlling the action point of the AR model to act, including:

determining linkage relations among different key points according to the key points;

and controlling each action point corresponding to each key point to act according to the linkage relation and the key point movement information of each key point.

Further, determining the linkage relation between different key points according to each key point comprises the following steps:

according to a preset database, wherein the database comprises a plurality of linkage relations, each linkage relation in the plurality of linkage relations is a relation between at least two key points, and the linkage relation corresponding to each key point is inquired.

Further, each key point is a preset point on the finger;

each linkage relation is linkage relation among preset points on the same finger, or each linkage relation is linkage relation among different fingers.

Further, each key point is a preset point on the facial organ;

each linkage relation is a linkage relation between preset points on the same facial organ, or each linkage relation is a linkage relation between different facial organs.

determining linkage relations among the key points according to the key point motion information;

and controlling each action point corresponding to each key point to act according to the linkage relation.

Further, the key point motion information is any one or more of the following: the coordinate position of the key point in the three-dimensional space, the orientation of the key point in the three-dimensional space and the movement speed of the key point.

determining an object action of the object according to the key points and the key point motion information;

determining an AR model action corresponding to the object action according to a corresponding relation between the preset object action and the AR model action;

and controlling an action point of the AR model to act according to the action of the AR model corresponding to the action of the object.

Further, the acquiring the video stream includes:

collecting the video stream through an image camera on the terminal equipment;

or acquiring the thermodynamic diagram of the object through an infrared camera on the terminal equipment, and generating the video stream according to the thermodynamic diagram.

Further, after controlling the action point of the AR model to act according to the key point motion information to control the AR model to act, the method further includes:

And determining and displaying voice prompt information corresponding to the action points of the AR model according to the corresponding relation between the preset action points and the voice prompt information, wherein the voice prompt information is used for representing that the action points of the AR model are doing actions.

A second aspect of the present application provides a terminal device, including:

an acquisition unit configured to acquire a video stream;

an identification unit, configured to identify a keypoint and keypoint motion information of an object in the video stream;

the determining unit is used for determining the action point corresponding to the key point according to the corresponding relation between the key point and the action point of the AR model;

and the control unit is used for controlling the action points of the AR model to act according to the key point motion information so as to control the AR model to act.

Further, the control unit is specifically configured to:

Further, each key point is a preset point on the finger;

Further, each key point is a preset point on the facial organ;

Further, the control unit is specifically configured to:

Further, the acquiring unit is specifically configured to:

collecting the video stream through an image camera on the terminal equipment;

Further, the terminal device further includes:

the prompting unit is used for controlling the action points of the AR model to act according to the key point movement information so as to control the AR model to act, and then determining and displaying voice prompting information corresponding to the action points of the AR model according to the corresponding relation between the preset action points and the voice prompting information, wherein the voice prompting information is used for representing that the action points of the AR model are acting.

A third aspect of the present application provides an electronic apparatus, comprising: a transmitter, a receiver, a memory, and a processor;

the memory is used for storing computer instructions; the processor is configured to execute the computer instructions stored in the memory to implement the interaction method based on the AR model provided in any implementation manner of the first aspect.

A fourth aspect of the present application provides a storage medium comprising: a readable storage medium and computer instructions stored in the readable storage medium; the computer instructions are configured to implement the interaction method based on the AR model provided in any implementation manner of the first aspect.

The interaction method, the interaction device, the interaction equipment and the interaction storage medium based on the AR model provided by the embodiment of the application are characterized in that the video stream is obtained, and the key points and the key point movement information of the objects in the video stream are identified; determining action points corresponding to the key points according to the corresponding relation between the key points and the action points of the AR model; and controlling the action points of the AR model to act according to the key point movement information so as to control the AR model to act. When the AR model is needed to be displayed on the terminal equipment and the user interacts with the AR model, the terminal equipment can determine key points of the user on the moving object according to the acquired video stream, and determine the movement condition of the key points; and the terminal equipment controls the motion points corresponding to the key points to perform corresponding actions according to the key point motion information of the key points, so as to control the AR model on the terminal equipment to perform actions. Interaction between the user and the AR model on the terminal equipment can be realized without additional motion capture equipment, so that the motion of the AR model is completed, and the cost can be reduced; moreover, as the corresponding relation is established between the key points of the objects in the video stream and the action points of the AR model, the action points of the AR model can be controlled to make various corresponding actions according to the corresponding relation; the action diversity of the AR model is improved.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are some embodiments of the application and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.

FIG. 1 is a flowchart of an interaction method based on an AR model according to an embodiment of the present application;

fig. 2 is a schematic diagram of key points of a hand according to an embodiment of the present application;

FIG. 3 is a flowchart of another interaction method based on an AR model according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of another terminal device according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

And, with the development of AR technology, the AR technology may be applied to a terminal device, which may be a mobile terminal device, and an application program of the terminal device. The AR model may be applied to various scenes, for example, the AR model is an AR face, and various display modes of the AR face are applied to video software, live broadcast software, camera software, for example, a sticker, an AR face Yan Meizhuang, an expression driver, and the like are provided. The gesture and the AR model of the face are also gradually applied to the terminal device, for example, an AR hand pair is generated and displayed according to the gesture of the user, and an AR face or other AR model is generated and displayed according to the facial expression of the user.

For example, an AR hand is an AR model, which is an augmented reality version of the hand toy. With existing motion capture devices, the user's motion is captured to impart an AR puppet motion.

The application provides an interaction method, an interaction device, interaction equipment and an interaction storage medium based on an AR model, which can realize interaction between a user and the AR model on terminal equipment without additional motion capture equipment, complete the action of the AR model and reduce the cost; moreover, as the corresponding relation is established between the key points of the objects in the video stream and the action points of the AR model, the action points of the AR model can be controlled to make various corresponding actions according to the corresponding relation; the action diversity of the AR model is improved, and the user experience is improved.

Fig. 1 is a flowchart of an interaction method based on an AR model according to an embodiment of the present application, as shown in fig. 1, where the method includes:

s101, obtaining a video stream.

Optionally, step S101 includes the following several implementations.

In the first implementation manner of step S101, the video stream is collected by an image camera on the terminal device.

In a second implementation manner of step S101, a thermodynamic diagram of the object is acquired by an infrared camera on the terminal device, and a video stream is generated according to the thermodynamic diagram.

In this step, the execution subject of this embodiment may be an electronic device, or a terminal device, or other processing apparatus or device that can execute the applet of this embodiment. The present embodiment is described with an execution body as a terminal device. The method provided by the embodiment can be applied to terminal equipment.

The terminal device may collect the video stream while the user is acting.

For example, an image camera is installed on the terminal device, and in a general environment, the image camera can collect image and video information; when a user acts, the terminal equipment can acquire a video stream through the image camera.

For another example, an infrared camera is installed on the terminal device, and the infrared camera can collect infrared information in an environment with weak light; when a user acts, the terminal equipment can acquire infrared information through the infrared camera; the terminal device generates a thermodynamic diagram according to the infrared information, and then generates a video stream according to the thermodynamic diagram at each time point. The terminal equipment generates a video stream according to the infrared information acquired by the infrared camera, and the existing mode can be seen.

S102, identifying key points and key point motion information of objects in the video stream.

Optionally, the key point motion information is any one or more of the following: the coordinate position of the key point in the three-dimensional space, the orientation of the key point in the three-dimensional space and the movement speed of the key point.

Optionally, each key point is a preset point on the finger; or, each key point is a preset point on the facial organ; alternatively, each key point is an articulation point on a limb.

In this step, the video stream has an object that acts, the terminal device needs to identify the object in the video stream, and uses the existing object identification model and the existing tracking technology to identify each key point on the object and the key point motion information of each key point. Examples of the target recognition model include a face recognition model, an expression recognition model, and a finger recognition model. Examples of the tracking technique include a three-dimensional hand skeleton tracking technique and a chase neutralization mapping technique (Animoji).

For example, fig. 2 is a schematic diagram of key points of a hand according to an embodiment of the present application, as shown in fig. 2, an object in a video stream is a user's hand, and a plurality of preset points are preset on each finger of the hand, for example, three preset points are set on each finger according to a joint of each finger, and each preset point is a joint of the finger. As shown in fig. 2, 3 key points are set on the thumb, namely a key point 1, a key point 2 and a key point 3 on the thumb; on the remaining individual fingers, 3 keypoints are also set, for example, a key point 1, a key point 2, and a key point 3 on the index finger a, a key point 1, a key point 2, and a key point 3 on the little finger D, and 3 key points on the middle finger B, and 3 key points on the ring finger. When fingers act, the terminal equipment collects video streams, and then an existing hand recognition technology is adopted to recognize each finger in the video streams; then, the terminal equipment adopts the existing three-dimensional hand skeleton tracking technology to determine each key point on the finger; and the terminal device can identify the finger and the key point of the finger on each frame in the video stream, and then when the finger acts, the terminal device can determine the motion condition of each key point of the finger according to the video stream, and then obtain the coordinate position of each key point in the three-dimensional space, the orientation of each key point in the three-dimensional space, the motion speed of each key point, and the like.

The three-dimensional hand skeleton tracking technology is a technology for analyzing a hand structure in real time by utilizing a video stream and modeling the three-dimensional skeleton of the hand. In the three-dimensional hand skeleton tracking technique, each hand is regarded as a three-dimensional structure in which a set of line segments in a three-dimensional space are connected by key points, according to an anatomical structure. As shown in fig. 2, the motion information of the hand in the three-dimensional space can be represented by using each key point; and, each keypoint has keypoint motion information.

For another example, the object in the video stream is a face of the user, and a plurality of preset points are preset on each facial organ of the face, for example, a plurality of preset points are set around eyes, a plurality of preset points are set on each eyebrow, and a plurality of preset points are set around the mouth. When a user performs facial expression, the terminal equipment collects a video stream, and then adopts the existing face recognition technology to recognize faces and facial organs in the video stream; then, the terminal equipment adopts the existing three-dimensional skeleton tracking technology and the position relation of preset points on the face to determine each preset point on each facial organ on the face; the preset points are key points; and the terminal device can identify the facial organs and key points of the facial organs on each frame in the video stream, so that when the user performs facial expression, the terminal device can determine the motion condition of each key point of each facial organ according to the video stream, and further obtain the coordinate position of each key point in the three-dimensional space, the orientation of each key point in the three-dimensional space, the motion speed of each key point, and the like.

Also for example, the object in the video stream is a limb of the user, for example, the limb is an arm, and a plurality of preset points are preset on the arm, for example, the preset points are set according to the shutdown of the arm, and each of the preset points is a preset point. When the arm of the user acts, the terminal equipment collects a video stream, and then an existing limb identification technology is adopted to identify the arm in the video stream; then, the terminal equipment adopts the existing three-dimensional skeleton tracking technology and the position relation of preset points on the arm to determine each preset point on the arm; the preset points are key points; and the terminal equipment can identify the key points of the arms on each frame in the video stream, and then when the arms of the user act, the terminal equipment can determine the movement condition of each key point of the arms according to the video stream, and then obtain the coordinate position of each key point in the three-dimensional space, the orientation of each key point in the three-dimensional space, the movement speed of each key point and the like.

S103, determining action points corresponding to the key points according to the corresponding relation between the key points and the action points of the AR model.

In this step, a plurality of operation points are set in advance on the AR model. The AR model may be any AR model.

And, the correspondence relationship between the key points and the action points of the model is established in advance, the correspondence relationship may be one-to-one correspondence between the key points and the action points, or the correspondence relationship may be correspondence between a plurality of key points and one action point, or the correspondence relationship may be correspondence between one key point and a plurality of action points.

Thus, the terminal device determines the action point corresponding to each key point according to the corresponding relation.

And S104, controlling the action points of the AR model to act according to the key point movement information so as to control the AR model to act.

In this step, since each key point has key point motion information, the key point motion information of the key point can be used as motion information of an action point corresponding to the key point; and the terminal equipment controls the motion points corresponding to the key points to perform corresponding actions according to the key point motion information of the key points, so as to control the AR model on the terminal equipment to perform actions.

In the embodiment, the video stream is acquired, and the key points and the key point movement information of the objects in the video stream are identified; determining action points corresponding to the key points according to the corresponding relation between the key points and the action points of the AR model; and controlling the action points of the AR model to act according to the key point movement information so as to control the AR model to act. When the AR model is needed to be displayed on the terminal equipment and the user interacts with the AR model, the terminal equipment can determine key points of the user on the moving object according to the acquired video stream, and determine the movement condition of the key points; and the terminal equipment controls the motion points corresponding to the key points to perform corresponding actions according to the key point motion information of the key points, so as to control the AR model on the terminal equipment to perform actions. Interaction between the user and the AR model on the terminal equipment can be realized without additional motion capture equipment, so that the motion of the AR model is completed, and the cost can be reduced; moreover, as the corresponding relation is established between the key points of the objects in the video stream and the action points of the AR model, the action points of the AR model can be controlled to make various corresponding actions according to the corresponding relation; the action diversity of the AR model is improved.

Fig. 3 is a flowchart of another interaction method based on an AR model according to an embodiment of the present application, as shown in fig. 3, where the method includes:

s201, obtaining a video stream.

This step may refer to step 101 shown in fig. 1, and will not be described in detail.

S202, identifying key points and key point motion information of objects in the video stream.

In this step, the step may be referred to as step 102 shown in fig. 1, and will not be described in detail.

S203, determining the action points corresponding to the key points according to the corresponding relation between the key points and the action points of the AR model.

In this step, the step may be referred to as step 103 shown in fig. 1, and will not be described in detail.

S204, according to the key point motion information, controlling the action points of the AR model to act so as to control the AR model to act.

Optionally, step S204 includes the following several implementations.

In the first implementation manner of step S204, according to each key point, determining the linkage relationship between different key points; and controlling each action point corresponding to each key point to act according to the linkage relation and the key point movement information of each key point.

Optionally, determining the linkage relationship between different key points according to each key point includes: according to a preset database, wherein the database comprises a plurality of linkage relations, each linkage relation in the plurality of linkage relations is a relation between at least two key points, and the linkage relation corresponding to each key point is inquired.

In the second implementation manner of step S204, determining the linkage relationship between the key points according to the key point motion information; and controlling each action point corresponding to each key point to act according to the linkage relation.

In a third implementation manner of step S204, determining an object action of the object according to the key points and the key point motion information; determining an AR model action corresponding to the object action according to a corresponding relation between the preset object action and the AR model action; and controlling an action point of the AR model to act according to the action of the AR model corresponding to the action of the object.

Optionally, each key point is a preset point on the finger; each linkage relationship is the linkage relationship among preset points on the same finger, or each linkage relationship is the linkage relationship among different fingers.

Optionally, each key point is a preset point on the facial organ; each linkage relationship is the linkage relationship between preset points on the same facial organ, or each linkage relationship is the linkage relationship between different facial organs.

In this step, when the action point of the AR model is controlled to perform an action, the following several implementations are provided.

The first implementation mode: a database is established in advance, wherein the database comprises a plurality of linkage relations, and each linkage relation is a relation between at least two key points; thus, the terminal device can determine the linkage relation corresponding to each key point. Thus, the terminal device may determine that there is a certain linkage relationship between key points on the object of the user, where the object is a part that is sending out an action, for example, the object is a hand, and the object is a face. For example, when the object is a hand, each finger is provided with a plurality of preset points, and the preset points are key points; thus, each key point on each finger has a certain linkage relation. For example, when the object is a face, each face organ of the face has a plurality of preset points, and the preset points are key points; thus, each key point of each facial organ has a certain linkage relationship, for example, key points on the mouth have a certain linkage relationship, and key points on the eyes have a certain linkage relationship. Thus, when the object acts, the terminal device can determine the key points with linkage relation and obtain the key points.

Then, because the key points and the action points have corresponding relations, the terminal equipment can determine the action points with the linkage relation according to the key points with the linkage relation; and then, the terminal equipment controls the action points of the linkage relation according to the key point movement information to perform corresponding actions. Thus, the terminal device controls the AR model to act.

For example, a correspondence is established between each key point of the thumb of the hand and the action point of the jaw of the AR model, and a correspondence is established between each key point of the remaining four fingers of the hand and the action point of the jaw of the AR model; the terminal equipment determines that the opening and closing actions are completed between the thumb and the other four fingers of the hand; the terminal equipment can determine that the key points of the thumb have linkage relations, and the key points of the other four fingers have linkage relations; and then, the terminal equipment controls the upper jaw and the lower jaw of the AR model to be closed and opened according to the linkage relation and the movement condition of the key points on each finger, so as to control the mouth of the AR model to be closed and opened.

For another example, a correspondence is established between each key point of the thumb of the hand and the action point of the left arm of the AR model; establishing a corresponding relation between each key point of the little finger of the hand and the action point of the right arm of the AR model; establishing a corresponding relation between each key point of the index finger, the middle finger and the ring finger of the hand and the action point of the head of the AR model; the terminal equipment determines the distance between the thumb and the little finger of the hand to finish the approaching and keeping away actions; the terminal equipment determines an index finger, a middle finger and a ring finger to finish bending and straightening actions; the terminal equipment can determine that the key points on the thumb have linkage relations, the key points on the little thumb have linkage relations and the key points on the other three fingers have linkage relations; moreover, the terminal equipment can determine that the thumb and the little finger have a linkage relation; then, the terminal equipment controls the double arms of the AR model to fold and unfold according to the linkage relation and the movement condition of the key points on each finger, and controls the head of the AR model to pitch forwards and lean backwards.

For another example, a correspondence is established between each key point on the palm of the hand and the trunk action point of the AR model; the terminal equipment determines that the palm rotates; the terminal equipment can determine that each key point on the palm has a linkage relation; and then, the terminal equipment controls the trunk of the AR model to turn according to the linkage relation and the movement condition of the key points on the palm, so as to control the direction of the AR model to change.

In the above process, the different fingers have a linkage relationship, for example, a linkage relationship between the thumb and the little finger; the key points on the same finger have a interlocked relationship, for example, the key points on the thumb have a interlocked relationship.

For another example, a correspondence is established between a facial organ of the user's face and a facial organ of the AR model; establishing a corresponding relation between each key point on the same facial organ of the face of the user and each key point on the same facial organ of the AR model; the terminal equipment determines that the face of the user is expressed in a facial mode; the terminal equipment can determine that each key point on the same facial organ has a linkage relation and the linkage relation among different facial organs; and then, the terminal equipment controls the AR model to perform corresponding expression actions according to the linkage relation and the movement condition of key points on the facial organs of the face of the user.

The second implementation mode: the terminal equipment establishes corresponding relations between different key point motion information and linkage relations in advance. For example, if the key point operation information of the key point 1 is a and the key point operation information of the key point 2 is B, then a linkage relationship exists between the key point 1 and the key point 2; the key point operation information of the key point 1 is A, the key point operation information of the key point 2 is C, and the key point operation information of the key point 3 is D, so that the key point 1, the key point 2 and the key point 3 have a linkage relation; the key point operation information of the key point 1 is A, the key point operation information of the key point 2 is C, the key point operation information of the key point 3 is E, and the key point operation information of the key point 4 is F, so that the key point 1 and the key point 2 have a linkage relation, and the key point 3 and the key point 4 have a linkage relation.

In the third implementation manner, after obtaining the key points and the key point movement information of the key points, the terminal equipment can directly determine the object actions of the object; then, as the corresponding relation between the object action and the AR model action is established, the terminal equipment can determine the AR model action corresponding to the object action; then, the terminal device directly controls the action points of the AR model to act according to the AR model action corresponding to the object action.

For example, after obtaining the key points on the finger and the key point movement information of the key points on the finger, the terminal device may directly determine that the finger is performing the finger dance movement; the corresponding relation between the finger of the user and the finger of the AR model is established; a corresponding relation is established between the key points of the fingers of the user and the action points of the fingers of the AR model; the finger dance motion corresponds to the AR model dance motion, and the AR model dance motion is used as the AR model motion; therefore, the terminal equipment can directly determine the AR model dance motion corresponding to the finger dance motion according to the corresponding relation between the object motion and the AR model motion; and the terminal equipment can control the action points corresponding to the key points on the AR model to perform dance actions according to the AR model dance actions.

By adopting the above manner provided by the embodiment, various actions of the AR model can be completed according to the object that is sending out the action. For example, it is also possible to implement a user's finger, controlling the arm of the AR model; and the finger of the user is realized, and the antenna of the AR model is controlled.

Also, in the above illustration, the motion of the finger and the motion of the facial expression may be applied to the AR model at the same time, so that the AR model completes the torso motion and the facial expression at the same time.

S205, according to the corresponding relation between the preset action points and the voice prompt information, determining and displaying the voice prompt information corresponding to the action points of the AR model, wherein the voice prompt information is used for representing that the action points of the AR model are doing actions.

In this step, after the terminal device controls the AR model displayed on the terminal device to complete the corresponding action, the terminal device may also send a voice prompt to prompt the user that the action point of the AR model is completing the corresponding action.

Specifically, the correspondence between the action point and the voice prompt information is already stored in the terminal device in advance, and when the action point of the AR model displayed on the terminal device is performing an action, the terminal device can determine the voice prompt information corresponding to the action point, and the terminal device sends the voice prompt information.

For example, when the two arms of the AR model displayed on the terminal device are clasping, the terminal device sends out a voice prompt of "the two arms are clasping".

According to the embodiment, through the fact that the AR model is required to be displayed on the terminal equipment, when a user interacts with the AR model, the terminal equipment can determine key points of the user on a moving object according to the acquired video stream, and determine the movement condition of the key points; and the terminal equipment controls the motion points corresponding to the key points to perform corresponding actions according to the key point motion information of the key points, so as to control the AR model on the terminal equipment to perform actions. Interaction between the user and the AR model on the terminal equipment can be realized without additional motion capture equipment, so that the motion of the AR model is completed, and the cost can be reduced; moreover, as the corresponding relation is established between the key points of the objects in the video stream and the action points of the AR model, the action points of the AR model can be controlled to make various corresponding actions according to the corresponding relation; the action diversity of the AR model is improved. Moreover, the method provided by the embodiment can be applied to the AR puppet to finish interaction between the user and the AR puppet, so that the AR puppet can finish various actions. In addition, the embodiment provides various ways how to control the action points of the AR model to act, so that the AR model can be controlled to complete corresponding actions quickly.

Fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present application, as shown in fig. 4, where the terminal device includes:

an acquisition unit 31 for acquiring a video stream.

And an identification unit 32 for identifying keypoints and keypoint motion information of the object in the video stream.

And a determining unit 33, configured to determine an action point corresponding to the key point according to the correspondence between the key point and the action point of the AR model.

The control unit 34 is configured to control the action point of the AR model to perform an action according to the key point motion information, so as to control the AR model to perform an action.

The terminal device provided in this embodiment is similar to the technical scheme for implementing the interaction method based on the AR model provided in any one of the foregoing embodiments, and the implementation principle and the technical effect are similar and are not repeated.

Fig. 5 is a schematic structural diagram of another terminal device according to an embodiment of the present application, where, on the basis of the embodiment shown in fig. 4, as shown in fig. 5, the control unit 34 is specifically configured to: determining linkage relations among different key points according to the key points; and controlling each action point corresponding to each key point to act according to the linkage relation and the key point movement information of each key point.

The control unit 34 is specifically configured to: according to a preset database, wherein the database comprises a plurality of linkage relations, each linkage relation in the plurality of linkage relations is a relation between at least two key points, and the linkage relation corresponding to each key point is inquired.

Alternatively, the control unit 34 is specifically configured to: determining linkage relations among the key points according to the key point movement information; and controlling each action point corresponding to each key point to act according to the linkage relation.

Alternatively, the control unit 34 is specifically configured to: determining object actions of the object according to the key points and the key point movement information; determining an AR model action corresponding to the object action according to a corresponding relation between the preset object action and the AR model action; and controlling an action point of the AR model to act according to the action of the AR model corresponding to the action of the object.

The acquiring unit 31 is specifically configured to: collecting video streams through an image camera on the terminal equipment; or, acquiring the thermodynamic diagram of the object through an infrared camera on the terminal equipment, and generating a video stream according to the thermodynamic diagram.

The terminal device provided in this embodiment further includes:

the prompting unit 41 is configured to determine and display, according to a correspondence between a preset action point and voice prompt information, voice prompt information corresponding to the action point of the AR model after the control unit 34 controls the action point of the AR model to perform an action according to the key point movement information, where the voice prompt information is used to characterize that the action point of the AR model is performing an action.

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application, as shown in fig. 6, where the electronic device includes: a transmitter 71, a receiver 72, a memory 73, and a processor 74;

memory 73 is used to store computer instructions; the processor 74 is configured to execute the computer instructions stored in the memory 73 to implement the technical solution of the interaction method based on the AR model according to any of the foregoing embodiments.

The present application also provides a storage medium comprising: a readable storage medium and computer instructions stored in the readable storage medium; the computer instructions are used for implementing the technical scheme of the interaction method based on the AR model of any implementation manner provided in the foregoing examples.

In the specific implementation of the electronic device described above, it should be understood that the processor 74 may be a central processing unit (english: central Processing Unit, abbreviated as CPU), or may be another general purpose processor, a digital signal processor (english: digital Signal Processor, abbreviated as DSP), an application specific integrated circuit (english: application Specific Integrated Circuit, abbreviated as ASIC), or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments described above may be performed by hardware associated with program instructions. The foregoing program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: read-only memory (ROM), RAM, flash memory, hard disk, solid state disk, magnetic tape, floppy disk, optical disk, and any combination thereof.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application.

Claims

1. An interaction method based on an AR puppet, which is characterized in that the method is applied to terminal equipment and comprises the following steps:

Acquiring a video stream, and identifying key points and key point motion information of objects in the video stream, wherein the objects in the video stream are hands of a user, and the key point motion information comprises the motion speed of the key points;

determining an action point corresponding to the key point according to the corresponding relation between the key point and the action point of the AR hand pair;

determining the linkage relation corresponding to each key point according to the corresponding relation between the preset different key point motion information and the linkage relation; if the key point motion information of the first key point is the first motion information and the key point motion information of the second key point is the second motion information, the first key point and the second key point have a linkage relation; if the key point motion information of the first key point is first motion information, the key point motion information of the second key point is third motion information, and the key point motion information of the third key point is fourth motion information, a linkage relation is formed among the first key point, the second key point and the third key point; if the key point motion information of the first key point is first motion information, the key point motion information of the second key point is third motion information, the key point motion information of the third key point is fifth motion information, and the key point motion information of the fourth key point is sixth motion information, a linkage relation exists between the first key point and the second key point, and a linkage relation exists between the third key point and the fourth key point;

According to the linkage relation and the key point motion information of each key point, controlling the action point of the AR hand pair to act so as to control the AR hand pair to act;

and determining and displaying voice prompt information corresponding to the action points of the AR puppet according to the corresponding relation between the preset action points and the voice prompt information, wherein the voice prompt information is used for representing that the action points of the AR puppet are doing actions.

2. The method of claim 1, wherein the keypoint motion information further comprises any one or more of: coordinate position of the key point in the three-dimensional space, and orientation of the key point in the three-dimensional space.

3. The method according to claim 1 or 2, wherein the acquiring the video stream comprises:

collecting the video stream through an image camera on the terminal equipment;

4. A terminal device, characterized in that the terminal device comprises:

an acquisition unit configured to acquire a video stream;

the identification unit is used for identifying key points and key point motion information of objects in the video stream, wherein the objects in the video stream are hands of users, and the key point motion information comprises the motion speed of the key points;

The determining unit is used for determining the action point corresponding to the key point according to the corresponding relation between the key point and the action point of the AR hand pair;

the control unit is used for determining the linkage relation corresponding to each key point according to the corresponding relation between the preset different key point motion information and the linkage relation; if the key point motion information of the first key point is the first motion information and the key point motion information of the second key point is the second motion information, the first key point and the second key point have a linkage relation; if the key point motion information of the first key point is first motion information, the key point motion information of the second key point is third motion information, and the key point motion information of the third key point is fourth motion information, a linkage relation is formed among the first key point, the second key point and the third key point; if the key point motion information of the first key point is first motion information, the key point motion information of the second key point is third motion information, the key point motion information of the third key point is fifth motion information, and the key point motion information of the fourth key point is sixth motion information, a linkage relation exists between the first key point and the second key point, and a linkage relation exists between the third key point and the fourth key point;

the prompting unit is used for determining and displaying voice prompt information corresponding to the action points of the AR puppet according to the corresponding relation between the preset action points and the voice prompt information, wherein the voice prompt information is used for representing that the action points of the AR puppet are doing actions.

5. An electronic device, comprising: a transmitter, a receiver, a memory, and a processor;

the memory is used for storing computer instructions; the processor is configured to execute the computer instructions stored in the memory to implement the AR-hand pair based interaction method of any one of claims 1-3.

6. A storage medium, comprising: a readable storage medium and computer instructions stored in the readable storage medium; the computer instructions for implementing the AR hand pair-based interaction method of any one of claims 1-3.