CN110321008B - Interaction method, device, equipment and storage medium based on AR model - Google Patents

Interaction method, device, equipment and storage medium based on AR model Download PDF

Info

Publication number
CN110321008B
CN110321008B CN201910576731.3A CN201910576731A CN110321008B CN 110321008 B CN110321008 B CN 110321008B CN 201910576731 A CN201910576731 A CN 201910576731A CN 110321008 B CN110321008 B CN 110321008B
Authority
CN
China
Prior art keywords
key point
motion information
model
action
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910576731.3A
Other languages
Chinese (zh)
Other versions
CN110321008A (en
Inventor
庞文杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201910576731.3A priority Critical patent/CN110321008B/en
Publication of CN110321008A publication Critical patent/CN110321008A/en
Application granted granted Critical
Publication of CN110321008B publication Critical patent/CN110321008B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language

Abstract

The application provides an interaction method, device, equipment and storage medium based on an AR model, wherein the method comprises the following steps: acquiring a video stream, and identifying key points and key point motion information of objects in the video stream; determining action points corresponding to the key points according to the corresponding relation between the key points and the action points of the AR model; and controlling the action points of the AR model to act according to the key point movement information so as to control the AR model to act. Interaction between the user and the AR model on the terminal equipment can be realized without additional motion capture equipment, so that the motion of the AR model is completed, and the cost can be reduced; moreover, as the corresponding relation is established between the key points of the objects in the video stream and the action points of the AR model, the action points of the AR model can be controlled to make various corresponding actions according to the corresponding relation; the action diversity of the AR model is improved.

Description

Interaction method, device, equipment and storage medium based on AR model
Technical Field
The embodiment of the application relates to the technical field of terminals, in particular to an interaction method, device, equipment and storage medium based on an AR model.
Background
With the development of intelligent technology, an augmented reality technology (Augmented Reality, abbreviated as AR) model has begun to appear and develop. The AR model may exhibit various actions according to the needs of the user. Currently, a motion capture device is provided, which can be connected with the body of a user, so that the motion capture device transmits the motion of the user to a terminal device; and the terminal equipment controls the AR model to move according to the received information.
In the prior art, when interaction is performed between a user and a terminal device to implement an action of an AR model, an existing action capturing device may be used to capture the action of the user so as to give the action of the AR model.
However, in the prior art, an expensive motion capture device is required in order to capture the motion of the user and to impart the motion to the AR model by using the existing motion capture device. When a user uses a terminal device, for interaction between the user and the terminal device to complete the action of the AR model, the prior art mode also needs an additional action capturing device, is inconvenient for interaction between the user and the terminal device, cannot complete the action of the AR model in time, and has higher cost.
Disclosure of Invention
The embodiment of the application provides an interaction method, device, equipment and storage medium based on an AR model, which are used for solving the problems that in the prior art, additional motion capturing equipment is needed, interaction between a user and terminal equipment is inconvenient, the motion of the AR model cannot be completed in time, and the cost is high.
The first aspect of the present application provides an interaction method based on an AR model, where the method is applied to a terminal device, and the method includes:
acquiring a video stream, and identifying key points and key point motion information of objects in the video stream;
determining action points corresponding to the key points according to the corresponding relation between the key points and the action points of the AR model;
and controlling the action points of the AR model to act according to the key point movement information so as to control the AR model to act.
Further, according to the key point motion information, controlling the action point of the AR model to act, including:
determining linkage relations among different key points according to the key points;
and controlling each action point corresponding to each key point to act according to the linkage relation and the key point movement information of each key point.
Further, determining the linkage relation between different key points according to each key point comprises the following steps:
according to a preset database, wherein the database comprises a plurality of linkage relations, each linkage relation in the plurality of linkage relations is a relation between at least two key points, and the linkage relation corresponding to each key point is inquired.
Further, each key point is a preset point on the finger;
each linkage relation is linkage relation among preset points on the same finger, or each linkage relation is linkage relation among different fingers.
Further, each key point is a preset point on the facial organ;
each linkage relation is a linkage relation between preset points on the same facial organ, or each linkage relation is a linkage relation between different facial organs.
Further, according to the key point motion information, controlling the action point of the AR model to act, including:
determining linkage relations among the key points according to the key point motion information;
and controlling each action point corresponding to each key point to act according to the linkage relation.
Further, the key point motion information is any one or more of the following: the coordinate position of the key point in the three-dimensional space, the orientation of the key point in the three-dimensional space and the movement speed of the key point.
Further, according to the key point motion information, controlling the action point of the AR model to act, including:
determining an object action of the object according to the key points and the key point motion information;
determining an AR model action corresponding to the object action according to a corresponding relation between the preset object action and the AR model action;
and controlling an action point of the AR model to act according to the action of the AR model corresponding to the action of the object.
Further, the acquiring the video stream includes:
collecting the video stream through an image camera on the terminal equipment;
or acquiring the thermodynamic diagram of the object through an infrared camera on the terminal equipment, and generating the video stream according to the thermodynamic diagram.
Further, after controlling the action point of the AR model to act according to the key point motion information to control the AR model to act, the method further includes:
And determining and displaying voice prompt information corresponding to the action points of the AR model according to the corresponding relation between the preset action points and the voice prompt information, wherein the voice prompt information is used for representing that the action points of the AR model are doing actions.
A second aspect of the present application provides a terminal device, including:
an acquisition unit configured to acquire a video stream;
an identification unit, configured to identify a keypoint and keypoint motion information of an object in the video stream;
the determining unit is used for determining the action point corresponding to the key point according to the corresponding relation between the key point and the action point of the AR model;
and the control unit is used for controlling the action points of the AR model to act according to the key point motion information so as to control the AR model to act.
Further, the control unit is specifically configured to:
determining linkage relations among different key points according to the key points;
and controlling each action point corresponding to each key point to act according to the linkage relation and the key point movement information of each key point.
Further, the control unit is specifically configured to:
According to a preset database, wherein the database comprises a plurality of linkage relations, each linkage relation in the plurality of linkage relations is a relation between at least two key points, and the linkage relation corresponding to each key point is inquired.
Further, each key point is a preset point on the finger;
each linkage relation is linkage relation among preset points on the same finger, or each linkage relation is linkage relation among different fingers.
Further, each key point is a preset point on the facial organ;
each linkage relation is a linkage relation between preset points on the same facial organ, or each linkage relation is a linkage relation between different facial organs.
Further, the control unit is specifically configured to:
determining linkage relations among the key points according to the key point motion information;
and controlling each action point corresponding to each key point to act according to the linkage relation.
Further, the key point motion information is any one or more of the following: the coordinate position of the key point in the three-dimensional space, the orientation of the key point in the three-dimensional space and the movement speed of the key point.
Further, the control unit is specifically configured to:
determining an object action of the object according to the key points and the key point motion information;
determining an AR model action corresponding to the object action according to a corresponding relation between the preset object action and the AR model action;
and controlling an action point of the AR model to act according to the action of the AR model corresponding to the action of the object.
Further, the acquiring unit is specifically configured to:
collecting the video stream through an image camera on the terminal equipment;
or acquiring the thermodynamic diagram of the object through an infrared camera on the terminal equipment, and generating the video stream according to the thermodynamic diagram.
Further, the terminal device further includes:
the prompting unit is used for controlling the action points of the AR model to act according to the key point movement information so as to control the AR model to act, and then determining and displaying voice prompting information corresponding to the action points of the AR model according to the corresponding relation between the preset action points and the voice prompting information, wherein the voice prompting information is used for representing that the action points of the AR model are acting.
A third aspect of the present application provides an electronic apparatus, comprising: a transmitter, a receiver, a memory, and a processor;
the memory is used for storing computer instructions; the processor is configured to execute the computer instructions stored in the memory to implement the interaction method based on the AR model provided in any implementation manner of the first aspect.
A fourth aspect of the present application provides a storage medium comprising: a readable storage medium and computer instructions stored in the readable storage medium; the computer instructions are configured to implement the interaction method based on the AR model provided in any implementation manner of the first aspect.
The interaction method, the interaction device, the interaction equipment and the interaction storage medium based on the AR model provided by the embodiment of the application are characterized in that the video stream is obtained, and the key points and the key point movement information of the objects in the video stream are identified; determining action points corresponding to the key points according to the corresponding relation between the key points and the action points of the AR model; and controlling the action points of the AR model to act according to the key point movement information so as to control the AR model to act. When the AR model is needed to be displayed on the terminal equipment and the user interacts with the AR model, the terminal equipment can determine key points of the user on the moving object according to the acquired video stream, and determine the movement condition of the key points; and the terminal equipment controls the motion points corresponding to the key points to perform corresponding actions according to the key point motion information of the key points, so as to control the AR model on the terminal equipment to perform actions. Interaction between the user and the AR model on the terminal equipment can be realized without additional motion capture equipment, so that the motion of the AR model is completed, and the cost can be reduced; moreover, as the corresponding relation is established between the key points of the objects in the video stream and the action points of the AR model, the action points of the AR model can be controlled to make various corresponding actions according to the corresponding relation; the action diversity of the AR model is improved.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are some embodiments of the application and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.
FIG. 1 is a flowchart of an interaction method based on an AR model according to an embodiment of the present application;
fig. 2 is a schematic diagram of key points of a hand according to an embodiment of the present application;
FIG. 3 is a flowchart of another interaction method based on an AR model according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of another terminal device according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
With the development of intelligent technology, an augmented reality technology (Augmented Reality, abbreviated as AR) model has begun to appear and develop. The AR model may exhibit various actions according to the needs of the user. Currently, a motion capture device is provided, which can be connected with the body of a user, so that the motion capture device transmits the motion of the user to a terminal device; and the terminal equipment controls the AR model to move according to the received information.
And, with the development of AR technology, the AR technology may be applied to a terminal device, which may be a mobile terminal device, and an application program of the terminal device. The AR model may be applied to various scenes, for example, the AR model is an AR face, and various display modes of the AR face are applied to video software, live broadcast software, camera software, for example, a sticker, an AR face Yan Meizhuang, an expression driver, and the like are provided. The gesture and the AR model of the face are also gradually applied to the terminal device, for example, an AR hand pair is generated and displayed according to the gesture of the user, and an AR face or other AR model is generated and displayed according to the facial expression of the user.
In the prior art, when interaction is performed between a user and a terminal device to implement an action of an AR model, an existing action capturing device may be used to capture the action of the user so as to give the action of the AR model.
However, in the prior art, an expensive motion capture device is required in order to capture the motion of the user and to impart the motion to the AR model by using the existing motion capture device. When a user uses a terminal device, for interaction between the user and the terminal device to complete the action of the AR model, the prior art mode also needs an additional action capturing device, is inconvenient for interaction between the user and the terminal device, cannot complete the action of the AR model in time, and has higher cost.
For example, an AR hand is an AR model, which is an augmented reality version of the hand toy. With existing motion capture devices, the user's motion is captured to impart an AR puppet motion.
The application provides an interaction method, an interaction device, interaction equipment and an interaction storage medium based on an AR model, which can realize interaction between a user and the AR model on terminal equipment without additional motion capture equipment, complete the action of the AR model and reduce the cost; moreover, as the corresponding relation is established between the key points of the objects in the video stream and the action points of the AR model, the action points of the AR model can be controlled to make various corresponding actions according to the corresponding relation; the action diversity of the AR model is improved, and the user experience is improved.
Fig. 1 is a flowchart of an interaction method based on an AR model according to an embodiment of the present application, as shown in fig. 1, where the method includes:
s101, obtaining a video stream.
Optionally, step S101 includes the following several implementations.
In the first implementation manner of step S101, the video stream is collected by an image camera on the terminal device.
In a second implementation manner of step S101, a thermodynamic diagram of the object is acquired by an infrared camera on the terminal device, and a video stream is generated according to the thermodynamic diagram.
In this step, the execution subject of this embodiment may be an electronic device, or a terminal device, or other processing apparatus or device that can execute the applet of this embodiment. The present embodiment is described with an execution body as a terminal device. The method provided by the embodiment can be applied to terminal equipment.
The terminal device may collect the video stream while the user is acting.
For example, an image camera is installed on the terminal device, and in a general environment, the image camera can collect image and video information; when a user acts, the terminal equipment can acquire a video stream through the image camera.
For another example, an infrared camera is installed on the terminal device, and the infrared camera can collect infrared information in an environment with weak light; when a user acts, the terminal equipment can acquire infrared information through the infrared camera; the terminal device generates a thermodynamic diagram according to the infrared information, and then generates a video stream according to the thermodynamic diagram at each time point. The terminal equipment generates a video stream according to the infrared information acquired by the infrared camera, and the existing mode can be seen.
S102, identifying key points and key point motion information of objects in the video stream.
Optionally, the key point motion information is any one or more of the following: the coordinate position of the key point in the three-dimensional space, the orientation of the key point in the three-dimensional space and the movement speed of the key point.
Optionally, each key point is a preset point on the finger; or, each key point is a preset point on the facial organ; alternatively, each key point is an articulation point on a limb.
In this step, the video stream has an object that acts, the terminal device needs to identify the object in the video stream, and uses the existing object identification model and the existing tracking technology to identify each key point on the object and the key point motion information of each key point. Examples of the target recognition model include a face recognition model, an expression recognition model, and a finger recognition model. Examples of the tracking technique include a three-dimensional hand skeleton tracking technique and a chase neutralization mapping technique (Animoji).
For example, fig. 2 is a schematic diagram of key points of a hand according to an embodiment of the present application, as shown in fig. 2, an object in a video stream is a user's hand, and a plurality of preset points are preset on each finger of the hand, for example, three preset points are set on each finger according to a joint of each finger, and each preset point is a joint of the finger. As shown in fig. 2, 3 key points are set on the thumb, namely a key point 1, a key point 2 and a key point 3 on the thumb; on the remaining individual fingers, 3 keypoints are also set, for example, a key point 1, a key point 2, and a key point 3 on the index finger a, a key point 1, a key point 2, and a key point 3 on the little finger D, and 3 key points on the middle finger B, and 3 key points on the ring finger. When fingers act, the terminal equipment collects video streams, and then an existing hand recognition technology is adopted to recognize each finger in the video streams; then, the terminal equipment adopts the existing three-dimensional hand skeleton tracking technology to determine each key point on the finger; and the terminal device can identify the finger and the key point of the finger on each frame in the video stream, and then when the finger acts, the terminal device can determine the motion condition of each key point of the finger according to the video stream, and then obtain the coordinate position of each key point in the three-dimensional space, the orientation of each key point in the three-dimensional space, the motion speed of each key point, and the like.
The three-dimensional hand skeleton tracking technology is a technology for analyzing a hand structure in real time by utilizing a video stream and modeling the three-dimensional skeleton of the hand. In the three-dimensional hand skeleton tracking technique, each hand is regarded as a three-dimensional structure in which a set of line segments in a three-dimensional space are connected by key points, according to an anatomical structure. As shown in fig. 2, the motion information of the hand in the three-dimensional space can be represented by using each key point; and, each keypoint has keypoint motion information.
For another example, the object in the video stream is a face of the user, and a plurality of preset points are preset on each facial organ of the face, for example, a plurality of preset points are set around eyes, a plurality of preset points are set on each eyebrow, and a plurality of preset points are set around the mouth. When a user performs facial expression, the terminal equipment collects a video stream, and then adopts the existing face recognition technology to recognize faces and facial organs in the video stream; then, the terminal equipment adopts the existing three-dimensional skeleton tracking technology and the position relation of preset points on the face to determine each preset point on each facial organ on the face; the preset points are key points; and the terminal device can identify the facial organs and key points of the facial organs on each frame in the video stream, so that when the user performs facial expression, the terminal device can determine the motion condition of each key point of each facial organ according to the video stream, and further obtain the coordinate position of each key point in the three-dimensional space, the orientation of each key point in the three-dimensional space, the motion speed of each key point, and the like.
Also for example, the object in the video stream is a limb of the user, for example, the limb is an arm, and a plurality of preset points are preset on the arm, for example, the preset points are set according to the shutdown of the arm, and each of the preset points is a preset point. When the arm of the user acts, the terminal equipment collects a video stream, and then an existing limb identification technology is adopted to identify the arm in the video stream; then, the terminal equipment adopts the existing three-dimensional skeleton tracking technology and the position relation of preset points on the arm to determine each preset point on the arm; the preset points are key points; and the terminal equipment can identify the key points of the arms on each frame in the video stream, and then when the arms of the user act, the terminal equipment can determine the movement condition of each key point of the arms according to the video stream, and then obtain the coordinate position of each key point in the three-dimensional space, the orientation of each key point in the three-dimensional space, the movement speed of each key point and the like.
S103, determining action points corresponding to the key points according to the corresponding relation between the key points and the action points of the AR model.
In this step, a plurality of operation points are set in advance on the AR model. The AR model may be any AR model.
And, the correspondence relationship between the key points and the action points of the model is established in advance, the correspondence relationship may be one-to-one correspondence between the key points and the action points, or the correspondence relationship may be correspondence between a plurality of key points and one action point, or the correspondence relationship may be correspondence between one key point and a plurality of action points.
Thus, the terminal device determines the action point corresponding to each key point according to the corresponding relation.
And S104, controlling the action points of the AR model to act according to the key point movement information so as to control the AR model to act.
In this step, since each key point has key point motion information, the key point motion information of the key point can be used as motion information of an action point corresponding to the key point; and the terminal equipment controls the motion points corresponding to the key points to perform corresponding actions according to the key point motion information of the key points, so as to control the AR model on the terminal equipment to perform actions.
In the embodiment, the video stream is acquired, and the key points and the key point movement information of the objects in the video stream are identified; determining action points corresponding to the key points according to the corresponding relation between the key points and the action points of the AR model; and controlling the action points of the AR model to act according to the key point movement information so as to control the AR model to act. When the AR model is needed to be displayed on the terminal equipment and the user interacts with the AR model, the terminal equipment can determine key points of the user on the moving object according to the acquired video stream, and determine the movement condition of the key points; and the terminal equipment controls the motion points corresponding to the key points to perform corresponding actions according to the key point motion information of the key points, so as to control the AR model on the terminal equipment to perform actions. Interaction between the user and the AR model on the terminal equipment can be realized without additional motion capture equipment, so that the motion of the AR model is completed, and the cost can be reduced; moreover, as the corresponding relation is established between the key points of the objects in the video stream and the action points of the AR model, the action points of the AR model can be controlled to make various corresponding actions according to the corresponding relation; the action diversity of the AR model is improved.
Fig. 3 is a flowchart of another interaction method based on an AR model according to an embodiment of the present application, as shown in fig. 3, where the method includes:
s201, obtaining a video stream.
In this step, the execution subject of this embodiment may be an electronic device, or a terminal device, or other processing apparatus or device that can execute the applet of this embodiment. The present embodiment is described with an execution body as a terminal device. The method provided by the embodiment can be applied to terminal equipment.
This step may refer to step 101 shown in fig. 1, and will not be described in detail.
S202, identifying key points and key point motion information of objects in the video stream.
Optionally, the key point motion information is any one or more of the following: the coordinate position of the key point in the three-dimensional space, the orientation of the key point in the three-dimensional space and the movement speed of the key point.
Optionally, each key point is a preset point on the finger; or, each key point is a preset point on the facial organ; alternatively, each key point is an articulation point on a limb.
In this step, the step may be referred to as step 102 shown in fig. 1, and will not be described in detail.
S203, determining the action points corresponding to the key points according to the corresponding relation between the key points and the action points of the AR model.
In this step, the step may be referred to as step 103 shown in fig. 1, and will not be described in detail.
S204, according to the key point motion information, controlling the action points of the AR model to act so as to control the AR model to act.
Optionally, step S204 includes the following several implementations.
In the first implementation manner of step S204, according to each key point, determining the linkage relationship between different key points; and controlling each action point corresponding to each key point to act according to the linkage relation and the key point movement information of each key point.
Optionally, determining the linkage relationship between different key points according to each key point includes: according to a preset database, wherein the database comprises a plurality of linkage relations, each linkage relation in the plurality of linkage relations is a relation between at least two key points, and the linkage relation corresponding to each key point is inquired.
In the second implementation manner of step S204, determining the linkage relationship between the key points according to the key point motion information; and controlling each action point corresponding to each key point to act according to the linkage relation.
In a third implementation manner of step S204, determining an object action of the object according to the key points and the key point motion information; determining an AR model action corresponding to the object action according to a corresponding relation between the preset object action and the AR model action; and controlling an action point of the AR model to act according to the action of the AR model corresponding to the action of the object.
Optionally, each key point is a preset point on the finger; each linkage relationship is the linkage relationship among preset points on the same finger, or each linkage relationship is the linkage relationship among different fingers.
Optionally, each key point is a preset point on the facial organ; each linkage relationship is the linkage relationship between preset points on the same facial organ, or each linkage relationship is the linkage relationship between different facial organs.
In this step, when the action point of the AR model is controlled to perform an action, the following several implementations are provided.
The first implementation mode: a database is established in advance, wherein the database comprises a plurality of linkage relations, and each linkage relation is a relation between at least two key points; thus, the terminal device can determine the linkage relation corresponding to each key point. Thus, the terminal device may determine that there is a certain linkage relationship between key points on the object of the user, where the object is a part that is sending out an action, for example, the object is a hand, and the object is a face. For example, when the object is a hand, each finger is provided with a plurality of preset points, and the preset points are key points; thus, each key point on each finger has a certain linkage relation. For example, when the object is a face, each face organ of the face has a plurality of preset points, and the preset points are key points; thus, each key point of each facial organ has a certain linkage relationship, for example, key points on the mouth have a certain linkage relationship, and key points on the eyes have a certain linkage relationship. Thus, when the object acts, the terminal device can determine the key points with linkage relation and obtain the key points.
Then, because the key points and the action points have corresponding relations, the terminal equipment can determine the action points with the linkage relation according to the key points with the linkage relation; and then, the terminal equipment controls the action points of the linkage relation according to the key point movement information to perform corresponding actions. Thus, the terminal device controls the AR model to act.
For example, a correspondence is established between each key point of the thumb of the hand and the action point of the jaw of the AR model, and a correspondence is established between each key point of the remaining four fingers of the hand and the action point of the jaw of the AR model; the terminal equipment determines that the opening and closing actions are completed between the thumb and the other four fingers of the hand; the terminal equipment can determine that the key points of the thumb have linkage relations, and the key points of the other four fingers have linkage relations; and then, the terminal equipment controls the upper jaw and the lower jaw of the AR model to be closed and opened according to the linkage relation and the movement condition of the key points on each finger, so as to control the mouth of the AR model to be closed and opened.
For another example, a correspondence is established between each key point of the thumb of the hand and the action point of the left arm of the AR model; establishing a corresponding relation between each key point of the little finger of the hand and the action point of the right arm of the AR model; establishing a corresponding relation between each key point of the index finger, the middle finger and the ring finger of the hand and the action point of the head of the AR model; the terminal equipment determines the distance between the thumb and the little finger of the hand to finish the approaching and keeping away actions; the terminal equipment determines an index finger, a middle finger and a ring finger to finish bending and straightening actions; the terminal equipment can determine that the key points on the thumb have linkage relations, the key points on the little thumb have linkage relations and the key points on the other three fingers have linkage relations; moreover, the terminal equipment can determine that the thumb and the little finger have a linkage relation; then, the terminal equipment controls the double arms of the AR model to fold and unfold according to the linkage relation and the movement condition of the key points on each finger, and controls the head of the AR model to pitch forwards and lean backwards.
For another example, a correspondence is established between each key point on the palm of the hand and the trunk action point of the AR model; the terminal equipment determines that the palm rotates; the terminal equipment can determine that each key point on the palm has a linkage relation; and then, the terminal equipment controls the trunk of the AR model to turn according to the linkage relation and the movement condition of the key points on the palm, so as to control the direction of the AR model to change.
In the above process, the different fingers have a linkage relationship, for example, a linkage relationship between the thumb and the little finger; the key points on the same finger have a interlocked relationship, for example, the key points on the thumb have a interlocked relationship.
For another example, a correspondence is established between a facial organ of the user's face and a facial organ of the AR model; establishing a corresponding relation between each key point on the same facial organ of the face of the user and each key point on the same facial organ of the AR model; the terminal equipment determines that the face of the user is expressed in a facial mode; the terminal equipment can determine that each key point on the same facial organ has a linkage relation and the linkage relation among different facial organs; and then, the terminal equipment controls the AR model to perform corresponding expression actions according to the linkage relation and the movement condition of key points on the facial organs of the face of the user.
The second implementation mode: the terminal equipment establishes corresponding relations between different key point motion information and linkage relations in advance. For example, if the key point operation information of the key point 1 is a and the key point operation information of the key point 2 is B, then a linkage relationship exists between the key point 1 and the key point 2; the key point operation information of the key point 1 is A, the key point operation information of the key point 2 is C, and the key point operation information of the key point 3 is D, so that the key point 1, the key point 2 and the key point 3 have a linkage relation; the key point operation information of the key point 1 is A, the key point operation information of the key point 2 is C, the key point operation information of the key point 3 is E, and the key point operation information of the key point 4 is F, so that the key point 1 and the key point 2 have a linkage relation, and the key point 3 and the key point 4 have a linkage relation.
In the third implementation manner, after obtaining the key points and the key point movement information of the key points, the terminal equipment can directly determine the object actions of the object; then, as the corresponding relation between the object action and the AR model action is established, the terminal equipment can determine the AR model action corresponding to the object action; then, the terminal device directly controls the action points of the AR model to act according to the AR model action corresponding to the object action.
For example, after obtaining the key points on the finger and the key point movement information of the key points on the finger, the terminal device may directly determine that the finger is performing the finger dance movement; the corresponding relation between the finger of the user and the finger of the AR model is established; a corresponding relation is established between the key points of the fingers of the user and the action points of the fingers of the AR model; the finger dance motion corresponds to the AR model dance motion, and the AR model dance motion is used as the AR model motion; therefore, the terminal equipment can directly determine the AR model dance motion corresponding to the finger dance motion according to the corresponding relation between the object motion and the AR model motion; and the terminal equipment can control the action points corresponding to the key points on the AR model to perform dance actions according to the AR model dance actions.
By adopting the above manner provided by the embodiment, various actions of the AR model can be completed according to the object that is sending out the action. For example, it is also possible to implement a user's finger, controlling the arm of the AR model; and the finger of the user is realized, and the antenna of the AR model is controlled.
Also, in the above illustration, the motion of the finger and the motion of the facial expression may be applied to the AR model at the same time, so that the AR model completes the torso motion and the facial expression at the same time.
S205, according to the corresponding relation between the preset action points and the voice prompt information, determining and displaying the voice prompt information corresponding to the action points of the AR model, wherein the voice prompt information is used for representing that the action points of the AR model are doing actions.
In this step, after the terminal device controls the AR model displayed on the terminal device to complete the corresponding action, the terminal device may also send a voice prompt to prompt the user that the action point of the AR model is completing the corresponding action.
Specifically, the correspondence between the action point and the voice prompt information is already stored in the terminal device in advance, and when the action point of the AR model displayed on the terminal device is performing an action, the terminal device can determine the voice prompt information corresponding to the action point, and the terminal device sends the voice prompt information.
For example, when the two arms of the AR model displayed on the terminal device are clasping, the terminal device sends out a voice prompt of "the two arms are clasping".
According to the embodiment, through the fact that the AR model is required to be displayed on the terminal equipment, when a user interacts with the AR model, the terminal equipment can determine key points of the user on a moving object according to the acquired video stream, and determine the movement condition of the key points; and the terminal equipment controls the motion points corresponding to the key points to perform corresponding actions according to the key point motion information of the key points, so as to control the AR model on the terminal equipment to perform actions. Interaction between the user and the AR model on the terminal equipment can be realized without additional motion capture equipment, so that the motion of the AR model is completed, and the cost can be reduced; moreover, as the corresponding relation is established between the key points of the objects in the video stream and the action points of the AR model, the action points of the AR model can be controlled to make various corresponding actions according to the corresponding relation; the action diversity of the AR model is improved. Moreover, the method provided by the embodiment can be applied to the AR puppet to finish interaction between the user and the AR puppet, so that the AR puppet can finish various actions. In addition, the embodiment provides various ways how to control the action points of the AR model to act, so that the AR model can be controlled to complete corresponding actions quickly.
Fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present application, as shown in fig. 4, where the terminal device includes:
an acquisition unit 31 for acquiring a video stream.
And an identification unit 32 for identifying keypoints and keypoint motion information of the object in the video stream.
And a determining unit 33, configured to determine an action point corresponding to the key point according to the correspondence between the key point and the action point of the AR model.
The control unit 34 is configured to control the action point of the AR model to perform an action according to the key point motion information, so as to control the AR model to perform an action.
The terminal device provided in this embodiment is similar to the technical scheme for implementing the interaction method based on the AR model provided in any one of the foregoing embodiments, and the implementation principle and the technical effect are similar and are not repeated.
Fig. 5 is a schematic structural diagram of another terminal device according to an embodiment of the present application, where, on the basis of the embodiment shown in fig. 4, as shown in fig. 5, the control unit 34 is specifically configured to: determining linkage relations among different key points according to the key points; and controlling each action point corresponding to each key point to act according to the linkage relation and the key point movement information of each key point.
The control unit 34 is specifically configured to: according to a preset database, wherein the database comprises a plurality of linkage relations, each linkage relation in the plurality of linkage relations is a relation between at least two key points, and the linkage relation corresponding to each key point is inquired.
Optionally, each key point is a preset point on the finger; each linkage relationship is the linkage relationship among preset points on the same finger, or each linkage relationship is the linkage relationship among different fingers.
Optionally, each key point is a preset point on the facial organ; each linkage relationship is the linkage relationship between preset points on the same facial organ, or each linkage relationship is the linkage relationship between different facial organs.
Alternatively, the control unit 34 is specifically configured to: determining linkage relations among the key points according to the key point movement information; and controlling each action point corresponding to each key point to act according to the linkage relation.
Optionally, the key point motion information is any one or more of the following: the coordinate position of the key point in the three-dimensional space, the orientation of the key point in the three-dimensional space and the movement speed of the key point.
Alternatively, the control unit 34 is specifically configured to: determining object actions of the object according to the key points and the key point movement information; determining an AR model action corresponding to the object action according to a corresponding relation between the preset object action and the AR model action; and controlling an action point of the AR model to act according to the action of the AR model corresponding to the action of the object.
The acquiring unit 31 is specifically configured to: collecting video streams through an image camera on the terminal equipment; or, acquiring the thermodynamic diagram of the object through an infrared camera on the terminal equipment, and generating a video stream according to the thermodynamic diagram.
The terminal device provided in this embodiment further includes:
the prompting unit 41 is configured to determine and display, according to a correspondence between a preset action point and voice prompt information, voice prompt information corresponding to the action point of the AR model after the control unit 34 controls the action point of the AR model to perform an action according to the key point movement information, where the voice prompt information is used to characterize that the action point of the AR model is performing an action.
The terminal device provided in this embodiment is similar to the technical scheme for implementing the interaction method based on the AR model provided in any one of the foregoing embodiments, and the implementation principle and the technical effect are similar and are not repeated.
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application, as shown in fig. 6, where the electronic device includes: a transmitter 71, a receiver 72, a memory 73, and a processor 74;
memory 73 is used to store computer instructions; the processor 74 is configured to execute the computer instructions stored in the memory 73 to implement the technical solution of the interaction method based on the AR model according to any of the foregoing embodiments.
The present application also provides a storage medium comprising: a readable storage medium and computer instructions stored in the readable storage medium; the computer instructions are used for implementing the technical scheme of the interaction method based on the AR model of any implementation manner provided in the foregoing examples.
In the specific implementation of the electronic device described above, it should be understood that the processor 74 may be a central processing unit (english: central Processing Unit, abbreviated as CPU), or may be another general purpose processor, a digital signal processor (english: digital Signal Processor, abbreviated as DSP), an application specific integrated circuit (english: application Specific Integrated Circuit, abbreviated as ASIC), or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments described above may be performed by hardware associated with program instructions. The foregoing program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: read-only memory (ROM), RAM, flash memory, hard disk, solid state disk, magnetic tape, floppy disk, optical disk, and any combination thereof.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application.

Claims (6)

1. An interaction method based on an AR puppet, which is characterized in that the method is applied to terminal equipment and comprises the following steps:
Acquiring a video stream, and identifying key points and key point motion information of objects in the video stream, wherein the objects in the video stream are hands of a user, and the key point motion information comprises the motion speed of the key points;
determining an action point corresponding to the key point according to the corresponding relation between the key point and the action point of the AR hand pair;
determining the linkage relation corresponding to each key point according to the corresponding relation between the preset different key point motion information and the linkage relation; if the key point motion information of the first key point is the first motion information and the key point motion information of the second key point is the second motion information, the first key point and the second key point have a linkage relation; if the key point motion information of the first key point is first motion information, the key point motion information of the second key point is third motion information, and the key point motion information of the third key point is fourth motion information, a linkage relation is formed among the first key point, the second key point and the third key point; if the key point motion information of the first key point is first motion information, the key point motion information of the second key point is third motion information, the key point motion information of the third key point is fifth motion information, and the key point motion information of the fourth key point is sixth motion information, a linkage relation exists between the first key point and the second key point, and a linkage relation exists between the third key point and the fourth key point;
According to the linkage relation and the key point motion information of each key point, controlling the action point of the AR hand pair to act so as to control the AR hand pair to act;
and determining and displaying voice prompt information corresponding to the action points of the AR puppet according to the corresponding relation between the preset action points and the voice prompt information, wherein the voice prompt information is used for representing that the action points of the AR puppet are doing actions.
2. The method of claim 1, wherein the keypoint motion information further comprises any one or more of: coordinate position of the key point in the three-dimensional space, and orientation of the key point in the three-dimensional space.
3. The method according to claim 1 or 2, wherein the acquiring the video stream comprises:
collecting the video stream through an image camera on the terminal equipment;
or acquiring the thermodynamic diagram of the object through an infrared camera on the terminal equipment, and generating the video stream according to the thermodynamic diagram.
4. A terminal device, characterized in that the terminal device comprises:
an acquisition unit configured to acquire a video stream;
the identification unit is used for identifying key points and key point motion information of objects in the video stream, wherein the objects in the video stream are hands of users, and the key point motion information comprises the motion speed of the key points;
The determining unit is used for determining the action point corresponding to the key point according to the corresponding relation between the key point and the action point of the AR hand pair;
the control unit is used for determining the linkage relation corresponding to each key point according to the corresponding relation between the preset different key point motion information and the linkage relation; if the key point motion information of the first key point is the first motion information and the key point motion information of the second key point is the second motion information, the first key point and the second key point have a linkage relation; if the key point motion information of the first key point is first motion information, the key point motion information of the second key point is third motion information, and the key point motion information of the third key point is fourth motion information, a linkage relation is formed among the first key point, the second key point and the third key point; if the key point motion information of the first key point is first motion information, the key point motion information of the second key point is third motion information, the key point motion information of the third key point is fifth motion information, and the key point motion information of the fourth key point is sixth motion information, a linkage relation exists between the first key point and the second key point, and a linkage relation exists between the third key point and the fourth key point;
According to the linkage relation and the key point motion information of each key point, controlling the action point of the AR hand pair to act so as to control the AR hand pair to act;
the prompting unit is used for determining and displaying voice prompt information corresponding to the action points of the AR puppet according to the corresponding relation between the preset action points and the voice prompt information, wherein the voice prompt information is used for representing that the action points of the AR puppet are doing actions.
5. An electronic device, comprising: a transmitter, a receiver, a memory, and a processor;
the memory is used for storing computer instructions; the processor is configured to execute the computer instructions stored in the memory to implement the AR-hand pair based interaction method of any one of claims 1-3.
6. A storage medium, comprising: a readable storage medium and computer instructions stored in the readable storage medium; the computer instructions for implementing the AR hand pair-based interaction method of any one of claims 1-3.
CN201910576731.3A 2019-06-28 2019-06-28 Interaction method, device, equipment and storage medium based on AR model Active CN110321008B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910576731.3A CN110321008B (en) 2019-06-28 2019-06-28 Interaction method, device, equipment and storage medium based on AR model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910576731.3A CN110321008B (en) 2019-06-28 2019-06-28 Interaction method, device, equipment and storage medium based on AR model

Publications (2)

Publication Number Publication Date
CN110321008A CN110321008A (en) 2019-10-11
CN110321008B true CN110321008B (en) 2023-10-24

Family

ID=68120589

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910576731.3A Active CN110321008B (en) 2019-06-28 2019-06-28 Interaction method, device, equipment and storage medium based on AR model

Country Status (1)

Country Link
CN (1) CN110321008B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112287868B (en) * 2020-11-10 2021-07-13 上海依图网络科技有限公司 Human body action recognition method and device

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20140060604A (en) * 2012-11-12 2014-05-21 주식회사 브이터치 Method for controlling electronic devices by using virtural surface adjacent to display in virtual touch apparatus without pointer
CN104658022A (en) * 2013-11-20 2015-05-27 中国电信股份有限公司 Method and device for generating three-dimensional cartoons
CN106445454A (en) * 2016-09-27 2017-02-22 合肥海诺恒信息科技有限公司 Control system for information publish and interactive display
WO2017126292A1 (en) * 2016-01-22 2017-07-27 Mitsubishi Electric Corporation Method for processing keypoint trajectories in video
CN107154069A (en) * 2017-05-11 2017-09-12 上海微漫网络科技有限公司 A kind of data processing method and system based on virtual role
CN107831890A (en) * 2017-10-11 2018-03-23 北京华捷艾米科技有限公司 Man-machine interaction method, device and equipment based on AR
CN107958479A (en) * 2017-12-26 2018-04-24 南京开为网络科技有限公司 A kind of mobile terminal 3D faces augmented reality implementation method
CN107967061A (en) * 2017-12-21 2018-04-27 北京华捷艾米科技有限公司 Man-machine interaction method and device
CN108335345A (en) * 2018-02-12 2018-07-27 北京奇虎科技有限公司 The control method and device of FA Facial Animation model, computing device
CN108646920A (en) * 2018-05-16 2018-10-12 Oppo广东移动通信有限公司 Identify exchange method, device, storage medium and terminal device
WO2018188088A1 (en) * 2017-04-14 2018-10-18 广州千藤玩具有限公司 Clay toy system based on augmented reality and digital image processing and method therefor
CN109032358A (en) * 2018-08-27 2018-12-18 百度在线网络技术(北京)有限公司 The control method and device of AR interaction dummy model based on gesture identification
CN109147017A (en) * 2018-08-28 2019-01-04 百度在线网络技术(北京)有限公司 Dynamic image generation method, device, equipment and storage medium
CN109191548A (en) * 2018-08-28 2019-01-11 百度在线网络技术(北京)有限公司 Animation method, device, equipment and storage medium
CN109784299A (en) * 2019-01-28 2019-05-21 Oppo广东移动通信有限公司 Model treatment method, apparatus, terminal device and storage medium

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20140060604A (en) * 2012-11-12 2014-05-21 주식회사 브이터치 Method for controlling electronic devices by using virtural surface adjacent to display in virtual touch apparatus without pointer
CN104658022A (en) * 2013-11-20 2015-05-27 中国电信股份有限公司 Method and device for generating three-dimensional cartoons
WO2017126292A1 (en) * 2016-01-22 2017-07-27 Mitsubishi Electric Corporation Method for processing keypoint trajectories in video
CN106445454A (en) * 2016-09-27 2017-02-22 合肥海诺恒信息科技有限公司 Control system for information publish and interactive display
WO2018188088A1 (en) * 2017-04-14 2018-10-18 广州千藤玩具有限公司 Clay toy system based on augmented reality and digital image processing and method therefor
CN107154069A (en) * 2017-05-11 2017-09-12 上海微漫网络科技有限公司 A kind of data processing method and system based on virtual role
CN107831890A (en) * 2017-10-11 2018-03-23 北京华捷艾米科技有限公司 Man-machine interaction method, device and equipment based on AR
CN107967061A (en) * 2017-12-21 2018-04-27 北京华捷艾米科技有限公司 Man-machine interaction method and device
CN107958479A (en) * 2017-12-26 2018-04-24 南京开为网络科技有限公司 A kind of mobile terminal 3D faces augmented reality implementation method
CN108335345A (en) * 2018-02-12 2018-07-27 北京奇虎科技有限公司 The control method and device of FA Facial Animation model, computing device
CN108646920A (en) * 2018-05-16 2018-10-12 Oppo广东移动通信有限公司 Identify exchange method, device, storage medium and terminal device
CN109032358A (en) * 2018-08-27 2018-12-18 百度在线网络技术(北京)有限公司 The control method and device of AR interaction dummy model based on gesture identification
CN109147017A (en) * 2018-08-28 2019-01-04 百度在线网络技术(北京)有限公司 Dynamic image generation method, device, equipment and storage medium
CN109191548A (en) * 2018-08-28 2019-01-11 百度在线网络技术(北京)有限公司 Animation method, device, equipment and storage medium
CN109784299A (en) * 2019-01-28 2019-05-21 Oppo广东移动通信有限公司 Model treatment method, apparatus, terminal device and storage medium

Also Published As

Publication number Publication date
CN110321008A (en) 2019-10-11

Similar Documents

Publication Publication Date Title
CN111694429A (en) Virtual object driving method and device, electronic equipment and readable storage
CN106527709B (en) Virtual scene adjusting method and head-mounted intelligent device
CN108509026B (en) Remote maintenance support system and method based on enhanced interaction mode
RU2708027C1 (en) Method of transmitting motion of a subject from a video to an animated character
CN109815776B (en) Action prompting method and device, storage medium and electronic device
JP4951498B2 (en) Face image recognition device, face image recognition method, face image recognition program, and recording medium recording the program
CN109117753B (en) Part recognition method, device, terminal and storage medium
CN110544301A (en) Three-dimensional human body action reconstruction system, method and action training system
KR20110139694A (en) Method and system for gesture recognition
CN207752446U (en) A kind of gesture identification interaction systems based on Leap Motion equipment
CN108549490A (en) A kind of gesture identification interactive approach based on Leap Motion equipment
CN110544302A (en) Human body action reconstruction system and method based on multi-view vision and action training system
CN114529639A (en) Method, device, equipment and storage medium for generating virtual image animation
JP2015195020A (en) Gesture recognition device, system, and program for the same
CN203630822U (en) Virtual image and real scene combined stage interaction integrating system
KR101654311B1 (en) User motion perception method and apparatus
CN108174141B (en) Video communication method and mobile device
WO2023273372A1 (en) Gesture recognition object determination method and apparatus
CN110321008B (en) Interaction method, device, equipment and storage medium based on AR model
CN113989928B (en) Motion capturing and redirecting method
Sreejith et al. Real-time hands-free immersive image navigation system using Microsoft Kinect 2.0 and Leap Motion Controller
CN108459707A (en) It is a kind of using intelligent terminal identification maneuver and the system that controls robot
CN115131879B (en) Action evaluation method and device
KR101519589B1 (en) Electronic learning apparatus and method for controlling contents by hand avatar
JP5092093B2 (en) Image processing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant