CN110888532A - Man-machine interaction method and device, mobile terminal and computer readable storage medium - Google Patents

Man-machine interaction method and device, mobile terminal and computer readable storage medium Download PDF

Info

Publication number
CN110888532A
CN110888532A CN201911179659.7A CN201911179659A CN110888532A CN 110888532 A CN110888532 A CN 110888532A CN 201911179659 A CN201911179659 A CN 201911179659A CN 110888532 A CN110888532 A CN 110888532A
Authority
CN
China
Prior art keywords
action
human body
real scene
augmented reality
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911179659.7A
Other languages
Chinese (zh)
Inventor
殷秀玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Microphone Holdings Co Ltd
Shenzhen Transsion Holdings Co Ltd
Original Assignee
Shenzhen Microphone Holdings Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Microphone Holdings Co Ltd filed Critical Shenzhen Microphone Holdings Co Ltd
Priority to CN201911179659.7A priority Critical patent/CN110888532A/en
Publication of CN110888532A publication Critical patent/CN110888532A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/176Dynamic expression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention discloses a man-machine interaction method, a man-machine interaction device, a mobile terminal and a computer readable storage medium, wherein the method comprises the following steps: if a display instruction of an augmented reality application program virtual image is detected, acquiring a real scene acquired by a camera, and operating the augmented reality application program based on the real scene; acquiring a video image acquired by a camera, and identifying a human body image in the video image acquired by the camera; acquiring characteristic information in the human body image based on the human body image; and determining an action instruction matched with the characteristic information, and executing an operation corresponding to the action instruction based on the augmented reality application program. By means of the acquired feature information of the human body image, an action instruction is determined, the virtual image is controlled to perform an interaction action matched with the feature information, man-machine interaction is achieved, and interaction intelligence is improved.

Description

Man-machine interaction method and device, mobile terminal and computer readable storage medium
Technical Field
The invention relates to the technical field of intelligent interaction, in particular to a man-machine interaction method, a man-machine interaction device, a mobile terminal and a computer readable storage medium.
Background
In the aspect of augmented reality virtual image interaction, the current mobile terminal is interacted with the virtual image mainly by clicking and dragging the virtual image in the screen, the interaction form with the virtual image is single, the interactivity is poor, and the intelligent interaction effect cannot be achieved.
The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.
Disclosure of Invention
The invention mainly aims to provide a human-computer interaction method, a human-computer interaction device and a computer readable storage medium, and aims to solve the technical problem that the interaction of augmented reality virtual images is not intelligent enough and user experience is influenced in the prior art.
In order to achieve the above object, the present invention provides a human-computer interaction method, an apparatus, a mobile terminal and a computer-readable storage medium, wherein the human-computer interaction method comprises the following steps:
if a display instruction of an augmented reality application program virtual image is detected, acquiring a real scene acquired by a camera, and operating the augmented reality application program based on the real scene so as to display the virtual image of the augmented reality application program in the real scene;
acquiring a video image acquired by a camera, and identifying a human body image in the video image acquired by the camera;
acquiring characteristic information in the human body image based on the human body image;
and determining action instructions matched with the characteristic information, and executing operation corresponding to the action instructions based on the augmented reality application program so that the virtual image makes interactive actions matched with the action instructions in the real scene.
Preferably, based on the human body image, determining human body action;
detecting whether the duration time of the human body action reaches a preset time or not;
if the duration time reaches the preset time, determining the action type of the human body action;
and determining characteristic information corresponding to the action type based on the action type.
Preferably, if the action type is a facial expression action, acquiring a key point track of facial five sense organs based on the human body image;
and determining emotion state information corresponding to the facial expression actions based on the key point tracks.
Preferably, if the human body motion type is limb motion, acquiring joint point track information of the limb based on the human body image;
determining category information of the limb action based on the joint point trajectory information.
Preferably, whether an interactive action matched with the action instruction exists in an action library is determined;
and if the virtual image exists, executing the operation corresponding to the action instruction based on the augmented reality application program so that the virtual image can make an interactive action matched with the action instruction in the real scene.
Preferably, if the virtual image does not exist, the action instruction is recorded, and the virtual image is controlled to output preset prompt information;
when an updating instruction is detected, determining whether an updating action matched with the action instruction exists in an updating packet corresponding to the updating instruction;
and if so, storing the action command and the updated action association in the action library.
Preferably, if a display instruction of the virtual image of the augmented reality application program is received, acquiring a real scene acquired by a camera and determining spatial information of the real scene;
acquiring an avatar of the augmented reality application program, and identifying spatial information of the reality scene;
and determining the display position of the virtual image in the real scene based on the spatial information of the real scene, and running the augmented reality application program based on the display position so as to display the virtual image at the display position of the real scene.
In addition, to achieve the above object, the present invention further provides a human-computer interaction device, including:
the detection module is used for acquiring a real scene acquired by a camera if a display instruction of an augmented reality application program virtual image is detected, and operating the augmented reality application program based on the real scene so as to display the virtual image of the augmented reality application program in the real scene;
the first acquisition module is used for acquiring a video image acquired by a camera and identifying a human body image in the video image acquired by the camera;
the second acquisition module is used for acquiring characteristic information in the human body image based on the human body image;
and the execution module is used for determining the action instruction matched with the characteristic information and executing the operation corresponding to the action instruction based on the augmented reality application program so that the virtual image makes the interactive action matched with the action instruction in the real scene.
Preferably, the detection module is further configured to:
if a display instruction of an augmented reality application program virtual image is received, acquiring a real scene acquired by a camera and determining spatial information of the real scene;
acquiring an avatar of the augmented reality application program, and identifying spatial information of the reality scene;
and determining the display position of the virtual image in the real scene based on the spatial information of the real scene, and running the augmented reality application program based on the display position so as to display the virtual image at the display position of the real scene.
Preferably, the second obtaining module is further configured to:
determining human body action based on the human body image;
detecting whether the duration time of the human body action reaches a preset time or not;
if the duration time reaches the preset time, determining the action type of the human body action;
and determining characteristic information corresponding to the action type based on the action type.
Preferably, the second obtaining module is further configured to:
if the action type is facial expression action, acquiring a key point track of the facial five sense organs position based on the human body image;
and determining emotion state information corresponding to the facial expression actions based on the key point tracks.
Preferably, the second obtaining module is further configured to:
if the human body action type is limb action, acquiring joint point track information of the limb based on the human body image;
determining category information of the limb action based on the joint point trajectory information.
Preferably, the execution module is further configured to:
determining whether an interactive action matched with the action instruction exists in an action library;
and if the virtual image exists, executing the operation corresponding to the action instruction based on the augmented reality application program so that the virtual image can make an interactive action matched with the action instruction in the real scene.
Preferably, the execution module is further configured to:
if the virtual image does not exist, recording the action instruction, and controlling the virtual image to output preset prompt information;
when an updating instruction is detected, determining whether an updating action matched with the action instruction exists in an updating packet corresponding to the updating instruction;
and if so, storing the action command and the updated action association in the action library.
In addition, to achieve the above object, the present invention also provides a mobile terminal, including: the system comprises a memory, a processor and a human-computer interaction program which is stored on the memory and can run on the processor, wherein the human-computer interaction program realizes the steps of the human-computer interaction method when being executed by the processor.
In addition, to achieve the above object, the present invention further provides a computer readable storage medium, on which a human-computer interaction program is stored, and the human-computer interaction program, when executed by a processor, implements the steps of the human-computer interaction method described above.
According to the method and the device, if the display instruction of the virtual image of the augmented reality application program is detected, the real scene collected by the camera is obtained, the augmented reality application program is operated based on the real scene, the virtual image of the augmented reality application program is displayed in the real scene, then the video image collected by the camera is obtained, the human body image in the video image collected by the camera is identified, then the characteristic information in the human body image is obtained based on the human body image, finally the action instruction matched with the characteristic information is determined, and the operation corresponding to the action instruction is executed based on the augmented reality application program, so that the virtual image makes the interactive action matched with the action instruction in the real scene, the intelligent interaction of the augmented reality application program is realized, and the user experience is improved.
Drawings
Fig. 1 is a schematic structural diagram of a mobile terminal in a hardware operating environment according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a first embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, fig. 1 is a schematic structural diagram of a mobile terminal in a hardware operating environment according to an embodiment of the present invention.
The mobile terminal of the embodiment of the invention can be a PC, and can also be mobile terminal equipment supporting augmented reality, such as a smart phone, a tablet personal computer and the like.
As shown in fig. 1, the mobile terminal may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Optionally, the mobile terminal may further include a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WiFi module, and the like. Such as light sensors, motion sensors, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display screen according to the brightness of ambient light, and a proximity sensor that may turn off the display screen and/or the backlight when the mobile terminal is moved to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when the motion sensor is stationary, and can be used for applications of recognizing human-computer interaction gestures (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; of course, the mobile terminal may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which are not described herein again.
Those skilled in the art will appreciate that the mobile terminal architecture shown in fig. 1 is not intended to be limiting of the terminal, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a human-computer interaction program.
In the mobile terminal shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and communicating with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be used to invoke a human-computer interaction program stored in the memory 1005.
In this embodiment, the human-computer interaction device includes: the system comprises a memory 1005, a processor 1001 and a human-computer interaction program which is stored on the memory 1005 and can run on the processor 1001, wherein when the processor 1001 calls the human-computer interaction program stored in the memory 1005, the following operations are executed:
if a display instruction of an augmented reality application program virtual image is detected, acquiring a real scene acquired by a camera, and operating the augmented reality application program based on the real scene so as to display the virtual image of the augmented reality application program in the real scene;
acquiring a video image acquired by a camera, and identifying a human body image in the video image acquired by the camera;
acquiring characteristic information in the human body image based on the human body image;
and determining action instructions matched with the characteristic information, and executing operation corresponding to the action instructions based on the augmented reality application program so that the virtual image makes interactive actions matched with the action instructions in the real scene.
Further, the processor 1001 may call the human-computer interaction program stored in the memory 1005, and further perform the following operations:
determining human body action based on the human body image;
detecting whether the duration time of the human body action reaches a preset time or not;
if the duration time reaches the preset time, determining the action type of the human body action;
and determining characteristic information corresponding to the action type based on the action type.
Further, the processor 1001 may call the human-computer interaction program stored in the memory 1005, and further perform the following operations:
if the action type is facial expression action, acquiring a key point track of the facial five sense organs position based on the human body image;
and determining emotion state information corresponding to the facial expression actions based on the key point tracks.
Further, the processor 1001 may call the human-computer interaction program stored in the memory 1005, and further perform the following operations:
if the human body action type is limb action, acquiring joint point track information of the limb based on the human body image;
determining category information of the limb action based on the joint point trajectory information.
Further, the processor 1001 may call the human-computer interaction program stored in the memory 1005, and further perform the following operations:
determining whether an interactive action matched with the action instruction exists in an action library;
and if the virtual image exists, executing the operation corresponding to the action instruction based on the augmented reality application program so that the virtual image can make an interactive action matched with the action instruction in the real scene.
Further, the processor 1001 may call the human-computer interaction program stored in the memory 1005, and further perform the following operations:
if the virtual image does not exist, recording the action instruction, and controlling the virtual image to output preset prompt information;
when an updating instruction is detected, determining whether an updating action matched with the action instruction exists in an updating packet corresponding to the updating instruction;
and if so, storing the action command and the updated action association in the action library.
Further, the processor 1001 may call the human-computer interaction program stored in the memory 1005, and further perform the following operations:
if a display instruction of an augmented reality application program virtual image is received, acquiring a real scene acquired by a camera and determining spatial information of the real scene;
acquiring an avatar of the augmented reality application program, and identifying spatial information of the reality scene;
and determining the display position of the virtual image in the real scene based on the spatial information of the real scene, and running the augmented reality application program based on the display position so as to display the virtual image at the display position of the real scene.
The invention also provides a method, and referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the method of the invention.
Step S100, if a display instruction of an augmented reality application program virtual image is detected, acquiring a real scene collected by a camera, and operating the augmented reality application program based on the real scene so as to display the virtual image of the augmented reality application program in the real scene;
in this embodiment, the displaying operation of opening the avatar may be performed simultaneously when the detecting of the displaying instruction of the avatar of the augmented reality application indicates detecting the operation of opening the augmented reality application, where the detecting of the operation of opening the augmented reality application may be detecting a click operation on an icon of the augmented reality application or detecting a voice operation of opening the augmented reality application, or after detecting the displaying instruction of the avatar of the augmented reality application may also indicate opening the augmented reality application, detecting an avatar start instruction in the augmented reality application, where the avatar start instruction may be detecting a touch instruction in a selection box of the avatar in the augmented reality application or detecting a voice instruction of starting the avatar.
And then starting a rear camera of the mobile terminal to enter a camera shooting mode, acquiring a real scene in the range of the rear camera, determining spatial information of the real scene, acquiring a virtual image of an augmented reality application program, and fusing and displaying the virtual image of the augmented reality application program in the real scene acquired by the rear camera, wherein the virtual image can be a 3D virtual doll preset by a developer, and can also be a 3D virtual image specific to a user by inputting attribute information of the virtual image in a virtual image setting step of the augmented reality application program based on the user.
Step S200, acquiring a video image acquired by a camera, and identifying a human body image in the video image acquired by the camera;
in this embodiment, after the mobile terminal displays the virtual image and the real space in a fused manner on a display interface of the mobile terminal, a video stream captured by a front camera of the mobile terminal is obtained, a total time of the video stream is greater than a preset time, which can be flexibly set, for example, 1 second, video images of consecutive frames are extracted from the video stream according to a preset sampling frequency, and then human body images of the consecutive frames in the video images of the consecutive frames are identified, where understandably, a rear camera of the mobile terminal is used for capturing a real scene, the front camera of the mobile terminal is used for capturing the video stream, and when the front camera captures a real scene, the rear camera captures a video to obtain the video stream.
Optionally, the human body image recognition may adopt a trained depth learning model based on color information and depth information to detect whether a human body image exists in a scene main image, after acquiring continuous frame human body images in continuous frame video images, coordinates of joint points of a skeleton structure of a human body to be recognized are acquired from the human body image to be recognized through a 3D vision sensor or a kinect bone tracking technology, coordinates of joint points at two ends of each part of the human body to be recognized are determined from the acquired coordinates of the joint points, dimensions of each part of the human body to be recognized are respectively calculated according to the coordinates of the joint points at two ends of each part of the human body to be recognized, ratios between the parts of the human body to be recognized are respectively calculated according to the dimensions of each part of the human body to be recognized, the calculated ratios are used as proportional features between the parts of the human body to be recognized, and then continuous frame human body images with equal proportional features between the parts are determined, if a plurality of different continuous frame human body images are detected, confirming respective initial human body image time of the plurality of different continuous frame human body images, determining the continuous frame human body image with the minimum initial human body image time, and taking the continuous frame human body image as a subsequent human body image to be processed.
Step S300, acquiring characteristic information in the human body image based on the human body image;
in this embodiment, when a human body image to be processed is detected, key points of The human body image are marked, based on The human body image and The key points of The human body image, a human body motion is determined and whether The human body motion is completed is detected, and when The human body motion is detected, whether The duration of The human body motion reaches a preset time is detected, where The preset time can be flexibly set, for example, 1 second, understandably, The total time of The video images of The consecutive frames is greater than The preset time, and if The duration of The human body motion reaches The preset time, The motion type of The human body motion is determined and characteristic information of The human body motion in The human body image is obtained, optionally, a preprocessing module of The mobile terminal performs gray level transformation and image normalization on The human body image, and performs detection through an APVD (Accumulation of small Pixel Difference) algorithm, and (4) keeping the standard human body image, finally establishing a training sample library and an index of the training sample by a feature extraction training module, reading the standard human body image, and extracting the features to obtain feature information.
In this embodiment, it can be understood that, a human body image to be processed is detected, then, a human body action is determined based on the human body image to be processed, when the human body action is detected to be completed, whether the duration time of the human body action reaches a preset time is detected, and if the duration time of the human body action reaches the preset time, the action type of the human body action is determined and the characteristic information of the human body action in the human body image is acquired.
Step S400, determining an action instruction matched with the characteristic information, and executing an operation corresponding to the action instruction based on the augmented reality application program so that the virtual image makes an interactive action matched with the action instruction in the real scene.
In the embodiment, the characteristic information of the human body action is determined, then the action instruction matched with the characteristic information of the human body action is determined based on the database of the mobile terminal, and then the operation corresponding to the action instruction is executed based on the augmented reality application program, namely, the instruction operation for controlling the virtual image to make the interactive action matched with the action instruction in the real space is executed. The action matching relation between the action instruction corresponding to the human body action and the virtual image can be a matching relation preset by a developer or a matching relation set when the user uses the virtual image for the first time.
Specifically, after determining the feature information corresponding to the detected human body motion, it is necessary to determine a motion instruction corresponding to the feature information, and then determine whether there is an interactive motion matching with the current motion instruction in the motion library, if there is an interactive motion matching with the current motion instruction in the motion library, the avatar is controlled to make an interactive motion matching with the motion instruction in the real space, and if not, the avatar is controlled to make an interactive motion that cannot be identified by the feature information in the real space, for example, a hand-spreading motion, a head-shaking motion and/or a "i do not understand" sound, etc., so as to prompt the user that the current motion cannot be identified.
In this embodiment, if a display instruction of an avatar of an augmented reality application is detected, a real scene collected by a camera is obtained, the augmented reality application is operated based on the real scene to display the avatar of the augmented reality application in the real scene, a video image collected by the camera is obtained, a human body image in the video image collected by the camera is identified, feature information in the human body image is obtained based on the human body image, an action instruction matched with the feature information is determined, and an operation corresponding to the action instruction is executed based on the augmented reality application to enable the avatar to perform an interactive action matched with the action instruction in the real scene, so that intelligent interaction of augmented reality is realized, and user experience is improved.
Based on the first embodiment, a second embodiment of the method of the present invention is provided, in this embodiment, step S300 includes:
step S310, determining human body action based on the human body image;
step S320, detecting whether the duration time of the human body action reaches a preset time or not;
step S330, if the duration time reaches a preset time, determining the action type of the human body action;
step S340, determining feature information corresponding to the action type based on the action type.
In this embodiment, when a continuous frame human body image to be processed is acquired, maximum entropy binarization processing is performed on the continuous frame human body image to obtain a binary image corresponding to the continuous frame human body image, human body feature points are extracted from each image in the continuous binary image according to the same extraction method, a coordinate system is established on each image by using a picture center as a coordinate origin, and feature point coordinates corresponding to the feature points on the image are acquired, where understandably, the coordinate systems established on each image are the same.
Then, determining the movement track of the characteristic points based on the characteristic point coordinates corresponding to the characteristic points on each image, wherein, the moving track of the characteristic points corresponds to the human body action, whether the duration time of the human body action reaches the preset time is detected, wherein the preset time can be flexibly set, for example, 1 second, when the duration reaches 1 second, the category of the human body action is determined, wherein the category of the human body action comprises facial expression action or limb action, after the category of the human body action is determined, the characteristic information in the effective human body action area is extracted based on the image processing technology, wherein, if the duration time of the human body action is detected not to reach the preset time, the virtual image is controlled to do the action of the spreader, or shaking the head and/or making an unrecognizable sound too fast, etc., to prompt the user that the current motion is unrecognizable.
In the embodiment, the human body action is determined based on the human body image, whether the duration time of the human body action reaches the preset time is detected, if the duration time reaches the preset time, the action type of the human body action is determined, and finally, based on the human body action type, the characteristic information corresponding to the action type of the human body action is determined, so that the mobile terminal can more accurately determine the human body action type, the accuracy of intelligent identification of the mobile terminal is improved, and further the user experience is improved.
Based on the second embodiment, a third embodiment of the method of the present invention is provided, in this embodiment, step S340 includes:
step S341, if the action type is facial expression action, acquiring a key point track of the facial five sense organs position based on the human body image;
step S342, determining emotional state information corresponding to the facial expression and action based on the key point trajectory.
In this embodiment, if the detected motion type is a facial expression motion, the face area is tracked in real time, key points of the five sense organs of the face are obtained, key point tracks are recorded, and key point track change data are obtained, where the key point track change data include the moving distance and moving direction of the key points, emotion type information corresponding to the micro expression of the user, such as distraction, injury, and the like, is determined according to the key point track change data, a change amplitude is calculated according to the key point track change data, emotion type grade information corresponding to each emotion type, such as slight distraction, great distraction, and the like, is determined according to the change amplitude, and the emotion type information and the emotion type grade information are collectively referred to as emotion state information.
In the embodiment, if the action type is facial expression action, a key point track of the facial five sense organs position is obtained based on the human body image; based on the key point tracks, the emotion state information corresponding to the facial expression actions is determined, so that the mobile terminal can more accurately determine the emotion state information of the recognized facial expressions, more accurate interaction actions are executed, and user experience is improved.
Based on the second embodiment, a fourth embodiment of the method of the present invention is provided, in this embodiment, step S340 includes:
step S343, if the human body motion type is limb motion, acquiring joint point track information of the limb based on the human body image;
step S344, determining the category information of the limb motion based on the joint point trajectory information.
In this embodiment, if the detected motion type is a limb motion, the limb area is tracked in real time, key points of the limb position are obtained, key point tracks are recorded, and key point track change data is obtained, where the key point track change data includes a moving distance and a moving direction of the key point, and limb motion category information corresponding to the limb motion of the user, such as fist making, arm swinging, head shaking, and the like, is analyzed and determined according to the key point track change data.
In the embodiment, if the action type is a limb action, joint point track information of the limb is acquired based on the human body image, and the category information of the limb action is determined based on the joint point track information, so that the mobile terminal can more accurately determine the identified limb action type, correct interaction action is executed, and intelligent interaction is realized.
Based on the first embodiment, a fifth embodiment of the method of the present invention is provided, in which step S400 includes:
step S410, determining whether an interactive action matched with the action instruction exists in an action library;
step S420, if yes, executing an operation corresponding to the action instruction based on the augmented reality application program, so that the virtual image makes an interactive action matching with the action instruction in the real scene.
In this embodiment, after determining the feature information corresponding to the detected human body action, it is necessary to first determine an action instruction corresponding to the feature information, and then determine whether there is an interactive action matching with the current action instruction in the action library, that is, the mobile terminal only controls the virtual image to perform some interactive actions in the action library, so as to ensure correct interaction and implement intelligent interaction.
Specifically, the mobile terminal has one or more action libraries, various interactive actions are stored in the action libraries, and the interactive actions are matched with action instructions of human body actions one by one. For example, according to the division of facial expression actions and body actions, the mobile terminal has two action libraries, and each action library only stores the interactive actions matched with each action. Therefore, after the action instruction corresponding to the characteristic information of the human body action is determined, only the action library corresponding to the virtual image is needed to determine whether the interactive action matched with the action instruction exists, and only if the interactive action matched with the current action instruction exists in the action library of the mobile terminal, the mobile terminal can control the virtual image to perform the corresponding interactive action.
Further, step S410 further includes:
step S430, if not, recording the action instruction, and controlling the virtual image to output preset prompt information;
step S440, when an update instruction is detected, determining whether an update action matched with the action instruction exists in an update package corresponding to the update instruction;
and step S450, if the updated action exists, storing the action command and the updated action in the action library in a related manner.
In this step, if there is no interactive action matching with the current action instruction in the action library, that is, the mobile terminal cannot control the avatar to perform the interactive action, the mobile terminal records the current feature information and controls to output the preset prompting information output by the avatar, where the preset prompting information may be action information and/or voice information, for example, the mobile terminal controls the avatar to perform an action of spreading hands, or shake the head and/or make a sound that i cannot understand, so as to prompt the user that the current action cannot be recognized.
Then, when an update instruction of the application program is detected, wherein the update instruction triggering manner of the application program may be automatically updated every three months preset by a developer, or whether the number of pieces of feature information, which does not correspond to the action matched with the feature information, in the terminal feature library recorded by the mobile terminal exceeds a preset value is detected, for example, if it is detected that the number of pieces of feature information recorded by the mobile terminal reaches 5, the update instruction of the application program is automatically triggered. Before an update instruction of an application program is triggered, an action instruction corresponding to a non-interactive action recorded by the mobile terminal is fed back to a background upgrading person, and the background upgrading person sets an interactive action matched with the action instruction according to the fed-back action instruction. And storing the action command and the interaction action matched with the action command into an update package of the application program, detecting whether the update package of the application program has an update action or not when the update command of the application program is detected, namely whether the interaction action corresponding to the action command recorded before exists or not, if so, storing the action command and the interaction action matched with the action command in an action library of the mobile terminal in a related manner, and if not, continuously feeding the recorded action command back to background upgrading personnel to wait for next update.
In this embodiment, by determining whether there is an interactive action matching the action instruction in the action library, if so, executing an operation corresponding to the action instruction based on the augmented reality application program, so that the avatar performs the interactive action matching the action instruction in the real scene, if not, recording the action instruction, and controlling the avatar to output preset prompt information, and then when detecting the update instruction, determining whether there is an update action matching the action instruction in an update package corresponding to the update instruction, and if so, storing the action instruction and the update action association in the action library, thereby improving the correctness of the interaction and further improving the user experience.
Based on the above embodiments, a sixth embodiment of the method of the present invention is provided, in which step S100 includes:
step S110, if a display instruction of an augmented reality application program virtual image is received, acquiring a real scene collected by a camera and determining spatial information of the real scene;
step S120, acquiring the virtual image of the augmented reality application program, and identifying the spatial information of the reality scene;
step S130, determining the display position of the virtual image in the real scene based on the spatial information of the real scene, and operating the augmented reality application program based on the display position so as to display the virtual image at the display position of the real scene.
In this embodiment, when detecting the start instruction of the augmented reality application, the camera mode of the rear camera is started, the real scene within the range of the rear camera is collected, and the real scene information is acquired, where the real scene information includes real scene spatial coordinate information and attributes of people or objects appearing in the real scene.
Then, the mobile terminal acquires the virtual image of the application program, and determines the spatial position information of the virtual image in the real scene based on the acquired real scene information, wherein optionally, the spatial position information of the virtual image in the real scene can be determined based on the instant positioning and mapping technology and the image processing technology, and finally the virtual image is placed at the placement position.
In this embodiment, if a display instruction of an avatar of an augmented reality application is received, a real scene collected by a camera is obtained and spatial information of the real scene is determined, then the avatar of the augmented reality application is obtained, the spatial information of the real scene is identified, finally, a display position corresponding to the spatial information of the avatar in the real scene is determined based on the spatial information of the real scene, the augmented reality application is operated based on the display position, the avatar is displayed at the display position of the real scene, and the placement position of the avatar in the real scene is determined according to the spatial information of different real scenes, so that the accuracy of the avatar displayed in a current display screen of a mobile terminal is improved, and further, the user experience is improved.
The invention also provides a man-machine interaction device. The man-machine interaction device of the invention comprises:
the detection module is used for acquiring a real scene acquired by a camera if a display instruction of an augmented reality application program virtual image is detected, and operating the augmented reality application program based on the real scene so as to display the virtual image of the augmented reality application program in the real scene;
the first acquisition module is used for acquiring a video image acquired by a camera and identifying a human body image in the video image acquired by the camera;
the second acquisition module is used for acquiring characteristic information in the human body image based on the human body image;
and the execution module is used for determining the action instruction matched with the characteristic information and executing the operation corresponding to the action instruction based on the augmented reality application program so that the virtual image makes the interactive action matched with the action instruction in the real scene.
Further, the detection module is further configured to:
if a display instruction of an augmented reality application program virtual image is received, acquiring a real scene acquired by a camera and determining spatial information of the real scene;
acquiring an avatar of the augmented reality application program, and identifying spatial information of the reality scene;
and determining the display position of the virtual image in the real scene based on the spatial information of the real scene, and running the augmented reality application program based on the display position so as to display the virtual image at the display position of the real scene.
Further, the second obtaining module is further configured to:
determining human body action based on the human body image;
detecting whether the duration time of the human body action reaches a preset time or not;
if the duration time reaches the preset time, determining the action type of the human body action;
and determining characteristic information corresponding to the action type based on the action type.
Further, the second obtaining module is further configured to:
if the action type is facial expression action, acquiring a key point track of the facial five sense organs position based on the human body image;
and determining emotion state information corresponding to the facial expression actions based on the key point tracks.
Further, the second obtaining module is further configured to:
if the human body action type is limb action, acquiring joint point track information of the limb based on the human body image;
determining category information of the limb action based on the joint point trajectory information.
Further, the execution module is further configured to:
determining whether an interactive action matched with the action instruction exists in an action library;
and if the virtual image exists, executing the operation corresponding to the action instruction based on the augmented reality application program so that the virtual image can make an interactive action matched with the action instruction in the real scene.
Further, the execution module is further configured to:
if the virtual image does not exist, recording the action instruction, and controlling the virtual image to output preset prompt information;
when an updating instruction is detected, determining whether an updating action matched with the action instruction exists in an updating packet corresponding to the updating instruction;
and if so, storing the action command and the updated action association in the action library.
The invention also provides a computer readable storage medium.
The computer readable storage medium of the present invention stores therein a human-computer interaction program, which when executed by a processor implements the steps of the human-computer interaction method as described above.
The method implemented when the human-computer interaction program running on the processor is executed may refer to each embodiment of the human-computer interaction method of the present invention, and details are not described here.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A human-computer interaction method is characterized by comprising the following steps:
if a display instruction of an augmented reality application program virtual image is detected, acquiring a real scene acquired by a camera, and operating the augmented reality application program based on the real scene so as to display the virtual image of the augmented reality application program in the real scene;
acquiring a video image acquired by a camera, and identifying a human body image in the video image acquired by the camera;
acquiring characteristic information in the human body image based on the human body image;
and determining action instructions matched with the characteristic information, and executing operation corresponding to the action instructions based on the augmented reality application program so that the virtual image makes interactive actions matched with the action instructions in the real scene.
2. The human-computer interaction method according to claim 1, wherein the step of acquiring feature information in the human body image based on the human body image comprises:
determining human body action based on the human body image;
detecting whether the duration time of the human body action reaches a preset time or not;
if the duration time reaches the preset time, determining the action type of the human body action;
and determining characteristic information corresponding to the action type based on the action type.
3. The human-computer interaction method of claim 2, wherein the feature information is emotional state information corresponding to facial expression actions, and the step of determining the feature information corresponding to the action type based on the action type comprises:
if the action type is facial expression action, acquiring a key point track of the facial five sense organs position based on the human body image;
and determining emotion state information corresponding to the facial expression actions based on the key point tracks.
4. The human-computer interaction method according to claim 2, wherein the feature information is category information of a limb action, and the step of determining the feature information corresponding to the action type based on the action type further comprises:
if the human body action type is limb action, acquiring joint point track information of the limb based on the human body image;
determining category information of the limb action based on the joint point trajectory information.
5. The human-computer interaction method according to claim 1, wherein the step of determining the action instruction matching the feature information and executing an operation corresponding to the action instruction based on the augmented reality application program so that the avatar makes an interactive action matching the action instruction in the real scene comprises:
determining whether an interactive action matched with the action instruction exists in an action library;
and if the virtual image exists, executing the operation corresponding to the action instruction based on the augmented reality application program so that the virtual image can make an interactive action matched with the action instruction in the real scene.
6. The human-computer interaction method of claim 5, wherein the step of determining whether there is an interaction action matching the action instruction in the action library further comprises:
if the virtual image does not exist, recording the action instruction, and controlling the virtual image to output preset prompt information;
when an updating instruction is detected, determining whether an updating action matched with the action instruction exists in an updating packet corresponding to the updating instruction;
and if so, storing the action command and the updated action association in the action library.
7. The human-computer interaction method according to any one of claims 1 to 6, wherein the step of acquiring a real scene captured by a camera if a display instruction of an avatar of an augmented reality application is detected, and running the augmented reality application based on the real scene to display the avatar of the augmented reality application in the real scene comprises:
if a display instruction of an augmented reality application program virtual image is received, acquiring a real scene acquired by a camera and determining spatial information of the real scene;
acquiring an avatar of the augmented reality application program, and identifying spatial information of the reality scene;
and determining the display position of the virtual image in the real scene based on the spatial information of the real scene, and running the augmented reality application program based on the display position so as to display the virtual image at the display position of the real scene.
8. A human-computer interaction device, characterized in that the human-computer interaction device comprises:
the detection module is used for acquiring a real scene acquired by a camera if a display instruction of an augmented reality application program virtual image is detected, and operating the augmented reality application program based on the real scene so as to display the virtual image of the augmented reality application program in the real scene;
the first acquisition module is used for acquiring a video image acquired by a camera and identifying a human body image in the video image acquired by the camera;
the second acquisition module is used for acquiring characteristic information in the human body image based on the human body image;
and the execution module is used for determining the action instruction matched with the characteristic information and executing the operation corresponding to the action instruction based on the augmented reality application program so that the virtual image makes the interactive action matched with the action instruction in the real scene.
9. A mobile terminal, characterized in that the mobile terminal comprises: memory, a processor and a human-computer interaction program stored on the memory and executable on the processor, the human-computer interaction program when executed by the processor implementing the steps of the human-computer interaction method as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium, having a human-computer interaction program stored thereon, which, when executed by a processor, implements the steps of the human-computer interaction method of any one of claims 1 to 7.
CN201911179659.7A 2019-11-25 2019-11-25 Man-machine interaction method and device, mobile terminal and computer readable storage medium Pending CN110888532A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911179659.7A CN110888532A (en) 2019-11-25 2019-11-25 Man-machine interaction method and device, mobile terminal and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911179659.7A CN110888532A (en) 2019-11-25 2019-11-25 Man-machine interaction method and device, mobile terminal and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN110888532A true CN110888532A (en) 2020-03-17

Family

ID=69748943

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911179659.7A Pending CN110888532A (en) 2019-11-25 2019-11-25 Man-machine interaction method and device, mobile terminal and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110888532A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111880664A (en) * 2020-08-03 2020-11-03 深圳传音控股股份有限公司 AR interaction method, electronic device and readable storage medium
CN111915744A (en) * 2020-08-31 2020-11-10 深圳传音控股股份有限公司 Interaction method, terminal and storage medium for augmented reality image
CN112463004A (en) * 2020-11-25 2021-03-09 努比亚技术有限公司 Interactive interface operation control method and device and computer readable storage medium
CN112947102A (en) * 2021-02-17 2021-06-11 珠海格力电器股份有限公司 Intelligent household appliance control interaction system based on scene change and control method thereof
CN113103230A (en) * 2021-03-30 2021-07-13 山东大学 Human-computer interaction system and method based on remote operation of treatment robot
CN113709537A (en) * 2020-05-21 2021-11-26 云米互联科技(广东)有限公司 User interaction method based on 5G television, 5G television and readable storage medium
WO2022116751A1 (en) * 2020-12-02 2022-06-09 北京字节跳动网络技术有限公司 Interaction method and apparatus, and terminal, server and storage medium
CN114793286A (en) * 2021-01-25 2022-07-26 上海哔哩哔哩科技有限公司 Video editing method and system based on virtual image
CN115390663A (en) * 2022-07-27 2022-11-25 合壹(上海)展览有限公司 Virtual human-computer interaction method, system, equipment and storage medium
CN115909413A (en) * 2022-12-22 2023-04-04 北京百度网讯科技有限公司 Method, apparatus, device and medium for controlling avatar
WO2023207989A1 (en) * 2022-04-28 2023-11-02 北京字跳网络技术有限公司 Method and apparatus for controlling virtual object, and device and storage medium

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113709537B (en) * 2020-05-21 2023-06-13 云米互联科技(广东)有限公司 User interaction method based on 5G television, 5G television and readable storage medium
CN113709537A (en) * 2020-05-21 2021-11-26 云米互联科技(广东)有限公司 User interaction method based on 5G television, 5G television and readable storage medium
CN111880664A (en) * 2020-08-03 2020-11-03 深圳传音控股股份有限公司 AR interaction method, electronic device and readable storage medium
CN111915744A (en) * 2020-08-31 2020-11-10 深圳传音控股股份有限公司 Interaction method, terminal and storage medium for augmented reality image
CN112463004A (en) * 2020-11-25 2021-03-09 努比亚技术有限公司 Interactive interface operation control method and device and computer readable storage medium
CN112463004B (en) * 2020-11-25 2022-07-22 努比亚技术有限公司 Interactive interface operation control method and device and computer readable storage medium
WO2022116751A1 (en) * 2020-12-02 2022-06-09 北京字节跳动网络技术有限公司 Interaction method and apparatus, and terminal, server and storage medium
CN114793286A (en) * 2021-01-25 2022-07-26 上海哔哩哔哩科技有限公司 Video editing method and system based on virtual image
CN112947102A (en) * 2021-02-17 2021-06-11 珠海格力电器股份有限公司 Intelligent household appliance control interaction system based on scene change and control method thereof
CN113103230A (en) * 2021-03-30 2021-07-13 山东大学 Human-computer interaction system and method based on remote operation of treatment robot
WO2023207989A1 (en) * 2022-04-28 2023-11-02 北京字跳网络技术有限公司 Method and apparatus for controlling virtual object, and device and storage medium
CN115390663A (en) * 2022-07-27 2022-11-25 合壹(上海)展览有限公司 Virtual human-computer interaction method, system, equipment and storage medium
CN115390663B (en) * 2022-07-27 2023-05-26 上海合壹未来文化科技有限公司 Virtual man-machine interaction method, system, equipment and storage medium
CN115909413A (en) * 2022-12-22 2023-04-04 北京百度网讯科技有限公司 Method, apparatus, device and medium for controlling avatar
CN115909413B (en) * 2022-12-22 2023-10-27 北京百度网讯科技有限公司 Method, apparatus, device, and medium for controlling avatar

Similar Documents

Publication Publication Date Title
CN110888532A (en) Man-machine interaction method and device, mobile terminal and computer readable storage medium
CN110716645A (en) Augmented reality data presentation method and device, electronic equipment and storage medium
CN111556278B (en) Video processing method, video display device and storage medium
US10318011B2 (en) Gesture-controlled augmented reality experience using a mobile communications device
CN108525305B (en) Image processing method, image processing device, storage medium and electronic equipment
CN103890696B (en) Certified gesture identification
US9342230B2 (en) Natural user interface scrolling and targeting
CN108038726B (en) Article display method and device
CN108712603B (en) Image processing method and mobile terminal
CN105229582A (en) Based on the gestures detection of Proximity Sensor and imageing sensor
EP3968131A1 (en) Object interaction method, apparatus and system, computer-readable medium, and electronic device
CN109495616B (en) Photographing method and terminal equipment
CN112540696A (en) Screen touch control management method, intelligent terminal, device and readable storage medium
CN110822641A (en) Air conditioner, control method and device thereof and readable storage medium
CN109947988B (en) Information processing method and device, terminal equipment and server
WO2021047069A1 (en) Face recognition method and electronic terminal device
CN114358822A (en) Advertisement display method, device, medium and equipment
CN112437231B (en) Image shooting method and device, electronic equipment and storage medium
CN113553946A (en) Information prompting method and device, electronic equipment and storage medium
CN110955332A (en) Man-machine interaction method and device, mobile terminal and computer readable storage medium
CN111383346B (en) Interactive method and system based on intelligent voice, intelligent terminal and storage medium
CN112363852A (en) Popup message processing method, device, equipment and computer readable storage medium
CN114210045A (en) Intelligent eye protection method and device and computer readable storage medium
CN108763514B (en) Information display method and mobile terminal
CN111625101A (en) Display control method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination