CN110955332A

CN110955332A - Man-machine interaction method and device, mobile terminal and computer readable storage medium

Info

Publication number: CN110955332A
Application number: CN201911179806.0A
Authority: CN
Inventors: 殷秀玉
Original assignee: Shenzhen Microphone Holdings Co Ltd
Current assignee: Shenzhen Microphone Holdings Co Ltd; Shenzhen Transsion Holdings Co Ltd
Priority date: 2019-11-22
Filing date: 2019-11-22
Publication date: 2020-04-03

Abstract

The invention discloses a man-machine interaction method, which comprises the following steps: when a starting instruction is detected, starting a target application corresponding to the starting instruction, and placing an avatar of the target application in a real space acquired by the target application; if the voice information is detected, determining a voice instruction corresponding to the voice information; and controlling the virtual image to make an interactive action corresponding to the voice instruction in the real space. The invention also discloses a man-machine interaction device, a mobile terminal and a computer readable storage medium. The invention places the virtual image of the target application in the collected real space, controls the virtual image to make corresponding action by recognizing the voice command of the voice information, realizes man-machine interaction and improves the intelligence of the interaction.

Description

Man-machine interaction method and device, mobile terminal and computer readable storage medium

Technical Field

The invention relates to the technical field of intelligent interaction, in particular to a man-machine interaction method, a man-machine interaction device, a mobile terminal and a computer readable storage medium.

Background

The AR technology (Augmented Reality) is a technology for increasing the perception of a user to the real world through information provided by a computer system, and superimposes virtual objects, scenes or system prompt information generated by the computer to real scenes, thereby realizing "Augmented Reality".

Along with the continuous development of AR technology, AR equipment also more and more gets into in people's daily life, however current AR equipment is intelligent not enough, and the user mainly relies on actions such as user click, drag and upset to interact with the virtual image in the AR equipment when using AR equipment to carry out the interaction process, and interactivity is relatively poor.

Disclosure of Invention

The invention mainly aims to provide a man-machine interaction method, a man-machine interaction device, a mobile terminal and a computer readable storage medium, and aims to improve the terminal intelligence and realize intelligent interaction.

In order to achieve the above object, the present invention provides a human-computer interaction method, which comprises the following steps:

when a starting instruction is detected, starting a target application corresponding to the starting instruction, and placing an avatar of the target application in a real space acquired by the target application;

if the voice information is detected, determining a voice instruction corresponding to the voice information;

and controlling the virtual image to make an interactive action corresponding to the voice instruction in the real space.

Preferably, when a start instruction is detected, the step of starting a target application corresponding to the start instruction and placing an avatar of the target application in a real space acquired by the target application includes:

when a starting instruction is detected, starting a target application corresponding to the starting instruction, and acquiring real space information based on the target application;

and acquiring the virtual image of the target application, identifying the real space information, determining the placement position of the virtual image in the real space corresponding to the real space information, and placing the virtual image in the placement position.

Preferably, if voice information is detected, the step of determining the voice instruction corresponding to the voice information includes:

if the voice information is detected, recognizing the voice type of the voice information;

and determining a voice instruction corresponding to the voice information based on the voice type.

Preferably, the step of determining the voice instruction corresponding to the voice information based on the voice type includes:

if the voice type is music, determining the music type of the music;

and determining a voice instruction corresponding to the voice information based on the music category.

if the voice type is a language, detecting whether a wake-up instruction exists in the voice information;

if yes, identifying semantics in the voice information, and determining a voice instruction corresponding to the voice information based on the semantics.

Preferably, the step of controlling the avatar to make the interactive action corresponding to the voice instruction in the real space includes:

determining whether an interactive action corresponding to the voice instruction exists in an action library;

and if so, controlling the virtual image to make the interactive action in the real space.

Preferably, after the step of determining whether the interaction action corresponding to the voice instruction exists in the preset action library, the method further includes:

if the voice command does not exist, recording the voice command, and controlling the virtual image to output preset prompt information;

when an updating instruction is detected, determining whether an updating action corresponding to the voice instruction exists in an updating packet corresponding to the updating instruction;

and if so, storing the voice command and the updated action association in the action library.

In addition, to achieve the above object, the present invention further provides a human-computer interaction device, including:

the detection module is used for starting a target application corresponding to a starting instruction when the starting instruction is detected, and placing a virtual image of the target application in a real space acquired by the target application;

the determining module is used for determining a voice instruction corresponding to the voice information if the voice information is detected;

and the control module is used for controlling the virtual image to make the interactive action corresponding to the voice instruction in the real space.

Preferably, the detection module is further configured to:

Preferably, the determining module is further configured to:

if the voice type is music, determining the music type of the music;

Preferably, the determining module is further configured to:

Preferably, the control module is further configured to:

In addition, to achieve the above object, the present invention also provides a mobile terminal, including: the system comprises a memory, a processor and a human-computer interaction program stored on the memory and capable of running on the processor, wherein the human-computer interaction program realizes the steps of the human-computer interaction method when being executed by the processor.

In addition, to achieve the above object, the present invention further provides a computer readable storage medium, on which a human-computer interaction program is stored, and the human-computer interaction program, when executed by a processor, implements the steps of the human-computer interaction method as described above.

According to the man-machine interaction method, when a starting instruction is detected, a target application corresponding to the starting instruction is started, and the virtual image of the target application is placed in a real space acquired by the target application; if the voice information is detected, determining a voice instruction corresponding to the voice information; and controlling the virtual image to make an interactive action corresponding to the voice instruction in the real space. The invention places the virtual image of the target application in the collected real space, controls the virtual image to make corresponding action by recognizing the voice command of the voice information, realizes man-machine interaction and improves the intelligence of the interaction.

Drawings

Fig. 1 is a schematic structural diagram of a mobile terminal in a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a human-computer interaction method according to a first embodiment of the present invention;

FIG. 3 is a schematic diagram of a human-computer interaction method according to a first embodiment of the present invention, in which an avatar is placed in a real space;

FIG. 4 is a flowchart illustrating a human-computer interaction method according to a second embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, fig. 1 is a schematic structural diagram of a mobile terminal in a hardware operating environment according to an embodiment of the present invention.

The mobile terminal of the embodiment of the invention can be a mobile terminal such as a mobile phone, a tablet, a digital camera and the like.

As shown in fig. 1, the mobile terminal may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the mobile terminal architecture shown in fig. 1 is not intended to be limiting of mobile terminals and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a human-computer interaction program.

The operating system is a program for managing and controlling the mobile terminal and software resources, and supports the operation of a network communication module, a user interface module, a man-machine interaction program and other programs or software; the network communication module is used for managing and controlling the network interface 1002; the user interface module is used to manage and control the user interface 1003.

In the mobile terminal shown in fig. 1, the mobile terminal calls a human-computer interaction program stored in a memory 1005 through a processor 1001 and performs operations in various embodiments of a human-computer interaction method described below.

Based on the hardware structure, the embodiment of the man-machine interaction method is provided.

Referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of a human-computer interaction method of the present invention, where the method includes:

step S10, when a starting instruction is detected, starting a target application corresponding to the starting instruction, and placing the virtual image of the target application in the real space collected by the target application;

step S20, if voice information is detected, determining a voice instruction corresponding to the voice information;

and step S30, controlling the virtual image to make an interactive action corresponding to the voice instruction in the real space.

The man-machine interaction method is applied to a mobile terminal, the mobile terminal is an intelligent terminal supporting AR technology, an AR application is installed, a user clicks an AR application icon of a display interface of the mobile terminal, or starts an opening instruction such as voice awakening, the AR application of the mobile terminal is started, the display interface of the mobile terminal can acquire a real space through the AR application, and a 3D virtual image in the AR application is fused with the real space, so that reality is enhanced.

The mobile terminal of this embodiment places the avatar in the real space when detecting the opening instruction, realizes augmented reality to when detecting voice command, control the avatar and make corresponding interactive action in the real space, improve mobile terminal's operability, realize intelligent interaction.

The respective steps will be described in detail below:

step S10, when a start instruction is detected, starting a target application corresponding to the start instruction, and placing the virtual image of the target application in the real space collected by the target application.

In this embodiment, when detecting the start instruction, the mobile terminal starts a target application corresponding to the start instruction, where the target application is specifically an AR application supporting an AR technology. The starting instruction comprises a clicking operation instruction, a voice instruction and the like, and a user can trigger the starting instruction by clicking an AR application icon on a display interface of the mobile terminal; the AR application may also be woken up by voice, specifically, the user sends out a voice containing starting the AR application, thereby triggering a start instruction, and the like.

After the target application is started, acquiring an avatar of the target application, acquiring real space information, and placing the avatar in a real space, wherein the real space may be a preset picture or a preset video, or a real picture before a current lens acquired by a camera of a mobile terminal, and the avatar is a 3D avatar constructed in advance, that is, the avatar constructed by the mobile terminal and the real space are fused and displayed on a display interface of the mobile terminal, as shown in fig. 3, the acquired desktop (that is, the real space) is displayed on the display interface of the mobile terminal, and the 3D avatar is placed on the desktop.

Further, step S10 includes:

in this step, when the mobile terminal detects the start instruction, the target application corresponding to the start instruction is started, and real space information is collected based on the target application, where the real space information refers to information included in a real space collected by the mobile terminal, and as shown in fig. 3, the real space information includes a human object appearing in the real space.

Then, the mobile terminal obtains the avatar of the target application, identifies the real space information, and determines the placement position of the avatar in the real space corresponding to the real space information, that is, the placement position of the avatar in the real space is not arbitrary, but is determined by identifying the real space information, specifically, the mobile terminal identifies the human object included in the real space information, and determines the placement point corresponding to the human object, wherein the placement point can be represented by a horizontal plane in a specific embodiment, so as to determine the placement position of the avatar, as shown in fig. 3, if the desktop is a horizontal plane in the real space, the placement position of the avatar is determined to be the desktop.

In the case where there are a plurality of placement points in the real space, a placement point near the center point of the display interface of the mobile terminal is preferable as the placement position.

Finally, the avatar is placed at the placement location.

Furthermore, in the process of obtaining the avatar of the target application, the corresponding avatar can be obtained according to the current real space information, that is, the avatar can be constructed based on the current real space information, and if the real space is indoor, the avatar of a doll shape is obtained; if the robot is outdoors, acquiring the virtual image of the animal and plant shape. According to different real space information, different virtual images are obtained, the operability of the mobile terminal is improved, the interaction interest is increased, and intelligent interaction is achieved.

Step S20, if the voice information is detected, determining a voice command corresponding to the voice information.

In this embodiment, after the mobile terminal displays the avatar and the real space in a fused manner on the display interface of the mobile terminal, if the voice information is detected, a voice instruction corresponding to the voice information is determined, that is, the user can interact with the avatar by using traditional actions such as clicking, dragging, and flipping, and also can interact in a voice manner, so that, in specific implementation, the mobile terminal needs to have a certain AI (Artificial Intelligence) capability, and can recognize the voice information sent by the user or the voice information sent by other devices. Since speech recognition technology is prior art, it will not be described here too much. Specifically, the mobile terminal recognizes semantics contained in the voice information, determines a voice instruction through the semantics, and specifically recognizes the semantics of the current voice information through command keywords contained in the voice information, such as "hug", "kiss", "dancing", "flying", and other command keywords, which indicate that the virtual image is to make corresponding interactive actions such as hug, kiss, dancing, flying, and the like.

In this embodiment, after determining the current voice instruction, the mobile terminal controls the avatar to perform an interactive action corresponding to the voice instruction in the real space, and if the recognized voice instruction is "dancing", the mobile terminal controls the avatar to perform a dancing action, wherein the dancing action can be determined according to a dance type included in the voice instruction, and if the voice instruction is dancing in a nation, the mobile terminal controls the avatar to perform a dancing action in a nation, and the like.

Further, step S30 includes:

and determining a target part of the virtual image based on the voice instruction, and controlling the target part to perform an interactive action corresponding to the voice instruction.

That is, after the voice command is determined, it is further determined which target part of the avatar the voice command is specific to, and then only the target part needs to be controlled to perform corresponding interaction, and other parts do not need to be changed, so that the processing pressure of the mobile terminal can be reduced.

In this embodiment, when a start instruction is detected, a target application corresponding to the start instruction is started, and an avatar of the target application is placed in a real space acquired by the target application; if the voice information is detected, determining a voice instruction corresponding to the voice information; and controlling the virtual image to make an interactive action corresponding to the voice instruction in the real space. The invention places the virtual image of the target application in the collected real space, controls the virtual image to make corresponding action by recognizing the voice command of the voice information, realizes man-machine interaction and improves the intelligence of the interaction.

Further, based on the first embodiment of the human-computer interaction method, the second embodiment of the human-computer interaction method is provided.

The second embodiment of the human-computer interaction method is different from the first embodiment of the human-computer interaction method in that, referring to fig. 4, step S20 includes:

step S21, if voice information is detected, recognizing the voice type of the voice information;

step S22, determining a voice instruction corresponding to the voice information based on the voice type.

When the voice information is detected, the corresponding voice instruction is determined by recognizing the voice type of the voice information, so that the mobile terminal can recognize the voice instruction of the voice information more accurately, and intelligent recognition is realized.

The respective steps will be described in detail below:

step S21, if the voice information is detected, recognizing the voice type of the voice information.

In this embodiment, if the mobile terminal detects the voice information, the voice type of the voice information is further recognized, where the voice type includes a language, music, and the like, the specific recognition mode may be recognized according to whether there is background music, and if the current voice information does not include the background music, the corresponding voice type is a language; if the current speech information includes background music, the corresponding speech type is music, etc.

It can be understood that the voice information can also be identified through voiceprints, and the voice type of the current voice information can be identified by specifically referring to the existing song shaking and searching technology and the like.

In this embodiment, the mobile terminal determines the voice command corresponding to the voice message according to the voice type, that is, different voice types correspond to different voice commands, and the corresponding voice command can be further determined only by determining the voice type of the current voice message first.

Specifically, step S22 includes:

if the voice type is music, determining the music type of the music;

in this step, if the mobile terminal determines that the current voice information is music, the music type of the music is further determined, wherein the music type includes a slow type music, a rap music, a rock music, and the like. It should be noted that the voice commands triggered by different music types are different.

In the step, the mobile terminal determines a voice instruction corresponding to the voice information according to the determined music type, wherein if the music type to which the current voice information belongs is a relaxing music, the corresponding voice instruction is a dancing and relaxing dance, such as a national dance and the like; and if the music type of the current voice information is rock music, the corresponding voice instruction is dancing and popping dance, such as street dance and the like.

Further, step S22 includes:

in this step, if the mobile terminal determines that the current voice message is a language, it further determines whether a wake-up command exists in the voice message, where the wake-up command refers to a specific command, and the mobile terminal recognizes the current voice message and performs a corresponding action after receiving the wake-up command.

In the step, if the mobile terminal determines that the current voice information contains the awakening instruction, the mobile terminal continues to recognize the semantics in the voice, and determines the voice instruction corresponding to the current voice information through the semantics, if the user wants to interact with the virtual image, the mobile terminal sends out the voice information containing the X-boy holding, the mobile terminal first determines that the awakening instruction containing the X-boy is contained in the voice information, and the semantics is holding, the mobile terminal determines that the current voice information is the action of holding, and the mobile terminal controls the virtual image to make the action of holding.

It can be understood that if the voice message does not have the wake-up instruction, the semantics of the current voice message are not recognized, that is, the current voice message is invalid, and the mobile terminal does not need to control the virtual image to make an action.

In the recognition process, if the speech information includes speech and music, the corresponding speech commands are sequentially determined according to the detected order, and the corresponding interactive actions are executed.

The embodiment identifies the voice information, and determines the corresponding voice command by identifying the voice type of the voice information, so that the voice identification function of the mobile terminal adapts to various application scenes, and the voice command is more accurately identified in the identification process, so that correct interaction is performed, and intelligent interaction is realized.

Further, a third embodiment of the human-computer interaction method is provided based on the first and second embodiments of the human-computer interaction method.

The third embodiment of the human-computer interaction method differs from the first and second embodiments of the human-computer interaction method in that step S30 includes:

step a, determining whether an interactive action corresponding to the voice instruction exists in an action library;

and b, if the virtual image exists, controlling the virtual image to perform the interactive action in the real space.

When the mobile terminal controls the virtual image to perform the interaction action, whether the interaction action corresponding to the current voice instruction exists in the action library needs to be determined, that is, the mobile terminal only controls the virtual image to perform the interaction action in the action library, so that the interaction is ensured to be correct, and the intelligent interaction is realized.

The respective steps will be described in detail below:

step a, determining whether an interactive action corresponding to the voice instruction exists in an action library.

In this embodiment, the mobile terminal has one or more action libraries, and various types of interactive actions are stored in the action libraries, and the interactive actions correspond to the voice commands one by one.

After the voice instruction is determined, the mobile terminal further determines whether an interactive action corresponding to the determined voice instruction exists in an action library, specifically, the mobile terminal compares the current voice instruction with the voice instructions stored in advance one by one, judges whether one voice instruction exists in the voice instructions stored in advance in the mobile terminal and matches with the current voice instruction, and if the one voice instruction exists, acquires the interactive action corresponding to the current voice instruction according to a correspondence table of the voice instruction and the interactive action. It is understood that the interaction and the corresponding voice command may be stored in association with each other in the action library or may be stored separately.

Further, step a includes:

and determining an action library corresponding to the voice type, and determining whether the interaction action corresponding to the voice instruction exists in the action library.

In another embodiment, the mobile terminal has a plurality of action libraries, which are specifically divided according to the voice type of the voice message, and if the voice type has two types, namely language and music, the mobile terminal has two action libraries, and each action library only stores the corresponding interactive action of the voice type. Therefore, after the current voice command and the voice type are determined, only the corresponding action library is needed to determine whether the interactive action corresponding to the voice command exists, and the confirmation in all the action libraries is not needed, so that the determination speed for determining whether the interactive action corresponding to the voice command exists in the action library is improved.

And the mobile terminal can control the virtual image to perform corresponding interactive action only if the interactive action corresponding to the current voice instruction exists in the action library of the mobile terminal.

Further, after the step a, the man-machine interaction method further includes:

in this step, if the mobile terminal determines that there is no interactive action corresponding to the current voice instruction in the action library, the current voice instruction is recorded, and the avatar is controlled to output preset prompting information, where the preset prompting information may be action information and/or voice information, for example, the mobile terminal controls the avatar to perform a booth hand action, or shake a head action and/or make a sound that i cannot understand, so as to prompt the user that the current voice instruction is invalid.

in this step, when the mobile terminal detects an update instruction, it is determined whether an update package corresponding to the update instruction has an update action corresponding to a previously recorded voice instruction, that is, a voice instruction recorded by the mobile terminal and corresponding to no interactive action is fed back to the background upgrade staff, the background upgrade staff updates the action library according to the fed-back voice instruction, and specifically adds an update action corresponding to a recorded voice instruction, so that the interactive actions in the action library are more and more abundant, and each interactive action is made according to a voice instruction corresponding to voice information sent by a user.

In the step, if the update package is confirmed to have the update action, the update action and the recorded voice command are managed and stored in the action library so as to increase the interactive action in the action library and realize the upgrading and updating of the action library.

It can be understood that if it is determined that there is no update action in the update package, the recorded voice instruction is continuously fed back to the background upgrade personnel to wait for the next update.

In this embodiment, when the avatar is controlled to perform the interactive action, it is first determined whether there is an interactive action corresponding to the current voice instruction in the action library, that is, only the avatar is controlled to perform the interactive action in the action library, and if there is no interactive action corresponding to the voice instruction in the action library, the background is fed back for upgrading, so that the interaction is ensured to be correct, and the intelligent interaction is realized.

The invention also provides a man-machine interaction device. The man-machine interaction device of the invention comprises:

Further, the detection module is further configured to:

Further, the determining module is further configured to:

if the voice type is music, determining the music type of the music;

Further, the determining module is further configured to:

Further, the control module is further configured to:

The invention also provides a computer readable storage medium.

The computer readable storage medium of the present invention stores therein a human-computer interaction program, which when executed by a processor implements the steps of the human-computer interaction method as described above.

The method implemented when the human-computer interaction program running on the processor is executed may refer to each embodiment of the human-computer interaction method of the present invention, and details are not described here.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A human-computer interaction method is characterized by comprising the following steps:

2. The human-computer interaction method according to claim 1, wherein the step of starting a target application corresponding to the opening instruction and placing an avatar of the target application in the real space acquired by the target application when the opening instruction is detected comprises:

3. The human-computer interaction method of claim 1, wherein the step of determining the voice command corresponding to the voice message if the voice message is detected comprises:

4. The human-computer interaction method of claim 3, wherein the step of determining the voice instruction corresponding to the voice message based on the voice type comprises:

if the voice type is music, determining the music type of the music;

5. The human-computer interaction method of claim 3, wherein the step of determining the voice instruction corresponding to the voice message based on the voice type comprises:

6. The human-computer interaction method according to any one of claims 1 to 5, wherein the step of controlling the avatar to perform the interaction corresponding to the voice command in the real space comprises:

7. The human-computer interaction method of claim 6, wherein after the step of determining whether the interaction action corresponding to the voice command exists in a preset action library, the method further comprises:

8. A human-computer interaction device, characterized in that the human-computer interaction device comprises:

9. A mobile terminal, characterized in that the mobile terminal comprises: memory, a processor and a human-computer interaction program stored on the memory and executable on the processor, the human-computer interaction program when executed by the processor implementing the steps of the human-computer interaction method as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium, having a human-computer interaction program stored thereon, which, when executed by a processor, implements the steps of the human-computer interaction method of any one of claims 1 to 7.