CN113257251A

CN113257251A - Robot user identification method, apparatus and storage medium

Info

Publication number: CN113257251A
Application number: CN202110514922.4A
Authority: CN
Inventors: 罗沛; 梁朋
Original assignee: Uditech Co Ltd
Current assignee: Uditech Co Ltd
Priority date: 2021-05-11
Filing date: 2021-05-11
Publication date: 2021-08-13

Abstract

The invention discloses a method, equipment and a storage medium for identifying a robot user, wherein the method comprises the following steps: acquiring an environment voice signal in an environment; determining an environment voice signal matched with preset semantic information in the environment voice signals, and taking the environment voice signal matched with the preset semantic information as a target voice signal; determining a target sound source direction corresponding to the target voice signal, and acquiring image information in the target sound source direction; and determining whether a robot user corresponding to the target voice signal exists in the target sound source direction or not according to the image information. The method and the device identify the robot user by combining the means of identifying the voice signal and the image, improve the accuracy of the robot for identifying the robot user, and enable the robot to better provide service for the robot user.

Description

Robot user identification method, apparatus and storage medium

Technical Field

The present invention relates to the field of robotics, and in particular, to a method and apparatus for identifying a robot user, and a storage medium.

Background

With the rapid development of computer technology, sensor technology, artificial intelligence and other technologies, robot technology is becoming mature day by day, and the mobile robot type among them is most widely used and plays an increasingly important role in numerous industries, and these various robots can well complete work under specific environments.

However, the existing robots have many disadvantages, and in most cases, the accuracy of the robot to identify the robot user who calls the robot for service is low, for example, the usage scenario of one robot is as follows: when a user wants to use the robot, the user sends out voice to call the robot, such as 'xx robot, please come over'. The robot recognizes the robot according to the voice information uttered by the user, and the accuracy of recognizing the user of the robot is low because there is much disturbing sound (such as talking sound of other people) in the environment.

The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.

Disclosure of Invention

The invention mainly aims to provide a method, equipment and a storage medium for identifying a robot user, and aims to solve the technical problem of low accuracy of identifying the robot user.

In order to achieve the above object, the present invention provides a method for identifying a robot user, the method being applied to a robot, the method comprising the steps of:

acquiring an environment voice signal in an environment;

determining an environment voice signal matched with preset semantic information in the environment voice signals, and taking the environment voice signal matched with the preset semantic information as a target voice signal;

determining a target sound source direction corresponding to the target voice signal, and acquiring image information in the target sound source direction;

and determining whether a robot user corresponding to the target voice signal exists in the direction of the target sound source or not according to the image information.

Optionally, the determining whether a robot user corresponding to the target voice signal exists in the target sound source direction according to the image information includes:

acquiring a voice signal in the direction of the target sound source;

if the voice signal in the direction of the target sound source contains a call instruction, detecting whether the image information contains human face features;

and if the image information contains human face features, determining that a robot user corresponding to the target voice signal exists in the target sound source direction.

acquiring the face features in the direction of the target sound source, and extracting mouth shape actions contained in the face features;

and if the mouth shape action contained in the face features is matched with a preset mouth shape action, determining that a robot user corresponding to the target voice signal exists in the target sound source direction.

extracting gesture actions in the image information;

and if the gesture action is matched with a preset gesture action, determining that a robot user corresponding to the target voice signal exists in the target sound source direction.

Optionally, the preset gesture motion comprises a first gesture motion of moving in the direction of the target sound source or a second gesture motion of telescopic grabbing in the direction of the target sound source.

Optionally, the step of determining, in the ambient voice signals, an ambient voice signal matching preset semantic information, and using the ambient voice signal matching the preset semantic information as a target voice signal includes:

extracting semantic information corresponding to the environment voice signal;

and determining semantic information matched with preset semantic information in the semantic information, and taking an environment voice signal corresponding to the semantic information matched with the preset semantic information as a target voice signal.

Optionally, the step of acquiring the image information in the direction of the target sound source includes:

and controlling a camera of the robot to turn to the direction of the target sound source so as to acquire image information in the direction of the target sound source.

controlling the robot to adjust an orientation of the robot to face the target sound source direction;

and acquiring image information in the direction of the target sound source.

In order to achieve the above object, the present invention also provides an identification apparatus for a robot user, comprising: the robot system comprises a memory, a processor and a robot user identification program stored on the memory and capable of running on the processor, wherein the steps of the robot user identification method are realized when the robot user identification program is executed by the processor.

In order to achieve the above object, the present invention further provides a storage medium having stored thereon an identification program for a robot user, the identification program for a robot user realizing the above steps of the method for identifying a robot user when executed by a processor.

The method comprises the steps of acquiring an environment voice signal in an environment; determining an environment voice signal matched with preset semantic information in the environment voice signals, and taking the environment voice signal matched with the preset semantic information as a target voice signal; determining a target sound source direction corresponding to the target voice signal, and acquiring image information in the target sound source direction; and determining whether a robot user corresponding to the target voice signal exists in the direction of the target sound source or not according to the image information. According to the method, the target voice signal containing the preset semantic information in the environment voice signal is determined by recognizing the environment voice signal in the environment, the image information in the target voice source direction where the target voice signal is located is recognized, whether the robot user exists in the target voice source direction is judged, the robot user is recognized by combining the means of recognizing the voice signal and the image, the accuracy of the robot for recognizing the robot user is improved, and therefore the robot can better provide services for the robot user.

Drawings

FIG. 1 is a schematic diagram of an identification device for a robot user in a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a first embodiment of a method for identifying a user of a robot according to the present invention;

FIG. 3 is a flowchart illustrating a second embodiment of a method for identifying a user of a robot according to the present invention;

fig. 4 is a schematic system structure diagram of an embodiment of an identification device for a robot user according to the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, fig. 1 is a schematic structural diagram of an identification device of a robot user in a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 1, the identification apparatus of a robot user according to an embodiment of the present invention may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

Optionally, the identification device of the robot user may further include a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WiFi module, and the like. Such as light sensors, motion sensors, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust brightness according to the brightness of ambient light. As one of the motion sensors, the gravity acceleration sensor can detect the magnitude of acceleration in each direction (generally, three axes), detect the magnitude and direction of gravity when the robot is stationary, and can be used for application of recognizing the posture of equipment by a robot user, related functions of vibration recognition (such as pedometer and knocking), and the like; of course, the identification device of the robot user may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which are not described herein again.

It will be appreciated by those skilled in the art that the configuration of the identification device of the robot user shown in fig. 1 does not constitute a limitation of the identification device of the robot user, and may comprise more or less components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and an identification program of a robot user.

In the identification device of the robot user shown in fig. 1, the network interface 1004 is mainly used for connecting to a background server and performing data communication with the background server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be used to call up the robot user's identification program stored in the memory 1005.

In this embodiment, the robot user's recognition apparatus includes: a memory 1005, a processor 1001, and an identification program of a robot user stored in the memory 1005 and capable of running on the processor 1001, wherein when the processor 1001 calls the identification program of the robot user stored in the memory 1005, the following operations are performed:

acquiring an environment voice signal in an environment;

Further, the processor 1001 may call the robot user's identification program stored in the memory 1005, and also perform the following operations:

acquiring a voice signal in the direction of the target sound source;

extracting gesture actions in the image information;

extracting semantic information corresponding to the environment voice signal;

and acquiring image information in the direction of the target sound source.

The present invention further provides a method for identifying a robot user, referring to fig. 2, and fig. 2 is a flowchart illustrating a first embodiment of the method for identifying a robot user according to the present invention.

In this embodiment, the method for identifying a robot user according to the present invention is applied to a robot, which is an intelligent machine capable of semi-autonomous or fully-autonomous operation, and includes the following steps:

step S10, acquiring an environment voice signal in the environment;

in this embodiment, when the robot enters a standby state or an idle state at a parking position of a hotel lobby, an environmental voice signal in an environment is acquired in real time through a sound sensor to perform real-time monitoring on the environmental voice signal, wherein the sound sensor may be a microphone. Further, the robot is provided with a plurality of sound sensors corresponding to different directions.

Step S20, determining an environment voice signal matched with preset semantic information in the environment voice signals, and using the environment voice signal matched with the preset semantic information as a target voice signal;

in this embodiment, after the environmental voice signal is acquired, according to the environmental voice signal, the environmental voice signal matched with the preset semantic information is determined in the environmental voice signal, so as to identify the voice signal including the preset keyword in the environmental voice signal, it should be noted that the preset semantic information is a word or a sentence including the preset keyword, and since the voice signal includes the semantic information, by identifying the environmental voice signal, the environmental voice signal matched with the preset semantic information in the environmental voice signal can be determined, and the environmental voice signal matched with the preset semantic information is taken as the target voice signal.

Further, the means for determining the ambient speech signal matching the preset semantic information comprises: (1) comparing each collected environment voice signal with a preset voice signal containing preset semantic information, and taking the environment voice signal matched with the preset voice signal as a target voice signal; (2) analyzing each environment voice signal to extract semantic information of the environment voice signals, comparing the semantic information corresponding to the environment voice signals with preset semantic information, and taking the environment voice signals corresponding to the semantic information matched with the preset semantic information as target voice signals.

It is understood that the speech signals formed by different keywords are different, and semantics can be proposed and recognized based on the speech signals. If the acquired voice signal is matched with preset keywords (such as Youimei, Youixiabroi and the like), it is indicated that a person is currently calling the robot. At this time, the voice signal including the preset keyword is locked as the target voice signal.

Step S30, determining a target sound source direction corresponding to the target voice signal, and acquiring image information in the target sound source direction;

in this embodiment, a sound source direction corresponding to a target speech signal is taken as a target sound source direction; after the target sound source direction corresponding to the target voice signal is determined, image information in the target sound source direction is obtained, and whether a machine user exists in the target sound source direction or not is identified according to the image information obtained in the target sound source direction subsequently. Further, after the target speech signal is specified, the sound sensor corresponding to the target speech signal is specified from among the sound sensors based on the target speech signal, and the sound sensor corresponding to the target speech signal is used as the target sound sensor, thereby specifying the target sound source direction corresponding to the target speech signal.

Further, the step of acquiring the image information in the direction of the target sound source includes:

step S301, controlling a camera of the robot to turn to the direction of the target sound source so as to acquire image information in the direction of the target sound source; alternatively, the first and second electrodes may be,

step S302, controlling the robot to adjust the orientation of the robot so that the robot faces the target sound source direction, and acquiring image information in the target sound source direction.

The orientation of the camera can be adjusted to be consistent with the direction of the target sound source by adjusting the camera of the robot, so that image information in the direction of the target sound source is collected through the camera. The robot can be adjusted to face the target sound source direction by adjusting the posture of the robot so as to acquire image information in the target sound source direction.

Step S40, determining whether or not a robot user corresponding to the target voice signal is present in the target sound source direction based on the image information.

In this embodiment, after the image information in the target sound source direction is collected, the image information is identified to determine whether a robot user corresponding to the target voice signal exists in the target sound source direction. Specifically, if the image information is identified to contain the robot user, the robot user corresponding to the target voice signal exists in the target sound source direction; if the robot user is identified not to be included in the image information, the robot user corresponding to the target voice signal does not exist in the target voice source direction. Further, if there is a robot user in the direction of the target sound source, the robot travels in front of the robot user to provide services to the robot user, and the services that the robot can provide include navigation services, chat services, and transport services, etc.

In the method for identifying the robot user provided by the embodiment, the environment voice signal in the environment is acquired; determining an environment voice signal matched with preset semantic information in the environment voice signals, and taking the environment voice signal matched with the preset semantic information as a target voice signal; determining a target sound source direction corresponding to the target voice signal, and acquiring image information in the target sound source direction; and determining whether a robot user corresponding to the target voice signal exists in the direction of the target sound source or not according to the image information. According to the method, the target voice signal containing the preset semantic information in the environment voice signal is determined by recognizing the environment voice signal in the environment, the image information in the target voice source direction where the target voice signal is located is recognized, whether the robot user exists in the target voice source direction is judged, the robot user is recognized by combining the means of recognizing the voice signal and the image, the accuracy of the robot for recognizing the robot user is improved, and therefore the robot can better provide services for the robot user.

Referring to fig. 3, in a second embodiment of the method for identifying a robot user according to the present invention, the method for determining whether a robot user corresponding to the target voice signal exists in the target voice source direction according to the image information in step S40 includes:

step S401, acquiring a voice signal in the direction of the target sound source;

step S402, if the voice signal in the direction of the target sound source contains a call instruction, detecting whether the image information contains human face features;

step S403, if the image information includes a human face feature, determining that a robot user corresponding to the target voice signal exists in the target sound source direction.

In this embodiment, it should be noted that, the first method for determining whether the robot user exists in the target sound source direction is: after the robot user calling the robot is identified, the robot turns to the target sound source direction, whether a voice signal for re-calling the robot exists is detected, and in combination with judging whether the acquired image information contains the human face features, the robot can determine that the robot user exists in the target sound source direction only if the voice signal for re-calling the robot exists in the detected target sound source direction and the image information contains the human face features.

Specifically, the voice signal in the target sound source direction is acquired again, whether the voice signal in the target sound source direction contains a call instruction or not and whether the image information contains a human face feature or not are detected, if the voice signal in the target sound source direction contains the call instruction and the image information contains the human face feature, it is determined that the robot user exists in the target sound source direction, and if not, it is determined that the robot user does not exist. Or whether the voice signal in the direction of the target sound source contains a call instruction is detected firstly, whether the voice signal in the direction of the target sound source contains the call instruction is detected, and whether the image information contains the human face features is detected, so that the recognition efficiency is improved.

Further, the determining whether a robot user corresponding to the target voice signal exists in the target sound source direction according to the image information includes:

step S411, acquiring the human face features in the direction of the target sound source, and extracting mouth shape actions contained in the human face features;

step S412, if the mouth shape action included in the face feature matches a preset mouth shape action, determining that a robot user corresponding to the target voice signal exists in the target sound source direction.

In this embodiment, it should be noted that, the second method for determining whether the robot user exists in the target sound source direction is: after the robot turns to obtain image information in the direction of a target sound source, whether a robot user exists in the direction of the target sound source is judged by detecting whether the image information contains human face features and detecting whether mouth shape actions in the human face features meet certain conditions. Only if the detected image information contains the human face features and the mouth shape actions in the detected human face features meet certain conditions, the robot can determine that the robot user exists in the target sound source direction.

Specifically, after the robot turns to obtain image information in the direction of a target sound source, whether the image information has human face features or not is detected, if the image information has the human face features, mouth shape actions contained in the recognized human face features are extracted, then the mouth shape actions contained in the human face features are compared with preset mouth shape actions, if the mouth shape actions contained in the human face features are matched with the preset mouth shape actions, it is determined that a robot user exists in the direction of the target sound source, and if not, it is determined that the robot user does not exist.

step S421, extracting gesture actions in the image information;

step S422, if the gesture action is matched with a preset gesture action, determining that a robot user corresponding to the target voice signal exists in the target sound source direction.

In this embodiment, it should be noted that, the third method for determining whether the robot user exists in the target sound source direction is: after the robot turns to obtain the image information in the target sound source direction, whether a robot user exists in the target sound source direction is judged by detecting whether the image information contains a specific gesture action. Only when the detected image information contains a specific gesture motion, the robot can determine that the robot user exists in the target sound source direction. Specifically, after the robot turns to obtain the image information in the target sound source direction, the gesture action in the image information is extracted, the recognized gesture action is compared with the preset gesture action, if the gesture action is matched with the preset gesture action, it is determined that a robot user exists in the target sound source direction, and otherwise, the robot user is determined to not exist.

Further, the preset gesture motion comprises a first gesture motion of moving in the direction of the target sound source or a second gesture motion of telescopic grabbing in the direction of the target sound source.

Optionally, one or more of the above three methods for determining whether the robot user is present in the target sound source direction are combined, thereby improving the accuracy of the robot in determining whether the robot user is present in the target sound source direction.

Further, the step of determining an ambient voice signal matching preset semantic information in the ambient voice signals, and using the ambient voice signal matching the preset semantic information as a target voice signal includes:

step S21, extracting semantic information corresponding to the environment voice signal;

step S22, determining semantic information matched with preset semantic information from the semantic information, and using an environmental voice signal corresponding to the semantic information matched with the preset semantic information as a target voice signal.

In this embodiment, the means for determining the environmental voice signal matching the preset semantic information is as follows: analyzing each environment voice signal to extract semantic information corresponding to each environment voice signal, comparing the semantic information corresponding to the environment voice signal with preset semantic information, and taking the environment voice signal corresponding to the semantic information matched with the preset semantic information as a target voice signal, thereby being capable of identifying the voice signal containing preset keywords in the environment voice signal.

In the method for identifying a robot user provided by this embodiment, a voice signal in the direction of the target sound source is acquired; if the voice signal in the direction of the target sound source contains a call instruction, detecting whether the image information contains human face features; and if the image information contains human face features, determining that a robot user corresponding to the target voice signal exists in the target sound source direction. In this embodiment, whether a robot user exists in the target sound source direction is determined by detecting whether a voice signal that reminds the robot again exists and determining whether the acquired image information includes a human face feature, and if it is detected that a voice signal that reminds the robot again exists in the target sound source direction and the image information includes a human face feature, it is determined that a robot user exists in the target sound source direction, accuracy of the robot in identifying the robot user is greatly improved, so that the robot provides a service for the robot user better.

Furthermore, an embodiment of the present invention further provides a storage medium, where an identification program of a robot user is stored, and the identification program of the robot user is executed by a processor to implement the steps of the method for identifying a robot user as described in any one of the above.

The specific embodiment of the storage medium of the present invention is substantially the same as the embodiments of the method for identifying a robot user, and will not be described in detail herein.

In addition, an embodiment of the present invention further provides an identification apparatus for a robot user, and referring to fig. 4, the identification apparatus for a robot user includes:

an obtaining module 100, configured to obtain an environmental voice signal in an environment;

a first determining module 200, configured to determine, from the environmental voice signals, an environmental voice signal matched with preset semantic information, and use the environmental voice signal matched with the preset semantic information as a target voice signal;

a second determining module 300, configured to determine a target sound source direction corresponding to the target speech signal, and acquire image information in the target sound source direction;

a third determining module 400, configured to determine whether a robot user corresponding to the target voice signal exists in the target voice source direction according to the image information.

Further, the third determining module is further configured to:

acquiring a voice signal in the direction of the target sound source;

Further, the third determining module is further configured to:

extracting gesture actions in the image information;

Further, the first determining module is further configured to:

extracting semantic information corresponding to the environment voice signal;

Further, the obtaining module is further configured to:

and acquiring image information in the direction of the target sound source.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above and includes several instructions for enabling a terminal device (which may be a mobile service robot) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method for identifying a robot user, the method being applied to a robot, the method comprising:

acquiring an environment voice signal in an environment;

2. The method for identifying a robot user according to claim 1, wherein the determining whether the robot user corresponding to the target voice signal exists in the target sound source direction based on the image information comprises:

acquiring a voice signal in the direction of the target sound source;

3. The method for identifying a robot user according to claim 1, wherein the determining whether the robot user corresponding to the target voice signal exists in the target sound source direction based on the image information comprises:

4. The method for identifying a robot user according to claim 1, wherein the determining whether the robot user corresponding to the target voice signal exists in the target sound source direction based on the image information comprises:

extracting gesture actions in the image information;

5. The method as claimed in claim 4, wherein the preset gesture motion includes a first gesture motion of moving in the direction of the target sound source or a second gesture motion of telescopic grasping in the direction of the target sound source.

6. The robot user recognition method of claim 1, wherein the step of determining an ambient voice signal matching preset semantic information among the ambient voice signals and using the ambient voice signal matching the preset semantic information as a target voice signal comprises:

extracting semantic information corresponding to the environment voice signal;

7. The robot user recognition method according to claim 1, wherein the step of acquiring image information in the direction of the target sound source comprises:

8. The robot user recognition method according to any one of claims 1 to 7, wherein the step of acquiring image information in the direction of the target sound source includes:

and acquiring image information in the direction of the target sound source.

9. An identification apparatus of a robot user, characterized by comprising: memory, a processor and a robot user identification program stored on the memory and executable on the processor, the robot user identification program, when executed by the processor, implementing the steps of the method of identifying a robot user according to any one of claims 1 to 8.

10. A storage medium having stored thereon a robot user identification program, which when executed by a processor, implements the steps of the robot user identification method according to any one of claims 1 to 8.