CN113257251A - Robot user identification method, apparatus and storage medium - Google Patents

Robot user identification method, apparatus and storage medium Download PDF

Info

Publication number
CN113257251A
CN113257251A CN202110514922.4A CN202110514922A CN113257251A CN 113257251 A CN113257251 A CN 113257251A CN 202110514922 A CN202110514922 A CN 202110514922A CN 113257251 A CN113257251 A CN 113257251A
Authority
CN
China
Prior art keywords
voice signal
sound source
robot
target sound
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110514922.4A
Other languages
Chinese (zh)
Inventor
罗沛
梁朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Uditech Co Ltd
Original Assignee
Uditech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Uditech Co Ltd filed Critical Uditech Co Ltd
Priority to CN202110514922.4A priority Critical patent/CN113257251A/en
Publication of CN113257251A publication Critical patent/CN113257251A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language

Abstract

The invention discloses a method, equipment and a storage medium for identifying a robot user, wherein the method comprises the following steps: acquiring an environment voice signal in an environment; determining an environment voice signal matched with preset semantic information in the environment voice signals, and taking the environment voice signal matched with the preset semantic information as a target voice signal; determining a target sound source direction corresponding to the target voice signal, and acquiring image information in the target sound source direction; and determining whether a robot user corresponding to the target voice signal exists in the target sound source direction or not according to the image information. The method and the device identify the robot user by combining the means of identifying the voice signal and the image, improve the accuracy of the robot for identifying the robot user, and enable the robot to better provide service for the robot user.

Description

Robot user identification method, apparatus and storage medium
Technical Field
The present invention relates to the field of robotics, and in particular, to a method and apparatus for identifying a robot user, and a storage medium.
Background
With the rapid development of computer technology, sensor technology, artificial intelligence and other technologies, robot technology is becoming mature day by day, and the mobile robot type among them is most widely used and plays an increasingly important role in numerous industries, and these various robots can well complete work under specific environments.
However, the existing robots have many disadvantages, and in most cases, the accuracy of the robot to identify the robot user who calls the robot for service is low, for example, the usage scenario of one robot is as follows: when a user wants to use the robot, the user sends out voice to call the robot, such as 'xx robot, please come over'. The robot recognizes the robot according to the voice information uttered by the user, and the accuracy of recognizing the user of the robot is low because there is much disturbing sound (such as talking sound of other people) in the environment.
The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.
Disclosure of Invention
The invention mainly aims to provide a method, equipment and a storage medium for identifying a robot user, and aims to solve the technical problem of low accuracy of identifying the robot user.
In order to achieve the above object, the present invention provides a method for identifying a robot user, the method being applied to a robot, the method comprising the steps of:
acquiring an environment voice signal in an environment;
determining an environment voice signal matched with preset semantic information in the environment voice signals, and taking the environment voice signal matched with the preset semantic information as a target voice signal;
determining a target sound source direction corresponding to the target voice signal, and acquiring image information in the target sound source direction;
and determining whether a robot user corresponding to the target voice signal exists in the direction of the target sound source or not according to the image information.
Optionally, the determining whether a robot user corresponding to the target voice signal exists in the target sound source direction according to the image information includes:
acquiring a voice signal in the direction of the target sound source;
if the voice signal in the direction of the target sound source contains a call instruction, detecting whether the image information contains human face features;
and if the image information contains human face features, determining that a robot user corresponding to the target voice signal exists in the target sound source direction.
Optionally, the determining whether a robot user corresponding to the target voice signal exists in the target sound source direction according to the image information includes:
acquiring the face features in the direction of the target sound source, and extracting mouth shape actions contained in the face features;
and if the mouth shape action contained in the face features is matched with a preset mouth shape action, determining that a robot user corresponding to the target voice signal exists in the target sound source direction.
Optionally, the determining whether a robot user corresponding to the target voice signal exists in the target sound source direction according to the image information includes:
extracting gesture actions in the image information;
and if the gesture action is matched with a preset gesture action, determining that a robot user corresponding to the target voice signal exists in the target sound source direction.
Optionally, the preset gesture motion comprises a first gesture motion of moving in the direction of the target sound source or a second gesture motion of telescopic grabbing in the direction of the target sound source.
Optionally, the step of determining, in the ambient voice signals, an ambient voice signal matching preset semantic information, and using the ambient voice signal matching the preset semantic information as a target voice signal includes:
extracting semantic information corresponding to the environment voice signal;
and determining semantic information matched with preset semantic information in the semantic information, and taking an environment voice signal corresponding to the semantic information matched with the preset semantic information as a target voice signal.
Optionally, the step of acquiring the image information in the direction of the target sound source includes:
and controlling a camera of the robot to turn to the direction of the target sound source so as to acquire image information in the direction of the target sound source.
Optionally, the step of acquiring the image information in the direction of the target sound source includes:
controlling the robot to adjust an orientation of the robot to face the target sound source direction;
and acquiring image information in the direction of the target sound source.
In order to achieve the above object, the present invention also provides an identification apparatus for a robot user, comprising: the robot system comprises a memory, a processor and a robot user identification program stored on the memory and capable of running on the processor, wherein the steps of the robot user identification method are realized when the robot user identification program is executed by the processor.
In order to achieve the above object, the present invention further provides a storage medium having stored thereon an identification program for a robot user, the identification program for a robot user realizing the above steps of the method for identifying a robot user when executed by a processor.
The method comprises the steps of acquiring an environment voice signal in an environment; determining an environment voice signal matched with preset semantic information in the environment voice signals, and taking the environment voice signal matched with the preset semantic information as a target voice signal; determining a target sound source direction corresponding to the target voice signal, and acquiring image information in the target sound source direction; and determining whether a robot user corresponding to the target voice signal exists in the direction of the target sound source or not according to the image information. According to the method, the target voice signal containing the preset semantic information in the environment voice signal is determined by recognizing the environment voice signal in the environment, the image information in the target voice source direction where the target voice signal is located is recognized, whether the robot user exists in the target voice source direction is judged, the robot user is recognized by combining the means of recognizing the voice signal and the image, the accuracy of the robot for recognizing the robot user is improved, and therefore the robot can better provide services for the robot user.
Drawings
FIG. 1 is a schematic diagram of an identification device for a robot user in a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a first embodiment of a method for identifying a user of a robot according to the present invention;
FIG. 3 is a flowchart illustrating a second embodiment of a method for identifying a user of a robot according to the present invention;
fig. 4 is a schematic system structure diagram of an embodiment of an identification device for a robot user according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, fig. 1 is a schematic structural diagram of an identification device of a robot user in a hardware operating environment according to an embodiment of the present invention.
As shown in fig. 1, the identification apparatus of a robot user according to an embodiment of the present invention may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Optionally, the identification device of the robot user may further include a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WiFi module, and the like. Such as light sensors, motion sensors, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust brightness according to the brightness of ambient light. As one of the motion sensors, the gravity acceleration sensor can detect the magnitude of acceleration in each direction (generally, three axes), detect the magnitude and direction of gravity when the robot is stationary, and can be used for application of recognizing the posture of equipment by a robot user, related functions of vibration recognition (such as pedometer and knocking), and the like; of course, the identification device of the robot user may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which are not described herein again.
It will be appreciated by those skilled in the art that the configuration of the identification device of the robot user shown in fig. 1 does not constitute a limitation of the identification device of the robot user, and may comprise more or less components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and an identification program of a robot user.
In the identification device of the robot user shown in fig. 1, the network interface 1004 is mainly used for connecting to a background server and performing data communication with the background server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be used to call up the robot user's identification program stored in the memory 1005.
In this embodiment, the robot user's recognition apparatus includes: a memory 1005, a processor 1001, and an identification program of a robot user stored in the memory 1005 and capable of running on the processor 1001, wherein when the processor 1001 calls the identification program of the robot user stored in the memory 1005, the following operations are performed:
acquiring an environment voice signal in an environment;
determining an environment voice signal matched with preset semantic information in the environment voice signals, and taking the environment voice signal matched with the preset semantic information as a target voice signal;
determining a target sound source direction corresponding to the target voice signal, and acquiring image information in the target sound source direction;
and determining whether a robot user corresponding to the target voice signal exists in the direction of the target sound source or not according to the image information.
Further, the processor 1001 may call the robot user's identification program stored in the memory 1005, and also perform the following operations:
acquiring a voice signal in the direction of the target sound source;
if the voice signal in the direction of the target sound source contains a call instruction, detecting whether the image information contains human face features;
and if the image information contains human face features, determining that a robot user corresponding to the target voice signal exists in the target sound source direction.
Further, the processor 1001 may call the robot user's identification program stored in the memory 1005, and also perform the following operations:
acquiring the face features in the direction of the target sound source, and extracting mouth shape actions contained in the face features;
and if the mouth shape action contained in the face features is matched with a preset mouth shape action, determining that a robot user corresponding to the target voice signal exists in the target sound source direction.
Further, the processor 1001 may call the robot user's identification program stored in the memory 1005, and also perform the following operations:
extracting gesture actions in the image information;
and if the gesture action is matched with a preset gesture action, determining that a robot user corresponding to the target voice signal exists in the target sound source direction.
Further, the processor 1001 may call the robot user's identification program stored in the memory 1005, and also perform the following operations:
extracting semantic information corresponding to the environment voice signal;
and determining semantic information matched with preset semantic information in the semantic information, and taking an environment voice signal corresponding to the semantic information matched with the preset semantic information as a target voice signal.
Further, the processor 1001 may call the robot user's identification program stored in the memory 1005, and also perform the following operations:
and controlling a camera of the robot to turn to the direction of the target sound source so as to acquire image information in the direction of the target sound source.
Further, the processor 1001 may call the robot user's identification program stored in the memory 1005, and also perform the following operations:
controlling the robot to adjust an orientation of the robot to face the target sound source direction;
and acquiring image information in the direction of the target sound source.
The present invention further provides a method for identifying a robot user, referring to fig. 2, and fig. 2 is a flowchart illustrating a first embodiment of the method for identifying a robot user according to the present invention.
In this embodiment, the method for identifying a robot user according to the present invention is applied to a robot, which is an intelligent machine capable of semi-autonomous or fully-autonomous operation, and includes the following steps:
step S10, acquiring an environment voice signal in the environment;
in this embodiment, when the robot enters a standby state or an idle state at a parking position of a hotel lobby, an environmental voice signal in an environment is acquired in real time through a sound sensor to perform real-time monitoring on the environmental voice signal, wherein the sound sensor may be a microphone. Further, the robot is provided with a plurality of sound sensors corresponding to different directions.
Step S20, determining an environment voice signal matched with preset semantic information in the environment voice signals, and using the environment voice signal matched with the preset semantic information as a target voice signal;
in this embodiment, after the environmental voice signal is acquired, according to the environmental voice signal, the environmental voice signal matched with the preset semantic information is determined in the environmental voice signal, so as to identify the voice signal including the preset keyword in the environmental voice signal, it should be noted that the preset semantic information is a word or a sentence including the preset keyword, and since the voice signal includes the semantic information, by identifying the environmental voice signal, the environmental voice signal matched with the preset semantic information in the environmental voice signal can be determined, and the environmental voice signal matched with the preset semantic information is taken as the target voice signal.
Further, the means for determining the ambient speech signal matching the preset semantic information comprises: (1) comparing each collected environment voice signal with a preset voice signal containing preset semantic information, and taking the environment voice signal matched with the preset voice signal as a target voice signal; (2) analyzing each environment voice signal to extract semantic information of the environment voice signals, comparing the semantic information corresponding to the environment voice signals with preset semantic information, and taking the environment voice signals corresponding to the semantic information matched with the preset semantic information as target voice signals.
It is understood that the speech signals formed by different keywords are different, and semantics can be proposed and recognized based on the speech signals. If the acquired voice signal is matched with preset keywords (such as Youimei, Youixiabroi and the like), it is indicated that a person is currently calling the robot. At this time, the voice signal including the preset keyword is locked as the target voice signal.
Step S30, determining a target sound source direction corresponding to the target voice signal, and acquiring image information in the target sound source direction;
in this embodiment, a sound source direction corresponding to a target speech signal is taken as a target sound source direction; after the target sound source direction corresponding to the target voice signal is determined, image information in the target sound source direction is obtained, and whether a machine user exists in the target sound source direction or not is identified according to the image information obtained in the target sound source direction subsequently. Further, after the target speech signal is specified, the sound sensor corresponding to the target speech signal is specified from among the sound sensors based on the target speech signal, and the sound sensor corresponding to the target speech signal is used as the target sound sensor, thereby specifying the target sound source direction corresponding to the target speech signal.
Further, the step of acquiring the image information in the direction of the target sound source includes:
step S301, controlling a camera of the robot to turn to the direction of the target sound source so as to acquire image information in the direction of the target sound source; alternatively, the first and second electrodes may be,
step S302, controlling the robot to adjust the orientation of the robot so that the robot faces the target sound source direction, and acquiring image information in the target sound source direction.
The orientation of the camera can be adjusted to be consistent with the direction of the target sound source by adjusting the camera of the robot, so that image information in the direction of the target sound source is collected through the camera. The robot can be adjusted to face the target sound source direction by adjusting the posture of the robot so as to acquire image information in the target sound source direction.
Step S40, determining whether or not a robot user corresponding to the target voice signal is present in the target sound source direction based on the image information.
In this embodiment, after the image information in the target sound source direction is collected, the image information is identified to determine whether a robot user corresponding to the target voice signal exists in the target sound source direction. Specifically, if the image information is identified to contain the robot user, the robot user corresponding to the target voice signal exists in the target sound source direction; if the robot user is identified not to be included in the image information, the robot user corresponding to the target voice signal does not exist in the target voice source direction. Further, if there is a robot user in the direction of the target sound source, the robot travels in front of the robot user to provide services to the robot user, and the services that the robot can provide include navigation services, chat services, and transport services, etc.
In the method for identifying the robot user provided by the embodiment, the environment voice signal in the environment is acquired; determining an environment voice signal matched with preset semantic information in the environment voice signals, and taking the environment voice signal matched with the preset semantic information as a target voice signal; determining a target sound source direction corresponding to the target voice signal, and acquiring image information in the target sound source direction; and determining whether a robot user corresponding to the target voice signal exists in the direction of the target sound source or not according to the image information. According to the method, the target voice signal containing the preset semantic information in the environment voice signal is determined by recognizing the environment voice signal in the environment, the image information in the target voice source direction where the target voice signal is located is recognized, whether the robot user exists in the target voice source direction is judged, the robot user is recognized by combining the means of recognizing the voice signal and the image, the accuracy of the robot for recognizing the robot user is improved, and therefore the robot can better provide services for the robot user.
Referring to fig. 3, in a second embodiment of the method for identifying a robot user according to the present invention, the method for determining whether a robot user corresponding to the target voice signal exists in the target voice source direction according to the image information in step S40 includes:
step S401, acquiring a voice signal in the direction of the target sound source;
step S402, if the voice signal in the direction of the target sound source contains a call instruction, detecting whether the image information contains human face features;
step S403, if the image information includes a human face feature, determining that a robot user corresponding to the target voice signal exists in the target sound source direction.
In this embodiment, it should be noted that, the first method for determining whether the robot user exists in the target sound source direction is: after the robot user calling the robot is identified, the robot turns to the target sound source direction, whether a voice signal for re-calling the robot exists is detected, and in combination with judging whether the acquired image information contains the human face features, the robot can determine that the robot user exists in the target sound source direction only if the voice signal for re-calling the robot exists in the detected target sound source direction and the image information contains the human face features.
Specifically, the voice signal in the target sound source direction is acquired again, whether the voice signal in the target sound source direction contains a call instruction or not and whether the image information contains a human face feature or not are detected, if the voice signal in the target sound source direction contains the call instruction and the image information contains the human face feature, it is determined that the robot user exists in the target sound source direction, and if not, it is determined that the robot user does not exist. Or whether the voice signal in the direction of the target sound source contains a call instruction is detected firstly, whether the voice signal in the direction of the target sound source contains the call instruction is detected, and whether the image information contains the human face features is detected, so that the recognition efficiency is improved.
Further, the determining whether a robot user corresponding to the target voice signal exists in the target sound source direction according to the image information includes:
step S411, acquiring the human face features in the direction of the target sound source, and extracting mouth shape actions contained in the human face features;
step S412, if the mouth shape action included in the face feature matches a preset mouth shape action, determining that a robot user corresponding to the target voice signal exists in the target sound source direction.
In this embodiment, it should be noted that, the second method for determining whether the robot user exists in the target sound source direction is: after the robot turns to obtain image information in the direction of a target sound source, whether a robot user exists in the direction of the target sound source is judged by detecting whether the image information contains human face features and detecting whether mouth shape actions in the human face features meet certain conditions. Only if the detected image information contains the human face features and the mouth shape actions in the detected human face features meet certain conditions, the robot can determine that the robot user exists in the target sound source direction.
Specifically, after the robot turns to obtain image information in the direction of a target sound source, whether the image information has human face features or not is detected, if the image information has the human face features, mouth shape actions contained in the recognized human face features are extracted, then the mouth shape actions contained in the human face features are compared with preset mouth shape actions, if the mouth shape actions contained in the human face features are matched with the preset mouth shape actions, it is determined that a robot user exists in the direction of the target sound source, and if not, it is determined that the robot user does not exist.
Further, the determining whether a robot user corresponding to the target voice signal exists in the target sound source direction according to the image information includes:
step S421, extracting gesture actions in the image information;
step S422, if the gesture action is matched with a preset gesture action, determining that a robot user corresponding to the target voice signal exists in the target sound source direction.
In this embodiment, it should be noted that, the third method for determining whether the robot user exists in the target sound source direction is: after the robot turns to obtain the image information in the target sound source direction, whether a robot user exists in the target sound source direction is judged by detecting whether the image information contains a specific gesture action. Only when the detected image information contains a specific gesture motion, the robot can determine that the robot user exists in the target sound source direction. Specifically, after the robot turns to obtain the image information in the target sound source direction, the gesture action in the image information is extracted, the recognized gesture action is compared with the preset gesture action, if the gesture action is matched with the preset gesture action, it is determined that a robot user exists in the target sound source direction, and otherwise, the robot user is determined to not exist.
Further, the preset gesture motion comprises a first gesture motion of moving in the direction of the target sound source or a second gesture motion of telescopic grabbing in the direction of the target sound source.
Optionally, one or more of the above three methods for determining whether the robot user is present in the target sound source direction are combined, thereby improving the accuracy of the robot in determining whether the robot user is present in the target sound source direction.
Further, the step of determining an ambient voice signal matching preset semantic information in the ambient voice signals, and using the ambient voice signal matching the preset semantic information as a target voice signal includes:
step S21, extracting semantic information corresponding to the environment voice signal;
step S22, determining semantic information matched with preset semantic information from the semantic information, and using an environmental voice signal corresponding to the semantic information matched with the preset semantic information as a target voice signal.
In this embodiment, the means for determining the environmental voice signal matching the preset semantic information is as follows: analyzing each environment voice signal to extract semantic information corresponding to each environment voice signal, comparing the semantic information corresponding to the environment voice signal with preset semantic information, and taking the environment voice signal corresponding to the semantic information matched with the preset semantic information as a target voice signal, thereby being capable of identifying the voice signal containing preset keywords in the environment voice signal.
In the method for identifying a robot user provided by this embodiment, a voice signal in the direction of the target sound source is acquired; if the voice signal in the direction of the target sound source contains a call instruction, detecting whether the image information contains human face features; and if the image information contains human face features, determining that a robot user corresponding to the target voice signal exists in the target sound source direction. In this embodiment, whether a robot user exists in the target sound source direction is determined by detecting whether a voice signal that reminds the robot again exists and determining whether the acquired image information includes a human face feature, and if it is detected that a voice signal that reminds the robot again exists in the target sound source direction and the image information includes a human face feature, it is determined that a robot user exists in the target sound source direction, accuracy of the robot in identifying the robot user is greatly improved, so that the robot provides a service for the robot user better.
Furthermore, an embodiment of the present invention further provides a storage medium, where an identification program of a robot user is stored, and the identification program of the robot user is executed by a processor to implement the steps of the method for identifying a robot user as described in any one of the above.
The specific embodiment of the storage medium of the present invention is substantially the same as the embodiments of the method for identifying a robot user, and will not be described in detail herein.
In addition, an embodiment of the present invention further provides an identification apparatus for a robot user, and referring to fig. 4, the identification apparatus for a robot user includes:
an obtaining module 100, configured to obtain an environmental voice signal in an environment;
a first determining module 200, configured to determine, from the environmental voice signals, an environmental voice signal matched with preset semantic information, and use the environmental voice signal matched with the preset semantic information as a target voice signal;
a second determining module 300, configured to determine a target sound source direction corresponding to the target speech signal, and acquire image information in the target sound source direction;
a third determining module 400, configured to determine whether a robot user corresponding to the target voice signal exists in the target voice source direction according to the image information.
Further, the third determining module is further configured to:
acquiring a voice signal in the direction of the target sound source;
if the voice signal in the direction of the target sound source contains a call instruction, detecting whether the image information contains human face features;
and if the image information contains human face features, determining that a robot user corresponding to the target voice signal exists in the target sound source direction.
Further, the third determining module is further configured to:
acquiring the face features in the direction of the target sound source, and extracting mouth shape actions contained in the face features;
and if the mouth shape action contained in the face features is matched with a preset mouth shape action, determining that a robot user corresponding to the target voice signal exists in the target sound source direction.
Further, the third determining module is further configured to:
extracting gesture actions in the image information;
and if the gesture action is matched with a preset gesture action, determining that a robot user corresponding to the target voice signal exists in the target sound source direction.
Further, the preset gesture motion comprises a first gesture motion of moving in the direction of the target sound source or a second gesture motion of telescopic grabbing in the direction of the target sound source.
Further, the first determining module is further configured to:
extracting semantic information corresponding to the environment voice signal;
and determining semantic information matched with preset semantic information in the semantic information, and taking an environment voice signal corresponding to the semantic information matched with the preset semantic information as a target voice signal.
Further, the obtaining module is further configured to:
and controlling a camera of the robot to turn to the direction of the target sound source so as to acquire image information in the direction of the target sound source.
Further, the obtaining module is further configured to:
controlling the robot to adjust an orientation of the robot to face the target sound source direction;
and acquiring image information in the direction of the target sound source.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above and includes several instructions for enabling a terminal device (which may be a mobile service robot) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A method for identifying a robot user, the method being applied to a robot, the method comprising:
acquiring an environment voice signal in an environment;
determining an environment voice signal matched with preset semantic information in the environment voice signals, and taking the environment voice signal matched with the preset semantic information as a target voice signal;
determining a target sound source direction corresponding to the target voice signal, and acquiring image information in the target sound source direction;
and determining whether a robot user corresponding to the target voice signal exists in the direction of the target sound source or not according to the image information.
2. The method for identifying a robot user according to claim 1, wherein the determining whether the robot user corresponding to the target voice signal exists in the target sound source direction based on the image information comprises:
acquiring a voice signal in the direction of the target sound source;
if the voice signal in the direction of the target sound source contains a call instruction, detecting whether the image information contains human face features;
and if the image information contains human face features, determining that a robot user corresponding to the target voice signal exists in the target sound source direction.
3. The method for identifying a robot user according to claim 1, wherein the determining whether the robot user corresponding to the target voice signal exists in the target sound source direction based on the image information comprises:
acquiring the face features in the direction of the target sound source, and extracting mouth shape actions contained in the face features;
and if the mouth shape action contained in the face features is matched with a preset mouth shape action, determining that a robot user corresponding to the target voice signal exists in the target sound source direction.
4. The method for identifying a robot user according to claim 1, wherein the determining whether the robot user corresponding to the target voice signal exists in the target sound source direction based on the image information comprises:
extracting gesture actions in the image information;
and if the gesture action is matched with a preset gesture action, determining that a robot user corresponding to the target voice signal exists in the target sound source direction.
5. The method as claimed in claim 4, wherein the preset gesture motion includes a first gesture motion of moving in the direction of the target sound source or a second gesture motion of telescopic grasping in the direction of the target sound source.
6. The robot user recognition method of claim 1, wherein the step of determining an ambient voice signal matching preset semantic information among the ambient voice signals and using the ambient voice signal matching the preset semantic information as a target voice signal comprises:
extracting semantic information corresponding to the environment voice signal;
and determining semantic information matched with preset semantic information in the semantic information, and taking an environment voice signal corresponding to the semantic information matched with the preset semantic information as a target voice signal.
7. The robot user recognition method according to claim 1, wherein the step of acquiring image information in the direction of the target sound source comprises:
and controlling a camera of the robot to turn to the direction of the target sound source so as to acquire image information in the direction of the target sound source.
8. The robot user recognition method according to any one of claims 1 to 7, wherein the step of acquiring image information in the direction of the target sound source includes:
controlling the robot to adjust an orientation of the robot to face the target sound source direction;
and acquiring image information in the direction of the target sound source.
9. An identification apparatus of a robot user, characterized by comprising: memory, a processor and a robot user identification program stored on the memory and executable on the processor, the robot user identification program, when executed by the processor, implementing the steps of the method of identifying a robot user according to any one of claims 1 to 8.
10. A storage medium having stored thereon a robot user identification program, which when executed by a processor, implements the steps of the robot user identification method according to any one of claims 1 to 8.
CN202110514922.4A 2021-05-11 2021-05-11 Robot user identification method, apparatus and storage medium Pending CN113257251A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110514922.4A CN113257251A (en) 2021-05-11 2021-05-11 Robot user identification method, apparatus and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110514922.4A CN113257251A (en) 2021-05-11 2021-05-11 Robot user identification method, apparatus and storage medium

Publications (1)

Publication Number Publication Date
CN113257251A true CN113257251A (en) 2021-08-13

Family

ID=77223058

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110514922.4A Pending CN113257251A (en) 2021-05-11 2021-05-11 Robot user identification method, apparatus and storage medium

Country Status (1)

Country Link
CN (1) CN113257251A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104881117A (en) * 2015-05-22 2015-09-02 广东好帮手电子科技股份有限公司 Device and method for activating voice control module through gesture recognition
CN106531179A (en) * 2015-09-10 2017-03-22 中国科学院声学研究所 Multi-channel speech enhancement method based on semantic prior selective attention
CN111222117A (en) * 2019-12-30 2020-06-02 云知声智能科技股份有限公司 Identification method and device of identity information
CN111599361A (en) * 2020-05-14 2020-08-28 宁波奥克斯电气股份有限公司 Awakening method and device, computer storage medium and air conditioner
CN111638783A (en) * 2020-05-18 2020-09-08 广东小天才科技有限公司 Man-machine interaction method and electronic equipment
CN112433770A (en) * 2020-11-19 2021-03-02 北京华捷艾米科技有限公司 Wake-up method and device for equipment, electronic equipment and computer storage medium
CN112597910A (en) * 2020-12-25 2021-04-02 北京小狗吸尘器集团股份有限公司 Method and device for monitoring human activities by using sweeping robot

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104881117A (en) * 2015-05-22 2015-09-02 广东好帮手电子科技股份有限公司 Device and method for activating voice control module through gesture recognition
CN106531179A (en) * 2015-09-10 2017-03-22 中国科学院声学研究所 Multi-channel speech enhancement method based on semantic prior selective attention
CN111222117A (en) * 2019-12-30 2020-06-02 云知声智能科技股份有限公司 Identification method and device of identity information
CN111599361A (en) * 2020-05-14 2020-08-28 宁波奥克斯电气股份有限公司 Awakening method and device, computer storage medium and air conditioner
CN111638783A (en) * 2020-05-18 2020-09-08 广东小天才科技有限公司 Man-machine interaction method and electronic equipment
CN112433770A (en) * 2020-11-19 2021-03-02 北京华捷艾米科技有限公司 Wake-up method and device for equipment, electronic equipment and computer storage medium
CN112597910A (en) * 2020-12-25 2021-04-02 北京小狗吸尘器集团股份有限公司 Method and device for monitoring human activities by using sweeping robot

Similar Documents

Publication Publication Date Title
CN111107667B (en) Bluetooth headset pairing method and device and readable storage medium
EP4064284A1 (en) Voice detection method, prediction model training method, apparatus, device, and medium
KR20190022109A (en) Method for activating voice recognition servive and electronic device for the same
WO2019124742A1 (en) Method for processing voice signals of multiple speakers, and electronic device according thereto
CN110992989B (en) Voice acquisition method and device and computer readable storage medium
EP2612222A1 (en) Use camera to augment input for portable electronic device
CN110059686B (en) Character recognition method, device, equipment and readable storage medium
CN111833872B (en) Voice control method, device, equipment, system and medium for elevator
CN110830771A (en) Intelligent monitoring method, device, equipment and computer readable storage medium
CN109215640B (en) Speech recognition method, intelligent terminal and computer readable storage medium
CN111984180B (en) Terminal screen reading method, device, equipment and computer readable storage medium
CN110944056A (en) Interaction method, mobile terminal and readable storage medium
WO2012111252A1 (en) Information processing device
CN111796926A (en) Instruction execution method and device, storage medium and electronic equipment
CN109684006B (en) Terminal control method and device
CN113257251A (en) Robot user identification method, apparatus and storage medium
CN110880330A (en) Audio conversion method and terminal equipment
CN115909505A (en) Control method and device of sign language recognition equipment, storage medium and electronic equipment
CN113744736B (en) Command word recognition method and device, electronic equipment and storage medium
WO2013035670A1 (en) Object retrieval system and object retrieval method
CN111816180B (en) Method, device, equipment, system and medium for controlling elevator based on voice
CN111627422B (en) Voice acceleration detection method, device and equipment and readable storage medium
CN110970035B (en) Single-machine voice recognition method, device and computer readable storage medium
CN108093124B (en) Audio positioning method and device and mobile terminal
CN108958505B (en) Method and terminal for displaying candidate information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination