CN110434853B

CN110434853B - Robot control method, device and storage medium

Info

Publication number: CN110434853B
Application number: CN201910719457.0A
Authority: CN
Inventors: 支涛; 徐诗昀
Original assignee: Beijing Yunji Technology Co Ltd
Current assignee: Beijing Yunji Technology Co Ltd
Priority date: 2019-08-05
Filing date: 2019-08-05
Publication date: 2021-05-14
Anticipated expiration: 2039-08-05
Also published as: CN110434853A

Abstract

The application provides a robot control method, a device and a storage medium, wherein the method comprises the steps of receiving voice information, and extracting character information in the voice information; judging whether the character information has preset characters or not; if so, determining the position of the target object according to the voice information; performing gesture recognition on the target object to obtain a gesture recognition result; inquiring the instruction base according to the gesture recognition result to determine a corresponding control instruction, and controlling the robot to execute corresponding operation according to the control instruction; the command library comprises preset different gesture recognition results and corresponding control commands.

Description

Robot control method, device and storage medium

Technical Field

The present disclosure relates to the field of automatic control technologies, and in particular, to a robot control method, apparatus, and storage medium.

Background

At present, most of robots are controlled through keys on the robots or through remote controllers, and the operation modes are inconvenient for users to use, so that the use experience of the users on products cannot be improved.

Disclosure of Invention

An object of the embodiments of the present application is to provide a method and an apparatus for controlling a robot, and a storage medium, so as to solve the problems of inconvenience and low user experience when the robot is operated by a key or a remote controller.

In order to achieve the above object, the present application provides the following technical solutions:

in a first aspect: the application provides a robot control method, which comprises the steps of receiving voice information and extracting character information in the voice information; judging whether the character information has preset characters or not; if so, determining the position of the target object according to the voice information; performing gesture recognition on the target object to obtain a gesture recognition result; inquiring an instruction base according to the gesture recognition result to determine a corresponding control instruction, and controlling the robot to move according to the control instruction; the command library comprises preset different gesture recognition results and corresponding control commands.

In the above designed scheme, the position of the target object is determined through the voice information, the gesture recognition of the target object is triggered through key characters in the voice information, the control instruction is searched according to the gesture recognition result, and therefore the robot is controlled to execute corresponding operation according to the control instruction, the problems that the robot is inconvenient to operate through a key or a remote controller and the user experience degree is not high in the prior art are solved, the robot is controlled through the combination of voice and gesture recognition, the control of the robot is enabled to be more convenient and rapid, and the user experience of products is improved.

In an optional implementation manner of the first aspect, the performing gesture recognition on the target object to obtain a gesture recognition result includes: continuously acquiring a plurality of scene images containing the target object; recognizing a gesture image of the target object in each scene image; and judging whether the hand swing amplitude of the target object exceeds a threshold value, and if so, determining the hand swing direction of the target object according to the plurality of scene images.

In the embodiment designed above, the gesture image of the target object is recognized according to the scene image, so that the amplitude of the hand swing is determined according to the gesture image, whether the hand swing is a valid gesture is determined according to the amplitude of the hand swing, the swing direction of the hand is determined after the hand swing is determined to be the valid gesture, and then the corresponding control instruction is searched according to the swing direction of the hand, so as to control the robot to move, thereby solving the problem of judgment error caused by an invalid gesture in the scene image, and improving the gesture recognition accuracy of the target object.

In an optional implementation of the first aspect, the determining a hand swing direction of the target object from the plurality of scene images comprises: analyzing a hand swing trend of the target object according to the gesture image of the target object in the plurality of gesture images; determining a hand swing direction of the target object according to the hand swing trend.

In the embodiment designed above, the hand swing direction is identified according to the time-series multiple gesture images, and the hand swing direction is determined according to the hand swing trend, so that the hand swing direction is more easily identified.

In an optional implementation manner of the first aspect, the recognizing the gesture image of the target object in each scene image includes: extracting a gesture image in each scene image; judging whether the number of the gesture images belonging to the same object in the plurality of scene images exceeds a preset number threshold value or not; if yes, determining the gesture images which exceed a preset number threshold and belong to the same object as the gesture images of the target object.

In the embodiment designed above, whether the gesture is an effective gesture is determined according to the number of gesture images of the same object, and some invalid gestures are deleted before the amplitude determination is performed, so that the accuracy of gesture recognition of the target object is improved.

In an optional implementation manner of the first aspect, the querying the instruction library according to the gesture recognition result to determine a corresponding control instruction includes: inquiring an instruction library according to the hand waving direction of the target object to determine a corresponding control instruction; the command library comprises preset different hand waving directions and corresponding control commands.

In an optional implementation manner of the first aspect, the controlling the robot to perform corresponding operations according to the control instruction includes: judging whether the target object moves or not at preset time intervals; if so, tracking the target object, and controlling the robot to complete the control instruction.

In the embodiment designed above, after the target object moves, the robot will follow the target object to complete the control command, so as to improve the experience and the sensitivity of the user.

In an optional implementation manner of the first aspect, the robot is provided with a plurality of sensors, and the determining the position of the target object according to the voice information includes: acquiring the receiving time of the voice information by each sensor; calculating a time difference between the receiving time with the shortest time and each of the remaining receiving times; and determining the position of the target object according to the positions of the sensors, the sound propagation speed and the calculated time differences.

In the embodiment designed above, the position of the sound source is calculated through the time difference of the voice information reaching different sensors, so that the position of the target object is determined more accurately while the human-computer interaction experience is improved.

In a second aspect: the application provides a robot control device, the device includes: the receiving and extracting module is used for receiving voice information and extracting character information in the voice information; the judging module is used for judging whether the character information has preset characters or not; the determining module is used for determining the position of a target object according to the voice information after judging that the character information has preset characters; the gesture recognition module is used for carrying out gesture recognition on the target object to obtain a gesture recognition result; the query control module is used for querying an instruction library according to the gesture recognition result to determine a corresponding control instruction and controlling the robot to move according to the control instruction; the command library comprises preset different gesture recognition results and corresponding control commands.

In the embodiment designed above, the position of the target object is determined through the voice information, the gesture recognition of the target object is triggered through the key characters in the voice information, the control instruction is searched according to the gesture recognition result, and therefore the robot is controlled to execute corresponding operation according to the control instruction, the problems that the robot is inconvenient to operate through a key or a remote controller and the user experience degree is not high in the prior art are solved, the robot is controlled through the combination of voice and gesture recognition, the robot is controlled more conveniently and rapidly, and the product experience of the user is improved.

In an optional implementation manner of the second aspect, the gesture recognition module is specifically configured to continuously acquire a plurality of scene images including the target object; recognizing a gesture image of the target object in each scene image; and judging whether the hand swing amplitude of the target object exceeds a threshold value, and if so, determining the hand swing direction of the target object according to the plurality of scene images.

In an optional implementation manner of the second aspect, the query control module is specifically configured to query an instruction library to determine a corresponding control instruction according to the hand swing direction of the target object; the command library comprises preset different hand waving directions and corresponding control commands.

In an optional implementation manner of the second aspect, the determining module is further configured to determine whether the target object moves every predetermined time; and the tracking module is used for tracking the target object after the target object moves and controlling the robot to complete the control instruction.

In an optional embodiment of the second aspect, the robot is provided with a plurality of sensors, and the determining module is specifically configured to obtain a time of receipt of the voice information by each sensor; calculating a time difference between the receiving time with the shortest time and each of the remaining receiving times; and determining the position of the target object according to the positions of the sensors, the sound propagation speed and the calculated time differences.

In a third aspect: the present application further provides an electronic device, including: a processor, a memory connected to the processor, the memory storing a computer program that, when executed by the computing device, is executed by the processor to perform the method of the first aspect, any of the alternative implementations of the first aspect.

In a fourth aspect: the present application provides a non-transitory readable storage medium having stored thereon a computer program which, when executed by a processor, performs the method of the first aspect, any of the optional implementations of the first aspect.

In a fifth aspect: the present application provides a computer program product which, when run on a computer, causes the computer to perform the method of the first aspect, any of the alternative implementations of the first aspect.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a first flowchart of a robot control method according to a first embodiment of the present application;

fig. 2 is a second flowchart of a robot control method according to the first embodiment of the present application;

fig. 3 is a third flowchart of a robot control method according to the first embodiment of the present application;

fig. 4 is a fourth flowchart of a robot control method according to the first embodiment of the present application;

FIG. 5 is a schematic diagram of a sensor for receiving speech according to a first embodiment of the present application;

fig. 6 is a schematic structural diagram of a robot control device according to a second embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to a third embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

First embodiment

As shown in fig. 1, the present application provides a robot control method, which specifically includes the following steps:

step S100: and receiving voice information and extracting character information in the voice information.

Step S102: and judging whether the character information has a preset character or not, if so, turning to the step S104.

Step S104: and determining the position of the target object according to the voice information.

Step S106: and performing gesture recognition on the target object to obtain a gesture recognition result.

Step S108: inquiring an instruction library according to the gesture recognition result to determine a corresponding control instruction; the command library comprises preset different gesture recognition results and corresponding control commands.

Step S110: and controlling the robot to execute corresponding operation according to the control instruction.

In step S100, the voice information may be received through a sound sensor or a software recognition module provided on the robot. The sound sensor or the software recognition module is always in an open state to receive voice information in real time under the condition that the robot is in a working state or a power-on state. The robot may be a service robot for a hotel, and the voice information may be sounds, words, etc. made by a series of users, such as hotel customers or staff. The character information in the voice information is extracted, and the voice information can be converted into characters.

After the voice is converted into the text in step S100, the step S102 of determining whether the character information has the preset character can be understood as: whether the converted characters have preset characters or keywords is judged, for example, whether the converted characters have characters such as ' Xiaorun (robot name), come ', ' Xiaorun ', go bar ', and the like is judged, and the content of the characters can be set according to a specific application scene.

After the step S102 determines that the character information has the preset character, the step S104 is executed to determine the position of the target object according to the voice information, wherein the target object in the step S104 is the sender of the voice information in the steps S100 to S102, and the specific position determination manner of the target object can be determined by sound localization.

In step S106, after the position of the target object is determined in step S104, gesture recognition is performed on the target object, which can be specifically understood as follows: after the position of the target object is determined, a camera on the robot is controlled to shoot images aiming at the position direction of the target object, and gesture recognition results are obtained by performing gesture recognition analysis on the shot images. When the number of the cameras on the robot is one, after the position of the target object is determined, the robot is controlled to rotate, so that the cameras on the robot are aligned to the position of the target object, and then image shooting is carried out; when the camera on the robot sets up a plurality ofly, a plurality of cameras can set up in different position, and like this, the robot can carry out the image shooting to the position of target object without the direction of rotation.

In step S108, after the gesture recognition is performed on the target object in step S106 to obtain a gesture recognition result, querying a corresponding control instruction in an instruction library according to the gesture recognition result, where different gesture recognition results correspond to different control instructions, for example, different directions of hand waving of the target object represent different control instructions; different palm patterns of the target object represent different control instructions. Specifically, the hand swings in different directions represent different control commands, for example, the hand swings from outside to the user himself, which may represent a control command for moving to the target object; the user swings from left to right, which can represent that the user wants the robot to move to the right of the user, and the corresponding control instruction is to move to the left of the target; for example, when the palm pattern is V-shaped, it represents that the user needs to take a picture, and the corresponding control command may be a picture taking command. The control instructions and the gesture recognition results are associated one by one and then stored in an instruction library, and after one gesture recognition result is obtained, a corresponding control instruction is searched in the instruction library, and then step S110 is executed to control the robot to perform corresponding operations through the control instruction, for example, to move to a user, to move away from the user, and the like.

According to the scheme, the position of the target object is determined through the voice information, gesture recognition of the target object is triggered through key characters in the voice information, the control instruction is searched according to the gesture recognition result, and therefore the robot is controlled to execute corresponding operation according to the control instruction, the problems that the robot is inconvenient to operate through a key or a remote controller and is not high in user experience degree in the prior art are solved, the robot is controlled through combination of voice and gesture recognition, control over the robot is enabled to be more convenient and rapid, and user experience of products is improved.

In an optional implementation manner of this embodiment, the gesture recognition on the target object in step S106 to obtain a gesture recognition result, as shown in fig. 2, specifically may be:

step S1060: a plurality of scene images containing a target object are continuously acquired.

Step S1062: and recognizing the gesture image of the target object in each scene image.

Step S1064: and judging whether the hand swing amplitude of the target object exceeds a threshold value, if so, turning to the step S1066.

Step S1066: and determining the hand swing direction of the target object according to the gesture image of the target object.

It has been explained in the foregoing description of step S106 that after the position of the target object is determined, image capturing is performed for the position direction of the target object by the camera of the robot. The scene image in step S1060 is an image obtained after shooting, and the meaning of the scene image is that in an actual application process, for example, when a service robot in a hotel shoots an image of a position direction of a target object, there is inevitably an obstacle or other people passing between the target object and a server robot, and therefore, after shooting through a camera, the obtained image is the scene image in step S1060. In addition, the continuous acquisition refers to continuous shooting of the position direction of the target object by a camera on the robot, the frequency of the continuous shooting can be freely set, for example, one image is shot in 0.1 millisecond, and the duration of the shooting can be freely set, for example, 30 seconds and the like.

After obtaining the plurality of scene images, step S1062 is performed to process each scene image, so as to obtain the gesture image of the target object. The processing process includes various processes, for example, when a target object in a scene image is blocked, the corresponding image can be deleted; and then finding out a plurality of continuous images with clear gestures of the target object in the scene image, and then sequentially extracting the gestures in each continuous image with clear gestures according to the time sequence relationship to identify and analyze the continuous gestures.

After the gesture image of the target object in each scene image is recognized in step S1062, step S1064 is performed. In step S1064, the hand swing amplitude of the target object may be analyzed according to the plurality of gesture images with time precedence relationship, and then it is determined whether the hand swing amplitude of the target object exceeds the threshold, and if the hand swing amplitude exceeds the threshold, the gesture of the target object is determined to be valid, which is a requirement for using the robot. Therefore, step S1066 is executed to determine the hand swing direction of the target object according to the gesture image of the target object, which may specifically be to analyze the hand swing trend of the target object through a plurality of gesture images of the target object in time-sequence relationship, and then determine the hand swing direction of the target object according to the hand swing trend. For example, in a plurality of gesture images that are continuous for a period of time, the first gesture image shows that the hand of the user is on the right side of the body of the user, the second gesture image shows that the hand of the user is in the middle of the body of the user, and the third gesture image shows that the hand of the user is on the left side of the body of the user, and then it is determined that the hand swing direction of the user at the moment is from the right side of the user to the left side of the user. After the hand swing direction of the user is recognized, the control command corresponding to the swing direction can be searched in the command library, for example, the hand swing direction is from the right of the user to the left of the user, and the robot can move to the left of the user. Further, the moving distance of the robot may be set according to the swing amplitude of the user's hand, for example, the swing amplitude of the user's hand is 30cm and the swing direction is from the right of the user to the left of the user, and then the robot may move 3 meters to the left of the user. The above examples are only for the convenience of understanding the scheme of the present application and do not limit the protection scope of the present application.

In an alternative embodiment of this embodiment, the foregoing description of step S1062 has mentioned that a series of processes are performed on the scene image, but many gestures may occur in the obtained scene image, and some invalid gestures in the scene image may be deleted before step S1064 determines whether the waving is valid or not according to the waving amplitude. For example, the gesture image for identifying the target object in each scene image in step S1062 may be set as: extracting a gesture image in each scene image; and judging whether the number of the gesture images belonging to the same object in the multiple scene images exceeds a preset number threshold, if so, determining the gesture images belonging to the same object exceeding the preset number threshold as the gesture images of the target object. In this solution, recognized gesture images belonging to the same object are present in a certain number of gesture images before being considered as an initial valid gesture. Because the duration of a user is generally long when the user actually waves his hand, the gesture images of the user may exist for a long time in the captured images of the scenes. Some invalid gestures may be short in duration, and the captured gesture image may exist in fewer images, so that some invalid gestures may be preliminarily deleted by the above scheme. The determination as to whether or not the gesture belongs to the same object may be made by the size of the gesture in the gesture image, the distance of the gesture from the robot, or the like.

In an alternative implementation manner of this embodiment, step S110 controls the robot to perform corresponding operations according to the control instruction, and as shown in fig. 3, may also be performed according to the following steps:

step S1102: and judging whether the target object moves at preset time intervals, and if so, turning to the step S1104.

Step S1104: and tracking the target object, and controlling the robot to complete the control instruction.

The scenario of the above design is that after the robot performs step S108 to search for a corresponding control instruction in the instruction library according to the gesture recognition result, the position of the user may change before or after the robot is controlled to perform a corresponding operation according to the control instruction. For example, the user walks after making a gesture (waving); or when the user makes a gesture, the robot makes partial operation according to the control instruction, and the position of the user changes. Therefore, step S1102 needs to be executed to determine whether the target object moves every preset time interval, and the specific determination method may be an automatic recognition technology such as gesture recognition. After the target is determined to move, step S1104 is executed to track the target object, specifically, the target object may be tracked based on the vision of the robot, and finally, until the control instruction corresponding to the gesture recognition result of the target object is completed.

In an optional implementation manner of this embodiment, the aforementioned description in step S104 has mentioned that the position of the target object can be determined by sound localization, which may specifically be: the robot is provided with a plurality of sensors, the sensors can receive external voice information, and on this basis, the position of the target object is determined according to the voice information in step S104, as shown in fig. 4, specifically, the position may be:

step S1040: the reception time of the voice information by each sensor is acquired.

Step S1042: a time difference between the reception time having the shortest time and each of the remaining reception times is calculated.

Step S1044: and determining the position of the target object according to the positions of the sensors, the sound propagation speed and the calculated time differences.

As shown in fig. 5, taking the number of the sensors as 4 as an example, the above solution can be understood as follows: the 4 sensors are arranged in a square array, each sensor can be positioned at a corner of the square array, and the side length of the square is 2K, so that the coordinates of the sensor 1 are (-K, -K), the coordinates of the sensor 2 are (K, -K), the coordinates of the sensor 3 are (K, K), and the coordinates of the sensor 4 are (-K, K). After the user utters the voice message, the time when each sensor receives the voice message is recorded, for example, the sensor 1 receives the voice message first, the received time is T1, the

sensors

4, 2 and 3 receive the voice message sequentially, the received times are T4, T2 and T3, respectively, then the time difference between the shortest received time and the remaining received time in step S1042 is: the time difference between sensor 1 and sensor 3 is Δ T_1-3Time difference between sensor 1 and sensor 2 is Δ T, T3-T1_1-2Time difference between sensor 1 and sensor 4 is Δ T, T2-T1_1-4T4-T1. Determining a position (x, y) of the target based on the time difference, the sound propagation speed, and the positions of the plurality of sensors:

where c is the propagation velocity of sound.

After the position of the target object is determined by the above scheme, if the robot has only one camera, the camera is rotated to align the camera with the direction of the determined target position for image capturing.

Second embodiment

Fig. 6 shows a schematic block diagram of the robot controller 2 provided in the present application, and it should be understood that the apparatus corresponds to the method embodiments in fig. 1 to 5, and can execute the steps involved in the method in the first embodiment, and the specific functions of the apparatus can be referred to the description above, and the detailed description is appropriately omitted here to avoid redundancy. The device includes at least one software function that can be stored in memory in the form of software or firmware (firmware) or solidified in the Operating System (OS) of the device. Specifically, the apparatus includes: the receiving and extracting module 200 is configured to receive the voice information and extract character information in the voice information; the judging module 202 is configured to judge whether the character information has a preset character; the determining module 204 is configured to determine a position of the target object according to the voice information after judging that the character information has a preset character; the gesture recognition module 206 is configured to perform gesture recognition on the target object to obtain a gesture recognition result; the query control module 208 is configured to query the instruction library according to the gesture recognition result to determine a corresponding control instruction, and control the robot to execute a corresponding operation according to the control instruction; the command library comprises preset different gesture recognition results and corresponding control commands.

The device designed by the embodiment determines the position of the target object through the voice information, triggers gesture recognition on the target object through key characters in the voice information, searches the control instruction according to the gesture recognition result, and controls the robot to execute corresponding operation according to the control instruction, solves the problems of inconvenience and low user experience degree existing in the prior art for operating the robot through a key or a remote controller, and controls the robot through the combination of voice and gesture recognition, so that the control on the robot is more convenient and faster, and the experience of a user on a product is improved.

In an optional implementation manner of this embodiment, the gesture recognition module 206 is specifically configured to continuously obtain multiple scene images including a target object; recognizing a gesture image of a target object in each scene image; and judging whether the hand swing amplitude of the target object exceeds a threshold value, and if so, determining the hand swing direction of the target object according to the multiple scene images.

In an optional implementation manner of this embodiment, the query control module 208 is specifically configured to determine a corresponding control instruction according to the query instruction library of the hand swing direction of the target object; the command library comprises preset different hand waving directions and corresponding control commands.

In an optional implementation manner of this embodiment, the determining module 202 is further configured to determine whether the target object moves every predetermined time; and the tracking module 210 is configured to track the target object after the target object moves, and control the robot to complete the control instruction.

In an optional implementation manner of this embodiment, the robot is provided with a plurality of sensors, and the determining module 204 is specifically configured to acquire a receiving time of the voice information by each sensor; calculating a time difference between the receiving time with the shortest time and each of the remaining receiving times; and determining the position of the target object according to the positions of the sensors, the sound propagation speed and the calculated time differences.

Third embodiment

As shown in fig. 7, the present application provides an electronic device 3 including: the processor 301 and the memory 302, the processor 301 and the memory 302 being interconnected and communicating with each other via a communication bus 303 and/or other form of connection mechanism (not shown), the memory 302 storing a computer program executable by the processor 301, the processor 301 executing the computer program when the computing device is running to perform the method of the first embodiment, any alternative implementation of the first embodiment.

The present application provides a non-transitory storage medium having stored thereon a computer program which, when executed by a processor, performs the method of the first embodiment, any one of the alternative implementations of the first embodiment.

The storage medium may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as a Static Random Access Memory (SRAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), an Erasable Programmable Read-Only Memory (EPROM), a Programmable Read-Only Memory (PROM), a Read-Only Memory (ROM), a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk.

The present application provides a computer program product which, when run on a computer, causes the computer to perform the method of the first embodiment, any of its alternative implementations.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the modules is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.

Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A robot control method, comprising:

receiving voice information, and extracting character information in the voice information;

judging whether the character information has preset characters or not;

if so, determining the position of the target object according to the voice information;

performing gesture recognition on the target object to obtain a gesture recognition result;

inquiring an instruction base according to the gesture recognition result to determine a corresponding control instruction, and controlling the robot to execute corresponding operation according to the control instruction; the command library comprises preset different gesture recognition results and corresponding control commands;

the gesture recognition of the target object to obtain a gesture recognition result includes:

continuously acquiring a plurality of scene images containing the target object;

recognizing a gesture image of the target object in each scene image;

judging whether the hand swing amplitude of the target object exceeds a threshold value, if so, determining the hand swing direction of the target object according to the gesture image of the target object;

the recognizing the gesture image of the target object in each scene image comprises the following steps:

extracting a gesture image in each scene image;

judging whether the number of the gesture images belonging to the same object in the plurality of scene images exceeds a preset number threshold value or not;

if yes, determining the gesture images which exceed a preset number threshold and belong to the same object as the gesture images of the target object.

2. The method of claim 1, wherein determining a hand swing direction of the target object from the gesture image of the target object comprises:

analyzing a hand swing trend of the target object according to the gesture image of the target object;

determining a hand swing direction of the target object according to the hand swing trend.

3. The method according to claim 1, wherein the querying the instruction library according to the gesture recognition result to determine the corresponding control instruction comprises:

inquiring an instruction library according to the hand waving direction of the target object to determine a corresponding control instruction; the command library comprises preset different hand waving directions and corresponding control commands.

4. The method of claim 1, wherein the controlling the robot to perform corresponding operations according to the control instructions comprises:

judging whether the target object moves or not at preset time intervals;

if so, tracking the target object, and controlling the robot to complete the control instruction.

5. The method of claim 1, wherein the robot is provided with a plurality of sensors, and wherein determining the position of the target object based on the voice information comprises:

acquiring the receiving time of the voice information by each sensor;

calculating a time difference between the receiving time with the shortest time and each of the remaining receiving times;

and determining the position of the target object according to the positions of the sensors, the sound propagation speed and the calculated time differences.

6. A robot control apparatus, characterized in that the apparatus comprises:

the receiving and extracting module is used for receiving voice information and extracting character information in the voice information;

the judging module is used for judging whether the character information has preset characters or not;

the determining module is used for determining the position of a target object according to the voice information after judging that the character information has preset characters;

the gesture recognition module is used for carrying out gesture recognition on the target object to obtain a gesture recognition result;

the query control module is used for querying an instruction library according to the gesture recognition result to determine a corresponding control instruction and controlling the robot to execute corresponding operation according to the control instruction; the command library comprises preset different gesture recognition results and corresponding control commands;

the gesture recognition module is specifically configured to continuously acquire a plurality of scene images including the target object;

recognizing a gesture image of the target object in each scene image;

extracting a gesture image in each scene image;

7. An electronic device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 5 when executing the computer program.

8. A non-transitory readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the steps of the method of any one of claims 1 to 5.