CN112036213A

CN112036213A - Gesture positioning method of robot, robot and device

Info

Publication number: CN112036213A
Application number: CN201910478634.0A
Authority: CN
Inventors: 叶树策; 刘林; 张俊旗; 陈洋铂
Original assignee: Anker Innovations Co Ltd
Current assignee: Anker Innovations Co Ltd
Priority date: 2019-06-03
Filing date: 2019-06-03
Publication date: 2020-12-04

Abstract

The application relates to the technical field of intelligent equipment, and particularly discloses a robot gesture positioning method, a robot and a device, wherein the method comprises the following steps: the robot comprises a camera, and the method comprises the following steps: controlling the camera to rotationally collect a human body target so as to obtain a first image; judging whether the first image contains an acquisition target or not; if so, controlling the camera to track and collect the collection target so as to obtain a second image; gesture information included in the second image is identified. By means of the mode, the human body gesture can be accurately positioned.

Description

Gesture positioning method of robot, robot and device

Technical Field

The application relates to the technical field of intelligent equipment, in particular to a gesture positioning method of a robot, the robot and a device.

Background

With the rapid development of the information age, the cleaning robot is widely applied and is a popular research direction in the field of service robots at present.

The cleaning robot can automatically finish cleaning work on the ground, windows and the like by means of certain artificial intelligence, for example, the floor sweeping robot is mainly used for cleaning the ground, and the window cleaning robot is mainly used for cleaning glass.

The inventor of the application finds that the existing cleaning robot is not intelligent enough in long-term research, and cannot accurately position human gestures and correctly execute corresponding gesture instructions in the cleaning process.

Disclosure of Invention

Accordingly, there is a need for a method, a robot and a device for positioning gestures of a robot, which can accurately position human body gestures.

In order to solve the technical problem, the application adopts a technical scheme that: provided is a gesture positioning method of a robot, wherein the robot comprises a camera, and the method comprises the following steps: controlling the camera to rotationally collect a human body target so as to obtain a first image; judging whether the first image contains an acquisition target or not; if so, controlling the camera to track and collect the collection target so as to obtain a second image; gesture information included in the second image is identified.

In order to solve the above technical problem, another technical solution adopted by the present application is: provided is a robot including: the camera is used for rotationally collecting a human body target to obtain a first image; the judging module is used for judging whether the first image contains the acquisition target or not; the camera is also used for tracking and collecting the collected target when the first image comprises the collected target so as to obtain a second image; and the recognition module is used for recognizing the gesture information contained in the second image.

In order to solve the above technical problem, the present application adopts another technical solution: an apparatus having a storage function is provided, the apparatus storing a computer program; the computer program, when executed by a processor, is capable of performing the gesture locating method described above.

The beneficial effect of this application is: different from the prior art, the method and the device have the advantages that the camera is controlled to rotationally collect the human body target so as to obtain a first image; judging whether the first image contains an acquisition target or not; if so, controlling the camera to track and collect the collected target so as to obtain a second image; and recognizing gesture information contained in the second image, controlling the robot to rotate and collect to find a collection target, tracking and collecting the collection target, realizing automatic positioning of the collection target, recognizing gesture information corresponding to the collection target, and controlling the robot to execute a corresponding instruction through the gesture information.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. Wherein:

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a method for locating a gesture of a robot according to the present disclosure;

FIG. 2 is a schematic flow chart diagram illustrating another embodiment of a method for locating gestures in a robot according to the present application;

FIG. 3 is a schematic flow chart of S20 in FIG. 1;

FIG. 4 is a schematic flow chart of S30 in FIG. 1;

FIG. 5 is a schematic flow chart diagram illustrating a method for locating gestures in a robot according to another embodiment of the present invention;

FIG. 6 is a schematic flow chart diagram illustrating a gesture positioning method for a robot according to still another embodiment of the present disclosure;

FIG. 7 is a schematic flow chart diagram illustrating a gesture locating method for a robot according to yet another embodiment of the present invention;

FIG. 8 is a schematic structural diagram of an embodiment of the robot of the present application;

FIG. 9 is a schematic structural diagram of another embodiment of a robot according to the present application;

fig. 10 is a schematic structural diagram of the device with a storage function according to the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In a specific implementation, the robot is a machine device for automatically executing work, which can receive human commands, run a pre-programmed program, and outline actions according to principles formulated by artificial intelligence technology, and has a task of assisting or replacing the work of human work, such as production, construction, or cleaning. The following describes a gesture positioning method of a robot and the robot provided by the present application in detail with reference to fig. 1 to 7 by taking a cleaning robot as an example.

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating an embodiment of a gesture positioning method of a robot according to the present application. In a specific implementation, the robot described in the embodiments of the present application has a camera, for example a rotating camera. Further, the rotary camera of the robot can perform rotary acquisition or video recording according to a preset rotary direction, a preset rotary speed, a preset rotary acceleration and a preset rotary angle. The method comprises the following steps:

s10: and controlling the camera to rotationally collect the human body target so as to obtain a first image.

Specifically, in the moving process of the robot, the camera shoots to obtain a first image, wherein the camera can shoot both pictures and videos, and when the camera shoots the pictures, the camera can work at certain time intervals.

After the robot starts the rotation acquisition mode or receives the rotation acquisition request, the robot can be triggered to start the camera, and the rotation acquisition mode of the camera is started. For example, the robot may trigger a start key of the camera to start a rotational capture mode of the camera through the request. After the robot starts the rotation collection mode of the camera, the camera can be controlled to perform rotation collection according to rotation data such as preset rotation direction, rotation speed or rotation acceleration.

In some possible embodiments, the robot may preset a rotation mode in which the camera is in the rotation capture mode to capture images, and the preset rotation direction or rotation angle of the robot may be preset when the robot leaves a factory, or may be preset according to parameters set by a user of the robot. Specifically, a user of the robot may input a customized rotation parameter, including a rotation direction or a rotation angle, an acceleration, and the like, on a user operation interface corresponding to a setting function module through the built-in setting function module of the robot, so as to modify a rotation mode of the robot in factory settings. The robot can obtain rotation parameters such as a rotation direction, a rotation angle or a rotation angular velocity input by a robot user, and the rotation parameters input by the robot user are used for replacing parameters set by the robot in a factory, so that the rotation parameters defined by the user are obtained.

After the robot starts the rotation collection mode of camera, then can trigger the camera according to predetermined rotation angle and rotate the collection according to predetermined direction of rotation, promptly, the robot can trigger the camera according to predetermined direction of rotation and rotate to shoot according to predetermined rotation angle control camera rotation to appointed angle, in order to obtain the picture of appointed angle. In specific implementation, the robot may acquire one or more preset rotation angles, and the camera is rotated to a position corresponding to the rotation angle according to a preset rotation direction to capture a picture corresponding to the rotation angle. The image corresponding to the rotation angle is a first image in a shooting range of the lens with the front of the camera facing when the camera rotates to the position corresponding to the rotation angle. The preset rotation angle of the camera can be any angle from 0 degree to 360 degrees.

S20: and judging whether the first image contains the acquisition target.

Specifically, feature information may be extracted from the first image, and whether or not the acquisition target exists in the first image may be determined using the established acquisition target recognition model.

If yes, go to S30.

S30: and controlling the camera to track and collect the collected target so as to obtain a second image.

Specifically, if the first image includes the acquisition target, the specific position of the acquisition target in the surrounding road conditions of the robot is determined, and the shooting parameters of the camera can be adjusted according to the position change information of the acquisition target, so that the display parameters of the acquisition target in the shooting picture conform to the predefined parameter value range, and the second image is obtained.

For example, the display parameter may include a display position of the acquisition target in the shot picture, and the parameter value range may include a coordinate value corresponding to a middle region or other preset region of the shot picture (for example, a coordinate system is established with an upper left corner of the shot picture as an origin and an edge as an axis, so as to determine the coordinate value). Preferably, the acquisition target is in the middle area of the shot picture.

S40: gesture information contained in the second image is identified.

Specifically, the image noise may be removed by applying a mean filtering method to the acquired second image. And converting the collected second image from an RGB color space into a YCrCb space, establishing an ellipse model, carrying out skin color detection, segmenting a gesture area and carrying out binarization processing. And constructing a convolutional neural network model and a parameter optimizer thereof, and obtaining a classifier with optimal performance by using training data. And executing static gesture recognition according to the gesture information in the recognition queue, and executing dynamic gesture recognition according to the gesture information in the recognition queue if the recognition is successful.

In other embodiments, the number of fingers in the gesture is also different, a second image is acquired through an RGBD camera such as Kinect, the second image is preprocessed through color space conversion, skin color extraction, binarization and morphological processing, a gesture rectangular region is extracted through a recognition algorithm, and gesture recognition is performed according to the pixel value change of the region.

An interactive gesture library may be provided for recognizing a specific gesture transmitted by a user, such as pointing to a specific location for cleaning and moving to a specific location.

Different from the prior art, the embodiment controls the camera to perform rotation acquisition so as to obtain a first image; judging whether the first image contains an acquisition target or not; if so, controlling the camera to track and collect the collected target so as to obtain a second image; and recognizing gesture information contained in the second image, controlling the robot to rotate and collect to find a collection target, tracking and collecting the collection target, realizing automatic positioning of the collection target, recognizing gesture information corresponding to the collection target, and controlling the robot to execute a corresponding instruction through the gesture information.

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating another embodiment of a gesture locating method of a robot according to the present application, before S10, the method further includes the following steps:

s50: and controlling the lens of the camera to face the same direction as the advancing direction of the robot, and obtaining a third image.

Wherein, the lens orientation of camera is the same with the direction of advance of robot.

Specifically, the shooting range of the lens of the camera may be a picture in the advancing direction of the robot, and the camera may be mounted at a suitable position on the robot to shoot a picture in a certain view range in the advancing direction of the robot, so as to obtain the third image. For example, when the robot and the camera are oriented in the same direction, the third image in the shooting range of the lens with the front facing of the camera is a picture in the shooting range with the front facing of the robot; when the orientations of the robot and the camera are not consistent, for example, the front orientation of the robot and the front orientation of the camera form an angle of 180 degrees, at this time, the third image in the shooting range of the lens with the front orientation of the camera is a picture in the shooting range with the back orientation of the robot.

S60: and carrying out human body recognition on the third image, and judging whether the third image contains a human body target.

Specifically, image pixel features in the third image are extracted. And inputting the image pixel characteristics into a human body deep learning model for recognition and classification. And judging whether the classification of the image pixel characteristics is matched with the human body characteristics existing in the human body deep learning model. And if the classification of the image pixel features is matched with the existing human body features in the human body deep learning model, judging that the third image comprises the human body target. The Deep learning model of the human body may be a random forest learning model, a regression self-organizing neural Network model, a Deep Belief Network (DBN) model, and the like, and in this embodiment, a DBN model is preferably used. The DBN model is provided with a plurality of hidden layer neural networks, complex functions can be better processed, and better generalization performance is shown when complex classification problems are processed.

If yes, go to S10. When the third image contains the human body target, the camera is controlled to rotate and collect according to preset rotation parameters.

Different from the prior art, the embodiment has the advantages that the human body recognition is carried out on the third image, so that the recognition process is simplified, the recognition efficiency is improved, and the real-time performance is better.

In an embodiment, the rotation parameters of the camera include at least a turning direction, a rotation speed and a rotation angle. The rotation angle of the camera is any angle from 0 degree to 360 degrees, and the rotation direction of the camera is an X-axis axial direction, a Y-axis axial direction or a Z-axis axial direction.

Specifically, the rotation parameters of the camera may include: the rotation direction (or referred to as rotation direction), the rotation angle, the rotation speed, or the rotation acceleration, etc., are not limited herein. For example, the camera performs 0-360 degree rotation acquisition in the X-axis direction, or performs 0-360 degree rotation acquisition in the Z-axis direction.

In an embodiment, S10 may specifically include: and controlling the camera to perform 0-180-degree rotation acquisition in the Z-axis axial direction so as to obtain a first image.

Referring to fig. 3, fig. 3 is a schematic flow chart of S20 in fig. 1, and S20 includes the following steps:

s201: and carrying out extraction processing on the first image to extract the characteristic information of the first image.

Specifically, the camera can identify the confirmed target area by adopting a remote HSV color and obtain an RGB color image, and the depth camera is adopted to obtain a depth image. RGB feature information of the first image may be obtained by SURF feature point detection.

S202: and matching the characteristic information of the first image with the pre-stored characteristic information of the acquired target through a target identification algorithm.

Specifically, the RGB feature information is feature-matched with pre-stored acquisition target RGB feature information.

The target recognition algorithm is a human hand recognition algorithm, and the main process of the algorithm is as follows: 1) acquiring a human hand: and acquiring a required human hand image from the first image. 2) Intercepting a human hand area: and intercepting the human hand area according to the skin color of the human hand. 3) Image preprocessing: and performing denoising, light compensation, smoothing, contrast enhancement and other processing on the image. 4) Human hand detection: specific parts of human hands, such as fingers, palms, wrists and the like, are marked. 5) Human hand recognition: the characteristic extraction is mainly carried out, including extracting the distance between two eyes, obtaining the inclination angle of the wrist, obtaining the gravity center of the wrist and the fingers, marking the characteristic, and storing the characteristic value into a human hand library after extracting the characteristic. 6) Human hand comparison: and comparing and analyzing the characteristics of the currently acquired image with the characteristics in the human hand library to give a similarity result.

If the matching is successful, the process proceeds to S203.

S203: and determining that the first image contains the acquisition target.

Specifically, if the existing object model is met, it is determined that the first image includes the acquisition target.

The above-described embodiments enable a more accurate recognition result.

Referring to fig. 4, fig. 4 is a schematic flowchart of S30 in fig. 1, and S30 includes the following steps:

s301: when the first image comprises the acquisition target, the camera is controlled to position the acquisition target so that the acquisition target is always positioned in the center of the lens of the camera.

Specifically, the preset rotation parameters of the camera can be adjusted according to the position change information of the acquisition target, so that the acquisition target is positioned in the center of the lens of the camera.

When the position change information comprises moving information in the horizontal direction, the lens of the camera can be controlled to synchronously rotate around the vertical axis; for example, when the collection target moves horizontally to the left, the lens of the camera may be controlled to rotate to the left about the vertical axis, and when the collection target moves horizontally to the right, the lens of the camera may be controlled to rotate to the right about the vertical axis.

When the position change information comprises moving information in the vertical direction, the lens of the camera can be controlled to synchronously rotate around the horizontal axis; for example, when the capture object moves vertically downward (more than a lower step), the lens of the camera may be controlled to rotate downward about the horizontal axis, and when the capture object moves vertically upward (such as an upper step), the lens of the camera may be controlled to rotate upward about the horizontal axis.

When the position change information comprises the spacing distance position change information between the lens of the camera and the lens of the camera, controlling the lens of the camera to perform synchronous picture zooming processing; for example, when the capture object moves toward the lens direction, the separation distance becomes smaller, and the lens of the camera can be controlled to perform the picture reduction processing, and when the capture object moves away from the lens direction, the separation distance becomes larger, and the lens of the camera can be controlled to perform the picture enlargement processing. Of course, when the position change information of the acquisition target occurs, the change in the horizontal direction (or called left-right direction), the vertical direction (or called up-down direction), and the lens direction (or called front-back direction) may be involved at the same time, and the shooting parameters may be controlled and adjusted according to the above scheme, so as to adapt to the relative position change information situation of the acquisition target.

Furthermore, the position of the acquisition target in the picture and the deviation between the acquisition target and the central area of the lens can be judged according to the pre-judged position, size and speed information of the acquisition target, and the camera can execute a 3D algorithm with corresponding speed to move the acquisition target to the central area of the lens.

Referring to fig. 5, fig. 5 is a schematic flowchart illustrating a gesture positioning method of a robot according to another embodiment of the present application, after S30, the method further includes the following steps:

s70: and judging whether the acquisition target is lost within the preset time.

Specifically, after the acquisition target is accurately extracted, the object tracking can be performed in a subsequent image frame by using the previously extracted acquisition target features, the specific tracking method is to perform gray value matching of brightness on the acquisition targets of successive image frames, if the acquisition targets can be well matched in a determined search range, the acquisition targets are considered not to be lost, and if the acquisition targets cannot be matched, the acquisition targets are considered to be lost.

In other embodiments, a target detection model may also be selected, and the target detection model is used for performing analysis detection on the second image. And analyzing according to the image data of the second image and the target detection model, determining a truncation threshold corresponding to the image data, and determining whether the acquisition target exists according to the truncation threshold. Wherein determining whether there is a target loss in the second image based on the truncation threshold comprises: and aiming at least one continuous frame of second image, obtaining each frame of second image in real time through a target detection model, and determining that the acquisition target in the second image is lost if the normalized primary-secondary peak ratio of the filter response image of each frame of second image is less than a truncation threshold.

If the acquisition target is lost, S50-S60 is re-executed.

Specifically, when the acquisition target is lost, the third image in the shooting range with the front face of the camera facing is acquired again. And when the third image contains the human body target, executing S10, and controlling the camera to rotate and collect according to preset rotation parameters again to obtain the first image.

The preset time may be 10-60 seconds, for example, 10 seconds, 30 seconds, or 60 seconds.

Different from the prior art, the embodiment can enable the robot to autonomously judge that the acquisition target is lost, so that a corresponding coping method is made.

Referring to fig. 6, fig. 6 is a schematic flowchart illustrating a gesture positioning method of the robot according to still another embodiment of the present application, after S40, the method further includes the following steps:

s80: and converting the gesture information into a corresponding control instruction.

Specifically, the robot may recognize the collected dynamic image information by using a depth map-based 3D gesture recognition technology, recognize complete gesture skeleton information of the hand of the user, including spatial coordinates, spatial degrees of freedom, and the like of each joint point of the hand, and then judge to obtain effective gesture information based on the gesture skeleton information. And searching gesture information in a preset gesture control list, if so, executing S90, otherwise, not performing any operation and response.

The mapping relation between the gesture information and the control instruction in the gesture control list can be set according to the type of the remote-controlled equipment.

The mapping relationship between the gesture information and the control instruction may include: the gesture information "fist making" corresponds to the control instruction "hovering", the gesture information "hand upward displacement" corresponds to the control instruction "rising", the gesture information "hand downward displacement" corresponds to the control instruction "falling", the gesture information "palm counterclockwise rotation" corresponds to the control instruction "turning left", the gesture information "palm clockwise rotation" corresponds to the control instruction "turning right", the gesture information "palm left inclined" corresponds to the control instruction "left inclined", the gesture information "palm right inclined" corresponds to the control instruction "right inclined", the gesture information "palm forward inclined" corresponds to the control instruction "forward inclined", and the gesture information "palm backward inclined" corresponds to the control instruction "backward inclined".

In other embodiments, when the robot is a sweeping robot, the mapping relationship between the gesture information and the control instruction may include: the gesture information "OK" corresponds to a control instruction "start", the gesture information "fist making" corresponds to a control instruction "pause", the gesture information "palm open upward" corresponds to a control instruction "forward", the gesture information "palm open downward" corresponds to a control instruction "backward", the gesture information "palm open leftward" corresponds to a control instruction "left turn", and the gesture information "palm open rightward" corresponds to a control instruction "right turn".

S90: and executing the control instruction.

The following explains the gesture positioning method of the robot in an implementation scenario. Referring to fig. 7, fig. 7 is a schematic flowchart illustrating a gesture positioning method of a robot according to still another embodiment of the present disclosure. The gesture positioning method of the robot comprises the following steps:

s101: the lens position is initialized so that the lens orientation of the camera is the same as the advancing direction of the robot.

S102: and obtaining a third image in the shooting range with the front face of the camera facing.

S103: and carrying out human body recognition on the third image.

S104: and judging whether the third image contains the human body target.

If yes, entering S105; if not, the process returns to S102.

S105: and controlling the camera to perform 0-180-degree rotation acquisition in the Z-axis axial direction so as to obtain a first image.

S106: and judging whether the first image contains the acquisition target (namely the hand characteristic).

If yes, entering S107; if not, the process returns to S105.

S107: and starting a timer, and controlling the camera to track and collect the collected target so as to obtain a second image.

S108: and under the tracking acquisition mode, judging whether the acquisition target in the second image is lost within the preset time.

If not, the process proceeds to S109. If yes, the camera returns to the initial state, and the operation returns to S101.

S109: and recognizing gesture information contained in the second image, converting the gesture information into a corresponding control instruction, and executing the control instruction.

Referring to fig. 8 and fig. 8 are schematic structural diagrams of an embodiment of the robot of the present application, it should be noted that the robot 10 of the present embodiment may perform the steps of the method, and detailed descriptions of relevant contents refer to the above method section, which is not described herein again.

The robot 10 is a cleaning robot 10, and the cleaning robot 10 in the present embodiment may be a floor sweeping robot 10, a window wiping robot 10, or another robot 10 engaged in cleaning work, which is not limited herein. The robot 10 includes: the camera 11, the determining module 12 and the identifying module 13, wherein the determining module 12 is coupled to the camera 11 and the identifying module 13.

The camera 11 is used for rotationally collecting a human body target so as to obtain a first image.

The judging module 12 is used for judging whether the first image includes the acquisition target.

The camera 11 is further configured to perform tracking acquisition on the acquisition target when the first image includes the acquisition target, so as to obtain a second image.

The recognition module 13 is configured to recognize gesture information included in the second image.

Further, the camera 11 is configured to obtain a third image within a shooting range in which the camera faces in the same direction as the advancing direction of the robot 10;

the identification module 13 is configured to perform human body identification on the third image, and determine whether the third image includes a human body target; when the third image includes the human body target, the camera 11 is configured to perform rotation acquisition according to preset rotation parameters.

The camera 11 is used for acquiring 180-degree rotation in the vertical direction to obtain a first image.

Referring to fig. 9, fig. 9 is a schematic structural diagram of another embodiment of the robot of the present application, and the robot 10 includes: an extraction module 14 and a matching module 15; the extraction module 14 is coupled to the camera 11, and the matching module 15 is coupled to the extraction module 14.

The extraction module 14 is configured to perform extraction processing on the first image to extract feature information of the first image;

the matching module 15 is configured to match the feature information of the first image with pre-stored feature information of the acquisition target by using a target recognition algorithm, and if the matching is successful, it is determined that the first image includes the acquisition target.

When the first image contains the acquisition target, the camera 11 is used for positioning the acquisition target so that the acquisition target is always positioned in the center of the lens of the camera.

The judging module 12 is used for judging whether the acquisition target is lost within a preset time;

if the acquisition target is lost, the camera 11 is used for obtaining a third image in the shooting range of the front face of the camera again;

when the third image includes the human body target, the camera 11 is configured to perform rotation acquisition again according to preset rotation parameters to obtain the first image.

Referring to fig. 10, fig. 10 is a schematic structural diagram of an apparatus with a storage function according to the present application, and the apparatus 20 stores a computer program 21. The computer program 21, when executed by a processor, implements a gesture localization method as in the present application.

The device 20 with storage function may be a portable storage medium such as a usb disk and an optical disk, or may be a terminal, a server or an integrated independent component such as an image processing chip.

The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application or are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. A method for gesture localization of a robot, the robot comprising a camera, the method comprising:

controlling the camera to rotationally collect a human body target so as to obtain a first image;

judging whether the first image contains an acquisition target or not;

if so, controlling the camera to track and collect the collection target so as to obtain a second image;

gesture information included in the second image is identified.

2. The method of claim 1, wherein prior to the step of controlling the camera to rotationally capture the human target to obtain the first image, the method further comprises:

controlling the direction of a lens of the camera to be the same as the advancing direction of the robot, and obtaining a third image;

carrying out human body recognition on the third image, and judging whether the third image contains the human body target;

and when the third image contains the human body target, controlling the camera to rotationally collect the human body target.

3. The method of claim 1,

the rotation angle of the camera is any angle from 0 degree to 360 degrees;

the rotation direction of the camera is the X-axis axial direction, the Y-axis axial direction or the Z-axis axial direction.

4. The method of claim 3, wherein the step of controlling the camera to perform a rotational acquisition of the human target to obtain the first image comprises:

and controlling the camera to perform rotation acquisition of 0-180 degrees in the Z-axis direction so as to obtain a first image.

5. The method of claim 1, wherein the step of determining whether the first image includes an acquisition target comprises:

extracting the first image to extract characteristic information of the first image;

matching the characteristic information of the first image with pre-stored characteristic information of a collected target through a target identification algorithm;

and if the matching is successful, determining that the first image comprises the acquisition target.

6. The method of claim 1, wherein the step of controlling the camera to perform tracking acquisition on the acquisition target to obtain a second image comprises:

and when the first image contains the acquisition target, controlling the camera to position the acquisition target so as to enable the acquisition target to be positioned in the center of the lens of the camera all the time.

7. The method of claim 2, wherein after the step of controlling the camera to perform tracking acquisition on the acquisition target to obtain a second image, the method further comprises:

judging whether the acquisition target is lost within a preset time;

if the acquisition target is lost, the third image in the shooting range with the front face of the camera facing is obtained again;

and when the third image contains the human body target, the camera is controlled again to perform rotation collection according to preset rotation parameters so as to obtain the first image.

8. The method of claim 1, wherein after the step of identifying gesture information contained in the second image, the method further comprises:

converting the gesture information into a corresponding control instruction;

and executing the control instruction.

9. A robot, comprising:

the camera is used for rotationally collecting a human body target to obtain a first image;

the judging module is used for judging whether the first image contains the acquisition target or not;

the camera is further used for tracking and collecting the collection target when the first image comprises the collection target so as to obtain a second image;

and the identification module is used for identifying the gesture information contained in the second image.

10. An apparatus having a storage function, characterized in that the apparatus stores a computer program; the computer program, when executed by a processor, is capable of performing the gesture localization method of any of claims 1 to 8.