CN114495273A

CN114495273A - Robot gesture teleoperation method and related device

Info

Publication number: CN114495273A
Application number: CN202210081710.6A
Authority: CN
Inventors: 高庆; 陈勇全; 池楚亮; 王启文; 沈文心; 房俊雯
Original assignee: Chinese University of Hong Kong Shenzhen
Current assignee: Chinese University of Hong Kong Shenzhen
Priority date: 2022-01-24
Filing date: 2022-01-24
Publication date: 2022-05-13

Abstract

The embodiment of the application discloses a robot gesture teleoperation method, which comprises the following steps: acquiring gesture image information of a user; obtaining gesture box information, gesture three-dimensional position information, gesture confidence and gesture posture information according to the gesture image information; estimating key point center information and gesture direction information through the gesture image information; obtaining gesture category information according to the gesture box information, the gesture confidence and the key point center information, wherein the gesture category information represents left and right categories of the hand of the user; and controlling the action of the manipulator of the robot through the first target gesture information.

Description

Robot gesture teleoperation method and related device

Technical Field

The embodiment of the application relates to the field of vision, in particular to a robot gesture teleoperation method and a related device.

Background

Teleoperation refers to the control of remote equipment to accomplish complex operations in an environment far away from an operation object under the control and participation of people. The teleoperation shows social value remarkably in the special scenes of post-disaster rescue sites, teleoperation and the like. The remote operation of the gestures of both hands of a person is simulated, so that the flexibility advantages of both hands of the person are achieved, and the mechanical equipment can be remotely controlled to complete complex operation.

The existing scheme, based on the data gloves and the myoelectricity bracelet method, can well control the robot to complete corresponding actions remotely. Vision-based two-hand detection and differentiation is a difficulty in visual teleoperation techniques. When two hands show different gestures, the difference of the gestures is far greater than the difference of the asymmetry of the hands, however, the existing methods can only extract the characteristics of the hands, and the methods cannot well distinguish the left hand from the right hand, so that inconvenience is brought to users.

Disclosure of Invention

The embodiment of the application provides a robot gesture teleoperation method and a related device.

A method of gesture teleoperation of a robot, comprising:

acquiring gesture image information of a user, wherein the gesture image information is image information representing hand motions of the user;

obtaining gesture box information, gesture three-dimensional position information, gesture confidence and gesture posture information according to the gesture image information, wherein the gesture box information is information representing the range of the hand of the user, the gesture confidence represents the probability that the gesture posture information conforms to the finger posture of the user, and the gesture posture information represents the posture with the maximum probability that the gesture posture conforms to the finger posture of the user in a gesture database;

estimating key point center information and gesture direction information through the gesture image information, wherein the key point center information is center position information representing the estimated hand, and the gesture direction information is direction information representing the hand direction of the user;

obtaining gesture category information according to the gesture box information, the gesture confidence and the key point center information, wherein the gesture category information represents left and right categories of the hand of the user;

and controlling the action of a manipulator of the robot through first target gesture information, wherein the first target gesture information comprises the gesture category information, the gesture three-dimensional position information, the gesture direction information and the gesture posture information.

Optionally, obtaining gesture box information, gesture three-dimensional position information, gesture confidence and gesture posture information according to the gesture image information includes:

performing feature extraction on a color image to obtain a feature map, wherein the color image is part of the information of the gesture image;

detecting the feature map to obtain the gesture three-dimensional position information and the gesture box information based on a depth image, wherein the depth image is partial information of the gesture image information;

and classifying the gesture box information to obtain the gesture attitude information and the gesture confidence.

Optionally, the estimating of the center information of the key point and the gesture direction information through the gesture image information includes:

determining two-dimensional information of key points according to the color image;

determining the three-dimensional information of the key points through the two-dimensional information of the key points based on the depth image;

and calculating the center information of the key points and the gesture direction information according to the three-dimensional information of the key points.

Optionally, obtaining gesture category information according to the gesture box information, the gesture confidence and the key point center information includes:

judging whether the gesture confidence is greater than a confidence threshold, wherein the confidence threshold is a preset constant value;

if yes, judging whether the key point center information is matched with the gesture box information;

if so, determining the gesture category information according to the key point center information matched with the gesture box information;

if not, judging whether the distance between the estimated center position of the hand in the key point center information and the center position of the hand judged in the gesture three-dimensional position information is smaller than a distance threshold value or not;

and if the distance between the hand and the center position of the hand is smaller than the distance threshold, determining the gesture type information according to the type of the estimated center position of the hand, wherein the distance between the hand and the determined center position of the hand is smaller than the distance threshold.

Optionally, the controlling the robot arm action of the robot through the first target gesture information includes:

calculating according to the gesture three-dimensional position information to obtain manipulator position information;

calculating according to the gesture direction information to obtain manipulator direction information;

and controlling the action of the manipulator through second target gesture information, wherein the second target gesture information comprises the gesture category information, the manipulator position information, the manipulator direction information and the gesture posture information.

Optionally, determining the three-dimensional information of the keypoint through the two-dimensional information of the keypoint based on the depth image includes:

calculating to obtain the three-dimensional information of the key points by the following formula:

z_i、x_iand y_iThree-dimensional coordinate information of the ith key point, namely the three-dimensional information of the key point;

d_iobtaining the depth value of the ith key point from the depth image;

s is the depth range of the depth image;

u_iand v_iTwo-dimensional coordinate information of the ith key point, namely the two-dimensional information of the key point;

cx and cy are coordinate values of the central point of the depth image;

fx is the focal length of the camera on the x-axis, fy is the focal length of the camera on the y-axis.

Optionally, calculating the key point center information through the key point three-dimensional information includes:

calculating the key point center information by the following formula:

x, Y and Z are the estimated central coordinate information of the hand of the corresponding category;

z_i、x_iand y_iThree-dimensional coordinate information of the ith key point of the hand corresponding to the category;

k is the total number of key points of the hand corresponding to the category.

A robotic gesture teleoperational device, comprising:

the device comprises an acquisition unit, a display unit and a control unit, wherein the acquisition unit is used for acquiring gesture image information of a user, and the gesture image information is image information representing hand motions of the user;

the processing unit is used for obtaining gesture box information, gesture three-dimensional position information, gesture confidence and gesture posture information according to the gesture image information, wherein the gesture box information is information representing the range of the hand of the user, the gesture confidence represents the probability that the gesture posture information conforms to the finger posture of the user, and the gesture posture information represents the posture with the maximum probability that the gesture posture conforms to the finger posture of the user in a gesture database;

the estimation unit is used for estimating key point center information and gesture direction information through the gesture image information, wherein the key point center information is center position information representing estimated hands, and the gesture direction information is direction information representing the hand direction of the user;

the processing unit is further configured to obtain gesture category information according to the gesture box information, the gesture confidence and the key point center information, where the gesture category information represents left and right categories of the hand of the user;

and the control unit is used for controlling the action of a manipulator of the robot through first target gesture information, and the first target gesture information comprises the gesture category information, the gesture three-dimensional position information, the gesture direction information and the gesture posture information.

A robotic gesture teleoperational device, comprising:

the system comprises a central processing unit, a memory and an input/output interface;

the memory is a transient storage memory or a persistent storage memory;

the central processor is configured to communicate with the memory and execute the instruction operations in the memory to perform the aforementioned methods.

A computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the aforementioned method.

According to the technical scheme, the embodiment of the application has the following advantages:

after gesture image information of a user is collected, gesture box information, gesture three-dimensional position information, gesture confidence and gesture posture information are obtained according to the gesture image information, and then key point center information and gesture direction information are estimated through the gesture image information. And then obtaining gesture category information according to the gesture box information, the gesture confidence and the key point center information. And finally, controlling the action of the manipulator of the robot through the first target gesture information. The left hand and the right hand can be distinguished through the gesture box information, the gesture confidence and the key point center information, and left and right categories of the hand of the user, namely gesture category information, are obtained. And corresponding remote operation is carried out on the left manipulator and the right manipulator of the robot by combining other information, so that the efficiency is improved, and better experience is provided for a user.

Drawings

FIG. 1 is a schematic diagram illustrating a gesture remote operation according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of an embodiment of a method for robot gesture teleoperation according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of another embodiment of a method for robot gesture teleoperation according to an embodiment of the present disclosure;

FIG. 4 is a topological diagram of key points of a human body according to an embodiment of the present application;

FIG. 5 is a schematic diagram of an embodiment of a robot gesture teleoperation device according to an embodiment of the present disclosure;

fig. 6 is a schematic view of another embodiment of a robot gesture teleoperation device according to the present application.

Detailed Description

The teleoperation mode of the traditional robot can control the traditional mechanical arm through equipment such as a mouse, a keyboard and the like. However, the conventional apparatus cannot easily and effectively control the robot having the robot arms, and cannot distinguish between the left and right hands of the user. The existing remote control methods of a two-arm robot simulating both hands and both arms of a person are teleoperated based on wearable interactive equipment, the motion of the hands of the user is limited by the interactive equipment, and the methods can only extract the characteristics of the hands, and the methods cannot well distinguish the left hand from the right hand. The gesture teleoperation method provided by the embodiment of the application can enable the user not to be limited by interactive equipment any more, and can distinguish the left hand and the right hand of the user to realize efficient control.

Referring to fig. 1, in the teleoperation according to the embodiment of the present application, gesture image information of a user is acquired by an RGB-D camera, and the acquired gesture image information is processed and then mapped to a manipulator of a dual-arm robot, that is, a five-finger dexterous hand. The five-finger dexterous hand grips an object on the table according to the gesture operation of the user to finish remote operation. The two-arm robot may be a baxter robot or other robots, and is not limited herein. Each arm of the baxter double-arm robot has 6 degrees of freedom, can realize basic path planning and motion control functions, and the double arms have a cooperative function, and can avoid collision between the double arms in complex operation. The manipulator at its end, also has 6 degrees of freedom, including 5 bending degrees of freedom for 5 fingers and 1 rotational degree of freedom for the thumb. Different objects can be grabbed or operated through different gesture postures.

To specifically describe the gesture teleoperation method in the embodiment of the present application, referring to fig. 2, an embodiment of the gesture teleoperation method for a robot in the embodiment of the present application includes:

201. acquiring gesture image information of a user;

and acquiring gesture image information of the user. The gesture image information is image information representing a hand motion of the user. Specifically, the user does not need to wear the interactive device, and can acquire the related image information through the RGB-D camera by taking a stroke action within the shooting range of the RGB-D camera. The RGB-D camera is capable of capturing RGB color images and depth images of the user's hand to generate relevant three-dimensional information, where the gesture image information includes a color image and a depth image.

It is understood that the RGB-D camera may be a realsense D435i camera, or a kinect camera or other cameras capable of acquiring a depth image and a two-dimensional color image, and is not limited herein.

202. Obtaining gesture box information, gesture three-dimensional position information, gesture confidence and gesture posture information according to the gesture image information;

and processing the gesture image information to obtain gesture box information, gesture three-dimensional position information, gesture confidence and gesture posture information. Specifically, feature extraction is performed on color images in the gesture image information to obtain a feature map, and the feature map is detected by combining the depth image to obtain gesture frame information and gesture three-dimensional position information. And inputting the gesture box information into an EfficientNet-B5 classifier for classification, comparing the classification with gesture postures in a gesture posture library, and picking out the most similar gesture posture to determine gesture posture information. And obtaining corresponding gesture confidence according to the gesture posture information. The gesture information is information representing the gesture with the maximum probability of conforming to the finger posture of the user in the gesture database.

203. Estimating key point center information and gesture direction information through the gesture image information;

and estimating key point center information and gesture direction information through the gesture image information. Specifically, the color image can be analyzed by using a BlazePose algorithm network to determine two-dimensional information of the key points, and then the three-dimensional information of the key points can be obtained by combining the two-dimensional information with the depth image. The BlazePose algorithm network is a lightweight convolutional neural network framework and is used for human posture estimation and real-time inference. In the inference process, the network generates 33 body key points for the human body, running at speeds above 30 frames per second. And (4) central position information of the hand, namely central information of the key points, which can be estimated according to the three-dimensional position information of the key points of the hand. A coordinate system can be established according to the three-dimensional information of the key points, and the gesture direction information can be determined by utilizing a quaternion rotation principle. The gesture direction information is information indicating the orientation of the hand of the user.

204. Obtaining gesture category information according to the gesture box information, the gesture confidence and the key point center information;

and analyzing and judging according to the gesture box information, the gesture confidence and the key point center information to obtain gesture category information. The gesture type information represents left and right types of the user's hand and is used for distinguishing the left hand from the right hand of the user. Specifically, the gesture category information is obtained by comparing the gesture confidence with a preset confidence threshold, and judging whether the center information of the key point is matched with the gesture frame information or judging whether the distance between the estimated center position of the hand in the center information of the key point and the center position of the hand judged in the three-dimensional position information of the gesture is smaller than a distance threshold.

205. Controlling the action of a manipulator of the robot through the first target gesture information;

and after the gesture type information is obtained, controlling the action of a manipulator of the robot through the first target gesture information. The first target gesture information comprises gesture category information, gesture three-dimensional position information, gesture direction information and gesture posture information. And processing the first target gesture information to obtain second target gesture information, and mapping the second target gesture information to a manipulator of the double-arm robot so as to control the manipulator to execute corresponding actions.

In the embodiment of the application, after gesture image information of a user is collected, gesture box information, gesture three-dimensional position information, gesture confidence and gesture posture information are obtained according to the gesture image information, and then key point center information and gesture direction information are estimated through the gesture image information. And then obtaining gesture category information according to the gesture box information, the gesture confidence and the key point center information. And finally, controlling the action of the manipulator through the first target gesture information. The left hand and the right hand can be distinguished through the gesture box information, the gesture confidence and the key point center information, and left and right categories of the hand of the user, namely gesture category information, are obtained. And corresponding remote operation is carried out on the left manipulator and the right manipulator of the robot by combining other information, so that the efficiency is improved. In addition, the actions of the hands without the wearable interactive equipment are collected, namely gesture image information is collected, the information is processed to obtain first target gesture information, and the manipulator is controlled to execute corresponding actions remotely according to the first target gesture information, so that the hands can trace the actions without limitation under the condition of remote operation, and better experience is provided for users.

Referring to fig. 3, another embodiment of a method for robot gesture remote operation according to the embodiment of the present application includes:

301. acquiring gesture image information of a user;

302. Carrying out feature extraction on the color image to obtain a feature map;

and performing feature extraction on the color image to obtain a feature map. Specifically, the color image is input into an input layer of the convolutional neural network, and then feature extraction is performed on the convolutional layer, so that a feature map of the hand of the user is obtained finally. It is understood that the type of convolutional neural network may be a network such as a DLA-34 network, which can be used for detecting and performing feature extraction, and is not limited herein.

303. Detecting the feature map based on the depth image to obtain the gesture three-dimensional position information and the gesture box information;

and detecting the feature map based on the depth image to obtain gesture three-dimensional position information and gesture box information. Specifically, a coordinate system is established, and the two-dimensional center coordinates of the hand and the width and height of a box representing the range of the hand are detected for the feature map. The offset amount may be set according to practical experience. And then, carrying out coordinate mapping by using the depth image to obtain gesture three-dimensional position information and gesture frame information. For example, the two-dimensional center coordinates of the hand detected for the feature map are (x, y), the width of the box is w, the height of the box is h, the offset is (Δ x, Δ y), the depth value obtained by the depth image is z, the two-dimensional center coordinates of the hand are (x + Δ x, y + Δ y), the coordinates of one diagonal point of the box of the hand are (x + Δ x + w/2, y + Δ y + h/2), and the coordinates of the other three diagonal points are analogized in sequence. The gesture three-dimensional position information is (x + Δ x, y + Δ y, z), and the gesture box information H _ box (x, y, z, w, H) represents a coordinate range with (x + Δ x, y + Δ y, z) as a center coordinate and w, H as a width and a height of the hand.

304. Classifying the gesture box information to obtain the gesture attitude information and the gesture confidence;

and classifying the gesture box information to obtain gesture posture information. Specifically, after the range of the square frame of the hand is determined, the feature map is cut according to the square frame, and the cut image of the hand is input into an EfficientNet-B5 classifier for classification. The gesture library is preset and can comprise a plurality of existing gesture gestures, such as 24 English letters or 10 Arabic numerals drawn by hand. And comparing in a gesture posture library, finding out the gesture posture with the maximum probability similar to the hand of the user, determining the gesture posture as gesture posture information, and determining the probability as a gesture confidence coefficient. For example, the gesture pose in the cropped image is the pose of the stroked number 1, while the gesture pose library has just no pose of the number 1, only poses of the number 2, the number 3, the number 4 and the number 5, while the probability of the number 2 is 90%, the probability of the number 3 is 80%, the probability of the number 4 is 70% and the probability of the number 5 is 60%. By contrast, the gesture of the number 2 with the highest probability is determined as gesture information, while the gesture confidence is determined to be 90%.

305. Determining two-dimensional information of key points according to the color image;

and determining two-dimensional information of the key points according to the color image. The color images can be analyzed using a BlazePose algorithm network to determine two-dimensional information for key points. The BlazePose algorithm network is a lightweight convolutional neural network framework and is used for human posture estimation and real-time inference. Referring to fig. 4, the network may generate two-dimensional information of key points according to a topological graph of key points of a human body. In the inference process, the network generates 33 body key points for the human body, running at speeds above 30 frames per second. In practice, the information of 10 key points of the face does not need to be acquired, and only 22 key points, especially the key points of the left hand and the right hand, need to be acquired.

306. Determining the three-dimensional information of the key points through the two-dimensional information of the key points based on the depth image;

and determining three-dimensional information of the key points through the two-dimensional information of the key points based on the depth image. Specifically, the three-dimensional information of the key points is calculated by the following formula:

d_iobtaining a depth value of the ith key point from the depth image;

s is the depth range of the depth image;

cx and cy are coordinate values of the central point of the depth image;

307. Calculating the center information of the key points and the gesture direction information according to the three-dimensional information of the key points;

and calculating key point center information and gesture direction information through the key point three-dimensional information. Specifically, the key point center information is calculated by the following formula:

k is the total number of key points of the hand corresponding to the category.

Taking the left hand as an example, please refer to fig. 4, the key points of the left hand are the 15 th point, the 17 th point, the 19 th point and the 21 st point, the value of k is 4, and the center coordinate information of the left hand is:

and so on for the right hand.

And establishing a coordinate system according to the three-dimensional information of the key points. Specifically, the z-axis is defined as a unit normal vector of a plane formed by the 15 th point, the 17 th point and the 19 th point, the x-axis is defined as a unit direction vector from the 15 th point to the left-hand center point, and the y-axis is defined as a unit direction vector cross-multiplied by the x-axis and the z-axis. According to the coordinate system, the gesture direction information can be obtained by utilizing a quaternion rotation principle.

308. And judging whether the gesture confidence is greater than a confidence threshold, if so, executing step 309, and if not, executing step 310. The confidence threshold is a preset constant value, the value range is 0.5 to 1, and the confidence threshold is generally 0.85 according to experience;

309. judging whether the key point center information is matched with the gesture box information, if so, executing a step 312, and if not, executing a step 311;

specifically, whether the estimated center position of the hand in the key point center information is in a square frame of the hand is judged, and if the estimated center position of the hand is in the square frame, the type of the hand can be determined.

310. Judging whether the distance between the estimated center position of the hand in the center information of the key points and the center position of the hand judged in the three-dimensional position information of the gesture is smaller than a distance threshold value, if so, executing a step 313, and if not, executing a step 311, wherein the distance threshold value is a constant value preset according to practical experience;

311. ending the control of the manipulator;

the control of the robot is ended. The type of the hand cannot be determined, and the remote control is ended to prevent the manipulator from performing a malfunction.

312. Determining the gesture category information according to the key point center information matched with the gesture box information;

and determining gesture category information according to the key point center information matched with the gesture box information. If the category of the key point center information matched with the gesture frame information is left, the gesture category information represents the information of the left hand, and if the category of the key point center information matched with the gesture frame information is right, the gesture category information represents the information of the right hand. For example, if the estimated center position of the left hand is within the frame of the hand, the gesture category information is determined as the left category, i.e., the hand of the user is the left hand. And determining the gesture category information as a right category if the estimated center position of the right hand is in the square frame of the hand, namely the hand of the user is the right hand.

313. Determining the gesture category information according to the category of the estimated central position of the hand, wherein the distance between the estimated central position of the hand and the determined central position of the hand is smaller than the distance threshold;

and determining gesture category information according to the category of the estimated central position of the hand, wherein the distance between the estimated central position of the hand and the determined central position of the hand is smaller than the distance threshold. For example, the estimated center position of the left hand in the center information of the key points is point P1, the estimated center position of the right hand is point P2, the center position of the hand determined in the three-dimensional position information of the gesture is point P3, and the distance threshold is set to 5. If the distance between P1 and P3 is 2, the distance between P2 and P3 is 7, and 2 < 5 < 7, the gesture type information is determined to be in a left type, that is, the hand of the user is the left hand.

314. Calculating according to the gesture three-dimensional position information to obtain manipulator position information;

and calculating through the gesture three-dimensional position information to obtain the manipulator position information. In order to efficiently control the manipulator, the three-dimensional position information of the gesture needs to be processed to obtain the position information of the manipulator. Specifically, the position information of the manipulator is calculated by the following formula:

and

the three-dimensional center coordinate information of the manipulator in the ith frame, namely the manipulator position information, wherein i is more than or equal to 2;

and

three-dimensional center coordinate information, namely gesture three-dimensional position information, representing the hand of the user in the ith frame;

k_pthe position increment coefficient can be preset according to practical experience and is generally set to be 1.

315. Calculating according to the gesture direction information to obtain manipulator direction information;

and calculating through the gesture direction information to obtain the direction information of the manipulator. In order to efficiently control the manipulator, the gesture direction information needs to be processed to obtain the manipulator direction information. Specifically, the manipulator direction information is calculated by the following formula:

indicating the direction information of the manipulator in the ith frame, namely the manipulator direction information;

direction information representing the user's hand in the ith frame, namely gesture direction information;

k_othe direction increment coefficient can be preset according to practical experience and is generally set to be 1.

316. Controlling the action of the manipulator through second target gesture information;

and controlling the action of the manipulator through second target gesture information. The second target gesture information comprises gesture category information, manipulator position information, manipulator direction information and gesture posture information. Specifically, the second target gesture information increment is mapped to the manipulator on the double-arm robot so as to remotely control the action of the manipulator.

In this embodiment, after acquiring gesture image information of a user, gesture box information, gesture three-dimensional position information, gesture confidence and gesture posture information are obtained according to the gesture image information, and then, key point center information and gesture direction information are estimated through the gesture image information. And then obtaining gesture category information according to the gesture box information, the gesture confidence and the key point center information. And finally, controlling the action of the manipulator through second target gesture information. The gesture control method comprises the steps of collecting the actions of the hands without wearing the interactive equipment, namely collecting gesture image information, processing the information to obtain first target gesture information, and remotely controlling the manipulator to execute corresponding actions according to the first target gesture information, so that the hands can draw the actions without limitation under the condition of remote operation, and better experience is provided for users. The key point center information and the gesture confidence are introduced for judgment, the gesture frame information and the gesture three-dimensional position information can be detected, the left hand and right hand categories can be judged, and the action accuracy can be improved.

Referring to fig. 5, an embodiment of a robot gesture remote operation device according to the embodiment of the present application includes:

the device comprises an acquisition unit 501, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring gesture image information of a user, and the gesture image information is image information representing hand motions of the user;

the processing unit 502 is configured to obtain gesture box information, gesture three-dimensional position information, gesture confidence and gesture posture information according to the gesture image information, where the gesture box information is information representing a range of a hand of the user, the gesture confidence represents a probability that the gesture posture information coincides with a finger posture of the user, and the gesture posture information is information representing a posture with a highest probability that the gesture posture coincides with the finger posture of the user in a gesture database;

an estimating unit 503, configured to estimate, through the gesture image information, key point center information and gesture direction information, where the key point center information is center position information representing an estimated hand, and the gesture direction information is orientation information representing a hand of the user;

the processing unit 502 is further configured to obtain gesture category information according to the gesture box information, the gesture confidence and the key point center information, where the gesture category information represents left and right categories of the hand of the user;

a control unit 504, configured to control a manipulator action of the robot through first target gesture information, where the first target gesture information includes the gesture category information, the gesture three-dimensional position information, the gesture direction information, and the gesture posture information.

In the embodiment of the application, after the acquisition unit 501 acquires gesture image information of a user, the processing unit 502 obtains gesture box information, gesture three-dimensional position information, gesture confidence and gesture posture information according to the gesture image information, and the estimation unit 503 estimates key point center information and gesture direction information through the gesture image information. And then obtaining gesture category information according to the gesture box information, the gesture confidence and the key point center information. Finally, the control unit 504 controls the manipulator to operate according to the first target gesture information. The left hand and the right hand can be distinguished through the gesture box information, the gesture confidence and the key point center information, and left and right categories of the hand of the user, namely gesture category information, are obtained. In addition, because the actions of the hands without wearing interactive equipment, namely acquiring gesture image information, processing the information to obtain first target gesture information, and remotely controlling the manipulators to execute corresponding actions according to the first target gesture information, the hands can trace the actions without limitation under the condition of remote operation, and better experience is provided for users.

The functions and processes executed by each unit in the robot gesture remote operation device of this embodiment are similar to those executed by the gesture remote operation device in fig. 1 to 5, and are not described again here.

Fig. 6 is a schematic structural diagram of a gesture remote operation device according to an embodiment of the present disclosure, where the gesture remote operation device 600 may include one or more Central Processing Units (CPUs) 601 and a memory 605, and the memory 605 stores one or more applications or data therein.

The memory 605 may be volatile storage or persistent storage, among other things. The program stored in memory 605 may include one or more modules, each of which may include a series of instruction operations in a gesture teleoperation device. Still further, the central processor 601 may be configured to communicate with the memory 605 to execute a series of command operations in the memory 605 on the gesture teleoperation device 600.

Gesture teleoperation device 600 may also include one or more power supplies 602, one or more wired or wireless network interfaces 603, one or more input-output interfaces 604, and/or one or more operating systems, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

The central processor 601 can perform the operations performed by the gesture teleoperation device in the embodiments shown in fig. 1 to fig. 5, which are not described herein again.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like.

Claims

1. A robot gesture teleoperation method is characterized by comprising the following steps:

2. The gesture teleoperation method according to claim 1, wherein obtaining gesture box information, gesture three-dimensional position information, gesture confidence and gesture posture information according to the gesture image information comprises:

3. The gesture teleoperation method according to claim 2, wherein the estimating of the center information of the key point and the gesture direction information through the gesture image information comprises:

4. The gesture teleoperation method according to claim 3, wherein obtaining gesture category information according to the gesture box information, the gesture confidence and the key point center information comprises:

if the gesture is matched with the gesture frame information, determining the gesture category information according to the key point center information matched with the gesture frame information;

5. The gesture teleoperation method of claim 4, wherein controlling the manipulator motion of the robot through the first target gesture information comprises:

6. The gesture teleoperation method of claim 3, wherein determining the three-dimensional keypoint information from the two-dimensional keypoint information based on the depth image comprises:

d_iobtaining the depth value of the ith key point from the depth image;

s is the depth range of the depth image;

cx and cy are coordinate values of the central point of the depth image;

7. The gesture teleoperation method according to claim 3, wherein the step of calculating the key point center information through the key point three-dimensional information comprises:

calculating the key point center information by the following formula:

k is the total number of key points of the hand corresponding to the category.

8. A robotic gesture teleoperation device, comprising:

the processing unit is used for obtaining gesture box information, gesture three-dimensional position information, gesture confidence and gesture posture information according to the gesture image information, wherein the gesture box information is information representing the range of the hand of the user, the gesture confidence represents the probability that the gesture posture information conforms to the finger posture of the user, and the gesture posture information represents the posture with the maximum probability of conforming to the finger posture of the user in a gesture database;

9. A robotic gesture teleoperation device, comprising:

the memory is a transient memory or a persistent memory;

the central processor is configured to communicate with the memory and execute the operations of the instructions in the memory to perform the method of any of claims 1 to 7.

10. A computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the method of any one of claims 1 to 7.