CN113370217B - Object gesture recognition and grabbing intelligent robot method based on deep learning - Google Patents

Object gesture recognition and grabbing intelligent robot method based on deep learning Download PDF

Info

Publication number
CN113370217B
CN113370217B CN202110732696.7A CN202110732696A CN113370217B CN 113370217 B CN113370217 B CN 113370217B CN 202110732696 A CN202110732696 A CN 202110732696A CN 113370217 B CN113370217 B CN 113370217B
Authority
CN
China
Prior art keywords
data set
camera
data
training
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110732696.7A
Other languages
Chinese (zh)
Other versions
CN113370217A (en
Inventor
杜广龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202110732696.7A priority Critical patent/CN113370217B/en
Publication of CN113370217A publication Critical patent/CN113370217A/en
Application granted granted Critical
Publication of CN113370217B publication Critical patent/CN113370217B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/163Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1694Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
    • B25J9/1697Vision controlled systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for an intelligent robot for object gesture recognition and grabbing based on deep learning. The method comprises the following steps: building a virtual environment and a mechanical arm working platform model; carrying out randomization treatment on an object on a virtual model of a mechanical arm working platform based on the constructed virtual environment, and obtaining a camera shooting image to obtain a data set; constructing an object posture detector; based on the constructed object posture detector, constructing a neural network, and training the neural network by using the acquired data set; the object posture detector after training is migrated to the real platform, a large amount of data is obtained by simulating and generating random objects in a virtual environment and is used as a training data set, a special training method is adopted to obtain the object posture detector with stronger generalization capability, and then the object posture detector is migrated to the real platform, so that the posture identification and grabbing of basic objects are realized.

Description

Object gesture recognition and grabbing intelligent robot method based on deep learning
Technical Field
The invention relates to the field of intelligent robot grabbing, in particular to a method for realizing object gesture recognition and grabbing by an intelligent robot based on deep learning.
Background
Today in industry 4.0, diverse robots are beginning to walk into the factory for auxiliary production, which replace humans to perform dangerous or repetitive work tasks. It is apparent that intelligent robots do not feel tired, but only operate in compliance with trained neural networks or rules, and that excellent intelligent robots are favored by industry and are used in mass production.
However, with the popularization of industrial robots, problems regarding the training time period and the grasping efficiency of intelligent robots are also raised. Although researchers have ensured as rapid and easy training as possible in training a neural network of a robot, how to shorten the training time of the neural network and improve the grasping efficiency of the robot is still a concern of researchers.
At present, the training mode of the mainstream intelligent robot is still based on actual scenes, namely, the training scenes are randomized in real life, and the scenes are acquired to train the neural network. However, this method has a major drawback in that randomization of training scenes in real life consumes a lot of time, and the time consumed for generating each unit of training data is relatively long compared to computer perception. For the training of intelligent robots, the time spent in the training process is relatively negligible, but the time spent for generating the training dataset is relatively large, and for the use of intelligent robots it is not acceptable to spend a larger proportion of the time for producing the training dataset than for a specific training intelligent robot.
Of course, the training mode of the intelligent robot at present also has a scheme of using simulation to train the neural network of the intelligent robot, for example, using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping, IEEE International Conference on Robotics and Automation and 2018. An attractive alternative approach is presented in this document to use an off-the-shelf simulator to present the synthetic virtual dataset and automatically generate the underlying truth notes for it. The article believes that models trained solely from simulation data often cannot be generalized to the real world. This document studies how to extend the stochastic simulation environment and domain adaptation method to training a capture system to capture new targets from the original monocular RGB image. The article shows that by using virtual synthetic data and domain adaptation, the number of real world samples required to achieve a given performance level can be significantly reduced, mainly using randomly generated virtual data sets. However, this technique cannot be trained entirely without using the real-world data set, there is still a need for real-world data sets, and the problem of robot neural network overfitting is not optimized.
Disclosure of Invention
Therefore, aiming at the defects of the prior art, the invention discloses a method for identifying and grabbing an intelligent robot based on the object gesture of deep learning, wherein in a virtual environment, a random algorithm is used for enabling training scenes to be different in more scenes and target object factors, so that more possible training data are generated and used for covering different working scenes faced in the industrial production process as much as possible. And because of the virtual environment built by means of the computer, there is an advantage in terms of the speed and amount of data to be generated over conventional ways of generating data sets. The intelligent robot is trained in the mode, the training speed is faster than that of a traditional training mode, and when the model is migrated to the real robot for use, the model has better generalization capability and provides stronger using effect in shorter time due to wider coverage of a data set and optimization aiming at the over-fitting problem. The training of the intelligent robot neural network is completely based on the virtual data set, and does not need to rely on the real world data set, so that the training efficiency of the neural network is improved, and the neural network is forced to pay attention to the pose characteristics of the object instead of the relation between the pose and the background of the training data set, so that the problem of over-fitting is reduced.
The object of the invention is achieved by at least one of the following technical solutions.
The method for the intelligent robot for identifying and grabbing the object gesture based on the deep learning comprises the following steps:
s1: building a virtual environment and a mechanical arm working platform model;
s2: based on the virtual environment constructed in the step S1, carrying out randomization treatment on an object on the virtual model of the mechanical arm working platform, and obtaining a camera shooting image to obtain a data set;
s3: constructing an object posture detector;
s4: constructing a neural network based on the object posture detector constructed in the step S3, and training the neural network by using the data set obtained in the step S2;
s5: and (3) migrating the object posture detector trained in the step S4 to a real platform.
Further, step S1 includes the steps of:
s1.1, acquiring the size and the shape of a mechanical arm working platform in a real environment, and constructing the mechanical arm and the mechanical arm working platform in a virtual environment in a one-to-one manner; simultaneously constructing a plurality of object models;
s1.2, splicing the object model obtained in the step S1.1 in a virtual environment, and simulating a real mechanical arm working platform and an actual basic environment.
Further, in step S2, the randomization process includes:
randomizing the appearance and drop positions of a plurality of different object models;
randomizing the color and the material of the object model;
randomizing ambient light.
Further, in step S2, after the randomization processing, an RGB image of the camera lens angle in the virtual environment is obtained as a dataset, and a specific position of the object model in the image in the dataset is obtained for subsequent verification.
Further, in step S3, the object pose detector is constructed by using EPnP algorithm and ranac algorithm, where PnP (peer-n-point) is a known point pair of n spatial 3D points corresponding to the 2D points of the image, and there are various solutions for calculating the pose of the camera, or a class of problems of the pose of the object, such as: direct Linear Transformation (DLT), P3P, EPnP, UPnP and nonlinear optimization methods; ranac (Random sample consensus, random sampling coincidence algorithm) is to estimate parameters of a mathematical model from a set of observation data sets containing 'outlier' through an iterative mode, and is widely used in computer vision, and the algorithm can effectively improve the accuracy of EPnP object attitude estimation; the method comprises the following steps:
s3.1, adopting an EPnP algorithm, and randomly selecting n reference points in a space between a workbench and a camera in a virtual environment. Acquiring 3D coordinates of the reference points in a world coordinate system, and recording as
Figure BDA0003139629090000031
i=1, …, n, while acquiring the 2D coordinates of these reference points on the projection plane taken by the camera, noted +.>
Figure BDA0003139629090000032
i=1,…,n;
S3.2, respectively selecting 4 control points in a world coordinate system and a camera projection plane by adopting a Principal Component Analysis (PCA) method through the selected n reference points, and respectively marking as:
Figure BDA0003139629090000033
j=1, …,4 and +.>
Figure BDA0003139629090000034
j=1, …,4. The method meets the following conditions:
Figure BDA0003139629090000035
wherein a is ij Is a homogeneous barycentric coordinate; the condition indicates that the selected 4 control points can be used for representing 3D reference points under any world coordinate system through weighting; in the projection plane, the reference point and the control point have the same weighting relation;
s3.3, coordinates of 4 control points under a world coordinate system and a camera coordinate system are obtained in the step S3.1 and the step S3.2, and a rotation matrix R and a translation matrix t are obtained by utilizing a 3D-3D algorithm and are called a camera external parameter matrix;
s3.3, using a Ranac algorithm, using the camera external parameter matrix obtained in the step S3.3 as an initial hypothesis model in the Ranac algorithm to test the selected reference points in all other data sets, namely comparing the estimated 2D screen coordinates obtained by transforming the 3D space coordinates of the reference points through the camera external parameter matrix with the actual 2D screen coordinates of the reference points obtained in the step S3.1 to obtain an estimated-actual coordinate distance difference, and recording as D mn Wherein m is the sequence number of the reference point selected from the single data in the data set, and n is the sequence number of the data in the data set; setting a threshold d according to the actual precision requirement 0 If d mn <=d 0 The reference point is considered as an intra-office point, otherwise, the reference point is considered as an extra-office point;
s3.4, in the first iteration, randomly selecting one data in the data set to start iteration, and setting the obtained camera external parameter matrix as an optimal external parameter matrix;
s3.5, repeating the Ranac algorithm for a plurality of Ranac iterations. Before performing the iteration, a threshold k is set for determining whether the number of local points obtained in one iteration meets the accuracy requirement. At the same time the threshold k cannot be set too high in order to prevent overfitting. In each Ranac iteration, if the ratio of the number of local points to the total number of reference points is greater than a threshold k and the number of local points is greater than the number of local points of the previous optimal external parameter matrix, setting the camera external parameter matrix of the iteration as the optimal external parameter matrix; continuously carrying out Ranac iteration until the iteration is finished, and obtaining an optimal external parameter matrix of the camera under the data set; the iteration times can be set according to actual conditions, and in general, the more the iteration times are, the higher the accuracy is, but the time cost is higher, and a reasonable value is required to be determined according to the actual conditions;
and S3.6, obtaining an optimal external parameter matrix to obtain the pose of the camera, and further obtaining the pose of the object model under a camera coordinate system.
Further, step S4 includes the steps of:
s4.1, adopting Python as a programming language, simultaneously considering flexibility and program size, and adopting an open-source PyTorch deep learning framework to construct a neural network;
s4.2, the invention aims to be applicable to a plurality of scenes, has stronger generalization capability, and the data set generated in the step S2 is single, so that the forced neural network is more concerned with estimating the pose characteristics of the object rather than the relation between the pose and the background in order to effectively prevent overfitting, and the data set generated in the step S2 is subjected to augmentation processing to obtain an augmented data set;
s4.3, training the neural network constructed in the step S4.1 by using the augmentation data set obtained in the step S4.2, wherein 20% of data is used for training, and the part of data becomes a training data set; 80% of the data is used for evaluation, this part of the data being referred to as the evaluation dataset;
s4.4, setting a standard to evaluate the final effect. Obtaining the estimated pose of the object model in the estimated data set through training and estimation in the step S4.3; in step S2, the actual specific coordinate position of the object model has been obtained, so that there is a one-to-one correspondence between the two sets of data. Constructing bounding boxes of the object model by using the two groups of data of the actual specific coordinate position of the object model and the estimated pose of the object model in the estimated dataset, which are obtained in the step S2, through a K-DOP algorithm, wherein the bounding boxes are called an actual bounding box and an estimated bounding box; and obtaining the corresponding overlapping relation between the estimated bounding box and the actual bounding box by adopting a bounding box collision algorithm, and judging whether the accuracy standard is reached.
Further, the step S4.2 specifically includes the following steps:
s4.2.1, obtaining the pose of the object model by using the real data provided by the data set obtained in the step S2, and then cutting the object model;
s4.2.2, synthesizing the cut object model with other pictures to achieve the purpose of replacing the background picture;
s4.2.3 image processing is performed on the combined image, including changing saturation, changing brightness, and adding noise, resulting in an augmented data set.
Further, in step S5, the trained object posture detector is applied to a mechanical arm in a laboratory, and after the building and training are performed according to steps S1 to S4, a grabbing point is required to be calculated according to the posture of the object, so as to realize the recognition and grabbing of the mechanical arm or the intelligent robot on the object in the real platform.
Further, the step of calculating the grabbing point is as follows:
s5.1, calculating a bounding box by adopting a K-DOP algorithm according to the pose of the object model.
S5.2, selecting a grabbing point on the bounding box according to the actual type of the mechanical claw of the mechanical arm.
Compared with the prior art, the invention has the advantages that:
the training of the intelligent robot neural network is completely based on the virtual data set, and does not need to rely on the real world data set, so that the training efficiency of the neural network is improved, and the neural network is forced to pay attention to the pose characteristics of the object instead of the relation between the pose and the background of the training data set, so that the problem of over-fitting is reduced. The method is applicable to multiple scenes and has strong generalization.
Drawings
FIG. 1 is a flow chart of object gesture recognition and grabbing based on virtual environment training according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, a detailed description of the specific implementation of the present invention will be given below with reference to the accompanying drawings and examples.
Examples:
the method for the intelligent robot for object gesture recognition and grabbing based on deep learning comprises the following steps as shown in fig. 1:
s1: building a virtual environment and a mechanical arm working platform model, wherein the method comprises the following steps of:
s1.1, acquiring the size and the shape of a mechanical arm working platform in a real environment, and constructing the mechanical arm and the mechanical arm working platform in a virtual environment in a one-to-one manner; simultaneously constructing a plurality of object models;
s1.2, splicing the object model obtained in the step S1.1 in a virtual environment, and simulating a real mechanical arm working platform and an actual basic environment.
S2: based on the virtual environment constructed in the step S1, carrying out randomization treatment on an object on the virtual model of the mechanical arm working platform, and obtaining a camera shooting image to obtain a data set;
the randomization process includes:
randomizing the appearance and drop positions of a plurality of different object models;
randomizing the color and the material of the object model;
randomizing ambient light.
After randomization processing, RGB pictures of camera lens angles in the virtual environment are obtained as a data set, and specific positions of object models in the pictures in the data set are obtained for subsequent verification.
S3: constructing an object posture detector;
the object pose detector is constructed by adopting an EPnP algorithm and a Ranac algorithm, pnP (peselect-n-point) is a known point pair of n spatial 3D points corresponding to 2D points of an image, and a problem of calculating the pose of a camera or the pose of the object is solved, and there are various solutions, for example: direct Linear Transformation (DLT), P3P, EPnP, UPnP, and nonlinear optimization methods. Ranac (Random sample consensus, random sampling coincidence algorithm) is to estimate parameters of a mathematical model from a group of observation data sets containing 'outer points' in an iterative manner, and is widely used in computer vision, and the algorithm can effectively improve the accuracy of EPnP object attitude estimation; the method comprises the following steps:
s3.1, adopting an EPnP algorithm, and randomly selecting n reference points in a space between a workbench and a camera in a virtual environment. Acquiring 3D coordinates of the reference points in a world coordinate system, and recording as
Figure BDA0003139629090000061
i=1, …, n, while acquiring the 2D coordinates of these reference points on the projection plane taken by the camera, noted +.>
Figure BDA0003139629090000062
i=1, …, n; in this embodiment, 10 reference points are selected and can be adjusted according to specific implementation conditions.
S3.2, respectively selecting 4 control points in a world coordinate system and a camera projection plane by adopting a Principal Component Analysis (PCA) method through the selected n reference points, and respectively marking as:
Figure BDA0003139629090000063
j=1, …,4 and +.>
Figure BDA0003139629090000064
j=1, …,4. The method meets the following conditions:
Figure BDA0003139629090000065
wherein a is ij Is a homogeneous barycentric coordinate; the condition indicates that the selected 4 control points can be used for representing 3D reference points under any world coordinate system through weighting; in the projection plane, the reference point and the control point have the same weighting relation;
s3.3, coordinates of 4 control points under a world coordinate system and a camera coordinate system are obtained in the step S3.1 and the step S3.2, and a rotation matrix R and a translation matrix t are obtained by utilizing a 3D-3D algorithm and are called a camera external parameter matrix;
s3.3 using Ransac algorithm, using the camera external parameter matrix obtained in step S3.3 as the initial hypothesis model in the ranac algorithm to test the selected reference points in all other data sets, namely comparing the estimated 2D screen coordinates obtained by transforming the 3D space coordinates of the reference points with the actual 2D screen coordinates of the reference points obtained in step S3.1 to obtain an estimated-actual coordinate distance difference, which is denoted as D mn Wherein m is the sequence number of the reference point selected from the single data in the data set, and n is the sequence number of the data in the data set; setting a threshold d according to the actual precision requirement 0 If d mn <=d 0 The reference point is considered as an intra-office point, otherwise, the reference point is considered as an extra-office point; in this embodiment, d is selected 0 =1mm。
S3.4, in the first iteration, randomly selecting one data in the data set to start iteration, and setting the obtained camera external parameter matrix as an optimal external parameter matrix;
s3.5, repeating the Ranac algorithm for a plurality of Ranac iterations. Before performing the iteration, a threshold k is set for determining whether the number of local points obtained in one iteration meets the accuracy requirement. At the same time the threshold k cannot be set too high in order to prevent overfitting. In each Ranac iteration, if the ratio of the number of local points to the total number of reference points is greater than the threshold k and the number of local points is greater than the number of local points of the previous optimal outlier matrix, the camera outlier matrix for that iteration is set as the optimal outlier matrix. In this embodiment, according to the accuracy requirement, the threshold k is set to 80%, that is, the ratio of the intra-office point to the extra-office point is 4:1, the camera external parameter matrix under the condition can be qualified to select the optimal external parameter matrix; continuously carrying out Ranac iteration until the iteration is finished, and obtaining an optimal external parameter matrix of the camera under the data set; the iteration times can be set according to actual conditions, and in general, the more the iteration times are, the higher the accuracy is, but the time cost is higher, and a reasonable value is required to be determined according to the actual conditions; in this embodiment, the iteration number is 10000 according to the accuracy requirement.
And S3.6, obtaining an optimal external parameter matrix to obtain the pose of the camera, and further obtaining the pose of the object model under a camera coordinate system.
S4: based on the object posture detector constructed in the step S3, constructing a neural network, and training the neural network by using the data set obtained in the step S2, wherein the method comprises the following steps of:
s4.1, adopting Python as a programming language, simultaneously considering flexibility and program size, and adopting an open-source PyTorch deep learning framework to construct a neural network;
s4.2, the invention aims to be applicable to a plurality of scenes and has stronger generalization capability. The data set generated in the step S2 is single, so as to effectively prevent overfitting, the force neural network focuses on estimating the pose characteristics of the object, rather than the relation between the pose and the background, and the data set generated in the step S2 is subjected to augmentation processing to obtain an augmented data set, which specifically comprises the following steps:
s4.2.1, obtaining the pose of the object model by using the real data provided by the data set obtained in the step S2, and then cutting the object model;
s4.2.2, synthesizing the cut object model with other pictures to achieve the purpose of replacing the background picture;
s4.2.3 image processing is performed on the combined image, including changing saturation, changing brightness, and adding noise, resulting in an augmented data set.
S4.3, training the neural network constructed in the step S4.1 by using the augmentation data set obtained in the step S4.2, wherein 20% of data is used for training, and the part of data becomes a training data set; 80% of the data is used for evaluation, this part of the data being referred to as the evaluation dataset;
s4.4, setting a standard to evaluate the final effect. Obtaining the estimated pose of the object model in the estimated data set through training and estimation in the step S4.3; in step S2, the actual specific coordinate position of the object model has been obtained, so that there is a one-to-one correspondence between the two sets of data. Constructing bounding boxes of the object model by using the two groups of data of the actual specific coordinate position of the object model and the estimated pose of the object model in the estimated dataset, which are obtained in the step S2, through a K-DOP algorithm, wherein the bounding boxes are called an actual bounding box and an estimated bounding box; acquiring an overlapping relation between a corresponding estimated bounding box and an actual bounding box by adopting a bounding box collision algorithm, and judging whether an accuracy standard is reached; the accuracy standard is set according to the accuracy requirement, and in this embodiment, the accuracy is set to be 90% overlapping of bounding boxes.
S5: migrating the object posture detector trained in the step S4 to a real platform;
the object posture detector after training is applied to a mechanical arm in a laboratory, after construction and training are carried out according to the steps S1-S4, a grabbing point is needed to be calculated according to the posture of the object, so that the mechanical arm or the intelligent robot in a real platform can recognize and grab the object, and the steps of calculating the grabbing point are as follows:
s5.1, calculating a bounding box by adopting a K-DOP algorithm according to the pose of the object model.
S5.2, selecting a grabbing point on the bounding box according to the actual type of the mechanical claw of the mechanical arm.

Claims (6)

1. The method for the intelligent robot for identifying and grabbing the object gesture based on the deep learning is characterized by comprising the following steps of:
s1: building a virtual environment and a mechanical arm working platform model; the method comprises the following steps:
s1.1, acquiring the size and the shape of a mechanical arm working platform in a real environment, and constructing the mechanical arm and the mechanical arm working platform in a virtual environment in a one-to-one manner; simultaneously constructing a plurality of object models;
s1.2, splicing the object model obtained in the step S1.1 in a virtual environment, and simulating a real mechanical arm working platform and an actual environment;
s2: based on the virtual environment constructed in the step S1, carrying out randomization treatment on an object on the virtual model of the mechanical arm working platform, and obtaining a camera shooting image to obtain a data set; the randomization process includes:
randomizing the appearance and drop positions of a plurality of different object models;
randomizing the color and the material of the object model;
randomizing ambient illumination;
after randomization processing, RGB pictures of camera lens angles in the virtual environment are obtained as a data set, and specific positions of object models in the pictures in the data set are obtained for subsequent verification;
s3: constructing an object posture detector; the construction of the object posture detector is realized by adopting an EPnP algorithm and a Ranac algorithm, and comprises the following steps:
s3.1, adopting an EPnP algorithm, and randomly selecting n reference points in a space between a workbench and a camera in a virtual environment; acquiring 3D coordinates of the reference points in a world coordinate system, and recording as
Figure FDA0004170151920000011
…, n, while acquiring the 2D coordinates of these reference points on the projection plane taken by the camera, denoted +.>
Figure FDA0004170151920000012
…,n;
S3.2, respectively selecting 4 control points in a world coordinate system and a camera projection plane by adopting a Principal Component Analysis (PCA) method through the selected n reference points, and respectively marking as:
Figure FDA0004170151920000013
and->
Figure FDA0004170151920000014
The method meets the following conditions:
Figure FDA0004170151920000015
wherein a is ij Is a homogeneous barycentric coordinate; the condition indicates that the selected 4 control points can be used for representing 3D reference points under any world coordinate system through weighting; in the projection plane, the reference point and the control point have the same weighting relation;
s3.3, coordinates of 4 control points under a world coordinate system and a camera coordinate system are obtained in the step S3.1 and the step S3.2, and a rotation matrix R and a translation matrix t are obtained by utilizing a 3D-3D algorithm and are called a camera external parameter matrix;
s3.4, using a Ranac algorithm, using the camera external parameter matrix obtained in the step S3.3 as an initial hypothesis model in the Ranac algorithm to test the selected reference points in all other data sets, namely comparing the estimated 2D screen coordinates obtained by transforming the 3D space coordinates of the reference points through the camera external parameter matrix with the actual 2D screen coordinates of the reference points obtained in the step S3.1 to obtain an estimated-actual coordinate distance difference, and recording as D mn Wherein m is the sequence number of the reference point selected from the single data in the data set, and n is the sequence number of the data in the data set; setting a threshold d according to the actual precision requirement 0 If d mn <=d 0 The reference point is considered as an intra-office point, otherwise, the reference point is considered as an extra-office point;
s3.5, in the first iteration, randomly selecting one data in the data set to start iteration, and setting the obtained camera external parameter matrix as an optimal external parameter matrix;
s3.6, repeating the Ranac algorithm for a plurality of Ranac iterations to obtain an optimal external parameter matrix of the camera under the data set;
s3.7, obtaining an optimal external parameter matrix to obtain the pose of the camera, and further obtaining the pose of the object model under a camera coordinate system;
s4: constructing a neural network based on the object posture detector constructed in the step S3, and training the neural network by using the data set obtained in the step S2;
s5: and (3) migrating the object posture detector trained in the step S4 to a real platform.
2. The method for intelligent robot recognition and grasping of object pose based on deep learning according to claim 1, wherein in step S3.5, a threshold k is set before performing iteration, for determining whether the number of local points obtained in one iteration meets the accuracy requirement; in each Ranac iteration, if the ratio of the number of the local points to the total number of the reference points is greater than a threshold k and the number of the local points is greater than the number of the local points of the previous optimal external parameter matrix, setting the camera external parameter matrix of the iteration as the optimal external parameter matrix; continuously carrying out Ranac iteration until the iteration is finished, and obtaining an optimal external parameter matrix of the camera under the data set; the iteration number is set according to the actual situation.
3. The method of intelligent robot for deep learning based object pose recognition and gripping according to claim 2, wherein step S4 comprises the steps of:
s4.1, adopting Python as a programming language, simultaneously considering flexibility and program size, and adopting an open-source PyTorch deep learning framework to construct a neural network;
s4.2, carrying out augmentation treatment on the data set generated in the step S2 to obtain an augmented data set;
s4.3, training the neural network constructed in the step S4.1 by using the augmentation data set obtained in the step S4.2, wherein 20% of data is used for training, and the part of data becomes a training data set; 80% of the data is used for evaluation, this part of the data being referred to as the evaluation dataset;
s4.4, setting a standard to evaluate the final effect, and obtaining the estimated pose of the object model in the evaluation data set through training and evaluation in the step S4.3; constructing bounding boxes of the object model by using the two groups of data of the actual specific coordinate position of the object model and the estimated pose of the object model in the estimated dataset, which are obtained in the step S2, through a K-DOP algorithm, wherein the bounding boxes are called an actual bounding box and an estimated bounding box; and obtaining the corresponding overlapping relation between the estimated bounding box and the actual bounding box by adopting a bounding box collision algorithm, and judging whether the accuracy standard is reached.
4. A method of intelligent robot for object gesture recognition and gripping based on deep learning according to claim 3, characterized in that step S4.2 specifically comprises the following steps:
s4.2.1, obtaining the pose of the object model by using the real data provided by the data set obtained in the step S2, and then cutting the object model;
s4.2.2, synthesizing the cut object model with other pictures to achieve the purpose of replacing the background picture;
s4.2.3 image processing is performed on the combined image, including changing saturation, changing brightness, and adding noise, resulting in an augmented data set.
5. The method for intelligent robot recognition and capture of object gesture based on deep learning according to claim 4, wherein in step S5, the trained object gesture detector is applied to a robot arm in a laboratory, and after the construction and training according to steps S1 to S4, the capture point is calculated according to the gesture of the object, so as to realize recognition and capture of the object by the robot arm or intelligent robot in a real platform.
6. The method for intelligent robot for recognition and gripping of object pose based on deep learning as claimed in claim 5, wherein the step of calculating the gripping point is as follows:
s5.1, calculating a bounding box by adopting a K-DOP algorithm according to the pose of the object model;
s5.2, selecting a grabbing point on the bounding box according to the actual type of the mechanical claw of the mechanical arm.
CN202110732696.7A 2021-06-29 2021-06-29 Object gesture recognition and grabbing intelligent robot method based on deep learning Active CN113370217B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110732696.7A CN113370217B (en) 2021-06-29 2021-06-29 Object gesture recognition and grabbing intelligent robot method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110732696.7A CN113370217B (en) 2021-06-29 2021-06-29 Object gesture recognition and grabbing intelligent robot method based on deep learning

Publications (2)

Publication Number Publication Date
CN113370217A CN113370217A (en) 2021-09-10
CN113370217B true CN113370217B (en) 2023-06-16

Family

ID=77579948

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110732696.7A Active CN113370217B (en) 2021-06-29 2021-06-29 Object gesture recognition and grabbing intelligent robot method based on deep learning

Country Status (1)

Country Link
CN (1) CN113370217B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114310954B (en) * 2021-12-31 2024-04-16 北京理工大学 Self-adaptive lifting control method and system for nursing robot
CN114474056B (en) * 2022-01-26 2023-07-21 北京航空航天大学 Monocular vision high-precision target positioning method for grabbing operation
CN115082795A (en) * 2022-07-04 2022-09-20 梅卡曼德(北京)机器人科技有限公司 Virtual image generation method, device, equipment, medium and product
CN115070780B (en) * 2022-08-24 2022-11-18 北自所(北京)科技发展股份有限公司 Industrial robot grabbing method and device based on digital twinning and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109903332A (en) * 2019-01-08 2019-06-18 杭州电子科技大学 A kind of object's pose estimation method based on deep learning
US10427306B1 (en) * 2017-07-06 2019-10-01 X Development Llc Multimodal object identification
CN110782492A (en) * 2019-10-08 2020-02-11 三星(中国)半导体有限公司 Pose tracking method and device
CN112150551A (en) * 2020-09-25 2020-12-29 北京百度网讯科技有限公司 Object pose acquisition method and device and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10427306B1 (en) * 2017-07-06 2019-10-01 X Development Llc Multimodal object identification
CN109903332A (en) * 2019-01-08 2019-06-18 杭州电子科技大学 A kind of object's pose estimation method based on deep learning
CN110782492A (en) * 2019-10-08 2020-02-11 三星(中国)半导体有限公司 Pose tracking method and device
CN112150551A (en) * 2020-09-25 2020-12-29 北京百度网讯科技有限公司 Object pose acquisition method and device and electronic equipment

Also Published As

Publication number Publication date
CN113370217A (en) 2021-09-10

Similar Documents

Publication Publication Date Title
CN113370217B (en) Object gesture recognition and grabbing intelligent robot method based on deep learning
CN109344882B (en) Convolutional neural network-based robot control target pose identification method
Johns et al. Deep learning a grasp function for grasping under gripper pose uncertainty
Sadeghi et al. Sim2real viewpoint invariant visual servoing by recurrent control
CN109483573A (en) Machine learning device, robot system and machine learning method
CN109816725A (en) A kind of monocular camera object pose estimation method and device based on deep learning
CN109986560B (en) Mechanical arm self-adaptive grabbing method for multiple target types
CN111695562B (en) Autonomous robot grabbing method based on convolutional neural network
US20210205988A1 (en) Task embedding for device control
JP2021163503A (en) Three-dimensional pose estimation by two-dimensional camera
CN108961144A (en) Image processing system
CN112149694B (en) Image processing method, system, storage medium and terminal based on convolutional neural network pooling module
Fu et al. Active learning-based grasp for accurate industrial manipulation
Thalhammer et al. Pyrapose: Feature pyramids for fast and accurate object pose estimation under domain shift
Inoue et al. Transfer learning from synthetic to real images using variational autoencoders for robotic applications
Chen et al. Towards generalization and data efficient learning of deep robotic grasping
JP2021163502A (en) Three-dimensional pose estimation by multiple two-dimensional cameras
JP2021176078A (en) Deep layer learning and feature detection through vector field estimation
Arents et al. Construction of a smart vision-guided robot system for manipulation in a dynamic environment
CN111496794B (en) Kinematics self-grabbing learning method and system based on simulation industrial robot
CN111275758B (en) Hybrid 3D visual positioning method, device, computer equipment and storage medium
Kiyokawa et al. Efficient collection and automatic annotation of real-world object images by taking advantage of post-diminished multiple visual markers
Luo et al. Robot artist performs cartoon style facial portrait painting
CN113269831B (en) Visual repositioning method, system and device based on scene coordinate regression network
CN113436293B (en) Intelligent captured image generation method based on condition generation type countermeasure network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant