CN113370217A - Method for recognizing and grabbing object posture based on deep learning for intelligent robot - Google Patents

Method for recognizing and grabbing object posture based on deep learning for intelligent robot Download PDF

Info

Publication number
CN113370217A
CN113370217A CN202110732696.7A CN202110732696A CN113370217A CN 113370217 A CN113370217 A CN 113370217A CN 202110732696 A CN202110732696 A CN 202110732696A CN 113370217 A CN113370217 A CN 113370217A
Authority
CN
China
Prior art keywords
data set
camera
pose
obtaining
deep learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110732696.7A
Other languages
Chinese (zh)
Other versions
CN113370217B (en
Inventor
杜广龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202110732696.7A priority Critical patent/CN113370217B/en
Publication of CN113370217A publication Critical patent/CN113370217A/en
Application granted granted Critical
Publication of CN113370217B publication Critical patent/CN113370217B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/163Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1694Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
    • B25J9/1697Vision controlled systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an intelligent robot method for recognizing and grabbing object postures based on deep learning. The method comprises the following steps: building a virtual environment and constructing a mechanical arm working platform model; randomizing an object on the virtual model of the mechanical arm working platform based on the built virtual environment, obtaining a camera shooting image and acquiring a data set; constructing an object attitude detector; constructing a neural network based on the constructed object posture detector, and training the neural network by using the acquired data set; the method comprises the steps of simulating and generating a random object in a virtual environment to obtain a large amount of data serving as a training data set, obtaining an object posture detector with strong generalization capability by adopting a special training method, and then transferring the object posture detector to a real platform to realize the posture recognition and grabbing of a basic object.

Description

Method for recognizing and grabbing object posture based on deep learning for intelligent robot
Technical Field
The invention relates to the field of intelligent robot grabbing, in particular to a method for realizing object posture recognition and grabbing by an intelligent robot based on deep learning.
Background
Today in industry 4.0, all manner of robots are beginning to walk into factories for secondary production, which instead of humans accomplish dangerous or repetitive work tasks. Obviously, the intelligent robot does not feel tired, and only operates according to the trained neural network or rule, so that the excellent intelligent robot is favored by the industry and is widely applied to production.
However, with the popularization of industrial robots, problems regarding the training time and grasping efficiency of intelligent robots have also been raised. Although researchers have ensured that the robot is trained quickly and easily as much as possible in terms of the neural network, how to shorten the training time of the neural network and improve the grabbing efficiency of the robot is still a concern of the researchers.
At present, the mainstream intelligent robot training mode is still based on actual scene training, namely, the actual training scene randomization in real life is performed, and the scene is obtained for neural network training. However, this method has a major drawback that randomization of training scenes in real life consumes a lot of time, and the time consumed for generating each unit of training data is relatively long compared with computer perception. For the training of the intelligent robot, the time consumed in the training process is relatively negligible, but the time consumed for generating the data set for training is huge, and for the use of the intelligent robot, it is not acceptable to consume a larger proportion of the time for producing the data set for training than for specifically training the intelligent robot.
At present, there is a scheme for training a neural network of an intelligent robot by Using Simulation, for example, from IEEE International Conference on robots and Automation, 2018. An attractive alternative proposed in this document is to use an off-the-shelf simulator to present the synthetic virtual dataset and automatically generate underlying truth annotations for it. The article recognizes that models trained solely on simulated data often cannot be generalized to the real world. This document studies how to extend the stochastic simulation environment and domain adaptation methods to train the capture system to capture new objects from the original monocular RGB images. This article shows that by using virtual synthetic data and domain adaptation, the number of real-world samples required to achieve a given level of performance can be reduced substantially, primarily using randomly generated virtual data sets. However, this technique cannot be trained without using real environment data sets at all, there is a need for real world data sets, and the problem of robot neural network overfitting is not optimized.
Disclosure of Invention
Therefore, aiming at the defects of the prior art, the invention discloses a method for recognizing and grabbing an object posture based on deep learning, wherein in a virtual environment, a random algorithm is used for enabling a training scene to generate difference in more scenes and target object factors so as to generate more possible training data for covering different working scenes faced in an industrial production process as far as possible. And due to the virtual environment built by means of the computer, the method has advantages over the traditional method for generating the data set in terms of speed and data quantity of the data set. Carry out intelligent robot's training under this kind of mode, training speed can be faster than traditional training mode to when migrating this model to real robot and using, because the more extensive of data set cover, and optimized to the overfitting problem, the model has better generalization ability, provides stronger result of use in a shorter time. The invention can train the neural network of the intelligent robot completely based on the virtual data set without relying on the real world data set, thereby improving the training efficiency of the neural network, and forcing the neural network to pay attention to the pose characteristics of the object instead of the relationship between the pose and the background of the training data set, thereby weakening the over-fitting problem.
The purpose of the invention is realized by at least one of the following technical solutions.
The method for recognizing and grabbing the object posture based on deep learning comprises the following steps:
s1: building a virtual environment and constructing a mechanical arm working platform model;
s2: randomizing an object on the virtual model of the mechanical arm working platform based on the virtual environment established in the step S1, obtaining a camera shooting image, and acquiring a data set;
s3: constructing an object attitude detector;
s4: constructing a neural network based on the object posture detector constructed in the step S3, and training the neural network by using the data set obtained in the step S2;
s5: and migrating the object posture detector trained in the step S4 to a real platform.
Further, step S1 includes the steps of:
s1.1, acquiring the size and the shape of a mechanical arm working platform in a real environment, and establishing the mechanical arm and the mechanical arm working platform one by one in a virtual environment; simultaneously constructing a plurality of object models;
s1.2, splicing the object models obtained in the step S1.1 in a virtual environment, and simulating a real mechanical arm working platform and an actual basic environment.
Further, in step S2, the randomization process includes:
randomizing the appearance and drop position of a plurality of different object models;
randomizing the color and material of the object model;
randomizing ambient lighting.
Further, in step S2, after the randomization process, RGB images of the camera lens angle in the virtual environment are obtained as a data set, and the specific position of the object model in the images in the data set is obtained for subsequent verification.
Further, in step S3, the object pose detector is constructed by using EPnP algorithm and ranac algorithm, and PnP (peer-n-point) is a known point pair corresponding to n spatial 3D points and 2D points of the image, and is a type of problem for calculating the camera pose or the object pose, and it has many solutions, for example: direct Linear Transformation (DLT), P3P, EPnP, UPnP, and nonlinear optimization methods; the Random sample consensus algorithm estimates parameters of a mathematical model in an iterative mode from a group of observation data sets containing 'local outer points', is widely used in computer vision, and can effectively improve the accuracy of the posture estimation of the EPnP object; the method comprises the following steps:
s3.1, adopting an EPnP algorithm,firstly, randomly selecting n reference points in the space between a workbench and a camera in a virtual environment. Obtaining the 3D coordinates of the reference points in the world coordinate system and recording the coordinates
Figure BDA0003139629090000031
i is 1, …, n, and the 2D coordinates of the reference points on the projection plane shot by the camera are acquired and recorded as
Figure BDA0003139629090000032
i=1,…,n;
S3.2, respectively selecting 4 control points in a world coordinate system and a camera projection plane by adopting a Principal Component Analysis (PCA) method through the selected n reference points, and respectively recording the control points as:
Figure BDA0003139629090000033
j-1, …,4 and
Figure BDA0003139629090000034
j is 1, …, 4. Satisfies the following conditions:
Figure BDA0003139629090000035
wherein, aijIs a homogeneous barycentric coordinate; the condition indicates that the selected 4 control points can represent the 3D reference point in any world coordinate system by weighting; in the projection plane, the reference point and the control point have the same weighting relation;
s3.3, obtaining coordinates of the 4 control points in a world coordinate system and a camera coordinate system through the step S3.1 and the step S3.2, and obtaining a rotation matrix R and a translation matrix t which are called as an external reference matrix of the camera by using a 3D-3D algorithm;
s3.3, testing the reference points selected in all other data sets by using the camera external parameter matrix obtained in the step S3.3 as an initial hypothesis model in the Randac algorithm by using the Randac algorithm, namely, obtaining estimated 2D screen coordinates of the 3D space coordinates of the reference points through the transformation of the camera external parameter matrix and actual reference point coordinates obtained in the step S3.1Comparing the 2D screen coordinates to obtain an estimated-actual coordinate distance difference value, and recording the difference value as DmnWherein m is the serial number of a reference point selected from single data in the data set, and n is the serial number of the data in the data set; setting a threshold value d according to the actual precision requirement0If d ismn<=d0If the reference point is a local point, the reference point is determined to be a local point, otherwise, the reference point is a local point;
s3.4, in the first iteration, randomly selecting a part of data in the data set to start iteration, and setting the obtained camera extrinsic parameter matrix as an optimal extrinsic parameter matrix;
and S3.5, repeating the Randac algorithm to carry out multiple Randac iterations. Before iteration, a threshold value k is set for determining whether the number of local points obtained in one iteration meets the precision requirement. At the same time, the threshold value k cannot be set too high in order to prevent overfitting. In each Ranmac iteration, if the proportion of the number of the local points in the total number of the reference points is greater than a threshold value k and the number of the local points is greater than the number of the local points of the previous optimal external parameter matrix, setting the camera external parameter matrix of the iteration as the optimal external parameter matrix; continuously carrying out Ransac iteration until the iteration is finished, and obtaining the optimal external parameter matrix of the camera under the data set; the iteration times can be set according to actual conditions, and under the ordinary condition, the more the iteration times, the higher the accuracy, but the higher the time cost, and a reasonable value needs to be determined according to the actual conditions;
and S3.6, obtaining the optimal external parameter matrix to obtain the pose of the camera, and further obtaining the pose of the object model in a camera coordinate system.
Further, step S4 includes the steps of:
s4.1, constructing a neural network by adopting an open-source PyTorch deep learning framework by adopting Python as a programming language and considering the flexibility and the size of a program;
s4.2, the method aims to be applicable to a plurality of scenes and has strong generalization capability, the data set generated in the step S2 is single, and in order to effectively prevent overfitting and force the neural network to pay more attention to the characteristics of the pose of the estimated object instead of the relation between the pose and the background, the data set generated in the step S2 is subjected to augmentation processing to obtain an augmented data set;
s4.3, training the neural network constructed in the step S4.1 by using the augmentation data set obtained in the step S4.2, wherein 20% of data is used for training, and the part of data becomes a training data set; 80% of the data is used for evaluation, this part of the data is called an evaluation dataset;
and S4.4, setting a standard to evaluate the final effect. Obtaining the estimated pose of the object model in the estimation data set through the training and estimation in the step S4.3; in step S2, the actual specific coordinate positions of the object model have been obtained, so that there is a one-to-one correspondence between the two sets of data. Using the actual specific coordinate position of the object model obtained in step S2 and the estimated pose of the object model in the evaluation dataset to construct bounding boxes of the object model by using the K-DOP algorithm, which are called actual bounding boxes and estimated bounding boxes; and obtaining the overlapping relation between the corresponding estimated bounding box and the actual bounding box by adopting a bounding box collision algorithm, and judging whether the accuracy standard is reached.
Further, step S4.2 specifically includes the following steps:
s4.2.1, obtaining the pose of the object model by using the real data provided by the data set obtained in the step S2, and then cutting the object model;
s4.2.2, synthesizing the cut object model with other pictures to achieve the purpose of replacing the background picture;
s4.2.3, image processing including saturation change, brightness change and noise addition is performed on the synthesized image resulting in an augmented data set.
Further, in step S5, the trained object posture detector is applied to a robot arm in a laboratory, and a grasping point needs to be calculated according to the posture of the object after construction and training according to steps S1 to S4, so as to realize recognition and grasping of the object by the robot arm or the intelligent robot in the real platform.
Further, the step of calculating the grab point is as follows:
and S5.1, calculating the bounding box by adopting a K-DOP algorithm according to the pose of the object model.
S5.2, selecting a grabbing point on the bounding box according to the actual type of the mechanical claw of the mechanical arm.
Compared with the prior art, the invention has the advantages that:
the invention can train the neural network of the intelligent robot completely based on the virtual data set without relying on the real world data set, thereby improving the training efficiency of the neural network, and forcing the neural network to pay attention to the pose characteristics of the object instead of the relationship between the pose and the background of the training data set, thereby weakening the over-fitting problem. The method is applicable to a plurality of scenes and has strong generalization.
Drawings
FIG. 1 is a flow chart of object pose recognition and capture based on virtual environment training according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in detail below with reference to the accompanying drawings.
Example (b):
the method for recognizing and grabbing the object posture based on deep learning, as shown in fig. 1, includes the following steps:
s1: the method for constructing the mechanical arm working platform model by building the virtual environment comprises the following steps:
s1.1, acquiring the size and the shape of a mechanical arm working platform in a real environment, and establishing the mechanical arm and the mechanical arm working platform one by one in a virtual environment; simultaneously constructing a plurality of object models;
s1.2, splicing the object models obtained in the step S1.1 in a virtual environment, and simulating a real mechanical arm working platform and an actual basic environment.
S2: randomizing an object on the virtual model of the mechanical arm working platform based on the virtual environment established in the step S1, obtaining a camera shooting image, and acquiring a data set;
the randomization process includes:
randomizing the appearance and drop position of a plurality of different object models;
randomizing the color and material of the object model;
randomizing ambient lighting.
And after randomization, acquiring an RGB (red, green and blue) picture of a camera lens angle in a virtual environment as a data set, and acquiring a specific position of an object model in the picture in the data set for subsequent verification.
S3: constructing an object attitude detector;
the object attitude detector is constructed by adopting an EPnP algorithm and a Ranpac algorithm, PnP (passive-n-point) is a known point pair of n spatial 3D points corresponding to an image 2D point, and the object attitude or the camera attitude is calculated, and the object attitude detector has various solutions, such as: direct Linear Transformation (DLT), P3P, EPnP, UPnP and nonlinear optimization methods. The Random sample consensus algorithm estimates parameters of a mathematical model in an iterative mode from a group of observation data sets containing 'local outer points', is widely used in computer vision, and can effectively improve the accuracy of the posture estimation of the EPnP object; the method comprises the following steps:
s3.1, adopting an EPnP algorithm, and randomly selecting n reference points in a space between a workbench and a camera in a virtual environment. Obtaining the 3D coordinates of the reference points in the world coordinate system and recording the coordinates
Figure BDA0003139629090000061
i is 1, …, n, and the 2D coordinates of the reference points on the projection plane shot by the camera are acquired and recorded as
Figure BDA0003139629090000062
1, …, n; in this embodiment, 10 reference points are selected and may be adjusted according to specific implementation conditions.
S3.2, respectively selecting 4 control points in a world coordinate system and a camera projection plane by adopting a Principal Component Analysis (PCA) method through the selected n reference points, and respectively recording the control points as:
Figure BDA0003139629090000063
j-1, …,4 and
Figure BDA0003139629090000064
j is 1, …, 4. Satisfies the following conditions:
Figure BDA0003139629090000065
wherein, aijIs a homogeneous barycentric coordinate; the condition indicates that the selected 4 control points can represent the 3D reference point in any world coordinate system by weighting; in the projection plane, the reference point and the control point have the same weighting relation;
s3.3, obtaining coordinates of the 4 control points in a world coordinate system and a camera coordinate system through the step S3.1 and the step S3.2, and obtaining a rotation matrix R and a translation matrix t which are called as an external reference matrix of the camera by using a 3D-3D algorithm;
s3.3, testing the reference points selected in all other data sets by using the camera external parameter matrix obtained in the step S3.3 as an initial hypothesis model in the Randac algorithm by using the Randac algorithm, namely comparing the estimated 2D screen coordinate obtained by transforming the 3D space coordinate of the reference point through the camera external parameter matrix with the actual 2D screen coordinate of the reference point obtained in the step S3.1 to obtain an estimated-actual coordinate distance difference which is recorded as DmnWherein m is the serial number of a reference point selected from single data in the data set, and n is the serial number of the data in the data set; setting a threshold value d according to the actual precision requirement0If d ismn<=d0If the reference point is a local point, the reference point is determined to be a local point, otherwise, the reference point is a local point; in this example, d is selected0=1mm。
S3.4, in the first iteration, randomly selecting a part of data in the data set to start iteration, and setting the obtained camera extrinsic parameter matrix as an optimal extrinsic parameter matrix;
and S3.5, repeating the Randac algorithm to carry out multiple Randac iterations. Before iteration, a threshold value k is set for determining whether the number of local points obtained in one iteration meets the precision requirement. At the same time, the threshold value k cannot be set too high in order to prevent overfitting. In each Ranmac iteration, if the proportion of the number of local points to the total number of the reference points is greater than a threshold value k, and the number of local points is greater than the number of local points of the previous optimal external parameter matrix, the camera external parameter matrix of the iteration is set as the optimal external parameter matrix. In this embodiment, according to the requirement of precision, the threshold value k is set to 80%, that is, the ratio of the local inner point to the local outer point is 4: 1, the camera external parameter matrix under the condition can be qualified to select the optimal external parameter matrix; continuously carrying out Ransac iteration until the iteration is finished, and obtaining the optimal external parameter matrix of the camera under the data set; the iteration times can be set according to actual conditions, and under the ordinary condition, the more the iteration times, the higher the accuracy, but the higher the time cost, and a reasonable value needs to be determined according to the actual conditions; in this embodiment, the number of iterations is 10000 according to the precision requirement.
And S3.6, obtaining the optimal external parameter matrix to obtain the pose of the camera, and further obtaining the pose of the object model in a camera coordinate system.
S4: constructing a neural network based on the object posture detector constructed in the step S3, and training the neural network by using the data set obtained in the step S2, wherein the method comprises the following steps:
s4.1, constructing a neural network by adopting an open-source PyTorch deep learning framework by adopting Python as a programming language and considering the flexibility and the size of a program;
s4.2, the method aims to be suitable for multiple scenes and has strong generalization capability. The data set generated in step S2 is relatively single, and in order to effectively prevent overfitting and force the neural network to focus more on the features of the pose of the estimated object, rather than the relationship between the pose and the background, the data set generated in step S2 is subjected to augmentation processing to obtain an augmented data set, which specifically includes the following steps:
s4.2.1, obtaining the pose of the object model by using the real data provided by the data set obtained in the step S2, and then cutting the object model;
s4.2.2, synthesizing the cut object model with other pictures to achieve the purpose of replacing the background picture;
s4.2.3, image processing including saturation change, brightness change and noise addition is performed on the synthesized image resulting in an augmented data set.
S4.3, training the neural network constructed in the step S4.1 by using the augmentation data set obtained in the step S4.2, wherein 20% of data is used for training, and the part of data becomes a training data set; 80% of the data is used for evaluation, this part of the data is called an evaluation dataset;
and S4.4, setting a standard to evaluate the final effect. Obtaining the estimated pose of the object model in the estimation data set through the training and estimation in the step S4.3; in step S2, the actual specific coordinate positions of the object model have been obtained, so that there is a one-to-one correspondence between the two sets of data. Using the actual specific coordinate position of the object model obtained in step S2 and the estimated pose of the object model in the evaluation dataset to construct bounding boxes of the object model by using the K-DOP algorithm, which are called actual bounding boxes and estimated bounding boxes; obtaining the overlapping relation between the corresponding estimated bounding box and the actual bounding box by adopting a bounding box collision algorithm, and judging whether the accuracy standard is reached; the accuracy criteria are set according to the accuracy requirements, and in this embodiment, the accuracy is set to 90% bounding box overlap.
S5: migrating the object posture detector trained in the step S4 to a real platform;
applying the trained object posture detector to a mechanical arm in a laboratory, constructing and training according to the steps S1-S4, and calculating a grabbing point according to the posture of the object to realize the recognition and grabbing of the object by the mechanical arm or an intelligent robot in a real platform, wherein the step of calculating the grabbing point is as follows:
and S5.1, calculating the bounding box by adopting a K-DOP algorithm according to the pose of the object model.
S5.2, selecting a grabbing point on the bounding box according to the actual type of the mechanical claw of the mechanical arm.

Claims (10)

1. The method for recognizing and grabbing the object posture based on deep learning is characterized by comprising the following steps of:
s1: building a virtual environment and constructing a mechanical arm working platform model;
s2: randomizing an object on the virtual model of the mechanical arm working platform based on the virtual environment established in the step S1, obtaining a camera shooting image, and acquiring a data set;
s3: constructing an object attitude detector;
s4: constructing a neural network based on the object posture detector constructed in the step S3, and training the neural network by using the data set obtained in the step S2;
s5: and migrating the object posture detector trained in the step S4 to a real platform.
2. The method for intelligent robot based on deep learning object posture recognition and grasp as claimed in claim 1, wherein step S1 includes the following steps:
s1.1, acquiring the size and the shape of a mechanical arm working platform in a real environment, and establishing the mechanical arm and the mechanical arm working platform one by one in a virtual environment; simultaneously constructing a plurality of object models;
s1.2, splicing the object models obtained in the step S1.1 in a virtual environment, and simulating a real mechanical arm working platform and an actual basic environment.
3. The method for intelligent robot based on deep learning object posture recognition and grasp as claimed in claim 2, wherein in step S2, the randomization process comprises:
randomizing the appearance and drop position of a plurality of different object models;
randomizing the color and material of the object model;
randomizing ambient lighting.
4. The method for object pose recognition and capture based on deep learning of claim 3, wherein in step S2, after the randomization process, RGB images of camera lens angles in the virtual environment are obtained as the data set, and the specific position of the object model in the images in the data set is obtained for subsequent verification.
5. The method for intelligent robot based on deep learning object pose recognition and capture according to claim 4, wherein in step S3, the object pose detector is constructed by using EPnP algorithm and Randac algorithm, comprising the following steps:
s3.1, adopting an EPnP algorithm, and randomly selecting n reference points in a space between a workbench and a camera in a virtual environment; obtaining the 3D coordinates of the reference points in the world coordinate system and recording the coordinates
Figure FDA0003139629080000011
i is 1, …, n, and the 2D coordinates of the reference points on the projection plane shot by the camera are acquired and recorded as
Figure FDA0003139629080000012
i=1,…,n;
S3.2, respectively selecting 4 control points in a world coordinate system and a camera projection plane by adopting a Principal Component Analysis (PCA) method through the selected n reference points, and respectively recording the control points as:
Figure FDA0003139629080000013
j-1, …,4 and
Figure FDA0003139629080000014
j is 1, …,4, satisfying:
Figure FDA0003139629080000021
wherein, aijIs a homogeneous barycentric coordinate; the condition indicates that the selected 4 control points can represent the 3D reference point in any world coordinate system by weighting; in the projection plane, the reference point and the control point have the same weighting relation;
s3.3, obtaining coordinates of the 4 control points in a world coordinate system and a camera coordinate system through the step S3.1 and the step S3.2, and obtaining a rotation matrix R and a translation matrix t which are called as an external reference matrix of the camera by using a 3D-3D algorithm;
s3.3, testing the reference points selected in all other data sets by using the camera external parameter matrix obtained in the step S3.3 as an initial hypothesis model in the Randac algorithm by using the Randac algorithm, namely comparing the estimated 2D screen coordinate obtained by transforming the 3D space coordinate of the reference point through the camera external parameter matrix with the actual 2D screen coordinate of the reference point obtained in the step S3.1 to obtain an estimated-actual coordinate distance difference which is recorded as DmnWherein m is the serial number of a reference point selected from single data in the data set, and n is the serial number of the data in the data set; setting a threshold value d according to the actual precision requirement0If d ismn<=d0If the reference point is a local point, the reference point is determined to be a local point, otherwise, the reference point is a local point;
s3.4, in the first iteration, randomly selecting a part of data in the data set to start iteration, and setting the obtained camera extrinsic parameter matrix as an optimal extrinsic parameter matrix;
s3.5, repeating the Randac algorithm for multiple Randac iterations to obtain an optimal external parameter matrix of the camera under the data set;
and S3.6, obtaining the optimal external parameter matrix to obtain the pose of the camera, and further obtaining the pose of the object model in a camera coordinate system.
6. The method for recognizing and grabbing an intelligent robot based on deep learning object posture as claimed in claim 5, wherein in step S3.5, before performing the iteration, a threshold k is set for determining whether the number of local points obtained in one iteration meets the accuracy requirement; in each Ranmac iteration, if the proportion of the number of the local points in the total number of the reference points is greater than a threshold value k and the number of the local points is greater than the number of the local points of the previous optimal external parameter matrix, setting the camera external parameter matrix of the iteration as the optimal external parameter matrix; continuously carrying out Ransac iteration until the iteration is finished, and obtaining an optimal external parameter matrix of the camera under the training data set; the number of iterations is set according to the actual situation.
7. The method for intelligent robot based on deep learning object posture recognition and grasp as claimed in claim 6, wherein step S4 includes the following steps:
s4.1, constructing a neural network by adopting an open-source PyTorch deep learning framework by adopting Python as a programming language and considering the flexibility and the size of a program;
s4.2, performing augmentation processing on the data set generated in the step S2 to obtain an augmented data set;
s4.3, training the neural network constructed in the step S4.1 by using the augmentation data set obtained in the step S4.2, wherein 20% of data is used for training, and the part of data becomes a training data set; 80% of the data is used for evaluation, this part of the data is called an evaluation dataset;
s4.4, setting a standard to evaluate the final effect, and obtaining the estimated pose of the object model in the evaluation data set through the training and evaluation in the step S4.3; using the actual specific coordinate position of the object model obtained in step S2 and the estimated pose of the object model in the evaluation dataset to construct bounding boxes of the object model by using the K-DOP algorithm, which are called actual bounding boxes and estimated bounding boxes; and obtaining the overlapping relation between the corresponding estimated bounding box and the actual bounding box by adopting a bounding box collision algorithm, and judging whether the accuracy standard is reached.
8. The method for recognizing and grabbing an intelligent robot based on deep learning object posture as claimed in claim 7, wherein the step S4.2 comprises the following steps:
s4.2.1, obtaining the pose of the object model by using the real data provided by the data set obtained in the step S2, and then cutting the object model;
s4.2.2, synthesizing the cut object model with other pictures to achieve the purpose of replacing the background picture;
s4.2.3, image processing including saturation change, brightness change and noise addition is performed on the synthesized image resulting in an augmented data set.
9. The method for intelligent robot based on deep learning object pose recognition and capture as claimed in claim 8, wherein in step S5, the trained object pose detector is applied to the robot arm in the laboratory, and after the construction and training according to steps S1-S4, the capture point is calculated according to the pose of the object, so as to realize the recognition and capture of the object by the robot arm or the intelligent robot in the real platform.
10. The method for the intelligent robot for recognizing and grabbing the object posture based on the deep learning of any one of claims 1-9, wherein the step of calculating the grabbing point is as follows:
s5.1, calculating a bounding box by adopting a K-DOP algorithm according to the pose of the object model;
s5.2, selecting a grabbing point on the bounding box according to the actual type of the mechanical claw of the mechanical arm.
CN202110732696.7A 2021-06-29 2021-06-29 Object gesture recognition and grabbing intelligent robot method based on deep learning Active CN113370217B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110732696.7A CN113370217B (en) 2021-06-29 2021-06-29 Object gesture recognition and grabbing intelligent robot method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110732696.7A CN113370217B (en) 2021-06-29 2021-06-29 Object gesture recognition and grabbing intelligent robot method based on deep learning

Publications (2)

Publication Number Publication Date
CN113370217A true CN113370217A (en) 2021-09-10
CN113370217B CN113370217B (en) 2023-06-16

Family

ID=77579948

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110732696.7A Active CN113370217B (en) 2021-06-29 2021-06-29 Object gesture recognition and grabbing intelligent robot method based on deep learning

Country Status (1)

Country Link
CN (1) CN113370217B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114310954A (en) * 2021-12-31 2022-04-12 北京理工大学 Self-adaptive lifting control method and system for nursing robot
CN114474056A (en) * 2022-01-26 2022-05-13 北京航空航天大学 Grabbing operation-oriented monocular vision high-precision target positioning method
CN115082795A (en) * 2022-07-04 2022-09-20 梅卡曼德(北京)机器人科技有限公司 Virtual image generation method, device, equipment, medium and product
CN115070780A (en) * 2022-08-24 2022-09-20 北自所(北京)科技发展股份有限公司 Industrial robot grabbing method and device based on digital twinning and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109903332A (en) * 2019-01-08 2019-06-18 杭州电子科技大学 A kind of object's pose estimation method based on deep learning
US10427306B1 (en) * 2017-07-06 2019-10-01 X Development Llc Multimodal object identification
CN110782492A (en) * 2019-10-08 2020-02-11 三星(中国)半导体有限公司 Pose tracking method and device
CN112150551A (en) * 2020-09-25 2020-12-29 北京百度网讯科技有限公司 Object pose acquisition method and device and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10427306B1 (en) * 2017-07-06 2019-10-01 X Development Llc Multimodal object identification
CN109903332A (en) * 2019-01-08 2019-06-18 杭州电子科技大学 A kind of object's pose estimation method based on deep learning
CN110782492A (en) * 2019-10-08 2020-02-11 三星(中国)半导体有限公司 Pose tracking method and device
CN112150551A (en) * 2020-09-25 2020-12-29 北京百度网讯科技有限公司 Object pose acquisition method and device and electronic equipment

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114310954A (en) * 2021-12-31 2022-04-12 北京理工大学 Self-adaptive lifting control method and system for nursing robot
CN114310954B (en) * 2021-12-31 2024-04-16 北京理工大学 Self-adaptive lifting control method and system for nursing robot
CN114474056A (en) * 2022-01-26 2022-05-13 北京航空航天大学 Grabbing operation-oriented monocular vision high-precision target positioning method
CN114474056B (en) * 2022-01-26 2023-07-21 北京航空航天大学 Monocular vision high-precision target positioning method for grabbing operation
CN115082795A (en) * 2022-07-04 2022-09-20 梅卡曼德(北京)机器人科技有限公司 Virtual image generation method, device, equipment, medium and product
CN115070780A (en) * 2022-08-24 2022-09-20 北自所(北京)科技发展股份有限公司 Industrial robot grabbing method and device based on digital twinning and storage medium
CN115070780B (en) * 2022-08-24 2022-11-18 北自所(北京)科技发展股份有限公司 Industrial robot grabbing method and device based on digital twinning and storage medium

Also Published As

Publication number Publication date
CN113370217B (en) 2023-06-16

Similar Documents

Publication Publication Date Title
CN113370217B (en) Object gesture recognition and grabbing intelligent robot method based on deep learning
CN109344882B (en) Convolutional neural network-based robot control target pose identification method
CN109816725B (en) Monocular camera object pose estimation method and device based on deep learning
CN111695562B (en) Autonomous robot grabbing method based on convolutional neural network
CN109986560B (en) Mechanical arm self-adaptive grabbing method for multiple target types
CN115097937A (en) Deep learning system for cuboid detection
KR101964282B1 (en) 2d image data generation system using of 3d model, and thereof method
US20210205988A1 (en) Task embedding for device control
CN111553949B (en) Positioning and grabbing method for irregular workpiece based on single-frame RGB-D image deep learning
CN111127548B (en) Grabbing position detection model training method, grabbing position detection method and grabbing position detection device
CN112639846A (en) Method and device for training deep learning model
Rambach et al. Learning 6dof object poses from synthetic single channel images
CN108961144A (en) Image processing system
CN110910452B (en) Low-texture industrial part pose estimation method based on deep learning
JP2021163503A (en) Three-dimensional pose estimation by two-dimensional camera
CN111832592A (en) RGBD significance detection method and related device
CN115903541A (en) Visual algorithm simulation data set generation and verification method based on twin scene
Marchand Control camera and light source positions using image gradient information
Inoue et al. Transfer learning from synthetic to real images using variational autoencoders for robotic applications
CN115147488A (en) Workpiece pose estimation method based on intensive prediction and grasping system
CN111489394A (en) Object posture estimation model training method, system, device and medium
JP2021163502A (en) Three-dimensional pose estimation by multiple two-dimensional cameras
JP2021176078A (en) Deep layer learning and feature detection through vector field estimation
Arents et al. Construction of a smart vision-guided robot system for manipulation in a dynamic environment
CN117115917A (en) Teacher behavior recognition method, device and medium based on multi-modal feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant