CN118003340A - Visual mechanical arm material grabbing control method and system based on deep learning - Google Patents

Visual mechanical arm material grabbing control method and system based on deep learning Download PDF

Info

Publication number
CN118003340A
CN118003340A CN202410412623.3A CN202410412623A CN118003340A CN 118003340 A CN118003340 A CN 118003340A CN 202410412623 A CN202410412623 A CN 202410412623A CN 118003340 A CN118003340 A CN 118003340A
Authority
CN
China
Prior art keywords
camera
target material
mechanical arm
information
deep learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410412623.3A
Other languages
Chinese (zh)
Inventor
蔡文智
周尚志
陈志明
李青松
高渊州
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Yiming Robot Automation Co ltd
Original Assignee
Xiamen Yiming Robot Automation Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Yiming Robot Automation Co ltd filed Critical Xiamen Yiming Robot Automation Co ltd
Priority to CN202410412623.3A priority Critical patent/CN118003340A/en
Publication of CN118003340A publication Critical patent/CN118003340A/en
Pending legal-status Critical Current

Links

Landscapes

  • Length Measuring Devices By Optical Means (AREA)

Abstract

The invention discloses a visual mechanical arm material grabbing control method and system based on deep learning, wherein the method comprises the following steps: acquiring a camera image, and analyzing to obtain camera ranging information and camera attitude information of a target material; acquiring photoelectric data, and analyzing to obtain photoelectric ranging information and photoelectric attitude information of a target material; according to a Kalman filtering algorithm, carrying out fusion calculation on camera ranging information and photoelectric ranging information to obtain a distance estimation value of a target material; carrying out fusion calculation on the camera attitude information and the photoelectric attitude information through vector analysis and a three-dimensional geometric algorithm to obtain an attitude estimation value of a target material; according to the distance estimation value and the attitude estimation value of the target material, performing motion planning on the mechanical arm by using a deep learning algorithm, and controlling the mechanical arm to grasp the material; according to the invention, the camera and the distance measurement technology are combined and positioned, and the material posture is determined by utilizing the deep learning algorithm, so that the hardware cost can be greatly reduced on the premise of meeting the accuracy requirement.

Description

Visual mechanical arm material grabbing control method and system based on deep learning
Technical Field
The invention relates to the technical field of computer vision, in particular to a deep learning-based visual mechanical arm material grabbing control method and a deep learning-based visual mechanical arm material grabbing control system applying the method.
Background
The technology of the visual mechanical arm is a comprehensive technology integrating visual sensing and mechanical arm operation. The technology acquires image information of a target object through a vision system, and guides the mechanical arm to accurately grasp and operate through a series of processing and analysis.
The existing vision mechanical arm mainly utilizes sensors such as cameras and the like arranged at the tail end of the mechanical arm to observe and sense the surrounding environment. The sensors can capture various information such as images, depth maps, thermal infrared maps and the like, and provide raw data for subsequent image processing and target recognition. The mechanical arm processes the image acquired by the sensor to extract the required information. Through image processing technology, the mechanical arm can identify the object of interest and determine the characteristics and properties thereof. After the target is detected and identified, the mechanical arm needs to perform motion planning and control: the planning stage involves the problems of determining the motion path of the mechanical arm, avoiding collision and the like so as to ensure that the grabbing task is completed safely and efficiently; the control stage includes motor driving, joint control, etc. to make the mechanical arm move precisely in the planned path. Finally, the vision mechanical arm performs action execution according to the obtained instruction, so that the tasks of grabbing, carrying, assembling and the like of the target object are realized.
Along with the continuous deepening of the application scene of the mechanical arm, higher and higher requirements are put on the grabbing accuracy of the mechanical arm. Although the high-precision 3D camera can provide more accurate and finer three-dimensional data, the accuracy of material positioning and grabbing is remarkably improved. However, there are also limitations to the robotic arm relying solely on a 3D camera for material positioning and gripping:
First, the performance of the 3D camera is greatly affected by environmental factors. The accuracy and stability of the data acquired by the 3D camera may be affected by lighting conditions, reflective characteristics of the object surface, and other interference factors in the environment. For example, in environments where light is insufficient or reflection is strong, the 3D camera may not be able to acquire a clear three-dimensional image, resulting in positioning failure or inaccurate capture.
Second, for complex shapes or materials stacked together, identification and positioning of 3D cameras can be challenging. Because these materials may have problems of shielding, overlapping, irregular shapes, etc., it may be difficult for the 3D camera to accurately extract characteristic information of the materials, thereby affecting the accuracy of positioning and grabbing.
Disclosure of Invention
The invention mainly aims to provide a visual mechanical arm material grabbing control method and system based on deep learning, which can greatly reduce hardware cost on the premise of meeting accuracy requirements by fusing and positioning a camera and a distance measurement technology and determining the material posture by using a deep learning algorithm.
In order to achieve the above purpose, the invention provides a visual mechanical arm material grabbing control method based on deep learning, which comprises the following steps:
step 10, acquiring a camera image, and analyzing the camera image to obtain camera ranging information and camera attitude information of a target material;
step 20, acquiring photoelectric data, and analyzing the photoelectric data to obtain photoelectric ranging information and photoelectric attitude information of a target material;
step 30, carrying out fusion calculation on the camera ranging information and the photoelectric ranging information according to a Kalman filtering algorithm to obtain a distance estimation value of a target material;
step 40, carrying out fusion calculation on the camera attitude information and the photoelectric attitude information through vector analysis and a three-dimensional geometric algorithm to obtain an attitude estimation value of a target material;
And 50, performing motion planning on the mechanical arm by using a deep learning algorithm according to the distance estimated value and the gesture estimated value of the target material, and controlling the mechanical arm to perform material grabbing.
Preferably, in the step 10, the analyzing the camera image specifically includes:
step 11, obtaining a depth image and a color image of a scene where the target material is located;
Acquiring a depth value corresponding to each pixel point according to the depth image;
calculating camera ranging information of the target material according to the depth value;
step 12, extracting the characteristics of the target material according to the depth image and the color image;
Identifying a target material by a feature matching or template matching or machine learning method, and determining the projection shape of the target material in an image;
Comparing the projection shape with a preset material model to obtain the direction and angle of the target material in a three-dimensional space;
And obtaining camera attitude information of the target material according to the direction and the angle.
Preferably, in the step 20, the analyzing the photoelectric data specifically includes:
Step 21, scanning an area where a target material is located through laser radar equipment, and acquiring three-dimensional point cloud data containing the target material;
extracting a point cloud subset corresponding to the target material through point cloud registration and segmentation;
extracting characteristics of a target material from the point cloud subset;
Step 22, identifying a target material through a feature matching or model matching or clustering algorithm or a machine learning method, and determining positioning information of the target material in a three-dimensional space;
analyzing photoelectric ranging information from each point in the point cloud subset to the laser radar equipment according to the positioning information;
step 23, analyzing the characteristics of the target material according to a model matching or deep learning method to obtain the direction and angle of the target material in a three-dimensional space;
and obtaining photoelectric attitude information of the target material according to the direction and the angle.
Preferably, in the step 30, fusion calculation is performed on the camera ranging information and the photoelectric ranging information according to a kalman filtering algorithm to obtain a distance estimation value of the target material, and the method specifically includes the following steps:
Step 31, calculating a fusion measurement value:
Wherein Z Measurement of represents a single measurement value obtained by fusion calculation, x represents electro-optical ranging information, and d1 represents a laser ranging random variable variance obtained by measurement; y represents camera ranging information, d2 represents a measured camera ranging random variable variance;
step 32, calculating the Kalman gain:
Wherein, Representing Kalman gain,/>Representing the random estimation error covariance,/>Representing measurement error covariance;
Step 33, calculating the final estimated value:
X Prediction 2=X Prediction 1+(Z Measurement of -X Prediction 1)
Wherein, X Prediction 2 represents the final estimated value, Z Measurement of represents the single measured value obtained by fusion calculation, and X Prediction 1 represents the initial value or the last estimated value.
Further, the method also comprises repeating the steps 31-33 and performing i iterations, and specifically comprises the following steps:
Step 31', calculating an ith fusion measured value Z Measurement of i, wherein the calculation formula is the same as that of the step 31;
step 32', calculate Kalman gain
Wherein,Kalman gain representing the ith iteration,/>Representing the kalman gain for the i-1 th iteration,Represents the random estimation error covariance of the ith iteration,/>Represents the random estimation error covariance of the i-1 th iteration,/>Representing the measurement error covariance of the ith iteration;
step 33', calculating a final iteration estimate:
X Prediction i=X Prediction i-1+(Z Measurement of -X Prediction i-1)
Wherein X Prediction i represents the i-th iteration estimated value, Z Measurement of represents the single measurement value obtained by fusion calculation, and X Prediction i-1 represents the i-1-th iteration estimated value.
Preferably, in the step 40, the camera pose information and the photoelectric pose information are fused and calculated through vector analysis and a three-dimensional geometric algorithm to obtain a pose estimation value of the target material, and the method specifically includes the following steps:
Step 41, converting the camera gesture information and the photoelectric gesture information into vector forms respectively; the vectors include a position vector and a direction vector;
Step 42, calculating cosine values of included angles between the vector and each coordinate axis by using a direction cosine formula of the vector;
step 43, calculating an included angle of the vector by using an inverse cosine function arccos, and obtaining a relative position relationship between the mechanical arm and the target material according to the included angle; the relative positional relationship includes camera positional information and photoelectric positional information;
And step 44, performing fusion calculation on the camera position information and the photoelectric position information by using Kalman filtering to obtain an estimated posture value of the fused target material.
Preferably, in the step 50, according to the distance estimation value and the attitude estimation value of the target material, the motion planning is performed on the mechanical arm by using a deep learning algorithm, and the mechanical arm is controlled to perform material grabbing, which specifically includes the following steps:
Step 51, obtaining a distance estimation value and a posture estimation value of a target material, and inputting the distance estimation value and the posture estimation value into a YOLOv deep learning model, wherein the YOLOv deep learning model calculates and outputs a motion track of the mechanical arm by using a track planning algorithm;
step 52, converting the motion track into a track instruction executable by the mechanical arm:
step 53, converting the track instruction into a joint angle instruction of the mechanical arm through the mechanical arm controller, and sending the joint angle instruction to the mechanical arm to execute grabbing operation;
And step 54, acquiring the latest distance estimated value and the latest attitude estimated value of the target material in real time in the process of executing the grabbing task by the mechanical arm, continuously running YOLOv a deep learning model and updating the motion trail, and executing steps 52-53 according to the updated motion trail.
Wherein the YOLOv deep learning model further includes:
The Backbone network Backbone comprises a convolution module Conv, a C3 module and a pooling module SPPF, wherein the convolution module Conv is provided with a series of convolution layers and is stacked according to a specific sequence and parameters, the C3 module is used for extracting features, and the pooling module SPPF is used for analyzing the features;
The Head assembly Head comprises a splicing module Concat and a detection module Detect, wherein the splicing module Concat is used for carrying out up-sampling and splicing processing on the characteristics extracted by the backbone network backbone, and the detection module Detect is used for generating a final detection result by utilizing the spliced characteristic diagram;
A neck component Neck, configured to connect the Backbone network backhaul and the Head component Head, and perform optimization processing on a feature output by the Backbone network backhaul;
the bottleneck component Bottleneck is used for reducing the dimension of the feature map and extracting the features, and carrying out dimension lifting to restore the original channel number after the feature extraction is completed.
Preferably, the step 10 further includes:
the camera calibration step:
the camera is arranged on the end effector of the mechanical arm;
Collecting a series of calibration plate images under different postures and positions for calibrating camera parameters;
Calculating camera internal parameter and camera external parameter by using a calibration algorithm;
the hand-eye calibration step:
According to camera external parameter corresponding to each image in the camera calibration result, combining the tail end gesture of the corresponding mechanical arm when each image is acquired, and performing hand-eye calibration;
calculating a conversion relation between hand-eye coordinate systems by using a hand-eye calibration algorithm, wherein the conversion relation comprises a rotation matrix and a translation vector;
And converting the ranging information and the gesture information of the target material from the mechanical arm coordinate system to the hand-eye coordinate system, so as to control the mechanical arm to execute grabbing operation according to the ranging information and the gesture information of the target material in the hand-eye coordinate system.
Corresponding to the deep learning-based visual mechanical arm material grabbing control method, the invention provides a deep learning-based visual mechanical arm material grabbing control system, which comprises the following components:
a 3D camera or binocular camera for capturing camera images;
the laser radar is used for collecting photoelectric data;
the mechanical arm is used for executing material grabbing operation;
The method also comprises a memory, a processor and a material grabbing control program which is stored in the memory and can run on the processor, wherein the material grabbing control program realizes the steps of the visual mechanical arm material grabbing control method based on deep learning according to any one of the above steps when being executed by the processor.
The beneficial effects of the invention are as follows:
(1) Although camera positioning can provide rich visual information, positioning accuracy can be affected under the condition of poor illumination conditions or unobvious characteristics of a target object; while laser ranging is not affected by illumination conditions, the laser ranging provides relatively less information and has limited adaptability to complex environments. Therefore, the invention combines the two technologies, the laser ranging sensor has the characteristic of being not influenced by illumination conditions, and can perform accurate distance measurement in various light environments, so that the laser ranging sensor has stronger robustness in environments with insufficient light or strong light reflection, and the laser ranging data and the three-dimensional image acquired by the 3D camera are combined to complement each other, thereby reducing the influence of environmental factors on positioning precision and improving the positioning robustness and precision.
(2) For materials of complex shape or stacked together, the fused positioning of laser ranging and 3D cameras can provide more comprehensive information. Laser ranging can accurately measure the distance to an object, while a 3D camera can acquire the three-dimensional shape and structure of the object. By combining the data of the two, the edge, the outline and the characteristic points of the material can be more accurately identified and positioned, so that the accurate grabbing of the complex material is realized.
(3) The application of the deep learning algorithm further improves the performance of the positioning system. By training the deep learning model, the system can learn how to extract useful features from the camera and laser ranging data, and determine the pose of the material accordingly. The method not only improves the positioning accuracy, but also can adapt to the change of different environments and materials, and enhances the flexibility and the universality of the system.
(4) On the premise of meeting the accuracy requirement, the method and the device for positioning the mobile terminal can reduce hardware cost by fusing the positioning scheme. This is because the shortage of hardware accuracy can be made up to a certain extent by optimization of the algorithm and fusion of data, thereby reducing the demand for high-accuracy hardware. This not only reduces the manufacturing cost of the system, but also improves the reliability and stability of the system.
(5) The invention utilizes Kalman filtering algorithm to perform fusion optimization on data, thereby further reducing noise and error and improving the stability and reliability of positioning; and through repeated iterative computation, the initial estimated value obtained by Kalman filtering can be further optimized and adjusted, and each iteration can update state estimation according to the previous estimated value and new observed data, so that the actual value is gradually approximated, the estimation error is reduced, and the estimation precision is improved.
(6) According to the invention, real-time visual positioning is performed through 3d visual positioning, and meanwhile, point laser ranging is utilized, and the two are fused for positioning, so that the 3d visual positioning error is reduced from 2% to 0.5%; and determining the 3D gesture of the object in real time by utilizing a deep learning algorithm, converting the visual coordinates into mechanical arm coordinates, and adjusting the gesture of the tail end of the mechanical arm by calculating the Euler angle of the mechanical arm so as to grasp the material in a proper gesture. The invention can realize the effect of high-precision laser scanning cameras by adopting the binocular cameras or the household 3d cameras, reduces the cost to 10 percent, combines the convolutional neural algorithm to perform target identification, has strong anti-interference performance and meets the industrial application requirement.
(7) The method for planning the movement of the mechanical arm based on the YOLOv deep learning model can better adapt to the position and posture change of the target material, thereby obviously improving the grabbing precision and success rate; in addition, in the process of executing the grabbing task by the mechanical arm, the invention acquires the latest distance and posture information of the target material in real time, and continuously updates the motion track, and the mechanical arm can cope with the position change or posture adjustment of the target material in the grabbing process by the real-time adjustment function, so that the mechanical arm is more automatic and intelligent.
(8) According to the YOLOv deep learning model, the characteristics in the image can be extracted more effectively by optimizing the Conv and C3 modules in the Backbone network back, so that the recognition accuracy of the target material is improved; the use of the pooling module SPPF is beneficial to carrying out finer analysis on the characteristics, and the detection accuracy is further improved; the bottleneck component Bottleneck is introduced, so that the model can perform feature extraction after dimension reduction, and the dimension is increased to restore the original channel number, so that the key information can be reserved while the calculated amount is reduced, the model can cope with the changes in different environments and scenes, the robustness of the model is enhanced, meanwhile, the calculation complexity can be reduced to a certain extent, and the reasoning speed of the model is improved, so that the model is more suitable for tasks such as real-time material grabbing.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the specific embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The robotic arm is a movable arm formed by a series of rigid components joined by links that provide a flexible wrist and end effector for performing tasks required by the robot. The mechanical arm can realize three movements of rotation, lifting and telescoping, wherein the rotation, lifting and telescoping movements are completed by the cross arm and the column production. The mechanical arm moves through a computer program, and for different tasks, the motion track of the joint space of the mechanical arm needs to be planned, so that the terminal pose is formed in a cascading way.
The vision robot arm is an industrial robot that incorporates machine vision technology. The working principle of the system mainly depends on a vision system to acquire and process environmental information, so that the target object is identified, positioned and grabbed. According to the invention, the camera and the laser radar are arranged at the tail end of the mechanical arm to identify the target material, the ranging information and the gesture information of the target material are obtained, and the track instruction of the mechanical arm is obtained through coordinate conversion, so that the mechanical arm is accurately controlled to execute grabbing operation.
The vision mechanical arm has wide application in the fields of industrial automation, logistics, manufacturing industry and the like. For example, on a factory production line, a vision robot may be used for product assembly, quality inspection, packaging, and like tasks. With the development of technologies such as 5G, cloud computing, edge computing and the like, the application of machine vision and mechanical arm systems will be more extensive, and more intelligent manufacturing and intelligent warehousing systems based on machine vision will be developed in the future.
Therefore, the invention provides a visual mechanical arm material grabbing control method based on deep learning, which comprises the following steps:
step 10, acquiring a camera image, and analyzing the camera image to obtain camera ranging information and camera attitude information of a target material;
step 20, acquiring photoelectric data, and analyzing the photoelectric data to obtain photoelectric ranging information and photoelectric attitude information of a target material;
step 30, carrying out fusion calculation on the camera ranging information and the photoelectric ranging information according to a Kalman filtering algorithm to obtain a distance estimation value of a target material;
step 40, carrying out fusion calculation on the camera attitude information and the photoelectric attitude information through vector analysis and a three-dimensional geometric algorithm to obtain an attitude estimation value of a target material;
And 50, performing motion planning on the mechanical arm by using a deep learning algorithm according to the distance estimated value and the gesture estimated value of the target material, and controlling the mechanical arm to perform material grabbing.
In this embodiment, in the step 10, analyzing the camera image specifically includes:
step 11, obtaining a depth image and a color image of a scene where the target material is located;
Acquiring a depth value corresponding to each pixel point according to the depth image;
calculating camera ranging information of the target material according to the depth value;
step 12, extracting the characteristics of the target material according to the depth image and the color image;
Identifying a target material by a feature matching or template matching or machine learning method, and determining the projection shape of the target material in an image;
Comparing the projection shape with a preset material model to obtain the direction and angle of the target material in a three-dimensional space;
And obtaining camera attitude information of the target material according to the direction and the angle.
In this embodiment, the image preprocessing is further included before the image is resolved, and the preprocessing process may be completed through three matrices, namely, a scaling matrix S, a translation matrix O, and a translation matrix T. The method specifically comprises the following steps:
1. the image is scaled in an equal ratio through a scaling matrix S (scaling multiple Scale is the minimum value of the width ratio and the high ratio of the target image to the source image);
2. centering the image, namely firstly translating the center of the image to an origin of an upper left corner coordinate through a translation matrix O; then translating the image to the center of the target position through a translation matrix T;
3. and filling or clipping the redundant part according to the preset image size.
In this embodiment, in the step 20, analyzing the photoelectric data specifically includes:
Step 21, scanning an area where a target material is located through laser radar equipment, and acquiring three-dimensional point cloud data containing the target material;
extracting a point cloud subset corresponding to the target material through point cloud registration and segmentation;
extracting characteristics of a target material from the point cloud subset;
Step 22, identifying a target material through a feature matching or model matching or clustering algorithm or a machine learning method, and determining positioning information of the target material in a three-dimensional space;
analyzing photoelectric ranging information from each point in the point cloud subset to the laser radar equipment according to the positioning information;
step 23, analyzing the characteristics of the target material according to a model matching or deep learning method to obtain the direction and angle of the target material in a three-dimensional space;
and obtaining photoelectric attitude information of the target material according to the direction and the angle.
In this embodiment, in the step 30, fusion calculation is performed on the camera ranging information and the photoelectric ranging information according to a kalman filtering algorithm to obtain a distance estimation value of the target material, and the method specifically includes the following steps:
Step 31, calculating a fusion measurement value:
Wherein Z Measurement of represents a single measurement value obtained by fusion calculation, x represents electro-optical ranging information, and d1 represents a laser ranging random variable variance obtained by measurement; y represents camera ranging information, d2 represents a measured camera ranging random variable variance;
step 32, calculating the Kalman gain:
Wherein, Representing Kalman gain,/>Representing the random estimation error covariance,/>Representing measurement error covariance;
Step 33, calculating the final estimated value:
X Prediction 2=X Prediction 1+(Z Measurement of -X Prediction 1)
Wherein, X Prediction 2 represents the final estimated value, Z Measurement of represents the single measured value obtained by fusion calculation, and X Prediction 1 represents the initial value or the last estimated value.
Further, the method also comprises repeating the steps 31-33 and performing i iterations, and specifically comprises the following steps:
Step 31', calculating an ith fusion measured value Z Measurement of i, wherein the calculation formula is the same as that of the step 31;
step 32', calculate Kalman gain
Wherein,Kalman gain representing the ith iteration,/>Representing the kalman gain for the i-1 th iteration,Represents the random estimation error covariance of the ith iteration,/>Represents the random estimation error covariance of the i-1 th iteration,/>Representing the measurement error covariance of the ith iteration;
step 33', calculating a final iteration estimate:
X Prediction i=X Prediction i-1+(Z Measurement of -X Prediction i-1)
Wherein X Prediction i represents the i-th iteration estimated value, Z Measurement of represents the single measurement value obtained by fusion calculation, and X Prediction i-1 represents the i-1-th iteration estimated value. In this embodiment, i is not less than 20.
In this embodiment, in the step 40, the camera pose information and the photoelectric pose information are fused and calculated by using a vector analysis and a three-dimensional geometric algorithm to obtain a pose estimation value of the target material, which specifically includes the following steps:
Step 41, converting the camera gesture information and the photoelectric gesture information into vector forms respectively; the vectors include a position vector and a direction vector;
Step 42, calculating cosine values of included angles between the vector and each coordinate axis by using a direction cosine formula of the vector;
step 43, calculating an included angle of the vector by using an inverse cosine function arccos, and obtaining a relative position relationship between the mechanical arm and the target material according to the included angle; the relative positional relationship includes camera positional information and photoelectric positional information;
And step 44, performing fusion calculation on the camera position information and the photoelectric position information by using Kalman filtering to obtain an estimated posture value of the fused target material.
The fusion calculation of step 44 mainly includes the following steps:
Initializing: an initial state estimate of the kalman filter is set, which may be based on previous knowledge, experience, or some assumption.
And a prediction step: based on the camera pose information and the mechanical arm photoelectric pose information at the current moment and the motion instruction of the mechanical arm, a prediction equation of a Kalman filter is used to estimate the state at the next moment.
Data fusion: after the predicting step, when new camera pose information and robotic arm optoelectronic pose information are available, using these data as observations; the observed value and the predicted value are fused, and the Kalman gain is calculated to determine which part of the prediction and the observation is more reliable.
Updating: the state estimate is updated using the kalman gain and the observations to obtain a more accurate state estimate.
And (3) iteration loop: and taking the updated state estimation as the input of a prediction step at the next moment, and repeating the processes of prediction, data fusion and updating. By means of an iterative loop, the Kalman filter is able to continually utilize new observations to correct and optimize the estimation of the system state.
The specific calculation process of the step 44 is similar to the specific step of performing the fusion calculation on the camera ranging information and the photoelectric ranging information according to the kalman filter algorithm in the foregoing step 30, and will not be described herein.
In this embodiment, in the step 50, according to the distance estimation value and the gesture estimation value of the target material, the mechanical arm is subjected to motion planning by using a deep learning algorithm, and the mechanical arm is controlled to grasp the material, which specifically includes the following steps:
Step 51, obtaining a distance estimation value and a posture estimation value of a target material, and inputting the distance estimation value and the posture estimation value into a YOLOv deep learning model, wherein the YOLOv deep learning model calculates and outputs a motion track of the mechanical arm by using a track planning algorithm;
step 52, converting the motion track into a track instruction executable by the mechanical arm:
step 53, converting the track instruction into a joint angle instruction of the mechanical arm through the mechanical arm controller, and sending the joint angle instruction to the mechanical arm to execute grabbing operation;
And step 54, acquiring the latest distance estimated value and the latest attitude estimated value of the target material in real time in the process of executing the grabbing task by the mechanical arm, continuously running YOLOv a deep learning model and updating the motion trail, and executing steps 52-53 according to the updated motion trail.
The attitude estimation value of the target material obtained by fusion calculation according to the camera attitude information and the photoelectric attitude information mainly comprises three angle information of RPY, which are respectively:
roll angle (Roll): a rotation angle about the X-axis for describing rotation of the object in a horizontal plane;
pitch angle (Pitch): the rotation angle around the Y axis represents the inclination degree of the object on the vertical plane;
Yaw angle (Yaw): the angle of rotation about the Z-axis describes the orientation of the object in the horizontal plane.
Then, utilizing Kalman filtering to make optimal estimation, utilizing the coordinates of the optimal estimation points to calculate 3 vectors of 3 axes, and obtaining a transformation matrix according to linear transformation after unitization; and finally, calculating the Euler angle and transmitting the Euler angle to the robot arm.
In this embodiment, the YOLOv deep learning model further includes:
The Backbone network Backbone comprises a convolution module Conv, a C3 module and a pooling module SPPF, wherein the convolution module Conv is provided with a series of convolution layers and is stacked according to a specific sequence and parameters, the C3 module is used for extracting features, and the pooling module SPPF is used for analyzing the features;
The Head assembly Head comprises a splicing module Concat and a detection module Detect, wherein the splicing module Concat is used for carrying out up-sampling and splicing processing on the characteristics extracted by the backbone network backbone, and the detection module Detect is used for generating a final detection result by utilizing the spliced characteristic diagram;
A neck component Neck, configured to connect the Backbone network backhaul and the Head component Head, and perform optimization processing on a feature output by the Backbone network backhaul;
the bottleneck component Bottleneck is used for reducing the dimension of the feature map and extracting the features, and carrying out dimension lifting to restore the original channel number after the feature extraction is completed.
In this embodiment, the step 10 further includes:
the camera calibration step:
the camera is arranged on the end effector of the mechanical arm;
Collecting a series of calibration plate images under different postures and positions for calibrating camera parameters;
Calculating camera internal parameter and camera external parameter by using a calibration algorithm;
the hand-eye calibration step:
According to camera external parameter corresponding to each image in the camera calibration result, combining the tail end gesture of the corresponding mechanical arm when each image is acquired, and performing hand-eye calibration;
calculating a conversion relation between hand-eye coordinate systems by using a hand-eye calibration algorithm, wherein the conversion relation comprises a rotation matrix and a translation vector;
And converting the ranging information and the gesture information of the target material from the mechanical arm coordinate system to the hand-eye coordinate system, so as to control the mechanical arm to execute grabbing operation according to the ranging information and the gesture information of the target material in the hand-eye coordinate system.
Corresponding to the deep learning-based visual mechanical arm material grabbing control method, the invention provides a deep learning-based visual mechanical arm material grabbing control system, which comprises the following components:
a 3D camera or binocular camera for capturing camera images;
the laser radar is used for collecting photoelectric data;
And the mechanical arm is used for executing material grabbing operation.
The 3D camera achieves the acquisition of depth information in various ways, such as structured light, time of flight (ToF), binocular vision, line scanning, and speckle technologies, so that the 3D camera can capture three-dimensional shape and position information of an object.
The method for acquiring the camera ranging information through the 3D camera comprises the following steps of:
Parallax method: this is a common method for a 3D camera to acquire distance. The distance of the object can be calculated by calculating the displacement (namely parallax) of the same object in the image, which is shot by two or more cameras; this method perceives depth information by simulating binocular vision based on the principle that human eyes observe an object.
Structured light method: the 3D camera emits a specific pattern of structured light (e.g., a grating or projected texture) into the scene and captures the reflection of the light at the object surface using the camera. The shape and position of the object in space can be calculated by analyzing the deformation of the structured light on the surface of the object;
Time of flight (ToF): this method calculates the time delay of the reflected beam of the object by sending a short pulse of light (typically infrared light) into the scene. From the propagation speed of the light, the round trip time of the light beam can be calculated, and the distance of the object can be calculated.
Wherein, camera gesture information is acquired by a 3D camera, and a PnP (PERSPECTIVE-n-Point) algorithm can be adopted. The algorithm utilizes the camera internal parameters, the position of the object in the image coordinate system and the position of the object in the corresponding world coordinate system to solve the pose of the camera in the real coordinate system. From the plurality of point pairs and the cosine theorem, the pose of the camera, i.e. the direction and angle of the object relative to the camera, can be calculated.
The laser radar uses laser as a signal source, and measures the distance and the gesture of the material by emitting pulse laser and receiving signals reflected by the pulse laser. The measuring mode has the characteristics of high precision and high resolution, so that the laser radar has remarkable advantages in the process of collecting photoelectric data. Meanwhile, the laser radar can acquire a three-dimensional image of the material, the laser radar can collect point cloud data of all points on the material by continuously scanning the material and measuring the reflection time of laser pulses, and further an accurate three-dimensional image is obtained through imaging processing, the three-dimensional image not only contains shape and size information of a target object, but also can reflect details and texture characteristics of the surface of the target object, and abundant information is provided for subsequent processing and analysis of photoelectric data.
The system further comprises a memory, a processor and a material grabbing control program stored in the memory and capable of running on the processor, wherein the material grabbing control program realizes the steps of the visual mechanical arm material grabbing control method based on deep learning according to any one of the above when being executed by the processor, and the implementation principle and the technical effect are similar, and detailed description can be found in the above embodiments and the detailed description is omitted here.
The memory may be used to store software programs and modules, and the processor executes the software programs and modules stored in the memory to perform various functional applications and data processing. The memory may mainly include a memory program area and a memory data area, wherein the memory program area may store an operating system, application programs required for at least one function (e.g., an optoelectronic data storage function and processing function, an image video storage function and processing function), and the like; the storage data area may store data created according to the use of the device, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory may also include a memory controller to provide access to the memory by the processor and the input unit.
The input unit may be used to receive input digital or character or image information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. Specifically, the input unit of the present embodiment includes, in addition to a 3D camera, a binocular camera, a laser radar, and other input devices.
The display unit may be used to display information entered by a user or provided to a user as well as various graphical user interfaces of the device, which may be composed of graphics, text, icons, video and any combination thereof. The display unit may include a display panel, and optionally, the display panel may be configured in the form of an LCD (Liquid CRYSTAL DISPLAY), an OLED (Organic Light-Emitting Diode), or the like. Further, the touch-sensitive surface may overlay the display panel, and upon detection of a touch operation thereon or thereabout, the touch-sensitive surface is communicated to the processor to determine the type of touch event, and the processor then provides a corresponding visual output on the display panel based on the type of touch event. The display unit can be arranged on the mechanical arm or the upper computer.
It should be noted that, in the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described as different from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other. For system embodiments, the description is relatively simple as it is substantially similar to method embodiments, and reference is made to the description of method embodiments for relevant points.
Also, herein, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
While the foregoing description illustrates and describes the preferred embodiments of the present invention, it is to be understood that the invention is not limited to the forms disclosed herein, but is not to be construed as limited to other embodiments, but is capable of use in various other combinations, modifications and environments and is capable of changes or modifications within the scope of the inventive concept, either as described above or as a matter of skill or knowledge in the relevant art. And that modifications and variations which do not depart from the spirit and scope of the invention are intended to be within the scope of the appended claims.

Claims (9)

1. A visual mechanical arm material grabbing control method based on deep learning is characterized by comprising the following steps:
step 10, acquiring a camera image, and analyzing the camera image to obtain camera ranging information and camera attitude information of a target material;
step 20, acquiring photoelectric data, and analyzing the photoelectric data to obtain photoelectric ranging information and photoelectric attitude information of a target material;
step 30, carrying out fusion calculation on the camera ranging information and the photoelectric ranging information according to a Kalman filtering algorithm to obtain a distance estimation value of a target material;
step 40, carrying out fusion calculation on the camera attitude information and the photoelectric attitude information through vector analysis and a three-dimensional geometric algorithm to obtain an attitude estimation value of a target material;
Step 50, performing motion planning on the mechanical arm by using a deep learning algorithm according to the distance estimation value and the attitude estimation value of the target material, and controlling the mechanical arm to perform material grabbing;
In the step 50, according to the distance estimation value and the attitude estimation value of the target material, the mechanical arm is subjected to motion planning by using a deep learning algorithm, and the mechanical arm is controlled to grasp the material, and specifically comprises the following steps:
Step 51, obtaining a distance estimation value and a posture estimation value of a target material, and inputting the distance estimation value and the posture estimation value into a YOLOv deep learning model, wherein the YOLOv deep learning model calculates and outputs a motion track of the mechanical arm by using a track planning algorithm;
step 52, converting the motion track into a track instruction executable by the mechanical arm:
step 53, converting the track instruction into a joint angle instruction of the mechanical arm through the mechanical arm controller, and sending the joint angle instruction to the mechanical arm to execute grabbing operation;
And step 54, acquiring the latest distance estimated value and the latest attitude estimated value of the target material in real time in the process of executing the grabbing task by the mechanical arm, continuously running YOLOv a deep learning model and updating the motion trail, and executing steps 52-53 according to the updated motion trail.
2. The method for controlling gripping of a visual mechanical arm material based on deep learning according to claim 1, wherein in the step 10, the analyzing the camera image specifically includes:
step 11, obtaining a depth image and a color image of a scene where the target material is located;
Acquiring a depth value corresponding to each pixel point according to the depth image;
calculating camera ranging information of the target material according to the depth value;
step 12, extracting the characteristics of the target material according to the depth image and the color image;
Identifying a target material by a feature matching or template matching or machine learning method, and determining the projection shape of the target material in an image;
Comparing the projection shape with a preset material model to obtain the direction and angle of the target material in a three-dimensional space;
And obtaining camera attitude information of the target material according to the direction and the angle.
3. The deep learning-based vision mechanical arm material grabbing control method according to claim 1, wherein in the step 20, the photoelectric data is parsed, and the method specifically includes:
Step 21, scanning an area where a target material is located through laser radar equipment, and acquiring three-dimensional point cloud data containing the target material;
extracting a point cloud subset corresponding to the target material through point cloud registration and segmentation;
extracting characteristics of a target material from the point cloud subset;
Step 22, identifying a target material through a feature matching or model matching or clustering algorithm or a machine learning method, and determining positioning information of the target material in a three-dimensional space;
analyzing photoelectric ranging information from each point in the point cloud subset to the laser radar equipment according to the positioning information;
step 23, analyzing the characteristics of the target material according to a model matching or deep learning method to obtain the direction and angle of the target material in a three-dimensional space;
and obtaining photoelectric attitude information of the target material according to the direction and the angle.
4. The method for controlling the grabbing of the visual mechanical arm material based on deep learning according to claim 1, wherein in the step 30, the camera ranging information and the photoelectric ranging information are fused and calculated according to a kalman filtering algorithm to obtain a distance estimated value of the target material, and the method specifically comprises the following steps:
Step 31, calculating a fusion measurement value:
Wherein Z Measurement of represents a single measurement value obtained by fusion calculation, x represents electro-optical ranging information, and d1 represents a laser ranging random variable variance obtained by measurement; y represents camera ranging information, d2 represents a measured camera ranging random variable variance;
step 32, calculating the Kalman gain:
Wherein, Representing Kalman gain,/>Representing the random estimation error covariance,/>Representing measurement error covariance;
Step 33, calculating the final estimated value:
X Prediction 2=X Prediction 1+(Z Measurement of -X Prediction 1)
Wherein, X Prediction 2 represents the final estimated value, Z Measurement of represents the single measured value obtained by fusion calculation, and X Prediction 1 represents the initial value or the last estimated value.
5. The deep learning-based vision mechanical arm material grabbing control method of claim 4, further comprising repeating the steps 31-33 and performing i iterations, and specifically comprising:
Step 31', calculating an ith fusion measured value Z Measurement of i, wherein the calculation formula is the same as that of the step 31;
step 32', calculate Kalman gain
Wherein,Kalman gain representing the ith iteration,/>Kalman gain representing the i-1 th iteration,/>Represents the random estimation error covariance of the ith iteration,/>Represents the random estimated error covariance of the i-1 th iteration,Representing the measurement error covariance of the ith iteration;
step 33', calculating a final iteration estimate:
X Prediction i=X Prediction i-1+(Z Measurement of -X Prediction i-1)
Wherein X Prediction i represents the i-th iteration estimated value, Z Measurement of represents the single measurement value obtained by fusion calculation, and X Prediction i-1 represents the i-1-th iteration estimated value.
6. The deep learning-based vision mechanical arm material grabbing control method is characterized by comprising the following steps of: in the step 40, the camera pose information and the photoelectric pose information are fused and calculated through vector analysis and a three-dimensional geometric algorithm to obtain a pose estimation value of the target material, and the method specifically comprises the following steps:
Step 41, converting the camera gesture information and the photoelectric gesture information into vector forms respectively; the vectors include a position vector and a direction vector;
Step 42, calculating cosine values of included angles between the vector and each coordinate axis by using a direction cosine formula of the vector;
step 43, calculating an included angle of the vector by using an inverse cosine function arccos, and obtaining a relative position relationship between the mechanical arm and the target material according to the included angle; the relative positional relationship includes camera positional information and photoelectric positional information;
And step 44, performing fusion calculation on the camera position information and the photoelectric position information by using Kalman filtering to obtain an estimated posture value of the fused target material.
7. The deep learning-based vision robot material grabbing control method of claim 1, wherein the YOLOv deep learning model further comprises:
The Backbone network Backbone comprises a convolution module Conv, a C3 module and a pooling module SPPF, wherein the convolution module Conv is provided with a series of convolution layers and is stacked according to a specific sequence and parameters, the C3 module is used for extracting features, and the pooling module SPPF is used for analyzing the features;
The Head assembly Head comprises a splicing module Concat and a detection module Detect, wherein the splicing module Concat is used for carrying out up-sampling and splicing processing on the characteristics extracted by the backbone network backbone, and the detection module Detect is used for generating a final detection result by utilizing the spliced characteristic diagram;
A neck component Neck, configured to connect the Backbone network backhaul and the Head component Head, and perform optimization processing on a feature output by the Backbone network backhaul;
the bottleneck component Bottleneck is used for reducing the dimension of the feature map and extracting the features, and carrying out dimension lifting to restore the original channel number after the feature extraction is completed.
8. The deep learning based vision robot material gripping control method according to any one of claims 1 to 7, wherein the step 10 is preceded by:
the camera calibration step:
the camera is arranged on the end effector of the mechanical arm;
Collecting a series of calibration plate images under different postures and positions for calibrating camera parameters;
Calculating camera internal parameter and camera external parameter by using a calibration algorithm;
the hand-eye calibration step:
According to camera external parameter corresponding to each image in the camera calibration result, combining the tail end gesture of the corresponding mechanical arm when each image is acquired, and performing hand-eye calibration;
calculating a conversion relation between hand-eye coordinate systems by using a hand-eye calibration algorithm, wherein the conversion relation comprises a rotation matrix and a translation vector;
And converting the ranging information and the gesture information of the target material from the mechanical arm coordinate system to the hand-eye coordinate system, so as to control the mechanical arm to execute grabbing operation according to the ranging information and the gesture information of the target material in the hand-eye coordinate system.
9. Visual mechanical arm material grabbing control system based on deep learning, which is characterized by comprising:
a 3D camera or binocular camera for capturing camera images;
the laser radar is used for collecting photoelectric data;
the mechanical arm is used for executing material grabbing operation;
The method also comprises a memory, a processor and a material grabbing control program which is stored in the memory and can run on the processor, wherein the material grabbing control program realizes the steps of the visual mechanical arm material grabbing control method based on deep learning as claimed in any one of claims 1 to 8 when being executed by the processor.
CN202410412623.3A 2024-04-08 2024-04-08 Visual mechanical arm material grabbing control method and system based on deep learning Pending CN118003340A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410412623.3A CN118003340A (en) 2024-04-08 2024-04-08 Visual mechanical arm material grabbing control method and system based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410412623.3A CN118003340A (en) 2024-04-08 2024-04-08 Visual mechanical arm material grabbing control method and system based on deep learning

Publications (1)

Publication Number Publication Date
CN118003340A true CN118003340A (en) 2024-05-10

Family

ID=90950976

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410412623.3A Pending CN118003340A (en) 2024-04-08 2024-04-08 Visual mechanical arm material grabbing control method and system based on deep learning

Country Status (1)

Country Link
CN (1) CN118003340A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102165880A (en) * 2011-01-19 2011-08-31 南京农业大学 Automatic-navigation crawler-type mobile fruit picking robot and fruit picking method
JP2014050936A (en) * 2012-09-10 2014-03-20 Applied Vision Systems Corp Handling system, handling method, and program
CN111795696A (en) * 2020-06-28 2020-10-20 中铁第一勘察设计院集团有限公司 Initial structure optimization method of multi-inertial navigation redundancy system based on zero-speed correction
CN114952809A (en) * 2022-06-24 2022-08-30 中国科学院宁波材料技术与工程研究所 Workpiece identification and pose detection method and system and grabbing control method of mechanical arm
CN116117807A (en) * 2022-12-30 2023-05-16 昆明理工大学 Chilli picking robot and control method
CN117325170A (en) * 2023-10-27 2024-01-02 武汉工程大学 Method for grabbing hard disk rack based on depth vision guiding mechanical arm

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102165880A (en) * 2011-01-19 2011-08-31 南京农业大学 Automatic-navigation crawler-type mobile fruit picking robot and fruit picking method
JP2014050936A (en) * 2012-09-10 2014-03-20 Applied Vision Systems Corp Handling system, handling method, and program
CN111795696A (en) * 2020-06-28 2020-10-20 中铁第一勘察设计院集团有限公司 Initial structure optimization method of multi-inertial navigation redundancy system based on zero-speed correction
CN114952809A (en) * 2022-06-24 2022-08-30 中国科学院宁波材料技术与工程研究所 Workpiece identification and pose detection method and system and grabbing control method of mechanical arm
CN116117807A (en) * 2022-12-30 2023-05-16 昆明理工大学 Chilli picking robot and control method
CN117325170A (en) * 2023-10-27 2024-01-02 武汉工程大学 Method for grabbing hard disk rack based on depth vision guiding mechanical arm

Similar Documents

Publication Publication Date Title
US11117262B2 (en) Intelligent robots
CN112476434B (en) Visual 3D pick-and-place method and system based on cooperative robot
CN109255813B (en) Man-machine cooperation oriented hand-held object pose real-time detection method
Zhu et al. Online camera-lidar calibration with sensor semantic information
Jiang et al. An overview of hand-eye calibration
CN109927036A (en) A kind of method and system of 3D vision guidance manipulator crawl
CN112652016B (en) Point cloud prediction model generation method, pose estimation method and pose estimation device
JP5839971B2 (en) Information processing apparatus, information processing method, and program
CN107741234A (en) The offline map structuring and localization method of a kind of view-based access control model
Xu et al. Online intelligent calibration of cameras and lidars for autonomous driving systems
CN109872355B (en) Shortest distance acquisition method and device based on depth camera
Cipolla et al. Visually guided grasping in unstructured environments
Smith et al. Eye-in-hand robotic tasks in uncalibrated environments
CN111780715A (en) Visual ranging method
CN116249607A (en) Method and device for robotically gripping three-dimensional objects
JP2730457B2 (en) Three-dimensional position and posture recognition method based on vision and three-dimensional position and posture recognition device based on vision
CN113778096B (en) Positioning and model building method and system for indoor robot
Csaba et al. Differences between Kinect and structured lighting sensor in robot navigation
Grudziński et al. Stereovision tracking system for monitoring loader crane tip position
CN114067210A (en) Mobile robot intelligent grabbing method based on monocular vision guidance
JP2778430B2 (en) Three-dimensional position and posture recognition method based on vision and three-dimensional position and posture recognition device based on vision
CN118003340A (en) Visual mechanical arm material grabbing control method and system based on deep learning
Nashman et al. Unique sensor fusion system for coordinate-measuring machine tasks
CN115972192A (en) 3D computer vision system with variable spatial resolution
KR102452315B1 (en) Apparatus and method of robot control through vision recognition using deep learning and marker

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination