CN113119073A

CN113119073A - Mechanical arm system based on computer vision and machine learning and oriented to 3C assembly scene

Info

Publication number: CN113119073A
Application number: CN202110411690.XA
Authority: CN
Inventors: 李智军; 徐梁睿; 李琴剑
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2021-04-16
Filing date: 2021-04-16
Publication date: 2021-07-16

Abstract

The invention provides a mechanical arm system facing a 3C assembly scene based on computer vision and machine learning, which comprises: based on the multi-degree-of-freedom high-precision industrial mechanical arm, the robot is flexible in movement and can complete various complex 3C assembly tasks; the monocular camera is arranged at the front end of the mechanical arm, and the movement track and the pose of the front end of the mechanical arm are accurately measured and calculated in real time by adopting an ORB tracking algorithm and matching with an IMU unit; the structured light depth camera with the observable global fixed position adopts a yoloV4-Lite convolutional neural network based on MobileNet to identify and position an object to be assembled and determine the pose and shape characteristics of the object. The invention can solve the problems of low intelligent degree and single working capacity of the traditional 3C assembly mechanical arm; the method depends heavily on empirical data, the data utilization rate is low, and the learning speed is slow; lack of the ability of forming a target-oriented strategy, poor generalization ability and other problems, and has high practical application value.

Description

Mechanical arm system based on computer vision and machine learning and oriented to 3C assembly scene

Technical Field

The invention relates to the field of intelligent control of mechanical arms, in particular to a mechanical arm system facing a 3C assembly scene and based on computer vision and machine learning, and particularly relates to a multi-degree-of-freedom mechanical arm system facing a complex 3C assembly scene and based on deep vision and machine learning.

Background

The 3C assembly scene has the characteristics of complex environment and complex task, and the requirement for the 3C automatic production line is that the machine not only needs to be provided with a mechanical arm with extremely high precision, but also needs the mechanical arm to complete various complex actions. However, the mechanical arm system applied in the current industrial scenario generally works in a fixed mode according to a built-in program, so different mechanical arms need to be purchased in different process flows; in addition, when the production process is changed, a built-in program needs to be rewritten or a new robot arm needs to be purchased. These will consume a large amount of manpower, material resources, greatly improved 3C automation line construction and operation and maintenance cost.

Under the conditions of complex environment and tasks, how the robot acquires knowledge through autonomous learning and reasoning, and carries out accurate, efficient and quick advanced learning and decision making is one of the key problems to be solved urgently by the current robot intelligent technology. The breakthrough of the technology can greatly improve the rapidity, the efficiency and the accuracy of the autonomous movement and the flexible operation of the industrial robot, and have great and profound influence on intelligent manufacturing and national life. In recent years, with the development of multiple subjects such as artificial intelligence, robotics, neuroscience and the like, the intelligent agent learning and decision-making theoretical methods based on computer vision and machine learning make a great performance breakthrough in the task of robot knowledge and skill learning, and the theoretical methods are gradually applied to the actual industrial production process.

Patent document CN112223283A discloses a robot arm, a robot arm control method, a processing apparatus, and a medium. The mechanical arm comprises a mechanical arm main body, a mechanical arm tail end, a force sensor, at least one distance sensor and processing equipment; the force sensor is arranged at the tail end of the mechanical arm and used for detecting stress information between the mechanical arm and the contact surface; the distance sensor is arranged at the tail end of the mechanical arm and used for detecting distance information between the position of the distance sensor and the contact surface. And the processing equipment adjusts the position of the tail end of the mechanical arm by adopting admittance control according to the stress information, and adjusts the posture of the tail end of the mechanical arm by adopting reverse admittance control according to the stress information and the distance information. The distance between the front side of the tail end of the mechanical arm in the advancing direction and the target point on the contact surface can be detected by using the distance sensor, the shape change in the direction is effectively sensed, the contact condition of the tail end of the mechanical arm is clearer, and the tail end posture of the mechanical arm is timely adjusted to meet the requirement of more accurate posture change. There is still room for improvement in structural and technical performance.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a mechanical arm system facing a 3C assembly scene based on computer vision and machine learning.

The invention provides a mechanical arm system facing a 3C assembly scene and based on computer vision and machine learning, which comprises: a robot arm member; the robot arm part includes: a wrist of the mechanical arm and a five-finger hand part; the five-finger hand member includes: a five-finger shell, a five-finger fingertip member; the mechanical arm component is made of 3D printing ABS resin materials; the wrist and the five-finger hand parts of the mechanical arm are made of 1060 aluminum alloy; the five-finger fingertip component adopts a 3D soft rubber coating to simulate the abdomen of a human finger.

Preferably, Maxon motors and drives are used;

the wrist of the mechanical arm has two active degrees of freedom of flexion and extension and outward rotation/inward inversion;

the five-finger hand component has 4 active degrees of freedom;

the hand part includes: a middle thumb device, an index finger device and a three-finger device;

the middle thumb has two active degrees of freedom of abduction/adduction and flexion and extension, and the index finger and the three fingers have two degrees of freedom of flexion and extension.

Preferably, the method further comprises the following steps: monocular cameras, depth cameras;

the monocular camera is positioned at the front end of the mechanical arm and moves along with the mechanical arm, and is used for tracking the track of the mechanical arm;

the configuration of the monocular camera is as follows: 640 × 480 resolution, 30 frames/sec sampling rate;

the depth camera can collect human body teaching data and target RGB-D information

The depth camera is fixed in position and can observe the whole situation of an operation table, is used for collecting human body teaching data and target RGB-D information, and is configured as follows: 1280 × 1080 color resolution, 30 frames/sec color sampling rate, 1280 × 720 active binocular depth resolution, 90 frames/sec depth sampling rate.

Preferably, the method further comprises the following steps: an action learning module;

the action learning module adopts a visual teaching transfer learning algorithm based on supervised learning;

the action learning module learns videos of human body arm assembly target objects with target types and operation success/failure labels, determines the highest operation success rate position as an optimal position point for various objects appearing in the training videos, and acquires motion trail strategy information.

And preliminarily generating a motion trail strategy of the mechanical arm aiming at the target operation.

Preferably, the method further comprises the following steps: the robot comprises a mechanical arm front end track tracking module and a track planning module;

the robot arm front end track tracking module adopts a monocular camera with 640 × 480 resolution and 30 frames/second sampling rate to acquire robot arm front end environment image data in real time, uses an ORB algorithm to extract and match characteristic points with approximately the same characteristic vectors in different pictures, calculates an European transformation matrix (a three-dimensional rotation matrix and a translation vector) of pose change when the monocular camera shoots adjacent time sequence pictures, is matched with an IMU unit to acquire data, accurately measures and calculates the robot arm front end motion track and pose in real time, and inputs data flow into a track planning module.

Preferably, the method further comprises the following steps: a target identification positioning module;

the target identification and positioning module adopts a global observable fixed-position structured light depth camera to collect target color and depth data, identifies a target to be assembled by using a trained and optimized MobileNet-based YooloV 4-Lite convolutional neural network in a color image, applies an image segmentation technology to extract the outline of the target in the color image, determines the position of the target by combining the identified target type, partitions a pixel area where an operation site at the front end of a mechanical arm is located in the color image by using the experience of the optimal operation site obtained by an action learning module, and couples the pixel coordinates of the pixel area with corresponding depth information by combining external parameter information of the color camera and the depth camera to obtain the three-dimensional space coordinates of the position to be operated.

Preferably, the trajectory planning module outputs point position control and continuous trajectory control strategies to the mechanical arm of the executing mechanism based on a motion trajectory strategy model established by the motion learning module and self iteration and in combination with a mechanical arm structure D-H kinematic model for the input target type and three-dimensional coordinate data of the operation point.

Based on Oja reinforcement learning criterion, a recurrent neural network model is established, historical movement track information collected by the front-end track tracking module of the mechanical arm is learned, and a movement track strategy calculation model is continuously optimized.

Compared with the prior art, the invention has the following beneficial effects:

1. the intelligent optimization and promotion of the high-precision 3C assembly production line are realized, the whole system is high in precision and strong in learning and generalization capability, and functions can be expanded through learning;

2. the invention can efficiently complete various 3C assembly tasks in a complex environment;

3. the invention has reasonable structure and convenient use and can overcome the defects of the prior art.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

fig. 1 is a schematic structural diagram of a robot system based on computer vision and machine learning in a 3C assembly scene.

Fig. 2 is a schematic view of the multi-degree-of-freedom 3C assembly robot arm of the present invention.

FIG. 3 is a schematic diagram of a depth separable convolution in accordance with the present invention.

Fig. 4 is a schematic diagram of the network structure of YOLOV4 in the present invention.

Fig. 5 is a schematic view of a hand-eye calibration process in the present invention.

Fig. 6 is a schematic diagram of a structure of exercise learning in the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.

As shown in fig. 1 to 6, the present invention is directed to solve the problems of poor generalization capability of a mechanical arm system, different mechanical arms required to be purchased in different process flows, and a great amount of manpower and material resources consumed due to the need to rewrite a built-in program or purchase a new mechanical arm when a production process is changed, thereby greatly increasing the construction and operation and maintenance costs of a 3C automation line, and further providing a multi-degree-of-freedom mechanical arm system based on depth vision and machine learning for a complex 3C assembly scene.

The hardware platform based on the invention is a multi-degree-of-freedom high-precision mechanical arm, a monocular camera and a depth camera.

The mechanical arm adopted by the invention is divided into a mechanical arm wrist and a five-finger hand: the mechanical arm and the shell of the five-finger hand are made of 3D printing ABS resin materials, so that the weight can be effectively reduced and the attractiveness can be guaranteed; the structural parts of the wrist and the five-finger hand are made of 1060 aluminum alloy, so that enough strength and reasonable weight can be provided; the fingertips of the five fingers are coated with 3D soft rubber films, finger bellies of human hands are simulated, and the gripping capability is improved; in the driving aspect, by using a Maxon motor and a driver, the wrist has two active degrees of freedom of flexion and extension and outward rotation/inward inversion, the five-finger hand has 4 active degrees of freedom, wherein the thumb has two active degrees of freedom of extension/inward contraction and flexion, and the index finger and the rest three fingers have two degrees of freedom of flexion and extension.

The monocular camera adopted by the invention is positioned at the front end of the mechanical arm and used for tracking the track of the mechanical arm, and the monocular camera is configured as follows: 640 × 480 resolution, 30 frames/sec sampling rate; the depth camera is fixed in position and can observe the whole situation of an operation table, is used for collecting human body teaching data and target RGB-D information, and is configured as follows: 1280 × 1080 color resolution, 30 frames/sec color sampling rate, 1280 × 720 active binocular depth resolution, 90 frames/sec depth sampling rate.

The mechanical arm system facing the 3C assembly scene based on computer vision and machine learning comprises an action learning module, a mechanical arm front end track tracking module, a target identification positioning module and a track planning module.

And the action learning module learns the video of human body arm assembly target objects with target types and operation success/failure labels by adopting a visual teaching transfer learning algorithm based on supervised learning, determines the highest position of the operation success rate as an optimal position point for various objects appearing in the training video, and preliminarily generates a motion track strategy of the mechanical arm for the target operation.

The robot arm front end track tracking module adopts a monocular camera with 640 × 480 resolution and 30 frames/second sampling rate to acquire robot arm front end environment image data in real time, uses an ORB algorithm to extract and match characteristic points with approximately the same characteristic vectors in different pictures, calculates an Euclidean transformation matrix (a three-dimensional rotation matrix and a translation vector) of pose change when the camera shoots adjacent time sequence pictures, is matched with an IMU unit to acquire data, accurately measures and calculates the robot arm front end motion track and pose in real time, and inputs data flow into the track planning module.

The target identification and positioning module is used for acquiring target color and depth data by adopting a global observable fixed-position structured light depth camera, identifying a target to be assembled by using a trained and optimized MobileNet-based YooloV 4-Lite convolutional neural network in a color image, extracting the outline of the target in the color image by applying an image segmentation technology, determining the position of the target by combining the identified target type, dividing a pixel area where an operating site at the front end of a mechanical arm is located in the color image by using the experience of an optimal operating site obtained by an action learning module, and coupling the pixel coordinates of the pixel area with corresponding depth information by combining external parameter information of the color camera and the depth camera to obtain the three-dimensional space coordinates of the operating site.

And the track planning module is used for outputting point position control and continuous track control strategies to the mechanical arm of the actuating mechanism on the basis of a motion track strategy model established by the motion learning module and self iteration for the input three-dimensional coordinate data of the target type and the operation site: completing point position control planning of key operation points of the mechanical arm by using a D-H inverse kinematics model; and establishing a recurrent neural network model by using an Oja reinforcement learning criterion of emotion regulation, learning the historical assembly motion track information of the mechanical arm, continuously optimizing a motion track strategy, and finishing the track planning of continuous control of the mechanical arm.

The multi-degree-of-freedom 3C assembly mechanical arm skill learning and intelligent control system realizes intelligent optimization and promotion of a high-precision 3C assembly production line, is high in precision of the whole system, strong in learning and generalization capability, capable of being expanded through learning, and capable of efficiently completing various 3C assembly tasks in a complex environment.

Specifically, in one embodiment, a 3C-oriented assembly scenario is based on a computer vision and machine learning robotic arm system;

the hardware platform is a multi-degree-of-freedom high-precision mechanical arm, a monocular camera and a depth camera; the skill transfer and intelligent control system comprises an action learning module, a mechanical arm front end track tracking module, a target identification and positioning module and a track planning module.

The multi-degree-of-freedom 3C assembly mechanical arm comprises a mechanical arm wrist and a five-finger hand. The mechanical arm and the shell of the five-finger hand are made of 3D printing ABS resin materials, so that the weight can be effectively reduced and the attractiveness can be guaranteed; the structural parts of the wrist and the five-finger hand are made of 1060 aluminum alloy, so that enough strength and reasonable weight can be provided; the five-finger fingertips are coated with 3D soft rubber films, finger abdomens of human hands are simulated, and the gripping capability is improved. In driving, a Maxon motor and a driver are used, the wrist has two active degrees of freedom of flexion and extension and outward rotation/inward inversion, the five-finger hand has 4 active degrees of freedom, wherein the thumb has two active degrees of freedom of extension/inward contraction and flexion, and the index finger and the rest three fingers have two degrees of freedom of flexion and extension. The robotic arm is configured with a monocular camera and IMU for front end trajectory tracking.

The action learning module uses a supervised learning and visual teaching-based transfer learning algorithm to classify the shape characteristic information of different types of targets, learns the corresponding human arm assembly motion trajectory video data, and completes the operation action planning of the end effector on different types of targets according to the constructed model.

The invention focuses on that the demonstrator only provides visual information or state information in observation and simulation learning, and realizes the simulation learning process on the premise of not providing action supervision signals. Such a method takes into account the domain adaptation between the teach pendant and the robot to some extent.

In the aspect of skill simulation and enhancement, the observation simulation learning method is adopted to realize generalization and enhancement of the operation skill of the smart robot. Observation simulation learning is the leading development direction of intelligent body teaching learning, and the current teaching learning is the process of training an intelligent body (called learner) control strategy by machine learning methods such as supervised learning, reinforcement learning and the like on the premise of giving a certain number of expert (called a demonstrator) teaching tracks, so that the behavior of the intelligent body is similar to the expert as much as possible, and the intelligent body shows the expert behavior.

The target object identification and positioning module needs to perform fine-grained image detection, a detection frame which is based on a deep convolutional neural network and has good real-time performance and high accuracy is designed and set up, and the type and the position of a target object are determined through the convolutional neural network. And collecting a detection data set of the target object, completing manual labeling, and performing work such as training set division, verification set division, test set division, data preprocessing and the like. And designing and building a deep convolutional neural network with a proper structure, designing a proper loss function, and training the deep convolutional neural network on a data set. Continuously debugging the hyper-parameters and the network structure according to a training result, and aiming at training a deep convolution neural network with good real-time performance and high identification accuracy. After the position and the type information of the article are recognized, the type information and the position information of the article are recorded. The improved Mobilenet-based YoloV4-Lite is used for identifying the cylinder and the cuboid in the project, and the Mobilenet is characterized in that deep separable convolution is used, and the channel-by-channel linear convolution can greatly reduce the parameter quantity on the premise of ensuring the feature extraction effect.

The input of the original YOLOV4 network is RGB picture with height 416 and width 416, and features are extracted through the CSPDarknet-53 backbone network. Three feature layers with different sizes are used, multi-scale feature fusion is achieved, the deviation of the target position relative to a default rectangular box (default box) is regressed, and meanwhile, a classification confidence coefficient is output. The original YOLOV4 network structure is complex, the prediction speed is slow, and the network structure needs to be adjusted in order to meet the requirement of detection speed in hand-eye coordination. In consideration of the trade-off between accuracy and speed, the MobileNet network is used as a backbone network to extract features, but the resolution of an input picture is not changed, and the same number of features are used for target detection.

The training process of the network adopts the idea of transfer learning, namely the whole training process is divided into two stages, the first stage aims to train a model with strong image feature extraction capability on the basis of a large data set, and the MSCOCO target detection data set is adopted for training on the training of the system, so that a good initial model is obtained. And in the second stage, on the basis of the initial model, the self-made various object mark data sets are used for fine-tuning to obtain a final classification model with strong generalization capability. The angle of image information obtained by a depth camera is changeable, in order to accurately identify an object, data enhancement is carried out by using multiple CV processing methods, the brightness, the distortion degree and the contrast of original data are changed, the detection requirements under different illumination conditions are met, the picture is rotated, the detection requirements at different visual angles are met, meanwhile, the latest mosaic data enhancement method is used, different pictures are spliced, the environment complexity is increased, and the detection capability of a small target is enhanced.

In order for the robot to accurately grasp the object on the worktable, the position of the object on the worktable, that is, the coordinates of the object on the robot base coordinate system, needs to be obtained. The invention uses RealSense D435i to obtain RGBA image and depth image of working environment, and converts pixel coordinate and depth value of object on the image into description in base coordinate system by hand-eye calibration method. The hand-eye calibration is an indispensable step for a robot to accurately grab a target object, and aims to map pixel information in a vision system to a robot coordinate system and then grab the object by the robot.

And the trajectory planning module trains a cyclic neural network to complete trajectory planning, the neural network comprises 400 neurons, each element in the cyclic weight matrix W is initialized to 0 with the probability of 1-p, and the rest elements are initialized to Gaussian distribution (the mean value is 0 and the variance is g/pN). Where g is the spectral radius of the cyclic synaptic weight and p is the probability of initialization of the cyclic synaptic weight. The initial input weights U and output weights V are initialized to be evenly distributed. In addition, it is assumed that the initial membrane potential of each neuron follows a [0,0.1] uniform distribution. Assuming that the target point is y' and the actual point reached is y, the objective function consists of both the euclidean distance error and the energy terms activated.

In the description of the present application, it is to be understood that the terms "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience in describing the present application and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present application.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. A3C assembly scene oriented mechanical arm system based on computer vision and machine learning is characterized by comprising: a robot arm member;

the robot arm part includes: a wrist of the mechanical arm and a five-finger hand part;

the five-finger hand member includes: a five-finger shell, a five-finger fingertip member;

the mechanical arm component is made of 3D printing ABS resin materials;

the wrist and the five-finger hand parts of the mechanical arm are made of 1060 aluminum alloy;

the five-finger fingertip component is coated with a 3D soft rubber film.

2. The 3C-assembly scene oriented computer vision and machine learning based robotic arm system of claim 1, wherein Maxon motors and drivers are used;

the five-finger hand component has 4 active degrees of freedom;

3. The 3C assembly scene based robotic arm system based on computer vision and machine learning of claim 1, further comprising: monocular cameras, depth cameras;

the depth camera can collect human body teaching data and target RGB-D information.

4. The 3C assembly scene based robotic arm system based on computer vision and machine learning of claim 1, further comprising: an action learning module;

5. The 3C assembly scene based robotic arm system based on computer vision and machine learning of claim 1, further comprising: the robot comprises a mechanical arm front end track tracking module and a track planning module;

the robot arm front end track tracking module adopts a monocular camera with 640 × 480 resolution and 30 frames/second sampling rate to acquire robot arm front end environment image data in real time, uses an ORB algorithm to extract and match characteristic points with approximately the same characteristic vectors in different pictures, calculates an European transformation matrix of pose change of the monocular camera when adjacent time sequence pictures are shot, cooperates with an IMU unit to acquire data, measures and calculates the robot arm front end motion track and pose in real time, and inputs data flow into a track planning module.

6. The 3C assembly scene based robotic arm system based on computer vision and machine learning of claim 1, further comprising: a target identification positioning module;

the target identification and positioning module adopts a fixed-position structured light depth camera capable of observing the whole situation to collect target color and depth data, in a color image, a trained and optimized YoloV4-Lite convolutional neural network based on MobileNet is used for identifying a target to be assembled, an image segmentation technology is applied to extract the outline of the target in the color image, the target pose is determined by combining the identified target type, the pixel region where the front end operation site of the mechanical arm is located in the color image is divided by the experience of the optimal operation site obtained by an action learning module, and the pixel region pixel coordinates are coupled with corresponding depth information by combining the external parameter information of the color camera and the depth camera to obtain the three-dimensional space coordinates of the target to be operated.

7. The mechanical arm system facing to the 3C assembly scene and based on computer vision and machine learning of claim 5, wherein the trajectory planning module outputs point position control and continuous trajectory control strategies to the mechanical arm of the executing mechanism based on a motion trajectory strategy model established through a motion learning module and self iteration and in combination with a mechanical arm structure D-H kinematics model to input three-dimensional coordinate data of the target type and the operation site.