CN115965690A

CN115965690A - Binocular vision-based non-contact excavator operation posture learning and estimating method

Info

Publication number: CN115965690A
Application number: CN202211722477.1A
Authority: CN
Inventors: 孙辉; 杨娇娇
Original assignee: Tsinghua University; Suzhou Automotive Research Institute of Tsinghua University
Current assignee: Tsinghua University; Suzhou Automotive Research Institute of Tsinghua University
Priority date: 2022-12-30
Filing date: 2022-12-30
Publication date: 2023-04-14

Abstract

The invention discloses an end-to-end excavator operation attitude estimation method using deep learning, which realizes non-contact measurement of the attitude of a shovel arm and a bucket by using a binocular camera plus a deep learning algorithm. Compared with a contact type measuring mode using an angle sensor and an IMU (inertial measurement Unit), the method greatly reduces the system cost and solves the problem that the sensor is easy to damage in the using process of the excavator. Compared with the conventional method for estimating angles by simply using characteristic points, the method for cascade connection of one end to multiple ends provided by the invention enhances the stability of a neural network and ensures that the network has certain interpretability by using a segmented cascade method.

Description

Binocular vision-based non-contact excavator operation posture learning and estimating method

Technical Field

The invention relates to the field of unmanned engineering machinery, in particular to a binocular vision-based non-contact type excavator operation posture learning and estimating method.

Background

With the increasing maturity of automatic driving technology, foreign famous engineering machinery companies, such as carter piler and komata, are always researching and developing unmanned engineering machinery, and the automatic driving mine dump truck of komata directly cancels a cab and is already put into practical use in australia and chile.

In the process of driving automation of an excavator, detection of the posture of a shovel arm and a bucket is an indispensable part in closed-loop automatic driving operation, and currently, angle measurement or estimation is mainly performed by using an angle sensor and an IMU (inertial measurement unit). However, the excavator working environment is severe, and the conventional direct measurement method cannot be directly used for measuring the bucket part and is easy to damage. In addition, the problems of complex sensor installation and calibration, high cost and the like exist.

The non-contact measurement technology needs to be developed urgently, a scheme of machine vision based on a monocular camera is a common choice, the camera is used for detecting key points, then the key points are combined with a calibration relation to reversely deduce angles, but the monocular camera has no scale information, effective constraint on characteristic points is difficult to find, and stability and reliability are difficult to achieve.

The binocular camera is used for directly observing the scale, and meanwhile, the self physical size information of the excavator is combined for constraint, so that the method is a feasible scheme.

Disclosure of Invention

The invention aims to: the invention provides a non-contact type excavator operation attitude estimation method based on a binocular camera and using deep learning, aiming at the problems that a direct sensor (an angle sensor/IMU) in the automatic measurement of the excavator operation attitude is high in price and easy to damage, and the method can greatly reduce the system cost and overcome the problem of short service life in severe environment.

Aiming at the defects of instability and inexplicability of a conventional method for estimating an attitude angle by using a camera and feature points, the invention provides a one-end-to-multi-end cascading type deep learning network, which makes full use of image information, binocular depth information and excavator physical model information, innovatively integrates the image information, the binocular depth information and the excavator physical model information into the deep learning network, enhances the stability of a neural network, and enables the network to have certain interpretability by a segmented cascading method.

Aiming at the problem that most machine vision methods have complex processes, the invention provides an end-to-end method, which simplifies the processes of the whole method.

The technical scheme of the invention is as follows:

the method is characterized in that wide-angle binocular cameras arranged above a cab are used, a left camera and a right camera respectively acquire images containing a shovel arm shovel, and three angle values of an included angle1 between the shovel arm and a first shovel arm, an included angle2 between the first shovel arm and a second shovel arm, and an included angle3 between the second shovel arm and a third shovel arm are obtained through a series of methods; the treatment process comprises the following steps:

s01, defining a starting point as one point on the bucket, setting the starting point as P0, defining the other 9 points as the joint points of the rotation of the shovel arm and the bucket, and defining angles to be solved as angle1, angle2 and angle3;

s02, designing a cascade deep learning network with one end to multiple ends, wherein one end is an image, the other end is divided into 3 branches, the branch 1 is used for predicting a rotation key point, the branch 2 is used for predicting three angles, the branch 3 is used for predicting the position of 3D, the 3 branches are in progressive relation and mutually constrained, and meanwhile, the annotated image, the angle sensor result and the depth image respectively provide true values for the 3 branches and are used for constructing a load capable of realizing back propagation;

s03, in a training stage, firstly acquiring an image signal from a left image, constructing a feature point detection network comprising a convolutional layer and a full-link layer, predicting the positions of 10 key feature points in the image, marking as P0 '-P9', and comparing the 10 feature points with feature points P0-P9 marked by the image to generate a comparison error Ep; at the moment, mapping the characteristic points to a certain projection plane according to projection transformation, thereby directly calculating attitude angles of a shovel arm and a bucket; then, combining the positions of P0 '-P9' and the depths of the characteristic points measured by the binoculars to form P0d '-P9 d';

s04, after P0d '-P9 d' is obtained, taking P0d '-P9 d' as input, predicting attitude angles angle1 '-angle 3' of a shovel arm and a bucket as output, adding more than 2 layers of hidden layers to construct a second neural network, and comparing the values of angle1 '-angle 3' with true values angle 1-angle 3 obtained by an angle sensor to generate a comparison error Ea;

s05, after angle1 '-angle 3' is obtained, converting the angle into position information according to the real 3D model of the shovel arm and the bucket, at the moment, fusing and calculating the left camera image and the right camera image to obtain a depth map, and comparing the information of the depth map with the calculated 3D position information to generate a comparison error Ed;

s06, in the stage of deducing use, obtaining the predicted angles angle1 '-angle 3' by adopting the trained model M according to the left image and the right image.

Preferably, the Loss in S02 is constructed by a method of Loss = α Ep + β Ea + γ Ed, where the values of α, β, and γ are the weights of the respective losses and are determined by experimental results.

Preferably, in S05, the left camera image and the right camera image are fused to obtain the depth map by a calculation method including BM and SGM.

The invention has the advantages that:

1. according to the end-to-end excavator operation attitude estimation method using deep learning, the non-contact measurement of the attitude of the shovel arm and the bucket is realized by using a binocular camera and a deep learning algorithm. Compared with a contact type measuring mode using an angle sensor and an IMU (inertial measurement Unit), the method greatly reduces the system cost and solves the problem that the sensor is easy to damage in the using process of the excavator.

2. Compared with the conventional method for estimating angles by simply using characteristic points, the method for cascade connection of one end to multiple ends provided by the invention has the advantages that the stability of the neural network is enhanced, and the network has certain interpretability by the method of cascade connection in sections.

Drawings

The invention is further described below with reference to the following figures and examples:

the accompanying drawings are included to provide a further understanding of the invention, are incorporated in and constitute a part of this specification, and together with the description, serve to explain the invention and not to limit the invention, in which:

FIG. 1 is an illustration of hardware installation positions, feature points and angles of a binocular vision-based non-contact excavator work attitude learning and estimation method;

fig. 2 is a neural network model construction process of a binocular vision-based non-contact type excavator operation posture learning and estimation method.

Detailed Description

The invention provides a binocular vision-based non-contact type excavator operation attitude learning and estimation method, as shown in figure 1, a wide-angle binocular camera arranged above a cab is used, a left camera and a right camera respectively acquire an image containing a shovel arm bucket, and three angle values of angle1, angle2 and angle3 are obtained through a series of methods. The processing flow comprises the following steps:

and S01, defining a starting point as one point on the bucket, setting the starting point as P0, defining the other 9 points as joint points of rotation of the shovel arm and the bucket, and defining angles to be solved as angle1, angle2 and angle3, wherein each angle can be determined by a plurality of points together so as to improve robustness.

S02, as shown in FIG. 2, the end-to-end deep learning method can greatly simplify the complexity of the algorithm, and meanwhile, hidden features can be more fully utilized through learning. In order to realize end-to-end training and inference and increase reliability, a cascade deep learning network with one end to multiple ends is designed, one end is an image, the other end is divided into 3 branches, the branch 1 is used for predicting a rotation key point, the branch 2 is used for predicting three angles, the branch 3 is used for predicting the position of 3D, the 3 branches are in progressive relation and constrained with each other, and meanwhile, the annotated image, the angle sensor result and the depth image respectively provide true values for the 3 branches and are used for constructing a load capable of realizing back propagation.

S03, in a training stage, firstly acquiring an image signal from a left image, constructing a feature point detection network comprising a convolutional layer and a full-link layer, predicting the positions of 10 key feature points in the image, marking as P0 'to P9', and comparing the 10 feature points with feature points P0 to P9 marked by the image to generate a comparison error Ep. In this case, the feature points may be mapped onto a certain projection plane according to projection transformation, thereby directly calculating the attitude angles of the shovel arm and the bucket. However, since the direct calculation method has too few constraints, angular jitter is likely to occur. The positions of P0 'to P9' are then combined with the binocular measured depths at these feature points to form P0d 'to P9d'.

S04, after P0d '-P9 d' is obtained, the P0d '-P9 d' is used as input, the attitude angles of the shovel arm and the bucket are predicted to be 1 '-angle 3' as output, more than 2 layers of hidden layers are added, and a second neural network is constructed. Meanwhile, each angle is determined by a plurality of characteristic points, and the neural network can automatically synthesize the calculation results of the characteristic points. Comparing the values of angle1 'to angle3' with the true values of angle1 to angle3 obtained by the angle sensor to generate a comparison error Ea.

S05, after angle1 '-angle 3' is obtained, the angle can be converted into position information according to the real 3D model of the shovel arm and the bucket, and the step naturally integrates the 3D physical model and the deep learning network to generate the effect of strong rule constraint. At this time, the left camera image and the right camera image are fused to obtain a depth map, and typical calculation methods include BM (block matching), SGM (semi-global matching), and the like. The information of this depth map is compared with the calculated 3D position information to generate a comparison error Ed.

The construction method of S06 and Loss is that Loss = α Ep + β Ea + γ Ed, where the values of α, β, γ are the weights of the respective losses, and are determined by the experimental results.

S07, in an inference (reference) use stage, obtaining the prediction angles of angle1 '-angle 3' only by adopting a trained model M according to a left image and a right image.

The preferred embodiments of the present invention will be further described with reference to the accompanying drawings.

As shown in fig. 1 and 2, in the implementation of the method for estimating the operation posture of the end-to-end excavator by using deep learning, the method is divided into two independent links of training and inference, and may include the following steps:

a training stage:

s1, installing a wide-angle binocular camera at the top of a cab of the excavator, and enabling an optical axis of the camera to be parallel to an operation plane of the excavator. The angle sensor is mounted in place.

S2, using the binocular camera and the angle sensor to collect data under different weather, illumination and operation postures, and reaching more than 20000 frames.

And S3, distributing the acquired image to image annotation personnel, and annotating P0-P9 key feature points. And (4) calculating the binocular parallax offline by using an accurate algorithm to form a parallax map for later use.

And S4, forming a training set by the left image, the right image, the parallax image, the angle value and the characteristic point marking value which are ready in the S2 and the S3, constructing a one-end-to-multi-end cascade deep learning network, starting training, and obtaining a trained model M after a period of time.

And (3) an inference stage:

s5, using the trained model M, taking the left image, the right image and the disparity map as input, and predicting P0-P9 feature points and three angles angle 1-angle 3.

While specific embodiments of the present invention have been described above, it should be understood that these are by way of example only and that numerous changes and modifications can be made to the embodiments without departing from the principles and spirit of the invention.

The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose of the embodiments is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All modifications made according to the spirit of the main technical scheme of the invention are covered in the protection scope of the invention.

Claims

1. The non-contact excavator operation posture learning and estimating method based on binocular vision is characterized in that a wide-angle binocular camera arranged above a cab is used, a left camera and a right camera respectively acquire an image containing a shovel arm bucket, and three angle values of an included angle1 between the bucket and a first shovel arm, an included angle2 between the first shovel arm and a second shovel arm, and an included angle3 between the second shovel arm and a third shovel arm are obtained through a series of methods; the processing flow comprises the following steps:

s01, defining a starting point as a point on the bucket, setting the starting point as P0, defining another 9 points as joint points of rotation of the shovel arm and the bucket, and defining angles to be solved as angle1, angle2 and angle3;

s02, designing a cascade deep learning network with one end to multiple ends, wherein one end is an image, the other end is divided into 3 branches, a branch 1 is used for predicting a rotation key point, a branch 2 is used for predicting three angles, a branch 3 is used for predicting the position of 3D, the 3 branches are in a progressive relation and are mutually constrained, and meanwhile, an annotated image, an angle sensor result and a depth image respectively provide true values for the 3 branches and are used for constructing an LOSS (load operating system) capable of realizing back propagation;

s03, in a training stage, firstly acquiring an image signal from a left image, constructing a feature point detection network comprising a convolutional layer and a full-link layer, predicting the positions of 10 key feature points in the image, marking as P0 'to P9', and comparing the 10 feature points with feature points P0 to P9 marked by the image to generate a comparison error Ep; at the moment, the feature points are mapped to a certain projection surface according to projection transformation, so that the attitude angles of the shovel arm and the bucket are directly calculated; then, combining the positions of P0 'to P9' and the depths of the binocular measured characteristic points to form P0d 'to P9d';

2. The binocular vision based non-contact excavator work attitude learning and estimation method according to claim 1, wherein the Loss in S02 is constructed by the method that Loss = α Ep + β Ea + γ Ed, wherein values of α, β and γ are weights of respective losses and are determined by experimental results.

3. The binocular vision based non-contact excavator work posture learning and estimation method according to claim 1, wherein in the step S05, the left camera image and the right camera image are subjected to fusion calculation, and a calculation method for obtaining the depth map includes BM and SGM.