CN115205356A

CN115205356A - Binocular stereo vision-based quick debugging method for practical training platform

Info

Publication number: CN115205356A
Application number: CN202211092343.6A
Authority: CN
Inventors: 李博; 郑泽胜; 朱万锦; 杨丹媚
Original assignee: Guangzhou Yidao Intelligent Information Technology Co ltd
Current assignee: Guangzhou Yidao Intelligent Information Technology Co ltd
Priority date: 2022-09-08
Filing date: 2022-09-08
Publication date: 2022-10-18
Anticipated expiration: 2042-09-08
Also published as: CN115205356B

Abstract

The invention provides a binocular vision-based quick installation and debugging method for a practical training platform, which comprises the following steps of: quickly calibrating binocular stereoscopic vision; positioning each module on the practical training platform by using an improved lightweight yolo v5 network, and acquiring point cloud of a target area by using a stereo matching method; point cloud registration of each module of the training platform; and debugging errors aiming at the positions of all modules of the reference training platform. The three-dimensional pose information of each module on the practical training platform is obtained by performing high-precision rapid identification and positioning on each module of the practical training platform through the binocular stereoscopic vision positioning system, and the real-time position deviation correction information feedback is performed on each module in the debugging process to guide the installation error of debugging personnel, so that the debugging efficiency of the practical training platform is improved, the debugging time is shortened, the debugging cost is reduced, and the purpose of rapid shipment is achieved.

Description

Binocular stereo vision-based quick debugging method for practical training platform

Technical Field

The invention relates to the technical field of computers, in particular to a quick debugging method of a practical training platform based on binocular stereo vision.

Background

With the rapid development of the intelligent manufacturing industry, more robots are needed to debug and operate employees in a production line, and a large amount of training platform equipment is needed to train the employees in the enterprises in higher vocational schools, social training institutions and the like. The practical training platform has more modules, and in the production process, the debugging period is long due to inaccurate assembly of the relative positions of the modules, so that the robot prefabricated program or the PLC prefabricated program cannot be directly copied into the platform to directly run.

Therefore, it is important to debug the installation position of each module, and there are several existing debugging methods:

firstly, the debugging engineer uses the ruler to measure the position of each module, and the method has long time and low precision, and the debugging effect has larger difference due to the individual difference of the debugging engineer.

Secondly, debugging is carried out through monocular machine vision, but the monocular machine vision can only solve the problems of translation and rotation of each module on a plane, the offset error of the integral three-dimensional pose of each module cannot be obtained, and due to the problem of depth of field, if the difference between the height of each module and the height of the monocular camera is large, the monocular camera cannot carry out accurate imaging, so that positioning is not accurate.

And thirdly, positioning and debugging are carried out through binocular vision, the existing binocular vision positioning is mainly still used for positioning a single object, and when the module is applied to a practical training platform with a plurality of modules, the modules are mutually shielded to cause inaccurate positioning so as to influence a debugging result.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a binocular vision-based rapid installation and debugging method for a practical training platform, which can rapidly and precisely identify and position each module of the practical training platform to obtain three-dimensional pose information of each module on the practical training platform, and feed back real-time position deviation correction information of each module in a debugging process to guide the installation error of debugging personnel, thereby improving the debugging efficiency of the practical training platform, shortening the debugging time and reducing the debugging cost.

The technical scheme of the invention is realized as follows:

a quick debugging method for a practical training platform based on binocular stereo vision comprises the following steps:

calibrating the binocular, namely calibrating and correcting the relative position of a monocular camera in a binocular vision system;

adding a double attention model into a Yolo v5 network, and designing a lightweight Yolo v5 network;

manufacturing a practical training platform module position data set, wherein the module position data set comprises a first characteristic diagram and a second characteristic diagram of a practical training platform, which are shot through a binocular vision system;

inputting the training platform module position data set into the lightweight Yolo v5 network, and positioning a target area of each module on the training platform;

performing feature point extraction operation on the target area through a stereo matching method to obtain a first feature point set and a second feature point set;

performing characteristic point matching on the first characteristic point set and the second characteristic point set through an Euclidean distance matching operation to obtain a matching point pair, and performing polarity constraint judgment on the matching point pair;

calculating the point cloud to be matched of the target area by using a feature matching point pair calculation formula according to a triangular imaging principle;

and performing point cloud registration operation on the point cloud to be registered, comparing the registered point cloud with a preset template point cloud, acquiring the offset error of each installation module, and guiding the offset error into visual software to output guide debugging information.

Preferably, the double-target centering further comprises that a left monocular camera and a right monocular camera in the binocular vision system shoot a series of checkerboard calibration plate images and fit internal parameters and external parameters of the left monocular camera and the right monocular camera;

a left monocular camera and a right monocular camera in the binocular vision system shoot a series of checkerboard calibration plate images;

searching angular point information in the chessboard pattern calibration plate image by using a Harris angular point detection method;

fitting the internal parameters and the external parameters of the left monocular camera and the right monocular camera according to the angular point information;

and carrying out camera coordinate conversion on the images acquired by the left monocular camera and the right monocular camera through an internal reference matrix, and multiplying by a rotation matrix to obtain new coordinate systems of the left monocular camera and the right monocular camera. And carrying out distortion correction on the left camera and the right camera through the distortion removal operation of the left camera and the right camera, and carrying out epipolar line verification on the left monocular camera and the right monocular camera on the image.

Preferably, the dual attention model includes a channel attention module and a location attention module;

adding the dual attention model to backbone and neck of a Yolo v5 network to obtain the lightweight Yolo v5 network.

Preferably, a truncated ICP method is used for carrying out registration operation on the point clouds to be matched, the point clouds to be matched are arranged in an ascending order, and the first half of the point clouds to be matched are taken as the minimum;

selecting a matching point pair meeting the requirement by utilizing the truncation ratio;

solving rigid body transformation between the matching point sets by using a generalized least square method;

updating the truncation ratio using the rigid body transformation;

and selecting the matching point pairs meeting the requirements by using the updated truncation ratio.

Preferably, the step of truncating the ICP to solve for the rigid transformation comprises constructing a residual metric function; accelerating and traversing by utilizing a K-D tree to obtain the most adjacent point pair of the template point cloud and the point cloud to be matched; solving the nearest point pair meeting the requirement by utilizing the truncation coefficient; the rigid transformation is found from the nearest pair of neighboring points that satisfy the requirements using the SVD method.

Preferably, the stereo matching method is to search the target region through a SURF feature point operator to obtain the first feature point set and the second feature point set.

Preferably, designing the lightweight Yolo v5 network includes acquiring a training platform top view sample diagram data set to construct a sample data set, and training the lightweight Yolo v5 network through the sample data set.

Preferably, the channel Attention module is a structure of Residual + Attention, reshape and transit are respectively performed on the feature map a to respectively obtain a feature map R1 and a feature map RT1, the feature map R1 and the feature map RT1 are multiplied by softmax to obtain a channel Attention feature map X, the feature map X and the feature map a are multiplied by a scale coefficient and then subjected to Reshape to be changed into an original shape, and finally the channel Attention module is added to the feature map a to obtain an output feature map E.

Preferably, the position Attention module is a structure of Residual + Attention, the feature map a is convolved by 3 convolution kernels respectively to obtain 3 feature maps B, C and D, then the feature map B is subjected to Reshape and Transpose and the matrix after Reshape is performed on the feature map C, softmax is performed to obtain a feature map S, the matrix after Reshape is performed on the feature map S and the feature map D is subjected to multiplication and then is subjected to Reshape after a scale coefficient, the original shape is changed, and finally the result is added to the feature map a to obtain a final output feature map E.

Preferably, the training platform module position data set includes: taking a picture of the random placement position of the module; and shooting pictures of different postures of the robot on the practical training platform.

Compared with the prior art, the invention has the following advantages.

According to the invention, the binocular vision system is used for carrying out high-precision and rapid identification on each module on the practical training platform, positioning the modules through a Yolo v5 network, carrying out point cloud generation, point cloud registration and other operations to obtain the three-dimensional pose information of each module on the practical training platform, and outputting the guiding and debugging information of each module for deviation rectification through comparison of the information and the three-dimensional pose information of the reference version, so that the adjustment of debugging personnel is facilitated, the debugging efficiency of the practical training platform is improved, the debugging time is shortened, and the debugging cost is reduced.

Because the model is of a great variety on the practical training platform, the problem of shielding often exists among all the modules, in order to enable target feature extraction to be more accurate, when all the modules are positioned, a lightweight Yolo v5 network is used for adding a double-attention module in a Yolo v5 model so as to solve the shielding problem, object position information still ensures that bottom layer feature information is not lost under the shielding condition, and reasoning speed and accuracy are still ensured while network calculation amount is not increased.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a rapid debugging method of a practical training platform based on binocular stereo vision;

FIG. 2 is a structural diagram of a lightweight Yolo v5 network in an embodiment of the present invention;

FIG. 3 is a block diagram of a channel attention module in an embodiment of the present invention;

FIG. 4 is a block diagram of a location attention module in an embodiment of the present invention;

FIG. 5 is a schematic view of a model for pinhole imaging in an embodiment of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

As shown in fig. 1 to 4, the embodiment of the invention discloses a method for quickly debugging a practical training platform based on binocular stereo vision, which comprises the following steps:

s01, calibrating and correcting the relative position of a monocular camera in a binocular vision system;

s02, training a lightweight Yolo v5 network; the native Yolo v5 network is an object detection algorithm, and is usually used to detect the required object position in the picture.

S03, manufacturing a practical training platform module position data set, wherein the module position data set comprises a first characteristic diagram and a second characteristic diagram of a practical training platform, which are shot through a binocular vision system;

s04, inputting the position data set of the practical training platform module into the lightweight Yolo v5 network, and positioning a target area of each module on the practical training platform;

s05, extracting the feature points of the target area by a stereo matching method to obtain a first feature point set and a second feature point set;

s06, performing characteristic point matching on the first characteristic point set and the second characteristic point set through Euclidean distance matching operation to obtain matching point pairs, and performing polarity constraint judgment on the matching point pairs;

s07, according to a triangular imaging principle, calculating point cloud to be matched of the target area by using a feature matching point pair calculation formula;

and S08, performing point cloud registration operation on the point cloud to be registered, comparing the registered point cloud with a preset template point cloud, acquiring offset errors of each installation module, and importing the offset errors into visual software to output guiding debugging information. Wherein the offset error comprises six degrees of freedom of offset error data.

In a specific embodiment, in step S01, the method further includes the following steps:

s11, firstly, shooting a series of images containing a checkerboard calibration plate by a monocular camera in a binocular vision system, wherein 15 pictures of the calibration plate at different angles are shot for each camera in the embodiment;

s12, searching the corner information in the checkerboard calibration plate image by using a Harris corner detection method, wherein the specific formula is as follows:

wherein,

for the strength value of the corner point,

is the coordinates of the pixels of the image,

for the sliding variables of the Harris corner sliding window,

is a weighted function of the coordinates of the pixels of the image,

is a pixel point of an image

The value of the pixel of (a) is,

is a pixel point

On the sliding window

The pixel value of the dot.

S13, fitting the internal parameters and the external parameters of the left monocular camera and the right monocular camera according to the corner point information;

as shown in fig. 5, the pinhole imaging model is a process model of linear camera imaging, and a light source emits light from a certain point P (XC, YC, ZC) on the object surface into the camera lens to form a projection point P (x, y), then the proportional relationship is obtained:

this formula can be expressed as:

and combining the above equation to obtain the relation between the world coordinate of the point P and the projection coordinate of the point P on the image:

wherein M is a 3 x 4 projection matrix;

are internal parameters related to the internal structure of the camera,

external parameters determined by the orientation of the camera relative to the world coordinate system;

the internal parameters of the camera are fitted by a least square method according to the shot multiple chessboard pattern calibration plate images for the homogeneous coordinate of the point P in the world coordinate system

And external parameters

。

And S14, calibrating and correcting the binocular vision system, carrying out camera coordinate conversion on images acquired by the left camera and the right camera through an internal reference matrix, and multiplying by a rotation matrix to obtain new coordinate systems of the left barrage camera and the right barrage camera. And carrying out distortion correction on the left camera and the right camera through the distortion removing operation of the left camera and the right camera, and carrying out epipolar line verification on the image of the left camera and the right camera.

In a specific embodiment, step S03 further includes the following steps:

s31, manufacturing a practical training platform module position data set, erecting a binocular vision system right above a practical training platform, enabling all modules on the practical training platform to be located in the visual field range of a camera, and respectively collecting 2000 top views of the practical training platform by utilizing a left monocular camera and a right monocular camera in the binocular vision system. The system comprises pictures of different states such as random positions of modules, different postures of the robot and the like, wherein a left monocular camera generates a first feature map, and a right monocular camera generates a second feature map.

And S32, carrying out bounding box bounding and class labeling on the spraying mold block, the welding module, the storage rack module, the rotary table, the drawing module and the like on the practical training platform by using a labeling tool.

As shown in fig. 2, because the training platform modules are of various types, and a shielding problem often exists among the modules, for example, the swing arm of the robot shields the modules in different postures, in order to enable the binocular vision system to more accurately extract target features, in the embodiment, a double attention module is added in the Yolo V5 model to solve the shielding problem, in addition, the correction system needs to be deployed on an industrial personal computer of the training platform, the industrial personal computer considers the integration cost and has relatively low configuration, so that the real-time operation requirement of the system cannot be met by adopting native Yolo V5, in the embodiment, the back bone and the rock of the Yolo V5 are improved by using a double attention mechanism, and the light-weight Yolo V5 network can improve the identification precision and the identification rate of the shielded objects; the dual attention module comprises a channel attention module and a position attention module, and a lightweight Yolo V5 network structure fused with a dual attention mechanism is shown in fig. 2.

In a preferred embodiment, the channel Attention module is shown in fig. 3, the channel Attention module itself is a Residual + Attention structure, and Reshape, reshape and transit are respectively performed on the feature map a to obtain feature maps respectively

And characteristic diagrams

Will be

And

multiplying by softmax to obtain a channel attention feature map X, multiplying the X and the feature map A by a scale factor, then multiplying the Reshape into the original shape, and finally adding the Reshape and the feature map A to obtain an output feature map E.

In a preferred embodiment, the position Attention module is as shown in fig. 4, the position Attention module itself is a Residual + Attention structure, the feature map a is convolved by 3 convolution kernels to obtain 3 feature maps B, C, D, then the matrix obtained by multiplying B by Reshape and transit by C Reshape is multiplied, softmax is performed to obtain a feature map S, the matrix obtained by multiplying S by D Reshape is multiplied by a scale factor, then Reshape is in the original shape, and finally the result is added to a to obtain the final output feature map E.

As shown in fig. 2, the dual attention module (DA) is added after the CBL layer of the backbone network so that the backbone network does not lose detail in terms of both the channel and object position dimensions when extracting the main features. On a bottleneck layer, a double attention module (DA) is accessed after CBL is carried out on three dimension dimensions, so that the object position information of each dimension still ensures that the bottom layer characteristic information is not lost under the condition of shielding, and the reasoning speed and accuracy are still ensured while the network calculation amount is not increased.

Specifically, in the embodiment, S05, a feature point extraction operation is performed on the target region through a stereo matching method to obtain a first feature point set and a second feature point set, specifically, according to the region of each module in the image captured by the left monocular camera and the right monocular camera in the binocular vision system obtained in step S03, a SURF feature point operator is used for stereo matching, and SURF feature points searched in the identified region are as follows:

wherein,

is the feature point set of the c-th target area module in the first feature map,

is the ith point in the first set of feature points.

Is the feature point set of the c-th target area module in the second feature map,

is the ith point in the second feature point set.

In a specific embodiment, step S06

And

the Euclidean distance matching is firstly carried out on the characteristic points, two characteristic point pairs with the shortest Euclidean distance are searched, and the Euclidean distance matching formula is as follows:

wherein,

is composed of

To

The euclidean distance of (c) is,

is the ith point in the first set of feature points,

for the ith point in the second feature point set, finding out two most matched feature point pairs in the c-th area as follows:

wherein

In the present embodiment, a preferred set threshold is 60 pixels, and if the epipolar distance of the row pixels of the left and right two feature points to be matched differs by 60px, the matched point pair is rejected.

In step S07, according to the triangular imaging principle of the left camera and the right camera, the characteristic matching point pairs are utilized

Then, the point cloud in the c-th area can be calculated, and the point cloud calculation formula is as follows:

wherein,

respectively the abscissa, ordinate and depth coordinate of the point p in the c-th area, f is the focal length of the camera, the distance between the optical centers of the two cameras is d,

in order to match the abscissa of the point pair,

is a matching point pair ordinate.

In a specific embodiment, the point cloud registration of each module in step S08 further includes the following steps:

obtaining point cloud in C area

After (x, y, z), registering the point cloud to be registered by utilizing a method of truncating ICP,

the difference between the idea of the algorithm and the ICP method is that the traditional ICP method uses the least square sum of the residual errors of all the point pairs, and the truncation ICP makes the least square sum of the residual errors of the point pairs which are arranged in the ascending order in the first half, so that the purpose of processing the point pairs which are arranged in the ascending order in the later half as the abnormal point pairs is achieved, and the influence of noise on the closed solution of the ICP method is reduced.

The formula for truncating the ICP estimator breakdown point is:

wherein n is the number of the point pairs, m is the number of the parameters, l is the number of the point pairs meeting the truncation threshold, and the first formula is selected to calculate the truncation coefficient because m is smaller than n. When l = n/2, the collapse point is close to 0.5.

Since the normal least square estimation is not ideal when the occupancy of the correct point pair is less than 50%, the theoretically correct point pair obtained in the first iteration includes many wrong point pairs even if the truncation factor is 0.5.

Therefore, in order to make the truncated ICP robust while maintaining structural convergence, in the present embodiment, a truncation coefficient is used in each ICP iteration to process an erroneous point pair, i.e., a matching point pair meeting the requirement is selected by using a truncation ratio, then a rigid body transformation between point sets is solved by using a generalized least square method, a template matching point pair is updated by using the obtained transformation, a truncation ratio phi is updated, a point pair meeting the requirement is selected by using a new truncation ratio, and the process is repeated until the final convergence. The more the number of iterations, the larger the truncation ratio phi will be, which is done in order to make the ratio of the mismatching point to the entire matching point smaller and smaller. Rate of truncation

The update formula of (c) is as follows:

where k is the number of iterations.

The rigid transformation steps for solving the objective function by truncating ICP are as follows:

s81, constructing a residual error measurement function

S82, accelerating and traversing by using a K-D tree to obtain the most adjacent point pair of the template point cloud and the point cloud to be approved.

And S83, calculating the nearest neighbor point pair meeting the requirement by using the truncation coefficient.

And S84, obtaining rigid transformation according to the nearest point pairs meeting the requirement by using an SVD method.

In the process of assembling each module of the practical training equipment, offset errors of each installation module and each module in the practical training platform equipment of the reference version are quickly and accurately positioned by using binocular stereo vision, namely

And guiding an installation and debugging engineer to debug the relative position on the platform by using the visual software.

According to the invention, the binocular vision system is used for carrying out high-precision rapid identification and positioning on each module on the practical training platform to obtain the three-dimensional pose information of each module on the practical training platform, and carrying out real-time position deviation correction information feedback on each module in the debugging process to guide the installation error of a debugging worker, thereby improving the debugging efficiency of the practical training platform, shortening the debugging time, reducing the debugging cost and achieving the purpose of rapid shipment.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for quickly debugging a practical training platform based on binocular stereo vision is characterized by comprising the following steps:

calculating a point cloud to be matched of the target area by using a feature matching point pair calculation formula according to a triangular imaging principle;

2. The binocular stereo vision based training platform rapid debugging method of claim 1, wherein the double-target centering further comprises:

and carrying out camera coordinate conversion on the images acquired by the left and right monocular cameras through an internal reference matrix, multiplying by a rotation matrix to obtain new coordinate systems of the left and right monocular cameras, carrying out distortion correction on the left and right cameras through the distortion removal operation of the left and right cameras, and carrying out epipolar line verification on the images by the left and right monocular cameras.

3. The binocular stereo vision based training platform rapid debugging method of claim 1, wherein: the dual attention model includes a channel attention module and a location attention module;

adding the dual attention model to a backbone and a tack of a Yolo v5 network to obtain the lightweight Yolo v5 network.

4. The binocular stereo vision based training platform rapid debugging method of claim 1, wherein the method comprises the following steps: performing registration operation on the point clouds to be matched by using a truncation ICP (inductively coupled plasma) method, wherein the point clouds to be matched are arranged in an ascending order, and the first half of the point clouds to be matched are taken as the minimum;

updating the truncation ratio using the rigid body transformation;

5. The binocular stereo vision based training platform rapid debugging method of claim 1, wherein: the step of truncating the ICP to solve for the rigid transformation includes

Constructing a residual error measurement function;

accelerating and traversing by utilizing a K-D tree to obtain the nearest point pair of the template point cloud and the point cloud to be matched;

solving the nearest point pair meeting the requirement by utilizing the truncation coefficient;

the rigid transformation is found from the nearest-neighbor pairs that satisfy the requirements using the SVD method.

6. The binocular stereo vision based training platform rapid debugging method of claim 1, wherein the method comprises the following steps: the stereo matching method comprises the step of searching in the target area through an SURF feature point operator to obtain the first feature point set and the second feature point set.

7. The binocular stereo vision based training platform rapid debugging method of claim 1, wherein: the design of the lightweight Yolo v5 network further comprises the steps of collecting a sample data set constructed by a sample diagram data set of a practical training platform, and training the lightweight Yolo v5 network through the sample data set.

8. The binocular stereo vision based training platform rapid debugging method of claim 3, wherein the method comprises the following steps: the channel Attention module is a structure of Residual + Attention, a characteristic diagram A is subjected to Reshape and Transpose respectively to obtain a characteristic diagram R1 and a characteristic diagram RT1 respectively, the characteristic diagram R1 and the characteristic diagram RT1 are multiplied through softmax to obtain a channel Attention characteristic diagram X, the characteristic diagram X and the characteristic diagram A are multiplied through a multiplication scale coefficient and then subjected to Reshape, the obtained product is changed into an original shape, and finally the obtained product is added with the characteristic diagram A to obtain an output characteristic diagram E.

9. The binocular stereo vision based training platform rapid debugging method of claim 3, wherein the method comprises the following steps: the position Attention module is a structure of Residual + Attention, the feature diagram A is convolved through 3 convolution kernels respectively to obtain 3 feature diagrams B, C and D, then the feature diagram B is multiplied by a matrix after Reshape and transfer are carried out on the feature diagram B and the feature diagram C, softmax is carried out to obtain a feature diagram S, the feature diagram S and the matrix after Reshape are multiplied by a scale coefficient to carry out Reshape, the original shape is changed, and finally the feature diagram S and the feature diagram A are added to obtain a final output feature diagram E.

10. The binocular stereo vision based training platform rapid debugging method of claim 1, wherein the method comprises the following steps: the training platform module position data set comprises:

taking a picture of the random placement position of the module; and

and shooting pictures of different postures of the robot on the practical training platform.