CN111062990A

CN111062990A - Binocular vision positioning method for underwater robot target grabbing

Info

Publication number: CN111062990A
Application number: CN201911282989.9A
Authority: CN
Inventors: 黄海; 石晓婷; 盛明伟; 张万里; 徐杨; 鲍轩
Original assignee: Harbin Engineering University
Current assignee: Harbin Engineering University
Priority date: 2019-12-13
Filing date: 2019-12-13
Publication date: 2020-04-24
Anticipated expiration: 2039-12-13
Also published as: CN111062990B

Abstract

The invention relates to a binocular vision positioning method for underwater robot target grabbing, and belongs to the field of computer vision. The method is mainly used for accurately acquiring the three-dimensional information of the captured target when the underwater robot works. The method comprises the following steps of binocular calibration: calculating internal and external parameters of the left camera and the right camera; target detection: positioning a target object detection frame; binocular image rectification: distortion correction and stereo correction, and determining a right image target area; stereo matching of binocular images: extracting image characteristic points, describing the characteristic points, performing stereo matching and removing mismatching; and calculating the three-dimensional information of the target in the image under the coordinates of the left camera. The method comprises the steps of extracting feature points, inhibiting non-maximum values to remove unstable feature points, constructing a binary descriptor, matching the feature points, removing mismatching and obtaining an accurate parallax value. By the scheme, the binocular stereo matching robustness can be improved, and meanwhile, the three-dimensional information of the detected target can be accurately obtained, so that the real-time positioning requirement on the target when the underwater robot target is grabbed is met.

Description

Binocular vision positioning method for underwater robot target grabbing

Technical Field

The invention relates to a binocular vision positioning method for underwater robot target grabbing, and belongs to the field of computer vision.

Background

The marine environment is unknown and severe, part of marine life needs to be collected manually, and the fishing difficulty is very high, so that the development of underwater robots to replace people to finish the work under the special working conditions becomes the trend of marine life fishing in the future. The binocular vision positioning technology can provide distance information and pose information of a target object for the underwater robot in real time, guide the robot to realize an autonomous obstacle avoidance function and complete intelligent autonomous capture of marine organisms. Compared with other positioning technologies, such as laser, sonar and the like, binocular stereoscopic vision can intuitively feed back underwater images, the details of the images are clear, and the information requirements of underwater target grabbing in multiple aspects are met; and the device is not easily interfered by the environment, and has high precision and high speed. The method has important significance in guiding the robot to achieve target grabbing.

The binocular stereo vision target positioning mainly comprises binocular calibration, image correction, stereo matching and three-dimensional coordinate calculation. The binocular calibration is off-line calibration, and internal and external parameters of the binocular cameras and a rotation and translation matrix between the binocular cameras are obtained. And the image correction uses parameters obtained by binocular calibration to eliminate the distortion of the camera and carries out epipolar line correction so that the left image and the right image are aligned in a row. The stereo matching is to find a point on the right image corresponding to a specific point on the left image, calculate the corresponding parallax, and then calculate the three-dimensional coordinates of the target. Stereo matching is the key to binocular stereo vision: the matching primitives can be classified into region-based matching, feature-based matching and phase-based matching according to their differences. Compared with the defects of large calculation amount and low speed based on the region correlation matching and the phase singularity based on the phase matching, the method has the advantages of strong anti-interference performance of the feature matching and high calculation speed.

The binocular vision positioning applied to underwater target grabbing has few researches, considers the poor quality of underwater images, meets the requirements of robot operation on accuracy and real-time performance, and needs to use advanced algorithm on land and improve. The orb (ordered FAST and rotaed brief) algorithm proposed by ethan rule et al in 2012 uses the OFAST algorithm with rotation invariance in the aspect of feature point extraction, and the algorithm is simple. In the aspect of feature point description, an rBRIEF (RotatedBERIEF) binary descriptor is adopted, so that the time consumption is short, and the method is a commonly used algorithm in the current binocular real-time positioning. However, the number of matching feature point pairs of the ORB algorithm is much lower than that of the SIFT algorithm, and the algorithm has no scale invariance and is poor in robustness to illumination noise. Aiming at the problems that characteristic points extracted by OFAST are unstable and have no scale invariance, and snow plums and the like are used in the document ' snow plum ', the characteristic points with robust scale invariance are extracted by using a SURF algorithm in an image characteristic point matching research [ J ] based on an improved ORB (object oriented library) and an electronic measurement and instrument report, 2016 ', and then a binary descriptor is constructed by using the ORB algorithm, but the SURB (SURF-ORB) algorithm has long running time and cannot meet the real-time requirement; dingyurong et al in a document 'Dingyurong, FAST feature point extraction algorithm [ J ] based on adaptive threshold, command control and simulation, 2013' proposes to use the adaptive threshold to replace a fixed threshold to control the number of FAST extracted feature points and use a non-maximum value to inhibit and remove part of candidate feature points so as to obtain robust, stable and good robustness feature points. Aiming at the defects that an rBRIEF binary descriptor has no scale invariance and has poor matching effect on a larger fuzzy graph, Zhang Yang and the like are provided in a document of Zhang Yang, an improved ORB feature point matching algorithm [ J ], Chongqing university of Industrial and commercial, 2018, and a feature point matching algorithm combining a BRISK feature descriptor with scale invariance and an ORB feature detection sub is provided, so that the robustness of the binary descriptor is effectively improved, however, the binary descriptor is easily influenced by noise and local apparent change by selecting pixel points in the field of the feature points to establish the binary descriptor.

Disclosure of Invention

The invention aims to provide a binocular vision positioning method for underwater robot target grabbing, aiming at solving the problem that the accuracy and the speed of the prior art cannot be considered at the same time.

The invention aims to realize the binocular vision positioning method for underwater robot target grabbing, which specifically comprises the following steps:

step one, binocular calibration: carrying out underwater offline calibration on the binocular cameras according to a pinhole imaging principle, and calculating internal reference matrixes, external reference matrixes and distortion coefficients of the left camera and the right camera and rotation and translation matrixes between the left camera and the right camera;

step two, target detection: detecting a target area of the left camera image, and determining pixel coordinates of a target detection frame;

step three, binocular image correction: setting the pixel gray value outside a target detection frame of the left eye image as 0, correcting radial distortion and tangential distortion generated in the underwater imaging process of a binocular camera, performing Bouguet polar line correction on the binocular image to enable the heights of image points on the left image and the right image to be consistent, and setting the pixel gray value of a specific area of the right eye image as 0;

step four, stereo matching: extracting characteristic points of the corrected binocular image, describing the characteristic points, matching the characteristic points, and removing mismatching to obtain an accurate matching point pair of the target;

step five, calculating three-dimensional coordinates: and calculating the three-dimensional coordinates of the target in the left camera coordinate system according to the similar triangle principle.

The invention also includes such structural features:

1. step one, the binocular calibration specifically comprises a calibration plate and a camera which are immersed in water at the same time, in order to reduce calibration errors, the distance between the calibration plate and the camera is adjusted according to the proportion of a calibration plate pattern in the visual field of the camera, and the size of the calibration plate accounts for 1/3-3/4 of the whole visual field; controlling the number of the pictures of the calibration plates with different poses within 25-30, and covering the whole visual field of the camera; the included angle between the calibration plate and the imaging plane needs to be increased step by step at an interval of about 10 degrees until 50 degrees, monocular calibration and binocular stereo calibration are carried out by using a Zhang-Zhengyou calibration method to obtain an external parameter matrix of the monocular camera, an internal parameter pixel focal length f of the monocular camera and a principal point coordinate (u)₀，v₀) And binocular cameraThe rotation R and the translation matrix T between.

2. And step two, the target detection specifically comprises the step of learning the target detection of the left camera by adopting a Fast RCNN algorithm, so that the time cost is saved for reducing the number of invalid feature points, and the target is accurately positioned, and the gray value of a pixel outside a detection frame is set to be zero.

3. The third step specifically comprises the following steps:

step 1, setting the gray value of a pixel outside a target detection frame of a left eye image as 0;

step 2, correcting radial distortion and tangential distortion generated by the binocular camera in the underwater imaging process: the camera calibration adopts a pinhole imaging model, an ideal pinhole model is a linear model, but due to the manufacturing precision of the lens, the deviation of the assembly process and the multiple refraction of water, glass and air, a distorted lens can be introduced, and the linear model is added with some internal parameters to become a nonlinear model; the distortion of the lens is divided into two types, radial distortion and tangential distortion, which are expressed as:

x₀＝x(1+k₁r²+k₁r⁴+k₃r⁶)

y₀＝y(1+k₁r²+k₁r⁴+k₃r⁶)

x₀＝x+[2p₁y+p₂(r²+2x²)]

y₀＝y+[2p₁x+p₂(r²+2y²)]

r²＝x₀ ²+y₀ ²

converting a source image pixel coordinate system into a camera coordinate system through an internal reference matrix, correcting a camera coordinate of an image through a distortion coefficient, converting the camera coordinate system into an image pixel coordinate system through the internal reference matrix after correction, and assigning a new image coordinate according to a pixel value of the source image coordinate;

step 3, performing Bouguet epipolar line correction on the binocular images to enable the heights of the image points on the left image and the right image to be consistent: the goal of Bouguet epipolar line correction is to have two phasesThe optical axes of the machine are parallel, and the heights of the image points on the left image and the right image are consistent, so that when the feature points are matched in a three-dimensional mode, only the matching points of the left image plane and the right image plane need to be searched on the same line, and the efficiency can be greatly improved; the Bouguet epipolar line correction method is to decompose a rotation matrix R of a right image plane relative to a left image plane into two matrices R_lAnd R_rRotating the left camera and the right camera by half respectively to enable the optical axes of the left camera and the right camera to be parallel, and constructing a transformation matrix R by the principle of decomposition that the distortion caused by the re-projection of the left image and the right image is minimum and the common area of the left view and the right view is maximum_rectThe base line is aligned with the imaging plane in parallel, namely, the row is aligned, and the integral rotation matrix R of the left camera and the right camera is obtained by multiplying the synthetic rotation matrix and the transformation matrix_l′，R_r' multiplying the coordinate systems of the left camera and the right camera by respective integral rotation matrixes to enable the main optical axes of the left camera and the right camera to be parallel and enable the image plane to be parallel to the base line; each matrix is represented as:

R_l＝R^1/2

R_r＝R^1/2

e3＝e1×e2

R_l′＝R_rect*R_l

R_r′＝R_rect*R_r

and 4, after binocular image correction, calculating a vertical coordinate range and a horizontal coordinate range of non-zero pixels of the left image, and setting the pixel value to be zero outside a corresponding area of the right image, wherein the minimum value of the non-zero pixels of the horizontal coordinate of the right image is the difference value between the minimum value of the horizontal coordinate of the left image and the maximum parallax.

4. The fourth step specifically comprises the following steps:

step 1, extracting feature points: when the robot works underwater, the camera may be collided by external force to change the image scale and direction, a Gaussian scale pyramid is established for the extracted feature points to have scale invariance, the image is checked by using the Gaussian convolution to carry out iterative convolution at first, and the downsampling processing is repeatedly carried out, the scale space pyramid consists of a plurality of Octave layers and Intra-Octave layers in an alternating mode, and the Octave layer consists of an original image

The multiple down-sampling is carried out to obtain the position where i is even number, and the Intra-octave layer is from the original image

Obtaining the position where i is odd number by multiple down sampling, i is the layer number, i belongs to [1, 8 ]]；

Because the underwater environment illumination is uneven and the noise is strong, when fast feature points are extracted from each layer of pyramid, in order to improve the robustness of the algorithm, a self-adaptive threshold value is adopted to replace a fixed threshold value to extract the feature points, a non-maximum value is adopted to inhibit and avoid the occurrence of feature point blocks, the number of invalid feature points is reduced, and the operation time of the algorithm is reduced; comparing the fast score values of the candidate characteristic points with the characteristic points of the surrounding fields and the characteristic points of the corresponding positions of the upper layer and the lower layer, and if the candidate characteristic points are not extreme values, rejecting the candidate characteristic points; the adaptive threshold value calculation formula and the score value calculation formula are as follows:

the proportionality coefficient a is 0.15-0.30, and n is 10; removing unstable edge points by using a hessian matrix; calculating the main direction of the feature points by using a gray scale centroid method to obtain stable and robust feature points for the extracted feature points to have rotation invariance;

step 2, specialDescription of characteristic points: rotating the neighborhood of the feature point to the main direction, and establishing a descriptor in a window W of the 48 multiplied by 48 field with the feature point as the center; the underwater robot has special working environment, uneven illumination, water particle scattering, random noise and other factors which influence the descriptor of the characteristic point, the three-tuple descriptor LATCH compares pixel blocks instead of single pixel values, the pixel blocks have more visual information supported by space, and are not sensitive to noise, and the descriptor has good accuracy and distinguishability; selecting three 7 x 7 pixel blocks P_t，a，P_t，1，P_t，2Comparing two pixel blocks P_t，1，P_t，2And a main pixel block P_t，aThe square sum of the norms determines the value of each bit of the descriptor;

forming a 56k triple arrangement by randomly selecting pixel coordinates, and defining the quality of the arrangement by unsupervised learning; rejecting the candidate arrangement if the absolute correlation of the candidate arrangement with all previously selected arrangements is greater than a threshold τ, τ being set to 0.2; finally establishing a 256-bit binary descriptor;

step 3, stereo matching: when the characteristic points are subjected to stereo matching through the Bouguet polar line correction, only the matching points of the left image plane and the right image plane need to be searched on the same line; in the binocular image correction step, pixel gray values outside the regions of interest extracted from the left image and the right image are set to be zero; calculating a maximum parallax value based on the operation height of binocular vision positioning, adopting exhaustive search when searching a matching point of a left image feature point in a right image, calculating a Hamming distance with the feature point in the maximum parallax range on the left side of a feature point pixel coordinate, comparing the ratio of the nearest neighbor distance to the next nearest neighbor distance with a set threshold value, setting the threshold value to be 0.55, if the threshold value is smaller than the threshold value, determining the matching point, and completing coarse matching of the feature point;

step 4, removing mismatching: mismatch is removed using the PROSAC algorithm.

5. The process of the PROSAC algorithm is as follows: the matching quality is expressed by the ratio of the nearest neighbor distance and the next nearest neighbor distance of the matching pair; the ratio is from large to smallArranging, namely sequencing the matching quality from good to poor, extracting 4 groups of matching point pairs with good matching quality to calculate a homography matrix H, calculating corresponding projection points by the remaining matching point pairs according to the homography matrix H, comparing the error e of the projection points and the matching points with an error threshold delta, judging as an inner point if e is less than delta, and otherwise, judging as an outer point; if the number T of the inner points is larger than the set threshold value T, the number of the inner points is updated to T, otherwise, iteration is continued, the homography matrix and the new inner points are recalculated by using the updated inner points, and if the iteration frequency I is smaller than I_mReturning the homography matrix and the new inner point set, otherwise, finding a conforming model; and finally, the calculated outliers are the mismatching point pairs and are removed.

6. Step five, the three-dimensional coordinate calculation specifically comprises the following steps:

step 1, after three-dimensional correction, optical axes of two cameras are parallel, left and right imaging planes are coplanar, epipolar lines are aligned, and an ideal binocular ranging model is met;

step 2, setting the coordinate of the point to be measured under the coordinate system of the left camera as P (X)_C，Y_C，Z_C) The pixel coordinate of the imaging point on the left and right cameras is P_l(u_l，v_l)，P_r，(u_r，v_r) The parallax value is d, (X)_C，Y_C，Z_C) The calculation is as follows:

d＝u_l-u_r

and 3, after the three-dimensional coordinates of the target under the coordinates of the left camera are obtained, converting the coordinates of the target object into a boat body coordinate system, locally planning the path of the robot, enabling the robot to stably move to the target, closing the paw to grab the target, and realizing the completely autonomous process of seeing the target from the camera to the paw to grab the target.

7. The binocular camera fixing support and the horizontal plane form an included angle of 60 degrees, the visual field of a camera is enlarged, a target can be quickly found, and when the paw grabs the target, the arm extends forwards to enable the paw to appear in the visual field of the camera to finish autonomous grabbing actions.

Compared with the prior art, the invention has the beneficial effects that: the invention provides an improved feature matching algorithm: the characteristic matching algorithm of the dimension FAST combined with the LATCH descriptor can extract the characteristic points which are robust and stable and have the dimension invariance for reducing noise interference, and effectively reduces the difference with the descriptor based on the histogram in the aspect of robustness. The dual requirements of the binocular vision positioning method on the running speed and the accuracy robustness during the grabbing of the underwater robot target can be met. The invention makes experience constraint on the distance, angle and number of photos of underwater off-line calibration, can greatly reduce random errors brought by a calibration tool, improves the calibration efficiency and has strong operability. According to the invention, target detection and target positioning are combined, the three-dimensional coordinates of the target area of interest can be accurately and effectively calculated, and the underwater robot is guided to accurately grab the target. The method and the device improve the operation efficiency from two aspects of a feature matching area and a matching algorithm in order to meet the real-time requirement of target grabbing. And setting the gray value of the pixel outside the left image target detection frame as zero, calculating the vertical coordinate range and the horizontal coordinate maximum value of the left image target area after three-dimensional correction, and setting the gray value of the pixel outside the corresponding area of the right image as zero. Reducing the feature matching area; after the characteristic points are extracted by the FAST, unstable points and local non-maximum value points are removed, the time for describing and matching the characteristic points is reduced, and when the characteristic point exhaustion method is matched, the matching points of the left image plane and the right image plane only need to be searched in the same row of the maximum parallax range, so that the matching efficiency is greatly improved. The method improves the original algorithm and improves the robustness of the algorithm in order to meet the requirement of the accuracy of target grabbing. A scale pyramid is constructed, the algorithm scale invariance is met, and the algorithm rotation deformation is avoided by a gray scale centroid method; by using the LATCH descriptor, the comparison of the pixel point pairs is converted into the comparison of the norm of the triple image block, so that the advantage of short time consumption of the binary descriptor is kept, and the robustness of the descriptor in the aspects of illumination change, visual angle change and noise influence is enhanced.

Drawings

Fig. 1 is a schematic flow chart of a target positioning method based on binocular stereo vision:

FIG. 2 is an original image of an underwater target captured by left and right cameras;

FIG. 3 is a left and right image after binocular image rectification;

FIG. 4 is a flow chart of an algorithm for feature point extraction description;

FIG. 5 is a flow chart of the PROSAC algorithm for removing mismatches;

fig. 6 is left and right images after binocular image feature point matching.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

The embodiment provides a binocular vision positioning method for underwater robot target grabbing, which is characterized in that a scale pyramid is constructed to extract FAST feature points, LATCH binary descriptors are formed to carry out feature point matching, mismatching is removed by using a PROSAC algorithm, an accurate parallax value is obtained, and the problem that the accuracy and the speed cannot be considered simultaneously in the prior art is solved. As shown in fig. 1, the overall implementation flow of the present invention is as follows:

binocular calibration: and carrying out underwater offline calibration on the binocular cameras according to a pinhole imaging principle, and calculating an internal reference matrix, an external reference matrix, a distortion coefficient and a rotation and translation matrix between the left camera and the right camera.

Target detection: and detecting targets such as sea cucumbers, sea urchins, scallops and the like in the left camera image on line, and determining the pixel coordinates of the target detection frame.

Binocular image rectification: setting the gray value of a pixel outside a target detection frame of the left eye image as 0, correcting radial distortion and tangential distortion generated in the underwater imaging process of the binocular camera, and performing Bouguet polar line correction on the binocular image to enable the heights of image points on the left image and the right image to be consistent. The gray value of the pixels in the specific area of the right eye image is set to 0.

Stereo matching: and extracting characteristic points of the corrected binocular image, describing the characteristic points, matching the characteristic points, and removing mismatching to obtain an accurate matching point pair of the target.

And (3) calculating three-dimensional coordinates: and calculating the three-dimensional coordinates of the target in the left camera coordinate system according to the similar triangle principle.

Further, in the target detection step: and learning the sea biological target detection of the left camera by adopting a Fast RCNN algorithm. In order to reduce the number of invalid feature points, save time and cost and accurately position the target, the gray value of the pixel outside the detection frame is set to be zero.

Further, in the binocular calibration step: the calibration plate and the camera are immersed in water at the same time. In order to reduce calibration errors, the distance between the calibration board and the camera is adjusted according to the proportion of the calibration board pattern in the field of view of the camera, and the size of the calibration board accounts for 1/3-3/4 of the whole field of view; controlling the number of the pictures of the calibration plates with different poses within 25-30, and covering the whole visual field of the camera; the included angle between the calibration plate and the imaging plane needs to be increased gradually at intervals of about 10 degrees until 50 degrees. Monocular calibration and binocular stereo calibration are carried out by using a Zhang Zhengyou calibration method to obtain the internal parameter pixel focal length f and principal point coordinates (u) of the monocular camera₀，v₀) And a rotation R and translation matrix T between the binocular cameras.

Further, in the binocular image correction step: after binocular image correction, the vertical coordinate range and the horizontal coordinate range of the non-zero pixels of the left image are calculated, the pixel value outside the corresponding area of the right image is set to be zero, and the minimum value of the non-zero pixels of the horizontal coordinate of the right image is the difference value between the minimum value of the horizontal coordinate of the left image and the maximum parallax.

The camera calibration generally adopts a pinhole imaging model, an ideal pinhole model is a linear model, but due to the manufacturing accuracy of the lens, the deviation of the assembly process and the multiple refraction of water, glass and air, a distorted lens can be introduced, and the linear model is usually added with some internal parameters to become a nonlinear model. The distortion of the lens is classified into two types, radial distortion and tangential distortion, as follows:

x₀＝x(1+k₁r²+k₁r⁴+k₃r⁶)

y₀＝y(1+k₁r²+k₁r⁴+k₃r⁶)

x₀＝x+[2p₁y+p₂(r²+2x²)]

y₀＝y+[2p₁x+p₂(r²+2y²)]

r²＝x₀ ²+y₀ ²

and converting a source image pixel coordinate system into a camera coordinate system through an internal reference matrix, correcting the camera coordinate of the image through a distortion coefficient, converting the camera coordinate system into an image pixel coordinate system through the internal reference matrix after correction, and assigning a new image coordinate according to the pixel value of the source image coordinate.

The goal of Bouguet epipolar line correction is to make the optical axes of the two cameras parallel and the image points highly coincident on the left and right images. When the feature points are matched in a three-dimensional mode, only the matching points of the left image plane and the right image plane need to be searched on the same line, and therefore efficiency can be greatly improved.

The Bouguet epipolar line correction method decomposes a rotation matrix R of the right image plane relative to the left image plane into two matrices R_lAnd R_rThe left and right cameras are each rotated by half so that the optical axes of the left and right cameras are parallel. The principle of decomposition is to minimize the distortion caused by the left and right image re-projection and maximize the common area of the left and right views. Constructing a transformation matrix R_rectSo that the baseline is parallel to the imaging plane (row aligned). Obtaining integral rotation matrix R of left and right cameras by multiplying synthesized rotation matrix and transformation matrix_l′，R_r'. The left and right camera coordinate systems are multiplied by the respective global rotation matrices such that the principal optical axes of the left and right cameras are parallel and the image plane is parallel to the baseline. Each matrix is represented as follows:

R_l＝R^1/2

R_r＝R^1/2

e3＝e1×e2

R_l′＝R_rect*R_l

R_r′＝R_rect*R_r

further, after the target detection frame position is determined by the left image, the gray value of the pixels outside the detection frame area is set to be zero. When the robot captures marine creatures underwater, a plurality of targets may exist in the visual field of the camera, so that target detection cannot be simply performed on the right image, after binocular image correction, the vertical coordinate range and the horizontal coordinate range of non-zero pixels of the left image are calculated, the pixel value is set to be zero outside the corresponding region of the right image, and it is worth noting that the minimum value of the non-zero pixels of the horizontal coordinate of the right image is the difference value between the minimum value of the horizontal coordinate of the left image and the maximum parallax. As shown in fig. 3.

Further, in the stereo matching step: and establishing a scale pyramid, extracting fast feature points on each layer of pyramid, then inhibiting non-maximum values, and removing unstable feature points. The relationship between the tri-pixel blocks is compared using the tri-tuple descriptor LATCH to establish a binary descriptor. And establishing matching point pairs based on Hamming distance by adopting exhaustive search. Mismatch is removed using the PROSAC algorithm.

Further, the steps of feature point extraction and feature point description are shown in fig. 4. The stereo matching comprises four steps:

extracting characteristic points: and establishing a Gaussian scale pyramid for the extracted feature points to have scale invariance, extracting fast feature points on each pyramid layer, comparing the fast score values of the candidate feature points with the feature points of the surrounding field and the feature points of the corresponding positions of the upper layer and the lower layer, and if the candidate feature points are not extreme values, rejecting the candidate feature points. And removing unstable edge points by using a hessian matrix, and calculating the main direction of the feature points by using a gray centroid method to obtain stable and robust feature points with scale invariance and rotation invariance.

Description of characteristic points: rotating the neighborhood of the feature point to the main direction and selecting 24 sigma_i×24σ_iSample region build descriptor using three-tuple descriptor LATCH to select three 3 sigma_i×3σ_iAnd the large pixel block compares the square sum of the two pixel blocks and the norm of the main pixel block to establish a LATCH binary descriptor.

In the characteristic point extraction step, when the robot works underwater, the camera may be collided by external force to change the image scale and direction, a Gaussian scale pyramid is established for the extracted characteristic points to have scale invariance, the image is checked by using the Gaussian convolution to carry out iterative convolution at first, and the down-sampling treatment is repeated, the scale space pyramid consists of a plurality of Octave layers and Intra-Octave layers in an alternating mode, and the Octave layer consists of an original image

Multiple down-sampling to obtain (i is even number), and the Intra-octave layer is from the original image

Obtaining by multiple down-sampling (i is odd number), i is layer number, i belongs to [1, 8 ]]。

Because the illumination of the underwater environment is uneven and the noise is strong, fast feature points are extracted from each layer of pyramid, and because the images collected in different environments have the differences of illumination intensity and noise intensity, in order to improve the robustness of the algorithm, the feature points are extracted by adopting a self-adaptive threshold instead of a fixed threshold. And non-maximum value suppression is adopted to avoid the occurrence of feature point blocks, the number of invalid feature points is reduced, and the operation time of the algorithm is reduced. And comparing the candidate characteristic points with the fast score values of the surrounding field characteristic points and the characteristic points of the upper and lower layers of corresponding positions, and if the candidate characteristic points are not extreme values, rejecting the candidate characteristic points. The adaptive threshold value calculation formula and the score value calculation formula are as follows:

the proportionality coefficient a is generally 0.15-0.30, and n is generally 10. And removing unstable edge points by using a hessian matrix, and calculating the main direction of the feature points by using a gray centroid method to obtain stable and robust feature points with scale invariance and rotation invariance.

Description of characteristic points: the feature point neighborhood is rotated to the principal direction and a descriptor is built within a 48 x 48 domain window W centered on the feature point. The underwater robot has special working environment, uneven illumination, water particle scattering, random noise and other factors which influence the descriptor of the characteristic point, the three-tuple descriptor LATCH compares pixel blocks instead of single pixel values, the pixel blocks have more visual information supported by space, and are insensitive to noise, and the descriptor has good accuracy and distinguishability. Conventional binary descriptors such as BRIEF, ORB, and BRISK algorithms select a certain amount of pixel point pairs in the feature point field to establish descriptors, and in order to solve the problem that the pixel points are sensitive to noise and other changes, researchers propose improvement of image pre-smoothing, and the side effect is loss of high-frequency information, especially loss of feature point region information. The triplet descriptor LATCH compares pixel blocks, which have more spatially supported visual information and are less sensitive to noise, than individual pixel values. Selecting three 7 x 7 pixel blocks P_t，a，P_t，1，P_t，2Comparing two pixel blocks P_t，1，P_t，2And a main pixel block P_t，aThe sum of the squares of the norms determines the value of the descriptor per bit.

The 56k triplet arrangement is formed by randomly selecting pixel coordinates and the quality of the arrangement is defined by unsupervised learning. Candidate placements are rejected if their absolute correlation with all previously selected placements is greater than a threshold τ, τ being set to 0.2. Finally, a 256-bit binary descriptor is created.

Stereo matching: when the characteristic points are matched stereoscopically by using the Bouguet epipolar line correction, the matching points of the left image plane and the right image plane only need to be searched on the same line. In the binocular image correction step, the gray value of pixels outside the region of interest extracted from the left image and the right image is set to be zero. Calculating a maximum parallax value based on the operation height of binocular vision positioning, adopting exhaustive search to calculate a Hamming distance with a feature point in a maximum parallax range on the left side of a feature point pixel coordinate when a matching point of a left image feature point is searched for in a right image, comparing the ratio of the nearest neighbor distance and the next nearest neighbor distance with a set threshold value, setting the threshold value to be 0.55, and determining the matching point if the threshold value is smaller than the threshold value. And finishing coarse matching of the feature points.

Removing mismatching: mismatch is removed using the PROSAC algorithm. The PROSAC algorithm flow is shown in fig. 5. And calculating the matching quality of the rough matching point pair, wherein the matching quality is expressed by the ratio of the nearest neighbor distance to the next nearest neighbor distance of the matching pair. And (3) sorting the ratios from large to small, namely sorting the matching quality from good to poor, extracting 4 groups of matching point pairs with relatively good sum of the matching quality to calculate a homography matrix H, calculating corresponding projection points according to the homography matrix H by using the rest matching point pairs, comparing an error e of the projection points and the matching points with an error threshold delta, judging as an inner point if e is less than delta, and otherwise, judging as an outer point. If the number T of the inner points is larger than the set threshold value T, the number of the inner points is updated to T, otherwise, iteration is continued, the homography matrix and the new inner points are recalculated by using the updated inner points, and if the iteration frequency I is smaller than I_mAnd returning the homography matrix and the new inner point set, otherwise, not finding a conforming model. And finally, the calculated outliers are the mismatching point pairs and are removed.

Further, in the step of calculating the three-dimensional coordinates: after the three-dimensional correction, the optical axes of the two cameras are parallel, the left imaging plane and the right imaging plane are coplanar, and the epipolar lines are aligned, so that the binocular range finding model is in accordance with an ideal binocular range finding model. Let the coordinate of the point to be measured under the coordinate system of the left camera be P (X)_C，Y_C，Z_C) Imaging images of points on left and right camerasThe element coordinate is P_l(u_l，v_l)，P_r(u_r，v_r) The parallax value is d, X_C，Y_C，Z_CThe calculation is as follows:

d＝u_l-u_r

and after the three-dimensional coordinates of the target under the coordinates of the left camera are obtained, the coordinates of the target object are converted into a boat body coordinate system, the path of the robot is planned locally, the robot moves to the target smoothly, the paw is closed to grab the target, and the completely autonomous process that the target is seen from the camera and the paw is grabbed by the target is realized.

Furthermore, in the binocular vision positioning method for underwater robot target grabbing, the binocular fixing support and the horizontal plane form an included angle of 60 degrees, so that the visual field of the camera is enlarged, and the target can be conveniently and quickly found. When the paw is used for grabbing the target, the arm extends forwards to enable the paw to appear in the visual field of the camera to finish the autonomous grabbing action.

In conclusion, the invention provides a binocular vision positioning method for underwater robot target grabbing, and belongs to the field of computer vision. The method is mainly used for accurately acquiring the three-dimensional information of the captured target when the underwater robot works. The method comprises the following steps: binocular calibration: calculating internal and external parameters of the left camera and the right camera; target detection: positioning a target object detection frame; binocular image rectification: distortion correction and stereo correction, and determining a right image target area; stereo matching of binocular images: extracting image characteristic points, describing the characteristic points, performing stereo matching and removing mismatching; and calculating the three-dimensional information of the target in the image under the coordinates of the left camera. The method comprises the steps of extracting feature points, inhibiting non-maximum values to remove unstable feature points, constructing a binary descriptor, matching the feature points, removing mismatching and obtaining an accurate parallax value. By the scheme, the binocular stereo matching robustness can be improved, and meanwhile, the three-dimensional information of the detected target can be accurately obtained, so that the real-time positioning requirement on the target when the underwater robot target is grabbed is met.

Claims

1. A binocular vision positioning method for underwater robot target grabbing is characterized by comprising the following steps:

2. The binocular vision positioning method for underwater robot target grabbing according to claim 1, characterized in that: step one, the binocular calibration specifically comprises the steps that a calibration plate and a camera are immersed in water at the same time, in order to reduce calibration errors, the distance between the calibration plate and the camera is adjusted according to the proportion of a pattern of the calibration plate in the visual field of the camera, and the size of the calibration plate accounts for 1/3-3 of the whole visual field(ii)/4; controlling the number of the pictures of the calibration plates with different poses within 25-30, and covering the whole visual field of the camera; the included angle between the calibration plate and the imaging plane needs to be increased step by step at an interval of about 10 degrees until 50 degrees, monocular calibration and binocular stereo calibration are carried out by using a Zhang-Zhengyou calibration method to obtain an external parameter matrix of the monocular camera, an internal parameter pixel focal length f of the monocular camera and a principal point coordinate (u)₀，v₀) And a rotation R and translation matrix T between the binocular cameras.

3. The binocular vision positioning method for underwater robot target grabbing according to claim 2, characterized in that: and step two, the target detection specifically comprises the step of learning the target detection of the left camera by adopting a Fast RCNN algorithm, so that the time cost is saved for reducing the number of invalid feature points, and the target is accurately positioned, and the gray value of a pixel outside a detection frame is set to be zero.

4. The binocular vision positioning method for underwater robot target grabbing according to claim 3, wherein the third step specifically comprises the following steps:

x₀＝x(1+k₁r²+k₁r⁴+k₃r⁶)

y₀＝y(1+k₁r²+k₁r⁴+k₃r⁶)

x₀＝x+[2p₁y+p₂(r²+2x²)]

y₀＝y+[2p₁x+p₂(r²+2y²)]

r²＝x₀ ²+y₀ ²

step 3, performing Bouguet epipolar line correction on the binocular images to enable the heights of the image points on the left image and the right image to be consistent: the goal of Bouguet epipolar line correction is to enable the optical axes of two cameras to be parallel, and the heights of image points on left and right images are consistent, so that when the feature points are matched in a three-dimensional mode, only the matching points of the left and right image planes need to be searched on the same line, and the efficiency can be greatly improved; the Bouguet epipolar line correction method is to decompose a rotation matrix R of a right image plane relative to a left image plane into two matrices R_lAnd R_rRotating the left camera and the right camera by half respectively to enable the optical axes of the left camera and the right camera to be parallel, and constructing a transformation matrix R by the principle of decomposition that the distortion caused by the re-projection of the left image and the right image is minimum and the common area of the left view and the right view is maximum_rectThe base line is aligned with the imaging plane in parallel, namely, the row is aligned, and the integral rotation matrix R of the left camera and the right camera is obtained by multiplying the synthetic rotation matrix and the transformation matrix_l′，R_r' multiplying the coordinate systems of the left camera and the right camera by respective integral rotation matrixes to enable the main optical axes of the left camera and the right camera to be parallel and enable the image plane to be parallel to the base line; each matrix is represented as:

R_l＝R^1/2

R_r＝R^1/2

e3＝e1×e2

R_l′＝R_rect*R_l

R_r′＝R_rect*R_r

and 4, after binocular image correction, calculating a vertical coordinate range and a horizontal coordinate range of non-zero pixels of the left image, setting the pixel value to be zero outside a corresponding area of the right image, and setting the minimum value of the non-zero pixels of the horizontal coordinate of the right image to be the difference value between the minimum value of the horizontal coordinate of the left image and the maximum parallax.

5. The binocular vision positioning method for underwater robot target grabbing according to claim 4, wherein the fourth step specifically comprises the following steps:

step 2, feature point description: rotating the neighborhood of the feature point to the main direction, and establishing a descriptor in a window W of the 48 multiplied by 48 field with the feature point as the center; the underwater robot has special working environment, uneven illumination, water particle scattering, random noise and other factors which influence the descriptor of the characteristic point, the three-tuple descriptor LATCH compares pixel blocks instead of single pixel values, the pixel blocks have more visual information supported by space, and are not sensitive to noise, and the descriptor has good accuracy and distinguishability; selecting three 7 x 7 pixel blocks P_t，a，P_t，1，P_t，2Comparing two pixel blocks P_t，1，P_t，2And a main pixel block P_t，aThe square sum of the norms determines the value of each bit of the descriptor;

step 4, removing mismatching: mismatch is removed using the PROSAC algorithm.

6. The binocular vision positioning method for underwater robot target grabbing according to claim 5, wherein the PROSAC algorithm flow is as follows: the matching quality is expressed by the ratio of the nearest neighbor distance and the next nearest neighbor distance of the matching pair; the arrangement of the ratio from large to small, namely the ordering of the matching quality from good to poor, 4 groups of matching point pairs with good matching quality are extracted to calculate a homography matrix H, the rest matching point pairs calculate corresponding projection points according to the homography matrix H, the error e of the projection points and the matching points is compared with an error threshold delta, if e is less than delta, an inner point is determined, and otherwise, the inner point is an outer point; if the number T of the inner points is larger than the set threshold value T, the number of the inner points is updated to T, otherwise, iteration is continued, the homography matrix and the new inner points are recalculated by using the updated inner points, and if the iteration frequency I is smaller than I_mReturning the homography matrix and the new inner point set, otherwise, finding a conforming model; and finally, the calculated outliers are the mismatching point pairs and are removed.

7. The binocular vision positioning method for underwater robot target grabbing according to claim 6, wherein the five-step three-dimensional coordinate calculation specifically comprises the following steps:

step 2, setting the coordinate of the point to be measured under the coordinate system of the left camera as P (X)_C，Y_C，Z_C) The pixel coordinate of the imaging point on the left and right cameras is P_l(u_l，v_l)，P_r(u_r，v_r) The parallax value is d, (X)_C，Y_C，Z_C) The calculation is as follows:

d＝u_l-u_r

8. The binocular vision positioning method for underwater robot target grabbing according to any one of claims 1 to 7, wherein: the binocular camera fixing support and the horizontal plane form an included angle of 60 degrees, when the paw grabs a target, the arm extends forwards to enable the paw to appear in the visual field of the camera to finish autonomous grabbing action.