CN114067197B

CN114067197B - Pipeline defect identification and positioning method based on target detection and binocular vision

Info

Publication number: CN114067197B
Application number: CN202111360831.6A
Authority: CN
Inventors: 何洪权; 葛昱彤; 申安慧; 张琳; 朱一沁
Original assignee: Henan University
Current assignee: Henan University
Priority date: 2021-11-17
Filing date: 2021-11-17
Publication date: 2022-11-18
Anticipated expiration: 2041-11-17
Also published as: CN114067197A

Abstract

The invention provides a pipeline defect identification and positioning method based on target detection and binocular vision, which comprises the following steps: calibrating and stereo matching a binocular camera to obtain an image with relatively small distortion, and determining a specific functional relation between a specific position of a certain point in the image under a world coordinate system and a pixel point position of the certain point through the stereo matching and depth calculation; training a target detection model through a target detection algorithm and identifying a video image; when the image is identified to have a certain type of defects, the model frames a defect target, and the world coordinates of the center position of the defect target are displayed on a screen. The method for detecting and positioning the pipeline defects has the advantages of wide monitoring range, good real-time performance, high accuracy, accurate positioning and the like. Meanwhile, the method overcomes the limitation that the traditional pipeline detection methods such as a magnetic flux leakage detection method and the like can only detect the ferromagnetic material pipeline, can not generate false signals, and improves the situations of false alarm and false alarm to a certain extent.

Description

Pipeline defect identification and positioning method based on target detection and binocular vision

Technical Field

The invention relates to the technical field of target detection and multi-view vision positioning in computer vision, in particular to a pipeline defect identification and positioning method based on target detection and binocular vision.

Background

The method adopts a single neural network to directly predict the boundary and the class probability of the object, and realizes the end-to-end class detection. Binocular vision is an important distance perception technology in a computer passive distance measurement method, and under the premise of non-contact measurement, after calibration, stereo matching and other operations are carried out on a camera, the parallax of left and right images is calculated, a depth map is obtained, and therefore the coordinate position of an object in the real world is reflected.

YOLOX is the latest result of the YOLO algorithm, and compared with a YOLOV3-V5 series, the method has the advantages of improvement in identification precision and certain competitive advantage in speed. After the advantages of each version are integrated, the target detection precision is improved by improving a data enhancement strategy and increasing a training sample, and meanwhile, a detector without an anchor point is added to improve the detection speed. The model trained under the YOLOX network framework is substituted into the recognition network, so that the detection of multiple targets and multiple classes can be realized, and the method has the characteristics of wide detection class range, high accuracy, high speed and the like.

However, the mainstream methods for detecting defects in the pipeline, i.e. the photoelectric detection methods, include an endoscopic method, a laser projection method, a CCD camera image acquisition method, and the like. Although the method adopts the detection technology in the field of image vision, the detection precision is low, further measurement of the defect size is not involved, certain manpower is consumed, data of the inner surface ring of the pipeline cannot be extracted, and the defect condition of the inner surface of the pipeline cannot be further represented visually. The limitations of computer vision methods in the field of classification of pipe defects are mainly the low detection accuracy and the lack of data sets.

Disclosure of Invention

The invention provides a pipeline defect identification and positioning method based on target detection and binocular vision, aiming at the technical problem that the internal defects of a pipeline cannot be detected and positioned with high precision when the existing pipeline identification and positioning technology actually detects defect targets.

The technical scheme of the invention is realized as follows:

a pipeline defect identification and positioning method based on target detection and binocular vision comprises four parts, namely a camera calibration module, a stereo correction module, an image capture module, a target detection module, a stereo matching module and a depth calculation module; the method comprises the following steps:

the method comprises the following steps: calibrating a camera: establishing a camera imaging geometric model, determining the mutual relation between the three-dimensional geometric position of one point on the surface of a space object and the corresponding point in the image, and solving the calibration parameters of the binocular camera according to the image coordinates and the world coordinates of the characteristic points on the calibration plate, wherein the calibration parameters comprise the internal parameters, the external parameters and the distortion parameters of the left camera and the right camera;

step two: and (3) stereo correction: carrying out three-dimensional correction through epipolar constraint to enable corresponding points in the two images to be on the same horizontal epipolar line, and acquiring calibration parameters of the corrected binocular camera;

step three: image capture and target detection: self-building a data set, carrying out target detection model training, capturing images by using a binocular camera, predicting real-time picture information displayed by a left camera, and outputting object identification information;

step four: the stereo matching and depth calculating module comprises: and transmitting the images of the left camera and the right camera after the stereo correction and the object identification information in the step three as input quantities to a stereo matching and depth calculation module to obtain a space three-dimensional coordinate of the identified object in the image of the left camera of the binocular camera, which is mapped to an actual three-dimensional space.

Preferably, the method for implementing camera calibration includes:

step 1.1: making a chessboard calibration board composed of black and white square intervals, shooting chessboard images of the chessboard calibration board by using a binocular camera in different directions, extracting angular point information of each chessboard image, and obtaining image coordinates and space three-dimensional coordinates of all inner angular points on the chessboard images;

step 1.2: establishing a geometric model of camera imaging, and determining the correlation between the three-dimensional geometric position of one point on the surface of the space object and the corresponding point in the image;

step 1.3: taking the image coordinates of all inner corner points on the chessboard image obtained in the step 1.1 and the space three-dimensional coordinates thereof as input, solving and respectively outputting an inner parameter matrix and an outer parameter matrix of a left camera and a right camera of the binocular camera through experiments and calculation according to a geometric model of camera imaging;

step 1.4: solving 5 distortion parameters k according to the internal parameter matrix and the external parameter matrix of the left camera and the right camera of the binocular camera by utilizing the coordinate relationship before and after distortion correction ₁ 、k ₂ 、k ₃ 、p ₁ 、p ₂ To correct for the distortion.

Preferably, in step 1.2, the extrinsic parameter matrix

Reflecting the conversion between a camera coordinate system and a world coordinate system, wherein R is a rotation matrix of a right camera relative to a left camera of the binocular camera, and t is a translation vector of the right camera relative to the left camera of the binocular camera; internal parameter matrix

Reflecting the conversion between the pixel coordinate system and the camera coordinate system, f is the focal length of the lens, u ₀ ,v ₀ ) Is the coordinate of the origin of the image coordinate system in the pixel coordinate system, d _x 、d _y And the size of each pixel point in the x-axis direction and the y-axis direction of the image coordinate system is determined.

Preferably, in step 1.4, the method for correcting the aberration is:

wherein (x) _p ,y _p ) Is the original coordinates of the image, (x) _tcorr ,y _tcorr ) Is the coordinates after image correction, and r is a variable.

Preferably, the implementation method of the stereo correction is as follows:

step 2.1: dividing a rotation matrix R of a right camera relative to a left camera of a binocular camera into a synthetic rotation matrix R of the left camera ₁ And the composite rotation matrix r of the right camera ₂ The left camera and the right camera respectively rotate by half, so that the optical axes of the left camera and the right camera are parallel, and the imaging surfaces of the left camera and the right camera are coplanar;

step 2.2: the composite rotation matrix r of the left camera ₁ Composite rotation matrix r of right camera ₂ Inputting original left camera and right camera internal reference matrixes, translation vector t and size of chessboard image into OpenCV, and outputting a row alignment correction rotation matrix R of the left camera through a cvStereoRectify function ₁ Right camera row alignment correction rotation matrix R ₂ And the internal reference matrix M of the corrected left camera _l And the internal reference matrix M of the corrected right camera _r And the corrected left camera projection matrix P _l And the corrected right camera projection matrix P _r And a reprojection matrix Q;

step 2.3: and (3) taking the output result of the cvStereoRectify function in the step (2.2) as a known constant, searching the position of each shaped pixel on the target image by adopting reverse mapping through a correction lookup mapping table of left and right views and the floating point position on the corresponding source image, interpolating each shaped value of the surrounding source pixels, cutting the image after the corrected image is assigned, and storing the correction result.

Preferably, the method for capturing the image and detecting the target is as follows:

step 3.1: capturing and screening a pipeline defect image by using a binocular camera and a web crawler, marking and classifying the pipeline defect image, and creating a data set adapted to the environment;

step 3.2: sorting the pipeline defect images and the marks on an organization catalog, loading the pipeline defect images and the marks in a data set by a getitem method, using a COCO Evaluator as an Evaluator, and putting all contents related in a deep learning model into a single COCO Exp file, wherein the contents comprise model setting, training setting and test setting;

step 3.3: after the setting is finished, initializing a model by using pre-training weights after the training of a COCO open source data set is finished, inputting a training command in a command line to start the training by using a GPU, and obtaining a weight file and a deep learning target detection model after the training is finished;

step 3.4: and capturing a real-time image of the pipeline through a binocular camera, inputting the real-time image of the pipeline into a deep learning target detection model, and outputting object identification information.

step 4.1: calculating to obtain the matching cost of each pixel point of the original image in a preset parallax range according to the stereoscopically corrected left camera image and right camera image of the binocular camera;

step 4.2: calculating to obtain a multi-path cost aggregation value of each pixel in a preset parallax range according to the matching cost of each pixel in the preset parallax range;

step 4.3: calculating to obtain the parallax of each pixel after cost aggregation according to the multi-path cost aggregation value of each pixel in a preset parallax range;

step 4.4: performing parallax optimization according to the parallax of each pixel after cost aggregation to obtain a parallax image of the binocular camera left camera after stereo correction;

step 4.5: and (3) performing depth calculation on the parallax map of the binocular camera after the stereo correction to obtain a depth map of the binocular camera left image after the stereo correction, and finally obtaining a space three-dimensional coordinate mapped to the actual three-dimensional space by the recognized object in the binocular camera left image after the stereo correction by combining the object recognition information obtained in the step (3.4).

Preferably, the method for calculating the matching cost includes:

wherein, e (x) _R Y, d) denotes the absolute value of the pixel grey value difference, I _R (x _R Y) representing candidate image pixel point (x) _R -0.5,y) and (x) _R +0.5, y) of the gray values of the sub-pixel positions,

gray value, I, representing the sub-pixel position between the pixels (x + d-0.5, y) and (x + d +0.5, y) of the matched image _T (x _R + d, y) represents a candidate image pixel point (x) _R + d-0.5, y) and (x) _R + d +0.5, y) of the gray values of the sub-pixel positions,

the gray value representing the sub-pixel position between the pixels (x-0.5, y) and (x +0.5, y) of the matched image, x _R Representing the abscissa of the candidate image pixel, y the ordinate of the image pixel, d the disparity and x the abscissa of the matching image pixel.

Preferably, the method for calculating the multipath cost aggregation value of each pixel in the preset parallax range includes:

calculate the path cost of pixel p along path r:

wherein p represents a pixel, r represents a path, d represents a disparity, p-r represents a pixel in the p path of the pixel, and L represents a path aggregation cost value. L is _r (p-r, d) represents the cost value of the last pixel in the path with a disparity of d, L _r (p-r, d-1) represents the cost value of the last pixel in the path with a disparity d-1, L _r (p-r, d + 1) represents the cost value of the last pixel in the path with the parallax of d +1, min _i L _r (p-r, i) represents the minimum of all cost values of the last pixel in the path; c (p, d) represents the initial cost value;

the total path cost value S is obtained from the path cost of pixel p along path r:

S(p,d)＝∑ _r L _r (p,d)。

preferably, the depth calculation method includes:

wherein f is the focal length, b is the base length, d is the parallax, c _xr Column coordinates of the principal point of the right camera, c _xl Column coordinates of the principal point of the left camera.

Compared with the prior art, the invention has the following beneficial effects:

1) The invention adopts a binocular RGBD camera to predict and position the category of the pipeline defect, improves the detection speed and precision compared with the existing method, and provides a classification method of a pipeline defect data set, which applies deep learning to binocular vision positioning to realize target detection, builds a deep learning convolutional neural network by referring to a YOLOX network, and performs appropriate adjustment, simplifies a feature extraction part, simplifies a network structure, ensures information transmission, and properly positions pixels to improve the prediction rate.

2) The invention reduces the detection cost, can predict and position various pipelines with various types and quantities, and has the advantages of wide monitoring range, good real-time performance, high precision, accurate positioning and the like; meanwhile, the invention overcomes the limitation that the traditional pipeline detection methods such as a magnetic flux leakage detection method and the like can only detect the ferromagnetic material pipeline, can not generate false signals, and improves the situations of false alarm and false alarm to a certain extent.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of the present invention.

Fig. 2 is a schematic structural diagram of a deep learning convolutional neural network according to the present invention.

Fig. 3 is a substructure diagram of a network used in fig. 2.

FIG. 4 is an exemplary classification and labeling of pipe defects according to the present invention.

Fig. 5 is a flowchart of camera calibration according to the present invention.

Fig. 6 is a flow chart of stereo calibration according to the present invention.

Fig. 7 is a flow chart of stereo matching and depth calculation according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.

As shown in fig. 1, the embodiment of the invention discloses a pipeline defect identification and positioning method based on target detection and binocular vision, which comprises four parts, namely a camera calibration module, a stereo correction module, an image capture module, a target detection module, a stereo matching module and a depth calculation module; a camera calibration module: the image correction method comprises an internal parameter coefficient, an external parameter coefficient and a distortion coefficient, and can be used for correcting the image shot by a camera later to obtain an image with relatively small distortion. Inputting image coordinates and world coordinates of known characteristic points of a calibration plate, establishing a camera imaging geometric model, determining the relation between camera image pixel coordinates and scene point three-dimensional coordinates, and outputting internal and external parameters and distortion coefficients of the camera. The stereo correction module: a visual system with an intersecting optical axis structure is adopted, and the relative positions of the two cameras are changed by decomposing a rotation matrix and a line alignment correction rotation matrix, so that corresponding points in the two images are on the same horizontal polar line. And acquiring corrected camera parameters by using a mathematical physical method on the basis of the original data of the camera. Two-dimensional search is changed into one-dimensional search, matching search space is reduced, and search rate of stereo matching is improved. And finally, finishing image correction through a correction mapping lookup table, and cutting and storing the image. A target detection module: the method comprises the steps of inputting RGB three-channel images shot by a left camera after stereo correction, training a deep learning network based on a target detection algorithm through a self-built pipeline defect data set, deducing and predicting the images by the network, and outputting the images as a defect semantic label class, a recognition box center coordinate (x, y) and a width (w, h) recognized in the left camera images. And the semantic label, the center coordinates of the identification frame and the width and height data are used as object identification information. The stereo matching and depth calculating module comprises: the method comprises the steps of taking left and right camera images of a binocular camera after stereo correction and object identification information obtained through deep learning object identification as input, obtaining a disparity map of the left camera of the binocular camera after stereo correction after processing through a stereo matching algorithm, converting the disparity map into a depth map through depth calculation, and finally combining the object identification information to output spatial three-dimensional coordinates which are mapped to an actual three-dimensional space by an identified object in the left camera image of the binocular camera after stereo correction. The method comprises the following specific steps:

the method comprises the following steps: calibrating a camera: calibrating a camera, establishing a camera imaging geometric model, determining the mutual relation between the three-dimensional geometric position of one point on the surface of a space object and the corresponding point in an image, and solving calibration parameters of the binocular camera according to the image coordinates and world coordinates of the feature points on the calibration plate, wherein the calibration parameters comprise internal parameters, external parameters and distortion parameters of a left camera and a right camera; and the internal parameter, the external parameter and the distortion parameter of the binocular camera and the right camera are used as camera calibration parameters. The internal reference, external reference coefficient and distortion parameter can correct the image shot by the camera later to obtain the image with relatively small distortion.

The method for realizing the camera calibration comprises the following steps:

step 1.1: a chessboard calibration plate composed of black and white square blocks at intervals is manufactured, and chessboard images of the chessboard calibration plate are shot by using binocular cameras in different directions, so that single-plane chessboard grids are imaged clearly in the left camera and the right camera. Extracting corner point information of each chessboard image to obtain image coordinates and space three-dimensional coordinates of all inner corner points on the chessboard image;

in step 1.2, the extrinsic parameter matrix

Reflecting the conversion between the pixel coordinate system and the camera coordinate system, f is the focal length of the lens, u ₀ ,v ₀ ) As the coordinates of the origin of the image coordinate system in the pixel coordinate system, d _x 、d _y And the size of each pixel point in the x-axis and y-axis directions of the image coordinate system is obtained.

In step 1.4, the method for correcting the distortion is as follows:

wherein (x) _p ,y _p ) Is the original coordinates of the image, (x) _tcorr ,y _tcorr ) Is the coordinate after the image correction, r is the variable; (x) _tcorr ,y _tcorr ) The approximation is described by a taylor series expansion at r = 0.

Step two: and (3) stereo correction: performing stereo correction through epipolar constraint to enable corresponding points in the two images to be on the same horizontal epipolar line, and acquiring calibration parameters of the corrected binocular camera; and calling OpenCV to obtain parameters of the corrected left camera and the corrected right camera, finishing correction, and finally obtaining corrected images through correction mapping.

The implementation method of the stereo correction comprises the following steps:

step 2.2: the composite rotation matrix r of the left camera ₁ Composite rotation matrix r of right camera ₂ Inputting original left camera and right camera internal reference matrixes, translation vector t and size of chessboard image into OpenCV, and outputting a row alignment correction rotation matrix R of the left camera through a cvStereoRectify function ₁ And correcting the rotation matrix R by the row alignment of the right camera ₂ And correctingInternal reference matrix M of rear left camera _l And the internal reference matrix M of the corrected right camera _r And the corrected left camera projection matrix P _l And the corrected right camera projection matrix P _r And a reprojection matrix Q;

step 2.3: taking the output result of the cvStereoRectify function in the step 2.2 as a known constant, searching a floating point position on a corresponding source image for each shaped pixel position on a target image by adopting reverse mapping through a correction searching mapping table of left and right views, interpolating each shaped value of surrounding source pixels, cutting the image after the corrected image is assigned, and storing the correction result.

Step three: image capture and target detection: self-building a data set, carrying out target detection model training, capturing images by using a binocular camera, predicting real-time picture information displayed by a left camera, and outputting object identification information; and inputting the left camera image obtained by correction in the step into a target detection module, and labeling and classifying the image captured by a binocular camera to self-establish a pipeline defect data set. And aggregating and forming a convolutional neural network of image features on different fine granularities of the images, mixing and combining the image features, transmitting the image features to a prediction layer, predicting the image features, generating a boundary frame and predicting the category. And outputting the recognized defect semantic label class, the recognized frame center coordinates (x, y) and the width and height (w, h) in the left camera image, and displaying a target detection result.

The method for realizing image capture and target detection comprises the following steps:

step 3.1: capturing and screening a pipeline defect image by using a binocular camera and a web crawler, marking and classifying the pipeline defect image, and creating a data set adapted to the environment; for the marker image, labelling or CVAT was used. The defects are classified as follows, since the ND class is an annotation-free class, the ND class is removed, and the total number of the ND class is 9 semantic tags.

(1) ND class (annotated class): and no defect. (one meter of buffer around each annotation class, not labeled ND if there are any other classes between annotation classes);

(2) OB class: surface damage;

(3) FO type: a category of obstacles;

(4) The RB class: classes that include cracks, fractures, and collapse;

(5) The RO type: species of root invasion of plants;

(6) Class DE: deforming;

(7) FS type: dislocation;

(8) The GR class: a branch pipe;

(9) The BE class: the class of adherent deposits;

(10) The AF-type: the class of fixed deposits.

Step 3.2: sorting the pipeline defect images and the marks on an organization catalog, loading the pipeline defect images and the marks in a data set by a getitem method, using a COCO Evaluator as an Evaluator, and putting all contents related in a deep learning model into a single COCO Exp file, wherein the contents comprise model setting, training setting and test setting; the number of training rounds epos =300, and the number of defect classes num _ classes =9 is defined on the basis of the classification at step 3.1.

step 3.4: and capturing a real-time image of the pipeline through a binocular camera, inputting the real-time image of the pipeline into a deep learning target detection model, and outputting object identification information. In the deep learning target detection model, a convolutional neural network of image features is aggregated and formed on different image fine granularities, the image features are mixed and combined, the image features are transmitted to a prediction layer, the image features are predicted, a boundary frame is generated, the category is predicted, and object identification information is output.

Step four: the stereo matching and depth calculating module comprises: and transmitting the images of the left camera and the right camera after the stereo correction and the object identification information in the step three as input quantities to a stereo matching and depth calculation module, processing the images by using the center coordinates of the identification frame in the object identification information as reference quantities through the stereo matching and depth calculation module, and outputting to obtain the space three-dimensional coordinates of the object to be identified in the left camera image of the binocular camera, which is mapped to the actual three-dimensional space.

step 4.1: calculating to obtain the matching cost of each pixel point of the original image in a preset parallax range according to the stereoscopically corrected left camera image and right camera image of the binocular camera; the purpose of the matching cost calculation is to measure the correlation between the pixel to be matched and the candidate pixel. Whether the two pixels are homonymous points or not can be calculated through a matching cost function, and the smaller the cost is, the greater the correlation is, and the greater the probability of being homonymous points is.

BT (Birchfield and Tomasi) algorithm was used as a calculation method for the matching cost, expressed as:

wherein, e (x) _R Y, d) denotes the BT cost, i.e. the absolute value of the difference in the grey values of the pixels, I _R (x _R Y) representing a candidate image pixel point (x) _R -0.5,y) and (x) _R +0.5, y) of sub-pixel locations,

the gray value representing the sub-pixel position between the pixels (x-0.5, y) and (x +0.5, y) of the matched image, x _R Denotes the abscissa of the candidate image (left image) pixel, y denotes the ordinate of the image pixel, d denotes the disparity, and x denotes the abscissa of the matching image (right image) pixel. Tong (Chinese character of 'tong')Through the formula, the matching cost calculation of the binocular camera left and right camera images subjected to the stereo correction in the preset parallax range can be realized, and the matching cost of each pixel point of the original image in the preset parallax range is obtained.

And 4.2: calculating to obtain a multi-path cost aggregation value of each pixel in a preset parallax range according to the matching cost of each pixel in the preset parallax range; the idea of adopting a global stereo matching algorithm, namely a global energy optimization strategy, is to simply find the optimal parallax of each pixel so that the global energy function of the whole image is minimum, and the definition of the global energy function is as follows:

E(d)＝E _data (d)+E _smooth (d)；

and (3) adopting a path cost aggregation method, namely performing one-dimensional aggregation on all paths around the pixel on the matching costs under all parallaxes of the pixel to obtain a path cost value under the path, and then adding all the path cost values to obtain the matched cost value after the pixel is aggregated. Calculate the path cost of pixel p along path r:

wherein p represents a pixel, r represents a path, d represents a disparity, p-r represents a pixel in the p path of the pixel, and L represents a path aggregation cost value. L is _r (p-r, d) represents the cost of the last pixel in the path with a disparity d, L _r (p-r, d-1) represents the cost value of the last pixel in the path with a disparity d-1, L _r (p-r, d + 1) represents the cost value of the last pixel in the path with the parallax of d +1, min _i L _r (p-r, i) represents the minimum of all cost values of the last pixel in the path; c (p, d) represents the initial cost value.

The first item is the matching cost value C, belonging to the data item.

The second item is a smooth item, the value accumulated on the path cost is taken, no punishment is made, and P is made ₁ Punishment and do P ₂ Punishing the value with the minimum cost in the three cases; p ₁ To accommodate tilting orCurved surfaces, P ₂ Then the purpose is to preserve the discontinuity. P ₂ It is often dynamically adjusted according to the gray level difference of the adjacent pixels as follows:

P ₂ ' is P ₂ Is generally set to be much larger than P ₁ Number of (1), I _bp And I _bq Representing the gray values of pixels p and q, respectively.

The third term is to guarantee a new path cost value L _r Not exceeding a certain numerical upper limit.

S(p,d)＝∑ _r L _r (p,d)；

and 4.2, multipath cost aggregation can be realized through various calculations in the step 4.2, and a multipath cost aggregation value of each pixel in a preset parallax range is obtained.

Step 4.3: calculating to obtain the parallax of each pixel after cost aggregation according to the multi-path cost aggregation value of each pixel in a preset parallax range; and parallax calculation, namely determining the optimal parallax value of each pixel through a cost matrix S after cost aggregation, and selecting the parallax corresponding to the minimum cost value as the optimal parallax in the cost values of all the parallaxes of a certain pixel by adopting a WTA (Winner-take-all) algorithm to finally obtain the parallax of each pixel after cost aggregation.

Step 4.4: performing parallax optimization according to the parallax of each pixel after cost aggregation to obtain a parallax image of the binocular camera left camera after stereo correction; the purpose of parallax optimization is to further optimize the parallax image obtained in the last step, improve the quality of the parallax image, including rejecting mismatching, improving parallax precision and suppressing noise.

The elimination of the error matching adopts a left-right consistency checking method which is based on the uniqueness constraint of parallax, namely that each pixel has only one correct parallax at most. The method comprises the following specific steps of interchanging left and right images, namely enabling the left image to be the right image and enabling the right image to be the left image, performing stereo matching again to obtain another Zhang Shicha image, wherein each value in the disparity map reflects the corresponding relation between two pixels, so that the homonymy point pixel of each pixel in the right image and the disparity value corresponding to the pixel are found through the disparity map of the left image according to the unique constraint of disparity, if the absolute value of the difference between the two disparity values is smaller than 1, the unique constraint is met, and the disparity value is retained, otherwise, the disparity value is not met and eliminated. Meanwhile, isolated outliers are removed by adopting a connected domain detection method, small blocks caused by mismatching in the disparity map are removed, and small isolated speckles are filtered. The formula for the consistency check is as follows:

and improving the parallax precision by adopting a sub-pixel optimization technology, obtaining the sub-pixel precision by using a quadratic curve interpolation method, and performing quadratic curve fitting on the cost value of the optimal parallax and the cost values of the front parallax and the rear parallax, wherein the parallax value corresponding to the extreme point of the curve is the new sub-pixel parallax value.

And Kalman filtering is adopted for noise suppression, so that the parallax result is smoother, the noise in the parallax image is eliminated to a certain extent, and the parallax filling effect is achieved. Parallax optimization can be achieved through the step 4.4, and finally the parallax image of the binocular camera left camera after stereo correction is obtained.

The depth calculation method comprises the following steps:

wherein f is the focal length, b is the base length, d is the parallax, c _xr Column coordinates of the principal point of the right camera, c _xl Is the column coordinate of the principal point of the left camera. After depth calculation, a depth map of the binocular camera left camera image after stereo correction can be obtained, and a space three-dimensional coordinate of the binocular camera left camera image after stereo correction, which is mapped to an actual three-dimensional space by the recognized object, can be finally obtained by combining object recognition information obtained through deep learning target detection.

As shown in fig. 2 and fig. 3, fig. 2 is a deep learning convolutional neural network structure proposed by the present invention, which is composed of three parts, i.e., feature extraction, feature fusion, class prediction and detection frame calibration. Fig. 3 is an important substructure contained in the network.

And the feature extraction part is used for aggregating and forming image features on different fine image granularities, and comprises a Focus interlaced sampling splicing structure and four DarkNet structures. Focus is used as the input part of the whole network, starting with the default 3 × 640 × 640 input, copying four, slicing four pictures into four 3 × 320 × 320 slices by a slicing operation, then splicing the four slices from depth using concat, generating 12 × 320 × 320 output, and finally inputting the result to the next convolutional layer via batch _ borm and leak _ relu. The basic architecture of the DarkNet structure consists of CSP and SPP, the original input is divided into two branches, convolution operation is respectively carried out to reduce the number of channels by half, the Bottleneck operation of YOLOV5 is abandoned in the first branch, a ResUnit structure is adopted, and then the first branch and the second branch are concat, so that the input and the output of the CSP are the same in size, and the purpose of the CSP is to enable a model to learn more characteristics; the SPP has an input of 512 × 20 × 20, is output 256 × 20 × 20 after passing through a 1 × 1 convolutional layer, is sampled by three parallel MaxPools, is added to its initial characteristics to output 1024 × 20 × 20, and is finally restored to 512 × 20 × 20 with a 512 convolution kernel.

The function of the feature fusion part is to mix and combine image features and transfer the image features to a prediction layer, using PAFPN, which enhances information propagation in view of YOLOV3, based on Mask R-CNN and FPN frameworks, with the ability to accurately retain spatial information, which helps to properly position pixels to form a Mask.

And predicting the image characteristics by the last part of network structure to generate a boundary box and predict the category. Firstly, extracting three characteristic layers by using Decoupled Head to detect a target, simultaneously adding a detector without an anchor point to improve the detection speed, and in an output part, the three outputs are 85 multiplied by 20, 85 multiplied by 40 and 85 multiplied by 80 respectively. And obtaining the defect prediction information by decoding the prediction result.

Compared with a YOLOX network, the deep learning convolution neural network structure provided by the invention is built by referring to the YOLOX network, partial CBS operation and convolution times are reduced in a characteristic extraction part, and the characteristic extraction part of the network is simplified. Meanwhile, the calculation mode of the loss function is modified, and standard binary cross entropy loss is used for training.

Wherein C is the number of semantic tags num mentioned above _classes ＝9，

Indicates whether there is a class c, x in the current picture _c Is the original output of the c-type model, sigma is Sigmoid activation function,

is a weight of class c, N represents the number of images in training, N _c Indicates the number of images containing class c.

As shown in fig. 4, the present invention provides a method for classifying pipeline defects, which comprises classifying the defect types into 10 classes, excluding the ND class, i.e. the defect-free class, and counting 9 semantic tags, namely, the surface damage class, the obstacle class, the crack class, the root invasion class, the deformation class, the dislocation class, the branch tube class, the adhesion deposit class and the fixed deposit class. Meanwhile, the figure shows an example of labeling 9 types of defects, which can be used as a reference when a pipeline defect data set is built by itself. The problem of fuzzy data classification standards of a computer vision method in the field of pipeline defect classification is solved through a pipeline defect classification method.

As shown in fig. 5, the camera calibration method disclosed in the present invention uses a camera calibration module. The input of the camera calibration module is the image coordinate and world coordinate of the known characteristic point of the calibration plate, so that a geometric model of camera imaging is established, the relation between the image pixel coordinate of the camera and the three-dimensional coordinate of the scene point is determined, and the internal and external parameter matrixes and the distortion coefficient of the camera are output.

An external parameter matrix W reflecting the conversion between the camera coordinate system and the world coordinate system; and the internal parameter matrix M reflects the conversion between the pixel coordinate system and the camera coordinate system.

An extrinsic parameter matrix W, an intrinsic parameter matrix M, and 5 distortion coefficients k ₁ 、k ₂ 、k ₃ 、p ₁ 、p ₂ As follows.

R is a rotation matrix of the binocular camera right camera serving as the camera relatively, and t is a translation vector of the binocular camera right camera serving as the camera relatively. f is the focal length of the lens and (u) ₀ ,v ₀ ) As the coordinates of the origin of the image coordinate system in the pixel coordinate system, d _x 、d _y And the size of each pixel point in the x-axis and y-axis directions of the image coordinate system is obtained. (x) _p ,y _p ) Is the original coordinates of the image, (x) _tcorr ,y _tcorr ) Is the coordinates after image correction, and the approximation is developed by Taylor series at r =0Like the description.

The camera calibration method is realized by Matlab, and various chessboard images should be acquired as much as possible. The distortion of the lens increases radially from the center of the image and sometimes exhibits non-uniformity across the frames of the image, and in order to obtain distortion information for the image, the checkerboard should be at various different edges of the image. And obtaining a rotation matrix R and an internal reference matrix M after Matlab calibration, and obtaining the rest parameter values after calculation. For subsequent import into OpenCV use, the rotation matrix R and the internal reference matrix M should be transposed. The order of distortion coefficients in the distortion vector is [ k ] ₁ ,k ₂ ,p ₁ ,p ₂ ,k ₃ ]。

As shown in fig. 6, the stereo calibration method disclosed in the present invention uses a stereo calibration module. Stereo correction is carried out through polar constraint, and a rotation matrix R output by a calibration module is divided into a synthetic rotation matrix R of a left camera and a right camera ₁ And r ₂ And two parts, corresponding points in the two images are on the same horizontal polar line. Acquiring corrected camera parameters including a composite rotation matrix r ₁ And r ₂ The original left and right camera internal reference matrix, the translation vector t and the size of the chessboard image. Calling functions StereoRecityf and initunortisterRecityMap in OpenCV to output left and right camera row alignment correction rotation matrix R ₁ And R ₂ And the internal reference matrix M of the left camera and the right camera after correction _l And M _r And a projection matrix P of the left camera and the right camera after correction _l And P _r And a reprojection matrix Q.

The reprojection matrix Q enables a conversion between the world coordinate system and the pixel coordinate system. The method comprises the following specific steps:

the origin of the camera coordinate system after correction is on the optical axis. Wherein d represents a parallax and the three-dimensional coordinates are

c′ _x Principal point representing right image, when correctedAt last, c' _x ＝c _x . The above formula is developed on the premise that the stereo correction is correct, as follows:

and acquiring parameters of the corrected left camera and the corrected right camera in real time, and acquiring a corrected image through correction mapping. And for each reshaped pixel position on the target image, finding out a floating point position on the corresponding source image, and interpolating each reshaped value of the surrounding source pixels. And after the corrected images are assigned, cutting the images and storing the corrected results. Finally, two image planes of the binocular camera are parallel, the optical axis is perpendicular to the image planes, the pole is at infinity, and at the moment, any point (x) in the image ₀ ,y ₀ ) Corresponding polar line y = y ₀ 。

As shown in fig. 7, the stereo matching and depth calculating method disclosed in the present invention uses a stereo matching module. The method comprises the steps of matching cost calculation based on a BT algorithm, cost aggregation based on a global energy optimization strategy, parallax calculation based on a WTA algorithm, parallax optimization and depth calculation.

The matching cost is calculated as follows. BT is the cost, i.e. the absolute value of the pixel grey value difference, while BT makes use of the grey information of the sub-pixels. First, calculate the pixel (x) in the left image _R -0.5,y) and (x) _R Between +0.5, y) sub-pixel positions (x) _R Gray value of + x, y)

The same can be said for the pixel in the right image (x) _R + d-0.5,y) and (x) _R Between + d +0.5, y) sub-pixel positions (x) _R A grey value of + d + x, y)

Two costs are respectively calculated, and the final cost is the minimum value of the two costs.

The cost of the two images is as follows:

the cost aggregation adopts the idea of a global stereo matching algorithm, namely a global energy optimization strategy, and the optimal parallax of each pixel is searched to ensure that the global energy function of the whole image is minimum. And performing one-dimensional aggregation on the matching costs under all parallaxes of the pixels on all paths around the pixels to obtain path cost values under the paths, and then adding the path cost values to obtain the matched cost value after the pixels are aggregated.

The matching cost value is divided into three items, the first item is a matching cost value C and belongs to a data item; the second item is a smooth item, the value accumulated on the path cost is taken, no punishment is made, and P is made ₁ Punishment and do P ₂ Punishment is carried out on the value with the minimum cost in the three cases; the third item is a reduced item, and the new path cost value L is ensured _r Not exceeding a certain numerical upper limit.

To obtain L _r After (p, d), the total path cost value S is as follows:

and the parallax calculation adopts a greedy algorithm based on WTA, namely, in the cost values of all the parallaxes of a certain pixel, the parallax corresponding to the minimum cost value is selected as the optimal parallax, and finally the parallax of each pixel after cost aggregation is obtained.

The parallax optimization comprises rejecting error matching, improving parallax precision and suppressing noise. And (3) eliminating error matching, namely adopting a left-right consistency check method, interchanging left and right images, performing stereo matching again to obtain another Zhang Shicha image, finding out the pixels of the same-name points of each pixel in the right image and the corresponding parallax value of the pixel through the parallax image of the left image according to the uniqueness constraint of the parallax, and if the absolute value of the difference between the two parallax values is less than 1, satisfying the uniqueness constraint and retaining, otherwise not satisfying the uniqueness constraint and eliminating. Meanwhile, isolated outliers are removed by adopting a connected domain detection method, small blocks caused by mismatching in the disparity map are removed, and small isolated speckles are filtered.

And Kalman filtering is adopted for noise suppression, and the parallax is optimally corrected by Kalman gain, so that the parallax result is smoother, the noise in the parallax image is eliminated to a certain extent, and the parallax filling effect is realized.

Is a state estimation equation, representing the estimation of x (k + 1) at the kth time; w (k + 1) is the Kalman gain,

an equation is estimated for the observation.

The pixel depth calculation formula is as follows. Where f is the focal length, b is the baseline length, d is the parallax, c _xr And c _xl Column coordinates of two camera principal points.

In conclusion, the method for identifying and positioning the pipeline defects based on the target detection and the binocular vision can finally obtain the spatial three-dimensional coordinates of the identified object mapped into the actual three-dimensional space in the image of the binocular camera left camera after the stereo correction, and the target detection and the positioning are completed.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A pipeline defect identification and positioning method based on target detection and binocular vision is characterized by comprising four parts, namely a camera calibration module, a three-dimensional correction module, an image capture module, a target detection module, a three-dimensional matching module and a depth calculation module; the method comprises the following steps:

step 3.1: capturing and screening a pipeline defect image by using a binocular camera and a web crawler, marking and classifying the pipeline defect image, and creating a data set;

the defects are classified as follows, except that the ND class is an unannotated class, and the other 9 semantic labels are:

OB class: a class of surface damage; FO type: an obstacle category; RB is: classes that include cracks, fractures, and slumping; RO type: the species of root invasion of the plant; DE: deforming; FS type: dislocation; GR type: a branch pipe; BE class: the class of adherent deposits; AF types: the class of fixed deposits;

the deep learning convolution neural network structure is composed of three parts, namely feature extraction, feature fusion, category prediction and detection frame calibration;

the characteristic extraction part comprises a Focus interlaced sampling splicing structure and four DarkNet structures; focus as the input part of the whole network, starting with the default 3 × 640 × 640 input, copying four, slicing four pictures into four 3 × 320 × 320 slices by a slicing operation, then splicing the four slices from depth using concat, generating 12 × 320 × 320 output, and finally inputting the result to the next convolutional layer via batch _ borm and leak _ relu; the DarkNet structure consists of CSP and SPP, the original input is divided into two branches, convolution operation is respectively carried out to reduce the number of channels by half, the Bottleneck operation of YOLOV5 is abandoned in the first branch, a ResUnit structure is adopted, and then concat branch I and branch II are carried out, so that the input and the output of the CSP are the same in size, and the purpose of the CSP is to enable a model to learn more characteristics; the input of SPP is 512 × 20 × 20, 256 × 20 × 20 is output after 1 × 1 convolutional layer, then sampling is carried out through three parallel MaxPools, the result is added with the initial characteristic of the result, 1024 × 20 × 20 is output, and finally 512 convolution kernels are used for restoring the result to 512 × 20 × 20;

the feature fusion part is used for mixing and combining image features and transmitting the image features to a prediction layer, information transmission is enhanced by adopting PAFPN (planar edge projection network) based on Mask R-CNN and FPN frames, the capability of accurately retaining spatial information is realized, and pixels are favorably and properly positioned to form a Mask;

predicting image characteristics by the last part of network structure to generate a boundary box and predict categories; firstly, extracting three characteristic layers by using Decoupled Head to detect a target, adding a detector without an anchor point to improve the detection speed, wherein in an output part, the three outputs are respectively 85 multiplied by 20, 85 multiplied by 40 and 85 multiplied by 80; obtaining defect prediction information by decoding the prediction result;

the deep learning convolution neural network structure is built by referring to a YOLOX network, compared with the YOLOX network, partial CBS operation and convolution times are reduced in a feature extraction part, and the feature extraction part of the network is simplified; meanwhile, the calculation mode of the loss function is modified, and standard binary cross entropy loss is used for training:

wherein C is the number of semantic tagsnum _classes ＝9，

Indicates whether there is a class c, x, in the current picture _c Is the original output of the c-type model, sigma is the Sigmoid activation function,

is a weight of class c, N represents the number of images in training, N _c Indicates the number of images containing class c;

step 3.4: capturing a real-time image of the pipeline through a binocular camera, inputting the real-time image of the pipeline into a deep learning target detection model, and outputting object identification information;

step four: the stereo matching and depth calculating module comprises: transmitting the images of the left camera and the right camera after the stereo correction and the object identification information in the step three as input quantities to a stereo matching and depth calculation module to obtain a space three-dimensional coordinate of the object to be identified in the left camera image of the binocular camera, which is mapped to an actual three-dimensional space;

the implementation method of the stereo matching and depth calculating module comprises the following steps:

and 4.2: calculating to obtain a multi-path cost aggregation value of each pixel in a preset parallax range according to the matching cost of each pixel in the preset parallax range;

step 4.4: performing parallax optimization according to the parallax of each pixel after cost aggregation to obtain a parallax image of the binocular camera after stereo correction;

the parallax optimization comprises eliminating error matching, improving parallax precision and suppressing noise; rejecting error matching, namely, interchanging positions of a left image and a right image by adopting a left-right consistency check method, performing stereo matching again to obtain another Zhang Shicha image, finding out a disparity value of each pixel in a right image at a pixel with the same name point and a disparity value of a pixel corresponding to the pixel with the same name point in a disparity map of a left image according to the uniqueness constraint of the disparity, wherein if the absolute value of the difference between the two disparity values is less than 1, the uniqueness constraint is satisfied, and otherwise, the uniqueness constraint is not satisfied, and the disparity is rejected; meanwhile, isolated outliers are removed by adopting a connected domain detection method, small blocks caused by mismatching in the disparity map are removed, and small isolated speckles are filtered;

improving the parallax precision, adopting a sub-pixel optimization technology, obtaining the sub-pixel precision by a quadratic curve interpolation method, carrying out quadratic curve fitting on the cost value of the optimal parallax and the cost values of the front parallax and the rear parallax, wherein the parallax value corresponding to the extreme point of the curve is a new sub-pixel parallax value;

adopting Kalman filtering to suppress noise, and optimally correcting parallax by using Kalman gain;

estimating an equation for the observation;

2. The method for identifying and positioning the defects of the pipeline based on the target detection and the binocular vision according to claim 1, wherein the method for realizing the camera calibration comprises the following steps:

step 1.2: establishing a geometric model of camera imaging, and determining the mutual relation between the three-dimensional geometric position of one point on the surface of the space object and the corresponding point in the image;

3. The method for identifying and locating defects in pipelines based on target detection and binocular vision according to claim 2, wherein in step 1.2, the extrinsic parameter matrix

Reflecting the conversion between a camera coordinate system and a world coordinate system, wherein R is a rotation matrix of a right camera relative to a left camera of the binocular camera, and t is a translation vector of the right camera relative to the left camera of the binocular camera; inner partParameter matrix

4. The method for identifying and positioning the defects of the pipeline based on the target detection and the binocular vision according to claim 2, wherein in the step 1.4, the method for correcting the distortion comprises the following steps:

wherein (x) _p ,y _p ) Is the original coordinates of the image, (x) _tcorr ,y _rcorr ) Is the coordinates after image correction, and r is a variable.

5. The method for identifying and positioning the pipeline defect based on the target detection and the binocular vision according to claim 3, wherein the method for realizing the stereo correction comprises the following steps:

step 2.2: the composite rotation matrix r of the left camera ₁ Composite rotation matrix r of right camera ₂ Inputting original internal parameter matrixes of the left camera and the right camera, translation vector t and size of the chessboard image into OpenCV, and outputting a row alignment correction rotation matrix R of the left camera through a cvStereoRectify function ₁ And correcting the rotation matrix R by the row alignment of the right camera ₂ And the internal reference matrix M of the corrected left camera _l And the internal reference matrix M of the corrected right camera _r And the corrected left camera projection matrix P _l And the corrected right camera projection matrix P _r And a reprojection matrix Q;

6. The method for identifying and positioning the pipeline defects based on the target detection and the binocular vision according to claim 1, wherein the method for calculating the matching cost comprises the following steps:

7. The method for identifying and positioning the pipeline defect based on the target detection and the binocular vision according to claim 1, wherein the method for calculating the multipath cost aggregate value of each pixel in the preset parallax range comprises the following steps:

calculate the path cost of pixel p along path r:

wherein p represents a pixel, r represents a path, d represents a parallax, p-r represents a pixel in the path of the pixel p, and L represents an aggregation cost value of a certain path; l is _r (p-r, d) represents the cost of the last pixel in the path with a disparity d, L _r (p-r, d-1) represents the cost value of the last pixel in the path with a disparity d-1, L _r (p-r, d + 1) represents the cost value of the last pixel in the path with disparity d +1, min _i L _r (p-r, i) represents the minimum of all cost values of the last pixel in the path; c (p, d) represents the initial cost value;

S(p,d)＝∑ _r L _r (p,d)。

8. the method for identifying and positioning the defects of the pipeline based on the target detection and the binocular vision according to claim 1, wherein the depth calculation is performed by:

wherein f is the focal length of the lens, and b is the base lengthD is parallax, c _xr Column coordinates of the principal point of the right camera, c _xl Is the column coordinate of the principal point of the left camera.