CN114067197B - Pipeline defect identification and positioning method based on target detection and binocular vision - Google Patents

Pipeline defect identification and positioning method based on target detection and binocular vision Download PDF

Info

Publication number
CN114067197B
CN114067197B CN202111360831.6A CN202111360831A CN114067197B CN 114067197 B CN114067197 B CN 114067197B CN 202111360831 A CN202111360831 A CN 202111360831A CN 114067197 B CN114067197 B CN 114067197B
Authority
CN
China
Prior art keywords
image
camera
pixel
parallax
binocular
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111360831.6A
Other languages
Chinese (zh)
Other versions
CN114067197A (en
Inventor
何洪权
葛昱彤
申安慧
张琳
朱一沁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University
Original Assignee
Henan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University filed Critical Henan University
Priority to CN202111360831.6A priority Critical patent/CN114067197B/en
Publication of CN114067197A publication Critical patent/CN114067197A/en
Application granted granted Critical
Publication of CN114067197B publication Critical patent/CN114067197B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06T5/80
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • G06T7/85Stereo camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30108Industrial image inspection
    • G06T2207/30132Masonry; Concrete
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30181Earth observation
    • G06T2207/30184Infrastructure

Abstract

The invention provides a pipeline defect identification and positioning method based on target detection and binocular vision, which comprises the following steps: calibrating and stereo matching a binocular camera to obtain an image with relatively small distortion, and determining a specific functional relation between a specific position of a certain point in the image under a world coordinate system and a pixel point position of the certain point through the stereo matching and depth calculation; training a target detection model through a target detection algorithm and identifying a video image; when the image is identified to have a certain type of defects, the model frames a defect target, and the world coordinates of the center position of the defect target are displayed on a screen. The method for detecting and positioning the pipeline defects has the advantages of wide monitoring range, good real-time performance, high accuracy, accurate positioning and the like. Meanwhile, the method overcomes the limitation that the traditional pipeline detection methods such as a magnetic flux leakage detection method and the like can only detect the ferromagnetic material pipeline, can not generate false signals, and improves the situations of false alarm and false alarm to a certain extent.

Description

Pipeline defect identification and positioning method based on target detection and binocular vision
Technical Field
The invention relates to the technical field of target detection and multi-view vision positioning in computer vision, in particular to a pipeline defect identification and positioning method based on target detection and binocular vision.
Background
The method adopts a single neural network to directly predict the boundary and the class probability of the object, and realizes the end-to-end class detection. Binocular vision is an important distance perception technology in a computer passive distance measurement method, and under the premise of non-contact measurement, after calibration, stereo matching and other operations are carried out on a camera, the parallax of left and right images is calculated, a depth map is obtained, and therefore the coordinate position of an object in the real world is reflected.
YOLOX is the latest result of the YOLO algorithm, and compared with a YOLOV3-V5 series, the method has the advantages of improvement in identification precision and certain competitive advantage in speed. After the advantages of each version are integrated, the target detection precision is improved by improving a data enhancement strategy and increasing a training sample, and meanwhile, a detector without an anchor point is added to improve the detection speed. The model trained under the YOLOX network framework is substituted into the recognition network, so that the detection of multiple targets and multiple classes can be realized, and the method has the characteristics of wide detection class range, high accuracy, high speed and the like.
However, the mainstream methods for detecting defects in the pipeline, i.e. the photoelectric detection methods, include an endoscopic method, a laser projection method, a CCD camera image acquisition method, and the like. Although the method adopts the detection technology in the field of image vision, the detection precision is low, further measurement of the defect size is not involved, certain manpower is consumed, data of the inner surface ring of the pipeline cannot be extracted, and the defect condition of the inner surface of the pipeline cannot be further represented visually. The limitations of computer vision methods in the field of classification of pipe defects are mainly the low detection accuracy and the lack of data sets.
Disclosure of Invention
The invention provides a pipeline defect identification and positioning method based on target detection and binocular vision, aiming at the technical problem that the internal defects of a pipeline cannot be detected and positioned with high precision when the existing pipeline identification and positioning technology actually detects defect targets.
The technical scheme of the invention is realized as follows:
a pipeline defect identification and positioning method based on target detection and binocular vision comprises four parts, namely a camera calibration module, a stereo correction module, an image capture module, a target detection module, a stereo matching module and a depth calculation module; the method comprises the following steps:
the method comprises the following steps: calibrating a camera: establishing a camera imaging geometric model, determining the mutual relation between the three-dimensional geometric position of one point on the surface of a space object and the corresponding point in the image, and solving the calibration parameters of the binocular camera according to the image coordinates and the world coordinates of the characteristic points on the calibration plate, wherein the calibration parameters comprise the internal parameters, the external parameters and the distortion parameters of the left camera and the right camera;
step two: and (3) stereo correction: carrying out three-dimensional correction through epipolar constraint to enable corresponding points in the two images to be on the same horizontal epipolar line, and acquiring calibration parameters of the corrected binocular camera;
step three: image capture and target detection: self-building a data set, carrying out target detection model training, capturing images by using a binocular camera, predicting real-time picture information displayed by a left camera, and outputting object identification information;
step four: the stereo matching and depth calculating module comprises: and transmitting the images of the left camera and the right camera after the stereo correction and the object identification information in the step three as input quantities to a stereo matching and depth calculation module to obtain a space three-dimensional coordinate of the identified object in the image of the left camera of the binocular camera, which is mapped to an actual three-dimensional space.
Preferably, the method for implementing camera calibration includes:
step 1.1: making a chessboard calibration board composed of black and white square intervals, shooting chessboard images of the chessboard calibration board by using a binocular camera in different directions, extracting angular point information of each chessboard image, and obtaining image coordinates and space three-dimensional coordinates of all inner angular points on the chessboard images;
step 1.2: establishing a geometric model of camera imaging, and determining the correlation between the three-dimensional geometric position of one point on the surface of the space object and the corresponding point in the image;
step 1.3: taking the image coordinates of all inner corner points on the chessboard image obtained in the step 1.1 and the space three-dimensional coordinates thereof as input, solving and respectively outputting an inner parameter matrix and an outer parameter matrix of a left camera and a right camera of the binocular camera through experiments and calculation according to a geometric model of camera imaging;
step 1.4: solving 5 distortion parameters k according to the internal parameter matrix and the external parameter matrix of the left camera and the right camera of the binocular camera by utilizing the coordinate relationship before and after distortion correction 1 、k 2 、k 3 、p 1 、p 2 To correct for the distortion.
Preferably, in step 1.2, the extrinsic parameter matrix
Figure BDA0003359192530000021
Reflecting the conversion between a camera coordinate system and a world coordinate system, wherein R is a rotation matrix of a right camera relative to a left camera of the binocular camera, and t is a translation vector of the right camera relative to the left camera of the binocular camera; internal parameter matrix
Figure BDA0003359192530000022
Reflecting the conversion between the pixel coordinate system and the camera coordinate system, f is the focal length of the lens, u 0 ,v 0 ) Is the coordinate of the origin of the image coordinate system in the pixel coordinate system, d x 、d y And the size of each pixel point in the x-axis direction and the y-axis direction of the image coordinate system is determined.
Preferably, in step 1.4, the method for correcting the aberration is:
Figure BDA0003359192530000023
wherein (x) p ,y p ) Is the original coordinates of the image, (x) tcorr ,y tcorr ) Is the coordinates after image correction, and r is a variable.
Preferably, the implementation method of the stereo correction is as follows:
step 2.1: dividing a rotation matrix R of a right camera relative to a left camera of a binocular camera into a synthetic rotation matrix R of the left camera 1 And the composite rotation matrix r of the right camera 2 The left camera and the right camera respectively rotate by half, so that the optical axes of the left camera and the right camera are parallel, and the imaging surfaces of the left camera and the right camera are coplanar;
step 2.2: the composite rotation matrix r of the left camera 1 Composite rotation matrix r of right camera 2 Inputting original left camera and right camera internal reference matrixes, translation vector t and size of chessboard image into OpenCV, and outputting a row alignment correction rotation matrix R of the left camera through a cvStereoRectify function 1 Right camera row alignment correction rotation matrix R 2 And the internal reference matrix M of the corrected left camera l And the internal reference matrix M of the corrected right camera r And the corrected left camera projection matrix P l And the corrected right camera projection matrix P r And a reprojection matrix Q;
step 2.3: and (3) taking the output result of the cvStereoRectify function in the step (2.2) as a known constant, searching the position of each shaped pixel on the target image by adopting reverse mapping through a correction lookup mapping table of left and right views and the floating point position on the corresponding source image, interpolating each shaped value of the surrounding source pixels, cutting the image after the corrected image is assigned, and storing the correction result.
Preferably, the method for capturing the image and detecting the target is as follows:
step 3.1: capturing and screening a pipeline defect image by using a binocular camera and a web crawler, marking and classifying the pipeline defect image, and creating a data set adapted to the environment;
step 3.2: sorting the pipeline defect images and the marks on an organization catalog, loading the pipeline defect images and the marks in a data set by a getitem method, using a COCO Evaluator as an Evaluator, and putting all contents related in a deep learning model into a single COCO Exp file, wherein the contents comprise model setting, training setting and test setting;
step 3.3: after the setting is finished, initializing a model by using pre-training weights after the training of a COCO open source data set is finished, inputting a training command in a command line to start the training by using a GPU, and obtaining a weight file and a deep learning target detection model after the training is finished;
step 3.4: and capturing a real-time image of the pipeline through a binocular camera, inputting the real-time image of the pipeline into a deep learning target detection model, and outputting object identification information.
Preferably, the method for capturing the image and detecting the target is as follows:
step 4.1: calculating to obtain the matching cost of each pixel point of the original image in a preset parallax range according to the stereoscopically corrected left camera image and right camera image of the binocular camera;
step 4.2: calculating to obtain a multi-path cost aggregation value of each pixel in a preset parallax range according to the matching cost of each pixel in the preset parallax range;
step 4.3: calculating to obtain the parallax of each pixel after cost aggregation according to the multi-path cost aggregation value of each pixel in a preset parallax range;
step 4.4: performing parallax optimization according to the parallax of each pixel after cost aggregation to obtain a parallax image of the binocular camera left camera after stereo correction;
step 4.5: and (3) performing depth calculation on the parallax map of the binocular camera after the stereo correction to obtain a depth map of the binocular camera left image after the stereo correction, and finally obtaining a space three-dimensional coordinate mapped to the actual three-dimensional space by the recognized object in the binocular camera left image after the stereo correction by combining the object recognition information obtained in the step (3.4).
Preferably, the method for calculating the matching cost includes:
Figure BDA0003359192530000041
wherein, e (x) R Y, d) denotes the absolute value of the pixel grey value difference, I R (x R Y) representing candidate image pixel point (x) R -0.5,y) and (x) R +0.5, y) of the gray values of the sub-pixel positions,
Figure BDA0003359192530000042
gray value, I, representing the sub-pixel position between the pixels (x + d-0.5, y) and (x + d +0.5, y) of the matched image T (x R + d, y) represents a candidate image pixel point (x) R + d-0.5, y) and (x) R + d +0.5, y) of the gray values of the sub-pixel positions,
Figure BDA0003359192530000043
the gray value representing the sub-pixel position between the pixels (x-0.5, y) and (x +0.5, y) of the matched image, x R Representing the abscissa of the candidate image pixel, y the ordinate of the image pixel, d the disparity and x the abscissa of the matching image pixel.
Preferably, the method for calculating the multipath cost aggregation value of each pixel in the preset parallax range includes:
calculate the path cost of pixel p along path r:
Figure BDA0003359192530000044
wherein p represents a pixel, r represents a path, d represents a disparity, p-r represents a pixel in the p path of the pixel, and L represents a path aggregation cost value. L is r (p-r, d) represents the cost value of the last pixel in the path with a disparity of d, L r (p-r, d-1) represents the cost value of the last pixel in the path with a disparity d-1, L r (p-r, d + 1) represents the cost value of the last pixel in the path with the parallax of d +1, min i L r (p-r, i) represents the minimum of all cost values of the last pixel in the path; c (p, d) represents the initial cost value;
the total path cost value S is obtained from the path cost of pixel p along path r:
S(p,d)=∑ r L r (p,d)。
preferably, the depth calculation method includes:
Figure BDA0003359192530000045
wherein f is the focal length, b is the base length, d is the parallax, c xr Column coordinates of the principal point of the right camera, c xl Column coordinates of the principal point of the left camera.
Compared with the prior art, the invention has the following beneficial effects:
1) The invention adopts a binocular RGBD camera to predict and position the category of the pipeline defect, improves the detection speed and precision compared with the existing method, and provides a classification method of a pipeline defect data set, which applies deep learning to binocular vision positioning to realize target detection, builds a deep learning convolutional neural network by referring to a YOLOX network, and performs appropriate adjustment, simplifies a feature extraction part, simplifies a network structure, ensures information transmission, and properly positions pixels to improve the prediction rate.
2) The invention reduces the detection cost, can predict and position various pipelines with various types and quantities, and has the advantages of wide monitoring range, good real-time performance, high precision, accurate positioning and the like; meanwhile, the invention overcomes the limitation that the traditional pipeline detection methods such as a magnetic flux leakage detection method and the like can only detect the ferromagnetic material pipeline, can not generate false signals, and improves the situations of false alarm and false alarm to a certain extent.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a schematic structural diagram of a deep learning convolutional neural network according to the present invention.
Fig. 3 is a substructure diagram of a network used in fig. 2.
FIG. 4 is an exemplary classification and labeling of pipe defects according to the present invention.
Fig. 5 is a flowchart of camera calibration according to the present invention.
Fig. 6 is a flow chart of stereo calibration according to the present invention.
Fig. 7 is a flow chart of stereo matching and depth calculation according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
As shown in fig. 1, the embodiment of the invention discloses a pipeline defect identification and positioning method based on target detection and binocular vision, which comprises four parts, namely a camera calibration module, a stereo correction module, an image capture module, a target detection module, a stereo matching module and a depth calculation module; a camera calibration module: the image correction method comprises an internal parameter coefficient, an external parameter coefficient and a distortion coefficient, and can be used for correcting the image shot by a camera later to obtain an image with relatively small distortion. Inputting image coordinates and world coordinates of known characteristic points of a calibration plate, establishing a camera imaging geometric model, determining the relation between camera image pixel coordinates and scene point three-dimensional coordinates, and outputting internal and external parameters and distortion coefficients of the camera. The stereo correction module: a visual system with an intersecting optical axis structure is adopted, and the relative positions of the two cameras are changed by decomposing a rotation matrix and a line alignment correction rotation matrix, so that corresponding points in the two images are on the same horizontal polar line. And acquiring corrected camera parameters by using a mathematical physical method on the basis of the original data of the camera. Two-dimensional search is changed into one-dimensional search, matching search space is reduced, and search rate of stereo matching is improved. And finally, finishing image correction through a correction mapping lookup table, and cutting and storing the image. A target detection module: the method comprises the steps of inputting RGB three-channel images shot by a left camera after stereo correction, training a deep learning network based on a target detection algorithm through a self-built pipeline defect data set, deducing and predicting the images by the network, and outputting the images as a defect semantic label class, a recognition box center coordinate (x, y) and a width (w, h) recognized in the left camera images. And the semantic label, the center coordinates of the identification frame and the width and height data are used as object identification information. The stereo matching and depth calculating module comprises: the method comprises the steps of taking left and right camera images of a binocular camera after stereo correction and object identification information obtained through deep learning object identification as input, obtaining a disparity map of the left camera of the binocular camera after stereo correction after processing through a stereo matching algorithm, converting the disparity map into a depth map through depth calculation, and finally combining the object identification information to output spatial three-dimensional coordinates which are mapped to an actual three-dimensional space by an identified object in the left camera image of the binocular camera after stereo correction. The method comprises the following specific steps:
the method comprises the following steps: calibrating a camera: calibrating a camera, establishing a camera imaging geometric model, determining the mutual relation between the three-dimensional geometric position of one point on the surface of a space object and the corresponding point in an image, and solving calibration parameters of the binocular camera according to the image coordinates and world coordinates of the feature points on the calibration plate, wherein the calibration parameters comprise internal parameters, external parameters and distortion parameters of a left camera and a right camera; and the internal parameter, the external parameter and the distortion parameter of the binocular camera and the right camera are used as camera calibration parameters. The internal reference, external reference coefficient and distortion parameter can correct the image shot by the camera later to obtain the image with relatively small distortion.
The method for realizing the camera calibration comprises the following steps:
step 1.1: a chessboard calibration plate composed of black and white square blocks at intervals is manufactured, and chessboard images of the chessboard calibration plate are shot by using binocular cameras in different directions, so that single-plane chessboard grids are imaged clearly in the left camera and the right camera. Extracting corner point information of each chessboard image to obtain image coordinates and space three-dimensional coordinates of all inner corner points on the chessboard image;
step 1.2: establishing a geometric model of camera imaging, and determining the correlation between the three-dimensional geometric position of one point on the surface of the space object and the corresponding point in the image;
in step 1.2, the extrinsic parameter matrix
Figure BDA0003359192530000061
Reflecting the conversion between a camera coordinate system and a world coordinate system, wherein R is a rotation matrix of a right camera relative to a left camera of the binocular camera, and t is a translation vector of the right camera relative to the left camera of the binocular camera; internal parameter matrix
Figure BDA0003359192530000062
Reflecting the conversion between the pixel coordinate system and the camera coordinate system, f is the focal length of the lens, u 0 ,v 0 ) As the coordinates of the origin of the image coordinate system in the pixel coordinate system, d x 、d y And the size of each pixel point in the x-axis and y-axis directions of the image coordinate system is obtained.
Step 1.3: taking the image coordinates of all inner corner points on the chessboard image obtained in the step 1.1 and the space three-dimensional coordinates thereof as input, solving and respectively outputting an inner parameter matrix and an outer parameter matrix of a left camera and a right camera of the binocular camera through experiments and calculation according to a geometric model of camera imaging;
step 1.4: solving 5 distortion parameters k according to the internal parameter matrix and the external parameter matrix of the left camera and the right camera of the binocular camera by utilizing the coordinate relationship before and after distortion correction 1 、k 2 、k 3 、p 1 、p 2 To correct for the distortion.
In step 1.4, the method for correcting the distortion is as follows:
Figure BDA0003359192530000071
wherein (x) p ,y p ) Is the original coordinates of the image, (x) tcorr ,y tcorr ) Is the coordinate after the image correction, r is the variable; (x) tcorr ,y tcorr ) The approximation is described by a taylor series expansion at r = 0.
Step two: and (3) stereo correction: performing stereo correction through epipolar constraint to enable corresponding points in the two images to be on the same horizontal epipolar line, and acquiring calibration parameters of the corrected binocular camera; and calling OpenCV to obtain parameters of the corrected left camera and the corrected right camera, finishing correction, and finally obtaining corrected images through correction mapping.
The implementation method of the stereo correction comprises the following steps:
step 2.1: dividing a rotation matrix R of a right camera relative to a left camera of a binocular camera into a synthetic rotation matrix R of the left camera 1 And the composite rotation matrix r of the right camera 2 The left camera and the right camera respectively rotate by half, so that the optical axes of the left camera and the right camera are parallel, and the imaging surfaces of the left camera and the right camera are coplanar;
step 2.2: the composite rotation matrix r of the left camera 1 Composite rotation matrix r of right camera 2 Inputting original left camera and right camera internal reference matrixes, translation vector t and size of chessboard image into OpenCV, and outputting a row alignment correction rotation matrix R of the left camera through a cvStereoRectify function 1 And correcting the rotation matrix R by the row alignment of the right camera 2 And correctingInternal reference matrix M of rear left camera l And the internal reference matrix M of the corrected right camera r And the corrected left camera projection matrix P l And the corrected right camera projection matrix P r And a reprojection matrix Q;
step 2.3: taking the output result of the cvStereoRectify function in the step 2.2 as a known constant, searching a floating point position on a corresponding source image for each shaped pixel position on a target image by adopting reverse mapping through a correction searching mapping table of left and right views, interpolating each shaped value of surrounding source pixels, cutting the image after the corrected image is assigned, and storing the correction result.
Step three: image capture and target detection: self-building a data set, carrying out target detection model training, capturing images by using a binocular camera, predicting real-time picture information displayed by a left camera, and outputting object identification information; and inputting the left camera image obtained by correction in the step into a target detection module, and labeling and classifying the image captured by a binocular camera to self-establish a pipeline defect data set. And aggregating and forming a convolutional neural network of image features on different fine granularities of the images, mixing and combining the image features, transmitting the image features to a prediction layer, predicting the image features, generating a boundary frame and predicting the category. And outputting the recognized defect semantic label class, the recognized frame center coordinates (x, y) and the width and height (w, h) in the left camera image, and displaying a target detection result.
The method for realizing image capture and target detection comprises the following steps:
step 3.1: capturing and screening a pipeline defect image by using a binocular camera and a web crawler, marking and classifying the pipeline defect image, and creating a data set adapted to the environment; for the marker image, labelling or CVAT was used. The defects are classified as follows, since the ND class is an annotation-free class, the ND class is removed, and the total number of the ND class is 9 semantic tags.
(1) ND class (annotated class): and no defect. (one meter of buffer around each annotation class, not labeled ND if there are any other classes between annotation classes);
(2) OB class: surface damage;
(3) FO type: a category of obstacles;
(4) The RB class: classes that include cracks, fractures, and collapse;
(5) The RO type: species of root invasion of plants;
(6) Class DE: deforming;
(7) FS type: dislocation;
(8) The GR class: a branch pipe;
(9) The BE class: the class of adherent deposits;
(10) The AF-type: the class of fixed deposits.
Step 3.2: sorting the pipeline defect images and the marks on an organization catalog, loading the pipeline defect images and the marks in a data set by a getitem method, using a COCO Evaluator as an Evaluator, and putting all contents related in a deep learning model into a single COCO Exp file, wherein the contents comprise model setting, training setting and test setting; the number of training rounds epos =300, and the number of defect classes num _ classes =9 is defined on the basis of the classification at step 3.1.
Step 3.3: after the setting is finished, initializing a model by using pre-training weights after the training of a COCO open source data set is finished, inputting a training command in a command line to start the training by using a GPU, and obtaining a weight file and a deep learning target detection model after the training is finished;
step 3.4: and capturing a real-time image of the pipeline through a binocular camera, inputting the real-time image of the pipeline into a deep learning target detection model, and outputting object identification information. In the deep learning target detection model, a convolutional neural network of image features is aggregated and formed on different image fine granularities, the image features are mixed and combined, the image features are transmitted to a prediction layer, the image features are predicted, a boundary frame is generated, the category is predicted, and object identification information is output.
Step four: the stereo matching and depth calculating module comprises: and transmitting the images of the left camera and the right camera after the stereo correction and the object identification information in the step three as input quantities to a stereo matching and depth calculation module, processing the images by using the center coordinates of the identification frame in the object identification information as reference quantities through the stereo matching and depth calculation module, and outputting to obtain the space three-dimensional coordinates of the object to be identified in the left camera image of the binocular camera, which is mapped to the actual three-dimensional space.
The method for realizing image capture and target detection comprises the following steps:
step 4.1: calculating to obtain the matching cost of each pixel point of the original image in a preset parallax range according to the stereoscopically corrected left camera image and right camera image of the binocular camera; the purpose of the matching cost calculation is to measure the correlation between the pixel to be matched and the candidate pixel. Whether the two pixels are homonymous points or not can be calculated through a matching cost function, and the smaller the cost is, the greater the correlation is, and the greater the probability of being homonymous points is.
BT (Birchfield and Tomasi) algorithm was used as a calculation method for the matching cost, expressed as:
Figure BDA0003359192530000091
wherein, e (x) R Y, d) denotes the BT cost, i.e. the absolute value of the difference in the grey values of the pixels, I R (x R Y) representing a candidate image pixel point (x) R -0.5,y) and (x) R +0.5, y) of sub-pixel locations,
Figure BDA0003359192530000092
gray value, I, representing the sub-pixel position between the pixels (x + d-0.5, y) and (x + d +0.5, y) of the matched image T (x R + d, y) represents a candidate image pixel point (x) R + d-0.5, y) and (x) R + d +0.5, y) of the gray values of the sub-pixel positions,
Figure BDA0003359192530000093
the gray value representing the sub-pixel position between the pixels (x-0.5, y) and (x +0.5, y) of the matched image, x R Denotes the abscissa of the candidate image (left image) pixel, y denotes the ordinate of the image pixel, d denotes the disparity, and x denotes the abscissa of the matching image (right image) pixel. Tong (Chinese character of 'tong')Through the formula, the matching cost calculation of the binocular camera left and right camera images subjected to the stereo correction in the preset parallax range can be realized, and the matching cost of each pixel point of the original image in the preset parallax range is obtained.
And 4.2: calculating to obtain a multi-path cost aggregation value of each pixel in a preset parallax range according to the matching cost of each pixel in the preset parallax range; the idea of adopting a global stereo matching algorithm, namely a global energy optimization strategy, is to simply find the optimal parallax of each pixel so that the global energy function of the whole image is minimum, and the definition of the global energy function is as follows:
E(d)=E data (d)+E smooth (d);
and (3) adopting a path cost aggregation method, namely performing one-dimensional aggregation on all paths around the pixel on the matching costs under all parallaxes of the pixel to obtain a path cost value under the path, and then adding all the path cost values to obtain the matched cost value after the pixel is aggregated. Calculate the path cost of pixel p along path r:
Figure BDA0003359192530000094
wherein p represents a pixel, r represents a path, d represents a disparity, p-r represents a pixel in the p path of the pixel, and L represents a path aggregation cost value. L is r (p-r, d) represents the cost of the last pixel in the path with a disparity d, L r (p-r, d-1) represents the cost value of the last pixel in the path with a disparity d-1, L r (p-r, d + 1) represents the cost value of the last pixel in the path with the parallax of d +1, min i L r (p-r, i) represents the minimum of all cost values of the last pixel in the path; c (p, d) represents the initial cost value.
The first item is the matching cost value C, belonging to the data item.
The second item is a smooth item, the value accumulated on the path cost is taken, no punishment is made, and P is made 1 Punishment and do P 2 Punishing the value with the minimum cost in the three cases; p 1 To accommodate tilting orCurved surfaces, P 2 Then the purpose is to preserve the discontinuity. P 2 It is often dynamically adjusted according to the gray level difference of the adjacent pixels as follows:
Figure BDA0003359192530000101
P 2 ' is P 2 Is generally set to be much larger than P 1 Number of (1), I bp And I bq Representing the gray values of pixels p and q, respectively.
The third term is to guarantee a new path cost value L r Not exceeding a certain numerical upper limit.
The total path cost value S is obtained from the path cost of pixel p along path r:
S(p,d)=∑ r L r (p,d);
and 4.2, multipath cost aggregation can be realized through various calculations in the step 4.2, and a multipath cost aggregation value of each pixel in a preset parallax range is obtained.
Step 4.3: calculating to obtain the parallax of each pixel after cost aggregation according to the multi-path cost aggregation value of each pixel in a preset parallax range; and parallax calculation, namely determining the optimal parallax value of each pixel through a cost matrix S after cost aggregation, and selecting the parallax corresponding to the minimum cost value as the optimal parallax in the cost values of all the parallaxes of a certain pixel by adopting a WTA (Winner-take-all) algorithm to finally obtain the parallax of each pixel after cost aggregation.
Step 4.4: performing parallax optimization according to the parallax of each pixel after cost aggregation to obtain a parallax image of the binocular camera left camera after stereo correction; the purpose of parallax optimization is to further optimize the parallax image obtained in the last step, improve the quality of the parallax image, including rejecting mismatching, improving parallax precision and suppressing noise.
The elimination of the error matching adopts a left-right consistency checking method which is based on the uniqueness constraint of parallax, namely that each pixel has only one correct parallax at most. The method comprises the following specific steps of interchanging left and right images, namely enabling the left image to be the right image and enabling the right image to be the left image, performing stereo matching again to obtain another Zhang Shicha image, wherein each value in the disparity map reflects the corresponding relation between two pixels, so that the homonymy point pixel of each pixel in the right image and the disparity value corresponding to the pixel are found through the disparity map of the left image according to the unique constraint of disparity, if the absolute value of the difference between the two disparity values is smaller than 1, the unique constraint is met, and the disparity value is retained, otherwise, the disparity value is not met and eliminated. Meanwhile, isolated outliers are removed by adopting a connected domain detection method, small blocks caused by mismatching in the disparity map are removed, and small isolated speckles are filtered. The formula for the consistency check is as follows:
Figure BDA0003359192530000102
and improving the parallax precision by adopting a sub-pixel optimization technology, obtaining the sub-pixel precision by using a quadratic curve interpolation method, and performing quadratic curve fitting on the cost value of the optimal parallax and the cost values of the front parallax and the rear parallax, wherein the parallax value corresponding to the extreme point of the curve is the new sub-pixel parallax value.
And Kalman filtering is adopted for noise suppression, so that the parallax result is smoother, the noise in the parallax image is eliminated to a certain extent, and the parallax filling effect is achieved. Parallax optimization can be achieved through the step 4.4, and finally the parallax image of the binocular camera left camera after stereo correction is obtained.
Step 4.5: and (3) performing depth calculation on the parallax map of the binocular camera after the stereo correction to obtain a depth map of the binocular camera left image after the stereo correction, and finally obtaining a space three-dimensional coordinate mapped to the actual three-dimensional space by the recognized object in the binocular camera left image after the stereo correction by combining the object recognition information obtained in the step (3.4).
The depth calculation method comprises the following steps:
Figure BDA0003359192530000111
wherein f is the focal length, b is the base length, d is the parallax, c xr Column coordinates of the principal point of the right camera, c xl Is the column coordinate of the principal point of the left camera. After depth calculation, a depth map of the binocular camera left camera image after stereo correction can be obtained, and a space three-dimensional coordinate of the binocular camera left camera image after stereo correction, which is mapped to an actual three-dimensional space by the recognized object, can be finally obtained by combining object recognition information obtained through deep learning target detection.
As shown in fig. 2 and fig. 3, fig. 2 is a deep learning convolutional neural network structure proposed by the present invention, which is composed of three parts, i.e., feature extraction, feature fusion, class prediction and detection frame calibration. Fig. 3 is an important substructure contained in the network.
And the feature extraction part is used for aggregating and forming image features on different fine image granularities, and comprises a Focus interlaced sampling splicing structure and four DarkNet structures. Focus is used as the input part of the whole network, starting with the default 3 × 640 × 640 input, copying four, slicing four pictures into four 3 × 320 × 320 slices by a slicing operation, then splicing the four slices from depth using concat, generating 12 × 320 × 320 output, and finally inputting the result to the next convolutional layer via batch _ borm and leak _ relu. The basic architecture of the DarkNet structure consists of CSP and SPP, the original input is divided into two branches, convolution operation is respectively carried out to reduce the number of channels by half, the Bottleneck operation of YOLOV5 is abandoned in the first branch, a ResUnit structure is adopted, and then the first branch and the second branch are concat, so that the input and the output of the CSP are the same in size, and the purpose of the CSP is to enable a model to learn more characteristics; the SPP has an input of 512 × 20 × 20, is output 256 × 20 × 20 after passing through a 1 × 1 convolutional layer, is sampled by three parallel MaxPools, is added to its initial characteristics to output 1024 × 20 × 20, and is finally restored to 512 × 20 × 20 with a 512 convolution kernel.
The function of the feature fusion part is to mix and combine image features and transfer the image features to a prediction layer, using PAFPN, which enhances information propagation in view of YOLOV3, based on Mask R-CNN and FPN frameworks, with the ability to accurately retain spatial information, which helps to properly position pixels to form a Mask.
And predicting the image characteristics by the last part of network structure to generate a boundary box and predict the category. Firstly, extracting three characteristic layers by using Decoupled Head to detect a target, simultaneously adding a detector without an anchor point to improve the detection speed, and in an output part, the three outputs are 85 multiplied by 20, 85 multiplied by 40 and 85 multiplied by 80 respectively. And obtaining the defect prediction information by decoding the prediction result.
Compared with a YOLOX network, the deep learning convolution neural network structure provided by the invention is built by referring to the YOLOX network, partial CBS operation and convolution times are reduced in a characteristic extraction part, and the characteristic extraction part of the network is simplified. Meanwhile, the calculation mode of the loss function is modified, and standard binary cross entropy loss is used for training.
Figure BDA0003359192530000121
Wherein C is the number of semantic tags num mentioned above classes =9,
Figure BDA0003359192530000122
Indicates whether there is a class c, x in the current picture c Is the original output of the c-type model, sigma is Sigmoid activation function,
Figure BDA0003359192530000123
is a weight of class c, N represents the number of images in training, N c Indicates the number of images containing class c.
As shown in fig. 4, the present invention provides a method for classifying pipeline defects, which comprises classifying the defect types into 10 classes, excluding the ND class, i.e. the defect-free class, and counting 9 semantic tags, namely, the surface damage class, the obstacle class, the crack class, the root invasion class, the deformation class, the dislocation class, the branch tube class, the adhesion deposit class and the fixed deposit class. Meanwhile, the figure shows an example of labeling 9 types of defects, which can be used as a reference when a pipeline defect data set is built by itself. The problem of fuzzy data classification standards of a computer vision method in the field of pipeline defect classification is solved through a pipeline defect classification method.
As shown in fig. 5, the camera calibration method disclosed in the present invention uses a camera calibration module. The input of the camera calibration module is the image coordinate and world coordinate of the known characteristic point of the calibration plate, so that a geometric model of camera imaging is established, the relation between the image pixel coordinate of the camera and the three-dimensional coordinate of the scene point is determined, and the internal and external parameter matrixes and the distortion coefficient of the camera are output.
An external parameter matrix W reflecting the conversion between the camera coordinate system and the world coordinate system; and the internal parameter matrix M reflects the conversion between the pixel coordinate system and the camera coordinate system.
An extrinsic parameter matrix W, an intrinsic parameter matrix M, and 5 distortion coefficients k 1 、k 2 、k 3 、p 1 、p 2 As follows.
Figure BDA0003359192530000124
Figure BDA0003359192530000125
Figure BDA0003359192530000126
R is a rotation matrix of the binocular camera right camera serving as the camera relatively, and t is a translation vector of the binocular camera right camera serving as the camera relatively. f is the focal length of the lens and (u) 0 ,v 0 ) As the coordinates of the origin of the image coordinate system in the pixel coordinate system, d x 、d y And the size of each pixel point in the x-axis and y-axis directions of the image coordinate system is obtained. (x) p ,y p ) Is the original coordinates of the image, (x) tcorr ,y tcorr ) Is the coordinates after image correction, and the approximation is developed by Taylor series at r =0Like the description.
The camera calibration method is realized by Matlab, and various chessboard images should be acquired as much as possible. The distortion of the lens increases radially from the center of the image and sometimes exhibits non-uniformity across the frames of the image, and in order to obtain distortion information for the image, the checkerboard should be at various different edges of the image. And obtaining a rotation matrix R and an internal reference matrix M after Matlab calibration, and obtaining the rest parameter values after calculation. For subsequent import into OpenCV use, the rotation matrix R and the internal reference matrix M should be transposed. The order of distortion coefficients in the distortion vector is [ k ] 1 ,k 2 ,p 1 ,p 2 ,k 3 ]。
As shown in fig. 6, the stereo calibration method disclosed in the present invention uses a stereo calibration module. Stereo correction is carried out through polar constraint, and a rotation matrix R output by a calibration module is divided into a synthetic rotation matrix R of a left camera and a right camera 1 And r 2 And two parts, corresponding points in the two images are on the same horizontal polar line. Acquiring corrected camera parameters including a composite rotation matrix r 1 And r 2 The original left and right camera internal reference matrix, the translation vector t and the size of the chessboard image. Calling functions StereoRecityf and initunortisterRecityMap in OpenCV to output left and right camera row alignment correction rotation matrix R 1 And R 2 And the internal reference matrix M of the left camera and the right camera after correction l And M r And a projection matrix P of the left camera and the right camera after correction l And P r And a reprojection matrix Q.
The reprojection matrix Q enables a conversion between the world coordinate system and the pixel coordinate system. The method comprises the following specific steps:
Figure BDA0003359192530000131
the origin of the camera coordinate system after correction is on the optical axis. Wherein d represents a parallax and the three-dimensional coordinates are
Figure BDA0003359192530000132
c′ x Principal point representing right image, when correctedAt last, c' x =c x . The above formula is developed on the premise that the stereo correction is correct, as follows:
Figure BDA0003359192530000133
and acquiring parameters of the corrected left camera and the corrected right camera in real time, and acquiring a corrected image through correction mapping. And for each reshaped pixel position on the target image, finding out a floating point position on the corresponding source image, and interpolating each reshaped value of the surrounding source pixels. And after the corrected images are assigned, cutting the images and storing the corrected results. Finally, two image planes of the binocular camera are parallel, the optical axis is perpendicular to the image planes, the pole is at infinity, and at the moment, any point (x) in the image 0 ,y 0 ) Corresponding polar line y = y 0
As shown in fig. 7, the stereo matching and depth calculating method disclosed in the present invention uses a stereo matching module. The method comprises the steps of matching cost calculation based on a BT algorithm, cost aggregation based on a global energy optimization strategy, parallax calculation based on a WTA algorithm, parallax optimization and depth calculation.
The matching cost is calculated as follows. BT is the cost, i.e. the absolute value of the pixel grey value difference, while BT makes use of the grey information of the sub-pixels. First, calculate the pixel (x) in the left image R -0.5,y) and (x) R Between +0.5, y) sub-pixel positions (x) R Gray value of + x, y)
Figure BDA0003359192530000134
The same can be said for the pixel in the right image (x) R + d-0.5,y) and (x) R Between + d +0.5, y) sub-pixel positions (x) R A grey value of + d + x, y)
Figure BDA0003359192530000141
Two costs are respectively calculated, and the final cost is the minimum value of the two costs.
Figure BDA0003359192530000142
The cost of the two images is as follows:
Figure BDA0003359192530000143
Figure BDA0003359192530000144
the cost aggregation adopts the idea of a global stereo matching algorithm, namely a global energy optimization strategy, and the optimal parallax of each pixel is searched to ensure that the global energy function of the whole image is minimum. And performing one-dimensional aggregation on the matching costs under all parallaxes of the pixels on all paths around the pixels to obtain path cost values under the paths, and then adding the path cost values to obtain the matched cost value after the pixels are aggregated.
The matching cost value is divided into three items, the first item is a matching cost value C and belongs to a data item; the second item is a smooth item, the value accumulated on the path cost is taken, no punishment is made, and P is made 1 Punishment and do P 2 Punishment is carried out on the value with the minimum cost in the three cases; the third item is a reduced item, and the new path cost value L is ensured r Not exceeding a certain numerical upper limit.
Figure BDA0003359192530000145
Figure BDA0003359192530000146
To obtain L r After (p, d), the total path cost value S is as follows:
Figure BDA0003359192530000147
and the parallax calculation adopts a greedy algorithm based on WTA, namely, in the cost values of all the parallaxes of a certain pixel, the parallax corresponding to the minimum cost value is selected as the optimal parallax, and finally the parallax of each pixel after cost aggregation is obtained.
Figure BDA0003359192530000148
The parallax optimization comprises rejecting error matching, improving parallax precision and suppressing noise. And (3) eliminating error matching, namely adopting a left-right consistency check method, interchanging left and right images, performing stereo matching again to obtain another Zhang Shicha image, finding out the pixels of the same-name points of each pixel in the right image and the corresponding parallax value of the pixel through the parallax image of the left image according to the uniqueness constraint of the parallax, and if the absolute value of the difference between the two parallax values is less than 1, satisfying the uniqueness constraint and retaining, otherwise not satisfying the uniqueness constraint and eliminating. Meanwhile, isolated outliers are removed by adopting a connected domain detection method, small blocks caused by mismatching in the disparity map are removed, and small isolated speckles are filtered.
Figure BDA0003359192530000151
And improving the parallax precision by adopting a sub-pixel optimization technology, obtaining the sub-pixel precision by using a quadratic curve interpolation method, and performing quadratic curve fitting on the cost value of the optimal parallax and the cost values of the front parallax and the rear parallax, wherein the parallax value corresponding to the extreme point of the curve is the new sub-pixel parallax value.
And Kalman filtering is adopted for noise suppression, and the parallax is optimally corrected by Kalman gain, so that the parallax result is smoother, the noise in the parallax image is eliminated to a certain extent, and the parallax filling effect is realized.
Figure BDA0003359192530000152
Is a state estimation equation, representing the estimation of x (k + 1) at the kth time; w (k + 1) is the Kalman gain,
Figure BDA0003359192530000155
an equation is estimated for the observation.
Figure BDA0003359192530000153
The pixel depth calculation formula is as follows. Where f is the focal length, b is the baseline length, d is the parallax, c xr And c xl Column coordinates of two camera principal points.
Figure BDA0003359192530000154
In conclusion, the method for identifying and positioning the pipeline defects based on the target detection and the binocular vision can finally obtain the spatial three-dimensional coordinates of the identified object mapped into the actual three-dimensional space in the image of the binocular camera left camera after the stereo correction, and the target detection and the positioning are completed.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (8)

1. A pipeline defect identification and positioning method based on target detection and binocular vision is characterized by comprising four parts, namely a camera calibration module, a three-dimensional correction module, an image capture module, a target detection module, a three-dimensional matching module and a depth calculation module; the method comprises the following steps:
the method comprises the following steps: calibrating a camera: establishing a camera imaging geometric model, determining the mutual relation between the three-dimensional geometric position of one point on the surface of a space object and the corresponding point in the image, and solving the calibration parameters of the binocular camera according to the image coordinates and the world coordinates of the characteristic points on the calibration plate, wherein the calibration parameters comprise the internal parameters, the external parameters and the distortion parameters of the left camera and the right camera;
step two: and (3) stereo correction: carrying out three-dimensional correction through epipolar constraint to enable corresponding points in the two images to be on the same horizontal epipolar line, and acquiring calibration parameters of the corrected binocular camera;
step three: image capture and target detection: self-building a data set, carrying out target detection model training, capturing images by using a binocular camera, predicting real-time picture information displayed by a left camera, and outputting object identification information;
the method for realizing image capture and target detection comprises the following steps:
step 3.1: capturing and screening a pipeline defect image by using a binocular camera and a web crawler, marking and classifying the pipeline defect image, and creating a data set;
the defects are classified as follows, except that the ND class is an unannotated class, and the other 9 semantic labels are:
OB class: a class of surface damage; FO type: an obstacle category; RB is: classes that include cracks, fractures, and slumping; RO type: the species of root invasion of the plant; DE: deforming; FS type: dislocation; GR type: a branch pipe; BE class: the class of adherent deposits; AF types: the class of fixed deposits;
step 3.2: sorting the pipeline defect images and the marks on an organization catalog, loading the pipeline defect images and the marks in a data set by a getitem method, using a COCO Evaluator as an Evaluator, and putting all contents related in a deep learning model into a single COCO Exp file, wherein the contents comprise model setting, training setting and test setting;
step 3.3: after the setting is finished, initializing a model by using pre-training weights after the training of a COCO open source data set is finished, inputting a training command in a command line to start the training by using a GPU, and obtaining a weight file and a deep learning target detection model after the training is finished;
the deep learning convolution neural network structure is composed of three parts, namely feature extraction, feature fusion, category prediction and detection frame calibration;
the characteristic extraction part comprises a Focus interlaced sampling splicing structure and four DarkNet structures; focus as the input part of the whole network, starting with the default 3 × 640 × 640 input, copying four, slicing four pictures into four 3 × 320 × 320 slices by a slicing operation, then splicing the four slices from depth using concat, generating 12 × 320 × 320 output, and finally inputting the result to the next convolutional layer via batch _ borm and leak _ relu; the DarkNet structure consists of CSP and SPP, the original input is divided into two branches, convolution operation is respectively carried out to reduce the number of channels by half, the Bottleneck operation of YOLOV5 is abandoned in the first branch, a ResUnit structure is adopted, and then concat branch I and branch II are carried out, so that the input and the output of the CSP are the same in size, and the purpose of the CSP is to enable a model to learn more characteristics; the input of SPP is 512 × 20 × 20, 256 × 20 × 20 is output after 1 × 1 convolutional layer, then sampling is carried out through three parallel MaxPools, the result is added with the initial characteristic of the result, 1024 × 20 × 20 is output, and finally 512 convolution kernels are used for restoring the result to 512 × 20 × 20;
the feature fusion part is used for mixing and combining image features and transmitting the image features to a prediction layer, information transmission is enhanced by adopting PAFPN (planar edge projection network) based on Mask R-CNN and FPN frames, the capability of accurately retaining spatial information is realized, and pixels are favorably and properly positioned to form a Mask;
predicting image characteristics by the last part of network structure to generate a boundary box and predict categories; firstly, extracting three characteristic layers by using Decoupled Head to detect a target, adding a detector without an anchor point to improve the detection speed, wherein in an output part, the three outputs are respectively 85 multiplied by 20, 85 multiplied by 40 and 85 multiplied by 80; obtaining defect prediction information by decoding the prediction result;
the deep learning convolution neural network structure is built by referring to a YOLOX network, compared with the YOLOX network, partial CBS operation and convolution times are reduced in a feature extraction part, and the feature extraction part of the network is simplified; meanwhile, the calculation mode of the loss function is modified, and standard binary cross entropy loss is used for training:
Figure FDA0003882479240000021
wherein C is the number of semantic tagsnum classes =9,
Figure FDA0003882479240000022
Indicates whether there is a class c, x, in the current picture c Is the original output of the c-type model, sigma is the Sigmoid activation function,
Figure FDA0003882479240000023
is a weight of class c, N represents the number of images in training, N c Indicates the number of images containing class c;
step 3.4: capturing a real-time image of the pipeline through a binocular camera, inputting the real-time image of the pipeline into a deep learning target detection model, and outputting object identification information;
step four: the stereo matching and depth calculating module comprises: transmitting the images of the left camera and the right camera after the stereo correction and the object identification information in the step three as input quantities to a stereo matching and depth calculation module to obtain a space three-dimensional coordinate of the object to be identified in the left camera image of the binocular camera, which is mapped to an actual three-dimensional space;
the implementation method of the stereo matching and depth calculating module comprises the following steps:
step 4.1: calculating to obtain the matching cost of each pixel point of the original image in a preset parallax range according to the stereoscopically corrected left camera image and right camera image of the binocular camera;
and 4.2: calculating to obtain a multi-path cost aggregation value of each pixel in a preset parallax range according to the matching cost of each pixel in the preset parallax range;
step 4.3: calculating to obtain the parallax of each pixel after cost aggregation according to the multi-path cost aggregation value of each pixel in a preset parallax range;
step 4.4: performing parallax optimization according to the parallax of each pixel after cost aggregation to obtain a parallax image of the binocular camera after stereo correction;
the parallax optimization comprises eliminating error matching, improving parallax precision and suppressing noise; rejecting error matching, namely, interchanging positions of a left image and a right image by adopting a left-right consistency check method, performing stereo matching again to obtain another Zhang Shicha image, finding out a disparity value of each pixel in a right image at a pixel with the same name point and a disparity value of a pixel corresponding to the pixel with the same name point in a disparity map of a left image according to the uniqueness constraint of the disparity, wherein if the absolute value of the difference between the two disparity values is less than 1, the uniqueness constraint is satisfied, and otherwise, the uniqueness constraint is not satisfied, and the disparity is rejected; meanwhile, isolated outliers are removed by adopting a connected domain detection method, small blocks caused by mismatching in the disparity map are removed, and small isolated speckles are filtered;
Figure FDA0003882479240000031
improving the parallax precision, adopting a sub-pixel optimization technology, obtaining the sub-pixel precision by a quadratic curve interpolation method, carrying out quadratic curve fitting on the cost value of the optimal parallax and the cost values of the front parallax and the rear parallax, wherein the parallax value corresponding to the extreme point of the curve is a new sub-pixel parallax value;
adopting Kalman filtering to suppress noise, and optimally correcting parallax by using Kalman gain;
Figure FDA0003882479240000032
is a state estimation equation, representing the estimation of x (k + 1) at the kth time; w (k + 1) is the Kalman gain,
Figure FDA0003882479240000033
estimating an equation for the observation;
Figure FDA0003882479240000034
step 4.5: and (3) performing depth calculation on the parallax map of the binocular camera after the stereo correction to obtain a depth map of the binocular camera left image after the stereo correction, and finally obtaining a space three-dimensional coordinate mapped to the actual three-dimensional space by the recognized object in the binocular camera left image after the stereo correction by combining the object recognition information obtained in the step (3.4).
2. The method for identifying and positioning the defects of the pipeline based on the target detection and the binocular vision according to claim 1, wherein the method for realizing the camera calibration comprises the following steps:
step 1.1: making a chessboard calibration board composed of black and white square intervals, shooting chessboard images of the chessboard calibration board by using a binocular camera in different directions, extracting angular point information of each chessboard image, and obtaining image coordinates and space three-dimensional coordinates of all inner angular points on the chessboard images;
step 1.2: establishing a geometric model of camera imaging, and determining the mutual relation between the three-dimensional geometric position of one point on the surface of the space object and the corresponding point in the image;
step 1.3: taking the image coordinates of all inner corner points on the chessboard image obtained in the step 1.1 and the space three-dimensional coordinates thereof as input, solving and respectively outputting an inner parameter matrix and an outer parameter matrix of a left camera and a right camera of the binocular camera through experiments and calculation according to a geometric model of camera imaging;
step 1.4: solving 5 distortion parameters k according to the internal parameter matrix and the external parameter matrix of the left camera and the right camera of the binocular camera by utilizing the coordinate relationship before and after distortion correction 1 、k 2 、k 3 、p 1 、p 2 To correct for the distortion.
3. The method for identifying and locating defects in pipelines based on target detection and binocular vision according to claim 2, wherein in step 1.2, the extrinsic parameter matrix
Figure FDA0003882479240000041
Reflecting the conversion between a camera coordinate system and a world coordinate system, wherein R is a rotation matrix of a right camera relative to a left camera of the binocular camera, and t is a translation vector of the right camera relative to the left camera of the binocular camera; inner partParameter matrix
Figure FDA0003882479240000042
Reflecting the conversion between the pixel coordinate system and the camera coordinate system, f is the focal length of the lens, u 0 ,v 0 ) As the coordinates of the origin of the image coordinate system in the pixel coordinate system, d x 、d y And the size of each pixel point in the x-axis and y-axis directions of the image coordinate system is obtained.
4. The method for identifying and positioning the defects of the pipeline based on the target detection and the binocular vision according to claim 2, wherein in the step 1.4, the method for correcting the distortion comprises the following steps:
Figure FDA0003882479240000043
wherein (x) p ,y p ) Is the original coordinates of the image, (x) tcorr ,y rcorr ) Is the coordinates after image correction, and r is a variable.
5. The method for identifying and positioning the pipeline defect based on the target detection and the binocular vision according to claim 3, wherein the method for realizing the stereo correction comprises the following steps:
step 2.1: dividing a rotation matrix R of a right camera relative to a left camera of a binocular camera into a synthetic rotation matrix R of the left camera 1 And the composite rotation matrix r of the right camera 2 The left camera and the right camera respectively rotate by half, so that the optical axes of the left camera and the right camera are parallel, and the imaging surfaces of the left camera and the right camera are coplanar;
step 2.2: the composite rotation matrix r of the left camera 1 Composite rotation matrix r of right camera 2 Inputting original internal parameter matrixes of the left camera and the right camera, translation vector t and size of the chessboard image into OpenCV, and outputting a row alignment correction rotation matrix R of the left camera through a cvStereoRectify function 1 And correcting the rotation matrix R by the row alignment of the right camera 2 And the internal reference matrix M of the corrected left camera l And the internal reference matrix M of the corrected right camera r And the corrected left camera projection matrix P l And the corrected right camera projection matrix P r And a reprojection matrix Q;
step 2.3: taking the output result of the cvStereoRectify function in the step 2.2 as a known constant, searching a floating point position on a corresponding source image for each shaped pixel position on a target image by adopting reverse mapping through a correction searching mapping table of left and right views, interpolating each shaped value of surrounding source pixels, cutting the image after the corrected image is assigned, and storing the correction result.
6. The method for identifying and positioning the pipeline defects based on the target detection and the binocular vision according to claim 1, wherein the method for calculating the matching cost comprises the following steps:
Figure FDA0003882479240000044
Figure FDA0003882479240000051
wherein, e (x) R Y, d) denotes the absolute value of the pixel grey value difference, I R (x R Y) representing candidate image pixel point (x) R -0.5,y) and (x) R +0.5, y) of the gray values of the sub-pixel positions,
Figure FDA0003882479240000052
gray value, I, representing the sub-pixel position between the pixels (x + d-0.5, y) and (x + d +0.5, y) of the matched image T (x R + d, y) represents a candidate image pixel point (x) R + d-0.5, y) and (x) R + d +0.5, y) of the gray values of the sub-pixel positions,
Figure FDA0003882479240000053
the gray value representing the sub-pixel position between the pixels (x-0.5, y) and (x +0.5, y) of the matched image, x R Representing the abscissa of the candidate image pixel, y the ordinate of the image pixel, d the disparity and x the abscissa of the matching image pixel.
7. The method for identifying and positioning the pipeline defect based on the target detection and the binocular vision according to claim 1, wherein the method for calculating the multipath cost aggregate value of each pixel in the preset parallax range comprises the following steps:
calculate the path cost of pixel p along path r:
Figure FDA0003882479240000054
wherein p represents a pixel, r represents a path, d represents a parallax, p-r represents a pixel in the path of the pixel p, and L represents an aggregation cost value of a certain path; l is r (p-r, d) represents the cost of the last pixel in the path with a disparity d, L r (p-r, d-1) represents the cost value of the last pixel in the path with a disparity d-1, L r (p-r, d + 1) represents the cost value of the last pixel in the path with disparity d +1, min i L r (p-r, i) represents the minimum of all cost values of the last pixel in the path; c (p, d) represents the initial cost value;
the total path cost value S is obtained from the path cost of pixel p along path r:
S(p,d)=∑ r L r (p,d)。
8. the method for identifying and positioning the defects of the pipeline based on the target detection and the binocular vision according to claim 1, wherein the depth calculation is performed by:
Figure FDA0003882479240000055
wherein f is the focal length of the lens, and b is the base lengthD is parallax, c xr Column coordinates of the principal point of the right camera, c xl Is the column coordinate of the principal point of the left camera.
CN202111360831.6A 2021-11-17 2021-11-17 Pipeline defect identification and positioning method based on target detection and binocular vision Active CN114067197B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111360831.6A CN114067197B (en) 2021-11-17 2021-11-17 Pipeline defect identification and positioning method based on target detection and binocular vision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111360831.6A CN114067197B (en) 2021-11-17 2021-11-17 Pipeline defect identification and positioning method based on target detection and binocular vision

Publications (2)

Publication Number Publication Date
CN114067197A CN114067197A (en) 2022-02-18
CN114067197B true CN114067197B (en) 2022-11-18

Family

ID=80273404

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111360831.6A Active CN114067197B (en) 2021-11-17 2021-11-17 Pipeline defect identification and positioning method based on target detection and binocular vision

Country Status (1)

Country Link
CN (1) CN114067197B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115841449B (en) * 2022-09-23 2023-07-21 昆明市测绘研究院 Monocular distance measurement method for longitudinal positioning of structural defect of drainage pipeline and storage medium
CN115963397B (en) * 2022-12-01 2023-07-25 华中科技大学 Rapid online detection method and device for surface defects of inner contour of motor stator
CN116051658B (en) * 2023-03-27 2023-06-23 北京科技大学 Camera hand-eye calibration method and device for target detection based on binocular vision
CN116399874B (en) * 2023-06-08 2023-08-22 华东交通大学 Method and program product for shear speckle interferometry to non-destructive detect defect size
CN116883945B (en) * 2023-07-21 2024-02-06 江苏省特种设备安全监督检验研究院 Personnel identification positioning method integrating target edge detection and scale invariant feature transformation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107248159A (en) * 2017-08-04 2017-10-13 河海大学常州校区 A kind of metal works defect inspection method based on binocular vision
CN112092929A (en) * 2020-10-12 2020-12-18 王宏辰 Survey car
CN113393439A (en) * 2021-06-11 2021-09-14 重庆理工大学 Forging defect detection method based on deep learning
CN113524194A (en) * 2021-04-28 2021-10-22 重庆理工大学 Target grabbing method of robot vision grabbing system based on multi-mode feature deep learning

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7101616B2 (en) * 2017-08-25 2022-07-15 クリス・シンライ・リウ Machine vision systems and methods for locating target elements
US11758100B2 (en) * 2019-09-11 2023-09-12 The Johns Hopkins University Portable projection mapping device and projection mapping system
CN112906674A (en) * 2021-04-13 2021-06-04 中国矿业大学(北京) Mine fire identification and fire source positioning method based on binocular vision
CN113506286A (en) * 2021-07-27 2021-10-15 西安电子科技大学 Microwave chip defect detection method based on small sample data set of YOLOv5 algorithm

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107248159A (en) * 2017-08-04 2017-10-13 河海大学常州校区 A kind of metal works defect inspection method based on binocular vision
CN112092929A (en) * 2020-10-12 2020-12-18 王宏辰 Survey car
CN113524194A (en) * 2021-04-28 2021-10-22 重庆理工大学 Target grabbing method of robot vision grabbing system based on multi-mode feature deep learning
CN113393439A (en) * 2021-06-11 2021-09-14 重庆理工大学 Forging defect detection method based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Stereo Processing by Semiglobal Matching and Mutual Information;Heiko Hirschmuller;《IEEE Transactions on Pattern Analysis and Machine Intelligence》;20071218;第30卷(第2期);第328-331页 *

Also Published As

Publication number Publication date
CN114067197A (en) 2022-02-18

Similar Documents

Publication Publication Date Title
CN114067197B (en) Pipeline defect identification and positioning method based on target detection and binocular vision
Koch et al. Evaluation of cnn-based single-image depth estimation methods
CN112435325A (en) VI-SLAM and depth estimation network-based unmanned aerial vehicle scene density reconstruction method
CN110689562A (en) Trajectory loop detection optimization method based on generation of countermeasure network
US9940725B2 (en) Method for estimating the speed of movement of a camera
CN115272271A (en) Pipeline defect detecting and positioning ranging system based on binocular stereo vision
US11082633B2 (en) Method of estimating the speed of displacement of a camera
CN106228605A (en) A kind of Stereo matching three-dimensional rebuilding method based on dynamic programming
CN113177565B (en) Binocular vision position measuring system and method based on deep learning
CN110889829A (en) Monocular distance measurement method based on fisheye lens
CN112348775B (en) Vehicle-mounted looking-around-based pavement pit detection system and method
CN113744315B (en) Semi-direct vision odometer based on binocular vision
CN113686314B (en) Monocular water surface target segmentation and monocular distance measurement method for shipborne camera
CN114119739A (en) Binocular vision-based hand key point space coordinate acquisition method
CN104182968A (en) Method for segmenting fuzzy moving targets by wide-baseline multi-array optical detection system
CN113393439A (en) Forging defect detection method based on deep learning
CN112561996A (en) Target detection method in autonomous underwater robot recovery docking
CN117036641A (en) Road scene three-dimensional reconstruction and defect detection method based on binocular vision
CN114648669A (en) Motor train unit fault detection method and system based on domain-adaptive binocular parallax calculation
CN116778288A (en) Multi-mode fusion target detection system and method
CN112017259B (en) Indoor positioning and image building method based on depth camera and thermal imager
CN112308776A (en) Method for solving occlusion and error mapping image sequence and point cloud data fusion
CN116310131A (en) Three-dimensional reconstruction method considering multi-view fusion strategy
CN116189140A (en) Binocular vision-based vehicle three-dimensional target detection algorithm
CN115273080A (en) Lightweight visual semantic odometer method for dynamic scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant