CN111784680A - Detection method based on consistency of key points of left and right eye views of binocular camera - Google Patents
Detection method based on consistency of key points of left and right eye views of binocular camera Download PDFInfo
- Publication number
- CN111784680A CN111784680A CN202010645495.9A CN202010645495A CN111784680A CN 111784680 A CN111784680 A CN 111784680A CN 202010645495 A CN202010645495 A CN 202010645495A CN 111784680 A CN111784680 A CN 111784680A
- Authority
- CN
- China
- Prior art keywords
- cnn
- loss
- right eye
- eye view
- representing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 52
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 46
- 238000012549 training Methods 0.000 claims abstract description 6
- 238000009432 framing Methods 0.000 claims description 3
- 238000000034 method Methods 0.000 abstract description 13
- 238000011161 development Methods 0.000 description 3
- 235000004522 Pentaglottis sempervirens Nutrition 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000001629 suppression Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/50—Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
- G06T2207/10012—Stereo images
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Quality & Reliability (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
- Length Measuring Devices By Optical Means (AREA)
Abstract
The invention discloses a method for detecting consistency of key points of left and right eye views based on a binocular camera, which comprises the following steps: extracting directional gradient histogram features by using a deterministic network; combining the extracted directional gradient histogram features, respectively carrying out 2D target detection on the left eye view and the right eye view by using a three-dimensional region suggestion network to obtain left eye view and right eye view candidate regions; respectively predicting key points of the left and right eye view candidate regions by using an internal key point prediction module of the stereo regional convolutional neural network; carrying out consistency matching on the key points predicted by the left and right eye views, establishing corresponding loss functions, and minimizing the loss of each task through training; and estimating a 3D frame according to the predicted key points, performing pixel-level precision matching through dense 3D frame alignment, and further correcting the result of the 3D frame estimated in the last step. The invention improves the accuracy rate in the three-dimensional detection by utilizing the consistency of the left key point and the right key point.
Description
Technical Field
The invention relates to the field of binocular camera stereo detection, in particular to a method for performing binocular camera 3D detection based on left and right eye view key point consistency.
Background
Object detection is an important part of the field of computer vision, and is beginning to be a focus of research almost from the beginning of birth of computers. The 2D target detection has great development, and the accuracy and the detection speed are obviously improved. With the development of 2D object detection, researchers' eyes are beginning to be directed to 3D object detection. In addition, 3D target detection has important significance in practical application. For example, in the field of unmanned driving, 3D object detection cannot be separated, and 3D object detection still has a large development space, so that it is very important to develop a 3D object detection algorithm.
For 3D target detection, most algorithms are based on laser radar, binocular cameras and monocular cameras, and the used methods are different for different detection devices. Of these, the most numerous of the three are lidar-based, and radar-based algorithms are now capable of extremely high accuracy. However, the lidar is expensive, is easily affected by weather changes, particularly in rainy and snowy days, and easily damages human eyes, which is fatal to popularization of unmanned driving. Although the monocular camera is relatively flat and is not influenced by weather, the defects of the laser radar can be overcome, the detection error of the 3D target is large, and the detection result is not satisfactory, so that the monocular 3D target detection is not suitable for popularization. Compared with the prior art, the binocular camera has the advantages that the comprehensiveness is better than that of the two cameras in the aspects of precision, cost, efficiency and the like. The binocular camera can obtain relatively accurate depth values. Therefore, 3D detection based on the binocular camera has extremely high research significance. The Stereo region Convolutional Neural network (StereoRegion Convolutional Neural Networks, StereoR-CNN) is used as a 3D target detection algorithm and has the characteristics of high accuracy, high speed and the like. However, the accuracy of 3D detection is still in a room for improvement.
It is therefore of interest to propose an efficient method for binocular vision 3D detection.
Disclosure of Invention
The invention provides a method for detecting the consistency of key points of left and right eye views based on a binocular camera, which utilizes the consistency of the left and right key points to improve the accuracy rate in three-dimensional detection, and is described in detail as follows:
a detection method based on binocular camera left and right eye view key point consistency comprises the following steps:
extracting directional gradient histogram features by using a deterministic network;
combining the extracted directional gradient histogram features, respectively carrying out 2D target detection on the left eye view and the right eye view by using a three-dimensional region suggestion network to obtain left eye view and right eye view candidate regions;
respectively predicting key points of the left and right eye view candidate regions by using an internal key point prediction module of the stereo regional convolutional neural network;
carrying out consistency matching on the key points predicted by the left and right eye views, establishing corresponding loss functions, and minimizing the loss of each task through training;
and estimating a 3D frame according to the predicted key points, performing pixel-level precision matching through dense 3D frame alignment, and further correcting the result of the 3D frame estimated in the last step.
Wherein the loss function is specifically:
wherein, wcls pRepresenting a classification weight, L, in the RPNcls pRepresents a classification loss in the RPN; w is areg pRepresenting the weight of the regression task in RPN, Lreg pRepresents the loss of the regression task in the RPN; w is acls rRepresents the weight of the classification in R-CNN, Lcls rRepresents a loss of classification in R-CNN; w is abox rRepresenting the weight of boxed tasks in R-CNN, Lbox rRepresents the loss of framing tasks in R-CNN;representing the weight of the angle of view in the R-CNN,represents the loss of angle of view in R-CNN; w is adim rRepresents the weight of a dimension in R-CNN, Ldim rRepresents the loss of dimensionality in R-CNN;represents the weight of the left keypoint in the R-CNN,represents the loss of the left key point in R-CNN;representing the weight of the right keypoint in the R-CNN,representing the loss of the right key point in R-CNN, p representing RPN part, R representing R-CNN part, β and gamma representing the left and right key point item coefficients respectively, β + gamma being 1.
The technical scheme provided by the invention has the beneficial effects that:
1. the method corrects the key points by using the consistency of the key points of the left and right eye views, so that the accuracy of three-dimensional detection is improved;
2. the method modifies the target function so as to improve the effect of the 3D target detection algorithm;
3. the invention combines multiple ideas together to realize the optimal effect, and is particularly suitable for 3D target detection based on a binocular camera.
Drawings
Fig. 1 is a flowchart of a detection method based on consistency of key points of left and right eye views of a binocular camera.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.
Example 1
The embodiment of the invention provides a method for detecting consistency of key points of left and right eye views based on a binocular camera, and the method comprises the following steps of:
101: extracting Histogram of Oriented Gradient (HOG) features by using a Deterministic network (DetNet);
102: 2D target detection is respectively carried out on the left eye view and the right eye view by utilizing a Stereo region suggestion network (Stereo RPN) in combination with the extracted HOG characteristics to obtain left eye view and right eye view candidate regions;
103: respectively predicting key points of the left and right eye view candidate regions by using an internal key point prediction module of the Stereo R-CNN;
104: carrying out consistency matching on the key points of the left and right eye view predictions, establishing corresponding loss functions, and minimizing the loss of each task through training to improve the accuracy of classification and detection tasks;
105: and estimating a 3D frame according to the predicted key points, performing pixel-level precision matching through dense 3D frame alignment, and further correcting the result of the 3D frame estimated in the last step.
Example 2
The scheme in example 1 is further described below by combining the calculation formula and examples, and the following description refers to:
201: HOG features were extracted from project (Aproject of Karlsruhe Institute of Technology and Toyota technical Institute at Chicago, KITTI) datasets at the Carlsuhe technical Institute and Chicago Toyota technical Institute with DetNet for subsequent further processing;
202: the region suggestion network (RPN) uses a sliding window to select the region of interest and selects the best result through non-maximum suppression. Because of the binocular camera, the RPN is transformed into a two-way Stereo region suggestion network (Stereo RPN). Generating left and right target RoI areas by adopting a Stereo RPN and performing non-maximum suppression processing to obtain left and right target candidate areas;
203: using the left and right target candidate regions to predict key points, wherein the predicted key points are used for subsequent stereo estimation;
204: since the left and right target views are theoretically consistent, and the difference between the left and right target views is parallax information, corresponding key points in the left and right target views have consistency. And matching the two to establish a corresponding loss function.
The loss function during training is as follows:
wherein, wcls pRepresenting a classification weight, L, in the RPNcls pRepresents a classification loss in the RPN; w is areg pRepresenting the weight of the regression task in RPN, Lreg pRepresents the loss of the regression task in the RPN; w is acls rRepresents the weight of the classification in R-CNN, Lcls rRepresents a loss of classification in R-CNN; w is abox rRepresenting the weight of boxed tasks in R-CNN, Lbox rRepresents the loss of framing tasks in R-CNN;representing the weight of the angle of view in the R-CNN,represents the loss of angle of view in R-CNN; w is adim rRepresents the weight of a dimension in R-CNN, Ldim rRepresents the loss of dimensionality in R-CNN;represents the weight of the left keypoint in the R-CNN,represents the loss of the left key point in R-CNN;representing the weight of the right keypoint in the R-CNN,represents the loss of the right key point in R-CNN, p represents the RPN fraction, and R represents the R-CNN fraction.
In order to keep the weight of the key point consistent with the stereo frame, the view angle and the dimension, coefficients of the left key point item and the right key point item are respectively beta and gamma, wherein beta + gamma is 1.
Each loss function L is weighted by uncertainty. Experiments show that beta is 0.8 and gamma is 0.2, which can obtain the best result.
205: and performing 3D frame estimation through the obtained prediction key points.
The following test experiments are given for implementing a method for performing binocular camera 3D detection based on left and right eye view key point consistency in the present invention:
the detection performance of the embodiment of the invention is measured by Average accuracy (Average Precision), and the detection indexes comprise 2D detection and 3D detection, and the detection method is divided into three modes of simple (easy), moderate (mode) and difficult (hard) according to the difficulty degree of the detected image. Average Precision represents the probability that the Average score of a relevant tag is ranked higher than other relevant tags; the 2D detection includes detection of left (left), right (right), and stereo (stereo), and the Intersection-over-Union (IoU) of the detection is 0.7 (corresponding to table 1); the 3D detection includes bird's eye view (bird's view) detection and 3D boxes (3D boxes) detection, and IoU is divided into two kinds of 0.5 (corresponding to table 2) and 0.7 (corresponding to table 3).
To evaluate the performance of the method, embodiments of the present invention used 7481 sets of images from the KITTI data set, randomly and roughly divided into two closely numbered groups, one for training and the other for testing. During the evaluation, only the tag of the Car (Car) is considered, and other tags (including bus and the like similar to the Car tag) are not considered.
TABLE 1
TABLE 2
TABLE 3
As can be seen from table 1, the simple, moderate mode accuracy improvement of the proposed method for 2D detection (IoU ═ 0.7) is insignificant, but about 2% improvement for difficult modes; for the 3D detection aspect, when IoU is 0.5 and IoU is 0.7, the bird's eye view detection accuracy is improved by about 1% to 3%, and the 3D frame detection accuracy is also improved. The experimental results prove the effectiveness of the method.
In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.
Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (2)
1. A detection method based on binocular camera left and right view key point consistency is characterized by comprising the following steps:
extracting directional gradient histogram features by using a deterministic network;
combining the extracted directional gradient histogram features, respectively carrying out 2D target detection on the left eye view and the right eye view by using a three-dimensional region suggestion network to obtain left eye view and right eye view candidate regions;
respectively predicting key points of the left and right eye view candidate regions by using an internal key point prediction module of the stereo regional convolutional neural network;
carrying out consistency matching on the key points predicted by the left and right eye views, establishing corresponding loss functions, and minimizing the loss of each task through training;
and estimating a 3D frame according to the predicted key points, performing pixel-level precision matching through dense 3D frame alignment, and further correcting the result of the 3D frame estimated in the last step.
2. The binocular camera based left and right eye view key point consistency detection method according to claim 1, wherein the loss function specifically is:
wherein, wcls pRepresenting a classification weight, L, in the RPNcls pRepresents a classification loss in the RPN; w is areg pRepresenting the weight of the regression task in RPN, Lreg pRepresents the loss of the regression task in the RPN; w is acls rRepresents the weight of the classification in R-CNN, Lcls rRepresents a loss of classification in R-CNN; w is abox rRepresenting the weight of boxed tasks in R-CNN, Lbox rRepresents the loss of framing tasks in R-CNN;representing the weight of the angle of view in the R-CNN,represents the loss of angle of view in R-CNN; w is adim rRepresents the weight of a dimension in R-CNN, Ldim rRepresents the loss of dimensionality in R-CNN;represents the weight of the left keypoint in the R-CNN,represents the loss of the left key point in R-CNN;representing the weight of the right keypoint in the R-CNN,representing the loss of the right key point in R-CNN, p representing RPN part, R representing R-CNN part, β and gamma representing the left and right key point item coefficients respectively, β + gamma being 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010645495.9A CN111784680B (en) | 2020-07-06 | 2020-07-06 | Detection method based on consistency of key points of left and right eye views of binocular camera |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010645495.9A CN111784680B (en) | 2020-07-06 | 2020-07-06 | Detection method based on consistency of key points of left and right eye views of binocular camera |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111784680A true CN111784680A (en) | 2020-10-16 |
CN111784680B CN111784680B (en) | 2022-06-28 |
Family
ID=72758031
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010645495.9A Expired - Fee Related CN111784680B (en) | 2020-07-06 | 2020-07-06 | Detection method based on consistency of key points of left and right eye views of binocular camera |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111784680B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114743045A (en) * | 2022-03-31 | 2022-07-12 | 电子科技大学 | Small sample target detection method based on double-branch area suggestion network |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108317953A (en) * | 2018-01-19 | 2018-07-24 | 东北电力大学 | A kind of binocular vision target surface 3D detection methods and system based on unmanned plane |
CN108335331A (en) * | 2018-01-31 | 2018-07-27 | 华中科技大学 | A kind of coil of strip binocular visual positioning method and apparatus |
US20190197196A1 (en) * | 2017-12-26 | 2019-06-27 | Seiko Epson Corporation | Object detection and tracking |
CN110110793A (en) * | 2019-05-10 | 2019-08-09 | 中山大学 | Binocular image fast target detection method based on double-current convolutional neural networks |
CN110322507A (en) * | 2019-06-04 | 2019-10-11 | 东南大学 | A method of based on depth re-projection and Space Consistency characteristic matching |
CN111126269A (en) * | 2019-12-24 | 2020-05-08 | 京东数字科技控股有限公司 | Three-dimensional target detection method, device and storage medium |
-
2020
- 2020-07-06 CN CN202010645495.9A patent/CN111784680B/en not_active Expired - Fee Related
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190197196A1 (en) * | 2017-12-26 | 2019-06-27 | Seiko Epson Corporation | Object detection and tracking |
CN108317953A (en) * | 2018-01-19 | 2018-07-24 | 东北电力大学 | A kind of binocular vision target surface 3D detection methods and system based on unmanned plane |
CN108335331A (en) * | 2018-01-31 | 2018-07-27 | 华中科技大学 | A kind of coil of strip binocular visual positioning method and apparatus |
CN110110793A (en) * | 2019-05-10 | 2019-08-09 | 中山大学 | Binocular image fast target detection method based on double-current convolutional neural networks |
CN110322507A (en) * | 2019-06-04 | 2019-10-11 | 东南大学 | A method of based on depth re-projection and Space Consistency characteristic matching |
CN111126269A (en) * | 2019-12-24 | 2020-05-08 | 京东数字科技控股有限公司 | Three-dimensional target detection method, device and storage medium |
Non-Patent Citations (3)
Title |
---|
CL´EMENT GODARD等: "Unsupervised Monocular Depth Estimation with Left-Right Consistency", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 * |
PEILIANG LI等: "Stereo R-CNN based 3D Object Detection for Autonomous Driving", 《2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 * |
王康如等: "基于迭代式自主学习的三维目标检测", 《光学学报》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114743045A (en) * | 2022-03-31 | 2022-07-12 | 电子科技大学 | Small sample target detection method based on double-branch area suggestion network |
CN114743045B (en) * | 2022-03-31 | 2023-09-26 | 电子科技大学 | Small sample target detection method based on double-branch area suggestion network |
Also Published As
Publication number | Publication date |
---|---|
CN111784680B (en) | 2022-06-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7106665B2 (en) | MONOCULAR DEPTH ESTIMATION METHOD AND DEVICE, DEVICE AND STORAGE MEDIUM THEREOF | |
CN108648161B (en) | Binocular vision obstacle detection system and method of asymmetric kernel convolution neural network | |
CN102156995A (en) | Video movement foreground dividing method in moving camera | |
JP2016009487A (en) | Sensor system for determining distance information on the basis of stereoscopic image | |
Xie et al. | A binocular vision application in IoT: Realtime trustworthy road condition detection system in passable area | |
US20220083789A1 (en) | Real-Time Target Detection And 3d Localization Method Based On Single Frame Image | |
CN110992378B (en) | Dynamic updating vision tracking aerial photographing method and system based on rotor flying robot | |
dos Santos Rosa et al. | Sparse-to-continuous: Enhancing monocular depth estimation using occupancy maps | |
US11868438B2 (en) | Method and system for self-supervised learning of pillar motion for autonomous driving | |
CN111784680B (en) | Detection method based on consistency of key points of left and right eye views of binocular camera | |
Kao et al. | Moving object segmentation using depth and optical flow in car driving sequences | |
CN111695480B (en) | Real-time target detection and 3D positioning method based on single frame image | |
Ji et al. | Stereo 3D object detection via instance depth prior guidance and adaptive spatial feature aggregation | |
CN106650814B (en) | Outdoor road self-adaptive classifier generation method based on vehicle-mounted monocular vision | |
Tao et al. | An efficient 3D object detection method based on fast guided anchor stereo RCNN | |
Kerkaou et al. | Support vector machines based stereo matching method for advanced driver assistance systems | |
Li et al. | CDMY: A lightweight object detection model based on coordinate attention | |
CN113284221B (en) | Target detection method and device and electronic equipment | |
Kim et al. | Stereo-based region of interest generation for real-time pedestrian detection | |
CN115272450A (en) | Target positioning method based on panoramic segmentation | |
Lu et al. | A geometric convolutional neural network for 3d object detection | |
Zhang et al. | An Improved Detection Algorithm For Pre-processing Problem Based On PointPillars | |
Akın et al. | Challenges in determining the depth in 2-d images | |
Zhang et al. | Learning deformable network for 3D object detection on point clouds | |
Zeng | High efficiency pedestrian crossing prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Yu Jiexiao Inventor after: Zhang Meiqi Inventor after: Jing Peiguang Inventor after: Su Yuting Inventor before: Yu Jiexiao Inventor before: Jing Peiguang Inventor before: Zhang Meiqi Inventor before: Su Yuting |
|
CB03 | Change of inventor or designer information | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20220628 |