CN111784680B - Detection method based on consistency of key points of left and right eye views of binocular camera - Google Patents
Detection method based on consistency of key points of left and right eye views of binocular camera Download PDFInfo
- Publication number
- CN111784680B CN111784680B CN202010645495.9A CN202010645495A CN111784680B CN 111784680 B CN111784680 B CN 111784680B CN 202010645495 A CN202010645495 A CN 202010645495A CN 111784680 B CN111784680 B CN 111784680B
- Authority
- CN
- China
- Prior art keywords
- cnn
- loss
- right eye
- key points
- eye view
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 54
- 238000012549 training Methods 0.000 claims abstract description 6
- 238000013527 convolutional neural network Methods 0.000 claims description 44
- 238000009432 framing Methods 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 abstract 1
- 238000000034 method Methods 0.000 description 11
- 238000011161 development Methods 0.000 description 3
- 235000004522 Pentaglottis sempervirens Nutrition 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000001629 suppression Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/50—Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
- G06T2207/10012—Stereo images
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Quality & Reliability (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
- Length Measuring Devices By Optical Means (AREA)
Abstract
The invention discloses a binocular camera based detection method for key point consistency of left and right eye views, which comprises the following steps: extracting the histogram feature of the direction gradient by using a deterministic network; combining the extracted directional gradient histogram features, and performing 2D target detection on the left eye view and the right eye view respectively by using a three-dimensional region suggestion network to obtain left eye view and right eye view candidate regions; predicting key points of the left and right sight view candidate areas by using an internal key point prediction module of the stereo area convolution neural network; carrying out consistency matching on key points predicted by the left and right eye views, establishing corresponding loss functions, and minimizing the loss of each task through training; and estimating a 3D frame according to the predicted key points, performing pixel-level precision matching through dense 3D frame alignment, and further correcting the result of the 3D frame estimated in the last step. The invention utilizes the consistency of the left key point and the right key point to improve the accuracy rate in the three-dimensional detection.
Description
Technical Field
The invention relates to the field of binocular camera three-dimensional detection, in particular to a method for performing 3D detection on a binocular camera based on the consistency of key points of left and right eye views.
Background
Object detection is an important part of the field of computer vision, and almost from the beginning of birth, object detection has become an important point of research. The 2D target detection has great development, and the accuracy and the detection speed are obviously improved. With the development of 2D object detection, researchers' eyes begin to focus on 3D object detection. In addition, 3D target detection has important significance in practical application. For example, in the field of unmanned driving, 3D object detection cannot be left, and 3D object detection still has a large development space, so that it is very important to develop a 3D object detection algorithm.
For 3D target detection, most algorithms are based on laser radar, binocular cameras and monocular cameras, and the used methods are different for different detection devices. Of these, the most numerous of the three are lidar-based, and radar-based algorithms are now capable of extremely high accuracy. However, the lidar is expensive, is easily affected by weather changes, particularly in rainy and snowy days, and easily damages human eyes, which is fatal to popularization of unmanned driving. Although the monocular camera is relatively flat and is not influenced by weather, the defects of the laser radar can be overcome, the detection error of the 3D target is large, and the detection result is not satisfactory, so that the monocular 3D target detection is not suitable for popularization. Compared with the prior art, the binocular camera has better comprehensiveness than the two cameras in consideration of the aspects of precision, cost, efficiency and the like. The binocular camera can obtain relatively accurate depth values. Therefore, 3D detection based on the binocular camera has extremely high research significance. The Stereo Region Convolutional Neural network (Stereo R-CNN) is used as a 3D target detection algorithm and has the characteristics of high accuracy, high speed and the like. However, the accuracy of 3D detection is still in a room for improvement.
It is therefore of interest to propose an efficient method for binocular vision 3D detection.
Disclosure of Invention
The invention provides a method for detecting the consistency of key points of left and right eye views based on a binocular camera, which utilizes the consistency of the left and right key points to improve the accuracy rate in three-dimensional detection, and is described in detail as follows:
a detection method based on binocular camera left and right eye view key point consistency comprises the following steps:
extracting directional gradient histogram features by using a deterministic network;
combining the extracted directional gradient histogram features, respectively carrying out 2D target detection on the left eye view and the right eye view by using a three-dimensional region suggestion network to obtain left eye view and right eye view candidate regions;
respectively predicting key points of the left and right eye view candidate regions by using an internal key point prediction module of the stereo regional convolutional neural network;
carrying out consistency matching on the key points predicted by the left and right eye views, establishing corresponding loss functions, and minimizing the loss of each task through training;
and estimating a 3D frame according to the predicted key points, performing pixel-level precision matching through dense 3D frame alignment, and further correcting the result of the 3D frame estimated in the last step.
Wherein the loss function is specifically:
wherein, wcls pRepresenting a classification weight, L, in the RPNcls pRepresents a classification loss in the RPN; w is areg pRepresenting the weight of the regression task in RPN, Lreg pRepresents the loss of the regression task in the RPN; w is acls rRepresents the weight of the classification in R-CNN, Lcls rRepresents a loss of classification in R-CNN; w is abox rRepresenting the weight of boxed tasks in R-CNN, Lbox rRepresents the loss of framing tasks in R-CNN;representing the weight of the angle of view in the R-CNN,represents the loss of angle of view in R-CNN; w is adim rRepresents the weight of a dimension in R-CNN, Ldim rRepresents the loss of dimensionality in R-CNN;represents the weight of the left keypoint in the R-CNN,represents the loss of the left key point in R-CNN;representing the weight of the right keypoint in the R-CNN,represents the loss of the right key point in R-CNN, p represents RPN part, R represents R-CNN part, beta and gamma are the coefficients of the left and right key point items respectively, and beta + gamma is 1.
The technical scheme provided by the invention has the beneficial effects that:
1. the method corrects the key points by using the consistency of the key points of the left and right eye views, so that the accuracy of three-dimensional detection is improved;
2. the method modifies the target function so as to improve the effect of the 3D target detection algorithm;
3. the invention combines multiple ideas together to realize the optimal effect, and is particularly suitable for 3D target detection based on a binocular camera.
Drawings
Fig. 1 is a flowchart of a detection method based on consistency of key points of left and right eye views of a binocular camera.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.
Example 1
The embodiment of the invention provides a detection method based on the consistency of key points of left and right eye views of a binocular camera, and the method comprises the following steps of:
101: extracting Histogram of Oriented Gradient (HOG) features by using a Deterministic network (DetNet);
102: 2D target detection is respectively carried out on the left eye view and the right eye view by utilizing a Stereo region suggestion network (Stereo RPN) in combination with the extracted HOG characteristics to obtain left eye view and right eye view candidate regions;
103: respectively predicting key points of the left and right eye view candidate regions by using an internal key point prediction module of the Stereo R-CNN;
104: carrying out consistency matching on the key points of the left and right eye view predictions, establishing corresponding loss functions, and minimizing the loss of each task through training to improve the accuracy of classification and detection tasks;
105: and estimating a 3D frame according to the predicted key points, performing pixel-level precision matching through dense 3D frame alignment, and further correcting the result of the 3D frame estimated in the last step.
Example 2
The scheme in example 1 is further described below by combining the calculation formula and examples, and the following description refers to:
201: HOG features were extracted from project of Karlsruhe Institute of Technology and Toyota technical Institute at Chicago, KiTTI) datasets using DetNet for subsequent further processing;
202: the region suggestion network (RPN) uses a sliding window to select the region of interest and selects the best result through non-maximum suppression. Because of the binocular camera, the RPN is transformed into a two-way Stereo region suggestion network (Stereo RPN). Generating left and right target RoI areas by adopting a Stereo RPN and performing non-maximum suppression processing to obtain left and right target candidate areas;
203: using the left and right target candidate regions to predict key points, wherein the predicted key points are used for subsequent stereo estimation;
204: since the left and right target views are theoretically consistent, and the difference between the left and right target views is parallax information, corresponding key points in the left and right target views have consistency. And matching the two to establish a corresponding loss function.
The loss function during training is as follows:
wherein wcls pRepresents a classification weight, L, in the RPNcls pRepresenting classification loss in the RPN; w is areg pRepresenting the weight of the regression task in RPN, Lreg pRepresents the loss of the regression task in the RPN; w is acls rRepresents the weight of the classification in R-CNN, Lcls rRepresents a loss of classification in R-CNN; w is abox rRepresenting the weight of boxed tasks in R-CNN, Lbox rRepresents the loss of framing tasks in R-CNN;representing the weight of the angle of view in the R-CNN,represents the loss of angle of view in R-CNN; w is adim rRepresents the weight of a dimension in R-CNN, Ldim rRepresents the loss of dimensionality in R-CNN;represents the weight of the left keypoint in the R-CNN,represents the loss of the left key point in R-CNN;representing the weight of the right keypoint in the R-CNN,represents the loss of the right key point in R-CNN, p represents the RPN fraction, and R represents the R-CNN fraction.
In order to keep the weight of the key point consistent with the stereo frame, the view angle and the dimension, coefficients of the left key point item and the right key point item are respectively beta and gamma, wherein beta + gamma is 1.
Each loss function L is weighted by uncertainty. Experiments show that beta is 0.8 and gamma is 0.2, which can obtain the best result.
205: and performing 3D frame estimation through the obtained prediction key points.
The following test experiments are given for implementing a method for performing binocular camera 3D detection based on left and right eye view key point consistency in the present invention:
The detection performance of the embodiment of the invention is measured by Average accuracy (Average Precision), and the detection indexes comprise 2D detection and 3D detection, and the detection method is divided into three modes of simple (easy), moderate (mode) and difficult (hard) according to the difficulty degree of the detected image. Average Precision represents the probability that the Average score of a relevant tag is ranked higher than other relevant tags; the 2D detection includes detection of left (left), right (right), and stereo (stereo), and the Intersection-over-Union (IoU) of the detection is 0.7 (corresponding to table 1); the 3D detection includes bird's eye view (bird's view) detection and 3D boxes (3D boxes) detection, and IoU is divided into two kinds of 0.5 (corresponding to table 2) and 0.7 (corresponding to table 3).
To evaluate the performance of the method, embodiments of the present invention used 7481 sets of images from the KITTI data set, randomly and roughly divided into two closely numbered groups, one for training and the other for testing. During the evaluation, only the tag of the Car (Car) is considered, and other tags (including bus and the like similar to the Car tag) are not considered.
TABLE 1
TABLE 2
TABLE 3
As can be seen from table 1, the simple, moderate mode accuracy improvement of the proposed method for 2D detection (IoU ═ 0.7) is insignificant, but about 2% improvement for difficult modes; for the 3D detection aspect, when IoU is 0.5 and IoU is 0.7, the bird's eye view detection accuracy is improved by about 1% to 3%, and the 3D frame detection accuracy is also improved. The experimental results prove the effectiveness of the method.
In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited as long as the device can perform the above functions.
Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-mentioned serial numbers of the embodiments of the present invention are only for description and do not represent the merits of the embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (1)
1. A binocular camera based detection method for key point consistency of left and right eye views is characterized by comprising the following steps:
extracting the histogram feature of the direction gradient by using a deterministic network;
combining the extracted directional gradient histogram features, respectively carrying out 2D target detection on the left eye view and the right eye view by using a three-dimensional region suggestion network to obtain left eye view and right eye view candidate regions;
respectively predicting key points of the left and right eye view candidate regions by using an internal key point prediction module of the stereo regional convolutional neural network;
carrying out consistency matching on the key points predicted by the left and right eye views, establishing corresponding loss functions, and minimizing the loss of each task through training;
Estimating a 3D frame according to the predicted key points, performing pixel-level precision matching through dense 3D frame alignment, and further correcting the result of the 3D frame estimated in the previous step;
the consistency matching of the key points of the left and right eye view prediction is as follows: the left and right target views are consistent in theory, the difference between the left and right target views is parallax information, corresponding key points in the left and right target views have consistency, and the predicted key points of the left and right target views are matched;
the loss function is specifically:
wherein wcls pRepresents a classification weight, L, in the RPNcls pRepresenting classification loss in the RPN; w is areg pRepresenting the weight of the regression task in RPN, Lreg pRepresents the loss of the regression task in the RPN; w is acls rRepresents the weight of the classification in R-CNN, Lcls rRepresents a loss of classification in R-CNN; w is abox rRepresenting the weight of boxed tasks in R-CNN, Lbox rRepresents the loss of framing tasks in R-CNN;representing the weight of the angle of view in the R-CNN,represents the loss of angle of view in R-CNN; w is adim rRepresents the weight of a dimension in R-CNN, Ldim rRepresents the loss of dimensionality in R-CNN;represents the weight of the left keypoint in the R-CNN,represents the loss of the left key point in R-CNN;representing the weight of the right keypoint in the R-CNN,represents the loss of the right key point in R-CNN, p represents RPN part, R represents R-CNN part, beta and gamma are the coefficients of the left and right key point items respectively, and beta + gamma is 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010645495.9A CN111784680B (en) | 2020-07-06 | 2020-07-06 | Detection method based on consistency of key points of left and right eye views of binocular camera |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010645495.9A CN111784680B (en) | 2020-07-06 | 2020-07-06 | Detection method based on consistency of key points of left and right eye views of binocular camera |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111784680A CN111784680A (en) | 2020-10-16 |
CN111784680B true CN111784680B (en) | 2022-06-28 |
Family
ID=72758031
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010645495.9A Expired - Fee Related CN111784680B (en) | 2020-07-06 | 2020-07-06 | Detection method based on consistency of key points of left and right eye views of binocular camera |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111784680B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114743045B (en) * | 2022-03-31 | 2023-09-26 | 电子科技大学 | Small sample target detection method based on double-branch area suggestion network |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108317953A (en) * | 2018-01-19 | 2018-07-24 | 东北电力大学 | A kind of binocular vision target surface 3D detection methods and system based on unmanned plane |
CN108335331A (en) * | 2018-01-31 | 2018-07-27 | 华中科技大学 | A kind of coil of strip binocular visual positioning method and apparatus |
CN110110793A (en) * | 2019-05-10 | 2019-08-09 | 中山大学 | Binocular image fast target detection method based on double-current convolutional neural networks |
CN110322507A (en) * | 2019-06-04 | 2019-10-11 | 东南大学 | A method of based on depth re-projection and Space Consistency characteristic matching |
CN111126269A (en) * | 2019-12-24 | 2020-05-08 | 京东数字科技控股有限公司 | Three-dimensional target detection method, device and storage medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10970425B2 (en) * | 2017-12-26 | 2021-04-06 | Seiko Epson Corporation | Object detection and tracking |
-
2020
- 2020-07-06 CN CN202010645495.9A patent/CN111784680B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108317953A (en) * | 2018-01-19 | 2018-07-24 | 东北电力大学 | A kind of binocular vision target surface 3D detection methods and system based on unmanned plane |
CN108335331A (en) * | 2018-01-31 | 2018-07-27 | 华中科技大学 | A kind of coil of strip binocular visual positioning method and apparatus |
CN110110793A (en) * | 2019-05-10 | 2019-08-09 | 中山大学 | Binocular image fast target detection method based on double-current convolutional neural networks |
CN110322507A (en) * | 2019-06-04 | 2019-10-11 | 东南大学 | A method of based on depth re-projection and Space Consistency characteristic matching |
CN111126269A (en) * | 2019-12-24 | 2020-05-08 | 京东数字科技控股有限公司 | Three-dimensional target detection method, device and storage medium |
Non-Patent Citations (3)
Title |
---|
Cl'ement Godard等.Unsupervised Monocular Depth Estimation with Left-Right Consistency.《2017 IEEE Conference on Computer Vision and Pattern Recognition》.2017, * |
Stereo R-CNN based 3D Object Detection for Autonomous Driving;Peiliang Li等;《2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)》;20191231;全文 * |
基于迭代式自主学习的三维目标检测;王康如等;《光学学报》;20200510(第09期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111784680A (en) | 2020-10-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7106665B2 (en) | MONOCULAR DEPTH ESTIMATION METHOD AND DEVICE, DEVICE AND STORAGE MEDIUM THEREOF | |
US11145078B2 (en) | Depth information determining method and related apparatus | |
CN108648161B (en) | Binocular vision obstacle detection system and method of asymmetric kernel convolution neural network | |
CN107491071B (en) | Intelligent multi-robot cooperative mapping system and method thereof | |
CN109887021B (en) | Cross-scale-based random walk stereo matching method | |
JP6574611B2 (en) | Sensor system for obtaining distance information based on stereoscopic images | |
CN110246151B (en) | Underwater robot target tracking method based on deep learning and monocular vision | |
US11822621B2 (en) | Systems and methods for training a machine-learning-based monocular depth estimator | |
CN102156995A (en) | Video movement foreground dividing method in moving camera | |
CN110992424B (en) | Positioning method and system based on binocular vision | |
US20220083789A1 (en) | Real-Time Target Detection And 3d Localization Method Based On Single Frame Image | |
dos Santos Rosa et al. | Sparse-to-continuous: Enhancing monocular depth estimation using occupancy maps | |
CN112686952A (en) | Image optical flow computing system, method and application | |
CN116310673A (en) | Three-dimensional target detection method based on fusion of point cloud and image features | |
CN111784680B (en) | Detection method based on consistency of key points of left and right eye views of binocular camera | |
US11868438B2 (en) | Method and system for self-supervised learning of pillar motion for autonomous driving | |
Tian et al. | Monocular depth estimation based on a single image: a literature review | |
CN111695480B (en) | Real-time target detection and 3D positioning method based on single frame image | |
Ji et al. | Stereo 3D object detection via instance depth prior guidance and adaptive spatial feature aggregation | |
US9323999B2 (en) | Image recoginition device, image recognition method, and image recognition program | |
CN110864670A (en) | Method and system for acquiring position of target obstacle | |
CN113284221B (en) | Target detection method and device and electronic equipment | |
JP2016004382A (en) | Motion information estimation device | |
CN115272450A (en) | Target positioning method based on panoramic segmentation | |
Akın et al. | Challenges in determining the depth in 2-d images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Yu Jiexiao Inventor after: Zhang Meiqi Inventor after: Jing Peiguang Inventor after: Su Yuting Inventor before: Yu Jiexiao Inventor before: Jing Peiguang Inventor before: Zhang Meiqi Inventor before: Su Yuting |
|
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20220628 |