CN111784680B - Detection method based on consistency of key points of left and right eye views of binocular camera - Google Patents

Detection method based on consistency of key points of left and right eye views of binocular camera Download PDF

Info

Publication number
CN111784680B
CN111784680B CN202010645495.9A CN202010645495A CN111784680B CN 111784680 B CN111784680 B CN 111784680B CN 202010645495 A CN202010645495 A CN 202010645495A CN 111784680 B CN111784680 B CN 111784680B
Authority
CN
China
Prior art keywords
cnn
loss
right eye
key points
eye view
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN202010645495.9A
Other languages
Chinese (zh)
Other versions
CN111784680A (en
Inventor
于洁潇
张美琪
井佩光
苏育挺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202010645495.9A priority Critical patent/CN111784680B/en
Publication of CN111784680A publication Critical patent/CN111784680A/en
Application granted granted Critical
Publication of CN111784680B publication Critical patent/CN111784680B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Length Measuring Devices By Optical Means (AREA)

Abstract

The invention discloses a binocular camera based detection method for key point consistency of left and right eye views, which comprises the following steps: extracting the histogram feature of the direction gradient by using a deterministic network; combining the extracted directional gradient histogram features, and performing 2D target detection on the left eye view and the right eye view respectively by using a three-dimensional region suggestion network to obtain left eye view and right eye view candidate regions; predicting key points of the left and right sight view candidate areas by using an internal key point prediction module of the stereo area convolution neural network; carrying out consistency matching on key points predicted by the left and right eye views, establishing corresponding loss functions, and minimizing the loss of each task through training; and estimating a 3D frame according to the predicted key points, performing pixel-level precision matching through dense 3D frame alignment, and further correcting the result of the 3D frame estimated in the last step. The invention utilizes the consistency of the left key point and the right key point to improve the accuracy rate in the three-dimensional detection.

Description

Detection method based on consistency of key points of left and right eye views of binocular camera
Technical Field
The invention relates to the field of binocular camera three-dimensional detection, in particular to a method for performing 3D detection on a binocular camera based on the consistency of key points of left and right eye views.
Background
Object detection is an important part of the field of computer vision, and almost from the beginning of birth, object detection has become an important point of research. The 2D target detection has great development, and the accuracy and the detection speed are obviously improved. With the development of 2D object detection, researchers' eyes begin to focus on 3D object detection. In addition, 3D target detection has important significance in practical application. For example, in the field of unmanned driving, 3D object detection cannot be left, and 3D object detection still has a large development space, so that it is very important to develop a 3D object detection algorithm.
For 3D target detection, most algorithms are based on laser radar, binocular cameras and monocular cameras, and the used methods are different for different detection devices. Of these, the most numerous of the three are lidar-based, and radar-based algorithms are now capable of extremely high accuracy. However, the lidar is expensive, is easily affected by weather changes, particularly in rainy and snowy days, and easily damages human eyes, which is fatal to popularization of unmanned driving. Although the monocular camera is relatively flat and is not influenced by weather, the defects of the laser radar can be overcome, the detection error of the 3D target is large, and the detection result is not satisfactory, so that the monocular 3D target detection is not suitable for popularization. Compared with the prior art, the binocular camera has better comprehensiveness than the two cameras in consideration of the aspects of precision, cost, efficiency and the like. The binocular camera can obtain relatively accurate depth values. Therefore, 3D detection based on the binocular camera has extremely high research significance. The Stereo Region Convolutional Neural network (Stereo R-CNN) is used as a 3D target detection algorithm and has the characteristics of high accuracy, high speed and the like. However, the accuracy of 3D detection is still in a room for improvement.
It is therefore of interest to propose an efficient method for binocular vision 3D detection.
Disclosure of Invention
The invention provides a method for detecting the consistency of key points of left and right eye views based on a binocular camera, which utilizes the consistency of the left and right key points to improve the accuracy rate in three-dimensional detection, and is described in detail as follows:
a detection method based on binocular camera left and right eye view key point consistency comprises the following steps:
extracting directional gradient histogram features by using a deterministic network;
combining the extracted directional gradient histogram features, respectively carrying out 2D target detection on the left eye view and the right eye view by using a three-dimensional region suggestion network to obtain left eye view and right eye view candidate regions;
respectively predicting key points of the left and right eye view candidate regions by using an internal key point prediction module of the stereo regional convolutional neural network;
carrying out consistency matching on the key points predicted by the left and right eye views, establishing corresponding loss functions, and minimizing the loss of each task through training;
and estimating a 3D frame according to the predicted key points, performing pixel-level precision matching through dense 3D frame alignment, and further correcting the result of the 3D frame estimated in the last step.
Wherein the loss function is specifically:
Figure BDA0002571457000000021
wherein, wcls pRepresenting a classification weight, L, in the RPNcls pRepresents a classification loss in the RPN; w is areg pRepresenting the weight of the regression task in RPN, Lreg pRepresents the loss of the regression task in the RPN; w is acls rRepresents the weight of the classification in R-CNN, Lcls rRepresents a loss of classification in R-CNN; w is abox rRepresenting the weight of boxed tasks in R-CNN, Lbox rRepresents the loss of framing tasks in R-CNN;
Figure BDA0002571457000000022
representing the weight of the angle of view in the R-CNN,
Figure BDA0002571457000000023
represents the loss of angle of view in R-CNN; w is adim rRepresents the weight of a dimension in R-CNN, Ldim rRepresents the loss of dimensionality in R-CNN;
Figure BDA0002571457000000024
represents the weight of the left keypoint in the R-CNN,
Figure BDA0002571457000000025
represents the loss of the left key point in R-CNN;
Figure BDA0002571457000000026
representing the weight of the right keypoint in the R-CNN,
Figure BDA0002571457000000027
represents the loss of the right key point in R-CNN, p represents RPN part, R represents R-CNN part, beta and gamma are the coefficients of the left and right key point items respectively, and beta + gamma is 1.
The technical scheme provided by the invention has the beneficial effects that:
1. the method corrects the key points by using the consistency of the key points of the left and right eye views, so that the accuracy of three-dimensional detection is improved;
2. the method modifies the target function so as to improve the effect of the 3D target detection algorithm;
3. the invention combines multiple ideas together to realize the optimal effect, and is particularly suitable for 3D target detection based on a binocular camera.
Drawings
Fig. 1 is a flowchart of a detection method based on consistency of key points of left and right eye views of a binocular camera.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.
Example 1
The embodiment of the invention provides a detection method based on the consistency of key points of left and right eye views of a binocular camera, and the method comprises the following steps of:
101: extracting Histogram of Oriented Gradient (HOG) features by using a Deterministic network (DetNet);
102: 2D target detection is respectively carried out on the left eye view and the right eye view by utilizing a Stereo region suggestion network (Stereo RPN) in combination with the extracted HOG characteristics to obtain left eye view and right eye view candidate regions;
103: respectively predicting key points of the left and right eye view candidate regions by using an internal key point prediction module of the Stereo R-CNN;
104: carrying out consistency matching on the key points of the left and right eye view predictions, establishing corresponding loss functions, and minimizing the loss of each task through training to improve the accuracy of classification and detection tasks;
105: and estimating a 3D frame according to the predicted key points, performing pixel-level precision matching through dense 3D frame alignment, and further correcting the result of the 3D frame estimated in the last step.
Example 2
The scheme in example 1 is further described below by combining the calculation formula and examples, and the following description refers to:
201: HOG features were extracted from project of Karlsruhe Institute of Technology and Toyota technical Institute at Chicago, KiTTI) datasets using DetNet for subsequent further processing;
202: the region suggestion network (RPN) uses a sliding window to select the region of interest and selects the best result through non-maximum suppression. Because of the binocular camera, the RPN is transformed into a two-way Stereo region suggestion network (Stereo RPN). Generating left and right target RoI areas by adopting a Stereo RPN and performing non-maximum suppression processing to obtain left and right target candidate areas;
203: using the left and right target candidate regions to predict key points, wherein the predicted key points are used for subsequent stereo estimation;
204: since the left and right target views are theoretically consistent, and the difference between the left and right target views is parallax information, corresponding key points in the left and right target views have consistency. And matching the two to establish a corresponding loss function.
The loss function during training is as follows:
Figure BDA0002571457000000041
wherein wcls pRepresents a classification weight, L, in the RPNcls pRepresenting classification loss in the RPN; w is areg pRepresenting the weight of the regression task in RPN, Lreg pRepresents the loss of the regression task in the RPN; w is acls rRepresents the weight of the classification in R-CNN, Lcls rRepresents a loss of classification in R-CNN; w is abox rRepresenting the weight of boxed tasks in R-CNN, Lbox rRepresents the loss of framing tasks in R-CNN;
Figure BDA0002571457000000042
representing the weight of the angle of view in the R-CNN,
Figure BDA0002571457000000043
represents the loss of angle of view in R-CNN; w is adim rRepresents the weight of a dimension in R-CNN, Ldim rRepresents the loss of dimensionality in R-CNN;
Figure BDA0002571457000000044
represents the weight of the left keypoint in the R-CNN,
Figure BDA0002571457000000045
represents the loss of the left key point in R-CNN;
Figure BDA0002571457000000046
representing the weight of the right keypoint in the R-CNN,
Figure BDA0002571457000000047
represents the loss of the right key point in R-CNN, p represents the RPN fraction, and R represents the R-CNN fraction.
In order to keep the weight of the key point consistent with the stereo frame, the view angle and the dimension, coefficients of the left key point item and the right key point item are respectively beta and gamma, wherein beta + gamma is 1.
Each loss function L is weighted by uncertainty. Experiments show that beta is 0.8 and gamma is 0.2, which can obtain the best result.
205: and performing 3D frame estimation through the obtained prediction key points.
The following test experiments are given for implementing a method for performing binocular camera 3D detection based on left and right eye view key point consistency in the present invention:
The detection performance of the embodiment of the invention is measured by Average accuracy (Average Precision), and the detection indexes comprise 2D detection and 3D detection, and the detection method is divided into three modes of simple (easy), moderate (mode) and difficult (hard) according to the difficulty degree of the detected image. Average Precision represents the probability that the Average score of a relevant tag is ranked higher than other relevant tags; the 2D detection includes detection of left (left), right (right), and stereo (stereo), and the Intersection-over-Union (IoU) of the detection is 0.7 (corresponding to table 1); the 3D detection includes bird's eye view (bird's view) detection and 3D boxes (3D boxes) detection, and IoU is divided into two kinds of 0.5 (corresponding to table 2) and 0.7 (corresponding to table 3).
To evaluate the performance of the method, embodiments of the present invention used 7481 sets of images from the KITTI data set, randomly and roughly divided into two closely numbered groups, one for training and the other for testing. During the evaluation, only the tag of the Car (Car) is considered, and other tags (including bus and the like similar to the Car tag) are not considered.
TABLE 1
Figure BDA0002571457000000048
TABLE 2
Figure BDA0002571457000000051
TABLE 3
Figure BDA0002571457000000052
As can be seen from table 1, the simple, moderate mode accuracy improvement of the proposed method for 2D detection (IoU ═ 0.7) is insignificant, but about 2% improvement for difficult modes; for the 3D detection aspect, when IoU is 0.5 and IoU is 0.7, the bird's eye view detection accuracy is improved by about 1% to 3%, and the 3D frame detection accuracy is also improved. The experimental results prove the effectiveness of the method.
In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited as long as the device can perform the above functions.
Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-mentioned serial numbers of the embodiments of the present invention are only for description and do not represent the merits of the embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (1)

1. A binocular camera based detection method for key point consistency of left and right eye views is characterized by comprising the following steps:
extracting the histogram feature of the direction gradient by using a deterministic network;
combining the extracted directional gradient histogram features, respectively carrying out 2D target detection on the left eye view and the right eye view by using a three-dimensional region suggestion network to obtain left eye view and right eye view candidate regions;
respectively predicting key points of the left and right eye view candidate regions by using an internal key point prediction module of the stereo regional convolutional neural network;
carrying out consistency matching on the key points predicted by the left and right eye views, establishing corresponding loss functions, and minimizing the loss of each task through training;
Estimating a 3D frame according to the predicted key points, performing pixel-level precision matching through dense 3D frame alignment, and further correcting the result of the 3D frame estimated in the previous step;
the consistency matching of the key points of the left and right eye view prediction is as follows: the left and right target views are consistent in theory, the difference between the left and right target views is parallax information, corresponding key points in the left and right target views have consistency, and the predicted key points of the left and right target views are matched;
the loss function is specifically:
Figure FDA0003544848340000011
wherein wcls pRepresents a classification weight, L, in the RPNcls pRepresenting classification loss in the RPN; w is areg pRepresenting the weight of the regression task in RPN, Lreg pRepresents the loss of the regression task in the RPN; w is acls rRepresents the weight of the classification in R-CNN, Lcls rRepresents a loss of classification in R-CNN; w is abox rRepresenting the weight of boxed tasks in R-CNN, Lbox rRepresents the loss of framing tasks in R-CNN;
Figure FDA0003544848340000012
representing the weight of the angle of view in the R-CNN,
Figure FDA0003544848340000013
represents the loss of angle of view in R-CNN; w is adim rRepresents the weight of a dimension in R-CNN, Ldim rRepresents the loss of dimensionality in R-CNN;
Figure FDA0003544848340000014
represents the weight of the left keypoint in the R-CNN,
Figure FDA0003544848340000015
represents the loss of the left key point in R-CNN;
Figure FDA0003544848340000016
representing the weight of the right keypoint in the R-CNN,
Figure FDA0003544848340000017
represents the loss of the right key point in R-CNN, p represents RPN part, R represents R-CNN part, beta and gamma are the coefficients of the left and right key point items respectively, and beta + gamma is 1.
CN202010645495.9A 2020-07-06 2020-07-06 Detection method based on consistency of key points of left and right eye views of binocular camera Expired - Fee Related CN111784680B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010645495.9A CN111784680B (en) 2020-07-06 2020-07-06 Detection method based on consistency of key points of left and right eye views of binocular camera

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010645495.9A CN111784680B (en) 2020-07-06 2020-07-06 Detection method based on consistency of key points of left and right eye views of binocular camera

Publications (2)

Publication Number Publication Date
CN111784680A CN111784680A (en) 2020-10-16
CN111784680B true CN111784680B (en) 2022-06-28

Family

ID=72758031

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010645495.9A Expired - Fee Related CN111784680B (en) 2020-07-06 2020-07-06 Detection method based on consistency of key points of left and right eye views of binocular camera

Country Status (1)

Country Link
CN (1) CN111784680B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114743045B (en) * 2022-03-31 2023-09-26 电子科技大学 Small sample target detection method based on double-branch area suggestion network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108317953A (en) * 2018-01-19 2018-07-24 东北电力大学 A kind of binocular vision target surface 3D detection methods and system based on unmanned plane
CN108335331A (en) * 2018-01-31 2018-07-27 华中科技大学 A kind of coil of strip binocular visual positioning method and apparatus
CN110110793A (en) * 2019-05-10 2019-08-09 中山大学 Binocular image fast target detection method based on double-current convolutional neural networks
CN110322507A (en) * 2019-06-04 2019-10-11 东南大学 A method of based on depth re-projection and Space Consistency characteristic matching
CN111126269A (en) * 2019-12-24 2020-05-08 京东数字科技控股有限公司 Three-dimensional target detection method, device and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10970425B2 (en) * 2017-12-26 2021-04-06 Seiko Epson Corporation Object detection and tracking

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108317953A (en) * 2018-01-19 2018-07-24 东北电力大学 A kind of binocular vision target surface 3D detection methods and system based on unmanned plane
CN108335331A (en) * 2018-01-31 2018-07-27 华中科技大学 A kind of coil of strip binocular visual positioning method and apparatus
CN110110793A (en) * 2019-05-10 2019-08-09 中山大学 Binocular image fast target detection method based on double-current convolutional neural networks
CN110322507A (en) * 2019-06-04 2019-10-11 东南大学 A method of based on depth re-projection and Space Consistency characteristic matching
CN111126269A (en) * 2019-12-24 2020-05-08 京东数字科技控股有限公司 Three-dimensional target detection method, device and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Cl'ement Godard等.Unsupervised Monocular Depth Estimation with Left-Right Consistency.《2017 IEEE Conference on Computer Vision and Pattern Recognition》.2017, *
Stereo R-CNN based 3D Object Detection for Autonomous Driving;Peiliang Li等;《2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)》;20191231;全文 *
基于迭代式自主学习的三维目标检测;王康如等;《光学学报》;20200510(第09期);全文 *

Also Published As

Publication number Publication date
CN111784680A (en) 2020-10-16

Similar Documents

Publication Publication Date Title
JP7106665B2 (en) MONOCULAR DEPTH ESTIMATION METHOD AND DEVICE, DEVICE AND STORAGE MEDIUM THEREOF
US11145078B2 (en) Depth information determining method and related apparatus
CN108648161B (en) Binocular vision obstacle detection system and method of asymmetric kernel convolution neural network
CN107491071B (en) Intelligent multi-robot cooperative mapping system and method thereof
CN109887021B (en) Cross-scale-based random walk stereo matching method
JP6574611B2 (en) Sensor system for obtaining distance information based on stereoscopic images
CN110246151B (en) Underwater robot target tracking method based on deep learning and monocular vision
US11822621B2 (en) Systems and methods for training a machine-learning-based monocular depth estimator
CN102156995A (en) Video movement foreground dividing method in moving camera
CN110992424B (en) Positioning method and system based on binocular vision
US20220083789A1 (en) Real-Time Target Detection And 3d Localization Method Based On Single Frame Image
dos Santos Rosa et al. Sparse-to-continuous: Enhancing monocular depth estimation using occupancy maps
CN112686952A (en) Image optical flow computing system, method and application
CN116310673A (en) Three-dimensional target detection method based on fusion of point cloud and image features
CN111784680B (en) Detection method based on consistency of key points of left and right eye views of binocular camera
US11868438B2 (en) Method and system for self-supervised learning of pillar motion for autonomous driving
Tian et al. Monocular depth estimation based on a single image: a literature review
CN111695480B (en) Real-time target detection and 3D positioning method based on single frame image
Ji et al. Stereo 3D object detection via instance depth prior guidance and adaptive spatial feature aggregation
US9323999B2 (en) Image recoginition device, image recognition method, and image recognition program
CN110864670A (en) Method and system for acquiring position of target obstacle
CN113284221B (en) Target detection method and device and electronic equipment
JP2016004382A (en) Motion information estimation device
CN115272450A (en) Target positioning method based on panoramic segmentation
Akın et al. Challenges in determining the depth in 2-d images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Yu Jiexiao

Inventor after: Zhang Meiqi

Inventor after: Jing Peiguang

Inventor after: Su Yuting

Inventor before: Yu Jiexiao

Inventor before: Jing Peiguang

Inventor before: Zhang Meiqi

Inventor before: Su Yuting

GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220628