CN111784680A - Detection method based on consistency of key points of left and right eye views of binocular camera - Google Patents

Detection method based on consistency of key points of left and right eye views of binocular camera Download PDF

Info

Publication number
CN111784680A
CN111784680A CN202010645495.9A CN202010645495A CN111784680A CN 111784680 A CN111784680 A CN 111784680A CN 202010645495 A CN202010645495 A CN 202010645495A CN 111784680 A CN111784680 A CN 111784680A
Authority
CN
China
Prior art keywords
cnn
loss
right eye
eye view
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010645495.9A
Other languages
Chinese (zh)
Other versions
CN111784680B (en
Inventor
于洁潇
井佩光
张美琪
苏育挺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202010645495.9A priority Critical patent/CN111784680B/en
Publication of CN111784680A publication Critical patent/CN111784680A/en
Application granted granted Critical
Publication of CN111784680B publication Critical patent/CN111784680B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Length Measuring Devices By Optical Means (AREA)

Abstract

The invention discloses a method for detecting consistency of key points of left and right eye views based on a binocular camera, which comprises the following steps: extracting directional gradient histogram features by using a deterministic network; combining the extracted directional gradient histogram features, respectively carrying out 2D target detection on the left eye view and the right eye view by using a three-dimensional region suggestion network to obtain left eye view and right eye view candidate regions; respectively predicting key points of the left and right eye view candidate regions by using an internal key point prediction module of the stereo regional convolutional neural network; carrying out consistency matching on the key points predicted by the left and right eye views, establishing corresponding loss functions, and minimizing the loss of each task through training; and estimating a 3D frame according to the predicted key points, performing pixel-level precision matching through dense 3D frame alignment, and further correcting the result of the 3D frame estimated in the last step. The invention improves the accuracy rate in the three-dimensional detection by utilizing the consistency of the left key point and the right key point.

Description

Detection method based on consistency of key points of left and right eye views of binocular camera
Technical Field
The invention relates to the field of binocular camera stereo detection, in particular to a method for performing binocular camera 3D detection based on left and right eye view key point consistency.
Background
Object detection is an important part of the field of computer vision, and is beginning to be a focus of research almost from the beginning of birth of computers. The 2D target detection has great development, and the accuracy and the detection speed are obviously improved. With the development of 2D object detection, researchers' eyes are beginning to be directed to 3D object detection. In addition, 3D target detection has important significance in practical application. For example, in the field of unmanned driving, 3D object detection cannot be separated, and 3D object detection still has a large development space, so that it is very important to develop a 3D object detection algorithm.
For 3D target detection, most algorithms are based on laser radar, binocular cameras and monocular cameras, and the used methods are different for different detection devices. Of these, the most numerous of the three are lidar-based, and radar-based algorithms are now capable of extremely high accuracy. However, the lidar is expensive, is easily affected by weather changes, particularly in rainy and snowy days, and easily damages human eyes, which is fatal to popularization of unmanned driving. Although the monocular camera is relatively flat and is not influenced by weather, the defects of the laser radar can be overcome, the detection error of the 3D target is large, and the detection result is not satisfactory, so that the monocular 3D target detection is not suitable for popularization. Compared with the prior art, the binocular camera has the advantages that the comprehensiveness is better than that of the two cameras in the aspects of precision, cost, efficiency and the like. The binocular camera can obtain relatively accurate depth values. Therefore, 3D detection based on the binocular camera has extremely high research significance. The Stereo region Convolutional Neural network (StereoRegion Convolutional Neural Networks, StereoR-CNN) is used as a 3D target detection algorithm and has the characteristics of high accuracy, high speed and the like. However, the accuracy of 3D detection is still in a room for improvement.
It is therefore of interest to propose an efficient method for binocular vision 3D detection.
Disclosure of Invention
The invention provides a method for detecting the consistency of key points of left and right eye views based on a binocular camera, which utilizes the consistency of the left and right key points to improve the accuracy rate in three-dimensional detection, and is described in detail as follows:
a detection method based on binocular camera left and right eye view key point consistency comprises the following steps:
extracting directional gradient histogram features by using a deterministic network;
combining the extracted directional gradient histogram features, respectively carrying out 2D target detection on the left eye view and the right eye view by using a three-dimensional region suggestion network to obtain left eye view and right eye view candidate regions;
respectively predicting key points of the left and right eye view candidate regions by using an internal key point prediction module of the stereo regional convolutional neural network;
carrying out consistency matching on the key points predicted by the left and right eye views, establishing corresponding loss functions, and minimizing the loss of each task through training;
and estimating a 3D frame according to the predicted key points, performing pixel-level precision matching through dense 3D frame alignment, and further correcting the result of the 3D frame estimated in the last step.
Wherein the loss function is specifically:
Figure BDA0002571457000000021
wherein, wcls pRepresenting a classification weight, L, in the RPNcls pRepresents a classification loss in the RPN; w is areg pRepresenting the weight of the regression task in RPN, Lreg pRepresents the loss of the regression task in the RPN; w is acls rRepresents the weight of the classification in R-CNN, Lcls rRepresents a loss of classification in R-CNN; w is abox rRepresenting the weight of boxed tasks in R-CNN, Lbox rRepresents the loss of framing tasks in R-CNN;
Figure BDA0002571457000000022
representing the weight of the angle of view in the R-CNN,
Figure BDA0002571457000000023
represents the loss of angle of view in R-CNN; w is adim rRepresents the weight of a dimension in R-CNN, Ldim rRepresents the loss of dimensionality in R-CNN;
Figure BDA0002571457000000024
represents the weight of the left keypoint in the R-CNN,
Figure BDA0002571457000000025
represents the loss of the left key point in R-CNN;
Figure BDA0002571457000000026
representing the weight of the right keypoint in the R-CNN,
Figure BDA0002571457000000027
representing the loss of the right key point in R-CNN, p representing RPN part, R representing R-CNN part, β and gamma representing the left and right key point item coefficients respectively, β + gamma being 1.
The technical scheme provided by the invention has the beneficial effects that:
1. the method corrects the key points by using the consistency of the key points of the left and right eye views, so that the accuracy of three-dimensional detection is improved;
2. the method modifies the target function so as to improve the effect of the 3D target detection algorithm;
3. the invention combines multiple ideas together to realize the optimal effect, and is particularly suitable for 3D target detection based on a binocular camera.
Drawings
Fig. 1 is a flowchart of a detection method based on consistency of key points of left and right eye views of a binocular camera.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.
Example 1
The embodiment of the invention provides a method for detecting consistency of key points of left and right eye views based on a binocular camera, and the method comprises the following steps of:
101: extracting Histogram of Oriented Gradient (HOG) features by using a Deterministic network (DetNet);
102: 2D target detection is respectively carried out on the left eye view and the right eye view by utilizing a Stereo region suggestion network (Stereo RPN) in combination with the extracted HOG characteristics to obtain left eye view and right eye view candidate regions;
103: respectively predicting key points of the left and right eye view candidate regions by using an internal key point prediction module of the Stereo R-CNN;
104: carrying out consistency matching on the key points of the left and right eye view predictions, establishing corresponding loss functions, and minimizing the loss of each task through training to improve the accuracy of classification and detection tasks;
105: and estimating a 3D frame according to the predicted key points, performing pixel-level precision matching through dense 3D frame alignment, and further correcting the result of the 3D frame estimated in the last step.
Example 2
The scheme in example 1 is further described below by combining the calculation formula and examples, and the following description refers to:
201: HOG features were extracted from project (Aproject of Karlsruhe Institute of Technology and Toyota technical Institute at Chicago, KITTI) datasets at the Carlsuhe technical Institute and Chicago Toyota technical Institute with DetNet for subsequent further processing;
202: the region suggestion network (RPN) uses a sliding window to select the region of interest and selects the best result through non-maximum suppression. Because of the binocular camera, the RPN is transformed into a two-way Stereo region suggestion network (Stereo RPN). Generating left and right target RoI areas by adopting a Stereo RPN and performing non-maximum suppression processing to obtain left and right target candidate areas;
203: using the left and right target candidate regions to predict key points, wherein the predicted key points are used for subsequent stereo estimation;
204: since the left and right target views are theoretically consistent, and the difference between the left and right target views is parallax information, corresponding key points in the left and right target views have consistency. And matching the two to establish a corresponding loss function.
The loss function during training is as follows:
Figure BDA0002571457000000041
wherein, wcls pRepresenting a classification weight, L, in the RPNcls pRepresents a classification loss in the RPN; w is areg pRepresenting the weight of the regression task in RPN, Lreg pRepresents the loss of the regression task in the RPN; w is acls rRepresents the weight of the classification in R-CNN, Lcls rRepresents a loss of classification in R-CNN; w is abox rRepresenting the weight of boxed tasks in R-CNN, Lbox rRepresents the loss of framing tasks in R-CNN;
Figure BDA0002571457000000042
representing the weight of the angle of view in the R-CNN,
Figure BDA0002571457000000043
represents the loss of angle of view in R-CNN; w is adim rRepresents the weight of a dimension in R-CNN, Ldim rRepresents the loss of dimensionality in R-CNN;
Figure BDA0002571457000000044
represents the weight of the left keypoint in the R-CNN,
Figure BDA0002571457000000045
represents the loss of the left key point in R-CNN;
Figure BDA0002571457000000046
representing the weight of the right keypoint in the R-CNN,
Figure BDA0002571457000000047
represents the loss of the right key point in R-CNN, p represents the RPN fraction, and R represents the R-CNN fraction.
In order to keep the weight of the key point consistent with the stereo frame, the view angle and the dimension, coefficients of the left key point item and the right key point item are respectively beta and gamma, wherein beta + gamma is 1.
Each loss function L is weighted by uncertainty. Experiments show that beta is 0.8 and gamma is 0.2, which can obtain the best result.
205: and performing 3D frame estimation through the obtained prediction key points.
The following test experiments are given for implementing a method for performing binocular camera 3D detection based on left and right eye view key point consistency in the present invention:
the detection performance of the embodiment of the invention is measured by Average accuracy (Average Precision), and the detection indexes comprise 2D detection and 3D detection, and the detection method is divided into three modes of simple (easy), moderate (mode) and difficult (hard) according to the difficulty degree of the detected image. Average Precision represents the probability that the Average score of a relevant tag is ranked higher than other relevant tags; the 2D detection includes detection of left (left), right (right), and stereo (stereo), and the Intersection-over-Union (IoU) of the detection is 0.7 (corresponding to table 1); the 3D detection includes bird's eye view (bird's view) detection and 3D boxes (3D boxes) detection, and IoU is divided into two kinds of 0.5 (corresponding to table 2) and 0.7 (corresponding to table 3).
To evaluate the performance of the method, embodiments of the present invention used 7481 sets of images from the KITTI data set, randomly and roughly divided into two closely numbered groups, one for training and the other for testing. During the evaluation, only the tag of the Car (Car) is considered, and other tags (including bus and the like similar to the Car tag) are not considered.
TABLE 1
Figure BDA0002571457000000048
TABLE 2
Figure BDA0002571457000000051
TABLE 3
Figure BDA0002571457000000052
As can be seen from table 1, the simple, moderate mode accuracy improvement of the proposed method for 2D detection (IoU ═ 0.7) is insignificant, but about 2% improvement for difficult modes; for the 3D detection aspect, when IoU is 0.5 and IoU is 0.7, the bird's eye view detection accuracy is improved by about 1% to 3%, and the 3D frame detection accuracy is also improved. The experimental results prove the effectiveness of the method.
In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.
Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (2)

1. A detection method based on binocular camera left and right view key point consistency is characterized by comprising the following steps:
extracting directional gradient histogram features by using a deterministic network;
combining the extracted directional gradient histogram features, respectively carrying out 2D target detection on the left eye view and the right eye view by using a three-dimensional region suggestion network to obtain left eye view and right eye view candidate regions;
respectively predicting key points of the left and right eye view candidate regions by using an internal key point prediction module of the stereo regional convolutional neural network;
carrying out consistency matching on the key points predicted by the left and right eye views, establishing corresponding loss functions, and minimizing the loss of each task through training;
and estimating a 3D frame according to the predicted key points, performing pixel-level precision matching through dense 3D frame alignment, and further correcting the result of the 3D frame estimated in the last step.
2. The binocular camera based left and right eye view key point consistency detection method according to claim 1, wherein the loss function specifically is:
Figure FDA0002571456990000011
wherein, wcls pRepresenting a classification weight, L, in the RPNcls pRepresents a classification loss in the RPN; w is areg pRepresenting the weight of the regression task in RPN, Lreg pRepresents the loss of the regression task in the RPN; w is acls rRepresents the weight of the classification in R-CNN, Lcls rRepresents a loss of classification in R-CNN; w is abox rRepresenting the weight of boxed tasks in R-CNN, Lbox rRepresents the loss of framing tasks in R-CNN;
Figure FDA0002571456990000012
representing the weight of the angle of view in the R-CNN,
Figure FDA0002571456990000017
represents the loss of angle of view in R-CNN; w is adim rRepresents the weight of a dimension in R-CNN, Ldim rRepresents the loss of dimensionality in R-CNN;
Figure FDA0002571456990000013
represents the weight of the left keypoint in the R-CNN,
Figure FDA0002571456990000014
represents the loss of the left key point in R-CNN;
Figure FDA0002571456990000015
representing the weight of the right keypoint in the R-CNN,
Figure FDA0002571456990000016
representing the loss of the right key point in R-CNN, p representing RPN part, R representing R-CNN part, β and gamma representing the left and right key point item coefficients respectively, β + gamma being 1.
CN202010645495.9A 2020-07-06 2020-07-06 Detection method based on consistency of key points of left and right eye views of binocular camera Expired - Fee Related CN111784680B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010645495.9A CN111784680B (en) 2020-07-06 2020-07-06 Detection method based on consistency of key points of left and right eye views of binocular camera

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010645495.9A CN111784680B (en) 2020-07-06 2020-07-06 Detection method based on consistency of key points of left and right eye views of binocular camera

Publications (2)

Publication Number Publication Date
CN111784680A true CN111784680A (en) 2020-10-16
CN111784680B CN111784680B (en) 2022-06-28

Family

ID=72758031

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010645495.9A Expired - Fee Related CN111784680B (en) 2020-07-06 2020-07-06 Detection method based on consistency of key points of left and right eye views of binocular camera

Country Status (1)

Country Link
CN (1) CN111784680B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114743045A (en) * 2022-03-31 2022-07-12 电子科技大学 Small sample target detection method based on double-branch area suggestion network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108317953A (en) * 2018-01-19 2018-07-24 东北电力大学 A kind of binocular vision target surface 3D detection methods and system based on unmanned plane
CN108335331A (en) * 2018-01-31 2018-07-27 华中科技大学 A kind of coil of strip binocular visual positioning method and apparatus
US20190197196A1 (en) * 2017-12-26 2019-06-27 Seiko Epson Corporation Object detection and tracking
CN110110793A (en) * 2019-05-10 2019-08-09 中山大学 Binocular image fast target detection method based on double-current convolutional neural networks
CN110322507A (en) * 2019-06-04 2019-10-11 东南大学 A method of based on depth re-projection and Space Consistency characteristic matching
CN111126269A (en) * 2019-12-24 2020-05-08 京东数字科技控股有限公司 Three-dimensional target detection method, device and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190197196A1 (en) * 2017-12-26 2019-06-27 Seiko Epson Corporation Object detection and tracking
CN108317953A (en) * 2018-01-19 2018-07-24 东北电力大学 A kind of binocular vision target surface 3D detection methods and system based on unmanned plane
CN108335331A (en) * 2018-01-31 2018-07-27 华中科技大学 A kind of coil of strip binocular visual positioning method and apparatus
CN110110793A (en) * 2019-05-10 2019-08-09 中山大学 Binocular image fast target detection method based on double-current convolutional neural networks
CN110322507A (en) * 2019-06-04 2019-10-11 东南大学 A method of based on depth re-projection and Space Consistency characteristic matching
CN111126269A (en) * 2019-12-24 2020-05-08 京东数字科技控股有限公司 Three-dimensional target detection method, device and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CL´EMENT GODARD等: "Unsupervised Monocular Depth Estimation with Left-Right Consistency", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
PEILIANG LI等: "Stereo R-CNN based 3D Object Detection for Autonomous Driving", 《2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
王康如等: "基于迭代式自主学习的三维目标检测", 《光学学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114743045A (en) * 2022-03-31 2022-07-12 电子科技大学 Small sample target detection method based on double-branch area suggestion network
CN114743045B (en) * 2022-03-31 2023-09-26 电子科技大学 Small sample target detection method based on double-branch area suggestion network

Also Published As

Publication number Publication date
CN111784680B (en) 2022-06-28

Similar Documents

Publication Publication Date Title
JP7106665B2 (en) MONOCULAR DEPTH ESTIMATION METHOD AND DEVICE, DEVICE AND STORAGE MEDIUM THEREOF
CN108648161B (en) Binocular vision obstacle detection system and method of asymmetric kernel convolution neural network
CN102156995A (en) Video movement foreground dividing method in moving camera
JP2016009487A (en) Sensor system for determining distance information on the basis of stereoscopic image
Xie et al. A binocular vision application in IoT: Realtime trustworthy road condition detection system in passable area
US20220083789A1 (en) Real-Time Target Detection And 3d Localization Method Based On Single Frame Image
CN110992378B (en) Dynamic updating vision tracking aerial photographing method and system based on rotor flying robot
dos Santos Rosa et al. Sparse-to-continuous: Enhancing monocular depth estimation using occupancy maps
US11868438B2 (en) Method and system for self-supervised learning of pillar motion for autonomous driving
CN111784680B (en) Detection method based on consistency of key points of left and right eye views of binocular camera
Kao et al. Moving object segmentation using depth and optical flow in car driving sequences
CN111695480B (en) Real-time target detection and 3D positioning method based on single frame image
Ji et al. Stereo 3D object detection via instance depth prior guidance and adaptive spatial feature aggregation
CN106650814B (en) Outdoor road self-adaptive classifier generation method based on vehicle-mounted monocular vision
Tao et al. An efficient 3D object detection method based on fast guided anchor stereo RCNN
Kerkaou et al. Support vector machines based stereo matching method for advanced driver assistance systems
Li et al. CDMY: A lightweight object detection model based on coordinate attention
CN113284221B (en) Target detection method and device and electronic equipment
Kim et al. Stereo-based region of interest generation for real-time pedestrian detection
CN115272450A (en) Target positioning method based on panoramic segmentation
Lu et al. A geometric convolutional neural network for 3d object detection
Zhang et al. An Improved Detection Algorithm For Pre-processing Problem Based On PointPillars
Akın et al. Challenges in determining the depth in 2-d images
Zhang et al. Learning deformable network for 3D object detection on point clouds
Zeng High efficiency pedestrian crossing prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Yu Jiexiao

Inventor after: Zhang Meiqi

Inventor after: Jing Peiguang

Inventor after: Su Yuting

Inventor before: Yu Jiexiao

Inventor before: Jing Peiguang

Inventor before: Zhang Meiqi

Inventor before: Su Yuting

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220628