US20150324659A1 - Method for detecting objects in stereo images - Google Patents

Method for detecting objects in stereo images Download PDF

Info

Publication number
US20150324659A1
US20150324659A1 US14/272,570 US201414272570A US2015324659A1 US 20150324659 A1 US20150324659 A1 US 20150324659A1 US 201414272570 A US201414272570 A US 201414272570A US 2015324659 A1 US2015324659 A1 US 2015324659A1
Authority
US
United States
Prior art keywords
images
sub
stereo images
pair
stereo
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US14/272,570
Other versions
US9195904B1 (en
Inventor
Ming-Yu Liu
Oncel Tuzel
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mitsubishi Electric Research Laboratories Inc
Original Assignee
Mitsubishi Electric Research Laboratories Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitsubishi Electric Research Laboratories Inc filed Critical Mitsubishi Electric Research Laboratories Inc
Priority to US14/272,570 priority Critical patent/US9195904B1/en
Priority to JP2015075866A priority patent/JP6345147B2/en
Priority to CN201510232293.0A priority patent/CN105096307B/en
Publication of US20150324659A1 publication Critical patent/US20150324659A1/en
Application granted granted Critical
Publication of US9195904B1 publication Critical patent/US9195904B1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • G06K9/468
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06K9/52
    • G06K9/6256
    • G06K9/6267
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/255Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images

Definitions

  • This invention relates to computer vision, and more particularly to detecting objects in stereo images.
  • stereo images acquired by a stereo camera typically has multiple lenses and sensors.
  • the intra-axial distance between the lenses is about the same distance as between the eyes to provide overlapping views.
  • FIG. 1 shows a conventional system for stereo-based object detection.
  • a stereo camera 101 acquires stereo images 102 .
  • the dection method can include the following steps: stereo imaging 100 , cost volume determination 110 , depth/disparity map estimation 120 , and object detection 130 .
  • This step is generally referred as depth/range map estimation.
  • This step can be achieved by determining disparity values, i.e., translation of corresponding pixels in the two images, determining the depth map.
  • the depth map can then be used lot object detection 130 , e.g., a histogram of oriented gradients (HoG) of the depth map is used for object description.
  • HoG histogram of oriented gradients
  • One method estimates the dominate disparity in a sub-image region, and use a co-occurrence histogram of the relative disparity values for object detection.
  • Depth/range/disparity map estimation is a challenging problem. Local methods suffers from inaccurate depth determination, while global methods require significant computational resources, and are unsuited for real-time applications.
  • one method determines a stixel map which marks the potential object locations. Each stixel is defined by a 3D position relative to the camera and stands vertically on a ground plane. A detector based on the color image content is then applied to the locations to detect objects.
  • U.S. Publication 20130177237 uses a range map to determine an area of interest, and uses a classifier based on an intensity histogram to detect objects.
  • Region of interest methods cannot be directly applied to object detection. They have to be applied in conjunction with other object detectors. In addition, miss detection is certain when the area of interest does not cover the object.
  • the embodiments of the invention provide a method for detecting objects in stereo images.
  • a cost volume is computed from the images.
  • object detection is directly applied features obtained from the cost volume.
  • the detection uses T decision tree classifiers (Adaboost) that are learned from training features.
  • the invention avoids the error-prone and computationally-complex depth map estimation step of the prior art, and leads to an accurate and efficient object detector.
  • the method is better-suited for embedded systems because it does not require complex optimization modules necessary to obtain good depth map.
  • the method searches all sub-images in the input images to detect the object. This avoids the miss detection problem that exists in the region of interest generation techniques.
  • the detection is accurate because the method can leverage a large amount of training data and make use of machine learning procedures. It outperforms region of interest generation techniques in detection accuracy.
  • FIG. 1 is a block diagram of a conventional stereo-based object detection system:
  • FIG. 2 is a block diagram of a stereo-based object detection system according to embodiments of the invention.
  • FIG. 3 is a block, diagram of an object detection module for the stereo-based object detection system of FIG. 2 ;
  • FIG. 4 is a block diagram of a method for learning the stereo-based object detector according to embodiments of the invention.
  • FIG. 5 is a schematic of cost volume determination according to embodiments of the invention.
  • FIG. 6 is a schematic of a learned feature according to embodiments of the invention.
  • FIG. 7 is a schematic of objects occupying large and small portions of sub-images.
  • FIG. 2 shows a method and system for detecting an object 201 in a pair al stereo images 200 according, to embodiments of our invention.
  • a cost volume 211 is generated 210 from the pair of stereo images. This is followed by selecting and extracting 215 feature vectors 216 .
  • an object detector 220 is applied to the features vectors volume to detect the object.
  • the object detector uses classifiers 230 leaned form training images features 231 . After the object is dected, it can be localized, that is the loation of the object in the image can be determined.
  • the method can be performed in a processor 250 connected to memory and input/output interfaces by buses as known in the art.
  • Our invention is based on the realization that depth information available in a depth map is also available in the cost volume, because the depth map is derived from the cost volume.
  • Our detector 220 that uses the cost volume directly is theoretically capable of matching the performance of any detector based on the depth map.
  • the cost volume is a richer representation than the conventional depth map.
  • the depth map only provides a depth for each pixel, while the cost volume provides matching costs for a range of potential depths that each pixel in the stereo images can have, including the true depth.
  • the detector uses features directly obtained from the cost volume can access more depth information, and achieve better performance.
  • one embodiment of our invention includes cost volume generation 210 , feature extraction 310 , object detection and localization 320 , learned discriminative features 330 , and a learned object classification model 340 .
  • the localization determines where the object is detected.
  • FIG. 4 shows a machine learning procedure for learning the discriminative features and the learned object classification model.
  • Features are selected and learned 410 from training data 400 comprising pairs of training stereo images.
  • FIG. 5 shows the generation of the cost volume C 211 .
  • the cost volume C: X ⁇ Y ⁇ D is a three-dimension data structure stored in the memory, where X and Y denote the image x and y axes and D denotes a set of disparity values, which are translations between corresponding pixels in the two stereo images I L 501 and I R 502 .
  • L L and I R are rectified, which means that the images have been transformed such that the lens distortion effects are compensated, and a pixel in a row of one image is mapped to a pixel in the same row of the other image.
  • the cost volume can then be determined by matching pixel appearance in the pair of stereo images I L and I R .
  • ⁇ 2 denotes a Euclidean norm and I L (x, y) refers the pixel color values in the (x,y) location of the I L image
  • I R (x,y) refers the pixel color values in the (x,y) location of the IR image
  • grad(I L (x,y)) refers to the gradient in the (x,y) location of the IL image
  • grad(I R (x ⁇ d,y)) refers to the gradient in the (x,y) location of the IR image
  • is the weight controlling the importance of the gradient information.
  • image smoothing technique can be applied, such as bilateral filtering or guided filtering to enhance the cost volume.
  • FIG. 6 shows feature selection and extraction 215 of FIG. 2 .
  • the sub-images can be considered a moving window passed over the image in, e.g., in a raster scan order for eachy pixel at multiple scales.
  • Each dimension of the feature vector corresponds to a numerical comparison result between a Fig cost disparity values of two e.g., rectangle, regions R k 1 601 and R k 2 602 in the sub-image 600 .
  • the sub-image be denoted as J and the k th dimension of feature vector be represented as f k (J).
  • the value of f k (J) is
  • d min (R k i ) represents to the disparity value that has a minimal (min) accumulated cost in the region of R k i of the sub-image. That is
  • determining the minimal cost disparity value in the region is relatively simple because the accumulated cost can be obtained efficiently using an integral image technique as known in the art.
  • the locations and size of the regions are learned using a machine learning procedure, which is described below.
  • the K-dimensional feature vector associated with the sub-image is passed to an ensemble classifier for determining a detection score.
  • the ensemble classifier includes T decision tree classifiers. Each decision tree classifier takes a small number of dimensions of the K-dimensional feature as input, and classifies the sub-image as positive (containing an object) or negative (not containing an object).
  • a detection score s obtained from the classifier for the sub-image J is given by
  • ⁇ t 's are the decision tree classifiers and ⁇ t 's are the corresponding weights. If the score is greater than a preset threshold, then the system declares a detection in the sub-image.
  • the classifier can be trained to give a higher score when the object occupies a larger portion, of the sub-image 701 and a lower score when the object only occupies a small portion of the subimage 702 , because the larger object provides a better estimate of where the object is located within the image than the smaller object.
  • J i denotes the i th sub-image
  • l i is the label
  • V is the total number of sub-images.
  • AdaBoost discrete AdaBoost procedure to learn T decision tree classifiers and their weights.
  • the procedure starts with assigning uniform weights to the training samples.
  • a decision tree is then learned based on the current training sample weights.
  • the weights of incorrectly classified samples are increased so that the weights have more impact during the next round of decision tree classifier learning.
  • We assign the weight to the decision tree classifier based on the weighted error rate. This process is repeated I times to construct an ensemble classifier of T decision tree classifiers.
  • a pseudo code of the procedure is described below.
  • ⁇ t log ⁇ 1 - ⁇ ⁇ ;

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

A method detects an object in a pair of stereo images acquired of a scene, by first generating a cost volume from the pair of stereo images, wherein the cost volume includes matching costs for a range of disparity values, for each pixel the sterao images, between the stereo images in the pair. Feature vectors are determined from sub-images in the cost volume using a feature function of the disparity values with a minimal accumulated cost within regions inside the sub-images. Then, a classifier is appled to the feature vectors to detect whether the sub-image includes the object.

Description

    FIELD OF THE INVENTION
  • This invention relates to computer vision, and more particularly to detecting objects in stereo images.
  • BACKGROUND OF THE INVENTION
  • Many computer vision applications use stereo images acquired by a stereo camera to detect objects. A stereo camera typically has multiple lenses and sensors. Usually, the intra-axial distance between the lenses is about the same distance as between the eyes to provide overlapping views.
  • FIG. 1 shows a conventional system for stereo-based object detection. A stereo camera 101 acquires stereo images 102. The dection method can include the following steps: stereo imaging 100, cost volume determination 110, depth/disparity map estimation 120, and object detection 130.
  • Most of the conventional methods for stereo-based object detection rely on per pixel depth information in the overlapping area 120. This step is generally referred as depth/range map estimation. This step can be achieved by determining disparity values, i.e., translation of corresponding pixels in the two images, determining the depth map. The depth map can then be used lot object detection 130, e.g., a histogram of oriented gradients (HoG) of the depth map is used for object description. One method estimates the dominate disparity in a sub-image region, and use a co-occurrence histogram of the relative disparity values for object detection.
  • Depth/range/disparity map estimation is a challenging problem. Local methods suffers from inaccurate depth determination, while global methods require significant computational resources, and are unsuited for real-time applications.
  • Several methods avoid the depth map determination step by using stereo cues for region of interest generation. For example, one method determines a stixel map which marks the potential object locations. Each stixel is defined by a 3D position relative to the camera and stands vertically on a ground plane. A detector based on the color image content is then applied to the locations to detect objects.
  • U.S. Publication 20130177237 uses a range map to determine an area of interest, and uses a classifier based on an intensity histogram to detect objects.
  • Region of interest methods cannot be directly applied to object detection. They have to be applied in conjunction with other object detectors. In addition, miss detection is certain when the area of interest does not cover the object.
  • SUMMARY OF THE INVENTION
  • The embodiments of the invention provide a method for detecting objects in stereo images. A cost volume is computed from the images. Then, object detection is directly applied features obtained from the cost volume. The detection uses T decision tree classifiers (Adaboost) that are learned from training features.
  • The invention avoids the error-prone and computationally-complex depth map estimation step of the prior art, and leads to an accurate and efficient object detector. The method is better-suited for embedded systems because it does not require complex optimization modules necessary to obtain good depth map. In addition, the method searches all sub-images in the input images to detect the object. This avoids the miss detection problem that exists in the region of interest generation techniques.
  • The detection is accurate because the method can leverage a large amount of training data and make use of machine learning procedures. It outperforms region of interest generation techniques in detection accuracy.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram of a conventional stereo-based object detection system:
  • FIG. 2 is a block diagram of a stereo-based object detection system according to embodiments of the invention;
  • FIG. 3 is a block, diagram of an object detection module for the stereo-based object detection system of FIG. 2;
  • FIG. 4 is a block diagram of a method for learning the stereo-based object detector according to embodiments of the invention.
  • FIG. 5 is a schematic of cost volume determination according to embodiments of the invention;
  • FIG. 6 is a schematic of a learned feature according to embodiments of the invention; and
  • FIG. 7 is a schematic of objects occupying large and small portions of sub-images.
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • FIG. 2 shows a method and system for detecting an object 201 in a pair al stereo images 200 according, to embodiments of our invention. A cost volume 211 is generated 210 from the pair of stereo images. This is followed by selecting and extracting 215 feature vectors 216. Then, an object detector 220 is applied to the features vectors volume to detect the object. The object detector uses classifiers 230 leaned form training images features 231. After the object is dected, it can be localized, that is the loation of the object in the image can be determined. The method can be performed in a processor 250 connected to memory and input/output interfaces by buses as known in the art.
  • Our invention is based on the realization that depth information available in a depth map is also available in the cost volume, because the depth map is derived from the cost volume.
  • Our detector 220 that uses the cost volume directly is theoretically capable of matching the performance of any detector based on the depth map. Moreover, the cost volume is a richer representation than the conventional depth map. The depth map only provides a depth for each pixel, while the cost volume provides matching costs for a range of potential depths that each pixel in the stereo images can have, including the true depth. Hence, the detector uses features directly obtained from the cost volume can access more depth information, and achieve better performance.
  • As shown in FIG. 3, one embodiment of our invention includes cost volume generation 210, feature extraction 310, object detection and localization 320, learned discriminative features 330, and a learned object classification model 340. The localization determines where the object is detected.
  • FIG. 4 shows a machine learning procedure for learning the discriminative features and the learned object classification model. Features are selected and learned 410 from training data 400 comprising pairs of training stereo images.
  • Cost Volume Generation
  • FIG. 5 shows the generation of the cost volume C 211. The cost volume C: X×Y×D is a three-dimension data structure stored in the memory, where X and Y denote the image x and y axes and D denotes a set of disparity values, which are translations between corresponding pixels in the two stereo images IL 501 and IR 502. We assume that LL and IR are rectified, which means that the images have been transformed such that the lens distortion effects are compensated, and a pixel in a row of one image is mapped to a pixel in the same row of the other image. The cost volume can then be determined by matching pixel appearance in the pair of stereo images IL and IR.
  • One way to determine the cost volume apply the mapping given by

  • C(x, y, d) =∥I L(x, y)−IR(x−d,y)∥2+λ∥

  • grad(IL(x,y))−grad(IR(x−d,y))∥2

  • for any (x,y,d)∈X×Y×D.
  • where ∥∥2 denotes a Euclidean norm and IL(x, y) refers the pixel color values in the (x,y) location of the IL image, IR(x,y) refers the pixel color values in the (x,y) location of the IR image, grad(IL(x,y)) refers to the gradient in the (x,y) location of the IL image, grad(IR(x−d,y)) refers to the gradient in the (x,y) location of the IR image, and λ is the weight controlling the importance of the gradient information. Note that image smoothing technique can be applied, such as bilateral filtering or guided filtering to enhance the cost volume.
  • Feature Extraction
  • FIG. 6 shows feature selection and extraction 215 of FIG. 2. We extract a K-dimensional feature vector from each sub-image 600 for determining whether or not the object is present in the sub-image. The sub-images can be considered a moving window passed over the image in, e.g., in a raster scan order for eachy pixel at multiple scales.
  • Note, the embodiments only directly uses the cost volume to determine the features. Depth map estimation as in the prior art is not performed.
  • Each dimension of the feature vector corresponds to a numerical comparison result between a Fig cost disparity values of two e.g., rectangle, regions Rk 1 601 and R k 2 602 in the sub-image 600. Let the sub-image be denoted as J and the kth dimension of feature vector be represented as fk(J). The value of fk(J) is
  • f k ( J ) = { 1 if d min ( R k 1 ) > d min ( R k 2 ) 0 if d min ( R k 1 ) = d min ( R k 2 ) - 1 otherwise , ( 1 )
  • where dmin(Rk i) represents to the disparity value that has a minimal (min) accumulated cost in the region of Rk i of the sub-image. That is
  • d min ( R k i ) = arg min d ( x , y ) R k i C ( x , y , d ) . ( 2 )
  • Note that determining the minimal cost disparity value in the region is relatively simple because the accumulated cost can be obtained efficiently using an integral image technique as known in the art. The locations and size of the regions are learned using a machine learning procedure, which is described below.
  • Object Detection and Localization
  • The K-dimensional feature vector associated with the sub-image is passed to an ensemble classifier for determining a detection score. The ensemble classifier includes T decision tree classifiers. Each decision tree classifier takes a small number of dimensions of the K-dimensional feature as input, and classifies the sub-image as positive (containing an object) or negative (not containing an object). A detection score s obtained from the classifier for the sub-image J is given by
  • s ( J ) = t = 1 T θ t δ t ( J ) , ( 3 )
  • where δt's are the decision tree classifiers and θt's are the corresponding weights. If the score is greater than a preset threshold, then the system declares a detection in the sub-image.
  • As shown in FIG. 7, the classifier can be trained to give a higher score when the object occupies a larger portion, of the sub-image 701 and a lower score when the object only occupies a small portion of the subimage 702, because the larger object provides a better estimate of where the object is located within the image than the smaller object.
  • Feature Selection and Classifier Learning Procedure
  • We use a discrete AdaBoost procedure for selecting the region

  • {(R k 1 ,R k 2)|∀k=12, . . . , K},   (4)
  • and for learning the decision tree classifier weights

  • t|∀t=1, 2, . . . , T}.   (5).
  • We collect a set of data fir a learning task, which includes a set of stereo training images. The sub-images that contain an object is labeled as positive instances, while others are labeled as negative instances. We align the positive and negative sub-images so that their centers coincide. The sub-images are also scaled to have the same height. The aligned and scaled sub-images are denoted as

  • D={(j i ,l i), i=1, 2, . . . , V},   (6)
  • where Ji denotes the ith sub-image, li is the label, and V is the total number of sub-images.
  • We sample a set of N regions as the feature pools {Ri, i=1, 2, . . . , N}, which have different locations and sizes and are covered by the aligned sub-images. We randomly pair two regions and compare their disparity values of the minimal cost. This is performed K times to construct a K-dimensional feature vector.
  • We use the discrete AdaBoost procedure to learn T decision tree classifiers and their weights. The procedure starts with assigning uniform weights to the training samples. A decision tree is then learned based on the current training sample weights. The weights of incorrectly classified samples are increased so that the weights have more impact during the next round of decision tree classifier learning. We assign the weight to the decision tree classifier based on the weighted error rate. This process is repeated I times to construct an ensemble classifier of T decision tree classifiers. A pseudo code of the procedure is described below.
    • Input: Feature vectors and class labels D={(f(Ji),li), i=1, 2, . . . , V}
    • Output: Ensemble classifiers Σt=1 Tθtδt(J)
    • Start with uniform weights
  • w i = 1 V , i = 1 , 2 , , V
  • For t=1, 2, . . . , T
    • 1. Learn a decision tree classifier δt(J)∈{−1,1} using weights wi's;
    • 2. Determine error rate ε=Σiwi|(δt(Ji≠li);
    • 3. Determine decision tree classifier weight
  • θ t = log 1 - ɛ ɛ ;
    • 4. Set wi←wiexp(θt|(δt(Ji)≠li) for i=1, 2, . . . , V; and
    • 5. Normalize the sample weights
  • w i w i Σ i w i .
  • Function δt, which is used in steps 2 and 4, represents the indicator function, which returns one if the statement in the parenthesis is true and zero otherwise.
  • Although the invention has been described by way of examples of preferred embodiments, it is to be understood that various other adaptations and modifications can be made within the spirit and scope of the invention. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the invention.

Claims (10)

1. A method for detecting an object in a pair of stereo images acquired of a scene, where each stereo image include pixels, comprising the steps of:
generating a cost volume from the pair of stereo images, wherein the cost volume includes matching costs for a range of disparity values, for each pixel, between the stereo images in the pair;
determining feature vectors from sub-images in the cost volume using a feature function of the disparity values with a minimal accumulated cost within regions inside the sub-images; and
applying a classifier to the feature vectors to detect whether the sub-images includes the object, wherein the steps are performed in a processor.
2. The method of claim 1, further comprising:
localizing the object within the stereo images.
3. The method of claim 1, wherin the classifier is learned from pairs of training stereo images.
4. The method of claim 1, further comprising:
rectifying the pair of stereo images.
5. The method of claim 1, further comprising:
smoothing the pair of stereo images.
6. The method of claim 1, wherein the generating further comprises:
matching colors and gradients of the pixels in the pair of stereo images using a Euclidaian norm.
7. The method of claim 1, wherein the feature function is
f k ( J ) = { 1 if d min ( R k 1 ) > d min ( R k 2 ) 0 if d min ( R k 1 ) = d min ( R k 2 ) - 1 otherwise , ( 1 )
where J represents the sub-image, k represents a dimension of the feature vectors, min represents a function that returns a minimum, and dmin(Rk i) represents the disparity value that has a minimal accumulated cost in rectangle area of Rk i in the sub-image, wherein i indexed the rectangular regions.
8. The method of claim 7, wherein
d min ( R k i ) = arg min d ( x , y ) R k i C ( x , y , d ) .
where C(x, y, d) represent the cost volume.
9. The method of claim 1, wherein the classifier is an ensemble classifier including T decision tree classifiers.
10. 10. The method of claim 9, wherein the classifier provides a detection score s for sub-image J is
s ( J ) = t = 1 T θ t δ t ( J ) ,
where δt's are the decision tree classifiers and θt's are corresponding weights.
US14/272,570 2014-05-08 2014-05-08 Method for detecting objects in stereo images Active US9195904B1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US14/272,570 US9195904B1 (en) 2014-05-08 2014-05-08 Method for detecting objects in stereo images
JP2015075866A JP6345147B2 (en) 2014-05-08 2015-04-02 Method for detecting an object in a pair of stereo images
CN201510232293.0A CN105096307B (en) 2014-05-08 2015-05-08 The method of detection object in paired stereo-picture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US14/272,570 US9195904B1 (en) 2014-05-08 2014-05-08 Method for detecting objects in stereo images

Publications (2)

Publication Number Publication Date
US20150324659A1 true US20150324659A1 (en) 2015-11-12
US9195904B1 US9195904B1 (en) 2015-11-24

Family

ID=54368107

Family Applications (1)

Application Number Title Priority Date Filing Date
US14/272,570 Active US9195904B1 (en) 2014-05-08 2014-05-08 Method for detecting objects in stereo images

Country Status (3)

Country Link
US (1) US9195904B1 (en)
JP (1) JP6345147B2 (en)
CN (1) CN105096307B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718910A (en) * 2016-01-22 2016-06-29 孟玲 Battery room with combination of local and global characteristics
US20160335775A1 (en) * 2014-02-24 2016-11-17 China Academy Of Telecommunications Technology Visual navigation method, visual navigation device and robot
US20170083787A1 (en) * 2015-09-18 2017-03-23 Qualcomm Incorporated Fast Cost Aggregation for Dense Stereo Matching
US20170147888A1 (en) * 2015-11-20 2017-05-25 GM Global Technology Operations LLC Stixel estimation methods and systems
CN106845520A (en) * 2016-12-23 2017-06-13 深圳云天励飞技术有限公司 A kind of image processing method and terminal
US10395144B2 (en) * 2017-07-24 2019-08-27 GM Global Technology Operations LLC Deeply integrated fusion architecture for automated driving systems
US10990801B1 (en) * 2018-05-31 2021-04-27 The Charles Stark Draper Laboratory, Inc. System and method for multidimensional gradient-based cross-spectral stereo matching
US11272163B2 (en) * 2017-02-07 2022-03-08 Sony Corporation Image processing apparatus and image processing method

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101825459B1 (en) * 2016-08-05 2018-03-22 재단법인대구경북과학기술원 Multi-class objects detection apparatus and method thereof
US9989964B2 (en) 2016-11-03 2018-06-05 Mitsubishi Electric Research Laboratories, Inc. System and method for controlling vehicle using neural network
US10554957B2 (en) * 2017-06-04 2020-02-04 Google Llc Learning-based matching for active stereo systems
TWI709725B (en) * 2019-12-03 2020-11-11 阿丹電子企業股份有限公司 Volume measuring apparatus and volume measuring method for boxes
KR102192322B1 (en) * 2020-04-13 2020-12-17 재단법인 다차원 스마트 아이티 융합시스템 연구단 Camera system with complementary pixlet structure in pixel block unit and operation method thereof
US11577723B2 (en) 2020-06-29 2023-02-14 Uatc, Llc Object trajectory association and tracking

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6301370B1 (en) * 1998-04-13 2001-10-09 Eyematic Interfaces, Inc. Face recognition from video images
US9111342B2 (en) * 2010-07-07 2015-08-18 Electronics And Telecommunications Research Institute Method of time-efficient stereo matching
CN101976455B (en) * 2010-10-08 2012-02-01 东南大学 Color image three-dimensional reconstruction method based on three-dimensional matching
KR20120049997A (en) * 2010-11-10 2012-05-18 삼성전자주식회사 Image process device, display apparatus and methods thereof
CN102026013B (en) * 2010-12-18 2012-05-23 浙江大学 Three-dimensional video matching method based on affine transformation
US8406470B2 (en) * 2011-04-19 2013-03-26 Mitsubishi Electric Research Laboratories, Inc. Object detection in depth images
US20130177237A1 (en) 2012-01-09 2013-07-11 Gregory Gerhard SCHAMP Stereo-vision object detection system and method
CN103366354B (en) * 2012-03-27 2016-09-07 富士通株式会社 Method and system for stereo matching
JP2014096062A (en) * 2012-11-09 2014-05-22 Yamaguchi Univ Image processing method and image processing apparatus
CN103226821B (en) * 2013-04-27 2015-07-01 山西大学 Stereo matching method based on disparity map pixel classification correction optimization

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160335775A1 (en) * 2014-02-24 2016-11-17 China Academy Of Telecommunications Technology Visual navigation method, visual navigation device and robot
US9886763B2 (en) * 2014-02-24 2018-02-06 China Academy Of Telecommunications Technology Visual navigation method, visual navigation device and robot
US20170083787A1 (en) * 2015-09-18 2017-03-23 Qualcomm Incorporated Fast Cost Aggregation for Dense Stereo Matching
US9626590B2 (en) * 2015-09-18 2017-04-18 Qualcomm Incorporated Fast cost aggregation for dense stereo matching
US20170147888A1 (en) * 2015-11-20 2017-05-25 GM Global Technology Operations LLC Stixel estimation methods and systems
US10482331B2 (en) * 2015-11-20 2019-11-19 GM Global Technology Operations LLC Stixel estimation methods and systems
CN105718910A (en) * 2016-01-22 2016-06-29 孟玲 Battery room with combination of local and global characteristics
CN106845520A (en) * 2016-12-23 2017-06-13 深圳云天励飞技术有限公司 A kind of image processing method and terminal
US11272163B2 (en) * 2017-02-07 2022-03-08 Sony Corporation Image processing apparatus and image processing method
US10395144B2 (en) * 2017-07-24 2019-08-27 GM Global Technology Operations LLC Deeply integrated fusion architecture for automated driving systems
US10990801B1 (en) * 2018-05-31 2021-04-27 The Charles Stark Draper Laboratory, Inc. System and method for multidimensional gradient-based cross-spectral stereo matching

Also Published As

Publication number Publication date
JP2015215877A (en) 2015-12-03
JP6345147B2 (en) 2018-06-20
US9195904B1 (en) 2015-11-24
CN105096307B (en) 2018-01-02
CN105096307A (en) 2015-11-25

Similar Documents

Publication Publication Date Title
US9195904B1 (en) Method for detecting objects in stereo images
US9426449B2 (en) Depth map generation from a monoscopic image based on combined depth cues
Keller et al. A new benchmark for stereo-based pedestrian detection
US11443454B2 (en) Method for estimating the pose of a camera in the frame of reference of a three-dimensional scene, device, augmented reality system and computer program therefor
US7831087B2 (en) Method for visual-based recognition of an object
US9704017B2 (en) Image processing device, program, image processing method, computer-readable medium, and image processing system
US8345930B2 (en) Method for computing food volume in a method for analyzing food
US9600898B2 (en) Method and apparatus for separating foreground image, and computer-readable recording medium
US9639748B2 (en) Method for detecting persons using 1D depths and 2D texture
US9305206B2 (en) Method for enhancing depth maps
US20100173269A1 (en) Food recognition using visual analysis and speech recognition
US20110025834A1 (en) Method and apparatus of identifying human body posture
KR20170006355A (en) Method of motion vector and feature vector based fake face detection and apparatus for the same
CN104517102A (en) Method and system for detecting classroom attention of student
CN105279772B (en) A kind of trackability method of discrimination of infrared sequence image
CN102609724B (en) Method for prompting ambient environment information by using two cameras
US8718362B2 (en) Appearance and context based object classification in images
CN111368682B (en) Method and system for detecting and identifying station caption based on master RCNN
US11244475B2 (en) Determining a pose of an object in the surroundings of the object by means of multi-task learning
CN111274964A (en) Detection method for analyzing water surface pollutants based on visual saliency of unmanned aerial vehicle
JPWO2019215780A1 (en) Identification system, model re-learning method and program
Ponsa et al. Cascade of classifiers for vehicle detection
CN112560969A (en) Image processing method for human weight recognition, model training method and device
KR101688910B1 (en) Method and apparatus for masking face by using multi-level face features
Elmogy Landmark manipulation system for mobile robot navigation

Legal Events

Date Code Title Description
STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8