US11272163B2 - Image processing apparatus and image processing method - Google Patents
Image processing apparatus and image processing method Download PDFInfo
- Publication number
- US11272163B2 US11272163B2 US16/474,946 US201816474946A US11272163B2 US 11272163 B2 US11272163 B2 US 11272163B2 US 201816474946 A US201816474946 A US 201816474946A US 11272163 B2 US11272163 B2 US 11272163B2
- Authority
- US
- United States
- Prior art keywords
- disparity
- image
- cost
- pixel
- images
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/271—Image signal generators wherein the generated image signals comprise depth maps or disparity maps
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/251—Fusion techniques of input or preprocessed data
-
- G06K9/4609—
-
- G06K9/6267—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/20—Image enhancement or restoration by the use of local operators
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
- G06T7/593—Depth or shape recovery from multiple images from stereo images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
- G06V10/12—Details of acquisition arrangements; Constructional details thereof
- G06V10/14—Optical characteristics of the device performing the acquisition or on the illumination arrangements
- G06V10/143—Sensing or illuminating at different wavelengths
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/803—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of input or preprocessed data
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/204—Image signal generators using stereoscopic image cameras
- H04N13/25—Image signal generators using stereoscopic image cameras using two or more image sensors with different characteristics other than in their location or field of view, e.g. having different resolutions or colour pickup characteristics; using image signals from one sensor to control the characteristics of another sensor
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/204—Image signal generators using stereoscopic image cameras
- H04N13/254—Image signal generators using stereoscopic image cameras in combination with electromagnetic radiation sources for illuminating objects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/275—Image signal generators from 3D object models, e.g. computer-generated stereoscopic image signals
- H04N13/279—Image signal generators from 3D object models, e.g. computer-generated stereoscopic image signals the virtual viewpoint locations being selected by the viewers or determined by tracking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
- G06T2207/10012—Stereo images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10048—Infrared image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20016—Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
Definitions
- the present disclosure relates to an image processing apparatus, an image processing method, and a program.
- the present disclosure relates to an image processing apparatus, an image processing method, and a program which are capable of improving the accuracy of stereo matching using images captured from two different viewpoints to detect an object with high accuracy.
- a visible camera and a far-infrared camera are provided side by side as in-vehicle cameras and stereo matching based on images captured by the two cameras is performed, which makes it possible to detect a person and to measure the distance to the person.
- Non-Patent Literature 1 Multispectral Pedestrian Detection: Benchmark Dataset and Baseline (CVPR2015_MutispectalPedestrian.pdf) discloses a pedestrian detection process using a visible image and a far-infrared image.
- Non-Patent Literature 1 discloses a configuration that learns feature amounts, such as the brightness, color, and gradient (strength and a direction) of a visible image (RGB image), and feature amounts related to the temperature and gradient (strength and a direction) of a far-infrared image, using machine learning, to detect a pedestrian.
- the application of the disclosed method makes it possible to detect a pedestrian even in a scene in which it is difficult to capture visible images, such as at night.
- Non-Patent Literature 1 a beam splitter is used to align the optical axes of the visible camera and the far-infrared camera. In this case, the size of the apparatus increases, which results in an increase in cost.
- the stereo matching has a problem of how to set a block size and a search range to be applied to block matching.
- the search range is too narrow
- the range in which the disparity can be detected is narrowed and scenes capable of responding to the narrowing of the range are limited.
- the search range is too wide
- the number of candidates increases and the number of errors in estimation increases.
- the processing time increases.
- Another problem of the object detection technique is a large amount of calculation.
- the present disclosure has been made in view of, for example, the above-mentioned problems and an object of the present disclosure is to provide an image processing apparatus, an image processing method, and a program which are capable of executing generation of a disparity map using images captured from two different viewpoints and an object detection process with high accuracy and efficiency.
- An object of an embodiment of the present disclosure is to provide an image processing apparatus, an image processing method, and a program which are capable of executing, for example, stereo matching, generation of a disparity map, and an object detection process with high accuracy and efficiency in image processing using a visible image and a far-infrared image.
- an image processing apparatus including an object detection unit that receives two images captured from different viewpoints and performs an object detection process.
- the object detection unit includes a disparity calculation unit that calculates a disparity of each pixel of the two images and generates a disparity map including calculated disparity information, and a classification unit that performs the object detection process using the disparity map generated by the disparity calculation unit.
- the disparity calculation unit generates disparity maps corresponding to a plurality of different resolutions and outputs the disparity maps to the classification unit.
- an image processing method to be executed in an image processing apparatus.
- the image processing method includes an object detection processing step of allowing an object detection unit to receive two images captured from different viewpoints and to perform an object detection process.
- the object detection processing step includes a disparity calculation step of allowing a disparity calculation unit to calculate a disparity of each pixel of the two images and to generate a disparity map including calculated disparity information, and a classification processing step of allowing a classification unit to perform the object detection process using the disparity map generated in the disparity calculation step.
- disparity calculation step disparity maps corresponding to a plurality of different resolutions are generated and output to the classification unit.
- a program that causes image processing to be executed in an image processing apparatus.
- the program causes an object detection unit to execute an object detection processing step of receiving two images captured from different viewpoints and executing an object detection process.
- the program causes a disparity calculation unit to execute a disparity calculation step of calculating a disparity of each pixel of the two images and generating a disparity map including calculated disparity information and causes a classification unit to execute a classification processing step of performing the object detection process using the disparity map generated in the disparity calculation step.
- disparity calculation step disparity maps corresponding to a plurality of different resolutions are generated and output to the classification unit.
- the program according to the present disclosure can be provided by a storage medium or a communication medium which is provided in a computer-readable form to an information processing apparatus or a computer system capable of executing various program codes. Since the program is provided in a computer readable form, processes corresponding to the program are implemented in the information processing apparatus or the computer system.
- a system is a logical set configuration of a plurality of apparatuses and is not limited to the configuration in which the apparatuses are provided in the same housing.
- an apparatus and a method that perform generation of a disparity map and an object detection process with high accuracy and efficiency are achieved.
- the apparatus includes a disparity calculation unit that receives two images captured from different viewpoints, calculates a disparity, and generates a disparity map and a classification unit that performs an object detection process using the disparity map.
- the disparity calculation unit performs a stereo matching process using an original-resolution image, generates cost volumes corresponding to a plurality of resolutions from the processing result, generates disparity maps and object candidate region maps corresponding to the plurality of different resolutions, using the cost volumes corresponding to each resolution, and outputs the disparity maps and the object candidate region maps to the classification unit.
- the apparatus and the method that perform the generation of the disparity map and the object detection process with high accuracy and efficiency are achieved by these processes.
- FIG. 1 is a diagram illustrating a correspondence relationship between the type of captured image and the wavelength of light.
- FIGS. 2A and 2B are diagrams illustrating an example of the arrangement of pixels in a visible image and a far-infrared image.
- FIG. 3 is a diagram illustrating an example of the configuration of an image processing apparatus according to the present disclosure.
- FIG. 4 is a diagram illustrating the configuration and process of an image processing unit.
- FIG. 5 is a diagram illustrating the configuration and process of an object detection unit.
- FIG. 6 is a diagram illustrating the configuration and process of a disparity calculation unit.
- FIG. 7 is a flowchart illustrating a process performed by a pixel matching unit of the disparity calculation unit.
- FIGS. 8A and 8B are diagrams illustrating an example of an imaging configuration, captured images, and parameters including a disparity.
- the image processing apparatus performs image processing for images captured from two different viewpoints.
- the process according to the present disclosure is not limited to the combination of the visible image and the far-infrared image and may be applied to combinations of other images, such as, a combination of a visible image and an infrared image and a combination of a visible image and a visible light image. That is, the process may be applied to any combination of images captured from two different viewpoints.
- a visible image 10 is an image in a wavelength range of about 0.4 ⁇ m to 0.7 ⁇ m and is a color image such as an RGB image captured by a general camera.
- the infrared image is an image formed by long-wavelength light with a wavelength of 0.7 ⁇ m or more.
- An infrared imaging camera that captures infrared images can capture an image of an object, such as a person, that generates heat in the dark and is used as, for example, a surveillance camera.
- infrared rays are divided into near-infrared rays with a wavelength of about 0.7 ⁇ m to 1 ⁇ m, mid-infrared rays with a wavelength of about 3 ⁇ m to 5 ⁇ m, and far-infrared rays with a wavelength of about 8 ⁇ m to 14 ⁇ m as illustrated in FIG. 1 .
- process according to the present disclosure is not limited to the far-infrared image and may be applied to processes using other infrared images.
- FIGS. 2A and 2B are diagrams illustrating an example of the arrangement of pixels on an imaging element that captures the visible image 10 and the far-infrared image 20 .
- the visible image illustrated in FIG. 2A shows an example of a Bayer array of R, G, and B pixels.
- the Bayer array is used for imaging elements of many visible imaging cameras.
- Each pixel of the imaging element outputs an electric signal corresponding to the amount of light with R, G, or B wavelengths.
- the far-infrared image illustrated in FIG. 2B is obtained by capturing light with a far-infrared (FIR) wavelength at all pixel positions.
- FIR far-infrared
- an infrared imaging element has a lower resolution than a visible imaging element.
- the reason is that infrared light, particularly, far-infrared light has a long wavelength and it is difficult for an imaging element having a high-density pixel array to use infrared light.
- FIG. 3 is a block diagram illustrating the configuration of an imaging apparatus which is an example of an image processing apparatus 100 according to the present disclosure.
- the image processing apparatus is not limited to the imaging apparatus and also includes an information processing apparatus such as a PC that receives an image captured by the imaging apparatus and performs image processing.
- Image processing other than the imaging process described in the following embodiment can be performed not only in the imaging apparatus, but also in the information processing apparatus such as a PC.
- the image processing apparatus 100 as the imaging apparatus illustrated in FIG. 3 includes a control unit 101 , a storage unit 102 , a codec 103 , an input unit 104 , an output unit 105 , an imaging unit 106 , and an image processing unit 120 .
- the imaging unit 106 includes a visible imaging unit 107 that captures a general a visible image and an infrared imaging unit 108 that captures a far-infrared image.
- the process according to the present disclosure can be applied not only to a combination of a visible image and a far-infrared image, but also to combinations of other images, such as a combination of a visible image and an infrared image and a combination of a visible image and a visible image.
- the visible imaging unit 107 includes a first imaging element 111 that captures a visible image.
- the first imaging element 111 includes, for example, R, G and B pixels that are arranged in the Bayer array described with reference to FIG. 2A and each pixel outputs a signal corresponding to input light of each of R, G, and B.
- the far-infrared imaging unit 108 includes a second imaging element 112 that captures a far-infrared image.
- the second imaging element 112 includes, for example, pixels on which far-infrared light is incident as described with reference to FIG. 2B . Each pixel outputs an electric signal corresponding to the amount of incident far-infrared light.
- the visible imaging unit 107 and the infrared imaging unit 108 are two imaging units set at positions that are a predetermined distance away from each other and capture images from different viewpoints.
- the same object image is not captured by the corresponding pixels, that is, the pixels at the same position in two images captured from different viewpoints and object deviation corresponding to disparity occurs.
- each of the visible imaging unit 107 and the infrared imaging unit 108 captures one still image. That is, a total of two still images are captured. In a case in which a moving image is captured, each of the imaging units captures continuous image frames.
- control unit 101 controls the imaging timing of the imaging units.
- the control unit 101 controls various processes of the imaging apparatus 100 , such as an imaging process, signal processing for a captured image, an image recording process, and a display process.
- the control unit 101 includes, for example, a CPU that performs processes according to various processing programs stored in the storage unit 102 and functions as a data processing unit that executes programs.
- the storage unit 102 is, for example, a RAM or a ROM that functions as a captured image storage unit, a storage unit storing processing programs executed by the control unit 101 or various parameters, and a work area at the time of data processing.
- the codec 103 performs a coding and decoding process such as a process of compressing and decompressing a captured image.
- the input unit 104 is, for example, a user operation unit and is used to input control information such as information related to the start and end of imaging and the setting of various modes.
- the output unit 105 includes a display unit and a speaker and is used to display captured images and through images and to output voice.
- the image processing unit 120 receives two images captured by the imaging unit 106 and performs image processing using the two images.
- FIG. 4 is a block diagram illustrating a specific configuration of the image processing unit 120 in the image processing apparatus 100 described with reference to FIG. 3 .
- the image processing unit 120 includes a calibration execution unit 140 and an object detection unit 200 .
- the calibration execution unit 140 receives a first image 131 which is a visible image captured by the first imaging element 111 of the visible imaging unit 107 in the imaging unit 106 and a second image 132 which is a far-infrared image captured by the second imaging element 112 of the far-infrared imaging unit 108 and performs a calibration process to generate a corrected first image 151 and a corrected second image 152 .
- the visible imaging unit 107 and the far-infrared imaging unit 108 in the imaging unit 106 are provided at the positions that are separated from each other and there is a difference between parameters of lenses forming the imaging units.
- the first image 131 captured by the visible imaging unit 107 and the second image 132 captured by the far-infrared imaging unit 108 are different in characteristics such as distortion, magnification, and resolution.
- the calibration execution unit 140 performs a distortion correction process and a magnification correction process for each of the first image 131 and the second image 132 .
- a parallelization process is performed for the images in order to facilitate a stereo matching process to be performed in the subsequent stage.
- the point is aligned on the same line of each image by the parallelization process.
- disparity in the horizontal direction occurs in accordance with the distance between the camera and one point in the three-dimensional space.
- the corrected first image 151 and the corrected second image 152 which are images calibrated by the calibration execution unit 140 are input to the object detection unit 200 .
- the object detection unit 200 performs an object detection process, for example, a person detection process using the corrected first image 151 and the corrected second image 152 after the calibration.
- the object detection unit 200 outputs an object detection result 170 as the processing result.
- the processing result is the detection result of a person.
- the object detection unit 200 includes a feature amount extraction unit 210 , a disparity calculation unit 220 , and a classification unit 230 .
- Each of the feature amount extraction unit 210 , the disparity calculation unit 220 , and the classification unit 230 of the object detection unit 200 receives the corrected first image 151 and the corrected second image 152 after the calibration and performs processes.
- the feature amount extraction unit 210 receives the corrected first image 151 and the corrected second image 152 after the calibration and extracts feature amounts from each of the images.
- the extracted feature amounts are feature amounts corresponding to the images.
- brightness, color information, and gradient information are extracted from the corrected first image 151 which is a visible image.
- temperature and gradient information are extracted from the corrected second image 152 which is a far-infrared image.
- the far-infrared image is, for example, a monochrome image formed by grayscale pixel values corresponding to the temperature such as the human body temperature.
- the temperature indicated by the pixel value of the monochrome image and the gradient information of the pixel value are extracted as the feature amounts.
- the feature amount extraction unit 210 outputs each of the following feature amount data items:
- the feature amount information items are input to the disparity calculation unit 220 and the classification unit 230 .
- the disparity calculation unit 220 generates a disparity map 225 , in which the disparity information of the corrected first image 151 and the corrected second image 152 has been reflected, using the feature amounts of two images.
- the disparity map is, for example, a map indicating the positional deviation d (pixels) of a corresponding pixel of the corrected second image 152 from each pixel (x, y) forming the corrected first image 151 .
- the position of the corresponding pixel of the corrected second image 152 including an image corresponding to the image of the pixel (x, y) forming the corrected first image 151 is (x+d, y).
- the disparity calculation unit 220 generates the disparity map, in which the disparity information of the corrected first image 151 and the corrected second image 152 has been reflected, and the number of disparity maps generated is not one.
- the disparity calculation unit 220 generates a plurality of disparity maps 225 corresponding to a plurality of different resolutions.
- the disparity calculation unit 220 calculates an object candidate region map 226 in which the existence probability of the object to be detected, for example, a person is represented in each pixel, using an evaluation value used for disparity calculation.
- the disparity calculation unit 220 For the object candidate region map 226 , similarly to the disparity map 225 , the disparity calculation unit 220 generates a plurality of object candidate region maps 226 corresponding to a plurality of different resolutions.
- the disparity map 225 and the object candidate region map 226 generated by the disparity calculation unit 220 are input to the classification unit 230 .
- the classification unit 230 receives the disparity map 225 and the object candidate region map 226 generated by the disparity calculation unit 220 and receives the first image feature amount 211 and the second image feature amount 212 from the feature amount extraction unit 210 .
- the classification unit 230 performs a process of detecting the object to be detected, on the basis of the input information. For example, in a case in which the detection target is a person, the classification unit 230 determines whether a person is present in each image region of the corrected first image 151 or the corrected second image and performs a process of classifying the image regions into a region in which the existence possibility of a person is high and a region in which the existence possibility of a person is low.
- the classification unit 230 selects a region determined to have a high possibility of including the object to be detected, for example, a person by the object candidate region map 226 , using the disparity map 225 or the object candidate region map 226 generated by the disparity calculation unit 220 and determines whether the object to be detected, for example, a person is present in the selected region.
- the determination process is performed using the feature amount information 211 and 212 in the same region of the corrected first image 151 which is a visible image and the corrected second image 152 which is a far-infrared image.
- the classification unit 230 passes the feature amounts through a machine-learned classifier to generate the determination result of whether the object to be detected, for example, a person is present in each image region.
- the classification unit 230 generates the object detection result 170 illustrated in FIG. 5 and outputs object detection result 170 .
- the object detection process using machine learning in the classification unit 230 can be performed using, for example, aggregated channel features (ACF) which are a detection algorithm disclosed in the above-mentioned Non-Patent Literature 1 [Multispectral Pedestrian Detection: Benchmark Dataset and Baseline (CVPR2015_MutispectalPedestrian.pdf)].
- ACF aggregated channel features
- the disparity calculation unit 220 includes a pixel matching unit 221 , a cost volume filtering unit 222 , and a disparity decision unit 223 .
- FIG. 7 is a flowchart illustrating the process performed by the pixel matching unit 221 .
- Step S 101 the pixel matching unit 221 receives the following:
- Step S 102 the pixel matching unit 221 acquires parameters applied to a search range section decision process to be performed in the next Step S 103 .
- the pixel matching unit 221 acquires parameters such as the size (actual size L) of the object to be detected and a baseline length B.
- the height of a person is set to the size L of the object to be detected.
- L is set to 170 cm.
- the baseline length B is the distance between the optical axes of the cameras capturing two images.
- the distance between the optical axes of the visible imaging unit 107 and the far-infrared imaging unit 108 described with reference to FIG. 3 is acquired as the baseline length B.
- the pixel matching unit 221 decides a search range section which is a corresponding point search region, using at least one of the actual size of the object to be detected, the size of the object to be detected on the image, or the baseline length corresponding to the distance between two cameras capturing two images.
- the pixel matching unit 221 performs a search range section decision process in Step S 103 and sets a candidate disparity in Step S 104 ,
- the search range section is a second image search section that is set in a case in which a corresponding point of the first image is searched from the second image.
- the candidate disparity is a disparity corresponding to a pixel position which is determined whether it is actually a corresponding point in the search section.
- Steps S 103 and S 104 The process in Steps S 103 and S 104 will be described in detail below.
- search process the processing time of the corresponding point search process (search process) is shortened and it is possible to perform an efficient process.
- search process the processing time of the corresponding point search process
- Step S 103 The search range section decision process performed in Step S 103 will be described with reference to FIGS. 8A and 8B .
- FIG. 8A is a diagram illustrating an example of an imaging configuration.
- FIG. 8B is a diagram illustrating an example of captured images.
- the object to be detected is a “person” as illustrated in FIG. 8A .
- Images including the object to be detected are captured by two cameras, that is, camera 1 and camera 2 illustrated in FIG. 8A .
- the camera 1 and the camera 2 correspond to the visible imaging unit 107 and the far-infrared imaging unit 108 described with reference to FIG. 3 , respectively.
- FIG. 8B illustrates an example of the images captured by the camera 1 and the camera 2 .
- the first image is an image captured by the camera 1 and the second image is an image captured by the camera 2 .
- the camera 1 and the camera 2 capture images at positions that are a distance corresponding to the baseline length B [m] away from each other and the pixel positions (corresponding points) of the same object deviate from each other in the horizontal direction.
- the amount of deviation is a disparity d [pixels].
- the size (height), that is, the actual size of the object to be detected (person) is L (m).
- the size (height) of the object to be detected (person) on the first image which is an image captured by the camera 1 is h [pixels].
- the baseline length B is a value obtained by camera calibration.
- the height L [m] of the object to be detected may be the average height of persons.
- the size (height) h of a person on the image is uniquely determined since it is difficult to know in advance how large the object to be detected appears on the image.
- the size (height) h of the person on the captured image is small. In a case in which the person is close to the camera, the size (height) h of the person on the captured image is large.
- the size of the object to be detected on the captured image varies in accordance with the distance between the object and the camera.
- the following process is performed as a general process for reliably detecting the object to be detected.
- images with a plurality of different resolutions are generated from the captured image and the object detection process is repeatedly and sequentially performed for the generated images with the plurality of resolutions.
- This process is performed as a general object detection process.
- images with a plurality of resolutions specifically, images enlarged or reduced at a plurality of different enlargement or reduction ratios are generated on the basis of the image (original-resolution image) captured by the camera and object detection is performed for the plurality of images while shifting a fixed-size detection window.
- the object detection process using a plurality of different images will be described with reference to FIG. 9 .
- FIG. 9 illustrates an example of the object detection processes using the following three types of images:
- Step 1 An object detection process using an original-resolution image
- Step 2 An object detection process using an S1-fold resolution image (S1-fold reduced image).
- Step 3 An object detection process using an S2-fold resolution image (S2-fold reduced image).
- S1 and S2 are equal to or less than 1.
- S1 is 1 ⁇ 2 and S2 is 1 ⁇ 4.
- the S1-fold resolution image is a reduced image with a resolution that is half the resolution of the original-resolution image.
- the S2-fold resolution image is a reduced image with a resolution that is a quarter of the resolution of the original-resolution image.
- Step 1 the object detection process using the original-resolution image is performed.
- the original-resolution image in (Step 1) is an image captured by the camera, is not subjected to a resolution conversion process, such as enlargement or reduction, and is an image with the same resolution as the image captured by the camera, that is, an original-resolution image.
- width and height indicate the horizontal size [pixels] and vertical size [pixels] of the original-resolution image, respectively.
- Step 1 the object detection process using a detection window with a predetermined size is performed for this image.
- box_w and box_h indicate the horizontal size [pixels] and vertical size [pixels] of the detection window, respectively.
- the object detection process using the detection window detects the feature amount of an image in the detection window, determines whether the detected feature amount is matched with or similar to the predetermined feature amount of the object to be detected, and determines whether the object to be detected is present in the window.
- the object to be detected is a “person” and an image having characteristics (for example, an edge or brightness) corresponding to the head or face of the person located in the upper part of the detection window, the body or hands of the person located at the center of the detection window, and the feet of the person located in the lower part of the detection window has been detected, it is determined that the person is present in the window.
- characteristics for example, an edge or brightness
- the detection window is sequentially moved one pixel by one pixel from the upper left end of the original-resolution image in the right direction and the downward direction to determine whether the feature amount corresponding to the object to be detected is present in each pixel region.
- an image region of a “person” that is the object to be detected is present on the lower right side. Since the image size of the person is larger than the size (w ⁇ h) of the detection window, it is difficult to determine that the image of the person is in the detection window and the detection fails.
- Step 2 the original-resolution image which is an image captured by the camera is reduced to generate the S1-fold resolution image and the same object detection process is performed for the S1-fold resolution image.
- the detection window having the same size (w ⁇ h) is applied and is moved from the upper left end of the S1-fold resolution image to the lower right end to determine whether the feature amount corresponding to the object to be detected is present in each pixel region.
- Step 2 similarly, an image region of a “person” that is the object to be detected is present on the lower right side.
- Step 2 since the image size of the person is larger than the size (w ⁇ h) of the detection window, it is difficult to determine that the image of the person is in the detection window and the detection fails.
- Step 3 the image is further reduced to generate the S2-fold resolution image and the same object detection process is performed for the S2-fold resolution image.
- the detection window having the same size (w ⁇ h) is applied and is moved from the upper left end of the S2-fold resolution image to the lower right end to determine whether the feature amount corresponding to the object to be detected is present in each pixel region.
- the size of the image of a “person” is equal to the size (w ⁇ h) of the detection window and it can be determined that the image of the person is present in the detection window. As a result, the detection succeeds.
- the object to be detected is detected.
- the disparity d [pixels] calculated in accordance with the above-mentioned is the disparity d in a case in which the original-resolution images are used as the images captured from two different viewpoints on the premise of the above-mentioned (Expression 1) and (Expression 2).
- Step 1 the original-resolution image is applied and the detection window with a size of (w ⁇ h) is applied.
- the disparity d [pixels] calculated in accordance with the above-mentioned (Expression 3) is the number of pixels corresponding to the positional deviation between the original-resolution images which are the images captured from two different viewpoints.
- the S1-fold resolution image or the S2-fold resolution image is applied and the detection window with the same size (w ⁇ h) is applied.
- the disparity d [pixels] calculated in accordance with (Expression 3) by substituting box_h into h of the above-mentioned (Expression 3) assuming that the size (h) of the object to be detected on the image is equal to the size (box_h) of the detection window does not correspond to the number of pixels corresponding to the positional deviation between the original-resolution images.
- the disparity calculation unit 220 of the object detection unit 200 in the image processing apparatus 100 according to the present disclosure illustrated in FIG. 6 generates a plurality of disparity maps 225 , that is, disparity maps corresponding to a plurality of resolution images and outputs the generated disparity maps 225 .
- the disparity calculation unit 220 generates the following maps corresponding to three types of resolution images and outputs the maps:
- the disparity calculation unit 220 converts the size of the detection window into a size at the original resolution as follows:
- h box_h in a case in which the resolution of the output disparity map is the original resolution
- h box_h/S1 in a case in which the resolution of the output disparity map is the S1-fold resolution image
- h box_h/S2 in a case in which the resolution of the output disparity map is the S2-fold resolution image.
- the disparity calculation unit 220 calculates the disparity d in accordance with the above-mentioned (Expression 3), generates the disparity maps 225 corresponding to each resolution, and outputs the disparity maps 225 .
- the output of the disparity calculation unit 225 includes the disparity maps 225 corresponding to a plurality of different resolutions.
- a search range and a block size most suitable for the resolution of the disparity map which is finally output are set.
- images with a plurality of resolutions are prepared. Then, stereo matching is not performed for each of the images, but is performed, using only the original-resolution image, to reduce intermediate data (cost volume). Finally, the intermediate data (cost volume) is used to obtain disparity maps corresponding to a plurality of resolutions.
- the disparity d can be accurately calculated in accordance with the above-mentioned (Equation 3), on the basis of the size (height) h of the object to be detected on the images with each resolution. Therefore, searching in the pixel matching is not required.
- the object to be detected is a person
- there are individual differences in height for example, adults and children
- the height also changes in accordance with a change in posture.
- This setting is the setting of a search range (search range section) in the process of searching for a corresponding point between the original-resolution images which are the images captured from different viewpoints, specifically, the corrected first image 151 and the corrected second image input to the disparity calculation unit 220 illustrated in FIG. 6 in this embodiment.
- a region in the range of ⁇ 2 pixels to +2 pixels from a position that deviates from the same pixel position in the corresponding point search image as that in the reference image by the disparity do in the horizontal direction is set as the search range section.
- the pixel margin to be set and the selection of candidate disparities may be changed in accordance with the image resolution of the disparity map that is finally output.
- the disparity d is calculated from the actual height L of the object to be detected, the size h of the object on the image, and the baseline length B between the cameras and only a search pixel center position decided by the disparity d and the periphery of the search pixel center position are set as the search region (search range section).
- This process can reduce the amount of calculation and matching errors caused by extra search.
- Step S 103 The search range section decision process of Step S 103 and the candidate disparity setting process of Step S 104 in the flow illustrated in FIG. 7 have been described above.
- Step S 105 of the flow illustrated in FIG. 7 the pixel matching unit 221 performs a stereo matching process using the original-resolution image.
- the pixel matching unit 221 calculates the similarity between the pixels corresponding to the candidate disparity in the search range section decided in Steps S 103 and S 104 and searches for the corresponding points of the visible image and the far-infrared image which are images captured from different viewpoints, that is, the corrected first image 151 and the corrected second image 152 illustrated in FIG. 5 .
- the similarity calculation process is a process of determining the similarity between the pixels corresponding to the candidate disparity in the search region (search range section) decided in Steps S 103 and S 104 , that is, pixels which are actually corresponding point determination targets in two images in the search section.
- the similarity calculation process is a pixel matching determination process.
- the brightness and color information and gradient information (strength and a direction) of the visible image and the temperature information and gradient information (strength and a direction) of the far-infrared image are used as the feature amounts in the detection of a pedestrian using the visible image and the far-infrared image.
- the feature amount extraction unit 210 illustrated in FIG. 5 acquires the following feature amounts from the visible image and the far-infrared image, that is, the corrected first image 151 and the corrected second image 152 illustrated in FIG. 5 :
- ⁇ is an evaluation value related to the cosine of a value obtained by multiplying the difference ⁇ between the gradient directions of a pixel (x, y) of the visible image and a pixel (x+d, y) of the far-infrared image by 2 as defined by (Expression 5) and (Expression 6).
- the double-angle cosine is used in order to allow the reversal of the gradient directions of the visible image and the far-infrared image.
- the evaluation value ⁇ related to the gradient direction is weighted with gradient strength “min(Mag 1 (x, y), Mag 2 (x+d, y))” to calculate the similarity.
- the evaluation value is weighted with the smaller of the gradient strength values of the two images in order to increase the similarity only in a case in which an edge (large gradient) common to the two images is present.
- This configuration makes it possible to obtain reliable similarity in the pixels in which there is an edge common to the two images, such as the contour of a person, is present.
- the similarity evaluation method is not limited to this method and various methods can be used in accordance with input sensor information (image).
- Step S 106 the pixel matching unit 221 determines whether the stereo matching process of Step S 105 for all of the pixels has ended. In a case in which the stereo matching process for all of the pixels has not ended, the pixel matching unit 221 continuously performs the process of Step S 105 for unprocessed pixels.
- the pixel matching unit 221 ends the process.
- a cost volume which is a stack of cost planes in which similarity for each pixel is set to each of the pixels forming the image is generated for all of the candidate disparities as illustrated in FIG. 10 .
- FIG. 10 illustrates an example of a cost volume 300 .
- the cost volume 300 includes a plurality of cost planes 301 - 1 to 301 - n.
- the cost plane is a monochrome image in which a pixel value that becomes closer to black as the similarity becomes higher and becomes closer to white as the similarity becomes lower is set to each pixel.
- the cost plane may be a monochrome image in which a pixel value that becomes closer to white as the similarity becomes higher and becomes closer to black as the similarity becomes black is set to each pixel.
- the cost plane may be an image in which a color corresponding to the similarity is set or may be a map in which a numerical value corresponding to the similarity is associated with a pixel position.
- FIG. 11 illustrates an example of the following three cost planes:
- a region of the “person” that is the object to be detected is set black in the disparity map of this plane. The size of the person is large since the distance from the camera is far.
- a region of the “person” that is the object to be detected is set black in the disparity map of this plane.
- the size of the person is medium since the distance from the camera is medium.
- a region of the “person” that is the object to be detected is set black in the disparity map of this plane. The size of the person is small since the distance from the camera is far.
- the cost planes 301 - 1 to 301 - n forming the cost volume 300 are planes generated on the basis of the result of the stereo matching process performed by the pixel matching unit 221 using the original-resolution image and all of the cost planes 301 - 1 to 301 - n have a resolution corresponding to the original-resolution image.
- the cost volume filtering unit 222 performs a process of filtering the cost volume described with reference to FIGS. 10 and 11 to generate cost volumes corresponding to a plurality of different resolutions.
- the cost volume filtering unit 222 generates three types of cost volumes using the following three types of cost planes for generating cost volumes:
- a plurality of resolution cost volumes generated by (a) to (c) corresponds to three types of cost volumes to which the following three types of cost planes described with reference to FIG. 11 belong:
- S1 and S2 are equal to or less than 1, S1 is, for example, 1 ⁇ 2, and S2 is, for example, 1 ⁇ 4.
- the S1-fold resolution image is a reduced image with a resolution that is half the resolution of the original-resolution image.
- the S2-fold resolution image is a reduced image with a resolution that is a quarter of the resolution of the original-resolution image.
- FIG. 13 is a flowchart illustrating the process of the cost volume filtering unit 222 that performs the cost volume filtering process.
- Step S 201 the cost volume filtering unit 222 selects one cost plane to be processed from the cost volume described with reference to FIG. 10 .
- the cost volume filtering unit 222 sequentially selects one cost plane from the n cost planes.
- Step S 202 the cost volume filtering unit 222 performs a step setting process.
- the step is the spacing between the pixels to be subjected to filtering, that is, a so-called thinning process.
- the cost volume filtering unit 222 changes the spacing between the pixels to be filtered to generate a cost volume of low-resolution images from a cost volume of high-resolution images (original-resolution images).
- the step setting in Step S 202 is a process of setting the spacing between the pixels to be filtered.
- the reciprocal of the magnification of the image is set as the spacing between the pixels to be filtered.
- the setting of the step varies in accordance with the resolution of the disparity map output from the disparity calculation unit 220 to the classification unit 230 .
- the step is set as follows.
- the step (filtering pixel spacing) is set to one pixel.
- the step (filtering pixel spacing) is set to (1/S1) pixels.
- the step (filtering pixel spacing) is set to (1/S2) pixels.
- the output disparity map is a 1 ⁇ 2-fold resolution image.
- the output disparity map is a 1 ⁇ 4-fold resolution image.
- the step (filtering pixel spacing) is set to one pixel, no pixels are thinned out.
- the image is output without being reduced.
- Step S 203 the cost volume filtering unit 222 sets the block size (kernel size) of a filter.
- the block size (kernel size) of the filter corresponds to the size of a filter applied to generate each resolution image and is the size of a block that defines a pixel region of surrounding pixels to be referred to in a case in which pixel values forming each resolution (original-resolution/S1-fold resolution/S2-fold resolution) image are calculated.
- the block size (kernel size) of the filter is set in accordance with the resolution of the cost volume to be generated. Specifically, the block size of the filter is set as follows.
- the block size (kernel size) of the filter is set to (box_w, box_h).
- the block size (kernel size) of the filter is set to (box_w/S1, box_h/S1).
- the block size (kernel size) of the filter is set to (box_w/S2, box_h/S2).
- the size of the detection window set in order to detect the object to be detected (for example, a person) which has been described with reference to FIG. 9 can be used as the block size (kernel size) of the filter.
- the detection window corresponds to the size of the detection window that is used by the classification unit 230 in the subsequent stage to determine whether an object in each detection window is the object to be detected, for example, a person.
- Step S 204 the cost volume filtering unit 222 performs the filtering process on the basis of the step (filtering pixel spacing) set in Step S 202 and the block size (kernel size) set in Step S 203 .
- the cost volume filtering unit 222 performs the filtering process for the cost plane selected in accordance with the resolution of the disparity map to be output.
- FIG. 14 illustrates an example of a process in a case in which the resolution of the output disparity map is the original resolution.
- the filtering process is performed as a process of applying an averaging filter to the cost planes of each candidate disparity in the search range section 0.
- a value after filtering at a pixel (x, y) (Similarity(x, y, d)) is represented by the following (Expression 7).
- the size of the filtered cost plane formed by Similarity(x, y, d) calculated by the above-mentioned (Expression 7) is an original resolution (width, height).
- the filtering process is performed as a process of applying an averaging filter to the cost planes of each candidate disparity in the search range section 1.
- a value after filtering at the pixel (x, y) (Similarity(x, y, d)) is represented by the following (Expression 8).
- the size of the filtered cost plane formed by Similarity(x, y, d) calculated by the above-mentioned (Expression 8) is an S1-fold resolution (S1 ⁇ width, S1 ⁇ height).
- the filtering process is performed as a process of applying an averaging filter to the cost planes of each candidate disparity in the search range section 2.
- a value after filtering at the pixel (x, y) (Similarity(x, y, d)) is represented by the following (Expression 9).
- the size of the filtered cost plane formed by Similarity(x, y, d) calculated by the above-mentioned (Expression 9) is an S2-fold resolution (S2 ⁇ width, S2 ⁇ height).
- the cost volume filtering unit 222 performs the filtering process for the cost plane selected in accordance with the resolution of the disparity map to be output.
- cost volumes corresponding to three types of different resolutions are generated:
- the cost volume filtering unit 222 changes the spacing between the pixels to be filtered in accordance with the magnitude of the disparity d to generate a low-resolution cost volume from a high-resolution cost volume.
- the kernel size of the filter can be set in accordance with the resolution of the cost volume to be generated to obtain a matching result at a block size suitable for the size of the object to be detected.
- the cost volume filtering unit 222 changes the spacing between the pixels to be filtered to generate a low-resolution cost volume from a high-resolution cost volume.
- the disparity decision unit 223 decides a disparity value with the highest similarity for each pixel of the cost volumes corresponding to each resolution input from the cost volume filtering unit 222 and generates a disparity map.
- Step S 301 the disparity decision unit 223 selects the resolution of the cost volume to be processed.
- the disparity decision unit 223 receives the cost volumes corresponding to each resolution from the cost volume filtering unit 222 . Specifically, for example, the disparity decision unit 223 receives the following three types of cost volumes described with reference to FIGS. 14 to 17 :
- Step S 301 the disparity decision unit 223 selects, for example, the resolution of the cost volume to be processed from the cost volumes corresponding to each resolution described in (1) to (3).
- Step S 302 the disparity decision unit 223 performs a disparity decision process on the basis of the cost volume corresponding to the resolution selected as the processing target to generate disparity maps corresponding to each resolution.
- the cost volume corresponding to one resolution described with reference to FIGS. 14 to 17 includes a plurality of cost planes.
- the pixel value corresponding to the similarity described with reference to (Expression 7) to (Expression 9) is set to each of the cost planes.
- the pixel value is set such that it is closer to black (low brightness) as the similarity becomes higher and is closer to white (high brightness) as the similarity becomes lower.
- the disparity decision unit 223 compares the pixel values (similarities) at the same pixel position (corresponding pixel position) in the plurality of cost planes included in the cost volume corresponding to the selected resolution, selects a cost plane with the highest similarity, and decides the disparity d of the selected cost plane as a disparity D of the pixel position.
- the disparity D(x, y) of a pixel position (x, y) is calculated in accordance with the above-mentioned (Expression 10).
- a disparity map corresponding to one resolution is generated by this process.
- the disparity map is a map in which the value of the difference D(x, y) calculated by the above-mentioned (Expression 10) is set to each pixel position (x, y).
- Step S 303 the disparity decision unit 223 performs an object candidate pixel determination process to generate an object region candidate map.
- the disparity decision unit 223 generates an object candidate region map indicating a region (pixel region) in which the existence probability of the object, for example, the object to be detected, such as a person, is high on the basis of the disparity D(x, y) corresponding to each pixel calculated in the disparity decision process of Step S 302 or the similarity at the disparity D(x, y).
- the value of the similarity is large in the candidate disparity of the correct answer in which a person is present and is small in at the other disparity values.
- the similarity is the same in any candidate disparity.
- the similarity is low since there is no edge (high gradient strength).
- Step S 303 the disparity decision unit 223 determines a region (pixel region) in which the existence probability of the object to be detected, such as a person, is high, in consideration of this situation.
- the pixel is selected as an object candidate pixel and is marked. For example, an object candidate region map in which the object candidate pixel is 1 and the other pixels are 0 is generated.
- Step S 304 the disparity decision unit 223 determines whether the generation of the object candidate region map based on the disparity decision process in Step S 302 k and the object candidate pixel determination in Step S 303 has been completed.
- the disparity decision unit 223 repeatedly performs the process in Steps S 302 and S 303 for an unprocessed pixel.
- Step S 305 the disparity decision unit 223 proceeds to Step S 305 .
- Step S 305 the disparity decision unit 223 determines whether the process for the cost volumes corresponding to all resolutions has ended.
- the disparity decision unit 223 In a case in which there is a cost volume corresponding to an unprocessed resolution, the disparity decision unit 223 repeatedly performs the process from Steps S 301 to S 304 for the cost volume corresponding to an unprocessed resolution.
- the disparity decision unit 223 ends the process.
- the disparity decision unit 223 generates disparity maps and object candidate region maps which correspond to the cost volumes corresponding to a plurality of different resolutions input from the cost volume filtering unit 222 , using the process according to this flow, and outputs these maps to the classification unit 230 illustrated in FIG. 5 .
- the disparity decision unit 223 generates the following data and outputs the generated data to the classification unit 230 :
- the classification unit 230 detects the object to be detected, for example, a person, using the disparity maps and the object candidate region maps corresponding to the plurality of resolutions.
- the classification unit 230 performs a classification process for only the object candidate pixels, using the object candidate region maps.
- the execution of the process for the limited region makes it possible to reduce the amount of calculation.
- the general object detector disclosed in, for example, Non-Patent Literature [ 1 ] performs a detection process while sliding the detection window on the images with a plurality of resolutions. As a result, the amount of calculation is very large. In contrast, in the process according to the present disclosure, the classification process is performed for only the object candidate pixels, using the generated object candidate region map, in the subsequent stage. Therefore, it is possible to reduce the amount of calculation.
- the pixel matching unit 221 of the disparity calculation unit 220 performs the similarity calculation process based on two images captured from a plurality of different in different viewpoints.
- the pixel matching unit 221 of the disparity calculation unit 220 receives the corrected first image 151 which is a calibration image based on the visible image and the corrected second image 152 which is a calibration image based on the far-infrared image and performs the similarity calculation process based on two images captured from two different viewpoints.
- the feature amounts applied to the similarity calculation process are not limited thereto and other feature amounts may be used.
- feature amounts other than the feature amounts calculated in advance may be used.
- the cost volume filtering unit 222 of the disparity calculation unit 220 performs a process, using the averaging filter as the filter applied in the filtering process for the cost plane.
- the filtering process performed for the cost plane by the cost volume filtering unit 222 is not to the process using the averaging filter and may be any process using other filters.
- a Gaussian filter or a bilateral filter may be used.
- a speed-up method using an integral image may be used.
- the disparity decision unit 223 of the disparity calculation unit 220 calculates the disparity decision process which compares the pixel values (similarities) at the same pixel position (corresponding pixel position) in a plurality of cost planes included in the cost volume corresponding to a specific selected resolution, selects a cost plane with the highest similarity, and decides the disparity d of the cost plane as the disparity d of the pixel position, using the above-mentioned (Expression 10).
- the disparity D(x, y) of the pixel position (x, y) is calculated in accordance with the above-mentioned (Expression 10).
- a disparity map corresponding to one resolution is generated by this process.
- the disparity map is a map in which the value of the difference D(x, y) calculated by the above-mentioned (Expression 10) is set to each pixel position (x, y).
- Methods other than the above-mentioned (Expression 10) may be used to calculate the disparity.
- the following method may be applied: after a global optimization process, such as a belief propagation method or a graph-cut method, is performed for the cost volume, the calculation expression of (Expression 10) is applied to the disparity.
- the disparity decision unit 223 of the disparity calculation unit 220 performs a process that generates the object candidate region map indicating the region (pixel region) in which the existence probability of the object, for example, the object to be detected, such as a person, is high on the basis of the disparity D(x, y) corresponding to each pixel calculated in the disparity decision process or the similarity at the disparity D(x, y).
- the process is a process of determining the region (pixel region) in which the existence probability of the object to be detected, such as a person, is high.
- the pixel is selected as the object candidate pixel and is marked.
- the object candidate region map in which the object candidate pixel is 1 and the other pixels are 0 is generated.
- image processing such as morphology processing (closing processing or opening processing) may be performed for the object candidate region map, in which a pixel having a similarity value equal to or greater than a predetermined value is 1 and the other pixels are 0, in order to remove noise and the result of the image processing may be output as the object candidate region map to the classification unit 230 .
- morphology processing closing processing or opening processing
- the image processing apparatus decides the search range of stereo matching from the actual size of the object to be detected, the size of the object on the image, and geometrical information between sensors. Therefore, it is possible to avoid unnecessary searching, to improve the stereo matching performance, and to reduce the amount of calculation.
- the block size of stereo matching is decided from the parameters (the size of the detection window and the resolution of an image in the detection process) of the object detector. Therefore, it is possible to optimize the block size to the object to be detected and to improve the stereo matching performance.
- a multi-resolution cost volume is generated from a high-resolution cost volume. Therefore, it is possible to reduce a feature amount extraction process before the stereo matching and to efficiently generate a multi-resolution disparity map.
- a candidate region in which the existence probability of an object is high is decided on the basis of a score related to the similarity obtained from the result of stereo matching and the classification process in the subsequent stage is performed only for the candidate region. Therefore, it is possible to reduce the amount of calculation of the object detector.
- FIG. 20 is a diagram illustrating an example of the hardware configuration of the image processing apparatus that performs the process according to the present disclosure.
- a central processing unit (CPU) 501 functions as a control unit or a data processing unit that performs various processes in accordance with a program stored in a read only memory (ROM) 502 or a storage unit 508 .
- the CPU 501 performs the process according to the sequence described in the above-mentioned embodiment.
- a random access memory (RAM) 503 stores, for example, programs or data executed by the CPU 501 .
- the CPU 501 , the ROM 502 , and the RAM 503 are connected to each other by a bus 504 .
- the CPU 501 is connected to an input/output interface 505 through the bus 504 .
- An input unit 506 that inputs an image captured by an imaging unit 521 and includes various switches, a keyboard, a mouse, and a microphone which can be used by the user to input information and an output unit 507 that outputs data to, for example, a display unit 522 or a speaker are connected to the input/output interface 505 .
- the CPU 501 performs various processes in response to commands input from the input unit 506 and outputs the processing results to, for example, the output unit 507 .
- the storage unit 508 connected to the input/output interface 505 is, for example, a hard disk drive and stores the programs or various types of data executed by the CPU 501 .
- a communication unit 509 functions as a transmitting and receiving unit for Wi-Fi communication, Bluetooth (registered trademark) (BT) communication, and other types of data communication through a network, such as the Internet or a local area network, and communicates with external apparatuses.
- a drive 510 connected to the input/output interface 505 drives a removable medium 511 , such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory such as a memory card, to record or read data.
- a removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory such as a memory card
- An image processing apparatus including
- an object detection unit that receives two images captured from different viewpoints and performs an object detection process, in which
- the object detection unit includes
- the disparity calculation unit generates disparity maps corresponding to a plurality of different resolutions and outputs the disparity maps to the classification unit.
- the disparity calculation unit includes a pixel matching unit that performs a stereo matching process which is a corresponding point search process using original-resolution images of the two images.
- the pixel matching unit decides a search range section which is a corresponding point search region, using at least one of a value of a height L of an object to be detected in the object detection process, a value of a height h of the object on the image, or a value of a baseline length B corresponding to a distance between two cameras which capture the two images.
- the pixel matching unit generates a cost volume which is a stack of cost planes in which a similarity for each pixel is set to each pixel forming the image as an execution result of the stereo matching process.
- the disparity calculation unit includes a cost volume filtering unit that performs a filtering process for the cost volume generated by the pixel matching unit to generate cost volumes corresponding to a plurality of different resolutions.
- the cost volume filtering unit performs the filtering process by changing setting of a step which is a spacing between pixels to be filtered and a kernel size defining a range of reference pixels to be referred to in the filtering process in accordance with the resolution of the cost volume to be output.
- the cost volume filtering unit performs the filtering process using an averaging filter.
- the disparity calculation unit includes a disparity decision unit that generates disparity maps and object candidate region maps which correspond to the cost volumes corresponding to the plurality of different resolutions generated by the cost volume filtering unit.
- the disparity decision unit performs a disparity decision process that compares pixel values (similarities) at a same pixel position (corresponding pixel position) in a plurality of cost planes included in a cost volume corresponding to a resolution selected as a processing target to select a cost plane with the highest similarity and decides a disparity d of the selected cost plane as a disparity D of the pixel position and generates a disparity map in which the disparity D decided in the disparity decision process is associated with each pixel.
- the disparity decision unit generates an object candidate region map indicating a region (pixel region) in which an existence probability of the object to be detected is high, on the basis of the disparity D(x, y) corresponding to each pixel decided in the disparity decision process or a similarity at the disparity D(x, y).
- the classification unit receives the disparity maps and the object candidate region maps corresponding to the plurality of different resolutions generated by the disparity calculation unit and performs the object detection process, using a machine learning process using input data.
- the classification unit performs the object detection process using aggregated channel features (ACF) which are an object detection algorithm.
- ACF aggregated channel features
- the two images captured from the different viewpoints are a visible image and a far-infrared image.
- an object detection processing step of allowing an object detection unit to receive two images captured from different viewpoints and to perform an object detection process in which
- the object detection processing step includes
- disparity maps corresponding to a plurality of different resolutions are generated and output to the classification unit.
- the program causes a disparity calculation unit to execute a disparity calculation step of calculating a disparity of each pixel of the two images and generating a disparity map including calculated disparity information and causes a classification unit to execute a classification processing step of performing the object detection process using the disparity map generated in the disparity calculation step, and
- disparity maps corresponding to a plurality of different resolutions are generated and output to the classification unit.
- a series of processes described in the specification may be implemented by hardware, software, or a combination thereof.
- a program having a processing sequence recorded thereon may be installed in a memory of a computer incorporated into dedicated hardware and then executed, or the program may be installed in a general-purpose computer capable of performing various processes and then executed.
- the program may be recorded on a recording medium in advance.
- the program may be installed from the recording medium to the computer.
- the program may be received by the computer through a network, such as a local area network (LAN) or the Internet, and then installed in a recording medium, such as a hard disk drive, provided in the computer.
- LAN local area network
- the various processes described in the specification are not only performed in time series according to the description, but also may be performed in parallel or individually in accordance with the processing capability of the apparatus performing the processes or if needed.
- the system is a logical set configuration of a plurality of apparatuses and is not limited to the configuration in which the apparatuses are provided in the same housing.
- an apparatus and a method that perform generation of a disparity map and an object detection process with high accuracy and efficiency are achieved.
- the apparatus includes a disparity calculation unit that receives two images captured from different viewpoints, calculates a disparity, and generates a disparity map and a classification unit that performs an object detection process using the disparity map.
- the disparity calculation unit performs a stereo matching process using an original-resolution image, generates cost volumes corresponding to a plurality of resolutions from the processing result, generates disparity maps and object candidate region maps corresponding to the plurality of different resolutions, using the cost volumes corresponding to each resolution, and outputs the disparity maps and the object candidate region maps to the classification unit.
- the apparatus and the method that perform the generation of the disparity map and the object detection process with high accuracy and efficiency are achieved by these processes.
Abstract
Description
- Non-Patent Literature 1: Multispectral Pedestrian Detection: Benchmark Dataset and Baseline (CVPR2015_MutispectalPedestrian.pdf)
Z=(f/h)L (Expression 1).
d=(fB/Z) (Expression 2).
d=(B/L)h (Expression 3).
Similarity(x,y,d)=min(Mag1(x,y),Mag2(x+d,Y))φ(x,y,d) (Expression 4).
Φ(x,y,d)=(cos(2θ(x,y,d)))+1)/2 (Expression 5),
and
θ(x,y,d)=Ori1(x,y)−Ori2(x+d,y) (Expression 6).
Similarity(x,y,d)=min(Mag1(x,y),Mag2(x+d,y))Φ(x,y,d) (Expression 4).
-
- a disparity calculation unit that calculates a disparity of each pixel of the two images and generates a disparity map including calculated disparity information, and
- a classification unit that performs the object detection process using the disparity map generated by the disparity calculation unit, and
-
- a disparity calculation step of allowing a disparity calculation unit to calculate a disparity of each pixel of the two images and to generate a disparity map including calculated disparity information, and
- a classification processing step of allowing a classification unit to perform the object detection process using the disparity map generated in the disparity calculation step, and
- 10 visible image
- 20 far-infrared image
- 100 image processing apparatus
- 101 control unit
- 102 storage unit
- 103 codec
- 104 input unit
- 105 output unit
- 106 imaging unit
- 107 visible imaging unit
- 108 far-infrared imaging unit
- 111 first imaging element
- 112 second imaging element
- 131 first image
- 132 second image
- 140 calibration execution unit
- 151 corrected first image
- 152 corrected first image
- 170 object detection result
- 200 object detection unit
- 210 feature amount extraction unit
- 211 first image feature amount
- 212 second image feature amount
- 220 disparity calculation unit
- 221 pixel matching unit
- 222 cost volume filtering unit
- 223 disparity decision unit
- 225 disparity map
- 226 object region candidate map
- 230 classification unit
- 300 cost volume
- 301 cost plane
- 501 CPU
- 502 ROM
- 503 RAM
- 504 bus
- 505 input/output interface
- 506 input unit
- 507 output unit
- 508 storage unit
- 509 communication unit
- 510 drive
- 511 removable medium
- 521 imaging unit
- 522 display unit
Claims (12)
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JPJP2017-020055 | 2017-02-07 | ||
JP2017020055 | 2017-02-07 | ||
JP2017-020055 | 2017-02-07 | ||
PCT/JP2018/001782 WO2018147059A1 (en) | 2017-02-07 | 2018-01-22 | Image processing device, image processing method, and program |
Publications (2)
Publication Number | Publication Date |
---|---|
US20190349572A1 US20190349572A1 (en) | 2019-11-14 |
US11272163B2 true US11272163B2 (en) | 2022-03-08 |
Family
ID=63107360
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/474,946 Active US11272163B2 (en) | 2017-02-07 | 2018-01-22 | Image processing apparatus and image processing method |
Country Status (3)
Country | Link |
---|---|
US (1) | US11272163B2 (en) |
JP (1) | JP7024736B2 (en) |
WO (1) | WO2018147059A1 (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI637350B (en) * | 2018-01-09 | 2018-10-01 | 緯創資通股份有限公司 | Method, image processing device, and system for generating disparity map |
US11060864B1 (en) * | 2019-01-22 | 2021-07-13 | Tp Lab, Inc. | Controller for measuring distance from reference location and real size of object using a plurality of cameras |
WO2020217283A1 (en) * | 2019-04-22 | 2020-10-29 | 日本電気株式会社 | Object detection device, object detection system, object detection method, and non-transitory computer-readable medium in which program is stored |
JP6808111B1 (en) * | 2020-05-07 | 2021-01-06 | 三菱電機株式会社 | Self-position estimation device, navigation control device, image generator, artificial satellite system and self-position estimation method |
JP7450668B2 (en) | 2022-06-30 | 2024-03-15 | 維沃移動通信有限公司 | Facial recognition methods, devices, systems, electronic devices and readable storage media |
Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0600709A2 (en) | 1992-12-01 | 1994-06-08 | Canon Kabushiki Kaisha | Range-image processing apparatus and method |
JPH06176107A (en) | 1992-12-04 | 1994-06-24 | Canon Inc | Method and device for processing distance picture |
US20080036576A1 (en) * | 2006-05-31 | 2008-02-14 | Mobileye Technologies Ltd. | Fusion of far infrared and visible images in enhanced obstacle detection in automotive applications |
US20090141938A1 (en) * | 2007-11-08 | 2009-06-04 | Elctronics And Telecommunications Research Institute | Robot vision system and detection method |
US20100128974A1 (en) * | 2008-11-25 | 2010-05-27 | Nec System Technologies, Ltd. | Stereo matching processing apparatus, stereo matching processing method and computer-readable recording medium |
JP2011048416A (en) | 2009-08-25 | 2011-03-10 | Konica Minolta Holdings Inc | Image processing apparatus and image processing method |
CN103139469A (en) | 2011-12-01 | 2013-06-05 | 索尼公司 | System and method for generating robust depth maps utilizing a multi-resolution procedure |
WO2014073670A1 (en) | 2012-11-09 | 2014-05-15 | 国立大学法人山口大学 | Image processing method and image processing device |
WO2014083721A1 (en) | 2012-11-27 | 2014-06-05 | 株式会社ソニー・コンピュータエンタテインメント | Information processing device and information processing method |
US20150036917A1 (en) * | 2011-06-17 | 2015-02-05 | Panasonic Corporation | Stereo image processing device and stereo image processing method |
US20150078669A1 (en) * | 2013-08-19 | 2015-03-19 | Nokia Corporation | Method, apparatus and computer program product for object detection and segmentation |
US20150249814A1 (en) * | 2012-09-27 | 2015-09-03 | Panasonic Intellectual Property Management Co., Ltd. | Stereo image processing device and stereo image processing method |
US20150324659A1 (en) * | 2014-05-08 | 2015-11-12 | Mitsubishi Electric Research Laboratories, Inc. | Method for detecting objects in stereo images |
US20160044297A1 (en) | 2014-08-11 | 2016-02-11 | Sony Corporation | Information processor, information processing method, and computer program |
US20160284090A1 (en) * | 2015-03-27 | 2016-09-29 | Yu Huang | Stereo image matching by shape preserving filtering of a cost volume in a phase domain |
US9485495B2 (en) * | 2010-08-09 | 2016-11-01 | Qualcomm Incorporated | Autofocus for stereo images |
US9626590B2 (en) * | 2015-09-18 | 2017-04-18 | Qualcomm Incorporated | Fast cost aggregation for dense stereo matching |
US10321112B2 (en) * | 2016-07-18 | 2019-06-11 | Samsung Electronics Co., Ltd. | Stereo matching system and method of operating thereof |
-
2018
- 2018-01-22 WO PCT/JP2018/001782 patent/WO2018147059A1/en active Application Filing
- 2018-01-22 US US16/474,946 patent/US11272163B2/en active Active
- 2018-01-22 JP JP2018567346A patent/JP7024736B2/en active Active
Patent Citations (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0600709A2 (en) | 1992-12-01 | 1994-06-08 | Canon Kabushiki Kaisha | Range-image processing apparatus and method |
US5504847A (en) | 1992-12-01 | 1996-04-02 | Canon Kabushiki Kaisha | Range-image processing apparatus and method |
DE69328230T2 (en) | 1992-12-01 | 2000-08-10 | Canon Kk | Distance image processing device and method |
JPH06176107A (en) | 1992-12-04 | 1994-06-24 | Canon Inc | Method and device for processing distance picture |
US20080036576A1 (en) * | 2006-05-31 | 2008-02-14 | Mobileye Technologies Ltd. | Fusion of far infrared and visible images in enhanced obstacle detection in automotive applications |
US20090141938A1 (en) * | 2007-11-08 | 2009-06-04 | Elctronics And Telecommunications Research Institute | Robot vision system and detection method |
US20100128974A1 (en) * | 2008-11-25 | 2010-05-27 | Nec System Technologies, Ltd. | Stereo matching processing apparatus, stereo matching processing method and computer-readable recording medium |
JP2011048416A (en) | 2009-08-25 | 2011-03-10 | Konica Minolta Holdings Inc | Image processing apparatus and image processing method |
US9485495B2 (en) * | 2010-08-09 | 2016-11-01 | Qualcomm Incorporated | Autofocus for stereo images |
US20150036917A1 (en) * | 2011-06-17 | 2015-02-05 | Panasonic Corporation | Stereo image processing device and stereo image processing method |
CN103139469A (en) | 2011-12-01 | 2013-06-05 | 索尼公司 | System and method for generating robust depth maps utilizing a multi-resolution procedure |
KR20130061636A (en) | 2011-12-01 | 2013-06-11 | 소니 주식회사 | System and method for generating robust depth maps utilizing a multi-resolution procedure |
JP2013117969A (en) | 2011-12-01 | 2013-06-13 | Sony Corp | System and method for generating robust depth map utilizing multiple resolution procedure |
TW201338500A (en) | 2011-12-01 | 2013-09-16 | Sony Corp | System and method for generating robust depth maps utilizing a multi-resolution procedure |
US20130142415A1 (en) | 2011-12-01 | 2013-06-06 | Gazi Ali | System And Method For Generating Robust Depth Maps Utilizing A Multi-Resolution Procedure |
EP2600618A2 (en) | 2011-12-01 | 2013-06-05 | Sony Corporation | System and method for generating robust depth maps utilizing a multi-resolution procedure |
US20150249814A1 (en) * | 2012-09-27 | 2015-09-03 | Panasonic Intellectual Property Management Co., Ltd. | Stereo image processing device and stereo image processing method |
WO2014073670A1 (en) | 2012-11-09 | 2014-05-15 | 国立大学法人山口大学 | Image processing method and image processing device |
US20150302596A1 (en) | 2012-11-09 | 2015-10-22 | Yoshiki Mizukami | Image processing method and an image processing apparatus |
JP2014096062A (en) | 2012-11-09 | 2014-05-22 | Yamaguchi Univ | Image processing method and image processing apparatus |
JP2014106732A (en) | 2012-11-27 | 2014-06-09 | Sony Computer Entertainment Inc | Information processor and information processing method |
WO2014083721A1 (en) | 2012-11-27 | 2014-06-05 | 株式会社ソニー・コンピュータエンタテインメント | Information processing device and information processing method |
US20150302239A1 (en) | 2012-11-27 | 2015-10-22 | Sony Computer Entrtainment Inc. | Information processor and information processing method |
US20150078669A1 (en) * | 2013-08-19 | 2015-03-19 | Nokia Corporation | Method, apparatus and computer program product for object detection and segmentation |
US20150324659A1 (en) * | 2014-05-08 | 2015-11-12 | Mitsubishi Electric Research Laboratories, Inc. | Method for detecting objects in stereo images |
US20160044297A1 (en) | 2014-08-11 | 2016-02-11 | Sony Corporation | Information processor, information processing method, and computer program |
JP2016038886A (en) | 2014-08-11 | 2016-03-22 | ソニー株式会社 | Information processing apparatus and information processing method |
US20160284090A1 (en) * | 2015-03-27 | 2016-09-29 | Yu Huang | Stereo image matching by shape preserving filtering of a cost volume in a phase domain |
US9626590B2 (en) * | 2015-09-18 | 2017-04-18 | Qualcomm Incorporated | Fast cost aggregation for dense stereo matching |
US10321112B2 (en) * | 2016-07-18 | 2019-06-11 | Samsung Electronics Co., Ltd. | Stereo matching system and method of operating thereof |
Non-Patent Citations (4)
Title |
---|
International Search Report and Written Opinion of PCT Application No. PCT/JP2018/001782, dated Apr. 10, 2018, 09 pages of ISRWO. |
Office Action for JP Patent Application No. 2018-567346, dated Nov. 16, 2021, 04 pages of English Translation and 04 pages of Office Action. |
Uto, et al., "Fast Registration Algorithms of Range Images Based on Multi resolution Analysis", IEICE Technical Report, Japan, Institute of Electronics, Information and Communication Engineers, vol. 105, No. 177, Jul. 8, 2005, pp. 33-38. |
Uto, et al., "Fast Registration Algorithms of Range Images Based on Multiresolution Analysis", IEICE Technical Report, vol. 105, No. 177, Jul. 8, 2005. |
Also Published As
Publication number | Publication date |
---|---|
JPWO2018147059A1 (en) | 2019-11-21 |
JP7024736B2 (en) | 2022-02-24 |
US20190349572A1 (en) | 2019-11-14 |
WO2018147059A1 (en) | 2018-08-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11272163B2 (en) | Image processing apparatus and image processing method | |
US6961466B2 (en) | Method and apparatus for object recognition | |
Sun et al. | Deep convolutional network cascade for facial point detection | |
US10303983B2 (en) | Image recognition apparatus, image recognition method, and recording medium | |
US10216979B2 (en) | Image processing apparatus, image processing method, and storage medium to detect parts of an object | |
WO2019220622A1 (en) | Image processing device, system, method, and non-transitory computer readable medium having program stored thereon | |
JP6032921B2 (en) | Object detection apparatus and method, and program | |
US7929728B2 (en) | Method and apparatus for tracking a movable object | |
US9690988B2 (en) | Image processing apparatus and image processing method for blink detection in an image | |
US8594435B2 (en) | Image processing device and method, and program therefor | |
JP2008257713A (en) | Correcting device and method for perspective transformed document image | |
US9426375B2 (en) | Line-of-sight detection apparatus and image capturing apparatus | |
JP7334432B2 (en) | Object tracking device, monitoring system and object tracking method | |
CN107766864B (en) | Method and device for extracting features and method and device for object recognition | |
US20230127009A1 (en) | Joint objects image signal processing in temporal domain | |
JPWO2012046426A1 (en) | Object detection apparatus, object detection method, and object detection program | |
JP5429564B2 (en) | Image processing apparatus and method, and program | |
JP2014010633A (en) | Image recognition device, image recognition method, and image recognition program | |
US10332259B2 (en) | Image processing apparatus, image processing method, and program | |
JP5791361B2 (en) | PATTERN IDENTIFICATION DEVICE, PATTERN IDENTIFICATION METHOD, AND PROGRAM | |
JP4789526B2 (en) | Image processing apparatus and image processing method | |
JP6278757B2 (en) | Feature value generation device, feature value generation method, and program | |
JP5625196B2 (en) | Feature point detection device, feature point detection method, feature point detection program, and recording medium | |
JP6276504B2 (en) | Image detection apparatus, control program, and image detection method | |
JP5702960B2 (en) | Image processing apparatus, image processing method, and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SONY CORPORATION, JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:UEMORI, TAKESHI;REEL/FRAME:049626/0344 Effective date: 20190625 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |