EP3432291A1 - Image processing apparatus, object recognition apparatus, device control system, image processing method, and program - Google Patents

Image processing apparatus, object recognition apparatus, device control system, image processing method, and program Download PDF

Info

Publication number
EP3432291A1
EP3432291A1 EP16894575.6A EP16894575A EP3432291A1 EP 3432291 A1 EP3432291 A1 EP 3432291A1 EP 16894575 A EP16894575 A EP 16894575A EP 3432291 A1 EP3432291 A1 EP 3432291A1
Authority
EP
European Patent Office
Prior art keywords
unit
area
distance
detection
size
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP16894575.6A
Other languages
German (de)
French (fr)
Other versions
EP3432291A4 (en
Inventor
Seiya Amano
Hiroyoshi Sekiguchi
Soichiro Yokota
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ricoh Co Ltd
Original Assignee
Ricoh Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ricoh Co Ltd filed Critical Ricoh Co Ltd
Publication of EP3432291A1 publication Critical patent/EP3432291A1/en
Publication of EP3432291A4 publication Critical patent/EP3432291A4/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
    • B60W30/08Active safety systems predicting or avoiding probable or impending collision or attempting to minimise its consequences
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C3/00Measuring distances in line of sight; Optical rangefinders
    • G01C3/02Details
    • G01C3/06Use of electric means to obtain final indication
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C3/00Measuring distances in line of sight; Optical rangefinders
    • G01C3/02Details
    • G01C3/06Use of electric means to obtain final indication
    • G01C3/08Use of electric radiation detectors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/584Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of vehicle lights or traffic lights
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/16Anti-collision systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/16Anti-collision systems
    • G08G1/165Anti-collision systems for passive traffic, e.g. including static obstacles, trees
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/16Anti-collision systems
    • G08G1/166Anti-collision systems for active traffic, e.g. moving vehicles, pedestrians, bikes

Definitions

  • the present invention relates to an image processing apparatus, an object recognition apparatus, a device control system, an image processing method, and a program.
  • disparity of each object appearing in two luminance images captured on the right and left is derived to generate a disparity image and pixels having a similar disparity value is grouped together to recognize the object.
  • a disparity cluster in a disparity image the height, horizontal width, and depth of an object and the position of an object in three dimensions may be detected.
  • Patent Literature 1 As the technology for recognizing objects described above, there is a disclosed technology in which a pedestrian recognition area where the presence of a pedestrian is recognized in image data is identified and a pedestrian score indicating the degree of certainty of a pedestrian is calculated (see Patent Literature 1).
  • Patent Literature 1 Japanese Laid-open Patent Publication No. 2014-146267
  • Patent Literature 1 has a problem in that for example when a pedestrian suddenly runs out from the back of a different vehicle, or the like, it is difficult to ensure that the pedestrian is detected without being discarded and is included as the control target.
  • the present invention has been made in consideration of the foregoing, and it has an object to provide an image processing apparatus, an object recognition apparatus, a device control system, an image processing method, and a program that performs a discard process properly.
  • the present invention includes a first calculating unit that calculates a distance between two objects, detected based on distance information on the objects, in a depth direction in detection areas of the objects; a second calculating unit that calculates an overlap size that is a size of an overlapped area of the two detection areas by using a method that corresponds to the distance calculated by the first calculating unit; and a discarding unit that determines whether each of the two objects in the detection areas is to be discarded in accordance with the overlap size.
  • a discard process may be properly performed.
  • FIGS. 1 to 24 a detailed explanation is given below of an embodiment of an image processing apparatus, an object recognition apparatus, a device control system, an image processing method, and a program according to the present invention.
  • the present invention is not limited to the embodiment below, and components in the embodiment below include the ones that may be easily developed by a person skilled in the art, substantially the same ones, and the ones in what is called a range of equivalents. Furthermore, the components may be variously omitted, replaced, modified, or combined without departing from the scope of the embodiment below.
  • FIG. 1 is a diagram that illustrates an example where a device control system according to the embodiment is installed in a vehicle.
  • a device control system 60 according to the present embodiment is installed in a vehicle 70.
  • FIG. 1(a) is a side view of the vehicle 70 with the device control system 60 installed therein
  • FIG. 1(b) is a front view of the vehicle 70.
  • the vehicle 70 which is an automobile, has the device control system 60 installed therein.
  • the device control system 60 includes an object recognition apparatus 1, a vehicle control device 6 (control device), a steering wheel 7, and a brake pedal 8, provided in the vehicle interior that is an accommodation space in the vehicle 70.
  • the object recognition apparatus 1 has an imaging function to capture images in a traveling direction of the vehicle 70, and for example it is installed near the rearview mirror inside the front window of the vehicle 70.
  • the object recognition apparatus 1 includes: a main body unit 2; and an imaging unit 10a and an imaging unit 10b that are fixed to the main body unit 2, and details of its configuration and operation are described later.
  • the imaging units 10a, 10b are fixed to the main body unit 2 so as to capture an object in the traveling direction of the vehicle 70.
  • the vehicle control device 6 is an ECU (electronic control unit) that performs various types of vehicle control on the basis of recognition information received from the object recognition apparatus 1. On the basis of recognition information received from the object recognition apparatus 1, the vehicle control device 6 performs, as an example of the vehicle control, steering control to avoid obstacles by controlling a steering system (control target) including the steering wheel 7, brake control to stop or reduce the speed of the vehicle 70 by controlling the brake pedal 8 (control target), or the like.
  • a steering system control target
  • brake control to stop or reduce the speed of the vehicle 70 by controlling the brake pedal 8 (control target), or the like.
  • the device control system 60 including the object recognition apparatus 1 and the vehicle control device 6 described above performs vehicle control such as steering control or brake control to improve driving safety of the vehicle 70.
  • the object recognition apparatus 1 captures images in front of the vehicle 70; however, this is not a limitation. That is, the object recognition apparatus 1 may be installed to capture images on the back or side of the vehicle 70. In this case, the object recognition apparatus 1 is capable of detecting the position of the following vehicle and person on the back of the vehicle 70 or a different vehicle and person on the side thereof. Furthermore, the vehicle control device 6 is capable of detecting dangers when the vehicle 70 changes a lane, merges into a lane, or the like, to perform the above-described vehicle control.
  • the vehicle control device 6 determines that there is the danger of collision when the vehicle 70 is backing to be parked, or the like, on the basis of recognition information on an obstacle on the back of the vehicle 70, output from the object recognition apparatus 1, it is capable of performing the above-described vehicle control.
  • FIG. 2 is a diagram that illustrates an example of the external appearance of the object recognition apparatus according to the embodiment.
  • the object recognition apparatus 1 includes the main body unit 2; and the imaging unit 10a and the imaging unit 10b that are fixed to the main body unit 2, as described above.
  • the imaging units 10a and 10b are made up of a pair of cylindrical cameras that are parallel to and are located at equivalent positions relative to the main body unit 2.
  • the imaging unit 10a illustrated in FIG. 2 is sometimes referred to as the right camera and the imaging unit 10b as the left camera.
  • FIG. 3 is a diagram that illustrates an example of the hardware configuration of the object recognition apparatus according to the embodiment. With reference to FIG. 3 , the hardware configuration of the object recognition apparatus 1 is explained.
  • the object recognition apparatus 1 includes a disparity-value deriving unit 3 and a recognition processing unit 5 inside the main body unit 2.
  • the disparity-value deriving unit 3 is a device that derives a disparity value dp indicating disparity with respect to an object from images obtained after the object is captured and outputs a disparity image (an example of distance information) indicating the disparity value dp of each pixel.
  • the recognition processing unit 5 is a device that performs an object recognition process, or the like, on an object such as person or vehicle appearing in a captured image on the basis of a disparity image output from the disparity-value deriving unit 3 and outputs recognition information that is information indicating a result of the object recognition process to the vehicle control device 6.
  • the disparity-value deriving unit 3 includes the imaging unit 10a, the imaging unit 10b, a signal converting unit 20a, a signal converting unit 20b, and an image processing unit 30.
  • the imaging unit 10a is a processing unit that captures an object in the front and generates analog image signals.
  • the imaging unit 10a includes an imaging lens 11a, an aperture 12a, and an image sensor 13a.
  • the imaging lens 11a is an optical element that refracts incident light to form an image of the object on the image sensor 13a.
  • the aperture 12a is a member that blocks part of light that has passed through the imaging lens 11a to adjust the amount of light input to the image sensor 13a.
  • the image sensor 13a is a semiconductor device that converts light that has entered the imaging lens 11a and passed through the aperture 12a into electric analog image signals.
  • the image sensor 13a is implemented by using solid state image sensors such as CCD (charge coupled devices) or CMOS (complementary metal oxide semiconductor).
  • the imaging unit 10b is a processing unit that captures the object in the front and generates analog image signals.
  • the imaging unit 10b includes an imaging lens 11b, an aperture 12b, and an image sensor 13b.
  • the functions of the imaging lens 11b, the aperture 12b, and the image sensor 13b are the same as those of the imaging lens 11a, the aperture 12a, and the image sensor 13a described above.
  • the imaging lens 11a and the imaging lens 11b are installed such that their principal surfaces are on the same plane so that the right and the left cameras capture images under the same condition.
  • the signal converting unit 20a is a processing unit that converts analog image signals generated by the imaging unit 10a into digital-format image data.
  • the signal converting unit 20a includes a CDS (correlated double sampling) 21a, an AGC (auto gain control) 22a, an ADC (analog digital converter) 23a, and a frame memory 24a.
  • the CDS 21a removes noise from analog image signals generated by the image sensor 13a by using correlated double sampling, a differential filter in a traverse direction, a smoothing filter in a longitudinal direction, or the like.
  • the AGC 22a performs gain control to control the intensity of analog image signals from which noise has been removed by the CDS 21a.
  • the ADC 23a converts analog image signals whose gain has been controlled by the AGC 22a into digital-format image data.
  • the frame memory 24a stores image data converted by the ADC 23a.
  • the signal converting unit 20b is a processing unit that converts analog image signals generated by the imaging unit 10b into digital-format image data.
  • the signal converting unit 20b includes a CDS 21b, an AGC 22b, an ADC 23b, and a frame memory 24b.
  • the functions of the CDS 21b, the AGC 22b, the ADC 23b, and the frame memory 24b are the same as those of the CDS 21a, the AGC 22a, the ADC 23a, and the frame memory 24a described above.
  • the image processing unit 30 is a device that performs image processing on image data converted by the signal converting unit 20a and the signal converting unit 20b.
  • the image processing unit 30 includes an FPGA (field programmable gate array) 31, a CPU (central processing unit) 32, a ROM (read only memory) 33, a RAM (random access memory) 34, an I/F (interface) 35, and a bus line 39.
  • FPGA field programmable gate array
  • CPU central processing unit
  • ROM read only memory
  • RAM random access memory
  • I/F interface
  • the FPGA 31 is an integrated circuit, and here it performs a process to derive the disparity value dp in an image based on image data.
  • the CPU 32 controls each function of the disparity-value deriving unit 3.
  • the ROM 33 stores programs for image processing executed by the CPU 32 to control each function of the disparity-value deriving unit 3.
  • the RAM 34 is used as a work area of the CPU 32.
  • the I/F 35 is an interface for communicating with an I/F 55 in the recognition processing unit 5 via a communication line 4.
  • the bus line 39 is an address bus and a data bus, or the like, for connecting the FPGA 31, the CPU 32, the ROM 33, the RAM 34, and the I/F 35 such that they can communicate with one other.
  • the image processing unit 30 includes the FPGA 31 as an integrated circuit for deriving the disparity value dp; however, this is not a limitation, and it may be an integrated circuit such as ASIC (application specific integrated circuit).
  • ASIC application specific integrated circuit
  • the recognition processing unit 5 includes an FPGA 51, a CPU 52, a ROM 53, a RAM 54, the I/F 55, a CAN (controller area network) I/F 58, and a bus line 59.
  • the FPGA 51 is an integrated circuit, and here it performs an object recognition process on an object on the basis of disparity images, or the like, received from the image processing unit 30.
  • the CPU 52 controls each function of the recognition processing unit 5.
  • the ROM 53 stores programs for an object recognition process with which the CPU 52 performs an object recognition process of the recognition processing unit 5.
  • the RAM 54 is used as a work area of the CPU 52.
  • the I/F 55 is an interface for data communication with the I/F 35 of the image processing unit 30 via the communication line 4.
  • the CAN I/F 58 is an interface for communicating with an external controller (e.g., the vehicle control device 6 illustrated in FIG.
  • a bus line 59 connected to the CAN of a vehicle, or the like is an address bus and a data bus, or the like, connecting the FPGA 51, the CPU 52, the ROM 53, the RAM 54, the I/F 55, and the CAN I/F 58 such that they can communicate with one another, as illustrated in FIG. 3 .
  • the FPGA 51 performs an object recognition process, or the like, on an object such as person or vehicle appearing in a captured image on the basis of the disparity image in accordance with a command from the CPU 52 of the recognition processing unit 5.
  • each of the above-described programs may be distributed by recorded in a recording medium readable by computers in the form of file that is installable and executable.
  • the recording medium may be a CD-ROM (compact disc read only memory), SD (secure digital) memory card, or the like.
  • the image processing unit 30 of the disparity-value deriving unit 3 and the recognition processing unit 5 are separate devices; however, this is not a limitation, and, for example, the image processing unit 30 and the recognition processing unit 5 may be the same device to generate disparity images and perform an object recognition process.
  • FIG. 4 is a diagram that illustrates an example of the configuration of functional blocks of the object recognition apparatus according to the embodiment. First, with reference to FIG. 4 , an explanation is given of the configuration and operation of the functional blocks in the relevant part of the object recognition apparatus 1.
  • the object recognition apparatus 1 includes the disparity-value deriving unit 3 and the recognition processing unit 5 as illustrated in FIG. 4 .
  • the disparity-value deriving unit 3 includes an image acquiring unit 100a (first imaging unit), an image acquiring unit 100b (second imaging unit), converting units 200a, 200b, and a disparity-value calculation processing unit 300 (generating unit).
  • the image acquiring unit 100a is a functional unit that captures the image of an object in the front by using the right camera, generates analog image signals, and obtains a luminance image that is an image based on the image signals.
  • the image acquiring unit 100a is implemented by using the imaging unit 10a illustrated in FIG. 3 .
  • the image acquiring unit 100b is a functional unit that captures the image of an object in the front by using the left camera, generates analog image signals, and obtains a luminance image that is an image based on the image signals.
  • the image acquiring unit 100b is implemented by using the imaging unit 10b illustrated in FIG. 3 .
  • the converting unit 200a is a functional unit that removes noise from image data on the luminance image obtained by the image acquiring unit 100a, converts it into digital-format image data, and outputs it.
  • the converting unit 200a is implemented by using the signal converting unit 20a illustrated in FIG. 3 .
  • the converting unit 200b is a functional unit that removes noise from image data on the luminance image obtained by the image acquiring unit 100b, converts it into digital-format image data, and outputs it.
  • the converting unit 200b is implemented by using the signal converting unit 20b illustrated in FIG. 3 .
  • the luminance image captured by the image acquiring unit 100a which is the right camera (the imaging unit 10a)
  • the reference image Ia hereafter, simply referred to as the reference image Ia
  • the luminance image captured by the image acquiring unit 100b which is the left camera (the imaging unit 10b)
  • the comparison image Ib hereafter, simply referred to as the comparison image Ib
  • the converting units 200a, 200b output the reference image Ia and the comparison image Ib, respectively, on the basis of two luminance images output from the image acquiring units 100a, 100b.
  • the disparity-value calculation processing unit 300 is a functional unit that derives the disparity value dp with respect to each pixel of the reference image Ia on the basis of the reference image Ia and the comparison image Ib received from the converting units 200a, 200b, respectively, and generates a disparity image in which the disparity value dp is applied to each pixel of the reference image Ia.
  • the disparity-value calculation processing unit 300 outputs the generated disparity image to the recognition processing unit 5.
  • the recognition processing unit 5 is a functional unit that recognizes (detects) an object on the basis of the reference image Ia and the disparity image received from the disparity-value deriving unit 3 and performs a tracking process on the recognized object.
  • FIG. 5 is a diagram that illustrates an example of the configuration of functional blocks in the disparity-value calculation processing unit of the object recognition apparatus according to the embodiment.
  • FIG. 6 is a diagram that explains the principle for deriving the distance from the imaging unit to an object.
  • FIG. 7 is a diagram that explains the case of obtaining a corresponding pixel that is in a comparison image and that corresponds to the reference pixel in the reference image.
  • FIG. 8 is a diagram that illustrates an example of the graph of results of block matching processing.
  • the imaging system illustrated in FIG. 6 includes the imaging unit 10a and the imaging unit 10b that are located parallel at equivalent positions.
  • the imaging units 10a, 10b include the imaging lens 11a, 11b, respectively, which refract incident light to form an image of the object on an image sensor that is a solid state image sensor.
  • Images captured by the imaging unit 10a and the imaging unit 10b are the reference image Ia and the comparison image Ib, respectively.
  • a point S on an object E in the three-dimensional space is mapped onto a position on a straight line parallel to the straight line connecting the imaging lens 11a and the imaging lens 11b.
  • the point S mapped onto each image is a point Sa(x,y) on the reference image Ia and is a point Sb(X,y) on the comparison image Ib.
  • the disparity value dp is represented as in Equation (1) below by using the point Sa(x,y) on coordinates of the reference image Ia and the point Sb(X,y) on coordinates of the comparison image Ib.
  • dp X ⁇ x
  • the distance Z is the distance from the straight line connecting the focus position of the imaging lens 11a and the focus position of the imaging lens 11b to the point S on the object E.
  • the distance Z may be calculated with Equation (2) below by using a focal length f of the imaging lens 11a and the imaging lens 11b, a base length B that is the distance between the imaging lens 11a and the imaging lens 11b, and the disparity value dp.
  • Z B ⁇ f / dp
  • Equation (2) it is understood that the distance Z is shorter as the disparity value dp is larger and the distance Z is longer as the disparity value dp is smaller.
  • C(p,d) represents C(x,y,d).
  • FIG. 7(a) is a conceptual diagram that illustrates a reference pixel p and a reference area pb in the reference image Ia
  • FIG. 7(b) is a conceptual diagram of calculating the cost value C while sequentially shifting (displacing) candidates for the corresponding pixel that is in the comparison image Ib and that corresponds to the reference pixel p illustrated in FIG. 7(a)
  • the corresponding pixel indicates the pixel that is in the comparison image Ib and that is nearest to the reference pixel p in the reference image Ia.
  • the cost value C is an evaluation value (degree of matching) representing the degree of similarity or the degree of dissimilarity of each pixel in the comparison image Ib with respect to the reference pixel p in the reference image Ia.
  • the cost value C described below is an evaluation value representing the degree of dissimilarity indicating that as the value is smaller, the pixel in the comparison image Ib is similar to the reference pixel p.
  • the cost value C(p,d) of the candidate pixel q(x+d,y) that is a candidate for the corresponding pixel with respect to the reference pixel p(x,y) is calculated.
  • the shift amount (displacement amount) between the reference pixel p and the candidate pixel q is d, and the shift amount d is a shift on a pixel to pixel basis.
  • the cost value C(p,d) is calculated, which is the degree of dissimilarity between the luminance values of the candidate pixel q(x+d,y) and the reference pixel p(x,y). Furthermore, as the stereo matching processing to obtain the corresponding pixel of the reference pixel p, block matching processing is performed according to the present embodiment.
  • the degree of dissimilarity is obtained between the reference area pb that is a predetermined area with the reference pixel p in the reference image Ia as a center and a candidate area qb (the same size as the reference area pb) with the candidate pixel q in the comparison image Ib as a center.
  • SAD Sud of Absolute Difference
  • SSD Small of Squared Difference
  • ZSSD Zero-mean-Sum of Squared Difference
  • the imaging units 10a, 10b are located parallel at equivalent positions and therefore the reference image Ia and the comparison image Ib also have a relation such that they are located parallel at equivalent positions. Therefore, the corresponding pixel that is in the comparison image Ib and that corresponds to the reference pixel p in the reference image Ia is present on the epipolar line EL that is illustrated as a line in a horizontal direction as viewed from the sheet surface in FIG. 7 and, to obtain the corresponding pixel in the comparison image Ib, a pixel is retrieved on the epipolar line EL of the comparison image Ib.
  • the cost value C(p,d) calculated during the above-described block matching processing is represented by, for example, the graph illustrated in FIG. 8 in relation to the shift amount d.
  • the disparity-value calculation processing unit 300 includes a cost calculating unit 301, a determining unit 302, and a first generating unit 303.
  • the cost calculating unit 301 is a functional unit that calculates the cost value C(p,d) of each of the candidate pixels q(x+d,y) on the basis of the luminance value of the reference pixel p(x,y) in the reference image Ia and the luminance value of each of the candidate pixels q(x+d,y) that are candidates for the corresponding pixel, identified by shifting the pixel at the corresponding position of the reference pixel p(x,y) by the shift amount d on the epipolar line EL on the comparison image Ib based on the reference pixel p(x,y).
  • the cost calculating unit 301 calculates, as the cost value C, the degree of dissimilarity between the reference area pb that is a predetermined area with the reference pixel p in the reference image Ia as a center and the candidate area qb (the same size as the reference area pb) with the candidate pixel q in the comparison image Ib as a center.
  • the determining unit 302 is a functional unit that determines that the shift amount d that corresponds to the minimum value of the cost value C calculated by the cost calculating unit 301 is the disparity value dp with respect to a pixel in the reference image Ia that is targeted for calculation of the cost value C.
  • the first generating unit 303 is a functional unit that generates a disparity image that is an image where, on the basis of the disparity value dp determined by the determining unit 302, the pixel value of each pixel of the reference image Ia is replaced with the disparity value dp that corresponds to the pixel.
  • Each of the cost calculating unit 301, the determining unit 302, and the first generating unit 303 illustrated in FIG. 5 is implemented by using the FPGA 31 illustrated in FIG. 3 . Furthermore, all or part of the cost calculating unit 301, the determining unit 302, and the first generating unit 303 may be implemented when the CPU 32 executes programs stored in the ROM 33 instead of the FPGA 31 that is a hardware circuit.
  • the functions of the cost calculating unit 301, the determining unit 302, and the first generating unit 303 in the disparity-value calculation processing unit 300 illustrated in FIG. 5 are illustrated as a concept, and this configuration is not a limitation.
  • multiple functional units that are illustrated as separate functional units in the disparity-value calculation processing unit 300 illustrated in FIG. 5 may be configured as a single functional unit.
  • a function provided in a single functional unit in the disparity-value calculation processing unit 300 illustrated in FIG. 5 may be divided and configured as multiple functional units.
  • FIG. 9 is a diagram that illustrates an example of the configuration of functional blocks of the recognition processing unit in the object recognition apparatus according to the embodiment.
  • FIG. 10 is a diagram that illustrates an example of the V map generated from a disparity image.
  • FIG. 11 is a diagram that illustrates an example of the U map generated from a disparity image.
  • FIG. 12 is a diagram that illustrates an example of the real U map generated from a U map.
  • FIG. 13 is a diagram that illustrates a process to extract an isolated area from a real U map.
  • FIG. 14 is a diagram that illustrates a process to generate a detection frame.
  • FIG. 15 is a diagram that illustrates a case where the distance between frames is short.
  • FIG. 16 is a diagram that illustrates a case where the distance between frames is long.
  • the recognition processing unit 5 includes a second generating unit 501, a clustering processing unit 502, and a tracking unit 503.
  • the second generating unit 501 is a functional unit that receives a disparity image from the disparity-value calculation processing unit 300, receives the reference image Ia from the disparity-value deriving unit 3, and generates a V-Disparity map, U-Disparity map, and Real U-Disparity map, or the like. Specifically, to detect a road surface from the disparity image input from the disparity-value calculation processing unit 300, the second generating unit 501 generates a V map VM that is the V-Disparity map illustrated in FIG. 10(b) .
  • the V-Disparity map is a two-dimensional histogram indicating the frequency distribution of the disparity value dp, where the vertical axis is the y axis of the reference image Ia and the horizontal axis is the disparity value dp (or distance) of the disparity image.
  • a road surface 600, a power pole 601, and a vehicle 602 appear in the reference image Ia illustrated in FIG. 10(a) .
  • the road surface 600 in the reference image Ia corresponds to a road surface portion 600a
  • the power pole 601 corresponds to a power pole portion 601a
  • the vehicle 602 corresponds to a vehicle portion 602a.
  • the second generating unit 501 conducts linear approximation on the position that is estimated to be a road surface based on the generated V map VM.
  • a road surface is flat, approximation is possible by using a single straight line; however, when the gradient of the road surface changes, there is a need to divide the V map VM into sections and conduct linear approximation with high accuracy.
  • Known technologies such as Hough transform or the least-square method may be used as linear approximation.
  • the power pole portion 601a and the vehicle portion 602a which are clusters located above the detected road surface portion 600a, are equivalent to the power pole 601 and the vehicle 602, respectively, that are objects on the road surface 600.
  • a U-Disparity map is generated by the second generating unit 501 described later, only information above the road surface is used to remove noise.
  • the second generating unit 501 generates a U map UM that is a U-Disparity map illustrated in FIG. 11(b) to recognize objects by using only information located above the road surface detected from the V map VM, i.e., by using information that is in a disparity image and that is equivalent to a left guardrail 611, a right guardrail 612, a vehicle 613, and a vehicle 614 in the reference image Ia illustrated in FIG. 11(a) .
  • the U map UM is a two-dimensional histogram indicating the frequency distribution of the disparity value dp, where the horizontal axis is the x axis of the reference image Ia and the vertical axis is the disparity value dp (or distance) of the disparity image.
  • the left guardrail 611 in the reference image Ia illustrated in FIG. 11(a) is equivalent to a left guardrail portion 611a on the U map UM
  • the right guardrail 612 is equivalent to a right guardrail portion 612a
  • the vehicle 613 is equivalent to a vehicle portion 613a
  • the vehicle 614 is equivalent to a vehicle portion 614a.
  • the second generating unit 501 generates a U map UM_H that is an example of the U-Disparity map illustrated in FIG. 11(c) by using only information located above the road surface detected from the V map VM, i.e., by using information that is in a disparity image and that is equivalent to the left guardrail 611, the right guardrail 612, the vehicle 613, and the vehicle 614 in the reference image Ia illustrated in FIG. 11(a) .
  • the U map UM_H which is an example of the U-Disparity map
  • the horizontal axis is the x axis of the reference image Ia
  • the vertical axis is the disparity value dp of the disparity image
  • the pixel value is the height of an object.
  • the left guardrail 611 in the reference image Ia illustrated in FIG. 11(a) is equivalent to a left guardrail portion 611b on the U map UM_H
  • the right guardrail 612 is equivalent to a right guardrail portion 612b
  • the vehicle 613 is equivalent to a vehicle portion 613b
  • the vehicle 614 is equivalent to a vehicle portion 614b.
  • the second generating unit 501 generates a real U map RM that is a Real U-Disparity map illustrated in FIG. 12(b) in which the horizontal axis has been converted into the actual distance.
  • the real U map RM is a two-dimensional histogram in which the horizontal axis is the actual distance in a direction from the imaging unit 10b (the left camera) to the imaging unit 10a (the right camera) and the vertical axis is the disparity value dp of the disparity image (or the distance in a depth direction that is converted from the disparity value dp).
  • the vehicle portion 613a is equivalent to a vehicle portion 613c
  • the vehicle portion 614a is equivalent to a vehicle portion 614c.
  • the second generating unit 501 does not decimate pixels in the case of a long distance (the small disparity value dp) as an object is small and therefore there is a small amount of disparity information and a distance resolution is low but decimates a large number of pixels in the case of a short distance as an object appears to be large and therefore there is a large amount of disparity information and a distance resolution is high, thereby generating the real U map RM that is equivalent to a plane view.
  • the cluster of pixel values (object) ("isolated area" described later) is extracted from the real U map RM so that the object can be detected.
  • the width of the rectangle enclosing a cluster corresponds to the width of an extracted object, and its height corresponds to the depth of the extracted object.
  • the second generating unit 501 is capable of not only generating the real U map RM from the U map UM but also generating the real U map RM directly from the disparity image.
  • images input from the disparity-value deriving unit 3 to the second generating unit 501 are not limited to the reference image Ia, but the comparison image Ib may be the target.
  • the second generating unit 501 is implemented by using the FPGA 51 illustrated in FIG. 3 . Furthermore, the second generating unit 501 may be implemented when the CPU 52 executes programs stored in the ROM 53 instead of the FPGA 51 that is a hardware circuit.
  • the clustering processing unit 502 is a functional unit that performs clustering processing to detect an object appearing in a disparity image on the basis of each map output from the second generating unit 501. As illustrated in FIG. 9 , the clustering processing unit 502 includes an area extracting unit 511 (extracting unit), a frame generating unit 512 (determining unit), a first discarding unit 513, and an overlap processing unit 514.
  • the area extracting unit 511 is a functional unit that extracts an isolated area that is a cluster of pixel values from the real U map RM included in maps (images) output from the second generating unit 501. Specifically, the area extracting unit 511 conducts binarization processing, labeling processing, or the like, on the real U map RM and extracts an isolated area with respect to each piece of identification information on the labeling processing. For example, FIG. 13 illustrates a state where isolated areas are extracted from the real U map RM. In the example of the real U map RM illustrated in the case of FIG. 13 , the area extracting unit 511 extracts isolated areas 621 to 624 as isolated areas. The isolated areas extracted by the area extracting unit 511 correspond to objects appearing in the reference image Ia, and they represent recognized areas of the objects in the reference image Ia.
  • the area extracting unit 511 is capable of identifying the position and the width (xmin, xmax) of the object at an isolated area in the x-axis direction on the disparity image and the reference image Ia. Furthermore, the area extracting unit 511 is capable of identifying the actual depth of an object based on information (dmin, dmax) on the height of the object on the U map UM or the real U map RM.
  • the area extracting unit 511 is capable of identifying the actual size of an object in the x-axis direction and the y-axis direction based on the width (xmin, xmax) of the object in the x-axis direction, the height (ymin, ymax) in the y-axis direction, and the disparity value dp that corresponds to each of them, identified on the disparity image.
  • the area extracting unit 511 is capable of identifying the position and the actual width, height, and depth of the object at an isolated area in the reference image Ia. Furthermore, as the area extracting unit 511 identifies the position of an object in the reference image Ia, the position in a disparity image is determined, and the distance to the object is also determined.
  • the area extracting unit 511 With regard to each extracted isolated area, the area extracting unit 511 generates recognized-area information that is information about an isolated area and includes, in the recognized-area information, here for example identification information on a labeling process and information on the position and the size of an isolated area on the reference image Ia, the V map VM, the U map UM, and the real U map RM.
  • the area extracting unit 511 sends the generated recognized-area information to the frame generating unit 512.
  • the area extracting unit 511 may perform processing such as smoothing to reduce noise, disparity dispersion, and the like, which are present on the real U map RM, plane detection of the object at an isolated area, or deletion of unnecessary areas.
  • the frame generating unit 512 is a functional unit that, with respect to the isolated area of an object on the real U map RM extracted by the area extracting unit 511, generates a frame at the object's area (hereafter, sometimes referred to as detection area) that is in a disparity image Ip (or the reference image Ia) and that corresponds to the isolated area. Specifically, the frame generating unit 512 generates detection frames 631a to 634a in the disparity image Ip or the reference image Ia as illustrated in FIG. 14(b) such that they correspond to detection areas 631 to 634 that correspond to the isolated areas 621 to 624, respectively, which are extracted by the area extracting unit 511 from the real U map RM, as illustrated in FIG. 14(a) .
  • the frame generating unit 512 includes the information on the frame generated on the disparity image Ip or the reference image Ia in the recognized-area information and sends it to the first discarding unit 513.
  • the first discarding unit 513 is a functional unit that determines what the object is on the basis of the actual size (width, height, depth) of the object (hereafter, sometimes referred to as detection object) in a detection area indicated with a frame by the frame generating unit 512 based on the size of the detection area and that discards it in accordance with the type of object.
  • the first discarding unit 513 uses for example the following (Table 1) to determine what a detection object is. For example, when the width of the object is 1300 [mm], the height is 1800 [mm], and the depth is 2000 [mm], it is determined that the object is a "standard-sized automobile".
  • the information that relates width, height, and depth with type of object may be stored as a table like (Table 1) in the RAM 54, or the like.
  • the first discarding unit 513 discards an object that is determined not to be targeted for subsequent processing (overlap processing, tracking processing, or the like, described later) in accordance with the determined type of detection object. For example, when pedestrians (persons) and vehicles are targeted for subsequent processing, the first discarding unit 513 discards detection objects indicated by detection frames 631a, 632a illustrated in FIG. 14(b) as they are side wall objects (guardrails). To discard a detection object, for example, the first discarding unit 513 includes a flag (discard flag) indicating discard in the recognized-area information on the detection object.
  • the first discarding unit 513 determines whether a detection object is to be discarded in accordance with the determined type of detection object; however, this is not a limitation, and it may be determined whether an object in a detection area is to be discarded in accordance with the size of the detection area.
  • the first discarding unit 513 includes a discard flag indicating whether the detection object is to be discarded in the recognized-area information and sends it to the overlap processing unit 514. Furthermore, with regard to a detection object in the following explanation of an overlap process and a tracking process, it is assumed that the discard flag included in the recognized-area information is off, that is, it is not discarded.
  • the overlap processing unit 514 is a functional unit that, when detection areas are overlapped, performs an overlap process to determine whether objects in the detection areas are to be discarded on the basis of the size of the overlapped detection areas.
  • the overlap processing unit 514 includes a first determining unit 521, a distance calculating unit 522 (first calculating unit), a second determining unit 523 (determining unit), an overlapped-size calculating unit 524 (second calculating unit), a third determining unit 525, and a second discarding unit 526 (discarding unit).
  • the first determining unit 521 is a functional unit that determines whether two detection areas are overlapped.
  • the distance calculating unit 522 is a functional unit that, when the first determining unit 521 determines that detection areas are overlapped, calculates the distance (hereafter, sometimes referred to as the distance between frames) between objects in the overlapped detection areas in a depth direction.
  • the second determining unit 523 is a functional unit that determines whether the distance between frames calculated by the distance calculating unit 522 is less than a predetermined threshold.
  • a distance equal to or longer than the predetermined threshold is referred to as “long distance” (second distance range)
  • a distance less than the predetermined threshold is “short distance” (first distance range).
  • the second determining unit 523 switches the predetermined threshold to be compared with the distance between frames in accordance with the distance to a closer object between two detection objects, for example, as illustrated in the following (Table 2).
  • the second determining unit 523 sets 4.5 [m] as the predetermined threshold to be compared with the distance between frames.
  • the relation between the distance to a detection object and the threshold to be compared with the distance between frames illustrated in (Table 2) is an example, and they may be defined with a different relation. The details of a determination process by the second determining unit 523 are described later with reference to FIG. 19 .
  • FIG. 15 illustrates an example of the case where the distance between frames is a short distance.
  • a disparity image Ip1 illustrated in FIG. 15 indicates that a detection area 641 in which the detection object is a pedestrian and a detection area 642 in which the detection object is a vehicle is in a short distance and parts of the detection areas 641, 642 are overlapped.
  • FIG. 16 illustrates an example of the case where the distance between frames is a long distance.
  • a disparity image Ip2 illustrated in FIG. 16 indicates that a detection area 651 in which the detection object is a pedestrian and a detection area 652 in which the detection object is a vehicle are in a long distance and parts of the detection areas 651, 652 are overlapped.
  • the overlapped-size calculating unit 524 is a functional unit that calculates the size (hereafter, sometimes referred to as overlap size) of the area where two detection areas are overlapped. The process to calculate the overlap size by the overlapped-size calculating unit 524 is explained later in detail with reference to FIGS. 19 , 20 , 22, and 23 .
  • the third determining unit 525 is a functional unit that determines whether the overlap size calculated by the overlapped-size calculating unit 524 is more than a predetermined percentage of the size of any one of the two detection areas (a threshold with regard to the overlap percentage of a detection area).
  • the third determining unit 525 switches the predetermined percentage (threshold) depending on whether the distance between frames in two detection areas is a short distance or a long distance, as illustrated in for example the following (Table 3). For example, as illustrated in (Table 3), when the distance between frames in two detection areas is a long distance, the third determining unit 525 uses 15[%] of the size of any one of the two detection areas as the threshold with regard to the overlap percentage of the detection areas.
  • the relation between the distance between frames and the threshold with regard to the overlap percentage of detection areas illustrated in (Table 3) is an example, and they may be defined with a different relation.
  • a determination process by the third determining unit 525 is described later in detail with reference to FIG. 19 .
  • the second discarding unit 526 is a functional unit that determines whether objects in two detection areas are to be discarded in accordance with a determination result regarding the overlap size by the third determining unit 525.
  • the second discarding unit 526 includes the discard flag indicating whether the detection object is discarded in the recognized-area information and sends it to the tracking unit 503. The discard process by the second discarding unit 526 is described later in detail with reference to FIG. 19 .
  • the area extracting unit 511, the frame generating unit 512, and the first discarding unit 513 of the clustering processing unit 502 and the first determining unit 521, the distance calculating unit 522, the second determining unit 523, the overlapped-size calculating unit 524, the third determining unit 525, and the second discarding unit 526 of the overlap processing unit 514, illustrated in FIG. 9 , are implemented by using the FPGA 51 illustrated in FIG. 3 .
  • all or part of the area extracting unit 511, the frame generating unit 512, and the first discarding unit 513 of the clustering processing unit 502 and the first determining unit 521, the distance calculating unit 522, the second determining unit 523, the overlapped-size calculating unit 524, the third determining unit 525, and the second discarding unit 526 of the overlap processing unit 514 may be implemented when the CPU 52 executes programs stored in the ROM 53 instead of the FPGA 51 that is a hardware circuit.
  • the tracking unit 503 is a functional unit that performs a tracking process on a detection object whose discard flag is off on the basis of the recognized-area information that is information related to the object detected by the clustering processing unit 502.
  • the tracking unit 503 outputs the recognized-area information including a result of a tracking process as recognition information to the vehicle control device 6 (see FIG. 3 ).
  • the tracking unit 503 is implemented by using the FPGA 51 illustrated in FIG. 3 .
  • the tracking unit 503 may be implemented when the CPU 52 executes programs stored in the ROM 53 instead of the FPGA 51 that is a hardware circuit.
  • the image processing apparatus may be the clustering processing unit 502 or the recognition processing unit 5 including the clustering processing unit 502.
  • each functional unit of the recognition processing unit 5 illustrated in FIG. 9 is illustrated as a concept, and this configuration is not a limitation.
  • multiple functional units that are illustrated as separate functional units in the recognition processing unit 5 illustrated in FIG. 9 may be configured as a single functional unit.
  • a function provided in a single functional unit in the recognition processing unit 5 illustrated in FIG. 9 may be divided and configured as multiple functional units.
  • FIG. 17 is a flowchart that illustrates an example of operation during block matching processing by the disparity-value deriving unit according to the embodiment. With reference to FIG. 17 , an explanation is given of the flow of operation during the block matching processing by the disparity-value deriving unit 3 in the object recognition apparatus 1.
  • the image acquiring unit 100b in the disparity-value deriving unit 3 captures an image of the object in the front by using the left camera (the imaging unit 10b), generates analog image signals, and obtains a luminance image that is an image based on the image signals. Thus, image signals targeted for the subsequent image processing are obtained. Then, a transition is made to Step S2-1.
  • the image acquiring unit 100a in the disparity-value deriving unit 3 captures an image of the object in the front by using the right camera (the imaging unit 10a), generates analog image signals, and obtains a luminance image that is an image based on the image signals. Thus, image signals targeted for the subsequent image processing are obtained. Then, a transition is made to Step S2-2.
  • the converting unit 200b in the disparity-value deriving unit 3 removes noise from the analog image signals obtained during capturing by the imaging unit 10b and converts it into digital-format image data. Due to this conversion into digital-format image data, image processing is possible on the image based on the image data on a pixel by pixel basis. Then, a transition is made to Step S3-1.
  • the converting unit 200a in the disparity-value deriving unit 3 removes noise from the analog image signals obtained during capturing by the imaging unit 10a and converts it into digital-format image data. Due to this conversion into digital-format image data, image processing is possible on the image based on the image data on a pixel by pixel basis. Then, a transition is made to Step S3-2.
  • the converting unit 200b outputs the image based on the digital-format image data, converted at Step S2-1, as the comparison image Ib for block matching processing.
  • the target image to be compared so as to obtain a disparity value during block matching processing is obtained. Then, a transition is made to Step S4.
  • the converting unit 200a outputs the image based on the digital-format image data, converted at Step S2-2, as the reference image Ia for block matching processing.
  • the reference image to obtain a disparity value during block matching processing is obtained. Then, a transition is made to Step S4.
  • the cost calculating unit 301 of the disparity-value calculation processing unit 300 in the disparity-value deriving unit 3 calculates and acquires the cost value C(p,d) of each of the candidate pixels q(x+d,y) for the corresponding pixel on the basis of the luminance value of the reference pixel p(x,y) in the reference image Ia and the luminance value of each of the candidate pixels q(x+d,y) that are identified by shifting them from the pixel at the corresponding position of the reference pixel p(x,y) by the shift amount d on the epipolar line EL in the comparison image Ib based on the reference pixel p(x,y).
  • the cost calculating unit 301 calculates, as the cost value C, the degree of dissimilarity between the reference area pb that is a predetermined area with the reference pixel p in the reference image Ia as a center and the candidate area qb (the same size as the reference area pb) with the candidate pixel q in the comparison image Ib as a center. Then, a transition is made to Step S5.
  • the determining unit 302 of the disparity-value calculation processing unit 300 in the disparity-value deriving unit 3 determines that the shift amount d that corresponds to the minimum value of the cost value C calculated by the cost calculating unit 301 is the disparity value dp with respect to a pixel in the reference image Ia targeted for calculation of the cost value C. Then, the first generating unit 303 of the disparity-value calculation processing unit 300 in the disparity-value deriving unit 3 generates a disparity image that is an image representing the luminance value of each pixel of the reference image Ia with the disparity value dp that corresponds to the pixel on the basis of the disparity value dp determined by the determining unit 302. The first generating unit 303 outputs the generated disparity image to the recognition processing unit 5.
  • FIG. 18 is a flowchart that illustrates an example of operation during the object recognition process by the recognition processing unit according to the embodiment.
  • FIG. 19 is a flowchart that illustrates an example of operation during the overlap process by the recognition processing unit according to the embodiment.
  • FIG. 20 is a diagram that illustrates an overlap size when the distance between frames is a short distance.
  • FIG. 21 is a diagram that illustrates operation to discard a detection object when the distance between frames is a short distance.
  • FIG. 22 is a diagram that illustrates an overlap size when the distance between frames is a long distance.
  • FIG. 23 is a diagram that illustrates a case where there is no overlap size when the distance between frames is a long distance.
  • FIG. 20 is a diagram that illustrates an overlap size when the distance between frames is a short distance.
  • FIG. 21 is a diagram that illustrates operation to discard a detection object when the distance between frames is a short distance.
  • FIG. 22 is a diagram that illustrates an overlap size when the distance between frames is
  • FIGS. 18 to 24 is a diagram that illustrates a case where a detection object is not discarded when the distance between frames is a long distance.
  • FIGS. 18 to 24 an explanation is given of the flow of operation during the object recognition process by the recognition processing unit 5 in the object recognition apparatus 1.
  • the second generating unit 501 receives the disparity image Ip from the disparity-value calculation processing unit 300, receives the reference image Ia from the disparity-value deriving unit 3, and generates various images, such as the V map VM, the U map UM, the U map UM_H, and the real U map RM. Then, a transition is made to Step S12.
  • the area extracting unit 511 of the clustering processing unit 502 extracts an isolated area that is a cluster of pixel values from the real U map RM included in the maps (images) output from the second generating unit 501. Furthermore, by using the V map VM, the U map UM, and the real U map RM, the area extracting unit 511 identifies the position of the object at an isolated area and the actual width, height, and depth in the reference image Ia or the disparity image Ip.
  • the area extracting unit 511 For each extracted isolated area, the area extracting unit 511 generates recognized-area information that is information about an isolated area and here includes, in the recognized-area information, for example the identification information on labeling processing and information such as the position and the size of an isolated area in the reference image Ia, the V map VM, the U map UM, and the real U map RM.
  • the area extracting unit 511 sends the generated recognized-area information to the frame generating unit 512. Then, a transition is made to Step S13.
  • the frame generating unit 512 of the clustering processing unit 502 is a functional unit that, with regard to the isolated area of an object on the real U map RM extracted by the area extracting unit 511, generates a frame for the detection area of the object that corresponds to the isolated area in the disparity image Ip (or the reference image Ia).
  • the frame generating unit 512 includes the information on the frame generated on the disparity image Ip or the reference image Ia in the recognized-area information and sends it to the first discarding unit 513. Then, a transition is made to Step S14.
  • the first discarding unit 513 of the clustering processing unit 502 determines what the object is on the basis of the actual size (width, height, depth) of the detection object in a detection area based on the size of the detection area indicated with the frame by the frame generating unit 512 and discards it in accordance with the type of object.
  • the first discarding unit 513 includes a flag (discard flag) indicating discard in the recognized-area information on the detection object.
  • the first discarding unit 513 includes the discard flag indicating whether the detection object is to be discarded in the recognized-area information and sends it to the overlap processing unit 514. Then, a transition is made to Step S15.
  • the overlap processing unit 514 When detection areas are overlapped, the overlap processing unit 514 performs an overlap process to determine whether objects in the detection areas are to be discarded on the basis of the size of the overlapped detection areas.
  • the overlap process by the overlap processing unit 514 is explained with reference to FIG. 19 .
  • the first determining unit 521 of the overlap processing unit 514 identifies any two detection objects among the detection objects that correspond to pieces of recognized-area information received from the first discarding unit 513. Then, a transition is made to Step S152.
  • the first determining unit 521 determines whether the detection areas of the two identified detection objects are overlapped. When the two detection areas are overlapped (Step S152: Yes), a transition is made to Step S153, and when they are not overlapped (Step S152: No), Step S151 is returned so that the first determining unit 521 identifies two different detection objects.
  • the distance calculating unit 522 of the overlap processing unit 514 calculates the distance between frames of the objects in the overlapped detection areas in a depth direction. Then, a transition is made to Step S154.
  • the second determining unit 523 of the overlap processing unit 514 determines whether the distance between frames calculated by the distance calculating unit 522 is less than a predetermined threshold.
  • a predetermined threshold that is, when the distance between frames is a short distance
  • Step S155 a transition is made to Step S155
  • Step S154: No a transition is made to Step S159.
  • the overlapped-size calculating unit 524 of the overlap processing unit 514 calculates the overlap size of the area where two detection areas are overlapped. For example, as illustrated in FIG. 20 , when a detection area 661 and a detection area 662 are overlapped, the overlapped-size calculating unit 524 calculates the size of an overlapped area 663 that is an area overlapped by using (height OL_H) ⁇ (width OL_W). Then, a transition is made to Step S156.
  • the third determining unit 525 of the overlap processing unit 514 determines whether the overlap size calculated by the overlapped-size calculating unit 524 is equal to or more than a predetermined percentage of the size of any one of the two detection areas (a threshold with regard to the overlap percentage of the detection areas).
  • a predetermined percentage of the size of any one of the two detection areas a threshold with regard to the overlap percentage of the detection areas.
  • the second discarding unit 526 of the overlap processing unit 514 does not discard the detection object in a short distance with a high degree of importance as the target for a tracking process but discards the detection object in a long distance.
  • the second discarding unit 526 includes the discard flag indicating non-discard in the recognized-area information on the detection object in a short distance, includes the discard flag indicating discard in the recognized-area information on the detection object in a long distance, and sends them to the tracking unit 503.
  • the second discarding unit 526 does not discard the detection object that is a vehicle but discards the detection object that is not a vehicle and has a size smaller than vehicles.
  • a detection object that is not a vehicle and has a size smaller than a vehicle is, for example, part of the vehicle that is improperly detected as a pedestrian and therefore it is discarded. For example, as illustrated in FIG.
  • the second discarding unit 526 does not discard the vehicle indicated by the detection frame 671 but discards the detection object indicated by the detection frame 672.
  • the second discarding unit 526 includes the discard flag indicating non-discard in the recognized-area information on the detection object that is a vehicle, includes the discard flag indicating discard in the recognized-area information on the detection object that is not a vehicle, and sends it to the tracking unit 503.
  • the second discarding unit 526 determines that the objects in the detection areas have a high degree of importance as the target for a tracking process and does not discard any of the detection objects.
  • the second discarding unit 526 includes the discard flag indicating non-discard in the recognized-area information on each of the two detection objects and sends it to the tracking unit 503.
  • the overlapped-size calculating unit 524 calculates a central area (an example of a partial area) of the detection area with the detection object in a short distance, included in the two detection areas. Specifically, as illustrated in FIG. 22 , the overlapped-size calculating unit 524 calculates for example a central area 681a that has the size of a central area in a horizontal direction (e.g., an area with 80[%] of the width in a horizontal direction) with regard to the detection area 681 whose detection object is closer, included in the two detection areas 681, 682.
  • the overlapped-size calculating unit 524 calculates the central area of the detection area with the detection object in a short distance, this is not a limitation and, for example, the area with a predetermined percentage (e.g., 85[%]) from the extreme right of the detection area may be calculated. Then, a transition is made to Step S160.
  • a predetermined percentage e.g. 85[%]
  • the overlapped-size calculating unit 524 calculates the overlap size of the area where the central area of the detection area with the detection object in a short distance and the detection area with the detection object in a long distance are overlapped, included in the two detection areas. For example, as illustrated in FIG. 22 , when the central area 681a of the detection area 681 is overlapped with the detection area 682, the overlapped-size calculating unit 524 calculates the size of an overlapped area 683 that is an area overlapped by using (height OL_H1) ⁇ (width OL_W1). Then, a transition is made to Step S161.
  • the third determining unit 525 determines whether the overlap size calculated by the overlapped-size calculating unit 524 is equal to or more than a predetermined percentage (a threshold with regard to an overlap percentage) of the size of any one of the central area of the detection area with the detection object in a short distance and the detection area with the detection object in a long distance.
  • a predetermined percentage a threshold with regard to an overlap percentage
  • the second discarding unit 526 does not discard the detection object in a short distance with a high degree of importance as the target for a tracking process but discards the detection object in a long distance.
  • the second discarding unit 526 does not discard the detection object in the detection area 681 that is in a short distance but discards the detection object in the detection area 682 that is in a long distance.
  • the second discarding unit 526 includes the discard flag indicating non-discard in the recognized-area information on the detection object in a short distance, includes the discard flag indicating discard in the recognized-area information on the detection object in a long distance, and sends them to the tracking unit 503.
  • the second discarding unit 526 determines that the objects in both the detection areas have a high degree of importance as the target for a tracking process and does not discard any of the detection objects.
  • the second discarding unit 526 includes the discard flag indicating non-discard in the recognized-area information on each of the two detection objects and sends it to the tracking unit 503.
  • the third determining unit 525 determines that the overlap size is less than the predetermined percentage of the size of any one of the central area of the detection area of the detection object in a short distance and the detection area of the detection object in a long distance.
  • the second discarding unit 526 determines that the detection objects in both the detection areas 681, 682a have a high degree of importance as the target for a tracking process and does not discard any of the detection objects.
  • the second discarding unit 526 does not discard the detection objects indicated by the detection frames 691, 692.
  • Step S157 After the process at Step S157, S158, S162, or S163 is finished, a transition is made to Step S16.
  • the tracking unit 503 performs a tracking process on a detection object whose discard flag is off on the basis of the recognized-area information that is information about an object detected by the clustering processing unit 502.
  • the tracking unit 503 outputs the recognized-area information including a result of the tracking process as recognition information to the vehicle control device 6 (see FIG. 3 ).
  • the object recognition process is conducted during the process at Steps S11 to S16 illustrated in FIG. 18
  • the overlap process is conducted during the process at Steps S151 to S163 illustrated in FIG. 19 .
  • the distance between frames of the detection areas of two detected objects is calculated, the method of calculating the size of the overlapped area with respect to the detection areas of the two objects is switched in accordance with the distance between frames, and it is determined whether the detection object is to be discarded in accordance with the size.
  • a discard process may be properly conducted. That is, according to the present embodiment, it is possible to discard objects that need to be discarded and refrain from discarding objects that do not need to be discarded other than vehicles.
  • the central area of the detection area with the detection object in a short distance, included in the two detection areas is calculated, the overlap size of the area where the central area is overlapped with the detection area with the detection object in a long distance is calculated, it is determined whether it is equal to or more than the predetermined percentage of the size of any one of the central area and the detection area with the detection object in a long distance, and when it is less than that, the two detection objects are not discarded.
  • the size of the area where the two detection areas are overlapped is calculated, it is determined whether it is equal to or more than the predetermined percentage of the size of any one of the two detection areas, and when it is equal to or more than that and when one of the two detection objects is a vehicle and the other one is not a vehicle and it is an object smaller than a vehicle, the detection object that is a vehicle is not discarded and the detection object that is not a vehicle and is smaller than a vehicle is discarded.
  • objects that are not vehicles may be discarded accurately as there is a high possibility of false detection.
  • the cost value C is an evaluation value representing a degree of dissimilarity; however, it may be an evaluation value representing a degree of similarity.
  • the shift amount d with which the cost value C, the degree of similarity, becomes maximum (extreme value) is the disparity value dp.
  • the object recognition apparatus 1 installed in an automobile that is the vehicle 70 is explained, this is not a limitation.
  • it may be installed in other examples of vehicles, such as bikes, bicycles, wheelchairs, or cultivators for agricultural use.
  • it may be not only a vehicle that is an example of a movable body, but also a movable body such as a robot.
  • a configuration may be such that a program executed by the object recognition apparatus 1 according to the above-described embodiment is provided by being stored, in the form of a file that is installable and executable, in a recording medium readable by a computer, such as a CD-ROM, a flexible disk (FD), a CD-R (compact disk recordable), or a DVD (digital versatile disk).
  • a program executed by the object recognition apparatus 1 according to the above-described embodiment is provided by being stored, in the form of a file that is installable and executable, in a recording medium readable by a computer, such as a CD-ROM, a flexible disk (FD), a CD-R (compact disk recordable), or a DVD (digital versatile disk).
  • a configuration may be such that the program executed by the object recognition apparatus 1 according to the above-described embodiment is stored in a computer connected via a network such as the Internet and provided by being downloaded via the network.
  • a configuration may be such that the program executed by the object recognition apparatus 1 according to the above-described embodiment is provided or distributed via a network such as the Internet.
  • the program executed by the object recognition apparatus 1 has a modular configuration that includes at least any of the above-described functional units, and in terms of actual hardware, the CPU 52 (the CPU 32) reads the program from the above-described ROM 53 (the ROM 33) and executes it so as to load and generate the above-described functional units in a main storage device (the RAM 54 (the RAM 34), or the like).

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Electromagnetism (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Signal Processing (AREA)
  • Automation & Control Theory (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)
  • Measurement Of Optical Distance (AREA)

Abstract

The present invention relates to an image processing apparatus, an object recognition apparatus, a device control system, an image processing method, and a program, and it includes a first calculating unit that calculates a distance between two objects, detected based on distance information on the objects, in a depth direction in detection areas of the objects; a second calculating unit that calculates an overlap size that is a size of an overlapped area of the two detection areas by using a method that corresponds to the distance calculated by the first calculating unit; and a discarding unit that determines whether each of the two objects in the detection areas is to be discarded in accordance with the overlap size.

Description

    Field
  • The present invention relates to an image processing apparatus, an object recognition apparatus, a device control system, an image processing method, and a program.
  • Background
  • Conventionally, body structures of automobiles, and the like, have been developed in terms of safety of automobiles as to how pedestrians and occupants in an automobile are protected when the automobile crashes into pedestrians. Furthermore, in recent years, technologies of detecting persons and automobiles at high speed have been developed due to improvements in information processing technologies and image processing technologies. By using these technologies, some automobiles have been developed to prevent crashes before happens by automatically applying a brake before an automobile crashes into an object. For automatic control of automobiles, the distance to an object such as person or different automobile needs to be measured with accuracy and, for this purpose, distance measurement using millimeter-wave radar and laser radar, distance measurement using a stereo camera, and the like, have been put into practical use.
  • When a stereo camera is used as a technology for recognizing objects, disparity of each object appearing in two luminance images captured on the right and left is derived to generate a disparity image and pixels having a similar disparity value is grouped together to recognize the object. Here, by extracting a disparity cluster in a disparity image, the height, horizontal width, and depth of an object and the position of an object in three dimensions may be detected.
  • As the technology for recognizing objects described above, there is a disclosed technology in which a pedestrian recognition area where the presence of a pedestrian is recognized in image data is identified and a pedestrian score indicating the degree of certainty of a pedestrian is calculated (see Patent Literature 1).
  • Citation List Patent Literature
  • Patent Literature 1: Japanese Laid-open Patent Publication No. 2014-146267
  • Summary Technical Problem
  • Typically, when objects are overlapped in a captured image, a process is conducted to exclude (discard) an object in the back from the control target (tracking target); however, it is preferable that, for example, pedestrians who run out from the back side of a different vehicle in the front are not discarded but included as the control target. Unfortunately, the technology disclosed in Patent Literature 1 has a problem in that for example when a pedestrian suddenly runs out from the back of a different vehicle, or the like, it is difficult to ensure that the pedestrian is detected without being discarded and is included as the control target.
  • The present invention has been made in consideration of the foregoing, and it has an object to provide an image processing apparatus, an object recognition apparatus, a device control system, an image processing method, and a program that performs a discard process properly.
  • Solution to Problem
  • In order to solve the problem mentioned above and achieve the object, the present invention includes a first calculating unit that calculates a distance between two objects, detected based on distance information on the objects, in a depth direction in detection areas of the objects; a second calculating unit that calculates an overlap size that is a size of an overlapped area of the two detection areas by using a method that corresponds to the distance calculated by the first calculating unit; and a discarding unit that determines whether each of the two objects in the detection areas is to be discarded in accordance with the overlap size.
  • Advantageous Effects of Invention
  • According to the present invention, a discard process may be properly performed.
  • Brief Description of Drawings
    • FIG. 1 is a diagram that illustrates an example where a device control system according to an embodiment is installed in a vehicle;
    • FIG. 2 is a diagram that illustrates an example of the external appearance of an object recognition apparatus according to the embodiment;
    • FIG. 3 is a diagram that illustrates an example of the hardware configuration of the object recognition apparatus according to the embodiment;
    • FIG. 4 is a diagram that illustrates an example of the configuration of functional blocks of the object recognition apparatus according to the embodiment;
    • FIG. 5 is a diagram that illustrates an example of the configuration of functional blocks in a disparity-value calculation processing unit of the object recognition apparatus according to the embodiment;
    • FIG. 6 is a diagram that explains the principle for deriving the distance from an imaging unit to an object;
    • FIG. 7 is a diagram that explains the case of obtaining a corresponding pixel that is in a comparison image and that corresponds to the reference pixel in the reference image;
    • FIG. 8 is a diagram that illustrates an example of the graph of results of block matching processing;
    • FIG. 9 is a diagram that illustrates an example of the configuration of functional blocks of the recognition processing unit in the object recognition apparatus according to the embodiment;
    • FIG. 10 is a diagram that illustrates an example of the V map generated from a disparity image;
    • FIG. 11 is a diagram that illustrates an example of the U map generated from a disparity image;
    • FIG. 12 is a diagram that illustrates an example of the real U map generated from a U map;
    • FIG. 13 is a diagram that illustrates a process to extract an isolated area from a real U map;
    • FIG. 14 is a diagram that illustrates a process to generate a detection frame;
    • FIG. 15 is a diagram that illustrates a case where the distance between frames is short;
    • FIG. 16 is a diagram that illustrates a case where the distance between frames is long;
    • FIG. 17 is a flowchart that illustrates an example of operation during block matching processing by a disparity-value deriving unit according to the embodiment;
    • FIG. 18 is a flowchart that illustrates an example of operation during the object recognition process by a recognition processing unit according to the embodiment;
    • FIG. 19 is a flowchart that illustrates an example of operation during the overlap process by the recognition processing unit according to the embodiment;
    • FIG. 20 is a diagram that illustrates an overlap size when the distance between frames is a short distance;
    • FIG. 21 is a diagram that illustrates operation to discard a detection object when the distance between frames is a short distance;
    • FIG. 22 is a diagram that illustrates an overlap size when the distance between frames is a long distance;
    • FIG. 23 is a diagram that illustrates a case where there is no overlap size when the distance between frames is a long distance; and
    • FIG. 24 is a diagram that illustrates a case where a detection object is not discarded when the distance between frames is a long distance.
    Description of Embodiments
  • With reference to FIGS. 1 to 24, a detailed explanation is given below of an embodiment of an image processing apparatus, an object recognition apparatus, a device control system, an image processing method, and a program according to the present invention. The present invention is not limited to the embodiment below, and components in the embodiment below include the ones that may be easily developed by a person skilled in the art, substantially the same ones, and the ones in what is called a range of equivalents. Furthermore, the components may be variously omitted, replaced, modified, or combined without departing from the scope of the embodiment below.
  • [Schematic configuration of vehicle including object recognition apparatus]
  • FIG. 1 is a diagram that illustrates an example where a device control system according to the embodiment is installed in a vehicle. With reference to FIG. 1, an explanation is given of a case where for example a device control system 60 according to the present embodiment is installed in a vehicle 70.
  • With regard to FIG. 1, FIG. 1(a) is a side view of the vehicle 70 with the device control system 60 installed therein, and FIG. 1(b) is a front view of the vehicle 70.
  • As illustrated in FIG. 1, the vehicle 70, which is an automobile, has the device control system 60 installed therein. The device control system 60 includes an object recognition apparatus 1, a vehicle control device 6 (control device), a steering wheel 7, and a brake pedal 8, provided in the vehicle interior that is an accommodation space in the vehicle 70.
  • The object recognition apparatus 1 has an imaging function to capture images in a traveling direction of the vehicle 70, and for example it is installed near the rearview mirror inside the front window of the vehicle 70. The object recognition apparatus 1 includes: a main body unit 2; and an imaging unit 10a and an imaging unit 10b that are fixed to the main body unit 2, and details of its configuration and operation are described later. The imaging units 10a, 10b are fixed to the main body unit 2 so as to capture an object in the traveling direction of the vehicle 70.
  • The vehicle control device 6 is an ECU (electronic control unit) that performs various types of vehicle control on the basis of recognition information received from the object recognition apparatus 1. On the basis of recognition information received from the object recognition apparatus 1, the vehicle control device 6 performs, as an example of the vehicle control, steering control to avoid obstacles by controlling a steering system (control target) including the steering wheel 7, brake control to stop or reduce the speed of the vehicle 70 by controlling the brake pedal 8 (control target), or the like.
  • The device control system 60 including the object recognition apparatus 1 and the vehicle control device 6 described above performs vehicle control such as steering control or brake control to improve driving safety of the vehicle 70.
  • Furthermore, as described above, the object recognition apparatus 1 captures images in front of the vehicle 70; however, this is not a limitation. That is, the object recognition apparatus 1 may be installed to capture images on the back or side of the vehicle 70. In this case, the object recognition apparatus 1 is capable of detecting the position of the following vehicle and person on the back of the vehicle 70 or a different vehicle and person on the side thereof. Furthermore, the vehicle control device 6 is capable of detecting dangers when the vehicle 70 changes a lane, merges into a lane, or the like, to perform the above-described vehicle control. Furthermore, when the vehicle control device 6 determines that there is the danger of collision when the vehicle 70 is backing to be parked, or the like, on the basis of recognition information on an obstacle on the back of the vehicle 70, output from the object recognition apparatus 1, it is capable of performing the above-described vehicle control.
  • [Configuration of the object recognition apparatus]
  • FIG. 2 is a diagram that illustrates an example of the external appearance of the object recognition apparatus according to the embodiment. As illustrated in FIG. 2, the object recognition apparatus 1 includes the main body unit 2; and the imaging unit 10a and the imaging unit 10b that are fixed to the main body unit 2, as described above. The imaging units 10a and 10b are made up of a pair of cylindrical cameras that are parallel to and are located at equivalent positions relative to the main body unit 2. Furthermore, for the convenience of explanation, the imaging unit 10a illustrated in FIG. 2 is sometimes referred to as the right camera and the imaging unit 10b as the left camera.
  • (Hardware configuration of the object recognition apparatus)
  • FIG. 3 is a diagram that illustrates an example of the hardware configuration of the object recognition apparatus according to the embodiment. With reference to FIG. 3, the hardware configuration of the object recognition apparatus 1 is explained.
  • As illustrated in FIG. 3, the object recognition apparatus 1 includes a disparity-value deriving unit 3 and a recognition processing unit 5 inside the main body unit 2.
  • The disparity-value deriving unit 3 is a device that derives a disparity value dp indicating disparity with respect to an object from images obtained after the object is captured and outputs a disparity image (an example of distance information) indicating the disparity value dp of each pixel. The recognition processing unit 5 is a device that performs an object recognition process, or the like, on an object such as person or vehicle appearing in a captured image on the basis of a disparity image output from the disparity-value deriving unit 3 and outputs recognition information that is information indicating a result of the object recognition process to the vehicle control device 6.
  • As illustrated in FIG. 3, the disparity-value deriving unit 3 includes the imaging unit 10a, the imaging unit 10b, a signal converting unit 20a, a signal converting unit 20b, and an image processing unit 30.
  • The imaging unit 10a is a processing unit that captures an object in the front and generates analog image signals. The imaging unit 10a includes an imaging lens 11a, an aperture 12a, and an image sensor 13a.
  • The imaging lens 11a is an optical element that refracts incident light to form an image of the object on the image sensor 13a. The aperture 12a is a member that blocks part of light that has passed through the imaging lens 11a to adjust the amount of light input to the image sensor 13a. The image sensor 13a is a semiconductor device that converts light that has entered the imaging lens 11a and passed through the aperture 12a into electric analog image signals. The image sensor 13a is implemented by using solid state image sensors such as CCD (charge coupled devices) or CMOS (complementary metal oxide semiconductor).
  • The imaging unit 10b is a processing unit that captures the object in the front and generates analog image signals. The imaging unit 10b includes an imaging lens 11b, an aperture 12b, and an image sensor 13b. Here, the functions of the imaging lens 11b, the aperture 12b, and the image sensor 13b are the same as those of the imaging lens 11a, the aperture 12a, and the image sensor 13a described above. Furthermore, the imaging lens 11a and the imaging lens 11b are installed such that their principal surfaces are on the same plane so that the right and the left cameras capture images under the same condition.
  • The signal converting unit 20a is a processing unit that converts analog image signals generated by the imaging unit 10a into digital-format image data. The signal converting unit 20a includes a CDS (correlated double sampling) 21a, an AGC (auto gain control) 22a, an ADC (analog digital converter) 23a, and a frame memory 24a.
  • The CDS 21a removes noise from analog image signals generated by the image sensor 13a by using correlated double sampling, a differential filter in a traverse direction, a smoothing filter in a longitudinal direction, or the like. The AGC 22a performs gain control to control the intensity of analog image signals from which noise has been removed by the CDS 21a. The ADC 23a converts analog image signals whose gain has been controlled by the AGC 22a into digital-format image data. The frame memory 24a stores image data converted by the ADC 23a.
  • The signal converting unit 20b is a processing unit that converts analog image signals generated by the imaging unit 10b into digital-format image data. The signal converting unit 20b includes a CDS 21b, an AGC 22b, an ADC 23b, and a frame memory 24b. Here, the functions of the CDS 21b, the AGC 22b, the ADC 23b, and the frame memory 24b are the same as those of the CDS 21a, the AGC 22a, the ADC 23a, and the frame memory 24a described above.
  • The image processing unit 30 is a device that performs image processing on image data converted by the signal converting unit 20a and the signal converting unit 20b. The image processing unit 30 includes an FPGA (field programmable gate array) 31, a CPU (central processing unit) 32, a ROM (read only memory) 33, a RAM (random access memory) 34, an I/F (interface) 35, and a bus line 39.
  • The FPGA 31 is an integrated circuit, and here it performs a process to derive the disparity value dp in an image based on image data. The CPU 32 controls each function of the disparity-value deriving unit 3. The ROM 33 stores programs for image processing executed by the CPU 32 to control each function of the disparity-value deriving unit 3. The RAM 34 is used as a work area of the CPU 32. The I/F 35 is an interface for communicating with an I/F 55 in the recognition processing unit 5 via a communication line 4. As illustrated in FIG. 3, the bus line 39 is an address bus and a data bus, or the like, for connecting the FPGA 31, the CPU 32, the ROM 33, the RAM 34, and the I/F 35 such that they can communicate with one other.
  • Here, the image processing unit 30 includes the FPGA 31 as an integrated circuit for deriving the disparity value dp; however, this is not a limitation, and it may be an integrated circuit such as ASIC (application specific integrated circuit).
  • As illustrated in FIG. 3, the recognition processing unit 5 includes an FPGA 51, a CPU 52, a ROM 53, a RAM 54, the I/F 55, a CAN (controller area network) I/F 58, and a bus line 59.
  • The FPGA 51 is an integrated circuit, and here it performs an object recognition process on an object on the basis of disparity images, or the like, received from the image processing unit 30. The CPU 52 controls each function of the recognition processing unit 5. The ROM 53 stores programs for an object recognition process with which the CPU 52 performs an object recognition process of the recognition processing unit 5. The RAM 54 is used as a work area of the CPU 52. The I/F 55 is an interface for data communication with the I/F 35 of the image processing unit 30 via the communication line 4. The CAN I/F 58 is an interface for communicating with an external controller (e.g., the vehicle control device 6 illustrated in FIG. 3), and, for example, a bus line 59 connected to the CAN of a vehicle, or the like, is an address bus and a data bus, or the like, connecting the FPGA 51, the CPU 52, the ROM 53, the RAM 54, the I/F 55, and the CAN I/F 58 such that they can communicate with one another, as illustrated in FIG. 3.
  • With this configuration, after a disparity image is sent to the recognition processing unit 5 from the I/F 35 of the image processing unit 30 via the communication line 4, the FPGA 51 performs an object recognition process, or the like, on an object such as person or vehicle appearing in a captured image on the basis of the disparity image in accordance with a command from the CPU 52 of the recognition processing unit 5.
  • Furthermore, each of the above-described programs may be distributed by recorded in a recording medium readable by computers in the form of file that is installable and executable. The recording medium may be a CD-ROM (compact disc read only memory), SD (secure digital) memory card, or the like.
  • Furthermore, as illustrated in FIG. 3, the image processing unit 30 of the disparity-value deriving unit 3 and the recognition processing unit 5 are separate devices; however, this is not a limitation, and, for example, the image processing unit 30 and the recognition processing unit 5 may be the same device to generate disparity images and perform an object recognition process.
  • (Configuration and operation of functional blocks of the object recognition apparatus)
  • FIG. 4 is a diagram that illustrates an example of the configuration of functional blocks of the object recognition apparatus according to the embodiment. First, with reference to FIG. 4, an explanation is given of the configuration and operation of the functional blocks in the relevant part of the object recognition apparatus 1.
  • Although described above with reference to FIG. 3, the object recognition apparatus 1 includes the disparity-value deriving unit 3 and the recognition processing unit 5 as illustrated in FIG. 4. Specifically, the disparity-value deriving unit 3 includes an image acquiring unit 100a (first imaging unit), an image acquiring unit 100b (second imaging unit), converting units 200a, 200b, and a disparity-value calculation processing unit 300 (generating unit).
  • The image acquiring unit 100a is a functional unit that captures the image of an object in the front by using the right camera, generates analog image signals, and obtains a luminance image that is an image based on the image signals. The image acquiring unit 100a is implemented by using the imaging unit 10a illustrated in FIG. 3.
  • The image acquiring unit 100b is a functional unit that captures the image of an object in the front by using the left camera, generates analog image signals, and obtains a luminance image that is an image based on the image signals. The image acquiring unit 100b is implemented by using the imaging unit 10b illustrated in FIG. 3.
  • The converting unit 200a is a functional unit that removes noise from image data on the luminance image obtained by the image acquiring unit 100a, converts it into digital-format image data, and outputs it. The converting unit 200a is implemented by using the signal converting unit 20a illustrated in FIG. 3.
  • The converting unit 200b is a functional unit that removes noise from image data on the luminance image obtained by the image acquiring unit 100b, converts it into digital-format image data, and outputs it. The converting unit 200b is implemented by using the signal converting unit 20b illustrated in FIG. 3.
  • Here, with regard to pieces of image data (hereafter, simply referred to as luminance images) on two luminance images output from the converting units 200a, 200b, the luminance image captured by the image acquiring unit 100a, which is the right camera (the imaging unit 10a), is the image data on a reference image Ia (hereafter, simply referred to as the reference image Ia) (first captured image), and the luminance image captured by the image acquiring unit 100b, which is the left camera (the imaging unit 10b), is the image data on a comparison image Ib (hereafter, simply referred to as the comparison image Ib) (second captured image). That is, the converting units 200a, 200b output the reference image Ia and the comparison image Ib, respectively, on the basis of two luminance images output from the image acquiring units 100a, 100b.
  • The disparity-value calculation processing unit 300 is a functional unit that derives the disparity value dp with respect to each pixel of the reference image Ia on the basis of the reference image Ia and the comparison image Ib received from the converting units 200a, 200b, respectively, and generates a disparity image in which the disparity value dp is applied to each pixel of the reference image Ia. The disparity-value calculation processing unit 300 outputs the generated disparity image to the recognition processing unit 5.
  • The recognition processing unit 5 is a functional unit that recognizes (detects) an object on the basis of the reference image Ia and the disparity image received from the disparity-value deriving unit 3 and performs a tracking process on the recognized object.
  • <Configuration and operation of functional blocks of the disparity-value calculation processing unit>
  • FIG. 5 is a diagram that illustrates an example of the configuration of functional blocks in the disparity-value calculation processing unit of the object recognition apparatus according to the embodiment. FIG. 6 is a diagram that explains the principle for deriving the distance from the imaging unit to an object. FIG. 7 is a diagram that explains the case of obtaining a corresponding pixel that is in a comparison image and that corresponds to the reference pixel in the reference image. FIG. 8 is a diagram that illustrates an example of the graph of results of block matching processing.
  • First, with reference to FIGS. 6 to 8, a distance measuring method using block matching processing is schematically explained.
  • <<Principle of distance measurement>>
  • With reference to FIG. 6, an explanation is given of the principle of deriving the disparity with respect to an object from the stereo camera due to stereo matching processing and measuring the distance from the stereo camera to the object by using the disparity value representing the disparity.
  • The imaging system illustrated in FIG. 6 includes the imaging unit 10a and the imaging unit 10b that are located parallel at equivalent positions. The imaging units 10a, 10b include the imaging lens 11a, 11b, respectively, which refract incident light to form an image of the object on an image sensor that is a solid state image sensor. Images captured by the imaging unit 10a and the imaging unit 10b are the reference image Ia and the comparison image Ib, respectively. In FIG. 6, on each of the reference image Ia and the comparison image Ib, a point S on an object E in the three-dimensional space is mapped onto a position on a straight line parallel to the straight line connecting the imaging lens 11a and the imaging lens 11b. Here, the point S mapped onto each image is a point Sa(x,y) on the reference image Ia and is a point Sb(X,y) on the comparison image Ib. Here, the disparity value dp is represented as in Equation (1) below by using the point Sa(x,y) on coordinates of the reference image Ia and the point Sb(X,y) on coordinates of the comparison image Ib. dp = X x
    Figure imgb0001
  • Furthermore, in FIG. 6, the disparity value dp may be represented as dp=Δa+Δb, where Δa is the distance between the point Sa(x,y) on the reference image Ia and the intersection point of the perpendicular extending from the imaging lens 11a with the imaging surface and Δb is the distance between the point Sb(X,y) on the comparison image Ib and the intersection point of the perpendicular extending from the imaging lens 11b with the imaging surface.
  • Then, by using the disparity value dp, a distance Z between the imaging units 10a, 10b and the object E is derived. Here, the distance Z is the distance from the straight line connecting the focus position of the imaging lens 11a and the focus position of the imaging lens 11b to the point S on the object E. As illustrated in FIG. 6, the distance Z may be calculated with Equation (2) below by using a focal length f of the imaging lens 11a and the imaging lens 11b, a base length B that is the distance between the imaging lens 11a and the imaging lens 11b, and the disparity value dp. Z = B × f / dp
    Figure imgb0002
  • According to Equation (2), it is understood that the distance Z is shorter as the disparity value dp is larger and the distance Z is longer as the disparity value dp is smaller.
  • <<Block matching processing>>
  • Next, with reference to FIGS. 7 and 8, an explanation is given of a distance measuring method due to block matching processing.
  • With reference to FIGS. 7 and 8, a method of calculating a cost value C(p,d) is explained. In the following explanation, C(p,d) represents C(x,y,d).
  • As for FIG. 7, FIG. 7(a) is a conceptual diagram that illustrates a reference pixel p and a reference area pb in the reference image Ia, and FIG. 7(b) is a conceptual diagram of calculating the cost value C while sequentially shifting (displacing) candidates for the corresponding pixel that is in the comparison image Ib and that corresponds to the reference pixel p illustrated in FIG. 7(a). Here, the corresponding pixel indicates the pixel that is in the comparison image Ib and that is nearest to the reference pixel p in the reference image Ia. Furthermore, the cost value C is an evaluation value (degree of matching) representing the degree of similarity or the degree of dissimilarity of each pixel in the comparison image Ib with respect to the reference pixel p in the reference image Ia. In explanation, the cost value C described below is an evaluation value representing the degree of dissimilarity indicating that as the value is smaller, the pixel in the comparison image Ib is similar to the reference pixel p.
  • As illustrated in FIG. 7(a), on the basis of the luminance value (pixel value) of the reference pixel p(x,y) in the reference image Ia and each of the candidate pixels q(x+d,y) that are candidates for the corresponding pixel on an epipolar line EL in the comparison image Ib with respect to the reference pixel p(x,y), the cost value C(p,d) of the candidate pixel q(x+d,y) that is a candidate for the corresponding pixel with respect to the reference pixel p(x,y) is calculated. The shift amount (displacement amount) between the reference pixel p and the candidate pixel q is d, and the shift amount d is a shift on a pixel to pixel basis. Specifically, while the candidate pixel q(x+d,y) is sequentially shifted by one pixel within a predetermined range (e.g., 0<d<25), the cost value C(p,d) is calculated, which is the degree of dissimilarity between the luminance values of the candidate pixel q(x+d,y) and the reference pixel p(x,y). Furthermore, as the stereo matching processing to obtain the corresponding pixel of the reference pixel p, block matching processing is performed according to the present embodiment. During the block matching processing, the degree of dissimilarity is obtained between the reference area pb that is a predetermined area with the reference pixel p in the reference image Ia as a center and a candidate area qb (the same size as the reference area pb) with the candidate pixel q in the comparison image Ib as a center. SAD (Sum of Absolute Difference), SSD (Sum of Squared Difference), ZSSD (Zero-mean-Sum of Squared Difference), which is obtained by subtracting the average value of blocks from the value of SSD, or the like, is used as the cost value C indicating the degree of dissimilarity between the reference area pb and the candidate area qb. These evaluation values represent the degree of dissimilarity because the value is smaller as the correlation is higher (the degree of similarity is higher).
  • Furthermore, as described above, the imaging units 10a, 10b are located parallel at equivalent positions and therefore the reference image Ia and the comparison image Ib also have a relation such that they are located parallel at equivalent positions. Therefore, the corresponding pixel that is in the comparison image Ib and that corresponds to the reference pixel p in the reference image Ia is present on the epipolar line EL that is illustrated as a line in a horizontal direction as viewed from the sheet surface in FIG. 7 and, to obtain the corresponding pixel in the comparison image Ib, a pixel is retrieved on the epipolar line EL of the comparison image Ib.
  • The cost value C(p,d) calculated during the above-described block matching processing is represented by, for example, the graph illustrated in FIG. 8 in relation to the shift amount d. In the example of FIG. 8, as the cost value C is the minimum value when the shift amount d=7, the disparity value dp=7 is derived.
  • <<Specific configuration and operation of functional blocks of the disparity-value calculation processing unit>>
  • With reference to FIG. 5, the specific configuration and operation of functional blocks of the disparity-value calculation processing unit 300 are explained.
  • As illustrated in FIG. 5, the disparity-value calculation processing unit 300 includes a cost calculating unit 301, a determining unit 302, and a first generating unit 303.
  • The cost calculating unit 301 is a functional unit that calculates the cost value C(p,d) of each of the candidate pixels q(x+d,y) on the basis of the luminance value of the reference pixel p(x,y) in the reference image Ia and the luminance value of each of the candidate pixels q(x+d,y) that are candidates for the corresponding pixel, identified by shifting the pixel at the corresponding position of the reference pixel p(x,y) by the shift amount d on the epipolar line EL on the comparison image Ib based on the reference pixel p(x,y). Specifically, during block matching processing, the cost calculating unit 301 calculates, as the cost value C, the degree of dissimilarity between the reference area pb that is a predetermined area with the reference pixel p in the reference image Ia as a center and the candidate area qb (the same size as the reference area pb) with the candidate pixel q in the comparison image Ib as a center.
  • The determining unit 302 is a functional unit that determines that the shift amount d that corresponds to the minimum value of the cost value C calculated by the cost calculating unit 301 is the disparity value dp with respect to a pixel in the reference image Ia that is targeted for calculation of the cost value C.
  • The first generating unit 303 is a functional unit that generates a disparity image that is an image where, on the basis of the disparity value dp determined by the determining unit 302, the pixel value of each pixel of the reference image Ia is replaced with the disparity value dp that corresponds to the pixel.
  • Each of the cost calculating unit 301, the determining unit 302, and the first generating unit 303 illustrated in FIG. 5 is implemented by using the FPGA 31 illustrated in FIG. 3. Furthermore, all or part of the cost calculating unit 301, the determining unit 302, and the first generating unit 303 may be implemented when the CPU 32 executes programs stored in the ROM 33 instead of the FPGA 31 that is a hardware circuit.
  • Here, the functions of the cost calculating unit 301, the determining unit 302, and the first generating unit 303 in the disparity-value calculation processing unit 300 illustrated in FIG. 5 are illustrated as a concept, and this configuration is not a limitation. For example, multiple functional units that are illustrated as separate functional units in the disparity-value calculation processing unit 300 illustrated in FIG. 5 may be configured as a single functional unit. Conversely, a function provided in a single functional unit in the disparity-value calculation processing unit 300 illustrated in FIG. 5 may be divided and configured as multiple functional units.
  • <Configuration and operation of functional blocks in the recognition processing unit>
  • FIG. 9 is a diagram that illustrates an example of the configuration of functional blocks of the recognition processing unit in the object recognition apparatus according to the embodiment. FIG. 10 is a diagram that illustrates an example of the V map generated from a disparity image. FIG. 11 is a diagram that illustrates an example of the U map generated from a disparity image. FIG. 12 is a diagram that illustrates an example of the real U map generated from a U map. FIG. 13 is a diagram that illustrates a process to extract an isolated area from a real U map. FIG. 14 is a diagram that illustrates a process to generate a detection frame. FIG. 15 is a diagram that illustrates a case where the distance between frames is short. FIG. 16 is a diagram that illustrates a case where the distance between frames is long. With reference to FIGS. 9 to 16, the configuration and operation of functional blocks of the recognition processing unit 5 are explained.
  • As illustrated in FIG. 9, the recognition processing unit 5 includes a second generating unit 501, a clustering processing unit 502, and a tracking unit 503.
  • The second generating unit 501 is a functional unit that receives a disparity image from the disparity-value calculation processing unit 300, receives the reference image Ia from the disparity-value deriving unit 3, and generates a V-Disparity map, U-Disparity map, and Real U-Disparity map, or the like. Specifically, to detect a road surface from the disparity image input from the disparity-value calculation processing unit 300, the second generating unit 501 generates a V map VM that is the V-Disparity map illustrated in FIG. 10(b). Here, the V-Disparity map is a two-dimensional histogram indicating the frequency distribution of the disparity value dp, where the vertical axis is the y axis of the reference image Ia and the horizontal axis is the disparity value dp (or distance) of the disparity image. For example, a road surface 600, a power pole 601, and a vehicle 602 appear in the reference image Ia illustrated in FIG. 10(a). On the V map VM, the road surface 600 in the reference image Ia corresponds to a road surface portion 600a, the power pole 601 corresponds to a power pole portion 601a, and the vehicle 602 corresponds to a vehicle portion 602a.
  • Furthermore, the second generating unit 501 conducts linear approximation on the position that is estimated to be a road surface based on the generated V map VM. When a road surface is flat, approximation is possible by using a single straight line; however, when the gradient of the road surface changes, there is a need to divide the V map VM into sections and conduct linear approximation with high accuracy. Known technologies such as Hough transform or the least-square method may be used as linear approximation. On the V map VM, the power pole portion 601a and the vehicle portion 602a, which are clusters located above the detected road surface portion 600a, are equivalent to the power pole 601 and the vehicle 602, respectively, that are objects on the road surface 600. When a U-Disparity map is generated by the second generating unit 501 described later, only information above the road surface is used to remove noise.
  • Furthermore, the second generating unit 501 generates a U map UM that is a U-Disparity map illustrated in FIG. 11(b) to recognize objects by using only information located above the road surface detected from the V map VM, i.e., by using information that is in a disparity image and that is equivalent to a left guardrail 611, a right guardrail 612, a vehicle 613, and a vehicle 614 in the reference image Ia illustrated in FIG. 11(a). Here, the U map UM is a two-dimensional histogram indicating the frequency distribution of the disparity value dp, where the horizontal axis is the x axis of the reference image Ia and the vertical axis is the disparity value dp (or distance) of the disparity image. The left guardrail 611 in the reference image Ia illustrated in FIG. 11(a) is equivalent to a left guardrail portion 611a on the U map UM, the right guardrail 612 is equivalent to a right guardrail portion 612a, the vehicle 613 is equivalent to a vehicle portion 613a, and the vehicle 614 is equivalent to a vehicle portion 614a.
  • Furthermore, the second generating unit 501 generates a U map UM_H that is an example of the U-Disparity map illustrated in FIG. 11(c) by using only information located above the road surface detected from the V map VM, i.e., by using information that is in a disparity image and that is equivalent to the left guardrail 611, the right guardrail 612, the vehicle 613, and the vehicle 614 in the reference image Ia illustrated in FIG. 11(a). Here, the U map UM_H, which is an example of the U-Disparity map, is an image where the horizontal axis is the x axis of the reference image Ia, the vertical axis is the disparity value dp of the disparity image, and the pixel value is the height of an object. The left guardrail 611 in the reference image Ia illustrated in FIG. 11(a) is equivalent to a left guardrail portion 611b on the U map UM_H, the right guardrail 612 is equivalent to a right guardrail portion 612b, the vehicle 613 is equivalent to a vehicle portion 613b, and the vehicle 614 is equivalent to a vehicle portion 614b.
  • Furthermore, from the generated U map UM illustrated in FIG. 12(a), the second generating unit 501 generates a real U map RM that is a Real U-Disparity map illustrated in FIG. 12(b) in which the horizontal axis has been converted into the actual distance. Here, the real U map RM is a two-dimensional histogram in which the horizontal axis is the actual distance in a direction from the imaging unit 10b (the left camera) to the imaging unit 10a (the right camera) and the vertical axis is the disparity value dp of the disparity image (or the distance in a depth direction that is converted from the disparity value dp). The left guardrail portion 611a on the U map UM illustrated in FIG. 12(a) is equivalent to a left guardrail portion 611c on the real U map RM, the right guardrail portion 612a is equivalent to a right guardrail portion 612c, the vehicle portion 613a is equivalent to a vehicle portion 613c, and the vehicle portion 614a is equivalent to a vehicle portion 614c. Specifically, on the U map UM, the second generating unit 501 does not decimate pixels in the case of a long distance (the small disparity value dp) as an object is small and therefore there is a small amount of disparity information and a distance resolution is low but decimates a large number of pixels in the case of a short distance as an object appears to be large and therefore there is a large amount of disparity information and a distance resolution is high, thereby generating the real U map RM that is equivalent to a plane view. As described later, the cluster of pixel values (object) ("isolated area" described later) is extracted from the real U map RM so that the object can be detected. In this case, the width of the rectangle enclosing a cluster corresponds to the width of an extracted object, and its height corresponds to the depth of the extracted object. Furthermore, the second generating unit 501 is capable of not only generating the real U map RM from the U map UM but also generating the real U map RM directly from the disparity image.
  • Furthermore, images input from the disparity-value deriving unit 3 to the second generating unit 501 are not limited to the reference image Ia, but the comparison image Ib may be the target.
  • The second generating unit 501 is implemented by using the FPGA 51 illustrated in FIG. 3. Furthermore, the second generating unit 501 may be implemented when the CPU 52 executes programs stored in the ROM 53 instead of the FPGA 51 that is a hardware circuit.
  • The clustering processing unit 502 is a functional unit that performs clustering processing to detect an object appearing in a disparity image on the basis of each map output from the second generating unit 501. As illustrated in FIG. 9, the clustering processing unit 502 includes an area extracting unit 511 (extracting unit), a frame generating unit 512 (determining unit), a first discarding unit 513, and an overlap processing unit 514.
  • The area extracting unit 511 is a functional unit that extracts an isolated area that is a cluster of pixel values from the real U map RM included in maps (images) output from the second generating unit 501. Specifically, the area extracting unit 511 conducts binarization processing, labeling processing, or the like, on the real U map RM and extracts an isolated area with respect to each piece of identification information on the labeling processing. For example, FIG. 13 illustrates a state where isolated areas are extracted from the real U map RM. In the example of the real U map RM illustrated in the case of FIG. 13, the area extracting unit 511 extracts isolated areas 621 to 624 as isolated areas. The isolated areas extracted by the area extracting unit 511 correspond to objects appearing in the reference image Ia, and they represent recognized areas of the objects in the reference image Ia.
  • Furthermore, based on the U map UM or the real U map RM generated by the second generating unit 501, the area extracting unit 511 is capable of identifying the position and the width (xmin, xmax) of the object at an isolated area in the x-axis direction on the disparity image and the reference image Ia. Furthermore, the area extracting unit 511 is capable of identifying the actual depth of an object based on information (dmin, dmax) on the height of the object on the U map UM or the real U map RM. Furthermore, based on the V map VM generated by the second generating unit 501, the area extracting unit 511 is capable of identifying the position and the height (ymin="the y-coordinate that is equivalent to the maximum height from the road surface with the maximum disparity value", ymax="the y-coordinate indicating the height of the road surface obtained from the maximum disparity value") of an object in the y-axis direction on the disparity image and the reference image Ia. Furthermore, the area extracting unit 511 is capable of identifying the actual size of an object in the x-axis direction and the y-axis direction based on the width (xmin, xmax) of the object in the x-axis direction, the height (ymin, ymax) in the y-axis direction, and the disparity value dp that corresponds to each of them, identified on the disparity image. As described above, by using the V map VM, the U map UM, and the real U map RM, the area extracting unit 511 is capable of identifying the position and the actual width, height, and depth of the object at an isolated area in the reference image Ia. Furthermore, as the area extracting unit 511 identifies the position of an object in the reference image Ia, the position in a disparity image is determined, and the distance to the object is also determined.
  • With regard to each extracted isolated area, the area extracting unit 511 generates recognized-area information that is information about an isolated area and includes, in the recognized-area information, here for example identification information on a labeling process and information on the position and the size of an isolated area on the reference image Ia, the V map VM, the U map UM, and the real U map RM. The area extracting unit 511 sends the generated recognized-area information to the frame generating unit 512.
  • Furthermore, on an extracted isolated area, the area extracting unit 511 may perform processing such as smoothing to reduce noise, disparity dispersion, and the like, which are present on the real U map RM, plane detection of the object at an isolated area, or deletion of unnecessary areas.
  • The frame generating unit 512 is a functional unit that, with respect to the isolated area of an object on the real U map RM extracted by the area extracting unit 511, generates a frame at the object's area (hereafter, sometimes referred to as detection area) that is in a disparity image Ip (or the reference image Ia) and that corresponds to the isolated area. Specifically, the frame generating unit 512 generates detection frames 631a to 634a in the disparity image Ip or the reference image Ia as illustrated in FIG. 14(b) such that they correspond to detection areas 631 to 634 that correspond to the isolated areas 621 to 624, respectively, which are extracted by the area extracting unit 511 from the real U map RM, as illustrated in FIG. 14(a). The frame generating unit 512 includes the information on the frame generated on the disparity image Ip or the reference image Ia in the recognized-area information and sends it to the first discarding unit 513.
  • The first discarding unit 513 is a functional unit that determines what the object is on the basis of the actual size (width, height, depth) of the object (hereafter, sometimes referred to as detection object) in a detection area indicated with a frame by the frame generating unit 512 based on the size of the detection area and that discards it in accordance with the type of object. The first discarding unit 513 uses for example the following (Table 1) to determine what a detection object is. For example, when the width of the object is 1300 [mm], the height is 1800 [mm], and the depth is 2000 [mm], it is determined that the object is a "standard-sized automobile". Here, the information that relates width, height, and depth with type of object (object type) may be stored as a table like (Table 1) in the RAM 54, or the like. Here, the relation between a size and a type of object (object type) illustrated in (Table 1) is an example, and they may be defined as a relation between a different size and a type of object. Table 1
    Object type Width Height Depth Unit (mm)
    Motorbike, bicycle <1100 <2500 >1000
    Pedestrian <1100 <2500 <=1000
    Small-sized automobile <1700 <1700 <10000
    Standard-sized automobile <1700 <2500 <10000
    Truck <3500 <3500 <15000
    Others Not applied to above sizes
  • The first discarding unit 513 discards an object that is determined not to be targeted for subsequent processing (overlap processing, tracking processing, or the like, described later) in accordance with the determined type of detection object. For example, when pedestrians (persons) and vehicles are targeted for subsequent processing, the first discarding unit 513 discards detection objects indicated by detection frames 631a, 632a illustrated in FIG. 14(b) as they are side wall objects (guardrails). To discard a detection object, for example, the first discarding unit 513 includes a flag (discard flag) indicating discard in the recognized-area information on the detection object. Here, the first discarding unit 513 determines whether a detection object is to be discarded in accordance with the determined type of detection object; however, this is not a limitation, and it may be determined whether an object in a detection area is to be discarded in accordance with the size of the detection area. The first discarding unit 513 includes a discard flag indicating whether the detection object is to be discarded in the recognized-area information and sends it to the overlap processing unit 514. Furthermore, with regard to a detection object in the following explanation of an overlap process and a tracking process, it is assumed that the discard flag included in the recognized-area information is off, that is, it is not discarded.
  • The overlap processing unit 514 is a functional unit that, when detection areas are overlapped, performs an overlap process to determine whether objects in the detection areas are to be discarded on the basis of the size of the overlapped detection areas. The overlap processing unit 514 includes a first determining unit 521, a distance calculating unit 522 (first calculating unit), a second determining unit 523 (determining unit), an overlapped-size calculating unit 524 (second calculating unit), a third determining unit 525, and a second discarding unit 526 (discarding unit).
  • The first determining unit 521 is a functional unit that determines whether two detection areas are overlapped.
  • The distance calculating unit 522 is a functional unit that, when the first determining unit 521 determines that detection areas are overlapped, calculates the distance (hereafter, sometimes referred to as the distance between frames) between objects in the overlapped detection areas in a depth direction.
  • The second determining unit 523 is a functional unit that determines whether the distance between frames calculated by the distance calculating unit 522 is less than a predetermined threshold. In the following explanation, a distance equal to or longer than the predetermined threshold is referred to as "long distance" (second distance range), and a distance less than the predetermined threshold is "short distance" (first distance range). Here, the second determining unit 523 switches the predetermined threshold to be compared with the distance between frames in accordance with the distance to a closer object between two detection objects, for example, as illustrated in the following (Table 2). For example, as illustrated in (Table 2), when the distance to a closer object between two detection objects is equal to and more than 15 [m] and less than 35 [m], the second determining unit 523 sets 4.5 [m] as the predetermined threshold to be compared with the distance between frames. Here, the relation between the distance to a detection object and the threshold to be compared with the distance between frames illustrated in (Table 2) is an example, and they may be defined with a different relation. The details of a determination process by the second determining unit 523 are described later with reference to FIG. 19. Table 2
    Threshold item Threshold
    Distance between frames (distance to detection object is less than 15[m]) 2.5[m]
    Distance between frames (distance to detection object is equal to or more than 15[m] and less than 35[m]) 4.5 [m]
    Distance between frames (distance to detection object is more than 35[m]) 9[m]
  • Here, FIG. 15 illustrates an example of the case where the distance between frames is a short distance. A disparity image Ip1 illustrated in FIG. 15 indicates that a detection area 641 in which the detection object is a pedestrian and a detection area 642 in which the detection object is a vehicle is in a short distance and parts of the detection areas 641, 642 are overlapped. Conversely, FIG. 16 illustrates an example of the case where the distance between frames is a long distance. A disparity image Ip2 illustrated in FIG. 16 indicates that a detection area 651 in which the detection object is a pedestrian and a detection area 652 in which the detection object is a vehicle are in a long distance and parts of the detection areas 651, 652 are overlapped.
  • The overlapped-size calculating unit 524 is a functional unit that calculates the size (hereafter, sometimes referred to as overlap size) of the area where two detection areas are overlapped. The process to calculate the overlap size by the overlapped-size calculating unit 524 is explained later in detail with reference to FIGS. 19, 20, 22, and 23.
  • The third determining unit 525 is a functional unit that determines whether the overlap size calculated by the overlapped-size calculating unit 524 is more than a predetermined percentage of the size of any one of the two detection areas (a threshold with regard to the overlap percentage of a detection area). Here, the third determining unit 525 switches the predetermined percentage (threshold) depending on whether the distance between frames in two detection areas is a short distance or a long distance, as illustrated in for example the following (Table 3). For example, as illustrated in (Table 3), when the distance between frames in two detection areas is a long distance, the third determining unit 525 uses 15[%] of the size of any one of the two detection areas as the threshold with regard to the overlap percentage of the detection areas. Here, the relation between the distance between frames and the threshold with regard to the overlap percentage of detection areas illustrated in (Table 3) is an example, and they may be defined with a different relation. A determination process by the third determining unit 525 is described later in detail with reference to FIG. 19.
  • [Table 3]
  • Table 3
    Threshold item Threshold
    Overlap percentage of detection areas (when distance between frames is short) 35[%] of any one
    Overlap percentage of detection areas (when distance between frames is long) 15[%] of any one
  • The second discarding unit 526 is a functional unit that determines whether objects in two detection areas are to be discarded in accordance with a determination result regarding the overlap size by the third determining unit 525. The second discarding unit 526 includes the discard flag indicating whether the detection object is discarded in the recognized-area information and sends it to the tracking unit 503. The discard process by the second discarding unit 526 is described later in detail with reference to FIG. 19.
  • The area extracting unit 511, the frame generating unit 512, and the first discarding unit 513 of the clustering processing unit 502 and the first determining unit 521, the distance calculating unit 522, the second determining unit 523, the overlapped-size calculating unit 524, the third determining unit 525, and the second discarding unit 526 of the overlap processing unit 514, illustrated in FIG. 9, are implemented by using the FPGA 51 illustrated in FIG. 3. Furthermore, all or part of the area extracting unit 511, the frame generating unit 512, and the first discarding unit 513 of the clustering processing unit 502 and the first determining unit 521, the distance calculating unit 522, the second determining unit 523, the overlapped-size calculating unit 524, the third determining unit 525, and the second discarding unit 526 of the overlap processing unit 514 may be implemented when the CPU 52 executes programs stored in the ROM 53 instead of the FPGA 51 that is a hardware circuit.
  • The tracking unit 503 is a functional unit that performs a tracking process on a detection object whose discard flag is off on the basis of the recognized-area information that is information related to the object detected by the clustering processing unit 502. The tracking unit 503 outputs the recognized-area information including a result of a tracking process as recognition information to the vehicle control device 6 (see FIG. 3). The tracking unit 503 is implemented by using the FPGA 51 illustrated in FIG. 3. Furthermore, the tracking unit 503 may be implemented when the CPU 52 executes programs stored in the ROM 53 instead of the FPGA 51 that is a hardware circuit.
  • Furthermore, "the image processing apparatus" according to the present invention may be the clustering processing unit 502 or the recognition processing unit 5 including the clustering processing unit 502.
  • Furthermore, the function of each functional unit of the recognition processing unit 5 illustrated in FIG. 9 is illustrated as a concept, and this configuration is not a limitation. For example, multiple functional units that are illustrated as separate functional units in the recognition processing unit 5 illustrated in FIG. 9 may be configured as a single functional unit. Conversely, a function provided in a single functional unit in the recognition processing unit 5 illustrated in FIG. 9 may be divided and configured as multiple functional units.
  • [Operation of the object recognition apparatus]
  • Next, with reference to FIGS. 17 to 24, an explanation is given of a specific operation of the object recognition apparatus 1.
  • (Block matching processing of the disparity-value deriving unit)
  • FIG. 17 is a flowchart that illustrates an example of operation during block matching processing by the disparity-value deriving unit according to the embodiment. With reference to FIG. 17, an explanation is given of the flow of operation during the block matching processing by the disparity-value deriving unit 3 in the object recognition apparatus 1.
  • <Step S1-1>
  • The image acquiring unit 100b in the disparity-value deriving unit 3 captures an image of the object in the front by using the left camera (the imaging unit 10b), generates analog image signals, and obtains a luminance image that is an image based on the image signals. Thus, image signals targeted for the subsequent image processing are obtained. Then, a transition is made to Step S2-1.
  • <Step S1-2>
  • The image acquiring unit 100a in the disparity-value deriving unit 3 captures an image of the object in the front by using the right camera (the imaging unit 10a), generates analog image signals, and obtains a luminance image that is an image based on the image signals. Thus, image signals targeted for the subsequent image processing are obtained. Then, a transition is made to Step S2-2.
  • <Step S2-1>
  • The converting unit 200b in the disparity-value deriving unit 3 removes noise from the analog image signals obtained during capturing by the imaging unit 10b and converts it into digital-format image data. Due to this conversion into digital-format image data, image processing is possible on the image based on the image data on a pixel by pixel basis. Then, a transition is made to Step S3-1.
  • <Step S2-2>
  • The converting unit 200a in the disparity-value deriving unit 3 removes noise from the analog image signals obtained during capturing by the imaging unit 10a and converts it into digital-format image data. Due to this conversion into digital-format image data, image processing is possible on the image based on the image data on a pixel by pixel basis. Then, a transition is made to Step S3-2.
  • <Step S3-1>
  • The converting unit 200b outputs the image based on the digital-format image data, converted at Step S2-1, as the comparison image Ib for block matching processing. Thus, the target image to be compared so as to obtain a disparity value during block matching processing is obtained. Then, a transition is made to Step S4.
  • <Step S3-2>
  • The converting unit 200a outputs the image based on the digital-format image data, converted at Step S2-2, as the reference image Ia for block matching processing. Thus, the reference image to obtain a disparity value during block matching processing is obtained. Then, a transition is made to Step S4.
  • <Step S4>
  • The cost calculating unit 301 of the disparity-value calculation processing unit 300 in the disparity-value deriving unit 3 calculates and acquires the cost value C(p,d) of each of the candidate pixels q(x+d,y) for the corresponding pixel on the basis of the luminance value of the reference pixel p(x,y) in the reference image Ia and the luminance value of each of the candidate pixels q(x+d,y) that are identified by shifting them from the pixel at the corresponding position of the reference pixel p(x,y) by the shift amount d on the epipolar line EL in the comparison image Ib based on the reference pixel p(x,y). Specifically, during block matching processing, the cost calculating unit 301 calculates, as the cost value C, the degree of dissimilarity between the reference area pb that is a predetermined area with the reference pixel p in the reference image Ia as a center and the candidate area qb (the same size as the reference area pb) with the candidate pixel q in the comparison image Ib as a center. Then, a transition is made to Step S5.
  • <Step S5>
  • The determining unit 302 of the disparity-value calculation processing unit 300 in the disparity-value deriving unit 3 determines that the shift amount d that corresponds to the minimum value of the cost value C calculated by the cost calculating unit 301 is the disparity value dp with respect to a pixel in the reference image Ia targeted for calculation of the cost value C. Then, the first generating unit 303 of the disparity-value calculation processing unit 300 in the disparity-value deriving unit 3 generates a disparity image that is an image representing the luminance value of each pixel of the reference image Ia with the disparity value dp that corresponds to the pixel on the basis of the disparity value dp determined by the determining unit 302. The first generating unit 303 outputs the generated disparity image to the recognition processing unit 5.
  • Although block matching processing is explained above as an example of stereo matching processing, this is not a limitation, and SGM (Semi-Global Matching) technique may be used for processing.
  • (Object recognition process of the recognition processing unit)
  • FIG. 18 is a flowchart that illustrates an example of operation during the object recognition process by the recognition processing unit according to the embodiment. FIG. 19 is a flowchart that illustrates an example of operation during the overlap process by the recognition processing unit according to the embodiment. FIG. 20 is a diagram that illustrates an overlap size when the distance between frames is a short distance. FIG. 21 is a diagram that illustrates operation to discard a detection object when the distance between frames is a short distance. FIG. 22 is a diagram that illustrates an overlap size when the distance between frames is a long distance. FIG. 23 is a diagram that illustrates a case where there is no overlap size when the distance between frames is a long distance. FIG. 24 is a diagram that illustrates a case where a detection object is not discarded when the distance between frames is a long distance. With reference to FIGS. 18 to 24, an explanation is given of the flow of operation during the object recognition process by the recognition processing unit 5 in the object recognition apparatus 1.
  • <Step S11>
  • The second generating unit 501 receives the disparity image Ip from the disparity-value calculation processing unit 300, receives the reference image Ia from the disparity-value deriving unit 3, and generates various images, such as the V map VM, the U map UM, the U map UM_H, and the real U map RM. Then, a transition is made to Step S12.
  • <Step S12>
  • The area extracting unit 511 of the clustering processing unit 502 extracts an isolated area that is a cluster of pixel values from the real U map RM included in the maps (images) output from the second generating unit 501. Furthermore, by using the V map VM, the U map UM, and the real U map RM, the area extracting unit 511 identifies the position of the object at an isolated area and the actual width, height, and depth in the reference image Ia or the disparity image Ip. Then, for each extracted isolated area, the area extracting unit 511 generates recognized-area information that is information about an isolated area and here includes, in the recognized-area information, for example the identification information on labeling processing and information such as the position and the size of an isolated area in the reference image Ia, the V map VM, the U map UM, and the real U map RM. The area extracting unit 511 sends the generated recognized-area information to the frame generating unit 512. Then, a transition is made to Step S13.
  • <Step S13>
  • The frame generating unit 512 of the clustering processing unit 502 is a functional unit that, with regard to the isolated area of an object on the real U map RM extracted by the area extracting unit 511, generates a frame for the detection area of the object that corresponds to the isolated area in the disparity image Ip (or the reference image Ia). The frame generating unit 512 includes the information on the frame generated on the disparity image Ip or the reference image Ia in the recognized-area information and sends it to the first discarding unit 513. Then, a transition is made to Step S14.
  • <Step S14>
  • The first discarding unit 513 of the clustering processing unit 502 determines what the object is on the basis of the actual size (width, height, depth) of the detection object in a detection area based on the size of the detection area indicated with the frame by the frame generating unit 512 and discards it in accordance with the type of object. To discard a detection object, for example, the first discarding unit 513 includes a flag (discard flag) indicating discard in the recognized-area information on the detection object. The first discarding unit 513 includes the discard flag indicating whether the detection object is to be discarded in the recognized-area information and sends it to the overlap processing unit 514. Then, a transition is made to Step S15.
  • <Step S15>
  • When detection areas are overlapped, the overlap processing unit 514 performs an overlap process to determine whether objects in the detection areas are to be discarded on the basis of the size of the overlapped detection areas. The overlap process by the overlap processing unit 514 is explained with reference to FIG. 19.
  • «Step S151»
  • The first determining unit 521 of the overlap processing unit 514 identifies any two detection objects among the detection objects that correspond to pieces of recognized-area information received from the first discarding unit 513. Then, a transition is made to Step S152.
  • «Step S152»
  • The first determining unit 521 determines whether the detection areas of the two identified detection objects are overlapped. When the two detection areas are overlapped (Step S152: Yes), a transition is made to Step S153, and when they are not overlapped (Step S152: No), Step S151 is returned so that the first determining unit 521 identifies two different detection objects.
  • «Step S153»
  • When the first determining unit 521 determines that the detection areas are overlapped, the distance calculating unit 522 of the overlap processing unit 514 calculates the distance between frames of the objects in the overlapped detection areas in a depth direction. Then, a transition is made to Step S154.
  • «Step S154»
  • The second determining unit 523 of the overlap processing unit 514 determines whether the distance between frames calculated by the distance calculating unit 522 is less than a predetermined threshold. When the distance between frames is less than the predetermined threshold, that is, when the distance between frames is a short distance (Step S154: Yes), a transition is made to Step S155, and when it is equal to or more than the predetermined threshold, that is, when the distance between frames is a long distance (Step S154: No), a transition is made to Step S159.
  • <<Step S155>>
  • When the second determining unit 523 determines that the distance between frames is a short distance, the overlapped-size calculating unit 524 of the overlap processing unit 514 calculates the overlap size of the area where two detection areas are overlapped. For example, as illustrated in FIG. 20, when a detection area 661 and a detection area 662 are overlapped, the overlapped-size calculating unit 524 calculates the size of an overlapped area 663 that is an area overlapped by using (height OL_H)×(width OL_W). Then, a transition is made to Step S156.
  • <<Step S156>>
  • The third determining unit 525 of the overlap processing unit 514 determines whether the overlap size calculated by the overlapped-size calculating unit 524 is equal to or more than a predetermined percentage of the size of any one of the two detection areas (a threshold with regard to the overlap percentage of the detection areas). When the overlap size is equal to or more than the predetermined percentage of the size of any one of the two detection areas (Step S156: Yes), a transition is made to Step S157, and when it is less than the predetermined percentage (Step S156: No), a transition is made to Step S158.
  • <<Step S157>>
  • When both the detection objects are vehicles, the second discarding unit 526 of the overlap processing unit 514 does not discard the detection object in a short distance with a high degree of importance as the target for a tracking process but discards the detection object in a long distance. The second discarding unit 526 includes the discard flag indicating non-discard in the recognized-area information on the detection object in a short distance, includes the discard flag indicating discard in the recognized-area information on the detection object in a long distance, and sends them to the tracking unit 503.
  • Conversely, when one of the two detection objects is a vehicle and the other one is not a vehicle and is an object whose size is smaller than a vehicle, the second discarding unit 526 does not discard the detection object that is a vehicle but discards the detection object that is not a vehicle and has a size smaller than vehicles. There is a high possibility that a detection object that is not a vehicle and has a size smaller than a vehicle is, for example, part of the vehicle that is improperly detected as a pedestrian and therefore it is discarded. For example, as illustrated in FIG. 21, when the distance between frames in the detection area indicated by a detection frame 671 and the detection area indicated by a detection frame 672 is a short distance, and when the detection object indicated by the detection frame 671 is a vehicle and the detection object indicated by the detection frame 672 is an object other than vehicles (a person in the vehicle in FIG. 21), the second discarding unit 526 does not discard the vehicle indicated by the detection frame 671 but discards the detection object indicated by the detection frame 672. The second discarding unit 526 includes the discard flag indicating non-discard in the recognized-area information on the detection object that is a vehicle, includes the discard flag indicating discard in the recognized-area information on the detection object that is not a vehicle, and sends it to the tracking unit 503.
  • <<Step S158>>
  • When the third determining unit 525 determines that the overlap size is smaller than the predetermined percentage of the size of any one of the two detection areas, the second discarding unit 526 determines that the objects in the detection areas have a high degree of importance as the target for a tracking process and does not discard any of the detection objects. The second discarding unit 526 includes the discard flag indicating non-discard in the recognized-area information on each of the two detection objects and sends it to the tracking unit 503.
  • <<Step S159>>
  • When the second determining unit 523 determines that the distance between frames is a long distance, the overlapped-size calculating unit 524 calculates a central area (an example of a partial area) of the detection area with the detection object in a short distance, included in the two detection areas. Specifically, as illustrated in FIG. 22, the overlapped-size calculating unit 524 calculates for example a central area 681a that has the size of a central area in a horizontal direction (e.g., an area with 80[%] of the width in a horizontal direction) with regard to the detection area 681 whose detection object is closer, included in the two detection areas 681, 682. Although the overlapped-size calculating unit 524 calculates the central area of the detection area with the detection object in a short distance, this is not a limitation and, for example, the area with a predetermined percentage (e.g., 85[%]) from the extreme right of the detection area may be calculated. Then, a transition is made to Step S160.
  • <<Step S160>>
  • The overlapped-size calculating unit 524 calculates the overlap size of the area where the central area of the detection area with the detection object in a short distance and the detection area with the detection object in a long distance are overlapped, included in the two detection areas. For example, as illustrated in FIG. 22, when the central area 681a of the detection area 681 is overlapped with the detection area 682, the overlapped-size calculating unit 524 calculates the size of an overlapped area 683 that is an area overlapped by using (height OL_H1)×(width OL_W1). Then, a transition is made to Step S161.
  • <<Step S161>>
  • The third determining unit 525 determines whether the overlap size calculated by the overlapped-size calculating unit 524 is equal to or more than a predetermined percentage (a threshold with regard to an overlap percentage) of the size of any one of the central area of the detection area with the detection object in a short distance and the detection area with the detection object in a long distance. When the overlap size is equal to or more than the predetermined percentage of the size of any one of them (Step S161: Yes), a transition is made to Step S162, and when it is less than the predetermined percentage (Step S161: No), a transition is made to Step S163.
  • <<Step S162>>
  • With respect to two detection objects, the second discarding unit 526 does not discard the detection object in a short distance with a high degree of importance as the target for a tracking process but discards the detection object in a long distance. In the example illustrated in FIG. 22, when the size (overlap size) of the overlapped area 683 is equal to or more than the predetermined percentage of the size of the central area 681a or the detection area 682, the second discarding unit 526 does not discard the detection object in the detection area 681 that is in a short distance but discards the detection object in the detection area 682 that is in a long distance. The second discarding unit 526 includes the discard flag indicating non-discard in the recognized-area information on the detection object in a short distance, includes the discard flag indicating discard in the recognized-area information on the detection object in a long distance, and sends them to the tracking unit 503.
  • <<Step S163>>
  • When the third determining unit 525 determines that the overlap size is less than the predetermined percentage of the size of any one of the central area of the detection area with the detection object in a short distance and the detection area with the detection object in a long distance, the second discarding unit 526 determines that the objects in both the detection areas have a high degree of importance as the target for a tracking process and does not discard any of the detection objects. That is, when it is simply determined that the overlap size of two detection areas is equal to or more than the predetermined percentage of the size of any one of the two detection areas, there is a possibility that the detection object in a long distance is discarded; however, as the overlap size is obtained with respect to the central area of the detection area in a short distance, it is possible to prevent a detection object (e.g., pedestrian) in a long distance which should not be discarded from being discarded although the detection areas are overlapped near the end. The second discarding unit 526 includes the discard flag indicating non-discard in the recognized-area information on each of the two detection objects and sends it to the tracking unit 503.
  • For example, in the example illustrated in FIG. 23, although the two detection areas 681, 682a are overlapped, the central area 681a of the detection area 681 is not overlapped with the detection area 682a, and therefore the third determining unit 525 determines that the overlap size is less than the predetermined percentage of the size of any one of the central area of the detection area of the detection object in a short distance and the detection area of the detection object in a long distance. In this case, the second discarding unit 526 determines that the detection objects in both the detection areas 681, 682a have a high degree of importance as the target for a tracking process and does not discard any of the detection objects.
  • Furthermore, as illustrated in FIG. 24, for example, when the distance between frames in the detection area indicated by a detection frame 691 and the detection area indicated by the detection frame 692 is a long distance and the third determining unit 525 determines that the overlap size is less than the predetermined percentage of the size of any one of the central area of the detection area in the detection frame 691 with the detection object in a short distance and the detection area in the detection frame 692 with the detection object in a long distance, the second discarding unit 526 does not discard the detection objects indicated by the detection frames 691, 692.
  • After the process at Step S157, S158, S162, or S163 is finished, a transition is made to Step S16.
  • <Step S16>
  • The tracking unit 503 performs a tracking process on a detection object whose discard flag is off on the basis of the recognized-area information that is information about an object detected by the clustering processing unit 502. The tracking unit 503 outputs the recognized-area information including a result of the tracking process as recognition information to the vehicle control device 6 (see FIG. 3).
  • As described above, the object recognition process is conducted during the process at Steps S11 to S16 illustrated in FIG. 18, and, at Step S15, the overlap process is conducted during the process at Steps S151 to S163 illustrated in FIG. 19.
  • As described above, the distance between frames of the detection areas of two detected objects is calculated, the method of calculating the size of the overlapped area with respect to the detection areas of the two objects is switched in accordance with the distance between frames, and it is determined whether the detection object is to be discarded in accordance with the size. Thus, a discard process may be properly conducted. That is, according to the present embodiment, it is possible to discard objects that need to be discarded and refrain from discarding objects that do not need to be discarded other than vehicles.
  • Furthermore, when the distance between frames is a long distance, the central area of the detection area with the detection object in a short distance, included in the two detection areas, is calculated, the overlap size of the area where the central area is overlapped with the detection area with the detection object in a long distance is calculated, it is determined whether it is equal to or more than the predetermined percentage of the size of any one of the central area and the detection area with the detection object in a long distance, and when it is less than that, the two detection objects are not discarded. Thus, when it is simply determined whether the overlap size of two detection areas is equal to or more than the predetermined percentage of the size of any one of the two detection areas, there is a possibility that the detection object in a long distance is discarded; however, as the overlap size is obtained with respect to the central area of the detection area in a short distance, it is possible to prevent a detection object (e.g., pedestrian) in a long distance which should not be discarded from being discarded although the detection areas are overlapped near the end.
  • Furthermore, when the distance between frames is a short distance, the size of the area where the two detection areas are overlapped is calculated, it is determined whether it is equal to or more than the predetermined percentage of the size of any one of the two detection areas, and when it is equal to or more than that and when one of the two detection objects is a vehicle and the other one is not a vehicle and it is an object smaller than a vehicle, the detection object that is a vehicle is not discarded and the detection object that is not a vehicle and is smaller than a vehicle is discarded. Thus, objects that are not vehicles may be discarded accurately as there is a high possibility of false detection.
  • Furthermore, according to the above-described embodiment, the cost value C is an evaluation value representing a degree of dissimilarity; however, it may be an evaluation value representing a degree of similarity. In this case, the shift amount d with which the cost value C, the degree of similarity, becomes maximum (extreme value) is the disparity value dp.
  • Furthermore, according to the above-described embodiment, although the object recognition apparatus 1 installed in an automobile that is the vehicle 70 is explained, this is not a limitation. For example, it may be installed in other examples of vehicles, such as bikes, bicycles, wheelchairs, or cultivators for agricultural use. Furthermore, it may be not only a vehicle that is an example of a movable body, but also a movable body such as a robot.
  • Furthermore, according to the above-described embodiment, when at least any of functional units of the disparity-value deriving unit 3 and the recognition processing unit 5 in the object recognition apparatus 1 is implemented by executing a program, the program is provided by being previously installed in a ROM, or the like. Furthermore, a configuration may be such that a program executed by the object recognition apparatus 1 according to the above-described embodiment is provided by being stored, in the form of a file that is installable and executable, in a recording medium readable by a computer, such as a CD-ROM, a flexible disk (FD), a CD-R (compact disk recordable), or a DVD (digital versatile disk). Furthermore, a configuration may be such that the program executed by the object recognition apparatus 1 according to the above-described embodiment is stored in a computer connected via a network such as the Internet and provided by being downloaded via the network. Moreover, a configuration may be such that the program executed by the object recognition apparatus 1 according to the above-described embodiment is provided or distributed via a network such as the Internet. Furthermore, the program executed by the object recognition apparatus 1 according to the above-described embodiment has a modular configuration that includes at least any of the above-described functional units, and in terms of actual hardware, the CPU 52 (the CPU 32) reads the program from the above-described ROM 53 (the ROM 33) and executes it so as to load and generate the above-described functional units in a main storage device (the RAM 54 (the RAM 34), or the like).
  • Reference Signs List
    • 1 OBJECT RECOGNITION APPARATUS
    • 2 MAIN BODY UNIT
    • 3 DISPARITY-VALUE DERIVING UNIT
    • 4 COMMUNICATION LINE
    • 5 RECOGNITION PROCESSING UNIT
    • 6 VEHICLE CONTROL DEVICE
    • 7 STEERING WHEEL
    • 8 BRAKE PEDAL
    • 10a, 10b IMAGING UNIT
    • 11a, 11b IMAGING LENS
    • 12a, 12b APERTURE
    • 13a, 13b IMAGE SENSOR
    • 20a, 20b SIGNAL CONVERTING UNIT
    • 21a, 21b CDS
    • 22a, 22b AGC
    • 23a, 23b ADC
    • 24a, 24b FRAME MEMORY
    • 30 IMAGE PROCESSING UNIT
    • 31 FPGA
    • 32 CPU
    • 33 ROM
    • 34 RAM
    • 35 I/F
    • 39 BUS LINE
    • 51 FPGA
    • 52 CPU
    • 53 ROM
    • 54 RAM
    • 55 I/F
    • 58 CAN I/F
    • 59 BUS LINE
    • 60 DEVICE CONTROL SYSTEM
    • 70 VEHICLE
    • 100a, 100b IMAGE ACQUIRING UNIT
    • 200a, 200b CONVERTING UNIT
    • 300 DISPARITY-VALUE CALCULATION PROCESSING UNIT
    • 301 COST CALCULATING UNIT
    • 302 DETERMINING UNIT
    • 303 FIRST GENERATING UNIT
    • 501 SECOND GENERATING UNIT
    • 502 CLUSTERING PROCESSING UNIT
    • 503 TRACKING UNIT
    • 511 AREA EXTRACTING UNIT
    • 512 FRAME GENERATING UNIT
    • 513 FIRST DISCARDING UNIT
    • 514 OVERLAP PROCESSING UNIT
    • 521 FIRST DETERMINING UNIT
    • 522 DISTANCE CALCULATING UNIT
    • 523 SECOND DETERMINING UNIT
    • 524 OVERLAP-SIZE CALCULATING UNIT
    • 525 THIRD DETERMINING UNIT
    • 526 SECOND DISCARDING UNIT
    • 600 ROAD SURFACE
    • 600a ROAD SURFACE PORTION
    • 601 POWER POLE
    • 601a POWER POLE PORTION
    • 602 VEHICLE
    • 602a VEHICLE PORTION
    • 611 LEFT GUARDRAIL
    • 611a to 611c LEFT GUARDRAIL PORTION
    • 612 RIGHT GUARDRAIL
    • 612a to 612c RIGHT GUARDRAIL PORTION
    • 613 VEHICLE
    • 613a to 613c VEHICLE PORTION
    • 614 VEHICLE
    • 614a to 614c VEHICLE PORTION
    • 621 to 624 ISOLATED AREA
    • 631 to 634 DETECTION AREA
    • 631a to 634a DETECTION FRAME
    • 641, 642, 651, 652 DETECTION AREA
    • 661, 662 DETECTION AREA
    • 663 OVERLAPPED AREA
    • 671, 672 DETECTION FRAME
    • 681 DETECTION AREA
    • 681a CENTRAL AREA
    • 682, 682a DETECTION AREA
    • 683 OVERLAPPED AREA
    • 691, 692 DETECTION FRAME
    • B BASE LENGTH
    • C COST VALUE
    • D SHIFT AMOUNT
    • DP DISPARITY VALUE
    • E OBJECT
    • EL EPIPOLAR LINE
    • F FOCAL LENGTH
    • Ia REFERENCE IMAGE
    • Ib COMPARISON IMAGE
    • Ip, Ip1, Ip2 DISPARITY IMAGE
    • OL_H, OL_H1 HEIGHT
    • OL_W, OL_W1 WIDTH
    • p REFERENCE PIXEL
    • pb REFERENCE AREA
    • q CANDIDATE PIXEL
    • qb CANDIDATE AREA
    • RM REAL U MAP
    • S, Sa, Sb POINT
    • Um U MAP
    • UM_H U MAP
    • VM V MAP
    • Z DISTANCE

Claims (12)

  1. An image processing apparatus comprising:
    a first calculating unit that calculates a distance between two objects, detected based on distance information on the objects, in a depth direction in detection areas of the objects;
    a second calculating unit that calculates an overlap size that is a size of an overlapped area with regard to the two detection areas by using a method that corresponds to the distance calculated by the first calculating unit; and
    a discarding unit that determines whether each of the two objects in the detection areas is to be discarded in accordance with the overlap size.
  2. The image processing apparatus according to claim 1, further comprising a determining unit that determines whether the distance calculated by the first calculating unit is included in a first distance range or included in a second distance range that is farther than the first distance range, wherein
    when the determining unit determines that the distance is included in the second distance range, the second calculating unit calculates, as the overlap size, a size of an area where a partial area of the detection area of a close object is overlapped with the detection area of a far object, included in the two detection areas, and
    when the overlap size is less than a predetermined percentage of a size of any one of the partial area and the detection area of the far object, the discarding unit discards neither the close object nor the far object.
  3. The image processing apparatus according to claim 2, wherein when the determining unit determines that the distance is included in the second distance range, the second calculating unit obtains, as the partial area, a predetermined central area in a horizontal direction of the detection area of the close object and calculates, as the overlap size, a size of an area where the central area is overlapped with the detection area of the far object.
  4. The image processing apparatus according to claim 2 or 3, wherein when the overlap size is equal to or more than the predetermined percentage of the size of any one of the partial area and the detection area of the far object, the discarding unit does not discard the close object but discards the far object.
  5. The image processing apparatus according to any one of claims 2 to 4, wherein
    the determining unit determines that the distance is included in the first distance range, the second calculating unit calculates, as the overlap size, a size of an area where the two detection areas are overlapped, and
    when the overlap size is equal to or more than a predetermined percentage of a size of any one of the two detection areas and when one of the two detection areas represents a vehicle and another one represents an object other than a vehicle, the discarding unit does not discard an object that is a vehicle but discards an object that is other than a vehicle.
  6. The image processing apparatus according to claim 5, wherein when the overlap size is equal to or more than a predetermined percentage of a size of any one of the two detection areas and when both the detection areas represent a vehicle, the discarding unit does not discard the close object, included in the objects indicated by the two detection areas, but discards the far object.
  7. The image processing apparatus according to claim 5 or 6, wherein when the overlap size is less than a predetermined percentage of a size of any one of the two detection areas, the discarding unit discards neither the close object nor the far object, included in the objects indicated by the two detection areas.
  8. The image processing apparatus according to any one of claims 1 to 7, further comprising:
    an extracting unit that extracts an isolated area indicating an object based on the distance information; and
    a determining unit that determines the detection area by generating a frame for the isolated area.
  9. An object recognition apparatus comprising:
    a first imaging unit that obtains a first captured image by capturing an image of an object;
    a second imaging unit that is located at a position different from a position of the first imaging unit and that obtains a second captured image by capturing an image of the object;
    a generating unit that generates the distance information in accordance with a disparity value obtained from the first captured image and the second captured image with respect to the object; and
    the image processing apparatus according to any one of claims 1 to 8.
  10. A device control system comprising:
    the object recognition apparatus according to claim 9; and
    a control device that controls a control target based on information about an object detected by the object recognition apparatus.
  11. An image processing method comprising:
    a first calculation step of calculating a distance between two objects, detected based on distance information on the objects, in a depth direction in detection areas of the objects;
    a second calculation step of calculating an overlap size that is a size of an overlapped area with regard to the two detection areas by using a method that corresponds to the distance calculated; and
    a discarding step of determining whether each of the two objects in the detection areas is to be discarded in accordance with the overlap size.
  12. A program causing a computer to function as:
    a first calculating unit that calculates a distance between two objects, detected based on distance information on the objects, in a depth direction in detection areas of the objects;
    a second calculating unit that calculates an overlap size that is a size of an overlapped area with regard to the two detection areas by using a method that corresponds to the distance calculated by the first calculating unit; and
    a discarding unit that determines whether each of the two objects in the detection areas is to be discarded in accordance with the overlap size.
EP16894575.6A 2016-03-15 2016-12-08 Image processing apparatus, object recognition apparatus, device control system, image processing method, and program Pending EP3432291A4 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2016051447 2016-03-15
PCT/JP2016/086640 WO2017158958A1 (en) 2016-03-15 2016-12-08 Image processing apparatus, object recognition apparatus, device control system, image processing method, and program

Publications (2)

Publication Number Publication Date
EP3432291A1 true EP3432291A1 (en) 2019-01-23
EP3432291A4 EP3432291A4 (en) 2019-03-27

Family

ID=59852209

Family Applications (1)

Application Number Title Priority Date Filing Date
EP16894575.6A Pending EP3432291A4 (en) 2016-03-15 2016-12-08 Image processing apparatus, object recognition apparatus, device control system, image processing method, and program

Country Status (4)

Country Link
US (1) US10937181B2 (en)
EP (1) EP3432291A4 (en)
JP (1) JP6795027B2 (en)
WO (1) WO2017158958A1 (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017047282A1 (en) * 2015-09-15 2017-03-23 株式会社リコー Image processing device, object recognition device, device control system, image processing method, and program
JP6601506B2 (en) * 2015-12-28 2019-11-06 株式会社リコー Image processing apparatus, object recognition apparatus, device control system, image processing method, image processing program, and vehicle
US10351133B1 (en) 2016-04-27 2019-07-16 State Farm Mutual Automobile Insurance Company Systems and methods for reconstruction of a vehicular crash
JP6950170B2 (en) * 2016-11-30 2021-10-13 株式会社リコー Information processing device, imaging device, device control system, information processing method, and program
CN107980138B (en) * 2016-12-28 2021-08-17 达闼机器人有限公司 False alarm obstacle detection method and device
CN110728710B (en) * 2018-07-16 2023-10-27 株式会社理光 Visual mileage calculation method, device and computer readable storage medium
US11568554B2 (en) * 2019-10-25 2023-01-31 7-Eleven, Inc. Contour-based detection of closely spaced objects
CN109800684B (en) * 2018-12-29 2022-06-21 上海依图网络科技有限公司 Method and device for determining object in video
CN109740518B (en) * 2018-12-29 2022-09-27 上海依图网络科技有限公司 Method and device for determining object in video
CN113631944A (en) * 2019-03-27 2021-11-09 松下知识产权经营株式会社 Distance measuring device and image generating method
JP2020190438A (en) 2019-05-20 2020-11-26 株式会社リコー Measuring device and measuring system
US11430134B2 (en) * 2019-09-03 2022-08-30 Nvidia Corporation Hardware-based optical flow acceleration
JP7408337B2 (en) * 2019-10-10 2024-01-05 キヤノン株式会社 Image processing method and image processing device
WO2021100115A1 (en) * 2019-11-19 2021-05-27 日本電気株式会社 Object detection device, object detection method, and program
CN111857501A (en) * 2020-07-03 2020-10-30 Oppo广东移动通信有限公司 Information display method and device and storage medium
US11343485B1 (en) * 2020-08-24 2022-05-24 Ambarella International Lp Virtual horizontal stereo camera

Family Cites Families (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3367170B2 (en) * 1993-11-05 2003-01-14 株式会社豊田中央研究所 Obstacle detection device
US7227526B2 (en) * 2000-07-24 2007-06-05 Gesturetek, Inc. Video-based image control system
JP3739693B2 (en) * 2001-11-09 2006-01-25 本田技研工業株式会社 Image recognition device
US8744122B2 (en) * 2008-10-22 2014-06-03 Sri International System and method for object detection from a moving platform
JP5316805B2 (en) 2009-03-16 2013-10-16 株式会社リコー In-vehicle camera device image adjustment device and in-vehicle camera device
JP5376313B2 (en) 2009-09-03 2013-12-25 株式会社リコー Image processing apparatus and image pickup apparatus
JP5664152B2 (en) 2009-12-25 2015-02-04 株式会社リコー Imaging device, in-vehicle imaging system, and object identification device
US8861842B2 (en) * 2010-02-05 2014-10-14 Sri International Method and apparatus for real-time pedestrian detection for urban driving
JP5371845B2 (en) * 2010-03-18 2013-12-18 富士フイルム株式会社 Imaging apparatus, display control method thereof, and three-dimensional information acquisition apparatus
US8824779B1 (en) * 2011-12-20 2014-09-02 Christopher Charles Smyth Apparatus and method for determining eye gaze from stereo-optic views
RU2582853C2 (en) * 2012-06-29 2016-04-27 Общество с ограниченной ответственностью "Системы Компьютерного зрения" Device for determining distance and speed of objects based on stereo approach
JP5870871B2 (en) * 2012-08-03 2016-03-01 株式会社デンソー Image processing apparatus and vehicle control system using the image processing apparatus
US20140139635A1 (en) * 2012-09-17 2014-05-22 Nec Laboratories America, Inc. Real-time monocular structure from motion
JP2014115978A (en) * 2012-11-19 2014-06-26 Ricoh Co Ltd Mobile object recognition device, notification apparatus using the device, mobile object recognition program for use in the mobile object recognition device, and mobile object with the mobile object recognition device
JP2014146267A (en) 2013-01-30 2014-08-14 Toyota Motor Corp Pedestrian detection device and driving support device
JP6467798B2 (en) 2013-07-25 2019-02-13 株式会社リコー Image processing apparatus, three-dimensional object detection method, three-dimensional object detection program, and moving object control system
JP6398347B2 (en) 2013-08-15 2018-10-03 株式会社リコー Image processing apparatus, recognition object detection method, recognition object detection program, and moving object control system
JP6174975B2 (en) * 2013-11-14 2017-08-02 クラリオン株式会社 Ambient environment recognition device
JP6417886B2 (en) 2013-12-12 2018-11-07 株式会社リコー Parallax value deriving device, moving body, robot, parallax value production method, and program
JP6340850B2 (en) 2014-03-18 2018-06-13 株式会社リコー Three-dimensional object detection device, three-dimensional object detection method, three-dimensional object detection program, and mobile device control system
JP6519262B2 (en) 2014-04-10 2019-05-29 株式会社リコー Three-dimensional object detection device, three-dimensional object detection method, three-dimensional object detection program, and mobile device control system
JP2016001170A (en) 2014-05-19 2016-01-07 株式会社リコー Processing unit, processing program and processing method
JP2016001464A (en) 2014-05-19 2016-01-07 株式会社リコー Processor, processing system, processing program, and processing method
JP6190758B2 (en) * 2014-05-21 2017-08-30 本田技研工業株式会社 Object recognition device and vehicle
JP6417729B2 (en) * 2014-06-09 2018-11-07 株式会社リコー Image processing apparatus, image processing method, program, parallax data production method, device control system
JP6550881B2 (en) * 2014-07-14 2019-07-31 株式会社リコー Three-dimensional object detection device, three-dimensional object detection method, three-dimensional object detection program, and mobile device control system
US20160019429A1 (en) 2014-07-17 2016-01-21 Tomoko Ishigaki Image processing apparatus, solid object detection method, solid object detection program, and moving object control system
US9726604B2 (en) * 2014-11-12 2017-08-08 Ricoh Company, Ltd. Adhering detection apparatus, adhering substance detection method, storage medium, and device control system for controlling vehicle-mounted devices
US9794543B2 (en) * 2015-03-02 2017-10-17 Ricoh Company, Ltd. Information processing apparatus, image capturing apparatus, control system applicable to moveable apparatus, information processing method, and storage medium of program of method
WO2016168378A1 (en) * 2015-04-13 2016-10-20 Gerard Dirk Smits Machine vision for ego-motion, segmenting, and classifying objects
JP2016206774A (en) * 2015-04-17 2016-12-08 トヨタ自動車株式会社 Three-dimensional object detection apparatus and three-dimensional object detection method
WO2017047282A1 (en) * 2015-09-15 2017-03-23 株式会社リコー Image processing device, object recognition device, device control system, image processing method, and program
JP6601506B2 (en) 2015-12-28 2019-11-06 株式会社リコー Image processing apparatus, object recognition apparatus, device control system, image processing method, image processing program, and vehicle
EP3422289A4 (en) * 2016-02-23 2019-02-27 Ricoh Company, Ltd. Image processing device, imaging device, mobile entity apparatus control system, image processing method, and program
US11087553B2 (en) * 2019-01-04 2021-08-10 University Of Maryland, College Park Interactive mixed reality platform utilizing geotagged social media

Also Published As

Publication number Publication date
EP3432291A4 (en) 2019-03-27
JPWO2017158958A1 (en) 2018-10-04
JP6795027B2 (en) 2020-12-02
US20190012798A1 (en) 2019-01-10
US10937181B2 (en) 2021-03-02
WO2017158958A1 (en) 2017-09-21

Similar Documents

Publication Publication Date Title
EP3432291A1 (en) Image processing apparatus, object recognition apparatus, device control system, image processing method, and program
EP3416132B1 (en) Image processing device, object recognition device, device control system, and image processing method and program
WO2018058356A1 (en) Method and system for vehicle anti-collision pre-warning based on binocular stereo vision
US11691585B2 (en) Image processing apparatus, imaging device, moving body device control system, image processing method, and program product
CN109997148B (en) Information processing apparatus, imaging apparatus, device control system, moving object, information processing method, and computer-readable recording medium
EP3392830B1 (en) Image processing device, object recognition device, apparatus control system, image processing method and program
EP3385904A1 (en) Image processing device, object recognition device, device conrol system, image processing method, and program
US10546383B2 (en) Image processing device, object recognizing device, device control system, image processing method, and computer-readable medium
EP3352134B1 (en) Image processing device, object recognition device, device control system, image processing method, and program
EP3432264A1 (en) Image processing device, image pickup device, mobile-body apparatus control system, image processing method, and program
EP2913998B1 (en) Disparity value deriving device, equipment control system, movable apparatus, robot, disparity value deriving method, and computer-readable storage medium
JP6992356B2 (en) Information processing equipment, image pickup equipment, equipment control system, mobile body, information processing method and program
JP6972798B2 (en) Information processing device, image pickup device, device control system, mobile body, information processing method, and program
EP3336754A2 (en) Information processing apparatus, photographing apparatus, moving object control system, moving object, information processing method, and program
EP3327696B1 (en) Information processing apparatus, imaging device, device control system, mobile body, information processing method, and program
EP3540643A1 (en) Image processing apparatus and image processing method
EP3287948B1 (en) Image processing apparatus, moving body apparatus control system, image processing method, and program
WO2018097269A1 (en) Information processing device, imaging device, equipment control system, mobile object, information processing method, and computer-readable recording medium
JP6828332B2 (en) Image processing equipment, object recognition equipment, equipment control systems, image processing methods and programs

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20180912

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

A4 Supplementary search report drawn up and despatched

Effective date: 20190221

RIC1 Information provided on ipc code assigned before grant

Ipc: B60W 30/08 20120101ALI20190215BHEP

Ipc: G06T 7/00 20170101ALI20190215BHEP

Ipc: G01C 3/06 20060101ALI20190215BHEP

Ipc: G08G 1/16 20060101AFI20190215BHEP

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20210212

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS