WO2018097269A1 - Dispositif de traitement d'informations, dispositif d'imagerie, système de commande d'équipement, objet mobile, procédé de traitement d'informations et support d'enregistrement lisible par ordinateur - Google Patents

Dispositif de traitement d'informations, dispositif d'imagerie, système de commande d'équipement, objet mobile, procédé de traitement d'informations et support d'enregistrement lisible par ordinateur Download PDF

Info

Publication number
WO2018097269A1
WO2018097269A1 PCT/JP2017/042302 JP2017042302W WO2018097269A1 WO 2018097269 A1 WO2018097269 A1 WO 2018097269A1 JP 2017042302 W JP2017042302 W JP 2017042302W WO 2018097269 A1 WO2018097269 A1 WO 2018097269A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
processing
unit
detection
region
Prior art date
Application number
PCT/JP2017/042302
Other languages
English (en)
Inventor
Seiya Amano
Soichiro Yokota
Sukehiro KIMURA
Jun Yoshida
Yohichiroh Ohbayashi
Shintaroh Kida
Hiroki Kubozono
Daisuke Okada
Tabito Suzuki
Sadao Takahashi
Original Assignee
Ricoh Company, Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from JP2017177897A external-priority patent/JP7206583B2/ja
Application filed by Ricoh Company, Ltd. filed Critical Ricoh Company, Ltd.
Priority to US16/347,127 priority Critical patent/US20200074212A1/en
Priority to EP17812277.6A priority patent/EP3545464A1/fr
Priority to CN201780072352.XA priority patent/CN109997148B/zh
Publication of WO2018097269A1 publication Critical patent/WO2018097269A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle

Definitions

  • the present invention relates to an information processing device, an imaging device, an equipment control system, a mobile object, an information processing method, and a computer-readable recording medium.
  • a body structure of an automobile and the like have been developed in view of how to save a pedestrian or how to protect an occupant in a case in which the pedestrian collides with the automobile from the viewpoint of safety of automobiles.
  • an information processing technique and an image processing technique have been developed, so that a technique of rapidly detecting a person and an automobile has been developed.
  • an automobile that prevents collision by automatically braking before the automobile collides with an object.
  • a distance to an object such as a person or another car needs to be precisely measured. Due to this, distance measurement using a millimetric wave radar and a laser radar, a distance measurement using a stereo camera, and the like are put to practical use.
  • a parallax image is generated based on a parallax of each object projected in a taken luminance image, and the object is recognized by integrating pixel groups having similar parallax values.
  • Patent Literature 1 discloses, for a technique of detecting an object using a distance image generated through stereo image processing, a technique of suppressing erroneous detection such that, when a group of the same objects is present among a plurality of detected objects, the same objects are erroneously regarded as a plurality of divided small objects (for example, two pedestrians) although the same objects should be regarded as one object and detected as a single object (for example, one preceding vehicle).
  • the object such as a vehicle and another object adjacent to the former object may be detected as one object.
  • an information processing device comprising: a first generation unit configured to generate first information in which a horizontal direction position and a depth direction position of an object are associated with each other from information in which a vertical direction position, the horizontal direction position, and the depth direction position of the object are associated with each other; a first detection unit configured to detect one region indicating the object based on the first information; a second generation unit configured to generate, from the information in which the vertical direction position, the horizontal direction position, and the depth direction position of the object are associated with each other, second information having separation performance higher than separation performance of the first information in which the horizontal direction position and the depth direction position of the object are associated with each other; a second detection unit configured to detect a plurality of regions indicating objects based on the second information; and an output unit configured to associate the one region detected based on the first information with the regions detected based on the second information, and to output the one region and the regions that are associated with each other.
  • Fig. 1A is a side view of a vehicle on which an equipment control system according to a first embodiment is mounted.
  • Fig. 1B is a front view of the vehicle illustrated in Fig. 1A.
  • Fig. 2 is a diagram illustrating an example of a hardware configuration of an object recognition device according to the first embodiment.
  • Fig. 3 is a diagram illustrating an example of a functional block configuration of the object recognition device according to the first embodiment.
  • Fig. 4 is a diagram illustrating an example of a functional block configuration of a recognition processing unit of the object recognition device according to the first embodiment.
  • Fig. 5A is a diagram illustrating an example of the reference image.
  • Fig. 5B is a diagram illustrating an example of a Vmap generated from the parallax image and the reference image.
  • Fig. 6A is a diagram illustrating an example of the reference image.
  • Fig. 6B is a diagram illustrating an example of a Umap generated from the reference image and the parallax image.
  • Fig. 6C is a diagram illustrating another example of a Umap generated from the reference image and the parallax image.
  • Fig. 7A is a diagram illustrating an example of a real Umap generated from the Umap.
  • Fig. 7B is a diagram illustrating an example of a real Umap generated from the Umap.
  • Fig. 8 is a diagram for explaining a method of sorting a classification of the object.
  • Fig. 9 is a flowchart illustrating an example of processing performed by a clustering processing unit.
  • Fig. 9 is a flowchart illustrating an example of processing performed by a clustering processing unit.
  • FIG. 10A is a diagram for explaining processing of creating a detection frame.
  • Fig. 10B is a diagram for explaining processing of creating a detection frame.
  • Fig. 11 is a flowchart illustrating an example of basic detection processing.
  • Fig. 12 is a flowchart illustrating an example of integration detection processing.
  • Fig. 13 is a flowchart illustrating an example of processing of selecting an object region to be output.
  • Fig. 14 is a flowchart illustrating an example of processing of selecting an object region to be output.
  • Fig. 15A is a diagram for explaining background detection processing in a case of a detection frame for an object region such as a vehicle.
  • Fig. 15B is a diagram for explaining background detection processing in a case of a detection frame for an object region such as a vehicle.
  • Fig. 15A is a diagram for explaining background detection processing in a case of a detection frame for an object region such as a vehicle.
  • Fig. 15B is a diagram for explaining background
  • FIG. 15C is a diagram for explaining background detection processing in a case of a detection frame for an object region such as a vehicle.
  • Fig. 16A is a diagram for explaining background detection processing in a case of a detection frame for an object region in which two groups such as pedestrians are coupled.
  • Fig. 16B is a diagram for explaining background detection processing in a case of a detection frame for an object region in which two groups such as pedestrians are coupled.
  • Fig. 16C is a diagram for explaining background detection processing in a case of a detection frame for an object region in which two groups such as pedestrians are coupled.
  • Fig. 17 is a flowchart illustrating an example of rejection processing.
  • Fig. 18A is a diagram for explaining rejection processing based on background information.
  • Fig. 18B is a diagram for explaining rejection processing based on background information.
  • Fig. 19 is a schematic diagram illustrating a schematic configuration of an equipment control system according to a second embodiment.
  • Fig. 20 is a schematic block diagram of an imaging unit and an analyzing unit.
  • Fig. 21 is a diagram illustrating a positional relation between a subject and an imaging lens of each camera unit.
  • Fig. 22 is a diagram for schematically explaining a function of the analyzing unit.
  • Fig. 23 is a diagram illustrating an example of a function of an object detection processing unit.
  • Fig. 24 is a diagram illustrating an example of a function of a road surface detection processing unit.
  • Fig. 25 is a diagram illustrating an example of a taken image.
  • Fig. 26 is a diagram illustrating an example of a High Umap.
  • Fig. 27 is a diagram illustrating an example of a Standard Umap.
  • Fig. 28 is a diagram illustrating an example of a specific function of the clustering processing unit.
  • Fig. 29 is a diagram illustrating an example of a taken image.
  • Fig. 30 is a diagram illustrating an example of an isolated region.
  • Fig. 31 is a diagram illustrating a region on a parallax image corresponding to the isolated region illustrated in Fig. 30.
  • Fig. 32 is a diagram for explaining rejection processing.
  • Fig. 33 is a flowchart illustrating an example of processing performed by the clustering processing unit.
  • Fig. 34 is a flowchart illustrating an example of isolated region detection processing.
  • Fig. 35 is a flowchart illustrating an example of basic detection processing.
  • Fig. 36 is a diagram illustrating an example after binarization processing is performed.
  • Fig. 37 is a flowchart illustrating an example of separation detection processing.
  • Fig. 38 is a flowchart illustrating an example of detection processing for integration.
  • Fig. 39A is a table illustrating an example of conditions for sorting detection results.
  • Fig. 39B is a table illustrating an example of conditions for sorting detection results.
  • Fig. 39C is a table illustrating an example of conditions for sorting detection results.
  • Fig. 40 is a flowchart illustrating an example of final determination processing.
  • Fig. 41A is a diagram illustrating an example of a condition for rejection.
  • FIG. 41B is a table illustrating an example of a condition for rejection.
  • Fig. 42 is a table illustrating an example of conditions for merge processing.
  • Fig. 43 is a diagram illustrating an example of correction processing.
  • Fig. 44 is a flowchart illustrating an example of integration correction processing.
  • Fig. 45 is a diagram illustrating a circumscribing rectangle of pixels having a parallax within an inclusive frame.
  • Fig. 46 is a flowchart illustrating a procedure of correction processing of a partial frame.
  • Fig. 47 is a table illustrating an example of a condition whether to be a target of coupling processing.
  • Fig. 48 is a flowchart illustrating a procedure of correction processing for short distance.
  • Fig. 49 is a flowchart illustrating a procedure of correction processing for long distance.
  • Fig. 50 is a diagram illustrating an example of a height map.
  • Fig. 51 is a diagram illustrating an example of a region of interest.
  • Fig. 52 is a diagram illustrating an example of a height profile.
  • Fig. 53 is a diagram illustrating an example of a height profile.
  • Fig. 54 is a flowchart illustrating a procedure of coupling determination processing.
  • the accompanying drawings are intended to depict exemplary embodiments of the present invention and should not be interpreted to limit the scope thereof. Identical or similar reference numerals designate identical or similar components throughout the various drawings.
  • FIGs. 1A and 1B are diagrams illustrating an example in which an equipment control system according to the present embodiment is mounted on a vehicle.
  • Figs. 1A and 1B the following describes a vehicle 70 on which an equipment control system 60 according to the present embodiment is mounted.
  • Fig. 1A is a side view of the vehicle 70 on which the equipment control system 60 is mounted
  • Fig. 1B is a front view of the vehicle 70.
  • the equipment control system 60 is mounted on the vehicle 70 as an automobile.
  • the equipment control system 60 includes the object recognition device 1 installed in a compartment as a sitting space of the vehicle 70, a vehicle control device 6 (control device), a steering wheel 7, and a brake pedal 8.
  • the object recognition device 1 has an imaging function for imaging a traveling direction of the vehicle 70, and is installed on an inner side of a front window in the vicinity of a rearview mirror of the vehicle 70, for example. Details about a configuration and an operation of the object recognition device 1 will be described later.
  • the object recognition device 1 includes a main body unit 2, and an imaging unit 10a and an imaging unit 10b fixed to the main body unit 2.
  • the imaging units 10a and 10b are fixed to the main body unit 2 so as to take an image of a subject in the traveling direction of the vehicle 70.
  • the vehicle control device 6 is an electronic control unit (ECU) that executes various vehicle control based on recognition information received from the object recognition device 1.
  • ECU electronice control unit
  • the vehicle control device 6 executes steering control for controlling a steering system (control object) including the steering wheel 7 to avoid an obstacle, brake control for controlling the brake pedal 8 (control object) to decelerate and stop the vehicle 70, or the like based on the recognition information received from the object recognition device 1.
  • safety in driving of the vehicle 70 can be improved by executing vehicle control such as steering control or brake control.
  • the object recognition device 1 is assumed to take an image of the front of the vehicle 70, but the embodiment is not limited thereto. That is, the object recognition device 1 may be installed to take an image of the back or a side of the vehicle 70. In this case, the object recognition device 1 can detect positions of a following vehicle and person in the rear of the vehicle 70, another vehicle and person on a side of the vehicle 70, or the like.
  • the vehicle control device 6 can detect danger at the time when the vehicle 70 changes lanes or when the vehicle 70 joins in a lane, and execute vehicle control as described above.
  • the vehicle control device 6 can execute vehicle control as described above.
  • FIG. 2 is a diagram illustrating an example of a hardware configuration of the object recognition device according to the present embodiment. With reference to Fig. 2, the following describes the hardware configuration of the object recognition device 1.
  • the object recognition device 1 includes a parallax value deriving unit 3 and a recognition processing unit 5 in the main body unit 2.
  • the parallax value deriving unit 3 derives a parallax value dp indicating a parallax for an object E from a plurality of images obtained by imaging the object E, and outputs a parallax image indicating the parallax value dp for each pixel (an example of "measurement information in which a position in a vertical direction of a detecting target, a position in a horizontal direction thereof, and a position in a depth direction thereof are associated with each other").
  • the recognition processing unit 5 performs object recognition processing and the like on an object such as a person and a vehicle projected in a taken image based on the parallax image output from the parallax value deriving unit 3, and outputs, to the vehicle control device 6, recognition information as information indicating a result of object recognition processing.
  • the parallax value deriving unit 3 includes the imaging unit 10a, the imaging unit 10b, a signal conversion unit 20a, a signal conversion unit 20b, and an image processing unit 30.
  • the imaging unit 10a is a processing unit that images a forward subject and generates an analog image signal.
  • the imaging unit 10a includes an imaging lens 11a, a diaphragm 12a, and an image sensor 13a.
  • the imaging lens 11a is an optical element for refracting incident light to form an image of the object on the image sensor 13a.
  • the diaphragm 12a is a member that adjusts a quantity of light input to the image sensor 13a by blocking part of light passed through the imaging lens 11a.
  • the image sensor 13a is a semiconductor element that converts light entering the imaging lens 11a and passing through the diaphragm 12a into an electrical analog image signal.
  • the image sensor 13a is implemented by a solid imaging element such as a charge coupled device (CCD) or a complementary metal oxide semiconductor (CMOS).
  • CCD charge coupled device
  • CMOS complementary metal oxide semiconductor
  • the imaging unit 10b is a processing unit that images a forward subject and generates an analog image signal.
  • the imaging unit 10b includes an imaging lens 11b, a diaphragm 12b, and an image sensor 13b. Functions of the imaging lens 11b, the diaphragm 12b, and the image sensor 13b are the same as the functions of the imaging lens 11a, the diaphragm 12a, and the image sensor 13a described above, respectively.
  • the imaging lens 11a and the imaging lens 11b are installed such that lens surfaces thereof are positioned on the same plane so that the left and right cameras can take an image under the same condition.
  • the signal conversion unit 20a is a processing unit that converts the analog image signal generated by the imaging unit 10a into digital image data.
  • the signal conversion unit 20a includes a correlated double sampling (CDS) 21a, an auto gain control (AGC) 22a, an analog digital converter (ADC) 23a, and a frame memory 24a.
  • CDS correlated double sampling
  • ADC auto gain control
  • ADC analog digital converter
  • the CDS 21a removes noise from the analog image signal generated by the image sensor 13a through correlated double sampling, a differential filter in the horizontal direction, a smoothing filter in the vertical direction, or the like.
  • the AGC 22a performs gain control for controlling strength of the analog image signal from which the noise is removed by the CDS 21a.
  • the ADC 23a converts the analog image signal on which gain control is performed by the AGC 22a into digital image data.
  • the frame memory 24a stores the image data converted by the ADC 23a.
  • the signal conversion unit 20b is a processing unit that converts the analog image signal generated by the imaging unit 10b into digital image data.
  • the signal conversion unit 20b includes a CDS 21b, an AGC 22b, an ADC 23b, and a frame memory 24b. Functions of the CDS 21b, the AGC 22b, the ADC 23b, and the frame memory 24b are the same as the functions of the CDS 21a, the AGC 22a, the ADC 23a, and the frame memory 24a described above, respectively.
  • the image processing unit 30 is a device that performs image processing on the image data converted by the signal conversion unit 20a and the signal conversion unit 20b.
  • the image processing unit 30 includes a field programmable gate array (FPGA) 31, a central processing unit (CPU) 32, a read only memory (ROM) 33, a random access memory (RAM) 34, an interface (I/F) 35, and a bus line 39.
  • FPGA field programmable gate array
  • CPU central processing unit
  • ROM read only memory
  • RAM random access memory
  • I/F interface
  • the FPGA 31 is an integrated circuit, and herein performs processing of deriving the parallax value dp for an image based on the image data.
  • the CPU 32 controls each function of the parallax value deriving unit 3.
  • the ROM 33 stores a computer program for image processing executed by the CPU 32 for controlling each function of the parallax value deriving unit 3.
  • the RAM 34 is used as a work area of the CPU 32.
  • the I/F 35 is an interface for communicating with an I/F 55 of the recognition processing unit 5 via a communication line 4.
  • the bus line 39 is an address bus, a data bus, and the like that connect the FPGA 31, the CPU 32, the ROM 33, the RAM 34, and the I/F 35 to each other in a communicable manner.
  • the image processing unit 30 is assumed to include the FPGA 31 as an integrated circuit for deriving the parallax value dp, but the embodiment is not limited thereto.
  • the integrated circuit may be an application specific integrated circuit (ASIC) and the like.
  • the recognition processing unit 5 includes an FPGA 51, a CPU 52, a ROM 53, a RAM 54, the I/F 55, a controller area network (CAN) I/F 58, and a bus line 59.
  • the FPGA 51 is an integrated circuit, and herein performs object recognition processing on the object based on the parallax image received from the image processing unit 30.
  • the CPU 52 controls each function of the recognition processing unit 5.
  • the ROM 53 stores a computer program for object recognition processing executed by the CPU 52 for performing object recognition processing of the recognition processing unit 5.
  • the RAM 54 is used as a work area of the CPU 52.
  • the I/F 55 is an interface for performing data communication with the I/F 35 of the image processing unit 30 via the communication line 4.
  • the CAN I/F 58 is an interface for communicating with an external controller (for example, the vehicle control device 6 illustrated in Fig. 2).
  • the bus line 59 connected to a CAN and the like of an automobile is an address bus, a data bus, and the like that connect the FPGA 51, the CPU 52, the ROM 53, the RAM 54, the I/F 55, and the CAN I/F 58 in a communicable manner as illustrated in Fig. 2.
  • the FPGA 51 performs object recognition processing and the like for the object such as a person and a vehicle projected in the taken image based on the parallax image in accordance with a command from the CPU 52 of the recognition processing unit 5.
  • Each computer program described above may be recorded and distributed in a computer-readable recording medium as an installable or executable file.
  • the recording medium include a compact disc read only memory (CD-ROM) or a secure digital (SD) memory card.
  • Fig. 3 is a diagram illustrating an example of a functional block configuration of the object recognition device according to the present embodiment. First, the following describes a configuration and operation of the functional block of the object recognition device 1 with reference to Fig. 3.
  • the object recognition device 1 includes the parallax value deriving unit 3 and the recognition processing unit 5 as illustrated in Fig. 3.
  • the parallax value deriving unit 3 includes an image acquisition unit 100a (first imaging module), an image acquisition unit 100b (second imaging module), conversion units 200a and 200b, and a parallax value arithmetic processing unit (generation unit) 300.
  • At least some of the functional units of the object recognition device 1 may be implemented by the FPGA 31 or the FPGA 51, or may be implemented when a computer program is executed by the CPU 32 or the CPU 52.
  • the image acquisition unit 100a and the image acquisition unit 100b are functional units that obtain a luminance image from images taken by the right camera (imaging unit 10a) and the left camera (imaging unit 10b), respectively.
  • the conversion unit 200a is a functional unit that removes noise from image data of the luminance image obtained by the image acquisition unit 100a and converts the image data into digital image data to be output.
  • the conversion unit 200a may be implemented by the signal conversion unit 20a illustrated in Fig. 2.
  • the conversion unit 200b is a functional unit that removes noise from image data of the luminance image obtained by the image acquisition unit 100b and converts the image data into digital image data to be output.
  • the conversion unit 200b may be implemented by the signal conversion unit 20b illustrated in Fig. 2.
  • the luminance image taken by the image acquisition unit 100a serving as the right camera (imaging unit 10a) is assumed to be image data of a reference image Ia (hereinafter, simply referred to as a reference image Ia), and the luminance image taken by the image acquisition unit 100b serving as the left camera (imaging unit 10b) is assumed to be image data of a comparative image Ib (hereinafter, simply referred to as a comparative image Ib). That is, the conversion units 200a and 200b output the reference image Ia and the comparative image Ib, respectively, based on the two luminance images output from the image acquisition units 100a and 100b.
  • the parallax value arithmetic processing unit 300 derives the parallax value for each pixel of the reference image Ia based on the reference image Ia and the comparative image Ib received from the conversion units 200a and 200b, and generates a parallax image in which each pixel of the reference image Ia is associated with the parallax value.
  • Fig. 4 is a diagram illustrating an example of a functional block configuration of the recognition processing unit of the object recognition device according to the present embodiment. With reference to Fig. 4, the following describes a configuration and operation of the functional block of the recognition processing unit 5.
  • the recognition processing unit 5 includes a second generation unit 500, a clustering processing unit 510, and a tracking unit 530.
  • the second generation unit 500 is a functional unit that receives the parallax image input from the parallax value arithmetic processing unit 300, receives the reference image Ia input from the parallax value deriving unit 3, and generates a V-Disparity map, a U-Disparity map, a Real U-Disparity map, and the like.
  • the V-Disparity map is an example of "information in which a position in the vertical direction is associated with a position in the depth direction”.
  • the U-Disparity map and the Real U-Disparity map are examples of "information in which a position in the horizontal direction is associated with a position in the depth direction".
  • the second generation unit 500 includes a third generation unit (movement surface estimation unit) 501, a fourth generation unit 502, and a fifth generation unit 503.
  • the following describes a configuration and operation of the second generation unit 500 of the recognition processing unit 5.
  • Fig. 5A is a diagram illustrating an example of the reference image
  • Fig. 5B is a diagram illustrating an example of a Vmap generated from the reference image and the parallax image
  • Fig. 6A is a diagram illustrating an example of the reference image
  • Figs. 6B and 6C are diagrams illustrating examples of a Umap generated from the reference image and the parallax image
  • Figs. 7A and 7B are diagrams illustrating an examples of a real Umap generated from the Umap.
  • the third generation unit 501 is a functional unit that generates a Vmap VM as the V-Disparity map illustrated in Fig. 5B for detecting a road surface (movement surface) from the parallax image input from the parallax value arithmetic processing unit 300.
  • the V-Disparity map is a two-dimensional histogram indicating frequency distribution of the parallax value dp assuming that the vertical axis indicates the y-axis (vertical direction) of the reference image Ia (Fig. 5A), and the horizontal axis indicates the parallax value dp of the parallax image or a distance in the depth direction.
  • a road surface 600, a utility pole 601, and a car 602 are projected.
  • the road surface 600 in the reference image Ia corresponds to a road surface part 600a in the Vmap VM
  • the utility pole 601 corresponds to a utility pole part 601a
  • the car 602 corresponds to a car part 602a.
  • the third generation unit 501 makes linear approximation of a position estimated to be the road surface from the generated Vmap VM. Approximation can be made with one straight line when the road surface is flat, but when an inclination of the road surface is variable, linear approximation needs to be accurately made by dividing a section in the Vmap VM. As linear approximation, Hough transform, a method of least squares, or the like as a well-known technique can be utilized.
  • the utility pole part 601a and the car part 602a as clusters positioned above the detected road surface part 600a correspond to the utility pole 601 and the car 602 as objects on the road surface 600, respectively.
  • "f" is a value obtained by converting a focal length of the imaging units 10a and 10b into the same unit as a unit of (y'-y0).
  • BF is a value obtained by multiplying a base length B by a focal length f of the imaging units 10a and 10b
  • offset is a parallax in a case of photographing an infinite object.
  • the fourth generation unit 502 is a functional unit that generates a Umap UM (second frequency image) as the U-Disparity map illustrated in Fig. 6B for recognizing the object by utilizing only information positioned above (an example of "equal to or higher than the first height") the road surface detected in the Vmap VM, that is, utilizing information on the parallax image corresponding to a left guardrail 611, a right guardrail 612, a car 613, and a car 614 in the reference image Ia illustrated in Fig. 6A.
  • a Umap UM second frequency image
  • the Umap UM is a two-dimensional histogram indicating frequency distribution of the parallax value dp assuming that the horizontal axis indicates the x-axis (horizontal direction) of the reference image Ia, and the vertical axis indicates the parallax value dp of the parallax image or a distance in the depth direction.
  • the left guardrail 611 in the reference image Ia illustrated in Fig. 6A corresponds to a left guardrail part 611a in the Umap UM
  • the right guardrail 612 corresponds to a right guardrail part 612a
  • the car 613 corresponds to a car part 613a
  • the car 614 corresponds to a car part 614a.
  • the fourth generation unit 502 generates a height Umap UM_H as an example of the U-Disparity map illustrated in Fig. 6C by utilizing only information positioned above the road surface detected in the Vmap VM, that is, utilizing information on the parallax image corresponding to the left guardrail 611, the right guardrail 612, the car 613, and the car 614 in the reference image Ia illustrated in Fig. 6A.
  • the height Umap UM_H as an example of the U-Disparity map is an image in which the horizontal axis is assumed to be the x-axis of the reference image Ia, the vertical axis is assumed to indicate the parallax value dp of the parallax image, and a pixel value is assumed to be the height of the object. In this case, a value of the height of the object is the largest value of the height from the road surface.
  • the car 6A corresponds to a left guardrail part 611b in the height Umap UM_H
  • the right guardrail 612 corresponds to a right guardrail part 612b
  • the car 613 corresponds to a car part 613b
  • the car 614 corresponds to a car part 614b.
  • the fifth generation unit 503 generates, from the height Umap UM_H generated by the fourth generation unit 502, a real height Umap RM_H as an example of the Real U-Disparity map illustrated in Fig. 7A obtained by converting the horizontal axis into an actual distance.
  • the fifth generation unit 503 also generates, from the Umap UM generated by the fourth generation unit 502, a real Umap RM as an example of the Real U-Disparity map illustrated in Fig. 7B obtained by converting the horizontal axis into an actual distance through the same processing as the processing described above.
  • each of the real height Umap RM_H and the real Umap RM is a two-dimensional histogram assuming that the horizontal axis indicates an actual distance in a direction (horizontal direction) from the imaging unit 10b (left camera) to the imaging unit 10a (right camera), and the vertical axis indicates the parallax value dp of the parallax image (or a distance in the depth direction converted from the parallax value dp).
  • the car part 613b corresponds to a car part 613c
  • the car part 614b corresponds to a car part 614c.
  • the fifth generation unit 503 generates the real height Umap RM_H and the real Umap RM corresponding to an overhead view by not performing thinning out when the object is at a distant place (the parallax value dp is small) because the object is small and an amount of parallax information and resolution of distance are small, and by largely thinning out pixels when the object is at a short-distance place because the object is projected to be large and the amount of parallax information and the resolution of distance are large.
  • a cluster (object region) of pixel values can be extracted from the real height Umap RM_H or the real Umap RM.
  • the width of a rectangle surrounding the cluster corresponds to the width of the extracted object, and the height thereof corresponds to the depth of the extracted object.
  • the fifth generation unit 503 does not necessarily generate the real height Umap RM_H from the height Umap UM_H. Alternatively, the fifth generation unit 503 can generate the real height Umap RM_H directly from the parallax image.
  • the second generation unit 500 can specify the position in the X-axis direction and the width (xmin, xmax) in the parallax image and the reference image Ia of the object from the generated height Umap UM_H or real height Umap RM_H.
  • the second generation unit 500 can specify an actual depth of the object from information of the height of the object (dmin, dmax) in the generated height Umap UM_H or real height Umap RM_H.
  • the second generation unit 500 can also specify an actual size in the x-axis direction and the y-axis direction of the object from the width in the x-axis direction (xmin, xmax) and the height in the y-axis direction (ymin, ymax) of the object specified in the parallax image, and the parallax value dp corresponding thereto.
  • the second generation unit 500 can specify the position of the object in the reference image Ia and the actual width, height, and depth thereof by utilizing the Vmap VM, the height Umap UM_H, and the real height Umap RM_H.
  • the position of the object in the reference image Ia is specified, so that the position thereof in the parallax image is also determined, and the second generation unit 500 can specify the distance to the object.
  • Fig. 8 is a diagram for explaining a method of sorting a classification of the object.
  • the second generation unit 500 can specify the classification of the object (object type) using a table illustrated in Fig. 8 based on an actual size (the width, the height, and the depth) specified for the object. For example, in a case in which the width of the object is 1300 [mm], the height thereof is 1800 [mm], and the depth thereof is 2000 [mm], the object can be specified as an "ordinary car”.
  • Information associating the width, the height, and the depth with the classification of the object (object type) as illustrated in Fig. 8 may be stored as a table in the RAM 54 and the like.
  • the clustering processing unit 510 illustrated in Fig. 4 is a functional unit that detects the object such as a vehicle based on each map input from the second generation unit 500. As illustrated in Fig. 4, the clustering processing unit 510 includes a basic detection unit 511, a separation detection unit 512, an integration detection unit 513, a selection unit 514, a frame creation unit 515, a background detection unit 516, and a rejection unit 517.
  • the basic detection unit 511 performs basic detection processing for detecting the depth, the width, and the like of the object such as a vehicle based on the Real U-Disparity map as a high-resolution map.
  • the following describes an example in which the basic detection unit 511 performs detection using the Real U-Disparity map.
  • the basic detection unit 511 may perform detection using the U-Disparity map.
  • the basic detection unit 511 may perform processing of converting the x-coordinate in the U-Disparity map into an actual distance and the like in the lateral direction (horizontal direction).
  • the basic detection processing if the road surface that is estimated based on the Vmap VM is lower than an actual road surface, for example, detection accuracy for the object region is deteriorated.
  • the separation detection unit 512 performs separation detection processing for detecting the depth, the width, and the like of the object such as a vehicle using, as an example of a high position map, a map using a parallax point of which the height from the road surface is equal to or larger than a predetermined value ("second height") among parallax points included in the Real U-Disparity map.
  • the separation detection unit 512 may separate the same object into a plurality of object regions to be detected in some cases.
  • the integration detection unit 513 uses, as an example of a low-resolution map, a small real Umap obtained by reducing the Real U-Disparity map by thinning out the pixels, for example, to perform integration detection processing for detecting the depth, the width, and the like of the object such as a vehicle.
  • the number of pixels in the small real Umap is smaller than that of the real Umap, so that resolution of the small real Umap is assumed to be low.
  • the integration detection unit 513 may perform detection using a map obtained by reducing the U-Disparity map.
  • the integration detection unit 513 uses the small real Umap of which the resolution is relatively low, so that the integration detection unit 513 may detect a plurality of objects as the same object in some cases.
  • detection performance for the object can be improved by basically using the high-resolution map for object detection, and also using the high position map having higher separation performance and the low-resolution map that can integrally detect the same object.
  • the selection unit 514 selects an object not to be rejected from among the objects detected by the basic detection unit 511, the separation detection unit 512, and the integration detection unit 513.
  • rejection means processing of excluding the object from processing at a later stage (tracking processing and the like).
  • the frame creation unit 515 creates a frame (detection frame) in a region (recognition region) in a parallax image Ip (or the reference image Ia) corresponding to a region of the object selected by the selection unit 514.
  • the frame means information of a rectangle surrounding the object as information indicating the position and the size of the object, for example, information of coordinates of corners of the rectangle and the height and the width of the rectangle.
  • the background detection unit 516 detects, in the detection frame created by the frame creation unit 515, a background of the object corresponding to the detection frame.
  • the rejection unit 517 rejects the object corresponding to the detection frame in which a background satisfying a predetermined condition is detected by the background detection unit 516. Background detection and rejection based thereon are preferably performed, but are not necessarily performed.
  • the tracking unit 530 is a functional unit that executes tracking processing as processing of tracking the object based on recognition region information as information about the object recognized by the clustering processing unit 510.
  • the recognition region information means information about the object recognized by the clustering processing unit 510, and includes information such as the position and the size of the recognized object in the V-Disparity map, the U-Disparity map, and the Real U-Disparity map, an identification number of labeling processing described later, and a rejection flag, for example.
  • Fig. 9 is a flowchart illustrating an example of processing performed by the clustering processing unit 510.
  • the basic detection unit 511 of the clustering processing unit 510 performs basic detection processing for detecting a region of the object from the real Umap RM.
  • the basic detection processing a cluster of parallax points on the real Umap RM is detected.
  • the number of pixels is relatively large, so that the resolution of distance is relatively high, and parallax information of the object positioned above the road surface is utilized.
  • the object region is detected with relatively stable accuracy.
  • the road surface that is estimated based on the Vmap VM is lower than an actual road surface, or when the number of parallax points of the object as a detection target is small, for example, detection accuracy for the object region is deteriorated. Details about the basic detection processing will be described later.
  • the separation detection unit 512 of the clustering processing unit 510 performs separation detection processing for detecting a region of the object using a parallax point of which the height from the road surface is equal to or larger than a predetermined value among parallax points included in the real Umap RM (Step S12).
  • the separation detection processing a cluster of parallax points of which the height from the road surface is equal to or larger than the predetermined value is detected from among the parallax points included in the real Umap RM.
  • an object region obtained by correctly separating the objects from each other can be detected because they are not influenced by an object of which the height from the road surface is relatively low.
  • the same object may be detected being separated into a plurality of object regions in some cases. Details about the separation detection processing will be described later.
  • the integration detection unit 513 of the clustering processing unit 510 performs integration detection processing for detecting the region of the object using the small real Umap as an image obtained by thinning out the pixels from the real Umap RM (Step S13).
  • the small real Umap may be created by thinning out the pixels from the real Umap RM so that the width of one pixel corresponds to about 10 cm, for example.
  • the pixel may be simply extracted from the real Umap RM, or a value of the pixel in the small real Umap may be determined based on a value of a pixel within a predetermined range from the pixel extracted from the real Umap RM.
  • the same object in a case of an object of which the number of parallax points is small, the same object is relatively hardly detected being separated into a plurality of object regions.
  • the resolution of distance is relatively low, so that a plurality of objects adjacent to each other may be detected as the same object, for example. Details about the integration detection processing will be described later.
  • the basic detection processing, the separation detection processing, and the integration detection processing described above may be performed in any order, or may be performed in parallel.
  • the selection unit 514 of the clustering processing unit 510 selects the object region to be output to the frame creation unit 515 from among object regions detected through the "basic detection processing", the “separation detection processing", and the “integration detection processing” described above (Step S14). Details about processing of selecting the object region to be output to the frame creation unit 515 will be described later.
  • Fig. 10A and 10B are diagrams for explaining the processing of creating the detection frame
  • Fig. 10A is a diagram illustrating an example of the real Umap RM
  • Fig. 10B is a diagram illustrating an example of the parallax image Ip (the reference image Ia) based on the real Umap RM.
  • the background detection unit 516 of the clustering processing unit 510 detects a background in a detection frame corresponding to the object region detected through the "integration detection processing" among created detection frames (Step S16). Details about the processing of detecting the background in the detection frame will be described later.
  • the rejection unit 517 of the clustering processing unit 510 performs rejection processing (Step S17). Details about the rejection processing will be described later.
  • Fig. 11 is a flowchart illustrating an example of the basic detection processing.
  • the basic detection unit 511 performs 8-neighbor labeling processing for giving the same ID to pixels that are continuous in a vertical, horizontal, or oblique direction for a parallax point as a pixel having a pixel value (frequency of the parallax) equal to or larger than a predetermined value in the real Umap RM.
  • Well-known labeling processing can be utilized.
  • the basic detection unit 511 sets a rectangle circumscribing each pixel group (each isolated region) to which the same ID is given (Step S202).
  • the basic detection unit 511 rejects the rectangle having a size equal to or smaller than a predetermined value (Step S203). This is because the rectangle having a size equal to or smaller than the predetermined value can be determined to be noise.
  • the basic detection unit 511 may also reject a rectangle having an average value of the pixel value (frequency of the parallax) in an area of the real Umap RM with respect to an area of each rectangle is smaller than the predetermined value, for example.
  • the rectangle circumscribing each isolated region is detected as the object region.
  • the basic detection processing it is sufficient that the region indicating the object is detected based on the parallax image.
  • the basic detection processing may be performed using a well-known technique.
  • the separation detection processing is significantly different from the "basic detection processing" described above in that used is the parallax point of which the height from the road surface is equal to or larger than the predetermined value among the parallax points included in the real Umap RM instead of using all parallax points included in the real Umap RM.
  • Other points may be the same as those of the "basic detection processing" described above.
  • a break of the parallax point equal to or smaller than a predetermined value (for example, corresponding to one pixel) in the horizontal direction in the real Umap RM is possibly caused by noise, so that the parallax point may be regarded to be continuous.
  • FIG. 12 is a flowchart illustrating an example of the integration detection processing.
  • the integration detection unit 513 performs 4-neighbor labeling processing for giving the same ID to pixels (parallax points) that are continuous in the vertical direction (depth direction) or the lateral direction (horizontal direction) on the small real Umap.
  • the 8-neighbor labeling processing may be used.
  • the integration detection unit 513 sets a rectangle circumscribing each pixel group (each isolated region) to which the same ID is given (Step S302).
  • the integration detection unit 513 extracts the object such as a vehicle (Step S303).
  • the integration detection unit 513 extracts the region of the object such as a vehicle based on the width, the depth, frequency of the parallax, and the like of each isolated region. Accordingly, the rectangle circumscribing each isolated region is detected as the object region.
  • Fig. 13 is a flowchart illustrating an example of processing of selecting the object region to be output.
  • the selection unit 514 rejects an object region not present on a lane on which a host vehicle is traveling among the object regions detected through the integration detection processing. For example, when the position of the object region is outside a predetermined range from a forward direction of the host vehicle, the selection unit 514 rejects the object region. Accordingly, for an object that may hamper the host vehicle from traveling, the object region detected through the integration detection processing is output.
  • the predetermined range may be set to be relatively wide corresponding to the distance from the host vehicle.
  • the selection unit 514 determines whether the object region detected through the integration detection processing is overlapped with one object region detected through the basic detection processing in a certain degree in the real Umap RM (Step S402). For example, if a value obtained by dividing an area of a region in which the object region detected through the integration detection processing is overlapped with the object region detected through the basic detection processing in the real Umap RM by an area of the object region detected through the basic detection processing is equal to or larger than a predetermined threshold, it is determined that they are overlapped with each other in a certain degree.
  • the selection unit 514 determines whether the size of the object region as a result of the integration detection processing is smaller than the object region as a result of the basic detection processing (Step S403). If the size is determined to be smaller (YES at Step S403), the object region detected through the basic detection processing and the object region detected through the separation detection processing are output to the frame creation unit 515 (Step S404), and the process is ended. That is, a result of the basic detection processing as an inclusive detection result and a result of the separation detection processing as a partial detection result are output while being associated with each other as information indicating the same object.
  • the selection unit 514 determines whether a plurality of object regions detected through the separation detection processing are present in the one object region detected through the basic detection processing (Step S405).
  • the selection unit 514 outputs the object region detected through the integration detection processing and the object regions detected through the separation detection processing to the frame creation unit 515 (Step S406), and the process is ended. That is, the result of the integration detection processing as an inclusive detection result and the result of the separation detection processing as a partial detection result are output while being associated with each other as information indicating the same object. This is because the result of the integration detection processing is considered to be most reliable as information indicating one object, and the result of the separation detection processing is considered to be most reliable as information indicating a plurality of objects when there are a plurality of object regions detected through the separation detection processing in one object region detected through the basic detection processing.
  • the selection unit 514 outputs the object region detected through the integration detection processing and the one object region detected through the basic detection processing to the frame creation unit 515 (Step S407), and the process is ended. That is, the result of the integration detection processing as an inclusive detection result and the result of the basic detection processing as a partial detection result are output while being associated with each other as information indicating the same object.
  • the result of the basic detection processing and the result of the separation detection processing can be equally treated when a plurality of object regions detected through the separation detection processing are not present in one object region detected through the basic detection processing, so that the result of the integration detection processing is considered to be most reliable as information indicating one object, and the result of the basic detection processing is considered to be most reliable as information indicating a plurality of objects.
  • the selection unit 514 outputs only the object region detected through the integration detection processing to the frame creation unit 515 (Step S408), and the process is ended. That is, the result of the integration detection processing as an inclusive detection result and a result indicating that the object region is not detected as a partial detection result are output while being associated with each other as information indicating the same object. This is because the result of the integration detection processing that is hardly influenced by noise is considered to be most reliable as information indicating a rough position of the object when the object region detected through the integration detection processing is not overlapped with one object region detected through the basic detection processing in a certain degree.
  • Step S402 The processing subsequent to Step S402 is executed for each object region detected through the integration detection processing.
  • Fig. 14 is a flowchart illustrating an example of processing of selecting an object region to be output.
  • the background detection unit 516 calculates a range on the real Umap RM corresponding to the detection frame created in the parallax image Ip.
  • the range may be a range between a left end of the coordinate in the horizontal direction in the real Umap RM of the object region corresponding to the detection frame and a right end of the coordinate in the horizontal direction of the object region.
  • the range may be a range between two different straight lines connecting the center of the imaging unit 10a and the imaging unit 10b and the parallax point of the object region on the real Umap RM corresponding to the detection frame, that is, a first straight line having the largest angle with respect to the horizontal direction and a second straight line having the smallest angle with respect to the horizontal direction.
  • the background detection unit 516 creates a histogram (hereinafter, referred to as an "object parallax histogram") indicating a total value of parallax frequency of the parallax points of the object region on the real Umap RM corresponding to the detection frame in the range (Step S502).
  • the background detection unit 516 creates a histogram (hereinafter, referred to as a "background parallax histogram") indicating a total value of parallax frequency of the parallax points distant from the object region on the real Umap RM corresponding to the detection frame by a predetermined distance or more in the range (Step S503).
  • a histogram hereinafter, referred to as a "background parallax histogram” indicating a total value of parallax frequency of the parallax points distant from the object region on the real Umap RM corresponding to the detection frame by a predetermined distance or more in the range
  • the background detection unit 516 determines whether there is a portion having a value of the object parallax histogram equal to or smaller than a first predetermined value and a value of the background parallax histogram equal to or larger than a second predetermined value in the range (Step S504).
  • the background detection unit 516 determines that the background is present in the detection frame (Step S505), and the process is ended.
  • the background detection unit 516 determines that the background is not present in the detection frame (Step S506), and the process is ended.
  • Figs. 15A, 15B, and 15C are diagrams for explaining background detection processing in a case of the detection frame for the object region such as a vehicle. Only parallax points on the real Umap RM present in a range 702 of a predetermined height may be used, among the predetermined height of a detection frame 701 for the object region such as a vehicle in Fig. 15A. In this case, in an object parallax histogram 705, a total value of parallax frequency is increased at portions corresponding to the vicinities of both ends 703 and 704 of a vehicle and the like as illustrated in Fig. 15B.
  • Figs. 16A, 16B, and 16C are diagrams for explaining background detection processing in a case of a detection frame for an object region in which two groups such as pedestrians are coupled.
  • Figs. 15A-15C only parallax points on the real Umap RM present in a range 712 of a predetermined height may be used, the predetermined height of a detection frame 711 for the object region in Fig. 16A.
  • an object parallax histogram 717 a total value of parallax frequency is increased in the vicinity of pedestrians 713, 714, 715, 716, and the like as illustrated in Fig. 16B.
  • a background parallax histogram 718 as illustrated in Fig. 16C, there is a portion 720 where a value of the background parallax histogram 718 is equal to or larger than a predetermined value in a portion 719 where a value of the object parallax histogram 717 is not substantially present.
  • Step S505 it is determined that the background is present in the detection frame.
  • Fig. 17 is a flowchart illustrating an example of the rejection processing.
  • the rejection processing a detection frame satisfying a predetermined condition is rejected among the detection frames corresponding to the object regions selected at Step S14.
  • each detection frame determined to include a background may be caused to be a processing target from among the detection frames corresponding to the object regions detected through the "integration detection processing".
  • the rejection unit 517 determines whether there are a plurality of detection frames corresponding to a plurality of object regions detected through the basic detection processing or the separation detection processing in the detection frame as a processing target.
  • the rejection unit 517 determines whether the background is present in a portion between the detection frames (Step S602). At this point, when a value of the background parallax histogram is equal to or larger than the predetermined value in the portion similarly to the processing of detecting the background in the detection frame described above, it is determined that the background is present.
  • the rejection unit 517 rejects the detection frame as a processing target (Step S603).
  • Figs. 18A and 18B are diagrams for explaining rejection processing based on background information.
  • a detection frame 752 corresponding to one object region detected through the basic detection processing is present in a detection frame 751 as a processing target.
  • the detection frame 751 as a processing target is not rejected.
  • a plurality of detection frames 762 and 763 corresponding to a plurality of object regions detected through the basic detection processing or the separation detection processing are present in the detection frame 761 as a processing target.
  • the value of the background parallax histogram is equal to or larger than the predetermined value in a portion 764 between the detection frames 762 and 763, it is determined that the background is present.
  • a side object 765 such as a pole and a vehicle 767 are detected as the same object through the basic detection processing, they may be detected as different objects with the detection frames 762 and 763 through background detection and rejection processing based thereon.
  • the rejection unit 517 may reject the detection frame using another method without performing background detection.
  • the rejection unit 517 may reject a detection frame corresponding to the region of the object sorted into "others" using a method of sorting a classification of the object illustrated in Fig. 8.
  • the rejection unit 517 may reject a detection frame overlapped with another detection frame, the overlapped area thereof being equal to or larger than a predetermined ratio.
  • a first detection result having relatively low separation performance and a second detection result having relatively high separation performance are generated to be associated with each other.
  • This configuration can improve performance of easily recognizing the object by performing simple processing at a later stage.
  • One of the first detection result and the second detection result associated with each other is rejected based on a predetermined condition.
  • This configuration can improve performance of recognizing each of a plurality of objects.
  • the value of distance (distance value) and the parallax value can be treated equivalently, so that the parallax image is used as an example of a distance image in the present embodiment.
  • the embodiment is not limited thereto.
  • the distance image may be generated by integrating a parallax image generated by using a stereo camera with distance information generated by using a detection device such as a millimetric wave radar and a laser radar.
  • a stereo camera and a detection device such as a millimetric wave radar and a laser radar may be used at the same time, and a result may be combined with a detection result of the object obtained by the stereo camera described above to further improve accuracy in detection.
  • a functional unit that performs at least part of processing of the functional units such as the parallax value arithmetic processing unit 300, the second generation unit 500, the clustering processing unit 510, and the tracking unit 530 of the object recognition device 1 may be implemented by cloud computing constituted of one or more computers.
  • the object recognition device 1 is mounted on the automobile as the vehicle 70.
  • the embodiment is not limited thereto.
  • the object recognition device 1 may be mounted on a vehicle such as a motorcycle, a bicycle, a wheelchair, and a cultivator for farming as an example of other vehicles.
  • the object recognition device 1 may be mounted on a mobile object such as a robot in addition to the vehicle as an example of a mobile object.
  • the computer program in a case in which at least one of the functional units of the parallax value deriving unit 3 and the recognition processing unit 5 in the object recognition device 1 is implemented by executing a computer program, the computer program is embedded and provided in a ROM and the like.
  • the computer program executed by the object recognition device 1 according to the embodiment described above may be recorded and provided in a computer-readable recording medium such as a compact disc read only memory (CD-ROM), a flexible disk (FD), a compact disc recordable (CD-R), and a digital versatile disc (DVD), as an installable or executable file.
  • the computer program executed by the object recognition device 1 according to the embodiment described above may be stored in a computer connected to a network such as the Internet and provided by being downloaded via the network.
  • the computer program executed by the object recognition device 1 according to the embodiment described above may be provided or distributed via a network such as the Internet.
  • the computer program executed by the object recognition device 1 according to the embodiment described above has a module configuration including at least one of the functional units described above.
  • the CPU 52 CPU 32
  • the ROM 53 ROM 33
  • the functional units described above are loaded into a main storage device (RAM 54 (RAM 34) and the like) to be generated.
  • Second Embodiment Fig. 19 is a schematic diagram illustrating a schematic configuration of an equipment control system 1100 according to a second embodiment. As illustrated in Fig. 19, the equipment control system 1100 is arranged in a vehicle 1101 such as an automobile as an example of equipment (a mobile object).
  • the equipment control system 1100 includes an imaging unit 1102, an analyzing unit 1103, a control unit 1104, and a display unit 1105.
  • the imaging unit 1102 is arranged in the vicinity of a room mirror on a windshield 1106 of the vehicle 1101 as an example of a mobile object, and takes an image in a traveling direction of the vehicle 1101, for example.
  • Various pieces of data including image data obtained through an imaging operation performed by the imaging unit 1102 are supplied to the analyzing unit 1103.
  • the analyzing unit 1103 analyzes an object to be recognized such as a road surface on which the vehicle 1101 is traveling, a forward vehicle of the vehicle 1101, a pedestrian, and an obstacle based on the various pieces of data supplied from the imaging unit 1102.
  • the control unit 1104 gives a warning and the like to a driver of the vehicle 1101 via the display unit 1105 based on an analysis result of the analyzing unit 1103.
  • the control unit 1104 performs traveling support such as control of various onboard devices, and steering wheel control or brake control of the vehicle 1101 based on the analysis result.
  • traveling support such as control of various onboard devices, and steering wheel control or brake control of the vehicle 1101 based on the analysis result.
  • the equipment control system according to the present embodiment can also be applied to a ship, an aircraft, a robot, and the like.
  • Fig. 20 is a schematic block diagram of the imaging unit 1102 and the analyzing unit 1103.
  • the analyzing unit 1103 functions as an "information processing device", and a pair of the imaging unit 1102 and the analyzing unit 1103 functions as an "imaging device”.
  • the control unit 1104 described above functions as a "control unit”, and controls the equipment (in this example, the vehicle 1101) based on an output result of the imaging device.
  • the imaging unit 1102 is configured such that two camera units are assembled to each other in parallel, the camera units including a first camera unit 1A for a left eye and a second camera unit 1B for a right eye. That is, the imaging unit 1102 is configured as a stereo camera for taking a stereo image.
  • the stereo image means an image including a plurality of taken images (a plurality of taken images corresponding to a plurality of viewpoints on a one-to-one basis) obtained through imaging for each of the viewpoints, and the imaging unit 1102 is a device for taking the stereo image (functions as an "imaging unit").
  • the camera units 1A and 1B include imaging lenses 5A and 5B, image sensors 6A and 6B, and sensor controllers 7A and 7B, respectively.
  • the image sensors 6A and 6B are, for example, a CCD image sensor or a CMOS image sensor.
  • the analyzing unit 1103 includes a data bus line 10, a serial bus line 11, a CPU 15, an FPGA 16, a ROM 17, a RAM 18, a serial IF 19, and a data IF 20.
  • the imaging unit 1102 described above is connected to the analyzing unit 1103 via the data bus line 10 and the serial bus line 11.
  • the CPU 15 executes and controls the entire operation, image processing, and image recognition processing of the analyzing unit 1103.
  • Luminance image data of an image taken by the image sensors 6A and 6B of the first camera unit 1A and the second camera unit 1B is written into the RAM 18 of the analyzing unit 1103 via the data bus line 10.
  • Change control data of sensor exposure value, change control data of an image reading parameter, various pieces of setting data, and the like from the CPU 15 or the FPGA 16 are transmitted or received via the serial bus line 11.
  • the FPGA 16 performs processing required to have real-time performance on the image data stored in the RAM 18.
  • the FPGA 16 causes one of respective pieces of luminance image data (taken images) taken by the first camera unit 1A and the second camera unit 1B to be a reference image, and causes the other one thereof to be a comparative image.
  • the FPGA 16 then calculates, as a parallax value (parallax image data) of a corresponding image portion, a position shift amount between a corresponding image portion on the reference image and a corresponding image portion on the comparative image, both of which corresponding to the same point in an imaging area.
  • Fig. 21 illustrates a positional relation among a subject 40, the imaging lens 5A of the first camera unit 1A, and the imaging lens 5B of the second camera unit 1B on an XZ-plane.
  • a distance b between the imaging lenses 5A and 5B and the focal length f of the imaging lenses 5A and 5B are fixed values, respectively.
  • a shift amount of the X-coordinate of the imaging lens 5A with respect to a gazing point P of the subject 40 is assumed to be ⁇ 1.
  • a shift amount of the X-coordinate of the imaging lens 5B with respect to the gazing point P of the subject 40 is assumed to be ⁇ 2.
  • the FPGA 16 calculates the parallax value d as a difference between X-coordinates of the imaging lenses 5A and 5B with respect to the gazing point P of the subject 40 through the following expression 1.
  • the FPGA 16 of the analyzing unit 1103 performs processing required to have real-time performance such as gamma correction processing and distortion correction processing (paralleling of left and right taken images) on the luminance image data supplied from the imaging unit 1102.
  • processing required to have real-time performance such as gamma correction processing and distortion correction processing (paralleling of left and right taken images)
  • the FPGA 16 By performing the arithmetic operation of the expression 1 described above using the luminance image data on which the processing required to have real-time performance is performed, the FPGA 16 generates parallax image data to be written into the RAM 18.
  • the CPU 15 performs control of the sensor controllers 7A and 7B of the imaging unit 1102, and overall control of the analyzing unit 1103.
  • the ROM 17 stores a three-dimensional object recognition program for executing situation recognition, prediction, three-dimensional object recognition, and the like described later.
  • the three-dimensional object recognition program is an example of an image processing program.
  • the CPU 15 acquires, for example, CAN information of the host vehicle (vehicle speed, acceleration, a rudder angle, a yaw rate, and the like) as parameters via the data IF 20.
  • the CPU 15 By executing and controlling various pieces of processing such as situation recognition using a luminance image and a parallax image stored in the RAM 18 in accordance with the three-dimensional object recognition program stored in the ROM 17, the CPU 15 recognizes a recognition target such as a preceding vehicle, for example.
  • Recognition data of the recognition target is supplied to the control unit 1104 via the serial IF 19.
  • the control unit 1104 performs traveling support such as brake control of the host vehicle and speed control of the host vehicle using the recognition data of the recognition target.
  • Fig. 22 is a diagram for schematically explaining a function of the analyzing unit 1103.
  • a stereo image taken by the imaging unit 1102 included in the stereo camera is supplied to the analyzing unit 1103.
  • the first camera unit 1A and the second camera unit 1B have a color specification
  • each of the first camera unit 1A and the second camera unit 1B performs an arithmetic operation of the following expression 2 to perform color luminance conversion processing for generating a luminance (y) signal from each signal of RGB (red, green, and blue).
  • RGB red, green, and blue
  • Each of the first camera unit 1A and the second camera unit 1B supplies luminance image data (taken image) generated through the color luminance conversion processing to a preprocessing unit 1111 included in the analyzing unit 1103.
  • the stereo image is a set of the luminance image data (taken image) taken by the first camera unit 1A and the luminance image data (taken image) taken by the second camera unit 1B.
  • the preprocessing unit 1111 is implemented by the FPGA 16.
  • the preprocessing unit 1111 preprocesses the luminance image data received from the first camera unit 1A and the second camera unit 1B. In this example, gamma correction processing is performed as preprocessing.
  • the preprocessing unit 1111 supplies the preprocessed luminance image data to a paralleled image generation unit 1112.
  • the paralleled image generation unit 1112 performs paralleling processing (distortion correction processing) on the luminance image data supplied from the preprocessing unit 1111.
  • the polynomial expression is, for example, based on a quintic polynomial expression related to x (a horizontal direction position of an image) and y (a vertical direction position of an image). Accordingly, paralleled luminance image can be obtained in which distortion of an optical system of the first camera unit 1A and the second camera unit 1B is corrected.
  • the paralleled image generation unit 1112 is implemented by the FPGA 16.
  • the parallax image generation unit 1113 is an example of a "distance image generation unit", and generates a parallax image including a parallax value for each pixel as an example of a distance image including distance information for each pixel from the stereo image taken by the imaging unit 1102.
  • the parallax image generation unit 1113 performs the arithmetic operation expressed by the expression 1 described above assuming that the luminance image data of the first camera unit 1A is standard image data and the luminance image data of the second camera unit 1B is comparative image data, and generates parallax image data indicating a parallax between the standard image data and the comparative image data.
  • the parallax image generation unit 1113 defines a block including a plurality of pixels (for example, 16 pixels ⁇ 1 pixel) centered on one focused pixel for a predetermined "row" of the standard image data.
  • a block having the same size as that of the defined block of the standard image data is shifted one pixel by one pixel in a horizontal line direction (X-direction).
  • the parallax image generation unit 1113 calculates each correlation value indicating correlation between a feature amount indicating a feature of a pixel value of the defined block in the standard image data and a feature amount indicating a feature of a pixel value of each block in the comparative image data.
  • the parallax image means information associating the vertical direction position, the horizontal direction position, and a depth direction position (parallax) with each other.
  • the parallax image generation unit 1113 performs matching processing for selecting the block of the comparative image data that is most closely correlated with the block of the standard image data among blocks in the comparative image data based on the calculated correlation value. Thereafter, a position shift amount is calculated as the parallax value d, the position shift amount between the focused pixel in the block of the standard image data and a corresponding pixel in the block of the comparative image data selected through the matching processing.
  • the parallax image data is obtained.
  • various well-known techniques can be utilized as a method of generating the parallax image. In short, it can be considered that the parallax image generation unit 1113 calculates (generates) the distance image (in this example, the parallax image) including the distance information for each pixel from the stereo image taken by the stereo camera.
  • a value (luminance value) of each pixel in the block can be used.
  • the correlation value the sum total of absolute values of differences between a value (luminance value) of each pixel in the block of the standard image data and a value (luminance value) of each pixel in the block of the comparative image data corresponding to the former pixel can be used. In this case, the block including the smallest sum total is detected as the most correlated block.
  • the matching processing of the parallax image generation unit 1113 for example, used is a method such as Sum of Squared Difference (SSD), Zero-mean Sum of Squared Difference (ZSSD), Sum of Absolute Difference (SAD), or Zero-mean Sum of Absolute Difference (ZSAD).
  • SSD Sum of Squared Difference
  • ZSSD Zero-mean Sum of Squared Difference
  • SAD Sum of Absolute Difference
  • ZSAD Zero-mean Sum of Absolute Difference
  • the parallax image generation unit 1113 is implemented by the FPGA 16.
  • the parallax image generated by the parallax image generation unit 1113 is supplied to the object detection processing unit 1114.
  • the function of the object detection processing unit 1114 is implemented when the CPU 15 executes a three-dimensional object recognition program.
  • Fig. 23 is a diagram illustrating an example of a function of the object detection processing unit 1114.
  • the object detection processing unit 1114 includes an acquisition unit 1121, a road surface detection processing unit 1122, a clustering processing unit 1123, and a tracking processing unit 1124.
  • the acquisition unit 1121 acquires the parallax image generated by the parallax image generation unit 1113. It can be considered that the acquisition unit 1121 has a function of acquiring a distance image (in this example, the parallax image) including distance information for each pixel calculated from the stereo image taken by the stereo camera.
  • the parallax image acquired by the acquisition unit 1121 is input to the road surface detection processing unit 1122 and the clustering processing unit 1123.
  • the road surface detection processing unit 1122 includes a road surface estimation unit 1131, a first generation unit 1132, a second generation unit 1133, and a third generation unit 1134.
  • the road surface estimation unit 1131 generates correspondence information in which a position in the vertical direction indicating the vertical direction (vertical direction orthogonal to an optical axis of the stereo camera) of the image is associated with a position in the depth direction indicating a direction of the optical axis of the stereo camera.
  • the road surface estimation unit 1131 votes each pixel (parallax value) of the parallax image into a map (hereinafter, referred to as a "Vmap (V-Disparity map)") in which a vertical axis indicates a coordinate (y) in the vertical direction of the image and a horizontal axis indicates the parallax value d, selects a sample point from voted parallax points using a predetermined method, and performs linear approximation (or curve approximation) on a selected point group to estimate a road surface shape.
  • Vmap V-Disparity map
  • various well-known techniques can be utilized.
  • the Vmap is a two-dimensional histogram in which the X-axis indicates the parallax value d, the Y-axis indicates a y-coordinate value, and the Z-axis indicates frequency in a group of (the X-coordinate value, the y-coordinate value, the parallax value d) of the parallax image.
  • the correspondence information (in this example, the Vmap) is information in which a frequency value of the parallax is recorded for each combination of the position in the vertical direction and the parallax value d (corresponding to the position in the depth direction).
  • An estimation result (road surface estimation information) obtained by the road surface estimation unit 1131 is input to the first generation unit 1132, the second generation unit 1133, the third generation unit 1134, and the clustering processing unit 1123.
  • the road surface detection processing unit 1122 is assumed to include three generation units including the first generation unit 1132, the second generation unit 1133, and the third generation unit 1134. Alternatively, any two generation units may be selected therefrom to be mounted.
  • the first generation unit 1132 Based on a plurality of pixels corresponding to a second range indicating a range of height equal to or larger than a predetermined value within a first range higher than the road surface (an example of a reference object as a reference of height of an object) in the parallax image, the first generation unit 1132 generates first information in which the position in the horizontal direction indicating a direction orthogonal to the optical axis of the stereo camera is associated with the position in the depth direction indicating the direction of the optical axis of the stereo camera.
  • the first information is a two-dimensional histogram in which the horizontal axis (X-axis) indicates a distance (actual distance) in the horizontal direction, the vertical axis (Y-axis) indicates the parallax value d of the parallax image, and the axis in the depth direction indicates frequency. It can be considered that the first information is information in which the frequency value of the parallax is recorded for each combination of the actual distance and the parallax value d. In the following description, the first information is referred to as a "High Umap".
  • the first generation unit 1132 generates a two-dimensional histogram in which the horizontal axis indicates x of the parallax image, the vertical axis indicates the parallax value d, and the axis in the depth direction indicates the frequency by voting a point (x, y, d) in the parallax image corresponding to the second range based on a value of (x, d).
  • the horizontal axis of the two-dimensional histogram is converted into the actual distance to generate the High Umap. It can be considered that the vertical axis of the High Umap indicates the position in the depth direction (a smaller parallax value d represents a larger distance in the depth direction).
  • a linear expression representing the road surface is obtained through road surface estimation by the road surface estimation unit 1131 described above, so that when the parallax value d is determined, a corresponding y-coordinate y0 is determined, and the coordinate y0 represents the height of the road surface.
  • y'-y0 represents the height from the road surface in a case in which the parallax value is d.
  • BF is a value obtained by multiplying a base length B by a focal length f of the imaging unit 1102
  • offset is a parallax in a case of photographing an infinite object.
  • a person group G1 including an adult and a child a person group G2 including adults, a pole, and a vehicle are projected.
  • a range in which an actual height from the road surface is 150 cm to 200 cm is set as the second range
  • Fig. 26 illustrates the High Umap to which the parallax value d within the second range is voted.
  • the parallax value d of the child having a height smaller than 150 cm is not voted, so that the child does not appear on the map.
  • the vertical axis indicates a thinned-out parallax obtained by performing thinning processing on the parallax value d using a thinning rate corresponding to the distance.
  • the High Umap generated by the first generation unit 1132 is input to the clustering processing unit 1123.
  • the second generation unit 1133 generates second information in which the position in the horizontal direction of the stereo camera is associated with the depth direction based on a plurality of pixels corresponding to the first range in the parallax image.
  • the second information is referred to as a "Standard Umap".
  • the second generation unit 1133 Assuming that the position in the horizontal direction of the parallax image is x, the position in the vertical direction is y, and the parallax value set for each pixel is d, the second generation unit 1133 generates a two-dimensional histogram in which the horizontal axis indicates x of the parallax image, the vertical axis indicates the parallax value d, and the axis in the depth direction indicates the frequency by voting a point (x, y, d) in the parallax image corresponding to the first range based on a value of (x, d). The horizontal axis of the two-dimensional histogram is converted into the actual distance to generate the Standard Umap.
  • the vertical axis of the Standard Umap indicates the position in the depth direction.
  • a range from 0 cm to 200 cm is set as the first range
  • Fig. 27 illustrates the Standard Umap to which the parallax value d within the first range is voted.
  • the second generation unit 1133 In addition to the Standard Umap, the second generation unit 1133 generates height information in which the height of the parallax point having the largest height (H) from the road surface is recorded from among parallax points (groups of the actual distance and the parallax value d) voted to the Standard Umap, the horizontal axis indicates the actual distance (a distance in a right and left direction of the camera), the vertical axis indicates the parallax value d, and the height is recorded for each corresponding point. It can be considered that the height information is information in which the height is recorded for each combination of the actual distance and the parallax value d.
  • the Standard Umap generated by the second generation unit 1133 is input to the clustering processing unit 1123.
  • the third generation unit 1134 generates third information in which the position in the horizontal direction is associated with the position in the depth direction of the stereo camera by using a plurality of pixels present in a range higher than the road surface in the parallax image, the number of pixels being smaller than that in a case of generating the first information or the second information.
  • the third information is a two-dimensional histogram in which the horizontal axis indicates the distance (actual distance) in the horizontal direction, the vertical axis indicates the parallax value d of the parallax image, and the axis in the depth direction indicates the frequency.
  • the third information is information in which the frequency value of the parallax is recorded for each combination of the actual distance and the parallax value d.
  • the third information is referred to as a "Small Umap".
  • the third generation unit 1134 Assuming that the position in the horizontal direction of the parallax image is x, the position in the vertical direction is y, and the parallax value set for each pixel is d, the third generation unit 1134 generates a two-dimensional histogram in which the horizontal axis indicates x of the parallax image, the vertical axis indicates the parallax value d, and the axis in the depth direction indicates the frequency by voting a point (x, y, d) (the number of points to be voted is smaller than that in a case of generating the Standard Umap) in the parallax image corresponding to the first range based on a value of (x, d).
  • the horizontal axis of the two-dimensional histogram is converted into the actual distance to generate the Small Umap. It may be considered that the vertical axis of the Small Umap indicates the position in the depth direction. The Small Umap has distance resolution for one pixel lower than that of the Standard Umap.
  • the third generation unit 1134 generates height information in which the height of the parallax point having the largest height (H) from the road surface is recorded from among parallax points (groups of the actual distance and the parallax value d) voted to the Small Umap, the horizontal axis indicates the actual distance (a distance in a right and left direction of the camera), the vertical axis indicates the parallax value d, and the height is recorded for each corresponding point. It may be considered that the height information is information in which the height is recorded for each combination of the actual distance and the parallax value d.
  • the Small Umap generated by the third generation unit 1134 is input to the clustering processing unit 1123.
  • the Standard Umap, the High Umap, and the Small Umap are each referred to as a "real Umap" when they are not required to be distinguished from each other.
  • the real Umap may be regarded as an overhead map (an overhead image, a bird's-eye view image) in which the horizontal axis is the vertical direction (the right and left direction of the camera) with respect to the optical axis of the stereo camera, and the vertical axis is an optical axis direction of the stereo camera.
  • the clustering processing unit 1123 detects an object position on the parallax image acquired by the acquisition unit 1121 using various pieces of information received from the road surface detection processing unit 1122.
  • Fig. 28 is a diagram illustrating an example of a specific function of the clustering processing unit 1123. As illustrated in Fig. 28, the clustering processing unit 1123 includes an isolated region detection processing unit 1140, a parallax image processing unit 1150, and a rejection processing unit 1160.
  • the isolated region detection processing unit 1140 performs isolated region detection processing for detecting an isolated region (assembly region) as a region of a cluster of parallax values d from each real Umap (the High Umap, the Standard Umap, and the Small Umap) received from the road surface detection processing unit 1122. Specific content of the isolated region detection processing unit 1140 will be described later.
  • Fig. 30 is a real Umap obtained based on the taken image illustrated in Fig. 29, and a framed region corresponds to the isolated region.
  • the parallax image processing unit 1150 performs parallax image processing for detecting object information in a real space or a region on the parallax image corresponding to the isolated region on the real Umap detected by the isolated region detection processing unit 1140.
  • Fig. 31 is a diagram illustrating a region on the parallax image (a result of processing performed by the parallax image processing unit 1150) corresponding to the isolated region illustrated in Fig. 30.
  • a region 31 is a region corresponding to the guardrail 81
  • a region 92 is a region corresponding to the vehicle 77
  • a region 93 is a region corresponding to the vehicle 79
  • a region 94 is a region corresponding to the pole 80A
  • a region 95 is a region corresponding to the pole 80B
  • a region 96 is a region corresponding to the guardrail 82.
  • the rejection processing unit 1160 performs rejection processing for selecting an object to be output based on the object information in the real space or the region on the parallax image detected by the parallax image processing unit 1150.
  • the rejection processing unit 1160 performs size rejection focusing on a size of the object, and overlap rejection focusing on a positional relation between objects.
  • size rejection rejected is a detection result of a size not falling within a size range determined for each object type illustrated in Fig. 8 described above.
  • the region 91 and the region 96 are rejected.
  • an overlapping result is selected for regions corresponding to isolated regions on the parallax image (the detection result on the real Umap) detected through the parallax image processing.
  • Fig. 33 is a flowchart illustrating an example of processing performed by the clustering processing unit 1123.
  • the Standard Umap, the High Umap, the Small Umap, the parallax image, the road surface estimation information, and the height information are input as input information, and the detection result on the parallax image is output as output information.
  • the isolated region detection processing unit 1140 performs isolated region detection processing (Step S1001). Specific content of the isolated region detection processing will be described later.
  • the parallax image processing unit 1150 performs parallax image processing (Step S1002).
  • the rejection processing unit 1160 then performs rejection processing using a result of the parallax image processing at Step S1002 (Step S1003), and outputs a detection result on a final parallax image as output information.
  • the output information (detection result) from the clustering processing unit 1123 is input to the tracking processing unit 1124 illustrated in Fig. 23. If the detection result (detected object) obtained by the clustering processing unit 1123 continuously appears over a plurality of frames, the tracking processing unit 1124 determines the detection result to be a tracking target. When the detection result is the tracking target, the tracking processing unit 1124 outputs the detection result to the control unit 1104 as an object detection result.
  • the isolated region detection processing unit 1140 includes a first detection unit 1141, a second detection unit 1142, a third detection unit 1143, and a final determination processing unit 1144.
  • the first detection unit 1141 detects an assembly region of the parallax value d (an example of distance information) from the High Umap (first information).
  • detection processing performed by the first detection unit 1141 is referred to as “separation detection processing”, and a processing result thereof is referred to as a “separation detection result (including the detected assembly region)”.
  • the High Umap is hardly influenced by an object present in a region at a low height as compared with the Standard Umap, so that separation performance of the High Umap is excellent. However, erroneous separation detection tends to be caused for an object having no parallax in a region having a high height from the road surface. Specific processing content will be described later.
  • the second detection unit 1142 detects an assembly region from the Standard Umap (second information).
  • detection processing performed by the second detection unit 1142 is referred to as “basic detection processing”, and a processing result thereof is referred to as a “basic detection result (including the detected assembly region)".
  • the separation detection result described above is assumed to accompany the basic detection result (to be included in the basic detection result).
  • the Standard Umap stable detection can be expected for the entire detection range because distance resolution for one pixel is high and the detection range includes a low position to a high position of the road surface.
  • an estimated road surface is detected to be lower than an actual road surface through road surface estimation or the parallax of the detection target is low, erroneous detection is easily caused due to a characteristic of the Standard Umap. Specific processing content will be described later.
  • the third detection unit 1143 detects an assembly region from the Small Umap (third information).
  • detection processing performed by the third detection unit 1143 is referred to as “detection processing for integration”
  • a processing result thereof is referred to as an "integration detection result (including the detected assembly region)”.
  • the Small Umap has a characteristic such that erroneous separation is hardly caused for an object that hardly has a parallax because resolution for one pixel is lower than that of the Standard Umap.
  • separation performance resolution
  • objects tend to be detected being coupled to each other in the detection processing (detection processing for integration) using the Small Umap.
  • the final determination processing unit 1144 performs final determination processing of causing the "basic detection result", the “separation detection result", and the "integration detection result” to be inputs, selecting and correcting the detection result to be output, and clarifying a relation between the detection results.
  • the final determination processing unit 1144 includes a rejection determination processing unit 1145, a merge processing unit 1146, and a correction unit 1147.
  • the rejection determination processing unit 1145 performs rejection determination processing for determining whether to reject the integration detection result. Specific content thereof will be described later.
  • the merge processing unit 1146 merges the "integration detection result" with the "basic detection result” and the “separation detection result” accompanying therewith. Specific content will be described later.
  • the correction unit 1147 corrects and outputs the merged detection result. Specific content of this correction processing will be described later.
  • Fig. 34 is a flowchart illustrating an example of isolated region detection processing.
  • the Standard Umap, the High Umap, the Small Umap, and the height information are input as input information, and the detection result on the Standard Umap is output as output information.
  • the second detection unit 1142 performs basic detection processing (Step S1011)
  • the first detection unit 1141 performs separation detection processing (Step S1012)
  • the third detection unit 1143 performs detection processing for integration (Step S1013).
  • the order of Step S1011 to Step S1013 is optional, and the steps may be executed in parallel.
  • the final determination processing unit 1144 performs final determination processing (Step S1014).
  • Fig. 35 is a flowchart illustrating an example of the basic detection processing.
  • the Standard Umap is input as input information. Output information will be clarified in the later description.
  • the second detection unit 1142 performs labeling processing for grouping each cluster of parallaxes in the Standard Umap and giving an ID thereto (Step S1021).
  • the second detection unit 1142 focuses on each group of a plurality of pixels included in the Standard Umap, and sets a pixel value of a pixel including a frequency value to be "1" and sets a pixel value of a pixel not including the frequency value to be "0" to be binarized among a focused pixel and pixels present in the vicinity of the focused pixel (for example, eight pixels corresponding to eight directions on a one-to-one basis, the eight direction including a right direction, a right obliquely upward direction, an upward direction, a left obliquely upward direction, a left direction, a left obliquely downward direction, a downward direction, and a right obliquely downward direction).
  • a method of binarization is not limited thereto and is optional.
  • the method of binarization may have a form such that a pixel value of a pixel including a frequency value of the parallax equal to or larger than a threshold is set to be "1" among the eight pixels described above present in the vicinity, and pixel values of the other pixels are set to be "0".
  • a closed region formed by a set of pixel values "1" is caused to be a cluster (one group) of parallaxes, and an ID is given to each pixel included in the closed region. The ID is set to be a value that can identify each group.
  • Fig. 36 is a diagram illustrating an example after binarization processing, and the same ID is given to each of five pixels included in a region 2000.
  • the second detection unit 1142 performs detection rectangle creating processing (Step S1022). Specifically, the second detection unit 1142 calculates a rectangle circumscribing the assembly region of pixels to which the same ID is given, and causes the calculated circumscribing rectangle to be a detection rectangle. Next, the second detection unit 1142 performs size check for checking a size of the detection rectangle created at Step S1022 (Step S1023). For example, when the size of the detection rectangle created at Step S1022 is equal to or smaller than a predetermined threshold as a size corresponding to noise, the second detection unit 1142 performs processing of discarding the detection rectangle.
  • the second detection unit 1142 performs frequency check for checking the frequency value (frequency value of the parallax) of each pixel included in the detection rectangle created at Step S1022(Step S1024). For example, when a cumulative value of the frequency value (frequency value of the parallax) included in the detection rectangle created at Step S1022 is equal to or smaller than a predetermined threshold as a number required for representing the object, the second detection unit 1142 performs processing of discarding the detection rectangle.
  • information indicating the detection rectangle on the Standard Umap is output as output information.
  • An ID for identifying a group is assigned to grouped pixels (pixels included in the detected assembly region) in the detection rectangle on the Standard Umap. That is, information indicating a map of the ID grouped on the Standard Umap (an "ID Umap on the Standard Umap", or simply referred to as an "ID Umap" when it is not required to be distinguished from others in some cases) is output as output information.
  • Fig. 37 is a flowchart illustrating an example of separation detection processing.
  • the Standard Umap and the High Umap are input as input information. Output information will be clarified in the later description.
  • the separation detection result accompanies the basic detection result, and the following processing will be repeated corresponding to the number of basic detection results.
  • the first detection unit 1141 sets, for one or more separation detection results accompanying a focused basic detection result, a region of interest including the separation detection result, and performs labeling processing on the set region of interest (Step S1031). Specific content of the labeling processing is described above.
  • the first detection unit 1141 performs, for each assembly region of pixels to which the same ID is given in the labeling processing at Step S1031, detection rectangle creating processing for calculating a rectangle circumscribing the assembly region (Step S1032).
  • the first detection unit 1141 performs size check processing for each detection rectangle created at Step S1032 (Step S1033). Specific content of size check processing is described above.
  • the first detection unit 1141 performs frequency check processing (Step S1034). Specific content of the frequency check processing is described above.
  • information indicating the detection rectangle on the High Umap (a detection result on the High Umap associated with the basic detection result) is output as output information.
  • An ID for identifying a group is assigned to each grouped pixel in the detection rectangle on the High Umap. That is, information indicating a map of the ID grouped on the High Umap (an "ID Umap on the High Umap", or simply referred to as an "ID Umap" when it is not required to be distinguished from others in some cases) is output as output information.
  • Fig. 38 is a flowchart illustrating an example of detection processing for integration.
  • the Small Umap and the height information are input as input information. Output information will be clarified in the later description.
  • the third detection unit 1143 repeats the following processing until detection is completed.
  • the third detection unit 1143 performs labeling processing for grouping each cluster of parallaxes in the Small Umap and giving an ID thereto (Step S1041). Specific content of the labeling processing is described above.
  • the third detection unit 1143 focuses on each group of a plurality of pixels included in the Small Umap, and sets a pixel value of a pixel including a frequency value to be "1" and sets a pixel value of a pixel not including the frequency value to be "0" to be binarized among a focused pixel and pixels present in the vicinity of the focused pixel (for example, the eight pixels described above).
  • a method of binarization is not limited thereto and is optional.
  • the method of binarization may have a form such that a pixel value of a pixel including a frequency value of the parallax equal to or larger than a threshold is set to be "1" among the eight pixels present in the vicinity, and pixel values of the other pixels are set to be "0".
  • a closed region formed by a set of pixel values "1" is caused to be a cluster (one group) of parallaxes, and an ID is given to each pixel included in the closed region.
  • Step S1042 the third detection unit 1143 performs detection rectangle creating processing (Step S1042). Specific content thereof is described above.
  • the third detection unit 1143 performs output determination processing (Step S1043).
  • the output determination processing is processing for selecting a detection result to be output by determining whether the size, the frequency value of the parallax, a depth length, and the like of the detection rectangle (detection result) created at Step S1042 meet a condition thereof.
  • objects tend to be detected being coupled to each other, so that it is assumed herein that only a detection result having a characteristic which seems to be a vehicle is output.
  • Figs. 39A, 39B, and 39C are tables illustrating examples of the conditions described above. Fig.
  • Fig. 39A is a table indicating an example of a condition related to the size (width) of the detection result.
  • Fig. 39B is a table indicating an example of a condition related to the depth length of the detection result.
  • a "nearest point distance" in Fig. 39B will be described later.
  • the "nearest point distance” indicates a distance from the center of a predetermined valid range (a range in which detection is valid) to a point of the detection result (object) nearest to the center in the depth direction.
  • Fig. 39C is a table illustrating an example of conditions related to the frequency value of the parallax.
  • the information indicating the detection rectangle on the Small Umap is output as output information.
  • An ID for identifying a group is assigned to each grouped pixel in the detection rectangle on the Small Umap. That is, information indicating a map of the ID grouped on the Small Umap (an "ID Umap on the Small Umap", or simply referred to as an “ID Umap" when it is not required to be distinguished from others in some cases) is output as output information.
  • the final determination processing unit 1144 receives three results including the basic detection result, the separation detection result, and the integration detection result, calculates a correspondence relation among the detection results, and sets an inclusive frame and a partial frame accompanying the inclusive frame.
  • the final determination processing unit 1144 corrects the inclusive frame and the partial frame, and selects an output target therefrom.
  • the inclusive frame stores a result detected through processing having low separation performance. That is, the inclusive frame indicates a frame having a larger size for the same object.
  • the integration detection result or the basic detection result is set as the inclusive frame.
  • the partial frame stores a result detected through processing having separation performance higher than that of the inclusive frame.
  • the partial frame is a detection frame (an outer frame of the detection result) associated with the inclusive frame, and is a result obtained by separating the inside of the inclusive frame.
  • the basic detection result or the separation detection result corresponds to the partial frame.
  • the frame indicates a position and a size of the object, and is information associating coordinates of a corner of the rectangle surrounding the object with the height and the width, for example.
  • Fig. 40 is a flowchart illustrating an example of final determination processing.
  • the Standard Umap, the High Umap, the height information, the basic detection result (the detection rectangle on the Standard Umap), the separation detection result associated with the basic detection result (the detection rectangle on the High Umap), and the integration detection result (the detection rectangle on the Small Umap) are input as input information.
  • output information output is an ID table in which the detection result on the Standard Umap, the ID Umap corresponding thereto, and a relation between the detection results are recorded.
  • the rejection determination processing unit 1145 performs rejection determination processing on a focused integration detection result (Step S1051).
  • the rejection determination processing unit 1145 performs rejection determination processing for selecting only an integration detection result satisfying a condition of a vehicle size present on an own lane, and rejecting other results.
  • performed is processing of converting the detection rectangle (integration detection result) on the Small Umap into the detection rectangle on the Standard Umap, and the integration detection result outside the valid range set in advance on the Standard Umap is rejected.
  • the embodiment is not limited thereto.
  • the valid range may be set in advance on the Small Umap, and the integration detection result outside the valid range set in advance on the Small Umap may be rejected.
  • Fig. 41A is a diagram illustrating an example of a condition for rejection
  • Fig. 41B is a table illustrating an example of a condition for rejection.
  • the result is determined to be valid (to be an output candidate) only when a "distance to the center” indicating a distance between the center of the valid range and the center of the integration detection result in a camera horizontal direction (a right and left direction of the camera) is larger than -Z2 (a threshold on a negative side) and equal to or smaller than Z2 (a threshold on a positive side), and other results are rejected.
  • the result is determined to be valid only when the distance to the center is larger than -Z3 (a threshold on the negative side) and equal to or smaller than Z3 (a threshold on the positive side), and other results are rejected.
  • Step S1052 in Fig. 40 When the integration detection result is determined to be valid through the rejection determination processing described above, the result of Step S1052 in Fig. 40 is "Yes”, and the processing from Step S1053 to Step S1056 is repeated corresponding to the number of basic detection results. On the other hand, when the result of Step S1052 is "No", the processing on the focused integration detection result is ended, and loop processing corresponding to the number of other integration detection results is repeated.
  • the merge processing unit 1146 performs matching between the integration detection result and the basic detection result. Specific content thereof is described below.
  • the merge processing unit 1146 detects overlapping between the detection frame of the integration detection result and the detection frame of the basic detection result on the Large Umap, clarifies a correspondence relation based on the detection result, and selects the integration detection result to be a processing target.
  • the merge processing unit 1146 calculates an overlapping rate of the integration detection result and the basic detection result.
  • the overlapping rate is calculated by dividing an area of an overlapping region of the basic detection result and the integration detection result by an area of the basic detection result.
  • the overlapping rate is calculated by dividing an area of an overlapping region of the basic detection result and the integration detection result by an area of the integration detection result.
  • the merge processing unit 1146 determines that the basic detection result overlapping with the integration detection result is present. The merge processing unit 1146 then sets the inclusive frame and the partial frame based on the condition illustrated in Fig. 42.
  • a threshold for example, 0.5
  • the merge processing unit 1146 rejects the integration detection result, sets the basic detection result as the inclusive frame, and sets the separation detection result associated with the basic detection result as the partial frame.
  • the merge processing unit 1146 rejects the basic detection result, sets the integration detection result as the inclusive frame, and sets the separation detection result associated with the basic detection result as the partial frame.
  • the merge processing unit 1146 sets the integration detection result as the inclusive frame, and sets the basic detection result as the partial frame.
  • the merge processing unit 1146 sets the integration detection result as the inclusive frame, and sets no partial frame. Content of matching performed by the merge processing unit 1146 has been described above.
  • Step S1054 When the integration detection result overlaps with the basic detection result as a result of matching at Step S1053, the result of Step S1054 is "Yes”. If the result of Step S1054 is "Yes", the merge processing unit 1146 merges the inclusive frame (integration detection result) with the partial frame (the basic detection result or the separation detection result) (Step S1055), and generates one "detection result". As described above, merge processing in this example is performed based on the condition illustrated in Fig. 42.
  • Step S1054 if the result of Step S1054 is "No", the merge processing unit 1146 sets only the integration detection result as the inclusive frame (Step S1056). No partial frame is set because a corresponding basic detection result is not present. That is, the integration detection result is set as the inclusive frame, and one "detection result" in which no partial frame is set is generated.
  • the correction processing at Step S1057 is performed corresponding to the number of "detection results" generated as described above.
  • the following describes the correction processing performed by the correction unit 1147.
  • the correction unit 1147 performs integration correction processing when the detection result includes the integration detection result. Specific content of the integration correction processing will be described later.
  • the correction unit 1147 corrects a first assembly region using a correction method corresponding to a distance of an assembly region (the first assembly region indicating an assembly region (a set of pixels to which an ID is given) detected by the first detection unit 1141) included in the separation detection result set as the partial frame.
  • the distance of the first assembly region indicates a distance (distance in the depth direction) from the stereo camera, and can be obtained using the parallax value d of each pixel included in the first assembly region.
  • the correction unit 1147 performs first correction processing on the first assembly region.
  • the correction unit 1147 performs second correction processing on the first assembly region.
  • the threshold it is preferable to set a value of distance that can secure the road surface estimation accuracy.
  • the threshold is set to be 30 m, but the embodiment is not limited thereto.
  • the first correction processing is processing of expanding the first assembly region using a relative standard of the height of the first assembly region from the reference object (road surface). More specifically, the first correction processing is processing of expanding the first assembly region to a boundary, a boundary being a position at which the height of the region of interest from the reference object is lower than a relative height threshold that indicates a relative value in accordance with an average value of the height of the first assembly region (the height from the reference object) in a second assembly region (an assembly region included in the basic detection result associated with the separation detection result) including the first assembly region and indicating the assembly region detected by the second detection unit 1142 in a direction in which the region of interest indicating a region directed outward from the first assembly region continues. Specific content thereof will be described later. In the following description, the first correction processing is referred to as "correction processing for short distance".
  • the second correction processing is processing of coupling two first assembly regions by using a relative standard of the height of the first assembly region from the reference object (road surface). More specifically, the second correction processing is processing of coupling one first assembly region and the other first assembly region when the height of the region of interest from the reference object is equal to or larger than the relative height threshold that indicates a relative value in accordance with an average value of the height (the height from the reference object) of the first assembly region in a direction continuous from one first assembly region to the other first assembly region in the region of interest indicating a region between two first assembly regions in the second assembly region including two or more first assembly regions. Specific content will be described later. In the following description, the second correction processing is referred to as "correction processing for long distance".
  • Fig. 43 is a flowchart illustrating an example of correction processing at Step S1057 in Fig. 40.
  • a list of inclusive frames, partial frames accompanying the inclusive frames, the ID Umap, the Standard Umap, and the height information are input as input information
  • a corrected list of inclusive frames, partial frames accompanying the inclusive frames, a corrected ID Umap, and an ID table in which a relation between detection results is recorded are output as output information.
  • the correction unit 1147 repeats the processing from Step S1061 to Step S1067 corresponding to the number of "detection results".
  • the correction unit 1147 creates an ID table (Step S1061).
  • the ID table is information having a table format in which the inclusive frame and the partial frame are associated with each other using an ID.
  • the correction unit 1147 counts the number of partial frames having a size corresponding to a vehicle size among partial frames included in a focused detection result (a group of the inclusive frame and the partial frame) (Step S1062).
  • the correction unit 1147 determines whether the detection result includes the integration detection result (Step S1063). That is, the correction unit 1147 determines whether the inclusive frame included in the detection result is the integration detection result.
  • Step S1064 If the result of Step S1063 is "Yes” (Yes at Step S1063), the correction unit 1147 performs integration correction processing (Step S1064). If the result of Step S1063 is "No" (No at Step S1063), the correction unit 1147 determines whether a distance of the detection result is smaller than a predetermined distance (for example, 30 m) (Step S1065). If the result of Step S1065 is "Yes” (Yes at Step S1065), the correction unit 1147 performs correction processing for short distance (Step S1066). If the result of Step S1065 is "No" (No at Step S1065), the correction unit 1147 performs correction processing for long distance (Step S1067).
  • a predetermined distance for example, 30 m
  • the detection result when the detection result includes the integration detection result (a result of detection using the Small Umap as a map having low resolution), integration correction processing is performed considering a distance difference and a horizontal position on the basic detection result and the separation detection result. Accordingly, the detection result can be corrected to have high separation performance while reducing erroneous separation.
  • appropriate one of the correction processing for short distance and the correction processing for long distance is used depending on the distance of the detection result. Accordingly, correction can be performed using an appropriate method for short distance having high road surface estimation accuracy and long distance having low road surface estimation accuracy.
  • the integration detection result is obtained by using a map having coarse resolution (Small Umap). Due to this, erroneous separation of the object can be reduced, but separation performance is deteriorated.
  • the basic detection result and the separation detection result are obtained by using a map having high resolution, so that separation performance is high but erroneous separation of the object is problematic.
  • correction processing for integration detection all partial frames (the basic detection result or the separation detection result) associated with the integration detection result are not coupled (integrated) with each other as the same object, but the partial frame is corrected to be a detection result having high separation performance while reducing erroneous separation by making coupling determination based on a distance difference or a horizontal direction.
  • Fig. 44 is a flowchart illustrating an example of integration correction processing.
  • the correction unit 1147 performs correction processing on the inclusive frame (integration detection result) (Step S1071).
  • the correction unit 1147 performs correction processing on the partial frame (the basic detection result or the separation detection result) included in the inclusive frame (Step S1072).
  • the correction unit 1147 performs coupling processing on partial frames after the correction processing at Step S1072 (Step S1073). Specific content of each step will be described later.
  • the inclusive frame includes only one partial frame, the coupling processing at Step S1073 is not performed.
  • the correction unit 1147 calculates a circumscribing rectangle of pixels having a parallax in the inclusive frame.
  • the ID of the inclusive frame is given to the pixel having no ID and having a frequency value of the parallax among the pixels included in the circumscribing rectangle.
  • a pixel having the frequency value but not having the ID may be present, and the correction processing of the inclusive frame is processing of setting the ID to such a pixel as part of the inclusive frame. Accordingly, an appropriate inclusive frame can be set to an object the parallax value of which is hardly obtained. This processing may be omitted as needed.
  • Fig. 46 is a flowchart illustrating a procedure of the correction processing of the partial frame.
  • the correction unit 1147 repeatedly performs processing at Step S1710 and Step S1720 for each partial frame associated with a focused inclusive frame.
  • the correction unit 1147 performs expansion processing of the partial frame.
  • the correction unit 1147 performs processing of updating the ID of the pixel in an expanded region to an ID of the partial frame.
  • the expansion processing is performed based on height information, for example. That is, a pixel having connectivity with the height information but having no frequency value is caused to be valid. An ID is then set to the valid pixel. This is the correction processing of the partial frame. Due to this, an appropriate partial frame can be set to an object the parallax value of which is hardly obtained. This processing may be omitted as needed.
  • the correction unit 1147 couples, among all combinations of partial frames, partial frames having a closer distance difference (center distance difference) between the partial frames in the optical axis direction (depth direction) of the camera and a closer distance difference (center horizontal position difference) between the partial frames in the right and left direction of the camera.
  • a circumscribing rectangle a circumscribing rectangle of two partial frames to be coupled
  • a region of the circumscribing rectangle is set as a coupled partial frame.
  • Fig. 47 is a table illustrating an example of a condition whether to be a target of coupling processing. Respective thresholds of the center distance difference and the center horizontal position difference are not limited to the example of Fig. 47 (2 m, 6 m), and can be optionally changed in a range in which erroneous separation may be caused for the same object.
  • the center horizontal position difference corresponds to a distance between the frames described later.
  • the center distance difference is obtained as a difference between an average value of a distance (a distance derived from the parallax value d) for each pixel included in one of the partial frames and an average value of a distance for each pixel included in the other one of the partial frames.
  • Partial frames having at least one of a closer distance difference (center distance difference) between the partial frames in the optical axis direction (depth direction) of the camera and a closer distance difference (center horizontal position difference) between the partial frames in the right and left direction of the camera may be coupled to each other.
  • the partial frame may be excluded from the target of the coupling processing.
  • a predetermined ratio that can be optionally set within a range in which it can be determined that erroneous separation may be caused, for example, 20%
  • the overlapping rate herein is obtained by dividing an area of an overlapping region of two partial frames by an area of any one of the two partial frames (typically, a partial frame having a smaller size).
  • the two partial frames as targets are caused to be the targets of the coupling processing.
  • a corrected list of inclusive frames, corrected (expanded, coupled) partial frames accompanying the inclusive frames, and a corrected ID Umap are output as output information.
  • the separation detection processing is detection processing using the High Umap to which only the parallax value of a region having a high height from the road surface is voted, so that separation performance is high.
  • an object spreading in a region having a low height from the road surface may be detected to have a smaller frame than an actual frame.
  • the correction processing for short distance the detection frame of the separation detection result is corrected using a relative standard of the height from the road surface of the detection frame.
  • Fig. 48 is a flowchart illustrating a procedure of correction processing for short distance.
  • the correction unit 1147 checks whether one or more partial frames (in this case, separation detection results) having a vehicle size are present in the inclusive frame of the focused detection result (in this case, the basic detection result) (Step S1091). If the result of Step S1091 is "No" (No at Step S1091), the processing is ended. If the result of Step S1091 is "Yes" (Yes at Step S1091), the correction unit 1147 performs expansion processing of the partial frame (Step S1092). This specific content is similar to the content described in the integration correction processing. Next, the correction unit 1147 performs update processing on a pixel ID (Step S1093). This content is also similar to the content described in the integration correction processing, so that detailed description thereof will not be repeated. Next, the correction unit 1147 deletes the inclusive frame (Step S1094).
  • estimation accuracy for the road surface in a case of long distance is lower than that in the case of short distance.
  • the parallax of the road surface is voted, which causes coupling of the detection frames or expansion of the detection frame.
  • This problem can be solved by employing the separation detection result as a detection result of a region having a high height from the road surface.
  • the separation detection result has high separation performance, so that erroneous separation may be caused and the object may be detected to be smaller than the actual object with high possibility when the estimation result of the road surface is estimated to be higher than the actual road surface or the vehicle has a low height.
  • the coupling processing and the correction processing of the detection frame are performed considering the above points.
  • Fig. 49 is a flowchart illustrating a procedure of correction processing for long distance.
  • the correction unit 1147 checks whether one or more partial frames (in this case, separation detection results) having a vehicle size are present in the inclusive frame of the focused detection result (in this case, the basic detection result) (Step S1101). If the result of Step S1101 is "Yes” (Yes at Step S1101), the correction unit 1147 performs coupling determination processing described later for each combination of the partial frame having a vehicle size and the other partial frame (Step S1102). On the other hand, if the result of Step S1101 is "No" (No at Step S1101), the correction unit 1147 performs the coupling determination processing described later for each combination of all partial frames (for each combination of two partial frames) (Step S1103).
  • the correction unit 1147 specifies height information corresponding to the region of the inclusive frame, and the region of the partial frame associated with the inclusive frame in advance.
  • the correction unit 1147 calculates a distance (in the following description, referred to as a "distance between the frames") between portions facing each other in the X-direction (the right and left direction of the camera) of focused two partial frames.
  • a distance between a boundary on the right side of the left partial frame and a distance of a boundary on the left side of the right partial frame is calculated as the distance between the frames.
  • the correction unit 1147 causes the two partial frames to be coupling targets.
  • the objects may be different objects with high possibility, so that coupling processing is not performed on the two partial frames.
  • 1.5 m is employed as the predetermined threshold, but the embodiment is not limited thereto. The following describes processing in a case in which the two partial frames become coupling targets.
  • the correction unit 1147 sets a region between the focused partial frames (in the example of Fig. 51, a region continuous from the partial frame on the left side (right side) to the partial frame on the right side (left side)) as a region of interest.
  • the correction unit 1147 then obtains an average value of the height (average value of the height of each pixel included in the partial frame in a height map) from the road surface of the partial frame as a coupling destination (for example, any of the partial frame on the right side and the partial frame on the left side), and uses a value relative to the average value as a threshold (hereinafter, referred to as a "relative height threshold").
  • a threshold hereinafter, referred to as a "relative height threshold"
  • 1/4 of the average value is assumed to be the relative height threshold.
  • the value relative to the average value of the height of one of the two partial frames as coupling targets is assumed to be the relative height threshold, but the embodiment is not limited thereto.
  • a value relative to the average value of the height of the two partial frames as coupling targets may be assumed to be the relative height threshold.
  • the correction unit 1147 then creates a height profile indicating distribution of the most frequent height in a direction in which the region of interest continues. For example, as illustrated in Fig. 52 and Fig. 53, the correction unit 1147 obtains the most frequent height in each column of the region of interest to create the height profile. The correction unit 1147 then checks continuity of height based on the relative height threshold and the height profile.
  • the correction unit 1147 checks continuity of height by checking whether the most frequent height smaller than the relative height threshold is present in all most frequent heights indicated by the height profile, and determines to perform coupling processing on the partial frames only when there is continuity of height. For example, as illustrated in Fig. 52, all the most frequent heights indicated by the height profile are equal to or larger than the relative height threshold (when the most frequent height is equal to or larger than the relative height threshold in all columns of the region of interest), the correction unit 1147 determines that there is continuity of height, and determines to perform coupling processing on the two partial frames. On the other hand, as illustrated in Fig. 53 for example, when the most frequent height smaller than the relative height threshold is present in all the most frequent heights indicated by the height profile, the correction unit 1147 determines that there is no continuity of height, and determines not to perform coupling processing on the two partial frames.
  • the region of interest may be divided into an upper part and a lower part, and whether to perform coupling processing may be determined by checking continuity of height for each divided region of interest.
  • Fig. 54 is a flowchart illustrating a procedure of coupling determination processing described above. Specific content of each step is described above, so that the description thereof will be appropriately omitted.
  • the correction unit 1147 checks whether the distance between the frames is smaller than the threshold (Step S1111). If the result of Step S1111 is "No" (No at Step S1111), the correction unit 1147 determines not to perform coupling processing on the focused two partial frames (Step S1117), and ends the processing. If the result of Step S1111 is "Yes" (Yes at Step S1111), the correction unit 1147 sets the region of interest (Step S1112). Next, the correction unit 1147 calculates the relative height threshold (Step S1113), and checks the continuity of height (Step S1114).
  • Step S1112 to Step S1114 Specific content of Step S1112 to Step S1114 is described above. If it is determined that there is the continuity of height as a result of Step S1114 (Yes at Step S1115), the correction unit 1147 determines to perform coupling processing (Step S1116), and ends the processing. On the other hand, if it is determined that there is no continuity of height as a result of Step S1114 (No at Step S1115), the correction unit 1147 determines not to perform coupling processing (Step S1117), and ends the processing.
  • the correction unit 1147 performs coupling processing on two partial frames determined to be coupled in the coupling determination processing among combinations of partial frames, and does not perform coupling processing on two partial frames determined not to be coupled. That is, the processing at Step S1104 illustrated in Fig. 49 is performed for each combination of partial frames. If the coupling processing is determined to be performed (Yes at Step S1104), the correction unit 1147 performs coupling processing on the focused two partial frames (Step S1105), and the process proceeds to Step S1106. As the coupling processing, a circumscribing rectangle of the partial frames as coupling targets is calculated, a region of the circumscribing rectangle is set as a partial frame after coupling, and update processing of the ID is performed. On the other hand, if the result of Step S1104 is "No" (No at Step S1104), the processing directly proceeds to Step S1106.
  • the correction unit 1147 performs correction processing on the partial frame.
  • Content of the correction processing is the same as that of the correction processing at Step S1072 in Fig. 44, so that detailed description thereof will not be repeated.
  • the processing at Step S1106 will be repeatedly performed corresponding to the number of partial frames.
  • the correction unit 1147 corrects the first assembly region while switching the correction method in accordance with the distance of the first assembly region obtained through the separation detection processing. More specifically, the correction unit 1147 performs correction processing for short distance on the first assembly region when the distance of the first assembly region is smaller than a threshold, and performs correction processing for long distance on the first assembly region when the distance of the first assembly region is equal to or larger than the threshold.
  • estimation accuracy for the road surface is high in a case of short distance, so that erroneous separation of the separation detection result is hardly caused, but an object spreading in a region having a low height from the road surface may be detected to have a smaller frame than an actual frame in the separation detection processing.
  • the correction processing for short distance is processing of expanding the first assembly region by using a relative standard of the height of the first assembly region from the road surface obtained through the separation detection processing.
  • estimation accuracy for the road surface is high, so that processing such as coupling is not required.
  • estimation accuracy for the road surface in a case of long distance is lower than that in the case of short distance, so that erroneous separation of the separation detection result is easily caused.
  • the correction processing for long distance is processing of coupling two first assembly regions by using a relative standard of the height of the first assembly region from the road surface obtained through the separation detection processing.
  • detection accuracy for the object can be sufficiently secured by switching between the correction processing for short distance and the correction processing for long distance in accordance with the distance of the first assembly region obtained through the separation detection processing to correct the first assembly region.
  • the computer program executed by the equipment control system 1100 may be recorded and provided in a computer-readable recording medium such as a compact disc read only memory (CD-ROM), a flexible disk (FD), a compact disc recordable (CD-R), a digital versatile disc (DVD), and a Universal Serial Bus (USB) as an installable or executable file, or may be provided or distributed via a network such as the Internet.
  • a computer-readable recording medium such as a compact disc read only memory (CD-ROM), a flexible disk (FD), a compact disc recordable (CD-R), a digital versatile disc (DVD), and a Universal Serial Bus (USB) as an installable or executable file, or may be provided or distributed via a network such as the Internet.
  • Various computer programs may be embedded and provided in a ROM, for example.
  • any of the above-described apparatus, devices or units can be implemented as a hardware apparatus, such as a special-purpose circuit or device, or as a hardware/software combination, such as a processor executing a software program.
  • any one of the above-described and other methods of the present invention may be embodied in the form of a computer program stored in any kind of storage medium.
  • storage mediums include, but are not limited to, flexible disk, hard disk, optical discs, magneto-optical discs, magnetic tapes, nonvolatile memory, semiconductor memory, read-only-memory (ROM), etc.
  • ASIC application specific integrated circuit
  • DSP digital signal processor
  • FPGA field programmable gate array
  • Object recognition device (example of "information processing device") 2 Main body unit (example of “imaging device”) 3 Parallax value deriving unit 4 Communication line 5 Recognition processing unit 6 Vehicle control device (example of "control device”) 60 Equipment control system 70 Vehicle 100a, 100b Image acquisition unit 200a, 200b Conversion unit 300 Parallax value arithmetic processing unit (example of "generation unit”) 500 Second generation unit 501 Third generation unit (example of "movement surface estimation unit”) 502 Fourth generation unit 503 Fifth generation unit 510 Clustering processing unit 511 Basic detection unit (example of "first detection unit”) 512 Separation detection unit (example of "second detection unit”) 513 Integration detection unit (example of "first detection unit”) 514 Selection unit 515 Frame creation unit 516 Background detection unit 517 Rejection unit 530 Tracking unit 1100 Equipment control system 1101 Vehicle 1102 Imaging unit 1103 Analyzing unit 1104 Control unit 1105 Display unit 1106 Windshield 1111 Preprocessing unit 1112 Paralleled image

Abstract

L'invention concerne un dispositif de traitement d'informations qui comprend : une première unité de génération configurée pour générer des premières informations dans lesquelles une position de direction horizontale et une position de direction de profondeur d'un objet sont associées l'une à l'autre à partir d'informations dans lesquelles une position de direction verticale, la position de direction horizontale et la position de direction de profondeur de l'objet sont associées les unes aux autres ; une première unité de détection configurée pour détecter une région indiquant l'objet sur la base des premières informations ; une seconde unité de génération configurée pour générer, à partir des informations dans lesquelles la position de direction verticale, la position de direction horizontale et la position de direction de profondeur de l'objet sont associées les unes aux autres, des secondes informations présentant des performances de séparation supérieures aux performances de séparation des premières informations dans lesquelles la position de direction horizontale et la position de direction de profondeur de l'objet sont associées l'une à l'autre ; une seconde unité de détection configurée pour détecter une pluralité de régions indiquant des objets sur la base des secondes informations ; et une unité de sortie configurée pour associer la région détectée sur la base des premières informations aux régions détectées sur la base des secondes informations, et pour délivrer en sortie la région et les régions qui sont associées les unes aux autres.
PCT/JP2017/042302 2016-11-25 2017-11-24 Dispositif de traitement d'informations, dispositif d'imagerie, système de commande d'équipement, objet mobile, procédé de traitement d'informations et support d'enregistrement lisible par ordinateur WO2018097269A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US16/347,127 US20200074212A1 (en) 2016-11-25 2017-11-24 Information processing device, imaging device, equipment control system, mobile object, information processing method, and computer-readable recording medium
EP17812277.6A EP3545464A1 (fr) 2016-11-25 2017-11-24 Dispositif de traitement d'informations, dispositif d'imagerie, système de commande d'équipement, objet mobile, procédé de traitement d'informations et support d'enregistrement lisible par ordinateur
CN201780072352.XA CN109997148B (zh) 2016-11-25 2017-11-24 信息处理装置、成像装置、设备控制系统、移动对象、信息处理方法和计算机可读记录介质

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
JP2016-229566 2016-11-25
JP2016229468 2016-11-25
JP2016-229572 2016-11-25
JP2016229566 2016-11-25
JP2016-229468 2016-11-25
JP2016229572 2016-11-25
JP2017-177897 2017-09-15
JP2017177897A JP7206583B2 (ja) 2016-11-25 2017-09-15 情報処理装置、撮像装置、機器制御システム、移動体、情報処理方法およびプログラム

Publications (1)

Publication Number Publication Date
WO2018097269A1 true WO2018097269A1 (fr) 2018-05-31

Family

ID=60661913

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/042302 WO2018097269A1 (fr) 2016-11-25 2017-11-24 Dispositif de traitement d'informations, dispositif d'imagerie, système de commande d'équipement, objet mobile, procédé de traitement d'informations et support d'enregistrement lisible par ordinateur

Country Status (1)

Country Link
WO (1) WO2018097269A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3955207A4 (fr) * 2019-04-10 2022-12-28 Hitachi Astemo, Ltd. Dispositif de détection d'objet

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008065634A (ja) 2006-09-07 2008-03-21 Fuji Heavy Ind Ltd 物体検出装置および物体検出方法
US20130128001A1 (en) * 2011-11-18 2013-05-23 Ganmei YOU Method and system for detecting object on a road
US20160014406A1 (en) * 2014-07-14 2016-01-14 Sadao Takahashi Object detection apparatus, object detection method, object detection program, and device control system mountable to moveable apparatus

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008065634A (ja) 2006-09-07 2008-03-21 Fuji Heavy Ind Ltd 物体検出装置および物体検出方法
US20130128001A1 (en) * 2011-11-18 2013-05-23 Ganmei YOU Method and system for detecting object on a road
US20160014406A1 (en) * 2014-07-14 2016-01-14 Sadao Takahashi Object detection apparatus, object detection method, object detection program, and device control system mountable to moveable apparatus

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHUNG-HEE LEE: "Stereo vision-based vehicle detection using a road feature and disparity histogram", OPTICAL ENGINEERING, vol. 50, no. 2, 1 February 2011 (2011-02-01), pages 027004, XP055049147, ISSN: 0091-3286, DOI: 10.1117/1.3535590 *
WU MEIQING ET AL: "Stereo based ROIs generation for detecting pedestrians in close proximity", 17TH INTERNATIONAL IEEE CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), IEEE, 8 October 2014 (2014-10-08), pages 1929 - 1934, XP032685662, DOI: 10.1109/ITSC.2014.6957988 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3955207A4 (fr) * 2019-04-10 2022-12-28 Hitachi Astemo, Ltd. Dispositif de détection d'objet

Similar Documents

Publication Publication Date Title
CN109997148B (zh) 信息处理装置、成像装置、设备控制系统、移动对象、信息处理方法和计算机可读记录介质
JP6795027B2 (ja) 情報処理装置、物体認識装置、機器制御システム、移動体、画像処理方法およびプログラム
US10776946B2 (en) Image processing device, object recognizing device, device control system, moving object, image processing method, and computer-readable medium
JP6743882B2 (ja) 画像処理装置、機器制御システム、撮像装置、画像処理方法及びプログラム
JP6597795B2 (ja) 画像処理装置、物体認識装置、機器制御システム、画像処理方法およびプログラム
JP6597792B2 (ja) 画像処理装置、物体認識装置、機器制御システム、画像処理方法およびプログラム
JP6769477B2 (ja) 画像処理装置、撮像装置、移動体機器制御システム、画像処理方法、及びプログラム
EP3115933B1 (fr) Dispositif de traitement d'images, dispositif de capture d'images, système de commande de corps mobile, procédé de traitement d'images et support d'enregistrement lisible sur ordinateur
US10748014B2 (en) Processing device, object recognition apparatus, device control system, processing method, and computer-readable recording medium
JP6907513B2 (ja) 情報処理装置、撮像装置、機器制御システム、情報処理方法およびプログラム
JP6547841B2 (ja) 画像処理装置、物体認識装置、機器制御システム、画像処理方法およびプログラム
JP6516012B2 (ja) 画像処理装置、物体認識装置、機器制御システム、画像処理方法およびプログラム
JP5073700B2 (ja) 物体検出装置
JP6992356B2 (ja) 情報処理装置、撮像装置、機器制御システム、移動体、情報処理方法およびプログラム
US10789727B2 (en) Information processing apparatus and non-transitory recording medium storing thereon a computer program
JP2012252501A (ja) 走行路認識装置及び走行路認識用プログラム
JP6701905B2 (ja) 検出装置、視差値導出装置、物体認識装置、機器制御システム、およびプログラム
WO2017154305A1 (fr) Dispositif de traitement d'image, système de commande d'appareil, dispositif d'imagerie, procédé de traitement d'image et programme
WO2018097269A1 (fr) Dispositif de traitement d'informations, dispositif d'imagerie, système de commande d'équipement, objet mobile, procédé de traitement d'informations et support d'enregistrement lisible par ordinateur
JP6969245B2 (ja) 情報処理装置、撮像装置、機器制御システム、移動体、情報処理方法、及び、情報処理プログラム
JP7062904B2 (ja) 情報処理装置、撮像装置、機器制御システム、移動体、情報処理方法およびプログラム
JP2019160251A (ja) 画像処理装置、物体認識装置、機器制御システム、移動体、画像処理方法およびプログラム
JP6943092B2 (ja) 情報処理装置、撮像装置、機器制御システム、移動体、情報処理方法、及び、情報処理プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17812277

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017812277

Country of ref document: EP

Effective date: 20190625