WO2021084915A1 - Image recognition device - Google Patents

Image recognition device Download PDF

Info

Publication number
WO2021084915A1
WO2021084915A1 PCT/JP2020/033886 JP2020033886W WO2021084915A1 WO 2021084915 A1 WO2021084915 A1 WO 2021084915A1 JP 2020033886 W JP2020033886 W JP 2020033886W WO 2021084915 A1 WO2021084915 A1 WO 2021084915A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
information
dimensional object
processing unit
parallax
Prior art date
Application number
PCT/JP2020/033886
Other languages
French (fr)
Japanese (ja)
Inventor
郭介 牛場
亮輔 鴇
Original Assignee
日立Astemo株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日立Astemo株式会社 filed Critical 日立Astemo株式会社
Priority to JP2021554138A priority Critical patent/JP7379523B2/en
Priority to DE112020004377.0T priority patent/DE112020004377T5/en
Publication of WO2021084915A1 publication Critical patent/WO2021084915A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • G06V30/1801Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/16Anti-collision systems
    • G08G1/166Anti-collision systems for active traffic, e.g. moving vehicles, pedestrians, bikes

Definitions

  • the present invention relates to an image recognition device.
  • Patent Document 1 in a situation where an apparently moving three-dimensional object and another three-dimensional object overlap, a pedestrian existing inside the region is traced by tracking feature points inside a predetermined region containing the three-dimensional object.
  • a recognition device that detects a moving three-dimensional object such as is proposed.
  • Patent Document 2 proposes a method using machine learning, and also proposes to perform recognition by combining an image taken by an optical camera with distance information obtained from stereo matching or radar. ing.
  • texture information taken by an optical camera is used for recognition of an object, and erroneous recognition occurs in a photograph drawn on a wall or a signboard or a similar silhouette generated by a combination of natural objects. doing. This is because when the recognition process is performed using the image of the optical camera and the distance image corresponding to the image, the information on the pixels, the distance, and the area in which they are put together becomes too large and cannot be realized at a realistic cost.
  • the present invention has been made in view of the above circumstances, and an object of the present invention is to provide an image recognition device capable of accurately detecting a three-dimensional object and improving recognition performance while suppressing an increase in cost. There is.
  • the image recognition device of the present invention that solves the above-mentioned problems is an image recognition device that recognizes a three-dimensional object on an image captured by an imaging unit, and with respect to a detection area of the three-dimensional object set on the image.
  • the distance information or the parallax information of the three-dimensional object is numerically converted, and the numerically converted distance information or the parallax information is combined with the image information of the image to perform a recognition process for specifying the type of the three-dimensional object. And.
  • an image recognition device capable of accurately detecting a three-dimensional object and improving recognition performance while suppressing an increase in cost.
  • FIG. 3 It is a block diagram which shows the functional block composition (Example 3) of the image recognition apparatus involved in the three-dimensional object recognition processing. It is a flowchart which shows the detail (Example 3) of the three-dimensional object recognition processing. It is a schematic diagram which shows the procedure of creating the background removal edge image which removed the background edge from the luminance image using weight information. It is a flowchart which shows the operation in the image recognition apparatus of another example.
  • FIG. 1 is a block diagram showing an overall configuration of an image recognition device 100 according to the present embodiment.
  • the image recognition device 100 is mounted on a vehicle (hereinafter, may be referred to as a own vehicle), and the left camera (imaging unit) 101 and the right camera (imaging unit) 102 (hereinafter, simply referred to as simply) arranged side by side in front of the vehicle.
  • Cameras 101 and 102 are provided.
  • the cameras 101 and 102 constitute a stereo camera, and image a three-dimensional object in front of the vehicle such as a pedestrian, a vehicle, a signal, a sign, a white line, a tail lamp of the vehicle, and a headlight.
  • the image recognition device 100 includes a processing device 110 that recognizes the outside environment of the vehicle based on the information (image information) of the image in front of the vehicle captured by the cameras 101 and 102. Then, the vehicle (own vehicle) controls the brake, steering, and the like based on the recognition result by the image recognition device 100.
  • the processing device 110 of the image recognition device 100 captures the images captured by the cameras 101 and 102 from the image input interface 103.
  • the image information taken in from the image input interface 103 is sent to the image processing unit 104 via the internal bus 109. Then, it is processed by the arithmetic processing unit 105, and the result in the process of processing, the image information of the final result, and the like are stored in the storage unit 106.
  • the image processing unit 104 includes a first image obtained from the image sensor of the left camera 101 (hereinafter, may be referred to as a left image) and a second image obtained from the image sensor of the right camera 102 (hereinafter, referred to as a right image). For each image, correction of device-specific deviation caused by the image sensor and image correction such as noise interpolation are performed, and this is stored in the storage unit 106 as image information. .. Further, the image processing unit 104 calculates the points corresponding to each other between the first image and the second image, obtains the parallax information, and obtains the parallax information as the distance information corresponding to each pixel on the image. Is stored in the storage unit 106.
  • the image processing unit 104 is connected to the arithmetic processing unit 105, the CAN interface 107, and the control processing unit 108 via the internal bus 109.
  • the arithmetic processing unit 105 recognizes a three-dimensional object in order to grasp the environment around the vehicle by using the image information and the distance information (parallax information) stored in the storage unit 106. A part of the recognition result of the three-dimensional object and the intermediate processing result is stored in the storage unit 106. After recognizing a three-dimensional object with respect to the captured image, the arithmetic processing unit 105 calculates the vehicle control using the recognition result. The vehicle control policy obtained as a result of the vehicle control calculation and a part of the recognition result are transmitted to the in-vehicle network CAN111 via the CAN interface 107, whereby the vehicle is controlled.
  • the control processing unit 108 monitors whether each processing unit has caused an abnormal operation, whether an error has occurred during data transfer, and the like, and prevents the abnormal operation.
  • the image processing unit 104, the arithmetic processing unit 105, and the control processing unit 108 may be composed of a single computer unit or a plurality of computer units.
  • FIG. 2 is a flowchart showing the operation of the image recognition device 100.
  • an image is captured by the left camera 101 and the right camera 102 provided in the image recognition device 100, and each of the captured image information 121 and 122 absorbs the unique characteristics of the image sensor.
  • Image processing S203 such as correction of The processing result of the image processing S203 is stored in the image buffer 161.
  • the image buffer 161 is provided in the storage unit 106 of FIG.
  • the parallax processing S204 is performed. Specifically, the two images corrected by the image processing S203 are used to collate the images with each other, thereby obtaining parallax information of the images obtained by the left camera 101 and the right camera 102. By the parallax of the left and right images, a certain point of interest on the image of the three-dimensional object is obtained as the distance to the three-dimensional object by the principle of triangulation.
  • the processing result of the parallax processing S204 is stored in the parallax buffer 162.
  • the parallax buffer 162 is provided in the storage unit 106 of FIG. Further, the information recorded in the parallax buffer 162 may be converted into distance information and then used for the subsequent processing.
  • the image processing S203 and the parallax processing S204 are performed by the image processing unit 104 of FIG. 1, and the finally obtained image information and the parallax information are stored in the storage unit 106.
  • FIG. 3 is a diagram showing a three-dimensional object detection region (also referred to as a three-dimensional object region) set on the image by the three-dimensional object detection process S205.
  • FIG. 3 shows a pedestrian detection area 301 and a vehicle detection area 302 detected by the cameras 101 and 102 on the image as a result of the three-dimensional object detection process S205.
  • These detection areas 301 and 302 indicate areas where pedestrians or vehicles exist on the image, and even if they are rectangular as shown in FIG.
  • the detection area will be treated as a rectangle, and a pedestrian will be mainly used as an example of a three-dimensional object.
  • a recognition process for specifying the type of the three-dimensional object is performed for the detection area set on the image by the three-dimensional object detection process S205.
  • the three-dimensional object to be recognized by the three-dimensional object recognition process S206 is, for example, a pedestrian, a vehicle, a signal, a sign, a white line, a tail lamp of a car, a headlight, or the like, and the type of any of these is specified.
  • This three-dimensional object recognition process S206 is performed using the image information recorded in the image buffer 161 and the parallax information recorded in the parallax buffer 162.
  • the information in the parallax buffer 162 may cause erroneous recognition because the relationship between the object and the background exists infinitely. This is the same even when a radar such as a millimeter wave is combined with an image sensor such as a camera.
  • the details of the three-dimensional object recognition process S206 that solves this problem will be described later.
  • a warning is issued to the occupant in consideration of the recognition result of the three-dimensional object in the three-dimensional object recognition process S206 and the state of the own vehicle (speed, steering angle, etc.), and the own vehicle.
  • the control for braking and adjusting the steering angle of the vehicle is determined, or the avoidance control for the recognized three-dimensional object is determined, and the result is output as automatic control information via the CAN interface 107 (S208).
  • the three-dimensional object detection process S205, the three-dimensional object recognition process S206, and the vehicle control process S207 are performed by the arithmetic processing unit 105 of FIG.
  • the program shown in the flowchart of FIG. 2 and the flowchart of FIG. 5 described later can be executed by a computer equipped with a CPU, memory, and the like. All processing or some processing may be realized by a hard logic circuit. Further, this program can be provided by storing it in the storage medium of the image recognition device 100 in advance. Alternatively, the program can be stored and provided in an independent storage medium, or the program can be recorded and stored in the storage medium of the image recognition device 100 via a network line. It may be supplied as a computer-readable computer program product in various forms such as a data signal (carrier wave).
  • FIG. 4 is a block diagram showing a functional block configuration (Example 1) of the image recognition device 100 related to the three-dimensional object recognition process S206.
  • FIG. 5 is a flowchart showing the details (Example 1) of the three-dimensional object recognition process S206.
  • the three-dimensional object recognition process S206 of FIG. 2, that is, the flowchart shown in FIG. 5 is normalized to the information of the parallax buffer 162 provided in the arithmetic processing unit 105 as shown in FIG.
  • the normalization processing unit 401 normalizes the parallax corresponding to the detection area acquired by the three-dimensional object detection process S205 among the information contained in the parallax buffer 162 (FIG. 5: S501).
  • the value s i of each parallax is numerically converted into the value S i after normalization based on the following equation (1).
  • s max and s min are, for example, the maximum and minimum values of the parallax value before normalization
  • S max and S min are the maximum and minimum values after normalization.
  • S max and S min shall be arbitrarily determined according to the format of the information used in the three-dimensional object recognition process S206.
  • s max and s min may be arbitrarily determined according to the format of the information used in the three-dimensional object recognition process S206. For example, in a stereo camera, the accuracy of parallax and distance is poor due to being dragged when the signal / noise ratio near the region where the brightness value is small is poor due to the sensor characteristics, or when the resolution in the region where the brightness value is saturated is unstable. It is conceivable that In such a case, s max and s min may be set to arbitrary values based on the original pixel information, sensor characteristics, etc., or may be converted and used based on a certain conversion formula such as 10% carry-up or round-down. In addition, regardless of the accuracy of the original image, in the case of a radar sensor or the like, it is conceivable to use s max and s min excluding outliers based on the erroneous measurement occurrence rate in the region.
  • the equation used for the normalization process S501 may be defined as the following equation (2).
  • (Number 2) s avr is the average value of the parallax values in the detection area.
  • the method used for normalization shall be arbitrarily determined according to the format of the information used in the three-dimensional object recognition process S206.
  • the parallax information corresponding to the detection area is numerically converted and normalized based on an arbitrary rule, but it goes without saying that the distance information corresponding to the detection area may be numerically converted and normalized. is there.
  • the recognition processing unit 402 performs recognition processing by combining the information of the image buffer 161 and the normalization information of the parallax buffer 162 (parallax information or distance information after the normalization processing) (FIG. 5: S502).
  • the recognition process S502 uses, for example, pattern matching that compares a luminance image in the image buffer 161 with a predetermined pattern using a normalized correlation or the like, or determination by a classifier created by using machine learning. ..
  • a method such as using the average value of the pattern matching result of the luminance image and the pattern matching result of the normalized parallax information as the final judgment value, or the luminance image and the normalized parallax A method of identifying by a classifier created by machine learning using the difference in information as a feature quantity is used.
  • FIG. 6 is a block diagram showing a functional block configuration (Example 2) of the image recognition device 100 related to the three-dimensional object recognition process S206.
  • FIG. 7 is a flowchart showing the details (Example 2) of the three-dimensional object recognition process S206.
  • the three-dimensional object recognition process S206 of FIG. 2, that is, the flowchart shown in FIG. 7, is based on the information (misparity information) of the disparity buffer 162 provided in the arithmetic processing unit 105, as shown in FIG.
  • the weight generation processing unit 601 that creates weights corresponding to each pixel of the image of the image buffer 161 and the recognition processing unit 602 that recognizes the weight information created by the weight generation processing unit 601 together with the information of the image buffer information 161. Will be implemented.
  • the weight generation processing unit 601 calculates the weight corresponding to each pixel of the image of the image buffer 161 (the image corresponding to the detection area acquired by the three-dimensional object detection processing S205) from the information of the parallax buffer 162. Generate (FIG. 7: S701).
  • the detection area obtained by the three-dimensional object detection process S205 includes a background portion in addition to the recognition target that is the foreground portion. At this time, if the recognition target, which is the foreground part, and the background part are treated in the same way, it causes erroneous recognition. Therefore, in the weight generation process S701, the weight is created using the parallax information.
  • the weight is 1 for pixels having a parallax value s i satisfying the following equation (3), and other than that. Give a weight such that is 0. (Number 3)
  • This weight is used, for example, to mask the luminance information obtained from the image buffer 161.
  • the weight generation processing unit 601 may use the median value instead of the average value s avr , and instead of determining the threshold value s th , obtains a value deviating from the variance or standard deviation of the parallax in the detection region. You can also. For example, a weight is given so that pixels not included in the standard deviation of 3 ⁇ are 0 and others are 1. The designer may arbitrarily determine the maximum and minimum (in other words, the range) of this weight, and assign it linearly or according to an arbitrary function.
  • the weight can be created, for example, by creating a histogram from the parallax value s i in the detection area and selecting either the foreground or the background mountain generated in the histogram. For example, a weight is given so that the pixel having the parallax value s i corresponding to the foreground to be recognized is 1 and the other pixels are 0.
  • the weight corresponding to each pixel is generated (by numerical conversion) from the parallax information of the three-dimensional object, but from the distance information of the three-dimensional object.
  • the weight corresponding to each pixel may be generated (by numerical conversion), or instead of each pixel, the weight corresponding to each distance (corresponding to each pixel) or each parallax may be generated. Is.
  • the recognition processing unit 602 performs recognition processing using the image information of the image buffer 161 and the weight information created by the weight generation processing unit 601 (FIG. 7: S702).
  • the recognition process S702 includes, for example, a method such as pattern matching in which a weighted value of a luminance image in the image buffer 161 and a predetermined pattern are compared using a normalized correlation or the like, or a luminance image and a weight.
  • the recognition processing unit 602 can use the parallax information and the distance information obtained from the parallax buffer 162 in combination with the image information and the weight information for recognition. For example, after masking each of the luminance image and the parallax image with a weight, a method of identifying the two types after masking and a discriminator characterized by the difference thereof is used.
  • FIG. 8 is a block diagram showing a functional block configuration (Example 3) of the image recognition device 100 related to the three-dimensional object recognition process S206.
  • FIG. 9 is a flowchart showing the details (Example 3) of the three-dimensional object recognition process S206.
  • the three-dimensional object recognition process S206 of FIG. 2 that is, the flowchart shown in FIG. 9, shows the weight generation process unit 801 provided in the arithmetic processing unit 105 and the normalization process as shown in FIG. It is carried out by unit 802 and recognition processing unit 803.
  • the weight generation processing unit 801 uses the information of the parallax buffer 162 to obtain an image of the image buffer 161 (in the three-dimensional object detection process S205). A weight corresponding to each pixel of the acquired detection area) is generated (FIG. 9: S901).
  • a weight is created in which the value within the range of an arbitrary threshold value th is set to 1 from the median value of parallax, and the other values are set to 0.
  • the normalization processing unit 802 normalizes the parallax information corresponding to the detection area acquired by the three-dimensional object detection processing S205 based on the weight created by the weight generation processing unit 801 (FIG. 9). : S902).
  • S902 for example, when a binary weight of 0 or 1 is obtained, the maximum and minimum values of the parallax having the weight of 1 are set to s max and s min, and are based on the following equation (4). Normalize each parallax. (Number 4)
  • S i that exceeds S max and S i that is less than S min are obtained, a value that can be judged to be an invalid value may be added to the normalization result. For example, in a system that is premised on handling a finite positive value, exception handling can be considered in which a negative value is treated as an invalid value.
  • the weight corresponding to each pixel is generated (by numerical conversion) from the parallax information of the three-dimensional object, but from the distance information of the three-dimensional object.
  • the weight corresponding to each pixel may be generated (by numerical conversion), or instead of each pixel, the weight corresponding to each distance (corresponding to each pixel) or each parallax may be generated.
  • the parallax information corresponding to the detection area is numerically converted and normalized, it goes without saying that the distance information corresponding to the detection area may be numerically converted and normalized.
  • the recognition processing unit 803 performs recognition using the image information of the image buffer 161 and the parallax information (parallax information after normalization processing) created by the normalization processing unit 802 (FIG. 9: S903). .. Further, the recognition processing unit 803 can use the weight information created by the weight generation processing unit 801 in combination with the image information and the normalization information for recognition. For example, the edge image 1001 created by using edge extraction from the luminance image shown in FIG. 10 is multiplied by the weight information 1002 to create an edge image (background-removed edge image) 1003 from which the background edge is removed. Recognition is performed using the background removal edge image 1003 and the normalized parallax image.
  • the recognition process S903 may use a pattern matching technique such as normalization correlation. Further, a classifier may be used in which the product or difference of the two types of information is input.
  • the normalization process alone is affected by the characteristics of the background part.
  • only the weight generation process causes a difference in recognition performance depending on the distance of the foreground portion and the like. Therefore, by performing the weight generation process and the normalization process together, it is possible to recognize the image without being affected by the combination of the foreground and the background and the distance of the foreground, which improves the recognition performance. Connect.
  • all the parallax information can be replaced with distance information.
  • the image recognition device 100 using a stereo camera composed of a pair of cameras 101 and 102 has been described. However, it may be realized by using an image recognition device 100A that does not use a stereo camera.
  • FIG. 11 is a flowchart showing the operation of the image recognition device 100A.
  • the same parts as those of the operation in the image recognition device 100 shown in FIG. 2 are designated by the same reference numerals, and the description thereof will be omitted.
  • the image recognition device 100A includes an optical camera (hereinafter, simply referred to as a camera) 1101 and a radar sensor 1102 as an imaging unit.
  • a three-dimensional object is detected.
  • the image is captured by the camera 1101, and the captured image information is subjected to image processing S203 such as correction for absorbing the unique characteristics of the image sensor.
  • the processing result of the image processing S203 is stored in the image buffer 161.
  • the radar sensor 1102 obtains the distance to the three-dimensional object as sensor information.
  • the three-dimensional object detection process S213 detects a three-dimensional object in the three-dimensional space based on the distance to the three-dimensional object.
  • the distance information used for detection is stored in the distance buffer 163.
  • the distance buffer 163 is provided, for example, in the storage unit 106 of FIG. Further, in the three-dimensional object detection process S213, the image and the distance are associated with each other as necessary for the subsequent process.
  • the three-dimensional object recognition process S214 in substantially the same manner as the above-mentioned image recognition device 100 (here, using the distance information of the three-dimensional object), with respect to the detection area set on the image by the three-dimensional object detection process S213. Performs recognition processing to identify the type of three-dimensional object.
  • the subsequent processing can be performed in the same manner as the configuration by the stereo camera described in the image recognition device 100. Further, the image recognition device 100A does not require a plurality of images in the image processing S203.
  • the image recognition devices 100 and 100A of the present embodiment described above are three-dimensional with respect to the detection region of a three-dimensional object set on the image captured by the cameras 101, 102 and 1101 as the imaging unit.
  • the distance information or the parallax information of the object is numerically converted, and the numerically converted distance information or the parallax information is combined with the image information of the image to perform a recognition process for specifying the type of the three-dimensional object.
  • the distance information or parallax information of the three-dimensional object to be recognized is normalized with respect to the information of each pixel obtained from the cameras 101, 102, 1101 and the corresponding distance or parallax information. (Figs. 4 and 5), or mask distance information or parallax information other than the recognition target, or change the weight of pixel information and distance information or parallax information (Figs. 6 and 7), or combine them (Fig. 8, 9) By doing so, recognition that combines pixel information and distance information or parallax information is realized.
  • the image recognition devices 100 and 100A of the present embodiment can improve the positive recognition rate with respect to the detection areas 301 and 302 of the three-dimensional object set on the images captured by the cameras 101, 102 and 1101. it can.
  • the shape (appearance on the image) similar to the recognition target generated by the combination of the foreground and the background has the effect of suppressing erroneous recognition of the target. Therefore, according to the present embodiment, it is possible to accurately detect a three-dimensional object and improve the recognition performance while suppressing an increase in cost.
  • a stereo camera or a monocular camera composed of two cameras is used, but three or more cameras may be used.
  • the front camera that images the front of the vehicle (in other words, acquires the image of the front of the vehicle) is illustrated, it is natural that a rear camera or a side camera that images the rear of the vehicle or the side of the vehicle may be used. Is.
  • the present invention is not limited to the above-described embodiments, and other embodiments that can be considered within the scope of the technical idea of the present invention are also included within the scope of the present invention as long as the features of the present invention are not impaired. ..
  • the above-described embodiment has been described in detail in order to explain the present invention in an easy-to-understand manner, and is not necessarily limited to the one including all the described configurations. Further, the configuration may be a combination of the above-described embodiment and a modified example.
  • each of the above configurations, functions, processing units, processing means, etc. may be realized by hardware by designing a part or all of them by, for example, an integrated circuit. Further, each of the above configurations, functions, and the like may be realized by software by the processor interpreting and executing a program that realizes each function. Information such as programs, tables, and files that realize each function can be stored in a memory, a hard disk, a storage device such as an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, or a DVD.
  • SSD Solid State Drive
  • control lines and information lines indicate those that are considered necessary for explanation, and not all control lines and information lines are necessarily indicated on the product. In practice, it can be considered that almost all configurations are interconnected.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

Provided is an image recognition device which can accurately detect a three-dimensional object and have improved recognition performance while minimizing increases in cost. With regard to information about each pixel obtained from cameras 101, 102, and 1101 and information about a distance or parallax corresponding thereto, the distance information or parallax information for a three-dimensional object to be recognized is normalized, or distance information or parallax information other than that of the object to be recognized is masked, or the weighting of the pixel information and the distance information or parallax information is changed, or the above techniques are combined, thereby implementing recognition in which the pixel information and the distance information or parallax information is combined.

Description

画像認識装置Image recognition device
 本発明は、画像認識装置に関する。 The present invention relates to an image recognition device.
 近年、運転支援や自動運転などに必要な画像認識装置に対する性能向上への要求が高まっている。例えば、歩行者に対する衝突安全機能では、自動車アセスメントにおいて夜間歩行者への衝突安全試験が追加されるなど、性能向上が求められている。この性能向上を実現するために、立体物に対する高い認識性能が必要になる。 In recent years, there has been an increasing demand for improved performance of image recognition devices required for driving support and automatic driving. For example, in the collision safety function for pedestrians, performance improvement is required, such as the addition of a collision safety test for night pedestrians in automobile assessment. In order to realize this performance improvement, high recognition performance for three-dimensional objects is required.
 特許文献1には、見かけ上ある移動立体物と他の立体物が重なっている状況において、立体物を内包する所定の領域の内部の特徴点を追跡することで領域の内部に存在する歩行者などの移動立体物を検知する認識装置が提案されている。 In Patent Document 1, in a situation where an apparently moving three-dimensional object and another three-dimensional object overlap, a pedestrian existing inside the region is traced by tracking feature points inside a predetermined region containing the three-dimensional object. A recognition device that detects a moving three-dimensional object such as is proposed.
 また、特許文献2には、機械学習を用いた手法が提案されており、光学カメラで撮影された映像と、ステレオマッチングやレーダーなどから得た距離の情報を組み合わせて認識を行うことも提案されている。 Further, Patent Document 2 proposes a method using machine learning, and also proposes to perform recognition by combining an image taken by an optical camera with distance information obtained from stereo matching or radar. ing.
特開2017-142760号公報Japanese Unexamined Patent Publication No. 2017-142760 特開2019-028528号公報Japanese Unexamined Patent Publication No. 2019-028528
 しかしながら、従来の装置では、対象の認識には光学カメラで撮影されたテクスチャ情報などを用いており、壁や看板に描かれた写真などや、自然物の組み合わせによって発生する類似シルエットで誤認識が発生している。これは、光学カメラの画像とそれに対応した距離画像を用いて認識処理を行う場合、画素と距離とそれらをまとめた領域の情報が膨大となりすぎて、現実的なコストでは実現できないためである。 However, in the conventional device, texture information taken by an optical camera is used for recognition of an object, and erroneous recognition occurs in a photograph drawn on a wall or a signboard or a similar silhouette generated by a combination of natural objects. doing. This is because when the recognition process is performed using the image of the optical camera and the distance image corresponding to the image, the information on the pixels, the distance, and the area in which they are put together becomes too large and cannot be realized at a realistic cost.
 本発明は、上記事情に鑑みてなされたもので、その目的とするところは、コスト増加を抑制しつつ、立体物を的確に検知し、認識性能を向上させることのできる画像認識装置を提供することにある。 The present invention has been made in view of the above circumstances, and an object of the present invention is to provide an image recognition device capable of accurately detecting a three-dimensional object and improving recognition performance while suppressing an increase in cost. There is.
 上記課題を解決する本発明の画像認識装置は、撮像部によって撮像された画像上の立体物の認識を行う画像認識装置であって、前記画像上に設定された立体物の検知領域に対して、前記立体物の距離情報または視差情報を数値変換し、数値変換された距離情報または視差情報と前記画像の画像情報とを組み合わせて、前記立体物の種別を特定する認識処理を行うことを特徴とする。 The image recognition device of the present invention that solves the above-mentioned problems is an image recognition device that recognizes a three-dimensional object on an image captured by an imaging unit, and with respect to a detection area of the three-dimensional object set on the image. , The distance information or the parallax information of the three-dimensional object is numerically converted, and the numerically converted distance information or the parallax information is combined with the image information of the image to perform a recognition process for specifying the type of the three-dimensional object. And.
 本発明によれば、コスト増加を抑制しつつ、立体物を的確に検知し、認識性能を向上させることのできる画像認識装置を提供できる。 According to the present invention, it is possible to provide an image recognition device capable of accurately detecting a three-dimensional object and improving recognition performance while suppressing an increase in cost.
 上記した以外の課題、構成及び効果は以下の実施形態の説明により明らかにされる。 Issues, configurations and effects other than those described above will be clarified by the explanation of the following embodiments.
画像認識装置の全体構成を示すブロック図である。It is a block diagram which shows the whole structure of an image recognition apparatus. 画像認識装置の動作を示すフローチャートである。It is a flowchart which shows the operation of an image recognition apparatus. 立体物検知処理により画像上に設定された立体物の検知領域を示す図である。It is a figure which shows the detection area of the 3D object set on the image by the 3D object detection processing. 立体物認識処理にかかわる画像認識装置の機能ブロック構成(実施例1)を示すブロック図である。It is a block diagram which shows the functional block composition (Example 1) of the image recognition apparatus which is involved in a three-dimensional object recognition process. 立体物認識処理の詳細(実施例1)を示すフローチャートである。It is a flowchart which shows the detail (Example 1) of the three-dimensional object recognition processing. 立体物認識処理にかかわる画像認識装置の機能ブロック構成(実施例2)を示すブロック図である。It is a block diagram which shows the functional block composition (Example 2) of the image recognition apparatus involved in the three-dimensional object recognition processing. 立体物認識処理の詳細(実施例2)を示すフローチャートである。It is a flowchart which shows the detail (Example 2) of the three-dimensional object recognition processing. 立体物認識処理にかかわる画像認識装置の機能ブロック構成(実施例3)を示すブロック図である。It is a block diagram which shows the functional block composition (Example 3) of the image recognition apparatus involved in the three-dimensional object recognition processing. 立体物認識処理の詳細(実施例3)を示すフローチャートである。It is a flowchart which shows the detail (Example 3) of the three-dimensional object recognition processing. 重み情報を用いて輝度画像から背景エッジを除去した背景除去エッジ画像を作成する手順を示す概略図である。It is a schematic diagram which shows the procedure of creating the background removal edge image which removed the background edge from the luminance image using weight information. 他例の画像認識装置における動作を示すフローチャートである。It is a flowchart which shows the operation in the image recognition apparatus of another example.
 以下、本発明の実施形態について図面を用いて説明する。なお、各図において同じ機能を有する部分には同じ符号を付して繰り返し説明は省略する場合がある。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In each figure, parts having the same function may be designated by the same reference numerals and repeated description may be omitted.
(画像認識装置の構成) 図1は、本実施形態にかかわる画像認識装置100の全体構成を示すブロック図である。画像認識装置100は、車両(以下、自車両ということがある)に搭載され、車両前方の左右に横並びで配置された左カメラ(撮像部)101と右カメラ(撮像部)102(以下、単にカメラ101、102ということがある)を備える。カメラ101、102は、ステレオカメラを構成し、例えば、歩行者、車両、信号、標識、白線、車のテールランプ、ヘッドライトなどの車両前方の立体物を撮像する。画像認識装置100は、カメラ101、102で撮像された車両前方の画像の情報(画像情報)に基づいて車外環境を認識する処理装置110を備える。そして、車両(自車両)は、画像認識装置100による認識結果に基づいて、ブレーキ、ステアリングなどを制御する。 (Structure of Image Recognition Device) FIG. 1 is a block diagram showing an overall configuration of an image recognition device 100 according to the present embodiment. The image recognition device 100 is mounted on a vehicle (hereinafter, may be referred to as a own vehicle), and the left camera (imaging unit) 101 and the right camera (imaging unit) 102 (hereinafter, simply referred to as simply) arranged side by side in front of the vehicle. Cameras 101 and 102) are provided. The cameras 101 and 102 constitute a stereo camera, and image a three-dimensional object in front of the vehicle such as a pedestrian, a vehicle, a signal, a sign, a white line, a tail lamp of the vehicle, and a headlight. The image recognition device 100 includes a processing device 110 that recognizes the outside environment of the vehicle based on the information (image information) of the image in front of the vehicle captured by the cameras 101 and 102. Then, the vehicle (own vehicle) controls the brake, steering, and the like based on the recognition result by the image recognition device 100.
 画像認識装置100の処理装置110は、カメラ101、102で撮像した画像を画像入力インタフェース103より取り込む。画像入力インタフェース103より取り込まれた画像情報は、内部バス109を介して画像処理部104へ送られる。そして、演算処理部105で処理され、処理途中の結果や最終結果の画像情報などは記憶部106に記憶される。 The processing device 110 of the image recognition device 100 captures the images captured by the cameras 101 and 102 from the image input interface 103. The image information taken in from the image input interface 103 is sent to the image processing unit 104 via the internal bus 109. Then, it is processed by the arithmetic processing unit 105, and the result in the process of processing, the image information of the final result, and the like are stored in the storage unit 106.
 画像処理部104は、左カメラ101の撮像素子から得られる第1の画像(以下、左画像ということがある)と、右カメラ102の撮像素子から得られる第2の画像(以下、右画像ということがある)とを比較して、それぞれの画像に対して、撮像素子に起因するデバイス固有の偏差の補正や、ノイズ補間などの画像補正を行い、これを画像情報として記憶部106に記憶する。更に、画像処理部104は、第1の画像と第2の画像との間で、相互に対応する箇所を計算して、視差情報を求め、画像上の各画素に対応する距離情報として、これを記憶部106に記憶する。画像処理部104は、内部バス109を介して演算処理部105、CANインタフェース107、制御処理部108に接続されている。 The image processing unit 104 includes a first image obtained from the image sensor of the left camera 101 (hereinafter, may be referred to as a left image) and a second image obtained from the image sensor of the right camera 102 (hereinafter, referred to as a right image). For each image, correction of device-specific deviation caused by the image sensor and image correction such as noise interpolation are performed, and this is stored in the storage unit 106 as image information. .. Further, the image processing unit 104 calculates the points corresponding to each other between the first image and the second image, obtains the parallax information, and obtains the parallax information as the distance information corresponding to each pixel on the image. Is stored in the storage unit 106. The image processing unit 104 is connected to the arithmetic processing unit 105, the CAN interface 107, and the control processing unit 108 via the internal bus 109.
 演算処理部105は、記憶部106に蓄えられた画像情報および距離情報(視差情報)を使い、車両周辺の環境を把握するために、立体物の認識を行う。立体物の認識結果や中間的な処理結果の一部が、記憶部106に記憶される。演算処理部105は、撮像した画像に対して立体物の認識を行った後に、認識結果を用いて車両制御の計算を行う。車両制御の計算の結果として得られた車両の制御方針や、認識結果の一部は、CANインタフェース107を介して、車載ネットワークCAN111に伝えられ、これにより車両の制御が行われる。 The arithmetic processing unit 105 recognizes a three-dimensional object in order to grasp the environment around the vehicle by using the image information and the distance information (parallax information) stored in the storage unit 106. A part of the recognition result of the three-dimensional object and the intermediate processing result is stored in the storage unit 106. After recognizing a three-dimensional object with respect to the captured image, the arithmetic processing unit 105 calculates the vehicle control using the recognition result. The vehicle control policy obtained as a result of the vehicle control calculation and a part of the recognition result are transmitted to the in-vehicle network CAN111 via the CAN interface 107, whereby the vehicle is controlled.
 制御処理部108は、各処理部が異常動作を起こしていないか、データ転送時にエラーが発生していないかなどを監視し、異常動作を防止する。画像処理部104、演算処理部105、および制御処理部108は、単一または複数のコンピュータユニットにより構成してもよい。 The control processing unit 108 monitors whether each processing unit has caused an abnormal operation, whether an error has occurred during data transfer, and the like, and prevents the abnormal operation. The image processing unit 104, the arithmetic processing unit 105, and the control processing unit 108 may be composed of a single computer unit or a plurality of computer units.
(画像認識装置の動作) 図2は、画像認識装置100の動作を示すフローチャートである。 (Operation of the image recognition device) FIG. 2 is a flowchart showing the operation of the image recognition device 100.
 S201、S202では、画像認識装置100に備えられた左カメラ101と右カメラ102とにより画像が撮像され、撮像された画像情報121、122のそれぞれについて、撮像素子が持つ固有の特性を吸収するための補正などの画像処理S203を行う。画像処理S203の処理結果は画像バッファ161に蓄えられる。画像バッファ161は、図1の記憶部106に設けられる。 In S201 and S202, an image is captured by the left camera 101 and the right camera 102 provided in the image recognition device 100, and each of the captured image information 121 and 122 absorbs the unique characteristics of the image sensor. Image processing S203 such as correction of The processing result of the image processing S203 is stored in the image buffer 161. The image buffer 161 is provided in the storage unit 106 of FIG.
 次に、視差処理S204が行われる。具体的には、画像処理S203で補正された2つの画像を使って、画像同士の照合を行い、これにより左カメラ101、右カメラ102で得た画像の視差情報を得る。左右画像の視差により、立体物の画像上のある着目点が、三角測量の原理によって、立体物までの距離として求められる。視差処理S204の処理結果は視差バッファ162に蓄えられる。視差バッファ162は、図1の記憶部106に設けられる。また、視差バッファ162に記録される情報は、距離情報に変換したのちに後段の処理に用いてもよい。 Next, the parallax processing S204 is performed. Specifically, the two images corrected by the image processing S203 are used to collate the images with each other, thereby obtaining parallax information of the images obtained by the left camera 101 and the right camera 102. By the parallax of the left and right images, a certain point of interest on the image of the three-dimensional object is obtained as the distance to the three-dimensional object by the principle of triangulation. The processing result of the parallax processing S204 is stored in the parallax buffer 162. The parallax buffer 162 is provided in the storage unit 106 of FIG. Further, the information recorded in the parallax buffer 162 may be converted into distance information and then used for the subsequent processing.
 画像処理S203および視差処理S204は、図1の画像処理部104で行われ、最終的に得られた画像情報、および視差情報は、記憶部106に蓄えられる。 The image processing S203 and the parallax processing S204 are performed by the image processing unit 104 of FIG. 1, and the finally obtained image information and the parallax information are stored in the storage unit 106.
 そして、次の立体物検知処理S205では、視差処理S204により左右画像の各画素の視差または距離が得られた視差情報を用いて、3次元空間上の立体物を検知する。図3は、立体物検知処理S205により画像上に設定された立体物の検知領域(立体物領域ともいう)を示す図である。図3には、立体物検知処理S205の結果、画像上において、カメラ101、102によって検知された歩行者の検知領域301と車両の検知領域302が示されている。これらの検知領域301、302は、画像上において歩行者または車両が存在する領域を示しており、図3に示すように矩形であっても、視差や距離から得られる不定形の領域であってもよい。後段の処理において計算機での扱いを容易にするため、一般的には矩形として扱われる。本実施形態では以下、検知領域は矩形として扱い、立体物の一例として主に歩行者を用いて説明する。 Then, in the next three-dimensional object detection process S205, a three-dimensional object in the three-dimensional space is detected using the parallax information obtained by the parallax processing S204 for the parallax or distance of each pixel of the left and right images. FIG. 3 is a diagram showing a three-dimensional object detection region (also referred to as a three-dimensional object region) set on the image by the three-dimensional object detection process S205. FIG. 3 shows a pedestrian detection area 301 and a vehicle detection area 302 detected by the cameras 101 and 102 on the image as a result of the three-dimensional object detection process S205. These detection areas 301 and 302 indicate areas where pedestrians or vehicles exist on the image, and even if they are rectangular as shown in FIG. 3, they are irregular areas obtained from parallax and distance. May be good. It is generally treated as a rectangle in order to facilitate the handling by a computer in the subsequent processing. In the present embodiment, the detection area will be treated as a rectangle, and a pedestrian will be mainly used as an example of a three-dimensional object.
 次に、立体物認識処理S206では、立体物検知処理S205により画像上に設定された検知領域に対して立体物の種別を特定する認識処理を行う。立体物認識処理S206による認識対象の立体物は、例えば、歩行者、車両、信号、標識、白線、車のテールランプやヘッドライトなどであり、これらの何れであるかその種別が特定される。この立体物認識処理S206は、画像バッファ161に記録された画像情報と、視差バッファ162に記録された視差情報とを用いて行われる。しかし、視差バッファ162の情報は、対象物と背景の関係が無限に存在するために誤認識の原因となる場合がある。これは、ミリ波などのレーダーと、カメラなどの画像センサとを組み合わせた場合でも同様である。この問題を解決した立体物認識処理S206の詳細については後述する。 Next, in the three-dimensional object recognition process S206, a recognition process for specifying the type of the three-dimensional object is performed for the detection area set on the image by the three-dimensional object detection process S205. The three-dimensional object to be recognized by the three-dimensional object recognition process S206 is, for example, a pedestrian, a vehicle, a signal, a sign, a white line, a tail lamp of a car, a headlight, or the like, and the type of any of these is specified. This three-dimensional object recognition process S206 is performed using the image information recorded in the image buffer 161 and the parallax information recorded in the parallax buffer 162. However, the information in the parallax buffer 162 may cause erroneous recognition because the relationship between the object and the background exists infinitely. This is the same even when a radar such as a millimeter wave is combined with an image sensor such as a camera. The details of the three-dimensional object recognition process S206 that solves this problem will be described later.
 次に、車両制御処理S207では、立体物認識処理S206での立体物の認識結果と、自車両の状態(速度、舵角など)とを勘案して、例えば、乗員に警告を発し、自車両のブレーキングや舵角調整などを行う制御を定め、あるいは、認識した立体物に対する回避制御を定め、その結果を自動制御情報として、CANインタフェース107を介して出力する(S208)。 Next, in the vehicle control process S207, for example, a warning is issued to the occupant in consideration of the recognition result of the three-dimensional object in the three-dimensional object recognition process S206 and the state of the own vehicle (speed, steering angle, etc.), and the own vehicle. The control for braking and adjusting the steering angle of the vehicle is determined, or the avoidance control for the recognized three-dimensional object is determined, and the result is output as automatic control information via the CAN interface 107 (S208).
 立体物検知処理S205、立体物認識処理S206、および車両制御処理S207は、図1の演算処理部105で行われる。 The three-dimensional object detection process S205, the three-dimensional object recognition process S206, and the vehicle control process S207 are performed by the arithmetic processing unit 105 of FIG.
 なお、図2のフローチャート、および後述の図5などのフローチャートで示したプログラムを、CPU、メモリなどを備えたコンピュータにより実行することができる。全部の処理、または一部の処理をハードロジック回路により実現してもよい。更に、このプログラムは、予め画像認識装置100の記憶媒体に格納して提供することができる。あるいは、独立した記憶媒体にプログラムを格納して提供したり、ネットワーク回線によりプログラムを画像認識装置100の記憶媒体に記録して格納することもできる。データ信号(搬送波)などの種々の形態のコンピュータ読み込み可能なコンピュータプログラム製品として供給してもよい。 Note that the program shown in the flowchart of FIG. 2 and the flowchart of FIG. 5 described later can be executed by a computer equipped with a CPU, memory, and the like. All processing or some processing may be realized by a hard logic circuit. Further, this program can be provided by storing it in the storage medium of the image recognition device 100 in advance. Alternatively, the program can be stored and provided in an independent storage medium, or the program can be recorded and stored in the storage medium of the image recognition device 100 via a network line. It may be supplied as a computer-readable computer program product in various forms such as a data signal (carrier wave).
<立体物認識処理(実施例1)> 図4は、立体物認識処理S206にかかわる画像認識装置100の機能ブロック構成(実施例1)を示すブロック図である。図5は、立体物認識処理S206の詳細(実施例1)を示すフローチャートである。本例において、前述の図2の立体物認識処理S206、すなわち、図5に示すフローチャートは、図4に示すように、演算処理部105に備えられた、視差バッファ162の情報に対して正規化を行う正規化処理部401と、正規化処理部401を通過した視差バッファ162の情報と、画像バッファ161の情報を合わせて認識を行う認識処理部402によって実施される。以下、順に各処理部の処理を説明する。なお、これらの処理ではステレオカメラを前提に説明する。 <Three-dimensional object recognition process (Example 1)> FIG. 4 is a block diagram showing a functional block configuration (Example 1) of the image recognition device 100 related to the three-dimensional object recognition process S206. FIG. 5 is a flowchart showing the details (Example 1) of the three-dimensional object recognition process S206. In this example, the three-dimensional object recognition process S206 of FIG. 2, that is, the flowchart shown in FIG. 5, is normalized to the information of the parallax buffer 162 provided in the arithmetic processing unit 105 as shown in FIG. This is performed by the normalization processing unit 401, the recognition processing unit 402 that recognizes the information of the parallax buffer 162 that has passed through the normalization processing unit 401, and the information of the image buffer 161. Hereinafter, the processing of each processing unit will be described in order. Note that these processes will be described on the premise of a stereo camera.
[正規化処理部] 正規化処理部401では、視差バッファ162が持つ情報のうち、立体物検知処理S205で取得された検知領域に対応する視差について正規化を行う(図5:S501)。正規化処理S501では、例えば下記の式(1)に基づいて、各視差の値siを正規化後の値Siに数値変換する。
(数1)
Figure JPOXMLDOC01-appb-I000001
ここでsmaxとsminは例えば正規化前の視差値の最大値、最小値であり、SmaxとSminは正規後の最大値と最小値である。SmaxとSminは立体物認識処理S206で用いる情報のフォーマットに合わせて任意に定めるものとする。例えばSmax=1、Smin=0である。また、smaxとsminも立体物認識処理S206で用いる情報のフォーマットに合わせて任意に定めてもよい。例えばステレオカメラにおいては、センサ特性から輝度値が小さな領域付近のシグナル/ノイズ比が悪い場合や、輝度値が飽和する領域の分解能が安定しない場合などに引きずられる形で視差や距離の精度が悪くなることが考えられる。このような場合、元の画素情報やセンサ特性などからsmaxとsminを任意の値に設定したり、1割繰り上げまたは切り下げのように一定の変換式に基づいて変換して用いてよい。また、元画像の精度に依らずとも、レーダーセンサなどの場合は領域内の誤計測発生率などに基づいて、外れ値を除外したsmaxとsminを用いることなども考えられる。
[Normalization processing unit] The normalization processing unit 401 normalizes the parallax corresponding to the detection area acquired by the three-dimensional object detection process S205 among the information contained in the parallax buffer 162 (FIG. 5: S501). In the normalization process S501, for example, the value s i of each parallax is numerically converted into the value S i after normalization based on the following equation (1).
(Number 1)
Figure JPOXMLDOC01-appb-I000001
Here, s max and s min are, for example, the maximum and minimum values of the parallax value before normalization, and S max and S min are the maximum and minimum values after normalization. S max and S min shall be arbitrarily determined according to the format of the information used in the three-dimensional object recognition process S206. For example, S max = 1 and S min = 0. Further, s max and s min may be arbitrarily determined according to the format of the information used in the three-dimensional object recognition process S206. For example, in a stereo camera, the accuracy of parallax and distance is poor due to being dragged when the signal / noise ratio near the region where the brightness value is small is poor due to the sensor characteristics, or when the resolution in the region where the brightness value is saturated is unstable. It is conceivable that In such a case, s max and s min may be set to arbitrary values based on the original pixel information, sensor characteristics, etc., or may be converted and used based on a certain conversion formula such as 10% carry-up or round-down. In addition, regardless of the accuracy of the original image, in the case of a radar sensor or the like, it is conceivable to use s max and s min excluding outliers based on the erroneous measurement occurrence rate in the region.
 また、正規化処理S501に用いる式は、下記の式(2)のように定めてもよい。
(数2)
Figure JPOXMLDOC01-appb-I000002
ここでsavrは検知領域の視差値の平均値である。上記のように、正規化に用いる手法は立体物認識処理S206で用いる情報のフォーマットに合わせて任意に定めるものとする。
Further, the equation used for the normalization process S501 may be defined as the following equation (2).
(Number 2)
Figure JPOXMLDOC01-appb-I000002
Here, s avr is the average value of the parallax values in the detection area. As described above, the method used for normalization shall be arbitrarily determined according to the format of the information used in the three-dimensional object recognition process S206.
 なお、ここでは、検知領域に対応する視差情報を任意の規則に基づいて数値変換して正規化しているが、検知領域に対応する距離情報を数値変換して正規化してもよいことは勿論である。 Here, the parallax information corresponding to the detection area is numerically converted and normalized based on an arbitrary rule, but it goes without saying that the distance information corresponding to the detection area may be numerically converted and normalized. is there.
[認識処理部] 認識処理部402では、画像バッファ161の情報と視差バッファ162の正規化情報(正規化処理後の視差情報または距離情報)を組み合わせて認識処理を行う(図5:S502)。認識処理S502は、例えば画像バッファ161にある輝度画像と、あらかじめ定められたパターンとを正規化相関などを用いて比較するパターンマッチングや、機械学習を用いて作成した識別器による判定などが用いられる。視差バッファ162の正規化情報を組み合わせる場合、例えば、輝度画像のパターンマッチング結果と正規化視差情報のパターンマッチング結果の平均値を最終的な判定値とするなどの手法や、輝度画像と正規化視差情報の差分を特徴量として機械学習によって作成された識別器によって識別する手法などを用いる。 [Recognition processing unit] The recognition processing unit 402 performs recognition processing by combining the information of the image buffer 161 and the normalization information of the parallax buffer 162 (parallax information or distance information after the normalization processing) (FIG. 5: S502). The recognition process S502 uses, for example, pattern matching that compares a luminance image in the image buffer 161 with a predetermined pattern using a normalized correlation or the like, or determination by a classifier created by using machine learning. .. When combining the normalized information of the parallax buffer 162, for example, a method such as using the average value of the pattern matching result of the luminance image and the pattern matching result of the normalized parallax information as the final judgment value, or the luminance image and the normalized parallax A method of identifying by a classifier created by machine learning using the difference in information as a feature quantity is used.
 パターンマッチングで対象の認識をする場合を例にすると、前述の正規化処理を行わない場合、認識対象である前景と背景の組み合わせは膨大な数となる。例えば前景の認識対象である歩行者が10mの位置にいる場合と、背景がその後ろ20mに壁として存在する場合、また同じく歩行者が10mの位置にいて、背景がその後ろ40mにいる場合では、視差または距離の情報が異なる。これらのパターンマッチングを行う場合、それぞれに対応したテンプレートを持つ必要があるが、前景の位置と背景の位置は有限ながらも無数に存在するため、その組み合わせをすべてテンプレートとして有することは現実的ではない。また、機械学習を用いた統計処理を行う場合であっても、前景と背景の組み合わせをすべて収集するのは現実的ではない。このため、現実的な情報量に落とし込める(所定範囲に圧縮できる)前述の正規化処理が有効となる。 Taking the case of recognizing an object by pattern matching as an example, if the above-mentioned normalization process is not performed, the number of combinations of the foreground and the background to be recognized becomes enormous. For example, when the pedestrian to be recognized in the foreground is at a position of 10 m, when the background exists as a wall 20 m behind it, and when the pedestrian is also at a position of 10 m and the background is 40 m behind it. , Parallax or distance information is different. When performing these pattern matching, it is necessary to have a template corresponding to each, but since there are innumerable foreground positions and background positions, it is not realistic to have all the combinations as templates. .. Moreover, even when statistical processing using machine learning is performed, it is not realistic to collect all combinations of foreground and background. Therefore, the above-mentioned normalization process that can reduce the amount of information to a realistic amount (can be compressed to a predetermined range) is effective.
<立体物認識処理(実施例2)> 図6は、立体物認識処理S206にかかわる画像認識装置100の機能ブロック構成(実施例2)を示すブロック図である。図7は、立体物認識処理S206の詳細(実施例2)を示すフローチャートである。本例において、前述の図2の立体物認識処理S206、すなわち、図7に示すフローチャートは、図6に示すように、演算処理部105に備えられた、視差バッファ162の情報(視差情報)から画像バッファ161の画像の各画素に対応する重みを作成する重み生成処理部601、重み生成処理部601で作成した重み情報と、画像バッファ情報161の情報を合わせて認識を行う認識処理部602によって実施される。 <Three-dimensional object recognition process (Example 2)> FIG. 6 is a block diagram showing a functional block configuration (Example 2) of the image recognition device 100 related to the three-dimensional object recognition process S206. FIG. 7 is a flowchart showing the details (Example 2) of the three-dimensional object recognition process S206. In this example, the three-dimensional object recognition process S206 of FIG. 2, that is, the flowchart shown in FIG. 7, is based on the information (misparity information) of the disparity buffer 162 provided in the arithmetic processing unit 105, as shown in FIG. The weight generation processing unit 601 that creates weights corresponding to each pixel of the image of the image buffer 161 and the recognition processing unit 602 that recognizes the weight information created by the weight generation processing unit 601 together with the information of the image buffer information 161. Will be implemented.
[重み生成処理部] 重み生成処理部601は、視差バッファ162の情報から、画像バッファ161の画像(立体物検知処理S205で取得された検知領域に対応した画像)の各画素に対応する重みを生成する(図7:S701)。立体物検知処理S205によって得られる検知領域には、前景部分となる認識対象のほかに、背景部分が含まれる。この時、前景部分となる認識対象と背景部分を同一に扱った場合、誤認識の原因になる。そこで、重み生成処理S701では、視差情報を用いて重みを作成する。重みは、例えば視差の値siの平均値savrに対して、任意のしきい値sthを定めたとき、以下の式(3)を満たす視差値siを持つ画素を1、それ以外を0とするような重みを与える。
(数3)
Figure JPOXMLDOC01-appb-I000003
[Weight generation processing unit] The weight generation processing unit 601 calculates the weight corresponding to each pixel of the image of the image buffer 161 (the image corresponding to the detection area acquired by the three-dimensional object detection processing S205) from the information of the parallax buffer 162. Generate (FIG. 7: S701). The detection area obtained by the three-dimensional object detection process S205 includes a background portion in addition to the recognition target that is the foreground portion. At this time, if the recognition target, which is the foreground part, and the background part are treated in the same way, it causes erroneous recognition. Therefore, in the weight generation process S701, the weight is created using the parallax information. For example, when an arbitrary threshold value s th is set for the average value s avr of the parallax value s i , the weight is 1 for pixels having a parallax value s i satisfying the following equation (3), and other than that. Give a weight such that is 0.
(Number 3)
Figure JPOXMLDOC01-appb-I000003
 この重みは、例えば画像バッファ161から得られる輝度情報をマスクするのに用いる。重み生成処理部601は、平均値savrの代わりに中央値を用いてもよいし、しきい値sthを定める代わりに、検知領域内の視差の分散や標準偏差から外れた値を求めることもできる。例えば標準偏差の3σ範囲内に含まれない画素を0、それ以外を1とするような重みを与える。この重みは設計者が最大最小(言い換えれば、範囲)を任意に定め、その間を線形に割り当てたり任意の関数に従って割り当てたりしてよい。また、重みは例えば検知領域内の視差値siからヒストグラムを作成し、ヒストグラムに生成される前景と背景の山のどちらかを選ぶ形で作成することができる。例えば、認識対象である前景に対応する視差値siを持つ画素を1、それ以外を0とするような重みを与える。 This weight is used, for example, to mask the luminance information obtained from the image buffer 161. The weight generation processing unit 601 may use the median value instead of the average value s avr , and instead of determining the threshold value s th , obtains a value deviating from the variance or standard deviation of the parallax in the detection region. You can also. For example, a weight is given so that pixels not included in the standard deviation of 3σ are 0 and others are 1. The designer may arbitrarily determine the maximum and minimum (in other words, the range) of this weight, and assign it linearly or according to an arbitrary function. Further, the weight can be created, for example, by creating a histogram from the parallax value s i in the detection area and selecting either the foreground or the background mountain generated in the histogram. For example, a weight is given so that the pixel having the parallax value s i corresponding to the foreground to be recognized is 1 and the other pixels are 0.
 なお、ここでは、立体物検知処理S205で取得された検知領域に対して、立体物の視差情報から各画素に対応する重みを(数値変換により)生成しているが、立体物の距離情報から各画素に対応する重みを(数値変換により)生成してもよいし、各画素の代わりに、(各画素に対応する)各距離または各視差に対応する重みを生成してもよいことは勿論である。 Here, for the detection area acquired by the three-dimensional object detection process S205, the weight corresponding to each pixel is generated (by numerical conversion) from the parallax information of the three-dimensional object, but from the distance information of the three-dimensional object. Of course, the weight corresponding to each pixel may be generated (by numerical conversion), or instead of each pixel, the weight corresponding to each distance (corresponding to each pixel) or each parallax may be generated. Is.
[認識処理部] 認識処理部602では、画像バッファ161の画像情報と、重み生成処理部601で作成した重み情報を用いて認識処理を行う(図7:S702)。認識処理S702は、例えば画像バッファ161にある輝度画像に対して重みをかけた値と、あらかじめ定められたパターンとを正規化相関などを用いて比較するパターンマッチングなどの手法や、輝度画像と重みの積を特徴量とする識別器によって識別する手法を用いる。また、認識処理部602は、前記画像情報や重み情報に合わせて、視差バッファ162から得られる視差情報や距離情報を組み合わせて認識に用いることができる。例えば、輝度画像と視差画像それぞれに対して重みによるマスクを行ったうえで、マスク後の前記2種類とその差分を特徴とするような識別器によって識別する手法などを用いる。 [Recognition processing unit] The recognition processing unit 602 performs recognition processing using the image information of the image buffer 161 and the weight information created by the weight generation processing unit 601 (FIG. 7: S702). The recognition process S702 includes, for example, a method such as pattern matching in which a weighted value of a luminance image in the image buffer 161 and a predetermined pattern are compared using a normalized correlation or the like, or a luminance image and a weight. A method of discriminating by a classifier whose feature quantity is the product of Further, the recognition processing unit 602 can use the parallax information and the distance information obtained from the parallax buffer 162 in combination with the image information and the weight information for recognition. For example, after masking each of the luminance image and the parallax image with a weight, a method of identifying the two types after masking and a discriminator characterized by the difference thereof is used.
 パターンマッチングで対象の認識をする場合を例にすると、前景と背景の組み合わせによって発生するパターン形状は膨大な数に上る。このため、前景と背景の組み合わせによって対象を誤認識することが考えられる。前述の重み生成処理による重み情報を用いることで、認識した前景だけの情報を用いて処理を行うことが可能になるため、誤認識を抑制する効果がある。これは、機械学習を用いる場合の正認識の向上、誤認識の低減にも同様に効果がある。 Taking the case of recognizing an object by pattern matching as an example, the number of pattern shapes generated by the combination of the foreground and the background is enormous. Therefore, it is conceivable that the object is erroneously recognized depending on the combination of the foreground and the background. By using the weight information obtained by the weight generation process described above, it is possible to perform the process using only the recognized foreground information, which is effective in suppressing erroneous recognition. This is similarly effective in improving positive recognition and reducing false recognition when machine learning is used.
<立体物認識処理(実施例3)> 図8は、立体物認識処理S206にかかわる画像認識装置100の機能ブロック構成(実施例3)を示すブロック図である。図9は、立体物認識処理S206の詳細(実施例3)を示すフローチャートである。本例において、前述の図2の立体物認識処理S206、すなわち、図9に示すフローチャートは、図8に示すように、演算処理部105に備えられた、重み生成処理部801と、正規化処理部802と、認識処理部803によって実施される。 <Three-dimensional object recognition process (Example 3)> FIG. 8 is a block diagram showing a functional block configuration (Example 3) of the image recognition device 100 related to the three-dimensional object recognition process S206. FIG. 9 is a flowchart showing the details (Example 3) of the three-dimensional object recognition process S206. In this example, the three-dimensional object recognition process S206 of FIG. 2, that is, the flowchart shown in FIG. 9, shows the weight generation process unit 801 provided in the arithmetic processing unit 105 and the normalization process as shown in FIG. It is carried out by unit 802 and recognition processing unit 803.
[重み生成処理部] 重み生成処理部801は、図6および図7に基づき説明した重み生成処理部601と同様に、視差バッファ162の情報から、画像バッファ161の画像(立体物検知処理S205で取得された検知領域に対応した画像)の各画素に対応する重みを生成する(図9:S901)。重み生成処理S901では、例えば視差の中央値から任意のしきい値sthの範囲に入る値を1、それ以外を0とする重みを作成する。 [Weight generation processing unit] Similar to the weight generation processing unit 601 described with reference to FIGS. 6 and 7, the weight generation processing unit 801 uses the information of the parallax buffer 162 to obtain an image of the image buffer 161 (in the three-dimensional object detection process S205). A weight corresponding to each pixel of the acquired detection area) is generated (FIG. 9: S901). In the weight generation process S901, for example, a weight is created in which the value within the range of an arbitrary threshold value th is set to 1 from the median value of parallax, and the other values are set to 0.
[正規化処理部] 正規化処理部802は、重み生成処理部801で作成された重みに基づいて、立体物検知処理S205で取得された検知領域に対応する視差情報を正規化する(図9:S902)。正規化処理S902では、例えば2値の重み0または1が得られている時、重み1となっている視差の最大値と最小値をsmaxとsminとし、以下の式(4)に基づいて各視差の正規化を行う。
(数4)
Figure JPOXMLDOC01-appb-I000004
ここで、Smaxを上回るSi、Sminを下回るSiが得られるような場合、その正規化結果に無効値と判断できるような値を加えてもよい。例えば有限な正数値を扱うことを前提としたシステムにおいて、マイナス値が入ってきた場合に無効値とするような例外処理が考えられる。
[Normalization processing unit] The normalization processing unit 802 normalizes the parallax information corresponding to the detection area acquired by the three-dimensional object detection processing S205 based on the weight created by the weight generation processing unit 801 (FIG. 9). : S902). In the normalization process S902, for example, when a binary weight of 0 or 1 is obtained, the maximum and minimum values of the parallax having the weight of 1 are set to s max and s min, and are based on the following equation (4). Normalize each parallax.
(Number 4)
Figure JPOXMLDOC01-appb-I000004
Here, if S i that exceeds S max and S i that is less than S min are obtained, a value that can be judged to be an invalid value may be added to the normalization result. For example, in a system that is premised on handling a finite positive value, exception handling can be considered in which a negative value is treated as an invalid value.
 なお、ここでは、立体物検知処理S205で取得された検知領域に対して、立体物の視差情報から各画素に対応する重みを(数値変換により)生成しているが、立体物の距離情報から各画素に対応する重みを(数値変換により)生成してもよいし、各画素の代わりに、(各画素に対応する)各距離または各視差に対応する重みを生成してもよいことは勿論である。また、検知領域に対応する視差情報を数値変換して正規化しているが、検知領域に対応する距離情報を数値変換して正規化してもよいことは勿論である。 Here, for the detection area acquired by the three-dimensional object detection process S205, the weight corresponding to each pixel is generated (by numerical conversion) from the parallax information of the three-dimensional object, but from the distance information of the three-dimensional object. Of course, the weight corresponding to each pixel may be generated (by numerical conversion), or instead of each pixel, the weight corresponding to each distance (corresponding to each pixel) or each parallax may be generated. Is. Further, although the parallax information corresponding to the detection area is numerically converted and normalized, it goes without saying that the distance information corresponding to the detection area may be numerically converted and normalized.
[認識処理部] 認識処理部803では、画像バッファ161の画像情報と、正規化処理部802で作成した視差情報(正規化処理後の視差情報)を用いて認識を行う(図9:S903)。また、認識処理部803は、前記画像情報や正規化情報に合わせて、重み生成処理部801で作成した重み情報を組み合わせて認識に用いることができる。例えば、図10に示す輝度画像からエッジ抽出を用いて作成したエッジ画像1001と、重み情報1002を掛け合わせて、背景エッジを除去したエッジ画像(背景除去エッジ画像)1003を作成する。
この背景除去エッジ画像1003と、正規化した視差画像を用いて認識を行う。認識処理S903は、例えば正規化相関のようなパターンマッチング技術を用いてもよい。また、2種類情報の積や差分を入力とする識別器を用いてもよい。
[Recognition processing unit] The recognition processing unit 803 performs recognition using the image information of the image buffer 161 and the parallax information (parallax information after normalization processing) created by the normalization processing unit 802 (FIG. 9: S903). .. Further, the recognition processing unit 803 can use the weight information created by the weight generation processing unit 801 in combination with the image information and the normalization information for recognition. For example, the edge image 1001 created by using edge extraction from the luminance image shown in FIG. 10 is multiplied by the weight information 1002 to create an edge image (background-removed edge image) 1003 from which the background edge is removed.
Recognition is performed using the background removal edge image 1003 and the normalized parallax image. The recognition process S903 may use a pattern matching technique such as normalization correlation. Further, a classifier may be used in which the product or difference of the two types of information is input.
 例えば機械学習によって識別機を作成し、これを用いて対象の認識処理を行う場合、正規化処理のみでは、背景部分の特徴の影響を受ける。また、重み生成処理のみでは、前景部分の距離などによって、認識性能に差が発生してしまう。そこで、重み生成処理と正規化処理を合わせて行うことにより、前景と背景の組み合わせに影響を受けず、かつ前景の距離にも影響を受けずに認識することが可能となり、認識性能の向上につながる。 For example, when a discriminator is created by machine learning and the target recognition process is performed using this machine, the normalization process alone is affected by the characteristics of the background part. In addition, only the weight generation process causes a difference in recognition performance depending on the distance of the foreground portion and the like. Therefore, by performing the weight generation process and the normalization process together, it is possible to recognize the image without being affected by the combination of the foreground and the background and the distance of the foreground, which improves the recognition performance. Connect.
 前述のように、前記視差情報は、すべて距離情報に置き換えることができる。 As described above, all the parallax information can be replaced with distance information.
(変形例) 本実施形態では、一対のカメラ101、102から構成されるステレオカメラを用いた画像認識装置100で説明した。しかし、ステレオカメラを用いない画像認識装置100Aを用いて実現してもよい。 (Modification Example) In the present embodiment, the image recognition device 100 using a stereo camera composed of a pair of cameras 101 and 102 has been described. However, it may be realized by using an image recognition device 100A that does not use a stereo camera.
 図11は、画像認識装置100Aにおける動作を示すフローチャートである。図11において、図2に示した画像認識装置100における動作と同一の箇所には同一の符号を付してその説明を省略する。 FIG. 11 is a flowchart showing the operation of the image recognition device 100A. In FIG. 11, the same parts as those of the operation in the image recognition device 100 shown in FIG. 2 are designated by the same reference numerals, and the description thereof will be omitted.
 図11に示すように、画像認識装置100Aは、撮像部としての光学カメラ(以下、単にカメラという)1101とレーダーセンサ1102を備えている。これにより、立体物を検知する。S211では、カメラ1101により画像が撮像され、撮像された画像情報について、撮像素子が持つ固有の特性を吸収するための補正などの画像処理S203を行う。画像処理S203の処理結果は画像バッファ161に蓄えられる。また、S212では、レーダーセンサ1102により、センサ情報としての立体物までの距離が得られる。
立体物検知処理S213では、立体物までの距離に基づいて、3次元空間上の立体物を検知する。検知に用いた距離情報は距離バッファ163に蓄えられる。距離バッファ163は、例えば図1の記憶部106に設けられる。また、立体物検知処理S213では、後段処理の必要に応じて画像と距離の対応付けを行う。立体物認識処理S214では、上述の画像認識装置100と略同様にして(ここでは、立体物の距離情報を使用して)、立体物検知処理S213により画像上に設定された検知領域に対して立体物の種別を特定する認識処理を行う。
As shown in FIG. 11, the image recognition device 100A includes an optical camera (hereinafter, simply referred to as a camera) 1101 and a radar sensor 1102 as an imaging unit. As a result, a three-dimensional object is detected. In S211 the image is captured by the camera 1101, and the captured image information is subjected to image processing S203 such as correction for absorbing the unique characteristics of the image sensor. The processing result of the image processing S203 is stored in the image buffer 161. Further, in S212, the radar sensor 1102 obtains the distance to the three-dimensional object as sensor information.
The three-dimensional object detection process S213 detects a three-dimensional object in the three-dimensional space based on the distance to the three-dimensional object. The distance information used for detection is stored in the distance buffer 163. The distance buffer 163 is provided, for example, in the storage unit 106 of FIG. Further, in the three-dimensional object detection process S213, the image and the distance are associated with each other as necessary for the subsequent process. In the three-dimensional object recognition process S214, in substantially the same manner as the above-mentioned image recognition device 100 (here, using the distance information of the three-dimensional object), with respect to the detection area set on the image by the three-dimensional object detection process S213. Performs recognition processing to identify the type of three-dimensional object.
 レーダーセンサ1102から出力される立体物までの距離を入力とする立体物検知処理S213は、距離計測に用いるレーダーセンサ1102のセンサ特性を考慮した検知処理を行う必要はあるが、検知領域を決定した後の処理は、画像認識装置100で説明したステレオカメラによる構成と同様にできる。また、画像認識装置100Aは、画像処理S203において複数の画像を必要としない。 The three-dimensional object detection process S213, which inputs the distance from the radar sensor 1102 to the three-dimensional object, needs to perform the detection process in consideration of the sensor characteristics of the radar sensor 1102 used for distance measurement, but determines the detection area. The subsequent processing can be performed in the same manner as the configuration by the stereo camera described in the image recognition device 100. Further, the image recognition device 100A does not require a plurality of images in the image processing S203.
(作用効果) 以上で説明した本実施形態の画像認識装置100、100Aは、撮像部としてのカメラ101、102、1101によって撮像された画像上に設定された立体物の検知領域に対して、立体物の距離情報または視差情報を数値変換し、数値変換された距離情報または視差情報と画像の画像情報とを組み合わせて、立体物の種別を特定する認識処理を行う。 (Effect of action) The image recognition devices 100 and 100A of the present embodiment described above are three-dimensional with respect to the detection region of a three-dimensional object set on the image captured by the cameras 101, 102 and 1101 as the imaging unit. The distance information or the parallax information of the object is numerically converted, and the numerically converted distance information or the parallax information is combined with the image information of the image to perform a recognition process for specifying the type of the three-dimensional object.
 詳しくは、認識処理を行うに当たって、カメラ101、102、1101から得られた各画素の情報と、それに対応する距離または視差の情報について、認識対象となる立体物の距離情報または視差情報を正規化する(図4、5)、あるいは認識対象以外の距離情報または視差情報をマスクする、もしくは画素情報と距離情報または視差情報の重みを変える(図6、7)、あるいはそれらを組み合わせる(図8、9)ことによって、画素情報と距離情報または視差情報を組み合わせた認識を実現する。 Specifically, in performing the recognition process, the distance information or parallax information of the three-dimensional object to be recognized is normalized with respect to the information of each pixel obtained from the cameras 101, 102, 1101 and the corresponding distance or parallax information. (Figs. 4 and 5), or mask distance information or parallax information other than the recognition target, or change the weight of pixel information and distance information or parallax information (Figs. 6 and 7), or combine them (Fig. 8, 9) By doing so, recognition that combines pixel information and distance information or parallax information is realized.
 以上で説明した実施形態によれば、次の作用効果が得られる。 According to the embodiment described above, the following effects can be obtained.
 すなわち、本実施形態の画像認識装置100、100Aは、カメラ101、102、1101によって撮像された画像上に設定された立体物の検知領域301、302に対して、正認識率を向上させることができる。また、その他の背景立体物について認識対象である歩行者や車両といった物体として誤認識することを抑制することができる。特に前景と背景の組み合わせによって発生する認識対象と類似した形状(画像上の見え)によって、対象を誤認識することを抑制する効果がある。したがって、本実施形態によれば、コスト増加を抑制しつつ、立体物を的確に検知し、認識性能を向上させることができる。 That is, the image recognition devices 100 and 100A of the present embodiment can improve the positive recognition rate with respect to the detection areas 301 and 302 of the three-dimensional object set on the images captured by the cameras 101, 102 and 1101. it can. In addition, it is possible to suppress erroneous recognition of other background three-dimensional objects as objects such as pedestrians and vehicles to be recognized. In particular, the shape (appearance on the image) similar to the recognition target generated by the combination of the foreground and the background has the effect of suppressing erroneous recognition of the target. Therefore, according to the present embodiment, it is possible to accurately detect a three-dimensional object and improve the recognition performance while suppressing an increase in cost.
 なお、上述した実施形態では、2つのカメラから構成されるステレオカメラまたは単眼カメラを用いたが、カメラは3つ以上使用してもよい。また、車両前方を撮像する(言い換えれば、車両前方の画像を取得する)前方カメラを例示したが、車両後方や車両側方を撮像する後方カメラや側方カメラを使用してもよいことは当然である。 In the above-described embodiment, a stereo camera or a monocular camera composed of two cameras is used, but three or more cameras may be used. Further, although the front camera that images the front of the vehicle (in other words, acquires the image of the front of the vehicle) is illustrated, it is natural that a rear camera or a side camera that images the rear of the vehicle or the side of the vehicle may be used. Is.
 本発明は、上記した実施形態に限定されるものではなく、本発明の特徴を損なわない限り、本発明の技術思想の範囲内で考えられるその他の形態についても、本発明の範囲内に含まれる。例えば、上記した実施形態は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、上述の実施形態と変形例を組み合わせた構成としてもよい。 The present invention is not limited to the above-described embodiments, and other embodiments that can be considered within the scope of the technical idea of the present invention are also included within the scope of the present invention as long as the features of the present invention are not impaired. .. For example, the above-described embodiment has been described in detail in order to explain the present invention in an easy-to-understand manner, and is not necessarily limited to the one including all the described configurations. Further, the configuration may be a combination of the above-described embodiment and a modified example.
 また、上記の各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、上記の各構成、機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウェアで実現してもよい。各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリや、ハードディスク、SSD(Solid State Drive)等の記憶装置、または、ICカード、SDカード、DVD等の記録媒体に置くことができる。 Further, each of the above configurations, functions, processing units, processing means, etc. may be realized by hardware by designing a part or all of them by, for example, an integrated circuit. Further, each of the above configurations, functions, and the like may be realized by software by the processor interpreting and executing a program that realizes each function. Information such as programs, tables, and files that realize each function can be stored in a memory, a hard disk, a storage device such as an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, or a DVD.
 また、制御線や情報線は説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。実際には殆ど全ての構成が相互に接続されていると考えてもよい。 In addition, the control lines and information lines indicate those that are considered necessary for explanation, and not all control lines and information lines are necessarily indicated on the product. In practice, it can be considered that almost all configurations are interconnected.
100、100A 画像認識装置101、102 カメラ(撮像部)103 画像入力インタフェース104 画像処理部105 演算処理部106 記憶部107 CANインタフェース108 制御処理部109 内部バス110 処理装置111 車載ネットワークCAN161 画像バッファ162 視差バッファ163 距離バッファ401 正規化処理部(実施例1)402 認識処理部(実施例1)601 重み生成処理部(実施例2)602 認識処理部(実施例2)801 重み生成処理部(実施例3)802 正規化処理部(実施例3)803 認識処理部(実施例3)1101 光学カメラ(撮像部)1102 レーダーセンサ 100, 100A Image recognition device 101, 102 Camera (imaging unit) 103 Image input interface 104 Image processing unit 105 Arithmetic processing unit 106 Storage unit 107 CAN interface 108 Control processing unit 109 Internal bus 110 Processing device 111 In-vehicle network CAN161 Image buffer 162 Misalignment Buffer 163 Distance buffer 401 Normalization processing unit (Example 1) 402 Recognition processing unit (Example 1) 601 Weight generation processing unit (Example 2) 602 Recognition processing unit (Example 2) 801 Weight generation processing unit (Example 1) 3) 802 Normalization processing unit (Example 3) 803 Recognition processing unit (Example 3) 1101 Optical camera (imaging unit) 1102 Radar sensor

Claims (6)

  1.  撮像部によって撮像された画像上の立体物の認識を行う画像認識装置であって、
     前記画像上に設定された立体物の検知領域に対して、前記立体物の距離情報または視差情報を数値変換し、数値変換された距離情報または視差情報と前記画像の画像情報とを組み合わせて、前記立体物の種別を特定する認識処理を行うことを特徴とする画像認識装置。
    An image recognition device that recognizes a three-dimensional object on an image captured by an image pickup unit.
    The distance information or disparity information of the three-dimensional object is numerically converted with respect to the detection area of the three-dimensional object set on the image, and the numerically converted distance information or disparity information is combined with the image information of the image. An image recognition device characterized by performing a recognition process for specifying the type of a three-dimensional object.
  2.  請求項1に記載の画像認識装置において、
     前記画像上に設定された立体物の検知領域に対して、
     前記立体物の距離情報または視差情報を任意の規則に基づいて数値変換して正規化する正規化処理部と、
     前記正規化処理部によって数値変換された距離情報または視差情報と、前記画像の画像情報とを用いて、前記立体物の種別を特定する認識処理を行う認識処理部と、を備えることを特徴とする画像認識装置。
    In the image recognition device according to claim 1,
    For the detection area of a three-dimensional object set on the image,
    A normalization processing unit that numerically converts and normalizes the distance information or parallax information of the three-dimensional object based on an arbitrary rule, and
    It is characterized by including a recognition processing unit that performs recognition processing for specifying the type of the three-dimensional object by using the distance information or parallax information numerically converted by the normalization processing unit and the image information of the image. Image recognition device.
  3.  請求項1に記載の画像認識装置において、
     前記画像上に設定された立体物の検知領域に対して、
     前記立体物の距離情報または視差情報から各画素または各距離または各視差に対応する重みを生成する重み生成処理部と、
     前記重み生成処理部によって生成された重み情報と、前記画像の画像情報とを用いて、前記立体物の種別を特定する認識処理を行う認識処理部と、を備えることを特徴とする画像認識装置。
    In the image recognition device according to claim 1,
    For the detection area of a three-dimensional object set on the image,
    A weight generation processing unit that generates weights corresponding to each pixel or each distance or each parallax from the distance information or parallax information of the three-dimensional object.
    An image recognition device including a recognition processing unit that performs recognition processing for specifying the type of the three-dimensional object by using the weight information generated by the weight generation processing unit and the image information of the image. ..
  4.  請求項3に記載の画像認識装置において、
     前記認識処理部は、前記重み生成処理部によって生成された重み情報と、前記画像の画像情報と、前記立体物の距離情報または視差情報とを用いて、前記立体物の種別を特定する認識処理を行うことを特徴とする画像認識装置。
    In the image recognition device according to claim 3,
    The recognition processing unit uses the weight information generated by the weight generation processing unit, the image information of the image, and the distance information or the parallax information of the three-dimensional object to identify the type of the three-dimensional object. An image recognition device characterized by performing.
  5.  請求項1に記載の画像認識装置において、
     前記画像上に設定された立体物の検知領域に対して、前記立体物の距離情報または視差情報から各画素または各距離または各視差に対応する重みを生成する重み生成処理部と、
     前記重み生成処理部で得られた重み情報に基づいて、前記画像上に設定された立体物の検知領域に対して、前記立体物の距離情報または視差情報を数値変換して正規化する正規化処理部と、
     前記正規化処理部によって数値変換された距離情報または視差情報と、前記画像の画像情報とを用いて、前記立体物の種別を特定する認識処理を行う認識処理部と、を備えることを特徴とする画像認識装置。
    In the image recognition device according to claim 1,
    A weight generation processing unit that generates weights corresponding to each pixel or each distance or each parallax from the distance information or parallax information of the three-dimensional object with respect to the detection area of the three-dimensional object set on the image.
    Based on the weight information obtained by the weight generation processing unit, the distance information or parallax information of the three-dimensional object is numerically converted and normalized to the detection area of the three-dimensional object set on the image. Processing unit and
    It is characterized by including a recognition processing unit that performs recognition processing for specifying the type of the three-dimensional object by using the distance information or parallax information numerically converted by the normalization processing unit and the image information of the image. Image recognition device.
  6.  請求項5に記載の画像認識装置において、
     前記認識処理部は、前記正規化処理部によって数値変換された距離情報または視差情報と、前記重み生成処理部によって生成された重み情報と、前記画像の画像情報とを用いて、前記立体物の種別を特定する認識処理を行うことを特徴とする画像認識装置。
    In the image recognition device according to claim 5,
    The recognition processing unit uses the distance information or parallax information numerically converted by the normalization processing unit, the weight information generated by the weight generation processing unit, and the image information of the image to form the three-dimensional object. An image recognition device characterized by performing recognition processing for specifying a type.
PCT/JP2020/033886 2019-10-29 2020-09-08 Image recognition device WO2021084915A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2021554138A JP7379523B2 (en) 2019-10-29 2020-09-08 image recognition device
DE112020004377.0T DE112020004377T5 (en) 2019-10-29 2020-09-08 IMAGE RECOGNITION DEVICE

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2019-196340 2019-10-29
JP2019196340 2019-10-29

Publications (1)

Publication Number Publication Date
WO2021084915A1 true WO2021084915A1 (en) 2021-05-06

Family

ID=75715095

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/033886 WO2021084915A1 (en) 2019-10-29 2020-09-08 Image recognition device

Country Status (3)

Country Link
JP (1) JP7379523B2 (en)
DE (1) DE112020004377T5 (en)
WO (1) WO2021084915A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019124537A (en) * 2018-01-15 2019-07-25 キヤノン株式会社 Information processor, method for controlling the same, program, and vehicle operation supporting system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6752024B2 (en) 2016-02-12 2020-09-09 日立オートモティブシステムズ株式会社 Image processing device
JP6764378B2 (en) 2017-07-26 2020-09-30 株式会社Subaru External environment recognition device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019124537A (en) * 2018-01-15 2019-07-25 キヤノン株式会社 Information processor, method for controlling the same, program, and vehicle operation supporting system

Also Published As

Publication number Publication date
DE112020004377T5 (en) 2022-07-07
JPWO2021084915A1 (en) 2021-05-06
JP7379523B2 (en) 2023-11-14

Similar Documents

Publication Publication Date Title
JP5600332B2 (en) Driving assistance device
JP7206583B2 (en) Information processing device, imaging device, device control system, moving object, information processing method and program
JP2013190421A (en) Method for improving detection of traffic-object position in vehicle
JP6701253B2 (en) Exterior environment recognition device
JP2014115978A (en) Mobile object recognition device, notification apparatus using the device, mobile object recognition program for use in the mobile object recognition device, and mobile object with the mobile object recognition device
CN110659547A (en) Object recognition method, device, vehicle and computer-readable storage medium
US9524645B2 (en) Filtering device and environment recognition system
JP6631691B2 (en) Image processing device, device control system, imaging device, image processing method, and program
WO2021084915A1 (en) Image recognition device
US20200210730A1 (en) Vehicle exterior environment recognition apparatus
KR20210147405A (en) Electronic device for performing object detection and operation mtehod thereof
WO2019175920A1 (en) Fog specification device, fog specification method, and fog specification program
JP7283268B2 (en) Information processing equipment and in-vehicle system
JP2018146495A (en) Object detection apparatus, object detection method, object detection program, imaging apparatus, and equipment control system
JP7466695B2 (en) Image Processing Device
WO2020036039A1 (en) Stereo camera device
JP2021051348A (en) Object distance estimation apparatus and object distance estimation method
JP7277666B2 (en) processing equipment
WO2018097269A1 (en) Information processing device, imaging device, equipment control system, mobile object, information processing method, and computer-readable recording medium
JP5890816B2 (en) Filtering device and environment recognition system
US20230096864A1 (en) Imaging processing device
CN115063772B (en) Method for detecting vehicles after formation of vehicles, terminal equipment and storage medium
WO2023112127A1 (en) Image recognition device and image recognition method
KR20230003953A (en) Vehicle lightweight deep learning processing device and method applying environment variance adaptive feature generator
WO2016002418A1 (en) Information processing device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20883110

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021554138

Country of ref document: JP

Kind code of ref document: A

122 Ep: pct application non-entry in european phase

Ref document number: 20883110

Country of ref document: EP

Kind code of ref document: A1