WO2021084915A1

WO2021084915A1 - Image recognition device

Info

Publication number: WO2021084915A1
Application number: PCT/JP2020/033886
Authority: WO
Inventors: 郭介牛場; 亮輔鴇
Original assignee: 日立Astemo株式会社
Priority date: 2019-10-29
Filing date: 2020-09-08
Publication date: 2021-05-06
Also published as: DE112020004377T5; JPWO2021084915A1; JP7379523B2

Abstract

Provided is an image recognition device which can accurately detect a three-dimensional object and have improved recognition performance while minimizing increases in cost. With regard to information about each pixel obtained from cameras 101, 102, and 1101 and information about a distance or parallax corresponding thereto, the distance information or parallax information for a three-dimensional object to be recognized is normalized, or distance information or parallax information other than that of the object to be recognized is masked, or the weighting of the pixel information and the distance information or parallax information is changed, or the above techniques are combined, thereby implementing recognition in which the pixel information and the distance information or parallax information is combined.

Description

Image recognition device

The present invention relates to an image recognition device.

In recent years, there has been an increasing demand for improved performance of image recognition devices required for driving support and automatic driving. For example, in the collision safety function for pedestrians, performance improvement is required, such as the addition of a collision safety test for night pedestrians in automobile assessment. In order to realize this performance improvement, high recognition performance for three-dimensional objects is required.

In Patent Document 1, in a situation where an apparently moving three-dimensional object and another three-dimensional object overlap, a pedestrian existing inside the region is traced by tracking feature points inside a predetermined region containing the three-dimensional object. A recognition device that detects a moving three-dimensional object such as is proposed.

Further, Patent Document 2 proposes a method using machine learning, and also proposes to perform recognition by combining an image taken by an optical camera with distance information obtained from stereo matching or radar. ing.

Japanese Unexamined Patent Publication No. 2017-142760 Japanese Unexamined Patent Publication No. 2019-028528

However, in the conventional device, texture information taken by an optical camera is used for recognition of an object, and erroneous recognition occurs in a photograph drawn on a wall or a signboard or a similar silhouette generated by a combination of natural objects. doing. This is because when the recognition process is performed using the image of the optical camera and the distance image corresponding to the image, the information on the pixels, the distance, and the area in which they are put together becomes too large and cannot be realized at a realistic cost.

The present invention has been made in view of the above circumstances, and an object of the present invention is to provide an image recognition device capable of accurately detecting a three-dimensional object and improving recognition performance while suppressing an increase in cost. There is.

The image recognition device of the present invention that solves the above-mentioned problems is an image recognition device that recognizes a three-dimensional object on an image captured by an imaging unit, and with respect to a detection area of the three-dimensional object set on the image. , The distance information or the parallax information of the three-dimensional object is numerically converted, and the numerically converted distance information or the parallax information is combined with the image information of the image to perform a recognition process for specifying the type of the three-dimensional object. And.

According to the present invention, it is possible to provide an image recognition device capable of accurately detecting a three-dimensional object and improving recognition performance while suppressing an increase in cost.

Issues, configurations and effects other than those described above will be clarified by the explanation of the following embodiments.

It is a block diagram which shows the whole structure of an image recognition apparatus. It is a flowchart which shows the operation of an image recognition apparatus. It is a figure which shows the detection area of the 3D object set on the image by the 3D object detection processing. It is a block diagram which shows the functional block composition (Example 1) of the image recognition apparatus which is involved in a three-dimensional object recognition process. It is a flowchart which shows the detail (Example 1) of the three-dimensional object recognition processing. It is a block diagram which shows the functional block composition (Example 2) of the image recognition apparatus involved in the three-dimensional object recognition processing. It is a flowchart which shows the detail (Example 2) of the three-dimensional object recognition processing. It is a block diagram which shows the functional block composition (Example 3) of the image recognition apparatus involved in the three-dimensional object recognition processing. It is a flowchart which shows the detail (Example 3) of the three-dimensional object recognition processing. It is a schematic diagram which shows the procedure of creating the background removal edge image which removed the background edge from the luminance image using weight information. It is a flowchart which shows the operation in the image recognition apparatus of another example.

Hereinafter, embodiments of the present invention will be described with reference to the drawings. In each figure, parts having the same function may be designated by the same reference numerals and repeated description may be omitted.

(Structure of Image Recognition Device) FIG. 1 is a block diagram showing an overall configuration of an image recognition device 100 according to the present embodiment. The image recognition device 100 is mounted on a vehicle (hereinafter, may be referred to as a own vehicle), and the left camera (imaging unit) 101 and the right camera (imaging unit) 102 (hereinafter, simply referred to as simply) arranged side by side in front of the vehicle. Cameras 101 and 102) are provided. The

cameras

101 and 102 constitute a stereo camera, and image a three-dimensional object in front of the vehicle such as a pedestrian, a vehicle, a signal, a sign, a white line, a tail lamp of the vehicle, and a headlight. The image recognition device 100 includes a processing device 110 that recognizes the outside environment of the vehicle based on the information (image information) of the image in front of the vehicle captured by the

cameras

101 and 102. Then, the vehicle (own vehicle) controls the brake, steering, and the like based on the recognition result by the image recognition device 100.

The processing device 110 of the image recognition device 100 captures the images captured by the

cameras

101 and 102 from the image input interface 103. The image information taken in from the image input interface 103 is sent to the image processing unit 104 via the internal bus 109. Then, it is processed by the arithmetic processing unit 105, and the result in the process of processing, the image information of the final result, and the like are stored in the storage unit 106.

The image processing unit 104 includes a first image obtained from the image sensor of the left camera 101 (hereinafter, may be referred to as a left image) and a second image obtained from the image sensor of the right camera 102 (hereinafter, referred to as a right image). For each image, correction of device-specific deviation caused by the image sensor and image correction such as noise interpolation are performed, and this is stored in the storage unit 106 as image information. .. Further, the image processing unit 104 calculates the points corresponding to each other between the first image and the second image, obtains the parallax information, and obtains the parallax information as the distance information corresponding to each pixel on the image. Is stored in the storage unit 106. The image processing unit 104 is connected to the arithmetic processing unit 105, the CAN interface 107, and the control processing unit 108 via the internal bus 109.

The arithmetic processing unit 105 recognizes a three-dimensional object in order to grasp the environment around the vehicle by using the image information and the distance information (parallax information) stored in the storage unit 106. A part of the recognition result of the three-dimensional object and the intermediate processing result is stored in the storage unit 106. After recognizing a three-dimensional object with respect to the captured image, the arithmetic processing unit 105 calculates the vehicle control using the recognition result. The vehicle control policy obtained as a result of the vehicle control calculation and a part of the recognition result are transmitted to the in-vehicle network CAN111 via the CAN interface 107, whereby the vehicle is controlled.

The control processing unit 108 monitors whether each processing unit has caused an abnormal operation, whether an error has occurred during data transfer, and the like, and prevents the abnormal operation. The image processing unit 104, the arithmetic processing unit 105, and the control processing unit 108 may be composed of a single computer unit or a plurality of computer units.

(Operation of the image recognition device) FIG. 2 is a flowchart showing the operation of the image recognition device 100.

In S201 and S202, an image is captured by the left camera 101 and the right camera 102 provided in the image recognition device 100, and each of the captured

image information

121 and 122 absorbs the unique characteristics of the image sensor. Image processing S203 such as correction of The processing result of the image processing S203 is stored in the image buffer 161. The image buffer 161 is provided in the storage unit 106 of FIG.

Next, the parallax processing S204 is performed. Specifically, the two images corrected by the image processing S203 are used to collate the images with each other, thereby obtaining parallax information of the images obtained by the left camera 101 and the right camera 102. By the parallax of the left and right images, a certain point of interest on the image of the three-dimensional object is obtained as the distance to the three-dimensional object by the principle of triangulation. The processing result of the parallax processing S204 is stored in the parallax buffer 162. The parallax buffer 162 is provided in the storage unit 106 of FIG. Further, the information recorded in the parallax buffer 162 may be converted into distance information and then used for the subsequent processing.

The image processing S203 and the parallax processing S204 are performed by the image processing unit 104 of FIG. 1, and the finally obtained image information and the parallax information are stored in the storage unit 106.

Then, in the next three-dimensional object detection process S205, a three-dimensional object in the three-dimensional space is detected using the parallax information obtained by the parallax processing S204 for the parallax or distance of each pixel of the left and right images. FIG. 3 is a diagram showing a three-dimensional object detection region (also referred to as a three-dimensional object region) set on the image by the three-dimensional object detection process S205. FIG. 3 shows a pedestrian detection area 301 and a vehicle detection area 302 detected by the

cameras

101 and 102 on the image as a result of the three-dimensional object detection process S205. These

detection areas

301 and 302 indicate areas where pedestrians or vehicles exist on the image, and even if they are rectangular as shown in FIG. 3, they are irregular areas obtained from parallax and distance. May be good. It is generally treated as a rectangle in order to facilitate the handling by a computer in the subsequent processing. In the present embodiment, the detection area will be treated as a rectangle, and a pedestrian will be mainly used as an example of a three-dimensional object.

Next, in the three-dimensional object recognition process S206, a recognition process for specifying the type of the three-dimensional object is performed for the detection area set on the image by the three-dimensional object detection process S205. The three-dimensional object to be recognized by the three-dimensional object recognition process S206 is, for example, a pedestrian, a vehicle, a signal, a sign, a white line, a tail lamp of a car, a headlight, or the like, and the type of any of these is specified. This three-dimensional object recognition process S206 is performed using the image information recorded in the image buffer 161 and the parallax information recorded in the parallax buffer 162. However, the information in the parallax buffer 162 may cause erroneous recognition because the relationship between the object and the background exists infinitely. This is the same even when a radar such as a millimeter wave is combined with an image sensor such as a camera. The details of the three-dimensional object recognition process S206 that solves this problem will be described later.

Next, in the vehicle control process S207, for example, a warning is issued to the occupant in consideration of the recognition result of the three-dimensional object in the three-dimensional object recognition process S206 and the state of the own vehicle (speed, steering angle, etc.), and the own vehicle. The control for braking and adjusting the steering angle of the vehicle is determined, or the avoidance control for the recognized three-dimensional object is determined, and the result is output as automatic control information via the CAN interface 107 (S208).

The three-dimensional object detection process S205, the three-dimensional object recognition process S206, and the vehicle control process S207 are performed by the arithmetic processing unit 105 of FIG.

Note that the program shown in the flowchart of FIG. 2 and the flowchart of FIG. 5 described later can be executed by a computer equipped with a CPU, memory, and the like. All processing or some processing may be realized by a hard logic circuit. Further, this program can be provided by storing it in the storage medium of the image recognition device 100 in advance. Alternatively, the program can be stored and provided in an independent storage medium, or the program can be recorded and stored in the storage medium of the image recognition device 100 via a network line. It may be supplied as a computer-readable computer program product in various forms such as a data signal (carrier wave).

<Three-dimensional object recognition process (Example 1)> FIG. 4 is a block diagram showing a functional block configuration (Example 1) of the image recognition device 100 related to the three-dimensional object recognition process S206. FIG. 5 is a flowchart showing the details (Example 1) of the three-dimensional object recognition process S206. In this example, the three-dimensional object recognition process S206 of FIG. 2, that is, the flowchart shown in FIG. 5, is normalized to the information of the parallax buffer 162 provided in the arithmetic processing unit 105 as shown in FIG. This is performed by the normalization processing unit 401, the recognition processing unit 402 that recognizes the information of the parallax buffer 162 that has passed through the normalization processing unit 401, and the information of the image buffer 161. Hereinafter, the processing of each processing unit will be described in order. Note that these processes will be described on the premise of a stereo camera.

[Normalization processing unit] The normalization processing unit 401 normalizes the parallax corresponding to the detection area acquired by the three-dimensional object detection process S205 among the information contained in the parallax buffer 162 (FIG. 5: S501). In the normalization process S501, for example, the value s _{i of} each parallax is numerically converted into the value S _{i after normalization based on the following equation (1).}
(Number 1)

Here, s _max and s _min are, for example, the maximum and minimum values of the parallax value before normalization, and S _max and S _min are the maximum and minimum values after normalization. S _max and S _min shall be arbitrarily determined according to the format of the information used in the three-dimensional object recognition process S206. For example, S _max = 1 and S _min = 0. Further, s _max and s _min may be arbitrarily determined according to the format of the information used in the three-dimensional object recognition process S206. For example, in a stereo camera, the accuracy of parallax and distance is poor due to being dragged when the signal / noise ratio near the region where the brightness value is small is poor due to the sensor characteristics, or when the resolution in the region where the brightness value is saturated is unstable. It is conceivable that _{In such a case, s max} and s _min may be set to arbitrary values based on the original pixel information, sensor characteristics, etc., or may be converted and used based on a certain conversion formula such as 10% carry-up or round-down. In addition, regardless of the accuracy of the original image, in the case of a radar sensor or the like, it is conceivable to use _{s max} and s _{min excluding outliers based on the erroneous measurement occurrence rate in the region.}

Further, the equation used for the normalization process S501 may be defined as the following equation (2).
(Number 2)

Here, s _avr is the average value of the parallax values in the detection area. As described above, the method used for normalization shall be arbitrarily determined according to the format of the information used in the three-dimensional object recognition process S206.

Here, the parallax information corresponding to the detection area is numerically converted and normalized based on an arbitrary rule, but it goes without saying that the distance information corresponding to the detection area may be numerically converted and normalized. is there.

[Recognition processing unit] The recognition processing unit 402 performs recognition processing by combining the information of the image buffer 161 and the normalization information of the parallax buffer 162 (parallax information or distance information after the normalization processing) (FIG. 5: S502). The recognition process S502 uses, for example, pattern matching that compares a luminance image in the image buffer 161 with a predetermined pattern using a normalized correlation or the like, or determination by a classifier created by using machine learning. .. When combining the normalized information of the parallax buffer 162, for example, a method such as using the average value of the pattern matching result of the luminance image and the pattern matching result of the normalized parallax information as the final judgment value, or the luminance image and the normalized parallax A method of identifying by a classifier created by machine learning using the difference in information as a feature quantity is used.

Taking the case of recognizing an object by pattern matching as an example, if the above-mentioned normalization process is not performed, the number of combinations of the foreground and the background to be recognized becomes enormous. For example, when the pedestrian to be recognized in the foreground is at a position of 10 m, when the background exists as a wall 20 m behind it, and when the pedestrian is also at a position of 10 m and the background is 40 m behind it. , Parallax or distance information is different. When performing these pattern matching, it is necessary to have a template corresponding to each, but since there are innumerable foreground positions and background positions, it is not realistic to have all the combinations as templates. .. Moreover, even when statistical processing using machine learning is performed, it is not realistic to collect all combinations of foreground and background. Therefore, the above-mentioned normalization process that can reduce the amount of information to a realistic amount (can be compressed to a predetermined range) is effective.

<Three-dimensional object recognition process (Example 2)> FIG. 6 is a block diagram showing a functional block configuration (Example 2) of the image recognition device 100 related to the three-dimensional object recognition process S206. FIG. 7 is a flowchart showing the details (Example 2) of the three-dimensional object recognition process S206. In this example, the three-dimensional object recognition process S206 of FIG. 2, that is, the flowchart shown in FIG. 7, is based on the information (misparity information) of the disparity buffer 162 provided in the arithmetic processing unit 105, as shown in FIG. The weight generation processing unit 601 that creates weights corresponding to each pixel of the image of the image buffer 161 and the recognition processing unit 602 that recognizes the weight information created by the weight generation processing unit 601 together with the information of the image buffer information 161. Will be implemented.

[Weight generation processing unit] The weight generation processing unit 601 calculates the weight corresponding to each pixel of the image of the image buffer 161 (the image corresponding to the detection area acquired by the three-dimensional object detection processing S205) from the information of the parallax buffer 162. Generate (FIG. 7: S701). The detection area obtained by the three-dimensional object detection process S205 includes a background portion in addition to the recognition target that is the foreground portion. At this time, if the recognition target, which is the foreground part, and the background part are treated in the same way, it causes erroneous recognition. Therefore, in the weight generation process S701, the weight is created using the parallax information. For example, when an arbitrary threshold value s _th is set for the average value s _avr _{of the parallax value s i} , the weight is _{1 for pixels having a parallax value s i} satisfying the following equation (3), and other than that. Give a weight such that is 0.
(Number 3)

This weight is used, for example, to mask the luminance information obtained from the image buffer 161. The weight generation processing unit 601 may use the median value instead of the _{average value s avr} _{, and instead of determining the threshold value s th} , obtains a value deviating from the variance or standard deviation of the parallax in the detection region. You can also. For example, a weight is given so that pixels not included in the standard deviation of 3σ are 0 and others are 1. The designer may arbitrarily determine the maximum and minimum (in other words, the range) of this weight, and assign it linearly or according to an arbitrary function. Further, the weight can be _{created, for example, by creating a histogram from the parallax value s i} in the detection area and selecting either the foreground or the background mountain generated in the histogram. For example, a weight is given so that the pixel having the _{parallax value s i} corresponding to the foreground to be recognized is 1 and the other pixels are 0.

Here, for the detection area acquired by the three-dimensional object detection process S205, the weight corresponding to each pixel is generated (by numerical conversion) from the parallax information of the three-dimensional object, but from the distance information of the three-dimensional object. Of course, the weight corresponding to each pixel may be generated (by numerical conversion), or instead of each pixel, the weight corresponding to each distance (corresponding to each pixel) or each parallax may be generated. Is.

[Recognition processing unit] The recognition processing unit 602 performs recognition processing using the image information of the image buffer 161 and the weight information created by the weight generation processing unit 601 (FIG. 7: S702). The recognition process S702 includes, for example, a method such as pattern matching in which a weighted value of a luminance image in the image buffer 161 and a predetermined pattern are compared using a normalized correlation or the like, or a luminance image and a weight. A method of discriminating by a classifier whose feature quantity is the product of Further, the recognition processing unit 602 can use the parallax information and the distance information obtained from the parallax buffer 162 in combination with the image information and the weight information for recognition. For example, after masking each of the luminance image and the parallax image with a weight, a method of identifying the two types after masking and a discriminator characterized by the difference thereof is used.

Taking the case of recognizing an object by pattern matching as an example, the number of pattern shapes generated by the combination of the foreground and the background is enormous. Therefore, it is conceivable that the object is erroneously recognized depending on the combination of the foreground and the background. By using the weight information obtained by the weight generation process described above, it is possible to perform the process using only the recognized foreground information, which is effective in suppressing erroneous recognition. This is similarly effective in improving positive recognition and reducing false recognition when machine learning is used.

<Three-dimensional object recognition process (Example 3)> FIG. 8 is a block diagram showing a functional block configuration (Example 3) of the image recognition device 100 related to the three-dimensional object recognition process S206. FIG. 9 is a flowchart showing the details (Example 3) of the three-dimensional object recognition process S206. In this example, the three-dimensional object recognition process S206 of FIG. 2, that is, the flowchart shown in FIG. 9, shows the weight generation process unit 801 provided in the arithmetic processing unit 105 and the normalization process as shown in FIG. It is carried out by unit 802 and recognition processing unit 803.

[Weight generation processing unit] Similar to the weight generation processing unit 601 described with reference to FIGS. 6 and 7, the weight generation processing unit 801 uses the information of the parallax buffer 162 to obtain an image of the image buffer 161 (in the three-dimensional object detection process S205). A weight corresponding to each pixel of the acquired detection area) is generated (FIG. 9: S901). In the weight generation process S901, for example, a weight is created in which the value within the range of an _{arbitrary threshold value th is set to 1 from the median value of parallax, and the other values are set to 0.}

[Normalization processing unit] The normalization processing unit 802 normalizes the parallax information corresponding to the detection area acquired by the three-dimensional object detection processing S205 based on the weight created by the weight generation processing unit 801 (FIG. 9). : S902). In the normalization process S902, for example, when a binary weight of 0 or 1 is obtained, the maximum and minimum values of the parallax having the weight of 1 are set to s _max and s _min, and are based on the following equation (4). Normalize each parallax.
(Number 4)

Here, if S _i that exceeds _{S max} and S _i that is less than S _min are obtained, a value that can be judged to be an invalid value may be added to the normalization result. For example, in a system that is premised on handling a finite positive value, exception handling can be considered in which a negative value is treated as an invalid value.

Here, for the detection area acquired by the three-dimensional object detection process S205, the weight corresponding to each pixel is generated (by numerical conversion) from the parallax information of the three-dimensional object, but from the distance information of the three-dimensional object. Of course, the weight corresponding to each pixel may be generated (by numerical conversion), or instead of each pixel, the weight corresponding to each distance (corresponding to each pixel) or each parallax may be generated. Is. Further, although the parallax information corresponding to the detection area is numerically converted and normalized, it goes without saying that the distance information corresponding to the detection area may be numerically converted and normalized.

[Recognition processing unit] The recognition processing unit 803 performs recognition using the image information of the image buffer 161 and the parallax information (parallax information after normalization processing) created by the normalization processing unit 802 (FIG. 9: S903). .. Further, the recognition processing unit 803 can use the weight information created by the weight generation processing unit 801 in combination with the image information and the normalization information for recognition. For example, the edge image 1001 created by using edge extraction from the luminance image shown in FIG. 10 is multiplied by the weight information 1002 to create an edge image (background-removed edge image) 1003 from which the background edge is removed.
Recognition is performed using the background removal edge image 1003 and the normalized parallax image. The recognition process S903 may use a pattern matching technique such as normalization correlation. Further, a classifier may be used in which the product or difference of the two types of information is input.

For example, when a discriminator is created by machine learning and the target recognition process is performed using this machine, the normalization process alone is affected by the characteristics of the background part. In addition, only the weight generation process causes a difference in recognition performance depending on the distance of the foreground portion and the like. Therefore, by performing the weight generation process and the normalization process together, it is possible to recognize the image without being affected by the combination of the foreground and the background and the distance of the foreground, which improves the recognition performance. Connect.

As described above, all the parallax information can be replaced with distance information.

(Modification Example) In the present embodiment, the image recognition device 100 using a stereo camera composed of a pair of

cameras

101 and 102 has been described. However, it may be realized by using an image recognition device 100A that does not use a stereo camera.

FIG. 11 is a flowchart showing the operation of the image recognition device 100A. In FIG. 11, the same parts as those of the operation in the image recognition device 100 shown in FIG. 2 are designated by the same reference numerals, and the description thereof will be omitted.

As shown in FIG. 11, the image recognition device 100A includes an optical camera (hereinafter, simply referred to as a camera) 1101 and a radar sensor 1102 as an imaging unit. As a result, a three-dimensional object is detected. In S211 the image is captured by the camera 1101, and the captured image information is subjected to image processing S203 such as correction for absorbing the unique characteristics of the image sensor. The processing result of the image processing S203 is stored in the image buffer 161. Further, in S212, the radar sensor 1102 obtains the distance to the three-dimensional object as sensor information.
The three-dimensional object detection process S213 detects a three-dimensional object in the three-dimensional space based on the distance to the three-dimensional object. The distance information used for detection is stored in the distance buffer 163. The distance buffer 163 is provided, for example, in the storage unit 106 of FIG. Further, in the three-dimensional object detection process S213, the image and the distance are associated with each other as necessary for the subsequent process. In the three-dimensional object recognition process S214, in substantially the same manner as the above-mentioned image recognition device 100 (here, using the distance information of the three-dimensional object), with respect to the detection area set on the image by the three-dimensional object detection process S213. Performs recognition processing to identify the type of three-dimensional object.

The three-dimensional object detection process S213, which inputs the distance from the radar sensor 1102 to the three-dimensional object, needs to perform the detection process in consideration of the sensor characteristics of the radar sensor 1102 used for distance measurement, but determines the detection area. The subsequent processing can be performed in the same manner as the configuration by the stereo camera described in the image recognition device 100. Further, the image recognition device 100A does not require a plurality of images in the image processing S203.

(Effect of action) The

image recognition devices

100 and 100A of the present embodiment described above are three-dimensional with respect to the detection region of a three-dimensional object set on the image captured by the

cameras

101, 102 and 1101 as the imaging unit. The distance information or the parallax information of the object is numerically converted, and the numerically converted distance information or the parallax information is combined with the image information of the image to perform a recognition process for specifying the type of the three-dimensional object.

Specifically, in performing the recognition process, the distance information or parallax information of the three-dimensional object to be recognized is normalized with respect to the information of each pixel obtained from the

cameras

101, 102, 1101 and the corresponding distance or parallax information. (Figs. 4 and 5), or mask distance information or parallax information other than the recognition target, or change the weight of pixel information and distance information or parallax information (Figs. 6 and 7), or combine them (Fig. 8, 9) By doing so, recognition that combines pixel information and distance information or parallax information is realized.

According to the embodiment described above, the following effects can be obtained.

That is, the

image recognition devices

100 and 100A of the present embodiment can improve the positive recognition rate with respect to the

detection areas

301 and 302 of the three-dimensional object set on the images captured by the

cameras

101, 102 and 1101. it can. In addition, it is possible to suppress erroneous recognition of other background three-dimensional objects as objects such as pedestrians and vehicles to be recognized. In particular, the shape (appearance on the image) similar to the recognition target generated by the combination of the foreground and the background has the effect of suppressing erroneous recognition of the target. Therefore, according to the present embodiment, it is possible to accurately detect a three-dimensional object and improve the recognition performance while suppressing an increase in cost.

In the above-described embodiment, a stereo camera or a monocular camera composed of two cameras is used, but three or more cameras may be used. Further, although the front camera that images the front of the vehicle (in other words, acquires the image of the front of the vehicle) is illustrated, it is natural that a rear camera or a side camera that images the rear of the vehicle or the side of the vehicle may be used. Is.

The present invention is not limited to the above-described embodiments, and other embodiments that can be considered within the scope of the technical idea of the present invention are also included within the scope of the present invention as long as the features of the present invention are not impaired. .. For example, the above-described embodiment has been described in detail in order to explain the present invention in an easy-to-understand manner, and is not necessarily limited to the one including all the described configurations. Further, the configuration may be a combination of the above-described embodiment and a modified example.

Further, each of the above configurations, functions, processing units, processing means, etc. may be realized by hardware by designing a part or all of them by, for example, an integrated circuit. Further, each of the above configurations, functions, and the like may be realized by software by the processor interpreting and executing a program that realizes each function. Information such as programs, tables, and files that realize each function can be stored in a memory, a hard disk, a storage device such as an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, or a DVD.

In addition, the control lines and information lines indicate those that are considered necessary for explanation, and not all control lines and information lines are necessarily indicated on the product. In practice, it can be considered that almost all configurations are interconnected.

100, 100A

Image recognition device

101, 102 Camera (imaging unit) 103 Image input interface 104 Image processing unit 105 Arithmetic processing unit 106 Storage unit 107 CAN interface 108 Control processing unit 109 Internal bus 110 Processing device 111 In-vehicle network CAN161 Image buffer 162 Misalignment Buffer 163 Distance buffer 401 Normalization processing unit (Example 1) 402 Recognition processing unit (Example 1) 601 Weight generation processing unit (Example 2) 602 Recognition processing unit (Example 2) 801 Weight generation processing unit (Example 1) 3) 802 Normalization processing unit (Example 3) 803 Recognition processing unit (Example 3) 1101 Optical camera (imaging unit) 1102 Radar sensor

Claims

An image recognition device that recognizes a three-dimensional object on an image captured by an image pickup unit.
The distance information or disparity information of the three-dimensional object is numerically converted with respect to the detection area of the three-dimensional object set on the image, and the numerically converted distance information or disparity information is combined with the image information of the image. An image recognition device characterized by performing a recognition process for specifying the type of a three-dimensional object.
In the image recognition device according to claim 1,
For the detection area of a three-dimensional object set on the image,
A normalization processing unit that numerically converts and normalizes the distance information or parallax information of the three-dimensional object based on an arbitrary rule, and
It is characterized by including a recognition processing unit that performs recognition processing for specifying the type of the three-dimensional object by using the distance information or parallax information numerically converted by the normalization processing unit and the image information of the image. Image recognition device.
In the image recognition device according to claim 1,
For the detection area of a three-dimensional object set on the image,
A weight generation processing unit that generates weights corresponding to each pixel or each distance or each parallax from the distance information or parallax information of the three-dimensional object.
An image recognition device including a recognition processing unit that performs recognition processing for specifying the type of the three-dimensional object by using the weight information generated by the weight generation processing unit and the image information of the image. ..
In the image recognition device according to claim 3,
The recognition processing unit uses the weight information generated by the weight generation processing unit, the image information of the image, and the distance information or the parallax information of the three-dimensional object to identify the type of the three-dimensional object. An image recognition device characterized by performing.
In the image recognition device according to claim 1,
A weight generation processing unit that generates weights corresponding to each pixel or each distance or each parallax from the distance information or parallax information of the three-dimensional object with respect to the detection area of the three-dimensional object set on the image.
Based on the weight information obtained by the weight generation processing unit, the distance information or parallax information of the three-dimensional object is numerically converted and normalized to the detection area of the three-dimensional object set on the image. Processing unit and
It is characterized by including a recognition processing unit that performs recognition processing for specifying the type of the three-dimensional object by using the distance information or parallax information numerically converted by the normalization processing unit and the image information of the image. Image recognition device.
In the image recognition device according to claim 5,
The recognition processing unit uses the distance information or parallax information numerically converted by the normalization processing unit, the weight information generated by the weight generation processing unit, and the image information of the image to form the three-dimensional object. An image recognition device characterized by performing recognition processing for specifying a type.