WO2017188017A1

WO2017188017A1 - Detection device, detection method, and program

Info

Publication number: WO2017188017A1
Application number: PCT/JP2017/015212
Authority: WO
Inventors: 小野　博明; 英史山田; 光永　知生
Original assignee: ソニーセミコンダクタソリューションズ株式会社
Priority date: 2016-04-28
Filing date: 2017-04-14
Publication date: 2017-11-02
Also published as: JP2017199278A

Abstract

The present technology pertains to a detection device, a detection method, and a program, with which a prescribed object can be detected. The detection device is equipped with: an acquisition unit that acquires distance information pertaining to the distance to a photographic subject; a setting unit that, on the basis of the distance information and a feature amount of an object to be detected, sets a region in which there is a possibility that the object is being photographed; and a determination unit that determines whether an image in the region is the object. Alternatively, the detection device is equipped with: an acquisition unit that acquires distance information pertaining to the distance to a photographic subject; a setting unit that uses the distance information to set a region in which there is a possibility that a prescribed object is being photographed; and an estimation unit that, on the basis of the size of the region and the distance information, estimates a category to which the object belongs. The present technology can be applied to detection devices that detect prescribed objects.

Description

Detection apparatus, detection method, and program

The present technology relates to a detection device, a detection method, and a program, and more particularly, to a detection device, a detection method, and a program that detect a predetermined subject from an image, for example.

It has been proposed to recognize a product by detecting a boundary between the product and the background by applying a pattern matching technique or an edge detection technique from a photographed image and cutting out a product region (for example, Patent Document 1). reference).

Further, in order to recognize a subject of all sizes, it has been proposed to recognize a search target by changing the resolution of the search image and executing a plurality of scans (see, for example, Patent Document 2). ).

JP 2016-31599 A JP 2011-14148 A

It is desired to reduce processing when recognizing (detecting) a predetermined object.

The present technology has been made in view of such a situation, and makes it possible to reduce processing when a predetermined object is recognized (detected).

In a first detection device according to one aspect of the present technology, the object is imaged from an acquisition unit that acquires distance information regarding a distance to a subject, the distance information, and a feature amount of an object to be detected. A setting unit configured to set a possible region; and a determination unit configured to determine whether an image in the region is the object.

A second detection device according to one aspect of the present technology sets an area where a predetermined object may be captured using an acquisition unit that acquires distance information regarding a distance to a subject and the distance information A setting unit; and an estimation unit that estimates a category to which the object belongs from the size of the region and the distance information.

The first detection method according to one aspect of the present technology acquires distance information regarding a distance to a subject, and the object may be captured from the distance information and a feature amount of the object to be detected. A step of setting a certain region and determining whether or not an image in the region is the object.

According to a second detection method of an aspect of the present technology, distance information related to a distance to a subject is acquired, an area where a predetermined object may be captured is set using the distance information, and the area And estimating the category to which the object belongs from the distance information and the distance information.

The first program according to an aspect of the present technology may acquire distance information related to a distance to a subject in a computer, and the object may be captured from the distance information and a feature amount of the object to be detected. A region having a characteristic is set, and a process including a step of determining whether or not an image in the region is the object is executed.

The second program of one aspect of the present technology acquires distance information related to a distance to a subject in a computer, sets an area where a predetermined object may be imaged using the distance information, A process including a step of estimating a category to which the object belongs is executed based on the size of the area and the distance information.

In the first detection device, the detection method, and the program according to one aspect of the present technology, distance information regarding the distance to the subject is acquired, and an object is imaged from the distance information and the feature amount of the object to be detected. An area that may be present is set, and it is determined whether or not the image in the area is an object.

In the second detection device, the detection method, and the program according to one aspect of the present technology, the distance information related to the distance to the subject is acquired, and the distance information may be used to capture a predetermined object Is set, and the category to which the object belongs is estimated from the size of the region and the distance information.

Note that the detection device may be an independent device or an internal block constituting one device.

Also, the program can be provided by being transmitted through a transmission medium or by being recorded on a recording medium.

According to one aspect of the present technology, processing when recognizing (detecting) a predetermined object can be reduced.

It should be noted that the effects described here are not necessarily limited, and may be any of the effects described in the present disclosure.

It is a figure showing the composition of the 1 embodiment of the detecting device to which this art is applied. It is a flowchart for demonstrating a 1st recognition process. It is a figure for demonstrating the process regarding the detection of a predetermined | prescribed object. It is a figure which shows an example of a table. It is a figure for demonstrating the process regarding the detection of a predetermined | prescribed object. It is a figure which shows the structure of the detection apparatus in 2nd Embodiment. It is a flowchart for demonstrating a 2nd recognition process. It is a figure for demonstrating the process regarding the detection of a predetermined | prescribed object. It is a figure which shows the structure of the detection apparatus in 3rd Embodiment. It is a flowchart for demonstrating a 3rd recognition process. It is a figure for demonstrating a to-be-photographed object's direction detection. It is a figure which shows the structure of the detection apparatus in 4th Embodiment. It is a flowchart for demonstrating a 4th recognition process. It is a figure for demonstrating a to-be-photographed object's direction detection. It is a figure which shows the structure of the detection apparatus in 5th Embodiment. It is a flowchart for demonstrating a 5th recognition process. It is a figure for demonstrating the detection of an object. It is a figure which shows the structure of the detection apparatus in 6th Embodiment. It is a flowchart for demonstrating a 6th recognition process. It is a figure which shows the structure of the detection apparatus in 7th Embodiment. It is a flowchart for demonstrating a 7th recognition process. It is a figure which shows the structure of the detection apparatus in 8th Embodiment. It is a flowchart for demonstrating the 8th recognition process. It is a figure for demonstrating a laminated structure. It is a figure for demonstrating a laminated structure. It is a figure for demonstrating a recording medium. It is a block diagram which shows an example of a schematic structure of a vehicle control system. It is explanatory drawing which shows an example of the installation position of a vehicle exterior information detection part and an imaging part.

Hereinafter, modes for carrying out the present technology (hereinafter referred to as embodiments) will be described. The present technology is applicable to recognizing (detecting) a predetermined object, for example, an object such as a person (face, upper body, whole body), automobile, bicycle, foodstuff, or the like. In addition, the present technology detects such a predetermined object using distance information. In the following description, a case where a human face is detected using distance information will be described as an example.

<First Embodiment>
FIG. 1 is a diagram illustrating a configuration of an embodiment of a detection device to which the present technology is applied. The detection apparatus 100 illustrated in FIG. 1 includes a distance information acquisition unit 111, a subject feature extraction unit 112, a subject candidate area detection unit 113, and an actual size database 114.

The distance information acquisition unit 111 measures the distance to the subject, generates a measurement result (distance information), and outputs the result to the subject feature extraction unit 112. The distance information acquisition unit 111 acquires distance information by a distance measuring sensor using active light (infrared rays or the like), for example. As a distance measuring sensor using active light, a TOF (Time-of-Flight) method, a Structured Light method, or the like can be applied.

Also, the distance information acquisition unit 111 may be configured to acquire distance information by a distance measuring sensor (ranging method) using reflected light of active light, for example, a TOF light source or a camera flash light. The distance information acquisition unit 111 may be configured to acquire distance information with a stereo camera. The distance information acquisition unit 111 may be configured to acquire distance information using an ultrasonic sensor. The distance information acquisition unit 111 may be configured to acquire distance information by a method using a millimeter wave radar.

The subject feature extraction unit 112 sets a detection target, for example, a frame that may have a human face, from the distance information. The frame is set by referring to a table stored in the actual size database 114. The actual size database 114 manages a table in which the size on the image in consideration of the distance and the actual size of the detected subject is associated. For example, when there is a person's face at a position separated by a predetermined distance, the table describes how much the face of the person is on the image.

The subject candidate area detection unit 113 determines whether or not there is a detection target within the set frame. If there is, the subject candidate area detection unit 113 cuts out the frame and outputs it to a subsequent processing unit (not shown).

Referring to the flowchart shown in FIG. 2, the operation of the detection apparatus 100 will be described.

In step S101, the subject feature extraction unit 112 sets a pixel (target pixel) to be processed. For example, the pixel of interest is sequentially set from the upper left pixel of the image. For example, as shown in the upper diagram of FIG. 3, when an image 131 (distance image 131) is acquired, the pixel of interest is sequentially set from the upper left pixel to the lower right pixel of the distance image 131. In the upper diagram of FIG. 3, the order of the target pixels that are sequentially set is represented by an arrow, but the target pixels may be set in an order other than such an order.

The distance image 131 is an image generated from distance information. For example, the same distance is an image represented by the same color and colored according to the distance. Note that the distance image 131 according to the present technology does not need to be an image colored according to the distance, but simply knows how far a predetermined pixel (subject) in the image 131 is from the detection device 100. Any image can be used.

Here, the description will be continued assuming that the distance image 131 is generated as shown in FIG. Further, as illustrated in the upper diagram of FIG. 3, the description will be continued with an example in which a pixel at a predetermined position in the distance image 131 is set as the pixel of interest 132.

In step S102, distance information on the target pixel is acquired. In step S103, the size of the detection frame is determined from the distance and the actual size of the subject to be detected. As shown in the middle diagram of FIG. 3, the subject feature extraction unit 112 sets a detection frame 133 from the distance of the target pixel 132 and the actual size of the subject to be detected.

The subject feature extraction unit 112 sets a detection frame 133 with reference to a table managed by the actual size database 114. In the actual size database 114, for example, a table 151 as shown in FIG. 4 is stored.

The table 151 shown in FIG. 4 is a table in which the size on the image based on the distance and the actual size of the face is associated. For example, when the distance is 0 (cm), the size on the face image is 30 pixels × 30 pixels, and when the distance is 50 (cm), the size on the face image is 25 pixels × 25 pixels. Yes, when the distance is 100 (cm), the relationship that the size on the face image is 20 pixels × 20 pixels is described.

The distance is a distance between the detection device 100 and a subject (in this case, a human face). The actual face size is the average human face size at a predetermined distance, for example, 50 cm away. Since human faces vary depending on gender and age, and there are individual differences, the actual face size will be described here as an average human face size.

Note that a table 151 in which the size on the image based on the actual size of a plurality of faces is associated with one distance may be created, and processing may be performed using such a table 151. For example, for one distance, the size on the image based on the actual size of the male face, the size on the image based on the actual size of the female face, and the size on the image based on the actual size of the child May be associated. In this case, a detection frame 133 corresponding to the size on the image based on each actual size may be set, and the process of step S104 described later may be executed for each detection frame 133.

FIG. 4 shows an example in which the distances are related to 0, 50, 100 and the size on the image based on the actual size in units of 50 centimeters. However, the distance is not limited to 50 centimeters. The value can be changed depending on the accuracy of information and the accuracy required for detection.

In step S103, the subject feature extraction unit 112 sets the detection frame 133 from the size on the image based on the distance of the target pixel 132 and the actual size of the subject to be detected, as shown in the middle diagram of FIG. To do. For example, when processing is performed with reference to the table 151 as illustrated in FIG. 4 and the distance of the target pixel 132 is determined to be 50 centimeters, the pixel of interest 132 is 25 pixels × 25 pixels centering on the target pixel 132. A detection frame 133 is set.

In addition, here, the detection frame 133 is described as an example of a quadrangle as illustrated in FIG. 3, but the detection frame 133 is not limited to a rectangle such as a rectangle, but may be other shapes such as a circle. Here, an example in which the actual size of the subject is the feature (feature amount) of the subject and the detection frame 133 is set using the size on the image based on the actual size will be described. It is also possible to set the detection frame 133 using the feature (feature amount).

Thus, the subject feature extraction unit 112 functions as a setting unit that sets the size of the subject as the feature amount and sets the detection frame 133 set to the feature amount in this case.

In this way, the detection frame 133 corresponding to the face size corresponding to the distance is set in the captured image 131. The detection frame 133 is a frame that is set by calculating the size on the distance image 131 if there is a subject to be detected, for example, a human face, at the distance of the position of the target pixel 132. It is. Note that such calculation itself is omitted, and other forms such as those described in the table 151 can be applied.

In step S104, the subject candidate area detection unit 113 determines whether the image in the detection frame 133 is a candidate for a subject to be detected. For example, a filter having the same size as the detection frame 133 is applied to the distance image 131, and the response value is used as the probability value of the subject candidate. As the filter, a DOG (Difference-of-Gaussian) filter, a Laplacian filter, or the like can be applied.

Whether the image in the detection frame 133 is a candidate for a subject to be detected can be determined using distance information in the detection frame 133 and the detection frame 133.

For example, when a person's face is imaged in the detection frame 133, since the person's face has irregularities, the distance information in the detection frame 133 is also information that varies in distance. Also, for example, when a human face is captured in the detection frame 133, but in the case of a human face shown in a photograph (poster), the distance information in the detection frame 133 is a constant value, The information is not distant.

It is determined whether or not there is a subject to be detected in the detection frame 133 by detecting such a degree of dispersal by applying a filter. The determination result is output to a processing unit (not shown) at the subsequent stage of the detection apparatus 100. Only when it is determined that there is a subject to be detected in the detection frame 133, the image in the detection device 100 can be cut out from the image 131, and the cut out image can be output.

For example, as shown in the lower diagram of FIG. 3, when it is determined that there is a face in the detection frame 133-1 and there is a subject to be detected, the image in the detection frame 133-1 is cut out from the image 131, Is output. If it is determined that there is no face in the detection frame 133-2 and there is no subject to be detected, the image in the detection frame 133-2 is not cut out.

The timing at which the cutout is performed can be performed after the processing in step S105 is completed. Further, the determination result in the determination process in step S104, that is, in this case, the value when the filter is applied is set as the probability value of the subject candidate, and after the process in step S105, clipping is performed based on the probability value. good. It is also possible to have a configuration in which only the probability value is output to the subsequent stage.

In this way, the subject candidate area detection unit 112 functions as a determination unit that determines whether the image in the detection frame 133 is a subject to be detected. In addition, an image that is determined by the subject candidate area detection unit 112 to be a subject to be detected can be cut out and output to a subsequent processing unit or the like.

In step S105, it is determined whether or not such processing has been completed for all pixels in the image 131. If it is determined that the processing has not been completed for all pixels, the processing returns to step S101. Then, a new target pixel is set, and the processing after step S102 is performed on the set target pixel.

On the other hand, if it is determined in step S105 that such processing has been completed for all the pixels in the image 131, the recognition processing is terminated.

By repeating the processing of steps S101 to S105, the probability value of the subject candidate is obtained for all the pixels in the distance image 131. Then, the maximum value of the probability value is set as the center position of the detection subject, the pixel at the center position is set as the target pixel 132, and the image in the detection frame 133 is cut out.

Note that, since the target pixel 132 is set and the detection frame 133 is set as described above, the target pixel 132 may not be set for all the pixels in the image 131.

For example, when the pixel located at the upper left of the image 131 is set as the target pixel 132, the detection frame 133 cannot be set. Therefore, even if the detection frame 133 is set, a part of the detection frame 133 (in this case, 3/4) is missing. Therefore, a pixel in such a region may not be set as the target pixel 132. Similarly, the region near the side of the image 131 is also a region where the detection frame 133 cannot be set. Therefore, a pixel in such a region may not be set as the target pixel 132.

Further, the target pixel 132 may be sequentially set pixel by pixel, but may be set at a predetermined interval, for example, every five pixels.

In addition, an area where the distance is determined to be far away, in other words, a pixel in an area where the background can be determined may not be set as the pixel of interest 132. In this way, the target pixel 132 to be processed can be reduced, and the processing can be reduced.

By performing such detection, for example, a detection result as shown in FIG. 5 is obtained. The upper diagram of FIG. 5 shows an example of the image 131, and shows a case where the detection frames 133-1 to 133-4 are set as regions where there is a possibility that there is a detection target (human face).

In the detection frame 133-1, the subject candidate area detection unit 113 determines that there is a face in the detection frame 133-1 set according to the distance of the subject by the subject feature extraction unit 112 (FIG. 1). The image within the detection frame 133-1 is cut out and output.

In the detection frame 133-2, the subject candidate area detection unit 113 determines that there is no face in the detection frame 133-2 set according to the distance of the subject by the subject feature extraction unit 112 (FIG. 1). The image within the detection frame 133-2 is not cut out.

In the detection frame 133-3, the subject candidate area detection unit 113 determines that there is a face within the detection frame 133-3 set according to the distance of the subject by the subject feature extraction unit 112 (FIG. 1). The image within the detection frame 133-3 is cut out and output.

In the detection frame 133-4, even if there is a face in the detection frame 133-1 set according to the distance of the subject by the subject feature extraction unit 112 (FIG. 1), the face is a photograph or the like. In this case, since the subject candidate area detection unit 113 determines that there is no face, the image in the detection frame 133-4 is not cut out.

Thus, in the present technology, an object is detected using the distance and the size of the detected object at the distance. By performing detection in this manner, objects other than the size of the detected object at that distance are removed from the detection target at a predetermined distance, and thus the possibility of erroneous detection can be reduced. .

Also, for example, when the detection target is a human face, there is no false detection of objects that are not far away, such as a human face in a photograph. Can be reduced.

In addition, for example, the detection can be reduced by performing detection using the present technology rather than performing detection by pattern matching or the like.

<Second Embodiment>
Next, a second embodiment will be described. FIG. 6 is a diagram illustrating a configuration example of the detection device 200 according to the second embodiment. In the detection apparatus 200 illustrated in FIG. 6 and the detection apparatus 100 illustrated in FIG. 1, the same portions are denoted by the same reference numerals, and description thereof is omitted.

The detection device 200 in the second embodiment is configured by adding an imaging unit 211 and a subject detail recognition unit 212 to the detection device 100 in the first embodiment.

The imaging unit 211 includes an imaging element such as a CCD or a CMOS image sensor, captures an image of ambient light (described as a normal image), and supplies the image to the subject detail recognition unit 212. The detection result from the subject candidate area detection unit 113 is also supplied to the subject detail recognition unit 212.

As described in the first embodiment, the subject candidate region detection unit 113 cuts out and outputs a region determined to have a detection target, for example, a human face, using the distance image 131. The subject detail recognition unit 212 performs more detailed recognition on the subject in the region supplied from the subject candidate region detection unit 113 using the normal image. For example, a recognition process for specifying an individual such as gender and age is performed.

Referring to the flowchart shown in FIG. 7, the processing of the detection apparatus 200 shown in FIG. 6 will be described.

Steps S201 to S205 are processes performed by the distance information acquisition unit 111 to the subject candidate area detection unit 113, and are performed in the same manner as steps S101 to S105 of the flowchart shown in FIG.

In step S206, the subject detail recognition unit 212 performs detail recognition using the subject candidate detection frame. For example, the subject detail recognition unit 212 sets the detection frame 133 supplied from the subject candidate region detection unit 113 as a corresponding region of the normal image from the imaging unit 211, and cuts out the image within the set detection frame 133. Then, using the extracted normal image, a preset recognition process such as a recognition process for specifying an individual such as the sex or age of the subject is executed.

By performing the processing in this way, the subject can be detected in more detail.

The information supplied from the subject candidate area detecting unit 113 to the subject detail recognizing unit 212 includes the size on the image (detection frame 133) based on the actual size of the subject, the representative point (for example, the target pixel 132), and the distribution map. It can be information such as (eg, heat map, filter response value). The subject detail recognition unit 212 performs detail recognition using information supplied from the subject candidate region detection unit 113.

By performing such detection (recognition) processing, for example, a detection result as shown in FIG. 8 is obtained. The upper and middle views of FIG. 8 are the same as FIG. That is, by the detection process using the distance image 131, the detection frame 133-1 and the detection frame 133-3 are supplied to the subject detail recognition unit 212 as information on the area where the subject is detected.

The subject detail recognizing unit 212 uses, for example, a method such as DNN (DeepningLearning) to detect an image cut out from the normal image when the detection frame 133-1 and the detection frame 133-3 are set for the normal image. To perform recognition processing.

As described above, also in the second embodiment, the distance and the size of the detected object at the distance are used to detect the detected object, so that the detection accuracy is improved and the processing load related to the detection is reduced. It becomes possible to make it. Furthermore, in the second embodiment, since a detailed recognition process is executed using a normal image (an image other than a distance image), the subject can be detected in more detail and the subject can be recognized.

<Third Embodiment>
Next, a third embodiment will be described. FIG. 9 is a diagram illustrating a configuration example of the detection apparatus 300 according to the third embodiment. In the detection apparatus 300 illustrated in FIG. 9 and the detection apparatus 100 illustrated in FIG. 1, the same portions are denoted by the same reference numerals, and description thereof is omitted.

The detection apparatus 300 according to the third embodiment is different from the detection apparatus 100 according to the first embodiment in that a subject direction detection unit 311 is added to the detection apparatus 100 according to the first embodiment. .

The subject direction detection unit 311 detects the direction in which the detected subject is facing. The detection device 300 according to the third embodiment detects the position, size, and direction of the subject.

Referring to the flowchart shown in FIG. 10, the processing of the detection apparatus 300 shown in FIG. 9 will be described.

Steps S301 to S306 (excluding step S305) are processes performed by the distance information acquisition unit 111 to the subject candidate area detection unit 113, and are performed in the same manner as steps S101 to S105 in the flowchart shown in FIG. Description is omitted.

In step S305, the subject candidate region detection unit 113 determines that the region (the region set by the detection frame 133) determined that there is a subject to be detected and the image cut out from the region are displayed in the subject direction detection unit 311. Supplied. The subject direction detection unit 311 detects the direction of the detected subject.

For example, a case where a screen as shown in FIG. 11 is acquired will be described as an example, and the direction detection will be described. In the example illustrated in FIG. 11, the detection target is described as being a hand. The subject feature extraction unit 112 and the subject candidate region detection unit 113 execute the processes of steps S302 to S304, so that a detection frame 133 is set in the distance image 131, and the detection frame 133 includes a detection target. A hand is detected.

The subject direction detection unit 311 divides the inside of the detection frame 133 into a predetermined size, and uses the divided area as the subject surface, and obtains the normal direction of the surface. In the image shown in FIG. 11, the palm faces rightward in the figure. When the palm is directed to the right, as the distance information of the palm part, distance information gradually getting farther from the front toward the back is obtained.

When a normal is set for the palm part (surface) from which such distance information can be obtained, a normal in the right direction in the figure is set as shown in FIG. From this set normal, it is determined that the palm is facing the right direction in the figure.

Thus, by using the distance information, the direction in which the subject is facing can also be determined. Therefore, according to the third embodiment, as in the first and second embodiments, the detection object is detected using the distance and the size of the detection object at the distance, so the detection accuracy is improved. It becomes possible to make it. It is also possible to determine the direction of the detected subject.

<Fourth embodiment>
Next, a fourth embodiment will be described. FIG. 12 is a diagram illustrating a configuration example of the detection apparatus 400 according to the fourth embodiment. In the detection apparatus 400 illustrated in FIG. 12 and the detection apparatus 300 illustrated in FIG. 9, the same portions are denoted by the same reference numerals, and description thereof is omitted.

The detection device 400 according to the fourth embodiment is configured by adding an imaging unit 411 and a subject detail recognition unit 412 to the detection device 300 according to the third embodiment. The added imaging unit 411 and subject detail recognition unit 412 perform basically the same processing as the imaging unit 211 and subject detail recognition unit 212 (both in FIG. 6) of the detection apparatus 200 in the second embodiment.

The imaging unit 411 captures a normal image and supplies it to the subject detail recognition unit 412. The subject detail recognition unit 412 is also supplied with the detection result from the subject direction detection unit 311. The subject direction detection unit 311 outputs a detection target, for example, the position of a human face (position where the detection frame 133 is set), its size (size of the detection frame 133), and its direction.

The subject detail recognition unit 412 performs more detailed recognition on the subject in the area supplied from the subject direction detection unit 311 using a normal image. For example, a recognition process for specifying an individual such as gender and age is performed.

Referring to the flowchart shown in FIG. 13, the processing of the detection apparatus 400 shown in FIG. 6 will be described.

Steps S401 to S406 are processes performed by the distance information acquisition unit 111, the subject feature extraction unit 112, the subject candidate region detection unit 113, and the subject direction detection unit 311. The processing in steps S301 to S306 in the flowchart illustrated in FIG. Since it is performed similarly, the description is abbreviate | omitted.

In step S407, the subject detail recognition unit 412 performs detail recognition using the subject candidate detection frame and the direction of the subject. For example, the subject detail recognition unit 412 sets the detection frame 133 supplied from the subject direction detection unit 311 as a corresponding area of the normal image from the imaging unit 411, and cuts out the image in the set detection frame 133. Then, using the extracted normal image, a preset recognition process such as a recognition process for specifying an individual such as the sex or age of the subject is executed. This recognition processing is performed in consideration of the direction of the subject.

FIG. 14 shows a diagram comparing the recognition method performed by the detection apparatus 400 according to the fourth embodiment with other recognition methods. The left diagram in FIG. 14 is a diagram illustrating an example of another recognition method. For example, when a face is detected as a detection target from a normal image, first, it is assumed that the detected object is a face, and in order to determine whether the face is facing forward or backward or left and right The determination with reference to the front / rear / left / right determination dictionary 431 is performed.

When it is determined that the face is facing in the front-rear direction (facing in a direction other than the left-right direction), the front / rear determination dictionary 432 is referred to and it is determined whether it is forward-facing or backward-facing. If it is determined to be forward, the forward dictionary 434 is referred to, and it is determined whether or not it is a human face, and if it is a human face, it is determined whether or not it is a forward-facing face. With this process, when data for identifying an individual is described in the forward dictionary 434, a person is identified by matching with the data.

On the other hand, when the forward / backward determination dictionary 432 is referred to and determined to be backward, the backward dictionary 435 is referred to to determine whether or not the face is a person's face. It is determined whether or not there is. As a result of this processing, when data specifying an individual is described in the backward dictionary 435, a person is specified by matching the data.

On the other hand, when the front / rear / left / right determination dictionary 431 is referred to and the face is determined to face in the left / right direction (ie, facing in a direction other than the front / rear direction), the left / right determination dictionary 433 is referred to and leftward. Or whether it is facing right. When it is determined that the face is leftward, the leftward dictionary 436 is referred to, and it is determined whether or not the face is a human face. If the face is a human face, it is determined whether or not the face is a leftward face. With this process, when data for identifying an individual is described in the left-facing dictionary 436, a person is identified by matching the data.

On the other hand, when the left / right determination dictionary 433 is referred to and determined to be right-facing, the right-facing dictionary 437 is referred to to determine whether or not it is a human face. It is determined whether or not there is. By this processing, when data for specifying an individual is described in the right-facing dictionary 437, a person is specified by matching with the data.

Thus, in the case of the conventional recognition process, the recognition process is performed by referring to a plurality of dictionaries and making a determination.

In the detection apparatus 400 according to the fourth embodiment, a region, a size, and a direction where a subject is present is detected from the distance image 131, and the subject detail recognition unit 412 (FIG. 12) recognizes using the information. Process. Accordingly, as shown in the right diagram of FIG. 14, recognition processing can be performed by preparing an X-direction dictionary 451 and referring to the X-direction dictionary 451.

The X-direction dictionary 451 is a dictionary including a forward-facing dictionary 434, a backward-facing dictionary 435, a left-facing dictionary 436, and a right-facing dictionary 437. Since the subject direction recognition unit 412 (FIG. 12) is also supplied with the direction of the subject, only the dictionary relating to the supplied direction can be referred to and the recognition process can be performed.

According to the detection apparatus 400 in the fourth embodiment, the number of dictionaries (data amount) can be reduced, and the determination process performed a plurality of times with reference to the dictionary can be omitted. Therefore, according to the detection apparatus 400 in the fourth embodiment, it is possible to reduce processing related to recognition processing. In addition, since only the detection target, for example, an image that has a high possibility of being a face (a clipped image) is a target for detailed recognition, a region to be processed in the image is narrowed. Therefore, it is possible to reduce processing related to recognition processing.

As described above, also in the fourth embodiment, since the detection object is detected by using the distance and the size of the detection object at the distance, the detection accuracy can be improved. In the fourth embodiment, since a detailed recognition process is executed using a normal image (an image other than a distance image), the subject can be detected in more detail and the subject can be recognized. The recognition process can be a process in which the direction of the subject is acquired in advance, and the process can be reduced.

<Fifth embodiment>
Next, a fifth embodiment will be described. In the fifth embodiment and the sixth to eighth embodiments described below, the detection target is detected by estimating the size of the subject and estimating the category to which the subject belongs.

FIG. 15 is a diagram illustrating a configuration example of the detection apparatus 500 according to the fifth embodiment. The detection apparatus 500 shown in FIG. 15 includes a distance information acquisition unit 111, a subject size estimation unit 511, and a subject category estimation unit 512.

The distance information acquisition unit 111 has a configuration similar to that of the distance information acquisition unit 111 included in the detection device 100, for example, and has a function of acquiring distance information for generating the distance image 131.

The subject size estimation unit 511 estimates the size of the subject and supplies the estimated size information to the subject category estimation unit 512. The subject category estimation unit 512 estimates the category to which the subject belongs from the estimated size of the subject and the distance at which the subject is located.

For example, as described above, in the case of a person's face, it is possible to know how large the person's face is at a predetermined distance away. In other words, if there is an object of a size that can be determined as a human face at a position separated by a predetermined distance, it can be estimated that there is a human face.

Utilizing this, the detection apparatus 500 estimates the size of the subject, and the category to which the subject belongs, for example, a category such as a human face category or a car category is determined from the size and distance.

Referring to the flowchart shown in FIG. 16, the processing of the detection apparatus 500 shown in FIG. 15 will be described.

In step S501, the subject size estimation unit 511 sets a target pixel. This process can be performed, for example, in the same manner as step S101 in the flowchart shown in FIG.

In step S502, the subject size estimation unit 511 acquires a distance around the set target pixel position. In step S503, the subject size estimation unit 511 estimates the subject size based on the peripheral distance distribution. For example, since the distance between the object and the background is greatly different, a region where the distance greatly changes (that is, an edge) is detected with reference to the surrounding distance distribution, so that the region where the object exists (the size up to the edge portion) Can be estimated.

In step S504, the subject category estimation unit 512 estimates the subject category based on the distance and the subject size. As described above, since the category of the object at the position can be estimated from the distance and the size, such estimation is performed in step S504.

For example, assume that a distance image 131 as shown in FIG. 17 is acquired. A distance image 131 illustrated in FIG. 17 is an image in which a hand is captured. For example, when a predetermined position of the hand is set as the target pixel 132, the distance distribution around the target pixel 132 is referred to.

The distance is greatly different between the hand part and the background. That is, in this case, the part with the hand is a short distance, but the background is a long distance. By referring to the distance distribution around the pixel of interest 132, when the distance distribution is referenced in a direction gradually moving away from the pixel of interest 132, there is a portion where the distance changes greatly.

In this case, since the target pixel 132 is set at a position approximately in the center of the palm, when searching from the palm toward the fingertip, the distance information changes rapidly from the tip of the finger to the background. To do. From the pixel of interest 132 to the position where the distance information changes abruptly, the arrows are used in FIG. Note that the position where the distance information changes suddenly decreases when the difference between the distance of the target pixel 132 and the distance of the pixel being searched for changes to a predetermined threshold value or more. It is good also as a position where distance information changed suddenly.

In this way, a range where an object may exist is estimated from the target pixel 132. In the example shown in FIG. 17, for example, a circle or a rectangle (not shown) having a radius from the target pixel 132 to the tip of the longest arrow is set, and the size of the circle or the rectangle is the subject size. . This subject size corresponds to the detection frame 133 in the first to fourth embodiments. In other words, the detection frame 133 is set by such processing.

The category of the detected subject is estimated from the distance of the target pixel 132 and the subject size (detection frame 133). In the example illustrated in FIG. 17, it is estimated that the detected subject size at the distance of the target pixel 132 belongs to the category “hand”.

It is determined in step S505 whether or not such processing has been executed for all the pixels in the distance image 131. Similar to step S105 in the flowchart shown in FIG. 2, the pixels set as the target pixel 132 may not be all the pixels in the distance image 131 but may be partially excluded.

Thus, according to the detection apparatus 500 in the fifth embodiment, the size of the subject can be estimated from the distance image, and the category can be estimated. In addition, according to the detection device 500, a plurality of subjects (categories) can be estimated, and for example, different objects such as a person and a car can be detected.

<Sixth Embodiment>
Next, the sixth embodiment will be described. FIG. 18 is a diagram illustrating a configuration example of a detection device 600 according to the sixth embodiment. In the detection apparatus 600 illustrated in FIG. 18 and the detection apparatus 500 illustrated in FIG. 15, the same portions are denoted by the same reference numerals, and description thereof is omitted.

The detection device 600 in the sixth embodiment is configured by adding an imaging unit 611 and a subject detail recognition unit 612 to the detection device 500 in the fifth embodiment. The added imaging unit 611 performs basically the same processing as the imaging unit 211 (FIG. 6) of the detection apparatus 200 in the second embodiment.

The imaging unit 611 captures a normal image and supplies it to the subject detail recognition unit 612. The subject detail recognition unit 612 is also supplied with the distance, subject size, and subject category from the subject category estimation unit 712. The subject detail recognition unit 612 performs more detailed recognition using the normal image using the distance, subject size, and subject category supplied from the subject category estimation unit 712.

Referring to the flowchart shown in FIG. 19, the processing of the detection apparatus 600 shown in FIG. 18 will be described.

Steps S601 to S605 are processes performed by the distance information acquisition unit 111, the subject size estimation unit 511, and the subject category estimation unit 512, and are performed in the same manner as steps S501 to S505 in the flowchart shown in FIG. Is omitted.

In step S606, the subject detail recognition unit 412 performs detail recognition using the subject candidate region and the subject category. For example, the subject detail recognition unit 412 sets a frame corresponding to the subject size supplied from the subject category estimation unit 512 as a corresponding region of the normal image from the imaging unit 411, and cuts out an image within the set frame.

Then, using the extracted normal image and the subject category supplied from the subject category estimation unit 512, after narrowing down the category, recognition processing such as recognition processing for specifying an object belonging to the category is set in advance. Execute the process. For example, when the category is determined to be a person, matching with an image belonging to a person is performed. When an individual is specified, or when the category is determined to be a car, matching with an image belonging to a car is performed. Detailed recognition to identify the vehicle type is performed.

Thus, according to the detection apparatus 600 in the sixth embodiment, the size of the subject can be estimated from the distance image, and the category to which the subject belongs can be estimated. Further, according to the detection apparatus 600, a plurality of subjects (categories) can be estimated, and for example, different objects such as a person and a car can be detected. Furthermore, the detected object can be recognized in detail.

<Seventh embodiment>
Next, the seventh embodiment will be described. FIG. 20 is a diagram illustrating a configuration example of the detection apparatus 700 according to the seventh embodiment. In the detection apparatus 700 illustrated in FIG. 20 and the detection apparatus 500 illustrated in FIG. 15, the same portions are denoted by the same reference numerals, and description thereof is omitted.

The detection device 700 in the seventh embodiment adds a subject shape estimation unit 711 to the detection device 500 in the fifth embodiment, and the subject category estimation unit 712 receives an output from the subject shape estimation unit 711. This configuration differs from the detection device 500 according to the fifth embodiment.

The subject shape estimation unit 711 estimates the shape of the subject. Reference is again made to FIG. As shown in FIG. 17, when the distance image 131 in which the hand is imaged is acquired, the distance information greatly changes from the target pixel 132, that is, by searching up to the edge portion, the shape of the hand is obtained. It is done.

The subject category estimation unit 712 performs basically the same processing as the subject category estimation unit 512 of the detection apparatus 500 illustrated in FIG. 15, but the subject category estimation unit 712 illustrated in FIG. The category is estimated using the estimated shape of the subject. Therefore, the category can be estimated with higher accuracy.

Referring to the flowchart shown in FIG. 21, the processing of the detection apparatus 700 shown in FIG. 20 will be described.

Steps S701 to S703 are processes performed by the distance information acquisition unit 111 and the subject size estimation unit 511, and are performed in the same manner as steps S501 to S503 in the flowchart shown in FIG.

In step S704, the subject shape estimation unit 711 estimates the shape of the subject based on the distance distribution around the target pixel 132. As described with reference to FIG. 17, the shape is estimated by searching for a portion (edge) where the distance greatly changes using the distance information. In other words, the region where the distance is gradually changing is assumed to be a part of the detected object, and the shape of the object is obtained while determining whether such a distance is changing gently. It is done.

In step S705, the subject category estimation unit 712 estimates the category to which the subject belongs based on the distance, the subject size, and the shape. In this case, since the category is estimated using not only the distance and the subject size but also the shape information, the category can be estimated with higher accuracy.

According to the detection apparatus 700 in the seventh embodiment, it is possible to estimate the size of the subject, estimate the category, and estimate the shape of the subject from the distance image. Further, according to the detection device 700, a plurality of subjects (categories) can be estimated, and for example, different objects such as a person and a car can be detected.

Note that category estimation by the subject category estimation unit 712 may be omitted, and the subject shape estimation result by the subject shape estimation unit 711 may be output to a subsequent processing unit (not shown).

<Eighth Embodiment>
Next, an eighth embodiment will be described. FIG. 22 is a diagram illustrating a configuration example of the detection apparatus 800 according to the eighth embodiment. In the detection apparatus 800 illustrated in FIG. 22 and the detection apparatus 700 illustrated in FIG. 20, the same portions are denoted by the same reference numerals, and description thereof is omitted.

The detection apparatus 800 according to the eighth embodiment is configured by adding an imaging unit 811 and a subject detail recognition unit 812 to the detection apparatus 700 according to the seventh embodiment. The added imaging unit 811 performs basically the same processing as the imaging unit 211 (FIG. 6) of the detection apparatus 200 in the second embodiment.

The imaging unit 811 captures a normal image and supplies it to the subject detail recognition unit 812. The subject detail recognition unit 812 is also supplied with the distance, subject size, subject category, and subject shape from the subject category estimation unit 712. The subject detail recognition unit 812 performs more detailed recognition using the normal image using the distance, subject size, subject category, and subject shape supplied from the subject category estimation unit 712.

Referring to the flowchart shown in FIG. 23, the processing of the detection apparatus 800 shown in FIG. 22 will be described.

Steps S801 to S806 are processing performed by the distance information acquisition unit 111, the subject size estimation unit 511, the subject shape estimation unit 711, and the subject category estimation unit 712, and are similar to steps S701 to S706 of the flowchart shown in FIG. Since it is performed, the description thereof is omitted.

In step S807, the subject detail recognition unit 812 performs detail recognition using the subject candidate area, the subject category, and the subject shape. For example, the subject detail recognition unit 812 sets a frame corresponding to the subject size supplied from the subject category estimation unit 712 as a corresponding region of the normal image from the imaging unit 411, and cuts out an image within the set frame.

Then, after narrowing down the category using the extracted normal image and the subject category supplied from the subject category estimation unit 712, a recognition process for identifying an object that matches the subject shape from among the objects belonging to the category. For example, a preset recognition process is executed. For example, when the category is determined to be a person, matching with an image belonging to the person is performed, and when performing the matching, the object shape is referred to, and an image close to the shape, for example, the shape is a human face. In some cases, processing is performed such as narrowing down recognition to a person's face, and then identifying an individual after narrowing down.

Thus, according to the detection apparatus 800 in the eighth embodiment, it is possible to estimate the size of the subject, estimate the category, and estimate the shape of the subject from the distance image. Further, the detection apparatus 800 can estimate a plurality of subjects (categories), and can detect different objects such as a person and a car, for example.

Furthermore, the detected object can be recognized in detail. Since the detailed recognition can be performed using information such as the estimated size, category, and shape of the subject, the processing related to the detailed recognition can be reduced.

According to the detection apparatus in the first to eighth embodiments, an object can be detected from a distance image. For example, when a person is detected as an object, the present technology can be applied to a surveillance camera or the like. Further, for example, the present technology can be applied to a game machine, and can be applied to a device that detects a person who plays a game and detects a gesture of the person (detects a hand, a direction of the hand, and the like).

In addition, the detection device according to the first to eighth embodiments is mounted on a car, and a person, a bicycle, a car other than the own car is detected, and information on the detected object is notified to the user or a collision is prevented. The present invention can also be applied to a part of a device that performs control for avoiding safety.

As the detection apparatus in the first to eighth embodiments, a stacked image sensor in which a plurality of substrates (dies) are stacked can be employed. Here, the case where the detection device 200 is configured by a laminated image sensor will be described by taking the detection device 200 (FIG. 6) in the second embodiment as an example.

FIG. 24 is a diagram illustrating a first configuration example of a stacked image sensor in which the entire detection device 200 of FIG. 6 is incorporated. The stacked image sensor of FIG. 24 has a two-layer structure in which a pixel substrate 901 and a signal processing substrate 902 are stacked.

The pixel substrate 901 is formed with a distance information acquisition unit 111 (part) and an imaging unit 211 (part). When the distance information acquisition unit 111 obtains distance information by the TOF method, the distance information acquisition unit 111 includes an irradiation unit that irradiates a subject with predetermined light and an imaging element that receives the irradiated light. The distance information acquisition unit 111 can be formed on the pixel substrate 901 including a part of an imaging element or a part such as an irradiation unit.

The imaging unit 211 also includes an image sensor for capturing a normal image. The part of the image sensor that constitutes the distance information acquisition unit 211 can be formed on the pixel substrate 901.

On the eyelid signal processing board 82, a subject feature extraction unit 112, a subject candidate region detection unit 113, an actual size database 114, and a subject detail recognition unit 212 are formed.

In the stacked image sensor of FIG. 24 configured as described above, the distance information acquisition unit 111 of the pixel substrate 901 performs imaging by receiving light incident thereon, and an image (distance image) obtained by the imaging. ) To detect an object to be detected.

In the stacked image sensor of FIG. 24, the imaging unit 211 of the pixel substrate 901 performs imaging by receiving light incident thereon, and is set as a detection target from an image (normal image) obtained by the imaging. A subject image or the like is cut out and output.

FIG. 25 is a diagram illustrating a second configuration example of the stacked image sensor in which the entire detection device 200 of FIG. 6 is incorporated.

In the figure, portions corresponding to those in FIG. 24 are denoted by the same reference numerals, and description thereof will be omitted as appropriate.

The stacked image sensor in FIG. 25 has a three-layer structure in which a pixel substrate 901, a signal processing substrate 902, and a memory substrate 903 are stacked.

A distance information acquisition unit 111 and an imaging unit 211 are formed on the pixel substrate 901, and a subject feature extraction unit 112, a subject candidate region detection unit 113, and a subject detail recognition unit 212 are formed on the signal processing substrate 902. .

A real size database 114 and an image storage unit 911 are formed on the memory substrate 903.

In FIG. 25, an image storage unit 911 is stored in the memory substrate 903 as a storage region for storing a detection result by the subject candidate region detection unit 113, for example, an image cut out from a distance image in which a subject to be detected is captured. Is formed. An actual size database 114 storing the table 151 (FIG. 4) is also formed on the memory substrate 903.

In FIG. 25, the pixel substrate 901, the signal processing substrate 902, and the memory substrate 903 are stacked in that order from the top. However, for example, the order of the signal processing substrate 902 and the memory substrate 903 is changed. The pixel substrate 901, the memory substrate 903, and the signal processing substrate 902 can be stacked in this order.

In addition, the laminated image sensor can be constituted by laminating four or more layers in addition to two or three layers of substrates.

Next, a series of processes performed by each of the detection devices 100 to 800 can be performed by hardware or software. When a series of processing is performed by software, a program constituting the software is installed in a general-purpose computer or the like.

FIG. 26 is a block diagram illustrating a configuration example of an embodiment of a computer in which a program for executing the above-described series of processes is installed.

The program can be recorded in advance in a hard disk 1005 or ROM 1003 as a recording medium built in the computer.

Alternatively, the program can be stored (recorded) in a removable recording medium 1011. Such a removable recording medium 1011 can be provided as so-called package software. Here, examples of the removable recording medium 1011 include a flexible disk, a CD-ROM (Compact Disc Read Only Memory), a MO (Magneto Optical) disc, a DVD (Digital Versatile Disc), a magnetic disc, and a semiconductor memory.

The program can be installed on the computer from the removable recording medium 1011 as described above, or can be downloaded to the computer via the communication network or the broadcast network and installed on the built-in hard disk 1005. That is, the program is transferred from a download site to a computer wirelessly via a digital satellite broadcasting artificial satellite, or wired to a computer via a network such as a LAN (Local Area Network) or the Internet. be able to.

The computer includes a CPU (Central Processing Unit) 1002, and an input / output interface 1010 is connected to the CPU 1002 via a bus 1001.

When an instruction is input by the user operating the input unit 1007 via the input / output interface 1010, the CPU 1002 executes a program stored in a ROM (Read Only Memory) 1003 accordingly. . Alternatively, the CPU 1002 loads a program stored in the hard disk 1005 to a RAM (Random Access Memory) 1004 and executes it.

Thereby, the CPU 1002 performs the process according to the flowchart described above or the process performed by the configuration of the block diagram described above. Then, the CPU 1002 outputs the processing result as necessary, for example, via the input / output interface 1010, from the output unit 1006, or from the communication unit 1008, and further recorded on the hard disk 1005.

Note that the input unit 1007 includes a keyboard, a mouse, a microphone, and the like. The output unit 1006 includes an LCD (Liquid Crystal Display), a speaker, and the like.

Here, in the present specification, the processing performed by the computer according to the program does not necessarily have to be performed in chronological order in the order described as the flowchart. That is, the processing performed by the computer according to the program includes processing executed in parallel or individually (for example, parallel processing or object processing).

Further, the program may be processed by one computer (processor), or may be distributedly processed by a plurality of computers. Furthermore, the program may be transferred to a remote computer and executed.

Furthermore, in this specification, the system means a set of a plurality of components (devices, modules (parts), etc.), and it does not matter whether all the components are in the same housing. Therefore, a plurality of devices housed in separate housings and connected via a network, and a single device housing a plurality of modules in one housing are all systems. .

Note that the embodiments of the present technology are not limited to the above-described embodiments, and various modifications can be made without departing from the gist of the present technology.

For example, the configuration examples of the detection devices 100 to 800 described above can be combined within a possible range.

Here, the present technology can take a configuration of cloud computing in which one function is shared by a plurality of devices via a network and is jointly processed.

Further, each step described in the above flowchart can be executed by one device or can be shared by a plurality of devices.

Further, when a plurality of processes are included in one step, the plurality of processes included in the one step can be executed by being shared by a plurality of apparatuses in addition to being executed by one apparatus.

<Application example>
The technology according to the present disclosure can be applied to various products. For example, the technology according to the present disclosure may be any type of movement such as an automobile, an electric vehicle, a hybrid electric vehicle, a motorcycle, a bicycle, personal mobility, an airplane, a drone, a ship, a robot, a construction machine, and an agricultural machine (tractor). You may implement | achieve as an apparatus mounted in a body.

FIG. 27 is a block diagram illustrating a schematic configuration example of a vehicle control system 7000 that is an example of a mobile control system to which the technology according to the present disclosure can be applied. The vehicle control system 7000 includes a plurality of electronic control units connected via a communication network 7010. In the example shown in FIG. 27, the vehicle control system 7000 includes a drive system control unit 7100, a body system control unit 7200, a battery control unit 7300, a vehicle exterior information detection unit 7400, a vehicle interior information detection unit 7500, and an integrated control unit 7600. . The communication network 7010 for connecting the plurality of control units conforms to an arbitrary standard such as CAN (Controller Area Network), LIN (Local Interconnect Network), LAN (Local Area Network), or FlexRay (registered trademark). It may be an in-vehicle communication network.

Each control unit includes a microcomputer that performs arithmetic processing according to various programs, a storage unit that stores programs executed by the microcomputer or parameters used for various calculations, and a drive circuit that drives various devices to be controlled. Is provided. Each control unit includes a network I / F for communicating with other control units via a communication network 7010, and is connected to devices or sensors inside and outside the vehicle by wired communication or wireless communication. A communication I / F for performing communication is provided. In FIG. 27, as a functional configuration of the integrated control unit 7600, a microcomputer 7610, a general-purpose communication I / F 7620, a dedicated communication I / F 7630, a positioning unit 7640, a beacon receiving unit 7650, an in-vehicle device I / F 7660, an audio image output unit 7670, An in-vehicle network I / F 7680 and a storage unit 7690 are illustrated. Similarly, other control units include a microcomputer, a communication I / F, a storage unit, and the like.

The drive system control unit 7100 controls the operation of the device related to the drive system of the vehicle according to various programs. For example, the drive system control unit 7100 includes a driving force generator for generating a driving force of a vehicle such as an internal combustion engine or a driving motor, a driving force transmission mechanism for transmitting the driving force to wheels, and a steering angle of the vehicle. It functions as a control device such as a steering mechanism that adjusts and a braking device that generates a braking force of the vehicle. The drive system control unit 7100 may have a function as a control device such as ABS (Antilock Brake System) or ESC (Electronic Stability Control).

A vehicle state detection unit 7110 is connected to the drive system control unit 7100. The vehicle state detection unit 7110 includes, for example, a gyro sensor that detects the angular velocity of the rotational movement of the vehicle body, an acceleration sensor that detects the acceleration of the vehicle, an operation amount of an accelerator pedal, an operation amount of a brake pedal, and steering of a steering wheel. At least one of sensors for detecting an angle, an engine speed, a rotational speed of a wheel, or the like is included. The drive system control unit 7100 performs arithmetic processing using a signal input from the vehicle state detection unit 7110, and controls an internal combustion engine, a drive motor, an electric power steering device, a brake device, or the like.

The body system control unit 7200 controls the operation of various devices mounted on the vehicle body according to various programs. For example, the body system control unit 7200 functions as a keyless entry system, a smart key system, a power window device, or a control device for various lamps such as a headlamp, a back lamp, a brake lamp, a blinker, or a fog lamp. In this case, the body control unit 7200 can be input with radio waves or various switch signals transmitted from a portable device that substitutes for a key. The body system control unit 7200 receives input of these radio waves or signals, and controls a door lock device, a power window device, a lamp, and the like of the vehicle.

The battery control unit 7300 controls the secondary battery 7310 that is a power supply source of the drive motor according to various programs. For example, information such as battery temperature, battery output voltage, or remaining battery capacity is input to the battery control unit 7300 from a battery device including the secondary battery 7310. The battery control unit 7300 performs arithmetic processing using these signals, and controls the temperature adjustment of the secondary battery 7310 or the cooling device provided in the battery device.

The outside information detection unit 7400 detects information outside the vehicle on which the vehicle control system 7000 is mounted. For example, the outside information detection unit 7400 is connected to at least one of the imaging unit 7410 and the outside information detection unit 7420. The imaging unit 7410 includes at least one of a ToF (Time Of Flight) camera, a stereo camera, a monocular camera, an infrared camera, and other cameras. The outside information detection unit 7420 detects, for example, current weather or an environmental sensor for detecting weather, or other vehicles, obstacles, pedestrians, etc. around the vehicle equipped with the vehicle control system 7000. At least one of the surrounding information detection sensors.

The environmental sensor may be, for example, at least one of a raindrop sensor that detects rainy weather, a fog sensor that detects fog, a sunshine sensor that detects sunlight intensity, and a snow sensor that detects snowfall. The ambient information detection sensor may be at least one of an ultrasonic sensor, a radar device, and a LIDAR (Light Detection and Ranging, Laser Imaging Detection and Ranging) device. The imaging unit 7410 and the outside information detection unit 7420 may be provided as independent sensors or devices, or may be provided as a device in which a plurality of sensors or devices are integrated.

Here, FIG. 28 shows an example of installation positions of the imaging unit 7410 and the vehicle outside information detection unit 7420. The

imaging units

7910, 7912, 7914, 7916, and 7918 are provided at, for example, at least one of the front nose, the side mirror, the rear bumper, the back door, and the upper part of the windshield in the vehicle interior of the vehicle 7900. An imaging unit 7910 provided in the front nose and an imaging unit 7918 provided in the upper part of the windshield in the vehicle interior mainly acquire an image in front of the vehicle 7900.

Imaging units

7912 and 7914 provided in the side mirror mainly acquire an image of the side of the vehicle 7900. An imaging unit 7916 provided in the rear bumper or the back door mainly acquires an image behind the vehicle 7900. The imaging unit 7918 provided on the upper part of the windshield in the passenger compartment is mainly used for detecting a preceding vehicle or a pedestrian, an obstacle, a traffic light, a traffic sign, a lane, or the like.

FIG. 28 shows an example of the shooting range of each of the

imaging units

7910, 7912, 7914, and 7916. The imaging range a indicates the imaging range of the imaging unit 7910 provided in the front nose, the imaging ranges b and c indicate the imaging ranges of the

imaging units

7912 and 7914 provided in the side mirrors, respectively, and the imaging range d The imaging range of the imaging part 7916 provided in the rear bumper or the back door is shown. For example, by superimposing the image data captured by the

imaging units

7910, 7912, 7914, and 7916, an overhead image when the vehicle 7900 is viewed from above is obtained.

The vehicle outside

information detection units

7920, 7922, 7924, 7926, 7928, and 7930 provided on the front, rear, sides, corners of the vehicle 7900 and the upper part of the windshield in the vehicle interior may be, for example, an ultrasonic sensor or a radar device. The vehicle outside

information detection units

7920, 7926, and 7930 provided on the front nose, the rear bumper, the back door, and the windshield in the vehicle interior of the vehicle 7900 may be, for example, LIDAR devices. These outside information detection units 7920 to 7930 are mainly used for detecting a preceding vehicle, a pedestrian, an obstacle, and the like.

Returning to FIG. 27, the description will be continued. The vehicle exterior information detection unit 7400 causes the imaging unit 7410 to capture an image outside the vehicle and receives the captured image data. Further, the vehicle exterior information detection unit 7400 receives detection information from the vehicle exterior information detection unit 7420 connected thereto. When the vehicle exterior information detection unit 7420 is an ultrasonic sensor, a radar device, or a LIDAR device, the vehicle exterior information detection unit 7400 transmits ultrasonic waves, electromagnetic waves, or the like, and receives received reflected wave information. The outside information detection unit 7400 may perform an object detection process or a distance detection process such as a person, a car, an obstacle, a sign, or a character on a road surface based on the received information. The vehicle exterior information detection unit 7400 may perform environment recognition processing for recognizing rainfall, fog, road surface conditions, or the like based on the received information. The vehicle outside information detection unit 7400 may calculate a distance to an object outside the vehicle based on the received information.

Further, the outside information detection unit 7400 may perform image recognition processing or distance detection processing for recognizing a person, a car, an obstacle, a sign, a character on a road surface, or the like based on the received image data. The vehicle exterior information detection unit 7400 performs processing such as distortion correction or alignment on the received image data, and combines the image data captured by the different imaging units 7410 to generate an overhead image or a panoramic image. Also good. The vehicle exterior information detection unit 7400 may perform viewpoint conversion processing using image data captured by different imaging units 7410.

The vehicle interior information detection unit 7500 detects vehicle interior information. For example, a driver state detection unit 7510 that detects the driver's state is connected to the in-vehicle information detection unit 7500. Driver state detection unit 7510 may include a camera that captures an image of the driver, a biosensor that detects biometric information of the driver, a microphone that collects sound in the passenger compartment, and the like. The biometric sensor is provided, for example, on a seat surface or a steering wheel, and detects biometric information of an occupant sitting on the seat or a driver holding the steering wheel. The vehicle interior information detection unit 7500 may calculate the degree of fatigue or concentration of the driver based on the detection information input from the driver state detection unit 7510, and determines whether the driver is asleep. May be. The vehicle interior information detection unit 7500 may perform a process such as a noise canceling process on the collected audio signal.

The integrated control unit 7600 controls the overall operation in the vehicle control system 7000 according to various programs. An input unit 7800 is connected to the integrated control unit 7600. The input unit 7800 is realized by a device that can be input by a passenger, such as a touch panel, a button, a microphone, a switch, or a lever. The integrated control unit 7600 may be input with data obtained by recognizing voice input through a microphone. The input unit 7800 may be, for example, a remote control device using infrared rays or other radio waves, or may be an external connection device such as a mobile phone or a PDA (Personal Digital Assistant) that supports the operation of the vehicle control system 7000. May be. The input unit 7800 may be, for example, a camera. In that case, the passenger can input information using a gesture. Alternatively, data obtained by detecting the movement of the wearable device worn by the passenger may be input. Furthermore, the input unit 7800 may include, for example, an input control circuit that generates an input signal based on information input by a passenger or the like using the input unit 7800 and outputs the input signal to the integrated control unit 7600. A passenger or the like operates the input unit 7800 to input various data or instruct a processing operation to the vehicle control system 7000.

The storage unit 7690 may include a ROM (Read Only Memory) that stores various programs executed by the microcomputer, and a RAM (Random Access Memory) that stores various parameters, calculation results, sensor values, and the like. The storage unit 7690 may be realized by a magnetic storage device such as an HDD (Hard Disc Drive), a semiconductor storage device, an optical storage device, a magneto-optical storage device, or the like.

General-purpose communication I / F 7620 is a general-purpose communication I / F that mediates communication with various devices existing in the external environment 7750. General-purpose communication I / F7620 is a cellular communication protocol such as GSM (Global System of Mobile communications), WiMAX, LTE (Long Term Evolution) or LTE-A (LTE-Advanced), or wireless LAN (Wi-Fi (registered trademark)). Other wireless communication protocols such as Bluetooth (registered trademark) may also be implemented. The general-purpose communication I / F 7620 is connected to a device (for example, an application server or a control server) existing on an external network (for example, the Internet, a cloud network, or an operator-specific network) via, for example, a base station or an access point. May be. The general-purpose communication I / F 7620 is a terminal (for example, a driver, a pedestrian or a store terminal, or an MTC (Machine Type Communication) terminal) that exists in the vicinity of the vehicle using, for example, P2P (Peer To Peer) technology. You may connect with.

The dedicated communication I / F 7630 is a communication I / F that supports a communication protocol formulated for use in vehicles. The dedicated communication I / F 7630 is a standard protocol such as WAVE (Wireless Access in Vehicle Environment), DSRC (Dedicated Short Range Communications), or cellular communication protocol, which is a combination of the lower layer IEEE 802.11p and the upper layer IEEE 1609. May be implemented. The dedicated communication I / F 7630 typically includes vehicle-to-vehicle communication, vehicle-to-infrastructure communication, vehicle-to-home communication, and vehicle-to-pedestrian communication. ) Perform V2X communication, which is a concept that includes one or more of the communications.

The positioning unit 7640 receives, for example, a GNSS signal from a GNSS (Global Navigation Satellite System) satellite (for example, a GPS signal from a GPS (Global Positioning System) satellite), performs positioning, and performs latitude, longitude, and altitude of the vehicle. The position information including is generated. Note that the positioning unit 7640 may specify the current position by exchanging signals with the wireless access point, or may acquire position information from a terminal such as a mobile phone, PHS, or smartphone having a positioning function.

The beacon receiving unit 7650 receives, for example, radio waves or electromagnetic waves transmitted from a radio station installed on the road, and acquires information such as the current position, traffic jam, closed road, or required time. Note that the function of the beacon receiving unit 7650 may be included in the dedicated communication I / F 7630 described above.

The in-vehicle device I / F 7660 is a communication interface that mediates the connection between the microcomputer 7610 and various in-vehicle devices 7760 present in the vehicle. The in-vehicle device I / F 7660 may establish a wireless connection using a wireless communication protocol such as a wireless LAN, Bluetooth (registered trademark), NFC (Near Field Communication), or WUSB (Wireless USB). The in-vehicle device I / F 7660 is connected to a USB (Universal Serial Bus), HDMI (High-Definition Multimedia Interface), or MHL (Mobile High-definition Link) via a connection terminal (and a cable if necessary). ) Etc. may be established. The in-vehicle device 7760 may include, for example, at least one of a mobile device or a wearable device that a passenger has, or an information device that is carried into or attached to the vehicle. In-vehicle device 7760 may include a navigation device that searches for a route to an arbitrary destination. In-vehicle device I / F 7660 exchanges control signals or data signals with these in-vehicle devices 7760.

The in-vehicle network I / F 7680 is an interface that mediates communication between the microcomputer 7610 and the communication network 7010. The in-vehicle network I / F 7680 transmits and receives signals and the like in accordance with a predetermined protocol supported by the communication network 7010.

The microcomputer 7610 of the integrated control unit 7600 is connected via at least one of a general-purpose communication I / F 7620, a dedicated communication I / F 7630, a positioning unit 7640, a beacon receiving unit 7650, an in-vehicle device I / F 7660, and an in-vehicle network I / F 7680. The vehicle control system 7000 is controlled according to various programs based on the acquired information. For example, the microcomputer 7610 calculates a control target value of the driving force generation device, the steering mechanism, or the braking device based on the acquired information inside and outside the vehicle, and outputs a control command to the drive system control unit 7100. Also good. For example, the microcomputer 7610 realizes ADAS (Advanced Driver Assistance System) functions including vehicle collision avoidance or impact mitigation, following traveling based on inter-vehicle distance, vehicle speed maintaining traveling, vehicle collision warning, or vehicle lane departure warning. You may perform the cooperative control for the purpose. Further, the microcomputer 7610 controls the driving force generator, the steering mechanism, the braking device, or the like based on the acquired information on the surroundings of the vehicle, so that the microcomputer 7610 automatically travels independently of the driver's operation. You may perform the cooperative control for the purpose of driving.

The microcomputer 7610 is information acquired via at least one of the general-purpose communication I / F 7620, the dedicated communication I / F 7630, the positioning unit 7640, the beacon receiving unit 7650, the in-vehicle device I / F 7660, and the in-vehicle network I / F 7680. The three-dimensional distance information between the vehicle and the surrounding structure or an object such as a person may be generated based on the above and local map information including the peripheral information of the current position of the vehicle may be created. Further, the microcomputer 7610 may generate a warning signal by predicting a danger such as a collision of a vehicle, approach of a pedestrian or the like or an approach to a closed road based on the acquired information. The warning signal may be, for example, a signal for generating a warning sound or lighting a warning lamp.

The audio image output unit 7670 transmits an output signal of at least one of audio and image to an output device capable of visually or audibly notifying information to a vehicle occupant or the outside of the vehicle. In the example of FIG. 27, an audio speaker 7710, a display unit 7720, and an instrument panel 7730 are illustrated as output devices. Display unit 7720 may include at least one of an on-board display and a head-up display, for example. The display portion 7720 may have an AR (Augmented Reality) display function. In addition to these devices, the output device may be other devices such as headphones, wearable devices such as glasses-type displays worn by passengers, projectors, and lamps. When the output device is a display device, the display device can display the results obtained by various processes performed by the microcomputer 7610 or information received from other control units in various formats such as text, images, tables, and graphs. Display visually. Further, when the output device is an audio output device, the audio output device converts an audio signal made up of reproduced audio data or acoustic data into an analog signal and outputs it aurally.

In the example shown in FIG. 27, at least two control units connected via the communication network 7010 may be integrated as one control unit. Alternatively, each control unit may be configured by a plurality of control units. Furthermore, the vehicle control system 7000 may include another control unit not shown. In the above description, some or all of the functions of any of the control units may be given to other control units. That is, as long as information is transmitted and received via the communication network 7010, the predetermined arithmetic processing may be performed by any one of the control units. Similarly, a sensor or device connected to one of the control units may be connected to another control unit, and a plurality of control units may transmit / receive detection information to / from each other via the communication network 7010. .

Note that a computer program for realizing each function of the detection devices 100 to 800 according to the present embodiment described with reference to FIGS. 1, 6, 9, 12, 15, 18, 20, and 22 is stored in any control unit. Etc. can be implemented. It is also possible to provide a computer-readable recording medium in which such a computer program is stored. The recording medium is, for example, a magnetic disk, an optical disk, a magneto-optical disk, a flash memory, or the like. Further, the above computer program may be distributed via a network, for example, without using a recording medium.

In the vehicle control system 7000 described above, the detection apparatuses 100 to 800 according to the present embodiment described with reference to FIGS. 1, 6, 9, 12, 15, 18, 20, and 22 are applied to the application example illustrated in FIG. The present invention can be applied to the integrated control unit 7600.

In addition, at least some of the components of the detection devices 100 to 800 according to the present embodiment described with reference to FIGS. 1, 6, 9, 12, 15, 18, 20, and 22 are the integrated control unit illustrated in FIG. It may be realized in a module for 7600 (for example, an integrated circuit module composed of one die). Alternatively, the detection devices 100 to 800 according to the present embodiment described with reference to FIGS. 1, 6, 9, 12, 15, 18, 20, and 22 are performed by a plurality of control units of the vehicle control system 7000 illustrated in FIG. It may be realized.

Further, the effects described in the present specification are merely examples and are not limited, and other effects may be obtained.

In addition, this technique can take the following structures.
(1)
An acquisition unit for acquiring distance information related to the distance to the subject;
A setting unit for setting a region where the object may be imaged from the distance information and the feature amount of the object to be detected;
A detection device comprising: a determination unit that determines whether an image in the region is the object.
(2)
The setting unit
The feature amount of the object is the size of the object at a predetermined distance,
Set a frame corresponding to the size of the object according to the distance in the pixel set as the processing target,
The determination unit
The detection device according to (1), wherein it is determined whether an image in the frame is the object.
(3)
The detection device according to (1) or (2), further including a direction detection unit that detects a direction in which the object is facing from the distance information.
(4)
An imaging unit that captures an image of ambient light;
The image picked up by the image pickup unit, the size of the object set by the setting unit, an image in an area determined to be the object by the determination unit, and detected by the direction detection unit The detection apparatus according to (3), further comprising: a recognition unit that performs detailed recognition on the object using at least one of the directions of the object.
(5)
An acquisition unit for acquiring distance information related to the distance to the subject;
Using the distance information, a setting unit that sets a region where a predetermined object may be imaged,
A detection apparatus comprising: an estimation unit configured to estimate a category to which the object belongs from the size of the region and the distance information.
(6)
The setting unit
Set up to the part where the distance information changes as an area where the object may be imaged,
The estimation unit includes
The detection apparatus according to (5), wherein a category to which an object having a size of the area belongs at a distance represented by the distance information in the area is a category to which the object belongs.
(7)
The detection apparatus according to (5) or (6), further including a shape estimation unit that estimates the shape of the object in the region set by the setting unit using the distance information.
(8)
The detection device according to (7), wherein the estimation unit estimates the category using at least one of the distance information, the size of the region, and the shape.
(9)
An imaging unit that captures an image of ambient light;
At least one of the image captured by the imaging unit, the size of the region set by the setting unit, the category estimated by the estimation unit, and the shape estimated by the shape estimation unit. The detection device according to (7), further comprising: a recognition unit that performs detailed recognition on the object.
(10)
The detection device according to any one of (1) to (9), wherein the acquisition unit acquires the distance information using a TOF type sensor, a stereo camera, an ultrasonic sensor, or a millimeter wave radar.
(11)
Get distance information about the distance to the subject,
From the distance information and the feature amount of the object to be detected, set an area where the object may be imaged,
A detection method including a step of determining whether an image in the region is the object.
(12)
Get distance information about the distance to the subject,
Using the distance information, set an area where a predetermined object may be imaged,
A detection method including a step of estimating a category to which the object belongs from the size of the region and the distance information.
(13)
On the computer,
Get distance information about the distance to the subject,
From the distance information and the feature amount of the object to be detected, set an area where the object may be imaged,
A program for executing processing including a step of determining whether or not an image in the region is the object.
(14)
On the computer,
Get distance information about the distance to the subject,
Using the distance information, set an area where a predetermined object may be imaged,
A program for executing a process including a step of estimating a category to which the object belongs from the size of the area and the distance information.

100 detection device, 111 distance information acquisition unit, 112 subject feature extraction unit, 113 subject candidate area detection unit, 114 actual size database, 200 detection device, 211 imaging unit, 212 subject detail recognition unit, 300 detection device, 311 subject direction detection Unit, 400 detection device, 411 imaging unit, 412 subject detail recognition unit, 500 detection device, 511 subject size estimation unit, 512 subject category estimation unit, 600 detection device, 611 imaging unit, 612 subject detail recognition unit, 700 detection device, 711 subject shape estimation unit, 712 subject category estimation unit, 800 detection device, 811 imaging unit, 812 subject detail recognition unit

Claims

An acquisition unit for acquiring distance information related to the distance to the subject;
A setting unit for setting a region where the object may be imaged from the distance information and the feature amount of the object to be detected;
A detection device comprising: a determination unit that determines whether an image in the region is the object.
The setting unit
The feature amount of the object is the size of the object at a predetermined distance,
Set a frame corresponding to the size of the object according to the distance in the pixel set as the processing target,
The determination unit
The detection device according to claim 1, wherein it is determined whether an image in the frame is the object.
The detection device according to claim 2, further comprising: a direction detection unit that detects a direction in which the object is facing from the distance information.
An imaging unit that captures an image of ambient light;
The image picked up by the image pickup unit, the size of the object set by the setting unit, an image in an area determined to be the object by the determination unit, and detected by the direction detection unit The detection apparatus according to claim 3, further comprising: a recognition unit that performs detailed recognition on the object using at least one of the directions of the object.
An acquisition unit for acquiring distance information related to the distance to the subject;
Using the distance information, a setting unit that sets a region where a predetermined object may be imaged,
A detection apparatus comprising: an estimation unit configured to estimate a category to which the object belongs from the size of the region and the distance information.
The setting unit
Set up to the part where the distance information changes as an area where the object may be imaged,
The estimation unit includes
The detection device according to claim 5, wherein a category to which an object having a size of the area belongs at a distance represented by the distance information in the area is estimated as a category to which the object belongs.
The detection device according to claim 5, further comprising a shape estimation unit that estimates the shape of the object in the region set by the setting unit using the distance information.
The detection device according to claim 7, wherein the estimation unit estimates the category using at least one of the distance information, the size of the region, and the shape.
An imaging unit that captures an image of ambient light;
At least one of the image captured by the imaging unit, the size of the region set by the setting unit, the category estimated by the estimation unit, and the shape estimated by the shape estimation unit. The detection device according to claim 7, further comprising: a recognition unit that performs detailed recognition on the object.
The detection apparatus according to claim 1, wherein the acquisition unit acquires the distance information using a TOF type sensor, a stereo camera, an ultrasonic sensor, or a millimeter wave radar.
Get distance information about the distance to the subject,
From the distance information and the feature amount of the object to be detected, set an area where the object may be imaged,
A detection method including a step of determining whether an image in the region is the object.
Get distance information about the distance to the subject,
Using the distance information, set an area where a predetermined object may be imaged,
A detection method including a step of estimating a category to which the object belongs from the size of the region and the distance information.
On the computer,
Get distance information about the distance to the subject,
From the distance information and the feature amount of the object to be detected, set an area where the object may be imaged,
A program for executing processing including a step of determining whether or not an image in the region is the object.
On the computer,
Get distance information about the distance to the subject,
Using the distance information, set an area where a predetermined object may be imaged,
A program for executing a process including a step of estimating a category to which the object belongs from the size of the area and the distance information.