WO2019016879A1

WO2019016879A1 - Object detection device and object detection method

Info

Publication number: WO2019016879A1
Application number: PCT/JP2017/026036
Authority: WO
Inventors: 亮祐三木; 聡笹谷; 誠也伊藤
Original assignee: 株式会社日立製作所
Priority date: 2017-07-19
Filing date: 2017-07-19
Publication date: 2019-01-24
Also published as: JP6802923B2; JPWO2019016879A1

Abstract

This object detection device, which determines whether there is a detection target within a measurement range, is characterized by comprising: a three-dimensional information acquisition unit which acquires three-dimensional information within the measurement range on the basis of an input from an image capturing device; an identification candidate area extraction unit which extracts identification candidate areas in which the detection target object may be present; identifiers which are used to detect the detection target object; an identifier information acquisition unit which acquires identifier information; an image conversion method determining unit which determines a parameter for virtually viewpoint-converting the three-dimensional information in the identification candidate areas; an image conversion execution unit which generates a converted image on the basis of the virtually viewpoint-converted three-dimensional information in the identification candidate areas; and an identification unit which detects the detection target object using the identifier on the basis of the converted image.

Description

Object detection apparatus and object detection method

The present invention provides object detection that realizes robust object detection against changes in the installation state of the camera, and even in the case where the manner in which the detection target appears changes due to movement of the camera and the detection target. The present invention relates to an apparatus and an object detection method.

There is a great need for an object detection technique for detecting an object (for example, a person, a cargo, a vehicle, etc.) to be detected from image information acquired by an imaging device such as a surveillance camera. As a general object detection technique, a background image in which an object to be detected does not exist is prepared in advance, and a background difference for detecting an object by comparing the input captured image with the background image, or a frame of an image There is an optical flow etc. which detect a moving body by the difference of the feature point in between. However, in these methods, it is not possible to detect, for example, only a specific object in an image, because all moving objects in the image are detected.

Therefore, a technique for detecting a specific object using contour information of an object, appearance information such as a color or a shape that can be read from an appearance, or the like is required.

For example, in Patent Document 1, it is determined in paragraph 0034 that “a person is determined to be a person from contour information obtained from a feature amount based on appearance by HOG, and that a foreground is “Identifying an image determined to be a state) as a person” is described. In Patent Document 1, including this description, means for extracting outline information of a person from a learning sample consisting of an image including a person and an image not including the person, and generating a discriminator that discriminates between a person and a person, And a means for determining whether or not a person is present in a predetermined area on an image using the

Further, in Patent Document 2, in paragraph 0016, “the deformation detection area 2100 on the monitoring image 2000 reflects information on each parameter of the camera device, and as shown in FIG. Then, the object recognition apparatus 1 (1a) extracts the feature amount of the image information 100 of the deformation detection area 2100 created as an area including the recognition target deformed by distortion or the like, and the object of the recognition target It is described that "is determined whether or not." Including this description, in Patent Document 2, as an applied technology of Patent Document 1, it is assumed that a detection target on an image is deformed due to the influence of lens distortion unique to a camera, and a person exists using a classifier. There is disclosed a technique for improving the detection rate by deforming a predetermined region to be input before determining whether or not it is.

JP, 2009-181220, A JP 2012-221437 A

In Patent Document 1, a classifier is made to learn a person image of a specific posture (for example, an image obtained by photographing an upright posture from the front) as a learning sample, and a human being with a specific posture is detected using this classifier. I am raising the rate.

However, in the actually photographed image, the posture (the appearance) of the person largely changes due to the relative positional relationship between the camera device and the person, or the lens distortion of the camera device, so the learning sample and the person in the photographed image When the contour information is different, in the discriminator of Patent Document 1, there is a problem that the accuracy of human detection is lowered.

Further, in Patent Document 2, as shown in FIG. 13 and the like of the same document, specification in which the contour information of a person to be input to a discriminator is set in advance from the parameter information of the camera device and the positional relationship between the detection target and the camera device By deforming (normalizing) so as to be the same as the posture of, it is possible to maintain the detection rate by the classifier even when the posture of the person changes in a certain range.

However, in the case where the detection object looks significantly different from what is expected, or when a part of the detection object is hidden behind a shielding object, the object recognition method of Patent Document 2 significantly reduces the detection rate. There is a problem called. For example, even if it is possible to easily detect a person from an image including all of the head, arms, torso and legs of the person, the image of the person taken from directly above or the lower body is hidden in the shield In the case of using an image obtained by capturing an image of a person, the human detection rate in the image is significantly reduced in the discriminator of Patent Document 2 because the leg can not be detected from the image.

In order to solve such a problem, in the present invention, in the case of using a photographed image including a person in a posture not corresponding to the classifier or an image photographed in a state where a part of the human body is hidden in the shadow of the obstacle. Another object of the present invention is to provide an object detection device capable of realizing highly accurate person detection.

An object detection device according to the present invention is an object detection device that determines whether or not a detection target exists in a measurement range, and acquires three-dimensional information in the measurement range based on an input from an imaging device. A three-dimensional information acquisition unit, an identification candidate area extraction unit for extracting an identification candidate area where the detection target may exist, a classifier used to detect the detection target, and classifier information acquisition for acquiring information of the classifier And an image conversion method determination unit that determines parameters for virtually performing a viewpoint conversion process on three-dimensional information in the identification candidate area, and three-dimensional information in the identification candidate area that is virtually subjected to a viewpoint conversion process. An image conversion execution unit that generates a converted image, and an identification unit that detects the detection target using the identifier based on the converted image.

According to the object detection apparatus of the present invention, even when using an image in which the relative positions of the camera device and the object are significantly different from each other or an image in which a part of the object is shielded, the object to be detected is made highly accurate. It can be detected.

FIG. 1 is a view showing an example of the arrangement of an object detection apparatus according to a first embodiment; FIG. 6 is a diagram showing details of an identification candidate region extraction unit of the first embodiment. FIG. 8 is a diagram showing details of an identification candidate area information management unit of the first embodiment. It is a figure which shows the identification candidate area | region in a two-dimensional image. It is a figure which shows the identification candidate area | region in three-dimensional imaging | photography space. FIG. 7 is a diagram showing details of a discriminator of the first embodiment. FIG. 7 is a diagram showing details of an image conversion method determination unit of the first embodiment. FIG. 6 is a diagram for explaining the processing content of the viewpoint conversion unit of the first embodiment. It is a figure explaining the effect of an image conversion method determination part. It is a figure explaining the effect of an image conversion method determination part. FIG. 7 is a diagram showing details of an identification unit of the configuration example of the first embodiment. FIG. 7 is a diagram showing an example of a process flow in the first embodiment. FIG. 7 is a diagram showing an example of the configuration of an object detection device according to a second embodiment. FIG. 18 is a diagram for explaining the process of the image conversion method determination unit of the second embodiment. FIG. 16 is a diagram for explaining the processing flow of the image conversion method determination unit of the second embodiment. It is a figure explaining the detail of the processing flow of FIG.

Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Although an example in which the detection target is a person will be described below, the detection target is not limited to a person, and may be cargo, a vehicle, or the like. Although an example of detecting a detection target from image information captured by an imaging device such as a camera will be described, information including the detection target is not limited to image information captured by an imaging device, and a heat map acquired by a thermo sensor It may be.

An object detection apparatus 2a according to the first embodiment will be described with reference to FIGS.

FIG. 1 is a block diagram showing an outline of an object detection device 2a of the present embodiment connected to an imaging device 1 such as a stereo camera. The object detection device 2a realizes robust detection of the detection target even when the appearance of the detection target on the captured image of the imaging device 1 changes due to a change in relative position between the imaging device 1 and the detection target. Object detection device.

In the object detection device 2a shown in FIG. 1, 3 is an image acquisition unit for acquiring image information within the measurement range based on the input from the

imaging device

1, and 4 is a third order within the measurement range based on the input from the imaging device 1. A three-dimensional information acquisition unit for acquiring source information, 5 is an identification candidate area extraction unit for extracting an identification candidate area which is an area where a detection target may exist using image information and three-dimensional information from a measurement range, and 6 is an object A discriminator information acquisition unit for acquiring information of the discriminator 64 used in the

detection device

2a, 7a determines a method for converting the discriminant candidate region into an optimal image as an input of the discriminator 64 using the discriminator information An image conversion method determination unit for performing conversion, an image conversion unit for acquiring a converted image from an identification candidate area based on the determined image conversion method, and an identification unit for determining whether or not a detection target is included in the converted image. is there. Note that some or all of the image acquisition unit 3 to the identification unit 9 need not necessarily be dedicated hardware, and may be stored in a program stored in a main storage device such as a semiconductor memory or an auxiliary storage device such as a hard disk. It may be realized by processing the data with an arithmetic device such as a CPU.

Hereinafter, the imaging device 1, the identification candidate area extraction unit 5, the classifier information acquisition unit 6, the image conversion method determination unit 7a, the image conversion unit 8 and the identification unit 9 shown in FIG. 1 will be individually described in detail.
<Imaging device>
The imaging device 1 is a device capable of acquiring image information of a measurement range and three-dimensional information. Here, image information is luminance information in digital image data, and three-dimensional information is coordinate information of a three-dimensional point group in a measurement range (three-dimensional space).

The imaging device 1 may be a stereo camera composed of two or more cameras, or a combination of one camera and a distance sensor capable of acquiring three-dimensional information. For example, a stereo camera measures the distance from a camera to a subject using the principle of triangulation by photographing the same subject with two or more cameras, and both of image information and three-dimensional information You can get In addition, the distance sensor measures the distance to the target by calculating from the phase difference between the projected light and the reflected light the time it takes for the projected light to be reflected by the target and returned to the distance sensor. By combining with the combined camera, three-dimensional information and image information can be associated and acquired.
<Identification candidate area extraction unit>
FIG. 2 shows the details of the identification candidate area extraction unit 5. The identification candidate area extraction unit 5 extracts an identification candidate area 55 in which a detection target may exist using image information or three-dimensional information acquired by the image acquisition unit 3 and the three-dimensional information acquisition unit 4 or both of them. Image processing unit 51 extracting identification candidate region 55 using image information; three-dimensional information processing unit 52 extracting identification candidate region 55 using three-dimensional information; and one or more extracted extraction candidates An identification candidate area ID assigning unit 53 for assigning an ID to the area 55, and an identification candidate area information management unit 54 for acquiring and managing identification candidate area information representing the position of the identification candidate area 55 are provided. Hereinafter, the image processing unit 51, the three-dimensional information processing unit 52, the identification candidate area ID assignment unit 53, and the identification candidate area information management unit 54 will be described in detail.

The image processing unit 51 extracts an identification candidate area 55 by performing image processing on the image information acquired by the imaging device 1. As the image processing executed here, for example, there is a background difference in which a background image obtained by capturing a shooting space in a state in which no detection target exists is obtained in advance, and the difference between the background image and the captured image is calculated. However, the method is not particularly limited as long as it is a means capable of extracting a region to be detected by image information, such as detection using color information such as skin color detection.

The three-dimensional information processing unit 52 extracts the identification candidate area 55 by performing three-dimensional processing on the three-dimensional information acquired by the imaging device 1. As the three-dimensional processing to be executed here, for example, background three-dimensional information of the shooting space in a state where no detection target exists is obtained in advance, and the difference between the background three-dimensional information and the three-dimensional information obtained anew Although there is a method of calculation, it is not particularly limited as long as the identification candidate area 55 is obtained by performing three-dimensional processing.

Next, the identification candidate area ID assignment unit 53 and the identification candidate area information management unit 54 will be described with reference to FIGS. 3A to 3C.

The identification candidate area ID assigning unit 53 assigns an ID to each of the identification candidate areas 55 extracted by the image processing unit 51 and the three-dimensional information processing unit 52. Further, the identification candidate area information management unit 54 adds position information of the identification candidate area to the ID, and manages the ID as identification candidate area information 54 — n. The position information is an image position indicating the start point and the end point in the two-dimensional image of the identification candidate area, and a three-dimensional position indicating the start point and the end point in the three-dimensional imaging space of the identification candidate area.

FIG. 3A exemplifies n pieces of identification candidate area information 54_n managed by the identification candidate area information management unit 54. In each of the identification candidate area information 54_n, in addition to the ID, the corresponding identification candidate area 55 is It indicates that the image position and the three-dimensional position which are position information are recorded. FIG. 3B specifically shows the image position of the identification candidate area information 54_1, and 56a and 56b indicate the start point (x1, y1) and the end point (x1) of the rectangular identification candidate area 55 in the captured image of the imaging device 1. ', y 1') is shown. Similarly, FIG. 3C specifically shows the three-dimensional position of the identification candidate area information 54_1, and 57a and 57b indicate the start point (X1, Y1, Z1) and the end point (X1) of the rectangular identification candidate area 55. ', Y1', Z1 ') are shown. Although FIG. 3B and FIG. 3C illustrated the rectangular or rectangular parallelepiped identification candidate area 55, as long as the expression can identify the position of the identification candidate area 55, identification candidate areas of other shapes may be used. In this case, it is needless to say that the information on the image position and the three-dimensional position in FIG. 3A is also expressed in accordance with the identification candidate area of the other shape.
<Classifier Information Acquisition Unit>
Next, the discriminator information acquisition unit 6 will be described with reference to FIG. The discriminator information acquisition unit 6 selects an appropriate one from a plurality of prepared discriminators 64 and extracts the discriminator information 65 corresponding thereto. Note that 67_n is a discriminator ID given to manage the discriminator 64_n.

The discriminator 64 is used for discrimination processing to discriminate whether a detection target is included in a captured image of the imaging device 1, and each of the discriminators 64_n has high discrimination ability with respect to detection targets of different postures. is there. Different classifiers 64 — n can have different characteristics by learning a large number of images including the detection target and images (learning samples) not including the detection target by the machine learning method. Although a Support Vector Machine is generally used as a machine learning method, another machine learning method may be used.

The discriminator information 65 — n indicates an input image in which the discriminator 64 — n exhibits a particularly high discrimination capability. FIG. 4 exemplifies a template 66_1 strong in identifying a person image viewed from the front, a template 66_2 strong in identifying a person image viewed from the top, and a template 66_n strong in identifying a person image viewed from the side as discriminator information. Other information is recorded as long as it is identifier information representing color information, feature information representing an outline, luminance information, gradient information, etc. that is an image or a method of generating an image suitable for the input of the identifier 64_n. You may leave it.
<Image conversion method determination unit>
Next, the process flow of the image conversion method determination unit 7a will be described using FIG. The image conversion method determination unit 7a is a conversion method (parameters and the like) for converting into an optimal image as an input to the discriminator 64 based on three-dimensional information in the rectangular parallelepiped identification candidate area 55 illustrated in FIG. 3C. To determine the In the processing flow of the image conversion method determination unit 7a, first, parameters for viewpoint conversion are determined (S51), and a viewpoint conversion image is generated using the parameters (S52). Then, the similarity with the converted image is calculated with reference to the classifier information 65 held by each of the plurality of classifiers 64 (S53), and if the similarity is higher than the threshold, the process is ended, and if below the threshold, step S51. , And change the parameter to another value (S54). Hereinafter, steps S51, S52, S53, and S54 will be described in detail.

In step S51, parameters α, β, and γ necessary for generating a viewpoint conversion image are determined. The details of each parameter will be described in detail later. As a method of determining the parameters α, β, and γ in step S51, there is a method of changing the parameters comprehensively.

In step S52, the process shown in FIG. 6 is performed. In FIG. 6, 82 is a viewpoint for observing the

identification candidate area

55, and 83, 84, 85 are x-axis, y-axis, z-axis, 86_1, 86_2 in the coordinate system of the three-dimensional space set in the measurement range by viewpoint conversion. An example of the converted image is shown. In step S52, using the parameters α, β, and γ determined in step S51, three-dimensional information included in the rectangular identification candidate area 55 is α about the x axis 83, β about the y axis 84, and By rotating by about γ about the z axis 85, viewpoint conversion is performed to a state observed from an arbitrary viewpoint, and a conversion candidate 86 is acquired by projecting the identification candidate area 55 after viewpoint conversion onto an image.

As a viewpoint conversion method, it is general to use conversion equations such as Equation 1 to Equation 3, but other viewpoint conversion methods may be used.

As a method of projecting the identification candidate area 55 on the transformed image 86 — n, perspective projection is a general method, but other methods may be used. For example, when the identification candidate area 55 including three-dimensional information of a person standing up is photographed by the imaging device 1 installed in the direction of the viewpoint 82, the person is displayed by projecting the identification candidate area 55 without changing the viewpoint. The converted image 86_1 viewed from the top can be acquired. On the other hand, viewpoint conversion is performed to rotate the identification candidate area 55 captured by the same imaging device 1 by α = 0 °, β = 0 °, γ = 90 °, and the identification candidate area 55 for the viewpoint 82 is When projected onto an image, it is possible to acquire a converted image 86_2 in which a person is viewed from the side.

Furthermore, in step S53 of FIG. 5, an optimization process is performed to determine an image conversion method to an image most suitable for the discriminator 64. As a method of determining the image conversion method, for example, the template 66 is acquired with reference to the discriminator information 65, and viewpoint conversion is performed on the identification candidate area 55 to calculate the similarity with the converted image 86_n acquired. There is a method. As a method of calculating the degree of similarity, for example, pattern matching such as Normalized Cross-Correlation is generally used, but other methods may be used. At this time, the evaluation function is a similarity, and an evaluation function with parameters α, β, and γ as variables is designed, and the similarity to the classifier 64_n is solved by solving an optimization problem that maximizes the evaluation function. Get the largest image conversion method. In addition, as illustrated in FIG. 4, when two or more classifiers 64 exist, the similarity to the conversion image 86_n acquired for the classification candidate region 55 is calculated for each classifier, and the similarity is calculated. The ID of the discriminator 64_n having the largest value is acquired.

In step S54, the similarity between the converted image 86_n calculated in step S53 and the discriminator information 65 is compared with a threshold, and if the similarity is equal to or greater than the threshold, the process ends; if less than the threshold, the process returns to step S51; After changing to, repeat the same process. The installer of the object detection apparatus 2a may optionally set the threshold used in step S54, but by feeding back the accuracy of object detection when object detection is performed by the object detection apparatus 2a using a predetermined threshold. The threshold may be changed to an appropriate value. For example, there is a method of changing the threshold value to a higher value when it is determined that the accuracy of the object detection device 2a using a certain threshold value is insufficient.

When determining the parameters in step S51, the image conversion method determination unit 7a creates a matrix map in which the ratio of the vertical and the horizontal of the converted image generated by the parameters α, β, and γ is recorded in advance. You may decide. Alternatively, the imaging space may be divided into a plurality of regions, and a matrix map holding parameters α, β and γ that are approximately effective for each region may be prepared and determined with reference to the matrix map. At that time, a method may be used in which it is possible to update if a more suitable one than the parameters α, β and γ held by the matrix map is found. Alternatively, the parameters α, β, and γ that are approximately effective may be determined by acquiring camera parameters and acquiring information on the installation state of the imaging device 1.

Further, in the image conversion method determination unit 7a, it is determined whether or not the process of calculating the degree of similarity by changing the parameters α, β, and γ is continued, and if continuing, the process returns to step S51. It is good to finish. The determination criterion of whether or not to execute the process may be determined, for example, depending on whether the number of times of changing the parameters α, β and γ exceeds the number set in advance. Alternatively, the process may be terminated when the degree of similarity calculated in step S53 is equal to or less than the preset minimum value. Even if the similarity does not exceed the threshold value, the process is terminated to prevent waste that the object detection process of the object detection device 2a is repeatedly performed in the case where the identification candidate area 55 does not include the object to be detected. Can.

Next, the effects of the image conversion method determination unit 7a will be described using FIGS. 7A and 7B. In FIG. 7A, 82a, 82b, and 82c indicate viewpoints (installation positions and directions) of the

imaging device

1, and 87a, 87b, and 87c indicate rectangular images including a person extracted from the photographed images of the respective viewpoints.

When a person is detected using the identifiers 64_1 and 64_2 shown in FIG. 7B based on the

rectangular images

87a, 87b, and 87c obtained by capturing an upright person, the rectangular image 87a captured from the viewpoint 82a is a template 66_1 of the identifier 64_1. And the rectangular image 87c taken from the viewpoint 82c has a high degree of similarity with the template 66_2 of the discriminator 64_2. Therefore, for the rectangular image 87a and the rectangular image 87c, a person can be easily detected by using the discriminator 64_1 or the discriminator 64_2.

On the other hand, in the person in the rectangular image 87b photographed from the viewpoint 82b, deformation (displacement from the template) occurs due to the inclination of the line of sight of the imaging device 1, and the degree of similarity with both the template 66_1 and the template 66_2 Is low, the discriminator 64_1 and the discriminator 64_2 can not discriminate a person. Therefore, conventionally, it has been difficult to detect a person at a site where only the imaging device 1 of the viewpoint 82b is installed.

Even in such a case, it is possible to detect a deformed detection target in the rectangular image 87b by using the image conversion method determination unit 7a of this embodiment. A procedure for detecting a deformed person photographed from the viewpoint 82b will be described below.

First, the parameters α, β, γ are determined, and the converted image 86 b is created by performing virtual viewpoint conversion on the rectangular image 87 b using the parameters as inputs. Then, the similarity between the converted image 86b and the template 66_1 and the template 66_2 is calculated, and when there is a classifier 64_n indicating the similarity equal to or higher than the threshold, the ID 67 of the classifier is acquired. If there is no classifier 64_n indicating the similarity equal to or higher than the threshold, the parameters α, β, γ are determined again, and the same processing is performed. Although the rectangular image 87b is deformed due to the tilt of the camera, it is possible to acquire image information on the front of the person and three-dimensional information. Therefore, when the rectangular image 87b photographed from the viewpoint 82b is virtually converted to the viewpoint 82a, a converted image 86b similar to the rectangular image 87a can be obtained, and an image suitable for input to the discriminator 64_1 can be obtained. It becomes.

Similarly, when the rectangular image 87b captured from the viewpoint 82b is virtually converted to the viewpoint 82c, a converted image 86b similar to the rectangular image 87c can be obtained, and an image suitable for input to the discriminator 64_2 can be obtained. It becomes possible.

Here, the advantage of the image conversion method determination unit 7a in a situation where an obstacle exists between the viewpoint 82b and a person and a part of the human body (for example, a leg) is not seen in the rectangular image 87b will be described. When the rectangular image 87b is subjected to viewpoint conversion to the viewpoint 82a, the converted image 86b loses the leg as well as the rectangular image 87b, so the human can not be detected by the classifier 64_1 that needs the leg detection. On the other hand, when the rectangular image 87b is subjected to viewpoint conversion to the viewpoint 82c, similarly to the rectangular image 87b, although the converted image 86b also lacks a leg, a human being is detected by the classifier 64_2 which does not need the leg detection. Can. That is, even when the rectangular image 87b in which a part of the human body is missing is input, if an appropriate image conversion method is determined in the image conversion method determination unit 7a and the classifier 64 corresponding thereto is selected, Accurate person detection can be realized.
<Image converter>
The image conversion unit 8 converts the identification candidate area 55 in accordance with the image conversion method determined by the image conversion method determination unit 7a, and acquires a converted image 86 suitable for input to the classifier 64. As the image conversion method, as in step S52, for example, conversion equations such as Equations 1 to 3 can be used, but other methods may be used.
<Identifier>
FIG. 8 shows the details of the identification unit 9. The identification unit 9 determines whether or not the detection target is included in the converted image 86_n acquired by the image conversion unit 8, and includes a classifier recording unit 91 that records at least one or more classifiers 64_n, It includes an identification processing execution unit 92 that performs identification processing on the converted image 86_n using the identifier 64_n, and an identification result output unit that outputs the result of the identification processing. Hereinafter, the identification process execution unit 92 and the identification result output unit 93 will be described in detail.

The discrimination processing execution unit 92 performs discrimination processing on the converted image 86_n output from the image conversion unit 8 using the classifier 64_n recorded in the classifier recording unit 91. When two or more classifiers 64_n are recorded in the classifier recording unit 91, the classification processing execution unit 92 acquires the ID of the classifier 64_n selected by the image conversion method determination unit 7a, and corresponds to the ID. After selecting the discriminator 64_n, the discrimination process is performed on the converted image 86_n output from the image conversion execution unit 7.

The identification result output unit 93 outputs the identification processing result of the identification processing execution unit 92 to the outside. For example, when the object detection device 2a is connected to a display device such as a monitor, the image of the imaging space may be displayed on the display device. Then, if the identification processing execution unit 92 determines that the converted image 86 _n includes the detection target, the identification candidate area information 54 _n of the identification candidate area 55 that is the basis of the converted image 86 _n is referred to. The image position of the identification candidate area 55 in the photographed image is acquired. Then, a rectangular detection window or the like may be displayed at a position corresponding to the detection target in the captured image displayed on the display device, or a message may be displayed indicating that the detection target has been detected.
<Processing flow>
Next, a processing flow of object detection in the object detection device 2a of the present embodiment will be described using FIG.

In step S91, first, the imaging device 1 acquires image information and three-dimensional information corresponding to the measurement range, and outputs the image information and the three-dimensional information to the object detection device 2a. The image acquisition unit 3 acquires image information based on the input from the imaging device 1, and the three-dimensional information acquisition unit 4 acquires three-dimensional information based on the input from the imaging device 1.

In step S 92, the identification candidate area 55 is extracted using the identification candidate area extraction unit 5. Specifically, after using the rectangular area extracted by the image processing unit 51 and the rectangular parallelepiped area extracted by the three-dimensional information processing unit 52 as the identification candidate area 55, the identification candidate area ID of the extracted identification candidate area 55 The assigning unit 53 assigns an ID.

In step S93, one identification candidate area 55 to be subjected to identification processing is selected from the extracted identification candidate areas 55.

In step S94, viewpoint conversion is performed on the selected identification candidate area 55, and the converted image 86_n is acquired by projecting the image onto the image. By the optimization processing, an image conversion method in which the degree of similarity with respect to the classifier information 65 is maximum is acquired. When there are two or more classifiers 64_n, for example, the ID of the classifier 64_n having the highest similarity between the converted image 86_n acquired by performing viewpoint conversion on the candidate identification area 55 and the template 66 is obtained And determine an appropriate image conversion method for the corresponding classifier 64_n.

In step S95, image conversion is performed on the selected identification candidate area 55 by the conversion method determined in step S94, and a converted image 86_n is acquired.

In step S96, identification processing is performed on the converted image 86_n acquired in step S95 using the identifier 64_n.

In step S97, as a result of the identification process, it is determined whether the converted image includes a detection target. If it is included, step S98 is performed. If it is not included, step S99 is performed.

In step S98, when it is determined that the detection target is included in the converted image as a result of the identification processing, the identification result is output. For example, when the object detection device 2a is connected to a display device such as a monitor, an image of the imaging space is displayed on the display device, and a rectangular detection window is displayed at a position corresponding to the identification candidate area 55 in the image. And a message indicating that a detection target has been detected may be displayed. The position corresponding to the identification candidate area 55 is acquired with reference to the position information recorded in the identification candidate area information management unit 54.

In step S99, after the identification process for the selected identification candidate area 55 is completed, it is determined whether the identification process has been performed for all the identification candidate areas 55 extracted in step S92. Then, if there is an identification candidate area 55 for which the identification process has not been performed, step S93 is performed, and if there is no identification candidate area 55 for which the identification process has not been performed, the object detection process is ended.

As described above, the object detection device 2a according to the first embodiment performs detection of the detection target after converting the extracted identification candidate area 55 into an image suitable for input to the classifier by virtual viewpoint conversion. By doing this, even if the appearance of the detection target in the image is different from the template of the classifier, or even if part of the detection target on the screen is hidden by the shield, the detection target is detected with high accuracy. can do.

Next, an object detection device 2b according to a second embodiment will be described with reference to FIGS. The same points as in the first embodiment will not be repeatedly described.

FIG. 2 is a block diagram showing an outline of an object detection apparatus 2b according to this embodiment connected to an imaging apparatus 1 such as a stereo camera. Although the object detection device 2a of the first embodiment uses the image conversion method determination unit 7a which comprehensively changes the parameters α, β, and γ for rotating three-dimensional information, the object detection device 2b of the present embodiment is more An image conversion method determination unit 7b capable of efficiently determining the parameters α, β, and γ is used. Hereinafter, the image conversion method determination unit 7b will be described in detail.

First, an outline of processing in the image conversion method determination unit 7b will be described with reference to FIG. 11 showing a situation in which an upright person is photographed from the viewpoint 82d. In FIG. 11, Xc, Yc, and Zc are the x-axis, y-axis, and z-axis of the camera coordinate system, and 204 indicates the optical axis of the imaging device 1 installed at the viewpoint 82 d. Here, the camera coordinate system is a three-dimensional coordinate system representing a shooting space, with the optical center of the camera of the imaging device 1 as the origin and the z axis (Zc) aligned with the direction of the optical axis 204 of the camera. (Xc) and the y-axis (Yc) are parallel to the horizontal and vertical directions of the image projection plane 205. Also, 205 is an image projection plane, 206 is an image captured from the

viewpoint

82d, 207 is three-dimensional information acquired from the

viewpoint

82d, 208 is a straight line indicating the posture direction of the detection target, 209 and 210 are identification candidate areas 55 and straight line 208 Point coordinates (Xct, Yct, Zct) and (Xcb, Ycb, Zcb) in the camera coordinate system of the camera, and 211 and 212 indicate point coordinates in the camera coordinate system of the identification candidate area 55 after the virtual viewpoint conversion and the straight line 208 (Xct ', Yct', Zct ') and (Xcb', Ycb ', Zcb') are shown.

In the image conversion method determination unit 7b of this embodiment, as shown in the lower part of FIG. 11, the straight line 208 inclined with respect to the y axis (Yc) is converted into a straight line 208 'parallel to the y axis (Yc). Parameters α, β, and γ are calculated. Then, by generating a three-sided view of the converted identification candidate area 55 ′, the optimum converted image 86 is obtained as an input to the discriminator 67.

FIG. 12 shows the process flow of the image conversion method determination unit 7b including the process of determining the parameters α, β, and γ. The following outlines this processing flow.

First, after setting the parameter β of viewpoint conversion to 0 ° (S121), the straight line 208 is acquired (S122). Then, after setting the optional parameters α and γ for rotating the straight line 208 (S123), a three-view drawing of the detection target is generated using the set parameters α, β and γ (S124). Then, after one of the three views is selected as the converted image 86 (S125), the similarity between the selected converted image 86 and the discriminator information 65 is calculated (S126). Is determined as the input image to the discriminator 64, and the process ends. On the other hand, if the similarity is less than the threshold value, the process proceeds to step S128 (S127). In step S128, it is determined whether all of the generated three views have been selected as conversion images, and if all have not been selected, the process proceeds to step S125, and the similarity is calculated for all three views in the case of β = 0 °. If it has, the process proceeds to step S129 (S128). In step S129, the parameter β is changed, that is, after the identification candidate area 55 is rotated about the y axis (Yc), the process returns to step S124 (S129), and a converted image 86 having a similarity of the threshold or more is obtained. Repeat the process until it is done. Hereinafter, particularly important steps S122, S123, S124, and S129 will be described in detail.

In step S122, the straight line 208 is acquired. As an example of how to obtain the straight line 208, the three-dimensional information of the identification candidate area 55 is referred to, and a straight line connecting two points at which the Euclidean distance between points of the three-dimensional point group is maximum is taken. This is because when the detection target is an upright person, the identification candidate area 55 including the person can be predicted to be a rectangular solid long in the vertical direction, and the direction in which the Euclidean distance is maximum is a straight line 208 indicating the posture direction of the person. It can be estimated that Alternatively, principal component analysis may be performed on the three-dimensional point group of the identification candidate area 55, and a straight line taken in the direction of the first component may be used. Alternatively, when the floor surface existing in the space to be captured can be detected by a general floor surface estimation method, a straight line 208 is obtained using the direction orthogonal to the floor surface and the information of one point corresponding to the head. It is good to decide how to decide.

In step S123, parameters α and γ are determined such that the x value and z value of the intersection coordinates 211 and 212 are equal, that is, Xct '= Xbt' and Zct '= Zbt'. When the identification candidate area 55 is rotated so that Xct 'and Xcb' become equal, the rotation angle around the z axis (Zc) corresponds to the parameter γ, and the identification candidate area 55 so that Zct 'and Zcb' become equal. The rotation angle around the x axis (Xc) when rotating x corresponds to the parameter α. Since the parameter β is set to 0 ° in step S121, the parameters α, β, and γ can be determined by the above processing.

Next, the process of step S124 will be described using FIG. In step S124, three views of the identification candidate area 55 are acquired after virtual viewpoint conversion. In FIG. 13,

viewpoints

82e, 82f, and 82g are viewpoints for generating a three-view, and converted

images

86e, 86f, and 86g indicate converted images 86 generated from the respective viewpoints. After determining the parameters α and γ that make the straight line 208 parallel to the y axis (Yc), the trihedral view is generated while changing the parameter β, and when the predetermined parameter β is obtained, the conversion in FIG. As shown in the image 86e, it is possible to virtually convert the viewpoint to the viewpoint from the front of the person in the identification candidate area 55, and the person can be detected using the identifier 64 having the corresponding template.

However, in a real environment, the way of viewing from a specific direction of the detection target may not be suitable for identification due to shielding or the like. Therefore, in addition to the converted image 86e from the viewpoint 82e from the front, the converted

images

86f and 86g are also obtained from the viewpoints 82f and 82g from the side surface and the top surface, thereby increasing the number of candidate classifiers 64 as candidates. The accuracy of detection can be improved. In addition, the viewpoint 82f which looks at a side can be set by further rotating each parameter by α = 0 °, β = 90 °, γ = 0 ° with respect to the viewpoint 82e, and each parameter is further transmitted to the viewpoint 82e. By rotating by α = 90 °, β = 0 °, γ = 0 °, it is possible to set the viewpoint 82g for looking at the upper surface, and perspective-projecting the identification candidate region 55 at the

viewpoints

82e, 82f, 82g, the converted

image

86e, 86f and 86g can be acquired efficiently, and efficient person detection can be realized by making these into three views.

In the object detection apparatus according to the second embodiment described above, the parameters α, β, and γ can be determined more efficiently than in the first embodiment, and when a person is deformed in an image or when shielding occurs. It is possible to carry out highly accurate person detection.

Reference Signs List 1 imaging apparatus, 2a, 2b object detection device, 3 image acquisition unit, 4 three-dimensional information acquisition unit, 5 identification candidate area extraction unit, 51 image processing unit, 52 three-dimensional information processing unit, 53 identification candidate area ID assignment unit 54 identification candidate area information management unit 54_n identification candidate area information 55 identification candidate area 6 identifier information acquisition unit 64 identifier identification 65 identifier information 66

template

7a, 7b image conversion method determination unit 8 image conversion unit , 82 viewpoints, 86 converted images, 87 rectangular images, 9 identification units, 91 classifier recording units, 92 identification processing execution units, 93 identification results output units

Claims

An object detection apparatus that determines whether a detection target exists within a measurement range, and
A three-dimensional information acquisition unit that acquires three-dimensional information in the measurement range based on an input from an imaging device;
An identification candidate area extraction unit that extracts an identification candidate area where the detection target may exist;
A classifier used to detect the detection target;
A classifier information acquisition unit that acquires information of the classifier;
An image conversion method determination unit that determines parameters for virtually performing viewpoint conversion processing on three-dimensional information in the identification candidate area;
An image conversion execution unit that generates a converted image based on three-dimensional information in the identification candidate area virtually subjected to viewpoint conversion processing;
An identification unit that detects the detection target using the identifier based on the converted image;
An object detection apparatus comprising:
In the object detection device according to claim 1,
An image acquisition unit for acquiring image information within the measurement range based on an input from the imaging device;
An object detection apparatus, further comprising:
In the object detection device according to claim 2,
The identification candidate area extraction unit
An object detection apparatus characterized by extracting the identification candidate area using at least one or more of the image information, the three-dimensional information, and an external sensor.
In the object detection device according to claim 2 or 3,
The image conversion method determination unit
An object detection apparatus characterized by using the image information, the three-dimensional information, and the information of the discriminator to determine a parameter for generating the converted image optimal as an input of the discriminator.
In the object detection device according to any one of claims 2 to 4,
The identification candidate area extraction unit
An identification candidate area ID assigning unit that assigns IDs to a plurality of the identification candidate areas;
An identification candidate area information management unit that collectively manages the ID of the identification candidate area, the position in the image information, and the position in the three-dimensional information;
An object detection apparatus comprising:
In the object detection device according to any one of claims 1 to 5,
The object detection device characterized in that the identifier information acquisition unit acquires the ID of the identifier and identifier information representing an input signal indicating particularly high discrimination ability.
The object detection apparatus according to any one of claims 2 to 6,
The image conversion method determination unit
The image processing to realize the most suitable image in the discriminator which performs optimization processing on the result of the virtual viewpoint conversion processing and determines whether the detection target is included in the identification candidate area An object detection apparatus characterized by determining a parameter.
In the object detection device according to claim 6,
The object detection device, wherein the discriminator information is any one of a template, color information, luminance information, an outline, and gradient information.
In the object detection device according to claim 8,
If the identifier information is a template,
The image conversion method determination unit calculates the similarity between the image acquired by performing the viewpoint conversion process and the acquired template, and selects the classifier that maximizes the similarity. .
In the object detection device according to any one of claims 2 to 9,
An object detection apparatus characterized in that the image conversion method determination unit has a function of determining the parameter by using a camera parameter expressing an installation state of the imaging device.
The object detection apparatus according to any one of claims 2 to 10.
The identification unit records an identification unit having identification ability with respect to the detection target, at least one identification unit recording unit;
An identification processing execution unit that performs an identification process of identifying whether the detection target is included in the converted image using the classifier;
An identification result output unit that outputs a result when it is determined that the detection target is included in the converted image;
An object detection apparatus comprising:
The object detection apparatus according to any one of claims 2 to 11.
The image detection method determination unit determines the parameter based on a three-dimensional shape of the detection target.
In the object detection device according to claim 12,
The image conversion method determination unit acquires a straight line that passes through the identification candidate area and indicates a general posture direction of a detection target,
A function of acquiring the parameters for realizing virtual viewpoint transformation such that the straight line is parallel to the Y axis of the camera coordinate system of the imaging device;
An object detection apparatus comprising:
In the object detection device according to claim 13,
The image conversion method determination unit performs virtual viewpoint conversion from the front, the side, and the top to the viewpoint for observing the identification candidate area after the straight line is converted to a state parallel to the Y axis of the camera coordinate system. An object detection device for acquiring the converted image at each viewpoint.
In the object detection device according to claim 13 or 14,
The image conversion method determination unit
An object detection apparatus characterized in that the straight line is determined by connecting two points at which Euclidean distances of respective points of a three-dimensional point group included in the identification candidate area are maximum.
In the object detection device according to claim 13 or 14,
The image conversion method determination unit
Principal component analysis is performed on the three-dimensional point group included in the identification candidate area;
An object detection apparatus characterized in that the straight line is determined by taking the direction of the first component.
In the object detection device according to claim 13,
The image conversion method determination unit
Estimate the floor of the measurement range,
Detecting a specific part of the detection target for the identification candidate area;
A straight line extending in a direction orthogonal to the floor surface is determined to be the straight line passing through one point corresponding to the portion.
An object detection method for determining whether or not a detection target is present in a measurement range, comprising:
Acquiring three-dimensional information within the measurement range based on the input from the imaging device;
Extracting an identification candidate area where the detection target may exist;
Acquiring information of a classifier used to detect the detection target;
Determining parameters for virtually performing viewpoint conversion processing on three-dimensional information in the identification candidate area;
Generating a converted image based on three-dimensional information in the identification candidate area virtually subjected to viewpoint conversion processing;
An object detection method comprising: detecting the detection target using the classifier based on the converted image.