WO2019016879A1 - Object detection device and object detection method - Google Patents

Object detection device and object detection method Download PDF

Info

Publication number
WO2019016879A1
WO2019016879A1 PCT/JP2017/026036 JP2017026036W WO2019016879A1 WO 2019016879 A1 WO2019016879 A1 WO 2019016879A1 JP 2017026036 W JP2017026036 W JP 2017026036W WO 2019016879 A1 WO2019016879 A1 WO 2019016879A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
object detection
information
candidate area
identification
Prior art date
Application number
PCT/JP2017/026036
Other languages
French (fr)
Japanese (ja)
Inventor
亮祐 三木
聡 笹谷
誠也 伊藤
Original Assignee
株式会社日立製作所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社日立製作所 filed Critical 株式会社日立製作所
Priority to PCT/JP2017/026036 priority Critical patent/WO2019016879A1/en
Priority to JP2019530278A priority patent/JP6802923B2/en
Publication of WO2019016879A1 publication Critical patent/WO2019016879A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast

Definitions

  • the present invention provides object detection that realizes robust object detection against changes in the installation state of the camera, and even in the case where the manner in which the detection target appears changes due to movement of the camera and the detection target.
  • the present invention relates to an apparatus and an object detection method.
  • an object detection technique for detecting an object (for example, a person, a cargo, a vehicle, etc.) to be detected from image information acquired by an imaging device such as a surveillance camera.
  • an imaging device such as a surveillance camera.
  • a background image in which an object to be detected does not exist is prepared in advance, and a background difference for detecting an object by comparing the input captured image with the background image, or a frame of an image
  • an optical flow etc. which detect a moving body by the difference of the feature point in between.
  • Patent Document 1 it is determined in paragraph 0034 that “a person is determined to be a person from contour information obtained from a feature amount based on appearance by HOG, and that a foreground is “Identifying an image determined to be a state) as a person” is described.
  • Patent Document 1 including this description, means for extracting outline information of a person from a learning sample consisting of an image including a person and an image not including the person, and generating a discriminator that discriminates between a person and a person, And a means for determining whether or not a person is present in a predetermined area on an image using the
  • the deformation detection area 2100 on the monitoring image 2000 reflects information on each parameter of the camera device, and as shown in FIG. Then, the object recognition apparatus 1 (1a) extracts the feature amount of the image information 100 of the deformation detection area 2100 created as an area including the recognition target deformed by distortion or the like, and the object of the recognition target It is described that "is determined whether or not.”
  • Patent Document 2 as an applied technology of Patent Document 1, it is assumed that a detection target on an image is deformed due to the influence of lens distortion unique to a camera, and a person exists using a classifier. There is disclosed a technique for improving the detection rate by deforming a predetermined region to be input before determining whether or not it is.
  • Patent Document 1 a classifier is made to learn a person image of a specific posture (for example, an image obtained by photographing an upright posture from the front) as a learning sample, and a human being with a specific posture is detected using this classifier. I am raising the rate.
  • a specific posture for example, an image obtained by photographing an upright posture from the front
  • Patent Document 2 As shown in FIG. 13 and the like of the same document, specification in which the contour information of a person to be input to a discriminator is set in advance from the parameter information of the camera device and the positional relationship between the detection target and the camera device By deforming (normalizing) so as to be the same as the posture of, it is possible to maintain the detection rate by the classifier even when the posture of the person changes in a certain range.
  • the object recognition method of Patent Document 2 significantly reduces the detection rate.
  • the detection rate There is a problem called. For example, even if it is possible to easily detect a person from an image including all of the head, arms, torso and legs of the person, the image of the person taken from directly above or the lower body is hidden in the shield.
  • the human detection rate in the image is significantly reduced in the discriminator of Patent Document 2 because the leg can not be detected from the image.
  • Another object of the present invention is to provide an object detection device capable of realizing highly accurate person detection.
  • An object detection device is an object detection device that determines whether or not a detection target exists in a measurement range, and acquires three-dimensional information in the measurement range based on an input from an imaging device.
  • an image conversion method determination unit that determines parameters for virtually performing a viewpoint conversion process on three-dimensional information in the identification candidate area, and three-dimensional information in the identification candidate area that is virtually subjected to a viewpoint conversion process.
  • An image conversion execution unit that generates a converted image, and an identification unit that detects the detection target using the identifier based on the converted image.
  • the object to be detected is made highly accurate. It can be detected.
  • FIG. 1 is a view showing an example of the arrangement of an object detection apparatus according to a first embodiment
  • FIG. 6 is a diagram showing details of an identification candidate region extraction unit of the first embodiment.
  • FIG. 8 is a diagram showing details of an identification candidate area information management unit of the first embodiment. It is a figure which shows the identification candidate area
  • FIG. 7 is a diagram showing details of a discriminator of the first embodiment.
  • FIG. 7 is a diagram showing details of an image conversion method determination unit of the first embodiment.
  • FIG. 6 is a diagram for explaining the processing content of the viewpoint conversion unit of the first embodiment. It is a figure explaining the effect of an image conversion method determination part.
  • FIG. 7 is a diagram showing details of an identification unit of the configuration example of the first embodiment.
  • FIG. 7 is a diagram showing an example of a process flow in the first embodiment.
  • FIG. 7 is a diagram showing an example of the configuration of an object detection device according to a second embodiment.
  • FIG. 18 is a diagram for explaining the process of the image conversion method determination unit of the second embodiment.
  • FIG. 16 is a diagram for explaining the processing flow of the image conversion method determination unit of the second embodiment. It is a figure explaining the detail of the processing flow of FIG.
  • the detection target is not limited to a person, and may be cargo, a vehicle, or the like.
  • information including the detection target is not limited to image information captured by an imaging device, and a heat map acquired by a thermo sensor It may be.
  • An object detection apparatus 2a according to the first embodiment will be described with reference to FIGS.
  • FIG. 1 is a block diagram showing an outline of an object detection device 2a of the present embodiment connected to an imaging device 1 such as a stereo camera.
  • the object detection device 2a realizes robust detection of the detection target even when the appearance of the detection target on the captured image of the imaging device 1 changes due to a change in relative position between the imaging device 1 and the detection target.
  • Object detection device Object detection device.
  • 3 is an image acquisition unit for acquiring image information within the measurement range based on the input from the imaging device 1, and 4 is a third order within the measurement range based on the input from the imaging device 1.
  • a three-dimensional information acquisition unit for acquiring source information 5 is an identification candidate area extraction unit for extracting an identification candidate area which is an area where a detection target may exist using image information and three-dimensional information from a measurement range, and 6 is an object
  • a discriminator information acquisition unit for acquiring information of the discriminator 64 used in the detection device 2a, 7a determines a method for converting the discriminant candidate region into an optimal image as an input of the discriminator 64 using the discriminator information
  • image acquisition unit 3 to the identification unit 9 need not necessarily be dedicated hardware, and may be stored in a program stored in a main storage device such as a semiconductor memory or an auxiliary storage device such as a hard disk. It may be realized by processing the data with an arithmetic device such as a CPU.
  • the imaging device 1 is a device capable of acquiring image information of a measurement range and three-dimensional information.
  • image information is luminance information in digital image data
  • three-dimensional information is coordinate information of a three-dimensional point group in a measurement range (three-dimensional space).
  • the imaging device 1 may be a stereo camera composed of two or more cameras, or a combination of one camera and a distance sensor capable of acquiring three-dimensional information.
  • a stereo camera measures the distance from a camera to a subject using the principle of triangulation by photographing the same subject with two or more cameras, and both of image information and three-dimensional information You can get
  • the distance sensor measures the distance to the target by calculating from the phase difference between the projected light and the reflected light the time it takes for the projected light to be reflected by the target and returned to the distance sensor.
  • three-dimensional information and image information can be associated and acquired.
  • FIG. 2 shows the details of the identification candidate area extraction unit 5.
  • the identification candidate area extraction unit 5 extracts an identification candidate area 55 in which a detection target may exist using image information or three-dimensional information acquired by the image acquisition unit 3 and the three-dimensional information acquisition unit 4 or both of them.
  • Image processing unit 51 extracting identification candidate region 55 using image information; three-dimensional information processing unit 52 extracting identification candidate region 55 using three-dimensional information; and one or more extracted extraction candidates
  • An identification candidate area ID assigning unit 53 for assigning an ID to the area 55, and an identification candidate area information management unit 54 for acquiring and managing identification candidate area information representing the position of the identification candidate area 55 are provided.
  • the image processing unit 51, the three-dimensional information processing unit 52, the identification candidate area ID assignment unit 53, and the identification candidate area information management unit 54 will be described in detail.
  • the image processing unit 51 extracts an identification candidate area 55 by performing image processing on the image information acquired by the imaging device 1.
  • image processing executed here for example, there is a background difference in which a background image obtained by capturing a shooting space in a state in which no detection target exists is obtained in advance, and the difference between the background image and the captured image is calculated.
  • the method is not particularly limited as long as it is a means capable of extracting a region to be detected by image information, such as detection using color information such as skin color detection.
  • the three-dimensional information processing unit 52 extracts the identification candidate area 55 by performing three-dimensional processing on the three-dimensional information acquired by the imaging device 1.
  • the three-dimensional processing to be executed here for example, background three-dimensional information of the shooting space in a state where no detection target exists is obtained in advance, and the difference between the background three-dimensional information and the three-dimensional information obtained anew
  • there is a method of calculation it is not particularly limited as long as the identification candidate area 55 is obtained by performing three-dimensional processing.
  • identification candidate area ID assignment unit 53 and the identification candidate area information management unit 54 will be described with reference to FIGS. 3A to 3C.
  • the identification candidate area ID assigning unit 53 assigns an ID to each of the identification candidate areas 55 extracted by the image processing unit 51 and the three-dimensional information processing unit 52. Further, the identification candidate area information management unit 54 adds position information of the identification candidate area to the ID, and manages the ID as identification candidate area information 54 — n.
  • the position information is an image position indicating the start point and the end point in the two-dimensional image of the identification candidate area, and a three-dimensional position indicating the start point and the end point in the three-dimensional imaging space of the identification candidate area.
  • FIG. 3A exemplifies n pieces of identification candidate area information 54_n managed by the identification candidate area information management unit 54.
  • the corresponding identification candidate area 55 is It indicates that the image position and the three-dimensional position which are position information are recorded.
  • FIG. 3B specifically shows the image position of the identification candidate area information 54_1, and 56a and 56b indicate the start point (x1, y1) and the end point (x1) of the rectangular identification candidate area 55 in the captured image of the imaging device 1. ', y 1') is shown.
  • FIG. 3B specifically shows the image position of the identification candidate area information 54_1, and 56a and 56b indicate the start point (x1, y1) and the end point (x1) of the rectangular identification candidate area 55 in the captured image of the imaging device 1. ', y 1') is shown.
  • FIG. 3C specifically shows the three-dimensional position of the identification candidate area information 54_1, and 57a and 57b indicate the start point (X1, Y1, Z1) and the end point (X1) of the rectangular identification candidate area 55. ', Y1', Z1 ') are shown.
  • FIG. 3B and FIG. 3C illustrated the rectangular or rectangular parallelepiped identification candidate area 55, as long as the expression can identify the position of the identification candidate area 55, identification candidate areas of other shapes may be used. In this case, it is needless to say that the information on the image position and the three-dimensional position in FIG. 3A is also expressed in accordance with the identification candidate area of the other shape.
  • ⁇ Classifier Information Acquisition Unit> Next, the discriminator information acquisition unit 6 will be described with reference to FIG.
  • the discriminator information acquisition unit 6 selects an appropriate one from a plurality of prepared discriminators 64 and extracts the discriminator information 65 corresponding thereto.
  • 67_n is a discriminator ID given to manage the discriminator 64_n.
  • the discriminator 64 is used for discrimination processing to discriminate whether a detection target is included in a captured image of the imaging device 1, and each of the discriminators 64_n has high discrimination ability with respect to detection targets of different postures. is there.
  • Different classifiers 64 — n can have different characteristics by learning a large number of images including the detection target and images (learning samples) not including the detection target by the machine learning method.
  • a Support Vector Machine is generally used as a machine learning method, another machine learning method may be used.
  • the discriminator information 65 — n indicates an input image in which the discriminator 64 — n exhibits a particularly high discrimination capability.
  • FIG. 4 exemplifies a template 66_1 strong in identifying a person image viewed from the front, a template 66_2 strong in identifying a person image viewed from the top, and a template 66_n strong in identifying a person image viewed from the side as discriminator information.
  • Other information is recorded as long as it is identifier information representing color information, feature information representing an outline, luminance information, gradient information, etc. that is an image or a method of generating an image suitable for the input of the identifier 64_n. You may leave it.
  • the image conversion method determination unit 7a is a conversion method (parameters and the like) for converting into an optimal image as an input to the discriminator 64 based on three-dimensional information in the rectangular parallelepiped identification candidate area 55 illustrated in FIG. 3C.
  • S51 parameters for viewpoint conversion are determined
  • S52 a viewpoint conversion image is generated using the parameters
  • the similarity with the converted image is calculated with reference to the classifier information 65 held by each of the plurality of classifiers 64 (S53), and if the similarity is higher than the threshold, the process is ended, and if below the threshold, step S51. , And change the parameter to another value (S54).
  • steps S51, S52, S53, and S54 will be described in detail.
  • step S51 parameters ⁇ , ⁇ , and ⁇ necessary for generating a viewpoint conversion image are determined. The details of each parameter will be described in detail later. As a method of determining the parameters ⁇ , ⁇ , and ⁇ in step S51, there is a method of changing the parameters comprehensively.
  • step S52 the process shown in FIG. 6 is performed.
  • 82 is a viewpoint for observing the identification candidate area 55
  • 83, 84, 85 are x-axis, y-axis, z-axis, 86_1, 86_2 in the coordinate system of the three-dimensional space set in the measurement range by viewpoint conversion. An example of the converted image is shown.
  • step S52 using the parameters ⁇ , ⁇ , and ⁇ determined in step S51, three-dimensional information included in the rectangular identification candidate area 55 is ⁇ about the x axis 83, ⁇ about the y axis 84, and By rotating by about ⁇ about the z axis 85, viewpoint conversion is performed to a state observed from an arbitrary viewpoint, and a conversion candidate 86 is acquired by projecting the identification candidate area 55 after viewpoint conversion onto an image.
  • viewpoint conversion method it is general to use conversion equations such as Equation 1 to Equation 3, but other viewpoint conversion methods may be used.
  • perspective projection is a general method, but other methods may be used.
  • the identification candidate area 55 including three-dimensional information of a person standing up is photographed by the imaging device 1 installed in the direction of the viewpoint 82, the person is displayed by projecting the identification candidate area 55 without changing the viewpoint.
  • the converted image 86_1 viewed from the top can be acquired.
  • an optimization process is performed to determine an image conversion method to an image most suitable for the discriminator 64.
  • the template 66 is acquired with reference to the discriminator information 65, and viewpoint conversion is performed on the identification candidate area 55 to calculate the similarity with the converted image 86_n acquired.
  • As a method of calculating the degree of similarity for example, pattern matching such as Normalized Cross-Correlation is generally used, but other methods may be used.
  • the evaluation function is a similarity, and an evaluation function with parameters ⁇ , ⁇ , and ⁇ as variables is designed, and the similarity to the classifier 64_n is solved by solving an optimization problem that maximizes the evaluation function.
  • step S54 the similarity between the converted image 86_n calculated in step S53 and the discriminator information 65 is compared with a threshold, and if the similarity is equal to or greater than the threshold, the process ends; if less than the threshold, the process returns to step S51; After changing to, repeat the same process.
  • the installer of the object detection apparatus 2a may optionally set the threshold used in step S54, but by feeding back the accuracy of object detection when object detection is performed by the object detection apparatus 2a using a predetermined threshold.
  • the threshold may be changed to an appropriate value. For example, there is a method of changing the threshold value to a higher value when it is determined that the accuracy of the object detection device 2a using a certain threshold value is insufficient.
  • the image conversion method determination unit 7a When determining the parameters in step S51, the image conversion method determination unit 7a creates a matrix map in which the ratio of the vertical and the horizontal of the converted image generated by the parameters ⁇ , ⁇ , and ⁇ is recorded in advance. You may decide. Alternatively, the imaging space may be divided into a plurality of regions, and a matrix map holding parameters ⁇ , ⁇ and ⁇ that are approximately effective for each region may be prepared and determined with reference to the matrix map. At that time, a method may be used in which it is possible to update if a more suitable one than the parameters ⁇ , ⁇ and ⁇ held by the matrix map is found. Alternatively, the parameters ⁇ , ⁇ , and ⁇ that are approximately effective may be determined by acquiring camera parameters and acquiring information on the installation state of the imaging device 1.
  • the process of calculating the degree of similarity by changing the parameters ⁇ , ⁇ , and ⁇ is continued, and if continuing, the process returns to step S51. It is good to finish.
  • the determination criterion of whether or not to execute the process may be determined, for example, depending on whether the number of times of changing the parameters ⁇ , ⁇ and ⁇ exceeds the number set in advance.
  • the process may be terminated when the degree of similarity calculated in step S53 is equal to or less than the preset minimum value. Even if the similarity does not exceed the threshold value, the process is terminated to prevent waste that the object detection process of the object detection device 2a is repeatedly performed in the case where the identification candidate area 55 does not include the object to be detected. Can.
  • FIGS. 7A and 7B the effects of the image conversion method determination unit 7a will be described using FIGS. 7A and 7B.
  • 82a, 82b, and 82c indicate viewpoints (installation positions and directions) of the imaging device 1
  • 87a, 87b, and 87c indicate rectangular images including a person extracted from the photographed images of the respective viewpoints.
  • the rectangular image 87a captured from the viewpoint 82a is a template 66_1 of the identifier 64_1.
  • the rectangular image 87c taken from the viewpoint 82c has a high degree of similarity with the template 66_2 of the discriminator 64_2. Therefore, for the rectangular image 87a and the rectangular image 87c, a person can be easily detected by using the discriminator 64_1 or the discriminator 64_2.
  • the discriminator 64_1 and the discriminator 64_2 can not discriminate a person. Therefore, conventionally, it has been difficult to detect a person at a site where only the imaging device 1 of the viewpoint 82b is installed.
  • the parameters ⁇ , ⁇ , ⁇ are determined, and the converted image 86 b is created by performing virtual viewpoint conversion on the rectangular image 87 b using the parameters as inputs. Then, the similarity between the converted image 86b and the template 66_1 and the template 66_2 is calculated, and when there is a classifier 64_n indicating the similarity equal to or higher than the threshold, the ID 67 of the classifier is acquired. If there is no classifier 64_n indicating the similarity equal to or higher than the threshold, the parameters ⁇ , ⁇ , ⁇ are determined again, and the same processing is performed. Although the rectangular image 87b is deformed due to the tilt of the camera, it is possible to acquire image information on the front of the person and three-dimensional information.
  • the advantage of the image conversion method determination unit 7a in a situation where an obstacle exists between the viewpoint 82b and a person and a part of the human body (for example, a leg) is not seen in the rectangular image 87b will be described.
  • the converted image 86b loses the leg as well as the rectangular image 87b, so the human can not be detected by the classifier 64_1 that needs the leg detection.
  • the rectangular image 87b is subjected to viewpoint conversion to the viewpoint 82c, similarly to the rectangular image 87b, although the converted image 86b also lacks a leg, a human being is detected by the classifier 64_2 which does not need the leg detection.
  • the image conversion unit 8 converts the identification candidate area 55 in accordance with the image conversion method determined by the image conversion method determination unit 7a, and acquires a converted image 86 suitable for input to the classifier 64.
  • the image conversion method as in step S52, for example, conversion equations such as Equations 1 to 3 can be used, but other methods may be used.
  • FIG. 8 shows the details of the identification unit 9.
  • the identification unit 9 determines whether or not the detection target is included in the converted image 86_n acquired by the image conversion unit 8, and includes a classifier recording unit 91 that records at least one or more classifiers 64_n, It includes an identification processing execution unit 92 that performs identification processing on the converted image 86_n using the identifier 64_n, and an identification result output unit that outputs the result of the identification processing.
  • the identification process execution unit 92 and the identification result output unit 93 will be described in detail.
  • the discrimination processing execution unit 92 performs discrimination processing on the converted image 86_n output from the image conversion unit 8 using the classifier 64_n recorded in the classifier recording unit 91.
  • the classification processing execution unit 92 acquires the ID of the classifier 64_n selected by the image conversion method determination unit 7a, and corresponds to the ID. After selecting the discriminator 64_n, the discrimination process is performed on the converted image 86_n output from the image conversion execution unit 7.
  • the identification result output unit 93 outputs the identification processing result of the identification processing execution unit 92 to the outside.
  • the image of the imaging space may be displayed on the display device.
  • the identification processing execution unit 92 determines that the converted image 86 _n includes the detection target
  • the identification candidate area information 54 _n of the identification candidate area 55 that is the basis of the converted image 86 _n is referred to.
  • the image position of the identification candidate area 55 in the photographed image is acquired.
  • a rectangular detection window or the like may be displayed at a position corresponding to the detection target in the captured image displayed on the display device, or a message may be displayed indicating that the detection target has been detected.
  • step S91 first, the imaging device 1 acquires image information and three-dimensional information corresponding to the measurement range, and outputs the image information and the three-dimensional information to the object detection device 2a.
  • the image acquisition unit 3 acquires image information based on the input from the imaging device 1
  • the three-dimensional information acquisition unit 4 acquires three-dimensional information based on the input from the imaging device 1.
  • step S 92 the identification candidate area 55 is extracted using the identification candidate area extraction unit 5. Specifically, after using the rectangular area extracted by the image processing unit 51 and the rectangular parallelepiped area extracted by the three-dimensional information processing unit 52 as the identification candidate area 55, the identification candidate area ID of the extracted identification candidate area 55 The assigning unit 53 assigns an ID.
  • step S93 one identification candidate area 55 to be subjected to identification processing is selected from the extracted identification candidate areas 55.
  • step S94 viewpoint conversion is performed on the selected identification candidate area 55, and the converted image 86_n is acquired by projecting the image onto the image.
  • an image conversion method in which the degree of similarity with respect to the classifier information 65 is maximum is acquired.
  • the ID of the classifier 64_n having the highest similarity between the converted image 86_n acquired by performing viewpoint conversion on the candidate identification area 55 and the template 66 is obtained And determine an appropriate image conversion method for the corresponding classifier 64_n.
  • step S95 image conversion is performed on the selected identification candidate area 55 by the conversion method determined in step S94, and a converted image 86_n is acquired.
  • step S96 identification processing is performed on the converted image 86_n acquired in step S95 using the identifier 64_n.
  • step S97 as a result of the identification process, it is determined whether the converted image includes a detection target. If it is included, step S98 is performed. If it is not included, step S99 is performed.
  • step S98 when it is determined that the detection target is included in the converted image as a result of the identification processing, the identification result is output.
  • the object detection device 2a is connected to a display device such as a monitor, an image of the imaging space is displayed on the display device, and a rectangular detection window is displayed at a position corresponding to the identification candidate area 55 in the image. And a message indicating that a detection target has been detected may be displayed.
  • the position corresponding to the identification candidate area 55 is acquired with reference to the position information recorded in the identification candidate area information management unit 54.
  • step S99 after the identification process for the selected identification candidate area 55 is completed, it is determined whether the identification process has been performed for all the identification candidate areas 55 extracted in step S92. Then, if there is an identification candidate area 55 for which the identification process has not been performed, step S93 is performed, and if there is no identification candidate area 55 for which the identification process has not been performed, the object detection process is ended.
  • the object detection device 2a performs detection of the detection target after converting the extracted identification candidate area 55 into an image suitable for input to the classifier by virtual viewpoint conversion. By doing this, even if the appearance of the detection target in the image is different from the template of the classifier, or even if part of the detection target on the screen is hidden by the shield, the detection target is detected with high accuracy. can do.
  • FIG. 2 is a block diagram showing an outline of an object detection apparatus 2b according to this embodiment connected to an imaging apparatus 1 such as a stereo camera.
  • the object detection device 2a of the first embodiment uses the image conversion method determination unit 7a which comprehensively changes the parameters ⁇ , ⁇ , and ⁇ for rotating three-dimensional information
  • the object detection device 2b of the present embodiment is more An image conversion method determination unit 7b capable of efficiently determining the parameters ⁇ , ⁇ , and ⁇ is used.
  • the image conversion method determination unit 7b will be described in detail.
  • FIG. 11 showing a situation in which an upright person is photographed from the viewpoint 82d.
  • Xc, Yc, and Zc are the x-axis, y-axis, and z-axis of the camera coordinate system
  • 204 indicates the optical axis of the imaging device 1 installed at the viewpoint 82 d.
  • the camera coordinate system is a three-dimensional coordinate system representing a shooting space, with the optical center of the camera of the imaging device 1 as the origin and the z axis (Zc) aligned with the direction of the optical axis 204 of the camera.
  • (Xc) and the y-axis (Yc) are parallel to the horizontal and vertical directions of the image projection plane 205.
  • 205 is an image projection plane
  • 206 is an image captured from the viewpoint 82d
  • 207 is three-dimensional information acquired from the viewpoint 82d
  • 208 is a straight line indicating the posture direction of the detection target
  • 209 and 210 are identification candidate areas 55 and straight line 208
  • 211 and 212 indicate point coordinates in the camera coordinate system of the identification candidate area 55 after the virtual viewpoint conversion and the straight line 208 (Xct ', Yct', Zct ') and (Xcb', Ycb ', Zcb') are shown.
  • the straight line 208 inclined with respect to the y axis (Yc) is converted into a straight line 208 'parallel to the y axis (Yc).
  • Parameters ⁇ , ⁇ , and ⁇ are calculated. Then, by generating a three-sided view of the converted identification candidate area 55 ′, the optimum converted image 86 is obtained as an input to the discriminator 67.
  • FIG. 12 shows the process flow of the image conversion method determination unit 7b including the process of determining the parameters ⁇ , ⁇ , and ⁇ . The following outlines this processing flow.
  • the straight line 208 is acquired (S122). Then, after setting the optional parameters ⁇ and ⁇ for rotating the straight line 208 (S123), a three-view drawing of the detection target is generated using the set parameters ⁇ , ⁇ and ⁇ (S124). Then, after one of the three views is selected as the converted image 86 (S125), the similarity between the selected converted image 86 and the discriminator information 65 is calculated (S126). Is determined as the input image to the discriminator 64, and the process ends. On the other hand, if the similarity is less than the threshold value, the process proceeds to step S128 (S127).
  • step S129 the parameter ⁇ is changed, that is, after the identification candidate area 55 is rotated about the y axis (Yc), the process returns to step S124 (S129), and a converted image 86 having a similarity of the threshold or more is obtained. Repeat the process until it is done.
  • step S129 the parameter ⁇ is changed, that is, after the identification candidate area 55 is rotated about the y axis (Yc), the process returns to step S124 (S129), and a converted image 86 having a similarity of the threshold or more is obtained.
  • step S122 the straight line 208 is acquired.
  • the three-dimensional information of the identification candidate area 55 is referred to, and a straight line connecting two points at which the Euclidean distance between points of the three-dimensional point group is maximum is taken.
  • the identification candidate area 55 including the person can be predicted to be a rectangular solid long in the vertical direction, and the direction in which the Euclidean distance is maximum is a straight line 208 indicating the posture direction of the person.
  • principal component analysis may be performed on the three-dimensional point group of the identification candidate area 55, and a straight line taken in the direction of the first component may be used.
  • a straight line 208 is obtained using the direction orthogonal to the floor surface and the information of one point corresponding to the head. It is good to decide how to decide.
  • the rotation angle around the z axis (Zc) corresponds to the parameter ⁇
  • the identification candidate area 55 so that Zct 'and Zcb' become equal.
  • the rotation angle around the x axis (Xc) when rotating x corresponds to the parameter ⁇ . Since the parameter ⁇ is set to 0 ° in step S121, the parameters ⁇ , ⁇ , and ⁇ can be determined by the above processing.
  • step S124 three views of the identification candidate area 55 are acquired after virtual viewpoint conversion.
  • viewpoints 82e, 82f, and 82g are viewpoints for generating a three-view
  • converted images 86e, 86f, and 86g indicate converted images 86 generated from the respective viewpoints.
  • the trihedral view is generated while changing the parameter ⁇ , and when the predetermined parameter ⁇ is obtained, the conversion in FIG.
  • the image 86e it is possible to virtually convert the viewpoint to the viewpoint from the front of the person in the identification candidate area 55, and the person can be detected using the identifier 64 having the corresponding template.
  • the way of viewing from a specific direction of the detection target may not be suitable for identification due to shielding or the like. Therefore, in addition to the converted image 86e from the viewpoint 82e from the front, the converted images 86f and 86g are also obtained from the viewpoints 82f and 82g from the side surface and the top surface, thereby increasing the number of candidate classifiers 64 as candidates. The accuracy of detection can be improved.
  • the parameters ⁇ , ⁇ , and ⁇ can be determined more efficiently than in the first embodiment, and when a person is deformed in an image or when shielding occurs. It is possible to carry out highly accurate person detection.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)
  • Closed-Circuit Television Systems (AREA)
  • Studio Devices (AREA)

Abstract

This object detection device, which determines whether there is a detection target within a measurement range, is characterized by comprising: a three-dimensional information acquisition unit which acquires three-dimensional information within the measurement range on the basis of an input from an image capturing device; an identification candidate area extraction unit which extracts identification candidate areas in which the detection target object may be present; identifiers which are used to detect the detection target object; an identifier information acquisition unit which acquires identifier information; an image conversion method determining unit which determines a parameter for virtually viewpoint-converting the three-dimensional information in the identification candidate areas; an image conversion execution unit which generates a converted image on the basis of the virtually viewpoint-converted three-dimensional information in the identification candidate areas; and an identification unit which detects the detection target object using the identifier on the basis of the converted image.

Description

物体検出装置、及び、物体検出手法Object detection apparatus and object detection method
 本発明は、カメラの設置状態が変化した場合や、カメラおよび検出対象の移動により検出対象の映り方が変化した場合であっても、それらの変化に対して頑健な物体検出を実現する物体検出装置、及び、物体検出方法に関する。 The present invention provides object detection that realizes robust object detection against changes in the installation state of the camera, and even in the case where the manner in which the detection target appears changes due to movement of the camera and the detection target. The present invention relates to an apparatus and an object detection method.
 監視カメラなどの撮像デバイスが取得した画像情報から検出対象の物体(例えば、人物、貨物、車両等)を検出する物体検出技術へのニーズが高い。一般的な物体検出技術としては、検出対象の物体が存在しない背景画像を予め用意しておき、入力された撮像画像と背景画像を比較することで、物体を検出する背景差分や、映像のフレーム間における特徴点の差分によって動体を検出するオプティカルフローなどがある。しかし、これらの方法では、画像中の動きのあるものを全て検出してしまうため、例えば画像中から特定の対象のみを検出することができない。 There is a great need for an object detection technique for detecting an object (for example, a person, a cargo, a vehicle, etc.) to be detected from image information acquired by an imaging device such as a surveillance camera. As a general object detection technique, a background image in which an object to be detected does not exist is prepared in advance, and a background difference for detecting an object by comparing the input captured image with the background image, or a frame of an image There is an optical flow etc. which detect a moving body by the difference of the feature point in between. However, in these methods, it is not possible to detect, for example, only a specific object in an image, because all moving objects in the image are detected.
 そこで、物体の輪郭情報や、外見から読み取れる色や形状などのアピアランス情報などを利用して特定の物体を検出する技術が求められる。 Therefore, a technique for detecting a specific object using contour information of an object, appearance information such as a color or a shape that can be read from an appearance, or the like is required.
 例えば、特許文献1では、段落0034に「HOGによるアピアランスに基づく特徴量から得られる輪郭情報から人である判定され、かつ、ピクセル状態分析による時空間特徴に基づく特徴量から前景(動状態または静状態)であると判定された画像を人と判別する」識別部が記載されている。この記載をはじめ、特許文献1には、人物を含む画像と含まない画像から成る学習サンプルから、人物の輪郭情報を抽出し、人物と人物以外に区別する識別器を生成する手段と、識別器を用いて画像上の所定の領域に人物が存在するか否かを判定する手段とを用いて、人物検出を実現する技術が開示されている。 For example, in Patent Document 1, it is determined in paragraph 0034 that “a person is determined to be a person from contour information obtained from a feature amount based on appearance by HOG, and that a foreground is “Identifying an image determined to be a state) as a person” is described. In Patent Document 1, including this description, means for extracting outline information of a person from a learning sample consisting of an image including a person and an image not including the person, and generating a discriminator that discriminates between a person and a person, And a means for determining whether or not a person is present in a predetermined area on an image using the
 また、特許文献2では、段落0016に「この監視画像2000上の変形検知領域2100は、カメラ装置の各パラメータに関する情報を反映し、図3に示すように、監視画像2000の歪みを考慮した上で作成される。そして、物体認識装置1(1a)は、歪み等で変形した認識対象を含む領域として作成された変形検知領域2100の画像情報100について、特徴量を抽出し、認識対象の物体か否かを判定する。」と記載されている。この記載をはじめ、特許文献2には、特許文献1の応用技術として、画像上の検出対象が、カメラ特有のレンズ歪みの影響で変形することを想定し、識別器を用いて人物が存在するか否かを判定する前に、入力する所定の領域を変形させることで検出率を向上させる技術が開示されている。 Further, in Patent Document 2, in paragraph 0016, “the deformation detection area 2100 on the monitoring image 2000 reflects information on each parameter of the camera device, and as shown in FIG. Then, the object recognition apparatus 1 (1a) extracts the feature amount of the image information 100 of the deformation detection area 2100 created as an area including the recognition target deformed by distortion or the like, and the object of the recognition target It is described that "is determined whether or not." Including this description, in Patent Document 2, as an applied technology of Patent Document 1, it is assumed that a detection target on an image is deformed due to the influence of lens distortion unique to a camera, and a person exists using a classifier. There is disclosed a technique for improving the detection rate by deforming a predetermined region to be input before determining whether or not it is.
特開2009-181220号公報JP, 2009-181220, A 特開2012-221437号公報JP 2012-221437 A
 特許文献1では、特定姿勢の人物画像(例えば、直立姿勢を正面から撮影した画像)を学習サンプルとして識別器に学習させ、この識別器を用いて人物検出することで、特定姿勢の人物の検出率を高めている。 In Patent Document 1, a classifier is made to learn a person image of a specific posture (for example, an image obtained by photographing an upright posture from the front) as a learning sample, and a human being with a specific posture is detected using this classifier. I am raising the rate.
 しかし、実際に撮影された画像中では、カメラ装置と人物の相対位置関係や、カメラ装置のレンズ歪みによって、人物の姿勢(見え方)が大きく変化するため、学習サンプルと撮影画像中の人物の輪郭情報が相違する場合は、特許文献1の識別器では、人物検出の精度が低下してしまうという課題がある。 However, in the actually photographed image, the posture (the appearance) of the person largely changes due to the relative positional relationship between the camera device and the person, or the lens distortion of the camera device, so the learning sample and the person in the photographed image When the contour information is different, in the discriminator of Patent Document 1, there is a problem that the accuracy of human detection is lowered.
 また、特許文献2では、同文献の図13等に示されるように、カメラ装置のパラメータ情報と、検出対象とカメラ装置の位置関係から、識別器に入力する人物の輪郭情報を予め設定した特定の姿勢と同一になるように変形(正規化)することで、人物の姿勢が一定の範囲で変化している場合であっても、識別器による検出率を維持することができる。 Further, in Patent Document 2, as shown in FIG. 13 and the like of the same document, specification in which the contour information of a person to be input to a discriminator is set in advance from the parameter information of the camera device and the positional relationship between the detection target and the camera device By deforming (normalizing) so as to be the same as the posture of, it is possible to maintain the detection rate by the classifier even when the posture of the person changes in a certain range.
 しかし、検出対象の見え方が想定と大幅に異なる場合や、検出対象の一部が遮蔽物の陰に隠れている場合には、特許文献2の物体認識方法では、検出率が大幅に低下するという課題がある。例えば、人物の頭部、腕部、胴体、脚部の全てを含む画像からは容易に人物を検出できる場合であっても、人物を真上から撮影した画像や、下半身が遮蔽物に隠された人物を撮影した画像を用いた場合は、画像中から脚部を検出できない等の理由により、特許文献2の識別器では、画像中の人物検出率が大幅に低下してしまう。 However, in the case where the detection object looks significantly different from what is expected, or when a part of the detection object is hidden behind a shielding object, the object recognition method of Patent Document 2 significantly reduces the detection rate. There is a problem called. For example, even if it is possible to easily detect a person from an image including all of the head, arms, torso and legs of the person, the image of the person taken from directly above or the lower body is hidden in the shield In the case of using an image obtained by capturing an image of a person, the human detection rate in the image is significantly reduced in the discriminator of Patent Document 2 because the leg can not be detected from the image.
 このような課題を解決するため、本発明では、識別器に対応しない姿勢の人物を含む撮影画像や、人体の一部が障害物の影に隠れた状態で撮影された画像を用いた場合においても、高精度な人物検出を実現できる物体検出装置を提供することを目的とする。 In order to solve such a problem, in the present invention, in the case of using a photographed image including a person in a posture not corresponding to the classifier or an image photographed in a state where a part of the human body is hidden in the shadow of the obstacle. Another object of the present invention is to provide an object detection device capable of realizing highly accurate person detection.
 本発明に係る物体検出装置は、計測範囲内に検出対象が存在するか否かを判定する物体検出装置であって、撮像装置からの入力を基に前記計測範囲内の三次元情報を取得する三次元情報取得部と、前記検出対象が存在し得る識別候補領域を抽出する識別候補領域抽出部と、前記検出対象の検出に用いる識別器と、該識別器の情報を取得する識別器情報取得部と、前記識別候補領域内の三次元情報を仮想的に視点変換処理するパラメータを決定する画像変換方法決定部と、仮想的に視点変換処理した前記識別候補領域内の三次元情報を基に変換画像を生成する画像変換実施部と、該変換画像を基に前記識別器を用いて前記検出対象を検出する識別部と、を備えるものとした。 An object detection device according to the present invention is an object detection device that determines whether or not a detection target exists in a measurement range, and acquires three-dimensional information in the measurement range based on an input from an imaging device. A three-dimensional information acquisition unit, an identification candidate area extraction unit for extracting an identification candidate area where the detection target may exist, a classifier used to detect the detection target, and classifier information acquisition for acquiring information of the classifier And an image conversion method determination unit that determines parameters for virtually performing a viewpoint conversion process on three-dimensional information in the identification candidate area, and three-dimensional information in the identification candidate area that is virtually subjected to a viewpoint conversion process. An image conversion execution unit that generates a converted image, and an identification unit that detects the detection target using the identifier based on the converted image.
 本発明の物体検出装置によれば、カメラ装置と物体の相対位置が想定と大幅に異なる画像や、物体の一部が遮蔽された画像を用いた場合においても、検出対象の物体を高精度に検出することができる。 According to the object detection apparatus of the present invention, even when using an image in which the relative positions of the camera device and the object are significantly different from each other or an image in which a part of the object is shielded, the object to be detected is made highly accurate. It can be detected.
実施例1の物体検出装置の構成例を示す図である。FIG. 1 is a view showing an example of the arrangement of an object detection apparatus according to a first embodiment; 実施例1の識別候補領域抽出部の詳細を示す図である。FIG. 6 is a diagram showing details of an identification candidate region extraction unit of the first embodiment. 実施例1の識別候補領域情報管理部の詳細を示す図である。FIG. 8 is a diagram showing details of an identification candidate area information management unit of the first embodiment. 二次元画像中の識別候補領域を示す図である。It is a figure which shows the identification candidate area | region in a two-dimensional image. 三次元撮影空間中の識別候補領域を示す図である。It is a figure which shows the identification candidate area | region in three-dimensional imaging | photography space. 実施例1の識別器の詳細を示す図である。FIG. 7 is a diagram showing details of a discriminator of the first embodiment. 実施例1の画像変換方法決定部の詳細を示す図である。FIG. 7 is a diagram showing details of an image conversion method determination unit of the first embodiment. 実施例1の視点変換部の処理内容を説明する図である。FIG. 6 is a diagram for explaining the processing content of the viewpoint conversion unit of the first embodiment. 画像変換方法決定部の効果を説明する図である。It is a figure explaining the effect of an image conversion method determination part. 画像変換方法決定部の効果を説明する図である。It is a figure explaining the effect of an image conversion method determination part. 実施例1の構成例の識別部の詳細を示す図である。FIG. 7 is a diagram showing details of an identification unit of the configuration example of the first embodiment. 実施例1における処理フロー例を示す図である。FIG. 7 is a diagram showing an example of a process flow in the first embodiment. 実施例2の物体検出装置の構成例を示す図である。FIG. 7 is a diagram showing an example of the configuration of an object detection device according to a second embodiment. 実施例2の画像変換方法決定部の処理を説明する図である。FIG. 18 is a diagram for explaining the process of the image conversion method determination unit of the second embodiment. 実施例2の画像変換方法決定部の処理フローを説明する図である。FIG. 16 is a diagram for explaining the processing flow of the image conversion method determination unit of the second embodiment. 図12の処理フローの詳細を説明する図である。It is a figure explaining the detail of the processing flow of FIG.
 以下、本発明の実施例について、適宜図面を参照しながら詳細に説明する。なお、以下では検出対象を人物とした例を説明するが、検出対象は人物に限定されず、貨物や車両等であっても良い。また、カメラ等の撮像装置で撮影した画像情報から検出対象を検出する例を説明するが、検出対象を含む情報は撮像装置で撮影した画像情報に限定されず、サーモセンサで取得したヒートマップであっても良い。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Although an example in which the detection target is a person will be described below, the detection target is not limited to a person, and may be cargo, a vehicle, or the like. Although an example of detecting a detection target from image information captured by an imaging device such as a camera will be described, information including the detection target is not limited to image information captured by an imaging device, and a heat map acquired by a thermo sensor It may be.
 実施例1の物体検出装置2aについて、図1から図9を用いて説明する。 An object detection apparatus 2a according to the first embodiment will be described with reference to FIGS.
 図1は、ステレオカメラ等の撮像装置1と接続された、本実施例の物体検出装置2aの概要を示すブロック図である。物体検出装置2aは、撮像装置1と検出対象の相対位置の変化により、撮像装置1の撮影画像上での検出対象の見え方が変化した場合であっても、検出対象の頑健な検出を実現する物体検出装置である。 FIG. 1 is a block diagram showing an outline of an object detection device 2a of the present embodiment connected to an imaging device 1 such as a stereo camera. The object detection device 2a realizes robust detection of the detection target even when the appearance of the detection target on the captured image of the imaging device 1 changes due to a change in relative position between the imaging device 1 and the detection target. Object detection device.
 図1に示す物体検出装置2aにおいて、3は撮像装置1からの入力を基に計測範囲内の画像情報を取得する画像取得部、4は撮像装置1からの入力を基に計測範囲内の三次元情報を取得する三次元情報取得部、5は画像情報と三次元情報を利用して検出対象が存在し得る領域である識別候補領域を計測範囲から抽出する識別候補領域抽出部、6は物体検出装置2aにて使用する識別器64の情報を取得する識別器情報取得部、7aは識別器の情報を用いて識別候補領域を、識別器64の入力として最適な画像へ変換する方法を決定する画像変換方法決定部、8は決定された画像変換方法に基づき識別候補領域から変換画像を取得する画像変換部、9は変換画像中に検出対象が含まれるか否かを判別する識別部である。なお、画像取得部3から識別部9の一部または全部は、必ずしも専用のハードウェアである必要はなく、半導体メモリ等の主記憶装置に記憶されたプログラムやハードディスク等の補助記憶装置に記憶されたデータを、CPU等の演算装置で処理することで実現されるものであっても良い。 In the object detection device 2a shown in FIG. 1, 3 is an image acquisition unit for acquiring image information within the measurement range based on the input from the imaging device 1, and 4 is a third order within the measurement range based on the input from the imaging device 1. A three-dimensional information acquisition unit for acquiring source information, 5 is an identification candidate area extraction unit for extracting an identification candidate area which is an area where a detection target may exist using image information and three-dimensional information from a measurement range, and 6 is an object A discriminator information acquisition unit for acquiring information of the discriminator 64 used in the detection device 2a, 7a determines a method for converting the discriminant candidate region into an optimal image as an input of the discriminator 64 using the discriminator information An image conversion method determination unit for performing conversion, an image conversion unit for acquiring a converted image from an identification candidate area based on the determined image conversion method, and an identification unit for determining whether or not a detection target is included in the converted image. is there. Note that some or all of the image acquisition unit 3 to the identification unit 9 need not necessarily be dedicated hardware, and may be stored in a program stored in a main storage device such as a semiconductor memory or an auxiliary storage device such as a hard disk. It may be realized by processing the data with an arithmetic device such as a CPU.
 以下、図1に示した、撮像装置1、識別候補領域抽出部5、識別器情報取得部6、画像変換方法決定部7a、画像変換部8、識別部9について、個々に詳細説明する。
<撮像装置>
 撮像装置1は、計測範囲の画像情報と三次元情報を取得できる装置である。ここで、画像情報とはデジタル画像データにおける輝度情報、三次元情報とは計測範囲(三次元空間)における三次元点群の座標情報である。
Hereinafter, the imaging device 1, the identification candidate area extraction unit 5, the classifier information acquisition unit 6, the image conversion method determination unit 7a, the image conversion unit 8 and the identification unit 9 shown in FIG. 1 will be individually described in detail.
<Imaging device>
The imaging device 1 is a device capable of acquiring image information of a measurement range and three-dimensional information. Here, image information is luminance information in digital image data, and three-dimensional information is coordinate information of a three-dimensional point group in a measurement range (three-dimensional space).
 撮像装置1としては、2台以上のカメラからなるステレオカメラや、1台のカメラと三次元情報を取得可能な距離センサの組み合わせでもよい。例えば、ステレオカメラは、2台以上のカメラで同一の対象を撮影することにより、三角測量の原理を利用してカメラから対象までの距離を計測するものであり、画像情報と三次元情報の両方を取得することができる。また、距離センサは投射した光が対象で反射し、距離センサに戻るまでの時間を、投射光と反射光の位相差から算出することで、対象までの距離を計測するものであり、予め位置合わせをしたカメラと組み合わせることで、三次元情報と画像情報を関連付けて取得できる。
<識別候補領域抽出部>
 図2は識別候補領域抽出部5の詳細を示している。識別候補領域抽出部5は、画像取得部3および三次元情報取得部4の取得する画像情報もしくは三次元情報、またはその両方を利用し、検出対象が存在し得る識別候補領域55を抽出するものであり、画像情報を用いて識別候補領域55を抽出する画像処理部51と、三次元情報を用いて識別候補領域55を抽出する三次元情報処理部52と、抽出した1つ以上の識別候補領域55にIDを付与する識別候補領域ID付与部53と、識別候補領域55の位置を表す識別候補領域情報を取得し、管理する識別候補領域情報管理部54を備えている。以下、画像処理部51、三次元情報処理部52、識別候補領域ID付与部53、識別候補領域情報管理部54について詳細に説明する。
The imaging device 1 may be a stereo camera composed of two or more cameras, or a combination of one camera and a distance sensor capable of acquiring three-dimensional information. For example, a stereo camera measures the distance from a camera to a subject using the principle of triangulation by photographing the same subject with two or more cameras, and both of image information and three-dimensional information You can get In addition, the distance sensor measures the distance to the target by calculating from the phase difference between the projected light and the reflected light the time it takes for the projected light to be reflected by the target and returned to the distance sensor. By combining with the combined camera, three-dimensional information and image information can be associated and acquired.
<Identification candidate area extraction unit>
FIG. 2 shows the details of the identification candidate area extraction unit 5. The identification candidate area extraction unit 5 extracts an identification candidate area 55 in which a detection target may exist using image information or three-dimensional information acquired by the image acquisition unit 3 and the three-dimensional information acquisition unit 4 or both of them. Image processing unit 51 extracting identification candidate region 55 using image information; three-dimensional information processing unit 52 extracting identification candidate region 55 using three-dimensional information; and one or more extracted extraction candidates An identification candidate area ID assigning unit 53 for assigning an ID to the area 55, and an identification candidate area information management unit 54 for acquiring and managing identification candidate area information representing the position of the identification candidate area 55 are provided. Hereinafter, the image processing unit 51, the three-dimensional information processing unit 52, the identification candidate area ID assignment unit 53, and the identification candidate area information management unit 54 will be described in detail.
 画像処理部51は、撮像装置1が取得した画像情報に対して画像処理を実施することで識別候補領域55を抽出する。ここで実行される画像処理としては、例えば、検出対象が存在しない状態の撮影空間を撮影した背景画像を予め取得しておき、その背景画像と撮影した画像との差分を算出する背景差分があるが、肌色検出などのカラー情報を用いた検出など、画像情報によって検出対象の領域を抽出できる手段であれば、特に限定しない。 The image processing unit 51 extracts an identification candidate area 55 by performing image processing on the image information acquired by the imaging device 1. As the image processing executed here, for example, there is a background difference in which a background image obtained by capturing a shooting space in a state in which no detection target exists is obtained in advance, and the difference between the background image and the captured image is calculated. However, the method is not particularly limited as long as it is a means capable of extracting a region to be detected by image information, such as detection using color information such as skin color detection.
 三次元情報処理部52は、撮像装置1が取得した三次元情報に対して三次元処理を実施することで識別候補領域55を抽出する。ここで実行される三次元処理としては、例えば、検出対象が存在しない状態の撮影空間の背景三次元情報を予め取得しておき、その背景三次元情報と改めて取得した三次元情報との差分を算出する方法があるが、三次元処理を実施することで識別候補領域55を取得するものであれば、特に限定しない。 The three-dimensional information processing unit 52 extracts the identification candidate area 55 by performing three-dimensional processing on the three-dimensional information acquired by the imaging device 1. As the three-dimensional processing to be executed here, for example, background three-dimensional information of the shooting space in a state where no detection target exists is obtained in advance, and the difference between the background three-dimensional information and the three-dimensional information obtained anew Although there is a method of calculation, it is not particularly limited as long as the identification candidate area 55 is obtained by performing three-dimensional processing.
 次に、図3A~図3Cを用いて、識別候補領域ID付与部53と識別候補領域情報管理部54について説明する。 Next, the identification candidate area ID assignment unit 53 and the identification candidate area information management unit 54 will be described with reference to FIGS. 3A to 3C.
 識別候補領域ID付与部53では、画像処理部51や三次元情報処理部52で抽出した識別候補領域55の各々に対しIDを付与する。また、識別候補領域情報管理部54では、IDに当該識別候補領域の位置情報を付加し、識別候補領域情報54_nとして管理する。なお、位置情報は、当該識別候補領域の二次元画像中の始点と終点を示す画像位置、および、当該識別候補領域の三次元撮影空間中の始点と終点を示す三次元位置である。 The identification candidate area ID assigning unit 53 assigns an ID to each of the identification candidate areas 55 extracted by the image processing unit 51 and the three-dimensional information processing unit 52. Further, the identification candidate area information management unit 54 adds position information of the identification candidate area to the ID, and manages the ID as identification candidate area information 54 — n. The position information is an image position indicating the start point and the end point in the two-dimensional image of the identification candidate area, and a three-dimensional position indicating the start point and the end point in the three-dimensional imaging space of the identification candidate area.
 図3Aは、識別候補領域情報管理部54が管理するn個の識別候補領域情報54_nを例示したものであり、各々の識別候補領域情報54_nには、IDに加え、対応する識別候補領域55の位置情報である画像位置と三次元位置が記録されていることを示している。図3Bは、識別候補領域情報54_1の画像位置を具体的に示すものであり、56a、56bは、撮像装置1の撮影画像における矩形の識別候補領域55の始点(x1,y1)と終点(x1’,y1’)を示している。同様に、図3Cは、識別候補領域情報54_1の三次元位置を具体的に示すものであり、57a、57bは、直方体状の識別候補領域55の始点(X1,Y1,Z1)と終点(X1’,Y1’,Z1’)を示している。なお、図3B、図3Cでは、矩形、直方体状の識別候補領域55を例示したが、識別候補領域55の位置を特定できる表現であれば、他の形状の識別候補領域を用いても良い。この場合、図3A中の画像位置、三次元位置の情報も当該他の形状の識別候補領域に合わせた表現とすることは言うまでもない。
<識別器情報取得部>
 次に、図4を用いて、識別器情報取得部6を説明する。識別器情報取得部6は複数用意されている識別器64から適切なものを選択し、それに対応した識別器情報65を抽出するものである。なお、67_nは識別器64_nを管理するために付与される識別器IDである。
FIG. 3A exemplifies n pieces of identification candidate area information 54_n managed by the identification candidate area information management unit 54. In each of the identification candidate area information 54_n, in addition to the ID, the corresponding identification candidate area 55 is It indicates that the image position and the three-dimensional position which are position information are recorded. FIG. 3B specifically shows the image position of the identification candidate area information 54_1, and 56a and 56b indicate the start point (x1, y1) and the end point (x1) of the rectangular identification candidate area 55 in the captured image of the imaging device 1. ', y 1') is shown. Similarly, FIG. 3C specifically shows the three-dimensional position of the identification candidate area information 54_1, and 57a and 57b indicate the start point (X1, Y1, Z1) and the end point (X1) of the rectangular identification candidate area 55. ', Y1', Z1 ') are shown. Although FIG. 3B and FIG. 3C illustrated the rectangular or rectangular parallelepiped identification candidate area 55, as long as the expression can identify the position of the identification candidate area 55, identification candidate areas of other shapes may be used. In this case, it is needless to say that the information on the image position and the three-dimensional position in FIG. 3A is also expressed in accordance with the identification candidate area of the other shape.
<Classifier Information Acquisition Unit>
Next, the discriminator information acquisition unit 6 will be described with reference to FIG. The discriminator information acquisition unit 6 selects an appropriate one from a plurality of prepared discriminators 64 and extracts the discriminator information 65 corresponding thereto. Note that 67_n is a discriminator ID given to manage the discriminator 64_n.
 識別器64は、撮像装置1の撮影画像中に検出対象が含まれるかを判別する識別処理に用いられ、識別器64_nの夫々は、異なる姿勢の検出対象に対して高い識別能力を有するものである。検出対象を含む画像と含まない画像(学習サンプル)を機械学習方法により多数学習することで、各々の識別器64_nに異なる特性を持たせることができる。なお、機械学習方法としては、Support Vector Machineが一般的であるが、他の機械学習方法を用いても良い。 The discriminator 64 is used for discrimination processing to discriminate whether a detection target is included in a captured image of the imaging device 1, and each of the discriminators 64_n has high discrimination ability with respect to detection targets of different postures. is there. Different classifiers 64 — n can have different characteristics by learning a large number of images including the detection target and images (learning samples) not including the detection target by the machine learning method. Although a Support Vector Machine is generally used as a machine learning method, another machine learning method may be used.
 識別器情報65_nは識別器64_nが特に高い識別能力を発揮する入力画像を示すものである。図4では、識別器情報として、正面視した人物画像の識別に強いテンプレート66_1、上面視した人物画像の識別に強いテンプレート66_2、側面視した人物画像の識別に強いテンプレート66_nを例示しているが、色情報や輪郭を表現する特徴量、或いは、輝度情報、勾配情報など、識別器64_nの入力に適当な画像もしくは画像の生成方法を表現する識別器情報であれば、他の情報を記録しておいても良い。
<画像変換方法決定部>
 次に、図5を用いて、画像変換方法決定部7aの処理フローを説明する。画像変換方法決定部7aは、図3Cに例示した直方体状の識別候補領域55内の三次元情報を基に、識別器64への入力として最適な画像へ変換するための変換方法(パラメータ等)を決定するものである。画像変換方法決定部7aの処理フローとしては、まず視点変換のパラメータを決定し(S51)、そのパラメータを用いて視点変換画像を生成する(S52)。そして、複数の識別器64の各々が保持する識別器情報65を参照して変換画像との類似度を算出し(S53)、類似度が閾値より高ければ処理を終了し、閾値以下ならステップS51に戻り、パラメータを他の値に変更する(S54)。以下、ステップS51、S52、S53、S54について詳しく説明する。
The discriminator information 65 — n indicates an input image in which the discriminator 64 — n exhibits a particularly high discrimination capability. FIG. 4 exemplifies a template 66_1 strong in identifying a person image viewed from the front, a template 66_2 strong in identifying a person image viewed from the top, and a template 66_n strong in identifying a person image viewed from the side as discriminator information. Other information is recorded as long as it is identifier information representing color information, feature information representing an outline, luminance information, gradient information, etc. that is an image or a method of generating an image suitable for the input of the identifier 64_n. You may leave it.
<Image conversion method determination unit>
Next, the process flow of the image conversion method determination unit 7a will be described using FIG. The image conversion method determination unit 7a is a conversion method (parameters and the like) for converting into an optimal image as an input to the discriminator 64 based on three-dimensional information in the rectangular parallelepiped identification candidate area 55 illustrated in FIG. 3C. To determine the In the processing flow of the image conversion method determination unit 7a, first, parameters for viewpoint conversion are determined (S51), and a viewpoint conversion image is generated using the parameters (S52). Then, the similarity with the converted image is calculated with reference to the classifier information 65 held by each of the plurality of classifiers 64 (S53), and if the similarity is higher than the threshold, the process is ended, and if below the threshold, step S51. , And change the parameter to another value (S54). Hereinafter, steps S51, S52, S53, and S54 will be described in detail.
 ステップS51では、視点変換画像の生成に必要なパラメータα、β、γを決定する。なお、各パラメータの詳細については後段にて詳しく説明する。ステップS51における、パラメータα、β、γの決定方法としては網羅的に変動させる方法がある。 In step S51, parameters α, β, and γ necessary for generating a viewpoint conversion image are determined. The details of each parameter will be described in detail later. As a method of determining the parameters α, β, and γ in step S51, there is a method of changing the parameters comprehensively.
 ステップS52では、図6に示す処理を行う。図6において、82は識別候補領域55を観測する視点、83、84、85は計測範囲に設定した三次元空間の座標系におけるx軸、y軸、z軸、86_1、86_2は視点変換によって作成される変換画像の一例を示している。ステップS52では、ステップS51で決定したパラメータα、β、γを用いて、直方体状の識別候補領域55に含まれる三次元情報を、x軸83を中心にα、y軸84を中心にβ、z軸85を中心にγだけ回転させることで、任意の視点から観測した状態に視点変換し、視点変換後の識別候補領域55を画像に投影することで変換画像86を取得する。 In step S52, the process shown in FIG. 6 is performed. In FIG. 6, 82 is a viewpoint for observing the identification candidate area 55, and 83, 84, 85 are x-axis, y-axis, z-axis, 86_1, 86_2 in the coordinate system of the three-dimensional space set in the measurement range by viewpoint conversion. An example of the converted image is shown. In step S52, using the parameters α, β, and γ determined in step S51, three-dimensional information included in the rectangular identification candidate area 55 is α about the x axis 83, β about the y axis 84, and By rotating by about γ about the z axis 85, viewpoint conversion is performed to a state observed from an arbitrary viewpoint, and a conversion candidate 86 is acquired by projecting the identification candidate area 55 after viewpoint conversion onto an image.
 視点変換の方法としては、式1~式3のような変換式を用いることが一般的であるが、他の視点変換方法を用いても良い。 As a viewpoint conversion method, it is general to use conversion equations such as Equation 1 to Equation 3, but other viewpoint conversion methods may be used.
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000003
 識別候補領域55を変換画像86_nに投影する方法としては、透視投影が一般的な方法であるが、他の方法を用いても良い。例えば、直立する人物の三次元情報を含む識別候補領域55を、視点82の方向に設置した撮像装置1で撮影した場合、視点変換せずに、識別候補領域55を投影することで、人物を上面視した変換画像86_1を取得できる。これに対し、同じ撮像装置1で撮影した識別候補領域55を、α=0°、β=0°、γ=90°だけ回転する視点変換を実施し、視点82に対して識別候補領域55を画像に投影すると、人物を側面視した変換画像86_2を取得できる。
Figure JPOXMLDOC01-appb-M000003
As a method of projecting the identification candidate area 55 on the transformed image 86 — n, perspective projection is a general method, but other methods may be used. For example, when the identification candidate area 55 including three-dimensional information of a person standing up is photographed by the imaging device 1 installed in the direction of the viewpoint 82, the person is displayed by projecting the identification candidate area 55 without changing the viewpoint. The converted image 86_1 viewed from the top can be acquired. On the other hand, viewpoint conversion is performed to rotate the identification candidate area 55 captured by the same imaging device 1 by α = 0 °, β = 0 °, γ = 90 °, and the identification candidate area 55 for the viewpoint 82 is When projected onto an image, it is possible to acquire a converted image 86_2 in which a person is viewed from the side.
 さらに、図5のステップS53では、最適化処理を実施することで、識別器64に対して最も適する画像への画像変換方法を決定する。画像変換方法の決定方法としては、例えば、識別器情報65を参照してテンプレート66を取得し、識別候補領域55に対して視点変換を実施して取得する変換画像86_nとの類似度を算出する方法などがある。類似度の算出方法として、例えば、Normalized Cross-Correlationなどのパターンマッチングを用いることが一般的であるが、他の方法を用いても良い。この際、評価関数を類似度とし、パラメータα、β、γが変数となる評価関数を設計し、この評価関数を最大化する最適化問題を解くことで、識別器64_nに対して類似度が最大となる画像変換方法を取得する。なお、図4に例示したように、識別器64が2つ以上存在する場合は、識別候補領域55に対して取得する変換画像86_nとの類似度を各識別器に対して算出し、類似度が最大となった識別器64_nのIDを取得しておく。 Furthermore, in step S53 of FIG. 5, an optimization process is performed to determine an image conversion method to an image most suitable for the discriminator 64. As a method of determining the image conversion method, for example, the template 66 is acquired with reference to the discriminator information 65, and viewpoint conversion is performed on the identification candidate area 55 to calculate the similarity with the converted image 86_n acquired. There is a method. As a method of calculating the degree of similarity, for example, pattern matching such as Normalized Cross-Correlation is generally used, but other methods may be used. At this time, the evaluation function is a similarity, and an evaluation function with parameters α, β, and γ as variables is designed, and the similarity to the classifier 64_n is solved by solving an optimization problem that maximizes the evaluation function. Get the largest image conversion method. In addition, as illustrated in FIG. 4, when two or more classifiers 64 exist, the similarity to the conversion image 86_n acquired for the classification candidate region 55 is calculated for each classifier, and the similarity is calculated. The ID of the discriminator 64_n having the largest value is acquired.
 ステップS54ではステップS53で算出した変換画像86_nと識別器情報65の類似度を閾値と比較し、類似度が閾値以上の場合は処理を終了し、閾値未満の場合はステップS51に戻り、異なるパラメータに変更したうえで、同様の処理を繰り返す。ステップS54で用いられる閾値は、物体検出装置2aの設置者が任意に設定してもよいが、所定の閾値により物体検出装置2aによる物体検出を実行した際の物体検出の精度をフィードバックすることで、閾値を適当な値に変更しても良い。例えば、ある閾値を用いた物体検出装置2aの精度が不十分であると判断された際に、閾値をより高い値に変更する方法などがある。 In step S54, the similarity between the converted image 86_n calculated in step S53 and the discriminator information 65 is compared with a threshold, and if the similarity is equal to or greater than the threshold, the process ends; if less than the threshold, the process returns to step S51; After changing to, repeat the same process. The installer of the object detection apparatus 2a may optionally set the threshold used in step S54, but by feeding back the accuracy of object detection when object detection is performed by the object detection apparatus 2a using a predetermined threshold. The threshold may be changed to an appropriate value. For example, there is a method of changing the threshold value to a higher value when it is determined that the accuracy of the object detection device 2a using a certain threshold value is insufficient.
 なお、ステップS51にてパラメータを決定する際に、画像変換方法決定部7aは予めパラメータα、β、γによって生成される変換画像の縦横の比率を記録した行列マップを作成し、それを参照して決定しても良い。あるいは、撮影空間を複数の領域に分割し、各領域に対しておおよそ有効であるパラメータα、β、γを保持した行列マップを用意し、それを参照して決定しても良い。その際、行列マップが保持するパラメータα、β、γよりも適したものが判明した場合には更新する方法でも良い。あるいは、カメラパラメータを取得して撮像装置1の設置状態の情報を取得することで、おおよそ有効であるパラメータα、β、γを決定する方法でも良い。 When determining the parameters in step S51, the image conversion method determination unit 7a creates a matrix map in which the ratio of the vertical and the horizontal of the converted image generated by the parameters α, β, and γ is recorded in advance. You may decide. Alternatively, the imaging space may be divided into a plurality of regions, and a matrix map holding parameters α, β and γ that are approximately effective for each region may be prepared and determined with reference to the matrix map. At that time, a method may be used in which it is possible to update if a more suitable one than the parameters α, β and γ held by the matrix map is found. Alternatively, the parameters α, β, and γ that are approximately effective may be determined by acquiring camera parameters and acquiring information on the installation state of the imaging device 1.
 また、画像変換方法決定部7aにおいて、パラメータα、β、γを変更して類似度を計算する処理を続行するか否かを判断し、続行する場合はステップS51に戻り、続行しない場合は処理を終了することとしても良い。処理を実行するか否かの判断基準は、例えば、パラメータα、β、γを変更した回数が、あらかじめ設定した回数を上回ったかどうかによって決定しても良い。あるいは、ステップS53にて計算した類似度があらかじめ設定した最低値以下の場合に処理を終了するという方式でも良い。類似度が閾値以上にならない場合でも処理を終了することにより、識別候補領域55に検出対象の物体が含まれていない場合に、物体検出装置2aの物体検出処理が繰り返し実施される無駄を防ぐことができる。 Further, in the image conversion method determination unit 7a, it is determined whether or not the process of calculating the degree of similarity by changing the parameters α, β, and γ is continued, and if continuing, the process returns to step S51. It is good to finish. The determination criterion of whether or not to execute the process may be determined, for example, depending on whether the number of times of changing the parameters α, β and γ exceeds the number set in advance. Alternatively, the process may be terminated when the degree of similarity calculated in step S53 is equal to or less than the preset minimum value. Even if the similarity does not exceed the threshold value, the process is terminated to prevent waste that the object detection process of the object detection device 2a is repeatedly performed in the case where the identification candidate area 55 does not include the object to be detected. Can.
 次に、図7A、図7Bを用いて、画像変換方法決定部7aの効果を説明する。図7Aにおいて82a、82b、82cは撮像装置1の視点(設置位置・方向)を示し、87a、87b、87cはそれぞれの視点の撮影画像から抽出した、人物を含む矩形画像を示す。 Next, the effects of the image conversion method determination unit 7a will be described using FIGS. 7A and 7B. In FIG. 7A, 82a, 82b, and 82c indicate viewpoints (installation positions and directions) of the imaging device 1, and 87a, 87b, and 87c indicate rectangular images including a person extracted from the photographed images of the respective viewpoints.
 直立する人物を撮影した矩形画像87a、87b、87cを基に、図7Bに示す識別器64_1、64_2を用いて人物を検出する場合、視点82aから撮影した矩形画像87aは識別器64_1のテンプレート66_1と類似度が高く、視点82cから撮影した矩形画像87cは識別器64_2のテンプレート66_2と類似度が高い。そのため、矩形画像87aと矩形画像87cについては、識別器64_1または識別器64_2を用いることで容易に人物を検出できる。 When a person is detected using the identifiers 64_1 and 64_2 shown in FIG. 7B based on the rectangular images 87a, 87b, and 87c obtained by capturing an upright person, the rectangular image 87a captured from the viewpoint 82a is a template 66_1 of the identifier 64_1. And the rectangular image 87c taken from the viewpoint 82c has a high degree of similarity with the template 66_2 of the discriminator 64_2. Therefore, for the rectangular image 87a and the rectangular image 87c, a person can be easily detected by using the discriminator 64_1 or the discriminator 64_2.
 一方、視点82bから撮影した矩形画像87b中の人物には、撮像装置1の視線の傾きを原因とする変形(テンプレートとのずれ)が発生しており、テンプレート66_1、テンプレート66_2の何れとも類似度が低いため、識別器64_1、識別器64_2では人物を識別できない。このため、視点82bの撮像装置1だけが設置された現場では、従来は人物を検出することが困難であった。 On the other hand, in the person in the rectangular image 87b photographed from the viewpoint 82b, deformation (displacement from the template) occurs due to the inclination of the line of sight of the imaging device 1, and the degree of similarity with both the template 66_1 and the template 66_2 Is low, the discriminator 64_1 and the discriminator 64_2 can not discriminate a person. Therefore, conventionally, it has been difficult to detect a person at a site where only the imaging device 1 of the viewpoint 82b is installed.
 このような場合であっても、本実施例の画像変換方法決定部7aを用いることで、矩形画像87b中の変形した検出対象も検出が可能となる。以下に視点82bから撮影した変形した人物を検出する手順を説明する。 Even in such a case, it is possible to detect a deformed detection target in the rectangular image 87b by using the image conversion method determination unit 7a of this embodiment. A procedure for detecting a deformed person photographed from the viewpoint 82b will be described below.
 まず、パラメータα、β、γを決定し、それを入力として矩形画像87bに対し仮想的な視点変換を実施することで変換画像86bを作成する。そして、変換画像86bとテンプレート66_1およびテンプレート66_2との類似度を計算し、閾値以上の類似度を示す識別器64_nがあった場合、その識別器のID67を取得する。閾値以上の類似度を示す識別器64_nがない場合は、パラメータα、β、γを再度決定し、同様の処理を実施する。矩形画像87bはカメラの傾きによる変形が発生しているものの、人物の正面の画像情報、三次元情報を取得できている。そのため、視点82bから撮影した矩形画像87bを視点82aへ仮想的に視点変換した場合、矩形画像87aと類似した変換画像86bを取得でき、識別器64_1への入力に適した画像を得ることが可能となる。 First, the parameters α, β, γ are determined, and the converted image 86 b is created by performing virtual viewpoint conversion on the rectangular image 87 b using the parameters as inputs. Then, the similarity between the converted image 86b and the template 66_1 and the template 66_2 is calculated, and when there is a classifier 64_n indicating the similarity equal to or higher than the threshold, the ID 67 of the classifier is acquired. If there is no classifier 64_n indicating the similarity equal to or higher than the threshold, the parameters α, β, γ are determined again, and the same processing is performed. Although the rectangular image 87b is deformed due to the tilt of the camera, it is possible to acquire image information on the front of the person and three-dimensional information. Therefore, when the rectangular image 87b photographed from the viewpoint 82b is virtually converted to the viewpoint 82a, a converted image 86b similar to the rectangular image 87a can be obtained, and an image suitable for input to the discriminator 64_1 can be obtained. It becomes.
 同様に、視点82bから撮影した矩形画像87bを視点82cへ仮想的に視点変換した場合、矩形画像87cと類似した変換画像86bを取得でき、識別器64_2への入力に適した画像を得ることが可能となる。 Similarly, when the rectangular image 87b captured from the viewpoint 82b is virtually converted to the viewpoint 82c, a converted image 86b similar to the rectangular image 87c can be obtained, and an image suitable for input to the discriminator 64_2 can be obtained. It becomes possible.
 ここで、視点82bと人物の間に障害物が存在し、人体の一部(例えば脚部)が矩形画像87bに映らない状況下での画像変換方法決定部7aの利点を説明する。矩形画像87bを視点82aへ視点変換した場合、矩形画像87bと同様に変換画像86bも脚部を欠落するため、脚部検出を必要とする識別器64_1では人物を検出できない。これに対し、矩形画像87bを視点82cへ視点変換した場合、矩形画像87bと同様に変換画像86bも脚部を欠落しているが、脚部検出が不要な識別器64_2では人物を検出することができる。すなわち、人体の一部が欠落した矩形画像87bが入力された場合であっても、画像変換方法決定部7aにて適切な画像変換方法を決定し、それに応じた識別器64を選択すれば、正確な人物検出を実現することができる。
<画像変換部>
 画像変換部8は、画像変換方法決定部7aが決定した画像変換方法に従って識別候補領域55を変換し、識別器64への入力に適した変換画像86を取得するものである。なお、画像変換方法はステップS52と同様に、例えば式1~式3のような変換式を用いることができるが、他の方法を用いても良い。
<識別部>
 図8は識別部9の詳細を示している。識別部9は、画像変換部8が取得する変換画像86_n中に検出対象が含まれるか否かを判定するものであり、少なくとも1つ以上の識別器64_nを記録する識別器記録部91と、識別器64_nを用いて変換画像86_nに対して識別処理を実施する識別処理実施部92と、識別処理の結果を出力する識別結果出力部を備える。以下、識別処理実施部92、識別結果出力部93について、詳細に説明する。
Here, the advantage of the image conversion method determination unit 7a in a situation where an obstacle exists between the viewpoint 82b and a person and a part of the human body (for example, a leg) is not seen in the rectangular image 87b will be described. When the rectangular image 87b is subjected to viewpoint conversion to the viewpoint 82a, the converted image 86b loses the leg as well as the rectangular image 87b, so the human can not be detected by the classifier 64_1 that needs the leg detection. On the other hand, when the rectangular image 87b is subjected to viewpoint conversion to the viewpoint 82c, similarly to the rectangular image 87b, although the converted image 86b also lacks a leg, a human being is detected by the classifier 64_2 which does not need the leg detection. Can. That is, even when the rectangular image 87b in which a part of the human body is missing is input, if an appropriate image conversion method is determined in the image conversion method determination unit 7a and the classifier 64 corresponding thereto is selected, Accurate person detection can be realized.
<Image converter>
The image conversion unit 8 converts the identification candidate area 55 in accordance with the image conversion method determined by the image conversion method determination unit 7a, and acquires a converted image 86 suitable for input to the classifier 64. As the image conversion method, as in step S52, for example, conversion equations such as Equations 1 to 3 can be used, but other methods may be used.
<Identifier>
FIG. 8 shows the details of the identification unit 9. The identification unit 9 determines whether or not the detection target is included in the converted image 86_n acquired by the image conversion unit 8, and includes a classifier recording unit 91 that records at least one or more classifiers 64_n, It includes an identification processing execution unit 92 that performs identification processing on the converted image 86_n using the identifier 64_n, and an identification result output unit that outputs the result of the identification processing. Hereinafter, the identification process execution unit 92 and the identification result output unit 93 will be described in detail.
 識別処理実施部92は、識別器記録部91に記録された識別器64_nを用いて、画像変換部8が出力した変換画像86_nに対し識別処理を実施する。識別器記録部91に識別器64_nが2つ以上記録されている場合、識別処理実施部92は、画像変換方法決定部7aにて選択された識別器64_nのIDを取得し、そのIDに対応する識別器64_nを選択した後に、画像変換実施部7が出力した変換画像86_nに対し識別処理を実施する。 The discrimination processing execution unit 92 performs discrimination processing on the converted image 86_n output from the image conversion unit 8 using the classifier 64_n recorded in the classifier recording unit 91. When two or more classifiers 64_n are recorded in the classifier recording unit 91, the classification processing execution unit 92 acquires the ID of the classifier 64_n selected by the image conversion method determination unit 7a, and corresponds to the ID. After selecting the discriminator 64_n, the discrimination process is performed on the converted image 86_n output from the image conversion execution unit 7.
 識別結果出力部93は、識別処理実施部92の識別処理結果を外部に出力する。例えば、物体検出装置2aがモニタなどの表示装置に接続される場合、その表示装置に撮影空間の画像を表示してもよい。そして、識別処理実施部92が、変換画像86_n中に検出対象を含むと判定した場合、その変換画像86_nの基となった識別候補領域55の識別候補領域情報54_nを参照し、撮像装置1の撮影画像中における識別候補領域55の画像位置を取得する。そして、表示装置に表示される撮影画像中の検出対象に対応する位置に矩形の検出窓等を表示したり、検出対象が検出された旨をメッセージとして表示してもよい。
<処理フロー>
 次に、図9を用いて、本実施例の物体検出装置2aにおける物体検出の処理フローを説明する。
The identification result output unit 93 outputs the identification processing result of the identification processing execution unit 92 to the outside. For example, when the object detection device 2a is connected to a display device such as a monitor, the image of the imaging space may be displayed on the display device. Then, if the identification processing execution unit 92 determines that the converted image 86 _n includes the detection target, the identification candidate area information 54 _n of the identification candidate area 55 that is the basis of the converted image 86 _n is referred to. The image position of the identification candidate area 55 in the photographed image is acquired. Then, a rectangular detection window or the like may be displayed at a position corresponding to the detection target in the captured image displayed on the display device, or a message may be displayed indicating that the detection target has been detected.
<Processing flow>
Next, a processing flow of object detection in the object detection device 2a of the present embodiment will be described using FIG.
 ステップS91では、先ず、撮像装置1は計測範囲に対応する画像情報及び三次元情報を取得し、物体検出装置2aに出力する。画像取得部3は撮像装置1からの入力を基に画像情報を取得し、三次元情報取得部4は撮像装置1からの入力を基に三次元情報を取得する。 In step S91, first, the imaging device 1 acquires image information and three-dimensional information corresponding to the measurement range, and outputs the image information and the three-dimensional information to the object detection device 2a. The image acquisition unit 3 acquires image information based on the input from the imaging device 1, and the three-dimensional information acquisition unit 4 acquires three-dimensional information based on the input from the imaging device 1.
 ステップS92では、識別候補領域抽出部5を用いて識別候補領域55を抽出する。具体的には、画像処理部51で抽出した矩形領域と、三次元情報処理部52で抽出した直方体領域を識別候補領域55とした後、抽出された識別候補領域55に対し、識別候補領域ID付与部53によってIDを付与する。 In step S 92, the identification candidate area 55 is extracted using the identification candidate area extraction unit 5. Specifically, after using the rectangular area extracted by the image processing unit 51 and the rectangular parallelepiped area extracted by the three-dimensional information processing unit 52 as the identification candidate area 55, the identification candidate area ID of the extracted identification candidate area 55 The assigning unit 53 assigns an ID.
 ステップS93では、抽出された識別候補領域55から、識別処理の対象とする1つの識別候補領域55を選択する。 In step S93, one identification candidate area 55 to be subjected to identification processing is selected from the extracted identification candidate areas 55.
 ステップS94では、選択された識別候補領域55に対し、視点変換を実施し、画像に投影することで変換画像86_nを取得する。最適化処理により、識別器情報65に対して類似度が最大となる画像変換方法を取得する。識別器64_nが2つ以上存在する場合は、例えば、識別候補領域55に対して視点変換を実施して取得する変換画像86_nとテンプレート66の類似度が最大となった識別器64_nのIDを取得し、対応する識別器64_nに対して適切な画像変換方法を決定する。 In step S94, viewpoint conversion is performed on the selected identification candidate area 55, and the converted image 86_n is acquired by projecting the image onto the image. By the optimization processing, an image conversion method in which the degree of similarity with respect to the classifier information 65 is maximum is acquired. When there are two or more classifiers 64_n, for example, the ID of the classifier 64_n having the highest similarity between the converted image 86_n acquired by performing viewpoint conversion on the candidate identification area 55 and the template 66 is obtained And determine an appropriate image conversion method for the corresponding classifier 64_n.
 ステップS95では、ステップS94にて決定された変換方法により、選択された識別候補領域55に対して画像変換を実施し、変換画像86_nを取得する。 In step S95, image conversion is performed on the selected identification candidate area 55 by the conversion method determined in step S94, and a converted image 86_n is acquired.
 ステップS96では、ステップS95にて取得した変換画像86_nに対して、識別器64_nを用いて識別処理を実施する。 In step S96, identification processing is performed on the converted image 86_n acquired in step S95 using the identifier 64_n.
 ステップS97では、識別処理の結果、変換画像中に検出対象が含まれるか否かを判定する。含まれる場合はステップS98を実施し、含まれない場合はステップS99を実施する。 In step S97, as a result of the identification process, it is determined whether the converted image includes a detection target. If it is included, step S98 is performed. If it is not included, step S99 is performed.
 ステップS98では、識別処理の結果、変換画像中に検出対象が含まれると判定された際に、識別結果を出力する。物体検出装置2aが、例えばモニタなどの表示装置に接続される場合、表示装置に撮影空間の画像を表示し、画像中において、識別候補領域55に対応する位置に矩形の検出窓を表示したり、検出対象が検出された旨を示すメッセージを表示してもよい。識別候補領域55に対応する位置は、識別候補領域情報管理部54に記録される位置情報を参照して取得する。 In step S98, when it is determined that the detection target is included in the converted image as a result of the identification processing, the identification result is output. For example, when the object detection device 2a is connected to a display device such as a monitor, an image of the imaging space is displayed on the display device, and a rectangular detection window is displayed at a position corresponding to the identification candidate area 55 in the image. And a message indicating that a detection target has been detected may be displayed. The position corresponding to the identification candidate area 55 is acquired with reference to the position information recorded in the identification candidate area information management unit 54.
 ステップS99では、選択された識別候補領域55に対する識別処理を終了した後に、ステップS92にて抽出されたすべての識別候補領域55に対して識別処理を実施したかを判定する。そして、識別処理が未実施な識別候補領域55が存在する場合、ステップS93を実施し、識別処理が未実施な識別候補領域55が存在しない場合、物体検出処理を終了する。 In step S99, after the identification process for the selected identification candidate area 55 is completed, it is determined whether the identification process has been performed for all the identification candidate areas 55 extracted in step S92. Then, if there is an identification candidate area 55 for which the identification process has not been performed, step S93 is performed, and if there is no identification candidate area 55 for which the identification process has not been performed, the object detection process is ended.
 以上説明したように、実施例1の物体検出装置2aでは、抽出した識別候補領域55を仮想的な視点変換により識別器への入力に適した画像に変換してから、検出対象の検出を実施することで、画像中の検出対象の見え方が識別器のテンプレートと相違する場合や、画面中の検出対象の一部が遮蔽物によって隠されている場合においても、検出対象を高精度に検出することができる。 As described above, the object detection device 2a according to the first embodiment performs detection of the detection target after converting the extracted identification candidate area 55 into an image suitable for input to the classifier by virtual viewpoint conversion. By doing this, even if the appearance of the detection target in the image is different from the template of the classifier, or even if part of the detection target on the screen is hidden by the shield, the detection target is detected with high accuracy. can do.
 次に、実施例2の物体検出装置2bについて、図10から図13を用いて説明する。なお、実施例1と共通する点は、重複説明を省略する。 Next, an object detection device 2b according to a second embodiment will be described with reference to FIGS. The same points as in the first embodiment will not be repeatedly described.
 図2は、ステレオカメラ等の撮像装置1と接続された、本実施例の物体検出装置2bの概要を示すブロック図である。実施例1の物体検出装置2aでは三次元情報を回転させるためのパラメータα、β、γを網羅的に変更する画像変換方法決定部7aを用いたが、本実施例の物体検出装置2bではより効率的にパラメータα、β、γを決定できる画像変換方法決定部7bを用いる。以下、画像変換方法決定部7bについて詳細に説明する。 FIG. 2 is a block diagram showing an outline of an object detection apparatus 2b according to this embodiment connected to an imaging apparatus 1 such as a stereo camera. Although the object detection device 2a of the first embodiment uses the image conversion method determination unit 7a which comprehensively changes the parameters α, β, and γ for rotating three-dimensional information, the object detection device 2b of the present embodiment is more An image conversion method determination unit 7b capable of efficiently determining the parameters α, β, and γ is used. Hereinafter, the image conversion method determination unit 7b will be described in detail.
 先ず、直立した人物を視点82dから撮影している様子を示す図11を用いて、画像変換方法決定部7bでの処理の概要を説明する。図11において、Xc、Yc、Zcはカメラ座標系のx軸、y軸、z軸であり、204は視点82dに設置した撮像装置1の光軸を示す。ここで、カメラ座標系とは、撮影空間を表す三次元座標系として、撮像装置1のカメラの光学中心を原点とし、z軸(Zc)をカメラの光軸204の方向に一致させ、x軸(Xc)とy軸(Yc)は画像投影面205の横方向と縦方向に平行にとったものである。また、205は画像投影面、206は視点82dから撮影した画像、207は視点82dから取得した三次元情報、208は検出対象の姿勢方向を示す直線、209、210は識別候補領域55と直線208のカメラ座標系における交点座標(Xct,Yct,Zct)、(Xcb,Ycb,Zcb)を示し、211、212は仮想的な視点変換後の識別候補領域55と直線208のカメラ座標系における交点座標(Xct’,Yct’,Zct’)、(Xcb’,Ycb’,Zcb’)を示す。 First, an outline of processing in the image conversion method determination unit 7b will be described with reference to FIG. 11 showing a situation in which an upright person is photographed from the viewpoint 82d. In FIG. 11, Xc, Yc, and Zc are the x-axis, y-axis, and z-axis of the camera coordinate system, and 204 indicates the optical axis of the imaging device 1 installed at the viewpoint 82 d. Here, the camera coordinate system is a three-dimensional coordinate system representing a shooting space, with the optical center of the camera of the imaging device 1 as the origin and the z axis (Zc) aligned with the direction of the optical axis 204 of the camera. (Xc) and the y-axis (Yc) are parallel to the horizontal and vertical directions of the image projection plane 205. Also, 205 is an image projection plane, 206 is an image captured from the viewpoint 82d, 207 is three-dimensional information acquired from the viewpoint 82d, 208 is a straight line indicating the posture direction of the detection target, 209 and 210 are identification candidate areas 55 and straight line 208 Point coordinates (Xct, Yct, Zct) and (Xcb, Ycb, Zcb) in the camera coordinate system of the camera, and 211 and 212 indicate point coordinates in the camera coordinate system of the identification candidate area 55 after the virtual viewpoint conversion and the straight line 208 (Xct ', Yct', Zct ') and (Xcb', Ycb ', Zcb') are shown.
 本実施例の画像変換方法決定部7bでは、図11の下図に示すように、y軸(Yc)に対して傾いた直線208を、y軸(Yc)と平行な直線208’に変換するためのパラメータα、β、γを算出する。そして、変換後の識別候補領域55’の三面図を生成することで、識別器67への入力として最適な変換画像86を取得する。 In the image conversion method determination unit 7b of this embodiment, as shown in the lower part of FIG. 11, the straight line 208 inclined with respect to the y axis (Yc) is converted into a straight line 208 'parallel to the y axis (Yc). Parameters α, β, and γ are calculated. Then, by generating a three-sided view of the converted identification candidate area 55 ′, the optimum converted image 86 is obtained as an input to the discriminator 67.
 図12は、上記のパラメータα、β、γの決定処理を含む、画像変換方法決定部7bの処理フローを示している。以下、この処理フローを概説する。 FIG. 12 shows the process flow of the image conversion method determination unit 7b including the process of determining the parameters α, β, and γ. The following outlines this processing flow.
 最初に、視点変換のパラメータβを0°に設定(S121)した後、直線208を取得する(S122)。そして、この直線208を回転させる任意のパラメータα、γを設定(S123)した後に、設定したパラメータα、β、γを利用して検出対象の三面図を生成する(S124)。そして、三面図のうちひとつを変換画像86として選択した後(S125)、選択した変換画像86と識別器情報65の類似度を算出し(S126)、類似度が閾値以上であれば、選択中の変換画像86を識別器64への入力画像と決定して処理を終了する。一方、類似度が閾値未満ならステップS128へ遷移する(S127)。ステップS128では、生成した三面図のすべてを変換画像として選択したかを判定し、すべてを選択していない場合はステップS125へ遷移し、β=0°の場合の三面図すべてについて類似度を算出した場合はステップS129へ遷移する(S128)。ステップS129では、パラメータβを変更して、すなわち、y軸(Yc)を中心に識別候補領域55を回転させてから、ステップS124へ戻り(S129)、類似度が閾値以上の変換画像86が得られるまで、処理を繰り返す。以下、特に重要なステップS122、S123、S124、S129について詳細に説明する。 First, after setting the parameter β of viewpoint conversion to 0 ° (S121), the straight line 208 is acquired (S122). Then, after setting the optional parameters α and γ for rotating the straight line 208 (S123), a three-view drawing of the detection target is generated using the set parameters α, β and γ (S124). Then, after one of the three views is selected as the converted image 86 (S125), the similarity between the selected converted image 86 and the discriminator information 65 is calculated (S126). Is determined as the input image to the discriminator 64, and the process ends. On the other hand, if the similarity is less than the threshold value, the process proceeds to step S128 (S127). In step S128, it is determined whether all of the generated three views have been selected as conversion images, and if all have not been selected, the process proceeds to step S125, and the similarity is calculated for all three views in the case of β = 0 °. If it has, the process proceeds to step S129 (S128). In step S129, the parameter β is changed, that is, after the identification candidate area 55 is rotated about the y axis (Yc), the process returns to step S124 (S129), and a converted image 86 having a similarity of the threshold or more is obtained. Repeat the process until it is done. Hereinafter, particularly important steps S122, S123, S124, and S129 will be described in detail.
 ステップS122では直線208を取得する。直線208の求め方の一例しては、識別候補領域55の三次元情報を参照し、三次元点群の各点同士のユークリッド距離が最大となる2点を結んだ直線をとるものとする。これは、検出対象が直立した人物である場合、その人物を含む識別候補領域55は鉛直方向に長い直方体であると予測でき、ユークリッド距離が最大となる方向が、人物の姿勢方向を示す直線208であると推定できるからである。また、識別候補領域55の三次元点群に対して主成分分析を実施し、その第一成分の方向にとった直線でも良い。あるいは、一般的な床面推定方法により、撮影する空間に存在する床面を検出可能な場合、その床面と直交する方向と、頭部に対応する1つの点の情報を用いて直線208を決定する方法をとっても良い。 In step S122, the straight line 208 is acquired. As an example of how to obtain the straight line 208, the three-dimensional information of the identification candidate area 55 is referred to, and a straight line connecting two points at which the Euclidean distance between points of the three-dimensional point group is maximum is taken. This is because when the detection target is an upright person, the identification candidate area 55 including the person can be predicted to be a rectangular solid long in the vertical direction, and the direction in which the Euclidean distance is maximum is a straight line 208 indicating the posture direction of the person. It can be estimated that Alternatively, principal component analysis may be performed on the three-dimensional point group of the identification candidate area 55, and a straight line taken in the direction of the first component may be used. Alternatively, when the floor surface existing in the space to be captured can be detected by a general floor surface estimation method, a straight line 208 is obtained using the direction orthogonal to the floor surface and the information of one point corresponding to the head. It is good to decide how to decide.
 ステップS123では、交点座標211と交点座標212のx値、z値が共に等しくなるような、すなわち、Xct’=Xbt’かつZct’=Zbt’となるようなパラメータα、γを決定する。Xct’とXcb’が等しくなるように識別候補領域55を回転させた際のz軸(Zc)周りの回転角がパラメータγに対応し、Zct’とZcb’が等しくなるように識別候補領域55を回転させた際のx軸(Xc)周りの回転角がパラメータαに対応する。ステップS121にてパラメータβは0°に設定されているため、以上の処理によりパラメータα、β、γを決定することができる。 In step S123, parameters α and γ are determined such that the x value and z value of the intersection coordinates 211 and 212 are equal, that is, Xct '= Xbt' and Zct '= Zbt'. When the identification candidate area 55 is rotated so that Xct 'and Xcb' become equal, the rotation angle around the z axis (Zc) corresponds to the parameter γ, and the identification candidate area 55 so that Zct 'and Zcb' become equal. The rotation angle around the x axis (Xc) when rotating x corresponds to the parameter α. Since the parameter β is set to 0 ° in step S121, the parameters α, β, and γ can be determined by the above processing.
 次に、図13を用いて、ステップS124の処理について説明する。ステップS124では、仮想的な視点変換後に識別候補領域55の三面図を取得する。図13において、視点82e、82f、82gは三面図を生成するための視点であり、変換画像86e、86f、86gは各視点より生成される変換画像86を示す。直線208をy軸(Yc)と平行にするパラメータα、γを決定した後、パラメータβを変化させながら三面図を生成していくと、所定のパラメータβとなったときに、図13の変換画像86eに示すように、識別候補領域55中の人物の正面からの視点へ仮想的に視点変換することができ、対応するテンプレートを持つ識別器64を用いて人物を検出することができる。 Next, the process of step S124 will be described using FIG. In step S124, three views of the identification candidate area 55 are acquired after virtual viewpoint conversion. In FIG. 13, viewpoints 82e, 82f, and 82g are viewpoints for generating a three-view, and converted images 86e, 86f, and 86g indicate converted images 86 generated from the respective viewpoints. After determining the parameters α and γ that make the straight line 208 parallel to the y axis (Yc), the trihedral view is generated while changing the parameter β, and when the predetermined parameter β is obtained, the conversion in FIG. As shown in the image 86e, it is possible to virtually convert the viewpoint to the viewpoint from the front of the person in the identification candidate area 55, and the person can be detected using the identifier 64 having the corresponding template.
 しかしながら、実環境では遮蔽などにより検出対象の特定の方向からの見え方が識別に適さない場合がある。そこで、正面からの視点82eからの変換画像86eに加え、側面と上面からの視点82f、82gからも変換画像86f、86gを得ておくことで、候補となる識別器64の数を増やし、物体検出の精度を向上させることができる。なお、視点82eに対して、各パラメータをさらにα=0°、β=90°、γ=0°だけ回転させることで側面を見る視点82fを設定でき、視点82eに対して、各パラメータをさらにα=90°、β=0°、γ=0°だけ回転させることで上面を見る視点82gを設定でき、視点82e、82f、82gにおいて識別候補領域55を透視投影することで、変換画像86e、86f、86gを効率的に取得でき、これを三面図とすることで効率的な人物検出を実現できる。 However, in a real environment, the way of viewing from a specific direction of the detection target may not be suitable for identification due to shielding or the like. Therefore, in addition to the converted image 86e from the viewpoint 82e from the front, the converted images 86f and 86g are also obtained from the viewpoints 82f and 82g from the side surface and the top surface, thereby increasing the number of candidate classifiers 64 as candidates. The accuracy of detection can be improved. In addition, the viewpoint 82f which looks at a side can be set by further rotating each parameter by α = 0 °, β = 90 °, γ = 0 ° with respect to the viewpoint 82e, and each parameter is further transmitted to the viewpoint 82e. By rotating by α = 90 °, β = 0 °, γ = 0 °, it is possible to set the viewpoint 82g for looking at the upper surface, and perspective-projecting the identification candidate region 55 at the viewpoints 82e, 82f, 82g, the converted image 86e, 86f and 86g can be acquired efficiently, and efficient person detection can be realized by making these into three views.
 以上説明した実施例2の物体検出装置では、実施例1に比べ、効率的にパラメータα、β、γを決定することができ、画像中で人物が変形する場合や、遮蔽が発生する場合においても高精度な人物検出を実施することができる。 In the object detection apparatus according to the second embodiment described above, the parameters α, β, and γ can be determined more efficiently than in the first embodiment, and when a person is deformed in an image or when shielding occurs. It is possible to carry out highly accurate person detection.
1 撮像装置、2a、2b 物体検出装置、3 画像取得部、4 三次元情報取得部、5 識別候補領域抽出部、51 画像処理部、52 三次元情報処理部、53 識別候補領域ID付与部、54 識別候補領域情報管理部、54_n 識別候補領域情報55 識別候補領域、6 識別器情報取得部、64 識別器、65 識別器情報、66 テンプレート、7a、7b 画像変換方法決定部、8 画像変換部、82 視点、86 変換画像、87 矩形画像、9 識別部、91 識別器記録部、92 識別処理実施部、93 識別結果出力部 Reference Signs List 1 imaging apparatus, 2a, 2b object detection device, 3 image acquisition unit, 4 three-dimensional information acquisition unit, 5 identification candidate area extraction unit, 51 image processing unit, 52 three-dimensional information processing unit, 53 identification candidate area ID assignment unit 54 identification candidate area information management unit 54_n identification candidate area information 55 identification candidate area 6 identifier information acquisition unit 64 identifier identification 65 identifier information 66 template 7a, 7b image conversion method determination unit 8 image conversion unit , 82 viewpoints, 86 converted images, 87 rectangular images, 9 identification units, 91 classifier recording units, 92 identification processing execution units, 93 identification results output units

Claims (18)

  1.  計測範囲内に検出対象が存在するか否かを判定する物体検出装置であって、
     撮像装置からの入力を基に前記計測範囲内の三次元情報を取得する三次元情報取得部と、
     前記検出対象が存在し得る識別候補領域を抽出する識別候補領域抽出部と、
     前記検出対象の検出に用いる識別器と、
     該識別器の情報を取得する識別器情報取得部と、
     前記識別候補領域内の三次元情報を仮想的に視点変換処理するパラメータを決定する画像変換方法決定部と、
     仮想的に視点変換処理した前記識別候補領域内の三次元情報を基に変換画像を生成する画像変換実施部と、
     該変換画像を基に前記識別器を用いて前記検出対象を検出する識別部と、
     を備えることを特徴とする物体検出装置。
    An object detection apparatus that determines whether a detection target exists within a measurement range, and
    A three-dimensional information acquisition unit that acquires three-dimensional information in the measurement range based on an input from an imaging device;
    An identification candidate area extraction unit that extracts an identification candidate area where the detection target may exist;
    A classifier used to detect the detection target;
    A classifier information acquisition unit that acquires information of the classifier;
    An image conversion method determination unit that determines parameters for virtually performing viewpoint conversion processing on three-dimensional information in the identification candidate area;
    An image conversion execution unit that generates a converted image based on three-dimensional information in the identification candidate area virtually subjected to viewpoint conversion processing;
    An identification unit that detects the detection target using the identifier based on the converted image;
    An object detection apparatus comprising:
  2.  請求項1に記載の物体検出装置において、
     前記撮像装置からの入力を基に前記計測範囲内の画像情報を取得する画像取得部、
     を更に備えることを特徴とする物体検出装置。
    In the object detection device according to claim 1,
    An image acquisition unit for acquiring image information within the measurement range based on an input from the imaging device;
    An object detection apparatus, further comprising:
  3.  請求項2に記載の物体検出装置において、
     前記識別候補領域抽出部は、
     前記画像情報、前記三次元情報、外部センサの少なくとも一つ以上を利用して、前記識別候補領域を抽出することを特徴とする物体検出装置。
    In the object detection device according to claim 2,
    The identification candidate area extraction unit
    An object detection apparatus characterized by extracting the identification candidate area using at least one or more of the image information, the three-dimensional information, and an external sensor.
  4.  請求項2または3に記載の物体検出装置において、
     前記画像変換方法決定部は、
     前記画像情報と前記三次元情報と前記識別器の情報を利用して、前記識別器の入力として最適な前記変換画像を生成するパラメータを決定することを特徴とする物体検出装置。
    In the object detection device according to claim 2 or 3,
    The image conversion method determination unit
    An object detection apparatus characterized by using the image information, the three-dimensional information, and the information of the discriminator to determine a parameter for generating the converted image optimal as an input of the discriminator.
  5.  請求項2から4のいずれかに記載の物体検出装置において、
     前記識別候補領域抽出部は、
     複数の前記識別候補領域にIDを付与する識別候補領域ID付与部と、
     前記識別候補領域のIDと、前記画像情報における位置と、前記三次元情報における位置を纏めて管理する識別候補領域情報管理部と、
     を備えることを特徴とする物体検出装置。
    In the object detection device according to any one of claims 2 to 4,
    The identification candidate area extraction unit
    An identification candidate area ID assigning unit that assigns IDs to a plurality of the identification candidate areas;
    An identification candidate area information management unit that collectively manages the ID of the identification candidate area, the position in the image information, and the position in the three-dimensional information;
    An object detection apparatus comprising:
  6.  請求項1から5のいずれかに記載の物体検出装置において、
     前記識別器情報取得部は、前記識別器のIDと、特に高い識別能力を示す入力信号を表現する識別器情報を取得することを特徴とする物体検出装置。
    In the object detection device according to any one of claims 1 to 5,
    The object detection device characterized in that the identifier information acquisition unit acquires the ID of the identifier and identifier information representing an input signal indicating particularly high discrimination ability.
  7.  請求項2から6のいずれかに記載の物体検出装置において、
     前記画像変換方法決定部は、
     前記仮想的な視点変換処理の結果に対して最適化処理を実施し、前記識別候補領域に前記検出対象が含まれるか否かを判定する識別器において最も適する画像への画像変換を実現する前記パラメータを決定することを特徴とする物体検出装置。
    The object detection apparatus according to any one of claims 2 to 6,
    The image conversion method determination unit
    The image processing to realize the most suitable image in the discriminator which performs optimization processing on the result of the virtual viewpoint conversion processing and determines whether the detection target is included in the identification candidate area An object detection apparatus characterized by determining a parameter.
  8.  請求項6に記載の物体検出装置において、
     前記識別器情報は、テンプレート、色情報、輝度情報、輪郭、勾配情報のいずれかであることを特徴とする物体検出装置。
    In the object detection device according to claim 6,
    The object detection device, wherein the discriminator information is any one of a template, color information, luminance information, an outline, and gradient information.
  9.  請求項8に記載の物体検出装置において、
     前記識別器情報がテンプレートである場合、
     前記画像変換方法決定部は、前記視点変換処理を実施して取得する画像と前記テンプレートの類似度を算出し、前記類似度が最大となる前記識別器を選択することを特徴とする物体検出装置。
    In the object detection device according to claim 8,
    If the identifier information is a template,
    The image conversion method determination unit calculates the similarity between the image acquired by performing the viewpoint conversion process and the acquired template, and selects the classifier that maximizes the similarity. .
  10.  請求項2から9のいずれかに記載の物体検出装置において、
     前記画像変換方法決定部は、前記撮像装置の設置状態を表現するカメラパラメータを利用し前記パラメータを決定する機能を備えることを特徴とする物体検出装置。
    In the object detection device according to any one of claims 2 to 9,
    An object detection apparatus characterized in that the image conversion method determination unit has a function of determining the parameter by using a camera parameter expressing an installation state of the imaging device.
  11.  請求項2から10のいずれかに記載の物体検出装置において、
     前記識別部は、前記検出対象に対して識別能力を有する識別部を少なくとも1つを記録する識別器記録部と、
     前記識別器を用いて前記変換画像に対して前記検出対象が含まれるか否かを識別する識別処理を実施する識別処理実施部と、
     前記変換画像に前記検出対象が含まれると判定された場合に結果を出力する識別結果出力部と、
     を備えることを特徴とする物体検出装置。
    The object detection apparatus according to any one of claims 2 to 10.
    The identification unit records an identification unit having identification ability with respect to the detection target, at least one identification unit recording unit;
    An identification processing execution unit that performs an identification process of identifying whether the detection target is included in the converted image using the classifier;
    An identification result output unit that outputs a result when it is determined that the detection target is included in the converted image;
    An object detection apparatus comprising:
  12.  請求項2から11のいずれかに記載の物体検出装置において、
     前記画像変換方法決定部は、前記検出対象の三次元形状に基づいて前記パラメータを決定することを特徴とする物体検出装置。
    The object detection apparatus according to any one of claims 2 to 11.
    The image detection method determination unit determines the parameter based on a three-dimensional shape of the detection target.
  13.  請求項12に記載の物体検出装置において、
     前記画像変換方法決定部は、前記識別候補領域を通過する、検出対象の一般的な姿勢方向を示す直線を取得し、
     前記直線が前記撮像装置のカメラ座標系のY軸と平行になるような仮想的な視点変換を実現する前記パラメータを取得する機能、
     を備えることを特徴とする物体検出装置。
    In the object detection device according to claim 12,
    The image conversion method determination unit acquires a straight line that passes through the identification candidate area and indicates a general posture direction of a detection target,
    A function of acquiring the parameters for realizing virtual viewpoint transformation such that the straight line is parallel to the Y axis of the camera coordinate system of the imaging device;
    An object detection apparatus comprising:
  14.  請求項13に記載の物体検出装置において、
     前記画像変換方法決定部は、前記直線が前記カメラ座標系のY軸と平行な状態に変換された後に、正面、側面、上面から前記識別候補領域を観測する視点へ仮想的な視点変換を実施し、それぞれの視点において前記変換画像を取得することを特徴とする物体検出装置。
    In the object detection device according to claim 13,
    The image conversion method determination unit performs virtual viewpoint conversion from the front, the side, and the top to the viewpoint for observing the identification candidate area after the straight line is converted to a state parallel to the Y axis of the camera coordinate system. An object detection device for acquiring the converted image at each viewpoint.
  15.  請求項13または14に記載の物体検出装置において、
     画像変換方法決定部は、
     前記識別候補領域に含まれる三次元点群の各点同士のユークリッド距離が最大となる2点を結ぶことで前記直線を決定することを特徴とする物体検出装置。
    In the object detection device according to claim 13 or 14,
    The image conversion method determination unit
    An object detection apparatus characterized in that the straight line is determined by connecting two points at which Euclidean distances of respective points of a three-dimensional point group included in the identification candidate area are maximum.
  16.  請求項13または14に記載の物体検出装置において、
     前記画像変換方法決定部は、
     前記識別候補領域に含まれる三次元点群に対して主成分分析を実施し、
     その第一成分の方向にとることで前記直線を決定することを特徴とする物体検出装置。
    In the object detection device according to claim 13 or 14,
    The image conversion method determination unit
    Principal component analysis is performed on the three-dimensional point group included in the identification candidate area;
    An object detection apparatus characterized in that the straight line is determined by taking the direction of the first component.
  17.  請求項13に記載の物体検出装置において、
     前記画像変換方法決定部は、
     前記計測範囲の床面を推定し、
     前記識別候補領域に対して前記検出対象の特定の部位を検出し、
     前記部位に対応するひとつの点を通り、前記床面と直交する方向へのびる直線を前記直線に決定することを特徴とする物体検出装置。
    In the object detection device according to claim 13,
    The image conversion method determination unit
    Estimate the floor of the measurement range,
    Detecting a specific part of the detection target for the identification candidate area;
    A straight line extending in a direction orthogonal to the floor surface is determined to be the straight line passing through one point corresponding to the portion.
  18.  計測範囲内に検出対象が存在するか否かを判定する物体検出方法であって、
     撮像装置からの入力を基に前記計測範囲内の三次元情報を取得し、
     前記検出対象が存在し得る識別候補領域を抽出し、
     前記検出対象の検出に用いる識別器の情報を取得し、
     前記識別候補領域内の三次元情報を仮想的に視点変換処理するパラメータを決定し、
     仮想的に視点変換処理した前記識別候補領域内の三次元情報を基に変換画像を生成し、
     該変換画像を基に前記識別器を用いて前記検出対象を検出することを特徴とする物体検出方法。
    An object detection method for determining whether or not a detection target is present in a measurement range, comprising:
    Acquiring three-dimensional information within the measurement range based on the input from the imaging device;
    Extracting an identification candidate area where the detection target may exist;
    Acquiring information of a classifier used to detect the detection target;
    Determining parameters for virtually performing viewpoint conversion processing on three-dimensional information in the identification candidate area;
    Generating a converted image based on three-dimensional information in the identification candidate area virtually subjected to viewpoint conversion processing;
    An object detection method comprising: detecting the detection target using the classifier based on the converted image.
PCT/JP2017/026036 2017-07-19 2017-07-19 Object detection device and object detection method WO2019016879A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2017/026036 WO2019016879A1 (en) 2017-07-19 2017-07-19 Object detection device and object detection method
JP2019530278A JP6802923B2 (en) 2017-07-19 2017-07-19 Object detection device and object detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2017/026036 WO2019016879A1 (en) 2017-07-19 2017-07-19 Object detection device and object detection method

Publications (1)

Publication Number Publication Date
WO2019016879A1 true WO2019016879A1 (en) 2019-01-24

Family

ID=65015748

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/026036 WO2019016879A1 (en) 2017-07-19 2017-07-19 Object detection device and object detection method

Country Status (2)

Country Link
JP (1) JP6802923B2 (en)
WO (1) WO2019016879A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPWO2020241057A1 (en) * 2019-05-29 2020-12-03
US11087169B2 (en) * 2018-01-12 2021-08-10 Canon Kabushiki Kaisha Image processing apparatus that identifies object and method therefor

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007271408A (en) * 2006-03-31 2007-10-18 Nippon Telegr & Teleph Corp <Ntt> Device and method for acquiring three-dimensional environmental information, and recoding medium storing program achieving the method
JP2015032185A (en) * 2013-08-05 2015-02-16 国立大学法人 東京大学 Three-dimensional environment restoration device
WO2016063796A1 (en) * 2014-10-24 2016-04-28 株式会社日立製作所 Calibration device
WO2016199244A1 (en) * 2015-06-10 2016-12-15 株式会社日立製作所 Object recognition device and object recognition system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007271408A (en) * 2006-03-31 2007-10-18 Nippon Telegr & Teleph Corp <Ntt> Device and method for acquiring three-dimensional environmental information, and recoding medium storing program achieving the method
JP2015032185A (en) * 2013-08-05 2015-02-16 国立大学法人 東京大学 Three-dimensional environment restoration device
WO2016063796A1 (en) * 2014-10-24 2016-04-28 株式会社日立製作所 Calibration device
WO2016199244A1 (en) * 2015-06-10 2016-12-15 株式会社日立製作所 Object recognition device and object recognition system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HAYATA, TAKESHI, ET AL.: "3-Jigen Tengun karano Multi Scale Tokucho Chushutsuho ni Kansaru Kento", ITE TECHNICAL REPORT, vol. 39, no. 29, 27 July 2015 (2015-07-27), pages 35 - 40, ISSN: 1342-6893 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11087169B2 (en) * 2018-01-12 2021-08-10 Canon Kabushiki Kaisha Image processing apparatus that identifies object and method therefor
JPWO2020241057A1 (en) * 2019-05-29 2020-12-03
WO2020241057A1 (en) * 2019-05-29 2020-12-03 コニカミノルタ株式会社 Image processing system, image processing program, and image processing method
JP7067672B2 (en) 2019-05-29 2022-05-16 コニカミノルタ株式会社 Image processing system, image processing program, and image processing method

Also Published As

Publication number Publication date
JP6802923B2 (en) 2020-12-23
JPWO2019016879A1 (en) 2020-03-26

Similar Documents

Publication Publication Date Title
JP6125188B2 (en) Video processing method and apparatus
JP5812599B2 (en) Information processing method and apparatus
JP5343042B2 (en) Point cloud data processing apparatus and point cloud data processing program
JP5838355B2 (en) Spatial information detection device, human position detection device
JP6352208B2 (en) 3D model processing apparatus and camera calibration system
US9740914B2 (en) Face location detection
JP5493108B2 (en) Human body identification method and human body identification device using range image camera
JP2014529727A (en) Automatic scene calibration
JP2017096939A (en) System and method for scoring clutter for use in 3d point cloud matching in vision system
JP6541920B1 (en) INFORMATION PROCESSING APPARATUS, PROGRAM, AND INFORMATION PROCESSING METHOD
JP2006343859A (en) Image processing system and image processing method
KR102110459B1 (en) Method and apparatus for generating three dimension image
JP2018120283A (en) Information processing device, information processing method and program
JP2010045501A (en) Image monitoring device
JP6802923B2 (en) Object detection device and object detection method
JP2018195070A (en) Information processing apparatus, information processing method, and program
JP5336325B2 (en) Image processing method
JP6027952B2 (en) Augmented reality image generation system, three-dimensional shape data generation device, augmented reality presentation device, augmented reality image generation method, and program
KR101578891B1 (en) Apparatus and Method Matching Dimension of One Image Up with Dimension of the Other Image Using Pattern Recognition
JP6374812B2 (en) 3D model processing apparatus and camera calibration system
JP5217917B2 (en) Object detection and tracking device, object detection and tracking method, and object detection and tracking program
JP6606340B2 (en) Image detection apparatus, image detection method, and program
JP3253328B2 (en) Distance video input processing method
JP2014002489A (en) Position estimation device, method, and program
US20220230342A1 (en) Information processing apparatus that estimates object depth, method therefor, and storage medium holding program therefor

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17918024

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019530278

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17918024

Country of ref document: EP

Kind code of ref document: A1