JP5539102B2

JP5539102B2 - Object detection apparatus and object detection method

Info

Publication number: JP5539102B2
Application number: JP2010183390A
Authority: JP
Inventors: 貫之岩本
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2010-08-18
Filing date: 2010-08-18
Publication date: 2014-07-02
Anticipated expiration: 2030-08-18
Also published as: JP2012043155A

Description

本発明は画像中の物体を検出する装置および方法に関する。 The present invention relates to an apparatus and method for detecting an object in an image.

従来、画像中の特定の物体として人体を検出する技術が知られている。例えば、非特許文献１によれば、画像の所定の範囲の輝度勾配ヒストグラムを特徴量とし、Support Vector Machineによる識別を行い、画像中の人物の検出を行う技術が開示されている。しかしながら、非特許文献１に開示される技術では、特定の姿勢の人体の検出には有効であるが、姿勢が変動した場合、検出が困難であるという課題があった。 Conventionally, a technique for detecting a human body as a specific object in an image is known. For example, Non-Patent Document 1 discloses a technique for detecting a person in an image by using a brightness gradient histogram in a predetermined range of the image as a feature amount and performing identification by Support Vector Machine. However, although the technique disclosed in Non-Patent Document 1 is effective for detecting a human body in a specific posture, there is a problem that it is difficult to detect when the posture changes.

非特許文献２では、このような課題を解決し、人体の変形を許容した人体検出方法が提案されている。非特許文献２によれば、人体の部位に対応した検出器の検出スコアを、重心との相対位置関係を加味して加算する技術が提案されている。すなわち、まず、人体の各部位に対応した検出器を学習によって得る。次に、人体のサンプル画像に対してそれらの検出器で検出を行い、検出点の人体重心に対する相対位置の確率分布をそれぞれの検出器について学習する。このようにして学習した検出器、および人体重心に対する相対位置の確率分布を用いることにより、画像中から人体を検出する。その検出の際には、まず、対象の画像から、人体各部位の検出スコアを算出する。人体各部位の検出器のスコアを、その検出器からみた人体重心の相対位置に対応するビンに投票する。全ての検出器からの投票を総計し、最終的な人体検出結果を得る。 Non-Patent Document 2 proposes a human body detection method that solves such problems and allows deformation of the human body. According to Non-Patent Document 2, a technique is proposed in which detection scores of detectors corresponding to human body parts are added in consideration of a relative positional relationship with the center of gravity. That is, first, a detector corresponding to each part of the human body is obtained by learning. Next, detection is performed with respect to the sample image of the human body with those detectors, and the probability distribution of the relative position of the detection point with respect to the center of gravity of the human body is learned for each detector. By using the detector thus learned and the probability distribution of the relative position with respect to the center of gravity of the human body, the human body is detected from the image. In the detection, first, the detection score of each part of the human body is calculated from the target image. The score of the detector of each part of the human body is voted for a bin corresponding to the relative position of the center of gravity of the human body viewed from the detector. The votes from all detectors are summed up to obtain the final human detection result.

また同様に、特許文献１に開示される技術も人体の変形を許容した人体検出方法である。特許文献１に開示される技術では、画像から人体部位と考えられる候補を抽出し、それら各候補の部位らしさの確率と、部位候補の配置関係、および、検出対象としての確率の高い部位候補の組合せを選択し、人体らしさを確率的に計算する。 Similarly, the technique disclosed in Patent Document 1 is a human body detection method that allows deformation of the human body. In the technique disclosed in Patent Document 1, candidates that are considered to be human body parts are extracted from an image, the probability of the part likelihood of each candidate, the positional relationship between the part candidates, and the part candidate with a high probability as a detection target. A combination is selected and the humanity is calculated stochastically.

以上のように、非特許文献２や特許文献１に記載の方法では、人体部位の配置関係の学習を行っていた。一方で、非特許文献３に記載の方法では、物体認識において、各部分特徴の幾何学的な配置関係を問わないモデル、いわゆるBoK（Bag of Keypoints）モデルが用いられている。 As described above, in the methods described in Non-Patent Document 2 and Patent Document 1, learning of the arrangement relationship of human body parts is performed. On the other hand, in the method described in Non-Patent Document 3, a model regardless of the geometrical arrangement relationship of each partial feature, that is, a so-called BoK (Bag of Keypoints) model is used in object recognition.

特開２００５−１６５９２３号公報JP 2005-165923 A

Navneet Dalal and Bill Triggs,“Histograms of Oriented Gradients for Human Detection,” IEEE Computer Visionand Pattern Recognition, pp. 886-893, 2005年Navneet Dalal and Bill Triggs, “Histograms of Oriented Gradients for Human Detection,” IEEE Computer Vision and Pattern Recognition, pp. 886-893, 2005 Lubomir Bourdev and Jitendra Malik,“Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations,” IEEEInternational Conference on Computer Vision, 2009年Lubomir Bourdev and Jitendra Malik, “Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations,” IEEE International Conference on Computer Vision, 2009 Gabriella Csurka, Christopher R.Dance, Lixin Fan, Jutta Willamowski, Cedric Bray, “Visual Categorization withBags of Keypoints,” In Workshop on Statistical Learning in Computer Vision,European Conference on Computer Vision 2004年Gabriella Csurka, Christopher R. Dance, Lixin Fan, Jutta Willamowski, Cedric Bray, “Visual Categorization with Bags of Keypoints,” In Workshop on Statistical Learning in Computer Vision, European Conference on Computer Vision 2004 Krystian Mikolajczyk and CordeliaSchmidt, “Scale ＆ Affine Invariant Interest Point Detectors,” InternationalJournal of Computer Vision, 60(1), pp.63-86,2004.（非特許文献４は、発明を実施するための形態において参照される）Krystian Mikolajczyk and Cordelia Schmidt, “Scale & Affine Invariant Interest Point Detectors,” International Journal of Computer Vision, 60 (1), pp. 63-86, 2004. ) D. Comaniciu, M. Peter, “MeanShift: A Robust Approach Toward Feature Space Analysis,” IEEE Transactions onPattern Analysis and Machine Intelligence, Vol.24(5), pp.603-619.（非特許文献５は、発明を実施するための形態において参照される）D. Comaniciu, M. Peter, “MeanShift: A Robust Approach Toward Feature Space Analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24 (5), pp. 603-619. (Referred to in the detailed description)

しかしながら、上述した特許文献１や非特許文献２の技術のように、人体を各部位に分割し、各部位の相対位置関係を利用する物体検出方法では、以下に述べるように各部位の相対位置関係の学習が煩雑であるという課題があった。 However, in the object detection method that divides the human body into parts and uses the relative positional relationship between the parts as in the techniques of Patent Document 1 and Non-Patent Document 2 described above, the relative positions of the parts are described as follows. There was a problem that the learning of relationships was complicated.

上述のように、非特許文献２に開示される技術では、人体の各部分の位置関係を人体画像の重心を基準とする相対座標系における存在確率分布として記述することで、人体の各部の位置に自由度を持たせている。しかしながら、例えば人体の上体部に関して、人体が正面を向いている場合の存在確率分布と、人体が横を向いている場合の存在確率分布では、関節構造の制約からくる可動域の違いがあるため、異なる分布となる。そのため、非特許文献２では、少なくとも人体の向きごとに、人体各部の存在確率分布を学習しなくてはいけない。 As described above, in the technique disclosed in Non-Patent Document 2, the position relationship of each part of the human body is described by describing the positional relationship of each part of the human body as an existence probability distribution in a relative coordinate system based on the center of gravity of the human body image. Has a degree of freedom. However, for example, regarding the upper body part of the human body, there is a difference in the range of motion due to joint structure restrictions between the existence probability distribution when the human body is facing the front and the existence probability distribution when the human body is facing sideways. Therefore, the distribution is different. Therefore, in Non-Patent Document 2, it is necessary to learn the existence probability distribution of each part of the human body at least for each direction of the human body.

また、特許文献１に開示される技術では、画像から人体部位と考えられる候補を抽出し、それら各候補の部位らしさの確率と、部位候補の配置関係、および、検出対象としての確率の高い部位候補の組合せを選択し、人体らしさを確率的に計算する。したがって、この特許文献１に開示される技術も、上述した非特許文献２に開示される技術と同様に、少なくとも人体の向きごとに、人体部位候補の配置関係の学習をする必要がある。 Further, in the technique disclosed in Patent Document 1, candidates that are considered to be human body parts are extracted from an image, the probability of the part likelihood of each candidate, the positional relationship of the part candidates, and the part with a high probability as a detection target A combination of candidates is selected, and the likelihood of human body is calculated stochastically. Therefore, similarly to the technique disclosed in Non-Patent Document 2 described above, the technique disclosed in Patent Document 1 needs to learn the arrangement relationship of human body part candidates at least for each direction of the human body.

一方で、非特許文献３にあるような、各部分特徴の幾何学的な配置関係を問わないモデルを用いる物体認識では、人体の向きの違いによって人体各部の存在確率分布を学習する必要はない。しかしながら、人体の各部分は特徴が乏しく、背景が複雑な場合、BoKモデルでは、検出が困難であるという課題がある。 On the other hand, in the object recognition using the model regardless of the geometric arrangement relationship of each partial feature as in Non-Patent Document 3, it is not necessary to learn the existence probability distribution of each part of the human body due to the difference in the orientation of the human body. . However, when each part of the human body has poor features and the background is complicated, the BoK model has a problem that it is difficult to detect.

本発明は上記の課題に鑑みてなされたものであり、姿勢変動を伴う対象物について当該対象物を構成する各部の相対位置を考慮せずに、画像中から当該対象物を高い検出率で検出可能にすることを目的とする。 The present invention has been made in view of the above-described problems, and detects an object with high detection rate from an image without considering the relative position of each part constituting the object with respect to the object with posture variation. The purpose is to make it possible.

上記の目的を達成するための本発明の一態様による物体検出装置は以下の構成を備える。すなわち、
画像から特定の物体を検出する物体検出装置であって、
特定の物体に関する姿勢パラメータと、画像の部分特徴と、該姿勢パラメータにおいて該部分特徴が画像に存在する確率とを対応付けて記憶する記憶手段と、
前記特定の物体の検出対象の画像より特徴点を検出し、検出した特徴点を含む局所画像を抽出して特徴量を算出することにより部分特徴を取得する取得手段と、
前記検出対象の画像とは無関係に、複数の姿勢パラメータを入力する入力手段と、
前記複数の姿勢パラメータの各々について、前記記憶手段に記憶された確率の大きい順に所定数の部分特徴を選択する選択手段と、
前記取得手段で取得された部分特徴と前記選択手段で選択した部分特徴との相関の度合いを示す照合度を算出する照合手段と、
前記複数の姿勢パラメータに関して前記照合手段により算出された照合度に基づいて、前記検出対象の画像に前記特定の物体が存在するか否かを判定する判定手段とを備える。 In order to achieve the above object, an object detection apparatus according to an aspect of the present invention has the following arrangement. That is,
An object detection device for detecting a specific object from an image,
A storage unit that stores a pose parameter related to a specific object, a partial feature of the image, and a probability that the partial feature exists in the image in the pose parameter in association with each other;
An acquisition means for acquiring a partial feature by detecting a feature point from an image to be detected of the specific object, extracting a local image including the detected feature point, and calculating a feature amount;
Input means for inputting a plurality of posture parameters irrespective of the image to be detected;
Selection means for selecting a predetermined number of partial features in descending order of probability stored in the storage means for each of the plurality of posture parameters;
Collating means for calculating a matching degree indicating a degree of correlation between the partial feature acquired by the acquiring means and the partial feature selected by the selecting means;
Determination means for determining whether or not the specific object is present in the detection target image based on the matching degree calculated by the matching means with respect to the plurality of posture parameters.

本発明によれば、姿勢変動を伴う対象物について当該対象物を構成する各部の相対位置を考慮せずに、画像から当該対象物を高い検出率で検出することができる。 ADVANTAGE OF THE INVENTION According to this invention, the target object can be detected from an image with a high detection rate, without considering the relative position of each part which comprises the target object with respect to the target object with attitude | position fluctuation | variation.

実施形態に係る物体検出装置が対象とする画像の一例を示す図。The figure which shows an example of the image which the object detection apparatus which concerns on embodiment makes object. 実施形態に係る物体検出装置による画像処理過程の一例を示す図。The figure which shows an example of the image processing process by the object detection apparatus which concerns on embodiment. 実施形態に係る物体検出装置の機能構成例を示すブロック図。The block diagram which shows the function structural example of the object detection apparatus which concerns on embodiment. 実施形態に係る物体検出装置が対象とする物体のパラメータの一例を示す図。The figure which shows an example of the parameter of the object made into the object detection apparatus which concerns on embodiment. （ａ）は対応関係記憶部のデータ構成例を示す図、（ｂ）は部分特徴記憶部のデータ構成例を示す図。(A) is a figure which shows the data structural example of a correspondence relationship memory | storage part, (b) is a figure which shows the data structural example of a partial feature memory | storage part. 実施形態に係る物体検出装置の処理を示すフローチャート。The flowchart which shows the process of the object detection apparatus which concerns on embodiment. 実施形態に係る物体検出装置の処理を示すフローチャート。The flowchart which shows the process of the object detection apparatus which concerns on embodiment.

以下、添付図面を参照して本発明をその好適な実施形態の一つに従って詳細に説明する。 Hereinafter, the present invention will be described in detail according to one of its preferred embodiments with reference to the accompanying drawings.

（概要）
本実施形態に係る物体検出装置は、画像中の特定の物体として人物の検出を行う。以下、図を用いてその一例を示す。なお、本実施形態では、画像中の人物を検出対象物体としているが、本発明はこれに限定されるものではない。例えば、自動車などのように、姿勢をカメラアングルによって規定できる剛体や、バネなどのように変形をパラメータで記述可能な柔軟物体など、ひとつ以上のパラメータによりその姿勢および形状を規定できる物体であれば、これを検出対象物体とすることができる。 (Overview)
The object detection apparatus according to the present embodiment detects a person as a specific object in an image. Hereinafter, an example is shown using figures. In the present embodiment, the person in the image is the detection target object, but the present invention is not limited to this. For example, an object that can specify its posture and shape with one or more parameters, such as a rigid body whose posture can be defined by camera angle, such as an automobile, or a flexible object that can describe deformation with parameters, such as a spring This can be set as a detection target object.

図１は、本実施形態における検出対象の画像１００を示す。画像１００において、物体１１０は検出対象となる人物を示す。検出ウィンドウ領域１２０は特定の物体（人物）を検出する対象となる領域を示す。検出ウィンドウ領域１３０は検出ウィンドウ領域１２０とは別の、検出対象となる領域を示す。物体１４０は背景の物体を示す。後述するように、物体検出装置には、人物画像の部分特徴があらかじめ姿勢パラメータと対応付けて学習されている。 FIG. 1 shows an image 100 to be detected in the present embodiment. In the image 100, an object 110 indicates a person to be detected. The detection window area 120 indicates an area that is a target for detecting a specific object (person). The detection window area 130 is a detection target area different from the detection window area 120. An object 140 indicates a background object. As will be described later, in the object detection apparatus, partial features of a person image are learned in advance in association with posture parameters.

図２は、本実施形態における物体検出装置においてあらかじめ学習された部分特徴の例を示す。部分特徴群２１０は人物の各部の部分特徴の模式図である。部分特徴２２１、部分特徴２２２、部分特徴２２３、部分特徴２２４、および部分特徴２２５は、それぞれ部分特徴の模式図である。 FIG. 2 shows an example of partial features learned in advance in the object detection apparatus according to the present embodiment. The partial feature group 210 is a schematic diagram of partial features of each part of a person. The partial feature 221, the partial feature 222, the partial feature 223, the partial feature 224, and the partial feature 225 are schematic views of the partial feature, respectively.

物体検出装置は、外部より入力された姿勢パラメータに基づき、部分特徴群２１０より複数の部分特徴、例えば部分特徴２２１、部分特徴２２２、および部分特徴２２３を選択する。このような部分特徴の選択処理は、たとえば、入力された姿勢パラメータにおいて各部分特徴が存在する確率（予め学習により求めておく）に基づいて行われる（詳細は後述する）。選択した部分特徴と、検出ウィンドウ領域１２０において抽出された部分特徴との照合を行い、それぞれの部分特徴の照合度の総和が閾値以上であれば、対応する姿勢パラメータ空間のビンに投票を行う。なお、姿勢パラメータ空間とは、姿勢パラメータのパラメータ数を次元数とした空間であり、複数のビンは、姿勢パラメータ空間を複数の空間に分割することで得られた複数の部分空間に対応する。 The object detection device selects a plurality of partial features, for example, the partial feature 221, the partial feature 222, and the partial feature 223 from the partial feature group 210 based on the posture parameter input from the outside. Such partial feature selection processing is performed based on, for example, the probability that each partial feature exists in the input posture parameter (preliminarily obtained by learning) (details will be described later). The selected partial feature is collated with the partial feature extracted in the detection window area 120. If the sum of the collation degrees of each partial feature is equal to or greater than a threshold value, the corresponding voting parameter space bin is voted. The posture parameter space is a space in which the number of parameters of the posture parameter is the number of dimensions, and the plurality of bins correspond to a plurality of partial spaces obtained by dividing the posture parameter space into a plurality of spaces.

再び外部から新たな姿勢パラメータを入力し、新たな姿勢パラメータに基づいて上記処理を繰り返す。すなわち、新たな姿勢パラメータに基づき、部分特徴群２１０より複数の部分特徴、例えば、部分特徴２２１、部分特徴２２４、および部分特徴２２５が選択され、同様に照合と投票が行われる。このように、姿勢パラメータの入力、部分特徴の選択、照合、投票の処理を規定の回数だけ繰り返す。規定の繰り返し回数の後、姿勢パラメータ空間に投票された分布の極値がある閾値を超えた場合、検出ウィンドウ領域１２０に人物が存在すると判定する。また、その人物の姿勢は、その極値に対応する姿勢パラメータで表現される姿勢であると判定する。画像中の他の領域、例えば検出ウィンドウ領域１３０など、に対しても同様の処理を行い、最終的に画像中から全ての人物を検出する。以上の処理について、以下に更に詳細に説明する。 A new posture parameter is input again from the outside, and the above processing is repeated based on the new posture parameter. That is, based on the new posture parameter, a plurality of partial features such as the partial feature 221, the partial feature 224, and the partial feature 225 are selected from the partial feature group 210, and collation and voting are performed in the same manner. In this manner, the posture parameter input, the partial feature selection, collation, and voting are repeated a predetermined number of times. If the extreme value of the distribution voted in the posture parameter space exceeds a certain threshold after the specified number of repetitions, it is determined that a person exists in the detection window area 120. Further, it is determined that the posture of the person is a posture expressed by a posture parameter corresponding to the extreme value. Similar processing is performed on other regions in the image, such as the detection window region 130, and finally all persons are detected from the image. The above processing will be described in more detail below.

（構成）
図３は本実施形態に係る物体検出装置３００の概略を示す図である。なお、以下に示す各部の機能は、物体検出装置３００が有するＣＰＵ（不図示）がコンピュータ読み取り可能なメモリ（不図示）に格納されたコンピュータプログラムを実行することにより実現される。したがって、パーソナルコンピュータなどの情報処理装置を用いて物体検出装置３００を実現することが可能である。物体検出装置３００において、画像入力部３０１は、例えば、実時間で撮影できるもの（例えばカメラ）でもよいし、画像を光学的に読み取るもの（例えばスキャナ）でもよいし、事前に撮影または読み取った画像を記憶したストレージであってもよい。画像入力部３０１から入力された画像は、特徴量算出部３０２に出力される。 (Constitution)
FIG. 3 is a diagram showing an outline of the object detection apparatus 300 according to this embodiment. In addition, the function of each part shown below is implement | achieved when the CPU (not shown) which the object detection apparatus 300 has runs the computer program stored in the computer-readable memory (not shown). Therefore, the object detection apparatus 300 can be realized using an information processing apparatus such as a personal computer. In the object detection device 300, the image input unit 301 may be, for example, a device that can shoot in real time (for example, a camera), a device that optically reads an image (for example, a scanner), or an image that has been captured or read in advance. May be stored. The image input from the image input unit 301 is output to the feature amount calculation unit 302.

特徴量算出部３０２は、画像入力部３０１より入力された画像の部分の特徴量（部分特徴）を算出する。すなわち、検出対象の画像より特徴点を検出し、検出した特徴点を含む部分画像（特徴点近傍の局所画像）を抽出して特徴量を算出することにより部分特徴が取得される。本実施形態では、この部分特徴として、
・非特許文献４に記載されているような、Harris Corner Detectorなどのコーナー検出器によって検出された特徴点近傍のAffineInvariant Regionの形状を正規化して局所画像を取得し、
・得られた局所画像に対して、非特許文献１に記載のHistograms ofOriented Gradientsを算出したものとする。但し、本実施形態で使用可能な部分特徴は、これに限られるものではなく、その他の周知の画像特徴量算出方法を用いて取得されたものを用いることができる。 The feature amount calculation unit 302 calculates the feature amount (partial feature) of the part of the image input from the image input unit 301. That is, a feature point is detected from an image to be detected, a partial feature including the detected feature point (a local image near the feature point) is extracted, and a feature amount is calculated to obtain a partial feature. In this embodiment, as this partial feature,
A local image is obtained by normalizing the shape of the AffineInvariant Region near the feature point detected by a corner detector such as Harris Corner Detector as described in Non-Patent Document 4,
It is assumed that Histograms of Oriented Gradients described in Non-Patent Document 1 are calculated for the obtained local image. However, the partial features that can be used in the present embodiment are not limited to this, and those obtained by using other known image feature amount calculation methods can be used.

姿勢パラメータ入力部３０３は、対象物体の姿勢を表す姿勢パラメータを受け取り、その姿勢パラメータを部分特徴記憶部３０４、部分特徴選択部３０５および対応関係記憶部３０７に出力する。入力される姿勢パラメータとしては、対象物体の学習時には、例えば画像入力部３０１より入力される画像に付与された姿勢アノテーションデータが用いられる。すなわち、姿勢パラメータ入力部３０３は、画像に付与された姿勢アノテーションデータから姿勢パラメータを抽出して、部分特徴記憶部３０４、部分特徴選択部３０５および対応関係記憶部３０７に出力する。また、対象物体の検出時には、姿勢パラメータ入力部３０３は、外部のプログラムにより生成されたデータ、例えばある確率密度に従って発生した乱数などを姿勢パラメータとして用いる。すなわち、姿勢パラメータ入力部３０３は、検出対象の画像とは無関係に複数の姿勢パラメータを入力する。 The posture parameter input unit 303 receives a posture parameter representing the posture of the target object, and outputs the posture parameter to the partial feature storage unit 304, the partial feature selection unit 305, and the correspondence relationship storage unit 307. As the posture parameter to be input, for example, posture annotation data attached to an image input from the image input unit 301 is used when learning the target object. That is, the posture parameter input unit 303 extracts posture parameters from the posture annotation data added to the image, and outputs the posture parameters to the partial feature storage unit 304, the partial feature selection unit 305, and the correspondence relationship storage unit 307. At the time of detecting the target object, the posture parameter input unit 303 uses data generated by an external program, for example, a random number generated according to a certain probability density, as the posture parameter. That is, the posture parameter input unit 303 inputs a plurality of posture parameters regardless of the detection target image.

本実施形態では、検出対象を人体としており、姿勢パラメータは、カメラアングルおよび関節角のデータを含む。図４に対象が人体である場合の姿勢パラメータの例を示す。図４において、角度φ_kは、カメラ４１０におけるカメラアングルを規定するk番目のパラメータを表し、図４では、水平方向に対する傾き（pitch）である。角度θ_jは、人体の姿勢を規定するj番目のパラメータを表し、図４においては、上腕と前腕のなす角度である。なお、カメラアングルが固定されている場合は、上記姿勢パラメータのうちカメラパラメータ（角度φ_k）は不要である。また、上記姿勢パラメータのうち検出対象の特定の物体が変形せず、カメラパラメータのみが変化する場合には、変形に関するパラメータ（角度θ_j）は不要である。 In the present embodiment, the detection target is a human body, and the posture parameters include data on camera angles and joint angles. FIG. 4 shows an example of posture parameters when the target is a human body. In FIG. 4, the angle φ _k represents the kth parameter that defines the camera angle in the camera 410, and in FIG. 4, is the pitch with respect to the horizontal direction. The angle θ _j represents the jth parameter that defines the posture of the human body, and in FIG. 4 is the angle formed by the upper arm and the forearm. When the camera angle is fixed, the camera parameter (angle φ _k ) is not required among the attitude parameters. Further, when the specific object to be detected among the posture parameters is not deformed and only the camera parameter is changed, the parameter (angle θ _j ) related to the deformation is unnecessary.

部分特徴記憶部３０４は、特徴量算出部３０２から出力される部分特徴と姿勢パラメータ入力部３０３より出力される姿勢パラメータとを対応付けて記憶する。複数のサンプル画像に対して、特徴量算出部３０２により得られた全ての部分特徴をクラスタリングし、得られたi番目のクラスタをΠ_iとし、クラスタΠ_iの中心の部分特徴をπ_iとする。すなわち、部分特徴π_iは、クラスタΠ_iを代表する部分特徴である。なお、クラスタを代表する部分特徴の決定方法はこれに限られるものではなく、例えば、クラスタに存在する全ての部分特徴の平均を用いてもよい。また、ある部分特徴πとそれに付与された姿勢パラメータθ=(φ₁,φ₂,φ₃,θ₁,θ₂,…,θ_n)の組をx =(π,θ)とする。姿勢パラメータθの下で、クラスタΠ_iに属する部分特徴が表れる確率P(Π_i|θ)を以下の式（１）、（２）に従って算出する。 The partial feature storage unit 304 stores the partial feature output from the feature amount calculation unit 302 and the posture parameter output from the posture parameter input unit 303 in association with each other. Cluster all of the partial features obtained by the feature quantity calculation unit 302 for a plurality of sample images, and let the i-th cluster obtained be Π _i and the partial feature at the center of the cluster Π _{i be} π _i . . That is, the partial feature π _i is a partial feature representing the cluster Π _i . Note that the method for determining the partial feature representing the cluster is not limited to this, and for example, an average of all the partial features existing in the cluster may be used. Further, a set of a partial feature π and a posture parameter θ = (φ ₁ , φ ₂ , φ ₃ , θ ₁ , θ ₂ ,..., Θ _n ) assigned thereto is assumed to be x = (π, θ). Under the posture parameter θ, the probability P (Π _i | θ) that a partial feature belonging to the cluster Π _i appears is calculated according to the following equations (1) and (2).

ここで、n(X)は集合Xに含まれる要素数を、Θ_kは姿勢パラメータ空間のk番目の部分空間をそれぞれ表す。プログラムへの実装では、例えば、Θ_kは姿勢パラメータ空間全体を複数のビンに区切った際のk番目のビンに対応する。すなわち、式（１）の集合X^k _iは、その要素x=(π,θ)の部分特徴πがクラスタΠ_iに含まれ、かつ、姿勢パラメータθがΘ_kに含まれるような集合を表す。部分特徴記憶部３０４は、このi番目のクラスタΠ_iの中心の部分特徴π_iと、式（２）によって算出されるP(Π_i|θ)とを対応付けて、例えば図５の（ｂ）に示すように記憶する。図５の（ｂ）では、例えば１番目のクラスタ（中心の部分特徴π₁）について、各姿勢パラメータθの下で存在する確率ｐ(Π₁|θ)が、欄５０１，５０２等に記録される。このように、部分特徴記憶部３０４には、特定の物体に関する姿勢パラメータ（θ）と、画像の部分特徴（π_iと、該姿勢パラメータにおいて該部分特徴が画像に存在する確率P(Π_i|θ)とが対応付けて記憶される。 Here, n (X) is the number of elements in set X, theta _k respectively represent the k-th subspace posture parameter space. The implementation of the program, for example, theta _k corresponds to the k-th bin at the time of separated whole posture parameter space into a plurality of bins. That is, the set X ^k _i in the expression (1) represents a set in which the partial feature π of the element x = (π, θ) is included in the cluster Π _i and the posture parameter θ is included in Θ _k. . Partial feature storage unit 304, and the i-th cluster [pi _i partial feature in the center of [pi _i, P calculated by the equation (2) | in association with ([pi _i theta), for example, in FIG. 5 (b ). In FIG. 5B, for example, for the first cluster (center partial feature π ₁ ), the probability p (Π ₁ | θ) existing under each posture parameter θ is recorded in columns 501, 502, etc. The As described above, the partial feature storage unit 304 stores the posture parameter (θ) related to a specific object, the partial feature (π _i ) of the image, and the probability P (Π _i | θ) is stored in association with each other.

部分特徴選択部３０５は、姿勢パラメータ入力部３０３より姿勢パラメータθを受け取り、部分特徴記憶部３０４に記憶されたP(Π_i|θ)に基づき、所定数の（以下、Ｍ個）の部分特徴を選択する。 The partial feature selection unit 305 receives the posture parameter θ from the posture parameter input unit 303, and based on P (Π _i | θ) stored in the partial feature storage unit 304, a predetermined number (hereinafter, M pieces) of partial features. Select.

物体検出仮説生成部３０６は、特徴量算出部３０２によって図１に示す画像の検出ウィンドウ領域１２０から抽出され、算出された部分特徴と、部分特徴選択部３０５によって選択された部分特徴とを照合し、検出対象物体の存在有無の仮説を生成する。特徴量算出部３０２によって抽出され、算出された部分特徴のひとつをπ、とし、その算出された部分特徴全ての集合をＱとする。部分特徴選択部３０５によって選択された部分特徴のひとつをπ’とすると、照合度は、例えば、以下の式（３）により与えられる。式（３）によれば、照合度は、集合Ｑに属する全ての部分特徴において、π’との相関が最大となる場合の相関値として与えられる。すなわち、照合度は、特徴量算出部３０２で取得された部分特徴と部分特徴選択部３０５で選択した部分特徴との相関の度合いを示している。

The object detection hypothesis generation unit 306 collates the calculated partial feature with the partial feature selected by the partial feature selection unit 305, extracted from the detection window area 120 of the image shown in FIG. Then, a hypothesis of the presence / absence of the detection target object is generated. One of the partial features extracted and calculated by the feature amount calculation unit 302 is π, and a set of all the calculated partial features is Q. When one of the partial features selected by the partial feature selection unit 305 is π ′, the matching degree is given by the following equation (3), for example. According to Expression (3), the matching degree is given as a correlation value when the correlation with π ′ is the maximum for all the partial features belonging to the set Q. That is, the collation degree indicates the degree of correlation between the partial feature acquired by the feature amount calculation unit 302 and the partial feature selected by the partial feature selection unit 305.

なお、照合度は式（３）で計算する方法に限らず、例えば、あらかじめπ’に関して学習を行った識別器の尤度で、集合Ｑに属する全ての部分特徴において最大となる値を算出してもよい。物体検出仮説生成部３０６は、部分特徴選択部３０５によって選択されたＭ個の部分特徴全てについて照合度を計算し、その総和を物体検出仮説として出力する。 Note that the matching degree is not limited to the method of calculating by Equation (3), and for example, the maximum value of all partial features belonging to the set Q is calculated with the likelihood of the discriminator that has previously learned about π ′. May be. The object detection hypothesis generation unit 306 calculates a matching degree for all M partial features selected by the partial feature selection unit 305, and outputs the sum as an object detection hypothesis.

対応関係記憶部３０７は、物体検出仮説生成部３０６によって生成された物体検出仮説と、姿勢パラメータ入力部３０３より入力された姿勢パラメータを対応づけて記憶する。図５の（ａ）に、対応関係記憶部３０７におけるデータ構成例を示す。図５の（ａ）は、姿勢パラメータに対応する記憶領域を表している。図５の（ａ）に示す例では、姿勢パラメータは、θ₁、θ₂からなる２次元ベクトルで与えられるとしているが、実際の装置では対象物体の姿勢を表現するために必要な次元数のベクトルが用いられる。姿勢パラメータの各要素は、幅Δのビンに区切られている。例えば、図５（ａ）の表の１行２列目のマスには、姿勢パラメータ入力部３０３より入力された姿勢パラメータが、0≦θ₁＜Δ、Δ≦θ₂＜2Δ、の範囲の値である際に、物体検出仮説生成部３０６から出力された値を、全て足し合わせた値が入っている。 The correspondence storage unit 307 stores the object detection hypothesis generated by the object detection hypothesis generation unit 306 and the posture parameter input from the posture parameter input unit 303 in association with each other. FIG. 5A shows a data configuration example in the correspondence relationship storage unit 307. FIG. 5A shows a storage area corresponding to the posture parameter. In the example shown in FIG. 5A, the posture parameter is given as a two-dimensional vector composed of θ ₁ and θ _2. However, in an actual device, the number of dimensions necessary for expressing the posture of the target object is set. A vector is used. Each element of the posture parameter is divided into bins having a width Δ. For example, in the cell in the first row and the second column in the table of FIG. 5A, the posture parameters input from the posture parameter input unit 303 are in the range of 0 ≦ θ ₁ <Δ and Δ ≦ θ ₂ <2Δ. When the value is a value, a value obtained by adding all the values output from the object detection hypothesis generation unit 306 is included.

検出確定部３０８は、対応関係記憶部３０７に記憶されたデータの極値を求め、それら極値の中である定めた閾値を超え、かつ最大の値となる場合に対象物体が検出されたと判定し、かつその対応する姿勢パラメータを対象物体の姿勢と定める。対応関係記憶部３０７に記憶されたデータの極値を求める方法には例えば非特許文献５に記載のmean shift法などがある。以上が、本実施形態にかかる物体検出装置に関する構成部分である。 The detection confirmation unit 308 obtains the extreme values of the data stored in the correspondence relationship storage unit 307, and determines that the target object has been detected when the threshold value exceeds a predetermined threshold value among the extreme values and becomes the maximum value. And the corresponding posture parameter is determined as the posture of the target object. As a method of obtaining the extreme value of the data stored in the correspondence relationship storage unit 307, for example, there is a mean shift method described in Non-Patent Document 5. The above is the configuration part related to the object detection apparatus according to the present embodiment.

（処理）
続いて図６Ａ、図６Ｂに示したフローチャートを用いて、本実施形態の物体検出装置３００が行う処理について説明する。なお、同フローチャートに従ったプログラムコードは、本実施形態の装置内の、不図示のＲＡＭやＲＯＭなどのメモリに格納され、不図示のＣＰＵなどにより読み出され、実行される。まず、正解サンプル画像を用いた、姿勢パラメータと部分特徴の学習を図６ＡのステップＳ６０１乃至Ｓ６０６で行い、次に図６ＢのステップＳ６０７乃至Ｓ６１６で対象画像中の物体検出を行う。なお、例えば、ステップＳ６０１乃至Ｓ６０６を実行する学習処理用のコンピュータ（情報処理装置）と、ステップＳ６０７乃至Ｓ６１６を実行する検出処理用のコンピュータ（情報処理装置）とは同一である必要はない。 (processing)
Next, processing performed by the object detection device 300 of the present embodiment will be described using the flowcharts shown in FIGS. 6A and 6B. Note that the program code according to the flowchart is stored in a memory such as a RAM or a ROM (not shown) in the apparatus of the present embodiment, and is read and executed by a CPU (not shown). First, learning of posture parameters and partial features using the correct sample image is performed in steps S601 to S606 in FIG. 6A, and then object detection in the target image is performed in steps S607 to S616 in FIG. 6B. Note that, for example, the computer for learning processing (information processing apparatus) that executes steps S601 to S606 and the computer for detection processing (information processing apparatus) that executes steps S607 to S616 are not necessarily the same.

まずステップＳ６０１において、画像入力部３０１がポジティブサンプル画像I_p1を読み込む。続いてステップＳ６０２において、姿勢パラメータ入力部３０３がポジティブサンプル画像I_p1に対応する姿勢パラメータθ_p1を読み込む。本処理は学習処理であるので、上述したように、姿勢パラメータ入力部３０３は、ポジティブサンプル画像I_p1に付与された姿勢アノテーションデータから姿勢パラメータθ_p1を読み込む。 First, in step S601, the image input unit 301 reads a positive sample image _Ip1 . Subsequently, in step S602, the posture parameter input unit 303 reads the posture parameter θ _p1 corresponding to the positive sample image I _p1 . Since this process is a learning process, as described above, the attitude parameter input unit 303 reads the attitude parameter θ _p1 from the attitude annotation data given to the positive sample image I _p1 .

続いてステップＳ６０３において、特徴量算出部３０２は、ポジティブサンプル画像I_p1の部分特徴を抽出する。続いてステップＳ６０４において、全てのポジティブサンプルについてステップＳ６０１乃至Ｓ６０３の処理が終了したかを確認する。まだ処理が終了していないポジティブサンプルがあればそのサンプルに関してステップＳ６０１〜Ｓ６０３の処理が行われる。他方、全てのポジティブサンプルについて上記処理が終了していれば、処理はステップＳ６０５へ進む。 Subsequently, in step S603, the feature amount calculation unit 302 extracts partial features of the positive sample image _Ip1 . Subsequently, in step S604, it is confirmed whether or not the processing in steps S601 to S603 has been completed for all positive samples. If there is a positive sample that has not been processed yet, the processing of steps S601 to S603 is performed on that sample. On the other hand, if the above process has been completed for all positive samples, the process proceeds to step S605.

ステップＳ６０５において、特徴量算出部３０２は、ステップＳ６０１乃至Ｓ６０４の処理によって得られた全ての部分特徴を、特徴の類似性に基づいてクラスタリングする。そして、i番目のクラスタをΠiとし、そのクラスタを代表する部分特徴を当該クラスタの中心の部分特徴π_iとして、部分特徴記憶部３０４に記憶する。続いてステップＳ６０６において、特徴量算出部３０２は、ステップＳ６０５において求めたクラスタΠ_iに属する部分特徴が姿勢パラメータθの下で現れる確率P(Π_i|θ)を各部分特徴について算出し、部分特徴記憶部３０４に記憶する。 In step S605, the feature amount calculation unit 302 clusters all partial features obtained by the processing in steps S601 to S604 based on the feature similarity. Then, the i-th cluster is set as Πi, and the partial feature representing the cluster is stored in the partial feature storage unit 304 as the partial feature π _{i at} the center of the cluster. Subsequently, in step S606, the feature amount calculation unit 302 calculates, for each partial feature, a probability P (Π _i | θ) that the partial feature belonging to the cluster Π _i obtained in step S605 appears under the posture parameter θ. Store in the feature storage unit 304.

上記のステップＳ６０１乃至Ｓ６０６により、検出対象のポジティブサンプルから、部分特徴と姿勢パラメータの対応関係の学習が完了し、図５の（ｂ）に示したような情報が部分特徴記憶部３０４に記憶される。続いて、Ｓ６０７乃至Ｓ６１６で対象画像中からの特定の物体（本例では人体）の検出を行う。 Through the above steps S601 to S606, learning of the correspondence between the partial features and the posture parameters is completed from the positive sample to be detected, and information as shown in FIG. 5B is stored in the partial feature storage unit 304. The Subsequently, in S607 to S616, a specific object (human body in this example) is detected from the target image.

まず、ステップＳ６０７において、画像入力部３０１は検出対象となる画像を入力する。続いてステップＳ６０８において、特徴量算出部３０２は、ステップＳ６０７において入力された画像の任意の位置に検出対象ウィンドウを設定する。続いてステップＳ６０９において、特徴量算出部３０２は、ステップＳ６０８において設定された検出ウィンドウ内で部分特徴を算出する。 First, in step S607, the image input unit 301 inputs an image to be detected. Subsequently, in step S608, the feature amount calculation unit 302 sets a detection target window at an arbitrary position of the image input in step S607. Subsequently, in step S609, the feature amount calculation unit 302 calculates partial features within the detection window set in step S608.

続いてステップＳ６１０において、姿勢パラメータ入力部３０３は、姿勢パラメータθを入力する。対象物体の検出時であるので、姿勢パラメータ入力部３０３は、外部のプログラムにより生成されたデータ、例えば、姿勢の生じやすさなどのある確率密度に従って発生した乱数などを姿勢パラメータとして用いる。続いてステップＳ６１１において、部分特徴選択部３０５は、部分特徴記憶部３０４に記憶されたP(Π_i|θ)に基づき、ステップＳ６１０において入力された姿勢パラメータθに対応する部分特徴をＭ個選択する。例えば、確率の大きい順にＭ個が選択される。続いてステップＳ６１２において、物体検出仮説生成部３０６は、ステップＳ６１０で選択されたＭ個の部分特徴それぞれについて、ステップＳ６０９において算出された部分特徴との照合度（検出スコア）を算出する（式（３））。続いてステップＳ６１３で、物体検出仮説生成部３０６は、ステップＳ６１２において算出されたＭ個の照合度の総和を、ステップＳ６１０において入力された姿勢パラメータθに対応する対応関係記憶部３０７の記憶領域に格納されている値に加算する。 Subsequently, in step S610, the posture parameter input unit 303 inputs the posture parameter θ. Since the target object is being detected, the posture parameter input unit 303 uses data generated by an external program, for example, a random number generated according to a certain probability density such as the likelihood of posture, as the posture parameter. Subsequently, in step S611, the partial feature selection unit 305 selects M partial features corresponding to the posture parameter θ input in step S610 based on P (Π _i | θ) stored in the partial feature storage unit 304. To do. For example, M items are selected in descending order of probability. Subsequently, in step S612, the object detection hypothesis generation unit 306 calculates, for each of the M partial features selected in step S610, a matching degree (detection score) with the partial feature calculated in step S609 (expression ( 3)). Subsequently, in step S613, the object detection hypothesis generation unit 306 stores the sum of the M matching degrees calculated in step S612 in the storage area of the correspondence storage unit 307 corresponding to the posture parameter θ input in step S610. Add to the stored value.

続いてステップＳ６１４において、ステップＳ６１０乃至Ｓ６１３の処理が規定回数繰り返されたかが判定され、完了していなければステップＳ６１０へ処理がもどされる。こうして、上記ステップＳ６１０〜Ｓ６１３の処理が規定回数繰り返される。一方、ステップＳ６１４で規定回数の処理が完了していたら、処理はステップＳ６１５へ進む。ステップＳ６１５において、検出確定部３０８は、対応関係記憶部３０７に記憶されたデータから、物体の存在を判定する。すなわち、検出確定部３０８は、上記加算の結果から閾値を越える極値が存在するかどうかにより物体の存在を判定する。ステップＳ６０８において設定された検出ウィンドウとは異なる領域において検出を行う場合は、再びステップＳ６０８に戻り、検出ウィンドウを新たな場所に設定し、ステップＳ６０９乃至Ｓ６１５の処理を行う（Ｓ６１６でＹＥＳ）。異なる領域について検出を行なわない場合には、本処理を終了する（Ｓ６１６でＮＯ）。 Subsequently, in step S614, it is determined whether the processes in steps S610 to S613 have been repeated a specified number of times. If not completed, the process returns to step S610. In this way, the processes in steps S610 to S613 are repeated a specified number of times. On the other hand, if the specified number of processes has been completed in step S614, the process proceeds to step S615. In step S615, the detection determination unit 308 determines the presence of the object from the data stored in the correspondence relationship storage unit 307. That is, the detection confirmation unit 308 determines the presence of an object based on whether or not there is an extreme value exceeding the threshold from the result of the addition. If detection is to be performed in an area different from the detection window set in step S608, the process returns to step S608 again, sets the detection window to a new location, and performs the processing of steps S609 to S615 (YES in S616). If the detection is not performed for a different area, the process ends (NO in S616).

以上の処理によって、対象物体検出処理が終了し、図１の画像より検出対象物体とその姿勢を検出することができる。以上のように、本実施形態では、姿勢パラメータに対応した人体部位特徴の組が選択され、選択された人体部位特徴の幾何学的な配置関係に依らず人体の存在有無の仮説が生成される。このような仮説の生成を、繰り返し処理により複数の姿勢パラメータについて行うことにより複数の仮説が生成され、生成された複数の仮説を統合することで画像中の人体の検出が行われる。すなわち、本実施形態によれば、人体部位の相対位置関係の学習をすることなく、変動姿勢の人体の検出を行うことができる。 With the above processing, the target object detection processing is completed, and the detection target object and its posture can be detected from the image of FIG. As described above, in this embodiment, a set of human body part features corresponding to posture parameters is selected, and a hypothesis of the presence or absence of a human body is generated regardless of the geometrical arrangement relationship of the selected human body part features. . By generating such hypotheses for a plurality of posture parameters by iterative processing, a plurality of hypotheses are generated, and a human body in an image is detected by integrating the generated hypotheses. That is, according to the present embodiment, it is possible to detect a human body with a varying posture without learning the relative positional relationship between human body parts.

なお、上記実施形態では人を検出対象としたが、画像中の任意の物体を検出対象とすることができることは明らかである。 In the above embodiment, a person is a detection target, but it is obvious that any object in the image can be a detection target.

以上、実施形態を詳述したが、本発明は、例えば、システム、装置、方法、プログラムもしくは記憶媒体等としての実施態様をとることが可能である。具体的には、複数の機器から構成されるシステムに適用しても良いし、また、一つの機器からなる装置に適用しても良い。 Although the embodiment has been described in detail above, the present invention can take an embodiment as a system, apparatus, method, program, storage medium, or the like. Specifically, the present invention may be applied to a system composed of a plurality of devices, or may be applied to an apparatus composed of a single device.

また、本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。 The present invention can also be realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, or the like) of the system or apparatus reads the program. It is a process to be executed.

Claims

A storage unit that stores a pose parameter related to a specific object, a partial feature of the image, and a probability that the partial feature exists in the image in the pose parameter in association with each other;
An acquisition means for acquiring a partial feature by detecting a feature point from an image to be detected of the specific object, extracting a local image including the detected feature point, and calculating a feature amount;
Input means for inputting a plurality of posture parameters irrespective of the image to be detected;
Selection means for selecting a predetermined number of partial features in descending order of probability stored in the storage means for each of the plurality of posture parameters;
Collating means for calculating a matching degree indicating a degree of correlation between the partial feature acquired by the acquiring means and the partial feature selected by the selecting means;
Object detection comprising: determination means for determining whether or not the specific object is present in the detection target image based on a matching degree calculated by the matching means with respect to the plurality of posture parameters. apparatus.

Learning means for generating information to be stored in the storage means;
The learning means includes
Obtaining posture parameters of the specific object from annotation data of a plurality of sample images;
In each of the plurality of sample images, a feature point is detected, and a partial feature is obtained by extracting a local image including the detected feature point and calculating a feature amount,
Cluster the acquired partial features based on similarity, calculate the probability that the partial features belonging to each cluster exist in each posture parameter,
The object detection apparatus according to claim 1, wherein a posture parameter, a probability of existence of a cluster, and a partial feature representing the cluster are associated with each other and stored in the storage unit.

The determination unit divides the posture parameter space with the number of parameters of the posture parameter as the number of dimensions into a plurality of partial spaces, and determines a plurality of matching degrees calculated by the matching unit with respect to the plurality of posture parameters. Adding to each subspace to which each of the parameters belongs, and determining that the specific object is present in the detection target image when there is an extreme value exceeding a predetermined threshold as a result of the addition. Item 3. The object detection apparatus according to Item 1 or 2.

The said determination means determines that the attitude | position parameter corresponding to the partial space in which the said extreme value exists is an attitude | position of the said specific object which exists in the said image of a detection target. Object detection device.

The object detection apparatus according to claim 1, wherein the posture parameter includes a parameter indicating a camera angle of a camera that has captured an image.

The object detection apparatus according to claim 1, wherein the specific object has a joint, and the posture parameter includes a parameter indicating a joint angle of each joint.

An information processing apparatus including a storage unit that stores a posture parameter related to a specific object, a partial feature of the image, and a probability that a local image of the partial feature exists in the posture parameter in the image is identified from the image. An object detection method for detecting an object of
An obtaining step for obtaining a partial feature by extracting a local image from an image to be detected of the specific object;
An input step in which the input means inputs a plurality of posture parameters regardless of the detection target image;
A selection step of selecting a predetermined number of partial features in descending order of probability stored in the storage unit for each of the plurality of posture parameters;
A matching step in which a matching unit calculates a matching degree indicating a degree of correlation between the partial feature acquired in the acquiring step and the partial feature selected in the selection step;
And a determination step of determining whether or not the specific object is present in the detection target image based on the matching degree calculated in the matching step with respect to the plurality of posture parameters. An object detection method.

The computer program for functioning a computer as each means of the object detection apparatus of any one of Claims 1 thru | or 6.

A computer-readable storage medium storing the computer program according to claim 8.