JP2009288917A

JP2009288917A - Information processor, information processing method and program

Info

Publication number: JP2009288917A
Application number: JP2008138995A
Authority: JP
Inventors: Keisuke Yamaoka; 啓介山岡
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2008-05-28
Filing date: 2008-05-28
Publication date: 2009-12-10

Abstract

PROBLEM TO BE SOLVED: To estimate an attitude of an object without imposing a burden on the object such as a person. SOLUTION: In this information processor, a silhouette extraction part 32 extracts a silhouette showing an area wherein a photographic subject appears from an image obtained by imaging the photographic subject, a line thinning part 33 thins the silhouette into a silhouette line simply showing the shape of the silhouette, a matching part 36 calculates a matching degree representing a degree of matching between an attitude model line and the silhouette line in each of a plurality of attitude model lines expressing prepared prescribed attitudes different from each other, and an attitude estimation part 37 estimates the attitude shown by the attitude model line having the maximum matching degree among the matching degrees calculated for each of the plurality of attitude model lines as the attitude of the photographic subject. The information processor can be applied to a personal computer, for example. COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、情報処理装置、情報処理方法、およびプログラムに関し、特に、例えば、姿勢が推定される対象である推定対象に負担をかけることなく、推定対象の姿勢を容易に推定することができるようにした情報処理装置、情報処理方法、およびプログラムに関する。 The present invention relates to an information processing device, an information processing method, and a program, and particularly, for example, can easily estimate the posture of an estimation target without imposing a burden on the estimation target that is a target whose posture is estimated. The present invention relates to an information processing apparatus, an information processing method, and a program.

ジェスチャ認識技術として、例えば、米国Motion Analysis社の「Eagle & Hawk Digital System」（商標）や、米国Vicon Peak社の「MX Motion Capture」（商標）等のように、人物の身体にマーカを付加するか、または特殊なセンサを内蔵したグローブを人物の手に装着した上で、複数のカメラにより人物を撮像し、撮像された複数の撮像画像から、人物の姿勢を推定することにより、人物のジェスチャを認識する技術が提案されている。 As a gesture recognition technology, for example, "Eagle & Hawk Digital System" (trademark) by Motion Analysis in the United States, "MX Motion Capture" (trademark) by Vicon Peak in the United States, etc., a marker is added to the human body. Alternatively, a person's gesture can be obtained by wearing a glove with a built-in special sensor on a person's hand, capturing the person with a plurality of cameras, and estimating the posture of the person from the plurality of captured images. A technology for recognizing this has been proposed.

また、例えば、プレイステーション（株式会社ソニーコンピュータエンタテインメントの登録商標）のeye Toy（ソニーコンピュータエンタテインメントヨーロッパリミテッドの登録商標）システムのように、１台のカメラにより人物を撮像し、撮像により得られた撮像画像と、人物を含まない背景のみが撮像された背景画像との差分や、撮像画像のフレームどうしの差分を用いて、人物の動きのある撮像画像内の領域を抽出する動き領域抽出技術が提案されている。 In addition, for example, like a PlayStation (registered trademark of Sony Computer Entertainment Inc.) eye Toy (registered trademark of Sony Computer Entertainment Europe Limited) system, a person is imaged by a single camera, and a captured image obtained by imaging And a motion region extraction technology that extracts a region in a captured image with a person's movement using the difference between the background image in which only the background not including the person is captured and the difference between frames of the captured image is proposed. ing.

さらに、例えば、それぞれ異なる入射角を有する参照光を用いて複数の基準姿勢画像が記録されたホログラフィック素子に、入射される入射画像が入射されたとき、ホログラフィック素子から出射される光の強度と方向とを検出し、検出した光の強度と方向により、入射画像が、複数の基準姿勢画像のうちのいずれかに一致するか、いずれにも一致しないかを判定する技術も存在する（例えば、特許文献１を参照）。 Further, for example, when an incident image is incident on a holographic element in which a plurality of standard posture images are recorded using reference beams having different incident angles, the intensity of light emitted from the holographic element There is also a technique for detecting whether or not an incident image matches any of a plurality of reference posture images or not according to the detected light intensity and direction (for example, , See Patent Document 1).

特開平０９−２７３９２０号公報JP 09-273920 A

しかしながら、上述したジェスチャ認識技術では、複数のカメラが設けられた非常に大きなスタジオで、人物の身体にマーカを付加するか、またはグローブを人物の手に装着した上で、複数のカメラにより人物を撮像しなければならないため、非常に、人物に負担をかけるものとなっていた。 However, in the gesture recognition technology described above, in a very large studio provided with a plurality of cameras, a marker is added to a person's body or a glove is attached to a person's hand, and a person is detected by a plurality of cameras. Since the image must be taken, it is very burdensome for the person.

さらに、動き領域抽出技術では、人物にマーカ等を付与する特別な設定は必要ないが、撮像画像内の動く領域を抽出するといった機能に限定されていた。 Furthermore, the motion region extraction technique does not require any special setting for adding a marker or the like to a person, but is limited to a function of extracting a moving region in a captured image.

本発明は、このような状況に鑑みてなされたものであり、人物等の推定対象に負担をかけることなく、推定対象の姿勢を容易に推定できるようにするものである。 The present invention has been made in view of such circumstances, and makes it possible to easily estimate the posture of an estimation target without imposing a burden on the estimation target such as a person.

本発明の一側面の情報処理装置、またはプログラムは、被写体が撮像された撮像画像から、前記被写体が現れた領域を表すシルエットを抽出する抽出手段と、前記シルエットを、前記シルエットの形状を簡略化したシルエット線に細線化する細線化手段と、予め用意されている、互いに異なる所定の姿勢を表す複数の姿勢モデル線毎に、前記姿勢モデル線と前記シルエット線との一致の度合いを表す一致度を算出する算出手段と、複数の前記姿勢モデル線毎に算出された前記一致度のうち、一致の度合いが最大の一致度に対応する前記姿勢モデル線が表す姿勢を、前記被写体の姿勢として推定する推定手段とを備える情報処理装置、または情報処理装置として、コンピュータを機能させるためのプログラムである。 An information processing apparatus or program according to an aspect of the present invention includes an extraction unit that extracts a silhouette representing a region where the subject appears from a captured image obtained by capturing the subject, and simplifies the shape of the silhouette. The degree of coincidence representing the degree of coincidence between the posture model line and the silhouette line for each of a plurality of posture model lines that are prepared in advance and represent predetermined postures different from each other. And a posture represented by the posture model line corresponding to the degree of coincidence having the maximum degree of coincidence among the degree of coincidence calculated for each of the plurality of posture model lines is estimated as the posture of the subject. An information processing apparatus including an estimation unit that performs the above function or a program for causing a computer to function as the information processing apparatus.

前記算出手段では、前記一致度として、複数の前記姿勢モデル線毎に、前記姿勢モデル線と前記シルエット線との距離を算出し、前記推定手段では、複数の前記姿勢モデル線毎に算出された距離のうちの最短距離に対応する前記姿勢モデル線が表す姿勢を、前記被写体の姿勢として推定することができる。 The calculation unit calculates the distance between the posture model line and the silhouette line for each of the plurality of posture model lines as the degree of coincidence, and the estimation unit calculates the distance for each of the plurality of posture model lines. The posture represented by the posture model line corresponding to the shortest distance among the distances can be estimated as the posture of the subject.

前記推定手段では、前記最短距離が所定の閾値未満である場合、前記最短距離に対応する前記姿勢モデル線が表す姿勢を、前記被写体の姿勢として推定することができる。 In the estimation unit, when the shortest distance is less than a predetermined threshold, the posture represented by the posture model line corresponding to the shortest distance can be estimated as the posture of the subject.

前記算出手段では、複数の前記姿勢モデル線毎に、前記姿勢モデル線の特徴を表す特徴量と、前記シルエット線の特徴を表す特徴量との一致の度合いを表す前記一致度を算出することができる。 The calculation means calculates, for each of the plurality of posture model lines, the degree of coincidence representing a degree of coincidence between a feature amount representing the feature of the posture model line and a feature amount representing the feature of the silhouette line. it can.

本発明の一側面の情報処理方法は、前記抽出手段が、被写体が撮像された撮像画像から、前記被写体が現れた領域を表すシルエットを抽出し、前記細線化手段が、前記シルエットを、前記シルエットの形状を簡略化したシルエット線に細線化し、前記算出手段が、予め用意されている、互いに異なる所定の姿勢を表す複数の姿勢モデル線毎に、前記姿勢モデル線と前記シルエット線との一致の度合いを表す一致度を算出し、前記推定手段が、複数の前記姿勢モデル線毎に算出された前記一致度のうち、一致の度合いが最大の一致度に対応する前記姿勢モデル線が表す姿勢を、前記被写体の姿勢として推定するステップを含む情報処理方法である。 In the information processing method according to one aspect of the present invention, the extraction unit extracts a silhouette representing an area where the subject appears from a captured image obtained by imaging the subject, and the thinning unit extracts the silhouette and the silhouette. For each of a plurality of posture model lines representing predetermined postures that are different from each other, the calculation unit is configured to match the posture model line and the silhouette line. A degree of coincidence representing a degree is calculated, and the estimation unit represents the posture represented by the posture model line corresponding to the degree of coincidence having the largest degree of coincidence among the degree of coincidence calculated for each of the plurality of posture model lines. And an information processing method including a step of estimating the posture of the subject.

本発明の一側面においては、被写体が撮像された撮像画像から、前記被写体が現れた領域を表すシルエットが抽出され、前記シルエットが、前記シルエットの形状を簡略化したシルエット線に細線化され、予め用意されている、互いに異なる所定の姿勢を表す複数の姿勢モデル線毎に、前記姿勢モデル線と前記シルエット線との一致の度合いを表す一致度が算出され、複数の前記姿勢モデル線毎に算出された前記一致度のうち、一致の度合いが最大の一致度に対応する前記姿勢モデル線が表す姿勢が、前記被写体の姿勢として推定される。 In one aspect of the present invention, a silhouette representing a region where the subject appears is extracted from a captured image obtained by capturing the subject, the silhouette is thinned into a silhouette line that simplifies the shape of the silhouette, The degree of coincidence representing the degree of coincidence between the posture model line and the silhouette line is calculated for each of a plurality of prepared posture model lines representing predetermined different postures, and calculated for each of the plurality of posture model lines. The posture represented by the posture model line corresponding to the degree of coincidence having the highest degree of coincidence among the obtained degrees of coincidence is estimated as the posture of the subject.

本発明によれば、推定対象に負担をかけることなく、推定対象の姿勢を容易に推定することができる。 According to the present invention, the posture of the estimation target can be easily estimated without imposing a burden on the estimation target.

以下、図面を参照して、本実施の形態について説明する。 Hereinafter, the present embodiment will be described with reference to the drawings.

図１は、本発明を適用した情報処理装置の構成例を示している。 FIG. 1 shows a configuration example of an information processing apparatus to which the present invention is applied.

情報処理装置１は、カメラ３１、シルエット抽出部３２、細線化部３３、距離算出部３４、姿勢記憶部３５、マッチング部３６、および姿勢推定部３７により構成される。 The information processing apparatus 1 includes a camera 31, a silhouette extraction unit 32, a thinning unit 33, a distance calculation unit 34, a posture storage unit 35, a matching unit 36, and a posture estimation unit 37.

カメラ３１は、被写体として、例えば、人物を撮像し、その撮像により得られた撮像画像を、シルエット抽出部３２に供給する。 The camera 31 captures, for example, a person as a subject, and supplies a captured image obtained by the imaging to the silhouette extraction unit 32.

シルエット抽出部３２は、カメラ３１からの撮像画像から、撮像画像内の人物が現れた領域を表すシルエットを検出（抽出）し、検出したシルエットが現れた画像であるシルエット画像を生成して、細線化部３３に供給する。 The silhouette extraction unit 32 detects (extracts) a silhouette representing a region in which the person appears in the captured image from the captured image from the camera 31, generates a silhouette image that is an image in which the detected silhouette appears, and To the conversion unit 33.

図２は、シルエット抽出部３２により、撮像画像から生成されるシルエット画像の一例を示している。 FIG. 2 shows an example of a silhouette image generated from the captured image by the silhouette extraction unit 32.

シルエット画像は、例えば、各画素の画素値が０または１に２値化された２値化画像である。シルエット画像内の人物のシルエットは、画素値が０を表す白色の画素（白色画素）で示されるとともに、シルエット画像内の背景は、画素値が１を表す黒色の画素（黒色画素）で示されている。 The silhouette image is, for example, a binarized image in which the pixel value of each pixel is binarized to 0 or 1. The silhouette of the person in the silhouette image is indicated by white pixels (white pixels) whose pixel value is 0, and the background in the silhouette image is indicated by black pixels (black pixels) whose pixel value is 1. ing.

なお、撮像画像内の人物のシルエットを検出する検出方法としては、例えば、予め撮像されて保持されている、人物を含まない背景のみが撮像された背景画像と、カメラ３１からの撮像画像との差分をとる背景差分法を用いることにより、撮像画像内の人物のシルエットを検出する方法を採用することができる。また、Graph Cutとステレオビジョンを用いる方法（"Bi-Layer segmentation of binocular stereo video" V.Kolmogorov, A.Blake et al. Microsoft Research Ltd., Cambridge, UK）を用れば、より精度良く撮像画像内の人物のシルエットを検出することができる。 As a detection method for detecting a silhouette of a person in a captured image, for example, a background image that is captured and held in advance and only a background that does not include a person is captured, and a captured image from the camera 31 are used. By using a background subtraction method that takes a difference, a method of detecting a silhouette of a person in a captured image can be employed. In addition, using the method using Graph Cut and stereo vision ("Bi-Layer segmentation of binocular stereo video" V. Kolmogorov, A. Blake et al. Microsoft Research Ltd., Cambridge, UK) The silhouette of the person inside can be detected.

図１に戻り、細線化部３３は、シルエット抽出部３２から供給されたシルエット画像内の人物のシルエットを、そのシルエットの形状を簡略化したシルエット線として、例えば、線幅が１画素分の幅となるシルエットの中心線（図３に示す）に細線化する細線化処理を行い、その細線化処理により得られた細線化画像を、距離算出部３４に供給する。 Returning to FIG. 1, the thinning unit 33 uses the silhouette of the person in the silhouette image supplied from the silhouette extraction unit 32 as a silhouette line in which the shape of the silhouette is simplified, for example, the line width is one pixel width. A thinning process for thinning the centerline of the silhouette (shown in FIG. 3) is performed, and a thinned image obtained by the thinning process is supplied to the distance calculation unit 34.

次に、図３は、細線化部３３が行う細線化処理により、シルエット画像から得られた細線化画像の一例を示している。 Next, FIG. 3 shows an example of a thinned image obtained from a silhouette image by thinning processing performed by the thinning unit 33.

図３の細線化画像は２値化画像であり、人物のシルエットの形状を簡略化したシルエット線が白色画素で示され、背景が黒色画素で示されている。 The thinned image in FIG. 3 is a binarized image, and a silhouette line obtained by simplifying the shape of a person's silhouette is indicated by white pixels, and a background is indicated by black pixels.

なお、細線化処理では、人物のシルエットのトポロジカルな性質（連結性やオイラー数）が保存された形で、人物のシルエットがシルエット線に細線化される。すなわち、細線化処理では、人物のシルエットの肘や膝等の連結部分が消滅したり、複数の異なる連結部分が１つに統合されたり、人物のシルエットにより形成される孔が消滅したり、新たに生成されたりすることがないように、人物のシルエットがシルエット線に細線化される。 In the thinning process, the silhouette of a person is thinned into a silhouette line in a form that preserves the topological properties (connectivity and Euler number) of the silhouette of the person. That is, in the thinning process, the connecting parts such as the elbows and knees of the person's silhouette disappear, a plurality of different connecting parts are integrated into one, the hole formed by the person's silhouette disappears, The silhouette of a person is thinned into a silhouette line so that it will not be generated.

また、細線化処理では、線幅が１画素分の幅となるシルエットの中心線をシルエット線とする他、その他、例えば、人物のシルエットから、そのシルエットの輪郭を検出し、検出した輪郭を表す輪郭線を、シルエット線とすることが可能である。 In the thinning process, the center line of the silhouette whose line width is one pixel width is used as the silhouette line. In addition, for example, the silhouette outline is detected from the silhouette of the person, and the detected outline is represented. The contour line can be a silhouette line.

図１に戻り、距離算出部３４は、細線化部３３からの細線化画像の各画素について、例えば、最も近くに存在する、シルエット線を構成する白色画素との直線距離を表すユークリッド距離（Euclidean）等の距離を算出する距離算出処理を行い、その距離算出処理により得られた距離画像を、マッチング部３６に供給する。なお、距離画像とは、例えば、細線化画像の各画素について算出されたユークリッド距離により構成される。 Returning to FIG. 1, the distance calculation unit 34, for each pixel of the thinned image from the thinning unit 33, for example, the Euclidean distance (Euclidean distance) representing the straight line distance between the closest white pixels that form the silhouette line ) And the like, and a distance image obtained by the distance calculation process is supplied to the matching unit 36. The distance image is composed of, for example, a Euclidean distance calculated for each pixel of the thinned image.

次に、図４は、距離算出部３４が行う距離算出処理により、細線化画像から得られた距離画像の一例を示している。 Next, FIG. 4 shows an example of a distance image obtained from the thinned image by the distance calculation processing performed by the distance calculation unit 34.

図４Ａには細線化画像が示されており、図４Ｂには、細線化画像の各画素のユークリッド距離をグレースケールで表した距離画像が示されている。 4A shows a thinned image, and FIG. 4B shows a distance image in which the Euclidean distance of each pixel of the thinned image is expressed in gray scale.

なお、図４Ｂの距離画像において、グレーが濃い程（黒色に近い程）にユークリッド距離が短いことを示しており、グレーが薄い程（白色に近い程）にユークリッド距離が長いことを示している。 In the distance image of FIG. 4B, the darker the gray (closer to black), the shorter the Euclidean distance, and the lighter the gray (closer to white), the longer the Euclidean distance. .

図１に戻り、姿勢記憶部３５は、所定の姿勢を表す姿勢モデル線を生成するために用いる、人物の姿勢を示す際の特徴的な部分である複数の特徴点（頭、胸、および各関節等）の座標値を記憶している。 Returning to FIG. 1, the posture storage unit 35 uses a plurality of feature points (a head, a chest, and each of characteristic points used to generate a posture model line representing a predetermined posture, which are characteristic portions when indicating the posture of a person. The coordinate values of joints etc. are stored.

次に、図５は、姿勢記憶部３５が記憶している座標値から生成される姿勢モデル線が現れた姿勢モデル画像の一例を示している。 Next, FIG. 5 shows an example of a posture model image in which a posture model line generated from the coordinate values stored in the posture storage unit 35 appears.

図５の姿勢モデル画像には、番号０乃至１４が付された黒丸（●）どうしを線分で結んだ姿勢モデル線が示されている。 The posture model image in FIG. 5 shows posture model lines in which black circles (●) numbered 0 to 14 are connected by line segments.

番号０乃至１４が付された黒丸は、姿勢モデル画像内の、複数の特徴点の座標値を示している。 Black circles with numbers 0 to 14 indicate coordinate values of a plurality of feature points in the posture model image.

すなわち、例えば、番号０は、人物の胸の位置を示しており、番号１は人物の頭の位置を示している。また、番号２および３は、それぞれ、人物の左肩と右肩の位置を示しており、番号４および５は、それぞれ、人物の左肘と右肘の位置を示している。さらに、番号６および７は、それぞれ、人物の左手と右手の位置を示している。 That is, for example, the number 0 indicates the position of the person's chest, and the number 1 indicates the position of the person's head. Numbers 2 and 3 indicate the positions of the left shoulder and right shoulder of the person, respectively, and numbers 4 and 5 indicate the positions of the left and right elbows of the person, respectively. Furthermore, numbers 6 and 7 indicate the positions of the left hand and right hand of the person, respectively.

番号８は、人物の腰部のうちの中央の位置を示しており、番号９および１０は、それぞれ、人物の腰部のうちの左側と右側の位置を示している。また、番号１１および１２は、それぞれ、人物の左膝と右膝の位置を示しており、番号１３および１４は、それぞれ、人物の左足と右足の位置を示している。 Number 8 indicates the center position of the person's waist, and numbers 9 and 10 indicate the left and right positions of the person's waist, respectively. Numbers 11 and 12 indicate the positions of the left and right knees of the person, and numbers 13 and 14 indicate the positions of the left and right legs of the person, respectively.

図１に戻り、マッチング部３６は、姿勢記憶部３５に記憶されている１５個の座標値（番号０乃至１４が付された黒丸）のセットを読み出し、予め定められた座標値どうしを線分で結ぶことにより、所定の姿勢モデル線が現れた姿勢モデル画像を生成する。 Returning to FIG. 1, the matching unit 36 reads a set of 15 coordinate values (black circles with numbers 0 to 14) stored in the posture storage unit 35, and sets predetermined coordinate values as line segments. The posture model image in which the predetermined posture model line appears is generated by connecting with.

また、マッチング部３６は、生成した複数の姿勢モデル画像と、距離算出部３４から供給された距離画像とに基づいて、複数の姿勢モデル画像毎に、姿勢モデル画像内の姿勢モデル線と細線化画像内のシルエット線との一致の度合いを表すチャンファ（Chamfer）距離を算出する距離マッチング処理を行う。 Further, the matching unit 36 thins the posture model line in the posture model image for each of the plurality of posture model images based on the plurality of generated posture model images and the distance image supplied from the distance calculation unit 34. A distance matching process is performed to calculate a chamfer distance indicating the degree of coincidence with the silhouette line in the image.

すなわち、例えば、マッチング部３６は、姿勢モデル線を構成する画素とシルエット線を構成する白色画素との重心どうしを一致させるように、姿勢モデル画像と距離画像とを重ね合わせ、姿勢モデル線が重なる距離画像上の位置に対応する複数のユークリッド距離の平均値を、チャンファ距離として算出する。 That is, for example, the matching unit 36 superimposes the posture model image and the distance image so that the centroids of the pixels constituting the posture model line and the white pixels constituting the silhouette line are matched, and the posture model line overlaps. An average value of a plurality of Euclidean distances corresponding to positions on the distance image is calculated as a chamfer distance.

なお、マッチング部３６が行う距離マッチング処理は、通常、エッジを含むエッジ画像どうしの一致の度合いを算出するために用いられる一般的な技術である。 The distance matching process performed by the matching unit 36 is a general technique that is generally used to calculate the degree of matching between edge images including edges.

さらに、細線化画像内のシルエット線に外接する外接矩形と、姿勢モデル画像内の姿勢モデル線に外接する外接矩形との大きさが異なる場合には、マッチング部３６は、両方の外接矩形のうち、一方の外接矩形を、他方の外接矩形に合わせるように、拡大または縮小を行うことにより、同様の大きさの外接矩形にした後で、距離マッチング処理を行うようにすれば、より精度が高いチャンファ距離を算出することが可能となる。 Further, when the circumscribed rectangle circumscribing the silhouette line in the thinned image and the circumscribed rectangle circumscribing the posture model line in the posture model image are different in size, the matching unit 36 determines whether the circumscribed rectangle is If one of the circumscribed rectangles is enlarged or reduced so as to match the circumscribed rectangle of the other, the circumscribed rectangle of the same size is used, and then the distance matching process is performed, so that the accuracy is higher. It is possible to calculate the chamfer distance.

マッチング部３６は、距離マッチング処理により得られた複数の姿勢モデル画像毎のチャンファ距離を、姿勢推定部３７に供給する。 The matching unit 36 supplies the chamfer distance for each of the plurality of posture model images obtained by the distance matching process to the posture estimation unit 37.

姿勢推定部３７は、複数の姿勢モデル画像のうち、シルエット線と姿勢モデル線との一致の度合いが最大のチャンファ距離に対応する姿勢モデル画像、すなわち、例えば、マッチング部３６からの複数のチャンファ距離のうち、最短のチャンファ距離に対応する姿勢モデル画像内の姿勢モデル線が表す姿勢を、カメラ３１で撮像された人物の姿勢であると推定して、その推定結果を出力する。 The posture estimation unit 37 is a posture model image corresponding to the chamfer distance having the maximum degree of matching between the silhouette line and the posture model line among the plurality of posture model images, that is, for example, the plurality of chamfer distances from the matching unit 36. Among them, the posture represented by the posture model line in the posture model image corresponding to the shortest chamfer distance is estimated to be the posture of the person imaged by the camera 31, and the estimation result is output.

次に、図６乃至図９を参照して、細線化部３３が行う細線化処理の詳細を説明する。 Next, details of the thinning process performed by the thinning unit 33 will be described with reference to FIGS.

図１の細線化部３３が行う細線化処理は、第１のフェーズ、第２のフェーズ、第３のフェーズ、および第４のフェーズからなる処理であり、第１のフェーズ、第２のフェーズ、第３のフェーズ、第４のフェーズの順番で処理が行われることにより、図２のシルエット画像内の人物のシルエットが、シルエット線に変換された図３の細線化画像が得られる。 The thinning process performed by the thinning unit 33 in FIG. 1 is a process including a first phase, a second phase, a third phase, and a fourth phase. The first phase, the second phase, By performing the processing in the order of the third phase and the fourth phase, the thinned image of FIG. 3 in which the silhouette of the person in the silhouette image of FIG. 2 is converted into a silhouette line is obtained.

第１のフェーズでは、シルエット画像内の水平ラインを左から右方向に走査していく走査順であって、シルエット画像内の最も左上の画素を開始位置として、左上から右下方向に走査する走査順（ラスタ走査順）（以下、右下方向の走査順という）に、順次、シルエット抽出部３２から細線化部３３に供給されるシルエット画像内の白色画素が注目画素とされる。そして、注目画素が、図６において後述する３つの条件のうちのいずれかの条件を満たす場合に、白色画素である注目画素が黒色画素（画素値が０から１）に変換される。 In the first phase, scanning is performed in which the horizontal lines in the silhouette image are scanned from left to right, and scanning is performed from the upper left to the lower right starting from the uppermost left pixel in the silhouette image. In order (raster scanning order) (hereinafter referred to as the scanning order in the lower right direction), white pixels in the silhouette image sequentially supplied from the silhouette extraction unit 32 to the thinning unit 33 are set as the target pixel. Then, when the target pixel satisfies any one of the three conditions described later in FIG. 6, the target pixel that is a white pixel is converted to a black pixel (pixel value is 0 to 1).

次に、図６を参照して、右下方向の走査順に、順次、シルエット抽出部３２から細線化部３３に供給されるシルエット画像内の白色画素が注目画素とされた場合に、その注目画素が黒色画素に変換される条件（以下、第１の条件という）を説明する。 Next, referring to FIG. 6, when a white pixel in a silhouette image sequentially supplied from the silhouette extraction unit 32 to the thinning unit 33 is sequentially selected in the scanning order in the lower right direction, the target pixel Will be described below (hereinafter referred to as the first condition).

図６Ａ乃至図６Ｃには、それぞれ、注目画素を中心とする、横×縦が３×３画素が示されており、０、１およびＸは対応する画素の画素値を示している。なお、Ｘは、０または１のいずれかの値を表している。このことは、その他の図面においても同様である。 FIGS. 6A to 6C each show 3 × 3 pixels in the horizontal and vertical directions centering on the pixel of interest, and 0, 1 and X indicate the pixel values of the corresponding pixels. X represents a value of 0 or 1. The same applies to other drawings.

また、以下において、注目画素の左方向、左上方向、上方向、右上方向、右方向、右下方向、下方向、および左下方向に隣接する各画素を、それぞれ、左画素、左上画素、上画素、右上画素、右画素、右下画素、下画素、および左下画素という。 Further, in the following, each pixel adjacent in the left direction, upper left direction, upper direction, upper right direction, right direction, lower right direction, lower direction, and lower left direction of the target pixel is represented by a left pixel, an upper left pixel, and an upper pixel, respectively. , Upper right pixel, right pixel, lower right pixel, lower pixel, and lower left pixel.

図６Ａに示すように、第１の条件として、左下画素、下画素、右下画素が黒色画素（画素値が１）であり、左上画素、上画素、右上画素が白色画素（画素値が０）である条件を満たす場合、注目画素が黒色画素に（画素値が０から１に）変換される。なお、左画素および右画素は、白色画素または黒色画素のいずれかの画素である。 As shown in FIG. 6A, the first condition is that the lower left pixel, lower pixel, and lower right pixel are black pixels (pixel value is 1), and the upper left pixel, upper pixel, and upper right pixel are white pixels (pixel value is 0). ), The target pixel is converted to a black pixel (pixel value is changed from 0 to 1). Note that the left pixel and the right pixel are either white pixels or black pixels.

図６Ｂに示すように、第１の条件として、下画素、右下画素、右画素が黒色画素であり、左画素、左上画素、上画素が白色画素である条件を満たす場合、注目画素が黒色画素に変換される。なお、左下画素および右上画素は、白色画素または黒色画素のいずれかの画素である。 As shown in FIG. 6B, as a first condition, if the lower pixel, the lower right pixel, and the right pixel are black pixels, and the left pixel, the upper left pixel, and the upper pixel are white pixels, the target pixel is black Converted to pixels. The lower left pixel and the upper right pixel are either white pixels or black pixels.

図６Ｃに示すように、第１の条件として、右下画素、右画素、右上画素が黒色画素であり、左上画素、左画素、左下画素が白色画素である条件を満たす場合、注目画素が黒色画素に変換される。なお、下画素および上画素は、白色画素または黒色画素のいずれかの画素である。 As shown in FIG. 6C, as a first condition, when the condition that the lower right pixel, the right pixel, and the upper right pixel are black pixels and the upper left pixel, the left pixel, and the lower left pixel are white pixels is satisfied, the target pixel is black Converted to pixels. The lower pixel and the upper pixel are either white pixels or black pixels.

細線化処理において、第１のフェーズの終了後、第２のフェーズが開始される。第２のフェーズでは、シルエット画像内の水平ラインを右から左方向に走査していく走査順であって、シルエット画像内の最も右下の画素を開始位置として、右下から左上方向に走査する走査順（逆ラスタ走査順）（以下、左上方向の走査順という）に、順次、第１のフェーズの終了後のシルエット画像内の白色画素が注目画素とされる。そして、注目画素が、図７において後述する３つの条件のうちのいずれかの条件を満たす場合に、白色画素である注目画素が黒色画素に変換される。 In the thinning process, the second phase is started after the end of the first phase. In the second phase, the horizontal line in the silhouette image is scanned from the right to the left, and the scanning is performed from the lower right to the upper left starting from the lower right pixel in the silhouette image. In the scanning order (reverse raster scanning order) (hereinafter referred to as the scanning order in the upper left direction), white pixels in the silhouette image after the end of the first phase are sequentially set as the target pixel. Then, when the target pixel satisfies any one of the three conditions described later in FIG. 7, the target pixel that is a white pixel is converted into a black pixel.

次に、図７を参照して、左上方向の走査順に、順次、第１のフェーズの終了後のシルエット画像内の白色画素が注目画素とされた場合に、その注目画素が黒色画素に変換される条件（以下、第２の条件という）を説明する。 Next, referring to FIG. 7, when a white pixel in the silhouette image after the end of the first phase is sequentially set as a target pixel in the order of scanning in the upper left direction, the target pixel is converted into a black pixel. The condition (hereinafter referred to as the second condition) will be described.

図７Ａに示すように、第２の条件として、左上画素、上画素、右上画素が黒色画素であり、左下画素、下画素、右下画素が白色画素である条件を満たす場合、注目画素が黒色画素に変換される。なお、左画素および右画素は、白色画素または黒色画素のいずれかの画素である。 As shown in FIG. 7A, as a second condition, if the upper left pixel, the upper pixel, and the upper right pixel are black pixels, and the lower left pixel, the lower pixel, and the lower right pixel are white pixels, the target pixel is black Converted to pixels. Note that the left pixel and the right pixel are either white pixels or black pixels.

図７Ｂに示すように、第２の条件として、左画素、左上画素、上画素が黒色画素であり、下画素、右下画素、右画素が白色画素である条件を満たす場合、注目画素が黒色画素に変換される。なお、左下画素および右上画素は、白色画素または黒色画素のいずれかの画素である。 As shown in FIG. 7B, as a second condition, if the left pixel, the upper left pixel, and the upper pixel are black pixels, and the lower pixel, the lower right pixel, and the right pixel are white pixels, the target pixel is black Converted to pixels. The lower left pixel and the upper right pixel are either white pixels or black pixels.

図７Ｃに示すように、第２の条件として、左上画素、左画素、左下画素が黒色画素であり、右下画素、右画素、右上画素が白色画素である条件を満たす場合、注目画素が黒色画素に変換される。なお、下画素および上画素は、白色画素または黒色画素のいずれかの画素である。 As shown in FIG. 7C, as a second condition, if the upper left pixel, the left pixel, and the lower left pixel are black pixels, and the lower right pixel, the right pixel, and the upper right pixel are white pixels, the target pixel is black Converted to pixels. The lower pixel and the upper pixel are either white pixels or black pixels.

さらに、細線化処理において、第２のフェーズの終了後、第３のフェーズが開始される。第３のフェーズでは、シルエット画像内の水平ラインを右から左方向に走査していく走査順であって、シルエット画像内の最も右上の画素を開始位置として、右上から左下方向に走査する走査順（以下、左下方向の走査順という）に、順次、第２のフェーズの終了後のシルエット画像内の白色画素が注目画素とされる。そして、注目画素が、図８において後述する３つの条件のうちのいずれかの条件を満たす場合に、白色画素である注目画素が黒色画素に変換される。 Further, in the thinning process, the third phase is started after the end of the second phase. In the third phase, the scanning order is to scan the horizontal lines in the silhouette image from right to left, and the scanning order is to scan from the upper right to the lower left starting from the upper right pixel in the silhouette image. The white pixel in the silhouette image after the end of the second phase is sequentially set as the target pixel (hereinafter referred to as the scanning order in the lower left direction). Then, when the target pixel satisfies any one of the three conditions described later in FIG. 8, the target pixel which is a white pixel is converted into a black pixel.

次に、図８を参照して、左下方向の走査順に、順次、第２のフェーズの終了後のシルエット画像内の白色画素が注目画素とされた場合に、その注目画素が黒色画素に変換される条件（以下、第３の条件という）を説明する。 Next, referring to FIG. 8, when a white pixel in the silhouette image after the end of the second phase is sequentially set as a target pixel in the scanning order in the lower left direction, the target pixel is converted into a black pixel. The condition (hereinafter referred to as the third condition) will be described.

図８Ａに示すように、第３の条件として、左上画素、左画素、左下画素が黒色画素であり、右下画素、右画素、右上画素が白色画素である条件を満たす場合、注目画素が黒色画素に変換される。なお、下画素および上画素は、白色画素または黒色画素のいずれかの画素である。 As shown in FIG. 8A, as a third condition, if the upper left pixel, the left pixel, and the lower left pixel are black pixels, and the lower right pixel, the right pixel, and the upper right pixel are white pixels, the target pixel is black Converted to pixels. The lower pixel and the upper pixel are either white pixels or black pixels.

図８Ｂに示すように、第３の条件として、左画素、左下画素、下画素が黒色画素であり、右画素、右上画素、上画素が白色画素である条件を満たす場合、注目画素が黒色画素に変換される。なお、左上画素および右下画素は、白色画素または黒色画素のいずれかの画素である。 As shown in FIG. 8B, as a third condition, if the left pixel, the lower left pixel, and the lower pixel are black pixels, and the right pixel, the upper right pixel, and the upper pixel are white pixels, the target pixel is a black pixel. Is converted to Note that the upper left pixel and the lower right pixel are either white pixels or black pixels.

図８Ｃに示すように、第３の条件として、左下画素、下画素、右下画素が黒色画素であり、左上画素、上画素、右上画素が白色画素である条件を満たす場合、注目画素が黒色画素に変換される。なお、左画素および右画素は、白色画素または黒色画素のいずれかの画素である。 As shown in FIG. 8C, as a third condition, if the lower left pixel, the lower pixel, and the lower right pixel are black pixels, and the upper left pixel, the upper pixel, and the upper right pixel are white pixels, the target pixel is black Converted to pixels. Note that the left pixel and the right pixel are either white pixels or black pixels.

細線化処理において、第３のフェーズの終了後、第４のフェーズが開始される。第４のフェーズでは、シルエット画像内の水平ラインを左から右方向に走査していく走査順であって、シルエット画像内の最も左下の画素を開始位置として、左下から右上方向に走査する走査順（以下、右上方向の走査順という）に、順次、第３のフェーズの終了後のシルエット画像内の白色画素が注目画素とされる。そして、注目画素が、図９において後述する３つの条件のうちのいずれかの条件を満たす場合に、白色画素である注目画素が黒色画素に変換される。 In the thinning process, the fourth phase is started after the end of the third phase. In the fourth phase, the scanning order is to scan the horizontal lines in the silhouette image from the left to the right, and the scanning order is to scan from the bottom left to the top right starting from the bottom left pixel in the silhouette image. The white pixels in the silhouette image after the end of the third phase are sequentially set as target pixels (hereinafter referred to as the scanning order in the upper right direction). Then, when the target pixel satisfies any one of the three conditions described later in FIG. 9, the target pixel that is a white pixel is converted into a black pixel.

次に、図９を参照して、右上方向の走査順に、順次、第３のフェーズの終了後のシルエット画像内の白色画素が注目画素とされた場合に、その注目画素が黒色画素に変換される条件（以下、第４の条件という）を説明する。 Next, referring to FIG. 9, when a white pixel in the silhouette image after the end of the third phase is sequentially set as a target pixel in the order of scanning in the upper right direction, the target pixel is converted into a black pixel. The condition (hereinafter referred to as the fourth condition) will be described.

図９Ａに示すように、第４の条件として、右下画素、右画素、右上画素が黒色画素であり、左上画素、左画素、左下画素が白色画素である条件を満たす場合、注目画素が黒色画素に変換される。なお、下画素および上画素は、白色画素または黒色画素のいずれかの画素である。 As shown in FIG. 9A, as a fourth condition, when the condition that the lower right pixel, the right pixel, and the upper right pixel are black pixels and the upper left pixel, the left pixel, and the lower left pixel are white pixels is satisfied, the target pixel is black Converted to pixels. The lower pixel and the upper pixel are either white pixels or black pixels.

図９Ｂに示すように、第４の条件として、右画素、右上画素、上画素が黒色画素であり、左画素、左下画素、下画素が白色画素である条件を満たす場合、注目画素が黒色画素に変換される。なお、左上画素および右下画素は、白色画素または黒色画素のいずれかの画素である。 As shown in FIG. 9B, as a fourth condition, if the right pixel, the upper right pixel, and the upper pixel are black pixels, and the left pixel, the lower left pixel, and the lower pixel are white pixels, the target pixel is a black pixel. Is converted to Note that the upper left pixel and the lower right pixel are either white pixels or black pixels.

図９Ｃに示すように、第４の条件として、左上画素、上画素、右上画素が黒色画素であり、左下画素、下画素、右下画素が白色画素である条件を満たす場合、注目画素が黒色画素に変換される。なお、左画素および右画素は、白色画素または黒色画素のいずれかの画素である。 As shown in FIG. 9C, as a fourth condition, when the upper left pixel, the upper pixel, and the upper right pixel are black pixels, and the lower left pixel, the lower pixel, and the lower right pixel are white pixels, the target pixel is black Converted to pixels. Note that the left pixel and the right pixel are either white pixels or black pixels.

細線化処理において、第１乃至第４のフェーズからなる処理が行われることにより、シルエット画像内の人物のシルエットが、シルエット線に細線化される。 In the thinning process, the processes of the first to fourth phases are performed, so that the silhouette of the person in the silhouette image is thinned into a silhouette line.

次に、図１０および図１１を参照して、細線化処理により得られた細線化画像内の各画素のユークリッド距離を算出する距離算出処理を説明する。 Next, a distance calculation process for calculating the Euclidean distance of each pixel in the thinned image obtained by the thinning process will be described with reference to FIGS.

距離算出部３４が行う距離算出処理は、初期化処理、ラスタ走査処理、および逆ラスタ走査処理の３つの処理からなり、初期化処理、ラスタ走査処理、および逆ラスタ走査処理の順番で処理が行われる。 The distance calculation process performed by the distance calculation unit 34 includes three processes: an initialization process, a raster scanning process, and a reverse raster scanning process. The processes are performed in the order of the initialization process, the raster scanning process, and the reverse raster scanning process. Is called.

初期化処理では、次式（１）を用いて、細線化画像の各画素f(i,j)を、初期距離d'(i,j)に変換する。 In the initialization process, each pixel f (i, j) of the thinned image is converted into an initial distance d ′ (i, j) using the following equation (1).

なお、画素f(i,j)とは、細線化画像内の最も左上の画素を０行０列目に存在する画素とした場合に、細線化画像内のi行j列目に存在する画素を表しており、d'(i,j)とは、f(i,j)を初期化して得られた初期距離を表している。 The pixel f (i, j) is a pixel existing in the i-th row and j-th column in the thinned image when the upper left pixel in the thinned image is the pixel existing in the 0th row and 0th column. Where d ′ (i, j) represents an initial distance obtained by initializing f (i, j).

初期化処理により、細線化画像の各画素f(i,j)に対応する初期距離d'(i,j)が得られた後、ラスタ走査処理が行われる。 After the initial distance d ′ (i, j) corresponding to each pixel f (i, j) of the thinned image is obtained by the initialization process, the raster scanning process is performed.

次に、図１０は、ラスタ走査処理により、細線化画像の各画素f(i,j)に対応する初期距離d'(i,j)が、仮距離d"(i,j)に変換されるときの一例を示している。 Next, in FIG. 10, the initial distance d ′ (i, j) corresponding to each pixel f (i, j) of the thinned image is converted into a temporary distance d ″ (i, j) by raster scanning processing. An example is shown.

図１０Ａには、細線化画像の各画素f(i,j)に対応する初期距離d'(i,j)が示されている。図１０Ｂには、初期距離d'(i,j)に対応する仮距離d"(i,j)が示されている。 FIG. 10A shows an initial distance d ′ (i, j) corresponding to each pixel f (i, j) of the thinned image. FIG. 10B shows a temporary distance d ″ (i, j) corresponding to the initial distance d ′ (i, j).

ラスタ走査処理では、右下方向の走査順に、順次、初期距離に注目し、次式（２）を用いて、注目している初期距離d'(i,j)を仮距離d"(i,j)に変換する。 In the raster scanning process, attention is paid to the initial distance sequentially in the order of scanning in the lower right direction, and the initial distance d ′ (i, j) of interest is converted into the temporary distance d ″ (i, j, Convert to j).

なお、仮距離d"(i,j)とは、d'(i,j)、d"(i-1,j)+1、およびd"(i,j-1)+1のうちの最小値を表している。 The temporary distance d "(i, j) is the minimum of d '(i, j), d" (i-1, j) +1, and d "(i, j-1) +1. Represents a value.

ラスタ走査処理により、初期距離d'(i,j)に対応する仮距離d"(i,j)が得られた後、逆ラスタ走査処理が行われる。 After the provisional distance d ″ (i, j) corresponding to the initial distance d ′ (i, j) is obtained by the raster scanning process, the reverse raster scanning process is performed.

次に、図１１は、逆ラスタ走査処理により、仮距離d"(i,j)が、ユークリッド距離d(i,j)に変換されるときの一例を示している。 Next, FIG. 11 shows an example when the temporary distance d ″ (i, j) is converted into the Euclidean distance d (i, j) by the reverse raster scanning process.

図１１Ａには、初期距離d'(i,j)に対応する仮距離d"(i,j)が示されている。図１１Ｂには、仮距離d"(i,j)に対応するユークリッド距離d(i,j)が示されている。 FIG. 11A shows the temporary distance d ″ (i, j) corresponding to the initial distance d ′ (i, j). FIG. 11B shows the Euclidean corresponding to the temporary distance d ″ (i, j). The distance d (i, j) is shown.

逆ラスタ走査処理では、左上方向の走査順に、順次、仮距離に注目し、次式（３）を用いて、注目している仮距離d"(i,j)を、ユークリッド距離d(i,j)に変換する。これにより、ユークリッド距離により構成される距離画像が生成される。 In the reverse raster scanning process, attention is paid to the temporary distance sequentially in the order of scanning in the upper left direction, and using the following equation (3), the temporary distance d "(i, j) of interest is converted into the Euclidean distance d (i, In this way, a distance image composed of the Euclidean distance is generated.

なお、ユークリッド距離d(i,j)とは、d"(i,j)、d(i+1,j)+1、およびd(i,j+1)+1のうちの最小値を表している。 Note that the Euclidean distance d (i, j) represents the minimum value among d "(i, j), d (i + 1, j) +1, and d (i, j + 1) +1. ing.

次に、図１２のフローチャートを参照して、図１の情報処理装置１が行う最尤姿勢出力処理を説明する。 Next, the maximum likelihood posture output process performed by the information processing apparatus 1 of FIG. 1 will be described with reference to the flowchart of FIG.

ステップＳ３１において、カメラ３１は、被写体として、例えば、人物を撮像し、その撮像により得られた撮像画像を、シルエット抽出部３２に供給する。 In step S31, the camera 31 images, for example, a person as a subject, and supplies a captured image obtained by the imaging to the silhouette extracting unit 32.

ステップＳ３２において、シルエット抽出部３２は、カメラ３１からの撮像画像から、撮像画像内の人物が現れた領域を表すシルエットを検出（抽出）し、検出した人物のシルエットが現れた画像であるシルエット画像を生成して、細線化部３３に供給する。 In step S 32, the silhouette extraction unit 32 detects (extracts) a silhouette representing a region in which the person appears in the captured image from the captured image from the camera 31, and a silhouette image that is an image in which the detected silhouette of the person appears. Is generated and supplied to the thinning unit 33.

ステップＳ３３において、細線化部３３は、シルエット抽出部３２から供給されたシルエット画像内の人物のシルエットを、そのシルエットの形状を簡略化したシルエット線として、例えば、線幅が１画素分の幅となるシルエットの中心線に細線化する細線化処理を行い、その細線化処理により得られた細線化画像を、距離算出部３４に供給する。 In step S33, the thinning unit 33 uses the silhouette of the person in the silhouette image supplied from the silhouette extraction unit 32 as a silhouette line with a simplified shape of the silhouette. A thinning process for thinning the center line of the silhouette is performed, and a thinned image obtained by the thinning process is supplied to the distance calculation unit 34.

ステップＳ３４において、距離算出部３４は、細線化部３３からの細線化画像の各画素について、例えば、最も近くに存在する、シルエット線を構成する白色画素との直線距離を表すユークリッド距離等の距離を算出する距離算出処理を行い、その距離算出処理により得られた距離画像を、マッチング部３６に供給する。 In step S34, the distance calculation unit 34, for each pixel of the thinned image from the thinning unit 33, for example, a distance such as a Euclidean distance that represents a straight line distance from the nearest white pixel that forms the silhouette line. A distance calculation process for calculating the distance image is performed, and the distance image obtained by the distance calculation process is supplied to the matching unit 36.

ステップＳ３５において、マッチング部３６は、姿勢記憶部３５に記憶されている複数の座標値を読み出し、予め定められた座標値どうしを線分で結ぶことにより、所定の姿勢モデル線が現れた姿勢モデル画像を生成する。 In step S35, the matching unit 36 reads a plurality of coordinate values stored in the posture storage unit 35, and connects predetermined coordinate values with line segments, whereby a posture model in which a predetermined posture model line appears. Generate an image.

また、マッチング部３６は、生成した複数の姿勢モデル画像と、距離算出部３４から供給された距離画像とに基づいて、複数の姿勢モデル画像毎に、姿勢モデル画像内の姿勢モデル線と細線化画像内のシルエット線との一致の度合いを表すチャンファ距離を算出する距離マッチング処理を行う。そして、その距離マッチング処理により得られた複数の姿勢モデル画像毎のチャンファ距離を、姿勢推定部３７に供給する。 Further, the matching unit 36 thins the posture model line in the posture model image for each of the plurality of posture model images based on the plurality of generated posture model images and the distance image supplied from the distance calculation unit 34. A distance matching process is performed to calculate a chamfer distance representing the degree of matching with the silhouette line in the image. Then, the chamfer distance for each of the plurality of posture model images obtained by the distance matching process is supplied to the posture estimation unit 37.

ステップＳ３６において、姿勢推定部３７は、複数の姿勢モデル画像のうち、シルエット線と姿勢モデル線との一致の度合いが最大のチャンファ距離に対応する姿勢モデル画像、すなわち、例えば、マッチング部３６からの複数のチャンファ距離のうち、最短のチャンファ距離に対応する姿勢モデル画像内の姿勢モデル線が表す姿勢を、カメラ３１で撮像された人物の姿勢であると推定して、その推定結果を出力する。 In step S36, the posture estimation unit 37 is a posture model image corresponding to the chamfer distance having the maximum degree of matching between the silhouette line and the posture model line among the plurality of posture model images, that is, for example, from the matching unit 36. Of the plurality of chamfer distances, the posture represented by the posture model line in the posture model image corresponding to the shortest chamfer distance is estimated to be the posture of the person imaged by the camera 31, and the estimation result is output.

以上のように推定された人物の姿勢が出力された後、最尤姿勢出力処理は終了される。 After the posture of the person estimated as described above is output, the maximum likelihood posture output process ends.

次に、図１３のフローチャートを参照して、図１２のステップＳ３３における細線化処理の詳細を説明する。 Next, details of the thinning process in step S33 of FIG. 12 will be described with reference to the flowchart of FIG.

ステップＳ６１において、細線化部３３は、１ずつインクリメント（加算）される変数nを１とする。 In step S61, the thinning unit 33 sets the variable n incremented (added) by 1 to 1.

ステップＳ６２乃至ステップＳ６５において、細線化部３３は、細線化処理の第nのフェーズによる処理を行う。 In steps S62 to S65, the thinning unit 33 performs processing in the nth phase of the thinning process.

すなわち、ステップＳ６２において、細線化部３３は、第nのフェーズによる処理を行うときの走査順（例えば、第１のフェーズでは、右下方向の走査順）で、順次、処理対象であるシルエット画像内の白色画素を、注目画素とする。 That is, in step S62, the thinning unit 33 sequentially processes the silhouette images that are the processing target in the scanning order when performing the processing in the nth phase (for example, the scanning order in the lower right direction in the first phase). The white pixel inside is set as the target pixel.

ステップＳ６３において、細線化部３３は、注目画素が第nの条件を満たしているか否かを判定し、注目画素が第nの条件を満たしていると判定した場合、処理は、ステップＳ６４に進められ、注目画素を黒色画素に変換する。 In step S63, the thinning unit 33 determines whether or not the target pixel satisfies the nth condition, and if it is determined that the target pixel satisfies the nth condition, the process proceeds to step S64. The target pixel is converted into a black pixel.

また、ステップＳ６３において、細線化部３３は、注目画素が第nの条件を満たしていないと判定した場合、処理は、ステップＳ６４をスキップして、ステップＳ６５に進められる。 If the thinning unit 33 determines in step S63 that the pixel of interest does not satisfy the nth condition, the process skips step S64 and proceeds to step S65.

ステップＳ６５において、細線化部３３は、処理対象であるシルエット画像内の白色画素すべてを、注目画素としたか否かを判定し、まだ白色画素すべてを注目画素としていないと判定した場合、処理は、ステップＳ６２に戻り、まだ注目画素とされていない白色画素を、新たな注目画素として、以下、同様の処理が繰り返される。 In step S65, the thinning unit 33 determines whether all the white pixels in the silhouette image to be processed are the target pixels, and if it is determined that all the white pixels are not yet the target pixels, the process is performed. The process returns to step S62, and the same processing is repeated thereafter with the white pixel that has not yet been set as the target pixel as a new target pixel.

また、ステップＳ６５において、細線化部３３は、処理対象であるシルエット画像内の白色画素すべてを、注目画素としたと判定した場合、すなわち、第ｎのフェーズの処理が終了した場合、処理は、ステップＳ６６に進められる。 In step S65, the thinning unit 33 determines that all the white pixels in the silhouette image to be processed are the target pixels, that is, if the processing of the nth phase is completed, The process proceeds to step S66.

ステップＳ６６において、細線化部３３は、変数nが４であるか否かを判定し、変数nが４でないと判定した場合、処理は、ステップＳ６７に進められる。そして、細線化部３３は、変数nに１をインクリメントして新たな変数nとし、処理は、ステップＳ６２に戻り、以下、同様の処理が繰り返される。 In step S66, the thinning unit 33 determines whether the variable n is 4, and if it is determined that the variable n is not 4, the process proceeds to step S67. Then, the thinning unit 33 increments the variable n by 1 to make a new variable n, the process returns to step S62, and the same process is repeated thereafter.

なお、ステップＳ６６において、細線化部３３は、変数nが４であると判定した場合、すなわち、第１乃至第４のフェーズからなる処理が終了した場合、細線化処理は終了されて、処理は、図１２のステップＳ３３にリターンされる。 In step S66, if the thinning unit 33 determines that the variable n is 4, that is, if the processing including the first to fourth phases is finished, the thinning processing is finished and the processing is performed. Return to step S33 of FIG.

次に、図１４のフローチャートを参照して、図１２のステップＳ３４における距離算出処理の詳細を説明する。 Next, details of the distance calculation processing in step S34 in FIG. 12 will be described with reference to the flowchart in FIG.

ステップＳ９１において、距離算出部３４は、式（１）を用いて、細線化部３３からの細線化画像の各画素f(i,j)を、対応する初期距離d'(i,j)に変換する初期化処理を行う。ステップＳ９２において、距離算出部３４は、１ずつインクリメントされる変数iおよびjを、０とする。 In step S91, the distance calculation unit 34 sets each pixel f (i, j) of the thinned image from the thinning unit 33 to the corresponding initial distance d ′ (i, j) using Expression (1). Perform initialization processing to convert. In step S92, the distance calculation unit 34 sets the variables i and j incremented by 1 to zero.

ステップＳ９３乃至ステップＳ９８において、距離算出部３４は、右下方向の走査順に、順次、初期距離に注目し、式（２）を用いて、注目している初期距離d'(i,j)を仮距離d"(i,j)に変換するラスタ走査処理を行う。 In step S93 to step S98, the distance calculation unit 34 pays attention to the initial distance sequentially in the order of scanning in the lower right direction, and uses the equation (2) to determine the initial distance d ′ (i, j) of interest. A raster scanning process for converting to the temporary distance d "(i, j) is performed.

すなわち、ステップＳ９３において、距離算出部３４は、d'(i,j)、d"(i-1,j)+1、およびd"(i,j-1)+1のうちの最小値を、注目している初期距離d'(i,j)に対応する仮距離d"(i,j)に決定する。そして、ステップＳ９４において、距離算出部３４は、変数jが細線化画像の水平方向の画素数-1を表す定数jmaxであるか否か（注目している初期距離d'(i,j)に対応する画素が細線化画像の右端に存在する画素であるか否か）を判定する。 That is, in step S93, the distance calculation unit 34 calculates the minimum value of d ′ (i, j), d ″ (i−1, j) +1, and d ″ (i, j−1) +1. Then, the temporary distance d ″ (i, j) corresponding to the initial distance d ′ (i, j) of interest is determined. In step S94, the distance calculation unit 34 determines that the variable j is the horizontal line of the thinned image. Whether or not the constant jmax represents the number of pixels in the direction −1 (whether or not the pixel corresponding to the initial distance d ′ (i, j) of interest is the pixel present at the right end of the thinned image) judge.

ステップＳ９４において、距離算出部３４は、変数jが定数jmaxでないと判定した場合（注目している初期距離d'(i,j)に対応する画素が細線化画像の右端に存在する画素でないと判定した場合）、処理は、ステップＳ９５に進められる。そして、距離算出部３４は、変数jに１をインクリメントして新たな変数jとし、処理は、ステップＳ９３に戻り、以下、同様の処理が繰り返される。 In step S94, when the distance calculation unit 34 determines that the variable j is not a constant jmax (a pixel corresponding to the initial distance d ′ (i, j) of interest is not a pixel present at the right end of the thinned image). If so, the process proceeds to step S95. Then, the distance calculation unit 34 increments the variable j by 1 to make a new variable j, the process returns to step S93, and the same process is repeated thereafter.

また、ステップＳ９４において、距離算出部３４は、変数jが定数jmaxであると判定した場合（注目している初期距離d'(i,j)に対応する画素が細線化画像の右端に存在する画素であると判定した場合）、処理は、ステップＳ９６に進められる。 In step S94, when the distance calculation unit 34 determines that the variable j is a constant jmax (a pixel corresponding to the initial distance d ′ (i, j) of interest exists at the right end of the thinned image). If it is determined that the pixel is a pixel), the process proceeds to step S96.

そして、ステップＳ９６において、距離算出部３４は、変数iが細線化画像の垂直方向の画素数-1を表す定数imaxであるか否か（注目している初期距離d'(i,j)に対応する画素が細線化画像の下端に存在する画素であるか否か）を判定する。 In step S96, the distance calculation unit 34 determines whether or not the variable i is a constant imax indicating the number of pixels -1 in the vertical direction of the thinned image (the initial distance d ′ (i, j) of interest). It is determined whether or not the corresponding pixel is a pixel existing at the lower end of the thinned image.

ステップＳ９６において、距離算出部３４は、変数iが定数imaxでないと判定した場合（注目している初期距離d'(i,j)に対応する画素が細線化画像の下端に存在する画素でないと判定した場合）、処理は、ステップＳ９７に進められる。そして、距離算出部３４は変数jを０とし、処理は、ステップＳ９８に進められ、変数iに１をインクリメントして新たな変数iとし、処理は、ステップＳ９３に戻り、以下、同様の処理が繰り返される。 In step S96, the distance calculation unit 34 determines that the variable i is not a constant imax (the pixel corresponding to the initial distance d ′ (i, j) of interest is not a pixel existing at the lower end of the thinned image). If so, the process proceeds to step S97. The distance calculation unit 34 sets the variable j to 0, and the process proceeds to step S98. The variable i is incremented by 1 to obtain a new variable i. The process returns to step S93. Repeated.

また、ステップＳ９６において、距離算出部３４は、変数iが定数imaxであると判定した場合（注目している初期距離d'(i,j)に対応する画素が細線化画像の下端に存在する画素であると判定した場合）、処理は、ステップＳ９９に進められる。 In step S96, if the distance calculation unit 34 determines that the variable i is a constant imax (a pixel corresponding to the initial distance d ′ (i, j) of interest exists at the lower end of the thinned image). If it is determined that the pixel is a pixel), the process proceeds to step S99.

ステップＳ９９乃至ステップＳ１０４において、距離算出部３４は、左上方向の走査順に、順次、仮距離に注目し、式（３）を用いて、注目している仮距離d"(i,j)をユークリッド距離d(i,j)に変換する逆ラスタ走査処理を行う。 In steps S99 to S104, the distance calculation unit 34 pays attention to the temporary distance sequentially in the order of scanning in the upper left direction, and uses the equation (3) to calculate the temporary distance d ″ (i, j) of interest. A reverse raster scanning process for converting the distance to d (i, j) is performed.

すなわち、ステップＳ９９において、距離算出部３４は、d"(i,j)、d(i+1,j)+1、およびd(i,j+1)+1のうちの最小値を、注目している仮距離d"(i,j)に対応するユークリッド距離d(i,j)に決定する。そして、ステップＳ１００において、距離算出部３４は、変数jが０であるか否か（注目している仮距離d"(i,j)に対応する画素が細線化画像の左端に存在する画素であるか否か）を判定する。 That is, in step S99, the distance calculation unit 34 pays attention to the minimum value among d "(i, j), d (i + 1, j) +1, and d (i, j + 1) +1. The Euclidean distance d (i, j) corresponding to the provisional distance d "(i, j) is determined. In step S100, the distance calculation unit 34 determines whether or not the variable j is 0 (the pixel corresponding to the temporary distance d ″ (i, j) of interest is a pixel existing at the left end of the thinned image). Whether or not there is).

ステップＳ１００において、距離算出部３４は、変数jが０でないと判定した場合（注目している仮距離d"(i,j)に対応する画素が細線化画像の左端に存在する画素でないと判定した場合）、処理は、ステップＳ１０１に進められる。そして、距離算出部３４は、変数jに-１をインクリメントして新たな変数jとし、処理は、ステップＳ９９に戻り、以下、同様の処理が繰り返される。 In step S100, when the distance calculation unit 34 determines that the variable j is not 0 (determined that the pixel corresponding to the temporary distance d "(i, j) of interest is not a pixel existing at the left end of the thinned image. The distance calculation unit 34 increments the variable j by -1 to obtain a new variable j, the process returns to step S99, and the same process is performed thereafter. Repeated.

また、ステップＳ１００において、距離算出部３４は、変数jが０であると判定した場合（注目している仮距離d"(i,j)に対応する画素が細線化画像が左端に存在する画素であると判定した場合）、処理は、ステップＳ１０２に進められる。 In step S100, when the distance calculation unit 34 determines that the variable j is 0 (a pixel corresponding to the focused temporary distance d "(i, j) is a pixel in which a thinned image exists at the left end. The process proceeds to step S102.

そして、ステップＳ１０２において、距離算出部３４は、変数iが０であるか否か（注目している仮距離d"(i,j)に対応する画素が細線化画像の上端に存在する画素であるか否か）を判定する。 In step S102, the distance calculation unit 34 determines whether or not the variable i is 0 (the pixel corresponding to the temporary distance d ″ (i, j) of interest is a pixel existing at the upper end of the thinned image). Whether or not there is).

ステップＳ１０２において、距離算出部３４は、変数iが０でないと判定した場合（注目している仮距離d"(i,j)に対応する画素が細線化画像の上端に存在する画素でないと判定した場合）、処理は、ステップＳ１０３に進められる。そして、距離算出部３４は、変数jを定数jmaxとし、処理は、ステップＳ１０４に進められ、変数iに-１をインクリメントして新たな変数iとし、処理は、ステップＳ９９に戻り、以下、同様の処理が繰り返される。 In step S102, when the distance calculation unit 34 determines that the variable i is not 0 (determined that the pixel corresponding to the temporary distance d ″ (i, j) of interest is not a pixel existing at the upper end of the thinned image. The distance calculation unit 34 sets the variable j as a constant jmax, the process proceeds to step S104, increments the variable i by -1, and sets a new variable i. The process returns to step S99, and the same process is repeated thereafter.

また、ステップＳ１０２において、距離算出部３４は、変数iが０であると判定した場合（注目している仮距離d"(i,j)に対応する画素が細線化画像の上端に存在する画素であると判定した場合）、距離算出処理は終了される。そして、処理は、図１２のステップＳ３４にリターンされる。 In step S102, the distance calculation unit 34 determines that the variable i is 0 (a pixel corresponding to the focused temporary distance d "(i, j) is present at the upper end of the thinned image. If it is determined that the distance is calculated, the distance calculation process ends, and the process returns to step S34 in FIG.

以上のように、図１２の最尤姿勢出力処理では、人物の身体にマーカ等を付加させることなく、人物が撮像された撮像画像から、人物の姿勢を推定するようにしたので、人物に負担をかけることなく、人物の姿勢を推定することができる。 As described above, in the maximum likelihood posture output process of FIG. 12, the posture of the person is estimated from the captured image obtained by capturing the person without adding a marker or the like to the person's body. The posture of the person can be estimated without applying.

また、図１２の最尤姿勢出力処理では、複数の姿勢モデル画像と、距離画像とに基づいて、線分が延びる方向、長さ、および線分どうしの接続関係等について、複数の姿勢モデル線毎に、姿勢モデル線とシルエット線とを直接に比較するようにしたので、例えば、画像モーメント等の特徴量から姿勢を推定する場合と比較して、より正確な推定を行うことができる。 Further, in the maximum likelihood posture output process of FIG. 12, a plurality of posture model lines with respect to the direction in which the line extends, the length, the connection relationship between the line segments, and the like based on the plurality of posture model images and the distance image. Since the posture model line and the silhouette line are directly compared each time, for example, more accurate estimation can be performed as compared with the case where the posture is estimated from the feature amount such as the image moment.

すなわち、画像モーメント等の特徴量から姿勢を推定するためには、例えばニューラルネットワーク等の写像により、特徴量と身体の構造とを対応付ける必要があるが、特徴量と身体の構造とを対応付ける段階で、特徴量と身体の構造との対応付けに曖昧さが介在してしまう。このため、例えば、画像モーメント等の特徴量から姿勢を推定する場合には、正確な姿勢を推定することができないことがあった。 That is, in order to estimate the posture from the feature quantity such as the image moment, it is necessary to associate the feature quantity with the body structure by mapping such as a neural network, but at the stage of associating the feature quantity with the body structure. In addition, ambiguity is involved in the correspondence between the feature quantity and the body structure. For this reason, for example, when the posture is estimated from the feature amount such as the image moment, the accurate posture may not be estimated.

しかしながら、図１２の最尤姿勢出力処理では、シルエット線と姿勢モデル線とを、直接に比較することから、特徴量と身体の構造との対応付けを行う必要がないため、より正確な推定を行うことが可能となる。 However, in the maximum likelihood posture output process of FIG. 12, since the silhouette line and the posture model line are directly compared, it is not necessary to associate the feature quantity with the structure of the body. Can be done.

なお、図１２の最尤姿勢出力処理では、ステップＳ３４（距離算出処理）において、距離算出部３４は、細線化画像の各画素f(i,j)について、ユークリッド距離d(i,j)を算出することとしたが、例えば、ユークリッド距離d(i,j)に代えて、シティブロック距離（マンハッタン距離）またはチェスボード距離等を算出するようにしてもよい。 In the maximum likelihood posture output process of FIG. 12, in step S34 (distance calculation process), the distance calculation unit 34 calculates the Euclidean distance d (i, j) for each pixel f (i, j) of the thinned image. For example, instead of the Euclidean distance d (i, j), a city block distance (Manhattan distance) or a chess board distance may be calculated.

この場合、ステップＳ３５において、マッチング部３６は、複数の姿勢モデル画像と、距離算出部３４からの、シティブロック距離またはチェスボード距離等により構成される距離画像とに基づいて、チャンファ距離を算出する距離マッチング処理を行い、ステップＳ３６において、姿勢推定部３７は、距離マッチング処理により得られた複数のチャンファ距離のうち、最短のチャンファ距離に対応する姿勢モデル画像内の姿勢モデル線が表す姿勢を、カメラ３１で撮像された人物の姿勢であると推定して、その推定結果を出力する。 In this case, in step S35, the matching unit 36 calculates the chamfer distance based on the plurality of posture model images and the distance image formed by the city block distance or the chess board distance from the distance calculation unit 34. A distance matching process is performed, and in step S36, the attitude estimation unit 37 represents the attitude represented by the attitude model line in the attitude model image corresponding to the shortest chamfer distance among the plurality of chamfer distances obtained by the distance matching process. It is estimated that the posture of the person is captured by the camera 31, and the estimation result is output.

次に、図１５を参照して、ユークリッド距離、シティブロック距離、およびチェスボード距離について説明する。 Next, the Euclidean distance, the city block distance, and the chess board distance will be described with reference to FIG.

ユークリッド距離とは、上述したように、２つの画素間の直線距離を表している。したがって、例えば、図１５Ａにおいて、横×縦が３×３画素のうち、中心に存在する中心画素（画素値が１の画素）と、その中心画素とのユークリッド距離は0.0となり、中心画素と、中心画素の上下左右方向に隣接するいずれの４画素とのユークリッド距離は1.0となる。また、例えば、中心画素と、中心画素の斜め方向に隣接するいずれの４画素とのユークリッド距離は1.41(=√２)となる。 As described above, the Euclidean distance represents a linear distance between two pixels. Therefore, for example, in FIG. 15A, among the 3 × 3 pixels in the horizontal × vertical direction, the center pixel (pixel having a pixel value of 1) existing at the center and the Euclidean distance between the center pixel is 0.0, The Euclidean distance between any four pixels adjacent to the central pixel in the vertical and horizontal directions is 1.0. For example, the Euclidean distance between the central pixel and any four pixels adjacent to the central pixel in the oblique direction is 1.41 (= √2).

シティブロック距離とは、２つの画素のうち、所定の画素から、他の画素までの最短のパスであって、所定の画素と他の画素との間に存在する画素を上下左右方向にたどったときのパス上に存在する画素の個数-1を表している。 The city block distance is the shortest path from a predetermined pixel to another pixel of the two pixels, and the pixel existing between the predetermined pixel and the other pixel is traced vertically and horizontally. It represents the number of pixels -1 present on the current pass.

したがって、例えば、図１５Ｂにおいて、横×縦が３×３画素のうち、中心に存在する中心画素と、その中心画素とのシティブロック距離は０となり、中心画素と、中心画素の上下左右方向に隣接するいずれの４画素とのシティブロック距離は１となる。また、例えば、中心画素と、中心画素の斜め方向に隣接するいずれの４画素とのシティブロック距離は２となる。 Therefore, for example, in FIG. 15B, among the 3 × 3 pixels in the horizontal and vertical directions, the city block distance between the central pixel present at the center and the central pixel is 0, and the central pixel and the central pixel are vertically and horizontally. The city block distance between any four adjacent pixels is 1. For example, the city block distance between the central pixel and any four pixels adjacent to the central pixel in the oblique direction is 2.

チェスボード距離は、２つの画素のうち、所定の画素から、他の画素までの最短のパスであって、所定の画素と他の画素との間に存在する画素を上下左右方向の他、斜め方向にたどったときのパス上に存在する画素の個数-1を表している。 The chessboard distance is the shortest path from a predetermined pixel to another pixel of the two pixels, and the pixel existing between the predetermined pixel and the other pixel is tilted in addition to the vertical and horizontal directions. This represents the number of pixels -1 present on the path when traced in the direction.

したがって、例えば、図１５Ｃにおいて、横×縦が３×３画素のうち、中心に存在する中心画素と、その中心画素とのチェスボード距離は０となり、中心画素と、中心画素の上下左右方向に隣接するいずれの４画素とのチェスボード距離は１となる。また、例えば、中心画素と、中心画素の斜め方向に隣接するいずれの４画素とのチェスボード距離は１となる。 Therefore, for example, in FIG. 15C, among the 3 × 3 pixels in the horizontal × vertical direction, the chessboard distance between the central pixel present at the center and the central pixel is 0, and the central pixel and the central pixel in the vertical and horizontal directions. The chess board distance between any four adjacent pixels is 1. Further, for example, the chessboard distance between the central pixel and any four pixels adjacent to the central pixel in the oblique direction is 1.

図１６は、人物の姿勢を推定する情報処理装置１の他の構成例を示している。 FIG. 16 shows another configuration example of the information processing apparatus 1 that estimates the posture of a person.

なお、図中、図１の場合に対応する部分については同一の符号を付してあり、以下、その説明は、適宜省略する。 In the figure, portions corresponding to those in FIG. 1 are denoted by the same reference numerals, and description thereof will be omitted as appropriate.

すなわち、図１６の情報処理装置１は、距離算出部３４、姿勢記憶部３５、マッチング部３６、および姿勢推定部３７に代えて、特徴量生成部６１、姿勢記憶部６２、マッチング部６３、および姿勢推定部６４が設けられている他は、図１の場合と同様に構成されている。 That is, the information processing apparatus 1 in FIG. 16 replaces the distance calculation unit 34, the posture storage unit 35, the matching unit 36, and the posture estimation unit 37 with a feature amount generation unit 61, a posture storage unit 62, a matching unit 63, and The configuration is the same as in the case of FIG. 1 except that the posture estimation unit 64 is provided.

特徴量生成部６１には、細線化部３３から細線化画像が供給される。特徴量生成部６１は、細線化部３３からの細線化画像内のシルエット線から、シルエット線の特徴を表す複数のヒストグラムにより構成されるシェイプコンテキスト（Shape Context）特徴量（以下、シルエット線のシェイプコンテキスト特徴量という）を生成する特徴量生成処理を行い、その結果得られたシルエット線のシェイプコンテキスト特徴量を、マッチング部６３に供給する。 A thinned image is supplied from the thinning unit 33 to the feature quantity generation unit 61. The feature quantity generation unit 61 uses a shape context (Shape Context) feature quantity (hereinafter referred to as a silhouette line shape) composed of a plurality of histograms representing the features of silhouette lines from the silhouette lines in the thinned image from the thinning unit 33. A feature amount generation process for generating a context feature amount is performed, and the shape context feature amount of the silhouette line obtained as a result is supplied to the matching unit 63.

なお、シェイプコンテキスト特徴量を生成する生成方法の詳細は、図１７を参照して後述する。また、シェイプコンテキスト特徴量のより詳細な生成方法は、例えば、「"Matching with Shape Contexts"(IEEE Workshop on Contentbased Access of Image and Video Libraries, 2000)」に記載されている。 The details of the generation method for generating the shape context feature amount will be described later with reference to FIG. A more detailed method for generating shape context feature values is described in, for example, ““ Matching with Shape Contexts ”(IEEE Workshop on Content-based Access of Image and Video Libraries, 2000)”.

姿勢記憶部６２は、予め、複数の姿勢モデル線毎に、姿勢モデル線の特徴を表す複数のヒストグラムにより構成されるシェイプコンテキスト特徴量（以下、姿勢モデル線のシェイプコンテキスト特徴量という）を記憶している。なお、姿勢モデル線のシェイプコンテキスト特徴量は、例えば、特徴量生成部６１により、シルエット線のシェイプコンテキスト特徴量を生成する生成方法と同一の方法により生成されたものである。 The posture storage unit 62 stores in advance a shape context feature amount (hereinafter referred to as a shape context feature amount of the posture model line) configured by a plurality of histograms representing the features of the posture model line for each of the plurality of posture model lines. ing. Note that the shape context feature amount of the pose model line is generated by the feature amount generation unit 61 using the same method as that for generating the shape context feature amount of the silhouette line, for example.

マッチング部６３は、特徴量生成部６１からの、シルエット線のシェイプコンテキスト特徴量と、複数の姿勢モデル線毎に、姿勢記憶部６２に記憶されている姿勢モデル線のシェイプコンテキスト特徴量とを比較することにより、シルエット線のシェイプコンテキスト特徴量と姿勢モデル線のシェイプコンテキスト特徴量との一致の度合いを表す一致度（シェイプコンテキスト特徴量の一致度）を算出する特徴量マッチング処理を行い、その特徴量マッチング処理により、複数の姿勢モデル線毎に得られたシェイプコンテキスト特徴量の一致度を、姿勢推定部６４に供給する。 The matching unit 63 compares the shape context feature amount of the silhouette line from the feature amount generation unit 61 with the shape context feature amount of the posture model line stored in the posture storage unit 62 for each of a plurality of posture model lines. By performing the feature amount matching processing for calculating the degree of coincidence (shape context feature amount coincidence) indicating the degree of coincidence between the shape context feature amount of the silhouette line and the shape context feature amount of the pose model line, Through the amount matching process, the degree of coincidence of the shape context feature amounts obtained for each of the plurality of posture model lines is supplied to the posture estimation unit 64.

姿勢推定部６４は、複数の姿勢モデル線のうち、一致の度合いが最大のシェイプコンテキスト特徴量の一致度に対応する姿勢モデル線、すなわち、例えば、マッチング部６３からの複数のシェイプコンテキスト特徴量の一致度のうち、最小のシェイプコンテキスト特徴量の一致度に対応する姿勢モデル線が表す姿勢を、カメラ３１で撮像された人物の姿勢であると推定して、その推定結果を出力する。 The posture estimation unit 64 corresponds to the posture model line corresponding to the degree of coincidence of the shape context feature amount having the maximum degree of matching among the plurality of posture model lines, that is, for example, the plurality of shape context feature amounts from the matching unit 63. Among the matching degrees, the posture represented by the posture model line corresponding to the matching degree of the minimum shape context feature amount is estimated to be the posture of the person imaged by the camera 31, and the estimation result is output.

次に、図１７を参照して、特徴量生成部６１が、シェイプコンテキスト特徴量（を構成する所定のヒストグラム）を生成する生成方法を説明する。 Next, a generation method in which the feature amount generation unit 61 generates a shape context feature amount (predetermined histogram constituting the shape context) will be described with reference to FIG.

図１７左側には、シルエット線と、そのシルエット線を構成する所定の白色画素を中心とする複数の同心円、および所定の白色画素から放射状に伸びた線分により囲まれて形成される略扇形の複数の領域が示されている。 On the left side of FIG. 17, a substantially fan-like shape formed by being surrounded by a silhouette line, a plurality of concentric circles centering on a predetermined white pixel constituting the silhouette line, and a line segment extending radially from the predetermined white pixel. Multiple areas are shown.

なお、シルエット線としては、上述したように、図３に示したシルエットの中心線の他、シルエットの輪郭を表す輪郭線を採用することが可能である。したがって、図１７左側には、シルエット線として、シルエットの輪郭を表す輪郭線が示されている。 As described above, as a silhouette line, in addition to the center line of the silhouette shown in FIG. 3, it is possible to adopt a contour line representing the outline of the silhouette. Accordingly, on the left side of FIG. 17, a contour line representing the contour of the silhouette is shown as a silhouette line.

図１７右側には、横軸を、複数の領域それぞれを示すBin番号とし、縦軸を、対応するBin番号の領域内に存在するシルエット線を構成する白色画素の個数を示すサンプル点数とする横軸と縦軸とで定義されるヒストグラムが示されている。 On the right side of FIG. 17, the horizontal axis is a Bin number indicating each of a plurality of regions, and the vertical axis is a sample point indicating the number of white pixels constituting the silhouette line existing in the region of the corresponding Bin number. A histogram defined by an axis and a vertical axis is shown.

特徴量生成部６１は、細線化部３３からのシルエット線を構成する白色画素に順次、注目する。そして、注目している白色画素を中心として形成される、図１７左側に示すような複数の領域から、ヒストグラムを生成する。 The feature quantity generation unit 61 sequentially pays attention to the white pixels constituting the silhouette line from the thinning unit 33. Then, a histogram is generated from a plurality of regions as shown on the left side of FIG. 17 that are formed around the white pixel of interest.

すなわち、例えば、マッチング部３６は、図１７左側に示すように、領域Ａに存在する個数が５個である場合、領域Ａを示すBin番号に対応するサンプル点数を５点とし、領域Ｂに存在する個数が７個である場合、領域Ｂを示すBin番号に対応するサンプル点数を７点として、図１７右側に示すようなヒストグラムを生成する。 That is, for example, as shown in the left side of FIG. 17, the matching unit 36 has five sample points corresponding to the Bin number indicating the region A and exists in the region B when the number of the regions A is five. When the number to be calculated is 7, the number of sample points corresponding to the Bin number indicating the region B is set to 7 to generate a histogram as shown on the right side of FIG.

特徴量生成部６１は、注目される、シルエット線を構成する白色画素の数だけ得られた複数のヒストグラムを、シルエット線のシェイプコンテキスト特徴量として、マッチング部６３に供給する。 The feature quantity generating unit 61 supplies a plurality of histograms obtained by the number of white pixels constituting the silhouette line to be noticed to the matching unit 63 as shape context feature quantities of the silhouette line.

マッチング部６３は、特徴量生成部６１からのシルエット線のシェイプコンテキスト特徴量と、複数の姿勢モデル線毎に、姿勢記憶部３５に記憶されている姿勢モデル線のシェイプコンテキスト特徴量との一致度（シェイプコンテキスト特徴量の一致度）を算出する特徴量マッチング処理を行い、その特徴量マッチング処理により、複数の姿勢モデル線毎に得られたシェイプコンテキスト特徴量の一致度を、姿勢推定部６４に供給する。 The matching unit 63 matches the shape context feature amount of the silhouette line from the feature amount generation unit 61 with the shape context feature amount of the posture model line stored in the posture storage unit 35 for each of a plurality of posture model lines. A feature amount matching process for calculating (shape context feature amount coincidence) is performed, and by the feature amount matching process, the shape context feature amount coincidence obtained for each of a plurality of posture model lines is transmitted to the posture estimation unit 64. Supply.

すなわち、例えば、マッチング部６３は、シルエット線を構成する白色画素を、例えば、ラスタスキャン順に順番に並べたときのx番目の白色画素について得られるヒストグラムと、姿勢モデル線を構成する画素を、ラスタスキャン順に順番に並べたときのx番目の画素について得られるヒストグラムとを決定する。そして、シルエット線を構成する白色画素のうちのx番目の白色画素について得られるヒストグラムと、姿勢モデル線を構成する画素のうちのx番目の画素について得られるヒストグラムとの対応するBin番号のサンプル点数どうしの差分絶対値の総和を、ヒストグラムどうしの一致度として算出する。 That is, for example, the matching unit 63 rasterizes the histograms obtained for the x-th white pixel when the white pixels constituting the silhouette line are arranged in order, for example, in raster scan order, and the pixels constituting the posture model line. The histogram obtained for the x-th pixel when arranged in the scan order is determined. And the number of sample points of the corresponding Bin number between the histogram obtained for the xth white pixel of the white pixels constituting the silhouette line and the histogram obtained for the xth pixel of the pixels constituting the posture model line The sum of absolute differences is calculated as the degree of coincidence between histograms.

なお、ヒストグラムどうしの一致度を評価する評価方法としては、その他、例えば、カイ２乗距離、KLダイバージェンス、バタチャリア距離等の各種の尺度を用いることが可能である。 In addition, as an evaluation method for evaluating the degree of coincidence between histograms, for example, various scales such as a chi-square distance, a KL divergence, a virtual distance, and the like can be used.

また、マッチング部６３は、決定したヒストグラム毎に算出したヒストグラムどうしの一致度の総和を、シェイプコンテキスト特徴量の一致度として算出し、姿勢推定部６４に供給する。 In addition, the matching unit 63 calculates the sum of the degrees of coincidence between the histograms calculated for each determined histogram as the degree of coincidence of the shape context feature values, and supplies the same to the posture estimation unit 64.

次に、図１８を参照して、シェイプコンテキスト特徴量を構成する所定のヒストグラムが、シルエット線等の線の一部分の特徴を一意に表していることを説明する。 Next, with reference to FIG. 18, it will be described that a predetermined histogram constituting the shape context feature amount uniquely represents a feature of a part of a line such as a silhouette line.

図１８左上側および図１８右上側には、それぞれ、ローマ字の「Ａ」の輪郭をなぞるように、その輪郭を表す画素である輪郭画素が示されている。 In the upper left side of FIG. 18 and the upper right side of FIG. 18, contour pixels, which are pixels representing the contour, are shown so as to trace the contour of the Roman letter “A”.

図１８左上側の「Ａ」の領域９１と、図１８右上側の「Ａ」の領域９２とは、いずれも、右斜め上方向から左斜め下方向に向かって伸びる、複数の輪郭画素により構成される線分が存在する領域であるため、領域９１と領域９２とは互いに類似する領域である。 The “A” area 91 on the upper left side of FIG. 18 and the “A” area 92 on the upper right side of FIG. 18 are both configured by a plurality of contour pixels extending from the upper right direction to the lower left direction. Therefore, the region 91 and the region 92 are similar to each other.

この場合、図１８下側に示すように、領域９１から得られるヒストグラム９１aと、領域９２から得られるヒストグラム９２aどうしが類似していることがわかる。 In this case, as shown in the lower side of FIG. 18, it can be seen that the histogram 91 a obtained from the region 91 and the histogram 92 a obtained from the region 92 are similar to each other.

また、図１８左側の「Ａ」の領域９３は、左方向から右方向に向かって伸びる、複数の輪郭画素により構成される線分が存在する領域であるため、領域９１および領域９２とは全く異なる領域である。 Also, the area 93 of “A” on the left side of FIG. 18 is an area where there is a line segment composed of a plurality of contour pixels extending from the left direction to the right direction. It is a different area.

この場合、図１８下側に示すように、領域９３から得られるヒストグラム９３aと、領域９１から得られるヒストグラム９１a（領域９２から得られるヒストグラム９２a）どうしが異なることがわかる。 In this case, as shown in the lower side of FIG. 18, it can be seen that the histogram 93a obtained from the region 93 and the histogram 91a obtained from the region 91 (histogram 92a obtained from the region 92) are different.

図１８に示すように、領域内に存在する図形（輪郭画素の配置）どうしが類似する場合には、領域から得られるヒストグラムどうしも類似し、領域内に存在する図形どうしが類似しない場合には、領域から得られるヒストグラムどうしも類似しない。したがって、領域から得られるヒストグラムは、領域内に存在する図形を一意に表現しているものである。 As shown in FIG. 18, when the figures (placement of outline pixels) existing in the area are similar, the histograms obtained from the areas are similar, and when the figures existing in the area are not similar The histograms obtained from the regions are not similar. Therefore, the histogram obtained from the region uniquely represents a graphic existing in the region.

次に、図１９のフローチャートを参照して、図１６の情報処理装置１が行う他の最尤姿勢出力処理を説明する。 Next, another maximum likelihood posture output process performed by the information processing apparatus 1 in FIG. 16 will be described with reference to the flowchart in FIG. 19.

ステップＳ１３１乃至ステップＳ１３３において、図１６のステップＳ３１乃至ステップＳ３３と同様の処理が行われる。 In steps S131 to S133, the same processing as in steps S31 to S33 in FIG. 16 is performed.

ステップＳ１３４において、特徴量生成部６１は、細線化部３３からの細線化画像内のシルエット線から、シルエット線の特徴を表す複数のヒストグラムにより構成されるシェイプコンテキスト特徴量（シルエット線のシェイプコンテキスト特徴量）を生成する特徴量生成処理を行い、その結果得られたシルエット線のシェイプコンテキスト特徴量を、マッチング部６３に供給する。 In step S134, the feature quantity generation unit 61 uses the shape context feature quantity (the silhouette context feature of the silhouette line) configured from a plurality of histograms representing the characteristics of the silhouette line from the silhouette lines in the thinned image from the thinning section 33. A feature amount generation process for generating (quantity) is performed, and the shape context feature amount of the silhouette line obtained as a result is supplied to the matching unit 63.

ステップＳ１３５において、マッチング部６３は、特徴量生成部６１からの、シルエット線のシェイプコンテキスト特徴量と、複数の姿勢モデル線毎に、姿勢記憶部６２に記憶されている姿勢モデル線のシェイプコンテキスト特徴量とを比較することにより、シルエット線のシェイプコンテキスト特徴量と姿勢モデル線のシェイプコンテキスト特徴量との一致の度合いを表す一致度（シェイプコンテキスト特徴量の一致度）を算出する特徴量マッチング処理を行い、その特徴量マッチング処理により、複数の姿勢モデル線毎に得られたシェイプコンテキスト特徴量の一致度を、姿勢推定部６４に供給する。 In step S 135, the matching unit 63 determines the shape context feature of the posture line stored in the posture storage unit 62 for each of the plurality of posture model lines and the silhouette line shape context feature amount from the feature amount generation unit 61. A feature amount matching process for calculating a degree of coincidence (a degree of coincidence of shape context feature amounts) indicating a degree of coincidence between the shape context feature amount of the silhouette line and the shape context feature amount of the pose model line by comparing the amount. And the degree of coincidence of the shape context feature values obtained for each of the plurality of posture model lines is supplied to the posture estimation unit 64 by the feature amount matching processing.

ステップＳ１３６において、姿勢推定部６４は、複数の姿勢モデル線のうち、一致の度合いが最大のシェイプコンテキスト特徴量の一致度に対応する姿勢モデル線、すなわち、例えば、マッチング部６３からの複数のシェイプコンテキスト特徴量の一致度のうち、最小のシェイプコンテキスト特徴量の一致度に対応する姿勢モデル線が表す姿勢を、カメラ３１で撮像された人物の姿勢であると推定して、その推定結果を出力する。 In step S136, the posture estimation unit 64 selects a posture model line corresponding to the degree of coincidence of the shape context feature quantity having the largest degree of matching among the plurality of posture model lines, that is, for example, a plurality of shapes from the matching unit 63. The posture represented by the posture model line corresponding to the degree of coincidence of the minimum shape context feature amount among the degree of coincidence of the context feature amount is estimated to be the posture of the person imaged by the camera 31, and the estimation result is output. To do.

次に、図２０のフローチャートを参照して、図１９のステップＳ１３４における特徴量生成処理の詳細を説明する。 Next, details of the feature quantity generation processing in step S134 of FIG. 19 will be described with reference to the flowchart of FIG.

ステップＳ１６１において、特徴量生成部６１は、細線化部３３からの細線化画像内のシルエット線を構成する白色画素のうちの所定の画素に注目し、注目画素とする。 In step S 161, the feature amount generation unit 61 pays attention to a predetermined pixel among white pixels constituting a silhouette line in the thinned image from the thinning unit 33 and sets it as a target pixel.

ステップＳ１６２において、特徴量生成部６１は、注目画素に対応して、図１７左側に示したように、略扇形の複数の領域を設定する。 In step S162, the feature quantity generation unit 61 sets a plurality of substantially sector-shaped areas as shown on the left side of FIG. 17 corresponding to the target pixel.

ステップＳ１６３において、特徴量生成部６１は、ステップＳ１６２で設定した複数の領域毎に、領域に含まれる白色画素を検出することにより、図１７右側に示したように、ヒストグラムを生成する。 In step S163, the feature quantity generation unit 61 generates a histogram as shown on the right side of FIG. 17 by detecting white pixels included in each area for each of the plurality of areas set in step S162.

ステップＳ１６４において、特徴量生成部６１は、シルエット線を構成する白色画素すべてを、注目画素としたか否かを判定し、まだ白色画素すべてを注目画素としていないと判定した場合、処理は、ステップＳ１６１に戻り、まだ注目画素とされていない白色画素を、新たな注目画素として、処理は、ステップＳ１６２に進められ、以下、同様の処理が繰り返される。 In step S164, the feature quantity generation unit 61 determines whether or not all the white pixels constituting the silhouette line are the target pixels, and if it is determined that all the white pixels are not yet the target pixels, the process proceeds to step S164. Returning to S161, the process proceeds to step S162 using a white pixel that has not yet been set as the target pixel as a new target pixel, and the same processing is repeated thereafter.

また、ステップＳ１６４では、特徴量生成部６１は、シルエット線を構成する白色画素すべてを、注目画素としたと判定した場合、ステップＳ１６３において、注目される、シルエット線を構成する白色画素の数だけ得られた複数のヒストグラムを、シルエット線のシェイプコンテキスト特徴量として、処理は、図１９のステップＳ１３４にリターンされる。 In step S164, when the feature quantity generation unit 61 determines that all the white pixels constituting the silhouette line are the target pixels, in step S163, the feature amount generation unit 61 determines the number of white pixels constituting the silhouette line to be noticed. The processing is returned to step S134 in FIG. 19 using the obtained plurality of histograms as shape context feature quantities of silhouette lines.

以上のように、図１９の他の最尤姿勢出力処理では、人物の身体にマーカ等を付加させることなく、人物が撮像された撮像画像から、人物の姿勢を推定するようにしたので、人物に負担をかけることなく、人物の姿勢を推定することができる。 As described above, in the other maximum likelihood posture output processing in FIG. 19, the posture of the person is estimated from the captured image in which the person is captured without adding a marker or the like to the human body. It is possible to estimate the posture of the person without burdening the user.

なお、図１２の最尤姿勢出力処理では、例えば、ステップＳ３６において、姿勢推定部３７は、マッチング部３６からの複数のチャンファ距離のうち、最短のチャンファ距離に対応する姿勢モデル画像内の姿勢モデル線が表す姿勢を、カメラ３１で撮像された人物の姿勢であると推定して、その推定結果を出力するようにしたが、例えば、最短のチャンファ距離が所定の閾値未満である場合には、最短のチャンファ距離に対応する姿勢モデル画像内の姿勢モデル線が表す姿勢を、推定結果として出力し、最短のチャンファ距離が所定の閾値以上である場合には、人物の姿勢として推定される姿勢が存在しないものとし、その旨を出力するようにしてもよい。 In the maximum likelihood posture output process of FIG. 12, for example, in step S 36, the posture estimation unit 37 includes the posture model in the posture model image corresponding to the shortest chamfer distance among the plurality of chamfer distances from the matching unit 36. The posture represented by the line is estimated to be the posture of the person imaged by the camera 31, and the estimation result is output. For example, when the shortest chamfer distance is less than a predetermined threshold, The posture represented by the posture model line in the posture model image corresponding to the shortest chamfer distance is output as an estimation result, and when the shortest chamfer distance is equal to or greater than a predetermined threshold, the posture estimated as the posture of the person is It may be assumed that it does not exist, and that fact may be output.

この場合、所定の閾値未満である最短のチャンファ距離に対応する姿勢モデル画像内の姿勢モデル線が表す姿勢のみが、推定結果として出力されるため、最短のチャンファ距離に対応する姿勢モデル画像内の姿勢モデル線が表す姿勢を必ず出力する場合と比較して、人物の姿勢を推定する精度を向上させることが可能となる。なお、図１９の他の最尤姿勢出力処理のステップＳ１３６についても、同様である。 In this case, since only the posture represented by the posture model line in the posture model image corresponding to the shortest chamfer distance that is less than the predetermined threshold is output as an estimation result, the posture model image corresponding to the shortest chamfer distance in the posture model image Compared with the case where the posture represented by the posture model line is always output, the accuracy of estimating the posture of the person can be improved. The same applies to step S136 of other maximum likelihood posture output processing in FIG.

また、マッチング部６３は、シルエット線を構成する白色画素を、ラスタスキャン順の順番に並べたときのx番目の白色画素について得られるヒストグラムと、姿勢モデル線を構成する画素を、ラスタスキャン順の順番に並べたときのx番目の画素について得られるヒストグラムとの対応するBin番号のサンプル点数どうしの差分絶対値の総和を、ヒストグラムどうしの一致度として算出することにより、コンテキスト特徴量の一致度を算出することとしたが、これに限定されない。 In addition, the matching unit 63 converts a histogram obtained for the x-th white pixel when the white pixels constituting the silhouette line are arranged in the raster scan order and the pixels constituting the posture model line in the raster scan order. By calculating the sum of absolute difference values of the corresponding Bin number sample points with the histogram obtained for the x-th pixel when arranged in order as the degree of coincidence between the histograms, the degree of coincidence of the context feature amount is calculated. Although it was decided to calculate, it is not limited to this.

すなわち、例えば、マッチング部６３は、シルエット線を構成する白色画素と、姿勢モデル線を構成する画素とを、それぞれ、１対１に対応付けたときのすべての組合せ毎に、コンテキスト特徴量の一致度を算出する。そして、マッチング部６３は、すべての組合せ毎に算出されたコンテキスト特徴量の一致度のうちの最小値を、例えばHungarian Method等のコスト最小化アルゴリズムを用いて算出し、最終的なコンテキスト特徴量の一致度として採用するようにしてもよい。 That is, for example, the matching unit 63 matches the context feature amount for every combination when the white pixels constituting the silhouette line and the pixels constituting the posture model line are respectively associated one-to-one. Calculate the degree. Then, the matching unit 63 calculates the minimum value of the matching degree of the context feature amount calculated for every combination using a cost minimizing algorithm such as Hungarian Method, and the final context feature amount is calculated. You may make it employ | adopt as a coincidence degree.

この場合、例えば、シルエット線を構成する白色画素と、姿勢モデル線を構成する画素とを、ラスタスキャン順等の順番で１対１に対応付ける場合と比較して、より適切なコンテキスト特徴量の一致度を算出することができる。 In this case, for example, more appropriate matching of the context feature amount than in the case where the white pixels constituting the silhouette line and the pixels constituting the posture model line are associated one-to-one in the order of raster scan order or the like. The degree can be calculated.

図１の情報処理装置１のシルエット抽出部３２は、例えば、人物を含まない背景のみが撮像された背景画像と、カメラ３１からの撮像画像との差分をとる背景差分法を用いることにより、撮像画像内の人物を検出することとしたが、さらに、撮像画像内の人物等の動きをも考慮すれば、より正確に、撮像画像内の人物を検出することが可能である。 The silhouette extraction unit 32 of the information processing apparatus 1 in FIG. 1 uses, for example, a background difference method that obtains a difference between a background image in which only a background not including a person is captured and a captured image from the camera 31. Although the person in the image is detected, it is possible to more accurately detect the person in the captured image if the movement of the person in the captured image is also taken into consideration.

また、撮像画像内の人物を検出する方法は、これに限定されない。すなわち、例えば、カメラ３１に、レーザを人物等の撮像対象に照射して、レーザの反射光を検出するまでの時間を計測することにより、カメラ３１から撮像対象までの距離を表す撮像対象距離を算出するレーザレンジファインダを設け、レーザレンジファインダにより算出された撮像対象距離に基づいて、人物を検出することが可能である。 Further, the method for detecting a person in the captured image is not limited to this. That is, for example, the imaging target distance representing the distance from the camera 31 to the imaging target is obtained by irradiating the imaging target such as a person to the camera 31 and measuring the time until the reflected light of the laser is detected. A laser range finder to be calculated is provided, and a person can be detected based on the imaging target distance calculated by the laser range finder.

具体的には、例えば、シルエット抽出部３２は、レーザレンジファインダにより算出された撮像対象距離が、人物等の被写体であるのか背景であるのかを判定するための被写体判定閾値未満であるか否かを判定し、被写体判定閾値未満である撮像対象を被写体とし、被写体判定閾値以上である撮像対象を背景とすることにより、人物等の被写体を検出することが可能である。 Specifically, for example, the silhouette extraction unit 32 determines whether or not the imaging target distance calculated by the laser range finder is less than a subject determination threshold for determining whether the subject is a subject such as a person or the background. It is possible to detect a subject such as a person by using an imaging target that is less than the subject determination threshold as a subject and using an imaging target that is greater than or equal to the subject determination threshold as a background.

また、その他、例えば、撮像対象距離を算出するレーザレンジファインダに代えて、カメラ３１のほか、カメラ３１とは異なる他のカメラを設け、カメラ３１と他のカメラとの視差により、撮像対象距離を算出するステレオ処理により、撮像対象距離を算出し、算出した撮像対象距離に基づいて、人物を検出するようにしてもよい。 In addition, for example, in place of the laser range finder for calculating the imaging target distance, in addition to the camera 31, another camera different from the camera 31 is provided, and the imaging target distance is determined by parallax between the camera 31 and the other camera. The imaging target distance may be calculated by the calculated stereo process, and a person may be detected based on the calculated imaging target distance.

なお、姿勢記憶部３５は、複数の特徴点の座標値を記憶していることとしたが、座標値に代えて、例えば、姿勢線が現れた姿勢モデル画像を記憶しているようにしてもよい。この場合、マッチング部３６は、姿勢記憶部３５に記憶された姿勢モデル画像を用いて、距離マッチング処理を行うことができるため、座標値から姿勢モデル線を生成し、生成した姿勢モデルが現れた姿勢モデル画像を用いて、距離マッチング処理を行う場合と比較して、より迅速に、距離マッチング処理を行うことが可能となる。 Note that the posture storage unit 35 stores the coordinate values of a plurality of feature points, but instead of the coordinate values, for example, a posture model image in which posture lines appear may be stored. Good. In this case, since the matching unit 36 can perform the distance matching process using the posture model image stored in the posture storage unit 35, the posture model line is generated from the coordinate values, and the generated posture model appears. Compared with the case where the distance matching process is performed using the posture model image, the distance matching process can be performed more quickly.

最尤姿勢出力処理、および他の最尤姿勢出力処理では、被写体として、例えば人物を採用することとしたが、例えば、猿や、オラウータン等の動物、およびマネキン等の、一定の姿勢をとることが可能なものを、被写体として採用することが可能である。 In the maximum likelihood posture output processing and other maximum likelihood posture output processing, for example, a person is adopted as a subject, but a certain posture such as a monkey, an animal such as an orangutan, and a mannequin is taken. Anything that can be used can be used as a subject.

本発明を適用した情報処理装置としては、例えば、パーソナルコンピュータ等を採用することができる。 As the information processing apparatus to which the present invention is applied, for example, a personal computer can be employed.

上述した一連の処理は、専用のハードウエアにより実行させることもできるし、ソフトウエアにより実行させることもできる。一連の処理をソフトウエアにより実行させる場合には、そのソフトウエアを構成するプログラムが、専用のハードウエアに組み込まれているコンピュータ、または、各種のプログラムをインストールすることで、各種の機能を実行することが可能な、例えば汎用のパーソナルコンピュータ等に、記録媒体からインストールされる。 The series of processes described above can be executed by dedicated hardware or can be executed by software. When a series of processing is executed by software, a program constituting the software executes various functions by installing a computer incorporated in dedicated hardware or various programs. For example, it is installed from a recording medium in a general-purpose personal computer or the like.

図２１は、プログラムを実行することにより上述した一連の処理を行うコンピュータの構成例を示している。 FIG. 21 shows a configuration example of a computer that performs the above-described series of processes by executing a program.

CPU(Central Processing Unit)２０１は、ROM(Read Only Memory)２０２、または記憶部２０８に記憶されているプログラムにしたがって各種の処理を実行する。RAM(Random Access Memory)２０３には、CPU２０１が実行するプログラムやデータ等が適宜記憶される。これらのCPU２０１、ROM２０２、およびRAM２０３は、バス２０４により相互に接続されている。 A CPU (Central Processing Unit) 201 executes various processes according to a program stored in a ROM (Read Only Memory) 202 or a storage unit 208. A RAM (Random Access Memory) 203 appropriately stores programs executed by the CPU 201, data, and the like. These CPU 201, ROM 202, and RAM 203 are connected to each other by a bus 204.

CPU２０１にはまた、バス２０４を介して入出力インタフェース２０５が接続されている。入出力インタフェース２０５には、キーボード、マウス、マイクロホン等よりなる入力部２０６、モニタ、スピーカ等よりなる出力部２０７が接続されている。CPU２０１は、入力部２０６から入力される指令に対応して各種の処理を実行する。そして、CPU２０１は、処理の結果を出力部２０７に出力する。 An input / output interface 205 is also connected to the CPU 201 via the bus 204. Connected to the input / output interface 205 are an input unit 206 composed of a keyboard, a mouse, a microphone, and the like, and an output unit 207 composed of a monitor, a speaker, and the like. The CPU 201 executes various processes in response to commands input from the input unit 206. Then, the CPU 201 outputs the processing result to the output unit 207.

入出力インタフェース２０５に接続されている記憶部２０８は、例えばハードディスクからなり、CPU２０１が実行するプログラムや各種のデータを記憶する。通信部２０９は、インターネットやローカルエリアネットワーク等のネットワークを介して外部の装置と通信する。 A storage unit 208 connected to the input / output interface 205 includes, for example, a hard disk, and stores programs executed by the CPU 201 and various data. The communication unit 209 communicates with an external device via a network such as the Internet or a local area network.

入出力インタフェース２０５に接続されているドライブ２１０は、磁気ディスク、光ディスク、光磁気ディスク、或いは半導体メモリ等のリムーバブルメディア２１１が装着されたとき、それらを駆動し、そこに記録されているプログラムやデータ等を取得する。取得されたプログラムやデータは、必要に応じて記憶部２０８に転送され、記憶される。 The drive 210 connected to the input / output interface 205 drives a removable medium 211 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, and drives programs and data recorded there. Etc. The acquired program and data are transferred to and stored in the storage unit 208 as necessary.

コンピュータにインストールされ、コンピュータによって実行可能な状態とされるプログラムを格納する記録媒体は、図２１に示すように、磁気ディスク（フレキシブルディスクを含む）、光ディスク（CD-ROM(Compact Disc-Read Only Memory)，DVD(Digital Versatile Disc)を含む）、光磁気ディスク、もしくは半導体メモリ等よりなるパッケージメディアであるリムーバブルメディア２１１、または、プログラムが一時的もしくは永続的に格納されるROM２０２や、記憶部２０８を構成するハードディスク等により構成される。記録媒体へのプログラムの格納は、必要に応じてルータ、モデム等のインタフェースである通信部２０９を介して、ローカルエリアネットワーク、インターネット、デジタル衛星放送といった、有線または無線の通信媒体を利用して行われる。 As shown in FIG. 21, a recording medium that stores a program that is installed in a computer and can be executed by the computer includes a magnetic disk (including a flexible disk), an optical disk (CD-ROM (Compact Disc-Read Only Memory). ), DVD (Digital Versatile Disc), a removable medium 211 that is a package medium made of a magneto-optical disk, a semiconductor memory, or the like, or a ROM 202 or a storage unit 208 in which a program is temporarily or permanently stored. It is composed of a hard disk or the like. The program is stored in the recording medium using a wired or wireless communication medium such as a local area network, the Internet, or digital satellite broadcasting via a communication unit 209 that is an interface such as a router or a modem as necessary. Is called.

なお、本明細書において、記録媒体に格納されるプログラムを記述するステップは、記載された順序に沿って時系列的に行われる処理はもちろん、必ずしも時系列的に処理されなくとも、並列的あるいは個別に実行される処理をも含むものである。 In the present specification, the step of describing the program stored in the recording medium is not limited to the processing performed in chronological order according to the described order, but is not necessarily performed in chronological order. It also includes processes that are executed individually.

また、本発明の実施の形態は、上述した実施の形態に限定されるものではなく、本発明の要旨を逸脱しない範囲において種々の変更が可能である。 The embodiments of the present invention are not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the present invention.

本発明を適用した情報処理装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the information processing apparatus to which this invention is applied. シルエット画像の一例を示す図である。It is a figure which shows an example of a silhouette image. 細線化画像の一例を示す図である。It is a figure which shows an example of a thinning image. 細線化画像から得られた距離画像の一例を示す図である。It is a figure which shows an example of the distance image obtained from the thinning image. 姿勢モデル画像の一例を示す図である。It is a figure which shows an example of a posture model image. 注目画素が黒色画素に変換される条件を説明する第１の図である。It is a 1st figure explaining the conditions in which an attention pixel is converted into a black pixel. 注目画素が黒色画素に変換される条件を説明する第２の図である。It is a 2nd figure explaining the conditions in which an attention pixel is converted into a black pixel. 注目画素が黒色画素に変換される条件を説明する第３の図である。It is a 3rd figure explaining the conditions by which an attention pixel is converted into a black pixel. 注目画素が黒色画素に変換される条件を説明する第４の図である。It is a 4th figure explaining the conditions in which an attention pixel is converted into a black pixel. 初期距離が、仮距離に変換されるときの一例を示す図である。It is a figure which shows an example when an initial distance is converted into a temporary distance. 仮距離が、ユークリッド距離に変換されるときの一例を示す図である。It is a figure which shows an example when a temporary distance is converted into a Euclidean distance. 最尤姿勢出力処理の詳細を説明するフローチャートである。It is a flowchart explaining the detail of maximum likelihood attitude | position output processing. 細線化処理の詳細を説明するフローチャートである。It is a flowchart explaining the detail of a thinning process. 距離算出処理の詳細を説明するフローチャートである。It is a flowchart explaining the detail of a distance calculation process. ユークリッド距離、シティブロック距離、およびチェスボード距離を説明する図である。It is a figure explaining a Euclidean distance, a city block distance, and a chess board distance. 本発明を適用した情報処理装置の他の構成例を示すブロック図である。It is a block diagram which shows the other structural example of the information processing apparatus to which this invention is applied. シェイプコンテキスト特徴量を構成する所定のヒストグラムの生成方法を説明する図である。It is a figure explaining the production | generation method of the predetermined | prescribed histogram which comprises a shape context feature-value. シェイプコンテキスト特徴量を構成する所定のヒストグラムが、線の一部分の特徴を一意に表していることを説明する図である。It is a figure explaining that the predetermined histogram which comprises the shape context feature-value uniquely represents the characteristic of a part of line. 他の最尤姿勢出力処理の詳細を説明するフローチャートである。It is a flowchart explaining the detail of another maximum likelihood attitude | position output process. 特徴量生成処理の詳細を説明するフローチャートである。It is a flowchart explaining the detail of a feature-value production | generation process. コンピュータの構成例を示すブロック図である。It is a block diagram which shows the structural example of a computer.

Explanation of symbols

１情報処理装置，３１カメラ，３２シルエット抽出部，３３細線化部，３４距離算出部，３５姿勢記憶部，３６マッチング部，３７姿勢推定部，６１特徴量生成部，６２姿勢記憶部，６３マッチング部，６４姿勢推定部 DESCRIPTION OF SYMBOLS 1 Information processing apparatus, 31 Cameras, 32 Silhouette extraction part, 33 Thinning part, 34 Distance calculation part, 35 Posture memory | storage part, 36 Matching part, 37 Posture estimation part, 61 Feature-value production | generation part, 62 Posture memory | storage part, 63 Matching , 64 Posture estimation unit

Claims

Extraction means for extracting a silhouette representing a region where the subject appears from a captured image obtained by imaging the subject;
Thinning means for thinning the silhouette into a silhouette line obtained by simplifying the shape of the silhouette;
Calculating means for calculating a degree of coincidence representing a degree of coincidence between the posture model line and the silhouette line for each of a plurality of posture model lines that are prepared in advance and represent different predetermined postures;
Information comprising: an estimation unit configured to estimate, as the posture of the subject, a posture represented by the posture model line corresponding to a degree of matching having a maximum degree of matching among the degree of matching calculated for each of the plurality of posture model lines. Processing equipment.

The calculation means calculates a distance between the posture model line and the silhouette line for each of the plurality of posture model lines as the degree of coincidence,
The information processing apparatus according to claim 1, wherein the estimation unit estimates a posture represented by the posture model line corresponding to the shortest distance among the distances calculated for each of the plurality of posture model lines as the posture of the subject. .

The information processing apparatus according to claim 2, wherein when the shortest distance is less than a predetermined threshold, the estimation unit estimates the posture represented by the posture model line corresponding to the shortest distance as the posture of the subject.

The calculation means calculates, for each of the plurality of posture model lines, the degree of coincidence representing a degree of coincidence between a feature amount representing the feature of the posture model line and a feature amount representing the feature of the silhouette line. The information processing apparatus according to 1.

In the information processing method of the information processing apparatus for estimating the posture of the subject from the captured image obtained by imaging the subject,
The information processing apparatus includes:
Extraction means;
Thinning means;
A calculation means;
An estimation means, and
The extraction means extracts a silhouette representing a region where the subject appears from a captured image obtained by imaging the subject,
The thinning means thins the silhouette into a silhouette line that simplifies the shape of the silhouette,
The calculation means calculates a degree of coincidence representing a degree of coincidence between the posture model line and the silhouette line for each of a plurality of posture model lines representing predetermined postures prepared in advance,
A step of estimating, as the posture of the subject, the posture represented by the posture model line corresponding to the degree of coincidence having the highest degree of coincidence among the degree of coincidence calculated for each of the plurality of posture model lines; An information processing method including:

Computer
Extraction means for extracting a silhouette representing a region where the subject appears from a captured image obtained by imaging the subject;
Thinning means for thinning the silhouette into a silhouette line obtained by simplifying the shape of the silhouette;
Calculating means for calculating a degree of coincidence representing a degree of coincidence between the posture model line and the silhouette line for each of a plurality of posture model lines that are prepared in advance and represent different predetermined postures;
Functions as an estimation unit that estimates the posture represented by the posture model line corresponding to the degree of coincidence having the maximum degree of coincidence among the degree of coincidence calculated for each of the plurality of posture model lines as the posture of the subject. Program to let you.