JP5718632B2

JP5718632B2 - Part recognition device, part recognition method, and part recognition program

Info

Publication number: JP5718632B2
Application number: JP2010286417A
Authority: JP
Inventors: 悠樹鍵谷; 拓雄森口; 央智牛島; 健宏馬渕
Original assignee: SOHGO SECURITY SERVICES CO.,LTD.
Current assignee: SOHGO SECURITY SERVICES CO.,LTD.
Priority date: 2010-12-22
Filing date: 2010-12-22
Publication date: 2015-05-13
Anticipated expiration: 2030-12-22
Also published as: JP2012133666A

Description

本発明は、部位認識装置、部位認識方法、及び部位認識プログラムに係り、特に撮影等により得られた映像や画像に含まれる人物の部位等を高精度に認識するための部位認識装置、部位認識方法、及び部位認識プログラムに関する。 The present invention relates to a part recognition device, a part recognition method, and a part recognition program, and in particular, a part recognition apparatus and part recognition for accurately recognizing a part of a person included in a video or an image obtained by photographing or the like. The present invention relates to a method and a site recognition program.

従来、銀行や百貨店、コンビニエンスストア等の小売店等の監視区域内には、防犯等の理由でカメラが設置されている。監視員等がカメラを用いて監視を行う場合、第１にリアルタイムや録画画像を目視確認する、第２に人物検出手法等により人物が居る場合のみを目視確認する、といった方法が主である。この目視確認により、カメラに映る人物の不審な特徴・不審な行動を見つけ、注意や警察に通報する等の処置を行う。 Conventionally, cameras have been installed in surveillance areas such as banks, department stores, and convenience stores for reasons such as crime prevention. When a monitor or the like performs monitoring using a camera, the first method is to visually confirm a real-time or recorded image, and secondly to visually confirm only when a person is present by a person detection method or the like. By this visual confirmation, suspicious characteristics / suspicious behavior of the person appearing on the camera are found, and actions such as notifying or reporting to the police are performed.

なお、カメラは、例えば画像伝送にＮＴＳＣ同軸ケーブルを用いたアナログカメラが全国各地に多く設置されている。また、その多くは、単眼のカメラで、１つの定められた領域を監視しているものである。 For example, many analog cameras using NTSC coaxial cables for image transmission are installed throughout the country. In many cases, a single camera is used to monitor one predetermined area.

また、近年では、カメラの画像から、人物とその手先等の部位を検出する方法が存在する（例えば、特許文献１及び特許文献２参照）。特許文献１に示されている手法では、複眼のカメラを用い、顔の領域と、顔に近い視差を持ち、且つ類似する肌色値を持つ領域を腕領域として検出し、その後、検出した腕領域を基準として動作の認識を行う。また、特許文献１に示されている手法では、カメラ２台を用いてカメラの前に立つ人物（ユーザ）と背景領域を分離し、頭部及び手先を認識している。また、特許文献２に示されている手法では、手先の認識において、手先領域が頭部領域よりも尖っていることを利用している。 In recent years, there are methods for detecting parts such as a person and their hands from a camera image (see, for example, Patent Document 1 and Patent Document 2). In the technique disclosed in Patent Document 1, a compound eye camera is used to detect a face region and a region having a parallax close to the face and having a similar skin color value as an arm region, and then the detected arm region Recognize motion based on In the technique disclosed in Patent Document 1, a person (user) standing in front of a camera and a background area are separated using two cameras, and the head and the hand are recognized. Further, the technique disclosed in Patent Document 2 utilizes the fact that the hand region is pointed more than the head region in hand recognition.

特開２０１０−１２３０１９号公報JP 2010-123019 A 特開２００９−２１１５６３号公報JP 2009-211153 A

ここで、上述した特許文献１に示す手法では、肌色検出による誤検知の多発という課題を解決するものであるが、例えば、「手が手袋等で覆われている場合」、「手が肌色検出で検出できず、且つ人物が後ろ向きの場合」、「手が肌色検出で検出できず、且つ顔が覆われている場合」、「手が肌色検出で検出できず、且つ服装の色と顔の色が近い場合」、「肌色に近い服装の場合」、「手が肌色検出で検出できず、且つ手または顔のどちらかに影できる、或いは強い照明が当たっている場合」等のように、頭と手の色が異なる場合に未検出・誤検出が発生する可能性がある。 Here, the method shown in Patent Document 1 described above solves the problem of frequent false detection due to skin color detection. For example, “when the hand is covered with gloves” or “the hand detects skin color” Cannot be detected with the skin color detection, and the hand cannot be detected with skin color detection and the face is covered. "If the color is close", "If the clothes are close to skin color", "If the hand cannot be detected by skin color detection and can be shaded on either the hand or the face, or if there is strong illumination", etc. If the head and hand colors are different, undetected / false positives may occur.

また、特許文献１に示す手法では、カメラを２台、又は複眼のカメラを用いることになるため、設置及び処理コストが掛かり、またカメラ２台の場合は、そのカメラを並べて設置する等、設置上の制約がある。 Further, in the method shown in Patent Document 1, since two cameras or a compound eye camera is used, installation and processing costs are required. In the case of two cameras, the cameras are installed side by side. There are the above restrictions.

また、特許文献２に示す手法では、被写体に対してマーカを強要したり、正面を向くことを強要したり、手を前に出したり振る等の動作を強要する場合がある。また、特許文献２に示す手法では、詳細な形状情報（指の形等）を利用する場合は、「手が握っており丸い時」、「頭に何か被っている時（帽子等）」、「人物がカメラに対し遠い等で、指がはっきり映らない時」、「認識できるエリアが限定されることがあり、移動を伴った挙動認識が不可能」等の条件により誤検知となる場合がある。 Further, in the method shown in Patent Document 2, there are cases in which an operation such as forcing a marker to the subject, forcing to turn to the front, or forcing a hand forward or shaking is sometimes performed. Further, in the method shown in Patent Document 2, when detailed shape information (finger shape, etc.) is used, “when hand is held and round”, “when something is covered on head (hat etc.)” , “When a person is far away from the camera, etc., and the finger is not clearly visible”, “When the area that can be recognized may be limited, behavior recognition with movement is impossible”, etc. There is.

なお、単眼カメラを用いた場合には、手先の位置ではなく、ぼんやりと腕の位置を認識するような方法は、挙動認識に応用できない。また、白い壁等、理想的な背景でなければいけない。更に、誤って人体として検出された場合も、そのまま手先を検出しようとすることがある。 When a monocular camera is used, a method of recognizing not the hand position but the arm position gently cannot be applied to behavior recognition. Also, it must be an ideal background, such as a white wall. Furthermore, even if a human body is mistakenly detected, the hand may be detected as it is.

更に、単眼カメラとして赤外線カメラのみを用いた場合には、カメラの種類が限定されてしまう。また、把持物がある場合を想定していない。 Furthermore, when only an infrared camera is used as a monocular camera, the type of camera is limited. Moreover, the case where there is a grasped object is not assumed.

本発明は、上記の問題点に鑑みてなされたものであって、撮影等により得られた映像や画像に含まれる人物の部位等を高精度に認識するための部位認識装置、部位認識方法、及び部位認識プログラムを提供することを目的とする。 The present invention has been made in view of the above-described problems, and is a part recognition device, a part recognition method, and the like for accurately recognizing a person's part or the like included in a video or image obtained by photographing or the like. And a site recognition program.

上記課題を解決するために、本発明は、以下の特徴を有する課題を解決するための手段を採用している。 In order to solve the above problems, the present invention employs means for solving the problems having the following features.

また本発明は、映像又は画像に含まれる人物の部位を認識する部位認識装置において、前記映像又は画像に含まれる少なくとも１人の人物の人体領域を検出する人体領域検出手段と、前記人体領域検出手段により得られる人体領域から所定の部位を認識する部位認識手段とを有し、前記部位認識手段は、前記人体領域検出手段により得られる人体領域を細線化し、細線化された情報に基づいて端点と分岐点との関係を示す行列を生成し、生成された行列と、予め登録された人物の複数の行列とを比較して、前記所定の部位を検出することを特徴とする。
Further, the present invention provides a human body region detecting means for detecting a human body region of at least one person included in the video or image, and the human body region detection in a part recognition device for recognizing a human part included in the video or image. Part recognition means for recognizing a predetermined part from the human body region obtained by the means, the part recognition means thins the human body region obtained by the human body region detection means, and the end point based on the thinned information And a branching point is generated, and the predetermined matrix is detected by comparing the generated matrix with a plurality of previously registered human matrices .

また本発明は、映像又は画像に含まれる人物の部位を認識する部位認識装置において、前記映像又は画像に含まれる少なくとも１人の人物の人体領域を検出する人体領域検出手段と、前記人体領域検出手段により得られる人体領域から所定の部位を認識する部位認識手段とを有し、前記部位認識手段は、前記人体領域検出手段により得られる人体領域を細線化すると共に、前記人体領域の画像に対してエッジ処理を行い、細線化された人体領域の線分の位置を基準として、前記エッジ処理された人体領域のエッジが所定形状になる部分を検出し、前記所定形状を検出した領域を前記所定の部位とすることを特徴とする。 Further, the present invention provides a human body region detecting means for detecting a human body region of at least one person included in the video or image, and the human body region detection in a part recognition device for recognizing a human part included in the video or image. Part recognition means for recognizing a predetermined part from the human body region obtained by the means, wherein the part recognition means thins the human body region obtained by the human body region detection means, and applies to the image of the human body region. Edge processing is performed, a portion where the edge of the human body region subjected to the edge processing has a predetermined shape is detected on the basis of the position of the line segment of the thinned human body region, and the region where the predetermined shape is detected is detected as the predetermined region. It is characterized by being a part of.

また本発明は、映像又は画像に含まれる人物の部位を認識するための部位認識方法において、前記映像又は画像に含まれる少なくとも１人の人物の人体領域を検出する人体領域検出ステップと、前記人体領域検出ステップにより得られる人体領域から所定の部位を認識する部位認識ステップとを有し、前記部位認識ステップは、前記人体領域検出ステップにより得られる人体領域を細線化し、細線化された情報に基づいて端点と分岐点との関係を示す行列を生成し、生成された行列と、予め登録された人物の複数の行列とを比較して、前記所定の部位を検出することを特徴とする。
The present invention also provides a human body region detection step for detecting a human body region of at least one person included in the video or image in a part recognition method for recognizing a human part included in a video or image, and the human body A region recognition step for recognizing a predetermined region from the human body region obtained by the region detection step, wherein the region recognition step is based on the thinned information obtained by thinning the human body region obtained by the human body region detection step. Then , a matrix indicating a relationship between the end points and the branch points is generated, and the predetermined matrix is detected by comparing the generated matrix with a plurality of pre-registered human matrices .

また本発明は、映像又は画像に含まれる人物の部位を認識するための部位認識方法において、前記映像又は画像に含まれる少なくとも１人の人物の人体領域を検出する人体領域検出ステップと、前記人体領域検出ステップにより得られる人体領域から所定の部位を認識する部位認識ステップとを有し、前記部位認識ステップは、前記人体領域検出ステップにより得られる人体領域を細線化すると共に、前記人体領域の画像に対してエッジ処理を行い、細線化された人体領域の線分の位置を基準として、前記エッジ処理された人体領域のエッジが所定形状になる部分を検出し、前記所定形状を検出した領域を前記所定の部位とすることを特徴とする。 The present invention also provides a human body region detection step for detecting a human body region of at least one person included in the video or image in a part recognition method for recognizing a human part included in a video or image, and the human body A part recognition step for recognizing a predetermined part from the human body region obtained by the region detection step, wherein the part recognition step thins the human body region obtained by the human body region detection step, and images of the human body region Edge processing is performed on the basis of the position of the line segment of the thinned human body region, a portion where the edge of the human body region subjected to the edge processing has a predetermined shape is detected, and the region where the predetermined shape is detected is detected. The predetermined portion is used.

また本発明は、コンピュータを、請求項１乃至８の何れか１項に記載された部位認識装置として機能させることを特徴とする部位認識プログラムである。
According to another aspect of the present invention, there is provided a part recognition program for causing a computer to function as the part recognition apparatus according to any one of claims 1 to 8 .

本発明によれば、撮影等により得られた映像や画像に含まれる人物の部位等を高精度に認識することができる。 ADVANTAGE OF THE INVENTION According to this invention, the site | part etc. of the person contained in the image | video and image obtained by imaging | photography etc. can be recognized with high precision.

本実施形態における部位認識装置の機能構成の一例を示す図である。It is a figure which shows an example of a function structure of the site | part recognition apparatus in this embodiment. 本実施形態における部位認識処理が実現可能なハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions which can implement | achieve the site | part recognition process in this embodiment. 本実施形態における部位認識の概略処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the outline process sequence of the site | part recognition in this embodiment. 本実施形態における部位認識の第１の実施例を説明するための図である。It is a figure for demonstrating the 1st Example of the site | part recognition in this embodiment. 円検出処理を説明するための一例を示す図である。It is a figure which shows an example for demonstrating a circle detection process. 本実施形態における部位認識の第２の実施例を説明するための図である。It is a figure for demonstrating the 2nd Example of the site | part recognition in this embodiment. 本実施形態における部位認識の第３の実施例を説明するための図である。It is a figure for demonstrating the 3rd Example of the site | part recognition in this embodiment. 本実施形態における部位認識の第４の実施例を説明するための図である。It is a figure for demonstrating the 4th Example of the site | part recognition in this embodiment. 本実施形態における行列化処理を説明するための図である。It is a figure for demonstrating the matrixing process in this embodiment. グラフマッチングを説明するための図である。It is a figure for demonstrating graph matching. 重みを付与した細線化グラフの生成手法について説明するための図である。It is a figure for demonstrating the production | generation method of the thinning graph to which the weight was provided. モデルグラフデータベースを説明するための図である。It is a figure for demonstrating a model graph database. 対象人物の手が挙がっている場合の対応を説明するための図である。It is a figure for demonstrating a response | compatibility when the target person's hand is raised. 本実施形態における部位認識の第６の実施例を説明するための図である。It is a figure for demonstrating the 6th Example of the site | part recognition in this embodiment. 本実施形態における部位認識の第７の実施例を説明するための図である。It is a figure for demonstrating the 7th Example of the site | part recognition in this embodiment. 本実施形態における人体エッジを用いた円検出手法の具体例を説明するための図である。It is a figure for demonstrating the specific example of the circle | round | yen detection method using the human body edge in this embodiment. （ウ）の処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the process sequence of (c). 手先検出の実施例を説明するための図である。It is a figure for demonstrating the Example of hand detection. 本実施形態により生成される画面の一例を示す図である。It is a figure which shows an example of the screen produced | generated by this embodiment. 本実施形態により生成される他の画面例を示す図である。It is a figure which shows the other example of a screen produced | generated by this embodiment.

＜本発明について＞
本発明は、カメラ等の撮像手段に撮影された映像（画像も含む）や、インターネット等の通信ネットワーク等により外部から取得した各種映像等から人物の部位の認識や動作の認識を高精度に行う。具体的には、本発明は、例えば既存の単眼カメラ１台等を用いて撮影された映像等に対する画像認識により、画像中に含まれる人物等における特定部位（例えば、手先や腕、頭、足、つま先等）を認識する。また、本発明では、認識された特定部位を経時的（時系列的）に追跡することで、正確な挙動認識を行う。 <About the present invention>
The present invention recognizes a person's part and recognizes a motion with high accuracy from a video (including an image) taken by an imaging means such as a camera or various videos acquired from the outside through a communication network such as the Internet. . Specifically, for example, the present invention recognizes a specific part (for example, a hand, arm, head, foot, etc.) of a person or the like included in an image by recognizing an image taken using, for example, an existing monocular camera. , Toes, etc.). In the present invention, accurate behavior recognition is performed by tracking the recognized specific part over time (in time series).

以下に、本発明における部位認識装置、部位認識方法、及び部位認識プログラムを好適に実施した形態について、図面を用いて説明する。なお、以下の処理では、人体の部位認識の一例として手先検出を例に説明するが、本発明においてはこれに限定されるものではなく、例えば頭（顔）、足等の部位であってもよい。 Below, the form which carried out suitably the part recognition device, the part recognition method, and the part recognition program in the present invention is explained using a drawing. In the following processing, hand detection will be described as an example of human body region recognition. However, the present invention is not limited to this. For example, even a region such as a head (face) or a foot may be used. Good.

＜部位認識装置：機能構成例＞
図１は、本実施形態における部位認識装置の機能構成の一例を示す図である。図１に示す部位認識装置１０は、入力手段１１と、出力手段１２と、蓄積手段１３と、人体領域検出手段１４と、部位認識手段１５と、挙動認識手段１６と、画面生成手段１７と、通知手段１８と、送受信手段１９と、制御手段２０とを有するよう構成されている。 <Part recognition device: functional configuration example>
FIG. 1 is a diagram illustrating an example of a functional configuration of the part recognition apparatus according to the present embodiment. The part recognition apparatus 10 shown in FIG. 1 includes an input means 11, an output means 12, a storage means 13, a human body region detection means 14, a part recognition means 15, a behavior recognition means 16, a screen generation means 17, The notification means 18, the transmission / reception means 19, and the control means 20 are configured.

入力手段１１は、ユーザ等からの人体領域検出指示や、部位認識指示、挙動認識指示、画面生成指示、通知指示、送受信指示等の本実施形態を実現するための各種指示を受け付ける。なお、入力手段１１は、例えばキーボードや、マウス等のポインティングデバイス、或いは、マイク等の音声入力デバイス等からなる。 The input unit 11 accepts various instructions for realizing the present embodiment such as a human body region detection instruction, a part recognition instruction, a behavior recognition instruction, a screen generation instruction, a notification instruction, and a transmission / reception instruction from a user or the like. Note that the input unit 11 includes, for example, a keyboard, a pointing device such as a mouse, or a voice input device such as a microphone.

出力手段１２は、入力手段１１により入力された指示内容や、各指示内容に基づいて生成された制御データにより、各構成で実行された経過又は結果等の各種情報を表示したり、その音声を出力する。なお、出力手段１２は、ディスプレイ等の画面表示機能やスピーカ等の音声出力機能等を有する。 The output means 12 displays various information such as the progress or result executed in each configuration, or displays the sound based on the instruction content input by the input means 11 and the control data generated based on each instruction content. Output. The output unit 12 has a screen display function such as a display, a sound output function such as a speaker, and the like.

更に、出力手段１２は、各機能により出力された結果や画面生成手段１７により生成された画面に表示された情報等を外部機器に出力する。つまり、出力手段１２は、外部機器への出力として、例えば、プリンタに出力したり、ファイルを生成して蓄積手段１３や、予め設定されたデータベース等の記憶装置や記録媒体に出力したり、監視区域（警備対象施設）内のセンサのＯＮ／ＯＦＦやライトの点灯／消灯を切り替えたり、警備員が所持する携帯端末に対して部位認識結果に基づく関連情報（異常があった場所や内容等）を表示するための制御信号を出力するといった印刷・出力機能等を有する。また、出力手段１２は、上述した１又は複数の外部機器に同時に出力することができる。 Further, the output unit 12 outputs the results output by the respective functions, the information displayed on the screen generated by the screen generation unit 17 and the like to the external device. In other words, the output unit 12 outputs, for example, to a printer as an output to an external device, generates a file and outputs it to a storage unit 13, a storage device such as a preset database, or a recording medium, Switch on / off of sensors in the area (security target facility), turn on / off the light, and related information based on the part recognition result for mobile terminals held by security guards (location and contents where there was an abnormality) Has a print / output function and the like for outputting a control signal for displaying the image. Further, the output unit 12 can output simultaneously to one or more external devices described above.

なお、上述した入力手段１１及び出力手段１２は、タッチパネルとして一体の構成であってもよい。 In addition, the input means 11 and the output means 12 mentioned above may be an integral structure as a touch panel.

蓄積手段１３は、上述した本実施形態を実現するための様々な情報を蓄積することができ、必要に応じて読み出しや書き込みが行われる。具体的には、蓄積手段１３は、顔の認証や、性別・年代等を推定するのに使用される各種特徴量データや、人体領域検出手段１４における人体領域検出結果、部位認識手段１５における手先等の部位検出結果、挙動認識手段１６における挙動認識結果、画面生成手段１７における画面生成結果、通知手段１８における通知結果、送受信手段１９における送受信情報、制御手段２０により制御された情報、エラー発生時のエラー情報、ログ情報、本発明を実現するためのプログラム等の各情報が蓄積される。更に、蓄積手段１３は、後述するグラフマッチング用データベースや時系列的に取得される人体領域や所定の部位の移動軌跡に対応する行動パターン情報等を蓄積する。 The storage unit 13 can store various information for realizing the above-described embodiment, and reading and writing are performed as necessary. Specifically, the accumulating unit 13 performs various feature amount data used for face authentication, gender and age estimation, a human body region detection result in the human body region detecting unit 14, and a hand in the part recognizing unit 15. Such as part detection results, behavior recognition results in the behavior recognition means 16, screen generation results in the screen generation means 17, notification results in the notification means 18, transmission / reception information in the transmission / reception means 19, information controlled by the control means 20, and when an error occurs Each information such as error information, log information, and a program for realizing the present invention is accumulated. Further, the storage means 13 stores a graph matching database, which will be described later, action pattern information corresponding to a human body region acquired in time series, a movement locus of a predetermined part, and the like.

人体領域検出手段１４は、例えば、コンビニエンスストアや百貨店等のレジ付近や、銀行の受付等の所定の監視区域等に設置された各カメラや、巡回している監視ロボットに設けられたカメラ等の撮像手段等により撮影されたリアルタイム映像や、撮影された後蓄積された膨大な量の監視映像、送受信手段１９により接続されるインターネット等の通信ネットワークを介して遠隔地にある画像サーバ等に蓄積された映像等の各種映像に対して、その映像中の画像に対して人体領域検出を行い、人物が含まれているか否かを判断する。 The human body region detection means 14 is, for example, a camera installed in the vicinity of a cash register of a convenience store, a department store, etc., in a predetermined monitoring area such as a bank reception, or a camera provided in a patrol surveillance robot. Real-time video captured by the imaging means, a huge amount of monitoring video accumulated after being captured, and stored in a remote image server via a communication network such as the Internet connected by the transmission / reception means 19 The human body region is detected from the images in the various videos such as the recorded video to determine whether or not a person is included.

具体的には、人体領域検出手段１４は、例えば、カメラ等により撮影された映像を、送受信手段１９を介して取得し、その取得した映像に含まれる時系列の各画像のうち、所定の画像（各フレーム画像や数フレーム分の間隔を空けた画像等）をキャプチャし、キャプチャした画像について１又は複数の人物を検出する。 Specifically, the human body region detection unit 14 acquires, for example, a video captured by a camera or the like via the transmission / reception unit 19, and a predetermined image among the time-series images included in the acquired video. (Each frame image or an image with an interval of several frames or the like) is captured, and one or a plurality of persons are detected in the captured image.

また、人体領域検出手段１４は、例えば連続する画像フレーム同士を比較して、色情報（輝度、色度等）が所定時間内に変化する場所が存在し、更にその場所で囲まれる領域が所定の領域以上のもの、又は経時的な移動範囲が所定の範囲内のものを人体領域として検出する。なお、人体検出手法については、本発明においてはこれに限定されるものではない。 Further, the human body region detection means 14 compares, for example, successive image frames, and there is a place where the color information (luminance, chromaticity, etc.) changes within a predetermined time, and the region surrounded by the place is predetermined. More than this area, or those within the predetermined range of movement over time are detected as human body areas. It should be noted that the human body detection method is not limited to this in the present invention.

また、人体領域検出手段１４は、人体領域の中心座標、及び人体領域の画像上の大きさを検出し、その人体領域を所定形状により元の画像に合成して人体領域が明確に分かるように画面表示するための各種情報を取得し、蓄積手段１３に蓄積させる。なお、人体領域の形状は、例えば矩形や円形、楕円形、他の多角形、人物の外形形状から所定倍率で拡大させた２値のシルエット形状等であってもよい。つまり、人体領域検出手段１４は、例えば、人体領域を白塗りにし、その他を黒塗りにしたシルエット形状の画像を生成することができる。更に、人体領域検出手段１４は、頭髪、上衣、下衣等の色情報を抽出したり、人物の実空間上での位置座標を算出したりする機能を有していてもよい。 Further, the human body region detection means 14 detects the center coordinates of the human body region and the size of the human body region on the image, and synthesizes the human body region with the original image with a predetermined shape so that the human body region can be clearly understood. Various information for screen display is acquired and stored in the storage means 13. Note that the shape of the human body region may be, for example, a rectangle, a circle, an ellipse, another polygon, a binary silhouette shape that is enlarged at a predetermined magnification from the outer shape of a person, or the like. That is, the human body region detection unit 14 can generate a silhouette-shaped image in which the human body region is painted white and the others are painted black, for example. Furthermore, the human body region detection means 14 may have a function of extracting color information such as hair, upper garment, and lower garment, and calculating position coordinates of a person in real space.

部位認識手段１５は、人体領域検出手段１４により検出された人物領域に対する所定の部位として、例えば、手先領域を検出する。具体的には、部位認識手段１５は、人体領域検出手段１４により検出された人物領域に対して細線化を行う。また、部位認識手段１５は、細線化された画像から、その端点や分岐点等の点同士の接続関係を表す行列（グラフ）に変換する。 The part recognizing means 15 detects, for example, a hand area as a predetermined part for the person area detected by the human body area detecting means 14. Specifically, the part recognizing means 15 performs thinning on the person area detected by the human body area detecting means 14. Further, the part recognizing means 15 converts the thinned image into a matrix (graph) representing the connection relationship between points such as end points and branch points.

また、部位認識手段１５は、原画像全体に対してエッジ検出を行い、検出された画面全体のエッジ情報から、人体領域のエッジを抽出する。また、部位認識手段１５は、変換したグラフに基づき、予め人物の姿勢等を登録した人物モデルグラフ（行列モデル）を用いてマッチング（モデルグラフマッチング）等を行うこともできる。これにより、得られた領域が人物であるかどうかを判別すると共に、手先に該当する１又は複数の端点をその辞書から得ることができ、端点を手先として認識することができる。 The part recognition means 15 performs edge detection on the entire original image, and extracts the edge of the human body region from the detected edge information of the entire screen. The part recognition means 15 can also perform matching (model graph matching) or the like using a person model graph (matrix model) in which the posture of the person is registered in advance based on the converted graph. Thus, it is possible to determine whether or not the obtained region is a person, to obtain one or more end points corresponding to the hand from the dictionary, and to recognize the end point as the hand.

また、部位認識手段１５は、認識対象が手先であれば、片手又は両手の部位を認識してもよく、また手先以外にも頭や足等、予め設定される複数の部位のうち、１又は複数の部位を、予め設定されたそれぞれの形状や色等の特徴を用いたパターンマッチング処理等を行うことで、同時に認識してもよい。なお、手先の検出においては、例えば、最初に、画像中に含まれる対象人物の一方の手の手先領域を認識し、その後、認識した手先領域の特徴に基づいて、同一画像中の他の手先領域を認識することで両手を認識することができる。 Further, if the recognition target is the hand, the part recognition means 15 may recognize the part of one hand or both hands, and in addition to the hand, one or more of a plurality of preset parts such as the head and feet may be recognized. A plurality of parts may be recognized simultaneously by performing a pattern matching process or the like using features such as preset shapes and colors. In detecting the hand, for example, first, the hand region of one hand of the target person included in the image is recognized, and then the other hand in the same image is determined based on the characteristics of the recognized hand region. By recognizing the area, both hands can be recognized.

更に、部位認識手段１５は、後述するように、人体の頭部や手先等の各部位の判別を行ったり、例えば手先等の所定の部位の位置を特定する前に、その手先候補の推定を行うこともできる。なお、部位認識手段１５の具体例については後述する。 Further, as will be described later, the part recognizing means 15 determines each part such as the human head and the hand, and estimates the hand candidate before specifying the position of a predetermined part such as the hand. It can also be done. A specific example of the site recognition means 15 will be described later.

挙動認識手段１６は、人体領域検出手段１４により検出された人体領域、及び／又は、部位認識手段１５により認識された所定の部位（１又は複数の部位）を、撮影された映像から時系列に取得し、取得した人体領域や部位の連続的な移動方向、移動速度、所定動作の繰り返し回数等により対象人物の挙動を認識する。なお、挙動認識手段１６は、例えば、人体領域及び所定の部位の時系列情報と、予め設定された行動パターンとを照合して、その人物挙動を認識することができる。 The behavior recognition unit 16 chronologically identifies a human body region detected by the human body region detection unit 14 and / or a predetermined part (one or a plurality of parts) recognized by the part recognition unit 15 from a captured image. Acquired, and recognizes the behavior of the target person based on the continuous moving direction, moving speed, number of repetitions of a predetermined action, etc. of the acquired human body region or part. Note that the behavior recognition unit 16 can recognize the human behavior by, for example, collating time series information of the human body region and the predetermined part with a preset behavior pattern.

具体的には、挙動認識手段１６は、例えば、手先を左右に激しく移動させる等、手先の時間経過に伴う移動状態や手先の位置情報等から、手先の突き出し行為の有無及び挙動等を認識する。また、挙動認識手段１６は、例えば、予め設定された行動パターンとして、人体領域がコンビニエンスストアのレジ付近を何度も往復するような場合、レジ付近で所定時間以上停止しているような場合には、その人物を不審者として認識することができる。 Specifically, the behavior recognizing means 16 recognizes the presence or absence of the action of the hand, the behavior, etc. from the moving state of the hand, the position information of the hand, etc. . Also, the behavior recognition means 16 is used when, for example, the human body region makes a round trip around the cash register of a convenience store as a preset behavior pattern, or when it has been stopped near the cash register for a predetermined time or more. Can recognize the person as a suspicious person.

また、挙動認識手段１６は、設置されたカメラの設置場所や位置等の位置情報と人体領域から顔領域を抽出し、その顔領域から顔の特徴点を取得して、顔の向き等により挙動を認識することもできる。なお、顔の特徴点は、例えば撮影された画像に含まれる顔における目や鼻、口等の位置情報からその顔の特徴量を取得し、予め設定された顔として検出されるための特徴量の照合パターンを用いたマッチング処理等を行うことにより人物の顔を検出する。また、上述の顔検出処理に限定されず、例えばエッジ検出や形状パターン検出による顔検出、色相抽出又は肌色抽出による顔検出等を用いることができる。 Further, the behavior recognition means 16 extracts a face area from position information and a human body area such as the installation location and position of the installed camera, acquires a facial feature point from the face area, and behaves depending on the orientation of the face. Can also be recognized. For example, the feature points of the face are obtained by acquiring the feature amount of the face from position information such as eyes, nose, and mouth of the face included in the photographed image, and detected as a preset face. The face of a person is detected by performing a matching process using the matching pattern. Further, the present invention is not limited to the above-described face detection processing, and for example, face detection by edge detection or shape pattern detection, face detection by hue extraction or skin color extraction, or the like can be used.

更に、挙動認識手段１６は、顔領域の中心座標（位置情報）、及び領域の画像上の大きさ（サイズ）を検出し、その顔領域を所定形状により元の画像に合成して顔領域が明確に分かるように画面表示するための各種情報を取得し、蓄積手段１３に蓄積させることもできる。なお、顔領域の形状は、本発明においては、矩形や円形、楕円形、他の多角形、人物の顔の外形形状から所定倍率で拡大させたシルエット形状等であってもよい。 Furthermore, the behavior recognition means 16 detects the center coordinates (position information) of the face area and the size (size) of the area on the image, and synthesizes the face area with the original image with a predetermined shape to determine the face area. As can be clearly seen, various information for screen display can be acquired and stored in the storage means 13. In the present invention, the shape of the face region may be a rectangle, a circle, an ellipse, another polygon, a silhouette shape enlarged from the outer shape of a human face at a predetermined magnification, or the like.

画面生成手段１７は、カメラにより撮影された映像や人体領域検出手段１４により検出された人体領域、本実施形態における部位認識を行うためのメニュー画面、部位認識を行うための入力画面、人体領域結果、部位認識結果、挙動認識結果、通知手段１８における通知結果等、本実施形態における部位認識処理を実現するうえで必要な各種画面を生成する。このとき、画面生成手段１７は、上述した各構成により処理された結果を表示する画面を生成するだけでなく、蓄積手段１３等に予め設定された各種データを表示するための画面を生成することもでき、例えば撮影された人物の領域に対応する位置情報等に関する数値化されたデータ（例えば、座標や時間情報、人物情報）等を表示させることもできる。 The screen generation means 17 includes an image captured by the camera, a human body area detected by the human body area detection means 14, a menu screen for performing part recognition in this embodiment, an input screen for performing part recognition, and a human body area result. Various screens necessary for realizing the part recognition processing in the present embodiment, such as a part recognition result, a behavior recognition result, and a notification result in the notification unit 18, are generated. At this time, the screen generation unit 17 generates not only a screen for displaying the results processed by the above-described configurations but also a screen for displaying various data set in advance in the storage unit 13 and the like. For example, digitized data (for example, coordinates, time information, and person information) related to position information corresponding to a photographed person's area can be displayed.

なお、画面生成手段１７が画面生成に必要な各種情報は、蓄積手段１３に予め蓄積されている情報等から必要な情報を適宜読み出して使用することができる。また、画面生成手段１７は、生成された画面等を出力手段１２としてのディスプレイ等に表示したり、スピーカ等により音声等を出力することができる。 Note that various information necessary for screen generation by the screen generation unit 17 can be used by appropriately reading out necessary information from information stored in the storage unit 13 in advance. Further, the screen generation means 17 can display the generated screen on a display or the like as the output means 12, and can output sound or the like through a speaker or the like.

通知手段１８は、挙動認識手段１６により得られる認識結果において、人に襲い掛かる動作であったり、殴る、蹴る等の動作であった場合には、例えば危険人物である旨等を示す緊急信号を生成し、生成された緊急信号をユーザや管理者、警備会社等におけるそのビルの担当警備員、監視員、代表責任者、監視ロボット等の所定の連絡先に通知する。また、通知手段１８は、その挙動を認識した画像に関する情報（撮影日時、撮影場所、その前の所定時間分の映像等）と、その特定物体の情報を画面生成手段１７により生成させて、出力手段１２により表示させる。 In the recognition result obtained by the behavior recognition means 16, the notification means 18 gives an emergency signal indicating that the person is a dangerous person, etc. The generated emergency signal is notified to a predetermined contact such as a security officer, a supervisor, a representative manager, a surveillance robot, or the like in charge of the building in a user, an administrator, a security company, or the like. Further, the notification means 18 causes the screen generation means 17 to generate and output information related to the image whose behavior has been recognized (shooting date and time, shooting location, video for a predetermined time before that) and information on the specific object. Displayed by means 12.

なお、通知手段１８は、例えば監視ロボット等に通知を行う場合には、その監視ロボットが対象者と対面しているか又は監視ロボットが備える撮像手段により対象者が撮影されるほど接近した位置にいるため、監視ロボットから対象者に対して音声メッセージを出力させたり、警報ランプや非常音等により周囲に対して注意を促すような処理を行わせるような監視ロボットに対する制御信号を通知することもできる。 For example, when notifying the monitoring robot or the like, the notifying unit 18 faces the target person or is close enough to the target person to be photographed by the imaging unit included in the monitoring robot. Therefore, it is also possible to notify a control signal to the monitoring robot that causes the monitoring robot to output a voice message to the target person, or to perform a process of calling attention to the surroundings by an alarm lamp or an emergency sound. .

送受信手段１９は、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）やインターネット等の通信ネットワーク等を介して１又は複数のカメラや監視ロボットが備える撮像手段からの監視映像を受信する。また、送受信手段１９は、例えば、上記の通信ネットワーク等を介して遠隔地にある画像サーバ等に蓄積された各種映像の中から必要な情報を選択して取得することもできる。 The transmission / reception means 19 receives a monitoring video from an imaging means included in one or a plurality of cameras or a monitoring robot via a communication network such as a LAN (Local Area Network) or the Internet. The transmission / reception means 19 can also select and acquire necessary information from various videos stored in an image server or the like at a remote location via the communication network described above.

ここで、送受信手段１９は、カメラから直接監視映像を受信し、リアルタイムに処理して事前に犯罪を予防することが好ましいが、例えば予めカメラで取得した映像をどこかに一時的に保存しておき、その保存された情報をまとめて上述した本実施形態における各種処理を行ってもよい。 Here, it is preferable that the transmission / reception means 19 receives the monitoring video directly from the camera and processes it in real time to prevent crime in advance. For example, the transmission / reception means 19 temporarily stores the video previously acquired by the camera somewhere. In addition, the stored information may be collected and the various processes in the present embodiment described above may be performed.

また、送受信手段１９は、装置内の蓄積手段１３に蓄積されている各種プログラムや各種データを他の端末に送信したり、他の端末から各種データを受信するための通信インタフェースとして用いることができる。 The transmission / reception means 19 can be used as a communication interface for transmitting various programs and various data stored in the storage means 13 in the apparatus to other terminals and receiving various data from other terminals. .

制御手段２０は、部位認識装置１０における各機能構成全体の制御を行う。具体的には、制御手段２０は、入力手段１１により入力されたユーザからの指示情報等に基づいて、上述した各機能構成における処理を実行させる等の各種制御を行う。 The control means 20 controls the entire functional configuration of the part recognition device 10. Specifically, the control unit 20 performs various controls such as executing processing in each functional configuration described above based on instruction information from the user input by the input unit 11.

なお、上述した実施形態では、人体領域検出手段１４及び部位認識手段１５における機能を部位認識装置１０に含めているが、本発明においてはこれに限定されるものではなく、人体領域検出手段１４及び手先検出手段（部位認識手段）１５としての機能を部位認識装置（図示せず）とし、部位認識装置１０とは別体に設けてもよい。 In the above-described embodiment, the functions of the human body region detecting unit 14 and the site recognizing unit 15 are included in the site recognizing device 10, but the present invention is not limited to this, and the human body region detecting unit 14 and The function as the hand detection means (part recognition means) 15 may be a part recognition apparatus (not shown) and may be provided separately from the part recognition apparatus 10.

＜部位認識装置：ハードウェア構成例＞
ここで、上述した部位認識装置１０においては、各機能をコンピュータに実行させることができる実行プログラム（部位認識プログラム）を生成し、例えば汎用のパーソナルコンピュータ（ＰＣ）、サーバ等にその実行プログラムをインストールすることにより、本実施形態における部位認識を実現することができる。 <Part recognition device: hardware configuration example>
Here, in the part recognition apparatus 10 described above, an execution program (part recognition program) that allows a computer to execute each function is generated, and the execution program is installed in, for example, a general-purpose personal computer (PC), a server, or the like. By doing so, the site | part recognition in this embodiment is realizable.

ここで、本実施形態における部位認識処理が実現可能なコンピュータのハードウェア構成例について図を用いて説明する。図２は、本実施形態における部位認識処理が実現可能なハードウェア構成の一例を示す図である。図２における部位認識装置１０のコンピュータ本体には、入力装置３１と、出力装置３２と、ドライブ装置３３と、補助記憶装置３４と、メモリ装置３５と、各種制御を行うＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）３６と、ネットワーク接続装置３７とを有するよう構成されており、これらはシステムバスＢで相互に接続されている。 Here, a hardware configuration example of a computer capable of realizing the part recognition process in the present embodiment will be described with reference to the drawings. FIG. 2 is a diagram illustrating an example of a hardware configuration capable of realizing the part recognition process in the present embodiment. 2 includes an input device 31, an output device 32, a drive device 33, an auxiliary storage device 34, a memory device 35, and a CPU (Central Processing Unit) 36 that performs various controls. And a network connection device 37, which are connected to each other via a system bus B.

入力装置３１は、使用者が操作するキーボード及びマウス等のポインティングデバイスを有しており、使用者からのプログラムの実行等、各種操作信号を入力する。出力装置３２は、本発明における部位認識等を行うためのコンピュータ本体を操作するのに必要な各種ウィンドウやデータ等を表示するモニタを有し、ＣＰＵ３６に有する制御プログラムに基づいてプログラム実行結果等を表示することができる。 The input device 31 has a pointing device such as a keyboard and a mouse operated by the user, and inputs various operation signals such as execution of a program from the user. The output device 32 has a monitor for displaying various windows and data necessary for operating the computer main body for performing part recognition or the like in the present invention, and displays a program execution result or the like based on a control program in the CPU 36. Can be displayed.

ここで、本発明において、コンピュータ本体にインストールされる実行プログラムは、例えば、ＣＤ−ＲＯＭ等の記録媒体３８等により提供される。プログラムを記録した記録媒体３８はドライブ装置３３にセット可能であり、記録媒体３８に含まれる実行プログラムが、記録媒体３８からドライブ装置３３を介して補助記憶装置３４にインストールされる。 Here, in the present invention, the execution program installed in the computer main body is provided by, for example, the recording medium 38 such as a CD-ROM. The recording medium 38 on which the program is recorded can be set in the drive device 33, and the execution program included in the recording medium 38 is installed from the recording medium 38 to the auxiliary storage device 34 via the drive device 33.

補助記憶装置３４は、ハードディスク等のストレージ手段であり、本発明における実行プログラムや、コンピュータに設けられた制御プログラムの他に、ドライブ装置３３から読み取ることができる各種データを蓄積し、必要に応じて入出力を行うことができる。また、上述した部位認識で得られる各種データ等を格納することもできる。 The auxiliary storage device 34 is a storage means such as a hard disk, and accumulates various data that can be read from the drive device 33 in addition to the execution program in the present invention and the control program provided in the computer. I / O can be performed. In addition, various data obtained by the above-described part recognition can be stored.

メモリ装置３５は、ＣＰＵ３６により補助記憶装置３４から読み出された実行プログラム等を格納する。なお、メモリ装置３５は、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）やＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等からなる。 The memory device 35 stores an execution program read from the auxiliary storage device 34 by the CPU 36. The memory device 35 includes a ROM (Read Only Memory), a RAM (Random Access Memory), and the like.

ＣＰＵ３６は、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）等の制御プログラム、メモリ装置３５により読み出され格納されている実行プログラムに基づいて、各種演算や各ハードウェア構成部とのデータの入出力等、コンピュータ全体の処理を制御する。 The CPU 36 performs processing for the entire computer, such as various operations and input / output of data with each hardware component, based on a control program such as an OS (Operating System) and an execution program read and stored by the memory device 35. To control.

また、ＣＰＵ３６は、本発明における実行プログラム及び制御プログラムにより、本実施形態における部位認識処理を実現することができる。なお、プログラムの実行中に必要な各種情報は、補助記憶装置３４から取得することができ、また格納することもできる。 Further, the CPU 36 can realize the part recognition process in the present embodiment by the execution program and the control program in the present invention. Various kinds of information necessary during the execution of the program can be acquired from the auxiliary storage device 34 and can also be stored.

ネットワーク接続装置３７は、通信ネットワーク等と接続することにより、実行プログラムを通信ネットワークに接続されている他の端末等から取得したり、部位認識プログラムを実行することで得られた各種情報若しくは当該プログラム自体を他の端末等に提供することができる。 The network connection device 37 obtains an execution program from another terminal connected to the communication network by connecting to a communication network or the like, or various information obtained by executing the part recognition program or the program It can be provided to other terminals.

なお、本発明における実行プログラムはＣＤ−ＲＯＭ等の持ち運び可能な記録媒体３８に格納することにより任意の端末で、そのＣＤ−ＲＯＭから実行プログラムを取得し実行することができる。 The execution program in the present invention can be acquired and executed from the CD-ROM by an arbitrary terminal by storing it in a portable recording medium 38 such as a CD-ROM.

なお、記録媒体３８は、上述したＣＤ−ＲＯＭの他、フレキシブルディスク、光磁気ディスク等のように情報を光学的、電気的或いは磁気的に記録する記録媒体、ＲＯＭ、フラッシュメモリ等のように情報を電気的に記録する半導体メモリ等、様々なタイプの記録媒体を用いることができる。 In addition to the CD-ROM described above, the recording medium 38 is a recording medium that records information optically, electrically, or magnetically, such as a flexible disk or a magneto-optical disk, or information such as a ROM or a flash memory. Various types of recording media, such as a semiconductor memory that electrically records data, can be used.

上述したようなハードウェア構成により、特別な装置構成を必要とせず、低コストで高精度に本発明における部位認識処理を行うことができる。また、プログラムをインストールすることにより、汎用のパーソナルコンピュータ等で本発明における部位認識処理を容易に実現することができる。 With the hardware configuration as described above, it is possible to perform the site recognition process in the present invention with high accuracy at low cost without requiring a special device configuration. Further, by installing the program, the part recognition process in the present invention can be easily realized by a general-purpose personal computer or the like.

＜部位認識処理例＞
次に、上述した部位認識装置１０や部位認識プログラムを用いた本実施形態における部位認識処理手順について説明する。 <Example of part recognition processing>
Next, a part recognition processing procedure in the present embodiment using the part recognition apparatus 10 and the part recognition program described above will be described.

図３は、本実施形態における部位認識の概略処理手順の一例を示すフローチャートである。図３において、まず所定の位置に取り付けられたカメラ等の撮像手段により撮影された映像中に含まれる所定の画像をキャプチャ（取得）し（Ｓ０１）、キャプチャした画像に含まれる人体領域を検出する（Ｓ０２）。次に、Ｓ０２の処理の結果として人体領域があるか否かを判断し（Ｓ０３）、人体領域がある場合（Ｓ０３において、ＹＥＳ）、その人体領域に対して上述した人体の所定の部位（手先等）を認識し（Ｓ０４）、その認識した部位を時系列的に追跡し、その結果から挙動認識を行う（Ｓ０５）。なお、挙動認識は、必要に応じて選択的に行うことができる。なお、Ｓ０４の処理において、画像中に複数人数が撮影されていれば、その人物毎の所定の部位が抽出される。 FIG. 3 is a flowchart illustrating an example of a schematic processing procedure for part recognition in the present embodiment. In FIG. 3, first, a predetermined image included in a video image captured by an imaging means such as a camera attached at a predetermined position is captured (S01), and a human body region included in the captured image is detected. (S02). Next, it is determined whether or not there is a human body region as a result of the process of S02 (S03). If there is a human body region (YES in S03), the predetermined part of the human body (the hand) described above with respect to the human body region Etc.) (S04), the recognized part is traced in time series, and the behavior is recognized from the result (S05). Note that behavior recognition can be selectively performed as necessary. In the process of S04, if a plurality of people are photographed in the image, a predetermined part for each person is extracted.

その後、Ｓ０２の処理における人体領域検出結果やＳ０４の処理における部位認識結果、Ｓ０５の処理における挙動認識結果等をディスプレイの出力手段等に表示する画面を生成し（Ｓ０５）、生成した画面を表示する（Ｓ０７）。なお、Ｓ０７の処理では、１つ画像からだけではなく、例えば時系列の映像から選択された複数の画像における人体領域検出や部位検出、挙動等の比較を行って、その比較画像を表示する行うこともできる。 Thereafter, a screen for displaying the human body region detection result in the process of S02, the part recognition result in the process of S04, the behavior recognition result in the process of S05, etc. on the output means of the display is generated (S05), and the generated screen is displayed. (S07). Note that in the processing of S07, the comparison image is displayed by comparing human body region detection, part detection, behavior, and the like in a plurality of images selected from, for example, time-series images, not only from one image. You can also.

また、Ｓ０５の処理における挙動認識において、不審者又は危険人物等であると判断された場合には、ユーザや管理センタ、警備員等に通知を行う（Ｓ０８）。 Further, in the behavior recognition in the process of S05, when it is determined that the person is a suspicious person or a dangerous person, a notification is sent to the user, the management center, the guard, etc. (S08).

ここで、Ｓ０３の処理において人体領域がない場合（Ｓ０３において、ＮＯ）、又はＳ０８の処理が終了後、部位認識処理を終了するか否かを判断し（Ｓ０９）、部位認識処理を終了しない場合（Ｓ０９において、ＮＯ）、Ｓ０１に戻り、次の対象画像をキャプチャして後続の処理を行う。また、ユーザ等からの終了指示等により部位認識処理を終了する場合（Ｓ０９において、ＹＥＳ）、処理を終了する。 Here, when there is no human body region in the process of S03 (NO in S03), or after the process of S08 is finished, it is determined whether or not the part recognition process is finished (S09), and the part recognition process is not finished (NO in S09), the process returns to S01 to capture the next target image and perform subsequent processing. In addition, when the part recognition process is ended by an end instruction from the user or the like (YES in S09), the process ends.

＜本実施形態における部位認識例＞
次に、上述した本実施形態における部位認識手段１５における部位認識例について説明する。なお、以下の処理では、便宜上、人体領域検出手段１４における処理内容も説明する。 <Example of site recognition in this embodiment>
Next, an example of part recognition in the part recognition means 15 in the above-described embodiment will be described. In the following processing, processing contents in the human body region detection unit 14 will be described for convenience.

＜第１の実施例：円検出＞
まず本実施形態における部位認識の第１の実施例について具体的に説明する。図４は、本実施形態における部位認識の第１の実施例を説明するための図である。図４に示す第１の実施例では、例えば、画像から検出した人体領域全体に「手先」らしい円を探索し、その円から手先を検出するものである。 <First embodiment: circle detection>
First, a first example of site recognition in the present embodiment will be specifically described. FIG. 4 is a diagram for explaining a first example of site recognition in the present embodiment. In the first embodiment shown in FIG. 4, for example, a circle that seems to be a “hand” is searched for in the entire human body region detected from the image, and the hand is detected from the circle.

つまり、円検出は、人体のエッジ検出後、人体の上部から順に、予め設定された画像サイズに対応する人体の大きさを基準に設定した円の探索を行い、手先の位置を認識する。 In other words, in the circle detection, after detecting the edge of the human body, a circle search based on the size of the human body corresponding to the preset image size is performed in order from the top of the human body to recognize the position of the hand.

図４に示すように、所定の位置に取り付けられたカメラ等の撮像手段により撮影された映像中に含まれる所定の画像をキャプチャ（取得）し、キャプチャした原画像（図４（ａ））に含まれる人体領域を検出する。具体的には、人体領域検出手段１４により人体領域をシルエットとして検出する（図４（ｂ））。なお、図４（ｂ）の例では、人体領域を白塗りとし、その他を黒塗りとしているが、本発明においてはこれに限定されるものではない。 As shown in FIG. 4, a predetermined image included in a video imaged by an imaging means such as a camera attached at a predetermined position is captured (acquired), and the captured original image (FIG. 4A) is obtained. Detect the human body area included. Specifically, the human body region is detected as a silhouette by the human body region detection means 14 (FIG. 4B). In the example of FIG. 4B, the human body region is painted white and the others are painted black. However, the present invention is not limited to this.

また、第１の実施例では、画像に人体領域がある場合、上述した人体領域検出と平行して画像全体に対するエッジを検出する（図４（ｃ））。なお、このエッジ検出処理も上述した人体領域検出手段１４により検出することができる。 Further, in the first embodiment, when a human body region is included in an image, an edge for the entire image is detected in parallel with the above-described human body region detection (FIG. 4C). This edge detection process can also be detected by the human body region detection means 14 described above.

次に、エッジ化された画像の中から上述した人体領域に対応させて人体のエッジを検出する（図４（ｄ））。なお、本実施形態におけるエッジとは、画像中における隣接画素間の色差や輝度差等に基づき、物体の輪郭部分等を抽出する処理等を意味している。 Next, the edge of the human body is detected from the edged image corresponding to the above-described human body region (FIG. 4D). Note that the edge in the present embodiment means a process for extracting an outline portion or the like of an object based on a color difference or luminance difference between adjacent pixels in an image.

次に、第１の実施例では、所定の部位（手先）部分を抽出するための円検出を行う。具体的には、部位認識手段１５は、人体エッジ領域を対象にして、頭等の端部から人体領域のエッジが円形に近い箇所を走査して探していく。なお、走査方向は、胴体方向から手先に対して行う。そして、例えば、最初に検出された円形に近い部分を手先部分として検出する（図４（ｅ））。 Next, in the first embodiment, circle detection for extracting a predetermined part (hand) part is performed. Specifically, the part recognizing means 15 searches the human body edge region by scanning a portion where the edge of the human body region is close to a circle from the end of the head or the like. The scanning direction is performed from the body direction to the hand. Then, for example, a portion close to a circle detected first is detected as a hand portion (FIG. 4E).

＜円検出処理について＞
ここで、上述した円検出処理について具体的に説明する。図５は、円検出処理を説明するための一例を示す図である。図５の例では、上述した人体エッジ検出処理により検出された人体エッジを含む予め設定された注目領域内に円又は円に近い形状（楕円や正方形、正六角形等）からなる所定の形状があるか否かを判断して円検出を行う。 <About circle detection processing>
Here, the circle detection process described above will be specifically described. FIG. 5 is a diagram illustrating an example for explaining the circle detection processing. In the example of FIG. 5, there is a predetermined shape consisting of a circle or a shape close to a circle (an ellipse, a square, a regular hexagon, etc.) within a preset attention area including the human body edge detected by the human body edge detection process described above. Whether or not the circle is detected.

具体的には、図５に示すように、ある点からのエッジ方向の角度に対して、領域内の全てのエッジ方向を、ヒストグラムに加算する。このとき、もしエッジの形状が円形であれば、全ての点のエッジ方向が注目点に対して９０°となり、ほぼ円形であれば、８０〜１００度以内のヒストグラムの値が大きくなる。そのため、これらの結果を、注目点を細線化した線分上で移動させて計測していくことで、円形部分を検出することができる。 Specifically, as shown in FIG. 5, all edge directions in the region are added to the histogram with respect to the angle of the edge direction from a certain point. At this time, if the shape of the edge is circular, the edge direction of all points is 90 ° with respect to the point of interest, and if it is almost circular, the value of the histogram within 80 to 100 degrees becomes large. Therefore, a circular portion can be detected by measuring these results by moving the point of interest on a thin line segment.

なお、注目領域内に円がない場合には、各エッジと中心がなす角度のヒストグラム結果はバラバラとなる。 When there is no circle in the attention area, the histogram results of the angles formed by the edges and the centers are different.

ここで、本実施形態では、例えば、加算値の値が予め設定された閾値以上のときにその部分が円形であると推測することができる。なお、閾値は、画質や画像中における人体領域の大きさ等により適宜変更することができる。また、円検出の際には、上述したグラフ化情報に基づいて胴体部分から手のほうに向けて注目点を移動していき、最初に円を検出した部分の注目点を手先の位置とする。これは、２番目以降の円検出は、把持物体である可能性が高いからである。なお、上述の処理は、画像中に含まれる全ての手先候補に対して行われる。 Here, in this embodiment, for example, when the value of the added value is equal to or greater than a preset threshold value, it can be estimated that the portion is circular. Note that the threshold value can be changed as appropriate depending on the image quality, the size of the human body region in the image, and the like. When detecting a circle, the attention point is moved from the body part toward the hand based on the graphed information described above, and the attention point of the part where the circle is first detected is set as the hand position. . This is because the second and subsequent circle detections are likely to be gripped objects. Note that the above-described processing is performed on all hand candidates included in the image.

上述したように、第１の実施例における円検出処理を行うことで、画像中の人体に対する所定の部位（例えば、手先等）の位置を高精度に取得するこができる。なお、本実施形態において、手先は、握った状態でも開いた状態でも、対応する所定の形状を用いて容易に認識することができる。また、所定の形状は、認識対象の部位毎に設定されており、例えば足であれば足先の形状が設定され、頭であれば大きめの円形状が設定される。 As described above, by performing the circle detection process in the first embodiment, the position of a predetermined part (for example, a hand) with respect to the human body in the image can be obtained with high accuracy. In the present embodiment, the hand can be easily recognized using a predetermined shape, whether it is grasped or opened. The predetermined shape is set for each region to be recognized. For example, a foot shape is set for a foot, and a larger circular shape is set for a head.

＜第２の実施例：細線化画像＞
次に、本実施形態における部位認識の第２の実施例について具体的に説明する。図６は、本実施形態における部位認識の第２の実施例を説明するための図である。図６に示す第２の実施例では、例えば、上述した第１の実施例における円検出を行わずに、細線化処理を行い、細線化された線分の端点のうち、所定の画素の位置から手先位置を判断するものである。 <Second Embodiment: Thinned Image>
Next, a second example of site recognition in the present embodiment will be specifically described. FIG. 6 is a diagram for explaining a second example of site recognition in the present embodiment. In the second embodiment shown in FIG. 6, for example, the thinning process is performed without performing the circle detection in the first embodiment described above, and the position of a predetermined pixel among the end points of the thinned line segment is determined. From this, the hand position is determined.

具体的には、図６に示すように、まず原画像（図６（ａ））から人体領域を抽出する（図６（ｂ））。ここまでは、上述した第１の実施例と同様の処理を行うため、ここでの具体的な説明は省略する。 Specifically, as shown in FIG. 6, first, a human body region is extracted from the original image (FIG. 6A) (FIG. 6B). Up to this point, the same processing as in the first embodiment described above is performed, and a specific description thereof is omitted here.

次に、図６（ｂ）で得られた人体領域に対して細線化を行う。ここで、細線化処理は、図６（ｂ）に示す人体領域のシルエットを圧縮して得られるものであり、具体的にはシルエットの外周形状を基準にし、中心点を結んで細線化を行うものである。 Next, thinning is performed on the human body region obtained in FIG. Here, the thinning process is obtained by compressing the silhouette of the human body region shown in FIG. 6B. Specifically, the thinning is performed by connecting the center points based on the outer peripheral shape of the silhouette. Is.

その後、細線化した情報に基づいて、その線分の形状や分岐点、端点、又は、その組み合わせ、他の分岐点や端点との相対位置関係等から画像中における手先部分を認識する（図６（ｃ））。 Thereafter, based on the thinned information, the hand portion in the image is recognized from the shape of the line segment, the branch point, the end point, or a combination thereof, the relative positional relationship with other branch points and end points, and the like (FIG. 6). (C)).

つまり、第２の実施例では、細線化した端点が、どの画素位置にあるかで手先位置を判断する。 That is, in the second embodiment, the hand position is determined according to which pixel position the thinned end point is located.

＜第３の実施例：円検出＋細線化画像＞
次に、本実施形態における部位認識の第３の実施例について具体的に説明する。図７は、本実施形態における部位認識の第３の実施例を説明するための図である。図７に示す第３の実施例では、上述した第１の実施例と第２の実施例とを組み合わせたものである。つまり、上述した第１の実施例における円検出を行う場合に、細線化された線上を中心候補として円検索を行うものである。 <Third embodiment: circle detection + thinned image>
Next, a third example of part recognition in this embodiment will be specifically described. FIG. 7 is a diagram for explaining a third example of part recognition in the present embodiment. The third embodiment shown in FIG. 7 is a combination of the first and second embodiments described above. That is, when performing circle detection in the first embodiment described above, circle search is performed using the thinned line as a center candidate.

具体的には、図７に示すように、上述した第１の実施例の図４（ａ）〜（ｄ）までは、図７（ａ）〜（ｄ）と同一の処理を行う。その後、図７（ｂ）に示す人体領域のシルエットを圧縮して細線化を行う（図７（ｅ））。 Specifically, as shown in FIG. 7, the same processing as in FIGS. 7A to 7D is performed up to FIGS. 4A to 4D of the first embodiment described above. Thereafter, the silhouette of the human body region shown in FIG. 7B is compressed and thinned (FIG. 7E).

その後、細線化した線分を円検出における円の中心部と併せて、線分の端部から移動させていくことで、人体領域のシルエット画像の外枠が円形の部分を抽出する。そして、検出された円形部分を手先部分として認識し、その中心位置を手先部分の位置座標として取得する（図７（ｆ））。 After that, the thinned line segment is moved from the end of the line segment together with the center of the circle in the circle detection, thereby extracting the circular portion of the outer frame of the silhouette image of the human body region. Then, the detected circular part is recognized as the hand part, and the center position is acquired as the position coordinate of the hand part (FIG. 7F).

第３の実施例に示すように、細線化した線上を円の中心候補として、円探索を行うことで、処理コストを削減し、効率的に円検出を行うことができる。 As shown in the third embodiment, by performing a circle search using the thinned line as a circle center candidate, the processing cost can be reduced and the circle can be detected efficiently.

＜第４の実施例：細線化グラフ＞
次に、本実施形態における部位認識の第４の実施例について具体的に説明する。図８は、本実施形態における部位認識の第４の実施例を説明するための図である。図８に示す第４の実施例では、上述した第２の実施例における細線化画像を行列化し、予め設定された行列モデル（モデルグラフ）とマッチングを行い、一致した行列モデルに予め設定されている部位情報に基づいて、手先の位置を認識する。なお、以下の説明では、上述した行列モデルとのマッチング処理を、モデルグラフマッチングをいう。 <Fourth Example: Thinned Graph>
Next, a fourth example of part recognition in this embodiment will be specifically described. FIG. 8 is a diagram for explaining a fourth example of site recognition in the present embodiment. In the fourth embodiment shown in FIG. 8, the thinned image in the second embodiment described above is matrixed, matched with a preset matrix model (model graph), and set to a matched matrix model in advance. The position of the hand is recognized based on the part information. In the following description, the above-described matching process with the matrix model is referred to as model graph matching.

つまり、図８の例では、まず上述した第２の実施例等を同様に、キャプチャした原画像
（図８（ａ））から人体領域を検出する（図８（ｂ））。その後、人体領域のシルエットに基づいて細線化処理を行い（図８（ｃ））、細線化された線分に基づいてモデルグラフマッチングを行い（図８（ｄ））、手先の検出を行う（図８（ｅ））。 That is, in the example of FIG. 8, first, the human body region is detected from the captured original image (FIG. 8A) in the same manner as in the second embodiment described above (FIG. 8B). Thereafter, thinning processing is performed based on the silhouette of the human body region (FIG. 8C), model graph matching is performed based on the thinned line segment (FIG. 8D), and the hand is detected ( FIG. 8 (e)).

＜行列化処理について＞
ここで、上述した細線化画像の行列化処理について、図を用いて具体的に説明する。図９は、本実施形態における行列化処理を説明するための図である。なお、本実施形態における行列化処理では、例えば、細線化した情報に対し、その線分中における分岐点と、端点とを設定し、設定された分岐点と端点とを行列により表記することにより、グラフ化を行っている。 <About matrix processing>
Here, the matrix processing of the thinned image described above will be specifically described with reference to the drawings. FIG. 9 is a diagram for explaining matrix processing in the present embodiment. In the matrix processing in the present embodiment, for example, for the thinned information, branch points and end points in the line segment are set, and the set branch points and end points are represented by a matrix. And graphing.

具体的には、図９に示すように、行列の行を分岐点の番号とし、行列の列を分岐点と端点の番号とし、行列要素が０の場合には、「接続関係なし」とし、行列要素が１の場合には、「接続関係あり」として行列によるグラフ化を行う。 Specifically, as shown in FIG. 9, the row of the matrix is the branch point number, the matrix column is the branch point and end point number, and when the matrix element is 0, “no connection”, When the matrix element is 1, graphing is performed using a matrix as “connected”.

つまり、図９の例では、分岐点の０番は、分岐点の１番、端点の０，１，２番と接続し、分岐点の１番は、分岐点の０番、端点の３番、４番と接続していることを意味している。 That is, in the example of FIG. 9, the branch point No. 0 is connected to the branch point No. 1 and the end points 0, 1, and 2, and the branch point No. 1 is the branch point No. 0 and the end point No. 3 This means that it is connected to No.4.

なお、上述したようにして生成された行列は、予め設定された行列モデルとマッチングを行うが、その行列モデルには、予め各分岐点又は端点が、どのような部位であるかを示す部位情報、及びその位置情報が設定されている。したがって、モデルグラフマッチングにより、一致した行列モデルを抽出することで、手先の部位に相当する端点を容易且つ正確に取得することができる。 The matrix generated as described above is matched with a preset matrix model, and the matrix model includes part information indicating what part each branch point or end point is in advance. And its position information are set. Therefore, by extracting the matched matrix model by model graph matching, the end point corresponding to the hand part can be easily and accurately acquired.

なお、上述した図８（ｄ）では、便宜上、細線化されたモデルが表示されているが、実際には、図９に示すように、グラフ化された行列モデルとして蓄積手段１３等に蓄積されている。 In FIG. 8D described above, the thinned model is displayed for convenience, but actually, as shown in FIG. 9, it is stored in the storage means 13 as a graphed matrix model. ing.

＜モデルグラフマッチング処理ついて＞
ここで、上述したモデルグラフマッチング処理について具体的に説明する。図１０は、グラフマッチングを説明するための図である。 <About model graph matching processing>
Here, the model graph matching process described above will be specifically described. FIG. 10 is a diagram for explaining the graph matching.

図１０に示すように、予め設定された人物の所定の動作パターンに対応する複数のモデルグラフが蓄積されたモデルグラフデータベース（行列モデルデータベース）を用いて、グラフ化された情報に対してそれが人体領域であるか否かを正確に確認することができる。 As shown in FIG. 10, using a model graph database (matrix model database) in which a plurality of model graphs corresponding to a predetermined motion pattern of a person set in advance is stored, It is possible to accurately confirm whether or not it is a human body region.

本実施形態では、例えば、各モデルグラフに予め正確な手、頭、足等の部位情報を設定しておき、その部位情報に基づいて、マッチングしたグラフから、その人体の手先部位等を高精度に取得することができる。 In the present embodiment, for example, accurate part information such as hands, heads, feet, etc. is set in advance in each model graph, and based on the part information, the hand part of the human body etc. is accurately obtained from the matched graph. Can be obtained.

したがって、図１０に示すように、モデルグラフデータベースを用いることで、例えば画像から得られたグラフ化データが、人体ではない場合やノイズ等で複雑に分岐した場合等のときに一致しないようにすることで、より高精度に人体の検出やその手先の位置や向き等の情報を取得することができる。なお、この場合には手先の状態は限定されず、握っていても開いていてもよい。 Therefore, as shown in FIG. 10, by using a model graph database, for example, graphed data obtained from an image is not matched when it is not a human body or when it is complicatedly branched due to noise or the like. Thus, it is possible to acquire information such as the detection of the human body and the position and orientation of the hand with higher accuracy. In this case, the state of the hand is not limited, and it may be grasped or opened.

また、例えば、対象人物が大きな帽子を被っていたり、手を繋いでいたり、杖をついている等、人物そのものとは異なる場合であっても、対応するモデルグラフを予め設定しておくことで、適切に部位認定を行うことができる。 Also, for example, even if the target person is wearing a big hat, holding hands, wearing a cane, etc., even if it is different from the person itself, by setting the corresponding model graph in advance, Appropriate site identification can be performed.

なお、上述したモデルグラフデータベースは、予め蓄積手段１３に蓄積されていてもよく、送受信手段１９を用いてインターネット等の通信ネットワークを介して、外部装置から取得してもよい。 The model graph database described above may be stored in the storage unit 13 in advance, or may be acquired from an external device via a communication network such as the Internet using the transmission / reception unit 19.

上述したマッチングを行うことにより、特定の姿勢（手を挙げている等）を認識でき、人体検出時に誤って検出された領域を、無駄な処理することなく除外することができる。 By performing the above-described matching, it is possible to recognize a specific posture (such as raising a hand), and it is possible to exclude an area that is erroneously detected at the time of human body detection without wasteful processing.

なお、図１０に示すモデルグラフは概念図であって、実際のモデルグラフのデータは、具体的には行列等に基づいて登録されている。 The model graph shown in FIG. 10 is a conceptual diagram, and the data of the actual model graph is specifically registered based on a matrix or the like.

モデルグラフを用いることで、無駄な情報を省いて迅速なマッチング処理を行うことができ、また高精度に部位の認識を行うことができる。 By using the model graph, it is possible to perform a quick matching process by omitting useless information and to recognize a part with high accuracy.

＜第５の実施例：重み付き細線化グラフ＞
次に、本実施形態における部位認識の第５の実施例について具体的に説明する。第５の実施例では、上述した第４の実施例における処理とほぼ同様の処理を行う（図８（ａ）〜（ｅ））。 <Fifth embodiment: weighted thinned graph>
Next, a fifth example of part recognition in this embodiment will be specifically described. In the fifth embodiment, substantially the same processing as that in the above-described fourth embodiment is performed (FIGS. 8A to 8E).

しかしながら、第５の実施例では、第４の実施例と比較すると、細線化画像を重み付きの行列にし、その行列に基づいてモデルグラフマッチングを行うものである。 However, in the fifth embodiment, compared with the fourth embodiment, the thinned image is made into a weighted matrix, and model graph matching is performed based on the matrix.

ここで、上述の内容について、図を用いて具体的に説明する。図１１は、重みを付与した細線化グラフの生成手法について説明するための図である。 Here, the above-mentioned content is concretely demonstrated using a figure. FIG. 11 is a diagram for explaining a method for generating a thinned graph with weights.

第５の実施例におけるグラフ化処理では、グラフ化される値（行列要素）に対して重みを付与する。 In the graphing process in the fifth embodiment, a weight is assigned to a graphed value (matrix element).

具体的には、図１１に示すように、カメラ等により撮影された映像から所定のフレーム画像をキャプチャし、キャプチャした画像から上述したように人体領域を検出し、検出された人体領域に対して細線化を行う。また、その細線化情報に基づいて上述したように分岐点及び端点からなる行列を用いてグラフ化を行う。 Specifically, as shown in FIG. 11, a predetermined frame image is captured from video captured by a camera or the like, a human body region is detected from the captured image, and the detected human body region is detected. Perform thinning. Further, based on the thinning information, graphing is performed using a matrix composed of branch points and end points as described above.

このとき、カメラの設置位置等に対応させて、人体領域を含む画像領域を更に複数の領域（ゾーン）に細分化し、細分化したデータに対して重み付けを付加してグラフ化（行列化）を行う。 At this time, the image area including the human body area is further subdivided into a plurality of areas (zones) according to the installation position of the camera, etc., and weighting is applied to the subdivided data to form a graph (matrix). Do.

図１１の例では、人体領域に対応する画像領域に対して、例えば、全体の縦の長さを基準に上から１／５のゾーンを頭部ゾーン、下から足部ゾーン、残りの領域を手部ゾーンとして３つのゾーンに細分化している。そして、細分化した頭部ゾーンの重みを１とし、手部ゾーンの重みを２とし、足部ゾーンの重みを３として、それぞれグラフとして反映させる。更に、分岐点の部分と端点との部分とで異なる数値（例えば、分岐点９、端点１等）にすることで、違いを明確にすることができる。 In the example of FIG. 11, for example, with respect to the image region corresponding to the human body region, a zone that is 1/5 from the top is defined as the head zone, the foot zone from the bottom, and the remaining regions are defined based on the overall vertical length. The hand zone is subdivided into three zones. Then, the weight of the subdivided head zone is set to 1, the weight of the hand zone is set to 2, and the weight of the foot zone is set to 3, and each is reflected as a graph. Furthermore, the difference can be clarified by using different numerical values (for example, the branch point 9, the end point 1, etc.) between the branch point portion and the end point portion.

このように、各ゾーン毎に予め設定した重みや点の種類毎に異なる値を付与することにより、そのグラフを参照するだけで、どの部分の端点であるかを容易に把握することができ、その部分から手先やその他の部位（例えば、頭、足等）を容易に検出することができる。また、より重要なゾーンで検出された端点又は分岐点には、高い重みを付加しておくことにより、重要なゾーン毎に端点や分岐点を管理することができる。なお、予め設定され蓄積手段１３等に蓄積されているモデルグラフにも上述したような重み付けがなされている。 In this way, by assigning a different value for each type of weight or point set in advance for each zone, it is possible to easily grasp which part is the end point only by referring to the graph, The hand and other parts (for example, the head, feet, etc.) can be easily detected from that part. Further, by adding a high weight to the end points or branch points detected in the more important zones, the end points and branch points can be managed for each important zone. Note that the above-described weighting is also applied to the model graph set in advance and stored in the storage means 13 or the like.

したがって、第５の実施例によれば、重み付けした行列モデルとモデルグラフとをマッチングすることでより確実に行列と同一のモデルグラフを取得することができ、これにより手先等の所定の部位の位置を高精度に取得することができる。 Therefore, according to the fifth embodiment, by matching the weighted matrix model with the model graph, it is possible to more reliably obtain the same model graph as the matrix, whereby the position of a predetermined part such as a hand Can be obtained with high accuracy.

＜モデルグラフに対応する部位情報について＞
ここで、上述したモデルグラフに対応する部位情報の具体例について図を用いて説明する。図１２は、モデルグラフデータベースを説明するための図である。なお、図１２（ａ）は、モデルグラフデータベースのデータ項目例を示し、図１２（ｂ）は、モデルグラフデータベースの具体的なデータ例を示している。 <Regional information corresponding to the model graph>
Here, a specific example of the part information corresponding to the model graph described above will be described with reference to the drawings. FIG. 12 is a diagram for explaining the model graph database. 12A shows an example of data items of the model graph database, and FIG. 12B shows a specific example of data of the model graph database.

図１２（ａ）に示すデータ項目としては、例えば、モデルグラフ番号によりモデルを識別する識別情報としての「ＭｏｄｅｌＩＤ」と、そのモデルの分岐点数を示す「ｂｒａｎｃｈｎｕｍ」と、そのモデルのノード数（分岐点＋端点数）を示す「ｎｏｄｅｎｕｍ」と、そのモデルグラフ（行列モデル）（例えば、図１１に示すような重み付き隣接行列等も含む）を示す「ｇｒａｐｈ」と、モデル中の手先数を示す「ｈａｎｄＣｏｕｎｔ」等がある。また、図１２（ａ）に示す項目に対するデータは、図１２（ｂ）に示すように、複数のモデルがデータベースに蓄積されている。なお、このデータの配列や数値の条件等については、本発明においては特に制限されるものではない。 The data items shown in FIG. 12A include, for example, “ModelID” as identification information for identifying a model by a model graph number, “branchnum” indicating the number of branch points of the model, and the number of nodes (branch) of the model. “Nodenum” indicating (point + number of endpoints), “graph” indicating its model graph (matrix model) (for example, including a weighted adjacency matrix as shown in FIG. 11), and the number of hands in the model “HandCount” and the like. Further, as shown in FIG. 12B, the data for the items shown in FIG. 12A has a plurality of models stored in the database. Note that the data arrangement, numerical conditions, and the like are not particularly limited in the present invention.

例えば、手先を認識する際のモデルグラフマッチングでは、モデルグラフデータベースの構成要素のうち、「ｇｒａｐｈ」は手、頭、足の情報を表すものであり、各部位を識別するための重み付けがされる。なお、重み付け方法は、上述したように、例えば、得られた人体領域を５等分し、上１／５を頭部ゾーン、下１／５を足部ゾーン、残りの中央部を手部ゾーンとし、細分化した領域を上から順に１〜３の重みを付与する。また、例えば分岐点同士の接続には、９を付与することもできる。なお、本実施形態では、例えば、上述した第４の実施例のように、重み付けを付与していなくてもよい。 For example, in model graph matching when recognizing a hand, among the components of the model graph database, “graph” represents hand, head, and foot information and is weighted to identify each part. . As described above, the weighting method is, for example, dividing the obtained human body region into five equal parts, the upper 1/5 is the head zone, the lower 1/5 is the foot zone, and the remaining central part is the hand zone. And weights 1 to 3 are assigned to the subdivided regions in order from the top. For example, 9 can be given to the connection between the branch points. In the present embodiment, for example, weighting may not be provided as in the above-described fourth example.

なお、上述したように、例えば人体領域を５等分する重み付け方法を用いた場合に、対象人物が手を挙げていると、頭部候補が２つとなり手の認識が難しくなるという問題がある。しかしながら、この場合には、上述した重み付けを行った細線化グラフと、予め用意したモデルグラフとのマッチングを行うことにより、特定の姿勢（手を挙げている等）を認識でき、人体検出時に誤って検出された領域を、無駄な処理することなく除外することができる。 As described above, for example, when a weighting method that divides a human body region into five parts is used, if the target person raises a hand, there are two head candidates, which makes it difficult to recognize the hand. . However, in this case, a specific posture (such as raising a hand) can be recognized by matching the thinned graph subjected to the weighting described above with a model graph prepared in advance, and erroneously detected when detecting a human body. Thus, the detected area can be excluded without wasteful processing.

ここで、対象人物の手が挙がっている場合の対策処理について、図を用いて説明する。図１３は、対象人物の手が挙がっている場合の対応を説明するための図である。図１３（ａ）には、手が挙がっている場合の細線化した状態を示しており、図１３（ｂ）は、拡張したモデルグラフデータベースの項目例を示している。図１３（ｂ）の項目例では、図１２（ａ）のデータ例と比較して、挙手時の頭部位置決定処理用のパラメータである「ｈｅａｄ」と、手が上がっているモデルであることを示す「ｈａｎｄｓＵＰ」等が追加されている。 Here, countermeasure processing when the target person's hand is raised will be described with reference to the drawings. FIG. 13 is a diagram for explaining the correspondence when the target person's hand is raised. FIG. 13A shows a thinned state when a hand is raised, and FIG. 13B shows an example of an expanded model graph database item. In the item example of FIG. 13B, compared to the data example of FIG. 12A, “head” that is a parameter for the head position determination process at the time of raising the hand and a model in which the hand is raised. “HandsUP” or the like indicating “” is added.

つまり、図１３の例では、手が挙がっているか否かを「ｈａｎｄｓＵｐ」にて判別し、例えば、「ｈａｎｄｓＵｐ」が１以上の時、「ｈｅａｄ」の値別に処理を行い、手先と頭部を分離する。つまり、図１３（ａ）に示すように頭部ゾーンに２つの端点がある場合に、そのｘ方向（水平方向）を比較し、分岐点から離れた位置にある方を手先とするように設定する。 That is, in the example of FIG. 13, whether or not the hand is raised is determined by “handsUp”. For example, when “handsUp” is 1 or more, processing is performed according to the value of “head”, and the hand and head are separated. To separate. That is, as shown in FIG. 13A, when there are two end points in the head zone, the x direction (horizontal direction) is compared, and the one located far from the branch point is set as the hand. To do.

また、一例として、片方の手先が上がっている場合（ｈａｎｄｓＵｐ：１）、０番目の分岐点のｘ座標値と頭部・手先候補を比較し、ｘ座標値が近い方を頭部とするという処理を行うことで、手先と頭部を分ける。「ｈｅａｄ」には、どの分岐点と比較すべきかの情報を示している。手が挙がっている場合の対策処理は、マッチング処理の結果として、手を挙げているというパラメータが抽出した場合に上述した頭部・手先判別処理が行われる。なお、頭部・手先判別処理は、上述した部位検出手段１５により行われる処理である。 Also, as an example, when one hand is raised (handsUp: 1), the x coordinate value of the 0th branch point is compared with the head / hand candidate, and the one with the closest x coordinate value is defined as the head. By processing, the hand and head are separated. In “head”, information on which branch point should be compared is shown. In the countermeasure process when the hand is raised, the head / hand discrimination process described above is performed when a parameter indicating that the hand is raised is extracted as a result of the matching process. The head / hand discrimination process is a process performed by the above-described part detection means 15.

上述したように、モデルグラフマッチング処理を行うことで、特定の姿勢（手を挙げている等）を認識でき、人体検出時に誤って検出された領域を、無駄な処理することなく除外することができる。 As described above, by performing the model graph matching process, it is possible to recognize a specific posture (such as raising a hand), and to exclude a region that is erroneously detected at the time of human body detection without wasteful processing. it can.

＜第６の実施例：円検出＋細線化画像の応用＞
次に、本実施形態における部位認識の第６の実施例について具体的に説明する。図１４は、本実施形態における部位認識の第６の実施例を説明するための図である。図１４に示す第６の実施例では、上述した第３の実施例における円検出と細線化処理を行う場合に、手先候補の推定を行う。なお、手先候補の推定は、上述した部位検出手段１５により行われる処理である。 <Sixth Embodiment: Circle Detection + Thinned Image Application>
Next, the sixth example of part recognition in this embodiment will be specifically described. FIG. 14 is a diagram for explaining a sixth example of site recognition in the present embodiment. In the sixth embodiment shown in FIG. 14, hand candidates are estimated when the circle detection and thinning process in the third embodiment described above are performed. In addition, estimation of a hand candidate is a process performed by the site | part detection means 15 mentioned above.

第６の実施例では、図１４に示すように手先候補として腕の領域を推定する（図１４（ｆ））。具体的には、図１４（ａ）〜図１４（ｅ）までの処理は、上記した第３の実施例（図７）と同様であるが、第６の実施例の場合には、更に細線化したデータから腕の領域を推定する。なお、腕の領域の推定は、細線化した線分を画面に表示し、ユーザが入力手段等を用いて指示した領域を腕の領域と推定してもよく、また線分の端点及び分岐点の位置関係等に基づいて腕の領域を推定してもよい。 In the sixth embodiment, as shown in FIG. 14, an arm region is estimated as a hand candidate (FIG. 14 (f)). Specifically, the processing from FIG. 14 (a) to FIG. 14 (e) is the same as that of the third embodiment (FIG. 7), but in the case of the sixth embodiment, the thin line is further reduced. The arm region is estimated from the digitized data. The arm area may be estimated by displaying a thin line segment on the screen and estimating the area designated by the user using the input means as the arm area. The arm region may be estimated on the basis of the positional relationship.

その後、第６の実施例では、推定された手先候補の領域のみに円検出を行い、手先を特定する（図１４（ｇ））。これにより、円検出による探索範囲を減らせることができるため、より効率的且つ迅速に部位認識を行うことができる。 Thereafter, in the sixth embodiment, circle detection is performed only on the estimated hand candidate area to identify the hand (FIG. 14G). Thereby, since the search range by circle detection can be reduced, site | part recognition can be performed more efficiently and rapidly.

＜第７の実施例：円検出＋細線化グラフ（重み付きも含む）の応用＞
次に、本実施形態における部位認識の第７の実施例について具体的に説明する。図１５は、本実施形態における部位認識の第７の実施例を説明するための図である。図１５に示す第７の実施例では、上述した第６の実施例に対し、更に細線化処理による細線化された線分（図１５（ｅ））から手先候補を推定する際に、上述したモデルグラフとのグラフマッチングを行う（図１５（ｆ））。 <Seventh embodiment: Application of circle detection + thinning graph (including weighted)>
Next, a seventh example of part recognition in this embodiment will be specifically described. FIG. 15 is a diagram for explaining a seventh example of part recognition in the present embodiment. In the seventh embodiment shown in FIG. 15, the above-described sixth embodiment is further described in estimating the hand candidate from the thinned line segment (FIG. 15E) by the thinning process. Graph matching with the model graph is performed (FIG. 15 (f)).

具体的には、上述した行列モデル（モデルグラフ）に手先候補となる腕の位置又は領域等を１又は複数設定しておき、細線化した線分を行列化（グラフ化）して、行列モデルとマッチングすることにより、手先候補の領域（図１５（ｇ）を容易に取得することができる。 Specifically, one or a plurality of arm positions or regions as hand tips are set in the matrix model (model graph) described above, and the thinned line segments are matrixed (graphed) to form a matrix model. By matching with, the hand candidate area (FIG. 15G) can be easily obtained.

その後、第７の実施形態では、モデルグラフマッチングにより得られる推定された手先候補の位置又は領域のみに対して円検出を行い、手先を特定する（図１５（ｈ））。これにより、より正確に候補の推定を行うことができ、円検出による探索範囲を更に減らせることができるため、より効率的且つ迅速に部位認識を行うことができる。 Thereafter, in the seventh embodiment, circle detection is performed only on the position or region of the estimated hand candidate obtained by model graph matching, and the hand is specified (FIG. 15 (h)). Thereby, candidates can be estimated more accurately, and the search range by circle detection can be further reduced, so that site recognition can be performed more efficiently and quickly.

＜人体エッジとグラフマッチングとを用いた手先の円検出手法について＞
ここで、例えば上述した第７の実施例に示す人体エッジとグラフマッチングとを用いた手先の円検出手法について、具体的に図を用いて説明する。図１６は、本実施形態における人体エッジを用いた円検出手法の具体例を説明するための図である。 <About circle detection method of hand using human body edge and graph matching>
Here, for example, the hand circle detection method using the human body edge and the graph matching shown in the seventh embodiment will be specifically described with reference to the drawings. FIG. 16 is a diagram for explaining a specific example of the circle detection method using the human body edge in the present embodiment.

本実施形態における人体エッジを用いた円検出手法では、例えば次の（ア）〜（エ）の例が考えられる。
（ア）エッジ検出→エッジの２値化→ヒストグラムに蓄積、判別を行う。
（イ）エッジ検出→エッジの２値化→エッジの細線化→ヒストグラムに蓄積、判別を行う。
（ウ）エッジ検出→エッジの２値化→エッジの細線化→エッジの円に対する占有度をヒストグラムに蓄積し、判別を行う。
（エ）細線化画像を用いて円の中心候補を絞り込んだ、上記（ア）〜（ウ）の方法を行う。ここで、上記（ア）〜（ウ）の具体的な処理について、以下に説明する。 In the circle detection method using the human body edge in the present embodiment, for example, the following examples (a) to (d) can be considered.
(A) Edge detection → edge binarization → accumulation and discrimination in a histogram.
(A) Edge detection → edge binarization → edge thinning → accumulation and determination in a histogram.
(C) Edge detection → edge binarization → edge thinning → edge occupancy with respect to a circle is accumulated in a histogram for discrimination.
(D) The above methods (a) to (c) in which the circle center candidates are narrowed down using the thinned image are performed. Here, the specific processes (A) to (C) will be described below.

＜（ア）の手法について＞
まず、上述したように画像全体のエッジを検出した後、有効なエッジのみを残すため、閾値処理等によりエッジを２値化する。その後、該当領域のエッジをヒストグラム化し、円判定を行う。ヒストグラムに加算する際は、エッジがある画素１つに対し、該当の角度に１加算する。２値化前のエッジの値を、ヒストグラムに加算してもよい。 <About the method (a)>
First, as described above, after detecting the edges of the entire image, the edges are binarized by threshold processing or the like in order to leave only valid edges. Thereafter, the edge of the corresponding region is converted into a histogram, and circle determination is performed. When adding to the histogram, 1 is added to the corresponding angle for each pixel with an edge. The edge value before binarization may be added to the histogram.

このとき、図１６（ａ）に示すように、原画像からエッジ検出を行った際には、エッジ強度情報を持ち、その後エッジの２値化を行った場合にはエッジか否かの情報を持つことになる。 At this time, as shown in FIG. 16A, when edge detection is performed from the original image, it has edge strength information, and when binarization of the edge is performed thereafter, information on whether or not the edge is detected. Will have.

＜（イ）の手法について＞
上述した（ア）の手法では、エッジ強度が高い場合等に円が２画素幅以上の線で構成されることでヒストグラムの加算値が増え、円らしい箇所が多く検出されてしまう可能性がある。そこで、（イ）の手法では、図１６（ｂ）に示すように、エッジを細線化し、１画素の線とすることで、円ではない箇所の無駄な検出を抑える。ここで、エッジの２値化及び細線化には、一般的な処理であるＣａｎｎｙのエッジ検出処理を用いているが、本発明においてはこれに限定されるものではない。また、（イ）の手法の場合、ヒストグラムに加算する際は、エッジがある画素１つに対し、該当の角度に１加算する。 <About the method (a)>
In the method (a) described above, when the edge strength is high, the circle is composed of lines having a width of 2 pixels or more, so that the added value of the histogram increases, and a lot of circle-like parts may be detected. . Therefore, in the method (A), as shown in FIG. 16B, the edge is thinned to form a line of one pixel, thereby suppressing useless detection of a portion that is not a circle. Here, for edge binarization and thinning, a Canny edge detection process, which is a general process, is used. However, the present invention is not limited to this. In the case of the method (A), when adding to the histogram, 1 is added to the corresponding angle for each pixel having an edge.

＜（ウ）の手法について＞
上述した（イ）では、円の大きさが大きくなるほど、円を構成する画素数が増え、ヒストグラムの加算値が高くなる。一方、小さい円は加算値が少なくなり、票数に閾値を与えて円検出を行う場合に、小さい円を検出しにくくなる可能性がある。また、小さい円を検出するためには閾値を下げる必要があり、その結果、円ではないエッジの塊を誤って円として検出してしまう可能性がある。 <About method (c)>
In (a) described above, as the size of the circle increases, the number of pixels constituting the circle increases and the added value of the histogram increases. On the other hand, a small circle has a small addition value, and it may be difficult to detect a small circle when performing circle detection by giving a threshold to the number of votes. Further, in order to detect a small circle, it is necessary to lower the threshold value. As a result, there is a possibility that an edge lump that is not a circle is erroneously detected as a circle.

そこで、（ウ）の手法では、注目領域の中心から注目する画素の距離を用い、注目画素が仮に真円を構成していると仮定したうえで、その注目画素が円周を占める割合（以下、「円占有度」という）をヒストグラムに加算する方法を用いる。これにより、大小の円を統合的に扱うことができ、円の大きさに関係なく円検出を行うことができる。 Therefore, in the method (c), using the distance of the pixel of interest from the center of the region of interest, assuming that the pixel of interest constitutes a perfect circle, the ratio of the pixel of interest to the circumference (hereinafter referred to as the circle) , “Circle occupancy”) is added to the histogram. Thus, large and small circles can be handled in an integrated manner, and circle detection can be performed regardless of the size of the circle.

ここで、（ウ）の手法について、図１６（ｃ）等を用いて、更に具体的に説明する。まず定数として注目領域の中心から注目する画素までの距離をｒとする。また、注目画素がエッジである場合、注目領域の中心をそのまま中心とした真円を構成していると仮定し、その注目画素が画素のサイズである「１」の長さ分円周を占めていることとする。このとき、半径がｒである円の円周は、「２πｒ」である。したがって、注目画素の円占有度は、「１／２πｒ」である。このようにして算出した円占有度をヒストグラムに加算し、同様の処理を注目領域の全画素に対して行う。このとき、注目領域にある図形が、注目領域の中心をそのまま中心とするエッジの途切れていない真円であれば、理想的には２πｒ個エッジが存在し、図１６（ｃ）に示すように、ヒストグラムの値が１（つまり、円周を１００％途切れなくエッジが並んでいる）と考えることができる。これにより、例えば、「円占有度が５０％以上であれば円とする」等の閾値を用いて円の判別を行うことができる。 Here, the method (c) will be described more specifically with reference to FIG. First, let r be the distance from the center of the region of interest to the pixel of interest as a constant. If the pixel of interest is an edge, it is assumed that a true circle is formed with the center of the region of interest as the center, and the pixel of interest occupies the circumference of the length of “1” that is the size of the pixel. Suppose that At this time, the circumference of a circle having a radius r is “2πr”. Therefore, the circle occupancy of the target pixel is “½πr”. The circle occupancy calculated in this way is added to the histogram, and the same processing is performed for all the pixels in the attention area. At this time, if the figure in the attention area is a perfect circle with the center of the attention area as it is and the edge is not interrupted, there are ideally 2πr edges, as shown in FIG. It can be considered that the value of the histogram is 1 (that is, the edges are aligned 100% without interruption). Thereby, for example, a circle can be determined using a threshold value such as “circle if the degree of circle occupation is 50% or more”.

この方法では、注目領域にある図形が楕円形状の場合でも、円周をどの程度エッジで占めているかを把握し、他の領域との比較を行う際の有用な指標となる。楕円形状の場合、中心に近い（短軸に近い）エッジは占有度が大きくなり（１票の重み大）、逆に中心から遠い（長軸に近い）エッジは占有度が小さくなる（１票の重み小）。楕円が途切れていない場合は、円占有度が１に近似できる値になると考えられる。 In this method, even if the figure in the attention area is an ellipse, it is a useful index for grasping how much the circumference occupies the edge and comparing with other areas. In the case of an ellipse, an edge closer to the center (closer to the short axis) has a higher occupancy (large weight of one vote), whereas an edge far from the center (close to the longer axis) has a lower occupancy (one vote). Small weight). When the ellipse is not interrupted, the circle occupancy is considered to be a value that can be approximated to 1.

ただし、円のどの位置にある場合でも円周を「１」占めていると仮定しているため、円が途切れていない場合に、円占有度が１を超える場合がある。つまり、円占有度を用いたヒストグラムを使う場合、エッジの２値化・細線化をすることで、より有効な処理が可能となる。 However, since it is assumed that the circumference occupies “1” in any position of the circle, the circle occupancy may exceed 1 when the circle is not interrupted. That is, when using a histogram using the degree of circle occupancy, more effective processing is possible by binarizing and thinning the edges.

ここで、上述の（ウ）の処理手順についてフローチャートを用いて説明する。図１７は、（ウ）の処理手順の一例を示すフローチャートである。図１７において、まず上述したようにエッジを検出し（Ｓ１１）、検出したエッジの２値化を行い（Ｓ１２）、細線化を行う（Ｓ１３）。 Here, the processing procedure of the above (c) will be described using a flowchart. FIG. 17 is a flowchart illustrating an example of the processing procedure of (c). In FIG. 17, first, an edge is detected as described above (S11), the detected edge is binarized (S12), and thinning is performed (S13).

次に、注目領域分のループ処理を行う。注目領域分のループ処理としては、注目領域の画素分ループ処理として、まず注目している画素がエッジか否かを判断し（Ｓ１４）、エッジである場合（Ｓ１４において、ＹＥＳ）、上述したように円占有度を算出し（Ｓ１５）、ヒストグラムに追加する（Ｓ１６）。Ｓ１６の処理が終了後、又は、Ｓ１４の処理において、注目している画素がエッジでない場合（Ｓ１６において、ＮＯ）、次の画素に対して同様の処理を行う。 Next, a loop process for the region of interest is performed. As the loop processing for the attention area, as the loop processing for the pixel of the attention area, it is first determined whether or not the pixel of interest is an edge (S14), and if it is an edge (YES in S14), as described above. The circle occupancy is calculated (S15) and added to the histogram (S16). After the process of S16 is completed or when the pixel of interest is not an edge in the process of S14 (NO in S16), the same process is performed on the next pixel.

画素分のループ処理が終了後、そのヒストグラムの結果から、その注目領域が円であるか否かを判断し（Ｓ１７）、円でない場合（Ｓ１７において、ＮＯ）、他の注目領域を設定し、その注目領域に対して同様の処理を行う。また、Ｓ１７の処理において、その注目領域が円である場合（Ｓ１７において、ＹＥＳ）、処理を終了する。 After the loop processing for pixels is completed, it is determined whether or not the attention area is a circle from the result of the histogram (S17). If it is not a circle (NO in S17), another attention area is set. The same processing is performed on the attention area. Further, in the process of S17, when the attention area is a circle (YES in S17), the process ends.

なお、上述の注目領域分のループ処理としては、例えば、予め設定された細線化画像の腕上を円の中心候補とし、胴体方向から手先方向までを対象としてループ処理を行うことができるが、本発明においてはこれに限定されるものではない。 In addition, as the loop processing for the region of interest described above, for example, it is possible to perform loop processing from the trunk direction to the hand direction, with the upper arm of the thinned image set in advance as a circle center candidate, The present invention is not limited to this.

また、上述の処理では、最初に円が検出された場合又は全領域で円が検出されなかった場合に終了しているが、本発明においてはこれに限定されるものではなく、例えば円検出された注目領域の位置を記憶し、更にループ処理を続けてもよい。 In the above-described processing, the process ends when a circle is first detected or when a circle is not detected in the entire region. However, the present invention is not limited to this. For example, a circle is detected. The position of the attention area may be stored, and the loop process may be continued.

＜手先検出の応用例＞
次に、本実施形態における手先検出の応用例について説明する。上述した手先検出手法については、画像中に含まれる両方の手について同様の処理をそれぞれの手について行うことによりそれぞれの手先を検出することができるが、本実施形態においてはこれに限定されるものではなく、例えば検出された一方の手先から取得される情報に基づいて、未検出である他方の手先を検出することができる。その具体例について以下に説明する。図１８は、手先検出の実施例を説明するための図である。 <Application example of hand detection>
Next, an application example of hand detection in the present embodiment will be described. Regarding the hand detection method described above, each hand can be detected by performing the same processing on both hands included in the image, but in the present embodiment, the present invention is limited to this. Instead, the other undetected hand can be detected based on, for example, information acquired from the detected one hand. Specific examples thereof will be described below. FIG. 18 is a diagram for explaining an example of hand detection.

図１８の例では、１つの画像中における同一人物の両手については、一方の手と他方の手は、その色情報や形状の特徴が同一又は類似しているため、例えば検出された一方の手先領域の形状パターンや画像の色情報等の特徴を用いてテンプレートを生成し、生成したテンプレートに基づいて人体領域全体にマッチング処理を行い、最も一致する箇所を手先とする。 In the example of FIG. 18, for both hands of the same person in one image, the color information and shape characteristics of one hand and the other hand are the same or similar. A template is generated using features such as the shape pattern of the region and the color information of the image, and matching processing is performed on the entire human body region based on the generated template, and the most matching part is used as a hand.

具体的に説明すると、まず検出した一方の手先からフィルタ情報を作成する。このフィルタ情報は、対象の画像に撮影されている人体を基準にした手先の大きさ、形状、位置等の手先情報を含んでいる。なお、図１８の例では、撮影された人体の身長から推定される手先の大きさ、形状等から生成した円をフィルタとする。また、フィルタの内容は、検出した手先の画素値そのままとする。 More specifically, first, filter information is created from one detected hand. This filter information includes hand information such as the hand size, shape and position based on the human body photographed in the target image. In the example of FIG. 18, a circle generated from the size and shape of the hand estimated from the height of the photographed human body is used as a filter. Further, the content of the filter is left as it is as the detected hand pixel value.

次に、このフィルタ情報を用いてテンプレートマッチングを行う。図１８では、得られた人体領域の最上部から、右側へ順にフィルタを移動し、その一致度が予め設定される閾値を超えた場合に、その部分を手先とする。 Next, template matching is performed using this filter information. In FIG. 18, when the filter is moved sequentially from the top of the obtained human body region to the right side, and the degree of coincidence exceeds a preset threshold value, that portion is used as a hand.

ここで、フィルタ情報に含まれる形状は、円の他、点、矩形、その他任意の形（楕円や長方形）、等、自由に設定することができる。また、フィルタ内の値は、検出した手先の画素値の他、肌色として規定した値を領域全体に分布させたり、手先座標上の画素値を領域全体に分布させたり、検出した手先領域の平均画素値を領域内に分布させる等を行うことができる。 Here, the shape included in the filter information can be freely set such as a circle, a point, a rectangle, or any other shape (an ellipse or a rectangle). In addition to the detected hand pixel value, the value in the filter distributes the value defined as the skin color over the entire area, distributes the pixel values on the hand coordinates over the entire area, or averages the detected hand area For example, pixel values can be distributed in a region.

また、テンプレートマッチングを行う箇所は、例えば、画像全体、人体領域内、人体領域内の手部ゾーン（手先があると重み付けした範囲）等が可能であり、予め設定された順序や位置に基づいて処理を行うことができる。 The template matching location can be, for example, the entire image, a human body region, a hand zone in the human body region (a range weighted when there is a hand), and the like, based on a preset order and position. Processing can be performed.

更に、本実施例において、類似度を数式により計算することもでき、その場合に計算する式は、一般的に以下に示す式（１）のようなＲ_ＳＳＤを用い、算出されたＲ_ＳＳＤが予め設定された値以上の場合には、一致であると判断することができる。なお、Ｒ_ＳＳＤの「Ｒ」は類似度を示し、「ＳＳＤ」は「ＳｕｍｏｆＳｑａｒｅｄＤｉｆｆｅｒｅｎｃｅ」の略であり、距離の算出方法が、差の二乗和を用いているという意味である。 Further, in the present embodiment, the similarity can be calculated by a mathematical formula. In this case, the formula to be calculated is generally an R _SSD such as the following formula (1), and the calculated R _SSD is When the value is equal to or greater than a preset value, it can be determined that the values match. Note that “R” of R _SSD indicates similarity, and “SSD” is an abbreviation of “Sum of Squared Difference”, which means that the distance calculation method uses the sum of squares of differences.

ここで、式（１）におけるＭはテンプレート画像の横の画素数を示し、Ｎはテンプレート画像の縦の画素数を示している。 Here, M in Expression (1) indicates the number of horizontal pixels of the template image, and N indicates the number of vertical pixels of the template image.

なお、上述した数式は、本発明についてはこれに限定されるものではない。例えば、式（１）に示す距離は「ユークリッド距離」と呼ばれ、ユークリッド距離は画素同士の直線距離のことであるが、ユークリッド距離の他、「チェス盤距離」「市街地距離」等の考え方があり、本発明ではそれらを用いることも可能である。 Note that the above-described mathematical expressions are not limited to this for the present invention. For example, the distance shown in the equation (1) is called “Euclidean distance”, and the Euclidean distance is a linear distance between pixels. In addition to the Euclidean distance, there are ways of thinking such as “chessboard distance” and “city distance”. They can also be used in the present invention.

また、別の手法として、閾値を越える画素を手先とする場合の他、最も一致度の高い箇所を手先とする方法もある。 As another method, there is a method in which a pixel having a highest matching degree is used as a hand, in addition to a case where a pixel exceeding a threshold is used as a hand.

更に、他の実施例としては、例えば、予め設定される矩形等の袖口形状に基づいて袖口位置を取得し、手先と袖口位置をペアにしたフィルタを行ったり、手先候補が多数現れた場合に、上述した円検出手法を用いて、最も円形度の高い（円らしい）箇所を手先とするといった処理を行うことができる。 Furthermore, as another embodiment, for example, when a cuff position is acquired based on a preset cuff shape such as a rectangle, and a hand and a cuff position are paired, or when many hand candidates appear Using the above-described circle detection method, it is possible to perform processing such as using a point with the highest circularity (appearing to be a circle) as a tip.

＜画面生成手段により生成される画面例＞
次に、本実施形態における画面生成手段１７により生成される画面例について、図を用いて説明する。図１９は、本実施形態により生成される画面の一例を示す図である。 <Example of screen generated by screen generation means>
Next, an example of a screen generated by the screen generation unit 17 in the present embodiment will be described with reference to the drawings. FIG. 19 is a diagram illustrating an example of a screen generated according to the present embodiment.

図１９に示す画面４０では、原画像４１、細線化画像４２、人体エッジ画像４３、部位認識結果画像４４等を一度に表示することができる。また、それぞれの画像を選択することで、その対象画像を拡大表示したり、時系列に映像を表示することもできる。なお、映像を表示する場合には、原画像４１、細線化画像４２、人体エッジ画像４３、及び部位認識結果画像４４は、それぞれ同期して同じ時間の内容が表示されることが好ましいが、少なくとも１つの画像みが異なった時間の内容を表示させてもよい。 On the screen 40 shown in FIG. 19, an original image 41, a thinned image 42, a human body edge image 43, a part recognition result image 44, and the like can be displayed at a time. Further, by selecting each image, the target image can be enlarged and displayed in time series. In addition, when displaying an image, it is preferable that the original image 41, the thinned image 42, the human body edge image 43, and the part recognition result image 44 are displayed in synchronization with each other at the same time. One image may be displayed at different times.

また、本実施形態では、例えば図１９に示す部位認識結果画像４４のように、画像上に所定の文字情報（例えば、手先の座標や頭部等の部位情報、撮影時刻等）を表示させることができる。 In the present embodiment, predetermined character information (for example, hand coordinates, part information such as the head, imaging time, etc.) is displayed on the image, for example, as a part recognition result image 44 shown in FIG. Can do.

なお、画面生成手段１７により生成される画面の内容やレイアウト等については、本発明においてはこれに限定されるものではなく、例えば挙動認定手段１６により得られた挙動結果や通知手段１８により通知された内容、その他処理実行時における各種エラーメッセージ等を表示することもできる。更に、画面生成手段１７は、時間の異なる部位認識結果を連続的に並べて表示させたり、合成して表示させた画面を生成することができる。 The contents and layout of the screen generated by the screen generation means 17 are not limited to this in the present invention. For example, the behavior result obtained by the behavior recognition means 16 and the notification means 18 are notified. In addition, various error messages at the time of execution of other processes can be displayed. Furthermore, the screen generation means 17 can generate a screen in which the part recognition results at different times are displayed side by side or combined and displayed.

ここで、図２０は、本実施形態により生成される他の画面例を示す図である。例えば、図２０（ａ）に示すように、人物の挙動が歩行行為と認識した場合には、その移動中の所定フレームを並べて表示する画面を生成する。 Here, FIG. 20 is a diagram illustrating another screen example generated by the present embodiment. For example, as shown in FIG. 20 (a), when the behavior of a person is recognized as a walking action, a screen that displays the moving predetermined frames side by side is generated.

なお、歩行行為と認識する場合には、まず上述した部位認識手段１５において、両手及び両足の部位を認識し、次に挙動認識手段１６において、その両手及び両足の間隔（例えば、大、中、小、無し等）、人体領域の移動距離（矩形中心のｘ座標の移動、或いは頭部座標の移動等）から挙動を認識することができる。つまり、図２０（ａ）の例では、歩行の状態で部位の間隔が変わり、その周期性が認められるため、手を振っている、足を開閉していると判断することができる。 In the case of recognizing a walking action, the part recognizing means 15 first recognizes the parts of both hands and both feet, and then the behavior recognizing means 16 determines the distance between both hands and both feet (for example, large, medium, The behavior can be recognized from the movement distance of the human body region (movement of the x coordinate of the rectangle center, movement of the head coordinate, etc.). That is, in the example of FIG. 20A, the interval between the parts changes in the walking state, and the periodicity is recognized, so that it can be determined that the hand is waving and the foot is opened and closed.

また、本実施形態では、画面生成手段１７により図２０（ｂ）に示すように複数のフレームを合成した合成画像を表示することができる。このとき、人体領域が移動していたとしてもその速度によっては重なる部分が存在する。そのため、本実施形態では、図２０（ｂ）に示すように、人体領域検出手段１４により得られる人体領域の輝度を変更して人体を透けて表示させることができる。また、画面生成手段１７は、合成画像を表示する際、例えば、図２０（ｂ）に示すように頭部座標の移動イメージを点や矢印等を用いて重ねて表示する画面を生成することもできる。なお、本実施形態では、挙動認識手段１６は、上述したように頭部座標の変位のみから挙動（移動行為等）を判断することもできる。 In the present embodiment, the screen generation means 17 can display a composite image obtained by combining a plurality of frames as shown in FIG. At this time, even if the human body region is moving, there is an overlapping portion depending on the speed. Therefore, in the present embodiment, as shown in FIG. 20B, the luminance of the human body region obtained by the human body region detecting means 14 can be changed and displayed through the human body. In addition, when the composite image is displayed, the screen generation unit 17 may generate a screen that displays the moving image of the head coordinates using dots, arrows, or the like as shown in FIG. 20B, for example. it can. In the present embodiment, the behavior recognition unit 16 can also determine a behavior (such as a moving action) only from the displacement of the head coordinates as described above.

また、図２０（ｃ）は、挙動認識として挙手行動を認識したときの連続フレーム画像を示し、図２０（ｄ）は、図２０（ｃ）の連続フレームを図２０（ｂ）と同様に重ね合わせた例を示している。この場合、腕の部分しか移動していないため、図２０（ｄ）に示すように、輝度変化させる領域を一部の領域（例えば、腕部分のみ）にして表示させることができる。 20C shows a continuous frame image when the raising hand action is recognized as the behavior recognition, and FIG. 20D shows the continuous frame of FIG. 20C overlapped similarly to FIG. 20B. A combined example is shown. In this case, since only the arm portion is moved, as shown in FIG. 20D, the region whose luminance is changed can be displayed as a partial region (for example, only the arm portion).

また、図２０（ｅ）は、挙動認識として殴る行動を認識したときの連続フレーム画像を示し、図２０（ｆ）は、図２０（ｅ）の連続フレームを図２０（ｂ）と同様に重ね合わせた例を示している。この場合、腕の部分しか移動していないため、図２０（ｆ）に示すように、輝度変化させる領域を一部の領域（例えば、腕部分のみ）にして表示させることができる。 FIG. 20 (e) shows a continuous frame image when recognizing a behavior to be recognized as behavior recognition, and FIG. 20 (f) shows the continuous frame of FIG. 20 (e) overlapped as in FIG. 20 (b). A combined example is shown. In this case, since only the arm portion is moved, as shown in FIG. 20 (f), the region whose luminance is changed can be displayed as a partial region (for example, only the arm portion).

なお、本実施形態では、図２０（ａ）〜（ｆ）に示すように、部位認識手段１５により認識された部位を丸印等で強調表示することもでき、更に挙動認識手段１６により得られる移動の軌跡を矢印等で表示することもできる。 In this embodiment, as shown in FIGS. 20A to 20F, the part recognized by the part recognition unit 15 can be highlighted with a circle or the like, and further obtained by the behavior recognition unit 16. The trajectory of movement can also be displayed with an arrow or the like.

更に、画面生成手段１７は、手先を認識した位置を拡大させることもでき、これにより、防犯上有用な把持物の情報や、物を掴む行為を、容易に確認することができる。なお、カメラからのリアルタイム映像が表示されている場合には、挙動認識手段１６は、撮影されたカメラの機能を用いて、その部位のズームアップした画像を表示させることができる。 Furthermore, the screen generation means 17 can also enlarge the position where the hand has been recognized, whereby it is possible to easily confirm information on the grasped object useful for crime prevention and the action of grasping the object. When a real-time video from the camera is displayed, the behavior recognition unit 16 can display a zoomed-up image of the part using the function of the photographed camera.

上述したように本発明によれば、撮影等により得られた映像や画像に含まれる人物の部位等を高精度に認識することができる。具体的には、本発明によれば、色情報を用いない為、顔や手先を覆うものがあっても実施することができる。また、本発明によれば、照明変化に強いので、顔と手先に色の差異がある場合の他、暗くてもエッジが見えれば部位認識を行うことができる。同時に、照明が強くてもよく、白黒画像であっても色合いの調整が悪くてもよい。 As described above, according to the present invention, it is possible to recognize a human part included in a video or image obtained by photographing or the like with high accuracy. Specifically, according to the present invention, since color information is not used, the present invention can be carried out even if there is something that covers the face or hand. Further, according to the present invention, since it is resistant to changes in illumination, it is possible to recognize a part if the edge is visible even when it is dark, as well as when there is a color difference between the face and the hand. At the same time, the illumination may be strong, and even a black and white image may have poor color adjustment.

また、人物がカメラに対して正面を向いていなくても部位認識を行うことができる。また本発明は、アナログカメラに限定されず、ネットワークカメラやＵＳＢカメラであってもよく、また赤外線カメラ等であってもよい等、適用範囲が広い。更に、本発明を実施する場合には、カメラ１台から取得した映像又は画像があればよいため、複数のカメラを必要とせず安価である。また、全国各地に多く設置されているアナログカメラからの映像や画像をそのまま用いることができる。 Further, the region recognition can be performed even if the person does not face the front of the camera. The present invention is not limited to an analog camera, and may be a network camera, a USB camera, an infrared camera, or the like, and has a wide range of applications. Furthermore, when the present invention is carried out, it is only necessary to have a video or an image acquired from one camera, so a plurality of cameras are not required and the cost is low. In addition, video and images from analog cameras that are often installed throughout the country can be used as they are.

また、本発明によれば、画像の背景等にも限定されず、対象人物に所定の動作やマーカを強要することもなく、把持物があったとしても円検出等を用いることで、容易に所定の部位を取得することができる。 Further, according to the present invention, the present invention is not limited to the background of the image, and does not force the target person to perform a predetermined action or marker. A predetermined part can be acquired.

なお、本発明の応用例として、例えば、特定の挙動を認識することで、監視員や警察に自動通報でき、また監視員の負担を軽減することができる。具体的には、本発明は、手先を前後させるような行動を「殴る」とし、警察に即時通報したり、手先を振って移動する（通過する）人物に対しては監視対象外とする等の処理に活用することができる。 In addition, as an application example of the present invention, for example, by recognizing a specific behavior, it is possible to automatically report to a supervisor or the police, and to reduce the burden on the supervisor. Specifically, according to the present invention, an action that moves the hand forward and backward is “spoken” and immediately notified to the police, or a person who moves (passes through) the hand is excluded from monitoring. It can be used for processing.

また、本発明において、足先の検出に利用すれば足の位置に関する情報を用いて歩行検出も可能になる。また、所定の行為のみ確認すればよいため、確認対象が減り、監視員に複数のモニタを同時に確認させることもできる。したがって、コスト低減を図ることができる。 Further, in the present invention, if it is used for detecting the tip of a foot, it is possible to detect walking using information on the position of the foot. Moreover, since only a predetermined action needs to be confirmed, the number of confirmation objects is reduced, and it is possible to allow a monitor to confirm a plurality of monitors simultaneously. Therefore, cost reduction can be achieved.

更に、警備員の巡回において、例えば戸締りや消火器の位置を確認する際に「指差呼称」と呼ばれる指差しの確認を行うが、このような指差呼称が行われた否かを認識することで、警備員の勤怠管理に応用できる。 Furthermore, in the patrol of the security guard, for example, when confirming the position of the door lock or the fire extinguisher, confirmation of pointing called “pointing designation” is performed, and it is recognized whether or not such pointing designation is performed. Therefore, it can be applied to attendance management of security guards.

また、本発明における挙動認識手法を利用して、例えば「手を振ったら照明を消す」といったユーザインターフェースの機能として利用することもできる。また、本発明における特定の挙動の認識により、例えば商品を手に取って戻したというような行動を認識することで、詳細なユーザの動作結果をマーケティングに反映させることができる。 In addition, the behavior recognition method according to the present invention can be used as a function of a user interface such as “turn off the light when the hand is waved”. In addition, by recognizing a specific behavior in the present invention, for example, by recognizing a behavior such as picking up a product and returning it, a detailed user operation result can be reflected in marketing.

また本発明は、車載カメラによる映像の処理から、手を挙げてタクシーを止める動作を行う人物の認識や交通整理員の動作の認識を可能とする。また、本発明は、上述した人体領域体検出や部位認識を応用して、アニメーション映像を生成することで、既存の防犯カメラを活かし、人物の動作のみを表現したプライバシー映像を作成することもできる。 In addition, the present invention enables recognition of a person who performs a motion of stopping a taxi by raising his hand or a motion of a traffic controller from the processing of an image by a vehicle-mounted camera. In addition, the present invention can also create a privacy image that expresses only a person's actions by utilizing the existing security camera by generating an animation image by applying the above-described human body region body detection and part recognition. .

また本発明は、医師や理学療法士等が不在でも、自動的にリハビリを支援できるシステムに応用できる。例えば、リハビリを行う人物を撮影し、手足の運動機能回復訓練の様子を計測したり、「もっと手を上へ」といったように、手足の位置を適切な位置に指示するシステム等に応用することができる。 Further, the present invention can be applied to a system that can automatically support rehabilitation even in the absence of a doctor or a physical therapist. For example, taking a picture of a person performing rehabilitation, measuring the state of exercise function recovery training of the limbs, or applying it to a system that directs the position of the limbs to an appropriate position, such as "more hands up" Can do.

以上本発明の好ましい実施例について詳述したが、本発明は係る特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形、変更が可能である。 The preferred embodiments of the present invention have been described in detail above, but the present invention is not limited to such specific embodiments, and various modifications, within the scope of the gist of the present invention described in the claims, It can be changed.

１０部位認識装置
１１入力手段
１２出力手段
１３蓄積手段
１４人体領域検出手段
１５部位認識手段
１６挙動認識手段
１７画面生成手段
１８通知手段
１９送受信手段
２０制御手段
３１入力装置
３２出力装置
３３ドライブ装置
３４補助記憶装置
３５メモリ装置
３６ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）
３７ネットワーク接続装置
３８記録媒体
４０画面
４１原画像
４２細線化画像
４３人体エッジ画像
４４部位認識結果画像 DESCRIPTION OF SYMBOLS 10 Part recognition apparatus 11 Input means 12 Output means 13 Accumulation means 14 Human body area | region detection means 15 Part recognition means 16 Behavior recognition means 17 Screen generation means 18 Notification means 19 Transmission / reception means 20 Control means 31 Input apparatus 32 Output apparatus 33 Drive apparatus 34 Auxiliary Storage device 35 Memory device 36 CPU (Central Processing Unit)
37 Network connection device 38 Recording medium 40 Screen 41 Original image 42 Thinned image 43 Human body edge image 44 Region recognition result image

Claims

In a part recognition device for recognizing a part of a person included in a video or an image,
Human body region detecting means for detecting a human body region of at least one person included in the video or image;
Recognizing a predetermined part from the human body region obtained by the human body region detecting means,
The part recognition means includes
The human body region obtained by the human body region detection means is thinned, a matrix indicating the relationship between the end points and the branch points is generated based on the thinned information, and the generated matrix and a plurality of pre-registered persons A part recognition apparatus that detects the predetermined part by comparing with a matrix .

The part recognition means includes
Site according to claim 1, wherein an image including a human body region is subdivided into a plurality of regions, and generates a granular said branch point and said end point for each region and the matrix by applying a weight to Recognition device.

In a part recognition device for recognizing a part of a person included in a video or an image,
Human body region detecting means for detecting a human body region of at least one person included in the video or image;
Recognizing a predetermined part from the human body region obtained by the human body region detecting means,
The part recognition means includes
The human body region obtained by the human body region detection means is thinned, edge processing is performed on the human body region image, and the edge-processed human body is used with reference to the position of the line segment of the thinned human body region. A part recognizing apparatus, wherein a part where an edge of a region has a predetermined shape is detected, and the region where the predetermined shape is detected is set as the predetermined portion.

The part recognition means includes
Of the thinned human body region, the region where the predetermined part is present is estimated, and the edge-processed human body is used with reference to the position of the line segment of the thinned human body region included in the estimated region The part recognition apparatus according to claim 3 , wherein a part where an edge of the area has a predetermined shape is detected, and the area where the predetermined shape is detected is set as the predetermined part.

The part recognition means includes
Based on the thinned information, a matrix indicating the relationship between the end points and the branch points is generated, and the predetermined matrix is detected by comparing the generated matrix with a plurality of previously registered human matrices. The part recognition apparatus according to claim 3 or 4 , wherein

The part recognition means includes
6. The matrix according to claim 5 , wherein the image including the human body region is subdivided into a plurality of regions, and the matrix is generated by assigning weights to the end points and the branch points for each subdivided region. Site recognition device.

The part recognition means includes
Based on the characteristics of the hand area of one hand of the person, the part recognition apparatus according to any one of claims 1 to 6, characterized in that recognize other hand areas in the same image.

Claims 1 to 7 wherein the track in time series site obtained by the body region and / or the region recognition unit obtained by the body region detection unit, characterized by having a behavior recognizing means for recognizing the behavior of the person The site | part recognition apparatus of any one of these.

In a part recognition method for recognizing a part of a person included in a video or an image,
A human body region detecting step of detecting a human body region of at least one person included in the video or image;
Recognizing a predetermined part from the human body region obtained by the human body region detection step,
The site recognition step includes
The human body region obtained by the human body region detection step is thinned, a matrix indicating the relationship between the end points and the branch points is generated based on the thinned information, and the generated matrix and a plurality of pre-registered persons A part recognition method , wherein the predetermined part is detected by comparing with a matrix .

In a part recognition method for recognizing a part of a person included in a video or an image,
A human body region detecting step of detecting a human body region of at least one person included in the video or image;
Recognizing a predetermined part from the human body region obtained by the human body region detection step,
The site recognition step includes
The human body region obtained by the human body region detection step is thinned, edge processing is performed on the image of the human body region, and the edge-processed human body is used with reference to the position of the line segment of the thinned human body region. A part recognizing method, wherein a part where an edge of a region has a predetermined shape is detected, and the region where the predetermined shape is detected is set as the predetermined portion.

Computer
The part recognition program for functioning as a part recognition apparatus as described in any one of Claims 1 thru | or 8 .