JP2016177757A

JP2016177757A - Information processing system, information processing method and program for detecting foot position

Info

Publication number: JP2016177757A
Application number: JP2015059409A
Authority: JP
Inventors: 裕一中谷; Yuichi Nakatani
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2015-03-23
Filing date: 2015-03-23
Publication date: 2016-10-06
Anticipated expiration: 2035-03-23
Also published as: JP6481960B2

Abstract

PROBLEM TO BE SOLVED: To provide an information processing system, information processing method and program capable of suitably detecting a foot position.SOLUTION: An information processing system includes detection means for detecting a foot area by collating an input image with a plurality of templates, management means for managing foot position information about a foot position in each template of the plurality of templates, and output means for outputting the foot position on the basis of the foot position information corresponding to a template used to detect the foot area.SELECTED DRAWING: Figure 1

Description

本発明に係るいくつかの態様は、情報処理システム、情報処理方法及びプログラムに関する。 Some embodiments according to the present invention relate to an information processing system, an information processing method, and a program.

従来より、単眼カメラを車に取り付けて前方を監視し、得られた画像から危険性の高い歩行者を検出して警告する装置等が開発されている。このような装置においては、画像から歩行者を見つけることが極めて重要である。特に歩行者の中でも、車と接近している歩行者ほど衝突の危険性が高いため、車から歩行者までの距離を検出できることが望ましい。例えばミリ波レーダ等の測距センサを用いれば距離を容易に検出することが可能であるが、製品コストの上昇などの課題が生じる 2. Description of the Related Art Conventionally, a device that attaches a monocular camera to a car, monitors the front, detects a pedestrian with high risk from the obtained image, and warns is developed. In such a device, it is extremely important to find a pedestrian from the image. In particular, among pedestrians, pedestrians that are closer to the car have a higher risk of collision, so it is desirable to be able to detect the distance from the car to the pedestrian. For example, distance can be easily detected by using a distance measuring sensor such as a millimeter wave radar, but problems such as an increase in product cost arise.

ここで、もし歩行者の足元位置を画像上で正確に検出することができれば、カメラと実世界との対応関係により、単眼カメラの画像からでも車と歩行者との距離を測定することが可能である。 Here, if the pedestrian's foot position can be accurately detected on the image, the distance between the car and the pedestrian can be measured even from the image of the monocular camera based on the correspondence between the camera and the real world. It is.

ところで、人物が写る画像に対する画像処理には、例えば特許文献１記載の手法がある。特許文献１には、複数のテンプレート画像を用意し、それらのテンプレート画像と入力画像との一致度を算出することにより、腕や足の状態を同定すること等が開示されている。
その他、特許文献２−４にも関連技術が開示されている。 Incidentally, for example, there is a method described in Patent Document 1 for image processing on an image in which a person is captured. Patent Document 1 discloses that a plurality of template images are prepared and the degree of matching between the template image and the input image is calculated to identify the state of the arm or foot.
In addition, Patent Documents 2-4 also disclose related technologies.

特開平０８−２７９０４４号公報Japanese Patent Laid-Open No. 08-279044 特開平１０−２５５０５７号公報Japanese Patent Laid-Open No. 10-255057 特開２００７−００４７２１号公報JP 2007-004721 A 特開２００７−１８８４１７号公報JP 2007-188417 A

しかしながら、特許文献１記載の手法は、腕や足などの各部位の状態は同定しているものの、足元等、特定の部位の位置を検出する手法については何ら考慮されていない。 However, although the technique described in Patent Document 1 identifies the state of each part such as an arm or a leg, no consideration is given to a technique for detecting the position of a specific part such as a foot.

本発明のいくつかの態様は前述の課題に鑑みてなされたものであり、好適に足元位置を検出することのできる情報処理システム、情報処理方法及びプログラムを提供することを目的の１つとする。 Some aspects of the present invention have been made in view of the above-described problems, and an object thereof is to provide an information processing system, an information processing method, and a program capable of suitably detecting a foot position.

本発明に係る情報処理システムは、入力画像と、複数のテンプレートとを照合することにより足元領域を検出する検出手段と、前記複数のテンプレートの各テンプレート毎に、足元の位置に関する足元位置情報を管理する管理手段と、前記足元領域の検出に用いたテンプレートに対応する前記足元位置情報に基づき、足元位置を出力する出力手段とを備える。 An information processing system according to the present invention manages a foot position information relating to a foot position for each template of the plurality of templates, and a detecting unit that detects a foot region by comparing an input image with a plurality of templates. Management means, and output means for outputting the foot position based on the foot position information corresponding to the template used for detecting the foot area.

本発明に係る情報処理方法は、入力画像と、複数のテンプレートとを照合することにより足元領域を検出するステップと、前記複数のテンプレートの各テンプレート毎に、足元の位置に関する足元位置情報を管理するステップと、前記足元領域の検出に用いたテンプレートに対応する前記足元位置情報に基づき、足元位置を出力するステップとを情報処理装置が行う。 An information processing method according to the present invention includes a step of detecting a foot area by comparing an input image with a plurality of templates, and manages foot position information relating to a foot position for each template of the plurality of templates. The information processing apparatus performs a step and a step of outputting a foot position based on the foot position information corresponding to the template used for detecting the foot area.

本発明に係るプログラムは、入力画像と、複数のテンプレートとを照合することにより足元領域を検出する処理と、前記複数のテンプレートの各テンプレート毎に、足元の位置に関する足元位置情報を管理する処理と、前記足元領域の検出に用いたテンプレートに対応する前記足元位置情報に基づき、足元位置を出力する処理とをコンピュータに実行させる。 The program according to the present invention includes a process of detecting a foot area by comparing an input image and a plurality of templates, and a process of managing foot position information relating to a foot position for each template of the plurality of templates. And causing the computer to execute a process of outputting the foot position based on the foot position information corresponding to the template used for detecting the foot area.

なお、本発明において、「部」や「手段」、「装置」、「システム」とは、単に物理的手段を意味するものではなく、その「部」や「手段」、「装置」、「システム」が有する機能をソフトウェアによって実現する場合も含む。また、１つの「部」や「手段」、「装置」、「システム」が有する機能が２つ以上の物理的手段や装置により実現されても、２つ以上の「部」や「手段」、「装置」、「システム」の機能が１つの物理的手段や装置により実現されても良い。 In the present invention, “part”, “means”, “apparatus”, and “system” do not simply mean physical means, but “part”, “means”, “apparatus”, “system”. This includes the case where the functions possessed by "are realized by software. Further, even if the functions of one “unit”, “means”, “apparatus”, and “system” are realized by two or more physical means or devices, two or more “parts” or “means”, The functions of “device” and “system” may be realized by a single physical means or device.

本発明によれば、好適に足元位置を検出することのできる情報処理システム、情報処理方法及びプログラムを提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the information processing system, the information processing method, and program which can detect a foot position suitably can be provided.

第１実施形態に係る足元位置検出装置の概略構成を示す機能ブロック図である。It is a functional block diagram which shows schematic structure of the foot position detection apparatus which concerns on 1st Embodiment. 図１に示した足元位置検出装置の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process of the foot position detection apparatus shown in FIG. 足元位置分布情報を生成する際の処理の流れの具体例を示すフローチャートである。It is a flowchart which shows the specific example of the flow of a process at the time of producing | generating foot position distribution information. 図１に示す足元位置検出装置を実装可能なハードウェアの構成を示すブロック図である。It is a block diagram which shows the structure of the hardware which can mount the foot position detection apparatus shown in FIG. 第２実施形態に係る足元位置検出装置の概略構成を示す機能ブロック図である。It is a functional block diagram which shows schematic structure of the foot position detection apparatus which concerns on 2nd Embodiment.

以下に本発明の実施形態を説明する。以下の説明及び参照する図面の記載において、同一又は類似の構成には、それぞれ同一又は類似の符号が付されている。 Embodiments of the present invention will be described below. In the following description and the description of the drawings to be referred to, the same or similar components are denoted by the same or similar reference numerals.

（１第１実施形態）
図１乃至図４は、第１実施形態を説明するための図である。以下、これらの図を参照しながら、以下の流れに沿って本実施形態を説明する。まず「１．１」で全体の概要を説明した上で、「１．２」でシステムの機能構成の概要を説明する。続いて「１．３」で処理の流れを、「１．４」で本システムを実現可能なハードウェア構成の具体例を示す。最後に「１．５」以降で、本実施形態に係る効果等を説明する。 (1 First Embodiment)
1 to 4 are diagrams for explaining the first embodiment. Hereinafter, the present embodiment will be described along the following flow with reference to these drawings. First, an overview of the entire system is described in “1.1”, and then an overview of the functional configuration of the system is described in “1.2”. Next, “1.3” shows the flow of processing, and “1.4” shows a specific example of a hardware configuration capable of realizing this system. Finally, the effects and the like according to the present embodiment will be described after “1.5”.

（１．１概要）
従来から、単眼カメラを車に取り付けて前方を監視し、得られた画像から危険性の高い歩行者を検出して警告する装置が存在する。このように、画像から歩行者を検出することは重要である。歩行者を検出するための手法としては、例えば歩行者の画像と歩行者ではない画像とを大量に集め、これらの画像を元に統計的識別器を生成する手法が考えられる。 (1.1 Overview)
2. Description of the Related Art Conventionally, there is a device that attaches a monocular camera to a car, monitors the front, and detects and warns a pedestrian with high risk from the obtained image. Thus, it is important to detect a pedestrian from an image. As a method for detecting a pedestrian, for example, a method of collecting a large amount of pedestrian images and non-pedestrian images and generating a statistical classifier based on these images can be considered.

ここで、検出された歩行者の中でも、特に車と接近している歩行者ほど衝突の危険性が高いと考えられる。よって、車から歩行者までの距離を検出できることが望ましい。距離を検出する方法としては、例えばミリ波レーダ等の測距センサを用いることも考えられるが、当該手法はセンサをカメラとは別に用意する必要があるため、製品コストの上昇を招く。 Here, among the detected pedestrians, the pedestrians approaching the car are considered to have a higher risk of collision. Therefore, it is desirable to be able to detect the distance from the car to the pedestrian. As a method for detecting the distance, for example, a distance measuring sensor such as a millimeter wave radar may be used. However, since the sensor needs to be prepared separately from the camera, the product cost increases.

この点、もし歩行者の足元位置を画像上で正確に検出することができれば、カメラと実世界との対応関係により、単眼カメラの画像からでも車と歩行者との距離を測定することが可能である。 In this regard, if the pedestrian's foot position can be accurately detected on the image, the distance between the car and the pedestrian can be measured even from the image of a monocular camera based on the correspondence between the camera and the real world. It is.

歩行者の足元を見つける方法としては、先述の歩行者を検出する方法と同様に、足元の画像と、足元ではない画像とを大量に集め、これらの画像を元に統計的識別器を生成する手法が考えられる。特に、歩行者を検出する手法の１つとして、歩行者の全身の画像を用いるのではなく、歩行者を頭部や下半身、足元等に分解して各々検出し、その結果を統合する方法がある。このような手法により歩行者を検出している場合には、当該検出結果を利用することで、計算コストを追加することなく画像中の足元の領域を特定することができる。 As a method for finding the pedestrian's feet, similar to the method for detecting pedestrians described above, a large amount of images of the feet and non-foot are collected, and a statistical discriminator is generated based on these images. A method can be considered. In particular, as one of the methods for detecting a pedestrian, there is a method in which the pedestrian is decomposed into the head, the lower body, the feet, etc., and each result is detected instead of using an image of the pedestrian, and the results are integrated. is there. When a pedestrian is detected by such a method, the area of the foot in the image can be specified without adding a calculation cost by using the detection result.

しかしながら、このような手法で足元の領域を見つけられたとしても、足元の位置を特定することは難しい。足元の領域を示す画像にも、向こうずねよりも下のみを足元の領域として検出される場合、太もも辺りよりも下全てを含む領域として検出される場合、歩行中等の事情により脚が開いている場合、等様々な状態がありうる。このような豊富なバリエーション故に、たとえ足元の領域が特定されたとしても、足元の位置を高い精度で検出することはできない。 However, even if a foot region is found by such a method, it is difficult to specify the foot position. In the image showing the area of the foot, if only the area below the shin is detected as the area of the foot, if it is detected as the area including all below the thigh, the leg may be open due to circumstances such as walking There can be various states. Because of such abundant variations, even if the foot region is specified, the foot position cannot be detected with high accuracy.

更に、車に取り付ける車載機で利用できる計算リソースの量は、一般的なＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）やワークステーションに比べるとかなり限られる場合が通常である。利用できる電力量や装置の発熱量に対する制約が厳しいためである。よって、なるべく小さい計算コストで、足元位置の推定精度を向上させることが必要である。本実施形態に係る、足元位置を検出する手法は、この点に配慮している。 Furthermore, the amount of computing resources that can be used by an in-vehicle device attached to a car is usually considerably limited as compared with a general PC (Personal Computer) or workstation. This is because there are severe restrictions on the amount of power that can be used and the amount of heat generated by the device. Therefore, it is necessary to improve the estimation accuracy of the foot position with as little calculation cost as possible. The method of detecting the foot position according to the present embodiment takes this point into consideration.

以下、本実施形態に係る手法の概要を説明する。本実施形態に係る足元位置検出装置は、歩行者の足元を見つける方法として、足元の画像と、足元ではない画像とを多数集め、これらを元に統計的識別子を生成することにより足元領域を見つけるものとする。 Hereinafter, an outline of the method according to the present embodiment will be described. The foot position detection apparatus according to the present embodiment collects a large number of foot images and non-foot images as a method for finding the foot of a pedestrian, and finds a foot region by generating a statistical identifier based on these images. Shall.

本実施形態においては、統計的識別器には、ＧＬＶＱ（ＧｅｎｅｒａｌｉｚｅｄＬｅａｒｎｉｎｇＶｅｃｔｏｒＱｕａｎｔｉｚａｔｉｏｎ）のような、テンプレート型の識別器を用いる。テンプレート型の識別器は、足元のテンプレートと、非足元のテンプレートとを特徴空間上で学習し、当該学習結果であるテンプレートを用いて足元領域を識別する。このような識別器を用いた足元検出処理では、足元と考えられる部分画像領域の特徴量と最も近いテンプレートを探し、当該テンプレートが足元のテンプレートであれば当該部分画像領域を足元領域として判定し、非足元のテンプレートであれば非足元領域として判定する。 In this embodiment, a template classifier such as GLVQ (Generalized Learning Vector Quantization) is used as the statistical classifier. The template type discriminator learns a foot template and a non-foot template on the feature space, and identifies a foot region using the template that is the learning result. In the step detection process using such a discriminator, a template that is closest to the feature amount of the partial image region that is considered to be a step is searched, and if the template is a step template, the partial image region is determined as the step region, If it is a non-foot template, it is determined as a non-foot area.

ここで、足元テンプレートが例えば１００個あったとすると、１００個の足元テンプレートは、それぞれ様々な足元の形状を表現している。よって、それぞれの足元テンプレート毎に、対応する足元の位置（座標位置）は異なる。例えば、１番目の足元テンプレートにおいては右下部分に足元位置があり、２番目の足元テンプレートでは左下部分に足元位置がある、といった具合に、足元テンプレート毎に足元とみなすべき位置（座標）にはばらつきが生じる。 Here, if there are 100 foot templates, for example, the 100 foot templates represent various foot shapes. Therefore, the corresponding foot position (coordinate position) is different for each foot template. For example, in the first foot template, the foot position is in the lower right part, and in the second foot template, the foot position is in the lower left part. Variation occurs.

本実施形態に係る足元位置の検出方法では、テンプレート毎の足元位置のばらつきを事前に学習することにより、簡易な構成により足元位置の検出精度を高めることができる。 In the foot position detection method according to the present embodiment, the foot position detection accuracy can be increased with a simple configuration by learning in advance the variation in the foot position for each template.

（１．２システムの機能構成）
以下、図１を参照しながら、本実施形態に係る情報処理システムである足元位置検出装置１００のシステム構成を説明する。図１は、足元位置検出装置１００の機能構成を示すブロック図である。足元位置検出装置１００は、入力部１１０、足元識別器１２０、足元位置算出部１３０、出力部１４０、及び記憶部１５０を含む。 (1.2 System functional configuration)
Hereinafter, the system configuration of the foot position detection device 100 which is the information processing system according to the present embodiment will be described with reference to FIG. FIG. 1 is a block diagram illustrating a functional configuration of the foot position detection device 100. The foot position detection device 100 includes an input unit 110, a foot classifier 120, a foot position calculation unit 130, an output unit 140, and a storage unit 150.

入力部１１０は、例えば図示しない記憶媒体や入力インタフェース等から画像を１枚ずつ読込み、その中の部分領域である入力部分画像を足元識別器１２０へと渡す。 The input unit 110 reads images one by one from, for example, a storage medium (not shown), an input interface, or the like, and passes the input partial image that is a partial area to the foot classifier 120.

足元識別器１２０は、入力部分画像の特徴量と、予め学習された足元／非足元を示す複数のテンプレート１５１とを照合する。その結果、最も照合スコアの高いテンプレート１５１及びその照合スコアを、足元識別器１２０は足元位置算出部１３０へと出力する。 The foot classifier 120 collates the feature amount of the input partial image with a plurality of templates 151 indicating foot / non-foot learned in advance. As a result, the footstep discriminator 120 outputs the template 151 having the highest matching score and the matching score to the foot position calculation unit 130.

足元位置算出部１３０は、足元識別器１２０から入力された、最も確からしい、すなわち最も照合スコアの高いテンプレート１５１のテンプレート番号をキーとして、当該テンプレートに対応する足元位置分布情報１５２を記憶部１５０から検索する。 The foot position calculation unit 130 uses the template number of the template 151 with the highest probability, that is, the highest matching score input from the foot classifier 120 as a key, and stores foot position distribution information 152 corresponding to the template from the storage unit 150. Search for.

出力部１４０は、この結果得られた足元位置分布情報１５２を外部のソフトウェアや表示装置等に出力する。なお、出力部１４０が出力する情報は、足元位置分布情報１５２そのものではなくとも、例えば分布の平均位置や、分布のピーク位置等であっても良い。
記憶部１５０は、テンプレート１５１及び足元位置分布情報１５２を記憶する記憶装置である。 The output unit 140 outputs the foot position distribution information 152 obtained as a result to external software, a display device, or the like. Note that the information output by the output unit 140 may be, for example, the average position of the distribution, the peak position of the distribution, or the like, not the foot position distribution information 152 itself.
The storage unit 150 is a storage device that stores the template 151 and the foot position distribution information 152.

テンプレート１５１は、多数の足元の画像と足元ではない画像とを集めて学習させた結果得られるものである。テンプレート１５１には、様々な形状の足元部分画像の特徴量に対応する複数の足元テンプレート、及び様々な形状の非足元の部分画像の特徴量に対応する複数の非足元テンプレートを含むことができる。先述の通り足元識別器１２０は、入力された入力部分画像とそれぞれのテンプレート１５１とを照合し、最も確からしいテンプレート１５１を選択することにより、当該入力部分画像が足元領域であるか、非足元領域であるかを判別することができる。 The template 151 is obtained as a result of learning by collecting a large number of foot images and non-foot images. The template 151 can include a plurality of foot templates corresponding to the feature amounts of the foot partial images having various shapes, and a plurality of non-foot templates corresponding to the feature amounts of the non-foot partial images having various shapes. As described above, the foot classifier 120 collates the input partial image with each template 151 and selects the most likely template 151 to determine whether the input partial image is a foot region or a non-foot region. Can be determined.

足元位置分布情報１５２は、複数のテンプレート１５１の各々に対応する足元位置の分布に関する情報である。足元位置分布情報１５２は、後述する図３の学習により生成される。 The foot position distribution information 152 is information relating to the foot position distribution corresponding to each of the plurality of templates 151. The foot position distribution information 152 is generated by learning shown in FIG.

（１．３処理の流れ）
次に、足元位置検出装置１００の処理の流れを、図２及び図３を参照しながら説明する。図２及び図３を参照しながら説明する後述の各処理ステップは、処理内容に矛盾を生じない範囲で、任意に順番を変更して若しくは並列に実行することができ、また、各処理ステップ間に他のステップを追加しても良い。更に、便宜上１つのステップとして記載されているステップは複数のステップに分けて実行することもでき、便宜上複数に分けて記載されているステップを１ステップとして実行することもできる。 (1.3 Process flow)
Next, the processing flow of the foot position detection device 100 will be described with reference to FIGS. Each processing step, which will be described later with reference to FIGS. 2 and 3, can be executed in any order or in parallel as long as there is no contradiction in the processing contents. Other steps may be added. Further, a step described as a single step for convenience can be executed by being divided into a plurality of steps, and a step described as being divided into a plurality of steps for convenience can be executed as one step.

（１．３．１足元位置検出の流れ）
まず、図２を参照しながら足元位置検出の流れを説明する。図２は、本実施形態に係る足元位置検出装置１００が、足元位置を検出する際の処理の流れを示すフローチャートである。 (1.3.1 Flow of foot position detection)
First, the flow of foot position detection will be described with reference to FIG. FIG. 2 is a flowchart showing a flow of processing when the foot position detection device 100 according to the present embodiment detects the foot position.

まず入力部１１０は、例えば図示しない記憶媒体や入力インタフェース等から画像を１枚ずつ読込み、その中の部分領域である入力部分画像を足元識別器１２０へと出力する（Ｓ２０１）。足元識別器１２０は、入力された入力部分画像から特徴量を算出し、当該特徴量と、記憶部１５０に記憶された各々のテンプレート１５１とを照合する（Ｓ２０３）。その結果、最も確からしいテンプレート１５１及びその照合スコアを、足元識別器１２０は足元位置算出部１３０へと出力する（Ｓ２０５）。 First, the input unit 110 reads images one by one from, for example, a storage medium (not shown), an input interface, or the like, and outputs an input partial image that is a partial area to the foot classifier 120 (S201). The foot classifier 120 calculates a feature amount from the input partial image that has been input, and collates the feature amount with each template 151 stored in the storage unit 150 (S203). As a result, the foot classifier 120 outputs the most likely template 151 and its matching score to the foot position calculating unit 130 (S205).

足元位置算出部１３０は、最も確からしいテンプレート１５１が足元テンプレートであった場合には、当該テンプレートのテンプレート番号をキーとして、足元位置分布情報１５２を検索する（Ｓ２０７）。これにより、最も確からしいテンプレート１５１に対応する足元位置分布情報１５２が得られる。 If the most likely template 151 is a foot template, the foot position calculating unit 130 searches the foot position distribution information 152 using the template number of the template as a key (S207). Thereby, the foot position distribution information 152 corresponding to the most likely template 151 is obtained.

出力部１４０は、このようにして得られた足元位置分布情報１５２から、足元位置に関する情報を出力する（Ｓ２０９）。出力する情報は、足元位置分布情報１５２そのものでも良いし、足元位置分布情報１５２で示される分布の平均位置や、分布がピークとなる位置の情報でも良い。 The output unit 140 outputs information on the foot position from the foot position distribution information 152 obtained in this way (S209). The output information may be the foot position distribution information 152 itself, or may be information on the average position of the distribution indicated by the foot position distribution information 152 or information on the position at which the distribution reaches a peak.

（１．３．２足元位置分布情報１５２の生成の流れ）
次に、足元位置分布情報１５２を生成する際の処理の流れを説明する。なお、足元位置分布情報１５２は、足元位置検出装置１００が生成してもよいし、他の情報処理装置が生成し、それを足元位置検出装置１００が記憶するようにしても良い。 (1.3.2 Flow of generation of foot position distribution information 152)
Next, the flow of processing when generating the foot position distribution information 152 will be described. Note that the foot position distribution information 152 may be generated by the foot position detection device 100, or may be generated by another information processing device and stored in the foot position detection device 100.

なお、以下の説明において、足元テンプレートはｎ個あるものとし、変数ｋはテンプレート番号を示す整数である。（ｐｘ、ｐｙ）は、入力部分画像である学習用足元画像中の足元位置を示す座標に対応する変数である。なお、学習用足元画像はｎ枚あるものとし、また、学習用足元画像においては、足元位置の座標が特定されているものとする。 In the following description, it is assumed that there are n step templates, and the variable k is an integer indicating a template number. (Px, py) are variables corresponding to coordinates indicating the foot position in the learning foot image that is the input partial image. It is assumed that there are n learning foot images, and the foot position coordinates are specified in the learning foot images.

Ｐ_k（ｘ，ｙ）は、テンプレート番号ｋのテンプレートの足元位置が、入力部分画像中の座標（ｘ，ｙ）に存在する確率を示す。Ｘ、Ｙは、それぞれ入力部分画像のｘ方向及びｙ方向のサイズを示す。ｃ（ｋ）は、テンプレート番号ｋのテンプレートが特定される回数が入る変数である。 P _k (x, y) indicates the probability that the foot position of the template with the template number k exists at the coordinates (x, y) in the input partial image. X and Y indicate the sizes of the input partial images in the x and y directions, respectively. c (k) is a variable in which the number of times the template with the template number k is specified is entered.

まず、全てのｋ，ｘ，ｙの組合せに対し、Ｐ_k（ｘ，ｙ）、ｃ（ｋ）の値を全て０に初期化する（Ｓ３０１）。また、入力部分画像である学習用足元画像の番号を示す変数ｉを１に初期化する（Ｓ３０３）。なおここで、学習用足元画像は計ｎ枚あるものとする。 First, all the values of P _k (x, y) and c (k) are initialized to 0 for all combinations of k, x, and y (S301). Also, the variable i indicating the number of the learning foot image that is the input partial image is initialized to 1 (S303). Here, it is assumed that there are a total of n learning foot images.

入力部１１０は、ｉ番目の学習用足元画像である学習用足元画像［ｉ］の入力を受ける（Ｓ３０５）。また、当該学習用足元画像中の座標を示す（ｐｘ，ｐｙ）を、（１，１）に初期化する（Ｓ３０７）。入力部１１０は、学習用足元画像［ｉ］を、足元位置が（ｐｘ，ｐｙ）となるように足元識別器１２０に対し入力する（Ｓ３０９）。 The input unit 110 receives an input of a learning foot image [i] that is an i-th learning foot image (S305). Also, (px, py) indicating the coordinates in the learning foot image is initialized to (1, 1) (S307). The input unit 110 inputs the learning foot image [i] to the foot classifier 120 such that the foot position is (px, py) (S309).

足元識別器１２０は、当該学習用足元画像［ｉ］の特徴量と、テンプレート１５１中に含まれる足元テンプレートとを比較する。その結果、最も近い足元テンプレートｍを特定する（Ｓ３１１）。これを受けて、Ｐｍ（ｐｘ，ｐｙ）とｃ（ｍ）とを１ずつ加算する（Ｓ３１３）。 The foot classifier 120 compares the feature amount of the learning foot image [i] with the foot template included in the template 151. As a result, the nearest foot template m is specified (S311). In response, Pm (px, py) and c (m) are added one by one (S313).

処理対象の座標を移動させるべく、ｐｘを１加算する（Ｓ３１５）。この結果、ｐｘが学習用足元画像［ｉ］の幅Ｘ以下の場合には（Ｓ３１５のＮｏ）、当該（ｐｘ，ｐｙ）にてＳ３０９に戻って処理を続ける。もしｐｘが学習用画像［ｉ］の幅Ｘよりも大きい場合には（Ｓ３１５のＹｅｓ）、現在のｙ座標について全て処理を終えたことを意味するため、ｐｙを１加算した上でｐｘを１に初期化する（Ｓ３１９）。もしｐｙが学習用足元画像［ｉ］の高さＹ以下の場合には（Ｓ３２１のＮｏ）、Ｓ３０９に戻って当該（ｐｘ，ｐｙ）に対して処理する。 In order to move the coordinates of the processing target, px is incremented by 1 (S315). As a result, when px is equal to or smaller than the width X of the learning footstep image [i] (No in S315), the process returns to S309 at (px, py) and continues the process. If px is larger than the width X of the learning image [i] (Yes in S315), it means that the process has been completed for all the current y coordinates, so that 1 is added to py and 1 is added to px. (S319). If py is less than or equal to the height Y of the learning foot image [i] (No in S321), the process returns to S309 to process the (px, py).

もしｐｙが学習用画像［ｉ］の高さＹよりも大きい場合には（Ｓ３２１のＹｅｓ）、全ての座標について処理を終えたことを意味するため、新たな学習用足元画像に対して処理すべくｉを１加算する（Ｓ３２３）。この結果、ｉが学習用足元画像の数ｎ以下であれば（Ｓ３２５のＮｏ）、Ｓ３０５に元って新たな学習用足元画像に対する処理を行う。 If py is larger than the height Y of the learning image [i] (Yes in S321), it means that the processing has been completed for all coordinates, and processing is performed on a new learning foot image. I is incremented by 1 (S323). As a result, if i is less than or equal to the number n of the learning foot images (No in S325), the processing for the new learning foot images is performed based on S305.

一方、ｉが学習用足元画像の数ｎよりも大きい場合には（Ｓ３２５のＹｅｓ）、全ての学習用足元画像に対する処理を終えたことを意味する。よって、全ての（ｋ，ｘ，ｙ）の組合せに対して、Ｐ_k（ｘ，ｙ）＝Ｐ_k（ｘ，ｙ）／ｃ（ｋ）を計算する（Ｓ３２７）。これにより、各テンプレートｋに対して、各座標（ｘ，ｙ）への足元位置の分布を示す足元位置分布情報１５２を生成することができる。 On the other hand, when i is larger than the number n of the footstep images for learning (Yes in S325), it means that the processing for all the footstep images for learning is finished. Therefore, P _k (x, y) = P _k (x, y) / c (k) is calculated for all combinations of (k, x, y) (S327). Accordingly, the foot position distribution information 152 indicating the foot position distribution at each coordinate (x, y) can be generated for each template k.

なお、Ｐｋ（ｘ，ｙ）の分布形状を滑らかにするために、ガウスフィルタを畳み込む等の平滑化処理を更に行っても良い。また、足元位置分布情報１５２は、Ｐ_k（ｘ，ｙ）そのものであっても良いが、分布の平均位置や、分布のピーク位置をこれに変えて或いは追加しても良い。 In order to smooth the distribution shape of Pk (x, y), smoothing processing such as convolution of a Gaussian filter may be further performed. Further, the foot position distribution information 152 may be P _k (x, y) itself, but the average position of the distribution and the peak position of the distribution may be changed or added thereto.

（１．４ハードウェア構成）
以下、図４を参照しながら、上述してきた足元位置検出装置１００をコンピュータにより実現する場合のハードウェア構成の一例を説明する。なお、足元位置検出装置１００の機能は必ずしも１台のコンピュータ等の情報処理装置により実現する必要はなく、複数台の情報処理装置により実現することも可能である。 (1.4 Hardware configuration)
Hereinafter, an example of a hardware configuration when the above-described foot position detection device 100 is realized by a computer will be described with reference to FIG. Note that the function of the foot position detection device 100 is not necessarily realized by an information processing device such as a single computer, and can be realized by a plurality of information processing devices.

図４に示すように、足元位置検出装置１００は、プロセッサ４０１、メモリ４０３、記憶装置４０５、入力インタフェース（Ｉ／Ｆ）４０７、データＩ／Ｆ４０９、通信Ｉ／Ｆ４１１、及び表示装置４１３を含む。 As shown in FIG. 4, the foot position detection device 100 includes a processor 401, a memory 403, a storage device 405, an input interface (I / F) 407, a data I / F 409, a communication I / F 411, and a display device 413.

プロセッサ４０１は、メモリ４０３に記憶されているプログラムを実行することにより足元位置検出装置１００における様々な処理を制御する。例えば、図１で説明した入力部１１０、足元識別器１２０、足元位置算出部１３０、及び出力部１４０に係る処理は、メモリ４０３に一時記憶された上で主にプロセッサ４０１上で動作するプログラムとして実現可能である。 The processor 401 controls various processes in the foot position detection apparatus 100 by executing a program stored in the memory 403. For example, the processes related to the input unit 110, the foot classifier 120, the foot position calculation unit 130, and the output unit 140 described with reference to FIG. 1 are temporarily stored in the memory 403 and are mainly operated on the processor 401. It is feasible.

メモリ４０３は、例えばＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等の記憶媒体である。メモリ４０３は、プロセッサ４０１によって実行されるプログラムのプログラムコードや、プログラムの実行時に必要となるデータを一時的に記憶する。例えば、メモリ４０３の記憶領域には、プログラム実行時に必要となるスタック領域が確保される。 The memory 403 is a storage medium such as a RAM (Random Access Memory). The memory 403 temporarily stores a program code of a program executed by the processor 401 and data necessary for executing the program. For example, a stack area necessary for program execution is secured in the storage area of the memory 403.

記憶装置４０５は、例えばハードディスクやフラッシュメモリなどの不揮発性の記憶媒体である。記憶装置４０５は、オペレーティングシステムや、入力部１１０、足元識別器１２０、足元位置算出部１３０、及び出力部１４０を実現するための各種プログラムや、 The storage device 405 is a non-volatile storage medium such as a hard disk or a flash memory. The storage device 405 includes an operating system, various programs for realizing the input unit 110, the foot classifier 120, the foot position calculation unit 130, and the output unit 140,

テンプレート１５１及び足元位置分布情報１５２を含む各種データ等を記憶部１５０として記憶する。記憶装置４０５に記憶されているプログラムやデータは、必要に応じてメモリ４０３にロードされることにより、プロセッサ４０１から参照される。 Various data including the template 151 and the foot position distribution information 152 are stored as the storage unit 150. Programs and data stored in the storage device 405 are referred to by the processor 401 by being loaded into the memory 403 as necessary.

入力Ｉ／Ｆ４０７は、ユーザからの入力を受け付けるためのデバイスである。入力Ｉ／Ｆ４０７の具体例としては、キーボードやマウス、タッチパネル、各種センサ等が挙げられる。入力Ｉ／Ｆ４０７は、例えばＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）等のインタフェースを介して足元位置検出装置１００に接続されても良い。 The input I / F 407 is a device for receiving input from the user. Specific examples of the input I / F 407 include a keyboard, a mouse, a touch panel, and various sensors. The input I / F 407 may be connected to the foot position detection device 100 via an interface such as a USB (Universal Serial Bus).

データＩ／Ｆ４０９は、足元位置検出装置１００の外部からデータを入力するためのデバイスである。データＩ／Ｆ４０９の具体例としては、各種記憶媒体に記憶されているデータを読み取るためのドライブ装置等がある。データＩ／Ｆ４０９は、足元位置検出装置１００の外部に設けられることも考えられる。その場合、データＩ／Ｆ４０９は、例えばＵＳＢ等のインタフェースを介して足元位置検出装置１００へと接続される。 The data I / F 409 is a device for inputting data from outside the foot position detection apparatus 100. Specific examples of the data I / F 409 include a drive device for reading data stored in various storage media. It is also conceivable that the data I / F 409 is provided outside the foot position detection device 100. In this case, the data I / F 409 is connected to the foot position detection device 100 via an interface such as USB.

通信Ｉ／Ｆ４１１は、足元位置検出装置１００の外部の装置等との間で有線又は無線によりデータ通信するためのデバイスである。通信Ｉ／Ｆ４１１は足元位置検出装置１００の外部に設けられることも考えられる。その場合、通信Ｉ／Ｆ４１１は、例えばＵＳＢ等のインタフェースを介して足元位置検出装置１００に接続される。 The communication I / F 411 is a device for performing data communication with a device or the like external to the foot position detection device 100 by wire or wireless. It is also conceivable that the communication I / F 411 is provided outside the foot position detection device 100. In this case, the communication I / F 411 is connected to the foot position detection device 100 via an interface such as a USB.

表示装置４１３は、各種情報を表示するためのデバイスである。表示装置４１３の具体例としては、例えば、液晶ディスプレイや有機ＥＬ（Ｅｌｅｃｔｒｏ−Ｌｕｍｉｎｅｓｃｅｎｃｅ）ディスプレイ等が挙げられる。表示装置４１３は、足元位置検出装置１００の外部に設けられても良い。その場合、表示装置４１３は、例えばディスプレイケーブル等を介して足元位置検出装置１００に接続される。 The display device 413 is a device for displaying various information. Specific examples of the display device 413 include a liquid crystal display and an organic EL (Electro-Luminescence) display. The display device 413 may be provided outside the foot position detection device 100. In that case, the display device 413 is connected to the foot position detection device 100 via, for example, a display cable.

（１．５本実施形態に係る効果）
以上説明したように、本実施形態に係る足元位置検出装置１００は、足元検出に用いるテンプレートと足元位置との関係を予め学習することにより、検出された足元の領域から足元位置を推測することができる。これにより、テンプレートを用いて足元領域を検出する足元位置検出装置１００の足元位置を精度高く検出することが可能である。また、本実施形態に係る手法は、足元位置検出に係る必要な計算コストが低いため、廉価なシステムに適用することが可能である。 (1.5 Effects according to this embodiment)
As described above, the foot position detection apparatus 100 according to the present embodiment can infer the foot position from the detected foot area by learning in advance the relationship between the template used for foot detection and the foot position. it can. Thereby, it is possible to detect the foot position of the foot position detecting device 100 that detects the foot region using the template with high accuracy. In addition, the method according to the present embodiment can be applied to an inexpensive system because the necessary calculation cost for detecting the foot position is low.

（２第２実施形態）
以下、第２実施形態を、図５を参照しながら説明する。図５は、情報処理システムである足元位置検出装置５００の機能構成を示すブロック図である。図５に示すように、足元位置検出装置５００は、検出部５１０と、管理部５２０と、出力部５３０とを含む。 (2 Second Embodiment)
The second embodiment will be described below with reference to FIG. FIG. 5 is a block diagram illustrating a functional configuration of a foot position detection device 500 that is an information processing system. As shown in FIG. 5, the foot position detection device 500 includes a detection unit 510, a management unit 520, and an output unit 530.

検出部５１０は、入力画像と、複数のテンプレートとを照合することにより足元領域を検出する。当該テンプレートを用いた足元領域の検出方法としては、例えば、ＧＬＶＱのような高度な統計的手法を用いることが考えられる。 The detection unit 510 detects the foot area by comparing the input image with a plurality of templates. As a method for detecting a foot region using the template, for example, it is conceivable to use an advanced statistical method such as GLVQ.

管理部５２０は、複数のテンプレートの各テンプレート毎に、足元の位置に関する足元位置情報を管理する。出力部５３０は、足元領域の検出に用いたテンプレートに対応する足元位置情報に基づき、足元位置を出力する。
このように実装することで、本実施形態に係る足元位置検出装置５００は、好適に足元位置を検出することができる。 The management unit 520 manages foot position information regarding the position of the foot for each template of the plurality of templates. The output unit 530 outputs the foot position based on the foot position information corresponding to the template used for detecting the foot area.
By mounting in this way, the foot position detection device 500 according to the present embodiment can preferably detect the foot position.

（３付記事項）
なお、前述の実施形態の構成は、組み合わせたり或いは一部の構成部分を入れ替えたりしてもよい。また、本発明の構成は前述の実施形態のみに限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々変更を加えてもよい。 (3 Additional notes)
Note that the configurations of the above-described embodiments may be combined or some of the components may be replaced. The configuration of the present invention is not limited to the above-described embodiment, and various modifications may be made without departing from the scope of the present invention.

なお、前述の各実施形態の一部又は全部は、以下の付記のようにも記載されうるが、以下には限られない。また、本発明のプログラムは、上記の各実施形態で説明した各動作を、コンピュータに実行させるプログラムであれば良い。 A part or all of each of the above-described embodiments can be described as in the following supplementary notes, but is not limited thereto. Moreover, the program of this invention should just be a program which makes a computer perform each operation | movement demonstrated in said each embodiment.

（付記１）
入力画像と、複数のテンプレートとを照合することにより足元領域を検出する検出手段と、前記複数のテンプレートの各テンプレート毎に、足元の位置に関する足元位置情報を管理する管理手段と、前記足元領域の検出に用いたテンプレートに対応する前記足元位置情報に基づき、足元位置を出力する出力手段とを備える情報処理システム。 (Appendix 1)
Detection means for detecting a foot area by comparing an input image with a plurality of templates, management means for managing foot position information regarding the position of the foot for each template of the plurality of templates, An information processing system comprising: output means for outputting a foot position based on the foot position information corresponding to the template used for detection.

（付記２）
前記管理手段が管理する前記足元位置情報は、前記足元位置の分布をテンプレート毎に示す情報である、付記１記載の情報処理システム。 (Appendix 2)
The information processing system according to appendix 1, wherein the foot position information managed by the management means is information indicating a distribution of the foot positions for each template.

（付記３）
前記出力手段が出力する前記足元位置は、前記足元位置情報、前記足元位置の平均値、又は前記足元位置の分布のピークを示す位置、の少なくともいずれかである、付記１又は付記２記載の情報処理システム。 (Appendix 3)
The information according to appendix 1 or appendix 2, wherein the foot position output by the output means is at least one of the foot position information, an average value of the foot positions, or a position indicating a peak of the distribution of the foot positions. Processing system.

（付記４）
前記検出手段は、ＧＬＶＱ（ＧｅｎｅｒａｌｉｚｅｄＬｅａｒｎｉｎｇＶｅｃｔｏｒＱｕａｎｔｉｚａｔｉｏｎ）により前記足元領域を検出する、付記１乃至付記３のいずれか１項記載の情報処理システム。 (Appendix 4)
The information processing system according to any one of appendix 1 to appendix 3, wherein the detection unit detects the foot region by GLVQ (Generalized Learning Vector Quantization).

（付記５）
入力画像と、複数のテンプレートとを照合することにより足元領域を検出するステップと、前記複数のテンプレートの各テンプレート毎に、足元の位置に関する足元位置情報を管理するステップと、前記足元領域の検出に用いたテンプレートに対応する前記足元位置情報に基づき、足元位置を出力するステップとを情報処理装置が行う情報処理方法。 (Appendix 5)
A step of detecting a foot area by comparing an input image with a plurality of templates, a step of managing foot position information regarding the position of the foot for each template of the plurality of templates, and detection of the foot area An information processing method in which an information processing apparatus performs a step of outputting a foot position based on the foot position information corresponding to the used template.

（付記６）
管理される前記足元位置情報は、前記足元位置の分布をテンプレート毎に示す情報である、付記５記載の情報処理方法。 (Appendix 6)
The information processing method according to appendix 5, wherein the managed foot position information is information indicating a distribution of the foot positions for each template.

（付記７）
出力される前記足元位置は、前記足元位置情報、前記足元位置の平均値、又は前記足元位置の分布のピークを示す位置、の少なくともいずれかである、付記５又は付記６記載の情報処理方法。 (Appendix 7)
The information processing method according to appendix 5 or appendix 6, wherein the output foot position is at least one of the foot position information, an average value of the foot positions, or a position indicating a peak of the distribution of the foot positions.

（付記８）
ＧＬＶＱ（ＧｅｎｅｒａｌｉｚｅｄＬｅａｒｎｉｎｇＶｅｃｔｏｒＱｕａｎｔｉｚａｔｉｏｎ）により前記足元領域を検出する、付記５乃至付記７のいずれか１項記載の情報処理方法。 (Appendix 8)
The information processing method according to any one of appendix 5 to appendix 7, wherein the foot region is detected by GLVQ (Generalized Learning Vector Quantization).

（付記９）
入力画像と、複数のテンプレートとを照合することにより足元領域を検出する処理と、
前記複数のテンプレートの各テンプレート毎に、足元の位置に関する足元位置情報を管理する処理と、前記足元領域の検出に用いたテンプレートに対応する前記足元位置情報に基づき、足元位置を出力する処理とをコンピュータに実行させるプログラム。 (Appendix 9)
A process of detecting a foot area by matching an input image with a plurality of templates,
For each template of the plurality of templates, a process for managing foot position information related to a foot position, and a process for outputting a foot position based on the foot position information corresponding to the template used for detecting the foot area. A program to be executed by a computer.

（付記１０）
管理される前記足元位置情報は、前記足元位置の分布をテンプレート毎に示す情報である、付記９記載のプログラム。 (Appendix 10)
The program according to appendix 9, wherein the managed foot position information is information indicating a distribution of the foot positions for each template.

（付記１１）
出力される前記足元位置は、前記足元位置情報、前記足元位置の平均値、又は前記足元位置の分布のピークを示す位置、の少なくともいずれかである、付記９又は付記１０記載のプログラム。 (Appendix 11)
The program according to appendix 9 or appendix 10, wherein the output foot position is at least one of the foot position information, an average value of the foot positions, or a position indicating a peak of the distribution of the foot positions.

（付記１２）
ＧＬＶＱ（ＧｅｎｅｒａｌｉｚｅｄＬｅａｒｎｉｎｇＶｅｃｔｏｒＱｕａｎｔｉｚａｔｉｏｎ）により前記足元領域を検出する、付記５乃至付記７のいずれか１項記載のプログラム。 (Appendix 12)
The program according to any one of appendix 5 to appendix 7, wherein the foot region is detected by GLVQ (Generalized Learning Vector Quantization).

１００：足元位置検出装置
１１０：入力部
１２０：足元識別器
１３０：足元位置算出部
１４０：出力部
１５０：記憶部
１５１：テンプレート
１５２：足元位置分布情報
４０１：プロセッサ
４０３：メモリ
４０５：記憶装置
４０７：入力インタフェース
４０９：データインタフェース
４１１：通信インタフェース
４１３：表示装置
５００：足元位置検出装置
５１０：検出部
５２０：管理部
５３０：出力部 100: foot position detection device 110: input unit 120: foot identifier 130: foot position calculation unit 140: output unit 150: storage unit 151: template 152: foot position distribution information 401: processor 403: memory 405: storage device 407: Input interface 409: Data interface 411: Communication interface 413: Display device 500: Foot position detection device 510: Detection unit 520: Management unit 530: Output unit

Claims

Detecting means for detecting a step region by comparing an input image with a plurality of templates;
Management means for managing foot position information relating to the position of the foot for each template of the plurality of templates;
An information processing system comprising: output means for outputting a foot position based on the foot position information corresponding to the template used for detecting the foot area.

The foot position information managed by the management means is information indicating a distribution of the foot positions for each template.
The information processing system according to claim 1.

The foot position output by the output means is at least one of the foot position information, an average value of the foot positions, or a position indicating a peak of the distribution of the foot positions.
The information processing system according to claim 1 or 2.

The detection means detects the foot region by GLVQ (Generalized Learning Vector Quantization).
The information processing system according to any one of claims 1 to 3.

Detecting a step region by matching an input image with a plurality of templates;
For each template of the plurality of templates, managing step position information regarding the position of the step;
An information processing method in which an information processing apparatus performs a step of outputting a foot position based on the foot position information corresponding to the template used for detecting the foot area.

A process of detecting a foot area by matching an input image with a plurality of templates,
For each template of the plurality of templates, a process for managing foot position information regarding the foot position;
A program for causing a computer to execute a process of outputting a foot position based on the foot position information corresponding to the template used for detecting the foot area.