JP2013152595A

JP2013152595A - Information processing apparatus and method, and program

Info

Publication number: JP2013152595A
Application number: JP2012012863A
Authority: JP
Inventors: Yasushi Shu; 寧周; Jun Yokono; 順横野
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2012-01-25
Filing date: 2012-01-25
Publication date: 2013-08-08

Abstract

PROBLEM TO BE SOLVED: To achieve fast and robust recognition of a recognition object.SOLUTION: A recognition unit recognizes a recognition object from an input image, and a learning unit learns a feature quantity of a recognition object for each frame using a feature quantity having a score changing in time series. The recognition unit recognizes, on the basis of the feature quantity of a previous frame obtained as the learning results of the learning unit, the recognition object of a current frame. This technique can be applied to, for instance, an information processing apparatus such as a personal computer.

Description

本技術は、情報処理装置および方法、並びにプログラムに関し、特に、認識物体を高速かつロバストに認識することができるようにする情報処理装置および方法、並びにプログラムに関する。 The present technology relates to an information processing device and method, and a program, and more particularly, to an information processing device and method, and a program that enable a recognition object to be recognized at high speed and robustly.

近年、ジェスチャコントロールや自動監視システム等を実現するため、物体を追跡する様々な手法が提案されている。 In recent years, various methods for tracking an object have been proposed in order to realize gesture control, an automatic monitoring system, and the like.

例えば、米国マサチューセッツ工科大学（MIT）の「SixthSense」や、米国Microsoft社の「Kinect」（商標）のように、色や深度等の特徴量を用いて、手や体等の形状やテクスチャが変化する物体追跡の実用例が存在する。 For example, as in Massachusetts Institute of Technology (MIT) "SixthSense" and Microsoft Corporation "Kinect" (trademark), shapes and textures of hands and bodies are changed using features such as color and depth. There are practical examples of tracking objects.

手ジェスチャを認識する技術としては、まず、手のみが撮像されている画像を用いるか、または、画像内における手の位置が指定されることにより、手のみが含まれている画像が抽出され、抽出されたその画像について、肌色情報、動き検出、およびパターンマッチングと言った手法により手ジェスチャを認識するものが提案されている（特許文献１参照）。 As a technique for recognizing a hand gesture, first, an image in which only the hand is captured is used, or an image including only the hand is extracted by specifying the position of the hand in the image. As for the extracted image, there has been proposed a technique for recognizing a hand gesture by a technique called skin color information, motion detection, and pattern matching (see Patent Document 1).

また、複数の手の形状等、姿勢やジェスチャを複数の辞書として事前学習により定義し、認識の際に、認識物体の状態や時系列変化に応じて、複数の辞書を切り替えて用いたり、同時に用いる手法も提案されている。 In addition, the posture and gestures such as multiple hand shapes are defined by prior learning as multiple dictionaries, and when recognizing, the multiple dictionaries can be switched and used at the same time depending on the state of the recognized object and time-series changes. A technique to be used has also been proposed.

特開２００７−３３３６９０号公報JP 2007-333690 A

しかしながら、特許文献１のような技術では、様々な状態の認識物体を事前学習する必要がある上に、事前学習したときの状態と同じ状態の認識物体しか認識できなかった。 However, in the technique such as Patent Document 1, it is necessary to previously learn recognition objects in various states, and only a recognition object in the same state as the state when learning in advance can be recognized.

すなわち、認識物体としての手の形状や色が大きく変化した場合には、認識性能に影響が出てしまい、また、認識物体の変化に対応できるように多数の辞書を定義しても、全ての状態空間（特徴量空間）を含めることには限界があった。 In other words, if the shape or color of the hand as a recognized object changes greatly, the recognition performance will be affected, and even if a large number of dictionaries are defined so as to cope with changes in the recognized object, There was a limit to including the state space (feature space).

特に、オンラインで、変化のある認識物体を認識する場合には、認識物体の様々な状態や時系列変化を事前学習した上で、認識物体を高速かつロバストに認識する必要があった。 In particular, when recognizing a recognized recognition object online, it is necessary to recognize the recognition object at high speed and robustly after learning in advance the various states and time-series changes of the recognition object.

本技術は、このような状況に鑑みてなされたものであり、認識物体を高速かつロバストに認識することができるようにするものである。 The present technology has been made in view of such a situation, and makes it possible to recognize a recognition object at high speed and robustly.

本技術の一側面の情報処理装置は、入力画像から認識物体を認識する認識部と、時系列で変化するスコアを有する特徴量を用いて、フレーム毎の前記認識物体の前記特徴量を学習する学習部とを備え、前記認識部は、前記学習部の学習結果として得られた前フレームの前記特徴量に基づいて、現フレームの前記認識物体を認識する。 An information processing apparatus according to an aspect of the present technology learns the feature amount of the recognition object for each frame using a recognition unit that recognizes a recognition object from an input image and a feature amount having a score that changes in time series. A learning unit, and the recognition unit recognizes the recognition object of the current frame based on the feature amount of the previous frame obtained as a learning result of the learning unit.

前記情報処理装置には、時系列で状態が変化する前記認識物体を含む学習用画像から、フレーム毎の前記認識物体の前記特徴量を事前に学習する事前学習部と、前記事前学習部の学習結果として得られたフレーム毎の前記特徴量の前記スコアを事前に計算する計算部をさらに設け、前記学習部には、前記計算部により事前に計算された前記スコアに応じた前記特徴量を用いて、前記入力画像のフレーム毎の前記認識物体の前記特徴量を学習させることができる。 The information processing apparatus includes: a pre-learning unit that learns in advance the feature amount of the recognized object for each frame from a learning image including the recognized object whose state changes in time series; and A calculation unit that pre-calculates the score of the feature amount for each frame obtained as a learning result is further provided, and the learning unit includes the feature amount corresponding to the score calculated in advance by the calculation unit. It is possible to learn the feature amount of the recognized object for each frame of the input image.

前記情報処理装置には、前記計算部により事前に計算された前記スコアを有する現フレームに対応する前記特徴量から、所定の閾値より高いスコアを有する前記特徴量を選択する選択部をさらに設け、前記学習部には、前記選択部により選択された前記特徴量を用いて、現フレームの前記認識物体の前記特徴量を学習させることができる。 The information processing apparatus further includes a selection unit that selects the feature amount having a score higher than a predetermined threshold from the feature amount corresponding to the current frame having the score calculated in advance by the calculation unit, The learning unit can learn the feature amount of the recognized object in the current frame using the feature amount selected by the selection unit.

前記情報処理装置には、前記学習部の学習結果として得られた前記特徴量を記憶する記憶部と、前記記憶部に記憶されている前記特徴量を、前記学習部の学習結果に応じてフレーム毎に更新する更新部をさらに設け、前記認識部には、前記更新部により更新された前フレームの前記特徴量に基づいて、現フレームの前記認識物体を認識させることができる。 The information processing apparatus includes a storage unit that stores the feature amount obtained as a learning result of the learning unit, and a frame that stores the feature amount stored in the storage unit according to the learning result of the learning unit. An update unit that updates each time is further provided, and the recognition unit can recognize the recognition object of the current frame based on the feature amount of the previous frame updated by the update unit.

前記選択部には、前記記憶部に記憶されている前フレームの前記特徴量から、他の閾値より低いスコアを有する前記特徴量を選択させ、前記学習部は、前記選択部により選択された前記特徴量を除いた前記特徴量を用いて、現フレームの前記認識物体の前記特徴量を学習させることができる。 The selection unit is configured to select the feature amount having a score lower than another threshold value from the feature amount of the previous frame stored in the storage unit, and the learning unit is selected by the selection unit The feature amount of the recognized object in the current frame can be learned using the feature amount excluding the feature amount.

前記更新部には、前記事前学習部の学習結果として得られた前記特徴量を、前記学習部の学習結果に応じてフレーム毎に更新させることができる。 The update unit can update the feature amount obtained as a learning result of the pre-learning unit for each frame according to the learning result of the learning unit.

本技術の一側面の情報処理方法は、入力画像から認識物体を認識する認識部と、時系列で変化するスコアを有する特徴量を用いて、フレーム毎の前記認識物体の前記特徴量を学習する学習部とを備える情報処理装置の情報処理方法であって、前記情報処理装置が、時系列で変化するスコアを有する特徴量を用いて、フレーム毎の前記認識物体の前記特徴量を学習し、前記学習部の学習結果として得られた前フレームの前記特徴量に基づいて、現フレームの前記認識物体を認識するステップを含む。 An information processing method according to an aspect of the present technology learns the feature amount of the recognized object for each frame using a recognition unit that recognizes a recognized object from an input image and a feature amount having a score that changes in time series. An information processing method of an information processing device comprising a learning unit, wherein the information processing device learns the feature amount of the recognized object for each frame using a feature amount having a score that changes in time series, Recognizing the recognized object of the current frame based on the feature amount of the previous frame obtained as a learning result of the learning unit.

本技術の一側面のプログラムは、入力画像から認識物体を認識する認識ステップと、
時系列で変化するスコアを有する特徴量を用いて、フレーム毎の前記認識物体の前記特徴量を学習する学習ステップとを含む処理をコンピュータに実行させ、前記認識ステップは、前記学習ステップの学習結果として得られた前フレームの前記特徴量に基づいて、現フレームの前記認識物体を認識する。 A program according to one aspect of the present technology includes a recognition step of recognizing a recognition object from an input image;
A learning step of learning the feature amount of the recognized object for each frame using a feature amount having a score that changes in time series, and the recognition step is a learning result of the learning step. The recognition object of the current frame is recognized based on the feature amount of the previous frame obtained as.

本技術の一側面においては、時系列で変化するスコアを有する特徴量を用いて、フレーム毎の認識物体の特徴量が学習され、学習部の学習結果として得られた前フレームの特徴量に基づいて、現フレームの認識物体が認識される。 In one aspect of the present technology, the feature amount of the recognition object for each frame is learned using the feature amount having a score that changes in time series, and the feature amount of the previous frame obtained as a learning result of the learning unit is used. Thus, the recognition object of the current frame is recognized.

本技術の一側面によれば、認識物体を高速かつロバストに認識することが可能となる。 According to one aspect of the present technology, a recognition object can be recognized at high speed and robustly.

本技術を適用した情報処理装置の一実施の形態の構成を示すブロック図である。It is a block diagram which shows the structure of one Embodiment of the information processing apparatus to which this technique is applied. 本技術を適用したパーソナルコンピュータの一実施の形態の構成を示すブロック図である。It is a block diagram which shows the structure of one Embodiment of the personal computer to which this technique is applied. オフライン学習処理について説明するフローチャートである。It is a flowchart explaining an offline learning process. 特徴量のスコアの時系列変化を示す図である。It is a figure which shows the time-sequential change of the score of a feature-value. 物体追跡処理について説明するフローチャートである。It is a flowchart explaining an object tracking process. 物体認識処理について説明するフローチャートである。It is a flowchart explaining an object recognition process. 学習処理について説明するフローチャートである。It is a flowchart explaining a learning process. スコアに応じた特徴量の入れ替えについて説明する図である。It is a figure explaining replacement of the feature-value according to a score.

以下、本技術の実施の形態について図を参照して説明する。なお、説明は以下の順序で行う。
１．情報処理装置の構成
２．パーソナルコンピュータの構成
３．オフライン学習処理
４．物体追跡処理
５．物体認識処理
６．学習処理
７．その他 Hereinafter, embodiments of the present technology will be described with reference to the drawings. The description will be given in the following order.
1. 1. Configuration of information processing apparatus 2. Configuration of personal computer Offline learning process Object tracking processing Object recognition processing Learning process Other

＜１．情報処理装置の構成＞
本技術を適用した情報処理装置の一実施の形態の構成を示すブロック図である。 <1. Configuration of information processing apparatus>
It is a block diagram which shows the structure of one Embodiment of the information processing apparatus to which this technique is applied.

図１の情報処理装置１は、入力画像から認識対象である認識物体を認識し、その認識結果を出力する。情報処理装置１は、オフライン処理部２とオンライン処理部３とから構成されている。オフライン処理部２は、予め用意された学習用データを基に、認識物体の情報を事前に学習（オフライン学習）し、その学習結果をオンライン処理部３に供給する。オンライン処理部３は、オフライン処理部２からの学習結果を基に、入力画像から認識物体を認識するとともに、認識対象の情報を学習（オンライン学習）する。 The information processing apparatus 1 in FIG. 1 recognizes a recognition object that is a recognition target from an input image, and outputs the recognition result. The information processing apparatus 1 includes an offline processing unit 2 and an online processing unit 3. The offline processing unit 2 learns information on the recognized object in advance (offline learning) based on learning data prepared in advance, and supplies the learning result to the online processing unit 3. The online processing unit 3 recognizes the recognized object from the input image based on the learning result from the offline processing unit 2 and learns information on the recognition target (online learning).

オフライン処理部２は、学習用データ保存部１１および学習処理部１２を備え、オンライン処理部３は、オフライン辞書保存部１３、入力部１４、認識部１５、および学習処理部１６を備えている。 The offline processing unit 2 includes a learning data storage unit 11 and a learning processing unit 12, and the online processing unit 3 includes an offline dictionary storage unit 13, an input unit 14, a recognition unit 15, and a learning processing unit 16.

学習用データ保存部１１には、認識物体の情報の学習のために事前に収集された学習用データが保存されている。学習用データは、認識物体を含む動画像（以下、学習用画像ともいう）である。 The learning data storage unit 11 stores learning data collected in advance for learning the information of the recognized object. The learning data is a moving image including a recognition object (hereinafter also referred to as a learning image).

学習処理部１２は、学習用データ保存部１１に保存されている学習用画像から、フレーム毎の認識物体の情報として、認識物体の特徴量を事前に学習（オフライン学習）し、その学習結果をオフライン辞書保存部１３に供給する。 The learning processing unit 12 learns the feature amount of the recognized object in advance (offline learning) from the learning image stored in the learning data storage unit 11 as the recognition object information for each frame, and the learning result is obtained. The data is supplied to the offline dictionary storage unit 13.

また、学習処理部１２は、学習部２１および計算部２２を備えている。学習部２１は、フレーム毎の認識物体の特徴量を事前に学習し、計算部２２は、フレーム毎の認識物体の特徴量のスコアを計算する。スコアは、その特徴量によっていかに認識物体が認識できるかの度合を表すパラメータであり、スコアの高い特徴量ほど、認識物体を精度よく認識するのに適した（良い）特徴量となる。 The learning processing unit 12 includes a learning unit 21 and a calculation unit 22. The learning unit 21 learns the feature amount of the recognized object for each frame in advance, and the calculation unit 22 calculates the score of the feature amount of the recognized object for each frame. The score is a parameter that represents the degree to which the recognized object can be recognized by the feature quantity, and the feature quantity having a higher score is a (good) feature quantity suitable for accurately recognizing the recognized object.

オフライン辞書保存部１３には、学習処理部１２からの学習結果が、辞書（オフライン辞書）として保存される。辞書は、認識物体の形状や状態の変化のパターン毎に与えられる認識器であり、学習結果としての特徴量やそのスコアを含むパラメータを有している。 The offline dictionary storage unit 13 stores the learning result from the learning processing unit 12 as a dictionary (offline dictionary). The dictionary is a recognizer that is given for each pattern of changes in the shape and state of a recognized object, and has parameters including a feature amount as a learning result and its score.

入力部１４は、カメラ等により被写体を撮像することにより取得された動画像を入力し、その動画像（入力画像）を認識部１５に出力する。なお、入力部１４は、それ自体が被写体を撮像するカメラであってもよい。また、カメラは、撮影方向が固定の固定カメラであるものとする。 The input unit 14 inputs a moving image acquired by imaging a subject with a camera or the like, and outputs the moving image (input image) to the recognition unit 15. Note that the input unit 14 may itself be a camera that captures an image of a subject. The camera is a fixed camera whose shooting direction is fixed.

認識部１５は、入力画像からフレーム毎に認識物体を認識し、その認識結果を学習処理部１６に供給する。 The recognition unit 15 recognizes a recognition object for each frame from the input image, and supplies the recognition result to the learning processing unit 16.

学習処理部１６は、オフライン辞書保存部１３に保存されている辞書を用いて、認識部１５からの認識結果に基づいて、入力画像における認識物体の特徴量を学習（オンライン学習）する。 The learning processing unit 16 uses the dictionary stored in the offline dictionary storage unit 13 to learn (online learning) the feature quantity of the recognized object in the input image based on the recognition result from the recognition unit 15.

また、学習処理部１６は、選択部２３、学習部２４、記憶部２５、および更新部２６を備えている。選択部２３は、オフライン辞書保存部１３の辞書に含まれる特徴量をそのスコアに応じて選択する。スコアの詳細は後述する。学習部２４は、選択部２３により選択された特徴量を用いて、認識物体の特徴量を学習する。記憶部２５は、学習部２４の学習結果を辞書（オンライン辞書）として記憶する。更新部２６は、記憶部２５に記憶されているオンライン辞書を、学習部２４の学習結果に応じて更新する。 The learning processing unit 16 includes a selection unit 23, a learning unit 24, a storage unit 25, and an update unit 26. The selection unit 23 selects a feature amount included in the dictionary of the offline dictionary storage unit 13 according to the score. Details of the score will be described later. The learning unit 24 learns the feature amount of the recognized object using the feature amount selected by the selection unit 23. The storage unit 25 stores the learning result of the learning unit 24 as a dictionary (online dictionary). The update unit 26 updates the online dictionary stored in the storage unit 25 according to the learning result of the learning unit 24.

また、この情報処理装置１は、図２に示されるような、ソフトウェアを実行することで所定の機能を実現するパーソナルコンピュータ３１により構成することもできる。 The information processing apparatus 1 can also be configured by a personal computer 31 that realizes a predetermined function by executing software as shown in FIG.

＜２．パーソナルコンピュータの構成＞
図２は、本技術を適用したパーソナルコンピュータの一実施の形態の構成を示すブロック図である。 <2. Configuration of personal computer>
FIG. 2 is a block diagram illustrating a configuration of an embodiment of a personal computer to which the present technology is applied.

情報処理装置としてのパーソナルコンピュータ３１は、バス４１、CPU（Central Processing Unit）４２、ROM（Read Only Memory）４３、RAM（Random Access Memory）４４、入力部４５、出力部４６、記憶部４７、通信部４８、ドライブ４９、およびリムーバブルメディア５０から構成されている。 A personal computer 31 as an information processing apparatus includes a bus 41, a CPU (Central Processing Unit) 42, a ROM (Read Only Memory) 43, a RAM (Random Access Memory) 44, an input unit 45, an output unit 46, a storage unit 47, and a communication. The unit 48, the drive 49, and the removable medium 50 are configured.

バス４１は、CPU４２、ROM４３、RAM４４、入力部４５、出力部４６、記憶部４７、通信部４８、ドライブ４９をそれぞれ相互に接続する。 The bus 41 connects the CPU 42, ROM 43, RAM 44, input unit 45, output unit 46, storage unit 47, communication unit 48, and drive 49 to each other.

CPU４２は、パーソナルコンピュータ３１の各種の動作を制御することで、図１の情報処理装置１の各種の機能を実現する。 The CPU 42 realizes various functions of the information processing apparatus 1 in FIG. 1 by controlling various operations of the personal computer 31.

ROM４３は、パーソナルコンピュータ３１において実行される各種の処理プログラムや処理に必要なデータなどを記録する。RAM４４は、各種の処理において得られたデータを一時的に記録保持するなどのように、各種の処理の作業領域として用いられる。 The ROM 43 records various processing programs executed in the personal computer 31 and data necessary for the processing. The RAM 44 is used as a work area for various processes such as temporarily recording and holding data obtained in various processes.

入力部４５は、キーボード、マウス、マイクロフォンなどよりなる。出力部４６は、ディスプレイ、スピーカなどよりなる。記憶部４７は、ハードディスクや不揮発性のメモリなどよりなる。 The input unit 45 includes a keyboard, a mouse, a microphone, and the like. The output unit 46 includes a display, a speaker, and the like. The storage unit 47 includes a hard disk, a nonvolatile memory, and the like.

通信部４８は、ネットワークインタフェースなどよりなる。ドライブ４９は、磁気ディスク、光ディスク、光磁気ディスク、或いは半導体メモリなどのリムーバブルメディア５０を駆動する。 The communication unit 48 includes a network interface. The drive 49 drives a removable medium 50 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.

以上のように構成されるパーソナルコンピュータ３１においては、CPU４２が、例えば、ROM４３や記憶部４７に記憶されているプログラムを、バス４１を介して、RAM４４にロードして実行することにより、各種の処理が行われる。 In the personal computer 31 configured as described above, the CPU 42 loads various programs, for example, stored in the ROM 43 or the storage unit 47 to the RAM 44 via the bus 41 and executes them. Is done.

CPU４２が実行するプログラムは、例えば、パッケージメディア等としてのリムーバブルメディア５０に記録して提供される。 The program executed by the CPU 42 is provided, for example, by being recorded on a removable medium 50 as a package medium or the like.

なお、パッケージメディアとしては、磁気ディスク（フレキシブルディスクを含む）、光ディスク（CD-ROM（Compact Disc-Read Only Memory），DVD（Digital Versatile Disc）等）、光磁気ディスク、もしくは半導体メモリなどが用いられる。 As the package medium, a magnetic disk (including a flexible disk), an optical disk (CD-ROM (Compact Disc-Read Only Memory), DVD (Digital Versatile Disc), etc.), a magneto-optical disk, or a semiconductor memory is used. .

また、プログラムは、ローカルエリアネットワーク、インターネット、デジタル衛星放送といった、有線または無線の伝送媒体を介して提供することができる。 The program can be provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting.

パーソナルコンピュータ３１においては、プログラムは、リムーバブルメディア５０をドライブ４９に装着することにより、バス４１を介して、記憶部４７にインストールすることができる。 In the personal computer 31, the program can be installed in the storage unit 47 via the bus 41 by attaching the removable medium 50 to the drive 49.

また、プログラムは、有線または無線の伝送媒体を介して、通信部４８で受信し、記憶部４７にインストールすることができる。その他、プログラムは、ROM４３や記憶部４７に、あらかじめインストールしておくことができる。 The program can be received by the communication unit 48 via a wired or wireless transmission medium and installed in the storage unit 47. In addition, the program can be installed in the ROM 43 or the storage unit 47 in advance.

なお、パーソナルコンピュータ３１が実行するプログラムは、本明細書で説明する順序に沿って時系列に処理が行われる処理であっても良いし、並列に、あるいは呼び出しが行われたとき等の必要なタイミングで処理が行われるプログラムであっても良い。 Note that the program executed by the personal computer 31 may be a process in which processing is performed in time series in the order described in this specification, or may be necessary in parallel or when a call is made. It may be a program that performs processing at timing.

＜３．オフライン学習処理＞
次に、図３のフローチャートを参照して、情報処理装置１のオフライン処理部２によるオフライン学習処理について説明する。 <3. Offline learning process>
Next, offline learning processing by the offline processing unit 2 of the information processing apparatus 1 will be described with reference to the flowchart of FIG.

ステップＳ１１において、学習処理部１２は、学習用データ保存部１１に保存されている学習用画像を取得する。学習用画像は、時系列で形状や状態が変化する認識物体を含む動画像である。ここでは、認識物体は、手であるものとし、学習用画像は、時系列でその形状が変化する手を含む動画像であるものとする。具体的には、例えば、パーの形状から物を払うジェスチャをする手を複数人について複数回撮像した動画像が、学習用画像として用意される。 In step S 11, the learning processing unit 12 acquires a learning image stored in the learning data storage unit 11. The learning image is a moving image including a recognition object whose shape and state change in time series. Here, it is assumed that the recognition object is a hand, and the learning image is a moving image including a hand whose shape changes in time series. Specifically, for example, a moving image in which a hand making a gesture for paying an object from the shape of a par is captured a plurality of times is prepared as a learning image.

ステップＳ１２において、学習処理部１２の学習部２１は、学習用データ保存部１１から取得した学習用画像から、認識対象の時系列状態を学習する。具体的には、学習部２１は、学習用画像から、認識物体の特徴量の時系列変化を学習する。ここでの学習においては、認識物体の位置および（あるいは）大きさに応じた情報を含むラベル情報に基づいてクラスタリングや補間等を行うようにしてもよいし、HMM(Hidden Markov Model)等のモデリング手法を用いるようにしてもよい。 In step S 12, the learning unit 21 of the learning processing unit 12 learns the recognition target time-series state from the learning image acquired from the learning data storage unit 11. Specifically, the learning unit 21 learns a time-series change in the feature amount of the recognized object from the learning image. In this learning, clustering, interpolation, etc. may be performed based on label information including information according to the position and / or size of the recognized object, or modeling such as HMM (Hidden Markov Model) A technique may be used.

ステップＳ１３において、学習処理部１２は、学習部２１の学習結果を用いて、時系列で学習用画像を正規化する。 In step S 13, the learning processing unit 12 normalizes the learning image in time series using the learning result of the learning unit 21.

ステップＳ１４において、学習処理部１２の計算部２２は、正規化された学習用画像を用いて、フレーム毎に特徴量のスコアを計算する。 In step S14, the calculation unit 22 of the learning processing unit 12 calculates a feature amount score for each frame using the normalized learning image.

図４は、特徴量のスコアの時系列変化の例を示している。 FIG. 4 shows an example of the time-series change of the feature amount score.

図４においては、ｎ個の特徴量f1，f2，・・・，fnのうち、特徴量f1，f2，fnのスコアの時刻ｔに対する変化、すなわちフレーム毎の変化が示されている。 In FIG. 4, among n feature quantities f1, f2,..., Fn, changes in the scores of the feature quantities f1, f2, fn with respect to time t, that is, changes for each frame are shown.

図４に示されるように、特徴量f1のスコアは、特徴量f2，fnのスコアよりも常に高い。すなわち、特徴量f1は、特徴量f2，fnと比べて学習用画像における手形状変化を認識するのに好適であるということができる。 As shown in FIG. 4, the score of the feature quantity f1 is always higher than the scores of the feature quantities f2 and fn. That is, it can be said that the feature amount f1 is more suitable for recognizing a hand shape change in the learning image than the feature amounts f2 and fn.

ステップＳ１５において、学習処理部１２は、学習結果として得られたフレーム毎の認識物体の特徴量とそのスコアを、オフライン辞書保存部１３に供給し、オフライン辞書（オフライン認識器）として保存する。 In step S 15, the learning processing unit 12 supplies the feature quantity and the score of the recognized object for each frame obtained as a learning result to the offline dictionary storage unit 13 and stores it as an offline dictionary (offline recognizer).

このようにして、学習用画像を用いて、認識物体についてのオフライン学習が行われる。 In this way, offline learning is performed on the recognized object using the learning image.

なお、本実施の形態では、オフライン学習の学習結果を用いて、オンライン学習が行われるが、オンライン学習においても、オフライン学習で学習した手形状変化と同一の手形状変化を学習する。 In the present embodiment, online learning is performed using the learning result of offline learning. In online learning, the same hand shape change as that learned by offline learning is learned.

ここで、従来の学習においては、特徴量の学習（パラメータ調整）とスコアの計算（評価）とは、同フレームについて行われるのが一般的であったが、本実施の形態のオフライン学習においては、後述するオンライン学習の特徴を考慮して、現在注目している現フレームにおいて特徴量の学習を行い、現フレームより時間的に後の次フレームにおいてその特徴量のスコアを計算するものとする。 Here, in the conventional learning, the feature amount learning (parameter adjustment) and the score calculation (evaluation) are generally performed for the same frame, but in the offline learning of the present embodiment, Considering the feature of online learning described later, it is assumed that the feature amount is learned in the current frame of interest, and the feature amount score is calculated in the next frame temporally after the current frame.

なお、以上においては、オフライン学習は、特徴量毎に行われるものとしたが、複数の特徴量の組み合わせである特徴量組毎に行われるようにしてもよい。 In the above description, offline learning is performed for each feature amount, but may be performed for each feature amount group that is a combination of a plurality of feature amounts.

＜４．物体追跡処理＞
次に、図５のフローチャートを参照して、情報処理装置１のオンライン処理部３による物体追跡処理について説明する。物体追跡処理においては、オフライン学習処理で学習した認識物体の形状や状態の変化のパターンと同様の変化をする認識物体の追跡が行われる。具体的には、例えば、物体追跡処理においては、入力画像において、パーの形状から物を払うジェスチャをする手の追跡が行われる。 <4. Object tracking processing>
Next, the object tracking process by the online processing unit 3 of the information processing apparatus 1 will be described with reference to the flowchart of FIG. In the object tracking process, a recognition object that changes in the same manner as the shape and state change pattern of the recognition object learned in the offline learning process is tracked. Specifically, for example, in the object tracking process, a hand that makes a gesture of paying an object from the shape of a par is tracked in the input image.

ステップＳ３１において、入力部１４は、認識物体を設定する。ここでは、認識物体である手の形状変化の初期状態である、パーの形状の手が設定される。 In step S31, the input unit 14 sets a recognition object. Here, the hand in the shape of a par, which is the initial state of the shape change of the hand that is the recognition object, is set.

ステップＳ３２において、入力部１４は、画像を取得する。すなわち、被写体を撮像して得られた入力画像が取得される。 In step S32, the input unit 14 acquires an image. That is, an input image obtained by imaging the subject is acquired.

ステップＳ３３において、入力部１４は、認識物体を検出する。すなわち、ステップＳ３１の処理で設定されたパーの形状の手が、ステップＳ３２の処理により取得された入力画像から検出される。ここでの認識物体の検出は、ユーザの操作に応じて行われるようにしてもよいし、入力部１４が物体検出手法を用いて行うようにしてもよい。 In step S33, the input unit 14 detects a recognition object. That is, the hand in the shape of the par set in the process of step S31 is detected from the input image acquired by the process of step S32. The detection of the recognized object here may be performed in accordance with a user operation, or may be performed by the input unit 14 using an object detection method.

ステップＳ３４において、認識部１５は、入力部１４からの入力画像（フレーム）について、手形状の確率や手の色パターンを計算することで、物体認識処理を施す。この物体認識処理には、例えば特開２０１０−１０８４７５号公報のような、Steerable Filterの応答を特徴量としてBoostingを用いて認識器を構成する技術を用いることができる。また、SSD（Sum of Squared Difference）や、カラーヒストグラムのテンプレートマッチング手法等を用いるようにしてもよい。 In step S 34, the recognition unit 15 performs object recognition processing on the input image (frame) from the input unit 14 by calculating the hand shape probability and the hand color pattern. For this object recognition processing, for example, a technology such as Japanese Patent Application Laid-Open No. 2010-108475 that configures a recognizer using Boosting with a Steerable Filter response as a feature amount can be used. Further, an SSD (Sum of Squared Difference), a color histogram template matching method, or the like may be used.

＜５．物体認識処理の詳細＞
ここで、図６のフローチャートを参照して、図５のフローチャートのステップＳ３４における物体認識処理の詳細について説明する。 <5. Details of object recognition processing>
Here, the details of the object recognition processing in step S34 of the flowchart of FIG. 5 will be described with reference to the flowchart of FIG.

ステップＳ５１において、認識部１５は、学習処理部１６の記憶部２５から、前フレームについてのオンライン学習の学習結果であるオンライン辞書を取得する。なお、入力画像のフレームが１フレーム目の場合、認識部１５は、オフライン辞書保存部１３から、１フレーム目に対応するフレームについてのオフライン学習の学習結果（オフライン辞書）を取得する。 In step S 51, the recognition unit 15 acquires an online dictionary that is a learning result of online learning for the previous frame from the storage unit 25 of the learning processing unit 16. When the frame of the input image is the first frame, the recognition unit 15 acquires a learning result (offline dictionary) of offline learning for the frame corresponding to the first frame from the offline dictionary storage unit 13.

ステップＳ５２において、認識部１５は、取得した辞書に基づいて、特徴量のスコアを計算することで認識物体を認識する。 In step S52, the recognition unit 15 recognizes the recognized object by calculating a feature amount score based on the acquired dictionary.

ステップＳ５３において、認識部１５は、計算した特徴量のスコアを基に認識結果を生成し、図５のステップＳ３４の処理に戻る。 In step S53, the recognition unit 15 generates a recognition result based on the calculated feature amount score, and returns to the process of step S34 in FIG.

図５のフローチャートに戻り、ステップＳ３５において、学習処理部１６は、認識部１５により生成された認識結果を用いて、認識物体の学習処理を実行する。 Returning to the flowchart of FIG. 5, in step S 35, the learning processing unit 16 executes a recognition object learning process using the recognition result generated by the recognition unit 15.

＜６．学習処理の詳細＞
ここで、図７のフローチャートを参照して、図５のフローチャートのステップＳ３５における学習処理の詳細について説明する。 <6. Details of the learning process>
Here, the details of the learning process in step S35 of the flowchart of FIG. 5 will be described with reference to the flowchart of FIG.

ステップＳ７１において、学習処理部１６の選択部２３は、認識部１５により生成された認識結果を取得する。 In step S 71, the selection unit 23 of the learning processing unit 16 acquires the recognition result generated by the recognition unit 15.

ステップＳ７２において、選択部２３は、記憶部２５に記憶されている、現フレームより時間的に前の前フレームのオンライン辞書（前フレームについてのオンライン学習結果）において、スコアが第１の閾値より低い特徴量を選択する。なお、現フレームが入力画像の１フレーム目である場合、ステップＳ７２の処理はスキップされる。 In step S72, the selection unit 23 has a score lower than the first threshold in the online dictionary of the previous frame temporally prior to the current frame (online learning result for the previous frame) stored in the storage unit 25. Select feature quantity. If the current frame is the first frame of the input image, the process of step S72 is skipped.

ステップＳ７３において、選択部２３は、オフライン辞書保存部１３に保存されているオフライン辞書（オフライン学習結果）において、現フレームに対応する特徴量のうち、スコアが第２の閾値より高い特徴量を選択する。 In step S73, the selection unit 23 selects, in the offline dictionary (offline learning result) stored in the offline dictionary storage unit 13, a feature amount having a score higher than the second threshold value from among the feature amounts corresponding to the current frame. To do.

ここで、前フレームのオンライン学習結果において、ステップＳ７２で選択された特徴量を除き、ステップＳ７３で選択された特徴量を加えた特徴量を、学習用特徴量という。 Here, in the online learning result of the previous frame, the feature amount obtained by adding the feature amount selected in step S73 excluding the feature amount selected in step S72 is referred to as a learning feature amount.

なお、ステップＳ７２で選択された特徴量と、ステップＳ７３で選択された特徴量とを入れ替えることで、学習用特徴量を得るようにしてもよい。 Note that the feature quantity for learning may be obtained by exchanging the feature quantity selected in step S72 with the feature quantity selected in step S73.

具体的には、図８上段に示される、オフライン学習結果としての時系列変化する特徴量のスコアにおいて、現フレームに対応する時刻Ｔでのスコアが高い特徴量f1，f2を、図８中段に示される、前フレームのオンライン学習結果においてスコアが低い特徴量fy，fzと入れ替えるようにしてもよい。この結果、図８下段に示される、特徴量f1，f2を含み、特徴量fy，fzが除かれた学習用特徴量が得られるようになる。 Specifically, the feature quantities f1 and f2 having high scores at the time T corresponding to the current frame in the score of the feature quantity changing in time series as the offline learning result shown in the upper part of FIG. 8 are shown in the middle part of FIG. The feature values fy and fz having low scores in the online learning result of the previous frame shown may be replaced. As a result, the learning feature quantity including the feature quantities f1 and f2 and excluding the feature quantities fy and fz shown in the lower part of FIG. 8 is obtained.

ステップＳ７４において、学習部２４は、選択部２３による選択の結果得られた学習用特徴量を用いて、認識物体の特徴量を学習する。学習は、オンラインブースティングの手法等を用いて実行される。オンラインブースティングの手法については、例えば次の文献に開示されている。
Helmut Grabner and Horst Bischof, "On-line Boosting and Vision", In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 1, pages 260-267, 2006 In step S 74, the learning unit 24 learns the feature amount of the recognized object using the learning feature amount obtained as a result of selection by the selection unit 23. Learning is performed using an online boosting technique or the like. The online boosting technique is disclosed in the following document, for example.
Helmut Grabner and Horst Bischof, "On-line Boosting and Vision", In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 1, pages 260-267, 2006

ステップＳ７５において、更新部２６は、学習部２４の学習結果により、記憶部２５に記憶されているオンライン辞書を更新し、処理は図５のステップＳ３５に戻る。 In step S75, the updating unit 26 updates the online dictionary stored in the storage unit 25 based on the learning result of the learning unit 24, and the process returns to step S35 in FIG.

図５のフローチャートに戻り、ステップＳ３６において、更新部２６は、学習部２４の学習結果により、オフライン辞書保存部１３に保存されているオフライン学習結果を更新する。これにより、特定の認識物体に対する認識性能の向上を図ることができる。なお、ステップＳ３６の処理は、必要に応じて実行されなくともよい。 Returning to the flowchart of FIG. 5, in step S 36, the update unit 26 updates the offline learning result stored in the offline dictionary storage unit 13 with the learning result of the learning unit 24. Thereby, the recognition performance with respect to a specific recognition object can be improved. Note that the process of step S36 may not be performed as necessary.

ステップＳ３７において、認識部１５は、例えば、図示せぬ表示装置に、認識結果を出力する。すなわち、ステップＳ３４の処理により生成された認識結果が、図示せぬ表示装置に表示される。具体的には、表示装置により表示されている入力画像において、認識された手を囲むような枠が表示される。 In step S37, the recognition unit 15 outputs the recognition result to a display device (not shown), for example. That is, the recognition result generated by the process of step S34 is displayed on a display device (not shown). Specifically, a frame surrounding the recognized hand is displayed in the input image displayed by the display device.

これにより、ユーザは、認識物体としての手の追跡を確認することができる。 Thereby, the user can confirm tracking of the hand as a recognition object.

なお、図５のステップＳ３５，Ｓ３６の処理とステップＳ３７の処理は、並列に実行することができる。すなわち、ステップＳ３５，Ｓ３６の処理が終了するより前に、ステップＳ３７の処理が実行されるようにしてもよい。 Note that the processes of steps S35 and S36 in FIG. 5 and the process of step S37 can be executed in parallel. That is, the process of step S37 may be executed before the processes of steps S35 and S36 are completed.

ステップＳ３８において、入力部１４は、画像を取得する。すなわち、新たな入力画像のフレームが取得される。 In step S38, the input unit 14 acquires an image. That is, a frame of a new input image is acquired.

なお、新たに取得される画像は、過去に取得された画像の次のフレームの画像でもよいし、過去に取得された画像から所定のフレーム数だけ後の画像でもよい。 Note that the newly acquired image may be an image of a frame next to an image acquired in the past, or may be an image after a predetermined number of frames from an image acquired in the past.

すなわち、新たな画像は、１フレーム毎に取得されてもよいし、所定のフレーム数ごとに取得されてもよい。 That is, a new image may be acquired for each frame or may be acquired for each predetermined number of frames.

ステップＳ３８の処理の後、処理はステップＳ３４に戻り、それ以降の処理が繰り返される。 After the process of step S38, the process returns to step S34, and the subsequent processes are repeated.

なお、以上においては、認識物体はユーザの体の一部である手であるものとしたが、手以外の体の一部や体全体、ユーザの体以外の物体等、時系列で形状や状態が変化するものであればよい。 In the above, the recognition object is assumed to be a hand that is a part of the user's body, but the shape and state in time series such as a part of the body other than the hand, the entire body, an object other than the user's body, As long as it changes.

以上の処理によれば、次のような効果を実現することができる。
（１）全特徴量空間についてオンライン学習を行うことは困難であるが、認識物体の事前知識を用いた学習（オフライン学習）を行うことで、形状や見えが大きく変化する認識物体を高速かつロバストに認識することができる。
（２）オフライン学習では、認識物体についての特徴量毎の良さ、または、複数の特徴量の組み合わせである特徴量組の良さであるスコアを、時間的な分布を用いて評価することができる。
（３）オンライン学習では、オフライン学習の学習結果に基づいて、学習に好適な特徴量が選択されて計算されるので高速に学習を行うことができ、また、認識対象に特化した特徴量が用いられるのでロバストに学習を行うことができる。
（４）オンライン学習の学習結果がオフライン学習結果（オフライン辞書）にフィードバックされるので、特定の認識物体についての辞書、すなわち、個人化された辞書を生成することができる。
（５）形状や見えが大きく変化する認識物体を高速かつロバストに認識することができるので、ジェスチャ認識におけるアプリケーションへの応用が容易となる。 According to the above processing, the following effects can be realized.
(1) Although it is difficult to perform online learning for the entire feature space, it is fast and robust to recognize objects whose shape and appearance change greatly by performing learning using prior knowledge of the recognized objects (offline learning). Can be recognized.
(2) In off-line learning, it is possible to evaluate a score, which is a goodness for each feature amount of a recognized object, or a goodness of a feature amount group which is a combination of a plurality of feature amounts, using a temporal distribution.
(3) In online learning, feature quantities suitable for learning are selected and calculated based on the learning results of offline learning, so that learning can be performed at high speed, and there are feature quantities specialized for recognition objects. Since it is used, it is possible to learn robustly.
(4) Since the learning result of online learning is fed back to the offline learning result (offline dictionary), a dictionary for a specific recognition object, that is, a personalized dictionary can be generated.
(5) Since a recognition object whose shape and appearance greatly change can be recognized at high speed and robustly, it can be easily applied to an application in gesture recognition.

本技術は、例えばテレビジョン受像器、パーソナルコンピュータなどの情報処理装置をジェスチャで遠隔操作する場合などに適用することができる。
＜７．その他＞ The present technology can be applied to a case where an information processing apparatus such as a television receiver or a personal computer is remotely operated with a gesture.
<7. Other>

本技術の実施の形態は、上述した実施の形態に限定されるものではなく、本技術の要旨を逸脱しない範囲において種々の変更が可能である。 Embodiments of the present technology are not limited to the above-described embodiments, and various modifications can be made without departing from the gist of the present technology.

例えば、本技術は、１つの機能をネットワークを介して複数の装置で分担、共同して処理するクラウドコンピューティングの構成をとることができる。 For example, the present technology can take a configuration of cloud computing in which one function is shared by a plurality of devices via a network and is jointly processed.

また、上述のフローチャートで説明した各ステップは、１つの装置で実行する他、複数の装置で分担して実行することができる。 In addition, each step described in the above flowchart can be executed by being shared by a plurality of apparatuses in addition to being executed by one apparatus.

さらに、１つのステップに複数の処理が含まれる場合には、その１つのステップに含まれる複数の処理は、１つの装置で実行する他、複数の装置で分担して実行することができる。 Further, when a plurality of processes are included in one step, the plurality of processes included in the one step can be executed by being shared by a plurality of apparatuses in addition to being executed by one apparatus.

また、本技術は以下のような構成をとることができる。
（１）入力画像から認識物体を認識する認識部と、
時系列で変化するスコアを有する特徴量を用いて、フレーム毎の前記認識物体の前記特徴量を学習する学習部と
を備え、
前記認識部は、前記学習部の学習結果として得られた前フレームの前記特徴量に基づいて、現フレームの前記認識物体を認識する
情報処理装置。
（２）時系列で状態が変化する前記認識物体を含む学習用画像から、フレーム毎の前記認識物体の前記特徴量を事前に学習する事前学習部と、
前記事前学習部の学習結果として得られたフレーム毎の前記特徴量の前記スコアを事前に計算する計算部をさらに備え、
前記学習部は、前記計算部により事前に計算された前記スコアに応じた前記特徴量を用いて、前記入力画像のフレーム毎の前記認識物体の前記特徴量を学習する
（１）に記載の情報処理装置。
（３）前記計算部により事前に計算された前記スコアを有する現フレームに対応する前記特徴量から、所定の閾値より高いスコアを有する前記特徴量を選択する選択部をさらに備え、
前記学習部は、前記選択部により選択された前記特徴量を用いて、現フレームの前記認識物体の前記特徴量を学習する
（２）に記載の情報処理装置。
（４）前記学習部の学習結果として得られた前記特徴量を記憶する記憶部と、
前記記憶部に記憶されている前記特徴量を、前記学習部の学習結果に応じてフレーム毎に更新する更新部をさらに備え、
前記認識部は、前記更新部により更新された前フレームの前記特徴量に基づいて、現フレームの前記認識物体を認識する
（１）乃至（３）のいずれかに記載の情報処理装置。
（５）前記選択部は、前記記憶部に記憶されている前フレームの前記特徴量から、他の閾値より低いスコアを有する前記特徴量を選択し、
前記学習部は、前記選択部により選択された前記特徴量を除いた前記特徴量を用いて、現フレームの前記認識物体の前記特徴量を学習する
（１）乃至（４）のいずれかに記載の情報処理装置。
（６）前記更新部は、前記事前学習部の学習結果として得られた前記特徴量を、前記学習部の学習結果に応じてフレーム毎に更新する
（４）または（５）に記載の情報処理装置。
（７）入力画像から認識物体を認識する認識部と、
時系列で変化するスコアを有する特徴量を用いて、フレーム毎の前記認識物体の前記特徴量を学習する学習部とを備える情報処理装置の情報処理方法であって、
前記情報処理装置が、
時系列で変化するスコアを有する特徴量を用いて、フレーム毎の前記認識物体の前記特徴量を学習し、
前記学習部の学習結果として得られた前フレームの前記特徴量に基づいて、現フレームの前記認識物体を認識する
ステップを含む情報処理方法。
（８）入力画像から認識物体を認識する認識ステップと、
時系列で変化するスコアを有する特徴量を用いて、フレーム毎の前記認識物体の前記特徴量を学習する学習ステップと
を含む処理をコンピュータに実行させ、
前記認識ステップは、前記学習ステップの学習結果として得られた前フレームの前記特徴量に基づいて、現フレームの前記認識物体を認識する
プログラム。 Moreover, this technique can take the following structures.
(1) a recognition unit that recognizes a recognition object from an input image;
A learning unit that learns the feature amount of the recognized object for each frame using a feature amount having a score that changes in time series, and
The recognition unit recognizes the recognition object of the current frame based on the feature amount of the previous frame obtained as a learning result of the learning unit.
(2) a pre-learning unit that learns in advance the feature amount of the recognized object for each frame from a learning image including the recognized object whose state changes in time series;
A calculation unit that calculates in advance the score of the feature amount for each frame obtained as a learning result of the pre-learning unit;
The information according to (1), wherein the learning unit learns the feature amount of the recognition object for each frame of the input image using the feature amount corresponding to the score calculated in advance by the calculation unit. Processing equipment.
(3) a selection unit that selects the feature amount having a score higher than a predetermined threshold from the feature amount corresponding to the current frame having the score calculated in advance by the calculation unit;
The information processing apparatus according to (2), wherein the learning unit learns the feature amount of the recognized object in the current frame using the feature amount selected by the selection unit.
(4) a storage unit that stores the feature amount obtained as a learning result of the learning unit;
An update unit that updates the feature amount stored in the storage unit for each frame according to a learning result of the learning unit;
The information processing apparatus according to any one of (1) to (3), wherein the recognition unit recognizes the recognition object of the current frame based on the feature amount of the previous frame updated by the update unit.
(5) The selection unit selects the feature amount having a score lower than another threshold from the feature amount of the previous frame stored in the storage unit,
The learning unit learns the feature amount of the recognized object in the current frame using the feature amount excluding the feature amount selected by the selection unit. (1) to (4) Information processing device.
(6) The update unit updates the feature amount obtained as a learning result of the pre-learning unit for each frame according to the learning result of the learning unit. (4) or (5) Processing equipment.
(7) a recognition unit that recognizes a recognition object from the input image;
An information processing method of an information processing apparatus including a learning unit that learns the feature amount of the recognized object for each frame using a feature amount having a score that changes in time series,
The information processing apparatus is
Using the feature amount having a score that changes in time series, learning the feature amount of the recognized object for each frame,
An information processing method including a step of recognizing the recognized object of the current frame based on the feature amount of the previous frame obtained as a learning result of the learning unit.
(8) a recognition step of recognizing a recognition object from the input image;
Using a feature amount having a score that changes in time series, causing a computer to execute a process including learning step for learning the feature amount of the recognized object for each frame,
The recognition step is a program for recognizing the recognition object of the current frame based on the feature amount of the previous frame obtained as a learning result of the learning step.

１情報処理装置，２オフライン処理部，３オンライン処理部，１１学習用データ保存部，１２学習処理部，１３オフライン辞書保存部，１４入力部，１５認識部，１６学習処理部，２１学習部，２２計算部，２３選択部，２４学習部，２５記憶部，２６更新部 DESCRIPTION OF SYMBOLS 1 Information processing apparatus, 2 Offline processing part, 3 Online processing part, 11 Learning data storage part, 12 Learning processing part, 13 Offline dictionary preservation | save part, 14 Input part, 15 Recognition part, 16 Learning processing part, 21 Learning part, 22 calculation units, 23 selection units, 24 learning units, 25 storage units, 26 update units

Claims

A recognition unit for recognizing a recognition object from an input image;
A learning unit that learns the feature amount of the recognized object for each frame using a feature amount having a score that changes in time series, and
The recognition unit recognizes the recognition object of the current frame based on the feature amount of the previous frame obtained as a learning result of the learning unit.

A pre-learning unit that learns in advance the feature amount of the recognized object for each frame from a learning image including the recognized object whose state changes in time series;
A calculation unit that calculates in advance the score of the feature amount for each frame obtained as a learning result of the pre-learning unit;
The information according to claim 1, wherein the learning unit learns the feature amount of the recognized object for each frame of the input image using the feature amount corresponding to the score calculated in advance by the calculation unit. Processing equipment.

A selection unit that selects the feature amount having a score higher than a predetermined threshold from the feature amount corresponding to the current frame having the score calculated in advance by the calculation unit;
The information processing apparatus according to claim 2, wherein the learning unit learns the feature amount of the recognized object in the current frame using the feature amount selected by the selection unit.

A storage unit for storing the feature amount obtained as a learning result of the learning unit;
An update unit that updates the feature amount stored in the storage unit for each frame according to a learning result of the learning unit;
The information processing apparatus according to claim 3, wherein the recognition unit recognizes the recognition object of the current frame based on the feature amount of the previous frame updated by the update unit.

The selection unit selects the feature amount having a score lower than another threshold from the feature amount of the previous frame stored in the storage unit,
The information processing apparatus according to claim 4, wherein the learning unit learns the feature amount of the recognized object in the current frame using the feature amount excluding the feature amount selected by the selection unit.

The information processing apparatus according to claim 4, wherein the update unit updates the feature amount obtained as a learning result of the pre-learning unit for each frame according to a learning result of the learning unit.

A recognition unit for recognizing a recognition object from an input image;
An information processing method of an information processing apparatus including a learning unit that learns the feature amount of the recognized object for each frame using a feature amount having a score that changes in time series,
The information processing apparatus is
Using the feature amount having a score that changes in time series, learning the feature amount of the recognized object for each frame,
An information processing method including a step of recognizing the recognized object of the current frame based on the feature amount of the previous frame obtained as a learning result of the learning unit.

A recognition step for recognizing a recognition object from an input image;
Using a feature amount having a score that changes in time series, causing a computer to execute a process including learning step for learning the feature amount of the recognized object for each frame,
The recognition step is a program for recognizing the recognition object of the current frame based on the feature amount of the previous frame obtained as a learning result of the learning step.