JP2015052994A

JP2015052994A - Feature selection device, learning device, method, and program

Info

Publication number: JP2015052994A
Application number: JP2013186415A
Authority: JP
Inventors: 森　稔; Minoru Mori; 稔森; 誠一内田; Seiichi Uchida
Original assignee: Kyushu University NUC; Nippon Telegraph and Telephone Corp
Current assignee: Kyushu University NUC; Nippon Telegraph and Telephone Corp
Priority date: 2013-09-09
Filing date: 2013-09-09
Publication date: 2015-03-19

Abstract

PROBLEM TO BE SOLVED: To be capable of selecting a feature effective for identification while maintaining consistency in time-of-day order between each of features.SOLUTION: A feature extraction unit 32 extracts, for each of two-times-of-day of learning data, a feature defined by a value of two-times-of-day from each of a plurality of learning data to which a class indicated by time series data expressed by a value of each time of day is previously imparted. An initial feature selection unit 342 selects, for each class, a two-times-of-day of at least one feature effective for class identification on the basis of each of two-times-of-day features and the class imparted to each of learning data. A time-of-day order feature selection unit 344 selects, for each class as two-times-of-day of a feature used for identifying whether the time series data to be identified represents a class, a two-times-of-day of a feature from two-times-of-day of the feature selected for the class so that consistency in time-of-day order between each of features is maintained.

Description

本発明は、特徴選択装置、学習装置、方法、及びプログラムに関する。 The present invention relates to a feature selection device, a learning device, a method, and a program.

従来、オンライン手書き文字認識やジェスチャー認識など、時系列データが表す文字やジェスチャーなどの対象を認識することが行われている。時系列データの認識における代表的な手法として、例えば、標準パターン及び入力パターン共に全ての文字ストロークや動作の軌跡を時刻順どおりに結合して表現した時系列パターンを生成し、これらのオンライン文字における筆点座標系列やジェスチャーにおける各サンプリング時での動作の特徴点間でＤＰマッチング（Dynamic Programming（動的計画法）によるマッチング）を行うことにより認識する手法が提案されている（非特許文献１）。 Conventionally, recognition of objects such as characters and gestures represented by time-series data such as online handwritten character recognition and gesture recognition has been performed. As a typical technique for recognizing time series data, for example, a standard time pattern and an input pattern are generated by combining all character strokes and motion trajectories in order of time. A method of recognizing by performing DP matching (matching by dynamic programming (dynamic programming)) between feature points of movement at the time of each sampling in a handwriting coordinate series or gesture has been proposed (Non-Patent Document 1). .

また、ＤＰマッチングに用いる特徴として、各サンプリング時の特徴点の座標だけではなく、隣接したサンプリング時間の特徴点間の差分としての相対座標値として定義される局所方向を用いる手法もある。 In addition, as a feature used for DP matching, there is a method using not only the coordinates of feature points at the time of each sampling but also a local direction defined as a relative coordinate value as a difference between feature points at adjacent sampling times.

さらに、相対座標値として、隣接したサンプリング時間だけでなく、更に離れたサンプリング時間での特徴点間の差分としての相対座標値を含む大局的特徴を用いる手法も提案されている（非特許文献２）。 Furthermore, as a relative coordinate value, a method using a global feature including a relative coordinate value as a difference between feature points not only at adjacent sampling times but also at further apart sampling times has been proposed (Non-Patent Document 2). ).

また、大局的特徴を用いてＤＰマッチングを実行するために、予め設定した開始位置から、特徴の有効性を示す指標を用いて、時刻順の整合性を保持するように、時間順に複数の特徴を選択する手法も提案されている（非特許文献３）。 In addition, in order to perform DP matching using global features, a plurality of features are arranged in time order so as to maintain consistency in time order using an index indicating the effectiveness of the feature from a preset start position. A method of selecting is proposed (Non-patent Document 3).

佐藤幸男、足立秀綱、「走り書き文字のオンライン認識」電子情報通信学会論文誌（Ｄ），Ｖｏｌ．Ｊ６８−（Ｄ），Ｎｏｌ．１２，ｐｐ．２１１６−２１２２Yukio Sato, Hidetsuna Adachi, “Online Recognition of Scribbled Characters” IEICE Transactions (D), Vol. J68- (D), Nol. 12, pp. 2116-2122 森稔、内田誠一、坂野等、「大局的構造情報を用いたオンライン数字認識」電子情報通信学会技術研究報告パターン認識・メディア理解研究会，ｖｏｌ．１１１，ｎｏ．３１７，ＰＲＭＵ２０１１−１０２，ｐｐ．１９−２４，Ｎｏｖ．２０１１Satoshi Mori, Seiichi Uchida, Sakano et al., “Online Number Recognition Using Global Structure Information” IEICE Technical Report Pattern Recognition / Media Understanding Study Group, vol. 111, no. 317, PRMU2011-102, pp. 19-24, Nov. 2011 森稔、内田誠一、坂野等、「大局的特徴に対するＤＰマッチング電子情報通信学会論文誌（Ｄ），Ｖｏｌ．Ｊ９６−Ｄ，Ｎｏ．７，ｐｐ．１６５４−１６５７Satoshi Mori, Seiichi Uchida, Sakano et al., “DP Matching for Global Features (J) -Vol.J96-D, No.7, pp.1655-1657

しかしながら、非特許文献１に記載の手法のように、各サンプリング時間での特徴点を用いたＤＰマッチング法では、ＤＰマッチングにより部分的な座標のずれを吸収できるが、座標に大きな変動が生じた際には、そのずれを吸収できず、他のパターンに誤認識される場合がある、という問題がある。 However, as in the method described in Non-Patent Document 1, the DP matching method using feature points at each sampling time can absorb a partial coordinate shift by DP matching, but a large change occurs in the coordinates. In such a case, there is a problem that the deviation cannot be absorbed and may be erroneously recognized by another pattern.

また、局所方向を用いたＤＰマッチング法では、座標値のずれに対して耐性が向上するが、逆に各サンプリング時間での特徴点を用いないことにより、異なる座標に存在する類似した形状を持つ別のパターンに誤認識される場合がある、という問題がある。 In addition, the DP matching method using the local direction improves the tolerance against the shift of the coordinate value, but conversely, by using no feature point at each sampling time, it has a similar shape existing at different coordinates. There is a problem that it may be misrecognized by another pattern.

また、非特許文献２に記載の手法のように、大局的特徴を用いた手法では、大局的な構造情報を用いることにより、より大きな座標の変動に対して耐性を向上させることが可能であるが、特徴が時系列に並んでいないため、ＤＰマッチング等の時系列方向への変動を許容する認識手法を適用することができない、という問題がある。 In addition, in the method using global features, such as the method described in Non-Patent Document 2, it is possible to improve tolerance to larger coordinate fluctuations by using global structure information. However, since the features are not arranged in time series, there is a problem that it is impossible to apply a recognition method that allows variation in the time series direction such as DP matching.

また、非特許文献３に記載の手法を用いることにより、大局的特徴をＤＰマッチングに適用することが可能であるが、ＤＰマッチングの開始位置をトップダウンで決める必要があり、またその開始位置から時間順に整合性を保持するよう特徴が選択されるために、選択される特徴が限定され、必ずしも識別に有効な特徴が選択されるわけではない、という問題がある。 In addition, by using the method described in Non-Patent Document 3, it is possible to apply a global feature to DP matching, but it is necessary to determine the DP matching start position from the top down, and from the start position. Since the features are selected so as to maintain the consistency in time order, there is a problem that the features to be selected are limited, and features that are effective for identification are not necessarily selected.

本発明は、上記問題点を解決するために成されたものであり、特徴間の各々における時刻順の整合性を保持して、識別に有効な特徴を選択することができる特徴選択装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made to solve the above-described problems, and a feature selection device and method capable of selecting features effective for identification while maintaining consistency in time order between the features. And to provide a program.

また、特徴間の各々における時刻順の整合性を保持して、精度よく識別するための特徴を得ることができる学習装置を提供することを目的とする。 It is another object of the present invention to provide a learning device that can maintain a time-order consistency between features and obtain features for accurate identification.

上記目的を達成するために、第１の発明の特徴選択装置は、各時刻の値で表現された時系列データの表すクラスが予め付与された複数の学習データの各々から、２時刻間の値で定義される特徴を、前記学習データの２時刻間の各々について抽出する特徴抽出部と、前記特徴抽出部により前記学習データの各々について抽出された前記学習データの２時刻間の各々の特徴と、前記学習データの各々に付与された前記クラスとに基づいて、前記複数のクラス毎に、前記クラスの識別に有効な複数の前記特徴の２時刻間を選択する初期特徴選択部と、前記クラス毎に、識別対象の時系列データが前記クラスを表すか否かを識別するために用いられる前記特徴の２時刻間として、前記初期特徴選択部において前記クラスに対して選択された前記特徴の２時刻間から、特徴間の各々における時刻順の整合性を保持するように前記特徴の２時刻間を選択する時刻順特徴選択部と、を含んで構成されている。 In order to achieve the above object, the feature selection device according to the first aspect of the present invention provides a value between two times from each of a plurality of learning data to which a class represented by time-series data represented by values at each time is given in advance. And a feature extracting unit that extracts the features defined in (2) for each of two times of the learning data, and (2) each feature of the learning data for two times extracted for each of the learning data by the feature extracting unit; An initial feature selection unit that selects, for each of the plurality of classes, two time points of the plurality of features effective for identifying the class, based on the class assigned to each of the learning data, and the class For each of the two times of the feature used to identify whether the time-series data to be identified represents the class or not, the initial feature selection unit selects the feature selected for the class. From between the time, it is configured to include a, a time order feature selecting section that selects between two times of the feature to retain the chronological consistency in each between features.

第２の発明の特徴選択方法は、特徴抽出部と、初期特徴選択部と、時刻順特徴選択部とを含む特徴選択装置における特徴選択方法であって、前記特徴抽出部は、各時刻の値で表現された時系列データの表すクラスが予め付与された複数の学習データの各々から、２時刻間の値で定義される特徴を、前記学習データの２時刻間の各々について抽出し、前記初期特徴選択部は、前記特徴抽出部により前記学習データの各々について抽出された前記学習データの２時刻間の各々の特徴と、前記学習データの各々に付与された前記クラスとに基づいて、前記クラス毎に、前記クラスの識別に有効な少なくとも１つの前記特徴の２時刻間を選択し、前記時刻順特徴選択部は、前記クラス毎に、識別対象の時系列データが前記クラスを表すか否かを識別するために用いられる前記特徴の２時刻間として、前記初期特徴選択部において前記クラスに対して選択された前記特徴の２時刻間から、特徴間の各々における時刻順の整合性を保持するように前記特徴の２時刻間を選択する。 A feature selection method of a second invention is a feature selection method in a feature selection device including a feature extraction unit, an initial feature selection unit, and a time-order feature selection unit, wherein the feature extraction unit includes a value at each time A feature defined by a value between two times is extracted from each of a plurality of learning data to which a class represented by the time-series data represented by the above is assigned in advance, and the initial The feature selection unit is configured to determine the class based on each feature of the learning data extracted for each of the learning data by the feature extraction unit between two times and the class assigned to each of the learning data. For each class, it selects between two times of at least one feature effective for class identification, and the time-order feature selection unit determines whether the time-series data to be identified represents the class for each class. Identify As the two times of the feature used for the purpose, the time order consistency in each of the features is maintained from the two times of the feature selected for the class in the initial feature selection unit. Select between two times of features.

第３の発明の学習装置は、請求項１に記載の特徴選択装置と、前記クラス毎に、前記クラスが予め付与された学習データの各々から、前記時刻順特徴選択部によって前記クラスに対して選択された前記特徴の２時刻間について、前記学習データの前記特徴を、識別対象の時系列データが前記クラスを表すか否かを識別するために用いられる特徴として抽出する選択特徴抽出部と、を含んで構成されている。 According to a third aspect of the present invention, there is provided a learning device according to the first aspect, and for each class, the learning data in which the class is previously assigned to the class by the time-order feature selection unit. A selection feature extraction unit that extracts the feature of the learning data as a feature used for identifying whether or not the time-series data to be identified represents the class for two times of the selected feature; It is comprised including.

また、本発明のプログラムは、コンピュータに、上記の特徴選択装置を構成する各部を実行させるためのプログラムである。 Moreover, the program of this invention is a program for making a computer perform each part which comprises said feature selection apparatus.

以上説明したように、特徴選択装置、方法、及びプログラムによれば、特徴間の各々における時刻順の整合性を保持して、識別に有効な特徴を選択することができる。 As described above, according to the feature selection device, method, and program, it is possible to select features that are effective for identification while maintaining consistency in time order between the features.

また、学習装置によれば、特徴間の各々における時刻順の整合性を保持して、精度よく識別するための特徴を得ることができる。 Further, according to the learning device, it is possible to obtain features for accurate identification while maintaining consistency in time order between the features.

本発明の実施の形態に係る特徴選択装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the feature selection apparatus which concerns on embodiment of this invention. 大局的特徴の一例を示す概略図である。It is the schematic which shows an example of a global characteristic. ＤＰマッチングを説明するための概略図である。It is the schematic for demonstrating DP matching. 時間順の整合性が保たれて選択された大局的特徴と時間順の整合性が保たれずに選択された大局的特徴との一例を示す概略図である。It is the schematic which shows an example of the global feature selected while maintaining the time order consistency, and the global feature selected without maintaining the time order consistency. 特徴の選択を説明するための概略図である。It is the schematic for demonstrating selection of the characteristic. 本発明の実施の形態に係る特徴選択装置における学習処理ルーチンを示すフローチャート図である。It is a flowchart figure which shows the learning process routine in the feature selection apparatus concerning embodiment of this invention. 本発明の実施の形態に係る特徴選択装置における識別処理ルーチンを示すフローチャート図である。It is a flowchart figure which shows the identification process routine in the feature selection apparatus concerning embodiment of this invention.

以下、図面を参照して本発明の実施の形態を詳細に説明する。本実施の形態では、オンライン文字を識別する文字識別装置に本発明を適用した例について説明する。なお、オンライン文字とは、文字の筆跡をストローク毎の筆点座標系列で表現したもの、すなわち、各サンプリング時刻の筆点座標値で文字パターンが表現された時系列データである。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In the present embodiment, an example in which the present invention is applied to a character identification device that identifies online characters will be described. The on-line character is a character handwriting represented by a handwriting coordinate series for each stroke, that is, time-series data in which a character pattern is represented by a handwriting coordinate value at each sampling time.

＜本実施の形態に係る特徴選択装置の構成＞
まず、本発明の実施の形態に係る特徴選択装置の構成について説明する。図１に示すように、本発明の実施の形態に係る特徴選択装置１００は、ＣＰＵと、ＲＡＭと、後述する学習処理ルーチン及び識別処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することができる。この特徴選択装置１００は、機能的には図１に示すように入力部１０と、演算部２０と、出力部５０とを備えている。 <Configuration of Feature Selection Device According to this Embodiment>
First, the configuration of the feature selection device according to the embodiment of the present invention will be described. As shown in FIG. 1, a feature selection apparatus 100 according to an embodiment of the present invention includes a CPU, a RAM, a ROM that stores a program and various data for executing a learning processing routine and an identification processing routine described later, and , Can be configured with a computer including. Functionally, the feature selection apparatus 100 includes an input unit 10, a calculation unit 20, and an output unit 50 as shown in FIG.

入力部１０は、キーボードなどの入力装置から、文字クラスが既知の学習用のオンライン文字データ（以下、「学習データ」という）を受け付ける。また、入力部１０は、キーボードなどの入力装置から、文字クラスが未知の時系列データである識別対象データを受け付ける。なお、入力部１０は、ネットワーク等を介して外部から入力されたものを受け付けるようにしてもよい。また、本実施の形態においては、数字の１０クラス（０〜９）を学習データとして受け付けるため、クラス毎に複数の学習データを受け付ける。 The input unit 10 receives online character data for learning (hereinafter referred to as “learning data”) having a known character class from an input device such as a keyboard. The input unit 10 also receives identification target data that is time-series data whose character class is unknown from an input device such as a keyboard. Note that the input unit 10 may accept input from the outside via a network or the like. Moreover, in this Embodiment, since 10 numbers (0-9) of numbers are received as learning data, several learning data are received for every class.

演算部２０は、選択特徴記憶部２２、学習部３０、及び識別部４０を含んで構成されている。 The calculation unit 20 includes a selection feature storage unit 22, a learning unit 30, and an identification unit 40.

選択特徴記憶部２２は、学習データ毎に、学習部３０において抽出された選択特徴を記憶している。 The selection feature storage unit 22 stores the selection feature extracted by the learning unit 30 for each learning data.

学習部３０は、特徴抽出部３２、及び特徴選択部３４を含んで構成されている。 The learning unit 30 includes a feature extraction unit 32 and a feature selection unit 34.

特徴抽出部３２は、入力部１０において受け付けた、１０個のクラスの何れかが付与された学習データの各々について、当該学習データから、ストローク上の任意の２点間を結ぶ相対ベクトルを特徴として各々抽出する。なお、２点間が２時刻間の一例である。 The feature extraction unit 32 uses, as a feature, a relative vector connecting any two points on the stroke from the learning data for each of the learning data to which any of the ten classes received by the input unit 10 is assigned. Extract each one. Note that two points are an example between two times.

具体的には、学習データが表すストロークの大きさを正規化し、Ｎ点の特徴点からなる時系列データをリサンプリングする。ここで、リサンプリングとは、１つの文字ストロークに設定する特徴点の数を決めて、特徴点をサンプリングすることをいう。リアルタイムで書かれている文字ストローク上の特徴点をサンプリングする際に、一定時間毎の特徴点をとると、人によって文字を書くスピードが異なるため、１文字当たりの特徴点数にばらつきがでるためである。例えば、１つの文字ストロークに設定する特徴点の数をＮとし、各特徴点の間隔が一定の距離になるようにリサンプリングする。なお、Ｎの数としては、字形を十分に表現可能な点数とする必要がある。 Specifically, the stroke size represented by the learning data is normalized, and time-series data composed of N feature points is resampled. Here, resampling means sampling the feature points by determining the number of feature points set for one character stroke. When sampling feature points on a character stroke written in real time, if the feature points are taken at regular intervals, the character writing speed varies depending on the person, so the number of feature points per character varies. is there. For example, the number of feature points set for one character stroke is N, and resampling is performed so that the interval between the feature points becomes a constant distance. Note that the number of N needs to be a number that can express the character shape sufficiently.

次に、２つの特徴点間で定義される相対ベクトル（ｄｘ，ｄｙ）を特徴として抽出する。ここでは、１次元目の特徴、・・・、ｊ次元目の特徴、・・・、Ｊ次元目の特徴が抽出される。Ｊは、Ｎ個の特徴点から２つを選択する組み合わせの数Ｎ×（Ｎ―１）／２である。このように、本実施の形態では、隣接する２点間の相対ベクトル及び隣接しない２点間の相対ベクトルを含む、ストローク上の任意の２点間の相対ベクトルで表される大局的特徴を抽出する。図２に、特徴点Ｐ_ｎにおける大局的特徴の一例を示す。ここで、Ｐ_ｎはリサンプリングして得られたＮ個の特徴点中のｎ番目の特徴点であることを表す。例えば、特徴点Ｐ_ｎの座標値を（ｘ_ｎ，ｙ_ｎ）、特徴点Ｐ_Ｎの座標値を（ｘ_Ｎ，ｙ_Ｎ）とすると、この２点間の相対ベクトル（Ｐ_ｎ→Ｐ_Ｎ）は、特徴点Ｐ_ｎに対する特徴点Ｐ_Ｎのｘ軸方向の相対位置ｄｘ＝ｘ_ｎ−ｘ_Ｎ、ｙ軸方向の相対位置ｄｙ＝ｙ_ｎ−ｙ_Ｎで表される。 Next, a relative vector (dx, dy) defined between two feature points is extracted as a feature. Here, the features of the first dimension,..., The features of the jth dimension,. J is the number N × (N−1) / 2 of combinations for selecting two from N feature points. As described above, in this embodiment, global features represented by a relative vector between any two points on the stroke including a relative vector between two adjacent points and a relative vector between two non-adjacent points are extracted. To do. FIG. 2 shows an example of a global feature at the feature point _Pn . Here, P _n represents the n-th feature point among the N feature points obtained by resampling. For example, _{assuming that} the coordinate value of the feature point P _n is (x _n , y _n ) and the coordinate value of the feature point P _N is (x _N , y _N ), the relative vector (P _n → P _N ) between the two points. the relative position of the x-axis direction of the feature point _{P n} for the feature point _{_{_{P n dx = x n -x n}}} , represented by the relative positions of the y-axis direction dy ₌ y n -y _n.

特徴選択部３４は、特徴抽出部３２において学習データの各々から抽出された特徴に基づいて、クラス毎に、時系列方向への変動を許容する識別処理を行う識別処理部４４（詳細は後述）に適用可能となるよう、識別に有効且つ時刻順の整合性を保持した特徴を選択する。 The feature selection unit 34 performs an identification process that allows variation in the time-series direction for each class based on the features extracted from each of the learning data by the feature extraction unit 32 (details will be described later). A feature that is effective for identification and that maintains consistency in time order is selected so that it can be applied to.

ここで、特徴選択部３４において、時刻順の整合性を保持した特徴を選択することの理由について説明する。ここでは、識別処理部４４で、時系列方向への変動を許容する識別処理として、ＤＰマッチングを行う例について説明する。なお、「時系列方向への変動を許容する識別処理」とは、時系列データのある点において時系列方向への伸縮が発生した場合でも、対応付け及びマッチングの評価が可能な識別処理のことである。つまり、ある一定の時間間隔や一定の距離間隔でサンプリングした２つの時系列データを比較する場合、一方がＭ個の点からなる時系列データ、他方がＮ個（Ｎ＞ＭまたはＮ＜Ｍ）の点からなる時系列データであっても適用可能な識別処理である。 Here, the reason why the feature selection unit 34 selects a feature that maintains consistency in time order will be described. Here, an example will be described in which the identification processing unit 44 performs DP matching as the identification processing that allows variation in the time-series direction. Note that “identification processing that allows variation in the time-series direction” is an identification process that enables evaluation of matching and matching even when expansion or contraction in the time-series direction occurs at a certain point in the time-series data. It is. That is, when comparing two time-series data sampled at a certain time interval or a certain distance interval, one is time-series data consisting of M points and the other is N (N> M or N <M) This is an applicable identification process even for time-series data consisting of these points.

大局的特徴を用いたＤＰマッチングでは、図３に示すように、参照データ上の大局的特徴の、入力データにおける対応先を順次最適化する。ＤＰマッチングによる大局的特徴の最適対応付けのアルゴリズムにおいて、最適化すべき変数集合は、図３に示すように、｛（ｓ(ｋ)，ｔ(ｋ)）｜ｋ＝１，・・・，Ｋ｝である。ｓ(ｋ)及びｔ(ｋ)は各々参照データから選択されたｋ番目の大局的特徴を与える２点Ｓ_ｋ及びＴ_ｋに対する入力データ上での対応点である。これら変数集合を、（ｓ(１)，ｔ(１)），・・・，（ｓ(ｋ)，ｔ(ｋ)），・・・，（ｓ(Ｋ)，ｔ(Ｋ)）の順に逐次決定する過程を考える。目的関数は、対応する大局的特徴間のユークリッド距離Ｋ個の総和である。目的関数を最小化するべく各変数の値、すなわち対応関係を決定することになる。 In DP matching using global features, as shown in FIG. 3, the correspondence destination in the input data of the global features on the reference data is sequentially optimized. In the algorithm for optimum matching of global features by DP matching, the variable set to be optimized is {(s (k), t (k)) | k = 1,..., K, as shown in FIG. }. s (k) and t (k) are the corresponding points on the input data for the two points S _k and T _k that give the k th global feature selected from the reference data, respectively. These variable sets are arranged in the order of (s (1), t (1)), ..., (s (k), t (k)), ..., (s (K), t (K)). Consider the process of sequential determination. The objective function is the sum of K Euclidean distances between corresponding global features. In order to minimize the objective function, the value of each variable, that is, the correspondence relationship is determined.

ここで、図４（a）に示すように大局的特徴が選択されている場合、つまり全順序性を保持した大局的特徴の列において、大局的特徴である相対ベクトルの始点及び終点に相当する特徴点の順番（時刻）が単調に増加または減少する場合、（ｓ(ｋ)，ｔ(ｋ)）の決定には（ｓ(ｋ−１)，ｔ(ｋ−１)）の値しか影響しない。従って、この決定過程は時刻順の整合性を伴うマルコフ決定過程となり、その最適解はＤＰマッチングで求まる。一方、図４（b）に示すように、選択された特徴を並べたときに、相対ベクトルの始点に相当する特徴点の順番が途中で逆に戻ったり、終点に相当する特徴点の順番が途中で逆に戻ったり、始点に相当する特徴点の順番と終点に相当する特徴点の順番とが入れ違いになったりしている大局的特徴が選択されている場合には、時刻順の整合性が保持できておらず、上記のＤＰマッチングによる対応付けの最適化が不可能となる。これはｋの決定にｋ−１以外の値が影響してしまい、マルコフ決定過程とならず時刻順の整合性を伴っていないことから明らかである。 Here, as shown in FIG. 4A, when the global feature is selected, that is, in the global feature column maintaining the total order, it corresponds to the start point and the end point of the relative vector which is the global feature. When the order (time) of feature points monotonously increases or decreases, only the values of (s (k-1), t (k-1)) affect the determination of (s (k), t (k)). do not do. Therefore, this determination process is a Markov determination process with consistency in time order, and the optimal solution is obtained by DP matching. On the other hand, as shown in FIG. 4B, when the selected features are arranged, the order of the feature points corresponding to the start point of the relative vector is reversed in the middle, or the order of the feature points corresponding to the end point is changed. If a global feature is selected that returns in the middle, or the order of the feature points corresponding to the start point and the order of the feature points corresponding to the end point is reversed, the consistency in time order Cannot be maintained, and it is impossible to optimize the association by the above DP matching. This is apparent from the fact that values other than k-1 affect the determination of k, which is not a Markov determination process and does not involve time order consistency.

上記のような理由により、特徴選択部３４では、時刻順の整合性を保持した特徴を選択する必要があるが、非特許文献３の手法では、上記条件を満足させるため、予め特徴選択を開始する位置を決定し、該開始位置から識別に対する有効性を表す指標に基づいて、順に選択するため、開始位置により選択可能な特徴が制限されたり、どの位置が適切な開始位置なのかを決定するのが難しかったりという問題がある（非特許文献３参照）。 For the reasons described above, the feature selection unit 34 needs to select a feature that maintains consistency in time order. However, in the technique of Non-Patent Document 3, feature selection is started in advance in order to satisfy the above condition. The position to be selected is determined in order based on the index indicating the effectiveness of the identification from the start position. Therefore, selectable features are limited by the start position, and which position is an appropriate start position. There is a problem that it is difficult (see Non-Patent Document 3).

本実施の形態では、非特許文献３の手法ように事前に開始位置を設定する必要がなく、且つ識別により有効な特徴な選択を可能とする方法を採用する。 In the present embodiment, unlike the method of Non-Patent Document 3, it is not necessary to set a start position in advance, and a method that enables effective feature selection by identification is adopted.

本実施の形態における手法は、より識別に有効な特徴を選択可能であるが、時間順の整合性を維持しない特徴選択手法を用いつつ、選択条件に新たな制限を付けることにより、時間順の整合性を保持し、且つ選択開始位置の設定を事前にする必要がなく、開始位置に依存せずに有効な特徴を選択可能にするものである。 The method in this embodiment can select features that are more effective for identification, but by using a feature selection method that does not maintain consistency in time order, by adding new restrictions to the selection conditions, Consistency is maintained, and it is not necessary to set the selection start position in advance, and an effective feature can be selected without depending on the start position.

特徴選択部３４は、初期特徴選択部３４２、時刻順特徴選択部３４４、及び選択特徴抽出部３４６を含んで構成されている。なお、図５は、特徴選択部３４において特徴が選択される過程の例を示す。 The feature selection unit 34 includes an initial feature selection unit 342, a time-order feature selection unit 344, and a selection feature extraction unit 346. FIG. 5 shows an example of a process in which features are selected by the feature selection unit 34.

初期特徴選択部３４２は、クラス毎に、特徴抽出部３２で抽出された特徴の各々から、当該クラスの識別に有効な特徴の２点間を初期特徴として選択する。初期特徴選択処理として、本実施の形態では、ブースティング処理によって、クラス毎に、当該クラスの学習データと、当該クラス以外のクラスが付与された学習データとに基づいて、特徴抽出部３２において抽出した２点間の特徴から、当該クラスの識別に有効な特徴の２点間を選択することにより、初期特徴の選択を行う。このブースティング処理には、例えば、公知のＡｄａＢｏｏｓｔ（ＡｄａｐｔｉｖｅＢｏｏｓｔｉｎｇ、アダブースト）法を用いることができる（非特許文献４：Ｙ．ＦｒｅｕｎｄａｎｄＲ．Ｓｃｈａｐｉｒｅ， “Ａｄｅｃｉｓｉｏｎ−ｔｈｅｏｒｅｔｉｃｇｅｎｅｒａｌｉｚａｔｉｏｎｏｆｏｎ−ｌｉｎｅｌｅａｒｎｉｎｇａｎｄａｎａｐｐｌｉｃａｔｉｏｎｔｏｂｏｏｓｔｉｎｇ” ＪｏｕｒｎａｌｏｆＣｏｍｐｕｔｅｒａｎｄＳｙｓｔｅｍＳｃｉｅｎｃｅｓ，１９９７．参照）。ＡｄａＢｏｏｓｔ法では、誤識別率最小化の観点で、既に選ばれた特徴の２点間の弱点をうまく補完するような特徴の２点間を逐次新たに選んでいくため、有効性が高い特徴の２点間から順に複数の特徴の２点間を初期特徴として選択することができる。 The initial feature selection unit 342 selects, for each class, two features that are effective for identifying the class as initial features from each of the features extracted by the feature extraction unit 32. In the present embodiment, as the initial feature selection processing, in the boosting processing, for each class, extraction is performed by the feature extraction unit 32 based on learning data of the class and learning data to which a class other than the class is assigned. The initial feature is selected by selecting between the two points of the feature effective for identifying the class from the feature between the two points. For this boosting process, for example, a known AdaBoost (Adaptive Boosting, Adaboost) method can be used (Non-Patent Document 4: Y. Freund and R. Shapire, “A decision-theoretical generation of on-line.” an application to boosting "Journal of Computer and System Sciences, 1997.). In the AdaBoost method, from the viewpoint of minimizing the misclassification rate, two points of features that complement well the weak points between the two points of already selected features are successively selected. It is possible to select between two points of a plurality of features as an initial feature in order from the two points.

なお、ブースティング処理は２クラスを比較して処理を実施するため、本実施の形態の様に、２クラス以上の識別が対象となる課題には、使用上の工夫が必要である。本実施の形態では、数字の１０クラスを識別対象とするため、クラス“０”とその他クラス、クラス“１”とその他クラスという分け方を、０〜９の１０通り実施し、各クラスの数字に対応した初期特徴の選択を行う。なお、ブースティング処理において、当該クラスについて初期特徴として１つの特徴の２点間が選択される。また、２回目のブースティング処理においては、２番目に有効性が高い特徴の２点間が選択される。 In addition, since boosting processing is performed by comparing two classes, as in the present embodiment, a device for identifying two or more classes needs to be devised for use. In the present embodiment, since 10 numbers are the identification target, the class “0” and other classes, and the class “1” and other classes are classified into 10 ways of 0 to 9, and the numbers of each class are used. The initial feature corresponding to is selected. In the boosting process, two points of one feature are selected as initial features for the class. Further, in the second boosting process, a point between two points having the second most effective feature is selected.

ただし、上述したように、ブースティング処理などの特徴選択手法は、各特徴を誤識別率最小化の基準で、識別に有効な特徴から順に選択していく。したがって、時間順の整合性は保たれておらず、そのままＤＰマッチング等の手法に適用することは不可能である。 However, as described above, in the feature selection method such as boosting processing, each feature is selected in order from features that are effective for identification based on the criterion for minimizing the misidentification rate. Therefore, consistency in time order is not maintained, and it is impossible to apply the method as it is to DP matching or the like.

時刻順特徴選択部３４４は、クラス毎に、初期特徴選択部３４２において当該クラスの初期特徴として選択された特徴の２点間が、当該クラスについて選択済みとされた特徴の２点間と時間順の整合性が保たれているか否かを判定する。具体的には、クラス毎に、初期特徴選択部３４２において当該クラスについて選択された初期特徴と、それまでに当該クラスについて選択済みとされた特徴の２点間との時間順の整合性が保たれているか否かを判定する。そして、当該クラスについて選択された初期特徴において、当該クラスについて選択済みとされている特徴の２点間との時間順の整合性が保たれていると判定された場合には、当該初期特徴を当該クラスの選択済み特徴の２点間として採用し、一方、時間順の整合性が保たれていない場合は、当該初期特徴を選択せずに不採用とする。なお、採用された選択済み特徴の２点間は、クラス毎にメモリ（図示省略）に記憶されている。また、初期特徴選択部３４２において最も有効と判断されている初期特徴については、時間順の整合性の判定処理を行わずに選択済み特徴の２点間として採用し、記憶する。また、初期特徴選択部３４２の処理と、時刻順特徴選択部３４４の処理とを、ブースティング処理により初期特徴として選択される全ての特徴について終了するまで繰り返す。 For each class, the time-order feature selection unit 344 determines, for each class, between the two points of the feature selected as the initial feature of the class by the initial feature selection unit 342 and between the two points of the feature selected for the class. It is determined whether or not the consistency is maintained. Specifically, for each class, consistency in time order between the initial feature selected for the class in the initial feature selection unit 342 and the two points of the features already selected for the class is maintained. It is determined whether or not it is hit. Then, in the initial feature selected for the class, if it is determined that the time order consistency between two points of the feature selected for the class is maintained, the initial feature is If the selected feature of the class is used between two points, and if the consistency in time order is not maintained, the initial feature is not selected and not adopted. In addition, between the two selected features that have been adopted is stored in a memory (not shown) for each class. In addition, the initial feature determined to be the most effective in the initial feature selection unit 342 is adopted and stored as two points of the selected feature without performing the time order consistency determination process. Further, the process of the initial feature selection unit 342 and the process of the time order feature selection unit 344 are repeated until all the features selected as the initial features by the boosting process are completed.

特定のクラスの初期特徴として選択された特徴の２点間が、当該クラスについて選択済みとされた特徴の２点間と時間順の整合性が保たれているか否かの判定は、具体的には、下記（１）式を、全ての選択済み特徴の２点間に対して、判定対象の初期特徴の２点間が満たした場合について時間順の整合性が保たれているとみなす。なお、Ｓ_p、Ｔ_pは選択済み特徴の２点間を表し、Ｓ_q、Ｓ_pは判定対象の初期特徴の２点間を表す。 The determination whether or not the two points of the feature selected as the initial feature of the specific class are consistent in time order with the two points of the feature selected for the class is specifically as follows. The following equation (1) is regarded as maintaining consistency in order of time when two points of the initial feature to be determined are satisfied with respect to two points of all selected features. Note that S _p and T _p represent two points of the selected feature, and S _q and S _p represent two points of the initial feature to be determined.

選択特徴抽出部３４６は、クラス毎に、当該クラスに属する学習データの各々に対し、時刻順特徴選択部３４４において当該クラスに対して選択した選択済み特徴の２点間の各々を用いて、当該選択済み特徴の２点間と同じ２点間について、当該学習データから特徴（相対ベクトル）を抽出し、選択特徴として選択特徴記憶部２２に記憶する。 The selected feature extraction unit 346 uses, for each class, each of the learning data belonging to the class, using each of the two points of the selected features selected for the class by the time-order feature selection unit 344, A feature (relative vector) is extracted from the learning data between two points that are the same as the two points of the selected feature and stored in the selected feature storage unit 22 as a selected feature.

識別部４０は、特徴抽出部４２、及び識別処理部４４を含んで構成されている。 The identification unit 40 includes a feature extraction unit 42 and an identification processing unit 44.

特徴抽出部４２は、入力部１０において受け付けた、文字クラスが未知の時系列データである識別対象データについて、特徴抽出部３２と同様に、任意の２点間の特徴を各々抽出する。 Similar to the feature extraction unit 32, the feature extraction unit 42 extracts features between arbitrary two points from the identification target data that is time-series data whose character class is unknown, which is received by the input unit 10.

識別処理部４４は、類似値算出部４４２、及び判定部４４４を含んで構成されている。 The identification processing unit 44 includes a similarity value calculation unit 442 and a determination unit 444.

類似値算出部４４２は、選択特徴記憶部２２に記憶されている学習データ毎に、特徴抽出部４２において識別対象データについて抽出された各特徴と、選択特徴記憶部２２に記憶されている当該学習データの各選択特徴とに基づいて、ＤＰマッチング等の手法を用いて、識別対象データと当該学習データとの類似値を算出する。 For each learning data stored in the selected feature storage unit 22, the similarity value calculation unit 442 includes each feature extracted for the identification target data in the feature extraction unit 42 and the learning stored in the selected feature storage unit 22. Based on each selected feature of the data, a similarity value between the identification target data and the learning data is calculated using a technique such as DP matching.

判定部４４４は、類似値算出部４４２において算出した学習データ毎の類似値に基づいて、最も類似値が高い学習データを抽出し、抽出された学習データが属するクラスを当該識別対象データが属するクラスとして判定し、判定されたクラスを、識別結果として出力部５０に出力する。 The determination unit 444 extracts learning data having the highest similarity value based on the similarity value for each learning data calculated by the similarity value calculation unit 442, and sets the class to which the extracted learning data belongs to the class to which the identification target data belongs. And the determined class is output to the output unit 50 as an identification result.

＜本実施の形態に係る特徴選択装置の作用＞
次に、本発明の実施の形態に係る特徴選択装置１００の作用について説明する。入力部１０により、数字の１０クラス（０〜９）の何れかが付与された学習データの各々が入力されると、特徴選択装置１００は、図６に示す学習処理ルーチンを実行する。 <Operation of Feature Selection Device According to this Embodiment>
Next, the operation of the feature selection device 100 according to the embodiment of the present invention will be described. When each of the learning data to which any of the ten classes (0 to 9) of numbers is input by the input unit 10, the feature selection device 100 executes a learning processing routine shown in FIG.

まず、ステップＳ１００では、入力部１０により入力された数字の１０クラス（０〜９）毎の学習データの各々を受け付ける。 First, in step S100, each of learning data for every 10 classes (0 to 9) of numbers input by the input unit 10 is accepted.

次に、ステップＳ１０１では、入力部１０において取得した学習データの各々について、文字パターンの大きさを正規化し、Ｎ点の特徴点からなる時系列データにリサンプリングする。 Next, in step S101, the size of the character pattern is normalized for each of the learning data acquired by the input unit 10 and resampled into time-series data including N feature points.

次に、ステップＳ１０２では、ステップＳ１０１においてリサンプリングされた学習データの各々について、全ての２つの特徴点間で定義される相対ベクトル（ｄｘ，ｄｙ）の各々を特徴（大局的特徴）として抽出する。 Next, in step S102, for each of the learning data resampled in step S101, each of the relative vectors (dx, dy) defined between all two feature points is extracted as a feature (global feature). .

次に、ステップＳ１０４では、処理対象となるクラスを選択する。 Next, in step S104, a class to be processed is selected.

次に、ステップＳ１０６では、ステップＳ１０４において選択したクラスに属する学習データの各々のステップＳ１０２において抽出した特徴の各々と、ステップＳ１０４において選択されなかったクラスの各々に属する学習データの各々のステップＳ１０２において抽出した特徴の各々と、前回のステップＳ１０６で選択された同じクラスの初期特徴とに基づいて、処理対象のクラスの初期特徴を選択する。 Next, in step S106, in each step S102 of each of the learning data belonging to each of the classes not selected in step S104 and each of the features extracted in step S102 of each of the learning data belonging to the class selected in step S104. Based on each of the extracted features and the initial feature of the same class selected in the previous step S106, the initial feature of the class to be processed is selected.

次に、ステップＳ１１２では、Ｓ１０６において選択された初期特徴が、処理対象のクラスについて前回までの処理において得られた選択済み特徴の２点間との時間順の整合性を満たしているか否かを判定する。時間順の整合性を満たしていると判定された場合には、ステップＳ１１４へ移行し、時間順の整合性を満たしていないと判定された場合には、ステップＳ１１６へ移行する。なお、処理対象のクラスについて当該処理を初めて行う場合には、ステップＳ１１４へ移行する。 Next, in step S112, it is determined whether or not the initial feature selected in S106 satisfies the consistency in time order between two points of the selected feature obtained in the previous processing for the class to be processed. judge. If it is determined that the time order consistency is satisfied, the process proceeds to step S114. If it is determined that the time order consistency is not satisfied, the process proceeds to step S116. When the process is performed for the class to be processed for the first time, the process proceeds to step S114.

次に、ステップＳ１１４では、ステップＳ１０６において選択した初期特徴を、処理対象のクラスの選択済み特徴の２点間としてメモリ（図示省略）に記憶する。 Next, in step S114, the initial feature selected in step S106 is stored in a memory (not shown) as a point between the selected features of the class to be processed.

次に、ステップＳ１１６では、処理対象のクラスについて、初期特徴として選択可能な全ての特徴について処理を終了したか否かの判定を行う。初期特徴として選択可能な全ての特徴について処理を行っている場合には、ステップＳ１１８へ移行し、初期特徴として選択可能な全ての特徴について処理を行っていない場合には、ステップＳ１０６へ戻り、ステップＳ１０６〜ステップＳ１１６の処理を繰り返す。 Next, in step S116, it is determined whether or not processing has been completed for all features that can be selected as initial features for the class to be processed. If all the features that can be selected as the initial feature are processed, the process proceeds to step S118. If all the features that can be selected as the initial feature are not processed, the process returns to step S106. The process from S106 to step S116 is repeated.

ステップＳ１１８では、処理対象のクラスに属する学習データの各々について、ステップＳ１１４において取得した選択済み特徴の２点間の各々を用いて、当該選択済み特徴の２点間と同じ２点間から、当該学習データの特徴（相対ベクトル）を抽出し、選択特徴として選択特徴記憶部２２に記憶する。 In step S118, for each piece of learning data belonging to the class to be processed, using each of the two points of the selected feature acquired in step S114, the two points that are the same as the two points of the selected feature are used. The feature (relative vector) of the learning data is extracted and stored in the selected feature storage unit 22 as a selected feature.

次に、ステップＳ１２０では、クラスの全てについて処理を終了したか否かの判定を行う。全てのクラスについて処理を終了している場合には、処理を終了し、全てのクラスについて処理を終了していない場合には、ステップＳ１０４へ移行し、処理対象となるクラスを変更し、ステップＳ１０４〜ステップＳ１２０の処理を繰り返す。 Next, in step S120, it is determined whether or not processing has been completed for all classes. If the process has been completed for all classes, the process is terminated. If the process has not been completed for all classes, the process proceeds to step S104, the class to be processed is changed, and step S104 is performed. -The process of step S120 is repeated.

次に、本発明の実施の形態に係る特徴選択装置１００における識別処理ルーチンついて説明する。入力部１０により、クラスが未知のオンライン文字データである識別対象データが入力されると、特徴選択装置１００は、図７に示す識別処理ルーチンを実行する。 Next, an identification processing routine in the feature selection device 100 according to the embodiment of the present invention will be described. When identification target data that is online character data whose class is unknown is input by the input unit 10, the feature selection device 100 executes an identification processing routine shown in FIG. 7.

まず、ステップＳ２００では、入力部１０により入力された識別対象データを受け付ける。 First, in step S200, identification target data input by the input unit 10 is received.

次に、ステップＳ２０２では、選択特徴記憶部２２に記憶されている学習データ毎の選択特徴を読み込む。 Next, in step S202, the selected feature for each learning data stored in the selected feature storage unit 22 is read.

次に、ステップＳ２０４では、上記ステップＳ１０２と同様にステップＳ２００において取得した識別対象データの特徴を抽出する。 Next, in step S204, the features of the identification target data acquired in step S200 are extracted as in step S102.

次に、ステップＳ２０６では、対象となる学習データを選択する。 Next, in step S206, target learning data is selected.

次に、ステップＳ２０８では、ステップＳ２０６において選択された学習データについて上記ステップＳ２０２において取得した選択特徴と、ステップＳ２０４において抽出した特徴とに基づいて、ＤＰマッチング等の手法を用いて、当該学習データとの類似値を算出する。 Next, in step S208, based on the selected feature acquired in step S202 for the learning data selected in step S206 and the feature extracted in step S204, the learning data and the learning data are selected using a technique such as DP matching. The similarity value of is calculated.

次に、ステップＳ２１０では、対象となる全ての学習データについて処理を終了したか否かの判定を行う。対象となる全ての学習データについて処理を終了した場合には、ステップＳ２１２へ移行し、対象となる全ての学習データについて処理を終了していない場合には、ステップＳ２０６へ移行し、ステップＳ２０６〜ステップＳ２１０の処理を繰り返す。 Next, in step S210, it is determined whether or not the processing has been completed for all target learning data. If the process has been completed for all target learning data, the process proceeds to step S212. If the process has not been completed for all target learning data, the process proceeds to step S206. The process of S210 is repeated.

次に、ステップＳ２１２では、ステップＳ２０８において算出した学習データの各々のとの類似値に基づいて、類似値が最も高い学習データを抽出し、当該学習データの属するクラスを当該識別対象データの属するクラスとして識別する。 Next, in step S212, learning data having the highest similarity value is extracted based on the similarity value with each of the learning data calculated in step S208, and the class to which the learning data belongs is changed to the class to which the identification target data belongs. Identify as.

次に、ステップＳ２１４では、ステップＳ２１２において識別された結果を出力部５０に出力して処理を終了する。 Next, in step S214, the result identified in step S212 is output to the output unit 50, and the process ends.

以上説明したように、本発明の実施の形態に係る特徴選択装置によれば、特徴間の各々における時刻順の整合性を保持して、識別に有効な特徴を選択することができる。 As described above, according to the feature selection apparatus according to the embodiment of the present invention, it is possible to select features that are effective for identification while maintaining consistency in time order between the features.

また、特徴間の各々における時刻順の整合性を保持して、精度よく識別するための特徴を得ることができる。 In addition, it is possible to obtain features for accurate identification while maintaining consistency in time order between the features.

また、大局的特徴を含む特徴を用いてＤＰマッチング等の時系列方向への変動を許容する識別手法を適用する場合に、より識別に有効な特徴を選択することができる。 In addition, when applying an identification method that allows variation in the time-series direction such as DP matching using features including global features, it is possible to select features that are more effective for identification.

また、時系列データから抽出した複数の大局的特徴から、より識別に有効な特徴を選択しつつ、時刻順の整合性を保持するように選択した特徴を用いることで、各特徴点の座標値や局所方向の変化だけでなく、大局的特徴を含む特徴を用いた場合でも、識別処理において、時系列方向への変動を許容する識別手法を適用することができる。すなわち、時系列データに時系列方向の伸縮が生じても識別可能となる。 In addition, by selecting features that are more effective for identification from a plurality of global features extracted from time-series data and using features that are selected to maintain time order consistency, the coordinate values of each feature point Even in the case where features including global features as well as changes in the local direction are used, an identification method that allows variation in the time-series direction can be applied in the identification processing. That is, even if expansion / contraction in the time series direction occurs in the time series data, it becomes possible to identify.

また、非特許文献３のように、予め選択開始位置を設定する必要がない為、開始位置に制約を受けず、より識別に有効な特徴を選択することが可能となる。 Further, since it is not necessary to set the selection start position in advance as in Non-Patent Document 3, it is possible to select features that are more effective for identification without being restricted by the start position.

また、初期特徴から選択特徴を選択する処理を繰り返すことにより、学習データから抽出された、識別に有効な特徴から、識別に有効且つ時間順の整合性が保たれた特徴を選択することができる。 In addition, by repeating the process of selecting the selected feature from the initial feature, it is possible to select a feature that is effective for identification and that maintains consistency in time order from features that are extracted from the learning data and that are effective for identification. .

なお、本発明は、上述した実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 Note that the present invention is not limited to the above-described embodiment, and various modifications and applications are possible without departing from the gist of the present invention.

例えば、本実施の形態においては、初期特徴の選択処理として、ブースティング処理、具体的にはＡｄａＢｏｏｓｔ法を用いた処理を説明したが、これに限定されるものではなく、他のブースティング手法や、ブースティング処理以外の特徴選択手法を適用してもよい。 For example, in the present embodiment, the boosting process, specifically, the process using the AdaBoost method has been described as the initial feature selection process. However, the present invention is not limited to this, and other boosting techniques, A feature selection method other than the boosting process may be applied.

また、本実施の形態においては、時系列データとしてオンライン文字データを入力し、文字パターンを識別する場合について説明したが、これに限定されるものではなく、例えば、ジェスチャー識別などの他の時系列データについても適用できる。 In the present embodiment, the case where online character data is input as time series data and the character pattern is identified has been described. However, the present invention is not limited to this. For example, other time series such as gesture identification is used. It can also be applied to data.

また、本実施の形態においては、学習部と識別部とを同一のコンピュータで構成する場合について説明したが、これに限定されるものではなく、別々のコンピュータで構成するようにしてもよい。 In the present embodiment, the learning unit and the identification unit are configured by the same computer. However, the present invention is not limited to this, and the learning unit and the identification unit may be configured by separate computers.

また、本実施の形態においては、識別対象データのクラスを識別する際に、識別対象データの全ての特徴を用いる場合について説明したが、これに限定されるものではなく、識別対象データから抽出した特徴のうち、学習データと同じ選択特徴のみを用いて識別対象データのクラスを識別してもよい。 In the present embodiment, the case where all the characteristics of the identification target data are used when identifying the class of the identification target data has been described. However, the present invention is not limited to this and is extracted from the identification target data. Of the features, the class of the identification target data may be identified using only the same selected features as the learning data.

また、本実施の形態においては、パターン識別に用いる特徴値として、２点間の相対ベクトルで表わされる大局的特徴のみを用いる場合について説明した。相対ベクトルで表される大局的特徴を用いた場合には、座標ずれを吸収することができるが、座標位置が大きく異なる場合でも相対ベクトルが類似している場合に誤識別を生じる場合がある。そこで、特徴値として大局的特徴と共に座標値も用いるようにしてもよい。座標値を特徴値とする場合にも、ｉ番目のサンプルのｊ番目の座標値をＰ_i,j＝（Ｐ_i,j,x，Ｐ_i,j,ｙ）として、上記実施の形態の大局的特徴と同様に扱うことができる。この場合も、上記実施の形態と同様にＡｄａＢｏｏｓｔ法を用いることで、座標値自体が文字の重要な構造を表す場合には、その座標値を示す特徴値が弱識別器として採用される。 Further, in the present embodiment, a case has been described in which only global features represented by relative vectors between two points are used as feature values used for pattern identification. When a global feature represented by a relative vector is used, a coordinate shift can be absorbed. However, even when the coordinate position is greatly different, erroneous identification may occur when the relative vectors are similar. Therefore, coordinate values may be used together with global features as feature values. Even when the coordinate value is used as a feature value, the j-th coordinate value of the i-th sample is set as P _{i, j} = (P _{i, j, x} , P _{i, j, y} ). It can be treated in the same way as special features. Also in this case, when the coordinate value itself represents an important structure of the character by using the AdaBoost method as in the above embodiment, the feature value indicating the coordinate value is adopted as the weak classifier.

また、本実施の形態においては、プログラムが予めインストールされている実施の形態として説明したが、これに限定されるものではなく、当該プログラムを、コンピュータ読み取り可能な記録媒体に格納して提供することも可能である。 Further, in the present embodiment, the embodiment has been described as the embodiment in which the program is installed in advance. However, the present invention is not limited to this, and the program is stored in a computer-readable recording medium and provided. Is also possible.

１０入力部
２０演算部
２２選択特徴記憶部
３０学習部
３２特徴抽出部
３４特徴選択部
４０識別部
４２特徴抽出部
４４識別処理部
５０出力部
１００特徴選択装置
３４２初期特徴選択部
３４４時刻順特徴選択部
３４６選択特徴抽出部
４４２類似値算出部
４４４判定部 DESCRIPTION OF SYMBOLS 10 Input part 20 Calculation part 22 Selection feature memory | storage part 30 Learning part 32 Feature extraction part 34 Feature selection part 40 Identification part 42 Feature extraction part 44 Identification processing part 50 Output part 100 Feature selection apparatus 342 Initial feature selection part 344 Time order feature selection Unit 346 selection feature extraction unit 442 similarity value calculation unit 444 determination unit

Claims

A feature defined by a value between two times is extracted from each of a plurality of learning data to which a class represented by time-series data represented by each time value is assigned in advance for each of the two times of the learning data. A feature extraction unit,
For each class, based on each feature of the learning data extracted for each of the learning data for two times by the feature extraction unit and the class assigned to each of the learning data, the class An initial feature selection unit that selects between two times of at least one of the features effective for identification of
For each of the classes, the feature selected for the class by the initial feature selection unit as two times of the feature used to identify whether the time-series data to be identified represents the class A time-order feature selecting unit that selects between the two times of the features so as to maintain the time-order consistency in each of the features between the two times;
A feature selection device.

A feature selection device according to claim 1;
For each of the classes, the feature of the learning data is identified for two times of the feature selected for the class by the time-order feature selection unit from each of the learning data to which the class is assigned in advance. A selection feature extraction unit that extracts as features used to identify whether the target time-series data represents the class;
Including learning device.

A feature selection method in a feature selection device including a feature extraction unit, an initial feature selection unit, and a time-order feature selection unit,
The feature extraction unit extracts a feature defined by a value between two times from each of a plurality of learning data to which a class represented by time-series data represented by a value at each time is given in advance. Extract for each time interval,
The initial feature selection unit is based on each feature of the learning data extracted for each of the learning data by the feature extraction unit for two times, and the class assigned to each of the learning data, For each class, select between two time points of at least one of the features effective for class identification;
For each class, the time-order feature selection unit sets the class in the initial feature selection unit as two time points of the feature used to identify whether the time-series data to be identified represents the class or not. A feature selection method that selects between two times of the feature so as to maintain consistency in time order between the features from the two times of the feature selected for the feature.

Computer
A feature defined by a value between two times is extracted from each of a plurality of learning data to which a class represented by time-series data represented by each time value is assigned in advance for each of the two times of the learning data. A feature extraction unit,
For each class, based on each feature of the learning data extracted for each of the learning data for two times by the feature extraction unit and the class assigned to each of the learning data, the class An initial feature selection unit that selects between two time points of at least one of the features effective for identification, and, for each class, used to identify whether the time-series data to be identified represents the class Between the two time points of the feature so as to maintain the time order consistency between each of the features from the two time points of the feature selected for the class in the initial feature selection unit. A time-order feature selection unit for selecting
Program to function as.