JP6676894B2

JP6676894B2 - Information processing apparatus, information processing method and program

Info

Publication number: JP6676894B2
Application number: JP2015146086A
Authority: JP
Inventors: 崇之原
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2015-07-23
Filing date: 2015-07-23
Publication date: 2020-04-08
Anticipated expiration: 2035-07-23
Also published as: JP2017027411A

Description

本発明は、情報処理装置、情報処理方法およびプログラムに関する。 The present invention relates to an information processing device, an information processing method, and a program.

入力ベクトルとクラスのラベルの組から構成される学習データに基づいて識別関数を構成（学習）し、未知の入力ベクトルに対して所属クラスを予測する方法が考案されている。その１つとして、サポートベクトルマシン（以下、ＳＶＭと略記）が存在する。ＳＶＭでは、学習データにおけるサンプルと識別面の間のマージンを最大化することで、高い汎化性能が得られることが知られている。また、識別関数の学習が二次計画問題となるため解の大域最適性が保証され、ニューラルネットワークのような局所解への収束の問題がない、といった利点がある。 A method has been devised in which a discriminant function is constructed (learned) based on learning data composed of a set of an input vector and a label of a class, and a class to which an unknown input vector belongs is predicted. One of them is a support vector machine (hereinafter abbreviated as SVM). In SVM, it is known that high generalization performance can be obtained by maximizing a margin between a sample and an identification plane in learning data. In addition, since learning of the discriminant function is a quadratic programming problem, global optimality of the solution is guaranteed, and there is an advantage that there is no problem of convergence to a local solution unlike a neural network.

一方、ＳＶＭでは学習によって獲得されるサポートベクトル数に比例して、識別にかかる処理時間が増加するという問題がある。そこで、ＳＶＭの識別時の計算コストを低減する技術が考案されている。例えば、学習後に識別関数をより少ない基準ベクトルを用いて近似することで、識別性能の劣化を極力抑えた上で処理時間を低減するアプローチが考案されている。 On the other hand, the SVM has a problem that the processing time required for identification increases in proportion to the number of support vectors acquired by learning. Therefore, a technique has been devised for reducing the calculation cost at the time of identifying the SVM. For example, an approach has been devised in which the discrimination function is approximated using a smaller number of reference vectors after learning, so that the deterioration of the discrimination performance is suppressed as much as possible and the processing time is reduced.

非特許文献１は、２つのサポートベクトルを統合し、新しい１つのサポートベクトルで置き換える「統合」による方法を提案している。非特許文献２は、多クラス分類のＳＶＭに対するサポートベクトルの「削除」「射影」「統合」を行う方法を提案している。 Non-Patent Document 1 proposes a method based on “integration” in which two support vectors are integrated and replaced with a new support vector. Non-Patent Document 2 proposes a method of performing “deletion”, “projection”, and “integration” of a support vector for an SVM of a multi-class classification.

サポートベクトルの「削除」は、重みの小さいサポートベクトルが存在しない場合は識別関数を大きく変える可能性があり、識別精度の安定的な維持の点で課題がある。「射影」では、残ったサポートベクトルのすべての重みを更新するため、識別関数の近似に処理時間を要する。また、各々のサポートベクトルが特徴空間で離れている場合は射影による識別関数の変化の吸収には限界がある。「統合」は、２つのサポートベクトルのみを変更するので「射影」に対して計算コストが低く、また識別関数を大きく劣化させる可能性が低い。処理時間低減と識別精度の維持の観点で「統合」によるアプローチのバランスがよいといえる。 The “deletion” of the support vector may significantly change the discriminant function when there is no support vector with a small weight, and there is a problem in stably maintaining the discrimination accuracy. In “projection”, processing time is required for approximating the identification function because all weights of the remaining support vectors are updated. Further, when the support vectors are separated from each other in the feature space, there is a limit in absorbing a change in the discrimination function due to the projection. Since “integration” changes only two support vectors, the calculation cost is lower than that of “projection”, and the possibility that the discriminant function is significantly deteriorated is low. It can be said that the approach of "integration" has a good balance from the viewpoint of reducing the processing time and maintaining the identification accuracy.

従来技術における「統合」のアプローチは、高次元の特徴空間での重みベクトルの変化量の最小化という評価基準を使用している。しかしこの方法は、入力ベクトルの分布が考慮されておらず、近似精度の劣化を招きやすいという問題がある。 The “integration” approach in the prior art uses an evaluation criterion of minimizing the amount of change of a weight vector in a high-dimensional feature space. However, this method has a problem that the distribution of the input vector is not taken into consideration and the approximation accuracy is likely to deteriorate.

本発明は、上記に鑑みてなされたものであって、識別関数の近似精度を向上させることを目的とする。 The present invention has been made in view of the above, and has as its object to improve the approximation accuracy of a discriminant function.

上述した課題を解決し、目的を達成するために、本発明は、複数の基準ベクトルと入力ベクトルとを引数とするカーネル関数の値の和で表される識別関数を処理する情報処理装置であって、複数の前記基準ベクトルを取得する取得部と、複数の前記基準ベクトルに含まれる第１基準ベクトルおよび第２基準ベクトルを統合した統合ベクトルを生成する生成部と、前記第１基準ベクトルおよび前記第２基準ベクトルを前記統合ベクトルで置き換えた近似関数と、前記識別関数と、の間の、前記入力ベクトルの確率分布に基づく近似誤差の期待値を表す評価値を算出する評価値算出部と、前記評価値が最小となる前記第１基準ベクトルと前記第２基準ベクトルとを出力する出力制御部と、を備える。 In order to solve the above-described problem and achieve the object, the present invention is an information processing apparatus that processes an identification function represented by a sum of values of a kernel function having a plurality of reference vectors and input vectors as arguments. An acquisition unit that acquires the plurality of reference vectors; a generation unit that integrates a first reference vector and a second reference vector included in the plurality of reference vectors to generate an integrated vector; An evaluation value calculation unit that calculates an evaluation value representing an expected value of an approximation error based on a probability distribution of the input vector between an approximation function obtained by replacing a second reference vector with the integrated vector and the identification function; An output control unit that outputs the first reference vector and the second reference vector that minimize the evaluation value.

本発明によれば、識別関数の近似精度を向上させることができるという効果を奏する。 According to the present invention, there is an effect that the approximation accuracy of the identification function can be improved.

図１は、第１の実施形態にかかる情報処理装置の構成を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration of the information processing apparatus according to the first embodiment. 図２は、第１の実施形態における統合処理の一例を示すフローチャートである。FIG. 2 is a flowchart illustrating an example of the integration processing according to the first embodiment. 図３は、第２の実施形態にかかる情報処理装置の構成を示すブロック図である。FIG. 3 is a block diagram illustrating a configuration of the information processing apparatus according to the second embodiment. 図４は、第３の実施形態にかかる情報処理装置の構成を示すブロック図である。FIG. 4 is a block diagram illustrating a configuration of an information processing apparatus according to the third embodiment. 図５は、第４の実施形態にかかる情報処理装置の構成を示すブロック図である。FIG. 5 is a block diagram illustrating a configuration of an information processing apparatus according to the fourth embodiment. 図６は、第４の実施形態における統合処理の一例を示すフローチャートである。FIG. 6 is a flowchart illustrating an example of the integration processing according to the fourth embodiment. 図７は、第５の実施形態にかかる情報処理装置の構成を示すブロック図である。FIG. 7 is a block diagram illustrating a configuration of an information processing apparatus according to the fifth embodiment. 図８は、第５の実施形態における識別処理の一例を示すフローチャートである。FIG. 8 is a flowchart illustrating an example of the identification processing according to the fifth embodiment. 図９は、各実施形態にかかる情報処理装置のハードウェア構成例を示す説明図である。FIG. 9 is an explanatory diagram illustrating a hardware configuration example of the information processing apparatus according to each embodiment.

以下に添付図面を参照して、この発明にかかる情報処理装置、情報処理方法およびプログラムの一実施形態を詳細に説明する。 Exemplary embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

下記の各実施形態は、ＳＶＭに代表されるような複数の基準ベクトルと入力ベクトルを引数としたカーネル関数の和（線形和）により識別関数を示す情報を処理する情報処理装置に関する。各実施形態の情報処理装置は、入力ベクトルの確率分布に基づく識別関数の期待近似誤差を最小化するように基準ベクトルを統合する。このように、識別関数をより少ない数の基準ベクトルで構成して近似することで、処理速度を向上させることができる。各実施形態の情報処理装置は、例えば、動画などの画像からの物体検出、画像を用いた物体追跡、文字認識、および、音声認識など、リアルタイムで学習をしながらパターンを識別する識別装置（アプリケーション）に適用することができる。 The following embodiments relate to an information processing apparatus that processes information indicating a discriminant function by the sum (linear sum) of a plurality of reference vectors represented by SVM and a kernel function with an input vector as an argument. The information processing apparatus according to each embodiment integrates reference vectors so as to minimize an expected approximation error of an identification function based on a probability distribution of an input vector. As described above, the processing speed can be improved by configuring and approximating the identification function with a smaller number of reference vectors. The information processing apparatus according to each of the embodiments includes an identification apparatus (an application that identifies a pattern while learning in real time, such as object detection from an image such as a moving image, object tracking using the image, character recognition, and voice recognition. ).

（第１の実施形態）
第１の実施形態では、ＳＶＭの識別関数を対象とする例を説明する。この場合、サポートベクトルが基準ベクトルとなる。また本実施形態では、一般的なカーネル関数を使用する例を説明する。適用可能な識別方式はＳＶＭに限られるものではない。 (First embodiment)
In the first embodiment, an example will be described in which an SVM identification function is targeted. In this case, the support vector becomes the reference vector. In this embodiment, an example in which a general kernel function is used will be described. The applicable identification method is not limited to SVM.

最初に原理を説明する。２クラス分類のＳＶＭで学習を行うと、Ｍ個（Ｍは正の整数）のサポートベクトルｓ^（ｉ）∈Ｒ^Ｄ（ｉ＝１，２，・・・，Ｍ）、対応する重みα_ｉ∈Ｒ（ｉ＝１，２，・・・，Ｍ）、および、オフセットｂ∈Ｒが求まり、入力ベクトルｘ∈Ｒ^Ｄに対する以下の（１）式に示す識別関数ｆ（ｘ）が得られる。Ｄは入力空間および特徴空間の次元数を表す。

First, the principle will be described. When learning is performed by the SVM of the two-class classification, M (M is a positive integer) support vectors s ⁽ⁱ⁾ { ^RD (i = 1, 2,..., M) and corresponding weights α _i } R (i = 1,2, ···, M), and, Motomari offset B∈R, discriminant function f shown in the following equation (1) for the input vector x∈R ^D (x) is obtained. D represents the number of dimensions of the input space and the feature space.

ここでＫ（，）はカーネル関数（Ｒ^Ｄ×Ｒ^Ｄ→Ｒ）である。ｆ（ｘ）の正負で入力ベクトルｘの属するクラスが予測される。（１）式に示されるように、識別関数ｆ（ｘ）の計算量はサポートベクトルの数Ｍに比例する。Ｍを低減したＬ（＜Ｍ）個のサポートベクトルにより、ｆ（ｘ）を（２）式のように近似すれば、計算量を低減することができる。

Here, K (,) is a kernel function ( ^RD × ^RD → R). The class to which the input vector x belongs is predicted by the sign of f (x). As shown in the equation (1), the amount of calculation of the identification function f (x) is proportional to the number M of support vectors. If f (x) is approximated by the L (<M) support vectors in which M is reduced as in equation (2), the calculation amount can be reduced.

本実施形態では、２つのサポートベクトルを選択し、選択した２つのサポートベクトルを１つのサポートベクトルに統合することでサポートベクトル数を低減する。このような統合のアプローチは、例えば非特許文献１および２でも用いられている。非特許文献１および２では、「入力ベクトルを高次元に写像した特徴空間における重みの変化量の最小化」という基準でサポートベクトルを統合する。具体的には次の通りである。φをカーネルＫ（，）に対応した入力空間から特徴空間への写像とする。このとき（１）式は以下の（３）式のように書き直せる。

In the present embodiment, the number of support vectors is reduced by selecting two support vectors and integrating the selected two support vectors into one support vector. Such an integration approach is also used in Non-Patent Documents 1 and 2, for example. In Non-Patent Documents 1 and 2, support vectors are integrated on the basis of “minimizing the amount of change in weight in a feature space in which input vectors are mapped in a high dimension”. Specifically, it is as follows. Let φ be a mapping from the input space corresponding to the kernel K (,) to the feature space. At this time, the expression (1) can be rewritten as the following expression (3).

ここで、ｗは高次元の特徴空間での重みであり、以下の（４）式により表される。

Here, w is a weight in the high-dimensional feature space, and is represented by the following equation (4).

以下の（５）式で表される重みを定義すると、（２）式は以下の（６）式のように表せる。

When the weight expressed by the following equation (5) is defined, the equation (2) can be expressed as the following equation (6).

非特許文献１および２では、これらの重みの差分のＬ２ノルムであり、以下の（７）式で表されるＪを評価関数としている。

In Non-Patent Documents 1 and 2, the L2 norm of the difference between these weights is used, and J expressed by the following equation (7) is used as the evaluation function.

一方、（３）式および（６）式から、識別関数の近似誤差は以下の（８）式で表される。この式から分かるように、（７）式を最小化しても入力ベクトルｘの分布に対して近似誤差は最小化されない。ただし、近似誤差の上界は押さえられる（非特許文献２参照）。

On the other hand, from the equations (3) and (6), the approximation error of the discriminant function is expressed by the following equation (8). As can be seen from this equation, even if equation (7) is minimized, the approximation error is not minimized for the distribution of the input vector x. However, the upper bound of the approximation error is suppressed (see Non-Patent Document 2).

例えば、φ（ｘ）が平均０、共分散Σの正規分布であったとすると、（８）式の期待値は以下の（９）式で表される。

For example, if φ (x) is a normal distribution with mean 0 and covariance Σ, the expected value of Expression (8) is expressed by Expression (9) below.

Σが単位行列でなければ、（７）式の最小化と、この期待値の最小化とは異なる。本実施形態はこの点を鑑みたものであり、入力ベクトルｘの分布ｐ（ｘ）に対する近似誤差の期待値である以下の（１０）式を最小化するように、サポートベクトルを統合する。

If Σ is not a unit matrix, the minimization of equation (7) is different from the minimization of this expected value. In the present embodiment, in view of this point, the support vectors are integrated so as to minimize the following expression (10), which is the expected value of the approximation error for the distribution p (x) of the input vector x.

ｉ番目とｊ番目（ｊ＝１，２，・・・，Ｍ）のサポートベクトルを統合して新しいサポートベクトルｓと重みαを生成するものとする。このとき、（１０）式の評価関数は、以下の（１１）式で表せる。以降、期待値Ｅ［］はすべてｐ（ｘ）に対して算出するものとする。

The i-th and j-th (j = 1, 2,..., M) support vectors are integrated to generate a new support vector s and weight α. At this time, the evaluation function of the expression (10) can be expressed by the following expression (11). Hereinafter, all the expected values E [] are calculated for p (x).

Ｊ（ｓ、α、ｉ，ｊ）をαで偏微分すると、以下の（１２）式のように表せる。

When J (s, α, i, j) is partially differentiated by α, it can be expressed as the following equation (12).

極値条件からＪ（ｓ、α、ｉ，ｊ）を最小化するｓ、αは、以下の（１３）式を満たす。

S and α that minimize J (s, α, i, j) from the extreme value condition satisfy the following expression (13).

また、Ｊ（ｓ、α、ｉ，ｊ）をｓで偏微分すると、以下の（１４）式のように表せる。

Further, when J (s, α, i, j) is partially differentiated with respect to s, it can be expressed as the following equation (14).

極値条件からＪ（ｓ、α、ｉ，ｊ）を最小化するｓ、αは、以下の（１５）式を満たす。

S and α that minimize J (s, α, i, j) from the extreme value condition satisfy the following equation (15).

（１３）式および（１５）式を連立させて解くことにより求められるｓ、αは、固定したｉ，ｊに対してＪ（ｓ、α、ｉ，ｊ）を最小化する。しかし、一般的なカーネル関数Ｋに対して（１３）式および（１５）式を解くことは容易ではない（特殊な場合の解法は後述する）。そこで、次のようなヒューリスティクスを用いる。まず、ｓをｓ^（ｉ）およびＳ^（ｊ）の線形結合で以下の（１６）式のように表す。

S and α obtained by simultaneously solving equations (13) and (15) minimize J (s, α, i, j) for fixed i, j. However, it is not easy to solve equations (13) and (15) for a general kernel function K (the solution in a special case will be described later). Therefore, the following heuristics are used. First, s is represented by a linear combination of s ⁽ⁱ⁾ and S ^(j) as in the following equation (16).

高次元での特徴空間でのサポートベクトルはこのような線形結合となることから、入力空間においても同様の関係が近似的に成り立つと仮定する。近似精度はカーネル関数に依存する。（１３）式および（１５）式からαを消去して、条件式の二乗誤差を最小化するｕを一次元探索で求める。このようにして求めたｕから、（１６）式によりｓを求め、さらに（１３）式からαを求める。 Since the support vector in the high-dimensional feature space is such a linear combination, it is assumed that a similar relationship is approximately established in the input space. The approximation accuracy depends on the kernel function. Α is eliminated from the expressions (13) and (15), and u that minimizes the square error of the conditional expression is obtained by a one-dimensional search. From u obtained in this manner, s is obtained from Expression (16), and α is obtained from Expression (13).

カーネル関数によっては（１３）式および（１５）式中の期待値が解析的に求まらない場合がある。その場合は、モンテカルロ法（例えばマルコフ連鎖モンテカルロ法）を用いてｐ（ｘ）からｘを複数サンプリングし、近似的に期待値を求めることができる。 Depending on the kernel function, the expected value in the expressions (13) and (15) may not be obtained analytically. In this case, the expected value can be approximately obtained by sampling a plurality of x from p (x) using a Monte Carlo method (for example, a Markov chain Monte Carlo method).

残った問題は、統合するサポートベクトルの選択、すなわち、ｉ，ｊの選択である。単純には、同じクラスに属するサポートベクトルに対してすべての組み合わせを試行し、Ｊ（ｓ、α、ｉ，ｊ）を最小化するｉ，ｊを求めることができる。サポートベクトルが極めて多い場合には、このような全探索では時間がかかる。このため、シミュレーティッド・アニーリング、タブーサーチ、および、遺伝的アルゴリズムに代表されるメタヒューリスティクスアルゴリズムにより最小化を行ってもよい。これにより、すべての基準ベクトルの組に対して探索を行うことを避け、識別関数の近似にかかる計算コストを低減することができる。 The remaining problem is the selection of the support vectors to be integrated, i.e., i, j. Simply, all combinations are tried for support vectors belonging to the same class, and i, j that minimizes J (s, α, i, j) can be obtained. If the number of support vectors is extremely large, such a full search takes time. For this reason, minimization may be performed by simulated annealing, tabu search, and a meta-heuristic algorithm represented by a genetic algorithm. As a result, it is possible to avoid searching for all sets of reference vectors, and to reduce the calculation cost for approximating the discriminant function.

以上の原理を元に構成される本実施形態の詳細について以下に説明する。図１は、第１の実施形態にかかる情報処理装置１００の構成を示すブロック図である。 The details of the present embodiment based on the above principle will be described below. FIG. 1 is a block diagram illustrating a configuration of the information processing apparatus 100 according to the first embodiment.

図１に示すように、本実施形態の情報処理装置１００は、記憶部２０と、取得部１０１と、選択部１０２と、生成部１０３と、重み算出部１０４と、評価値算出部１０５と、出力制御部１０６と、を備えている。 As illustrated in FIG. 1, the information processing apparatus 100 according to the present embodiment includes a storage unit 20, an acquisition unit 101, a selection unit 102, a generation unit 103, a weight calculation unit 104, an evaluation value calculation unit 105, And an output control unit 106.

記憶部２０は、処理に用いる各種情報を記憶する。例えば記憶部２０は、学習により得られたサポートベクトル、重み、および、オフセットなどの、識別関数を示す情報を記憶する。記憶部２０を情報処理装置１００の外部に備えるように構成してもよい。記憶部２０は、ＨＤＤ（Hard Disk Drive）、光ディスク、メモリカード、ＲＡＭ（Random Access Memory）などの一般的に利用されているあらゆる記憶媒体により構成することができる。 The storage unit 20 stores various information used for processing. For example, the storage unit 20 stores information indicating a discriminant function such as a support vector, a weight, and an offset obtained by learning. The storage unit 20 may be provided outside the information processing apparatus 100. The storage unit 20 can be configured by any generally used storage medium such as a hard disk drive (HDD), an optical disk, a memory card, and a random access memory (RAM).

取得部１０１は、処理対象とする情報を取得する。例えば取得部１０１は、サポートベクトルおよび重みなどの情報を記憶部２０から読み出すことにより取得する。 The acquisition unit 101 acquires information to be processed. For example, the acquisition unit 101 acquires information such as a support vector and a weight by reading from the storage unit 20.

選択部１０２は、取得したサポートベクトルのうち処理対象とするサポートベクトルを選択する。例えば選択部１０２は、取得されたサポートベクトルのうち、同じクラスに属するｉ番目のサポートベクトル（第１基準ベクトル）と、ｊ番目のサポートベクトル（第２基準ベクトル）を取得する。 The selection unit 102 selects a support vector to be processed from the acquired support vectors. For example, the selection unit 102 obtains an i-th support vector (first reference vector) and a j-th support vector (second reference vector) belonging to the same class among the obtained support vectors.

生成部１０３は、選択された複数のサポートベクトルを統合した統合ベクトルを生成する。例えば生成部１０３は、上述の原理に従い、ｉ番目のサポートベクトルと、ｊ番目のサポートベクトルとを統合した統合ベクトルｓを生成する。 The generation unit 103 generates an integrated vector obtained by integrating a plurality of selected support vectors. For example, the generation unit 103 generates an integrated vector s obtained by integrating the i-th support vector and the j-th support vector according to the above-described principle.

重み算出部１０４は、生成された統合ベクトルｓの重みを算出する。例えば重み算出部１０４は、上述の原理に従い、統合ベクトルｓの重みαを算出する。 The weight calculator 104 calculates the weight of the generated integrated vector s. For example, the weight calculator 104 calculates the weight α of the integrated vector s according to the above principle.

評価値算出部１０５は、統合する前の複数のサポートベクトル（ｉ番目のサポートベクトル、ｊ番目のサポートベクトル）を、統合ベクトルで置き換えた識別関数である近似関数と、識別関数と、の近似誤差を表す評価値を算出する。例えば評価値算出部１０５は、ｓ^（ｉ）、ｓ^（ｊ）を、ｓで置き換え、α_ｉ、α_ｊをαで置き換えた場合の評価関数（（１１）式）の値（評価関数値）を、評価値として求める。 The evaluation value calculation unit 105 calculates an approximation error between an identification function, which is an identification function obtained by replacing a plurality of support vectors (i-th support vector, j-th support vector) before integration with an integration vector, and an identification function. Is calculated. For example, the evaluation value calculation unit 105 replaces s ⁽ⁱ⁾ and s ^(j) with s, and replaces α _i and α _j with α, the value of the evaluation function (expression (11)) (evaluation function value). Is obtained as an evaluation value.

出力制御部１０６は、情報処理の結果などの各種情報を出力する。例えば出力制御部１０６は、評価値（評価関数値）が最小となる第１基準ベクトルと第２基準ベクトルとを出力する。 The output control unit 106 outputs various information such as a result of information processing. For example, the output control unit 106 outputs a first reference vector and a second reference vector having the smallest evaluation value (evaluation function value).

取得部１０１、選択部１０２、生成部１０３、重み算出部１０４、評価値算出部１０５、および、出力制御部１０６は、例えば、ＣＰＵ（Central Processing Unit）などの処理装置にプログラムを実行させること、すなわち、ソフトウェアにより実現してもよいし、ＩＣ（Integrated Circuit）などのハードウェアにより実現してもよいし、ソフトウェアおよびハードウェアを併用して実現してもよい。 The acquisition unit 101, the selection unit 102, the generation unit 103, the weight calculation unit 104, the evaluation value calculation unit 105, and the output control unit 106 allow a processing device such as a CPU (Central Processing Unit) to execute a program. That is, it may be realized by software, by hardware such as an IC (Integrated Circuit), or by using both software and hardware.

次に、このように構成された第１の実施形態にかかる情報処理装置１００によるサポートベクトルの統合処理について図２を用いて説明する。図２は、第１の実施形態における統合処理の一例を示すフローチャートである。 Next, a support vector integration process performed by the information processing apparatus 100 according to the first embodiment thus configured will be described with reference to FIG. FIG. 2 is a flowchart illustrating an example of the integration processing according to the first embodiment.

取得部１０１は、記憶部２０からＳＶＭの学習で得られたサポートベクトルｓ^（ｉ）（ｉ＝１，２，・・・，Ｍ）と、対応する重みα_ｉ（ｉ＝１，２，・・・，Ｍ）を取得する（ステップＳ１０１）。 The acquisition unit 101 determines the support vector s ⁽ⁱ⁾ (i = 1, 2,..., M) obtained from the storage unit 20 by learning the SVM and the corresponding weight α _i (i = 1, 2,. .., M) are acquired (step S101).

選択部１０２は、取得されたサポートベクトルから、同じクラスに属するｉ番目とｊ番目のサポートベクトルを選択する（ステップＳ１０２）。選択部１０２は、例えば、ｉ＝１，ｊ＝１から始めて、順にｉとｊをインクリメントし、ｓ^（ｉ）およびｓ^（ｊ）が異なるクラスに属する場合はスキップして次の組み合わせを選択する。 The selecting unit 102 selects the i-th and j-th support vectors belonging to the same class from the acquired support vectors (step S102). The selection unit 102 starts with, for example, i = 1 and j = 1, increments i and j in order, skips when s ⁽ⁱ⁾ and s ^(j) belong to different classes, and selects the next combination. .

生成部１０３は、選択されたサポートベクトルｓ^（ｉ）およびｓ^（ｊ）を統合して、統合ベクトルｓを生成する（ステップＳ１０３）。また重み算出部１０４は、生成された統合ベクトルｓの重みαを算出する（ステップＳ１０４）。 The generation unit 103 integrates the selected support vectors s ⁽ⁱ⁾ and s ^(j) to generate an integrated vector s (step S103). The weight calculator 104 calculates the weight α of the generated integrated vector s (step S104).

評価値算出部１０５は、ｓ^（ｉ）、ｓ^（ｊ）を、ｓで置き換え、α_ｉ、α_ｊをαで置き換えた場合の評価関数（（１１）式）の値である評価関数値を求める（ステップＳ１０５）。求めた評価関数値が、それ以前に求められた評価関数値の最小値よりも小さい場合、評価値算出部１０５は、統合するサポートベクトルのインデックスｉ，ｊ、ｓ、α、および、評価関数値を例えば記憶部２０に記憶する。 The evaluation value calculation unit 105 calculates an evaluation function value that is a value of an evaluation function (Equation (11)) when s ⁽ⁱ⁾ and s ^(j) are replaced with s, and α _i and α _j are replaced with α. It is determined (step S105). When the obtained evaluation function value is smaller than the minimum value of the evaluation function values obtained before that, the evaluation value calculation unit 105 determines the indexes i, j, s, α of the support vectors to be integrated, and the evaluation function value. Is stored in the storage unit 20, for example.

選択部１０２は、終了判定を行う（ステップＳ１０６）。例えば選択部１０２は、同じクラスに属するすべてのサポートベクトルの組が選択済みであれば処理が終了したと判定する。処理が終了していない場合（ステップＳ１０６：Ｎｏ）、ステップＳ１０２に戻り処理が繰り返される。 The selection unit 102 makes an end determination (step S106). For example, the selection unit 102 determines that the process has been completed if all the sets of support vectors belonging to the same class have been selected. If the processing has not been completed (Step S106: No), the process returns to Step S102 and the processing is repeated.

処理が終了した場合（ステップＳ１０６：Ｙｅｓ）、出力制御部１０６は、評価関数値が最小となるサポートベクトルのインデックスｉ，ｊ、ｓ、αを出力する（ステップＳ１０７）。 When the processing is completed (Step S106: Yes), the output control unit 106 outputs the support vector indexes i, j, s, and α that minimize the evaluation function value (Step S107).

以上の処理により、入力ベクトルの分布に基づく近似誤差の期待値を最小化するサポートベクトルの統合が得られる。これにより、識別関数の近似精度を極力保った上で、識別にかかる処理時間を低減することができる。 Through the above processing, integration of the support vectors that minimizes the expected value of the approximation error based on the distribution of the input vectors is obtained. As a result, the processing time required for identification can be reduced while maintaining the approximation accuracy of the identification function as much as possible.

上記の処理では、２つのサポートベクトルを１つに統合したが、これを複数回繰り返すことでサポートベクトルの数を任意の数まで低減することができる。繰り返しに際して、統合されたサポートベクトルを統合対象に含めてもよいし、統合されたサポートベクトル以外を統合対象としてもよい。統合を繰り返す場合、サポートベクトルの組ごとの評価関数値を最初の繰り返し時に記憶しておいてもよい。これにより、２回目以降の繰り返しでは評価関数値を計算する必要がなくなり、処理を高速化できる。サポートベクトル数をＢ個低減する場合は、それまでに選択されていないサポートベクトルの組の中で評価関数値が最小となるものを、順次Ｂ個選択していけばよい。 In the above processing, two support vectors are integrated into one, but by repeating this multiple times, the number of support vectors can be reduced to an arbitrary number. At the time of repetition, the integrated support vector may be included in the integration target, or a non-integrated support vector may be included in the integration target. When the integration is repeated, the evaluation function value for each pair of support vectors may be stored at the first iteration. This eliminates the need to calculate the evaluation function value in the second and subsequent iterations, and can speed up the processing. In the case of reducing the number of support vectors by B, it is sufficient to sequentially select, from the set of support vectors that have not been selected until then, the number of support vectors having the smallest evaluation function value.

（変形例）
統合するサポートベクトルの最適化にメタヒューリスティクスアルゴリズム（シミュレーティッド・アニーリング、タブーサーチ、遺伝的アルゴリズム）を用いてもよい。この場合は、ステップＳ１０２において、最初の繰り返しでは、選択部１０２は初期値としてランダムにｉ，ｊを選択する。選択部１０２は、２回目以降の繰り返しでは前の繰り返し時に得られた識別関数値を用いてメタヒューリスティクスアルゴリズムに基づきｉ，ｊを選択する。このような変形例では、ｉ，ｊのすべての組み合わせに対して評価関数値を求める必要がないため、サポートベクトル統合の処理の高速化ができる。 (Modification)
A meta-heuristic algorithm (simulated annealing, tabu search, genetic algorithm) may be used to optimize the support vectors to be integrated. In this case, in the first iteration in step S102, the selection unit 102 randomly selects i and j as initial values. In the second and subsequent iterations, the selection unit 102 selects i and j based on the metaheuristic algorithm using the identification function values obtained in the previous iteration. In such a modified example, since it is not necessary to obtain the evaluation function values for all combinations of i and j, the processing of support vector integration can be speeded up.

また、ステップＳ１０６による繰り返しは行わず、ステップＳ１０２でｉ，ｊを選択するときにヒューリスティックな基準を用いてもよい。α_ｉ、α_ｊが小さく、ｓ^（ｉ）、ｓ^（ｊ）の距離が小さいときに、評価関数値（（１１）式）が小さくなる傾向がある。そこで、選択部１０２は、α_ｉ、α_ｊが特定閾値以下で、かつ、距離が最小となるｓ^（ｉ）、ｓ^（ｊ）（Ｋ（ｓ^（ｉ）、ｓ^（ｊ））が最大となるｓ^（ｉ）、ｓ^（ｊ））を選択する。例えばサポートベクトル数をＢ個低減する場合、選択部１０２は、α_ｉ、α_ｊが特定閾値以下で、かつ、距離が小さい組からＢ個の重複しないｓ^（ｉ）、ｓ^（ｊ）を選択すればよい。 Further, the repetition in step S106 may not be performed, and a heuristic criterion may be used when selecting i and j in step S102. When α _i and α _j are small and the distance between s ⁽ⁱ⁾ and s ^(j) is small, the evaluation function value (equation (11)) tends to be small. Therefore, the selecting unit 102 determines that s ⁽ⁱ⁾ , s ^(j) (K (s ⁽ⁱ⁾ , s ^(j) )) in which α _i and α _j are equal to or less than the specific threshold value and the distance is minimum are maximum. S ⁽ⁱ⁾ and s ^(j) ). For example, when the number of support vectors is reduced by B, the selection unit 102 selects B non-overlapping s ⁽ⁱ⁾ and s ^(j) from a set in which α _i and α _j are equal to or less than a specific threshold and the distance is small. do it.

また、ステップＳ１０３で統合ベクトルを生成する際、ｕを一次元探索せずに、以下の（１７）式に示すように、重みの比率で固定してもよい。

When generating the integrated vector in step S103, u may be fixed at a weight ratio as shown in the following equation (17) without performing one-dimensional search for u.

このように計算を簡略化することで識別関数の近似精度が劣化する可能性があるが、サポートベクトルの統合にかかる処理時間を低減することができる。例えば、動画の各フレームでオンライン学習する場合では、サポートベクトルを統合する処理時間もリアルタイム性に寄与する。従ってこのような高速化が意味を持つ。 Although the approximation accuracy of the discriminant function may be degraded by simplifying the calculation in this way, the processing time required for integrating the support vectors can be reduced. For example, when online learning is performed for each frame of a moving image, the processing time for integrating support vectors also contributes to real-time performance. Therefore, such high speed is significant.

以上では、２クラス分類の例を扱ったが、多クラス分類にも容易に拡張ができる。多クラス分類の場合は、クラスの数だけ識別関数が用意されるが、この各々の識別関数の近似を同様に行えばよい。 Although the example of the two-class classification has been described above, the present invention can be easily extended to the multi-class classification. In the case of multi-class classification, as many discriminating functions as the number of classes are prepared, and approximation of each discriminant function may be performed in the same manner.

（第２の実施形態）
第２の実施形態では、カーネル関数にＲＢＦ（Radial Basis Function）カーネルを使う場合を説明する。ＲＢＦカーネルは、以下の（１８）式により定義される。γはパラメータである。

(Second embodiment)
In the second embodiment, a case where an RBF (Radial Basis Function) kernel is used as a kernel function will be described. The RBF kernel is defined by the following equation (18). γ is a parameter.

ＲＢＦカーネルを使用する場合は、期待値Ｅ［Ｋ（ａ，ｘ）Ｋ（ｂ，ｘ）］は以下の（１９）式により表される。

When the RBF kernel is used, the expected value E [K (a, x) K (b, x)] is expressed by the following equation (19).

入力ベクトルｘ∈Ｒ^Ｄの分布ｐ（ｘ）として一様分布を仮定すると、以下の（２０）式が成り立つ。本実施形態では、一様分布に基づく期待値Ｅ［ｆ（ｘ）］は、領域全体に渡ったｆ（ｘ）の累積値（積分値）と定義する。

Assuming a uniform distribution as a distribution p of the input vector x∈R ^D (x), below it is established (20). In the present embodiment, the expected value E [f (x)] based on the uniform distribution is defined as the cumulative value (integral value) of f (x) over the entire area.

従って、Ｅ［Ｋ（ａ，ｘ）Ｋ（ｂ，ｘ）］は以下の（２１）式で表せる。

Therefore, E [K (a, x) K (b, x)] can be expressed by the following equation (21).

また、入力ベクトルｘの分布ｐ（ｘ）として平均μ、共分散Σの正規分布を仮定すると、以下の（２２）式が成り立つ。

Further, assuming a normal distribution of mean μ and covariance として as distribution p (x) of input vector x, the following equation (22) is established.

従って、Ｅ［Ｋ（ａ，ｘ）Ｋ（ｂ，ｘ）］は以下の（２３）式で表せる。

Therefore, E [K (a, x) K (b, x)] can be expressed by the following equation (23).

いずれの場合も、Ｅ［Ｋ（ａ，ｘ）Ｋ（ｂ，ｘ）］は、ａおよびｂに依存しない定数Ｃを用いて以下の（２４）式のように表現できる。ただし、任意のｐ（ｘ）に対して（２４）式が成り立つわけではない。

In any case, E [K (a, x) K (b, x)] can be expressed as in the following equation (24) using a constant C independent of a and b. However, equation (24) does not hold for any p (x).

この結果を用いると（１３）式は、以下の（２５）式のように書き換えることができる。

Using this result, equation (13) can be rewritten as equation (25) below.

また、以下の（２６）式が成り立つ。

Also, the following equation (26) holds.

入力ベクトルｘの分布ｐ（ｘ）に一様分布を仮定すると、（２６）式は以下の（２７）式のように表せる。Ｃ’はａおよびｂに依存しない定数である。

Assuming a uniform distribution for the distribution p (x) of the input vector x, Equation (26) can be expressed as Equation (27) below. C ′ is a constant independent of a and b.

この結果を用いると（１５）式は、以下の（２８）式のように書き換えることができる。

Using this result, equation (15) can be rewritten as equation (28) below.

ここで、ｕを以下の（２９）式で表す。

Here, u is represented by the following equation (29).

このｕを用いると（２８）式は、以下の（３０）式で表される。第１の実施形態では（１６）式で人為的にｓをｓ^（ｉ）およびｓ^（ｊ）の線形結合で表現したが、ＲＢＦカーネルでは自然に線形結合の表現が得られる。

Using this u, the expression (28) is expressed by the following expression (30). In the first embodiment, s is artificially represented by the linear combination of s ⁽ⁱ⁾ and s ^(j) in Expression (16). However, the RBF kernel naturally provides a linear combination.

（３０）式を（２９）式に代入してｓを消去すると、以下の（３１）式が得られる。

By substituting equation (30) into equation (29) and eliminating s, the following equation (31) is obtained.

（３１）式の右辺をｇ（ｕ）と置くと、ｕを求めることは関数ｇ（ｕ）の不動点を求める問題である。ｇ’（ｕ）＜１の領域に解が存在する場合（例えば閉区間［０１］でｇ’（ｕ）＜１）は、適当な初期値（例えば（１７）式）から始めてｕ←ｇ（ｕ）（ｕへのｇ（ｕ）の代入）を繰り返すことで、解を求めることができる。ｇ’（ｕ）＜１の領域に解が存在しない場合は、ニュートン法で解くか、一次元探索により解くことができる。 Assuming that the right side of equation (31) is g (u), finding u is a problem of finding a fixed point of the function g (u). If a solution exists in the region of g ′ (u) <1 (eg, g ′ (u) <1 in the closed section [0 1]), u ← g starts from an appropriate initial value (eg, equation (17)). A solution can be obtained by repeating (u) (substitution of g (u) into u). If there is no solution in the region where g '(u) <1, it can be solved by Newton's method or by one-dimensional search.

ｕが求まると（３０）式からｓが求まり、さらに（２５）式からαが求まる。このときの評価関数値は、以下の（３２）式で表される。

When u is found, s is found from equation (30), and α is found from equation (25). The evaluation function value at this time is expressed by the following equation (32).

この評価関数値を最小化するサポートベクトルのインデックスｉ，ｊが求められる。 The index i, j of the support vector that minimizes this evaluation function value is obtained.

以上の原理を元に構成される本実施形態の詳細について以下に説明する。図３は、第２の実施形態にかかる情報処理装置２００の構成を示すブロック図である。 The details of the present embodiment based on the above principle will be described below. FIG. 3 is a block diagram illustrating a configuration of an information processing apparatus 200 according to the second embodiment.

図３に示すように、本実施形態の情報処理装置２００は、記憶部２０と、取得部１０１と、選択部１０２と、生成部２０３と、重み算出部２０４と、評価値算出部２０５と、出力制御部１０６と、を備えている。 As illustrated in FIG. 3, the information processing device 200 according to the present embodiment includes a storage unit 20, an acquisition unit 101, a selection unit 102, a generation unit 203, a weight calculation unit 204, an evaluation value calculation unit 205, And an output control unit 106.

第２の実施形態では、生成部２０３、重み算出部２０４、および、評価値算出部２０５の機能が第１の実施形態と異なっている。その他の構成および機能は、第１の実施形態にかかる情報処理装置１００のブロック図である図１と同様であるので、同一符号を付し、ここでの説明は省略する。また第２の実施形態の統合処理の全体の流れは、第１の実施形態の統合処理を示す図２と同様であるため説明を省略する。以下では、第１の実施形態との相違部分を記載する。 In the second embodiment, the functions of the generation unit 203, the weight calculation unit 204, and the evaluation value calculation unit 205 are different from those of the first embodiment. Other configurations and functions are the same as those in FIG. 1 which is a block diagram of the information processing apparatus 100 according to the first embodiment. Further, the overall flow of the integration processing of the second embodiment is the same as that of FIG. 2 showing the integration processing of the first embodiment, and thus the description is omitted. In the following, differences from the first embodiment will be described.

図２のステップＳ１０３に対応する処理として、生成部２０３は、選択されたサポートベクトルｓ^（ｉ）およびｓ^（ｊ）を統合して、統合ベクトルｓを生成する。第１の実施形態の生成部１０３は、例えば（１３）式、（１５）式および（１６）式などを用いて統合ベクトルｓを生成した。本実施形態の生成部２０３は、例えば、（３０）式および（３１）式などを用いて統合ベクトルｓを生成する。 As processing corresponding to step S103 in FIG. 2, the generation unit 203 integrates the selected support vectors s ⁽ⁱ⁾ and s ^(j) to generate an integrated vector s. The generation unit 103 according to the first embodiment generates the integrated vector s using, for example, Expressions (13), (15), and (16). The generation unit 203 of the present embodiment generates the integrated vector s using, for example, Expressions (30) and (31).

図２のステップＳ１０４に対応する処理として、重み算出部２０４は、生成された統合ベクトルｓの重みαを算出する。重み算出部２０４は、例えば（２５）式を用いて重みαを算出する。 As processing corresponding to step S104 in FIG. 2, the weight calculation unit 204 calculates the weight α of the generated integrated vector s. The weight calculation unit 204 calculates the weight α using, for example, Equation (25).

図２のステップＳ１０５に対応する処理として、評価値算出部２０５は、ｓ^（ｉ）、ｓ^（ｊ）を、ｓで置き換え、α_ｉ、α_ｊをαで置き換えた場合の評価関数（（３２）式）の値である評価関数値を求める。求めた評価関数値が、それ以前に求められた評価関数値の最小値よりも小さい場合、評価値算出部２０５は、統合するサポートベクトルのインデックスｉ，ｊ、ｓ、α、および、評価関数値を例えば記憶部２０に記憶する。 As a process corresponding to step S105 in FIG. 2, the evaluation value calculation unit 205 replaces s ⁽ⁱ⁾ and s ^(j) with s, and replaces α _i and α _j with α. The evaluation function value which is the value of the expression is calculated. When the obtained evaluation function value is smaller than the minimum value of the evaluation function values obtained before that, the evaluation value calculation unit 205 determines the indexes i, j, s, α of the support vectors to be integrated, and the evaluation function value. Is stored in the storage unit 20, for example.

本実施形態によれば、カーネル関数にＲＢＦカーネルを使う場合にも、入力ベクトルの分布に基づく近似誤差の期待値を最小化するサポートベクトルの統合が得られる。これにより、識別関数の近似精度を極力保った上で、識別にかかる処理時間を低減することができる。入力ベクトルの確率分布として一様分布または正規分布を用いることにより、入力ベクトルの一様性、または、入力ベクトルの偏りを反映した上で識別関数を近似することができる。 According to the present embodiment, even when the RBF kernel is used as the kernel function, integration of the support vectors that minimizes the expected value of the approximation error based on the distribution of the input vector can be obtained. As a result, the processing time required for identification can be reduced while maintaining the approximation accuracy of the identification function as much as possible. By using a uniform distribution or a normal distribution as the probability distribution of the input vector, it is possible to approximate the discriminant function while reflecting the uniformity of the input vector or the bias of the input vector.

（第３の実施形態）
第３の実施形態では、カーネル関数に指数Ｂｈａｔｔａｃｈａｒｙｙａカーネルを使う場合を説明する。指数Ｂｈａｔｔａｃｈａｒｙｙａカーネルは、以下の（３３）式により定義される。β、γはパラメータである。なお入力ベクトルｘは各要素が０以上の実数で、総和が１になるように正規化されている。指数Ｂｈａｔｔａｃｈａｒｙｙａカーネルは、特にヒストグラム特徴を扱う場合に効果を発揮する。

(Third embodiment)
In the third embodiment, a case will be described in which an exponential Bhattacharyya kernel is used for a kernel function. The exponent Bhattacharyya kernel is defined by the following equation (33). β and γ are parameters. Note that the input vector x is a real number in which each element is 0 or more, and is normalized so that the sum is 1. The exponential Bhattacharyya kernel is particularly effective when dealing with histogram features.

指数Ｂｈａｔｔａｃｈａｒｙｙａカーネルを使用する場合は、期待値Ｅ［Ｋ（ａ，ｘ）Ｋ（ｂ，ｘ）］は以下の（３４）式により表される。

ここで、入力ベクトルｘの分布として領域［０１］^Ｄ内での一様分布を考える（厳密には入力ベクトルｘはＤ次元単体上に存在するが近似をする）。このとき、以下の（３５）式が成り立つ。

When the exponential Bhattacharyya kernel is used, the expected value E [K (a, x) K (b, x)] is expressed by the following equation (34).

Here, a uniform distribution in the region [0 1] ^D is considered as the distribution of the input vector x (strictly speaking, the input vector x exists on a D-dimensional simple substance but is approximated). At this time, the following equation (35) holds.

ここでｅｘｐを二次の項までテーラー展開すると、以下の（３６）式が得られる。

Here, if exp is expanded to a second-order term by Taylor expansion, the following equation (36) is obtained.

（３４）式および（３６）式より、以下の（３７）式が得られる。

From the equations (34) and (36), the following equation (37) is obtained.

この（３７）式を用いると、（１３）式は以下の（３８）式のように書き換えることができる。

Using this equation (37), equation (13) can be rewritten as equation (38) below.

また、以下の（３９）式が成り立つ。

Also, the following equation (39) holds.

ここで、入力ベクトルｘの分布として領域［０１］^Ｄ内での一様分布を考え、ｅｘｐを二次の項までテーラー展開すると、以下の（４０）式が得られる。

Here, considering the uniform distribution in the region [0 1] ^D as the distribution of the input vector x, and performing the Taylor expansion of exp to a quadratic term, the following equation (40) is obtained.

この結果を用いると、（１５）式は以下の（４１）式のように書き換えることができる。ただし、ｔ＝１，２，・・・，Ｄである。

Using this result, equation (15) can be rewritten as equation (41) below. Here, t = 1, 2,..., D.

（３８）式および（４１）式の拘束式からｓおよびαを求めることができるが、これらの式を解くのは容易ではない。そこで、第１の実施形態と同じ近似解法を用いる。まず、ｓをｓ^（ｉ）およびＳ^（ｊ）の線形結合で以下の（４２）式のように表す。

Although s and α can be obtained from the constraint equations (38) and (41), it is not easy to solve these equations. Therefore, the same approximate solution as in the first embodiment is used. First, s is represented by a linear combination of s ⁽ⁱ⁾ and S ^(j) as in the following equation (42).

（３８）式を（４１）式に代入してαを消去して、条件式の二乗誤差を最小化するｕを一次元探索で求める。このようにして求めたｕから、（４１）式よりｓを求め、さらに（３８）式からαを求める。このときの評価関数値は、以下の（４３）式で表される。

The equation (38) is substituted into the equation (41) to eliminate α, and u that minimizes the square error of the conditional expression is obtained by a one-dimensional search. From u obtained in this manner, s is obtained from equation (41), and α is obtained from equation (38). The evaluation function value at this time is expressed by the following equation (43).

以上の原理を基に構成される本実施形態の詳細について以下に説明する。図４は、第３の実施形態にかかる情報処理装置３００の構成を示すブロック図である。 Details of the present embodiment based on the above principle will be described below. FIG. 4 is a block diagram illustrating a configuration of an information processing device 300 according to the third embodiment.

図４に示すように、本実施形態の情報処理装置３００は、記憶部２０と、取得部１０１と、選択部１０２と、生成部３０３と、重み算出部３０４と、評価値算出部３０５と、出力制御部１０６と、を備えている。 As illustrated in FIG. 4, the information processing apparatus 300 of the present embodiment includes a storage unit 20, an acquisition unit 101, a selection unit 102, a generation unit 303, a weight calculation unit 304, an evaluation value calculation unit 305, And an output control unit 106.

第３の実施形態では、生成部３０３、重み算出部３０４、および、評価値算出部３０５の機能が第１の実施形態と異なっている。その他の構成および機能は、第１の実施形態にかかる情報処理装置１００のブロック図である図１と同様であるので、同一符号を付し、ここでの説明は省略する。また第３の実施形態の統合処理の全体の流れは、第１の実施形態の統合処理を示す図２と同様であるため説明を省略する。以下では、第１の実施形態との相違部分を記載する。 In the third embodiment, the functions of the generation unit 303, the weight calculation unit 304, and the evaluation value calculation unit 305 are different from those of the first embodiment. Other configurations and functions are the same as those in FIG. 1 which is a block diagram of the information processing apparatus 100 according to the first embodiment. The overall flow of the integration process according to the third embodiment is the same as that in FIG. 2 showing the integration process according to the first embodiment, and thus the description is omitted. In the following, differences from the first embodiment will be described.

図２のステップＳ１０３に対応する処理として、生成部３０３は、選択されたサポートベクトルｓ^（ｉ）およびｓ^（ｊ）を統合して、統合ベクトルｓを生成する。本実施形態の生成部３０３は、例えば、（４２）式および（３１）式などを用いて統合ベクトルｓを生成する。 As processing corresponding to step S103 in FIG. 2, the generation unit 303 integrates the selected support vectors s ⁽ⁱ⁾ and s ^(j) to generate an integrated vector s. The generation unit 303 of the present embodiment generates the integrated vector s using, for example, Expressions (42) and (31).

図２のステップＳ１０４に対応する処理として、重み算出部３０４は、生成された統合ベクトルｓの重みαを算出する。重み算出部３０４は、例えば（３８）式を用いて重みαを算出する。 As processing corresponding to step S104 in FIG. 2, the weight calculation unit 304 calculates the weight α of the generated integrated vector s. The weight calculation unit 304 calculates the weight α using, for example, Expression (38).

図２のステップＳ１０５に対応する処理として、評価値算出部３０５は、ｓ^（ｉ）、ｓ^（ｊ）を、ｓで置き換え、α_ｉ、α_ｊをαで置き換えた場合の評価関数（（４２）式）の値である評価関数値を求める。求めた評価関数値が、それ以前に求められた評価関数値の最小値よりも小さい場合、評価値算出部３０５は、統合するサポートベクトルのインデックスｉ，ｊ、ｓ、α、および、評価関数値を例えば記憶部２０に記憶する。 As a process corresponding to step S105 in FIG. 2, the evaluation value calculation unit 305 replaces s ⁽ⁱ⁾ and s ^(j) with s and replaces α _i and α _j with α. The evaluation function value which is the value of the expression is calculated. When the obtained evaluation function value is smaller than the minimum value of the evaluation function values obtained before that, the evaluation value calculation unit 305 determines the indexes i, j, s, α of the support vectors to be integrated, and the evaluation function value. Is stored in the storage unit 20, for example.

本実施形態によれば、カーネル関数に指数Ｂｈａｔｔａｃｈａｒｙｙａカーネルを使う場合にも、入力ベクトルの分布に基づく近似誤差の期待値を最小化するサポートベクトルの統合が得られる。これにより、識別関数の近似精度を極力保った上で、識別にかかる処理時間を低減することができる。 According to the present embodiment, even when the exponential Bhattacharyya kernel is used for the kernel function, integration of support vectors that minimizes the expected value of the approximation error based on the distribution of the input vector can be obtained. As a result, the processing time required for identification can be reduced while maintaining the approximation accuracy of the identification function as much as possible.

（第４の実施形態）
第４の実施形態では、学習データから入力ベクトルの分布を推定し、その推定分布に基づき識別関数の近似誤差を最小化するようにサポートベクトルを統合する例を説明する。図５は、第４の実施形態にかかる情報処理装置４００の構成を示すブロック図である。 (Fourth embodiment)
In the fourth embodiment, an example will be described in which the distribution of input vectors is estimated from learning data, and the support vectors are integrated based on the estimated distribution so as to minimize the approximation error of the discriminant function. FIG. 5 is a block diagram illustrating a configuration of an information processing device 400 according to the fourth embodiment.

図５に示すように、本実施形態の情報処理装置４００は、記憶部２０と、推定部４０１と、取得部１０１と、選択部１０２と、生成部１０３と、重み算出部１０４と、評価値算出部１０５と、出力制御部１０６と、を備えている。 As shown in FIG. 5, the information processing apparatus 400 according to the present embodiment includes a storage unit 20, an estimation unit 401, an acquisition unit 101, a selection unit 102, a generation unit 103, a weight calculation unit 104, an evaluation value A calculation unit 105 and an output control unit 106 are provided.

第４の実施形態では、推定部４０１を追加したことが第１の実施形態と異なっている。その他の構成および機能は、第１の実施形態にかかる情報処理装置１００のブロック図である図１と同様であるので、同一符号を付し、ここでの説明は省略する。 The fourth embodiment differs from the first embodiment in that an estimation unit 401 is added. Other configurations and functions are the same as those in FIG. 1 which is a block diagram of the information processing apparatus 100 according to the first embodiment.

推定部４０１は、記憶部２０などに記憶される学習データに含まれるＮ個の入力ベクトルｓ^（ｉ）（ｉ＝１，２，・・・，Ｎ）を基に入力ベクトル分布ｐ（ｘ）を推定する。推定部４０１による入力ベクトル分布の推定処理の詳細については後述する。 The estimating unit 401 determines an input vector distribution p (x) based on N input vectors s ⁽ⁱ⁾ (i = 1, 2,..., N) included in the learning data stored in the storage unit 20 or the like. Is estimated. The details of the estimation process of the input vector distribution by the estimation unit 401 will be described later.

次に、このように構成された第４の実施形態にかかる情報処理装置４００によるサポートベクトルの統合処理について図６を用いて説明する。図６は、第４の実施形態における統合処理の一例を示すフローチャートである。 Next, a support vector integration process performed by the information processing apparatus 400 according to the fourth embodiment thus configured will be described with reference to FIG. FIG. 6 is a flowchart illustrating an example of the integration processing according to the fourth embodiment.

推定部４０１は、記憶部２０から学習データを読み込み、学習データに含まれるＮ個の入力ベクトルｓ^（ｉ）（ｉ＝１，２，・・・，Ｎ）を基に入力ベクトル分布ｐ（ｘ）を推定する（ステップＳ２０１）。 The estimation unit 401 reads the learning data from the storage unit 20, and based on N input vectors s ⁽ⁱ⁾ (i = 1, 2,..., N) included in the learning data, an input vector distribution p (x ) Is estimated (step S201).

ステップＳ２０２からステップＳ２０８までは、第１の実施形態にかかる情報処理装置１００におけるステップＳ１０１からステップＳ１０７までと同様の処理なので、その説明を省略する。 Steps S202 to S208 are the same as steps S101 to S107 in the information processing apparatus 100 according to the first embodiment, and a description thereof will be omitted.

次に、入力ベクトル分布の推定処理について説明する。推定部４０１は、例えば以下の（４４）式で示される経験分布を用いることができる。ここで、δはデルタ関数である。

Next, the process of estimating the input vector distribution will be described. The estimating unit 401 can use, for example, an empirical distribution represented by the following equation (44). Here, δ is a delta function.

入力ベクトル分布ｐ（ｘ）の推定方法はこれに限られるものではない。例えば推定部４０１は、ｐ（ｘ）にパラメトリックなモデルを仮定し、最尤推定によりそのパラメータを求めてもよい。例えばｐ（ｘ）に、正規分布、ラプラス分布、一様分布（領域限定）、ポアソン分布、ベータ分布、ガンマ分布、および、これらの混合分布を考えることができる。推定部４０１は、さらにパラメータに事前分布を仮定してＭＡＰ推定およびベイズ推定によって分布を推定してもよい。 The method of estimating the input vector distribution p (x) is not limited to this. For example, the estimating unit 401 may assume a parametric model for p (x) and obtain its parameters by maximum likelihood estimation. For example, a normal distribution, a Laplace distribution, a uniform distribution (region limited), a Poisson distribution, a beta distribution, a gamma distribution, and a mixed distribution thereof can be considered for p (x). The estimating unit 401 may further estimate the distribution by MAP estimation and Bayes estimation, assuming a prior distribution as a parameter.

別のアプローチとして、推定部４０１は、学習データから得られた識別関数ｆ（ｘ）をもとに入力ベクトル分布を推定してもよい。例えば推定部４０１は、以下の（４５）式のようにｐ（ｘ）を設定する。

As another approach, the estimation unit 401 may estimate an input vector distribution based on a discriminant function f (x) obtained from learning data. For example, the estimating unit 401 sets p (x) as in the following equation (45).

厳密に言えば、（４５）式の分布は入力ベクトルの推定分布ではなく、近似誤差の重み付けを行うための便宜的な分布である。ＳＶＭの目的はクラス分類であり、識別境界付近の識別関数の近似精度が重要である。そこで、（４５）式のようにｆ（ｘ）＝０となる識別境界付近に入力ベクトルが集中していると考えて近似誤差の期待値を取ることで、識別境界付近の近似精度を高めることができる。（４５）式に基づく近似誤差の期待値計算は、例えばマルコフ連鎖モンテカルロ法により実施することができる。 Strictly speaking, the distribution of Expression (45) is not an estimated distribution of the input vector but a distribution for the purpose of weighting the approximation error. The purpose of the SVM is class classification, and the approximation accuracy of the identification function near the identification boundary is important. Therefore, it is considered that the input vector is concentrated near the discrimination boundary where f (x) = 0 as in the equation (45), and the expected value of the approximation error is taken to improve the approximation accuracy near the discrimination boundary. Can be. The calculation of the expected value of the approximation error based on the equation (45) can be performed by, for example, the Markov chain Monte Carlo method.

（４５）式の分布は、識別境界からの距離が大きいほど、確率密度の値が小さくなる分布の例である。このような確率分布を用いることにより、識別精度への影響が大きい識別境界付近に重点を置いて識別関数を近似することが可能となる。 The distribution of Expression (45) is an example of a distribution in which the value of the probability density decreases as the distance from the identification boundary increases. By using such a probability distribution, it is possible to approximate the discriminant function with emphasis on the vicinity of the discriminating boundary that greatly affects the discriminating accuracy.

なお、本実施形態は一般的なカーネルの場合だけでなく、第２の実施形態のＲＢＦカーネル、および、第３の実施形態の指数Ｂｈａｔｔａｃｈａｒｙｙａカーネルの場合にも同様に適用することができる。 The present embodiment can be applied not only to a general kernel, but also to the RBF kernel of the second embodiment and the exponential Bhattacharya kernel of the third embodiment.

本実施形態によれば、学習データを基に入力ベクトルの分布を推定し、その分布に基づいた識別関数の近似誤差の期待値最小化を行うことで、識別関数の近似精度を高めることができる。また、学習データの分布を反映した上で識別関数を近似することができる。 According to the present embodiment, the estimation accuracy of the identification function can be improved by estimating the distribution of the input vector based on the learning data and minimizing the expected value of the approximation error of the identification function based on the distribution. . Further, the identification function can be approximated while reflecting the distribution of the learning data.

（第５の実施形態）
本実施形態では、基準ベクトルの統合により得られた識別関数を用いてパターン認識などを実行する識別装置として情報処理装置を実現する例を説明する。図７は、第５の実施形態にかかる情報処理装置５００の構成を示すブロック図である。 (Fifth embodiment)
In the present embodiment, an example will be described in which an information processing apparatus is realized as an identification apparatus that performs pattern recognition and the like using an identification function obtained by integrating reference vectors. FIG. 7 is a block diagram illustrating a configuration of an information processing device 500 according to the fifth embodiment.

図７に示すように、本実施形態の情報処理装置５００は、記憶部２０と、取得部１０１と、選択部１０２と、生成部１０３と、重み算出部１０４と、評価値算出部１０５と、出力制御部１０６と、認識部５０１と、を備えている。 As illustrated in FIG. 7, the information processing apparatus 500 according to the present embodiment includes a storage unit 20, an acquisition unit 101, a selection unit 102, a generation unit 103, a weight calculation unit 104, an evaluation value calculation unit 105, An output control unit 106 and a recognition unit 501 are provided.

第５の実施形態では、認識部５０１を追加したことが第１の実施形態と異なっている。その他の構成および機能は、第１の実施形態にかかる情報処理装置１００のブロック図である図１と同様であるので、同一符号を付し、ここでの説明は省略する。 The fifth embodiment is different from the first embodiment in that a recognition unit 501 is added. Other configurations and functions are the same as those in FIG. 1 which is a block diagram of the information processing apparatus 100 according to the first embodiment.

認識部５０１は、第１の実施形態の方法で識別関数が近似されたＳＶＭを用いた識別（認識）処理を実行する。例えば認識部５０１は、識別の対象となる特徴量ベクトルがいずれのクラスに属するかを、ＳＶＭを用いて識別する。特徴量ベクトルは、例えば、画像の特徴、文字の特徴、および、音声の特徴を表すベクトルである。このように、本実施形態の識別装置は、画像認識、文字認識、および、音声認識などの任意の識別処理に適用できる。 The recognizing unit 501 executes a recognizing (recognizing) process using the SVM whose discriminant function is approximated by the method of the first embodiment. For example, the recognizing unit 501 uses SVM to identify to which class the feature vector to be identified belongs. The feature amount vector is, for example, a vector representing a feature of an image, a feature of a character, and a feature of a sound. As described above, the identification device of the present embodiment can be applied to any identification processing such as image recognition, character recognition, and voice recognition.

次に、このように構成された第５の実施形態にかかる情報処理装置５００による識別処理について図８を用いて説明する。図８は、第５の実施形態における識別処理の一例を示すフローチャートである。 Next, an identification process performed by the information processing apparatus 500 according to the fifth embodiment thus configured will be described with reference to FIG. FIG. 8 is a flowchart illustrating an example of the identification processing according to the fifth embodiment.

認識部５０１は、識別の対象となる特徴量ベクトルを、記憶部２０または外部装置などから入力する（ステップＳ３０１）。認識部５０１は、ＳＶＭを用いて特徴量ベクトルがいずれのクラスに属するかを識別する（ステップＳ３０２）。認識部５０１は、識別結果を出力し（ステップＳ３０３）、識別処理を終了する。 The recognition unit 501 inputs a feature amount vector to be identified from the storage unit 20 or an external device (step S301). The recognizing unit 501 identifies to which class the feature vector belongs using the SVM (step S302). The recognition unit 501 outputs the identification result (Step S303), and ends the identification processing.

なお、本実施形態は第１の実施形態だけでなく、第２〜第４の実施形態にも同様に適用することができる。本実施形態によれば、より少ない基準ベクトルを用いて近似された識別関数を用いることができるため、識別性能の劣化を極力抑えた上で処理時間を低減することが可能となる。 This embodiment can be applied not only to the first embodiment but also to the second to fourth embodiments. According to the present embodiment, since an identification function approximated by using a smaller number of reference vectors can be used, it is possible to reduce processing time while minimizing deterioration of identification performance.

次に、上記各実施形態にかかる情報処理装置のハードウェア構成について図９を用いて説明する。図９は、各実施形態にかかる情報処理装置のハードウェア構成例を示す説明図である。 Next, a hardware configuration of the information processing apparatus according to each of the above embodiments will be described with reference to FIG. FIG. 9 is an explanatory diagram illustrating a hardware configuration example of the information processing apparatus according to each embodiment.

各実施形態にかかる情報処理装置は、ＣＰＵ（Central Processing Unit）５１などの制御装置と、ＲＯＭ（Read Only Memory）５２やＲＡＭ（Random Access Memory）５３などの記憶装置と、ネットワークに接続して通信を行う通信Ｉ／Ｆ５４と、各部を接続するバス６１を備えている。 The information processing apparatus according to each embodiment communicates with a control device such as a CPU (Central Processing Unit) 51 and a storage device such as a ROM (Read Only Memory) 52 and a RAM (Random Access Memory) 53 by connecting to a network. And a bus 61 for connecting each unit.

なお、本実施形態の情報処理装置で実行されるプログラムは、ＲＯＭ等に予め組み込まれて提供される。 The program executed by the information processing apparatus according to the present embodiment is provided by being incorporated in a ROM or the like in advance.

本実施形態の情報処理装置で実行されるプログラムは、インストール可能な形式又は実行可能な形式のファイルでＣＤ−ＲＯＭ、フレキシブルディスク（ＦＤ）、ＣＤ−Ｒ、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ）等のコンピュータで読み取り可能な記録媒体に記録してコンピュータ・プログラム・プロダクトとして提供するように構成してもよい。 The program executed by the information processing apparatus according to the present embodiment is a file in an installable format or an executable format in a computer such as a CD-ROM, a flexible disk (FD), a CD-R, and a DVD (Digital Versatile Disk). You may comprise so that it may be recorded on a readable recording medium and provided as a computer program product.

さらに、本実施形態の情報処理装置で実行されるプログラムを、インターネット等のネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するように構成してもよい。また、本実施形態の情報処理装置で実行されるプログラムをインターネット等のネットワーク経由で提供または配布するように構成してもよい。 Furthermore, the program executed by the information processing apparatus of the present embodiment may be stored on a computer connected to a network such as the Internet and provided by being downloaded via the network. Further, the program executed by the information processing apparatus of the present embodiment may be provided or distributed via a network such as the Internet.

本実施形態の情報処理装置で実行されるプログラムは、上述した各部を含むモジュール構成となっており、実際のハードウェアとしてはＣＰＵ（プロセッサ）が上記ＲＯＭからプログラムを読み出して実行することにより上記各部が主記憶装置上にロードされ、各部が主記憶装置上に生成されるようになっている。 The program executed by the information processing apparatus according to the present embodiment has a module configuration including the above-described units. As actual hardware, the CPU (processor) reads out the program from the ROM and executes the program. Is loaded on the main storage device, and each unit is generated on the main storage device.

２０記憶部
１００、２００、３００、４００、５００情報処理装置
１０１取得部
１０２選択部
１０３、２０３、３０３生成部
１０４、２０４、３０４重み算出部
１０５、２０５、３０５評価値算出部
１０６出力制御部
４０１推定部
５０１認識部 Reference Signs List 20 storage unit 100, 200, 300, 400, 500 information processing device 101 acquisition unit 102 selection unit 103, 203, 303 generation unit 104, 204, 304 weight calculation unit 105, 205, 305 evaluation value calculation unit 106 output control unit 401 Estimation unit 501 Recognition unit

D. Nguyen， T. Ho， “An efficient method for simplifying support vector machines”，ＩＣＭＬ，６１７−６２４，２００５．D. Nguyen, T. Ho, “An efficient method for simplifying support vector machines”, ICML, 617-624, 2005. Z. Wang， K. Crammer， S. Vucetic， “Ｍｕｌｔｉ−ＣｌａｓｓＰｅｇａｓｏｓＯＮａＢｕｄｇｅｔ”，ＩｎＰｒｏｃ．ＩＣＭＬ，２０１０．Z. Wang, K. Crammer, S. Vucetic, "Multi-Class Pegasos ON a Budget", In Proc. ICML, 2010.

Claims

An information processing apparatus for processing an identification function represented by a sum of values of a kernel function having a plurality of reference vectors and an input vector as arguments,
An acquisition unit that acquires a plurality of the reference vectors,
A generation unit that generates an integrated vector obtained by integrating a first reference vector and a second reference vector included in the plurality of reference vectors;
Calculating an evaluation value representing an expected value of an approximation error based on a probability distribution of the input vector between an approximation function obtained by replacing the first reference vector and the second reference vector with the integrated vector and the discrimination function. An evaluation value calculation unit to perform
An output control unit that outputs the first reference vector and the second reference vector that minimize the evaluation value;
An information processing apparatus comprising:

The reference vector is a support vector obtained by learning a support vector machine,
The information processing device according to claim 1.

The probability distribution of the input vector is a uniform distribution or a normal distribution,
The information processing device according to claim 1.

The probability distribution of the input vector is estimated from learning data,
The information processing device according to claim 1.

The probability distribution of the input vector is a distribution in which the value of the probability density decreases as the distance from the identification boundary increases.
The information processing device according to claim 1.

The output control unit selects and selects the first reference vector and the second reference vector with the minimum evaluation value by a simulated annealing, a taboo search, and a meta-heuristic algorithm including a genetic algorithm. Outputting the first reference vector and the second reference vector;
The information processing device according to claim 1.

The acquisition unit acquires one or more identification functions for identifying one or more classes, respectively.
The information processing device according to claim 1.

An information processing method executed by an information processing apparatus that processes an identification function represented by a sum of values of a kernel function having a plurality of reference vectors and input vectors as arguments,
An obtaining step of obtaining a plurality of the reference vectors;
A generation step of generating an integrated vector obtained by integrating a first reference vector and a second reference vector included in the plurality of reference vectors;
Calculating an evaluation value representing an expected value of an approximation error based on a probability distribution of the input vector between an approximation function obtained by replacing the first reference vector and the second reference vector with the integrated vector and the discrimination function. Evaluation value calculating step to be performed;
An output control step of outputting the first reference vector and the second reference vector that minimize the evaluation value;
An information processing method comprising:

An information processing apparatus that processes an identification function represented by a sum of values of a kernel function having a plurality of reference vectors and input vectors as arguments,
An acquisition unit that acquires a plurality of the reference vectors,
A generation unit that generates an integrated vector obtained by integrating a first reference vector and a second reference vector included in the plurality of reference vectors;
Calculating an evaluation value representing an expected value of an approximation error based on a probability distribution of the input vector between an approximation function obtained by replacing the first reference vector and the second reference vector with the integrated vector and the discrimination function. An evaluation value calculation unit to perform
An output control unit that outputs the first reference vector and the second reference vector that minimize the evaluation value;
Program to function as