JP5130934B2

JP5130934B2 - Recognition system, information processing apparatus, design apparatus, and program

Info

Publication number: JP5130934B2
Application number: JP2008029072A
Authority: JP
Inventors: 震一田村
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 2007-03-05
Filing date: 2008-02-08
Publication date: 2013-01-30
Anticipated expiration: 2028-02-08
Also published as: JP2008250990A

Description

本発明は、与えられた入力値から、入力値に対応するカテゴリを表す値を所定の計算モデルにより算出して、入力値に対応するカテゴリを認識する認識システムの上記計算モデルを導出する方法、この計算モデルによってカテゴリを認識する認識システム、この方法の実施に使用される情報処理装置、設計装置、及び、プログラム等に関する。 The present invention calculates a value representing a category corresponding to an input value from a given input value using a predetermined calculation model, and derives the calculation model of the recognition system for recognizing the category corresponding to the input value, The present invention relates to a recognition system for recognizing a category by using this calculation model, an information processing apparatus used for implementing this method, a design apparatus, a program, and the like.

従来、認識行為をコンピュータ上で実現する方法としては、ニューラルネットワークやサポートベクタマシンを用いた方法が知られている（例えば、特許文献１，非特許文献１参照）。 Conventionally, a method using a neural network or a support vector machine is known as a method for realizing a recognition action on a computer (see, for example, Patent Document 1 and Non-Patent Document 1).

ニューラルネットワークは、神経細胞の機能を数式によりモデル化したものである。神経細胞は、入力信号により加わる電位が閾値を超えると、パルスを発するといった機能を有する。ニューラルネットワークでは、このような機能を、シグモイド関数等の非線形関数を用いて実現する。即ち、ニューラルネットワークでは、入力値を、非線形関数に代入して、その出力値を、次の神経細胞に対応する非線形関数に代入するといった演算を実行する。そして、認識結果に対応する出力値を、末端の非線形関数の出力値から得る。 A neural network is a model of a function of a nerve cell by a mathematical expression. A nerve cell has a function of emitting a pulse when a potential applied by an input signal exceeds a threshold value. In the neural network, such a function is realized by using a non-linear function such as a sigmoid function. That is, in the neural network, an operation is performed in which an input value is substituted into a nonlinear function and an output value is substituted into a nonlinear function corresponding to the next nerve cell. Then, an output value corresponding to the recognition result is obtained from the output value of the terminal nonlinear function.

尚、神経細胞間を結ぶ各シナプスは、異なる伝播効率を有し、認識の結果は、神経細胞間の接続関係及び各神経細胞間の伝播効率によって変化する。ニューラルネットワークにおいては、これを、非線形関数の出力値を結合荷重Ｗで重み付けして、次の非線形関数に代入することで、モデル化している。 In addition, each synapse which connects between nerve cells has different propagation efficiency, and the result of recognition changes with the connection relation between nerve cells, and the propagation efficiency between each nerve cell. In the neural network, this is modeled by weighting the output value of the nonlinear function with the coupling load W and substituting it into the next nonlinear function.

このようなニューラルネットワークを用いた認識システムを構築するに当たっては、まず、ニューラルネットワークの型、即ち、神経細胞に対応するユニット間の接続関係を決定する。ニューラルネットワークの型は、換言すると、結合荷重Ｗが未知のニューラルネットワークである。従って、次には、結合荷重Ｗを決定する。 In constructing a recognition system using such a neural network, first, the type of neural network, that is, the connection relationship between units corresponding to nerve cells is determined. In other words, the type of the neural network is a neural network whose connection weight W is unknown. Therefore, next, the coupling load W is determined.

具体的には、ニューラルネットワークへ入力するベクトルのサンプル、及び、当該サンプルを入力したときに得られるべき出力値の組み合わせを、学習データとして、この学習データの複数から、結合荷重Ｗを学習する。ニューラルネットワークとしては、階層型ネットワークが知られ、この階層型ネットワークにおける結合荷重Ｗの学習方法としては、バックプロパゲーション法が知られている。従来では、このような手法で、認識システムの計算モデルを導出し、当該認識システムを構築している。 Specifically, the combination load W is learned from a plurality of pieces of learning data by using, as learning data, a vector sample input to the neural network and a combination of output values to be obtained when the sample is input. As the neural network, a hierarchical network is known, and a back propagation method is known as a learning method of the connection weight W in the hierarchical network. Conventionally, a calculation model of a recognition system is derived by such a method, and the recognition system is constructed.

この他、サポートベクタマシンによる認識方法は、入力ベクトルＵに対応するカテゴリを表す値ｙを、識別関数ｙ₁又は識別関数ｙ₂により算出し、ベクトルＵに対応するカテゴリを認識するものである。 In addition, the recognition method by the support vector machine is to recognize the category corresponding to the vector U by calculating the value y representing the category corresponding to the input vector U using the discrimination function y ₁ or the discrimination function y ₂ .

尚、本願明細書では、ベクトルＡとベクトルＢとの内積を＜Ａ，Ｂ＞で表す。また、関数ｓｇｎ（ｘ）は、ｘ＞０のとき１をとり、ｘ≦０のとき−１をとる符号関数である。

In the present specification, the inner product of the vector A and the vector B is represented by <A, B>. The function sgn (x) is a sign function that takes 1 when x> 0 and takes −1 when x ≦ 0.

この他、識別関数ｙ₁，ｙ₂に含まれるパラメータα_s（但し、ｓ＝１，２，…，Ｓである。）及びパラメータβは、ベクトルＵのサンプルＵ（ｓ）と、当該サンプルに対応するカテゴリを表す値ｙ（ｓ）の組み合わせからなるＳ個の学習データＤ_v（ｓ）＝｛Ｕ（ｓ），ｙ（ｓ）｝（ｓ＝１，２，…，Ｓ）に基づいて値が決定されるものである。 In addition, the parameter α _s (where s = 1, 2,..., S) and the parameter β included in the discrimination functions y ₁ and y ₂ are included in the sample U (s) of the vector U and the sample. Based on S pieces of learning data D _v (s) = {U (s), y (s)} (s = 1, 2,..., S) composed of combinations of values y (s) representing corresponding categories. The value is to be determined.

関数Ｋ（Ｕ，Ｕ（ｓ），γ）は、カーネルと呼ばれ、ベクトルＵ，Ｕ（ｓ）を写像φにより非線形変換して高次元空間に写像したときのベクトルφ（Ｕ），φ（Ｕ（ｓ））の内積＜φ（Ｕ），φ（Ｕ（ｓ）＞の算出に用いられる。このカーネルＫ（Ｕ，Ｕ（ｓ），γ）としては、ガウシアンカーネルＫ₁や多項式カーネルＫ₂が知られている。 The function K (U, U (s), γ) is called a kernel, and the vectors φ (U), φ (when the vectors U, U (s) are nonlinearly transformed by the mapping φ and mapped to a high-dimensional space. U (s)) is used to calculate the inner product <φ (U), φ (U (s)>. The kernel K (U, U (s), γ) includes a Gaussian kernel K ₁ and a polynomial kernel K. ₂ is known.

識別関数ｙ₂は、非線形サポートベクタマシンと呼ばれ、この非線形サポートベクタマシンを用いた認識システムを構築するに際しては、まず、非線形サポートベクタマシンの型、即ち、パラメータα_s及びパラメータβ並びにパラメータγが未知の非線形サポートベクタマシンを決定する。この後、パラメータγを適当に設定して、学習データＤ_v（ｓ）＝｛Ｕ（ｓ），ｙ（ｓ）｝（ｓ＝１，２，…，Ｓ）に基づき、パラメータα_s及びパラメータβの解を求め、非線形サポートベクタマシンを設計する。

The discriminant function y ₂ is called a non-linear support vector machine. When constructing a recognition system using this non-linear support vector machine, first, the type of the non-linear support vector machine, that is, parameter α _s, parameter β and parameter γ are used. Determine an unknown nonlinear support vector machine. Thereafter, the parameter γ is appropriately set, and the parameter α _s and the parameter are set based on the learning data D _v (s) = {U (s), y (s)} (s = 1, 2,..., S). Find the solution of β and design a nonlinear support vector machine.

尚、サポートベクタマシンの設計時には、周知のように、カテゴリの異なるサンプルを分離し、カテゴリの異なるサンプルの各グループから、最も離れた位置を通る超平面を求めるようにして、パラメータα_s及びパラメータβの解を求める。
特開２００５−３１６８８８号公報ブラジミールバプニク（Vladimir N. Vapnik）著，「統計的学習理論の本質（The Nature of Statistical Learning Theory）」，第２版，（米国），シュプリンガー（Springer），１９９９年，ｐ．１３２−ｐ．１４０ Incidentally, when the support vector machine design, as is well known to separate the different samples of categories, as from each group of samples having different categories, determining the hyperplane passing through the most distant position, the parameter alpha _s and parameters Find the solution of β.
JP 2005-316888 A Vladimir N. Vapnik, “The Nature of Statistical Learning Theory”, 2nd edition, (USA) Springer, 1999, p. 132-p. 140

しかしながら、従来知られるニューラルネットワークの学習方法、及び、サポートベクタマシンの設計方法では、次のような問題があった。
従来知られるニューラルネットワークの学習方法としては、上述したように、バックプロパゲーション法が知られているが、バックプロパゲーション法では、学習データが示す入力値のサンプルをニューラルネットワークに入力したときの出力値と、学習データが示す出力値（教師信号）と、の二乗誤差を小さくする方向に、結合荷重Ｗを修正していくため、学習時に与える結合荷重Ｗの初期値によっては、最適解を求められない可能性があった。 However, the conventional neural network learning method and support vector machine design method have the following problems.
As described above, the back-propagation method is known as a conventionally known neural network learning method. However, in the back-propagation method, an output when a sample of input values indicated by learning data is input to the neural network is known. Since the connection load W is corrected in a direction to reduce the square error between the value and the output value (teacher signal) indicated by the learning data, an optimum solution is obtained depending on the initial value of the connection load W given during learning. There was a possibility that it was not possible.

ここで、バックプロパゲーション法による結合荷重Ｗの学習方法について、簡単なニューラルネットワークを例に挙げて説明する。具体的には、入力ユニットを２つ、出力ユニットを１つ、中間ユニットを２つ有する三層フィードフォワードニューラルネットワークであって、非線形関数としてシグモイド関数 Here, the learning method of the connection weight W by the back propagation method will be described by taking a simple neural network as an example. Specifically, a three-layer feedforward neural network having two input units, one output unit, and two intermediate units, and a sigmoid function as a nonlinear function

が採用されたニューラルネットワーク（図１２（ａ）参照）を例に挙げて説明する。このニューラルネットワークの入出力関係は、次式で表される。

An example will be described in which a neural network (see FIG. 12A) is employed. The input / output relationship of this neural network is expressed by the following equation.

従って、このニューラルネットワークの結合荷重Ｗ＝｛ｗ１，…，ｗ９｝が、Ｓ個（ｓ＝１，２，…，Ｓ）の学習データＤ（ｓ）＝｛ｘ１（ｓ），ｘ２（ｓ），ｙ（ｓ）｝によって学習されるものとすると、バックプロパゲーション法では、二乗誤差Ｅ

Therefore, the connection load W = {w1,..., W9} of this neural network is S (s = 1, 2,..., S) learning data D (s) = {x1 (s), x2 (s). , Y (s)}, the backpropagation method uses the square error E

が最小となる結合荷重Ｗを求めることになる。尚、ｙ（ｓ）は、サンプルＸ（ｓ）＝｛ｘ１（ｓ），ｘ２（ｓ）｝に対応するカテゴリを表す値（ニューラルネットワークにて算出されるべき値）であり、例えば、−１又は＋１を採る。

The joint load W that minimizes the value is obtained. Note that y (s) is a value (a value to be calculated by a neural network) representing a category corresponding to the sample X (s) = {x1 (s), x2 (s)}. For example, −1 Or take +1.

しかしながら、二乗誤差Ｅは、ｗ１，…，ｗ９の非線形関数であるため、図１２（ｂ）に示すように、この二乗誤差Ｅには、極小値が複数存在し、学習時におけるｗ１，…，ｗ９の初期値の設定次第では、最小値ではない極小点に収束するように、学習が行われて、ｗ１，…，ｗ９の解が求められる可能性があった。即ち、従来手法では、結合荷重Ｗについて局所解しか求めることができないため、適切な結合荷重Ｗの解を得られない可能性があった。 However, since the square error E is a non-linear function of w1,..., W9, as shown in FIG. 12 (b), there are a plurality of minimum values in this square error E, and w1,. Depending on the setting of the initial value of w9, learning may be performed so as to converge to a minimum point that is not the minimum value, and solutions of w1,..., w9 may be obtained. That is, in the conventional method, since only a local solution can be obtained for the coupling load W, there is a possibility that an appropriate solution for the coupling load W cannot be obtained.

また、従来手法では、学習データＤ（ｓ）に従って、二乗誤差Ｅが小さくなるように、結合荷重Ｗの解を求める程度であるため、この解を算出するに当たって用いた学習データＤ（ｓ）以外の値を、ニューラルネットワークに入力した場合、適切な認識結果が得られるとは限らなかった。 In addition, in the conventional method, the solution of the coupling load W is determined so that the square error E is reduced according to the learning data D (s). Therefore, the learning data D (s) other than the learning data D (s) used for calculating this solution is used. When the value of is input to the neural network, an appropriate recognition result is not always obtained.

一方、サポートベクタマシンに関しては、従来、次のような問題があった。即ち、サポートベクタマシンを設計するには、サンプルＵ（ｓ）を写像φにより高次元にとばしたときのサンプルφ（Ｕ（ｓ））の集合が、カテゴリ毎の集合に、線形分離可能である必要があるため、φ（Ｕ（Ｓ））の集合が線形分離可能となるような写像φとして、演算量が膨大な量とならないような適当な写像φを設定することができない場合には、サンプルＵ（ｓ）に対応する認識システムとして、サポートベクタマシンを用いた認識システムを設計することができないといった問題があった。 On the other hand, the support vector machine has the following problems. That is, to design a support vector machine, a set of samples φ (U (s)) when the samples U (s) are skipped to a higher dimension by the mapping φ can be linearly separated into sets for each category. If it is not possible to set an appropriate mapping φ such that the amount of computation does not become a huge amount as a mapping φ that enables linear separation of a set of φ (U (S)), There is a problem that a recognition system using a support vector machine cannot be designed as a recognition system corresponding to the sample U (s).

尚、識別関数ｙ₂における内積＜φ（Ｕ），φ（Ｕ（ｓ））＞の演算を簡単にするためには、上述したカーネルＫ（Ｕ，Ｕ（ｓ），γ）を用いることができるが、周知のように、カーネルＫとして任意の関数を採ることはできないため、カーネルＫ（Ｕ，Ｕ（ｓ），γ）を用いて非線形サポートベクタマシンを設計する場合でも、任意の学習データに対応した非線形サポートベクタマシンを設計することはできないといった問題があった。 In order to simplify the calculation of the inner product <φ (U), φ (U (s))> in the discrimination function y ₂ , the above-described kernel K (U, U (s), γ) is used. Although, as is well known, an arbitrary function cannot be taken as the kernel K, the arbitrary learning data can be used even when a nonlinear support vector machine is designed using the kernel K (U, U (s), γ). There was a problem that it was not possible to design a non-linear support vector machine corresponding to.

また、カーネルＫ（Ｕ，Ｕ（ｓ），γ）のパラメータγの値については、従来、試行錯誤により、設計者が定めていたため、このことが、最適な非線形サポートベクタマシンの設計の妨げとなっていた。 Further, the value of the parameter γ of the kernel K (U, U (s), γ) has conventionally been determined by the designer through trial and error, and this may hinder the design of the optimal nonlinear support vector machine. It was.

本発明は、こうした問題に鑑みなされたものであり、従来よりも好適な認識システムの計算モデルを導出することができるようにし、好適な認識システムを構築することができるようにすることを目的とする。 The present invention has been made in view of these problems, and an object of the present invention is to be able to derive a calculation model of a recognition system that is more suitable than before and to construct a preferable recognition system. To do.

かかる目的を達成するためになされた本発明は、入力値Ｘ＝｛ｘ１，…，ｘＮ₁｝（但し、値Ｎ₁は、２以上の整数である。）から、入力値Ｘに対応するカテゴリを表す値ｙを、所定の計算モデルにより算出して、入力値Ｘに対応するカテゴリを認識する認識システムの上記計算モデルを、入力値ＸのサンプルＸ（ｓ）及びこのサンプルが属するカテゴリを表す値ｙ（ｓ）の組合せからなる任意のＳ個の学習データＤ（ｓ）＝｛Ｘ（ｓ），ｙ（ｓ）｝（但し、ｓ＝１，…，Ｓである。）に基づいて、導出するモデル導出方法であって、次の手順［ａ］〜手順［ｃ］を経て、認識システムの計算モデルを導出するものである。 In order to achieve this object, the present invention provides a category corresponding to the input value X from the input value X = {x1,..., XN ₁ } (where the value N ₁ is an integer of 2 or more). Is calculated by a predetermined calculation model, and the calculation model of the recognition system for recognizing the category corresponding to the input value X represents the sample X (s) of the input value X and the category to which this sample belongs. Based on arbitrary S pieces of learning data D (s) = {X (s), y (s)} (where s = 1,..., S) composed of combinations of values y (s). This is a model derivation method for deriving a calculation model of a recognition system through the following procedure [a] to procedure [c].

本発明のモデル導出方法では、まず、学習対象のニューラルネットワークとして、第０層をＮ₁個の入力ユニットからなる入力層とし、第Ｌ₁層を出力層とした（Ｌ₁＋１）層のニューラルネットワーク（但し、値Ｌ₁は２以上の整数である。）を設定する。 In the model derivation method of the present invention, first, as a neural network to be learned, a (L ₁ +1) layer neural network in which the 0th layer is an input layer composed of N ₁ input units and the L ₁ layer is an output layer. A network (however, the value L ₁ is an integer of 2 or more) is set.

そして、このニューラルネットワークにおいて未知数の学習パラメータＷ₁を、各学習データＤ（ｓ）の値ｙ（ｓ）を教師信号として用いて、Ｓ個の学習データＤ（ｓ）に基づき学習し、学習パラメータＷ₁の解を求める（手順［ａ］）。尚、学習パラメータＷ₁の解は、周知のバックプロパゲーション法により求めることができる。 Then, in this neural network, an unknown number of learning parameters W ₁ are learned based on the S learning data D (s) using the value y (s) of each learning data D (s) as a teacher signal. A solution of W ₁ is obtained (procedure [a]). Incidentally, the solution of learning parameters W ₁ is, Ru can be obtained by known back propagation method.

また、手順［ａ］で学習パラメータＷ₁の解を求めた後には、当該学習パラメータＷ₁の解を設定してなる学習後の（Ｌ₁＋１）層のニューラルネットワークを用いて、各学習データＤ（ｓ）に対応する新たな学習データＤ_v（ｓ）を生成する（手順［ｂ］）。 In addition, after the solution of the learning parameter W ₁ is obtained in the procedure [a], each learning data is obtained using a (L ₁ +1) layer neural network after learning obtained by setting the solution of the learning parameter W _1. New learning data D _v (s) corresponding to D (s) is generated (procedure [b]).

具体的には、各学習データＤ（ｓ）毎に、学習データＤ（ｓ）のサンプルＸ（ｓ）を、学習後の（Ｌ₁＋１）層のニューラルネットワークへの入力値Ｘとして、このニューラルネットワークを構成する第（Ｌ₁−１）層の出力値Ｚ＝｛ｚ１，…，ｚＮ₂｝を求め、上記求めた出力値Ｚを新たなサンプルＵ（ｓ）として設定し、設定した新たなサンプルＵ（ｓ）、及び、学習データＤ（ｓ）が示す値ｙ（ｓ）の組合せからなるデータＤ_v（ｓ）＝｛Ｕ（ｓ），ｙ（ｓ）｝を、学習データＤ（ｓ）に対応する新たな学習データＤ_v（ｓ）として生成する。但し、第（Ｌ₁−１）層は、Ｎ₂個（但し、値Ｎ₂は２以上の整数である。）の中間ユニットからなるものとする。 Specifically, for each learning data D (s), a sample X (s) of the learning data D (s) is used as an input value X to the neural network of the (L ₁ +1) layer after learning. output value Z = {z1, ..., zN 2} of the (L ₁ -1) layer constituting the network seeking to set the output value Z obtained above as a new sample U (s), a new set Data D _v (s) = {U (s), y (s)} consisting of a combination of the sample U (s) and the value y (s) indicated by the learning data D (s) is used as the learning data D (s). ) _Is generated as new learning data D _v (s) corresponding to. However, the (L ₁ -1) th layer is composed of N ₂ intermediate units (where the value N ₂ is an integer of 2 or more).

また、手順［ｂ］の終了後には、手順［ｂ］で生成した各学習データＤ_v（ｓ）に基づき、入力値Ｕ＝｛ｕ１，…，ｕＮ｝に対応するカテゴリを表す値ｙを算出する非線形サポートベクタマシンを設計する（手順［ｃ］）。但し、値Ｎは、入力値Ｕのサンプルとされる学習データＤ_v（ｓ）のサンプルＵ（ｓ）の次元数である。 Further, after the completion of the procedure [b], a value y representing a category corresponding to the input value U = {u1,..., UN} is calculated based on each learning data D _v (s) generated in the procedure [b]. A non-linear support vector machine is designed (procedure [c]). However, the value N is the number of dimensions of the sample U (s) of the learning data D _v (s) that is a sample of the input value U.

非線形サポートベクタマシンは、識別関数ｙ The non-linear support vector machine is a discriminant function y

により、入力値Ｕに対応するカテゴリを表す値ｙとして、＋１又は−１を算出するものである。

Thus, +1 or −1 is calculated as the value y representing the category corresponding to the input value U.

従って、ここでは、非線形サポートベクタマシンの型、即ち、パラメータα_s及びパラメータβ並びにパラメータγが未知の非線形サポートベクタマシンに、パラメータγの値を設定した後、学習データＤ_v（ｓ）＝｛Ｕ（ｓ），ｙ（ｓ）｝（ｓ＝１，２，…，Ｓ）に基づき、パラメータα_s及びパラメータβの解を求め、この解を非線形サポートベクタマシンに設定して、非線形サポートベクタマシンを設計する。 Therefore, here, after setting the value of the parameter γ in the nonlinear support vector machine type, that is, the nonlinear support vector machine in which the parameters α _s, β, and γ are unknown, the learning data D _v (s) = { Based on U (s), y (s)} (s = 1, 2,..., S), a solution of the parameter α _s and the parameter β is obtained, and this solution is set in the nonlinear support vector machine to obtain the nonlinear support vector. Design the machine.

そして、本発明では、これらの手順［ａ］〜手順［ｃ］により得られた学習パラメータＷ₁及び非線形サポートベクタマシンを用いて、上記認識システムの計算モデルとして、次の計算モデルを導出する。 In the present invention, the following calculation model is derived as the calculation model of the recognition system using the learning parameter W ₁ obtained by these steps [a] to [c] and the nonlinear support vector machine.

即ち、学習後の（Ｌ₁＋１）層のニューラルネットワークにおける第（Ｌ₁−１）層の出力値Ｚを演算可能な計算モデルと、手順［ｃ］で設計した非線形サポートベクタマシンと、を組み合わせてなる計算モデルであって、入力値Ｘから第（Ｌ₁−１）層の出力値Ｚを算出し、この出力値Ｚを非線形サポートベクタマシンの入力値Ｕとして当該非線形サポートベクタマシンの出力値ｙを算出する計算モデルを、認識システムの計算モデルとして導出する。 That is, a combination of the calculation model capable of calculating the output value Z of the (L ₁ -1) layer in the (L ₁ +1) layer neural network after learning and the nonlinear support vector machine designed in step [c]. An output value Z of the (L ₁ -1) -th layer is calculated from the input value X, and this output value Z is used as the input value U of the nonlinear support vector machine, and the output value of the nonlinear support vector machine A calculation model for calculating y is derived as a calculation model of the recognition system.

本発明において、このように認識システムの計算モデルを導出するのは、ニューラルネットワークを用いた認識方法では、学習パラメータＷ₁の最適解を求められることが理論上保証されない問題がある一方で、パラメータα_s及びパラメータβの最適解を求めることができる非線形サポートベクタマシンを用いた認識方法では、任意の認識対象に対して、非線形サポートベクタマシンを設計することができないといった問題があるためである。 In the present invention, the calculation model of the recognition system is derived in such a way that the recognition method using the neural network has a problem that it is not theoretically guaranteed that the optimum solution of the learning parameter W ₁ can be obtained. This is because the recognition method using the nonlinear support vector machine that can obtain the optimal solution of α _s and parameter β has a problem that the nonlinear support vector machine cannot be designed for an arbitrary recognition target.

非線形サポートベクタマシンは、ニューラルネットワークと比較して、学習データ以外の入力に対しても正しい認識結果を得ることができる点で優れているが、上述したように、非線形サポートベクタマシンを設計するためには、入力値ＵのサンプルＵ（ｓ）が線形分離しやすい集合である必要があり、従来の手法では、任意の認識システムに対して、非線形サポートベクタマシンの手法を採用することができないといった問題があった。 Compared to neural networks, nonlinear support vector machines are superior in that they can obtain correct recognition results for inputs other than learning data. For example, the sample U (s) of the input value U needs to be a set that is easily linearly separated, and the conventional method cannot adopt the nonlinear support vector machine method for an arbitrary recognition system. There was a problem.

そこで、本発明では、学習データＤ（ｓ）から新たな学習データＤ_v（ｓ）を生成して、サンプルＸ（ｓ）を、線形分離しやすい値Ｚに置き換えることにより、サンプルＸ（ｓ）によらず、非線形サポートベクタマシンを設計できるようにしたのである。 Therefore, in the present invention, new learning data D _v (s) is generated from the learning data D (s), and the sample X (s) is replaced with a value Z that is easily linearly separated, thereby obtaining the sample X (s). Regardless of this, a nonlinear support vector machine can be designed.

尚、値Ｚが線形分離しやすいのは、ニューラルネットワークにおける出力ユニットの出力ｙが、それより一つ下層の第（Ｌ₁−１）層の出力値Ｚ＝｛ｚ１，…，ｚＮ₂｝の超平面からの符号付距離に対応した値を表すものとなるためである。 Incidentally, the value Z is easily linearly separable, the output y of the output unit in the neural network, the output value of it than the one lower layer first (L ₁ -1) layer Z = {z1, ..., zN 2} of This is because it represents a value corresponding to the signed distance from the hyperplane.

即ち、ニューラルネットワークにおける出力ユニットの出力ｙは、第（Ｌ₁−１）層における第ｉ中間ユニット−出力ユニット間の結合係数をＷ（ｉ）で表し、出力ユニットの閾値をＷ０で表したとき、次式 That is, the output y of the output unit in the neural network is expressed by W (i) representing the coupling coefficient between the i-th intermediate unit and the output unit in the (L ₁ -1) layer, and the threshold value of the output unit represented by W0. ,

で表すことができる。

Can be expressed as

一方、Ｎ₂次元における超平面からの符号付距離Ｌ_gは、超平面の法線ベクトルが、長さ１の法線ベクトルＧ＝｛ｇ１，…，ｇＮ₂｝であるとすると、次式 On the other hand, the signed distance L _g from the hyperplane in the N ₂ dimension is as follows, assuming that the normal vector of the hyperplane is a normal vector G = {g1,..., GN ₂ } of length 1.

で表すことができる。

Can be expressed as

従って、学習データＤ（ｓ）に対応する適切な学習パラメータＷ（ｉ）の解が得られていれば、各サンプルＸ（ｓ）をニューラルネットワークに入力したときの第（Ｌ₁−１）層の出力値Ｚの集合は、カテゴリ毎に、超平面で線形分離可能となる。 Therefore, if an appropriate solution of the learning parameter W (i) corresponding to the learning data D (s) is obtained, the (L ₁ −1) layer when each sample X (s) is input to the neural network. The set of output values Z can be linearly separated on the hyperplane for each category.

勿論、ニューラルネットワークの型は、設計者の試行錯誤により設定されるものであり、設計者が採用したニューラルネットワークの型、換言すると、ニューラルネットワークの階層数や中間ユニットの個数が適切でないとき、第（Ｌ₁−１）層の出力値Ｚの集合は、超平面で線形分離することができない可能性がある。しかしながら、少なくとも、出力値Ｚの集合は、サンプルＸ（ｓ）の集合よりも線形分離しやすい集合であるということができる。 Of course, the type of the neural network is set by the trial and error of the designer, and when the type of the neural network adopted by the designer, in other words, when the number of layers of the neural network or the number of intermediate units is not appropriate, There is a possibility that the set of output values Z of the (L ₁ -1) layer cannot be linearly separated in the hyperplane. However, it can be said that at least the set of output values Z is a set that is more easily linearly separated than the set of samples X (s).

従って、本発明では、出力値Ｚに基づいて、新たな学習データＤ_vを生成し、この学習データＤ_vに基づき、非線形サポートベクタマシンを設計するようにしたのである。出力値Ｚの集合は、少なくともＸ（ｓ）よりも大幅に線形分離しやすい集合であるため、写像φにより比較的簡単に、線形分離することができる。 Therefore, in the present invention, new learning data D _v is generated based on the output value Z, and a nonlinear support vector machine is designed based on the learning data D _v . Since the set of output values Z is a set that is much easier to linearly separate than at least X (s), linear separation can be performed relatively easily by the mapping φ.

よって、本発明によれば、サンプルＸ（ｓ）の分布によらず、適切な非線形サポートベクタマシンを設計することができ、非線形サポートベクタマシンにより、従来よりも好適な認識システムの計算モデルを導出することができるのである。 Therefore, according to the present invention, an appropriate nonlinear support vector machine can be designed regardless of the distribution of the sample X (s), and a calculation model of a recognition system that is more suitable than the conventional model can be derived using the nonlinear support vector machine. It can be done.

換言すると、ニューラルネットワークでは、学習パラメータＷ₁（結合係数）の解として、局所解しか得られないため、理論的に最適なニューラルネットワークを設計できる保証がなかったが、本発明では、パラメータα_s及びパラメータβの解として大局的解を求めることができるサポートベクタマシンを用いて、認識システムの計算モデルを導出しているため、ニューラルネットワークの上記欠点をサポートベクタマシンで補完することができ、認識システムの計算モデルとして適切な計算モデルを求めることができる。 In other words, since only a local solution can be obtained as a solution of the learning parameter W ₁ (coupling coefficient) in the neural network, there is no guarantee that a theoretically optimal neural network can be designed. However, in the present invention, the parameter α _s Since the calculation model of the recognition system is derived using a support vector machine that can obtain a global solution as a solution of the parameter β, the above-mentioned drawbacks of the neural network can be complemented by the support vector machine. An appropriate calculation model can be obtained as a calculation model of the system.

また、ニューラルネットワークでは、学習データ以外の値が入力された場合に、必ずしも適切な認識結果を得られるとは限らないが、サポートベクタマシンでは、上述したように学習データ以外の値についても、正しい認識結果を得ることができるので、本発明の方法で導出されるサポートベクタマシンを組み込んだ計算モデルを用いれば、従来よりも、認識性能に優れた認識システムを構築することができる。 In addition, in a neural network, when a value other than learning data is input, an appropriate recognition result is not always obtained. However, in a support vector machine, as described above, values other than learning data are also correct. Since a recognition result can be obtained, it is possible to construct a recognition system that has a better recognition performance than the prior art by using a calculation model incorporating a support vector machine derived by the method of the present invention.

即ち、本発明のモデル導出方法により導出された計算モデルによって、与えられた入力値Ｘ＝｛ｘ１，…，ｘＮ₁｝から、入力値Ｘに対応するカテゴリを表す値ｙを算出し、入力値Ｘに対応するカテゴリを認識する認識システムを構築すれば、当該認識システムにて、従来よりも優れた認識行為を実現することができる。 That is, the value y representing the category corresponding to the input value X is calculated from the given input value X = {x1,..., XN ₁ } by the calculation model derived by the model derivation method of the present invention, and the input value if build recognizing system the category corresponding to the X, in the recognition system, it is possible to achieve excellent recognition acts than before.

尚、上述の認識システムは、入力値Ｘに対応するカテゴリの認識結果として、入力値Ｘに対応するカテゴリを表す値ｙを出力すると共に、当該値ｙを算出する際に求められた非線形サポートベクタマシンを構成する符号関数への入力値ｐの絶対値｜ｐ｜を、認識結果の確度を表す情報として出力する構成にされるとよい。このように認識システムを構成すれば、認識結果の確からしさに関する情報をも出力することができるので、確度が低い場合には、認識動作をやり直す等の行為が可能になる。 The recognition system described above outputs a value y representing a category corresponding to the input value X as a recognition result of the category corresponding to the input value X, and the nonlinear support vector obtained when calculating the value y. the absolute value of the input value p to sign function constituting the machine | p | a, have good when it is in the configuration of outputting the information indicating the accuracy of the recognition result. If the recognition system is configured in this way, it is possible to output information related to the accuracy of the recognition result. Therefore, when the accuracy is low, an action such as re-performing the recognition operation becomes possible.

また、上述のモデル導出方法において、学習データＤ_v（ｓ）は、次のように生成されてもよい。即ち、手順［ｂ］では、各学習データＤ（ｓ）毎に、学習データＤ（ｓ）のサンプルＸ（ｓ）に基づいて算出した値Ｚを、所定のアルゴリズムにより次元削減し、当該次元削減後の値Ｖ＝｛ｖ１，…，ｖＮ₃｝（但し、値Ｎ₃は値Ｎ₂より小さい２以上の整数である。）を、新たなサンプルＵ（ｓ）として設定して、学習データＤ_v（ｓ）＝｛Ｕ（ｓ），ｙ（ｓ）｝を生成してもよい。 In the model derivation method described above, the learning data D _v (s) may be generated as follows. That is, in step [b], for each learning data D (s), the value Z calculated based on the sample X (s) of the learning data D (s) is reduced in dimension by a predetermined algorithm, and the dimension reduction is performed. The subsequent value V = {v1,..., VN ₃ } (where the value N ₃ is an integer greater than or equal to ₂ smaller than the value N ₂ ) is set as a new sample U (s), and the learning data D _v (s) = {U (s), y (s)} may be generated.

このように学習データＤ_v（ｓ）を生成する場合には、認識システムの計算モデルとして、入力値Ｘから第（Ｌ₁−１）層の出力値Ｚを算出し、この出力値Ｚを上記所定のアルゴリズムにより次元削減して、出力値Ｚを、Ｎ₃次元の値Ｖ＝｛ｖ１，…，ｖＮ₃｝に変換し、変換後の値Ｖを非線形サポートベクタマシンの入力値Ｕとして当該非線形サポートベクタマシンの出力値ｙを算出する計算モデルを、導出することになる。このようにして、非線形サポートベクタマシンへの入力値Ｕを次元削減すれば、非線形サポートベクタマシンの設計及び認識行為に係る演算量を抑えることができる。 When the learning data D _v (s) is generated in this way, the output value Z of the (L ₁ −1) -th layer is calculated from the input value X as a calculation model of the recognition system, and this output value Z is calculated as above. The dimension is reduced by a predetermined algorithm, and the output value Z is converted into an N ₃ dimensional value V = {v1,..., VN ₃ }, and the converted value V is used as the input value U of the nonlinear support vector machine. a calculation model for calculating the output value y of the support vector machine, ing to be derived. Thus, if the input value U to the nonlinear support vector machine is reduced in dimension, the amount of calculation related to the design and recognition action of the nonlinear support vector machine can be suppressed.

具体的に、上記次元削減は、主成分分析の手法により実現することができる。即ち、手順［ｂ］では、各学習データＤ（ｓ）毎に、学習データＤ（ｓ）のサンプルＸ（ｓ）に基づいて算出した値Ｚを、主成分分析の手法により求めたＮ₃個のＮ₂次元ベクトルＪｍ（但し、ｍ＝１，…，Ｎ₃である。）を用いて次元削減することにより、値Ｚを、第ｍ要素の値ｖｍがベクトルＪｍと前記値Ｚとの内積＜Ｊｍ，Ｚ＞で表されるＮ₃次元の値Ｖ＝｛ｖ１，…，ｖＮ₃｝に変換し、当該次元削減後の値Ｖ＝｛ｖ１，…，ｖＮ₃｝を、新たなサンプルＵ（ｓ）として設定して、学習データＤ_v（ｓ）＝｛Ｕ（ｓ），ｙ（ｓ）｝を生成してもよい。 Specifically, the dimension reduction can be realized by a principal component analysis technique. In other words, in step [b], for each learning data D (s), N ₃ values Z calculated based on the sample X (s) of the learning data D (s) are obtained by the principal component analysis technique. N ₂ dimensional vector Jm (where m = 1,..., N ₃ ) is used to reduce the dimension, and the inner product of the value J of the m-th element is the vector Jm and the value Z. <Jm, Z> value V = {v1, ..., vN _3} of N _{3-dimensional} represented by converting the value V = after the dimension reduction {v1, ..., vN _3} a new sample U (S) may be set to generate learning data D _v (s) = {U (s), y (s)}.

このように学習データＤ_v（ｓ）を生成する場合には、認識システムの計算モデルとして、入力値Ｘから第（Ｌ₁−１）層の出力値Ｚを演算し、この出力値Ｚを、Ｎ₃個のＮ₂次元ベクトルＪｍを用いて次元削減し、次元削減後の値Ｖを非線形サポートベクタマシンの入力値Ｕとして当該非線形サポートベクタマシンの出力値ｙを算出する計算モデルを、導出することになる。 When the learning data D _v (s) is generated in this way, the output value Z of the (L ₁ −1) -th layer is calculated from the input value X as a calculation model of the recognition system, and this output value Z is Using N ₃ N _two- dimensional vectors Jm, the dimension is reduced, and a calculation model for calculating the output value y of the nonlinear support vector machine is derived using the value V after the dimension reduction as the input value U of the nonlinear support vector machine. in particular ing.

また、非線形サポートベクタマシンにカーネルＫ（Ｕ，Ｕ（ｓ），γ）を用いる場合には、学習データＤ_vに基づいてパラメータα_s及びパラメータβの解を求める前に、パラメータγを設定する必要があるが、従来の手法でパラメータγを設計すると、手間がかかるばかりでなく、必ずしも適切な値をパラメータγに設定できるとは限らないため、次のようにして、パラメータγの解を求めるとよい。 When the kernel K (U, U (s), γ) is used for the nonlinear support vector machine, the parameter γ is set before obtaining the solution of the parameter α _s and the parameter β based on the learning data D _v. Although it is necessary, designing the parameter γ using the conventional method is not only troublesome, but it is not always possible to set an appropriate value for the parameter γ. Therefore, a solution for the parameter γ is obtained as follows. Good.

即ち、手順［ｃ］では、入力値Ｕからカテゴリを表す値ｙを算出する上記非線形サポートベクタマシンについて、当該非線形サポートベクタマシンを構成するカーネルＫ（Ｕ，Ｕ（ｓ），γ）のパラメータγの適値を、次の手順［１］〜手順［３］により求め、当該適値をパラメータγに設定したカーネルＫ（Ｕ，Ｕ（ｓ），γ）を用いて、非線形サポートベクタマシンを設計するとよい。 That is, in the procedure [c], for the nonlinear support vector machine that calculates the value y representing the category from the input value U, the parameter γ of the kernel K (U, U (s), γ) that constitutes the nonlinear support vector machine. Is determined by the following steps [1] to [3], and a non-linear support vector machine is designed using the kernel K (U, U (s), γ) in which the appropriate value is set as the parameter γ. Good.

手順［１］：学習対象のニューラルネットワークとして、第０層をＮ個の入力ユニットからなる入力層とし、第Ｌ₂層を出力層とした（Ｌ₂＋１）層のニューラルネットワーク（但し、値Ｌ₂は２以上の整数である。）を設定する。そして、このニューラルネットワークにおいて未知数の学習パラメータＷ₂を、各学習データＤ_v（ｓ）の値ｙ（ｓ）を教師信号として用いて、Ｓ個の学習データＤ_v（ｓ）に基づき学習し、学習パラメータＷ₂の解を求める。 Procedure [1]: a learning object of a neural network, the layer 0 as the input layer of N input unit, the neural network of the first L ₂ layer was output layer (L ₂ +1) layer (where the value L ₂ is an integer greater than or equal to 2.) is set. Then, the learning parameters W ₂ unknowns in the neural network, using the value y (s) of the training data D _v (s) as a teacher signal, and learning based on the S learning data D _v (s), A solution for the learning parameter W ₂ is obtained.

手順［２］：各学習データＤ_v（ｓ）毎に、学習データＤ_v（ｓ）のサンプルＵ（ｓ）を、手順［１］で求めた学習パラメータＷ₂の解を設定してなる学習後の前記（Ｌ₂＋１）層のニューラルネットワークへの入力値として、このニューラルネットワークを構成する第（Ｌ₂−１）層の出力値Ｈ（ｓ）＝｛ｈ１（ｓ），…，ｈＮ_h（ｓ）｝を求める。但し、第（Ｌ₂−１）層は、Ｎ_h個（但し、値Ｎ_hは、入力値Ｕの次元数Ｎよりも大きい整数である。）の中間ユニットからなるものとする。 Procedure [2]: learning each learning data D _v (s), comprising the training data D _v samples U (s) of (s), by setting the solution to the learning parameters W ₂ determined in Step [1] As an input value to the later (L ₂ +1) layer neural network, the output value H (s) = {h 1 (s),..., HN _{h of} the (L ₂ −1) layer constituting this neural network. (S)} is obtained. However, the (L ₂ −1) -th layer is composed of N _h intermediate units (where the value N _h is an integer larger than the dimension number N of the input value U).

手順［３］：学習データＤ_v（ｓ）及び手順［２］で求めた各学習データＤ_v（ｓ）のサンプルＵ（ｓ）に対応する値Ｈ（ｓ）を用いて、次式 Procedure [3]: Using the learning data D _v (s) and the value H (s) corresponding to the sample U (s) of each learning data D _v (s) obtained in procedure [2],

に従い、二乗誤差Ｅが極小値を採るパラメータγの解γ^*を、パラメータγの適値として求める。

Accordingly, a solution γ ^* of the parameter γ for which the square error E takes a minimum value is obtained as an appropriate value of the parameter γ.

このような手順［１］〜手順［３］によりパラメータγの解を求めれば、カーネルに設定すべきパラメータγの適値を計算により簡単に求めることができ、従来のように試行錯誤によりパラメータγの値を設定する場合よりも、適切な値を、パラメータγに設定することができる。 If the solution of the parameter γ is obtained by such procedure [1] to procedure [3], an appropriate value of the parameter γ to be set in the kernel can be easily obtained by calculation, and the parameter γ can be obtained by trial and error as in the past. than when setting the value, the appropriate value, Ru can be set to the parameter gamma.

尚、カーネルＫ（Ｕ，Ｕ（ｓ），γ）としては、ガウシアンカーネルや、多項式カーネルを用いることができ、ガウシアンカーネルを用いる場合には、手順［３］において、Ｑ’ａｂを、次式で求めることになる。 As the kernel K (U, U (s), γ), a Gaussian kernel or a polynomial kernel can be used. When using the Gaussian kernel, Q′ab is expressed by the following equation in step [3]. ing to be determined in.

また、多項式カーネルを用いる場合には、手順［３］において、Ｑ’ａｂを、次式で求めることができる。

Further, when using a polynomial kernel, Q′ab can be obtained by the following equation in the procedure [3].

また、本発明の方法で、認識システムの計算モデルを導出するに際しては、次の情報処理装置を用いるとよい。

Further, in the method of the present invention, when deriving the computational model of the recognition system, it has good With next information processing apparatus.

本発明の情報処理装置は、与えられた入力値Ｘ＝｛ｘ１，…，ｘＮ₁｝から、入力値Ｘに対応するカテゴリを表す値ｙを、所定の計算モデルにより算出して、入力値Ｘに対応するカテゴリを認識する認識システムの上記計算モデルの導出に用いられる情報処理装置であって、以下の取得手段、学習手段、新データ生成手段、サポートベクタマシン設計手段、及び、出力手段を備えるものである。 The information processing apparatus according to the present invention calculates a value y representing a category corresponding to the input value X from a given input value X = {x1,..., XN ₁ } by using a predetermined calculation model. An information processing apparatus used for derivation of the calculation model of the recognition system for recognizing a category corresponding to the following, comprising the following acquisition means, learning means, new data generation means, support vector machine design means, and output means Is.

この情報処理装置においては、取得手段が、入力値ＸのサンプルＸ（ｓ）及びこのサンプルが属するカテゴリを表す値ｙ（ｓ）の組合せからなるＳ個の学習データＤ（ｓ）＝｛Ｘ（ｓ），ｙ（ｓ）｝を取得し、学習手段が、取得手段が取得した各学習データＤ（ｓ）に基づいて、上述の手順［ａ］により、ニューラルネットワークの学習パラメータＷ₁の解を求める。 In this information processing apparatus, the acquisition means includes S pieces of learning data D (s) = {X ( s), y (s)}, and the learning means obtains a solution of the learning parameter W ₁ of the neural network by the above-described procedure [a] based on each learning data D (s) obtained by the obtaining means. Ask.

また、新データ生成手段は、上述の手順［ｂ］により、学習手段で求められた学習パラメータＷ₁の解を設定してなる学習後の（Ｌ₁＋１）層のニューラルネットワークを用いて、取得手段が取得した各学習データＤ（ｓ）に対応する新たな学習データＤ_v（ｓ）を生成し、サポートベクタマシン設計手段は、これら各学習データＤ_v（ｓ）に基づき、上述の手順［ｃ］により、非線形サポートベクタマシンを設計する。 Further, the new data generation means is obtained by using the (L ₁ +1) layer neural network after learning, in which the solution of the learning parameter W ₁ obtained by the learning means is set by the procedure [b] described above. New learning data D _v (s) corresponding to each learning data D (s) acquired by the means is generated, and the support vector machine designing means, based on each learning data D _v (s), the above procedure [ c] to design a non-linear support vector machine.

そして、この情報処理装置においては、出力手段が、学習手段により求められた学習パラメータＷ₁の解を表す情報、及び、サポートベクタマシン設計手段により設計された非線形サポートベクタマシンを表す情報を出力する。尚、非線形サポートベクタマシンを表す情報としては、非線形サポートベクタマシンとしての識別関数ｙを構成するパラメータα_s及びパラメータβやカーネルのパラメータγ等の情報を挙げることができる。 In this information processing apparatus, the output means outputs information representing the solution of the learning parameter W ₁ obtained by the learning means and information representing the nonlinear support vector machine designed by the support vector machine design means. . The information representing the non-linear support vector machine may include information such as the parameter α _s and the parameter β constituting the discriminant function y as the non-linear support vector machine, the kernel parameter γ, and the like.

このように構成された情報処理装置を用いれば、ユーザは、情報処理装置に、学習データＤ（ｓ）を与える程度で、簡単に、当該学習データＤ（ｓ）に対応した認識システムの計算モデルにかかる情報を得ることができる。 If the information processing apparatus configured in this way is used, the user can easily provide the learning data D (s) to the information processing apparatus, and the calculation model of the recognition system corresponding to the learning data D (s) can be easily obtained. The information concerning can be obtained.

尚、新データ生成手段により、学習データＤ（ｓ）のサンプルＸ（ｓ）に基づいて算出した値Ｚを次元削減して、当該次元削減後の値Ｖ＝｛ｖ１，…，ｖＮ₃｝を、新たなサンプルＵ（ｓ）として設定し、学習データＤ_v（ｓ）＝｛Ｕ（ｓ），ｙ（ｓ）｝を生成する場合、出力手段は、学習手段により求められた学習パラメータＷ₁の解を表す情報、及び、サポートベクタマシン設計手段により設計された非線形サポートベクタマシンを表す情報、並びに、値Ｚから前記値Ｖへの変換方法を表す情報を出力する構成にすることができる。 Note that the value Z calculated based on the sample X (s) of the learning data D (s) is dimension-reduced by the new data generating means, and the value V = {v1,..., VN ₃ } after the dimension reduction is obtained. When the learning data D _v (s) = {U (s), y (s)} is set as a new sample U (s), the output unit learns the learning parameter W ₁ obtained by the learning unit. information representing the solution of, and information indicating the non-linear support vector machine designed by the support vector machine design means, and, Ru can be configured to output the information from the value Z representing the method of converting the value V .

このように構成された情報処理装置を用いれば、少ない演算量で認識行為を実現可能な認識システムの計算モデルを導出することができる。
尚、主成分分析の手法により求めたＮ₃個のＮ₂次元ベクトルＪｍを用いて、学習データＤ（ｓ）のサンプルＸ（ｓ）に基づいて算出した値Ｚを次元削減する場合、出力手段は、値Ｚから値Ｖへの変換方法を表す情報として、上記Ｎ₃個のＮ₂次元ベクトルＪｍを出力する構成にすることができる。 By using the information processing apparatus configured as described above, it is possible to derive a calculation model of a recognition system that can realize a recognition action with a small amount of calculation.
When the dimension Z of the value Z calculated based on the sample X (s) of the learning data D (s) is reduced using N ₃ N _two- dimensional vectors Jm obtained by the principal component analysis technique, output means as information indicating the conversion method from the value Z to a value V, Ru can be configured to output the N ₃ pieces of N _{2-dimensional} vector Jm.

また、本発明の情報処理装置が備える上記取得手段、学習手段、新データ生成手段、サポートベクタマシン設計手段、及び、出力手段としての機能は、プログラムにより、コンピュータに実現させることができる。 Moreover, the acquisition unit of the information processing apparatus of the present invention is provided, the learning means, new data generation means, support vector machine design means, and, the function as the output unit, the program, Ru can be realized on the computer.

この他、本発明の設計装置は、与えられた入力値Ｕ＝｛ｕ１，…，ｕＮ｝から、カテゴリを表す値ｙを算出する非線形サポートベクタマシンの設計装置であって、入力値ＵのサンプルＵ（ｓ）及びこのサンプルが属するカテゴリを表す値ｙ（ｓ）の組合せからなるＳ個の学習データＤ_v（ｓ）＝｛Ｕ（ｓ），ｙ（ｓ）｝（但し、ｓ＝１，…，Ｓである。）を取得する取得手段と、取得手段が取得したＳ個の学習データＤ_v（ｓ）に基づき、非線形サポートベクタマシンに採用するカーネルＫ（Ｕ，Ｕ（ｓ），γ）のパラメータγの適値を、上述の手順［１］〜手順［３］により求める適値算出手段と、適値算出手段により求めたパラメータγの適値を設定したカーネルＫ（Ｕ，Ｕ（ｓ），γ）を用いて、取得手段が取得したＳ個の学習データＤ_v（ｓ）に基づき、非線形サポートベクタマシンを設計する設計手段と、設計手段により設計された非線形サポートベクタマシンを表す情報を出力する出力手段と、を備えるものである。 In addition, design equipment of the present invention, the input value U = {u1, ..., uN } given from an apparatus for designing a nonlinear support vector machines for calculating a value y representing the category of the input value U S pieces of learning data D _v (s) = {U (s), y (s)} consisting of a combination of a sample U (s) and a value y (s) representing a category to which the sample belongs (where s = 1 ,..., S)) and a kernel K (U, U (s), which is employed in the nonlinear support vector machine based on the S learning data D _v (s) acquired by the acquisition unit. An appropriate value calculating means for obtaining an appropriate value of the parameter γ of γ) by the above-mentioned procedure [1] to procedure [3], and a kernel K (U, U) in which the appropriate value of the parameter γ obtained by the appropriate value calculating means is set. S learning data D _v acquired by the acquiring means using (s), γ) On the basis of (s), a design means for designing a non-linear support vector machine and an output means for outputting information representing the non-linear support vector machine designed by the design means are provided.

この設計装置を用いれば、ユーザは、設計装置に学習データＤ_v（ｓ）を与える程度で、学習データＤ_v（ｓ）に好適な非線形サポートベクタマシンの情報を得ることができ、効率的に本発明の手法で、認識システムの計算モデルを導出することができる。 With this design apparatus, the user, to the extent that gives the training data D _v (s) in the design system, it is possible to obtain information of the preferred nonlinear support vector machine learning data D _v (s), effectively With the method of the present invention, a calculation model of the recognition system can be derived.

尚、本発明の設計装置は、カーネルＫ（Ｕ，Ｕ（ｓ），γ）のパラメータγとして、ガウシアンカーネルのパラメータγの解を求める構成にすることができる。また、本発明の設計装置が備える取得手段、適値算出手段、設計手段、及び、出力手段としての機能は、プログラムにより、コンピュータに実現させることができる。 The design device of the present invention, the kernel K as a parameter (U, U (s), γ) γ, Ru can be configured to solving parameter gamma Gaussian kernel. Moreover, acquiring means for designing device comprises of the present invention, suitable value calculating means, the design unit, and the function as the output unit, the program, Ru can be realized on the computer.

以下に本発明の実施例について、図面と共に説明する。 Embodiments of the present invention will be described below with reference to the drawings.

図１は、本発明が適用された第一実施例のモデル導出装置１の構成を表すブロック図である。図１に示すように、本実施例のモデル導出装置１は、周知のパーソナルコンピュータと同様、各種演算処理を行うＣＰＵ１１と、ブートプログラム等を記憶するＲＯＭ１３と、プログラム実行時に作業領域として使用されるＲＡＭ１５と、オペレーティングシステムやその他の各種プログラム・データを記憶するハードディスク装置１７と、液晶ディスプレイからなる表示装置２１と、キーボードやポインティングデバイス等で構成されるユーザインタフェース２３と、フレキシブルディスクに対してデータ読み書き可能なドライブ装置２５と、を備えた構成にされている。 FIG. 1 is a block diagram showing the configuration of a model deriving device 1 of the first embodiment to which the present invention is applied. As shown in FIG. 1, the model deriving device 1 according to the present embodiment is used as a work area when executing a program, a CPU 11 that performs various arithmetic processes, a ROM 13 that stores a boot program and the like, as in a known personal computer. RAM 15, hard disk device 17 that stores an operating system and other various programs and data, a display device 21 that includes a liquid crystal display, a user interface 23 that includes a keyboard, a pointing device, and the like, and data read / write from / to a flexible disk And a possible drive device 25.

このモデル導出装置１は、ハードディスク装置１７に記録されたオペレーティングシステムにより動作し、例えば、ユーザインタフェース２３を通じてプログラムの実行指令が入力されると、ユーザにより指定されたプログラムに基づいた処理を、当該オペレーティングシステムの管理下で、ＣＰＵ１１により実行する。 The model deriving device 1 operates by an operating system recorded on the hard disk device 17. For example, when a program execution command is input through the user interface 23, the model deriving device 1 performs processing based on the program specified by the user. It is executed by the CPU 11 under the management of the system.

具体的に、モデル導出装置１は、認識システムの計算モデルを導出するためのモデル導出プログラムを、ハードディスク装置１７に備えている。図２は、ＣＰＵ１１が、このモデル導出プログラムに従って実行するモデル導出処理を表すフローチャートである。 Specifically, the model deriving device 1 includes a model deriving program for deriving a calculation model of the recognition system in the hard disk device 17. FIG. 2 is a flowchart showing a model derivation process executed by the CPU 11 according to the model derivation program.

ＣＰＵ１１は、ユーザインタフェース２３を通じてモデル導出プログラムの実行指令が入力されると、図２に示す処理を実行し、認識システムの計算モデルとして、ニューラルネットワークと非線形サポートベクタマシンとを組み合わせてなる計算モデルの設計値を、データファイルに出力すると共に、表示装置２１に出力する。 When the execution instruction of the model derivation program is input through the user interface 23, the CPU 11 executes the processing shown in FIG. 2, and the calculation model of the recognition system is a combination of a neural network and a non-linear support vector machine. The design value is output to the data file and output to the display device 21.

図３（ａ）は、このモデル導出処理により導出される計算モデルの基本構成図である。本実施例では、入力値Ｘ＝｛ｘ１，…，ｘＮ１｝（但し、Ｎ１は２以上の整数値である。）からカテゴリを表す値ｙを算出する認識システムの計算モデルとして、前半でニューラルネットワークにより演算を行い、後半では非線形サポートベクタマシンにより演算を行って、カテゴリを表す値ｙを算出する計算モデル、を導出する。尚、図３（ｂ）は、本実施例で導出する計算モデルとの対比として、従来のニューラルネットワークで構成された認識システムの構成を表した図であり、図３（ｃ）は、従来の非線形サポートベクタマシンで構成された認識システムの構成を表した図である。 FIG. 3A is a basic configuration diagram of a calculation model derived by this model derivation process. In this embodiment, as a calculation model of a recognition system for calculating a value y representing a category from input values X = {x1,..., XN1} (where N1 is an integer value of 2 or more), a neural network is used in the first half. In the second half, a calculation model for calculating a value y representing a category is derived by performing calculations using a nonlinear support vector machine. FIG. 3B is a diagram showing a configuration of a recognition system configured by a conventional neural network as a comparison with the calculation model derived in the present embodiment, and FIG. It is a figure showing the structure of the recognition system comprised with the nonlinear support vector machine.

図２に示すモデル導出処理を開始すると、ＣＰＵ１１は、まずＳ１１０にて、表示装置２１に、ＧＵＩ構成のファイル選択画面を表示し、ファイル選択画面を通じて、ユーザに、計算モデルのパラメータ学習に用いるデータが記録された読出対象のデータファイルを選択させる。具体的に、Ｓ１１０では、ＧＵＩ構成のファイル選択画面に、ハードディスク装置１７及びドライブ装置２５に記録されたデータファイルであって、当該モデル導出プログラムに対応したデータファイルのリストを表示する。 When the model derivation process shown in FIG. 2 is started, the CPU 11 first displays a file selection screen having a GUI configuration on the display device 21 in S110, and data used for parameter learning of the calculation model to the user through the file selection screen. Is selected as a data file to be read. Specifically, in S110, a list of data files recorded in the hard disk device 17 and the drive device 25 and corresponding to the model derivation program is displayed on the GUI configuration file selection screen.

そして、ファイル選択画面を通じ、読出対象のデータファイルが選択されると（Ｓ１２０でＹｅｓ）、選択されたデータファイルを読み出し、このデータファイルの記述内容に従って、以降の処理で学習する三層フィードフォワードニューラルネットワークの入力ユニットの数Ｎ１を設定すると共に（Ｓ１３０）、このニューラルネットワークの中間ユニットの数Ｎ２を設定する（Ｓ１４０）。但し、設定するＮ１，Ｎ２は、２以上の整数値である。 Then, when a data file to be read is selected through the file selection screen (Yes in S120), the selected data file is read, and a three-layer feedforward neural network that learns in subsequent processing according to the description content of the data file The number N1 of network input units is set (S130), and the number N2 of intermediate units of the neural network is set (S140). However, N1 and N2 to be set are integer values of 2 or more.

また、Ｓ１４０の処理を終えると、読み出した上記データファイルの記述内容に従って、上記ニューラルネットワークとは別のニューラルネットワークであって、非線形サポートベクタマシンのカーネルパラメータの解を導出する際に用いるニューラルネットワークの中間ユニットの数Ｎｈを設定し（Ｓ１５０）、その後、学習データの総数Ｓを設定する（Ｓ１６０）。但し、設定するＮｈは、Ｎ２よりも大きい２以上の整数値である。この処理を終えると、Ｓ１７０に移行する。 When the processing of S140 is finished, the neural network is different from the neural network according to the description content of the read data file, and the neural network used for deriving the kernel parameter solution of the nonlinear support vector machine The number Nh of intermediate units is set (S150), and then the total number S of learning data is set (S160). However, Nh to be set is an integer value of 2 or more larger than N2. When this process ends, the process proceeds to S170.

尚、図４（ａ）は、当該モデル導出処理で読み出されるデータファイルの構成を表した図である。図４（ａ）に示すように、当該モデル導出処理で読み出されるデータファイルには、入力ユニットの個数Ｎ１、中間ユニットの個数Ｎ２，Ｎｈ、学習データの総数Ｓについてのユーザ設定値が記述されており、Ｓ１３０〜Ｓ１６０では、この記述内容に従って、各パラメータＮ１，Ｎ２，Ｎｈ，Ｓの値を設定することになる。 FIG. 4A shows the structure of the data file read out in the model derivation process. As shown in FIG. 4A, in the data file read by the model derivation process, user set values for the number of input units N1, the number of intermediate units N2, Nh, and the total number S of learning data are described. In S130 to S160, the values of the parameters N1, N2, Nh, and S are set according to the description content.

また、当該データファイルには、学習データとして、入力値Ｘ＝｛ｘ１，ｘ２，…，ｘＮ１｝のサンプル及び当該サンプルが属するカテゴリを表す値ｙ（＋１又は−１）の組が、Ｓ個記述されており、Ｓ１７０〜Ｓ２００では、この記述内容に従い、以降の処理に用いるＳ個の学習データＤ（１），Ｄ（２），…，Ｄ（Ｓ）を設定する。 Further, in the data file, as learning data, S sets of samples of the input value X = {x1, x2,..., XN1} and a value y (+1 or −1) representing the category to which the sample belongs are described. In S170 to S200, S pieces of learning data D (1), D (2),..., D (S) used for the subsequent processing are set according to the description content.

即ち、Ｓ１７０では、パラメータｓを値１に設定し、Ｓ１８０では、データＤ（ｓ）＝｛Ｘ（ｓ），ｙ（ｓ）｝を構成するパラメータＸ（ｓ）＝｛ｘ１（ｓ），…，ｘＮ１（ｓ）｝に、当該データファイルで先頭からｓ番目に記述されたサンプルＸ＝｛ｘ１，ｘ２，…，ｘＮ１｝の値を設定し、データＤ（ｓ）＝｛Ｘ（ｓ），ｙ（ｓ）｝を構成するパラメータｙ（ｓ）に、データファイルに記述された当該サンプルが属するカテゴリを表す値ｙを設定する。また、Ｓ１９０では、パラメータｓの値を１加算し、Ｓ２００では、加算後のパラメータｓの値がサンプルの総数Ｓより大きいか否かを判断する。そして、ｓ≦Ｓである場合には（Ｓ２００でＮｏ）、Ｓ１８０に移行し、ｓ＞Ｓである場合には、Ｓ２１０に移行する。このようにして、Ｓ１７０〜Ｓ２００では、Ｓ個の学習データＤ（１），Ｄ（２），…，Ｄ（Ｓ）を設定する。 That is, the parameter s is set to the value 1 in S170, and the parameter X (s) = {x1 (s),... Constituting the data D (s) = {X (s), y (s)} in S180. , XN1 (s)} is set to the value of the sample X = {x1, x2,..., XN1} described in the sth from the top in the data file, and the data D (s) = {X (s), A value y representing the category to which the sample described in the data file belongs is set in the parameter y (s) constituting y (s)}. In S190, the value of the parameter s is incremented by 1, and in S200, it is determined whether or not the value of the parameter s after the addition is larger than the total number S of samples. If s ≦ S (No in S200), the process proceeds to S180, and if s> S, the process proceeds to S210. In this way, in S170 to S200, S pieces of learning data D (1), D (2),..., D (S) are set.

また、Ｓ２１０に移行すると、ＣＰＵ１１は、Ｓ個の学習データＤ（１），…，Ｄ（Ｓ）に基づき、入力ユニットＮ１個、中間ユニットＮ２個、出力ユニット１個の三層フィードフォワードニューラルネットワークにおいて未知の学習パラメータＷ１＝｛Ｗａ（０，１），…，Ｗａ（ｉ，ｊ），…，Ｗａ（Ｎ１，Ｎ２），Ｗｂ（０），…，Ｗｂ（ｊ），…，Ｗｂ（Ｎ２）｝の解Ｗ１^*を、各学習データＤ（ｓ）の値ｙ（ｓ）を教師信号として、周知のバックプロパゲーション法により求める。 In S210, the CPU 11 determines a three-layer feedforward neural network of N1 input units, N2 intermediate units, and 1 output unit based on S pieces of learning data D (1),..., D (S). Unknown learning parameter W1 = {Wa (0,1), ..., Wa (i, j), ..., Wa (N1, N2), Wb (0), ..., Wb (j), ..., Wb (N2 )} the solution W1 ^* of the value y (s) of the training data D (s) as a teacher signal, determined by the well-known back propagation method.

尚、図５（ａ）に示すように、パラメータＷａ（ｉ，ｊ）は、第ｉ入力ユニット−第ｊ中間ユニット間の結合係数に対応するパラメータであり、パラメータＷｂ（ｊ）は、第ｊ中間ユニット−出力ユニット間の結合係数に対応するパラメータである。また、パラメータＷａ（０，ｊ）は、第ｊ中間ユニットの閾値に対応するパラメータであり、パラメータＷｂ（０）は、出力ユニットの閾値に対応するパラメータである。 As shown in FIG. 5A, the parameter Wa (i, j) is a parameter corresponding to the coupling coefficient between the i-th input unit and the j-th intermediate unit, and the parameter Wb (j) is This is a parameter corresponding to the coupling coefficient between the intermediate unit and the output unit. The parameter Wa (0, j) is a parameter corresponding to the threshold value of the j-th intermediate unit, and the parameter Wb (0) is a parameter corresponding to the threshold value of the output unit.

即ち、Ｓ２１０では、学習パラメータＷ１の解Ｗ１^*として、（Ｎ１・Ｎ２＋Ｎ２）個の学習パラメータＷａ（ｉ，ｊ）（但し、ｉ＝０，１，…，Ｎ１、ｊ＝１，２，…，Ｎ２である。）の解Ｗａ^*（ｉ，ｊ）、及び、（Ｎ２＋１）個の学習パラメータＷｂ（ｊ）（但し、ｊ＝０，１，…，Ｎ２である。）の解Ｗｂ^*（ｊ）を求める。 That is, in S210, as a solution W1 ^* learning parameters W1, (N1 · N2 + N2 ) number of learning parameters Wa (i, j) (where, i = 0,1, ..., N1 , j = 1,2, ..., Solution Wa ^* (i, j) of (N2) and solution Wb ^* (j of (N2 + 1) learning parameters Wb (j) (where j = 0, 1,..., N2). )

入力値をＸ＝｛ｘ１，…，ｘＮ１｝とする入力ユニットＮ１個、中間ユニットＮ２個、出力ユニット１個の三層フィードフォワードニューラルネットワークの出力値ｙは、次式で表すことができる。但し、ｘ０＝１，ｚ０＝１である。また、関数ｆ（ｘ）は、非線形関数であり、本実施例では、シグモイド関数を採用する（ｆ（ｘ）＝ｓｉｇ（ｘ））。 The output value y of the three-layer feedforward neural network having N1 input units, N2 intermediate units, and one output unit with input values X = {x1,..., XN1} can be expressed by the following equation. However, x0 = 1 and z0 = 1. The function f (x) is a non-linear function, and in this embodiment, a sigmoid function is adopted (f (x) = sig (x)).

従って、Ｓ２１０では、上式に基づき、サンプルＸ（ｓ）が属するカテゴリを表す値ｙ（ｓ）と、当該サンプルＸ（ｓ）をニューラルネットワークに入力したときの出力値ｙ（Ｗ１，Ｘ（ｓ））との二乗誤差Ｅが最小となる値Ｗ１^*を、勾配法により、学習パラメータＷ１の解Ｗ１^*として求める。

Therefore, in S210, based on the above equation, the value y (s) representing the category to which the sample X (s) belongs, and the output value y (W1, X (s) when the sample X (s) is input to the neural network. The value W1 ^* that minimizes the square error E)) is determined as the solution W1 ^* of the learning parameter W1 by the gradient method.

即ち、二乗誤差Ｅを、次式 That is, the square error E is expressed by the following equation:

で定義すると共に、学習パラメータＷａ（ｉ，ｊ），Ｗｂ（ｊ）の修正量ΔＷａ（ｉ，ｊ），ΔＷｂ（ｊ）を、学習速度を表す定数ηを用いて次式

And the correction amounts ΔWa (i, j) and ΔWb (j) of the learning parameters Wa (i, j) and Wb (j) are expressed by the following equation using a constant η representing the learning speed:

で定め、学習パラメータＷａ（ｉ，ｊ），Ｗｂ（ｊ）の値を、初期値からΔＷａ（ｉ，ｊ），ΔＷｂ（ｊ）ずつ更新していく。そして、学習パラメータＷａ（ｉ，ｊ），Ｗｂ（ｊ）の値が収束したときの当該学習パラメータＷａ（ｉ，ｊ），Ｗｂ（ｊ）の値を、二乗誤差Ｅが最小となる値Ｗ１^*として求める。尚、ここでは「二乗誤差Ｅが最小」と表現しているが、上述の手法では、二乗誤差Ｅが局所的に最小となる値Ｗ１^*しか求めることができない。即ち、Ｓ２１０では、実質的には、二乗誤差Ｅが極小値となる値Ｗ１^*を、学習パラメータＷ１の解Ｗ１^*として求めることになる。

The learning parameters Wa (i, j) and Wb (j) are updated from the initial values by ΔWa (i, j) and ΔWb (j). Then, when the values of the learning parameters Wa (i, j) and Wb (j) converge, the values of the learning parameters Wa (i, j) and Wb (j) are set to a value W1 ^* that minimizes the square error E. Asking. Although the expression “the square error E is minimum” is used here, only the value W1 ^* at which the square error E is locally minimum can be obtained by the above-described method. In other words, in S210, the value W1 ^{* at} which the square error E is minimal is obtained as the solution W1 ^* of the learning parameter W1.

また、Ｓ２１０での処理を終えると、ＣＰＵ１１は、Ｓ２２０に移行し、図６に示す新学習データ生成処理を実行する。図６は、ＣＰＵ１１が実行する新学習データ生成処理を表すフローチャートである。 When the process in S210 is completed, the CPU 11 proceeds to S220 and executes a new learning data generation process shown in FIG. FIG. 6 is a flowchart showing new learning data generation processing executed by the CPU 11.

Ｓ２２０において、新学習データ生成処理を開始すると、ＣＰＵ１１は、まず、パラメータＮを値Ｎ２に設定して、Ｎ＝Ｎ２次元のパラメータＵ（ｓ）（ｓ＝１，…，Ｓ）を生成する（Ｓ２２１）。また、パラメータｓを値１に設定する（Ｓ２２２）。 When the new learning data generation process is started in S220, the CPU 11 first sets the parameter N to the value N2, and generates an N = N2-dimensional parameter U (s) (s = 1,..., S) ( S221). The parameter s is set to a value 1 (S222).

その後、Ｓ２２３に移行して、Ｓ２１０で算出した学習パラメータＷ１の解Ｗ１^*を設定してなる三層フィードフォワードニューラルネットワークに、サンプルＸ（ｓ）を入力したときの中間層の出力値Ｚ＝｛ｚ１，…，ｚＮ２｝を、学習データＤ（ｓ）に基づき求める。 Thereafter, the process proceeds to S223, and the output value Z = {of the intermediate layer when the sample X (s) is input to the three-layer feedforward neural network in which the solution W1 ^* of the learning parameter W1 calculated in S210 is set. z1,..., zN2} are obtained based on the learning data D (s).

具体的には、次式に従って、中間層の出力値Ｚを算出する（但し、ｘ０（ｓ）＝１とする。）。 Specifically, the output value Z of the intermediate layer is calculated according to the following equation (provided that x0 (s) = 1).

また、この処理後には、算出した値Ｚを、パラメータＵ（ｓ）＝｛ｕ１（ｓ），…，ｕＮ（ｓ）｝に設定し、新たな学習データＤｖ（ｓ）＝｛Ｕ（ｓ）＝Ｚ，ｙ（ｓ）｝を生成する（Ｓ２２５）。その後、パラメータｓを１加算した値に更新し（Ｓ２２７）、更新後のパラメータｓの値が学習データの総数Ｓよりも大きいか否かを判断する（Ｓ２２９）。

After this processing, the calculated value Z is set to the parameter U (s) = {u1 (s),..., UN (s)}, and new learning data Dv (s) = {U (s) = Z, y (s)} is generated (S225). Thereafter, the parameter s is updated to a value obtained by adding 1 (S227), and it is determined whether or not the updated parameter s value is larger than the total number S of learning data (S229).

そして、ｓ≦Ｓである場合には（Ｓ２２９でＮｏ）、Ｓ２２３に移行する。このようにして、Ｓ２２３〜Ｓ２２９の処理を繰返し実行することにより、当該新学習データ生成処理では、ｓ＝１からｓ＝Ｓの範囲で、各学習データＤ（ｓ）から、学習データＤｖ（ｓ）を生成する。そして、全学習データＤ（ｓ）について、新たな学習データＤｖ（ｓ）を生成し終えると、Ｓ２２９においてＹｅｓと判断し、当該新学習データ生成処理を終了する。 If s ≦ S (No in S229), the process proceeds to S223. In this way, by repeatedly executing the processing of S223 to S229, in the new learning data generation processing, from the learning data D (s) to the learning data Dv (s) in the range of s = 1 to s = S. ) Is generated. When generation of new learning data Dv (s) is completed for all learning data D (s), it is determined Yes in S229, and the new learning data generation processing ends.

また、Ｓ２２０において、新学習データ生成処理を終了すると、ＣＰＵ１１は、Ｓ２３０に移行し、図７に示すカーネルパラメータ設定処理を実行する。図７は、ＣＰＵ１１が実行するカーネルパラメータ設定処理を表すフローチャートである。 In S220, when the new learning data generation process ends, the CPU 11 proceeds to S230 and executes the kernel parameter setting process shown in FIG. FIG. 7 is a flowchart showing kernel parameter setting processing executed by the CPU 11.

カーネルパラメータ設定処理を開始すると、ＣＰＵ１１は、入力ユニットＮ個、中間ユニットＮｈ個、出力ユニット１個の三層フィードフォワードニューラルネットワークを新たな学習対象のニューラルネットワークに設定し、このニューラルネットワークにおいて未知の学習パラメータＷ２＝｛Ｗｃ（０，１），…，Ｗｃ（ｊ，ｋ），…，Ｗｃ（Ｎ，Ｎｈ），Ｗｄ（０），…，Ｗｄ（Ｎｈ）｝の解Ｗ２^*を、Ｓ２２０で生成したＳ個の学習データＤｖ（１），Ｄｖ（２），…，Ｄｖ（Ｓ）に基づき、学習データＤｖ（ｓ）が示す値ｙ（ｓ）を教師信号として、バックプロパゲーション法により求める（Ｓ２３１）。 When the kernel parameter setting process is started, the CPU 11 sets a three-layer feedforward neural network having N input units, Nh intermediate units, and one output unit as a new learning target neural network. learning parameters W2 = {Wc (0,1), ..., Wc (j, k), ..., Wc (N, Nh), Wd (0), ..., Wd (Nh)} solutions W2 ^* of at S220 Based on the generated S pieces of learning data Dv (1), Dv (2),..., Dv (S), the value y (s) indicated by the learning data Dv (s) is used as a teacher signal by the back propagation method. (S231).

尚、図５（ｂ）に示すように、パラメータＷｃ（ｊ，ｋ）は、第ｊ入力ユニット−第ｋ中間ユニット間の結合係数に対応するパラメータであり、パラメータＷｄ（ｋ）は、第ｋ中間ユニット−出力ユニット間の結合係数に対応するパラメータである。また、パラメータＷｃ（０，ｋ）は、第ｋ中間ユニットの閾値に対応するパラメータであり、パラメータＷｄ（０）は、出力ユニットの閾値に対応するするパラメータである。 As shown in FIG. 5B, the parameter Wc (j, k) is a parameter corresponding to the coupling coefficient between the jth input unit and the kth intermediate unit, and the parameter Wd (k) is the kth This is a parameter corresponding to the coupling coefficient between the intermediate unit and the output unit. The parameter Wc (0, k) is a parameter corresponding to the threshold value of the k-th intermediate unit, and the parameter Wd (0) is a parameter corresponding to the threshold value of the output unit.

即ち、Ｓ２３１では、学習パラメータＷ２の解Ｗ２^*として、（Ｎ・Ｎｈ＋Ｎｈ）個の学習パラメータＷｃ（ｊ，ｋ）（但し、ｊ＝０，１，…，Ｎ、ｋ＝１，２，…，Ｎｈである。）の解Ｗｃ^*（ｊ，ｋ）、及び、（Ｎｈ＋１）個の学習パラメータＷｄ（ｋ）（但しｋ＝０，１，…，Ｎｈである。）の解Ｗｄ^*（ｋ）を求める。 That is, in S231, as the solution W2 ^* of learning parameters W2, (N · Nh + Nh ) number of learning parameters Wc (j, k) (however, j = 0,1, ..., N , k = 1,2, ..., Nh.) Solution Wc ^* (j, k) and solution Wd ^* (k) of (Nh + 1) learning parameters Wd (k) (where k = 0, 1,..., Nh). Ask for.

入力値をＵ＝｛ｕ１，…，ｕＮ｝とする入力ユニットＮ個、中間ユニットＮｈ個、出力ユニット１個の三層フィードフォワードニューラルネットワークの出力値ｙは、次式で表すことができる（但し、ｕ０＝１，ｈ０＝１とする。）。 The output value y of the three-layer feedforward neural network with N input units, Nh intermediate units, and one output unit with input values U = {u1,..., UN} can be expressed as , U0 = 1 and h0 = 1.)

従って、Ｓ２３１では、上式に基づき、サンプルＵ（ｓ）が属するカテゴリを表す値ｙ（ｓ）と、当該サンプルＵ（ｓ）をニューラルネットワークに入力したときの出力値ｙ（Ｗ２，Ｕ（ｓ））との二乗誤差Ｅが最小となる値Ｗ２^*を、勾配法により、学習パラメータＷ２の解Ｗ２^*として求める。

Therefore, in S231, based on the above equation, the value y (s) representing the category to which the sample U (s) belongs, and the output value y (W2, U (s) when the sample U (s) is input to the neural network. A value W2 ^* that minimizes the square error E)) is obtained as a solution W2 ^* of the learning parameter W2 by the gradient method.

但し、Ｓ２３１では、Ｓ２１０と同様、実質的には、二乗誤差Ｅが極小値となる値Ｗ２^*を、学習パラメータＷ２の解Ｗ２^*として求めることになる。

However, in S231, as in S210, the value W2 ^{* at} which the square error E is minimal is obtained as the solution W2 ^* of the learning parameter W2.

また、Ｓ２３１での処理を終えると、ＣＰＵ１１は、パラメータｓを値１に設定し（Ｓ２３２）、その後、Ｓ２３３に移行する。
また、Ｓ２３３に移行すると、ＣＰＵ１１は、Ｓ２３１で算出した学習パラメータＷ２の解Ｗ２^*を設定してなる三層フィードフォワードニューラルネットワークに、サンプルＵ（ｓ）を入力したときの中間層の出力値Ｈ（ｓ）＝｛ｈ１（ｓ），…，ｈＮｈ（ｓ）｝を、学習データＤｖ（ｓ）に基づき求める。 When the process in S231 is completed, the CPU 11 sets the parameter s to a value 1 (S232), and then proceeds to S233.
When the process proceeds to S233, the CPU 11 outputs the output value H of the intermediate layer when the sample U (s) is input to the three-layer feedforward neural network in which the solution W2 ^* of the learning parameter W2 calculated in S231 is set. (S) = {h1 (s),..., HNh (s)} is obtained based on the learning data Dv (s).

具体的には、次式に従って、中間層の出力値Ｈを算出する（但し、ｕ０（ｓ）＝１である。）。 Specifically, the output value H of the intermediate layer is calculated according to the following equation (provided that u0 (s) = 1).

また、この後には、パラメータｓの値を１加算し（Ｓ２３４）、Ｓ２３５に移行する。そして、Ｓ２３５では、更新後のパラメータｓの値が学習データの総数Ｓよりも大きいか否かを判断し、ｓ≦Ｓである場合には（Ｓ２３５でＮｏ）、Ｓ２３３に移行する。このようにして、当該カーネルパラメータ設定処理では、Ｓ２３３〜Ｓ２３５の処理を繰返し実行することにより、ｓ＝１からｓ＝Ｓまでの範囲において、学習データＤｖ（ｓ）のサンプルＵ（ｓ）を上記ニューラルネットワークに入力したときに得られる中間層の出力値Ｈ（ｓ）を算出する。そして、ｓ＞Ｓとなると、Ｓ２３５でＹｅｓと判断し、Ｓ２３６に移行する。

Thereafter, the value of the parameter s is incremented by 1 (S234), and the process proceeds to S235. In S235, it is determined whether or not the value of the updated parameter s is larger than the total number S of learning data. If s ≦ S (No in S235), the process proceeds to S233. In this way, in the kernel parameter setting process, by repeatedly executing the processes of S233 to S235, the sample U (s) of the learning data Dv (s) is stored in the range from s = 1 to s = S. An intermediate layer output value H (s) obtained when input to the neural network is calculated. When s> S, it is determined Yes in S235, and the process proceeds to S236.

また、Ｓ２３６に移行すると、ＣＰＵ１１は、第ａ行第ｂ列の要素Ｑ_abが、Ｈ（ｓ＝ａ）とＨ（ｓ＝ｂ）との内積＜Ｈ（ａ），Ｈ（ｂ）＞で表されるＳ行Ｓ列の行列Ｑを算出する。 In S236, the CPU 11 determines that the element Q _ab in the a-th row and the b-th column is an inner product <H (a), H (b)> of H (s = a) and H (s = b). A matrix Q of S rows and S columns represented is calculated.

また、この処理を終えると、ＣＰＵ１１は、Ｓ個の学習データＤｖ（１），…，Ｄｖ（Ｓ）に基づき、次の二乗誤差Ｅが最小となるパラメータγの解γ^*を、勾配法により算出する（Ｓ２３７）。

When this processing is completed, the CPU 11 calculates a solution γ ^{* of the} parameter γ that minimizes the next square error E based on the S pieces of learning data Dv (1),..., Dv (S) by the gradient method. Calculate (S237).

尚、本実施例では、カーネルとしてガウシアンカーネルが採用されているものとする。また、Ｓ２３７では、勾配法によりパラメータγの解γ^*を求めるため、実質的には、二乗誤差Ｅが極小値となる値γ^*を、パラメータγの解γ^*として求めることになる。

In this embodiment, a Gaussian kernel is adopted as the kernel. In S237, since the solution γ ^* of the parameter γ is obtained by the gradient method, the value γ ^{* at} which the square error E becomes the minimum value is substantially obtained as the solution γ ^* of the parameter γ.

また、Ｓ２３７での処理を終えると、ＣＰＵ１１は、算出した解γ^*をカーネルＫ（Ｕ，Ｕ（ｓ），γ）のパラメータγに設定して、非線形サポートベクタマシンの設計に用いるカーネルＫを、次のように設定する（Ｓ２３８）。 When the processing in S237 is completed, the CPU 11 sets the calculated solution γ ^* as the parameter γ of the kernel K (U, U (s), γ), and sets the kernel K used for the design of the nonlinear support vector machine. The following settings are made (S238).

その後、当該カーネルパラメータ設定処理を終了する。

Thereafter, the kernel parameter setting process ends.

また、Ｓ２３０におけるカーネルパラメータ設定処理を終了すると、ＣＰＵ１１は、Ｓ２４０に移行し、非線形サポートベクタマシンを設計する。
具体的に、Ｓ２４０では、非線形サポートベクタマシンにおける未知のパラメータα_s，βの解α_s ^*，β^*を、周知の技法と同様、次の拘束条件付二次最適化問題を解くことにより得る。 When the kernel parameter setting process in S230 ends, the CPU 11 proceeds to S240 and designs a nonlinear support vector machine.
Specifically, in S240, the solutions α _s ^* and β ^* of the unknown parameters α _s and β in the nonlinear support vector machine are obtained by solving the following constrained secondary optimization problem as in the known technique. .

即ち、拘束条件 That is, restraint conditions

の下で、目的関数Ｌα

The objective function Lα

が最大となるパラメータα_s（ｓ＝１，２，…，Ｓ）の解α_s ^*を、非線形サポートベクタマシンのパラメータα_sの解α_s ^*として算出する。

The solution α _s ^* of the parameter α _s (s = 1, 2,..., S) that maximizes is calculated as the solution α _s ^* of the parameter α _s of the nonlinear support vector machine.

また、パラメータβの解β^*は、ゼロでないα_s ^*（ｓ＝１，２，…，Ｓ）に対応するＵ（ｓ）、即ちサポートベクタを用いて、次式により得る。 Further, a solution β ^{* of the} parameter β is obtained by the following equation using U (s) corresponding to α _s ^* (s = 1, 2,..., S) that is not zero, that is, a support vector.

このようにして、Ｓ２４０で非線形サポートベクタマシンのパラメータα_s，βの解α_s ^*，β^*を算出すると、ＣＰＵ１１は、Ｓ２５０に移行し、上述のＳ２１０で算出した学習パラメータＷａ（ｉ，ｊ）の解Ｗａ^*（ｉ，ｊ）、即ち、ニューラルネットワークにおける中間層の出力値Ｚの算出に必要な学習パラメータＷａ（ｉ，ｊ）の解Ｗａ^*（ｉ，ｊ）、及び、Ｓ２４０で設計した非線形サポートベクタマシンのパラメータα_s，β，γの解α_s ^*，β^*，γ^*を記述したデータファイルを生成し、これを、ハードディスク装置１７に書き込む。尚、図４（ｂ）は、Ｓ２５０で出力するデータファイルの構成を示した説明図である。

In this way, when the solutions α _s ^* and β ^* of the parameters α _s and β of the nonlinear support vector machine are calculated in S240, the CPU 11 proceeds to S250, and the learning parameter Wa (i, j calculated in S210 described above is obtained. solution Wa ^* (i, j)) of, i.e., solutions of learning parameters required for calculating the output value Z of the intermediate layer Wa (i, j) in the neural network Wa ^* (i, j), and, designed in S240 A data file describing the solutions α _s ^* , β ^* , γ ^* of the parameters α _s , β, γ of the nonlinear support vector machine is generated and written into the hard disk device 17. FIG. 4B is an explanatory diagram showing the configuration of the data file output in S250.

また、この処理を終えると、ＣＰＵ１１は、Ｓ２６０に移行し、入力値Ｘ＝｛ｘ１，…，ｘｉ，…，ｘＮ１｝から上述した中間層の出力値Ｚ＝Ｕ＝｛ｕ１，…，ｕｊ，…，ｕＮ｝への変換式 When this process is finished, the CPU 11 proceeds to S260, and the intermediate layer output value Z = U = {u1,..., Uj, from the input value X = {x1,..., Xi,. ..., uN} conversion formula

及び、設計した非線形サポートベクタマシン（識別関数）

And designed non-linear support vector machine (discriminant function)

を記したモデル導出結果出力画面を、表示装置２１を通じて表示し、その後、当該モデル導出処理を終了する。

Is displayed through the display device 21, and then the model derivation process is terminated.

尚、このようにしてモデル導出装置１から出力される学習データＤ（１），…，Ｄ（Ｓ）に対応する計算モデルの導出結果は、認識システムの設計者により利用される。即ち、この導出結果に従って、認識システムは、入力値Ｘから上記ニューラルネットワークにおける出力層より１層手前の中間層の出力値Ｚを算出し、この出力値Ｚを非線形サポートベクタマシンの入力値Ｕとして、当該非線形サポートベクタマシンの出力値ｙを算出し、入力値Ｘに対応するカテゴリを認識する構成にされる。 Note that the calculation model derivation result corresponding to the learning data D (1),..., D (S) output from the model derivation device 1 in this way is used by the designer of the recognition system. That is, according to this derivation result, the recognition system calculates the output value Z of the intermediate layer one layer before the output layer in the neural network from the input value X, and uses this output value Z as the input value U of the nonlinear support vector machine. The output value y of the nonlinear support vector machine is calculated, and the category corresponding to the input value X is recognized.

図８は、上述のモデル導出装置１により導出された計算モデルが搭載されてなる認識装置３０の構成例を示したブロック図である。この認識装置３０は、特徴抽出部３１と、認識部３３と、出力インタフェース３５と、を備え、音声データや画像データなどの認識対象パターンが外部より入力されると、特徴抽出部３１により、認識対象パターンの特徴を抽出してＮ１次元の特徴ベクトルを生成し、これを認識部３３への入力Ｘ＝｛ｘ１，…，ＸＮ１｝とする。 FIG. 8 is a block diagram showing a configuration example of the recognition device 30 on which the calculation model derived by the model derivation device 1 is mounted. The recognition device 30 includes a feature extraction unit 31, a recognition unit 33, and an output interface 35. When a recognition target pattern such as audio data or image data is input from the outside, the recognition unit 30 recognizes the feature extraction unit 31. The feature of the target pattern is extracted to generate an N1-dimensional feature vector, which is input to the recognition unit 33 as X = {x1,..., XN1}.

また、認識部３３は、特徴抽出部３１より特徴ベクトルＸ＝｛ｘ１，…，ＸＮ１｝が入力されると、次の認識処理を実行する。図９は、認識部３３が実行する認識処理を表すフローチャートである。 Further, when the feature vector X = {x1,..., XN1} is input from the feature extraction unit 31, the recognition unit 33 performs the following recognition process. FIG. 9 is a flowchart showing a recognition process executed by the recognition unit 33.

認識処理を開始すると、認識部３３は、まず入力値Ｘ＝｛ｘ１，…，ＸＮ１｝を、Ｎ次元のベクトルＵ＝｛ｕ１，…，ｕＮ｝に変換する（Ｓ３１０）。但し、ベクトルＵの第ｊ要素の値ｕｊは、次の値を採る。 When the recognition process is started, the recognition unit 33 first converts the input value X = {x1,..., XN1} into an N-dimensional vector U = {u1,..., UN} (S310). However, the value uj of the j-th element of the vector U takes the following value.

尚、Ｗａ^*（ｉ，ｊ）は、定数であり、設計段階でモデル導出装置１により算出された学習パラメータＷａ（ｉ，ｊ）の解に対応するものである。また、ここでは、ｘ０＝１として、値ｕｊを算出する。

Wa ^* (i, j) is a constant and corresponds to the solution of the learning parameter Wa (i, j) calculated by the model deriving device 1 at the design stage. Here, the value uj is calculated assuming x0 = 1.

また、この処理を終えると、認識部３３は、算出したベクトルＵに基づき、次式により、カテゴリを表す値ｙを算出する（Ｓ３２０）。 When this process is finished, the recognition unit 33 calculates a value y representing the category by the following equation based on the calculated vector U (S320).

尚、値Ｓ，β^*，γ^*及びα_S ^*，ｙ（ｓ），Ｕ（ｓ）（ｓ＝１，２，…，Ｓ）も同様に、定数であり、上述の手法により算出された解α_S ^*，β^*，γ^*及び解の導出に用いられた学習データＤｖ（１），…，Ｄｖ（Ｓ）に対応した値に、設計段階で設定されるものである。

The values S, β ^* , γ ^* and α _S ^* , y (s), U (s) (s = 1, 2,..., S) are also constants, and are calculated by the above-described method. The values corresponding to the solutions α _S ^* , β ^* , γ ^* and the learning data Dv (1),..., Dv (S) used for derivation of the solutions are set at the design stage.

そして、Ｓ３２０の処理を終えると、認識部３３は、算出したカテゴリを表す値ｙを、入力パターンの認識結果として、出力インタフェース３５を通じて出力する（Ｓ３３０）。その後、当該認識処理を終了する。尚、このようにして、認識装置３０から出力されるカテゴリを表す値ｙは、後段の情報処理装置（図示せず）に入力され、この認識結果に対応する処理が、当該情報処理装置にて実行される。 When the process of S320 is completed, the recognition unit 33 outputs the value y representing the calculated category as an input pattern recognition result through the output interface 35 (S330). Thereafter, the recognition process is terminated. In this way, the value y representing the category output from the recognition device 30 is input to the subsequent information processing device (not shown), and the processing corresponding to the recognition result is performed by the information processing device. Executed.

以上、第一実施例のモデル導出装置１及び認識装置３０の構成について説明したが、本実施例では、認識システムの計算モデルとして、ニューラルネットワークと非線形サポートベクタマシンとを組み合わせた計算モデルを導出するようにした。具体的には、任意の認識対象に対して好適な計算モデルを導出するために、まずＳ個の各学習データＤ（ｓ）が示す入力値ＸのサンプルＸ（ｓ）及び教師信号ｙ（ｓ）に基づき、ニューラルネットワークを学習し、学習したニューラルネットワークにより、サンプルＸ（ｓ）を、線形分離しやすい値Ｚに変換して、新たな学習データＤｖ（ｓ）を生成するようにした。そして、この学習データＤｖ（ｓ）に基づき、非線形サポートベクタマシンを設計することにより、サンプルをカテゴリ毎に分離可能な非線形サポートベクタマシンを求め、入力値Ｘに対応するカテゴリを精度よく認識可能な計算モデルを導出するようにした。 The configuration of the model deriving device 1 and the recognition device 30 according to the first embodiment has been described above. In this embodiment, a calculation model combining a neural network and a nonlinear support vector machine is derived as a calculation model of the recognition system. I did it. Specifically, in order to derive a suitable calculation model for an arbitrary recognition target, first, the sample X (s) of the input value X indicated by each of the S pieces of learning data D (s) and the teacher signal y (s ), The neural network is learned, and the learned neural network converts the sample X (s) into a value Z that can be easily linearly separated to generate new learning data Dv (s). Then, by designing a nonlinear support vector machine based on the learning data Dv (s), a nonlinear support vector machine capable of separating samples into categories can be obtained, and a category corresponding to the input value X can be accurately recognized. A calculation model was derived.

従来知られるニューラルネットワークの学習方法では、学習パラメータＷ１の解として、局所解しか得られないため、最適なニューラルネットワークを構成することができる理論的な保証がなかったが、本実施例では、理論的に最適解を求めることが可能な非線形サポートベクタマシンを用いて計算モデルを導出しているので、学習データに基づき最適な認識システムの計算モデルを導出することができる。 In the conventionally known neural network learning method, only a local solution can be obtained as a solution of the learning parameter W1, and thus there is no theoretical guarantee that an optimal neural network can be configured. Since the calculation model is derived using a non-linear support vector machine capable of obtaining an optimal solution, an optimal recognition system calculation model can be derived based on the learning data.

また、ニューラルネットワークでは、学習データ外の入力があった場合に、これを十分正確に認識することができないが、本実施例では、非線形サポートベクタマシンを用いて計算モデルを導出しているので、学習データ外のパターン入力に対しても良好な認識結果を得ることが可能な計算モデルを導出することができる。 In addition, in the neural network, when there is an input outside the learning data, this cannot be recognized sufficiently accurately, but in this example, since a calculation model is derived using a nonlinear support vector machine, It is possible to derive a calculation model capable of obtaining a good recognition result even for pattern input outside the learning data.

また、従来の非線形サポートベクタマシンの設計方法では、サンプルＸ（ｓ）が元々高い線形分離性を示さないと、好適な非線形サポートベクタマシンを設計することができないといった問題があったが、本実施例のモデル導出装置１では、入力値Ｘ（ｓ）を線形分離しやすい値Ｚ（ｓ）に置換して非線形サポートベクタマシンを設計するので、従来よりも様々な認識システムに対して、非線形サポートベクタマシンの手法を採用することができる。 Further, the conventional nonlinear support vector machine design method has a problem that a suitable nonlinear support vector machine cannot be designed unless the sample X (s) originally exhibits high linear separation. In the model derivation device 1 of the example, the nonlinear support vector machine is designed by replacing the input value X (s) with a value Z (s) that can be easily linearly separated. Vector machine techniques can be employed.

この他、本実施例では、非線形サポートベクタマシンに用いるカーネルのパラメータγについても学習データＤｖ（ｓ）に基づき学習するようにしたので、従来のように、設計者が試行錯誤によりパラメータγの値を設定する必要がなく、認識システムの設計を効率的に行うことができる。 In addition, in this embodiment, since the kernel parameter γ used in the nonlinear support vector machine is also learned based on the learning data Dv (s), the value of the parameter γ is determined by the designer through trial and error as in the past. Therefore, the recognition system can be designed efficiently.

続いて、第二実施例のモデル導出装置１及び認識装置３０について説明する。第二実施例のモデル導出装置１は、ＣＰＵ１１が実行するモデル導出処理及び新学習データ生成処理の内容が、第一実施例のモデル導出装置１と異なる程度であり、第二実施例の認識装置３０は、認識部３３で実行する認識処理の内容が、第一実施例の認識装置３０と異なる程度である。従って、以下では、上記異なる処理の内容を、図１０及び図１１を用いて説明するに留める。 Next, the model derivation device 1 and the recognition device 30 of the second embodiment will be described. The model derivation device 1 of the second embodiment is different from the model derivation device 1 of the first embodiment in the contents of the model derivation process and new learning data generation process executed by the CPU 11, and the recognition device of the second embodiment. 30 is the extent to which the content of the recognition process executed by the recognition unit 33 is different from the recognition device 30 of the first embodiment. Therefore, hereinafter, the contents of the different processes will be described with reference to FIGS. 10 and 11.

図１０（ａ）は、第二実施例においてＣＰＵ１１が実行するモデル導出処理を表したフローチャートであり、処理の後半部分を抜粋して示した図である。また、図１０（ｂ）は、第二実施例においてＣＰＵ１１が実行する新学習データ生成処理を表すフローチャートである。 FIG. 10A is a flowchart showing a model derivation process executed by the CPU 11 in the second embodiment, and is a diagram showing an extracted part of the latter half of the process. FIG. 10B is a flowchart showing new learning data generation processing executed by the CPU 11 in the second embodiment.

第二実施例において、ＣＰＵ１１は、モデル導出処理を開始すると、第一実施例と同様に、Ｓ１１０からＳ２１０までの処理を実行し、その後のＳ４００にて、図１０（ｂ）に示す新学習データ生成処理を実行する。 In the second embodiment, when starting the model derivation process, the CPU 11 executes the processes from S110 to S210 as in the first embodiment, and in S400 thereafter, the new learning data shown in FIG. Execute the generation process.

この新学習データ生成処理を開始すると、ＣＰＵ１１は、まず、パラメータｓを値１に設定する（Ｓ４１０）。その後、Ｓ２１０で算出した学習パラメータＷ１の解Ｗ１^*を設定してなる三層フィードフォワードニューラルネットワークに、サンプルＸ（ｓ）を入力したときの中間層の出力値Ｚ（ｓ）＝｛ｚ１（ｓ），…，ｚＮ２（ｓ）｝を、学習データＤ（ｓ）に基づき求める（Ｓ４２０）。 When the new learning data generation process is started, the CPU 11 first sets the parameter s to a value 1 (S410). Thereafter, the output value Z (s) = {z1 (s) of the intermediate layer when the sample X (s) is input to the three-layer feedforward neural network in which the solution W1 ^* of the learning parameter W1 calculated in S210 is set. ,..., ZN2 (s)} is obtained based on the learning data D (s) (S420).

具体的には、次式に従って、中間層の出力値Ｚ（ｓ）を算出する（但し、ｘ０（ｓ）＝１とする。）。 Specifically, the output value Z (s) of the intermediate layer is calculated according to the following equation (provided that x0 (s) = 1).

また、この処理後には、パラメータｓを１加算した値に更新し（Ｓ４２５）、更新後のパラメータｓの値が学習データの総数Ｓよりも大きいか否かを判断する（Ｓ４３０）。

After this processing, the parameter s is updated to a value obtained by adding 1 (S425), and it is determined whether or not the updated parameter s is greater than the total number S of learning data (S430).

そして、ｓ≦Ｓである場合には（Ｓ４３０でＮｏ）、Ｓ４２０に移行する。このようにして、Ｓ４２０〜Ｓ４３０の処理を繰返し実行することにより、当該新学習データ生成処理では、ｓ＝１からｓ＝Ｓまでの範囲において、各学習データＤ（ｓ）のサンプルＸ（ｓ）を入力したときの中間層の出力値Ｚ（ｓ）＝｛ｚ１（ｓ），…，ｚＮ２（ｓ）｝を求める。 If s ≦ S (No in S430), the process proceeds to S420. In this way, by repeatedly executing the processing of S420 to S430, in the new learning data generation processing, the sample X (s) of each learning data D (s) in the range from s = 1 to s = S. The output value Z (s) = {z1 (s),..., ZN2 (s)} of the intermediate layer when.

そして、全学習データＤ（ｓ）のサンプルＸ（ｓ）について、対応する中間層の出力値Ｚ（ｓ）＝｛ｚ１（ｓ），…，ｚＮ２（ｓ）｝を求め終えると、Ｓ４３０においてＹｅｓと判断し、Ｓ４４０に移行する。 When the output value Z (s) = {z1 (s),..., ZN2 (s)} of the corresponding intermediate layer is obtained for the sample X (s) of all the learning data D (s), Yes in S430. And the process proceeds to S440.

Ｓ４４０に移行すると、ＣＰＵ１１は、算出したＺ（１），…，Ｚ（Ｓ）の平均値μ＝｛μ１，…，μＮ２｝を、次式に従って算出する。 After shifting to S440, the CPU 11 calculates the average value μ = {μ1,..., ΜN2} of the calculated Z (1),.

また、この後には、算出したμを用いて、分散共分散行列Ｃを、次式に従って算出する（Ｓ４４５）。尚、上付き文字Ｔは、転置を表すものとする。

Thereafter, the variance-covariance matrix C is calculated according to the following equation using the calculated μ (S445). The superscript letter T represents transposition.

Ｓ４４５において分散共分散行列Ｃを算出すると、ＣＰＵ１１は、この行列Ｃの固有値λｒ及び固有ベクトルＪｒ（ｒ＝１，２，…）を算出し（Ｓ４５０）、固有値λｒの大きい固有ベクトルＪｒから順にＮ３個の固有ベクトルＪｒを、主軸Ｊ１，…，Ｊｍ，…，ＪＮ３に設定する（Ｓ４５５）。尚、主軸に設定する固有ベクトルの数Ｎ３は、２以上でＺ（ｓ）の次数Ｎ２よりも小さい値に設定される。具体的に、主軸に設定する固有ベクトルの数Ｎ３は、Ｚ（ｓ）の次数Ｎ２よりも所定数小さい値として設計段階で予め定められてもよいし、得られた固有値λｒ（ｒ＝１，２，…）から最適な値に設定するようにしてもよい。

When the variance-covariance matrix C is calculated in S445, the CPU 11 calculates eigenvalues λr and eigenvectors Jr (r = 1, 2,...) Of the matrix C (S450), and N3 pieces in order from the eigenvector Jr having the largest eigenvalue λr The eigenvector Jr is set to the principal axes J1,..., Jm,..., JN3 (S455). Note that the number N3 of eigenvectors set for the main axis is set to a value of 2 or more and smaller than the order N2 of Z (s). Specifically, the number N3 of eigenvectors set on the main axis may be predetermined in the design stage as a value smaller than the order N2 of Z (s) at the design stage, or the obtained eigenvalues λr (r = 1, 2). ,...) May be set to an optimum value.

このようにしてＳ４５５での処理を終えると、ＣＰＵ１１は、Ｓ４６０に移行し、パラメータＮを主軸の個数Ｎ３に設定して、Ｎ＝Ｎ３次元のパラメータＵ（ｓ）（ｓ＝１，…，Ｓ）を生成する（Ｓ４６０）。また、パラメータｓを値１に設定する（Ｓ４６５）。 When the processing in S455 is completed in this way, the CPU 11 proceeds to S460, sets the parameter N to the number N3 of spindles, and N = N3-dimensional parameter U (s) (s = 1,..., S ) Is generated (S460). Further, the parameter s is set to a value 1 (S465).

その後、Ｓ４２０の処理で算出したＺ（ｓ）を、主軸Ｊ１，…，Ｊｍ，…，ＪＮ３を用いて、Ｎ３次元のベクトルＶ＝｛ｖ１，…，ｖｍ，…，ｖＮ３｝に変換する（Ｓ４７０）。尚、このベクトルＶにおける第ｍ要素の値ｖｍは、次式で算出される。 Thereafter, Z (s) calculated in the process of S420 is converted into an N3-dimensional vector V = {v1,..., Vm,..., VN3} using the main axes J1,. ). Note that the value vm of the m-th element in this vector V is calculated by the following equation.

また、このようにして、ベクトルＺ（ｓ）を、ベクトルＶに変換すると、ＣＰＵ１１は、算出した値Ｖを、パラメータＵ（ｓ）＝｛ｕ１（ｓ），…，ｕＮ（ｓ）｝に設定し、新たな学習データＤｖ（ｓ）＝｛Ｕ（ｓ）＝Ｖ，ｙ（ｓ）｝を生成する（Ｓ４７５）。その後、パラメータｓを１加算した値に更新し（Ｓ４８０）、更新後のパラメータｓの値が学習データの総数Ｓよりも大きいか否かを判断する（Ｓ４９０）。

Further, when the vector Z (s) is converted into the vector V in this way, the CPU 11 sets the calculated value V to the parameter U (s) = {u1 (s),..., UN (s)}. Then, new learning data Dv (s) = {U (s) = V, y (s)} is generated (S475). Thereafter, the parameter s is updated to a value obtained by adding 1 (S480), and it is determined whether or not the updated parameter s is greater than the total number S of learning data (S490).

そして、ｓ≦Ｓである場合には（Ｓ４９０でＮｏ）、Ｓ４７０に移行する。このようにして、Ｓ４７０〜Ｓ４９０の処理を繰返し実行することにより、当該新学習データ生成処理では、ｓ＝１からｓ＝Ｓまでの範囲において、各学習データＤ（ｓ）に基づいて算出したＺ（ｓ）から、学習データＤｖ（ｓ）を生成する。そして、全学習データＤ（ｓ）について、新たな学習データＤｖ（ｓ）を生成し終えると、Ｓ４９０でＹｅｓと判断し、当該新学習データ生成処理を終了する。 If s ≦ S (No in S490), the process proceeds to S470. In this way, by repeatedly executing the processing of S470 to S490, in the new learning data generation processing, Z calculated based on each learning data D (s) in the range from s = 1 to s = S. Learning data Dv (s) is generated from (s). When generation of new learning data Dv (s) has been completed for all learning data D (s), it is determined Yes in S490, and the new learning data generation processing ends.

Ｓ４００での新学習データ生成処理を終了すると、ＣＰＵ１１は、Ｓ５００に移行し、第一実施例と同様にカーネルパラメータ設定処理を実行して、パラメータγの解γ^*を求める。その後、Ｓ５１０に移行する。 When the new learning data generation process in S400 is completed, the CPU 11 proceeds to S500 and executes the kernel parameter setting process as in the first embodiment to obtain a solution γ ^{* of the} parameter γ. Thereafter, the process proceeds to S510.

また、Ｓ５１０に移行すると、ＣＰＵ１１は、学習データＤｖ（１），…，Ｄｖ（Ｓ）及びパラメータγの解γ^*を用いて、非線形サポートベクタマシンを設計する。即ち、Ｓ２４０と同様にして、拘束条件付二次最適化問題を解き、非線形サポートベクタマシンにおける未知のパラメータα_s，βの解α_s ^*，β^*を、求める。 In S510, the CPU 11 designs a nonlinear support vector machine using the learning data Dv (1),..., Dv (S) and the solution γ ^{* of the} parameter γ. That is, similarly to S240, the constrained secondary optimization problem is solved, and the solutions α _s ^* and β ^* of the unknown parameters α _s and β in the nonlinear support vector machine are obtained.

このようにして、Ｓ５１０で非線形サポートベクタマシンのパラメータα_s，βの解α_s ^*，β^*を算出すると、ＣＰＵ１１は、Ｓ５２０に移行し、上述のＳ２１０で算出した学習パラメータＷａ（ｉ，ｊ）の解Ｗａ^*（ｉ，ｊ）、即ち、中間層の出力値Ｚの算出に必要な学習パラメータＷａ（ｉ，ｊ）の解Ｗａ^*（ｉ，ｊ）、及び、ベクトルＺからベクトルＶへの変換に必要な主軸Ｊ１，…，ＪＮ３、Ｓ５１０で設計した非線形サポートベクタマシンのパラメータα_s，β，γの解α_s ^*，β^*，γ^*を記述したデータファイルを生成し、これを、ハードディスク装置１７に書き込む。具体的に、Ｓ５２０では、図４（ｂ）に示す記述に対して、更に、Ｎ３個の主軸Ｊ１，Ｊ２，…，ＪＮ３の値を記述して、データファイルを生成し、これを、ハードディスク装置１７に書き込む。 In this manner, when the solutions α _s ^* and β ^* of the parameters α _s and β of the nonlinear support vector machine are calculated in S510, the CPU 11 proceeds to S520, and the learning parameter Wa (i, j calculated in S210 described above is obtained. ) solution Wa ^* (i, j), i.e., solutions Wa ^* (i learning parameters required for calculating the output value Z of the intermediate layer Wa (i, j), j), and, from the vector Z to the vector V A data file describing the solutions α _s ^* , β ^* , γ ^* of the parameters α _s , β, γ of the nonlinear support vector machine designed in the main axes J1,. To the hard disk device 17. Specifically, in S520, in addition to the description shown in FIG. 4B, the values of N3 spindles J1, J2,..., JN3 are described to generate a data file, which is stored in the hard disk device. 17 is written.

また、この処理を終えると、ＣＰＵ１１は、Ｓ５３０に移行し、入力値Ｘ＝｛ｘ１，…，ｘｉ，…，ｘＮ１｝から上述したベクトルＶ＝Ｕ＝｛ｕ１，…，ｕｍ，…，ｕＮ｝への変換式 When this process is finished, the CPU 11 proceeds to S530, and the vector V = U = {u1,..., Um,..., UN} described above from the input value X = {x1,. Conversion formula to

及び、Ｓ５１０で設計した非線形サポートベクタマシンを記したモデル導出結果出力画面を、表示装置２１を通じて表示し、その後、当該モデル導出処理を終了する。

And the model derivation result output screen which described the nonlinear support vector machine designed by S510 is displayed through the display apparatus 21, and the said model derivation process is complete | finished after that.

以上、第二実施例のモデル導出処理について説明したが、第二実施例では、この出力結果に基づき、入力値Ｘから上記ニューラルネットワークの出力層より１層手前の中間層の出力値Ｚを算出し、この出力値Ｚを、Ｎ３個のＮ２次元ベクトルＪｍを用いて次元削減し、次元削減後の値Ｖを非線形サポートベクタマシンの入力値Ｕとして当該非線形サポートベクタマシンの出力値ｙを算出する計算モデルが、認識装置３０に搭載されることになる。 The model derivation process of the second embodiment has been described above. In the second embodiment, based on this output result, the output value Z of the intermediate layer one layer before the output layer of the neural network is calculated from the input value X. Then, the dimension of the output value Z is reduced using N3 N2-dimensional vectors Jm, and the output value y of the nonlinear support vector machine is calculated using the dimension-reduced value V as the input value U of the nonlinear support vector machine. The calculation model is mounted on the recognition device 30.

続いて、この計算モデルが搭載された第二実施例の認識装置３０の認識部３３が実行する認識処理について説明する。図１１は、第二実施例の認識部３３が実行する認識処理を表すフローチャートである。 Subsequently, a recognition process executed by the recognition unit 33 of the recognition device 30 according to the second embodiment on which the calculation model is mounted will be described. FIG. 11 is a flowchart showing a recognition process executed by the recognition unit 33 of the second embodiment.

特徴抽出部３１から特徴ベクトルＸが入力されて、図１１に示す認識処理を開始すると、認識部３３は、まず入力値Ｘ＝｛ｘ１，…，ＸＮ１｝を、Ｎ２次元のベクトルＺ＝｛ｚ１，…，ｚＮ２｝に変換する（Ｓ６１０）。但し、ベクトルＺの第ｊ（ｊ＝１，２，…，Ｎ２）要素の値ｚｊは、次の値を採る。 When the feature vector X is input from the feature extraction unit 31 and the recognition process illustrated in FIG. 11 is started, the recognition unit 33 first converts the input value X = {x1,..., XN1} into an N2-dimensional vector Z = {z1. ,..., ZN2} (S610). However, the value zj of the j-th (j = 1, 2,..., N2) element of the vector Z takes the following values.

尚、Ｗａ^*（ｉ，ｊ）は、定数であり、設計段階でモデル導出装置１により算出された学習パラメータＷａ（ｉ，ｊ）の解に対応するものである。また、ここでは、ｘ０＝１として、値ｚｊを算出する（Ｓ６１０）。

Wa ^* (i, j) is a constant and corresponds to the solution of the learning parameter Wa (i, j) calculated by the model deriving device 1 at the design stage. Here, the value zj is calculated with x0 = 1 (S610).

また、この処理を終えると、認識部３３は、算出したベクトルＺを、次式に従い次元削減して、Ｎ次元のベクトルＵ＝｛ｕ１，…，ｕｍ，…，ｕＮ｝に変換する（Ｓ６２０）。尚、ベクトルＪｍは、設計段階でモデル導出装置１により算出された主軸に対応するものである。 When this processing is completed, the recognition unit 33 reduces the dimension of the calculated vector Z according to the following equation and converts it into an N-dimensional vector U = {u1,..., Um,..., UN} (S620). . The vector Jm corresponds to the main axis calculated by the model deriving device 1 at the design stage.

その後、算出したベクトルＵに基づき、次式により、カテゴリを表す値ｙを算出する（Ｓ６３０）。

Thereafter, based on the calculated vector U, a value y representing the category is calculated by the following equation (S630).

そして、Ｓ６３０の処理を終えると、認識部３３は、算出したカテゴリを表す値ｙを、入力パターンの認識結果として、出力インタフェース３５を通じて出力する（Ｓ６４０）。その後、当該認識処理を終了する。 When the processing of S630 is completed, the recognition unit 33 outputs the value y representing the calculated category through the output interface 35 as the recognition result of the input pattern (S640). Thereafter, the recognition process is terminated.

以上、第二実施例のモデル導出装置１及び認識装置３０の動作について説明したが、本実施例によれば、ニューラルネットワークの中間層の出力値Ｚを次元削減するため、少ない演算量でパターン認識可能な認識システムの計算モデルを導出することができ、大変便利である。 The operation of the model derivation device 1 and the recognition device 30 according to the second embodiment has been described above. According to this embodiment, the pattern recognition is performed with a small amount of computation in order to reduce the dimension of the output value Z of the intermediate layer of the neural network. A calculation model of a possible recognition system can be derived, which is very convenient.

尚、本発明の情報処理装置が備える取得手段は、上記実施例においてＳ１１０〜Ｓ２００の処理により実現され、学習手段は、Ｓ２１０の処理により実現されている。また、新学習データ生成手段は、第一実施例においてＳ２２０の処理により実現され、第二実施例においてＳ４００の処理により実現されている。 Note that the acquisition means included in the information processing apparatus of the present invention is realized by the processing of S110 to S200 in the above embodiment, and the learning means is realized by the processing of S210. Further, the new learning data generating means is realized by the process of S220 in the first embodiment, and is realized by the process of S400 in the second embodiment.

また、サポートベクタマシン設計手段は、第一実施例においてＳ２３０及びＳ２４０の処理により実現され、第二実施例においてＳ５００及びＳ５１０の処理により実現されている。その他、出力手段は、第一実施例においてＳ２５０〜Ｓ２６０の処理により実現され、第二実施例においてＳ５２０〜Ｓ５３０の処理により実現されている。 Further, the support vector machine design means is realized by the processes of S230 and S240 in the first embodiment, and is realized by the processes of S500 and S510 in the second embodiment. In addition, an output means is implement | achieved by the process of S250-S260 in a 1st Example, and is implement | achieved by the process of S520-S530 in a 2nd Example.

この他、本発明の設計装置が備える取得手段は、第一実施例においてＳ２２０の処理により実現され、第二実施例においてＳ４００の処理により実現されている。また、適値算出手段は、カーネルパラメータ設定処理により実現され、設計手段は、第一実施例においてＳ２４０の処理により実現され、第二実施例においてＳ５１０の処理により実現されている。 In addition, the acquisition means provided in the design apparatus of the present invention is realized by the process of S220 in the first embodiment, and is realized by the process of S400 in the second embodiment. The appropriate value calculation means is realized by the kernel parameter setting process, and the design means is realized by the process of S240 in the first embodiment, and is realized by the process of S510 in the second embodiment.

また、本発明は、上記実施例に限定されるものではなく、種々の態様を採ることができる。
例えば、上記実施例では、ガウシアンカーネルを用いた非線形サポートベクタマシンを設計するようにしたが、多項式カーネルなどの他のカーネルを用いて非線形サポートベクタマシンを設計してもよい。また、カーネルを用いずに、非線形サポートベクタマシンを設計してもよい。 Further, the present invention is not limited to the above-described embodiments, and can take various forms.
For example, in the above-described embodiment, the nonlinear support vector machine using the Gaussian kernel is designed, but the nonlinear support vector machine may be designed using another kernel such as a polynomial kernel. Further, a non-linear support vector machine may be designed without using a kernel.

カーネルを用いずに、非線形サポートベクタマシンを設計する場合には、例えば、 When designing a non-linear support vector machine without using a kernel, for example,

として、１以上の整数の組合せ｛ｅ１（ｎ），ｅ２（ｎ），…，ｅＮ（ｎ）｝を、ｎ＝１，２，…，Ｎ５において組合せが重複しないように、Ｎ５個採り、Ｎ次元のベクトルＵ＝｛ｕ１，…，ｕＮ｝を、それより高次元のＮ５次元のベクトルφ（Ｕ）＝｛ｂ１，…，ｂＮ５｝に置換する写像φを定めればよい。勿論、このようにモデル導出装置１を構成する場合には、カーネルパラメータ設定処理を実行しないように、モデル導出処理を構成すればよい。

Assuming that N5 integer combinations {e1 (n), e2 (n),..., EN (n)} are taken so that the combinations do not overlap at n = 1, 2,. A mapping φ that replaces the dimensional vector U = {u1,..., UN} with a higher-dimensional N5 dimensional vector φ (U) = {b1,. Of course, when the model deriving device 1 is configured in this way, the model deriving process may be configured not to execute the kernel parameter setting process.

この他、Ｓ３３０，Ｓ６４０の処理では、カテゴリを表す値ｙと共に、認識結果（値ｙ）の確度を表す情報を、出力インタフェース３５を通じて出力するように、認識装置３０を構成してもよい。 In addition, in the processes of S330 and S640, the recognition device 30 may be configured to output information indicating the accuracy of the recognition result (value y) together with the value y indicating the category through the output interface 35.

具体的に、認識結果の確度を表す情報としては、値ｙを算出する際にＳ３２０，Ｓ６３０で求められる非線形サポートベクタマシンを構成する符号関数への入力値ｐの絶対値｜ｐ｜を、採用することができる。 Specifically, as the information representing the accuracy of the recognition result, the absolute value | p | of the input value p to the sign function constituting the nonlinear support vector machine obtained in S320 and S630 when the value y is calculated is adopted. can do.

上述したように、符号関数への入力値ｐは、超平面からの符号付距離に対応する値である。従って、入力値ｐの絶対値が大きければ、認識結果の確度が高いということができ、入力値ｐの絶対値が小さければ、認識結果の確度が低いということができる。

As described above, the input value p to the sign function is a value corresponding to the signed distance from the hyperplane. Therefore, it can be said that the accuracy of the recognition result is high if the absolute value of the input value p is large, and the accuracy of the recognition result is low if the absolute value of the input value p is small.

従って、認識結果（値ｙ）と共に、このような確度を表す情報（値｜ｐ｜）を出力するように認識装置３０を構成すれば、出力先の情報処理装置にて、確度を表す情報から認識処理のやり直しの要否を判断することができて、確度が低い場合には、例えば、認識対象としての音声をユーザに再度発声させることにより、認識処理をやり直すことができる。 Therefore, if the recognition apparatus 30 is configured to output such information (value | p |) indicating the accuracy together with the recognition result (value y), the information processing apparatus that is the output destination uses the information indicating the accuracy. If it is possible to determine whether the recognition process needs to be performed again and the accuracy is low, the recognition process can be performed again by, for example, causing the user to utter the voice as the recognition target.

モデル導出装置１の構成を表すブロック図である。2 is a block diagram illustrating a configuration of a model derivation device 1. FIG. ＣＰＵ１１が実行するモデル導出処理を表すフローチャートである。It is a flowchart showing the model derivation process which CPU11 performs. モデル導出処理により導出する計算モデルの基本構成図（ａ）及び従来の計算モデルの構成図（ｂ）（ｃ）である。It is a basic composition figure (a) of a calculation model derived by model derivation processing, and a composition figure (b) (c) of the conventional calculation model. 入出力されるデータファイルの構成を表した説明図である。It is explanatory drawing showing the structure of the data file input / output. パラメータと結合係数との対応関係を示した説明図である。It is explanatory drawing which showed the correspondence of a parameter and a coupling coefficient. ＣＰＵ１１が実行する新学習データ生成処理を表すフローチャートである。It is a flowchart showing the new learning data generation process which CPU11 performs. ＣＰＵ１１が実行するカーネルパラメータ設定処理を表すフローチャートである。It is a flowchart showing the kernel parameter setting process which CPU11 performs. 認識装置３０の構成例を示したブロック図である。3 is a block diagram illustrating a configuration example of a recognition device 30. FIG. 認識部３３が実行する認識処理を表すフローチャートである。It is a flowchart showing the recognition process which the recognition part 33 performs. 第二実施例においてＣＰＵ１１が実行するモデル導出処理を表すフローチャート（ａ）及び新学習データ生成処理を表すフローチャート（ｂ）である。It is the flowchart (a) showing the model derivation process which CPU11 performs in a 2nd Example, and the flowchart (b) showing new learning data generation processing. 第二実施例において認識部３３が実行する認識処理を表すフローチャートである。It is a flowchart showing the recognition process which the recognition part 33 performs in 2nd Example. ニューラルネットワークによる演算方法の例を示した説明図（ａ）及びニューラルネットワークの問題点を示した説明図（ｂ）である。It is explanatory drawing (a) which showed the example of the calculation method by a neural network, and explanatory drawing (b) which showed the problem of the neural network.

Explanation of symbols

１…モデル導出装置、１１…ＣＰＵ、１３…ＲＯＭ、１５…ＲＡＭ、１７…ハードディスク装置、２１…表示装置、２３…ユーザインタフェース、２５…ドライブ装置、３０…認識装置、３１…特徴抽出部、３３…認識部、３５…出力インタフェース DESCRIPTION OF SYMBOLS 1 ... Model deriving device, 11 ... CPU, 13 ... ROM, 15 ... RAM, 17 ... Hard disk device, 21 ... Display device, 23 ... User interface, 25 ... Drive device, 30 ... Recognition device, 31 ... Feature extraction part, 33 ... recognition unit, 35 ... output interface

Claims

From a given input value X = {x1,..., XN ₁ } (where the value N ₁ is an integer of 2 or more), a value y representing a category corresponding to the input value X is calculated in a predetermined manner. calculated by the model, the calculation model of recognizing systems the category corresponding to the input value X, the input value X of the sample X (s) and the value y representing the category to which the sample belongs (s) A method of deriving based on arbitrary S pieces of learning data D (s) = {X (s), y (s)} (where s = 1,..., S) composed of combinations. ,
As a learning target neural network, a (L ₁ +1) layer neural network in which the 0th layer is an input layer composed of N ₁ input units and the L ₁ layer is an output layer (however, the value L ₁ is 2 or more) In this neural network, an unknown number of learning parameters W ₁ are used as the training signal D by using the value y (s) of each learning data D (s) as a teacher signal. A procedure [a] for learning based on (s) and obtaining a solution of the learning parameter W ₁ ;
New learning corresponding to each learning data D (s) is performed using the neural network of the (L ₁ +1) layer after learning obtained by setting the solution of the learning parameter W ₁ obtained in the procedure [a]. A procedure for generating data D _v (s), for each learning data D (s), a sample X (s) of learning data D (s) is stored in the (L ₁ +1) layer after learning. As an input value X to the neural network, an output value Z of the (L ₁ −1) -th layer composed of N ₂ (the value N ₂ is an integer of 2 or more) intermediate units constituting this neural network. = {Z1,..., ZN ₂ }, the obtained output value Z is set as a new sample U (s), the set new sample U (s), and the learning data D (s) a combination of the values y (s) indicated by the data D _v (s) {U (s), y ( s)} a, the procedure [b] generating said as learning data D new learning data corresponding to the _{(s) D v (s)} ,
Based on the learning data D _v (s) generated in the procedure [b], the input value U = {u1,..., UN} (where the value N is the learning data D _{v used} as a sample of the input value U). (C is the number of dimensions of the sample U (s) in (s).) A procedure [c] for designing a nonlinear support vector machine that calculates a value y representing a category corresponding to
Including
A combination of a calculation model capable of calculating the output value Z of the (L ₁ -1) layer in the neural network of the (L ₁ +1) layer after learning and the nonlinear support vector machine designed in the step [c] The output model Z of the (L ₁ −1) layer is calculated from the input value X, and the output value Z is used as the input value U of the nonlinear support vector machine. A model derivation method for deriving a calculation model for calculating an output value y as a calculation model of the recognition system
The value y representing the category corresponding to the input value X is calculated from the given input value X = {x1,..., XN ₁ } using the calculation model derived by A recognition system characterized by recognizing a category to be performed.

The model derivation method is:
In the step [b], for each learning data D (s), the value Z calculated based on the sample X (s) of the learning data D (s) is reduced in dimension by a predetermined algorithm, and the dimension The value V = {v1,..., VN ₃ } after reduction (where the value N ₃ is an integer of 2 or more smaller than the value N ₂ ) is set as the new sample U (s), and Learning data D _v (s) = {U (s), y (s)} is generated,
As a calculation model of the recognition system,
The output value Z of the (L ₁ −1) -th layer is calculated from the input value X, the dimension of the output value Z is reduced by the predetermined algorithm, and the output value Z is converted into an N ₃ -dimensional value V = { v1,..., vN ₃ }, and a model derivation method for deriving a calculation model for calculating the output value y of the nonlinear support vector machine using the converted value V as the input value U of the nonlinear support vector machine. recognition system according to claim 1, wherein a.

The model derivation method is:
In the step [b], for each learning data D (s), the value Z calculated based on the sample X (s) of the learning data D (s) is obtained by the principal component analysis method N _3. number of N _{2-dimensional} vector Jm (where, m = 1, ..., a N _3.) by dimension reduction using, the value Z, the value of the m element vm is said value Z vector Jm inner product <Jm, Z> value of N _{3-dimensional} represented by V = {v1, ..., vN _3} is converted to a value V = after the dimension reduction {v1, ..., vN _3}, said new The learning data D _v (s) = {U (s), y (s)} is set as a simple sample U (s),
As a calculation model of the recognition system,
The output value Z of the (L ₁ −1) -th layer is calculated from the input value X, the output value Z is dimension-reduced using the N ₃ N _2- dimensional vectors Jm, and the value V after dimension reduction is obtained. The recognition system according to claim 2, wherein the model is a model derivation method for deriving a calculation model for calculating an output value y of the nonlinear support vector machine with the input value U of the nonlinear support vector machine.

The model derivation method is:
4. The recognition system according to claim 1, wherein the procedure [a] is a model derivation method for obtaining a solution of the learning parameter W ₁ by a back-propagation method. 5.

The model derivation method is:
In the procedure [c], for the nonlinear support vector machine that calculates the value y representing the category from the input value U, the parameter γ of the kernel K (U, U (s), γ) constituting the nonlinear support vector machine is changed. The nonlinear support vector machine is designed using the kernel K (U, U (s), γ) in which the appropriate value is obtained by the following procedure [1] to procedure [3] and the appropriate value is set as the parameter γ. The recognition system according to claim 1, wherein the recognition system is a model derivation method .
[1] As a learning target neural network, a (L ₂ +1) layer neural network in which the 0th layer is an input layer made up of N input units and the L ₂ layer is an output layer (where the value L ₂ is And an unknown number of learning parameters W ₂ in the neural network, the value y (s) of each learning data D _v (s) as a teacher signal, and the S number of learning parameters W ₂ . Learning is performed based on the learning data D _v (s), and a solution of the learning parameter W ₂ is obtained.
[2] Learning obtained by setting a solution of the learning parameter W ₂ obtained in the procedure [1] for a sample U (s) of the learning data D _v (s) for each learning data D _v (s). As input values to the neural network of the later (L ₂ +1) layer, N _h pieces constituting the neural network (however, the value N _h is an integer larger than the dimension number N of the input value U). The output value H (s) = {h1 (s),..., HN _h (s)} of the (L ₂ −1) -th layer consisting of the intermediate units is obtained.
[3] Using the learning data D _v (s) and the value H (s) corresponding to the sample U (s) of each learning data D _v (s) obtained in the procedure [2],

Accordingly, a solution γ ^{* of the} parameter γ for which the square error E takes a minimum value is obtained as an appropriate value of the parameter γ (where <H (s = a), H (s = b)> is a value H (s = A) represents the inner product of the value H (s = b)).

The kernel K (U, U (s), γ) is a Gaussian kernel.

The recognition system according to claim 5, wherein:

As a recognition result of the category corresponding to the input value X, a value y representing the category corresponding to the input value X is output, and the code constituting the nonlinear support vector machine obtained when the value y is calculated 7. The recognition system according to claim 1, wherein the absolute value | p | of the input value p to the function is output as information indicating the accuracy of the recognition result. .

From a given input value X = {x1,..., XN ₁ } (where the value N ₁ is an integer of 2 or more), a value y representing a category corresponding to the input value X is calculated in a predetermined manner. An information processing apparatus used for deriving the calculation model of a recognition system that recognizes a category corresponding to the input value X calculated by a model,
S learning data D (s) = {X (s), y (s)} consisting of a combination of the sample X (s) of the input value X and the value y (s) representing the category to which the sample belongs (provided that , S = 1,..., S.)
A (L ₁ +1) layer neural network in which the 0th layer is configured as an input layer composed of N ₁ input units and the L ₁ layer is configured as an output layer (however, the value L ₁ is an integer of 2 or more) S) learning data D acquired by the acquisition means using the unknown learning parameter W ₁ as the teacher signal and the value y (s) of each learning data D (s) acquired by the acquisition means. Learning means for learning based on (s) and obtaining a solution of the learning parameter W ₁ ;
Corresponding to each learning data D (s) acquired by the acquisition unit using the neural network of the (L ₁ +1) layer after learning by setting the solution of the learning parameter W ₁ obtained by the learning unit New learning data D _v (s) to be generated, and for each learning data D (s), a sample X (s) of the learning data D (s) is converted to the (L ₁ after learning). As an input value X to the neural network of the +1) layer, the (L ₁ -1) th layer composed of N ₂ (the value N ₂ is an integer of 2 or more) intermediate units constituting this neural network. Output value Z = {z1,..., ZN ₂ }, set the obtained output value Z as a new sample U (s), the set new sample U (s), and the learning data From the combination of the values y (s) indicated by D (s) Data D _v (s) = that the new data generating means for generating {U (s), y ( s)} a, as the learning data D (s) corresponding to the new training data D _v (s),
Based on the learning data D _v (s) generated by the new data generating means, the input value U = {u1,..., UN} (where the value N is the learning data D that is a sample of the input value U). _v is the number of dimensions of the sample U (s) of (s).) Support vector machine design means for designing a nonlinear support vector machine that calculates a value y representing a category corresponding to
Output means for outputting information representing the solution of the learning parameter W ₁ obtained by the learning means, and information representing the nonlinear support vector machine designed by the support vector machine design means;
An information processing apparatus comprising:

For each learning data D (s), the new data generating means reduces the value Z calculated based on the sample X (s) of the learning data D (s) by a predetermined algorithm, The value V = {v1,..., VN ₃ } after reduction (where the value N ₃ is an integer of 2 or more smaller than the value N ₂ ) is set as the new sample U (s), and Learning data D _v (s) = {U (s), y (s)} is generated,
The output means includes information representing the solution of the learning parameter W ₁ obtained by the learning means, information representing the nonlinear support vector machine designed by the support vector machine design means, and the value Z 9. The information processing apparatus according to claim 8 , wherein the information processing apparatus is configured to output information indicating a conversion method to the value V.

It said new data generating means, wherein each training data D (s), the value Z calculated on the basis of the samples X (s) of the training data D (s), N ₃ obtained by the technique of principal component analysis number of N _{2-dimensional} vector Jm (where, m = 1, ..., a N _3.) by dimension reduction using, the value Z, the value of the m element vm is said value Z vector Jm inner product <Jm, Z> value of N _{3-dimensional} represented by V = {v1, ..., vN _3} is converted to a value V = after the dimension reduction {v1, ..., vN _3}, said new Set as a simple sample U (s) to generate the learning data D _v (s) = {U (s), y (s)},
The output means is configured to output N ₃ N _two- dimensional vectors Jm obtained by a principal component analysis method as information representing a conversion method from the value Z to the value V. The information processing apparatus according to claim 9 .

The information processing apparatus according to claim 8 , wherein the learning unit is configured to obtain a solution of the learning parameter W ₁ by a back propagation method.

The support vector machine design means calculates the parameter y of the kernel K (U, U (s), γ) constituting the nonlinear support vector machine for the nonlinear support vector machine that calculates the value y representing the category from the input value U. Is determined by the following procedure [1] to procedure [3], and the non-linear support vector machine is obtained by using the kernel K (U, U (s), γ) in which the appropriate value is set as the parameter γ. The information processing apparatus according to claim 8 , wherein the information processing apparatus is configured to be designed.
[1] As a learning target neural network, a (L ₂ +1) layer neural network in which the 0th layer is an input layer made up of N input units and the L ₂ layer is an output layer (where the value L ₂ is And an unknown number of learning parameters W ₂ in the neural network, the value y (s) of each learning data D _v (s) as a teacher signal, and the S number of learning parameters W ₂ . Learning is performed based on the learning data D _v (s), and a solution of the learning parameter W ₂ is obtained.
[2] Learning obtained by setting a solution of the learning parameter W ₂ obtained in the procedure [1] for a sample U (s) of the learning data D _v (s) for each learning data D _v (s). As input values to the neural network of the later (L ₂ +1) layer, N _h pieces constituting the neural network (however, the value N _h is an integer larger than the dimension number N of the input value U). The output value H (s) = {h1 (s),..., HN _h (s)} of the (L ₂ −1) -th layer consisting of the intermediate units is obtained.
[3] Using the learning data D _v (s) and the value H (s) corresponding to the sample U (s) of each learning data D _v (s) obtained in the procedure [2],

The kernel K (U, U (s), γ) is a Gaussian kernel.

The information processing apparatus according to claim 12, wherein:

14. The function as the acquisition means, the learning means, the new data generation means, the support vector machine design means, and the output means included in the information processing apparatus according to claim 8 is a computer. A program to make it happen.

This is a design apparatus for a nonlinear support vector machine that calculates a value y representing a category of the input value U from the input value U = {u1,..., UN} (where the value N is an integer of 2 or more). And
S learning data D _v (s) = {U (s), y (s)} () consisting of a combination of a sample U (s) of the input value U and a value y (s) representing the category to which the sample belongs. Where s = 1,..., S.)
Based on the S learning data D _v (s) acquired by the acquisition means, an appropriate value of the parameter γ of the kernel K (U, U (s), γ) employed in the nonlinear support vector machine is determined by the following procedure. [1] to an appropriate value calculating means obtained by the procedure [3];
Based on the S pieces of learning data D _v (s) acquired by the acquisition unit using the kernel K (U, U (s), γ) in which appropriate values of the obtained parameters γ are set, the nonlinear support vector Design means to design the machine;
Output means for outputting information representing the nonlinear support vector machine designed by the design means;
A design apparatus comprising:
[1] As a learning target neural network, a (L + 1) layer neural network in which the 0th layer is an input layer composed of N input units and the Lth layer is an output layer (where the value L is an integer of 2 or more) in a.) set the learning parameters W unknowns in this neural network, the value y (s) of the training data D _v (s) as a teacher signal, the S pieces of learning data D _v (s) And learning of the learning parameter W is obtained.
[2] After learning, a sample U (s) of learning data D _v (s) is set for each learning data D _v (s) as a solution of the learning parameter W obtained in step [1]. As an input value to the (L + 1) layer neural network, N _h pieces (where the value N _h is an integer larger than the dimension number N of the input value U) constituting the neural network. The output value H (s) = {h1 (s),..., HN _h (s)} of the (L−1) th layer is obtained.
[3] Using the learning data D _v (s) and the value H (s) corresponding to the sample U (s) of each learning data D _v (s) obtained in the procedure [2],

The kernel K (U, U (s), γ) is a Gaussian kernel.

The design apparatus according to claim 15, wherein:

A program for causing a computer to realize the functions of the acquisition unit, the appropriate value calculation unit, the design unit, and the output unit included in the design apparatus according to claim 15 or 16 .