JP7230324B2

JP7230324B2 - Neural network learning method, computer program and computer device

Info

Publication number: JP7230324B2
Application number: JP2018230323A
Authority: JP
Inventors: 強福趙
Original assignee: University of Aizu
Current assignee: University of Aizu
Priority date: 2018-12-07
Filing date: 2018-12-07
Publication date: 2023-03-01
Anticipated expiration: 2038-12-07
Also published as: JP2020091813A

Description

本発明は、ニューラルネットワークの学習方法および当該学習方法をコンピュータ装置に実行させるためのコンピュータプログラムに関する。 The present invention relates to a neural network learning method and a computer program for causing a computer device to execute the learning method.

多数の階層を持つ階層型ニューラルネットワーク(MLP: multilayer perceptron)に基づく機械学習は、深層学習(deep learning)という。深層学習は、最近のAIブームを引き起こした主なトリガーであるともいえる。実際、一つの中間層（隠れ層）しかないニューラルネットワーク（SHL-MLP: single hidden layer MLP）に比べて、複数の中間層（隠れ層）を有するニューラルネットワーク（DMLP: deep MLP）のほうが、より効率的且つ効果的である。すなわち、与えられた問題に対して、DMLPは、近似精度がより高く、必要とされるニューロンの総数がより少なくなることが知られている（非特許文献１）。 Machine learning based on multilayer neural networks (MLPs) with many layers is called deep learning. Deep learning can be said to be the main trigger for the recent AI boom. In fact, a neural network with multiple hidden layers (DMLP: deep MLP) is more efficient than a neural network with only one hidden layer (SHL-MLP: single hidden layer MLP). Efficient and effective. That is, DMLP is known to have higher approximation accuracy and a smaller total number of neurons required for a given problem (Non-Patent Document 1).

Stephan Trenn, “Multilayer Perceptrons: Approximation Order and Necessary Number of Hidden Units,” IEEE Transactions on Neural Networks, Vol. 19, No. 5, pp. 836 - 844, 2008.Stephan Trenn, "Multilayer Perceptrons: Approximation Order and Necessary Number of Hidden Units," IEEE Transactions on Neural Networks, Vol. 19, No. 5, pp. 836 - 844, 2008. Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle, “Greedy layer-wisetraining of deep networks,” J. Platt et al. (Eds), Advances in Neural Information Processing Systems 19, pp. 153-160, MIT Press, 2007.Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle, “Greedy layer-wise training of deep networks,” J. Platt et al. (Eds), Advances in Neural Information Processing Systems 19, pp. 153-160, MIT Press, 2007.

しかしながら、複数の中間層（隠れ層）を有するニューラルネットワーク（DMLP）を使用する際に、中間層（隠れ層）の階層数を決める有効な方法がないため、通常、階層数を十分大きい数に固定する。また、一階層ずつ成長させる段階的学習方法(layer-wise training)も提案されているが（非特許文献２）、同様に階層数は十分大きい数にあらかじめ決められている。その結果、DMLPの設計（学習）コストが非常に高くなる。ここでコストは、主に二つの側面がある。一つは、学習データを集める学習コストで、もう一つは、学習に必要とされる計算リソース（メモリ、計算時間など）の量（決断コスト）である。また、このように得られたDMLPを実際に使用するときにも、余計な計算をする必要があるので、決断効率が悪くなる。 However, when using a neural network (DMLP) with multiple hidden layers (hidden layers), there is no effective way to determine the number of hidden layers. fixed. Also, a layer-wise training method has been proposed in which the number of layers grows one layer at a time (Non-Patent Document 2). As a result, the design (learning) cost of DMLP becomes very high. There are two main aspects of cost here. One is the learning cost of collecting learning data, and the other is the amount of computational resources (memory, computational time, etc.) required for learning (decision cost). Also, when actually using the DMLP obtained in this way, it is necessary to perform unnecessary calculations, resulting in poor decision-making efficiency.

そこで、本発明の目的は、中間層の階層数を効率的に決定することができるニューラルネットワークの学習方法及びそのコンピュータプログラム及びコンピュータ装置を提供することにある。 SUMMARY OF THE INVENTION Accordingly, an object of the present invention is to provide a neural network learning method, a computer program therefor, and a computer apparatus capable of efficiently determining the number of intermediate layers.

上記目的を達成するための本発明におけるニューラルネットワークの第一の学習方法は、k(k=0,1,…,K)階層の中間層（k=0である0番目の中間層に対応する写像は単位写像）を有するニューラルネットワークの学習方法において、あらかじめ用意したデータ信号と教師信号の組からなる学習データの集合におけるデータ信号を0番目の中間層の出力として求める第一のステップと、k>0において、k番目の中間層の写像及びk番目の出力層の写像を、k-1番目の中間層の出力に基づいて生成して学習を行う第二のステップと、k番目の中間層の出力を生成し、当該k番目の出力層の出力を評価する第三のステップと、k番目の出力層の出力の評価が所定レベル以上である場合、学習を終了する第四のステップと、k番目の出力層の出力の評価が所定レベル未満である場合、k=k+1とし、k>Kであれば学習を終了し、k>Kでなければ第二のステップ乃至第四のステップを繰り返す第五のステップとを備えることを特徴とする。 The first learning method of the neural network in the present invention for achieving the above object is the k (k = 0, 1, ..., K) hidden layer (corresponding to the 0th hidden layer where k = 0 In the learning method of a neural network having a mapping (unit mapping), a first step of obtaining a data signal in a set of learning data consisting of a set of a data signal and a teacher signal prepared in advance as an output of the 0th hidden layer; > 0, the second step of learning by generating the k-th hidden layer map and the k-th output layer map based on the output of the k-1-th hidden layer, and the k-th hidden layer a third step of generating an output of and evaluating the output of the k-th output layer; a fourth step of terminating learning if the evaluation of the output of the k-th output layer is equal to or greater than a predetermined level; If the evaluation of the output of the k-th output layer is less than a predetermined level, let k=k+1, if k>K, end the learning, if not k>K, the second step to the fourth step and a fifth step of repeating

本発明におけるニューラルネットワークの第二の学習方法は、k(k=0,1,…,K)階層の中間層（k=0である0番目の中間層に対応する写像は単位写像）を有するニューラルネットワークの学習方法において、あらかじめ用意したデータ信号と教師信号の組からなる学習データの集合におけるデータ信号を0番目の中間層の出力として求める第一のステップと、k>0において、k番目の中間層の写像及びk番目の出力層の写像を、k-1番目の中間層の出力に基づいて生成して学習を行う第二のステップと、k番目の中間層の出力を生成し、当該k番目の出力層の出力をクラスごとに評価する第三のステップと、k番目の出力層の出力の評価がすべてのクラスについて所定レベル以上である場合、学習を終了する第四のステップと、k番目の出力層の出力の評価が少なくとも一つのクラスについて所定レベル未満である場合、評価が当該所定レベル未満のクラスの出力のみを残し、k=k+1とし、k>Kであれば学習を終了し、k>Kでなければ第二のステップ乃至第四のステップを繰り返す第五のステップとを備えることを特徴とする。 The second learning method of the neural network in the present invention has k (k = 0, 1, ..., K) hidden layers (the mapping corresponding to the 0th hidden layer where k = 0 is a unit mapping). In the neural network learning method, the first step is to find the data signal in the set of learning data consisting of a set of data signal and teacher signal prepared in advance as the output of the 0th hidden layer, and when k>0, the kth A second step of learning by generating a map of the hidden layer and a map of the k-th output layer based on the output of the k-1th hidden layer, and generating the output of the k-th hidden layer, a third step of evaluating the output of the k-th output layer for each class; a fourth step of terminating learning if the evaluation of the output of the k-th output layer is equal to or greater than a predetermined level for all classes; If the evaluation of the output of the k-th output layer is less than a given level for at least one class, leave only the output of the class whose evaluation is less than the given level, set k=k+1, and learn if k>K and a fifth step of repeating the second to fourth steps unless k>K.

本発明におけるニューラルネットワークの第三の学習方法は、k(k=0,1,…,K)階層の中間層（k=0である0番目の中間層に対応する写像は単位写像）を有するニューラルネットワークの学習方法において、あらかじめ用意したデータ信号と教師信号の組からなる学習データの集合におけるデータ信号を0番目の中間層の出力として求める第一のステップと、k>0において、k番目の中間層の写像及びk番目の出力層の写像を、0乃至k-1番目の中間層の出力に基づいて生成して学習を行う第二のステップと、k番目の中間層の出力を生成し、当該k番目の出力層の出力をクラスごとに評価する第三のステップと、k番目の出力層の出力の評価がすべてのクラスについて所定レベル以上である場合、学習を終了する第四のステップと、k番目の出力層の出力の評価が少なくとも一つのクラスについて所定レベル未満である場合、評価が当該所定レベル未満のクラスの出力のみを残し、k=k+1とし、k>Kであれば学習を終了し、k>Kでなければ第二のステップ乃至第四のステップを繰り返す第五のステップとを備えることを特徴とする。 The third learning method of the neural network in the present invention has k (k = 0, 1, ..., K) hidden layers (the mapping corresponding to the 0th hidden layer where k = 0 is a unit mapping) In the neural network learning method, the first step is to find the data signal in the set of learning data consisting of a set of data signal and teacher signal prepared in advance as the output of the 0th hidden layer, and when k>0, the kth A second step of learning by generating a hidden layer map and a k-th output layer map based on the outputs of the 0th to k-1th hidden layers, and generating the k-th hidden layer output. , a third step of evaluating the output of the k-th output layer for each class, and a fourth step of terminating learning if the evaluation of the output of the k-th output layer is equal to or higher than a predetermined level for all classes and if the evaluation of the output of the k-th output layer is less than a given level for at least one class, then leave only the output of the class whose evaluation is less than the given level, let k=k+1, if k>K If k>K, the learning is terminated, and if k>K, the second to fourth steps are repeated.

また、上記第一乃至第三の学習方法をコンピュータ装置に実行させるためのコンピュータプログラム及びそのコンピュータ装置が提供される。コンピュータ装置は、サーバ装置、パーソナルコンピュータや携帯デバイスのような端末装置であり、さらに、サーバ装置及び端末装置を複数台接続した構成である。 Also provided are a computer program for causing a computer device to execute the first to third learning methods, and the computer device. The computer device is a server device, a terminal device such as a personal computer or a mobile device, and further has a configuration in which a plurality of server devices and terminal devices are connected.

本発明の学習方法において、複数の中間層を有するニューラルネットワーク（DMLP: deep MLP）は一階層ずつ成長し、与えられた問題を解決できる段階で成長を早期に終了することができる。これにより、必要な数だけの階層数を設定することができ、コストの効率化を図ることができる。また、パターン分類問題を解決する場合、各クラスの難易度は通常異なるので、本発明ではクラス別に階層数を決めることができる。これによって、学習コストも決断コストも低減することができる。さらに、高い階層を生成する際に、それより低いすべての層の情報を使用することによって、より早い段階でよいモデルを求めることができ、早期に学習を終了することができる。 In the learning method of the present invention, a neural network (DMLP: deep MLP) having a plurality of intermediate layers grows layer by layer, and the growth can be terminated early at a stage where a given problem can be solved. As a result, the required number of layers can be set, and cost efficiency can be improved. Also, when solving a pattern classification problem, the difficulty level of each class is usually different, so in the present invention, the number of layers can be determined for each class. This reduces both learning and decision costs. Furthermore, by using information from all lower layers when generating higher layers, a good model can be found at an earlier stage, and learning can be terminated earlier.

本発明の実施の形態における第一の学習方法によるニューラルネットワークの学習モデルを示す図である。FIG. 4 is a diagram showing a neural network learning model according to the first learning method according to the embodiment of the present invention; 本発明の実施の形態における第二の学習方法によるニューラルネットワークの学習モデルを示す図である。FIG. 4 is a diagram showing a neural network learning model according to the second learning method according to the embodiment of the present invention; 本発明の実施の形態における第三の学習方法によるニューラルネットワークの学習モデルを示す図である。FIG. 10 is a diagram showing a neural network learning model according to the third learning method according to the embodiment of the present invention; 第三の学習方法によるニューラルネットワークの学習モデルの実装例を示す図である。FIG. 10 is a diagram showing an implementation example of a neural network learning model according to the third learning method;

以下、図面を参照して本発明の実施の形態について説明する。しかしながら、かかる実施の形態例が、本発明の技術的範囲を限定するものではない。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described with reference to the drawings. However, such an embodiment does not limit the technical scope of the present invention.

図１は、本発明の実施の形態における第一の学習方法によるニューラルネットワークの学習モデルを示す図である。図１にある記号は、以下のように定義される。x^(k) ∈ R^Nkはk(k=0,1,…,K)番目の中間層の出力である。ここで、ξ_kはk番目の中間層に対応する「写像」（あるいは変換、関数）である。ただし、ξ₀は単位写像で、x⁽⁰⁾は元の学習データそのものである（すなわち、0番目の中間層は通常のニューロンの層ではなく、バッファである。0番目の中間層を入力層とみなすこともできる）。 FIG. 1 is a diagram showing a neural network learning model according to a first learning method according to an embodiment of the present invention. The symbols in FIG. 1 are defined as follows. x ^(k) ∈ R ^Nk is the output of the k(k=0,1,...,K)-th hidden layer. where ξ _k is the "map" (or transformation, function) corresponding to the k-th hidden layer. where ξ ₀ is the unit map and x ⁽⁰⁾ is the original training data itself (that is, the 0th hidden layer is not a layer of ordinary neurons, but a buffer. The 0th hidden layer is the input layer can also be considered).

N_kはk番目中間層のニューロン数で、N₀ = N_fは元の「特徴空間」の次元である。また、φ_kはk(k=0,1,…,K)番目の出力層に対応する写像で、y^(k) ∈ R^Ncはその出力である。Ncはクラス数である。 N _k is the number of neurons in the kth hidden layer and N ₀ =N _f is the dimension of the original 'feature space'. Also, φ _k is a map corresponding to the k (k=0, 1, . . . , K)-th output layer, and y ^(k) ∈ R ^Nc is its output. Nc is the number of classes.

図１のモデルの設計（学習）方法について説明する。まず、学習データの集合Ωは以下のように与えられたとする。 A design (learning) method for the model in FIG. 1 will be described. First, assume that a set Ω of learning data is given as follows.

Ω = {<x_i ⁽⁰⁾, d_i>, i = 1,2, … ,N_d}
ここで、x_i ⁽⁰⁾∈ R^Nkはi番目のデータ（特徴ベクトル）で、d_i ∈ R^Ncはそのラベル（教師信号）である。また、N_fはデータの次元、N_dは学習データの数、Ncはクラスの数である。 Ω = {<x _i ⁽⁰⁾ , d _i >, i = 1,2, … ,N _d }
where x _i ⁽⁰⁾ ∈ R ^Nk is the i-th data (feature vector) and d _i ∈ R ^Nc is its label (teacher signal). Also, N _f is the dimension of the data, N _d is the number of training data, and Nc is the number of classes.

本発明の実施の形態においては、バターン分類問題を例とし、ラベル（教師信号）はNc個の値を取る一次元の整数であってもよいし、Nc個の２つの値({-1,1}あるいは{0,1})を取る２進数であってもよい。本実施の形態では、後者である２進数を採用する。したがって、図１のモデルの各出力層のニューロン数はNcとなる。上述の定義において、図１に示す第一の学習方法の各工程（Step)は次のようになる。
（Step 1）
k=0; Ω_k= Ω; Ξ = {}; Φ = {}とする。
ここで、Ξは中間層の集合, Φは出力層の集合、 {}は空集合、Ω_kはk番目の中間層の出力x^(k)で構成されるデータベースである。
（Step 2）
Ω₀に基づいて、φ₀を生成（学習）し、Φに追加する。
（Step 3）
φ_kの性能を評価する。評価方法については次の段落で説明する。
φ_kの性能が所定レベルを超えている場合は学習が成功で終了し、ΞとΦを出力する。 In the embodiment of the present invention, the pattern classification problem is taken as an example. 1} or {0,1}). In this embodiment, the latter binary number is adopted. Therefore, the number of neurons in each output layer of the model in FIG. 1 is Nc. In the above definition, each step of the first learning method shown in FIG. 1 is as follows.
(Step 1)
Ω _k = Ω; Ξ = {}; Φ = {}.
Here, Ξ is a set of hidden layers, Φ is a set of output layers, {} is an empty set, and Ω _k is a database composed of outputs x ^(k) of the k-th hidden layer.
(Step 2)
Based on _Ω0 , generate (learn) _φ0 and add to Φ.
(Step 3)
Evaluate the performance of φ _k . The evaluation method is explained in the next paragraph.
If the performance of φ _k exceeds a predetermined level, learning ends successfully and outputs Ξ and Φ.

具体的には、φ_kの性能は認識率と信頼度で評価する。通常、学習データそのものではなく、予め用意した検証データの集合(validation set)で性能を評価する。認識率は、φ_kの出力と教師信号との差をもとに求められる。また、信頼度はある入力データが与えられた場合ある判断を下す事後確率で求められる。認識率と信頼度が所定の閾値以上になったら、φ_kの性能が所定レベルを超えていると判断される。φ_kの性能が所定レベルを超えていない場合は、次の（Step 4）に進む。
（Step 4）
k=k+1とする。すなわち、中間層を一層ずつ増やしていく。
（Step 5）
k>Kならば、学習が終了する（この場合、学習は失敗する）。k>Kでなければ、次の（Step
6）に進む。
（Step 6）
Ω_k-1に基づいてξ_kとφ_kとを生成し、Ξ = Ξ + {ξ_k}; Φ = Φ + {φ_k}とする。ξ_kとφ_kを同時に生成するためには、既知の3層MLPの学習方法を使用する（例えば、良く知られている誤差逆伝播法など）。
（Step 7）
Ω_k =ξ_k(Ω_k-1)を求め、（Step 3）に戻る。 Specifically, the performance of φ _k is evaluated by recognition rate and reliability. Usually, performance is evaluated not on the training data itself, but on a set of validation data prepared in advance (validation set). The recognition rate is obtained based on the difference between the φ _k output and the teacher signal. Also, the reliability is obtained by the posterior probability of making a certain judgment given certain input data. When the recognition rate and reliability are equal to or greater than a predetermined threshold, it is determined that the performance of φ _k exceeds a predetermined level. If the performance of φ _k does not exceed the predetermined level, proceed to the next step (Step 4).
(Step 4)
Let k=k+1. That is, the number of intermediate layers is increased one by one.
(Step 5)
If k>K, then learning ends (in which case learning fails). If not k>K, the next (Step
Proceed to 6).
(Step 6)
Generate ξ _k and φ _k based on Ω _k−1 and let Ξ = Ξ + {ξ _k }; Φ = Φ + {φ _k }. To generate ξ _k and φ _k simultaneously, we use a known method of learning a three-layer MLP (eg, the well-known error backpropagation method).
(Step 7)
Obtain Ω _k =ξ _k (Ω _k-1 ) and return to (Step 3).

すなわち、k>Kとならない間において、（Step 3）乃至（Step 7）が繰り返されることで、φ_kの性能が所定レベルを超えるまで、中間層が一階層ずつ順次生成され、φ_kの性能が所定レベルを超えたところで中間層の生成は停止され、学習が終了する。 That is, by repeating (Step 3) to (Step 7) while k>K does not hold, intermediate layers are sequentially generated layer by layer until the performance of φ _k exceeds a predetermined level, and the performance of φ _k is exceeds a predetermined level, generation of the intermediate layer is stopped, and learning ends.

Ω_k =ξ_k(Ω_k-1)は、k番目の中間層を利用してすべての学習データをξ_kによって新しい特徴空間に写像することを意味する。なお、写像によってデータ自体は変わるが、そのラベル（教師信号）は変わらない。 Ω _k =ξ _k (Ω _k-1 ) means that the k-th hidden layer is used to map all learning data to a new feature space by ξ _k . Although the data itself changes due to the mapping, its label (teaching signal) does not change.

上述した第一の学習方法では、一階層ずつ中間層を増やしていき、（Step 3）における一階層ごとの途中の評価結果をもとに、必要な数だけの階層数を設定することができ、学習を早期に終了できる。また、（Step 3）において、性能評価の結果として階層を増やしていくごとにΞとΦを出力するが、すべての出力層（一階層追加されるごとに生成される出力層すべて）ではなく、最後の出力層（学習が成功して終了した場合に生成された最後の出力層）だけを結果として出力してもよい。 In the first learning method described above, the number of intermediate layers is increased one by one, and the required number of layers can be set based on the intermediate evaluation results for each layer in (Step 3). , can finish learning early. Also, in (Step 3), Ξ and Φ are output as the number of layers is increased as a result of performance evaluation, but not all output layers (all output layers generated each time one layer is added), Only the last output layer (the last output layer generated if training ended successfully) may be output as a result.

一方、（Step 3）における性能評価は、学習を早期に終了させるために実施される。評価を行うためには、通常、学習データの一部を検証集合(validation set)として保留し、それをベースに出力層の性能を評価する。単純に考えると、もし大部分（例えば、99%）の検証用データに対して、出力が教師信号と同じであれば、将来の新しい観測データに対しても高い確率で正解が得られると期待できる。しかし、認識率あるいは誤差率だけで評価すると、得られたシステムの汎化能力が高くならない場合がありうる。これは周知の不良設定問題である。通常、評価関数の中に、適切な正則化項(regularization factor)を取り入れることにより、この問題を解決することができる。実際、同じ程度の近似精度を達成するために必要とされるシステムの規模（パラメータの数）が小さければ小さいほどシステムの汎化能力がよい。学習の早期終了は、事実上システムの規模を抑える効果があるので、システムの汎化能力を向上することも可能である。 On the other hand, the performance evaluation in (Step 3) is performed to terminate learning early. For evaluation, we usually reserve a portion of the training data as a validation set and evaluate the performance of the output layer based on that. Simply put, if the output is the same as the teacher signal for most of the validation data (e.g. 99%), we expect that the correct answer will be obtained with a high probability for new observation data in the future. can. However, if the recognition rate or the error rate alone is evaluated, the generalization ability of the resulting system may not be high. This is a well-known ill-posed problem. Incorporating a suitable regularization factor in the evaluation function usually solves this problem. In fact, the smaller the system size (number of parameters) required to achieve the same degree of approximation accuracy, the better the system's ability to generalize. Since early termination of learning effectively reduces the size of the system, it is also possible to improve the generalization ability of the system.

図２は、本発明の実施の形態における第二の学習方法の学習モデルを示す図である。図１に示した第一の学習方法と比較して、φ_kの出力y^(k)の次元だけが異なる。ここで、y^(k)の次元M_k ≦ Ncは、φ_kで十分高い精度で認識できるクラスの数で、φ_kの出力は十分高い精度で認識できるクラスのインデックスに対応する。学習の途中で、あるクラスの学習（あるいは検証）データが十分高い精度で認識できれば、そのクラスのデータは、それ以降の学習に使用する必要がない。すべてのクラスが高い精度で認識できた時点で、学習を終了すればよい。具体的には、第二の学習方法は以下のようになる。
（Step 1）
k=0; A = {1,2, … , Nc}; Ω_k = Ω; Ξ = {}; Φ = {}とする。
ここで、Aは正確に認識されていないクラスのインデックスの集合である。
（Step 2）
Ω₀をベースに、φ₀を生成し、Φ = Φ + {φ₀}とする。
（Step 3）
φ_kの性能を評価する。
B_k = {};
Aからi を取り出し、クラスiの認識精度が十分高ければ、B_k = B_k + {i};
A = {}になった場合、学習を終了し、ΞとΦを出力する。
ここで、B_kは、φ_kで正確に認識できるクラスのインデックスである。図２におけるM_kは、B_kのサイズ（元の数）である。
（Step 4）
k=k+1とする。
（Step 5）
k>K ならば、学習が終了する。（この場合、学習は失敗する）。k>Kでなければ、次の（Step 6）に進む。
（Step 6）
Ω_k-1に基づいてξ_kとφ_kを生成し、Ξ = Ξ + {ξ_k}; Φ = Φ + {φ_k}とする。
（Step 7）
Ω_k =ξ_k(Ω_k-1)を求め、（Step 3）に戻る。 FIG. 2 is a diagram showing a learning model of the second learning method according to the embodiment of the present invention. Compared to the first learning method shown in FIG. 1, only the dimension of the output y ^(k) of φ _k differs. Here, the dimension M _k ≤ Nc of y ^(k) is the number of classes recognizable with sufficiently high accuracy by φ _k , and the output of φ _k corresponds to the index of the class recognizable with sufficiently high accuracy. If the training (or verification) data of a certain class can be recognized with sufficiently high accuracy in the middle of learning, the data of that class need not be used for subsequent learning. Learning can be terminated when all classes are recognized with high accuracy. Specifically, the second learning method is as follows.
(Step 1)
Let k=0; A = {1,2, … , Nc}; Ω _k = Ω; Ξ = {};
where A is the set of indices of classes that are not correctly recognized.
(Step 2)
Based on Ω ₀ , φ ₀ is generated and Φ = Φ + {φ ₀ }.
(Step 3)
Evaluate the performance of φ _k .
_Bk = {};
Take i from A, and if class i's recognition accuracy is high enough, then B _k = B _k + {i};
If A = {}, stop learning and output Ξ and Φ.
where B _k is the index of the class that can be accurately recognized by φ _k . M _k in FIG. 2 is the size (original number) of B _k .
(Step 4)
Let k=k+1.
(Step 5)
If k>K, learning ends. (In this case learning fails). If not k>K, proceed to the next step (Step 6).
(Step 6)
Generate ξ _k and φ _k based on Ω _k-1 and set Ξ = Ξ + {ξ _k }; Φ = Φ + {φ _k }.
(Step 7)
Obtain Ω _k =ξ _k (Ω _k-1 ) and return to (Step 3).

第二の学習方法は、上述の第一の学習方法と同様に、中間層を一階層ずつ増やしていきながら学習を行い、さらに、クラスの難易度が異なることを利用し、クラスごとに学習を早期終了することができる。学習済みのモデルを使って新しいパターンを認識する際に、そのパターンが認識しやすいクラスに属するものであれば、より早期に結果を出すことができる。 In the second learning method, as in the first learning method described above, learning is carried out while increasing the middle layer one by one. May be terminated early. When using a trained model to recognize a new pattern, if the pattern belongs to a class that is easy to recognize, it will produce results earlier.

図３は、本発明の実施の形態における第三の学習方法の学習モデルを示す図である。第三の学習方法は、上記の第二の学習方法と比較して、各階層がそれより下のすべての階層で検出した特徴を利用するようにしたものである。これによって、有効な情報をより効率に使用でき、より早い段階で学習あるいは決断を終了することができる。 FIG. 3 is a diagram showing a learning model of the third learning method according to the embodiment of the present invention. The third learning method is different from the second learning method in that each layer uses the features detected in all the layers below it. This allows for more efficient use of available information and earlier completion of learning or decision making.

DMLPの中間層ニューロンは、最小の察知エージェントであり、元のデータに含まれていない「特徴」を検出する機能を持つ。しかし、従来のDMLPはパイプラインの形を取るため、最終判断は最後の中間層のニューロンで抽出された特徴だけを利用して行う。第三の学習方法は、これを改善するためのものである。具体的には、第三の学習方法は、以下のようになる。
（Step 1）
k=0; A = {1,2, … , Nc}; Ω_k = Ω; Ξ = {}; Φ = {}とする。
（Step 2）
Ω₀に基づいて、φ₀を生成し、Φ = Φ + {φ₀}とする。
（Step 3）
φ_kの性能を評価する。
B_k = {};
Aからiを取り出し、クラスiの認識精度が十分高ければ、B_k = B_k + {i};
A = {}になった場合、学習を終了し、ΞとΦを出力する。
ここで、B_kは、φ_kで正確に認識できるクラスのインデックスである。図２におけるM_kは、B_kのサイズ（元の数）である。
（Step 4）
k=k+1とする。
（Step 5）
k>K ならば、学習が終了する。（この場合、学習は失敗する）。k>Kでなければ、次の（Step 6）に進む。
（Step 6）
Ω_k-1に基づいてξ_kとφ_kを生成し、Ξ = Ξ + {ξ_k}; Φ = Φ + {φ_k}とする。
（Step 7）
Ω_kを求め、（Step 3）に戻る。
ここで、Ω_kは、以下のように求める。
Ω₀ = Ω = {<x_i ⁽⁰⁾,d_i>, i = 1,2, … ,N_d}
k>0に対して、x_i ^(k)= x_i ^(k-1) + ξ_k(x_i ^(k-1)); Ω_k = {<x_i ^(k),d_i>,i = 1,2, … ,N_d}
ただし、演算子+は、二つのベクトルの連結(concatenation)を表す。
例えば、x = (x₁ x₂)^T,y = (y₁ y₂)^Tとして、x + y = (x₁ x₂ y₁ y₂)となる。以上の方法で各層の学習データを求めると、k番目のデータの次元はn_k = Σ_j=0 ^k N_jとなる。 DMLP middle-layer neurons are the smallest perceptive agents, capable of detecting 'features' not included in the original data. However, since conventional DMLP takes the form of a pipeline, the final decision is made using only the features extracted by the neurons in the last hidden layer. The third learning method is to improve this. Specifically, the third learning method is as follows.
(Step 1)
Let k=0; A = {1,2, … , Nc}; Ω _k = Ω; Ξ = {};
(Step 2)
Based on Ω ₀ , generate φ ₀ and let Φ = Φ + {φ ₀ }.
(Step 3)
Evaluate the performance of φ _k .
_Bk = {};
Take i from A, and if class i's recognition accuracy is high enough, then B _k = B _k + {i};
If A = {}, stop learning and output Ξ and Φ.
where B _k is the index of the class that can be accurately recognized by φ _k . M _k in FIG. 2 is the size (original number) of B _k .
(Step 4)
Let k=k+1.
(Step 5)
If k>K, learning ends. (In this case learning fails). If not k>K, proceed to the next step (Step 6).
(Step 6)
Generate ξ _k and φ _k based on Ω _k-1 and let Ξ = Ξ + {ξ _k }; Φ = Φ + {φ _k }.
(Step 7)
Find Ω _k and return to (Step 3).
Here, Ω _k is obtained as follows.
Ω ₀ = Ω = {<x _i ⁽⁰⁾ ,d _i >, i = 1,2, … ,N _d }
For k>0, x _i ^(k) = x _i ^(k-1) + ξ _k (x _i ^(k-1) ); Ω _k = {<x _i ^(k) ,d _i >,i = 1,2, … ,N _d }
However, the operator + represents the concatenation of two vectors.
For example, x = (x ₁ x ₂ ) ^T , y = (y ₁ y ₂ ) ^T , then x + y = (x ₁ x ₂ y ₁ y ₂ ). When learning data for each layer is obtained by the above method, the dimension of the k-th data is n _k =Σ _j=0 ^k N _j .

また、DMLPの実装は、通常、計算コストが高い。上述の学習方法を用いることにより、中間層の層数を必要数に絞ることができるが、その場合でも、比較的に難しい問題を解くためには、多数の階層数を有する深いDMLPを使用する場合がある。このような場合は、１台のコンピュータ装置で中間層と出力層のすべてを実装するのではなく、中間層をネットワーク上のサーバ装置、例えばクラウドサーバに実装し、出力層を端末装置、例えばローカルサーバあるいは携帯端末で実装することができる。具体的には、
・第一の学習方法を利用する場合、クラウドサーバに中間層を実装し、携帯端末に最後の出力層を実装すればよい。
・第二の学習方法を利用する場合、クラウドサーバに中間層を実装し、携帯端末に各階層の出力層が実装される。第二の学習方法では、すべての階層の出力層を実装する必要があるが、出力総数はNcであり、端末装置（携帯端末）での計算量は著しく増えない。
・第三の学習方法を利用する場合は、同様に、クラウドサーバに中間層を実装し、携帯デバイスに各階層の出力層が実装される。出力層をニューラルネットワークで実装するより、決定木（DT: Decision Tree）で実装することもできる。これによって使用する特徴の数を減らし、コストを抑えることができる。ここでコストとは、２つの意味を持つ。一つは、クラウドサーバからもらうデータの量（通信量或は通信料）で、もうひとつは決断するための計算量である。 Also, DMLP implementations are typically computationally expensive. By using the above learning method, the number of intermediate layers can be reduced to the required number, but even in that case, deep DMLP with a large number of layers is used to solve relatively difficult problems. Sometimes. In such a case, instead of implementing all of the intermediate layer and the output layer in one computer device, the intermediate layer is implemented in a server device on the network, such as a cloud server, and the output layer is implemented in a terminal device, such as a local It can be implemented on a server or a mobile terminal. in particular,
・When using the first learning method, it is sufficient to implement the intermediate layer on the cloud server and the final output layer on the mobile terminal.
・When using the second learning method, the intermediate layer is implemented in the cloud server, and the output layer of each layer is implemented in the mobile terminal. In the second learning method, it is necessary to implement the output layers of all hierarchies, but the total number of outputs is Nc, and the amount of computation on the terminal device (portable terminal) does not significantly increase.
- When using the third learning method, similarly, the intermediate layer is implemented in the cloud server, and the output layer of each layer is implemented in the portable device. Rather than implementing the output layer with a neural network, it can also be implemented with a decision tree (DT). This reduces the number of features used and keeps costs down. Here, cost has two meanings. One is the amount of data received from the cloud server (communication volume or communication fee), and the other is the amount of calculation for making a decision.

図４は、第三の学習方法による学習モデルの実装例を示す図である。クラウドサーバに中間層を実装し、携帯端末に最後の出力層が実装される。 FIG. 4 is a diagram showing an implementation example of a learning model according to the third learning method. The middle layer is implemented on the cloud server, and the final output layer is implemented on the mobile device.

中間層をクラウドサーバに実装することにより、ユーザの意図はサーバに不可視となり、プライバシーの保護が可能となる。さらに、ξ₀を適切な乱数行列で実装することで、生のユーザデータを守りつつ、正しい決断ができる。 By implementing the middle layer on the cloud server, the user's intentions are hidden from the server and privacy can be protected. Moreover, by implementing ξ ₀ with a suitable random matrix, we can make correct decisions while preserving raw user data.

Claims

A computer program that causes a computer device to execute a learning method for a neural network having k (k=0,1,...,K) hidden layers (the map corresponding to the 0th hidden layer where k=0 is a unit map) in the computer device,
a first step of determining the data signal in a learning data set consisting of a set of a data signal and a teacher signal prepared in advance as an output of the 0th hidden layer;
a second step of learning by generating a map of the k-th hidden layer and a map of the k-th output layer based on the output of the k-1-th hidden layer when k>0;
a third step of generating the output of the kth hidden layer and evaluating the output of the kth output layer;
a fourth step of terminating learning if the evaluation of the output of the k-th output layer is equal to or higher than a predetermined level;
If the evaluation of the output of the k-th output layer is less than a predetermined level, k=k+1, and if k>K, learning is terminated; if k>K, the second step to the first and a fifth step of repeating the four steps.

In the first step, based on the output of the 0th hidden layer, generate a map of the 0th output layer, evaluate the output of the 0th output layer, and output the 0th output layer. is above a predetermined level, learning is terminated, and if the evaluation of the output of the 0th output layer is less than a predetermined level, proceeding to the second step. computer program.

A computer program that causes a computer device to execute a learning method for a neural network having k (k=0,1,...,K) hidden layers (the map corresponding to the 0th hidden layer where k=0 is a unit map) in the computer device,
a first step of determining the data signal in a learning data set consisting of a set of a data signal and a teacher signal prepared in advance as an output of the 0th hidden layer;
a second step of learning by generating a map of the k-th hidden layer and a map of the k-th output layer based on the output of the k-1-th hidden layer when k>0;
a third step of generating the output of the kth hidden layer and evaluating the output of the kth output layer by class;
a fourth step of terminating learning if the evaluation of the output of the k-th output layer is equal to or higher than a predetermined level for all classes;
if the evaluation of the output of the k-th output layer is less than a predetermined level for at least one class, then only the output of the class whose evaluation is less than the predetermined level is retained, let k=k+1, and if k>K A computer program characterized by ending learning, and executing a fifth step of repeating the second step to the fourth step if k>K is not satisfied.

In the first step, based on the output of the 0th hidden layer, generate a map of the 0th output layer, evaluate the output of the 0th output layer, and output the 0th output layer. is above a predetermined level for all classes, terminate learning, and if the evaluation of the output of the 0th output layer is less than a predetermined level for at least one class, proceed to the second step. 4. A computer program as claimed in claim 3, characterized by:

A computer program that causes a computer device to execute a learning method for a neural network having k (k=0,1,...,K) hidden layers (the map corresponding to the 0th hidden layer where k=0 is a unit map) in the computer device,
a first step of determining the data signal in a learning data set consisting of a set of a data signal and a teacher signal prepared in advance as an output of the 0th hidden layer;
a second step of learning by generating a map of the k-th hidden layer and a map of the k-th output layer based on the outputs of the 0th to k-1th hidden layers when k>0;
a third step of generating the output of the kth hidden layer and evaluating the output of the kth output layer by class;
a fourth step of terminating learning if the evaluation of the output of the k-th output layer is equal to or higher than a predetermined level for all classes;
if the evaluation of the output of the k-th output layer is less than a predetermined level for at least one class, then only the output of the class whose evaluation is less than the predetermined level is retained, let k=k+1, and if k>K A computer program characterized by ending learning, and executing a fifth step of repeating the second step to the fourth step if k>K is not satisfied.

In the first step, based on the output of the 0th hidden layer, generate a map of the 0th output layer, evaluate the output of the 0th output layer, and output the 0th output layer. is above a predetermined level for all classes, terminate learning, and if the evaluation of the output of the 0th output layer is less than a predetermined level for at least one class, proceed to the second step. 6. A computer program as claimed in claim 5, characterized by:

In a learning method for a neural network having k (k = 0, 1, ..., K) hidden layers (the mapping corresponding to the 0th hidden layer where k = 0 is a unit mapping),
a first step of determining the data signal in a learning data set consisting of a set of a data signal and a teacher signal prepared in advance as an output of the 0th hidden layer;
a second step of learning by generating a map of the k-th hidden layer and a map of the k-th output layer based on the output of the k-1-th hidden layer when k>0;
a third step of generating the output of the kth hidden layer and evaluating the output of the kth output layer;
a fourth step of terminating learning if the evaluation of the output of the k-th output layer is equal to or higher than a predetermined level;
If the evaluation of the output of the k-th output layer is less than a predetermined level, k=k+1, and if k>K, learning is terminated; if k>K, the second step to the first and a fifth step of repeating the four steps.

In a learning method for a neural network having k (k = 0, 1, ..., K) hidden layers (the mapping corresponding to the 0th hidden layer where k = 0 is a unit mapping),
a first step of determining the data signal in a learning data set consisting of a set of a data signal and a teacher signal prepared in advance as an output of the 0th hidden layer;
a second step of learning by generating a map of the k-th hidden layer and a map of the k-th output layer based on the output of the k-1-th hidden layer when k>0;
a third step of generating the output of the kth hidden layer and evaluating the output of the kth output layer by class;
a fourth step of terminating learning if the evaluation of the output of the k-th output layer is equal to or higher than a predetermined level for all classes;
if the evaluation of the output of the k-th output layer is less than a predetermined level for at least one class, then only the output of the class whose evaluation is less than the predetermined level is retained, let k=k+1, and if k>K and a fifth step of ending learning and repeating the second to fourth steps if k>K.

In a learning method for a neural network having k (k = 0, 1, ..., K) hidden layers (the mapping corresponding to the 0th hidden layer where k = 0 is a unit mapping),
a first step of determining the data signal in a learning data set consisting of a set of a data signal and a teacher signal prepared in advance as an output of the 0th hidden layer;
a second step of learning by generating a map of the k-th hidden layer and a map of the k-th output layer based on the outputs of the 0th to k-1th hidden layers when k>0;
a third step of generating the output of the kth hidden layer and evaluating the output of the kth output layer by class;
a fourth step of terminating learning if the evaluation of the output of the k-th output layer is equal to or higher than a predetermined level for all classes;
if the evaluation of the output of the k-th output layer is less than a predetermined level for at least one class, then only the output of the class whose evaluation is less than the predetermined level is retained, let k=k+1, and if k>K and a fifth step of ending learning and repeating the second to fourth steps if k>K.

7. Computer apparatus for executing a computer program according to any one of claims 1 to 6.

The computer device includes a server device on a network and a terminal device connected to the server device through a communication line, the middle layer of the k layers is implemented in the server device, and the output layer is implemented in the terminal device. 11. The computer device of claim 10, wherein: