JPH10254846A

JPH10254846A - Learning method for regression-type neural network

Info

Publication number: JPH10254846A
Application number: JP9061473A
Authority: JP
Inventors: Kenichi Arai; 賢一新井
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1997-03-14
Filing date: 1997-03-14
Publication date: 1998-09-25

Abstract

PROBLEM TO BE SOLVED: To suppress drop to a local minimum and to execute a stable learning operation with less errors by correcting a connection weight parameter in a direction where the errors are reduced the most, while a learning coefficient is adjusted so as not to increase the errors, from a stage where the errors are reduced to the value of not more than a reference value. SOLUTION: A neural network initialization part 7 constitutes a neural network from the parameter of the number of elements or the like, initializes connection weight by random numbers and sets a neuro gain parameter β and the learning coefficient η. Then, a βvalue deciding part 9 decides the value of the neuro gain parameter β. When all time sequential data are inputted, a steepest drop direction calculation part 11 calculates the steepest drop direction. A η value deciding part 10 decides the value of the learning coefficient ηand a connection weight matrix correction part 12 corrects connection weight, based on the decided learning coefficient η and the steepest drop direction.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、具体的な入出力時
系列データからその入出力関係を回帰結合型神経回路網
を用いて外部入力のある繰り返し写像関数として推定
し、時系列パターン認識、音声認識、文法解析、有限状
態機械の設計等を行う技術において時系列データの記憶
や学習を神経回路網が行う回帰型神経回路網の学習方法
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention estimates the input / output relationship from specific input / output time-series data as a repetitive mapping function having an external input using a recursive neural network, and performs time-series pattern recognition. The present invention relates to a learning method of a recurrent neural network in which a neural network stores and learns time-series data in a technology for performing voice recognition, grammar analysis, finite state machine design, and the like.

【０００２】[0002]

【従来の技術】神経回路網はニュートラルネットワーク
とも呼ばれ、生物の脳神経細胞の回路網をモデル化して
考案された情報処理装置である。最も一般的に用いられ
る神経回路網を簡単に説明する。2. Description of the Related Art A neural network, also called a neutral network, is an information processing device devised by modeling a network of brain cells of an organism. The most commonly used neural networks are briefly described.

【０００３】生物脳の神経細胞回路網を多数の神経細胞
という素子と、その素子間の結合とから構成される系で
あると考える。[0003] The neural network of the biological brain is considered to be a system composed of a number of nerve cells and connections between the elements.

【０００４】各神経素子はその膜電位に応じてインパル
スを出力する。各素子の出力が連続値を取るモデルでは
その出力値はパルス密度を表すことになる。また、その
膜電位は他の素子からのインパルスを受けて増加するの
で、パルス密度とその素子との結合効率を考慮した重み
付和で計算できると考えられる。膜電位の上昇に伴って
出力値は単調増加するが、パルス密度には上限があり出
力値は飽和する。[0004] Each neural element outputs an impulse according to its membrane potential. In a model in which the output of each element takes a continuous value, the output value represents the pulse density. In addition, since the membrane potential increases upon receiving an impulse from another element, it is considered that the membrane potential can be calculated by a weighted sum in consideration of the pulse density and the coupling efficiency with the element. The output value monotonically increases with an increase in the membrane potential, but the pulse density has an upper limit and the output value is saturated.

【０００５】これらのことから時刻ｔでのｉ番目の神経
素子の出力値をＳ_i(t) 、膜電位をｈ_i(t) 、素子ｊか
ら素子ｉへの結合効率をｗ_ijとすれば離散時間のモデル
では、From these facts, if the output value of the i-th neural element at time t is S _i (t), the membrane potential is h _i (t), and the coupling efficiency from element j to element i is w _ij , In a discrete-time model,

【数１】Ｓ_i（ｔ＋１）＝σ（ｈ_i(t) ）（２）となる。ここで、σはシグモイド関数であり、具体的に
はロジスティク関数、(Equation 1) S _i (t + 1) = σ (h _i (t)) (2) Here, σ is a sigmoid function, specifically, a logistic function,

【数２】が最もよく使われる。また、今後のため、(Equation 2) Is most often used. Also, for the future,

【数３】のようにβを導入する。これをニューロゲインパラメー
タと呼ぶ。特に、βの記述のないときは、β＝１．０で
ある。(Equation 3) Is introduced as follows. This is called a neurogain parameter. In particular, when there is no description of β, β = 1.0.

【０００６】神経回路網の一部あるいは全ての神経素子
に外部から入力が与えられる。この部分を入力部あるい
は入力層という。また、一部あるいは全ての神経素子は
外部に出力をする、あるいは外部から観測される。この
部分を出力部、あるいは出力層という。[0006] An input is externally supplied to some or all of the neural elements of the neural network. This part is called an input part or input layer. Some or all of the neural elements output to the outside or are observed from the outside. This part is called an output unit or an output layer.

【０００７】入力層から出力層に一方的に信号の伝わる
結合しかないものをフィードフォワード型神経回路網と
いう。これに対して、フィードバック方向の結合があ
り、信号の流れがループできるようなものを回帰型神経
回路網という。[0007] A feedforward type neural network has only one connection for transmitting a signal from the input layer to the output layer. On the other hand, a recurrent neural network that has a coupling in a feedback direction and can loop a signal flow is called a recurrent neural network.

【０００８】入力層、中間層、出力層の３層から構成さ
れる神経回路網を考える。図５に示すように、入力層の
素子からは中間層の素子への結合があり、中間層の素子
からは出力層の素子への結合と中間層の素子への結合が
ある構造の回帰型神経回路網はＥｌｍａｎネットと呼ば
れ、広く使われている構造の回帰型神経回路網の一つで
ある。Consider a neural network composed of three layers: an input layer, a middle layer, and an output layer. As shown in FIG. 5, the regression type structure has a structure in which an element in the input layer is coupled to an element in the intermediate layer, and a element in the intermediate layer is coupled to an element in the output layer and coupled to an element in the intermediate layer. The neural network is called an Elman net and is one of the recurrent neural networks having a widely used structure.

【０００９】入力層の素子の集合をＩ、素子数Ｎ_I、中
間層の素子の集合をＵ、素子数Ｎ_U、出力層の素子の集
合をＯ、素子数Ｎ_Oとする。また、全素子数をＮとす
る。出力素子、中間素子、入力素子の出力値をまとめて
ベクトルとして表現すれば、それぞれ、Ｎ_O次ベクトル
Ｓ^O、Ｎ_U次ベクトルＳ^U、Ｎ_I次ベクトルＳ^Iであ
り、全素子の出力値ベクトルはＮ次ベクトルＳとなる。The set of elements in the input layer is I, the number of elements N _I , the set of elements in the intermediate layer is U, the number of elements N _U , the set of elements in the output layer is O, and the number of elements N _O. The total number of elements is N. When the output values of the output element, the intermediate element, and the input element are collectively expressed as a vector, they are N _O -order vector S ^O , N _U -order vector S ^U , and N _I -order vector S ^I , respectively. The vector is an N-order vector S.

【００１０】[0010]

【数４】ここで、中間層のｊ番目の素子から出力層のｉ番目の素
子への結合重みをｗ_ij ^OUとし、中間層のｊ番目の素子か
ら中間層のｉ番目の素子への結合重みをｗ_ij ^UUとし、入
力層のｊ番目の素子から中間層のｉ番目の素子への結合
重みをｗ_ij ^UIとする。このとき、ｗ_ij、ｗ_ij ^OU、
ｗ_ij ^UU、ｗ_ij ^UIは、それぞれ（Ｎ_O＋Ｎ_U）×（Ｎ_U＋
Ｎ_I）、Ｎ_O×Ｎ_U、Ｎ_U×Ｎ_U、Ｎ_U×Ｎ_Iの行列
Ｗ、Ｗ^OU、Ｗ^UU、Ｗ^UIの(i,j) 要素とみなせ、それらの
関係は次のようになる。(Equation 4) Here, the connection weight from the j-th element in the intermediate layer to the i-th element in the output layer is w _ij ^OU, and the connection weight from the j-th element in the intermediate layer to the i-th element in the intermediate layer is w _ij and ^UU, the connection weight to the i-th element of the intermediate layer and w _ij ^UI from the j-th element of the input layer. At this time, w _ij , w _ij ^OU ,
w _{_ij} ^UU, w _ij ^UI, respectively _{_{(N O + N U) ×}} (N U +
N _i ), N _o × N _u , N _u × N _u , N _u × N _I matrix (W, W ^OU , W ^UU , W ^UI ) (i, j) elements, and their relationships are as follows: become.

【００１１】[0011]

【数５】入力層から出力層への直接の結合はないので、行列右上
はＮ_O×Ｎ_Iのゼロ行列である。以上から、神経回路網
の時間発展の式はまとめて書くと次のようになる。(Equation 5) Since there is no direct coupling from the input layer to the output layer, the matrix upper right is the zero matrix of N _O × N _I. From the above, the expression of the time evolution of the neural network is written as follows.

【００１２】[0012]

【数６】ここでのσは、各要素をシグモイド関数で計算したもの
を要素とするベクトル関数である。(Equation 6) Here, σ is a vector function in which each element is calculated by a sigmoid function.

【００１３】次に、神経回路網の学習について説明す
る。これらの神経回路網は、いくつかの入力値ξ(t) と
出力値のζ(t) 組から、この入出力値の組の関係を実現
している系を近似的に再現する能力がある。どのような
系を近似できるかは素子間の結合の仕方などの神経回路
網の構成に依るが、例えば、フィードフォワード型神経
回路網では多次元実数関数や特徴空間から分類を表す変
数への写像関数、回帰型神経回路網では力学系や時系列
データから分類を表す変数への写像関数などを近似する
ことができる。特に、回帰型神経回路網はデータが時間
と共に変化するとき系列データを扱うときに用いられ
る。Next, learning of the neural network will be described. These neural networks are capable of approximating the system that realizes the relationship between this input and output value pair from several input value ξ (t) and output value ζ (t) pairs. . The kind of system that can be approximated depends on the configuration of the neural network, such as how elements are connected.For example, in a feedforward neural network, a mapping from a multidimensional real number function or feature space to a variable representing a classification is performed. Functions and regression-type neural networks can approximate mapping functions from dynamical systems and time-series data to variables representing classifications. In particular, recurrent neural networks are used when dealing with series data when the data changes over time.

【００１４】神経回路網では様々な系の近似を、素子間
の結合効率であるｗ_ijをパラメータとし、この値を適切
な値に設定することで実現している。In a neural network, approximation of various systems is realized by setting _wij , which is the coupling efficiency between elements, as a parameter and setting this value to an appropriate value.

【００１５】しかし、神経回路網は非線型系であるた
め、最適なパラメータを一度に求めることは一般には出
来ない。このため通常パラメータｗ_ijは逐次的に求める
ことになる。これを「学習」という。However, since the neural network is a non-linear system, it is generally not possible to obtain optimum parameters at once. For this reason, the normal parameters w _ij are determined sequentially. This is called "learning".

【００１６】与えられた幾つかの入出力値の組を学習デ
ータという。A given set of input / output values is called learning data.

【００１７】｛ξ^(k)(1),ξ^(k)(2),…，ξ^(k)(t) ｝（８）｛ζ^(k)(1),ζ^(k)(2),…，ζ^(k)(t) ｝（９）但し、ｋ＝１，２，…，Ｐここでは、Ｐ組の学習データが与えられたことになる。
また、ξ^(k)(t) 、ζ^(k)(t) はそれぞれＮ_I次、Ｎ_O
次のベクトルである。｛Ξ ^(k) (1), ξ ^(k) (2),…, ξ ^(k) (t)｝ (8) ｛ζ ^(k) (1), ζ ^(k) (2), .., Ζ ^(k) (t)｝ (9) where k = 1, 2,..., P Here, P sets of learning data are given.
Ξ ^(k) (t) and ζ ^(k) (t) are N _I order and N _O , respectively.
The next vector.

【００１８】ξ(t) ＝（ξ₁(t),…，ξＮ_O(t))' ζ(t) ＝（ζ₁(t),…，ζＮ_Ｉ(t))' ここで、´は転置を表す。神経回路網に入力学習データ
を与える。[0018] _{ξ (t) = (ξ 1} (t), ..., ξN O (t)) here 'ζ (t) = (ζ 1 (t), ..., ζN I (t))', ' is Represents transposition. The input learning data is given to the neural network.

【００１９】[0019]

【数７】あるいは、まとめて書くと次のようになる。(Equation 7) Or, when written together:

【００２０】[0020]

【数８】神経回路網の出力値Ｓ^O(k)(t) と学習データの出力値ζ
^(k)(t) の差を誤差という。また、全誤差の自乗和Ｅを
次のように決める。Ｓ^O(k)(t) を単にＳ^(k)(t)と書く
ことにする。(Equation 8) The output value S ^{O (k)} (t) of the neural network and the output value of the learning data ζ
The difference between ^{(k) and} (t) is called an error. The sum of squares E of all errors is determined as follows. Let S ^{O (k)} (t) be simply written as S ^(k) (t).

【００２１】[0021]

【数９】誤差Ｅが０になれば、学習データの入出力の対応関係を
完全に獲得したことになる。そこで、どのようにｗ_ijを
修正し誤差Ｅを減少させるかが問題になる。最も一般的
に使われている方法は最急降下法といい、誤差Ｅを最も
減少させる方向、つまり最急降下ｗ_ijを修正するもので
ある。最急降下方向はＥをｗ_ijで偏微分することで求め
られるので、ｗ_ijの修正幅Δｗ_ijは(Equation 9) When the error E becomes 0, it means that the input / output correspondence of the learning data has been completely acquired. Therefore, the problem is how to correct w _ij and reduce the error E. The most commonly used method is the steepest descent method, which corrects the direction in which the error E is reduced most, that is, the steepest descent w _ij . Since the steepest descent direction is determined by partially differentiating E with w _ij, correction range [Delta] w _ij of w _ij is

【数１０】となる。ηは学習係数と呼ばれ学習を安定させるための
正の係数である。∂Ｅ／∂ｗ_ijを計算する方法はいくつ
か知られており、次に代表的な２つの方法を示す。(Equation 10) Becomes η is called a learning coefficient and is a positive coefficient for stabilizing learning. How to calculate the ∂E / ∂w _ij are known some following two typical methods.

【００２２】まず、最急降下方向を求める計算方法とし
て、リアルタイムリカレントラーニング（ＲＴＲＬ：Re
al Time Recurrent Learning）を具体的に説明する。以
下では学習データ番号を表す(k) は省略するが、最終的
なΔｗ_ijは各学習データでのｗ_ijの修正値の和をとれば
よい。First, a real-time recurrent learning (RTRL: Re
al Time Recurrent Learning). Hereinafter, (k) representing the learning data number is omitted, but the final Δw _ij may be the sum of the correction values of w _{ij in} each learning data.

【００２３】式(15)(16)より、From equations (15) and (16),

【数１１】となる。ここで、式(13)(14)より、[Equation 11] Becomes Here, from equations (13) and (14),

【数１２】となり、ｐ_pq ⁱ(t) を逐次求めていくことができる。た
だし、初期条件は、(Equation 12) And p _pq ⁱ (t) can be obtained sequentially. However, the initial condition is

【数１３】である。(Equation 13) It is.

【００２４】逐次求めたｐ_pq ⁱ(t) によりΔｗ_ij(t) を
計算できる。このように、ＲＴＲＬでは時系列データが
与えられるとその時点で結合重みの修正値を計算でき、
学習を進めていくことができるという特徴を有する。From the sequentially obtained p _pq ⁱ (t), Δw _ij (t) can be calculated. Thus, in the RTRL, when time-series data is given, a correction value of the connection weight can be calculated at that time,
The feature is that learning can be advanced.

【００２５】次に最急降下法を求める別の計算方法とし
ては、バックプロパゲーションスルータイム（ＢＰＴ
Ｔ：Back Propagation Through Time ）を具体的に説明
する。ここでも、学習データ番号を表す(k) は省略す
る。Next, another calculation method for obtaining the steepest descent method is a back propagation through time (BPT).
T: Back Propagation Through Time) will be specifically described. Here, (k) representing the learning data number is omitted.

【００２６】やはり式(15)より、From equation (15),

【数１４】となる。ここで、[Equation 14] Becomes here,

【数１５】とおけば、式（２４）は次のように書ける。(Equation 15) Then, equation (24) can be written as follows.

【００２７】[0027]

【数１６】ｚ_i（τ）については、次のように求めることができ
る。(Equation 16) z _i (τ) can be obtained as follows.

【００２８】[0028]

【数１７】このように、ＢＰＴＴではｚ_i(t）を時間逆方向に計算
していきΔｗ_ijを求める。ただし終端条件は、ｚ_i(T）＝ｅ_i(T) (30) である。[Equation 17] As described above, in BPTT, z _i (t) is calculated in the time reverse direction to obtain Δw _ij . However, the termination condition is z _i (T) = e _i (T) (30).

【００２９】次に、解散時間、離散値の時系列データの
学習の例として、有限オートマトンの学習について説明
する。まず、Moore 流の有限オートマトンＭの定義は次
のようなものである。Next, learning of a finite automaton will be described as an example of learning the time series data of the dissolution time and discrete values. First, the definition of the Moore finite automaton M is as follows.

【００３０】Ｍ＝（Ｘ，Ｙ，Ｓ，f_s，f_o，ｓ_o）・Ｘ：入力記号集合・Ｙ：出力記号集合・Ｓ：状態集合・ｓ_o∈Ｓ：初期状態・f_s：Ｘ×Ｓ→Ｓ状態遷移関数・f_o：Ｓ→Ｙ出力関数現在の状態がｓである有限オートマトンに入力記号ｘが
与えられたとする。このとき、有限オートマトンの状態
は状態遷移関数f_sに従い、ｓ´＝f_s（ｓ，ｘ）へと遷
移し、さらに出力記号ｙ＝f_o(s')を出力する。つま
り、記号の入力に対して、オートマトンは状態を遷移さ
せその状態に応じて記号を出力する。このようにして、
入力記号列を与えれば状態遷移、記号出力が繰り返さ
れ、その結果有限オートマトンは出力記号列を返すこと
になる。[0030] M = (X, Y, S , f s, f o, s o) · X: input symbol set · Y: Output symbol set · S: state set · s _o ∈S: initial state · f _s: X × S → S state transition function f _o : S → Y output function Suppose that an input symbol x is given to a finite automaton whose current state is s. At this time, the finite state automaton state in accordance with the state transition function _{_{f s, s'= f s (}} s, x) transitions to further outputs the output symbols _{y = f o (s')} . That is, in response to the input of the symbol, the automaton changes the state and outputs the symbol according to the state. In this way,
Given an input symbol sequence, the state transition and symbol output are repeated, and as a result, the finite state automaton returns an output symbol sequence.

【００３１】有限オートマトンの学習においては入力記
号、出力記号は、それぞれ、入力層入力する入力ベクト
ル、出力層の出力と対応する出力ベクトルへと連続値ベ
クトルに変換される。有限オートマトンの学習ではこれ
らの入出力ベクトル列に対して学習を行う。In the learning of the finite state automaton, the input symbol and the output symbol are converted into a continuous value vector into an input vector input to the input layer and an output vector corresponding to the output of the output layer, respectively. In the learning of the finite automaton, learning is performed on these input / output vector sequences.

【００３２】そして学習が成功したとき、中間層の素子
の出力値の軌道は相空間上で非連結な領域に集中する。
つまり、いくつかのクラスターを形成することが知られ
ている。また、このクラスターは有限オートマトンにお
ける「状態」に対応するので、学習した有限オートマト
ンの再構成をすることが可能である。When the learning is successful, the trajectories of the output values of the elements of the intermediate layer are concentrated in a non-connected area in the phase space.
That is, it is known that some clusters are formed. Further, since this cluster corresponds to the “state” in the finite state automaton, it is possible to reconstruct the learned finite state automaton.

【００３３】[0033]

【発明が解決しようとする課題】誤差を最も減少させる
方向へパラメータを修正する最急降下法では、一旦、最
小値でない局所的な極小値の谷に落ち込んでしまうと、
そこから抜け出せず全体の最小値に到達できないという
問題点がある。なお、局所的な極小値をローカルミニマ
ムといい、領域全体の最小値をグローバルミニマムとい
う。In the steepest descent method in which a parameter is corrected in a direction in which an error is reduced to a minimum, once a valley of a local minimum value which is not the minimum value falls,
There is a problem that it is not possible to escape from there and to reach the overall minimum value. Note that a local minimum value is called a local minimum, and a minimum value of the entire area is called a global minimum.

【００３４】また、学習データの入出力関係を獲得した
神経回路網においても、安定したオートマトン的動作の
保証はされず、長い系列のデータに対して不安定な挙動
を示すという問題がある。Further, even in a neural network that has acquired the input / output relationship of learning data, stable automaton-like operation is not guaranteed, and there is a problem in that unstable behavior is exhibited for long series of data.

【００３５】さらに、ニューロゲインパラメータβが大
きいとき、学習、すなわちｗ_ijの変化に対して素子の出
力が敏感に変化することがあり、途中で誤差が急増する
という問題がある。Further, when the neurogain parameter β is large, the output of the element may change sensitively with respect to learning, that is, the change of w _ij , and there is a problem that the error increases rapidly on the way.

【００３６】本発明は、上記に鑑みてなされたもので、
その目的とするところは、ローカルミニマムへの落込み
を抑制し、誤差が少なく、安定した学習動作を行うこと
ができる回帰型神経回路網の学習方法を提供することに
ある。The present invention has been made in view of the above,
An object of the present invention is to provide a learning method for a recurrent neural network that can suppress a drop to a local minimum, have a small error, and perform a stable learning operation.

【００３７】[0037]

【課題を解決するための手段】上記目的を達成するた
め、請求項１記載の本発明は、離散時間、離散値の入力
時系列データと該データに対応した離散時間、離散値の
目標出力時系列データの組がいくつか与えられた時、入
力時系列データを離散時間、連続値の回帰型神経回路網
に入力し、該神経回路網の実際の出力データと目標出力
時系列データとの誤差を減少させるように逐次的に結合
重みパラメータを誤差平面の最急降下方向に修正し、そ
の入出力時系列データ間の関数を獲得する回帰型神経回
路網の学習方法であって、学習がある程度進み、誤差が
基準値以下に減少した段階から、誤差が増加しないよう
にニューロゲインパラメータを増加させつつ学習を進
め、また誤差が増加しないように学習係数を調整しなが
ら誤差が最も減少する方向に結合重みパラメータを修正
することを要旨とする。In order to achieve the above object, according to the present invention, there is provided an input time series data of discrete time and discrete value and a target time of discrete time and discrete value corresponding to the data. When several sets of series data are given, input time series data is input to a discrete time, continuous value regression type neural network, and an error between actual output data of the neural network and target output time series data. A regression-type neural network learning method in which the connection weight parameter is successively corrected in the steepest descent direction of the error plane so as to reduce and the function between the input and output time series data is obtained. From the stage where the error decreases below the reference value, the learning proceeds while increasing the neuro gain parameter so that the error does not increase, and the error decreases most while adjusting the learning coefficient so that the error does not increase. And summarized in that to correct the connection weights parameter countercurrent.

【００３８】請求項１記載の本発明にあっては、学習が
ある程度進み、誤差が基準値以下になった段階から、ニ
ューロゲインパラメータの値を増大させつつ学習を進め
るとともに、またニューロゲインパラメータの増大およ
び結合行列の修正により誤差が基準値以上に増大しない
ようにニューロゲインパラメータの増大幅および学習係
数の値を適応的に調整している。According to the first aspect of the present invention, the learning is advanced to some extent and the learning is advanced while increasing the value of the neurogain parameter from the stage where the error becomes equal to or less than the reference value. The increase width of the neurogain parameter and the value of the learning coefficient are adaptively adjusted so that the error does not increase beyond the reference value due to the increase and the modification of the coupling matrix.

【００３９】[0039]

【発明の実施の形態】まず、本発明の実施の形態を説明
する前に本発明に関連する基本的事項および原理につい
て説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS Before describing the embodiments of the present invention, basic matters and principles relating to the present invention will be described.

【００４０】誤差Ｅは神経回路網の結合重みｗ_ijの関数
である。そこで、誤差Ｅを結合重みｗ_ijを変数とする関
数とみなせば、ｙ＝Ｅ({ｗ_ij｜ｉ，ｊ＝１，…Ｎ}) (31) は平面をなし、これを誤差平面という。The error E is a function of the connection weight w _ij of the neural network. Therefore, if the error E is regarded as a function using the connection weight w _ij as a variable, y = E ({w _ij | i, j = 1,... N}) (31) forms a plane, which is called an error plane.

【００４１】最急降下法では∂Ｅ／∂ｗ_ijを計算する
が、このベクトル値は誤差平面の斜面の勾配が最も急な
方向を表していることになる。つまり、最急降下法にお
いては初期値のｗ_ijから誤差平面の最も急な方向にｗ_ij
が移動していくことになる。In the steepest descent method, ∂E / ∂w _ij is calculated, and this vector value indicates the direction in which the slope of the slope of the error plane is the steepest. That, w _ij from w _ij initial values in the steepest descent method in the steepest direction of the error plane
Will move.

【００４２】ニューロゲインパラメータβが０のとき、
誤差平面は傾き０の平坦な平面である。なぜならば、σ
β＝ｏ（ｘ）＝１／２であり、素子の出力値がｗ_ijによ
らず常に一定になるからである。When the neurogain parameter β is 0,
The error plane is a flat plane having a slope of 0. Because σ
This is because β = o (x) = １／, and the output value of the element is always constant regardless of w _ij .

【００４３】ニューロゲインパラメータβが∞の時、誤
差平面は微細な階段状の面となる。なぜならは、シグモ
イド関数は完全な｛０，１｝のステップ関数になり、Ｅ
も離散値をとるようになるからである。When the neurogain parameter β is ∞, the error plane is a fine stepped surface. Because the sigmoid function becomes a complete {0,1} step function,
Also takes discrete values.

【００４４】また、βの値を０から増加させていくと、
誤差平面の形状は初め平坦であったが、徐々に緩やかな
傾斜の山谷構造を持つようになる。さらに、βの値を増
加させていくと、傾きを増し険しい山谷の様相を示す。
そして最終的にはβ＝∞で、断崖絶壁の階段状の面とな
る。When the value of β is increased from 0,
Although the shape of the error plane was flat at first, it gradually has a mountain-valley structure with a gentle inclination. Furthermore, as the value of β is increased, the slope increases and a steep mountain-valley appearance is exhibited.
Finally, β = 、, and it becomes a step-like surface of a cliff.

【００４５】これらのことからβが大きいとき、平地の
部分で学習が進まず、平地の境界の急斜面ではｗ_ijの変
化に対して素子の出力値の変化が敏感になるので、学習
は困難であることがわかる。比較的小さいβでは誤差平
面は緩やかな斜面で、学習は確実に進むことが期待でき
る。次にニューロゲインパラメータβと神経回路網の挙
動の関係を考える。オートマトンとしての神経回路網の
動作は、ニューロゲインパラメータβがある値β_Oより
大きくなると安定化し、任意の長さの入力記号列に対し
て安定した状態遷移および記号の出力が得られるように
なる。From these facts, when β is large, learning does not proceed on a flat ground portion, and on a steep slope at the boundary of a flat ground, the change in the output value of the element becomes sensitive to the change in w _ij , so that learning is difficult. You can see that there is. For a relatively small β, the error plane is a gentle slope, and learning can be expected to proceed reliably. Next, the relationship between the neurogain parameter β and the behavior of the neural network will be considered. The operation of the neural network as an automaton stabilizes when the neurogain parameter β becomes larger than a certain value β _O , so that a stable state transition and symbol output can be obtained for an input symbol string of any length. .

【００４６】ここでは安定にオートマトン的動作をする
理由を説明する。まず、幾つかの語句、記号の定義をす
る。Here, the reason why the automaton operation is performed stably will be described. First, some words and symbols are defined.

【００４７】Ｎ_U次元ユークリッド空間Ｒ^NUを考え、中
間素子の出力ベクトルはＲ^NU上の点とみなせる。ただ
し、０≦Ｓ_i ^O≦１であるので、Ｎ^U次元の単位超立方
体（［０，１］Ｎｕ）中にしか存在しない。なお、今後
は簡単のため、Ｓ^OはＯを省略しＳと書くことにする。
この超立方体の頂点をｖとする、すなわちｖ＝（ｖ₁，
…，ｖ_NU）、ｖ_i∈｛０，１｝である。また、頂点すべ
ての集合をＶ（＝｛０，１｝Ｎｕ）とする。Considering a N _U dimensional Euclidean space R ^NU , the output vector of the intermediate element can be regarded as a point on R ^NU . However, 0 because it is _{^{≦ S i O ≦ 1, N}} U dimensional unit hypercube ([0,1] Nu) only present in the. It should be noted that, for the sake of simplicity in the future, S ^O is to be written as to omit the O S.
Let the vertex of this hypercube be v, ie v = (v ₁ ,
.., V _NU ) and v _i {0, 1}. Also, a set of all vertices is set to V (= {0, 1} Nu).

【００４８】結合行列Ｗと有限個の入力ベクトルξα
（α＝α₁，…，α_p）が与えられたとする。Ｒ^NU上の
平面A coupling matrix W and a finite number of input vectors ξα
(Α = α ₁ ,..., Α _p ) is given. Plane on R ^NU

【数１８】を考える。これをＨ_iαと呼ぶことにする。これら平面
Ｈ_iα（α＝α₁，…，α_p,i＝１，…Ｎ^U）によって
Ｒ^NUは部分空間Ｐ_v,αへと分割される。(Equation 18) think of. This will be referred to as _Hi α. R ^NU is divided into subspaces P _v, α by these planes H _i α (α = α ₁ ,..., Α _{p, i} = 1,... N ^U ).

【００４９】[0049]

【数１９】次にＤ_vを次のように定義する。[Equation 19] Next, D _v is defined as follows.

【００５０】Ｄv ＝ｆβ（Ｐ_v,α，α） (33) ここでｆβ（Ｓ，α）は次の意味である。Dv = fβ ( _Pv, α, α) (33) where fβ (S, α) has the following meaning.

【００５１】ｆβ（Ｓ，α）＝σβ((Ｗ^UUＳ＋Ｗ^UIξα)) (34) Ｄ_vはαとβには依存しないことに注意しよう。Ｓ∈Ｐ
_v,αのとき、ｆβ（Ｓ，α）の値域がαにもβにも依存
しないことを示そう。Ｓの写像先Ｓ´は、[0051] fβ (S, α) = σβ ((W UU S + W UI ξα)) (34) D v is trying to note that does not depend on α and β. S∈P
Let us show that when _{v and} α, the range of fβ (S, α) does not depend on α or β. The mapping destination S ′ of S is

【数２０】となるが、式（３２）より、ｖ_i＝０のとき、(Equation 20) From equation (32), when v _i = 0,

【数２１】であり０＜(S')_i＜１／２となる。また、ｖ_i＝１のと
き、(Equation 21) 0 <(S ′) _i <１／. When v _i = 1,

【数２２】であり、１／２＜(S')_i＜１となる。したがって、Ｓの
写像先はαとβには依存しない領域であることがわか
る。すなわち、(Equation 22) And 1/2 <(S ′) _i <1. Therefore, it can be seen that the mapping destination of S is a region independent of α and β. That is,

【数２３】である。(Equation 23) It is.

【００５２】Ｆ：｛Ｖ×Ｘ｝→Ｖを頂点間遷移関数と呼
ぶことにする。Ｆによってｖ_pからｖ_qへ到達できる入
力シンボル列α₁α₂，…，α_nが存在するとき、ｖ_p
とｖ_qは連結しているという。連結している頂点の集合
をＶ_c（⊆Ｖ）とする。Ｓ∈Ｄ_vに対してｇ（Ｓ）＝ｖ
と関数ｇを定義する。具体的にＦはｇを使い次のように
定義することにする。F: {V × X} → V is called an inter-vertex transition function. When there exists an input symbol sequence α ₁ α ₂ ,..., Α _n that can reach v _q from v _p by F, v _p
And v _q are connected. A set of connected vertices is defined as V _c (⊆V). S∈D _v against g (S) = v
And a function g are defined. Specifically, F is defined as follows using g.

【００５３】Ｆ（ｖ，α）＝ｇｏｆβ（ｖ，α） (37) Ｎ_vを次のように定義する。[0053] F (v, α) = gofβ (v, α) (37) a N _v is defined as follows.

【００５４】Ｎ_v＝Ｄv ∩Ｐ_F(v,α₁₎，α₁∩…∩Ｐ_F(v,α_P)，α_P (38) αi が与えられた時、ｖがＦによってｖ' ＝Ｆ（ｖ，α
_i）に写像されるように、Ｐ_F(v,α₁₎，α₁はｆβによ
ってＤ_v'に写像される領域を意味する。N _v = D _v ∩P _{F (v,} α ₁₎ , α ₁ ∩... ∩P _{F (v,} α _P) , α _P (38) F (v, α
_As mapped to _i ), PF _(v, α ₁₎ , α ₁ means the area mapped to D _{v ′} by fβ.

【００５５】つまり、Ｎ_v＝０であるならば、ｇｏｆ（Ｎ_v，α_i）＝Ｆ（ｖ，α_i）但し、ｉ＝１，…，Ｐ (39) である。That is, if N _v = 0, then gof (N _v , α _i ) = F (v, α _i ) where i = 1,..., P (39).

【００５６】［定理］Ｖ_cの全ての要素ｖに対して、ｖ
∈Ｎ_v ^c（Ｎ_v ^cはＮ_vの境界も含む領域である。）で
あるならば、次のような有限値のβ₀が存在する。βが
β₀以上の任意の値のとき、ｆβαⁿｏ…ｏｆβα²ｏｆβα¹（Ｄ_vo）⊂Ｎ_v1' ｖ¹＝Ｆαⁿｏ…ｏＦα¹ｏＦα¹（ｖ₀） (40) が任意のｖ_o∈Ｖ_c、任意の長さの入力記号列α1 α2
…α∈Ｘ^*に対して成り立つ。ここで、Ｆα（Ｖ）とｆ
βαはそれぞれｆ（ｖ，α）とｆβ（Ｓ，ξα）と同じ
意味である。β₀を臨界ニューロゲインパラメータと呼
ぶことにする。[Theorem] For all elements v of V _c , v
If ∈N _{_v} ^{^c} (N _v ^c is a region including a boundary of N _v.) Is, there are beta ₀ finite value as follows. When beta is any value or _{^{β 0, fβα n o ... ofβα}} 2 ofβα 1 (D vo) ⊂N v1 'v 1 = Fα n o ... oFα 1 oFα 1 (v 0) (40) any v _o ∈V _c , input symbol string α1 α2 of arbitrary length
... holds true for α∈X ^* . Here, Fα (V) and f
βα has the same meaning as f (v, α) and fβ (S, ξα), respectively. Let β ₀ be called the critical neurogain parameter.

【００５７】［証明］全ての入力記号α∈Ｘ、つまり全
ての種類の入力ベクトルξαに対して、ップシュータてｆβα（Ｎ_v）⊆Ｔ_F(v,α）⊆Ｎ_F(v,α）となる超立方体Ｔ_v（ｖ∈Ｖ_c）がβの条件によって構
成できることを示す。[Proof] For all input symbols α∈X, that is, for all types of input vectors ξα, a pusher is given as fβα (N _v ) ⊆TF _(v, α) ⊆NF _(v, α). It is shown that the following hypercube T _v (v∈V _c ) can be formed by the condition of β.

【００５８】入力記号列α１α２…に対して頂点遷移関
数Ｆによりり頂点ｖ₀→ｖ₁→ｖ₂→…と遷移していく
としよう。この時、上記のようなＴ_vが構成できれば、
神経回路網の中間素子の出力ベクトルが作る状態ベクト
ルはＮ_v0，Ｎ_v1，Ｎ_v2，…中を順に経巡ることとなり、
安定な状態遷移をする。Assume that the vertices v ₀ → v ₁ → v ₂ →... Transition with respect to the input symbol strings α 1 α ₂ . At this time, if T _v as described above can be constructed,
The state vector created by the output vector of the intermediate element of the neural network goes through N _v0 , N _v1 , N _v2 ,.
Make a stable state transition.

【００５９】Ｓおよび入力ベクトルξαに対してFor S and the input vector ξα,

【数２４】と定義する。(Equation 24) Is defined.

【００６０】Ｓ∈Ｎ_vの時、ｆβ（Ｓ，ξα）∈Ｄ_F(v,
α₎であるので、もし（Ｆ（v,α))_i＝１であるならば
ｈ_iα＞０であり、（Ｆ（v,α))_i＝０であるならばｈ
_iα＜０である。したがって、βが無限大に近づくとき
σβ（ｈ_iα）はそれぞれ、１と０に近づくことが分か
る。[0060] when the _{S∈N v, fβ (S, ξα} ) ∈D F (v,
α ₎ , if (F (v, α)) _i = 1, then _hi α> 0, and if (F (v, α)) _i = 0, then h
_i α <0. Therefore, it can be seen that when β approaches infinity, σβ (h _i α) approaches 1 and 0, respectively.

【００６１】Ｎ_vの頂点をｚ_k（ｋ＝１，…，Ｋ）と
し、φ_v,i、ψ_v,iを次のように定義する。The vertex of N _v is z _k (k = 1,..., K), and φ _{v, i} and ψ _{v, i} are defined as follows.

【００６２】[0062]

【数２５】 φ_v,i、ψ_v,iを用いてＴ_vは次のように決めることが
できる。(Equation 25) T _v can be determined as follows using φ _{v, i} and ψ _{v, i} .

【００６３】[0063]

【数２６】ここで、Ｔ_v⊆Ｎ_vであることを注意しておく。(Equation 26) Note that T _v ⊆N _v .

【００６４】また、次のようにβ_i ^v,αを定義する。[0064] In addition, as shown in the following β _i ^v, to define the α.

【００６５】各ｖ，α，ｉおよび全てのＳ∈Ｎ_vに対し
て、For each v, α, i and all S∈N _v ,

【数２７】を満たす最小のβをβ_i ^v,αとする。[Equation 27] Let β _i ^v, α be the minimum β that satisfies.

【００６６】ここで、Ｖ' ＝Ｆ（v,α) である。ここで
Ｎ_vは凸な空間であるので、ｈ_iαの最大値や最小値は
ＳがＮ_vの頂点上で生じる。よって、全てのＳ∈Ｎ_vを
調べる必要はなく、全ての頂点だけ条件を満たしている
か調べれば十分である。また、 β^v,α＝max _iβ_i ^v,
αとするならば、β≧β^v,αに対して、ｆβ（Ｓ，ξα）∈Ｔ_v' (46) が成り立つことがわかる。したがって、Here, V ′ = F (v, α). Here, since N _v is the convex space, the maximum value and the minimum value of h _i alpha occurs on the vertices of S is N _v. Thus, it is not necessary to examine all of S∈N _v, it is sufficient to examining whether the condition is satisfied only all vertices. Β ^v, α = max _i β _i ^v,
If α is set, it can be seen that fβ (S, ∈α) _' T _{v ′} (46) holds for β ≧ β ^v, α. Therefore,

【数２８】のようにβ₁を選べば、全てのｖ∈Ｖ_cとα∈Ｘに対し
てｆβ₁（Ｎ_v,α）はＴv'の部分集合となることが分か
る。[Equation 28] By selecting β ₁ as follows, it can be seen that fβ ₁ (N _v, α) is a subset of Tv ′ for all v∈V _c and α∈X.

【００６７】したがって、上記のように定めたβ1 より
大きいβを選んでおけば、任意のＳ₀∈Ｎ_voに対してＴ_v1∋ｆβα（Ｓ₀）＝Ｓ₁，Ｔ_v2∋ｆβα（Ｓ₁）Ｓ₂，Ｔ_v3∋ｆβα（Ｓ₂）＝Ｓ₃，…となり、任意の入力記
号列に対して、常にＮ_vの中に軌道が写像されることと
なり、臨界ニューロゲインパラメータβ₀（≦β₁）は
存在する、つまり定理は証明されたことになる。Therefore, if β larger than β1 determined as described above is selected, for any S ₀ ∈N _vo , T _v1 ∋fβα (S ₀ ) = S ₁ , T _v2 ∋fβα (S ₁ ) S ₂ , T _v3 ∋fβα (S ₂ ) = S ₃ ,..., And for any input symbol string, the trajectory is always mapped into N _v , and the critical neurogain parameter β ₀ (≦ β ₁ ) exists, that is, the theorem has been proved.

【００６８】上記の内容を鑑みて、初めニューロゲイン
パラメータβをある小さい値に設定し、誤差が最小値に
なるようｗ_ijを修正する。緩やかな斜面での効率的な学
習とローカルミニマムへの落込みの回避が期待できる。In view of the above, first, the neurogain parameter β is set to a certain small value, and _wij is corrected so that the error becomes a minimum value. Efficient learning on gentle slopes and avoidance of local minimum can be expected.

【００６９】次に徐々にβを大きくしていきβ₀に達す
れば、安定な状態遷移が得られる。βが大きい場合、結
合重みの変化に対して素子の出力値が敏感に変化して誤
差が急激に増加することがある。しかし、ここではηを
適応的に決定しているので、結合重みの修正幅を慎重に
調整していることになり誤差が急激に増加することはな
い。Next, when β gradually increases and reaches β ₀ , a stable state transition can be obtained. When β is large, the output value of the element changes sensitively to the change in the connection weight, and the error may increase rapidly. However, since η is adaptively determined here, the correction width of the connection weight is carefully adjusted, and the error does not increase sharply.

【００７０】次に、図面を参照して本発明の実施形態に
ついて説明する。図１は、本発明の一実施形態に係わる
回帰型神経回路網の学習方法を実施する時系列データ学
習装置の構成を示すブロック図である。図１において、
１はパラメータ、データ、計算途中の結果等を蓄積して
おくメモリ、２は学習データが入力されるデータ入力
部、３は出力素子の出力値を出力するデータ出力部、４
は神経素子の値を蓄積しておく神経素子値記憶部、５は
結合重みの値を記憶しておく結合重み記憶部、６は各部
の動作を制御する制御部、７はパラメータから神経回路
網を構成し、結合重みを乱数により初期化する神経回路
網初期化部、８は素子の時間発展を計算し、素子値を更
新する素子時間発展計算部、９はニューロゲインパラメ
ータβの値を決めるβ値決定部、１０は学習係数ηの値
を決めるη値決定部、１１は最急降下方向を計算する最
急降下方向計算部、１２は結合重みを修正する結合重み
行列修正部である。Next, an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a time-series data learning device that implements a learning method for a recurrent neural network according to an embodiment of the present invention. In FIG.
Reference numeral 1 denotes a memory for storing parameters, data, results during calculation, etc., 2 a data input unit for inputting learning data, 3 a data output unit for outputting an output value of an output element, 4
Is a neural element value storage unit for storing values of neural elements, 5 is a connection weight storage unit for storing connection weight values, 6 is a control unit for controlling the operation of each unit, and 7 is a neural network based on parameters. , A neural network initialization unit that initializes connection weights by random numbers, 8 calculates an element time evolution, and updates an element value. 9 determines a value of a neurogain parameter β. A β value determination unit 10, a η value determination unit 10 for determining the value of the learning coefficient η, a steepest descent direction calculation unit 11 for calculating the steepest descent direction, and a connection weight matrix correction unit 12 for correcting the connection weight.

【００７１】次に、図２に示すフローチャートを参照し
て、以上のように構成される時系列データ学習装置の作
用を説明する。Next, the operation of the time-series data learning device configured as described above will be described with reference to the flowchart shown in FIG.

【００７２】まず、神経回路網初期化部７により素子数
等のパラメータから神経回路網を構成し、結合重みを乱
数により初期化し（ステップＳ２１）、ニューロゲイン
パラメータβおよび学習係数ηを初期値に設定する（ス
テップＳ２２）。なお、ニューロゲインパラメータβの
初期値は小さい値に設定する。First, the neural network initializing unit 7 forms a neural network from parameters such as the number of elements, initializes connection weights with random numbers (step S21), and sets the neurogain parameter β and the learning coefficient η to initial values. It is set (step S22). Note that the initial value of the neuro gain parameter β is set to a small value.

【００７３】次に、β値決定部９によりニューロゲイン
パラメータβの値を決定する（ステップＳ２３）。この
ニューロゲインパラメータβの値の決定処理については
図３を参照して詳細に後述する。それから、素子値を初
期化する（ステップＳ２４）。データ入力部２から入力
層の素子に時系列データを入力する（ステップＳ２
５）。素子値を時間発展の式に基づき更新する（ステッ
プＳ２６）。そして、全パターンを終了したか否か、す
なわち全ての時系列データが入力されたか否かをチェッ
ク（ステップＳ２７）、そうでない場合に、ステップＳ
２４に戻って、全ての時系列データが入力されるまで繰
り返すが、全ての時系列データが入力されている場合に
は、ステップ２８に進み、最急降下方向計算部１１で最
急降下方向を計算する。Next, the value of the neurogain parameter β is determined by the β value determining section 9 (step S23). The process of determining the value of the neuro gain parameter β will be described later in detail with reference to FIG. Then, the element values are initialized (step S24). Time series data is input from the data input unit 2 to the elements of the input layer (step S2).
5). The element value is updated based on the time evolution formula (step S26). Then, it is checked whether or not all the patterns have been completed, that is, whether or not all the time-series data has been input (step S27).
Returning to 24, the process is repeated until all the time-series data is input. If all the time-series data has been input, the process proceeds to step 28, where the steepest descent direction calculation unit 11 calculates the steepest descent direction. .

【００７４】それから、η値決定部１０により学習係数
ηの値を決定し（ステップＳ２９）、この決定された学
習係数ηおよび最急降下方向に基づいて結合重み行列修
正部１２で結合重みを修正する（ステップＳ３０）。そ
して、誤差が許容範囲以内であるか否かをチェックし
（ステップＳ３１）、誤差が許容範囲以内の場合には、
処理を終了するが、そうでない場合には、ステップＳ２
３に戻り同じ処理を繰り返す。Then, the value of the learning coefficient η is determined by the η value determination section 10 (step S29), and the connection weight is corrected by the connection weight matrix correction section 12 based on the determined learning coefficient η and the steepest descent direction. (Step S30). Then, it is checked whether or not the error is within an allowable range (step S31).
The process ends, but if not, step S2
3 and the same process is repeated.

【００７５】次に、図３に示すフローチャートを参照し
て、図２のステップＳ２３におけるニューロゲインパラ
メータβの値を適応的に決定する処理について説明す
る。なお、この処理において、最大誤差とは、Next, the process of adaptively determining the value of the neurogain parameter β in step S23 of FIG. 2 will be described with reference to the flowchart shown in FIG. In this process, the maximum error is

【数２９】であり、すべての学習データを正しく学習できたかを判
定するのに使用することもできる。この値が０．５より
小さければ、すべて正しく学習できていることになる。(Equation 29) It can also be used to determine whether all learning data has been learned correctly. If this value is smaller than 0.5, it means that all learning has been completed correctly.

【００７６】図３の処理では、まず最大誤差Ｅmax が
０．５未満であるか否かがチェックされ（ステップＳ４
１）、０．５未満である場合には、ニューロゲインパラ
メータβの値および増大幅Δをそれぞれβ₀、Δ₀に初
期化し（ステップＳ４３）、０．５未満でない場合に
は、ステップＳ４２に進み、ニューロゲインパラメータ
βをそのままの値で出力して処理を終了する。In the process of FIG. 3, first, it is checked whether or not the maximum error Emax is less than 0.5 (step S4).
1) If it is less than 0.5, the value of the neurogain parameter β and the increase width Δ are initialized to β ₀ and Δ ₀ respectively (step S43), and if not less than 0.5, the process proceeds to step S42. Then, the neuro gain parameter β is output as it is, and the process is terminated.

【００７７】それから、最大誤差Ｅ（β）がＥ（β＋
Δ）より大きいか否かをチェックし（ステップＳ４
４）、大きい場合には、ニューロゲインパラメータβの
値および増大幅をβ←β＋Δ，Δ←２Δに増加する（ス
テップＳ４５）。そして、最大誤差Ｅ（β）がＥ（β＋
Δ）より小さいか否かをチェックする（ステップＳ４
６）。小さい場合には、ニューロゲインパラメータβの
値が決定したとして、該ニューロゲインパラメータβの
値を出力し、処理を終了する（ステップＳ４７）。最大
誤差Ｅ（β）がＥ（β＋Δ）より小さくない場合には、
ステップＳ４５に戻り、同様にニューロゲインパラメー
タβの値および増大幅をβ←β＋Δ，Δ←２Δに増加し
て、同じ処理を繰り返す。Then, the maximum error E (β) becomes E (β +
Δ) is checked (step S4).
4) If it is larger, the value and increase width of the neurogain parameter β are increased to β ← β + Δ, Δ ← 2Δ (step S45). Then, the maximum error E (β) is E (β +
Δ) or not (step S4)
6). If the value is smaller, it is determined that the value of the neurogain parameter β has been determined, the value of the neurogain parameter β is output, and the process ends (step S47). If the maximum error E (β) is not smaller than E (β + Δ),
Returning to step S45, similarly, the value and the increase width of the neuro gain parameter β are increased to β ← β + Δ, Δ ← 2Δ, and the same processing is repeated.

【００７８】ステップＳ４４のチェックにおいて、最大
誤差Ｅ（β）がＥ（β＋Δ）以下である場合には、ステ
ップＳ４８に進み、ニューロゲインパラメータβの増加
幅ΔをΔ／２に減少させる。そして、最大誤差Ｅ（β）
がＥ（β＋Δ）より大きいか否かをチェックし（ステッ
プＳ４９）、大きい場合には、ニューロゲインパラメー
タ（β＋Δ）を出力し、ニューロゲインパラメータβの
値を決定し、処理を終了する（ステップＳ５０）。If it is determined in step S44 that the maximum error E (β) is equal to or smaller than E (β + Δ), the process proceeds to step S48, and the increment Δ of the neurogain parameter β is reduced to Δ / 2. And the maximum error E (β)
Is greater than or equal to E (β + Δ) (step S49), and if so, the neurogain parameter (β + Δ) is output, the value of the neurogain parameter β is determined, and the process is terminated (step S50). ).

【００７９】ステップＳ４９のチェックにおいて、最大
誤差Ｅ（β）がＥ（β＋Δ）以下の場合には、ステップ
Ｓ５１に進み、ステップＳ４８，４９の処理の回数が所
定の規定回数を越えているか否かをチェックする。越え
ていない場合には、ステップＳ４８に戻り、同じ処理を
繰り返すが、越えている場合には、ステップＳ５２に進
み、ニューロゲインパラメータＢ₀の値を出力し、処理
を終了する（βの値の決定）。If it is determined in step S49 that the maximum error E (β) is equal to or smaller than E (β + Δ), the flow advances to step S51 to determine whether or not the number of processes in steps S48 and S49 exceeds a predetermined number. Check. If not, the process returns to step S48 to repeat the same processing. If it does, the process proceeds to step S52, where the value of the neurogain parameter B ₀ is output, and the process ends (the value of β Decision).

【００８０】次に、図４に示すフローチャートを参照し
て、図２のステップＳ２９における学習係数ηの値を適
応的に決定する処理について詳述する。なお、この処理
において、Ｅ（η）とは、学習係数ηで結合重みを変更
した後で計算した全誤差の自乗和Ｅ（式（１５））のこ
とである。Next, the processing for adaptively determining the value of the learning coefficient η in step S29 of FIG. 2 will be described in detail with reference to the flowchart shown in FIG. In this process, E (η) refers to the sum of squares E (Equation (15)) of all errors calculated after changing the connection weight with the learning coefficient η.

【００８１】図４の処理では、まず学習係数ηの値をη
₀に初期化する（ステップＳ６１）。それからＥ（０）
がＥ（η）より小さいか否かをチェックし（ステップＳ
６２）、小さい場合には、学習係数ηの値をη／２に減
少させ（ステップＳ６６）、Ｅ（０）がＥ（η）より大
きいか否かをチェックする（ステップＳ６７）。大きい
場合には、ステップＳ６８に進み、学習係数ηの値を出
力し、処理を終了する（ηの値の決定）。Ｅ（０）がＥ
（η）以下の場合には、ステップＳ６６に戻り、学習係
数ηをη／２に減少し、同じ処理を繰り返す。In the process of FIG. 4, first, the value of the learning coefficient η is set to η
Initialized to ₀ (step S61). Then E (0)
Is smaller than E (η) (step S).
62) If it is smaller, the value of the learning coefficient η is reduced to η / 2 (step S66), and it is checked whether E (0) is larger than E (η) (step S67). If it is larger, the process proceeds to step S68, where the value of the learning coefficient η is output, and the process ends (determination of the value of η). E (0) is E
If (η) or less, the process returns to step S66, where the learning coefficient η is reduced to η / 2, and the same processing is repeated.

【００８２】一方、ステップＳ６２の処理において、Ｅ
（０）がＥ（η）以上の場合には、ステップＳ６３に進
み、Ｅ（η）がＥ（２η）より小さいか否かをチェック
し、小さくない場合には、ステップＳ６５に進み、学習
係数ηの値を２ηに増加し、同じ処理を繰り返すが、小
さい場合には、ステップＳ６４に進み、学習係数ηを出
力し、処理を終了する（ηの値の決定）。On the other hand, in the process of step S62, E
If (0) is equal to or larger than E (η), the process proceeds to step S63, and it is checked whether E (η) is smaller than E (2η). The value of η is increased to 2η, and the same process is repeated. If the value is small, however, the process proceeds to step S64, where the learning coefficient η is output, and the process is terminated (determination of the value of η).

【００８３】[0083]

【発明の効果】以上説明したように、本発明によれば、
回帰結合を持つ神経回路網の時系列データの学習の初期
段階においては小さなニューロゲインパラメータβを用
いることによりローカルミニマムへの落込みを極力抑制
し、効率的な学習を行うことができ、また学習がある程
度進んだ段階からは、ニューロゲインパラメータβを徐
々に増大させながら学習を行い、神経回路網は安定な状
態遷移が獲得できる。更に、ニューロゲインパラメータ
β、学習係数ηを一旦獲得した入出力関係を破壊するこ
とのないように適応的に調整することで安定な学習を行
うことができる。As described above, according to the present invention,
In the initial stage of learning the time series data of the neural network with regression coupling, the use of a small neuro-gain parameter β minimizes the drop to the local minimum, enabling efficient learning. From a stage where the learning has progressed to some extent, learning is performed while gradually increasing the neurogain parameter β, and the neural network can acquire a stable state transition. Furthermore, stable learning can be performed by adaptively adjusting the neurogain parameter β and the learning coefficient η so as not to destroy the input / output relationship once obtained.

[Brief description of the drawings]

【図１】本発明の一実施形態に係わる回帰型神経回路網
の学習方法を実施する時系列データ学習装置の構成を示
すブロック図である。FIG. 1 is a block diagram illustrating a configuration of a time-series data learning device that implements a learning method for a recurrent neural network according to an embodiment of the present invention.

【図２】図１に示す時系列データ学習装置の作用を示す
フローチャートである。FIG. 2 is a flowchart showing the operation of the time-series data learning device shown in FIG.

【図３】図２のステップＳ２３におけるニューロゲイン
パラメータβの値を適応的に決定する処理を示すフロー
チャートである。FIG. 3 is a flowchart showing a process of adaptively determining a value of a neurogain parameter β in step S23 of FIG.

【図４】図２のステップＳ２９における学習係数ηの値
を適応的に決定する処理を示すフローチャートである。FIG. 4 is a flowchart showing processing for adaptively determining a value of a learning coefficient η in step S29 of FIG. 2;

【図５】回帰型神経回路網の構成の一種であるＥｉｍａ
ｎネットの構成を示す図である。FIG. 5 shows Eima, which is a kind of a configuration of a recurrent neural network.
It is a figure showing composition of n net.

[Explanation of symbols]

１メモリ２データ入力部３データ出力部４神経素子値記憶部５結合重み記憶部６制御部７神経回路網初期化部８素子時間発展計算部９ β値決定部１０ η値決定部１１最急降下方向計算部１２結合重み行列修正部 Reference Signs List 1 memory 2 data input unit 3 data output unit 4 neural element value storage unit 5 connection weight storage unit 6 control unit 7 neural network initialization unit 8 element time evolution calculation unit 9 β value determination unit 10 η value determination unit 11 steepest descent Direction calculation unit 12 Connection weight matrix correction unit

Claims

[Claims]

When given several sets of discrete-time, discrete-valued input time-series data and discrete-time, discrete-valued target output time-series data corresponding to the data, the input time-series data is divided into discrete-time, A continuous value is input to the regression type neural network, and the connection weight parameter is sequentially corrected in the steepest descent direction of the error plane so as to reduce the error between the actual output data of the neural network and the target output time series data. A learning method for a regression-type neural network that acquires a function between the input and output time-series data, wherein the learning has progressed to some extent and the neurogain has been set so that the error does not increase from the stage where the error decreases below a reference value. A regression-type neural circuit characterized in that learning is advanced while increasing parameters, and a connection weight parameter is corrected in a direction in which an error is reduced while adjusting a learning coefficient so that an error does not increase. Web learning method.