JPH0554163A

JPH0554163A - Neural network and learning method thereof

Info

Publication number: JPH0554163A
Application number: JP21964491A
Authority: JP
Inventors: Sumio Watanabe; 澄夫渡辺
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1991-06-12
Filing date: 1991-08-30
Publication date: 1993-03-05

Abstract

PURPOSE:To provide the efficient learning method by making the property of the neural network clear by analyzing the frequency of the neural network. CONSTITUTION:In the case of calculating constants ai, Wi and thetai so as to minimize the error of output data to teacher data f(x), first of all, Fourier transformation F(k) of the teacher data f(x) is calculated (S1). Next, a sampling interval T is decided, a sequence gn is generated from gn=nT.F(nt) (S2), and a sequence pi satisfying gn+p1gn-1+p2gn-2+...+pNgn-N=0 is calculated from the sequence gn (S3). Afterwards, a resolution Zi is calculated by resolving xN+p1xN<-1>+p2 xN<-2>+...+pN=0 from the sequence pi (S4) and next, the constants Wi and thetai are calculated from Wi=-T/log¦Zi¦ and thetai=arg(Zi)/log¦Zi¦ (S5). Then, tan<-1>(Wix+thetai) is calculated from the constants Wi and thetai by replacing the output of an intermediate layer, the constant ai is calculated by a method of least squares, and the processing is finished (S6).

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明はニューラルネットワー
クに関し、音声認識、音声合成、文字認識、ロボット制
御、株価予測など、ニューラルネットワークが応用でき
る全ての分野に適用できるものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a neural network, and can be applied to all fields to which a neural network can be applied, such as voice recognition, voice synthesis, character recognition, robot control and stock price prediction.

【０００２】[0002]

【従来の技術】ここ数年、ニューラルネットワークを用
いた音声・画像認識や時系列予測等の研究が活発に行わ
れており、その有効性が確かめられている。また、ニュ
ーラルネットワークを実現したハードウェアも販売され
始め、様々な製品への応用がなされて本格的な実用化が
始まろうとしている。例えば、渡辺・米山らによる『ニ
ューラルネットワークを用いた超音波３次元物体認識
法』（信学技報，ＵＳ90−29（1990年））においてもか
なり良好な結果が得られており、実用化への検討が進め
られている。2. Description of the Related Art In recent years, studies on voice / image recognition and time series prediction using neural networks have been actively conducted, and their effectiveness has been confirmed. In addition, hardware that implements neural networks has begun to be sold, and it has been applied to various products and is about to be put into practical use. For example, Watanabe and Yoneyama et al.'S "Ultrasonic three-dimensional object recognition method using neural network" (Shingaku Giho, US90-29 (1990)) gave quite good results. Is under consideration.

【０００３】ニューラルネットワークの学習問題を定式
化すると次のようになる。すなわち、３層パーセプトロ
ン型ニューラルネットワークの学習問題とは、教師デー
タと呼ばれる与えられた実関数ｆ(x) を、予め定められ
ている単調増加関数σ(x) の線形和The learning problem of the neural network is formulated as follows. That is, the learning problem of the three-layer perceptron type neural network is that the given real function f (x) called the teacher data is linearly summed with a predetermined monotonically increasing function σ (x).

【数１】ただし、ａ_i：出力層のシナプス荷重（ｉ＝1,2,…,N、Ｎは中間層のニューロンの個数）ｗ_i：中間層のシナプス荷重 θ_i：中間層の閾値 ε(x) ：誤差に展開する問題である。つまり、関数ε(x) が所定の基
準のもとで最も小さくなるように定数ａ_i，ｗ_i，θ_i
を求める問題である。ここで、関数σ（ｗ_iｘ＋θ_i）
は中間層のｉ番目のニューロンの出力値を示し、関数σ
（ｘ）としてはシグモイド関数〔１／（１＋ｅｘｐ（−
ｘ））〕が使われる。[Equation 1] However, a _i : Synapse load of the output layer (i = 1,2, ..., N, N is the number of neurons in the intermediate layer) w _i : Synapse load of the intermediate layer θ _i : Threshold value of the intermediate layer ε (x): This is a problem that develops into error. That is, the constants a _i , w _i and θ _i are set so that the function ε (x) becomes the smallest under a predetermined criterion.
Is the problem of seeking. Here, the function σ (w _i x + θ _i )
Represents the output value of the i-th neuron in the hidden layer, and the function σ
As (x), the sigmoid function [1 / (1 + exp (-
x))] is used.

【０００４】教師データｆ(x) に対して最もよい定数ａ
_i，ｗ_i，θ_iを求める方法は、これまでのところ誤差
逆伝播法だけしか知られていない。誤差逆伝播法では、
次式で誤差関数Ｅを求めBest constant a for teacher data f (x)
Only the back-propagation method has been known so far as a method for obtaining _i , w _i and θ _i . In error backpropagation,
Calculate the error function E by the following formula

【数２】この誤差関数Ｅを利用して、最急降下法 Δａ_i＝−η∂E/∂ａ_i Δｗ_i＝−η∂E/∂ｗ_i Δθ_i＝−η∂E/∂θ_i により、定数ａ_i，ｗ_i，θ_iを求めるものである。[Equation 2] Using this error function E, the steepest descent method Δa _i = −η∂E / ∂a _i Δw _i = −η∂E / ∂w _i Δθ _i = −η∂E / ∂θ _{i is used} to obtain a constant a _i , W _i , θ _i are obtained.

【０００５】[0005]

【発明が解決しようとする課題】誤差逆伝播法は、これ
まで困難であった３層以上のニューラルネットワークの
学習を可能にしたが、この方法はランダムな初期値から
出発して最急降下法で学習を行うため、次のような不都
合が生じる。学習に膨大な時間を要する。学習が局所極小に落ちて進まなくなることがある。どのような特徴が学習されているのか分らず、学習
結果が解釈できない。中間層の役割が明らかでなく、その個数は試行錯誤
によって決定しなければならない。個数が少なすぎると
学習が進まず、多すぎると過学習となる。The back-propagation method has made it possible to learn a neural network having three or more layers, which has been difficult so far. However, this method starts from a random initial value and uses the steepest descent method. Since learning is performed, the following inconveniences occur. It takes a huge amount of time to learn. Learning may fall to a local minimum and not progress. I do not understand what features are being learned, and I cannot interpret the learning results. The role of the intermediate layer is not clear and its number must be determined by trial and error. If the number is too small, learning will not proceed, and if too many, over learning will occur.

【０００６】そこで、これらの不都合を補正するために
誤差逆伝播法を改良するいくつかの試みがなされてい
る。しかし、どの方法もニューラルネットワークの数理
的な性質を明らかにすることなく行われているため、本
質的な改善とはなっていない。また、ニューラルネット
ワークの数理的な性質を明らかにすることは、その非線
形性のために困難であると考えられてきた。この発明
は、ニューラルネットワークの周波数解析を行うことに
よってニューラルネットワークの性質を明らかにし、効
率的な学習方法を提供することを目的とする。Therefore, several attempts have been made to improve the error backpropagation method in order to correct these disadvantages. However, none of these methods are essential improvements because they are performed without clarifying the mathematical properties of neural networks. It has also been considered difficult to clarify the mathematical properties of neural networks because of their non-linearity. An object of the present invention is to clarify the properties of a neural network by performing frequency analysis of the neural network and to provide an efficient learning method.

【０００７】[0007]

【課題を解決するための手段】この発明は、入力層と、
複数Ｎユニットのニューロンを有する中間層と、出力層
とを備え、入力層から中間層へのシナプス荷重をｗ
_i（ただし、ｉ＝1,2,…,N）、中間層の各ニューロンの
閾値をθ_i、中間層から出力層へのシナプス荷重をａ_i
とするとき、教師データｆ(x) のフーリエ変換Ｆ(k) を
求める第１の処理行程と、任意のサンプリング間隔Ｔを
定めて数列｛ｇ_n｝を「ｇ_n＝ｎＴ・Ｆ(nT)」によって
生成する第２の処理行程と、数列｛ｇ_n｝から「ｇ_n＋
ｐ₁ｇ_n-1＋ｐ₂ｇ_n-2＋・・・・＋ｐ_Nｇ_n-N＝０」なる
条件を満たす数列｛ｐ_i｝を求める第３の処理行程と、
数列｛ｐ_i｝からＮ次元複素代数方程式「ｘ^N＋ｐ₁ｘ
^N-1＋ｐ₂ｘ^N-2＋…＋ｐ_N＝０」を解き解｛Ｚ_i｝を
求める第４の処理行程と、次式から定数ｗ_iおよびθ_i
を求める第５の処理行程と、ｗ_i＝−Ｔ／ log｜Ｚ_i｜ θ_i＝arg(Ｚ_i）／ log｜Ｚ_i｜中間層の出力を「 tan^-1（ｗ_iｘ＋θ_i）」から求め、
その結果から教師データｆ(x) に対する出力データの誤
差が最小となるように定数ａ_iを最小２乗法によって求
める第６の処理行程とによって各定数ｗ_i、θ_iおよび
ａ_iを決定することを特徴とする。SUMMARY OF THE INVENTION The present invention comprises an input layer,
An intermediate layer having a plurality of N units of neurons and an output layer are provided, and the synaptic weight from the input layer to the intermediate layer is w.
_i (where i = 1, 2, ..., N), the threshold value of each neuron in the intermediate layer is θ _i , and the synaptic weight from the intermediate layer to the output layer is a _i
Then, the first processing step for obtaining the Fourier transform F (k) of the teacher data f (x) and the arbitrary sampling interval T are determined and the sequence {g _n } is changed to “g _n = nT · F (nT)”. a second processing step of generating by "," g _n from the sequence {g _n} +
and a third processing step for _{obtaining a} sequence {p _i } satisfying the condition "p ₁ g _n-1 + p ₂ g _n-2 + ... + p _N g _nN = 0"
From the sequence {p _i }, the N-dimensional complex algebraic equation “x ^N + p ₁ x
^N-1 + p ₂ x ^N-2 + ... + p _N = 0 "to obtain a solution {Z _i } and the constants w _i and θ _i from the following equation.
And a fifth processing step for obtaining the following: w _i = −T / log | Z _i ｜ θ _i = arg (Z _i ) / log | Z _i ｜ The output of the intermediate layer is “tan ⁻¹ (w _i x + θ _i )”. Requested from
From the result, each constant w _i , θ _i and a _i is determined by the sixth processing step of obtaining the constant a _i by the least square method so that the error of the output data with respect to the teacher data f (x) is minimized. Is characterized by.

【０００８】[0008]

【作用】この発明は、ニューラルネットワークの周波数
解析を行うことによってニューラルネットワークの構造
を解明し、ニューラルネットワークの学習問題を繰り返
し法ではなく直接法によって解決するようにしている。
周波数解析の結果明らかになることは、ニューラルネッ
トワークの学習は周波数軸上での線形予測の問題に帰着
するということである。線形予測の問題は音声合成等で
従来から用いられてきた音声の分析法の中で既に深く研
究されており、その場合とほとんど同様の方法によって
ニューラルネットワークの問題も解決することができ
る。According to the present invention, the structure of the neural network is clarified by performing frequency analysis of the neural network, and the learning problem of the neural network is solved by the direct method instead of the iterative method.
What becomes clear as a result of the frequency analysis is that the learning of the neural network results in the problem of linear prediction on the frequency axis. The problem of linear prediction has already been studied deeply in speech analysis methods that have been conventionally used in speech synthesis and the like, and the problem of neural networks can be solved by almost the same method.

【０００９】[0009]

【実施例】図１は、この発明の処理手順を示すフローチ
ャートである。この発明は、図２に示すように、入力層
１、複数Ｎ個のニューロンを有する中間層２および出力
層３からなる３層パーセプトロンにおいて、入力層１か
ら中間層２へのシナプス荷重がｗ_i（ｉ＝１，２，…，
Ｎ）、中間層２の各ニューロンの閾値がθ_i、中間層２
から出力層３へのシナプス荷重がａ_iのときに、望まし
い出力値である教師データｆ(x) に対する出力データの
誤差ε(x) が最小となるように各定数ａ_i、ｗ_i、θ_i
を求めることにある。すなわち、DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 is a flow chart showing the processing procedure of the present invention. As shown in FIG. 2, the present invention is a three-layer perceptron consisting of an input layer 1, an intermediate layer 2 having a plurality of N neurons and an output layer 3, and a synaptic load from the input layer 1 to the intermediate layer 2 is w _i. (I = 1, 2, ...,
N), the threshold of each neuron in the hidden layer 2 is θ _i , and the hidden layer 2
Constants a _i , w _i , θ so that the error ε (x) of the output data with respect to the desired output value, the teacher data f (x), is minimized when the synaptic load from the output layer 3 to the output layer 3 is a _i. _i
Is to ask. That is,

【数３】において、関数ε(x) を、所定の基準のもとで最も小さ
くなるように定数ａ_i、ｗ_i、θ_iを求めることにあ
る。[Equation 3] In, the constants a _i , w _i and θ _i are obtained so that the function ε (x) becomes the smallest under a predetermined criterion.

【００１０】図１において、まず、教師データｆ(x) の
フーリエ変換Ｆ(k) を求める。ここで、「Ｆ(k) ≡∫ｆ
(X) exp(ikx)dx」である（ステップＳ１）。In FIG. 1, first, the Fourier transform F (k) of the teacher data f (x) is obtained. Here, "F (k) ≡ ∫f
(X) exp (ikx) dx ”(step S1).

【００１１】次いで、適当なサンプリング間隔Ｔを決め
て数列｛ｇ_n｝を、「ｇ_n＝ｎＴ・Ｆ(nT)」によって生
成する。ただし、ｎは整数である（ステップＳ２）。Then, an appropriate sampling interval T is determined and a sequence {g _n } is generated by "g _n = nTF (nT)". However, n is an integer (step S2).

【００１２】次いで、得られた数列｛ｇ_n｝からｇ_n＋ｐ₁ｇ_n-1＋ｐ₂ｇ_n-2＋・・・・＋ｐ_Nｇ_n-N＝０を満たす数列｛ｐ_i｝を求める（ステップＳ３）。次数
Ｎの決定は中間層のニューロンの個数の決定に対応す
る。線形予測問題としての数列｛ｐ_i｝を解く方法とし
ては共分散法と自己相関法とがあるが、この点について
は後述する。[0012] Then, determine the sequence {p _i} that satisfies the obtained sequence _{_{{g n} g n + p}} 1 g n-1 + p 2 g n-2 + ···· + p N g nN = 0 ( step S3). The determination of the order N corresponds to the determination of the number of neurons in the hidden layer. There are a covariance method and an autocorrelation method as a method for solving a sequence {p _i } as a linear prediction problem, which will be described later.

【００１３】次いで、得られた数列｛ｐ_i｝からＮ次元
複素代数方程式ｘ^N＋ｐ₁ｘ^N-1＋ｐ₂ｘ^N-2＋・・・＋ｐ_N＝０を解いて解｛Ｚ_i｝を求める（ステップＳ４）。Ｎ次元
複素代数方程式の数値解法としては、高速かつ正確で高
名なＤＫＡ法を用いることができる。Then, an N-dimensional complex algebraic equation x ^N + p ₁ x ^N-1 + p ₂ x ^N-2 + ... + p _N = 0 is solved from the obtained sequence {p _i } to obtain a solution {Z _i }. Obtained (step S4). As a numerical solution for the N-dimensional complex algebraic equation, a fast, accurate and well-known DKA method can be used.

【００１４】次いで、定数ｗ_i，θ_iを、次式によって
求める（ステップＳ５）。ｗ_i＝−Ｔ／ log｜Ｚ_i｜ θ_i＝arg(Ｚ_i）／ log｜Ｚ_i｜ただし、「arg(x)」は複素数ｘの偏角を表す。Next, the constants w _i and θ _i are obtained by the following equation (step S5). w _i = −T / log | Z _i | θ _i = arg (Z _i ) / log | Z _i | where “arg (x)” represents the argument of the complex number x.

【００１５】次いで、中間層の出力「σ（ｗ_iｘ＋
θ_i）」を「 tan^-1（ｗ_iｘ＋θ_i）」と置換し、先に
求めた定数ｗ_iおよびθ_iから「 tan^-1（ｗ_iｘ＋
θ_i）」を求め、これから定数ａ_iを最小２乗法によっ
て求め（ステップＳ６）、処理を終了する。Then, the output of the intermediate layer, "σ (w _i x +
theta _i) "and" ^{_{tan -1 (w i x + θ}} i) "and replacing the constant previously determined w _i and theta" tan ^-1 (w from _i _i x +
θ _i ) ”, and the constant a _i is calculated from this by the method of least squares (step S6), and the process ends.

【００１６】ステップＳ６では、「σ(x) ＝ tan^-1(x)
」と置くことによって前述のニューラルネットワーク
の学習問題が解析的に解くことが出来ることに着目して
いる。このようにすれば、定数ａ_i，ｗ_i，θ_iを、従
来のように繰り返しによらず直接求めることが可能とな
る。「 tan^-1(x) 」はニューラルネットワークの出力関
数として通常用いられるシグモイド関数とよく似た形状
をしており、この関数を用いることでニューラルネット
ワークの能力が変わることはない。In step S6, "σ (x) = tan ^-1 (x)
It is noted that the above learning problem of the neural network can be solved analytically by putting "." In this way, the constants a _i , w _i and θ _i can be directly obtained without repeating as in the conventional case. “Tan ⁻¹ (x)” has a shape very similar to the sigmoid function that is usually used as the output function of a neural network, and using this function does not change the capacity of the neural network.

【００１７】次に、これら一連の処理（ステップＳ１〜
Ｓ６）によってニューラルネットワークの学習が可能な
理由を、次の定理によって説明する。なお、以下でいう
フーリエ変換とは緩増加超関数としてのフーリエ変換を
意味する。「 tan^-1(x) 」やシグモイド関数は通常の関
数の意味ではフーリエ変換が出来ないが、緩増加超関数
としてはフーリエ変換が出来るためである。Next, a series of these processes (steps S1 to S1)
The reason why the neural network can be learned by S6) will be explained by the following theorem. The Fourier transform described below means a Fourier transform as a slowly increasing superfunction. This is because "tan ^-1 (x)" and the sigmoid function cannot be Fourier transformed in the usual sense, but they can be Fourier transformed as a slowly increasing superfunction.

【００１８】〔定理〕次の３つの命題は同値である。関数ｆ(x) がある定数ａ_i，ｗ_i，θ_i((ｗ_i，θ
_i）≠（ｗ_i，θ_i))を用いて次のように表される。[Theorem] The following three propositions are equivalent. The function f (x) has constants a _i , w _i , θ _i ((w _i , θ
_i ) ≠ (w _i , θ _i )) is used to express as follows.

【数４】関数ｆ(x) のフーリエ変換を、「Ｆ(k) ≡∫f(x)ex
p(ikx)dx」とするとき、「ｋＦ(k) 」はある定数ｃ_i，
ｄ_i（ｄ_iは複素数、ｄ_i≠ｄ_j）を用いて次のように
表される。[Equation 4] The Fourier transform of the function f (x) is calculated as “F (k) ≡ ∫f (x) ex
"p (ikx) dx", "kF (k)" is a constant c _i ,
It is expressed as follows using d _i (d _i is a complex number, d _i ≠ d _j ).

【数５】関数ｆ(x) のフーリエ変換を、「Ｆ(k) ≡∫f(x)ex
p(ikx)dx」とするとき、「ｇ_n≡ｎＴ・Ｆ（nT）」（た
だし、ｎは整数、Ｔは任意の定数）とおくと、ｇ_n＋ｐ₁ｇ_n-1＋ｐ₂ｇ_n-2＋・・・・＋ｐ_Nｇ_n-N＝０ …(5) が成立するような定数（ｐ₁，ｐ₂，・・・・，ｐ_N）が存
在する。ここで、Ｎ次元複素代数方程式「ｘ^N＋ｐ₁ｘ
^N-1＋ｐ₂ｘ^N-2＋・・・・＋ｐ_N＝０」は重解を持たない
とする。〔定理終了〕[Equation 5] The Fourier transform of the function f (x) is calculated as “F (k) ≡ ∫f (x) ex
If p (ikx) dx ", then" g _n ≡nT · F (nT) "(where n is an integer and T is an arbitrary constant), then g _n + p ₁ g _n-1 + p ₂ g _{n There} are constants (p ₁ , p ₂ , ..., P _N ) such that _-2 + ... + P _N g _nN = 0 (5) holds. Here, the N-dimensional complex algebraic equation “x ^N + p ₁ x
^N-1 + p ₂ x ^N-2 + ... + p _N = 0 "has no multiple solution. [End theorem]

【００１９】〔証明〕 <==>：「 tan^-1（wx＋θ）」のフーリエ変換が、「ｊ（π/2)
^1/2・exp[−ｋ｛（１＋ｊθ）／ｗ｝] ／ｋ」となるこ
とから明らかである。[Proof] <==>: The Fourier transform of “tan ⁻¹ (wx + θ)” is “j (π / 2)
^1/2 · exp [−k {(1 + jθ) / w}] / k ”.

【００２０】 ==>：「Ｚ_i＝ exp(d_iＴ）」とおくと、式(4) から、==>: If “Z _i = exp (d _i T)” is set, then from equation (4),

【数６】が成立する。次に、多項式Ｓ(x) を考え、その展開係数
をｐ_iと置く。Ｓ(x) ≡（x-Z₁)・(x-Z₂)…(x-Z_N) ＝ｘ^N＋p₁ｘ^N-1＋p₂ｘ^N-2＋・・・・＋ｐ_N …(7) そこで、次の値を計算すると[Equation 6] Is established. Next, the polynomial S (x) is considered, and its expansion coefficient is set as p _i . S (x) ≡ (xZ ₁ ) ・ (xZ ₂ )… (xZ _N ) ＝ x ^N ＋ p ₁ x ^N-1 ＋ p ₂ x ^N-2 ＋・・・・・・＋ p _N … (7) Then, the following value And calculate

【数７】式(6) を用いて[Equation 7] Using equation (6)

【数８】こうして式(5) が示された。この証明から、係数ｐ₁，
ｐ₂，…，ｐ_Nとｄ₁との関係も明らかになった。[Equation 8] Equation (5) is thus shown. From this proof, the coefficient p ₁ ,
The relationship between p ₂ , ..., P _N and d ₁ was also clarified.

【００２１】 ==>：式(5) を満たす数列は、最初のＮ項が決れば残りの項も
決まるので、その解は一つしかない。そこで、式(6) の
ような数列がどのような初期値の与え方によっても求ま
る（ｃ_iが求まる）かを示せばよい。それには次の行列==>: The number sequence satisfying the equation (5) has only one solution because the first N terms determine the remaining terms. Therefore, it suffices to show whether the sequence as shown in the equation (6) can be obtained (c _i can be obtained) by any method of giving the initial value. It has the matrix

【数９】が逆行列を持つことを示せばよい。この行列の行列式
は、Π_i>j（Ｚ_i−Ｚ_j）で与えられるので（Vandermo
nde の行列式）、次の方程式ｘ^N＋ｐ₁ｘ^N-1＋ｐ₂ｘ^N-2＋・・・・＋ｐ_N＝０が重解を持たないことから、行列式はゼロでない。〔証
明終了〕[Equation 9] It suffices to show that has an inverse matrix. The determinant of this matrix is given by Π _{i> j} (Z _i −Z _j ), so (Vandermo
The determinant is not zero because the following equation x ^N + p ₁ x ^N-1 + p ₂ x ^N-2 + ... + p _N = 0 does not have multiple solutions. [End of certification]

【００２２】〔系〕前記の定理に現れる定数には、次の
ような関係がある。ｄ_i＝−（１＋ｊθ_i）／ｗ_i また、｛exp(ｄ_iＴ) ｝は次の方程式ｘ^N＋ｐ₁ｘ^N-1＋ｐ₂ｘ^N-2＋・・・・＋ｐ_N＝０の解である。〔系終了〕[System] The constants appearing in the above theorem have the following relationship. d _i = − (1 + jθ _i ) / w _i Also, {exp (d _i T)} is a solution of the following equation x ^N + p ₁ x ^N-1 + p ₂ x ^N-2 + ... + p _N = 0. Is. [End of system]

【００２３】前記の定理からニューラルネットワークの
学習に関して次のような点が明らかとなり、その結果、
前述した学習アルゴリズムを構成することが出来る。ニューラルネットワークの学習は周波数軸上での線
形予測問題に帰着する。中間層の個数は線形予測の次数と一致する。入力層から中間層への重みｗ_i，θ_iだけによって
線形予測の係数｛ｐ_i｝は決定される。中間層から出力
層への重みａ_iは線形予測の係数｛ｐ_i｝に影響を与え
ない。すなわち、周波数軸上の線形予測によって求まる
のは入力層から中間層への重みである。From the above theorem, the following points regarding the learning of the neural network become clear, and as a result,
The learning algorithm described above can be constructed. Learning the neural network results in a linear prediction problem on the frequency axis. The number of hidden layers matches the order of linear prediction. The linear prediction coefficient {p _i } is determined only by the weights w _i and θ _i from the input layer to the intermediate layer. The weights a _i from the hidden layer to the output layer do not affect the linear prediction coefficients {p _i }. That is, what is obtained by linear prediction on the frequency axis is the weight from the input layer to the intermediate layer.

【００２４】以上のことから、次のような工学上の見地
が得られる。解こうとしている工学的な問題にニューラルネット
ワークの応用が適するか否かは、周波数軸上の線形予測
が適するか否かによって判断できる。中間層の個数は線形予測の残差が十分小さくなるか
どうかによって決定される。中間層から出力層への重みは線形予測を行った後、
最小２乗法によって決められるのであり、ニューラルネ
ットワークの本質は入力層から出力層への重み決定に存
在する。From the above, the following engineering point of view can be obtained. Whether or not the application of the neural network is suitable for the engineering problem to be solved can be judged by whether or not the linear prediction on the frequency axis is suitable. The number of hidden layers is determined by whether the residual of linear prediction is sufficiently small. The weight from the hidden layer to the output layer is
It is determined by the method of least squares, and the essence of the neural network lies in the weight determination from the input layer to the output layer.

【００２５】この事実を利用すると、中間層の個数を自
動的に決定することが出来る。すなわち、中間層の個数
をある値にして学習を行い、線形予測による残差が十分
小さくならなかったときには、中間層の個数が足りなか
ったのであるから、中間層の個数を増やして線形予測を
やり直す。そうして予め設定した残差の値よりも小さく
なるまでその操作を繰り返すと、必要となる最小の中間
層の個数を決定することが出来る。By utilizing this fact, the number of intermediate layers can be automatically determined. That is, when the learning is performed with the number of hidden layers set to a certain value, and the residual due to linear prediction does not become sufficiently small, the number of hidden layers is insufficient, so the linear prediction is performed by increasing the number of hidden layers. Start over. Then, by repeating the operation until it becomes smaller than the preset residual value, the minimum number of intermediate layers required can be determined.

【００２６】これまでの説明では、入力層および出力層
が１ユニットで中間層が複数Ｎユニットの場合について
述べていたが、次に、入力層が複数ユニットになった場
合について説明する。この場合は１次元の場合に問題が
帰着する。In the above description, the case where the input layer and the output layer are one unit and the intermediate layer is a plurality of N units has been described. Next, the case where the input layer is a plurality of units will be described. In this case, the problem is reduced to the one-dimensional case.

【００２７】〔補題〕[Lemma]

【数１０】のフーリエ変換Ｆ(k₁,k₂,・・・・,ｋ_N）は、次のように与
えられる。[Equation 10] The Fourier transform F (k ₁ , k ₂ , ..., K _N ) of is given as follows.

【数１１】ここで関数ρは関数σのフーリエ変換である。〔補題終
了〕[Equation 11] Here, the function ρ is the Fourier transform of the function σ. [End of lemma]

【００２８】〔系〕式(9) のような関数のフーリエ変換
は原点を通る直線の組み合わせ[System] Fourier transform of a function such as equation (9) is a combination of straight lines passing through the origin.

【数１２】以外では０になる。〔系終了〕与えられた関数のフーリエ変換を利用して、式(11)の直
線が求まると、比ｗ_i1 : ｗ_i2 : ｗ_i3 : ・・・・ :ｗ_iN …(12) が求まったことになる。この比は関数σの形状によらな
い。ｗ_i1の値を求めるには関数σの形状が必要になる
が、式(12)の比の値は関数σに依らない不変量である。[Equation 12] It becomes 0 in all other cases. [System end] When the straight line of the equation (11) is obtained by using the Fourier transform of the given function, the ratio w _i1 : w _i2 : w _i3 : ...: w _iN ... (12) is obtained. It will be. This ratio does not depend on the shape of the function σ. The shape of the function σ is required to obtain the value of w _i1 , but the value of the ratio in Expression (12) is an invariant that does not depend on the function σ.

【００２９】式(11)のような直線を求めるには、関数｜
Ｆ(k₁,k₂ , …,k_N）｜をＮ次元画像とみなして、原点
を通る直線を全て求めればよい。各直線上では、To obtain a straight line as shown in equation (11), the function |
F (k ₁ , k ₂ , ..., K _N ) | is regarded as an N-dimensional image, and all the straight lines passing through the origin may be obtained. On each straight line,

【数１３】を解くことになり、１次元の問題に帰着する。定数ｗ_i1
およびθ_iを求めるためには、再び「σ(x) ＝ tan
^-1(x) 」の場合を利用すればよい。[Equation 13] Will result in a one-dimensional problem. Constant w _i1
And θ _i are calculated again by using “σ (x) = tan
^-1 (x) "can be used.

【００３０】ところで、２層パーセプトロンでは解くこ
とはできないが、３層以上で解けるものとして排他的論
理和の問題がある。この問題についてシミュレーション
を行ったところ、図３に示すような結果が得られた。図
中、（ａ）は教師データ、（ｂ）は誤差伝播法による１
万回の学習結果、（ｃ）はこの発明の学習方法による学
習結果である。なお、中間層の個数は５個である。By the way, although it cannot be solved by the two-layer perceptron, there is an exclusive OR problem that it can be solved by three or more layers. When simulation was performed on this problem, the results shown in FIG. 3 were obtained. In the figure, (a) is teacher data, (b) is 1 by the error propagation method.
(C) is the learning result by the learning method of the present invention. The number of intermediate layers is five.

【００３１】この結果から明らかなように、従来の学習
方法（ｂ）に比べこの発明による学習方法（ｃ）の方が
より忠実に教師データを捉えている。学習時間も従来方
法では約10時間要するのに対し、この発明による学習方
法では数秒ですみ、その有効性が確かめられた。As is clear from this result, the learning method (c) according to the present invention captures the teacher data more faithfully than the conventional learning method (b). The learning time required for the conventional method was about 10 hours, whereas the learning method according to the present invention only required a few seconds, and the effectiveness was confirmed.

【００３２】次に、前述したステップＳ３における線形
予測問題の解法について述べる。この解法については周
知の事柄であるが、前述の処理手順の実現が可能である
ことを示すために簡単に述べることとする。Next, a method of solving the linear prediction problem in step S3 described above will be described. Although this solution is a well-known matter, it will be briefly described in order to show that the above-described processing procedure can be realized.

【００３３】まず、複素数の数列｛ｘ_t｝_t=0,1,.... ,
_M-1をＮ次で線形予測する問題について考える。ｘ_t＋α₁ｘ_t-1＋・・・＋α_Nｘ_t-N＝ε_t …(14) ここで、α_i，ε_tはそれぞれ複素数である。誤差Ｅは
次のように定義する。First, a complex number sequence {x _t } _{t = 0,1, ...}
Consider the problem of linearly predicting _M-1 in the Nth order. x _t + α ₁ x _t-1 + ... + α _N x _tN = ε _t (14) Here, α _i and ε _t are complex numbers, respectively. The error E is defined as follows.

【数１４】とおくと（ただし、ｃ_ijは自己共役、ｃ_ij ^*＝ｃ_ji）、[Equation 14] (Where c _ij is self-adjoint, c _ij ^* = c _ji ),

【数１５】が得られる。この値を最小にする（α_i＝ｐ_i＋sqrt
（−１）ｑ_i) は、 ∂E/∂ｐ_i＝∂E/∂ｑ_i＝０から特徴づけることができ、「ｃ_ij＝ａ_ij＋sqrt（−
１）ｂ_ij」とするとき、連立方程式[Equation 15] Is obtained. Minimize this value (α _i = p _i + sqrt
(−1) q _i ) can be characterized by ∂E / ∂p _i = ∂E / ∂q _i = 0, and “c _ij = a _ij + sqrt (−
1) b _ij ”, simultaneous equations

【数１６】を解くことによって求めることができる。最適化する範
囲（t₀，t₁）を（ｐ_,M-1)と選んでこの方程式を解く場
合を共分散法、（−∞，＋∞）を選んで（０，M-1)以外
での値を０と置く方法を自己相関法という。[Equation 16] Can be found by solving. The optimization range (t ₀ , t ₁ ) is chosen as (p _, M-1) and this equation is solved by the covariance method, and (-∞, + ∞) is chosen except (0, M-1). The method of setting the value of at 0 is called the autocorrelation method.

【００３４】[0034]

【発明の効果】この発明によれば、ニューラルネットワ
ークの非線形関数として「ｙ＝ tan^-1(x) 」を選び、そ
の学習が周波数軸上の線形予測問題に帰着することに着
目して学習方法を構成するようにしたので、学習速度が
誤差逆伝播法よりも１000 倍近く高速になり、かつニュ
ーラルネットワークの本質の理解を数理的に行うことが
可能となった。この結果、音声認識、音声合成、文字認
識、物体認識、株価予想、ロボット制御等に著しい進展
が期待できる。According to the present invention, the learning method is selected by focusing on the fact that "y = tan ^-1 (x)" is selected as the non-linear function of the neural network and the learning results in a linear prediction problem on the frequency axis. Since the learning speed is nearly 1000 times faster than the error backpropagation method, it is possible to mathematically understand the essence of neural networks. As a result, significant progress can be expected in voice recognition, voice synthesis, character recognition, object recognition, stock price prediction, robot control, and the like.

【図面の簡単な説明】[Brief description of drawings]

【図１】この発明の処理手順を示すフローチャートであ
る。FIG. 1 is a flowchart showing a processing procedure of the present invention.

【図２】３層パーセプトロンのブロック図である。FIG. 2 is a block diagram of a three-layer perceptron.

【図３】排他的論理和問題のシミュレーション結果を示
す図である。FIG. 3 is a diagram showing a simulation result of an exclusive OR problem.

[Explanation of symbols]

１入力層２中間層３出力層 1 Input layer 2 Middle layer 3 Output layer

Claims

[Claims]

1. An input layer, an intermediate layer having neurons of a plurality of N units, and an output layer, wherein synaptic weights from the input layer to the intermediate layer are w _i (where i = 1, 2,
, N), where the threshold of each neuron in the intermediate layer is θ _i , and the synapse weight from the intermediate layer to the output layer is a _i , the Fourier transform F (k) of the training data f (x) is obtained. The first processing step and an arbitrary sampling interval T are determined, and the sequence {g _n } is changed to "g
The second processing step generated by “ _n = nT · F (nT)” and the above sequence {g _n } to “g _n + p ₁ g _n-1 + p ₂ g _n-2”.
A third processing step for _{obtaining a} sequence {p _i } satisfying the condition “+ ... · + p _N g _nN = 0” and the N-dimensional complex algebraic equation “x ^N + p” from the above sequence {p _i }.
₁ x ^N-1 + p ₂ x ^N-2 + ... + p _N = 0 "to obtain a solution {Z _i } and a fifth process for obtaining the constants w _i and θ _i from the following equations And w _i = −T / log | Z _i ｜ θ _i = arg (Z _i ) / log | Z _i ｜ The output of the intermediate layer is calculated from “tan ⁻¹ (w _i x + θ _i )”, and the result is obtained. From the above, a sixth processing step of obtaining the constant a _i by the least square method so that the error of the output data with respect to the teacher data f (x) is minimized, and the constants w _i , θ _i and a _i are determined by Neural network characterized by.

2. An input layer, an intermediate layer having neurons of a plurality of N units, and an output layer, wherein a synaptic weight from the input layer to the intermediate layer is w _i (where i = 1, 2,
, N), the threshold of each neuron in the intermediate layer is θ _i , the synapse weight from the intermediate layer to the output layer is a _i , and a constant w that minimizes the error of the output data with respect to the teacher data f (x) _In the learning of the multi-layer perceptron for obtaining _i , θ _i , and a _i , the first processing step for obtaining the Fourier transform F (k) of the teacher data f (x) and an arbitrary sampling interval T are set and a sequence {g _n } To “g
The second processing step generated by “ _n = nT · F (nT)” and the above sequence {g _n } to “g _n + p ₁ g _n-1 + p ₂ g _n-2”.
A third processing step for _{obtaining a} sequence {p _i } satisfying the condition “+ ... · + p _N g _nN = 0” and the N-dimensional complex algebraic equation “x ^N + p” from the above sequence {p _i }.
₁ x ^N-1 + p ₂ x ^N-2 + ... + p _N = 0 "and a fifth process for obtaining a solution {Z _i } and a fifth process for obtaining the constants w _i and θ _i from the following equations And w _i = −T / log | Z _i ｜ θ _i = arg (Z _i ) / log | Z _i ｜ The output of the intermediate layer is calculated from “tan ⁻¹ (w _i x + θ _i )”, and the result is obtained. And a sixth processing step for obtaining the above-mentioned constants a _i by the least-squares method, and a learning method for a neural network characterized by:

3. The learning according to claim 2, wherein as a result of performing the learning, if the error is smaller than a predetermined value, the learning is ended, and if not smaller, the number of intermediate layers is increased and the learning is performed again. A learning method for a neural network, characterized in that the number of intermediate layers is determined by repeating the learning until the error becomes smaller than a predetermined value.

4. In learning of a multi-layer perceptron having three or more layers, a straight line passing through the origin is applied to the Fourier transform of a given function when the dimension of input data is two or more, and the result can be linearly predicted. A learning method and its learning method, characterized in that learning is performed by utilizing the fact that it is reduced to.

5. When approximating a given function using a neural network, the Fourier transform of the function is obtained, the function is approximated in frequency space, and then the approximate function is obtained by inverse Fourier transform. A function approximation method using a neural network.