JPH07121498A

JPH07121498A - Method for constructing neural network

Info

Publication number: JPH07121498A
Application number: JP5287649A
Authority: JP
Inventors: Shinichi Tamura; 震一田村
Original assignee: NipponDenso Co Ltd
Current assignee: Denso Corp
Priority date: 1993-10-22
Filing date: 1993-10-22
Publication date: 1995-05-12
Anticipated expiration: 2018-02-04
Also published as: JP3374476B2

Abstract

PURPOSE:To enable constructing an efficient and optimum neural network by optimally deciding the number of neurons added in the hidden layer of a feed-forward-type neural network. CONSTITUTION:The feed-forward-type neural network has an input layer, an output layer and more than one hidden layer and a link-weight is set by learning based on learning data. Then, within (n) (n>=2) neurons contained in the hidden layer, the rank of the matrix O (mXn matrix) of a linear expression, Ow=t, which is generated by a vector W formed by the link weights Wj of the j- th(1<=j<=n) neuron and a vector t formed by the (m) learning values ti(i=1, 2,..., m) of learning data, is obtained and the value of the rank is adopted as the optimum number of the neurons in the hidden layer. Thus, the efficient and optimum neural network can be constructed.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、ニューラルネットワー
ク(NN)の構築方法に関し、特に、フィードフォワード型
ニューラルネットワークの隠れ層（中間層とも呼ばれ
る）に含まれるニューロンの個数を最適に決定する方法
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for constructing a neural network (NN), and more particularly to a method for optimally determining the number of neurons included in a hidden layer (also called an intermediate layer) of a feedforward type neural network. .

【０００２】[0002]

【従来の技術】従来、フィードフォワード型ニューラル
ネットワーク（階層型NNとも呼ばれる。以下、FFNNと記
す）においては、ある特定の問題に適用する場合に、ネ
ットワークの層数や、隠れ層のニューロン数等は、研究
者の経験と勘によって適当に決められている。そして階
層化されたFFNNではBP学習（バックプロパゲーション学
習、逆伝播学習）が一般に実施されている。これはある
系もしくはそのある特定の問題の入力値に対する出力値
が得られた時、その入力に対する望ましい出力値が系か
ら出力されるように、各層を結び付けるリンクウエイト
（結合係数）を出力層側から変更していく学習アルゴリ
ズムである。通常ニューラルネットワークが対象とする
問題は非線型な関係であり、入力に完全に対応する出力
を得る系を構築するには無限個のニューロンを必要とす
ることになる。それで、階層構造が暫定的に決定された
FFNNで、BP学習によりトレーニングして、学習誤差が予
め定めた値よりも小さくなれば、つまりうまく学習に成
功すれば、そのFFNNを応用に適用する、という方法がと
られている。このあり方はFFNNに限らず、一般のニュー
ラルネットワークの理論的解析がいまだになされていな
いことに起因するが、構築されたニューラルネットワー
クが最適な構成であるとは限らないため、いくつかのモ
デルを用意して、学習の結果、最も良いネットワークを
採用するなどの方法も取られている。2. Description of the Related Art Conventionally, in a feedforward type neural network (also called a hierarchical NN; hereinafter referred to as FFNN), when applied to a specific problem, the number of layers in the network, the number of neurons in hidden layers, etc. Is appropriately determined by the experience and intuition of the researchers. BP learning (backpropagation learning, backpropagation learning) is generally performed in the hierarchical FFNN. This is because the link weights (coupling coefficients) that connect each layer are output side, so that when the output value for the input value of a certain system or its particular problem is obtained, the desired output value for that input is output from the system. It is a learning algorithm that changes from. The problem that neural networks usually deal with is a non-linear relationship, and an infinite number of neurons are required to construct a system that obtains an output that completely corresponds to an input. So, the hierarchical structure was tentatively decided
In FFNN, training is performed by BP learning, and if the learning error becomes smaller than a predetermined value, that is, if the learning succeeds, the FFNN is applied to the application. This method is not limited to FFNN, but it is due to the fact that theoretical analysis of general neural networks has not been done yet, but since the constructed neural network is not always the optimal configuration, several models are prepared. Then, as a result of learning, the best network is adopted.

【０００３】[0003]

【発明が解決しようとする課題】いずれにせよ、学習が
完了して応用に適したFFNNが用意されたとしても、その
FFNNが、その応用に際して申し分無く最適な、必要最小
限の隠れ層構成になっているかどうかは全く保証がな
く、一般には余分な隠れニューロンを保有する。このこ
とは、ネットワークとして必要以上の計算を実施しなけ
ればならず、余分な記憶容量を必要とすることを意味
し、取り扱える問題の限度が小さくなり、計算時間が増
大してしまうという問題がある。また、ネットワークを
ハードウエアで実現する場合では装置の規模を必要以上
に大きくしてしまうという問題がある。それゆえ、ニュ
ーラルネットワークにおいて最適なネットワークを構築
するための手立てが求められていた。In any case, even if learning is completed and an FFNN suitable for application is prepared,
There is no guarantee that the FFNN has the minimum required hidden layer structure that is perfectly optimal for its application, and generally has extra hidden neurons. This means that the network has to perform more calculations than necessary and requires an extra storage capacity, which reduces the problem that can be handled and increases the calculation time. . Further, when the network is realized by hardware, there is a problem that the scale of the device is unnecessarily increased. Therefore, there has been a demand for means for constructing an optimum network in the neural network.

【０００４】発明者らは、ネットワークの特徴を示す、
FFNNを学習させた後の関係式が行列（マトリクス）で表
記されることから、数学の行列理論を応用して最適化す
る方法を見出したのでFFNNの最適構築方法として提案す
る。The inventors show the characteristics of the network,
Since the relational expression after learning the FFNN is expressed by a matrix, we have found a method of optimization by applying the matrix theory of mathematics, so we propose it as an optimal construction method of FFNN.

【０００５】[0005]

【課題を解決するための手段】上記の課題を解決するた
め本発明の構成は、入力層、出力層、少なくとも一つ以
上の隠れ層を有し、学習データに基づいた学習によりリ
ンクウエイトが設定されたフィードフォワード型ニュー
ラルネットワークにおいて、前記隠れ層に含まれるｎ個
(n≧2)のニューロンのうち、第ｊ番目(1≦ｊ≦n)ニュー
ロンの各リンクウエイトｗ_jによるベクトルｗと、前記
学習データのｍ個の学習ベクトル値 t_i(i=1,2, …,m)
によるベクトルｔにより形成される数１式の示される線
型関係式の行列Ｏ（ｍ×ｎ行列）のランクを求め、該ラ
ンクの値を前記隠れ層のニューロンの最適個数とするこ
とである。In order to solve the above problems, the structure of the present invention has an input layer, an output layer, and at least one hidden layer, and a link weight is set by learning based on learning data. In the hidden layer in the generated feedforward neural network
Of the (n ≧ 2) neurons, the vector w according to each link weight w _j of the j-th (1 ≦ j ≦ n) neuron and m learning vector values t _i (i = 1,2) of the learning data. ,…, M)
Is to obtain the rank of the matrix O (m × n matrix) of the linear relational expression shown by the equation 1 formed by the vector t and the value of the rank is set as the optimum number of neurons in the hidden layer.

【０００６】また本発明の関連発明の構成は、数１式で
得られる前記行列Ｏを二つの直交行列Ｕ（ｍ×ｍ行
列）、Ｖ（ｎ×ｎ行列）、および対角行列Ｄ（ｍ×ｎ行
列）の積で示される数２式で、前記対角行列Ｄの対角成
分Ｄ_iagを数３式と書き表した際に、ある正の誤差値ｅ
未満のσ_i(2≦ｉ≦ｒ) の個数ｋにより、前記ランクの
値をｒ−ｋとし、該ｒ−ｋを前記隠れ層のニューロンの
最適個数とすることである。Further, in the configuration of the related invention of the present invention, the matrix O obtained by the equation 1 is converted into two orthogonal matrices U (m × m matrix), V (n × n matrix), and a diagonal matrix D (m When a diagonal component D _iag of the diagonal matrix D is expressed as Formula 3, the positive error value e
According to the number k of σ _i (2 ≦ i ≦ r) of less than, the value of the rank is set to r−k, and the r−k is set to the optimum number of neurons in the hidden layer.

【０００７】別の関連発明の構成はまた、前記隠れ層が
ｐ(p≧2)だけあり、前記出力層のすぐ直前に形成されて
いる第ｐ隠れ層についてランク値を求め、該ランク値を
第ｐ隠れ層のニューロン個数としたのち、順次、第ｐ-1
隠れ層についても、第ｐ隠れ層のｊ番目のニューロンに
おける各線型関係式（数４式）を満たす行列Ｏよりラン
ク値を求めて、前記ランク値を第ｐ-1番目の隠れ層のニ
ューロン個数とし、全ての隠れ層に対してニューロンの
最適個数を決定することである。According to another configuration of the related invention, there are p (p ≧ 2) hidden layers, and a rank value is obtained for the p-th hidden layer formed immediately before the output layer, and the rank value is calculated. After setting the number of neurons in the p-th hidden layer, p-1
Also for the hidden layer, the rank value is obtained from the matrix O that satisfies each linear relational expression (Equation 4) in the j-th neuron of the p-th hidden layer, and the rank value is calculated as the number of neurons in the (p-1) -th hidden layer. And determine the optimal number of neurons for all hidden layers.

【０００８】[0008]

【作用】隠れ層のニューロンの個数は線型問題の独立変
数の個数に相当し、線型問題を解くのに最低限必要な変
数の個数は行列理論におけるランク（階数）の値でほぼ
求められることから、ランクの値が隠れ層に最低必要な
ニューロンの個数に相当する。それで、構築して学習さ
せたニューラルネットワークの関係を行列表記して、そ
の行列からランクを求め、用いているニューロンの個数
が得られたランク値より多ければ、従属な変数に相当す
る余分なニューロンが存在することになり、省略でき
る。非線形性や計算上の誤差等により、行列のランクを
計算する際に、行列の対角要素が０とならずランクを決
定できないことが多いが、一定値ｅ以下の対角要素を無
視する（近似する）ことで、その誤差の範囲内でランク
を想定できる。余分と見なされる隠れ層のニューロンの
存在は、実際には不要と見なされるニューロンにおいて
も、ネットワーク出力に何らかの影響を与えているが、
その影響が無視できる値であれば、省略した方が効率が
よくなる。また実際にこの近似は誤差の少ない近似とな
っている。[Function] Since the number of neurons in the hidden layer corresponds to the number of independent variables in the linear problem, the minimum number of variables required to solve the linear problem is almost determined by the rank (rank) value in the matrix theory. The rank value corresponds to the minimum number of neurons required in the hidden layer. Then, the relation of the constructed and learned neural network is expressed in a matrix, the rank is obtained from the matrix, and if the number of neurons used is larger than the obtained rank value, the extra neurons corresponding to the dependent variables are Exists and can be omitted. When calculating the rank of a matrix due to non-linearity or calculation error, the diagonal elements of the matrix often do not become 0 and the rank cannot be determined, but diagonal elements with a constant value e or less are ignored ( By approximating), the rank can be assumed within the range of the error. The existence of hidden layer neurons, which are considered to be redundant, has some effect on the network output even in neurons that are actually considered to be unnecessary.
If the effect is negligible, it is more efficient to omit it. In fact, this approximation is an approximation with little error.

【０００９】[0009]

【発明の効果】無駄に計算量を増加させているネットワ
ークを減少させることができ、効率良い最適なニューラ
ルネットワークが構築できるので、計算時間、記憶容量
の節約が実現し、同規模の計算機では、より高度な問題
に対処できる。EFFECTS OF THE INVENTION Since it is possible to reduce the network which unnecessarily increases the calculation amount and to construct an efficient and optimal neural network, the calculation time and the storage capacity can be saved. Can handle more advanced problems.

【００１０】[0010]

【実施例】以下、本発明を具体的な実施例に基づいて説
明する。図１は本発明の、必要な隠れ層のニューロン数
を決定するための手順を示す。この手順を実際に当ては
めていくFFNNの構成として、図２に示す４層構造のニュ
ーラルネットワークで説明する。EXAMPLES The present invention will be described below based on specific examples. FIG. 1 illustrates the procedure for determining the required number of hidden layer neurons of the present invention. As a configuration of the FFNN to which this procedure is actually applied, a four-layer neural network shown in FIG. 2 will be described.

【００１１】図２のFFNNは入力層として二つのニューロ
ン１、２を持ち、第１隠れ層としての６個のニューロン
３〜８があり、第２隠れ層としての６個のニューロン９
〜１４、出力層は一つのニューロン１５のみで構成さ
れ、各層のニューロンは次の層の全てのニューロンへリ
ンクされている。そして学習データによりＢＰ学習法で
既に学習がなされ、各リンクウエイトの値が決められて
いる。第１、第２隠れ層のニューロンはそれぞれ変換関
数として良く知られたシグモイド関数s(x)=1／(1+exp(-
x)) を持つものとし、入、出力層のニューロンは線型関
係のみ持つものとする。The FFNN shown in FIG. 2 has two neurons 1 and 2 as an input layer, six neurons 3 to 8 as a first hidden layer, and six neurons 9 as a second hidden layer.
˜14, the output layer is composed of only one neuron 15, and the neuron of each layer is linked to all the neurons of the next layer. The learning data has already been learned by the BP learning method, and the value of each link weight has been determined. The neurons in the first and second hidden layers are sigmoid functions s (x) = 1 / (1 + exp (-
x)), and the neurons in the input and output layers have only linear relationships.

【００１２】つまり、隠れ層の各ニューロンの関係は、
図３(a) に示すように、そのニューロンにリンクされる
各ラインの値ｏ_iに、リンクウエイト（重み付け）とし
てｗ_iを掛け合わせて加算する。即ち、In other words, the relationship between the neurons in the hidden layer is
As shown in FIG. 3A, the value o _{i of} each line linked to the neuron is multiplied by w _i as a link weight (weighting) and added. That is,

【数５】ｏ＝ s（ Σ ｗ_iｏ_i＋θ ）（ｓはシグモ
イド関数）がそのニューロンの出力値となる。ここでθは、そのニ
ューロンが持つポテンシャルもしくはバイアスであり実
数値であって、系に適した値を決めることができる。図
２の各隠れ層はこのようなニューロンが６個並んでつな
げられている。入出力ニューロンの関係は図３(b) に示
す線型関係である。(5) o = s (Σw _i o _i + θ) (s is a sigmoid function) is the output value of the neuron. Here, θ is a potential or bias of the neuron and is a real value, and a value suitable for the system can be determined. In each hidden layer in FIG. 2, six such neurons are connected side by side. The relationship between the input and output neurons is the linear relationship shown in Fig. 3 (b).

【００１３】さてここで、このFFNNが10個の学習データ
出力値（教師信号）ｔ_i(i=1,2, …,10)によってトレー
ニングされたとする。先ず初めに、出力層に近い第２隠
れ層において、第ｉ番目の学習データが与えられた時の
第２隠れニューロンの第ｊ番目の出力をｏ(i,k) とし
て、以下の10×6 行列Ｏを作成する。Now, it is assumed that the FFNN is trained by ten learning data output values (teacher signals) t _i (i = 1, 2, ..., 10). First, in the second hidden layer close to the output layer, the j-th output of the second hidden neuron when the i-th learning data is given is o (i, k), and the following 10 × 6 Create a matrix O.

【数６】（ただし、i=1,2,…,10 、k=9,10, …,14 (6個））この行列Ｏは、第２隠れ層の６個のニューロン９〜１４
から出力層のニューロン１５へつながる各ラインの、リ
ンクウエイト列によるベクトルＷと、学習データベクト
ルｔとで、このFFNNの学習誤差の範囲内で数１式の線型
方程式を満足する。即ち、[Equation 6] (However, i = 1,2, ..., 10, k = 9,10, ..., 14 (6)) This matrix O is the six neurons 9 to 14 of the second hidden layer.
The vector W by the link weight sequence and the learning data vector t of each line connected to the neuron 15 of the output layer satisfy the linear equation of Formula 1 within the learning error of this FFNN. That is,

【数７】ＯＷ＝ｔここでｔベクトルは学習データ出力値（教師信号）ｔ_i
(i=1,2, …,10)を要素とする。ベクトルＷは、要素ｗ_j
(j=1,2, …,6) が第２隠れ層の番号 (j+8)のニューロン
から出力１５へのリンクウエイトを表す。仮りに学習デ
ータが20個であれば、その数だけ関係式が増える。各学
習データは通常、非線型関係を含んでいるので、誤差を
少なくするために、取りうるデータ範囲に広く分布する
より多くの学習データ、できれは全ての学習データを利
用することが望ましい。OW = t where t is the learning data output value (teacher signal) t _i
(i = 1,2, ..., 10) is an element. Vector W has element w _j
(j = 1,2, ..., 6) represents the link weight from the neuron of the second hidden layer number (j + 8) to the output 15. If there are 20 pieces of learning data, the number of relational expressions increases. Since each learning data usually contains a non-linear relationship, it is desirable to use more learning data, preferably all learning data, which are widely distributed in the possible data range, in order to reduce the error.

【００１４】この行列Ｏが求まったら、この行列Ｏのラ
ンクの値Ｒを求めることを実施する。数学の線型代数の
理論から、任意のｍ×ｎ行列Ｍは、ｍ×ｍ直交行列Ｘ、
ｎ×ｎ直交行列Ｙの二つと、一つのｍ×ｎ対角行列Ｇと
から次式のように表されることが知られている。After the matrix O is obtained, the rank value R of the matrix O is obtained. From the theory of linear algebra of mathematics, an arbitrary m × n matrix M is an m × m orthogonal matrix X,
It is known that two n × n orthogonal matrices Y and one m × n diagonal matrix G are expressed as the following equation.

【数８】Ｍ＝ＸＧＹ^t （添字 tは行列Ｖの転置を示す）それで、数６式の行列Ｏも展開できて、数２式のように
表される。このとき、数３式の対角行列Ｄの対角成分Ｄ
_iagを特異値と呼び、値の大きい順（数３式のσが、σ
₁≧σ₂≧σ₃≧……≧σ_r≧０）に左上から並んで表
記することができる（行列の行または列を入れ換えても
関係は変わらないという、線型代数の理論による）。こ
のような一連の行列の展開・再構成を特異値分解と呼
ぶ。Equation 8] M = XGY ^t (subscript t denotes the transpose of the matrix V) So, it can deploy matrix O of equation (6), it is expressed by the equation (2). At this time, the diagonal component D of the diagonal matrix D of Equation 3
_iag is called a singular value, and is sorted in descending order of values (σ in Equation 3 is σ
₁ ≧ σ ₂ ≧ σ ₃ ≧ ... ≧ σ _r ≧ 0) can be described side by side from the upper left (according to the theory of linear algebra that the relationship does not change even if the rows or columns of the matrix are exchanged). Such expansion / reconstruction of a series of matrices is called singular value decomposition.

【００１５】また線型代数の理論から、対角行列Ｇの対
角成分の値が０でない個数が、もとの行列Ｍのランクを
示すことが知られている。さらに、行列Ｇの対角成分の
内、特異値の小さい方からｐ個だけ０にしてしまった行
列Ｇ' より得られる行列Ｍ'は行列Ｍよりもｐだけラン
クの小さい行列のうち、最も最小自乗距離の短い行列で
あることも知られている。このことは、もとの行列Ｍの
非常によい近似を与えることを意味する。数学では数３
式のσはきっちり０となることがあり得るが、ニューラ
ルネットワークにおける学習結果からは、ネットワーク
の特性上学習誤差を含むため、必ずしも０とはならな
い。しかし、σの小さい値のものは、ある誤差値ｅを下
回ることから、近似するという観点で０であると見なす
ことは妥当である。It is known from the theory of linear algebra that the number of non-zero diagonal components of the diagonal matrix G indicates the rank of the original matrix M. Further, of the diagonal elements of the matrix G, the matrix M ′ obtained from the matrix G ′ in which p p is set to 0 from the smallest singular value is the smallest matrix p having a rank smaller than the matrix M by p. It is also known that the matrix has a short squared distance. This means that it gives a very good approximation of the original matrix M. Number 3 in mathematics
Although σ in the expression may be exactly 0, it does not always become 0 because the learning result in the neural network includes a learning error due to the characteristics of the network. However, a small value of σ falls below a certain error value e, so it is appropriate to regard it as 0 from the viewpoint of approximation.

【００１６】ある誤差値ｅ以下の対角成分を無視して０
とすることは、線型代数の理論で、誤差の自乗値の総和
（自乗距離）が最小である近似行列であることがわかっ
ている。即ち、数３式の各σのうち、０でないものがｐ
個あったとするとき、そのうちのｑ個を０にした時得ら
れる対角行列によってできる行列Ｏ’は、行列Ｏよりも
ｑだけランクの小さい行列のうち、最も自乗距離の小さ
い、即ち最も近似な行列であることがわかっている。こ
れはつまり、最も良い近似を与える線型関係式を得るこ
とになり、FFNNの計算誤差を少ないままに維持した近似
を実施することを意味する。0 is ignored by ignoring the diagonal components below a certain error value e.
It is known from the theory of linear algebra that the approximation matrix has the minimum sum of squared values of errors (squared distance). That is, among each σ in the equation 3, the one that is not 0 is p
If there are q, then the matrix O ′ formed by the diagonal matrix obtained when 0 is set to 0 has the smallest square distance, that is, the closest approximation, of the matrices whose rank is q smaller than the matrix O. I know it's a matrix. This means that the linear relational expression that gives the best approximation is obtained, which means that the approximation is performed while keeping the FFNN calculation error small.

【００１７】従って、学習で得られた数５式の行列Ｏの
ランクは、各対角成分σ_iの、誤差値ｅ以下を０として
残った個数、とすることができる。従ってそのランク
は、線型代数学から、要素の独立な個数を意味するの
で、その得られたランク値がニューロンの独立性、即ち
最低限必要なニューロンの個数を与える。Therefore, the rank of the matrix O of the equation (5) obtained by learning can be the number of remaining diagonal components σ _i with the error value e or less being 0. Therefore, since the rank means an independent number of elements from the linear algebra, the obtained rank value gives the independence of neurons, that is, the minimum required number of neurons.

【００１８】ランク値がニューロンの個数Ｎと一致した
場合は、全てのニューロンが独立であり、全て必要であ
ることを意味し、省くことができない。無理に省くと誤
差の増大を招くことになる。If the rank value is equal to the number N of neurons, it means that all the neurons are independent and all of them are necessary, and it cannot be omitted. If it is forcibly omitted, an error will increase.

【００１９】ランク値Ｒがニューロン数Ｎより少ない場
合は、Ｎ−Ｒ個のニューロンを取り除くことができる。
つまりFFNNは学習誤差を大幅に増大させることなく（正
確には理論上、最小自乗誤差の増大、もしくは誤差値ｅ
以下の特異値の総和の増大で）第２隠れ層からニューロ
ンを減らすことができる。If the rank value R is less than the number N of neurons, N−R neurons can be removed.
In other words, FFNN does not significantly increase the learning error (more accurately, theoretically, the increase of the least squares error or the error value e
It is possible to reduce neurons from the second hidden layer (by increasing the sum of singular values below).

【００２０】そこでランクＲが５となった場合、図２の
第２隠れ層の出力のうち、例えばベクトルｏ₁が他のベ
クトルｏ_i(i=2〜6)で表記されるとすると、数１式は、If the rank R becomes 5, then, for example, if the vector o _{1 in} the output of the second hidden layer in FIG. 2 is represented by another vector o _i (i = 2 to 6), then One set is

【数９】（Σ kｏ_i, ｏ₂,ｏ₃,ｏ₄,ｏ₅,ｏ₆)Ｗ =ｔ (i=2〜6) となり、展開すれば元々ベクトルｏ_i(i=2〜6)で数１式
を表わしたものにすぎないことから、ニューロンの個数
は一つ少ない５個でよいことになる。ランク値がさらに
小さい値の場合には上記の展開を繰り返し適用すること
でＮ−Ｒ個のニューロンが不要であることがわかる。Equation 9] _{_{(Σ ko i, o 2,}} o 3, o 4, o 5, o 6) W = t (i = 2~6) , and the original vector o _i By deploying (i = 2 to 6) Since only the expression 1 is expressed by, the number of neurons is 5, which is one less. When the rank value is smaller, it can be seen that NR neurons are unnecessary by repeatedly applying the above expansion.

【００２１】さらに今度は、第１隠れ層に着目する。ま
ず第２隠れ層で実施した方法と同様に、行列Ｏを作成す
る。このときリンクウエイトの行列Ｗは第１隠れ層から
第２隠れ層のｊ番目のニューロンへリンクされる値のリ
ンクウエイトを並べた列ベクトルとなる。ここでは出力
にあたる第２隠れ層が６個あるので、６つの数１式が得
られ、これらを満足する一つの行列Ｏを求めることにな
る。学習ベクトルｔに相当する値は、出力１５の学習値
が与えられて、第２隠れ層の各リンクウエイトが学習で
定まったときに、第１隠れ層からの出力を第２隠れ層側
から規定されることから、この規定される出力値を用い
る。これらの式から図１に示す上述の方法を同様に適用
することで、やはりそこで求まるランクＲが第１隠れ層
における最小限必要なニューロン数となり、不要なニュ
ーロン個数が判定できる。また、この実施例に限らず、
いくつの階層構造を有するFFNNにおいても同様な展開で
各隠れ層のニューロン数を順次出力側から決定していく
ことができる。Further, this time, attention is paid to the first hidden layer. First, the matrix O is created in the same manner as the method implemented in the second hidden layer. At this time, the link weight matrix W is a column vector in which link weights of values linked from the first hidden layer to the j-th neuron of the second hidden layer are arranged. Here, since there are six second hidden layers that are outputs, six equations (1) are obtained, and one matrix O that satisfies these is obtained. The value corresponding to the learning vector t is given the learning value of the output 15, and when each link weight of the second hidden layer is determined by learning, the output from the first hidden layer is defined from the second hidden layer side. Therefore, the specified output value is used. By applying the above-described method shown in FIG. 1 from these expressions in the same manner, the rank R obtained there is also the minimum required number of neurons in the first hidden layer, and the number of unnecessary neurons can be determined. Also, not limited to this embodiment,
The same expansion can be used to sequentially determine the number of neurons in each hidden layer from the output side in FFNNs with any number of hierarchical structures.

【００２２】本発明の実行手順を図１のフローチャート
で説明する。まずステップ100 では、初期設定として、
予め前述の誤差値ｅを、例えば0.1 等として設定してお
き、この値ｅより小さな対角要素を切り捨ててランクに
含めないと設定する。そしてステップ102 で隠れ層の段
数Ｈを計算パラメータｉに設定する。得られている学習
結果のデータから、第ｉ隠れ層の係数行列Ｏを計算する
（ステップ104)。次に得られた係数行列Ｏの特異値分解
を行う（ステップ106)。そして得られた特異値から、誤
差値ｅより小さい特異値の個数を配列Ｆ(i) に代入する
（ステップ108)。そして次の段の隠れ層に移って（ステ
ップ110,112)、以下同様の手順を隠れ層が無くなるまで
続け、最終的に得られた配列Ｆ(i) に、第ｉ隠れ層の不
要なニューロンの個数が示されている（ステップ114)。
なお、この手順では、不要なニューロン個数を算出する
が、ランクの値、つまり必要なニューロン個数を結果と
して出力しても意味としては同じである。The execution procedure of the present invention will be described with reference to the flowchart of FIG. First, in step 100, as an initial setting,
The above-mentioned error value e is set to, for example, 0.1 in advance, and diagonal elements smaller than this value e are truncated and not included in the rank. Then, in step 102, the number H of hidden layers is set as the calculation parameter i. The coefficient matrix O of the i-th hidden layer is calculated from the obtained learning result data (step 104). Next, singular value decomposition of the obtained coefficient matrix O is performed (step 106). Then, from the obtained singular values, the number of singular values smaller than the error value e is substituted into the array F (i) (step 108). Then, the process proceeds to the hidden layer in the next stage (steps 110 and 112), and the same procedure is repeated until the hidden layer disappears, and the number of unnecessary neurons in the i-th hidden layer is added to the finally obtained array F (i). Are shown (step 114).
In this procedure, the number of unnecessary neurons is calculated, but the meaning is the same even if the rank value, that is, the required number of neurons is output as a result.

【００２３】（第二実施例）本発明の別の実施例とし
て、ニューロンの最適個数の決定を、数回繰り返す場合
を示す。つまり、学習済みのニューラルネットワークの
構成において、まず一度ランク値を求めて最適ニューロ
ン個数を決定し、次に最適ニューロン個数を設定したニ
ューラルネットワークで再度学習させて、得られた新し
いリンクウエイトを基に、再度ランクを求めて、より最
適なニューロン個数を決めていく手順を繰り返すことを
実施する。(Second Embodiment) As another embodiment of the present invention, a case where the determination of the optimum number of neurons is repeated several times will be described. In other words, in the structure of the learned neural network, the rank value is first obtained to determine the optimal number of neurons, and then the neural network in which the optimal number of neurons is set is trained again, and based on the new link weight obtained. , The rank is obtained again, and the procedure of determining the more optimal number of neurons is repeated.

【００２４】図示しない三層構成（入力層２個、隠れ層
２０個、出力層１個のニューロン）のFFNNを平方領域検
知問題に適用した場合で、１００個の学習データを基に
したリンクウエイトから特異値分解して特異値を求めた
シミュレーション結果を図４(a) に示す。この図から、
不要と推定されるニューロンの数は、ニューロン５個目
と６個目で少し大きく減少していることから、およそ１
４個となり、実際に主として活躍するニューロンは６、
７個と推定される。そこで隠れ層のニューロン個数を７
個に縮小したFFNNにおいて同じ学習データでトレーニン
グし、再度特異値を求めると図４(b) のようなグラフと
なった。この図４(b) の結果から、さらに２個のニュー
ロンが不要と推定され、この隠れ層のニューロン数は５
とすることができることがわかる。この適応問題は、別
の議論により４個のニューロン数でよいことが知られて
おり（R.P.Lippman et al.,"An Introduction to Compu
ting with Neural Nets",IEEE ASSP Magazine, pp.4-2
3,Apr.1987 ）、本発明の構築方法が十分対応している
ことがわかる。When an FFNN having a three-layer structure (two neurons in the input layer, 20 hidden layers, and one output layer) (not shown) is applied to the square area detection problem, a link weight based on 100 learning data is used. Fig. 4 (a) shows the simulation result of singular value decomposition from S to obtain the singular value. From this figure,
The number of neurons estimated to be unnecessary is about 1 since the number of neurons is slightly reduced at the 5th and 6th neurons.
There are 4 neurons, and 6 neurons are actually active,
It is estimated to be 7. Therefore, the number of neurons in the hidden layer is 7
Training with the same learning data in the FFNN reduced to individual pieces and recalculating the singular values resulted in the graph shown in Fig. 4 (b). From the result of FIG. 4 (b), it is estimated that 2 more neurons are unnecessary, and the number of neurons in this hidden layer is 5
It turns out that you can It is known from another argument that this adaptation problem requires four neurons (RPLippman et al., "An Introduction to Compu
ting with Neural Nets ", IEEE ASSP Magazine, pp.4-2
3, Apr. 1987), it can be seen that the construction method of the present invention is sufficiently compatible.

【００２５】なお、どのニューロンが独立、あるいは従
属かということは、学習の結果決まることであり、学習
させる前には決定できないことから、一度経験的にニュ
ーラルネットワークを適当な隠れ層のニューロン個数で
学習させ、その結果得られた入力側と出力側の学習デー
タの線型関係式からランクを求め、不要な個数のニュー
ロンを除いて、再び学習させて新たなリンクウエイトを
得る、という展開で、より正確なニューラルネットワー
クが構築される。あるいは、特異値が小さい値を示した
ニューロンについて、そのニューロンを取り除き、残っ
たリンクウエイトを踏襲したまま再学習させ、BP学習の
収束を早くして最適なネットワークを得る、という手順
で求めてもよい。Note that which neuron is independent or dependent is determined as a result of learning and cannot be determined before learning. Therefore, once the empirical method is applied to the neural network, the number of neurons in an appropriate hidden layer is used. The learning is performed, the rank is obtained from the linear relational expression of the learning data on the input side and the output side obtained as a result, the unnecessary number of neurons are removed, and the learning is performed again to obtain a new link weight. An accurate neural network is built. Alternatively, for a neuron showing a small singular value, the neuron is removed, relearning is performed while following the remaining link weights, and BP learning converges quickly to obtain an optimal network. Good.

【００２６】以上のように、本発明の学習済みのFFNNに
おいて、各層のニューロンの構成数に係わらず、学習デ
ータに基づいて隠れ層のニューロンの最適な個数の決め
方が明らかにされたので、効率的な計算が実施でき、ま
た効果的な結果を求められるフィードフォワード型ニュ
ーラルネットワークを構築することができる。As described above, in the trained FFNN of the present invention, it has been clarified how to determine the optimal number of hidden layer neurons based on the learning data regardless of the number of constituent neurons of each layer. It is possible to construct a feedforward neural network that can perform effective calculation and obtain effective results.

[Brief description of drawings]

【図１】隠れ層のニューロン個数を決定する手順の一例
を示すフローチャート。FIG. 1 is a flowchart showing an example of a procedure for determining the number of hidden layer neurons.

【図２】本発明の応用を示すニューラルネットワークの
構成図。FIG. 2 is a configuration diagram of a neural network showing an application of the present invention.

【図３】各層のニューロンの働きを示す説明図。FIG. 3 is an explanatory diagram showing functions of neurons in each layer.

【図４】特異値のシミュレーション結果を示す特性図。FIG. 4 is a characteristic diagram showing simulation results of singular values.

【図５】一般的なニューラルネットワークの構成図。FIG. 5 is a configuration diagram of a general neural network.

[Explanation of symbols]

１、２入力層のニューロン３、４、５、６、７、８第１隠れ層のニューロン９、１０、１１、１２、１３、１４第２隠れ層のニュ
ーロン１５出力層のニューロン1, 2 Input layer neurons 3, 4, 5, 6, 7, 8 First hidden layer neurons 9, 10, 11, 12, 13, 14 Second hidden layer neurons 15 Output layer neurons

Claims

[Claims]

1. A feedforward neural network having an input layer, an output layer, at least one hidden layer, and link weights set by learning based on learning data, wherein n hidden layers are included in the hidden layer. Of (n ≧ 2) neurons,
Each link weight w _{j of} the j-th (1 ≦ j ≦ n) neuron
Vector w and m learning values t of the learning data
_The linear relational expression formed by the vector t by _i (i = 1, 2, ..., M), ## EQU1 ## The rank of the matrix O (m × n matrix) of Ow = t is obtained, and the value of the rank is calculated as above. A method for constructing a neural network, characterized in that the optimal number of hidden layer neurons is used.

2. The matrix O obtained by the equation 1 is obtained by two orthogonal matrices U (m × m matrix), V (n × n matrix), and a diagonal matrix D (m × n matrix). The product is expanded to O = UDV ^t (subscript t indicates the transpose of the matrix V), and the diagonal component D _iag of the diagonal matrix D is _expressed as D _iag = (σ ₁ , σ ₂ , σ ₃ , ..., σ _r ) (where r = min (m, n), σ _i ≧ 0 (1 ≦ i ≦ r)), and σ _i (less than a certain positive error value e 2 ≦ i
2. The method for constructing a neural network according to claim 1, wherein the rank value is set to r-k according to the number k of ≤r), and the r-k is set to the optimum number of neurons in the hidden layer.

3. There are p (p ≧ 2) hidden layers, and a rank value is obtained for the p-th hidden layer formed immediately before the output layer, and the rank value is set to the number of neurons in the p-th hidden layer. Then, for each p−1 hidden layer, the linear relational expression in the j-th neuron of the p-th hidden layer is expressed as follows: Ow ^(j) = t ^(j) (where w ^(j) Is the p-th pixel for the j-th neuron
The link weight from each neuron in the hidden layer, t ^(j), is the output of the j-th neuron with respect to the learning value t _i . 2. A rank value is obtained from a matrix O that satisfies the above condition, the rank value is set as the number of neurons in the (p-1) th hidden layer, and the optimal number of neurons is determined for all hidden layers. 2. The method for constructing a neural network according to 2.