WO1991002315A1 - Procedes relatifs a la configuration d'un reseau de traitement reparti en parallele - Google Patents

Procedes relatifs a la configuration d'un reseau de traitement reparti en parallele Download PDF

Info

Publication number
WO1991002315A1
WO1991002315A1 PCT/US1990/004037 US9004037W WO9102315A1 WO 1991002315 A1 WO1991002315 A1 WO 1991002315A1 US 9004037 W US9004037 W US 9004037W WO 9102315 A1 WO9102315 A1 WO 9102315A1
Authority
WO
WIPO (PCT)
Prior art keywords
nodes
input
eigenvectors
layer
hidden
Prior art date
Application number
PCT/US1990/004037
Other languages
English (en)
Inventor
Aaron James Owens
Michael Joseph Piovoso
Original Assignee
E.I. Du Pont De Nemours And Company
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by E.I. Du Pont De Nemours And Company filed Critical E.I. Du Pont De Nemours And Company
Publication of WO1991002315A1 publication Critical patent/WO1991002315A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Definitions

  • the present invention relates to various methods for configuring a parallel distributed processing network and, in particular, to a method for estimating the initial number of hidden nodes and for initializing the connection weights between the input and the hidden nodes and between the hidden and the output nodes, as well as to a method for optimizing the number of hidden nodes in a parallel distributing processing network.
  • PDPN's Processing Networks
  • Copending Application Serial Number 07/316,717, filed February 28, 1989 (ED-0373) in the name of Aaron James Owens and assigned to the assignee of the present invention relates to an apparatus and method for controlling a process using a trained parallel distributed processing network.
  • the data which is presented to the network is either the pattern to be recognized or the input data to the process or functional map.
  • the parallel distributed processing network then learns the appropriate weights to relate the input to the known outputs.
  • the training of the network represents the most computationally time consuming part of the process.
  • Back Propagation Networks are feedforward multilayered networks with the first layer being the input layer and the last layer being the output layer.
  • the intermediate layer(s) is(are) referred to as hidden (or intermediate) layers.
  • Figure 1 illustrates such a network having only one hidden layer, but in general there may be many such hidden layers.
  • the Back Propagation Network works in the following way.
  • Each layer in the network is composed of nodes (artificial neurons) which are mathematical computation elements which have multiple inputs and a single output.
  • the output is then fanned out to serve as an input for additional neurons in
  • a i(j) be the output of the i th neuron in the j th layer of the network and W i(g),j(h) be the weight or
  • the output of a neuron j in layer g-1 is determined in a two step process. First the "affine transformation" from the previous layer is defined by
  • W j(g-1) is the bias term and Q is the number of neurons in layer g.
  • the result of the affine transformation, X j(g-1) is transformed by a nonlinear squashing function such as the hyperbolic tangent, tanh(X j(g- 1) ), or the sigmoid function,
  • Figures 2 and 3 illustrate the hyperbolic and the sigmoid squashing functions, respectively.
  • the Back Propagation Network is trained by
  • the weights W i(g),j(g - 1) are adjusted to minimize the sum-of-squared errors between the exact known outputs and the output of the network.
  • the corresponding weights then form the least-squared error solution and define the resulting Back Propagation Network.
  • Techniques for minimizing the error are found in Rumelhart and McClelland (David E. Rumelhart and James L. McClelland, Parallel Distributed Processing, volume 1. Foundations. MIT Press, Cambridge. MA, 1986) and in
  • the method requires an initialization procedure for choosing the weights W i (g),j(g- 1) and the biases W j(g-1) as well as the number of hidden nodes or neurons in the hidden layer of the parallel distributed processing network.
  • the initial weights must not all be set to zero because convergence to the optimal solution is not possible.
  • a standard procedure for choosing the weights and biases is to take values from a random distribution.
  • Current criteria for choosing the number of hidden nodes are heuristic. One suggestion is to guess the number of hidden nodes and to train the network to the best possible error. If this error is on the order of the noise level of the training response variable, then the number of hidden nodes is assumed correct. If the error is larger or smaller than this noise level, then nodes are deleted or added until the number of nodes chosen permits the error to reach that level.
  • the present invention relates to various methods for configuring a parallel distributed processing network.
  • This method uses principal component analysis to determine the set of eigenvectors and their corresponding eigenvalues from a total number of input information members, where the total number of input information members is derived from a predetermined number of sets of input exemplars, each set having a plurality of input information members.
  • the number of hidden nodes is determined by the number of eigenvectors in the set whose eigenvalues exceed a
  • the threshold may be determined using a predetermined percentage R of the maximum one of the eigenvalues or by simply assigning a reference eigenvalue as the threshold.
  • the invention relates to the initial assignment of values of the connection weights between the nodes in the input and in the hidden layers. These initial values of weights between each of the nodes in the input layer and a selected node in the hidden layer are assigned in accordance with the values of the components of a selected one of the eigenvectors in the subset. Preferably, the eigenvectors are prioritized according to their relative eigenvalues.
  • the invention relates to the initial assignment of values of the connection weights between the nodes in the hidden and in the output layers.
  • the activation level of each of the nodes in the hidden layer is determined by applying to the input nodes a predetermined input set of exemplars. Each input set of exemplars has a corresponding output set of exemplars.
  • the activation levels of the hidden layer and the output set of exemplars transformed by the inverse of the squashing function are regressed thereby to form a matrix of column vectors each of which is composed of coefficients.
  • the coefficients of each column vector correspond to the predetermined order of nodes in the hidden layer.
  • Each of the column vectors sequentially corresponds to each of the output nodes.
  • the connection weight between each of the nodes in the hidden layer and a selected one of the Y nodes in the output layer is assigned in accordance with the values of the coefficients of the column vector corresponding to the selected output node.
  • the invention relates to a method for choosing the optimal number of hidden nodes.
  • This method starts from a least-squares linear model from which a better Back Propagation Network model is systematically improved, leading to the model generated from a training set which can best reproduce a prediction set. As the number of hidden nodes are increased, the prediction error decreases until a minimum is found; this determines the optimal number of hidden nodes.
  • Figure 1 is a schematic description of a three-layer feed forward back propagation network
  • Figure 2 is a graph of the hyperbolic squashing function
  • Figure 3 is a graph of the sigmoid squashing function
  • Figure 4 is a graph of the sum-of-squared error versus number of hidden nodes for the training data set; for doing this.
  • FO4YAF of the NAG library or CORVC from the IMSL library computes the variancecovariance matrix.
  • Other routines, FO2AVF [NAG] or PRINC are also routines, FO2AVF [NAG] or PRINC.
  • Block 1 of Figure 6 involves these operations.
  • Figure 6 refer to those in the program listing (pages 23 to 27 of the Appendix) which perform the operations described In the boxes.
  • the eigenvalues tell the relative importance of each eigenvector by giving the amount of variance explained by looking in the direction of the eigenvector.
  • Block 4 of Figure 6 illustrates this step.
  • the inverse of the nonlinear transformation of the squashing function of the output data is used to obtain the target inputs the output layer. This is illustrated in block 5 of Figure 6.
  • Figure 5 is a graph of the sum-of-squared error versus number of hidden nodes for the prediction data set;
  • Figure 6 is a block diagram of the steps of the hidden node estimation and the weight initialization methods;
  • Figure 7 is a block diagram of the steps of the method to optimize the number of hidden nodes
  • Figure 8 is a graph of the sum-of-squared error versus number of hidden nodes for the polymer composition data of the example.
  • An Appendix (Pages 18 to 29 ) including a listing of a program in Fortran-77 source language is attached to and forms part of this application.
  • Several aspects of the present invention relate to methods for rationally picking the number of hidden nodes and for determining excellent starting values for the weights. First discussed are methods to determine the initialization of the weights and to choose the appropriate number of hidden units.
  • PCA Principle Component Analysis
  • the invention in another aspect uses the back propagation algorithm to further refine the required number of hidden nodes and the final weights.
  • the Back Propagation Network is trained using a back
  • the trained network is tested against a "prediction" data set (i.e., input-output correspondences) which is withheld from the training phase.
  • a "prediction" data set i.e., input-output correspondences
  • Each of these exemplars are presented to the trained network, and the sum-of-squared errors of the outputs is produced.
  • This sum is a measure of how well the trained network interpolates on similar data.
  • Block 3 of Figure 7 illustrates this. The sum is compared with the previous sum to see if there is any improvement. If the sum increases, then the present network is doing a poorer job and the previously trained network is optimal. If the network produces a sum less than the previous, then the present one is better. This decision is illustrated in block 4 of Figure 7.
  • the weights for the old connections are the same as were previously calculated.
  • the weights for the new connections are chosen as small random numbers. This guarantees that the search for a better model begins near the solution of the previous model. It also guarantees that the new sum-of-squared error on the training set is monotonically decreasing. This is not always the case with standard back propagation algorithms.
  • the FORTRAN 77 computer program SHAPE (pages 28 to 29 of the Appendix) implements these steps, illustrated as Block 6 of Figure 7.
  • a procedure is defined based on the predictability of the trained network itself.
  • the initialization weights between the hidden and the output layers are determined by the multiple linear regression of the activation levels of the hidden nodes with the transformation of the output data by the inverse of the squashing function.
  • the determination of the initial weights is based on
  • PCR Principle Component Regression
  • PCA Principle Component Analysis
  • Subroutines FO4YAF from the NAG Library or CORVC from the IMSL library can be used to evaluate the variance-covariance matrix of the input data, and subroutines FO2ABF [NAG] or PRINC [IMSL] will compute the matrix's eigenvalues and eigenvectors.
  • the NAG Library is available from Numerical Algorithms Group, Downers Grove, Illinois while the IMSL Library is available from IMSL, Inc., Houston, Texas.
  • the eigenvectors determine a new system of coordinates which are orthogonal and define directions in the data space which order in descending fashion the amount of variability in the data.
  • the eigenvalue is related to the variability accounted for by that corresponding eigenvector. If the hidden nodes were linear and the weights from the input layer to the hidden layer were initialized with the components of the eigenvectors, then the output would be the projection of a given exemplar onto the space spanned by the eigenvectors. Furthermore, if one takes the projection of each exemplar onto the eigenvectors and regresses these projections onto the outputs, then a Principle Component Analysis (PCA)model is developed. Principle Component Regression of the input exemplars is used to determine the initial weights for the neural network from the input to the hidden layer and to determine the initial estimate of the number of hidden nodes.
  • PCA Principle Component Analysis
  • the last (M-P) eigenvalues would be associated with noise and hence be very small.
  • a threshold could be used to exclude eigenvalues less than R% of the maximum eigenvalue, where R might be set by the amount of noise in the input data.
  • R might be set by the amount of noise in the input data.
  • Another objective criterion would be to keep all eigenvectors associated with eigenvalues greater than some fixed value, for example 1.
  • the number of eigenvectors kept defines the initial es ⁇ mate of the number of hidden nodes.
  • the initial weights for the connections between the input layer and the hidden node j are the components of the eigenvector corresponding to j th eigenvaluer
  • the initialization of the input layer to the hidden layer of the parallel distributed processing network represented in Figure 1 is to set the w i(3),j(2) to be the i th component of the j th eigenvector.
  • MLR Multiple Linear Regression
  • each of the input exemplars is projected onto each of the eigenvectors by evaluating the affine transformation
  • the output set of Y output information members are transformed by the inverse of the nonlinear squashing function.
  • the activation levels of the hidden layer are regressed with the target inputs of the output nodes resulting in a matrix of column vectors which is composed of coefficients.
  • the coefficients of each column vector correspond to the predetermined hidden nodes, and each of the column vectors sequentially correspond to each of the Y output nodes.
  • the coefficients of the j th column vector are the initial weights from each of the P hidden nodes to the j th output nodes.
  • the initialization of the hidden layer to the first or output layer is to set W j(2), k( 1) equal to j th
  • the actual number of nodes required and the optimal weights must be determined iteratively.
  • the optimal number of nodes P o is usually close to this number P in the subset of eigenvectors, but in general not the same.
  • Parallel distributed processing networks are nonlinear processes, and principle component regression is a linear process which will only approximate the parallel distributed processing network solution.
  • the objective of implementing back propagation network is not merely to represent the input data set but rather to reliably predict what the output would be for an input data vector not previously seen by the network.
  • the optimal number of hidden nodes to do this is the objective of the back propagation network representation.
  • the errors of the prediction outputs with a set of target outputs of the prediction data set does not decrease, start again with a parallel distributed processing network having one or two fewer hidden nodes and repeat the procedure.
  • the training data set consisted of 15 exemplars N, each with 39 inputs M and one output Y.
  • TRAIN.DAT Example Data Set
  • the output of the PROGRAM FACTW is given below, labeled "Initialization: Example Run.” Both objective criteria discussed above result in an initial selection of two significant eigenvectors with corresponding eigenvalues of 32.75 and 4.65 in the Example Run. Choosing two as the number of hidden nodes, the weights were initialized as per the method of this invention.
  • the initial file TRAIN.WF produced by FACTW is given in the example printout, with weight entries identified.
  • Figure 8 shows the results of applying the methodology in Figure 7 to this polymer composition data set. Predictions were made on 21 exemplars in a distinct prediction data set. The prediction error has a minimum at 2 hidden units. [Although the present invention would not normally evaluate a model with
  • CALL F04JGF (itrain, inter1, a, maxpat, b, tol, svd,
  • Thresholds Random (-fn to fn) c

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

On décrit divers procédés utilisant les techniques d'analyse des principaux composants et de régression linéaire multiple, afin de configurer un réseau de traitement réparti en parallèle. Sont notamment décrit un procédé d'estimation du nombre initial de n÷uds cachés (j), un procédé d'initialisation des poids de connexion entre les n÷uds d'entrée (i) et les n÷uds cachés (j), et entre les n÷uds cachés (j) et les n÷uds de sortie (k), ainsi qu'un procédé d'optimisation du nombre de n÷uds cachés (j) dans un réseau de traitement réparti en parallèle.
PCT/US1990/004037 1989-08-01 1990-07-24 Procedes relatifs a la configuration d'un reseau de traitement reparti en parallele WO1991002315A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US38818489A 1989-08-01 1989-08-01
US388,184 1989-08-01

Publications (1)

Publication Number Publication Date
WO1991002315A1 true WO1991002315A1 (fr) 1991-02-21

Family

ID=23533037

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1990/004037 WO1991002315A1 (fr) 1989-08-01 1990-07-24 Procedes relatifs a la configuration d'un reseau de traitement reparti en parallele

Country Status (1)

Country Link
WO (1) WO1991002315A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0583217A2 (fr) * 1992-08-11 1994-02-16 Hitachi Europe Limited Optimisation d'un réseau neuronal à multicouches
DE10139682A1 (de) * 2001-08-11 2003-02-27 Deneg Gmbh Verfahren zum Generieren von neuronalen Netzen
CN103605323A (zh) * 2013-08-09 2014-02-26 中国蓝星(集团)股份有限公司 化工生产的离散控制方法及装置
CN108369664A (zh) * 2015-11-30 2018-08-03 谷歌有限责任公司 调整神经网络的大小

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4933872A (en) * 1988-11-15 1990-06-12 Eastman Kodak Company Method and system for wavefront reconstruction

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4933872A (en) * 1988-11-15 1990-06-12 Eastman Kodak Company Method and system for wavefront reconstruction

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Introduction to Computing with Neural Nets", RICHARD P. LIPPMANN, IEEE ASSP MAGAZINE, April 1987, pages 4-22 *
"Parallel Distributed Processing", DAVID E. RUMELHART and JAMES L. MCCLELLAND, 1986, MIT Press. *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0583217A2 (fr) * 1992-08-11 1994-02-16 Hitachi Europe Limited Optimisation d'un réseau neuronal à multicouches
EP0583217A3 (fr) * 1992-08-11 1995-03-15 Hitachi Europ Ltd Optimisation d'un réseau neuronal à multicouches.
DE10139682A1 (de) * 2001-08-11 2003-02-27 Deneg Gmbh Verfahren zum Generieren von neuronalen Netzen
DE10139682B4 (de) * 2001-08-11 2004-08-05 Deneg Gmbh Verfahren zum Generieren von neuronalen Netzen
CN103605323A (zh) * 2013-08-09 2014-02-26 中国蓝星(集团)股份有限公司 化工生产的离散控制方法及装置
CN103605323B (zh) * 2013-08-09 2016-03-30 蓝星(北京)技术中心有限公司 化工生产的离散控制方法及装置
CN108369664A (zh) * 2015-11-30 2018-08-03 谷歌有限责任公司 调整神经网络的大小

Similar Documents

Publication Publication Date Title
US11687788B2 (en) Generating synthetic data examples as interpolation of two data examples that is linear in the space of relative scores
WO2019067960A1 (fr) Développement agressif à l'aide de générateurs coopératifs
JP5395241B2 (ja) 入力ベクトルがニューロンによって既知であるか未知であるかを決定する方法
MacKay A practical Bayesian framework for backpropagation networks
Nauck Neuro-fuzzy systems: review and prospects
Yingwei et al. A sequential learning scheme for function approximation using minimal radial basis function neural networks
Ghosn et al. Bias learning, knowledge sharing
EP0694192B1 (fr) Systeme de reconnaissance et methode
WO1991002315A1 (fr) Procedes relatifs a la configuration d'un reseau de traitement reparti en parallele
Plagianakos et al. An improved backpropagation method with adaptive learning rate
Chakraborty et al. Connectionist models for part-family classifications
Pazouki et al. Adaptive learning algorithm for RBF neural networks in kernel spaces
Mehta et al. Feedforward Neural Networks for Process Identification and Prediction
Hsu Solving multi-response problems through neural networks and principal component analysis
Mashor Modified recursive prediction error algorithm for training layered neural network
JPH0535710A (ja) ニユーラルネツトワークの学習方法および学習装置
Vassiljeva et al. Genetic algorithm based structure identification for feedback control of nonlinear mimo systems
Swingler A comparison of learning rules for mixed order hyper networks
Chaturvedi Factors affecting the performance of artificial neural network models
Brown et al. On the condition of adaptive neurofuzzy models
Brandusoiu HOW TO FINE-TUNE NEURAL NETWORKS FOR CLASSIFICATION
Sørheim ART2/BP architecture for adaptive estimation of dynamic processes
MacKay The evidence for neural networks
Soldić-Aleksić An application of discriminant analysis and artificial neural networks to classification problems
Mu et al. Design of Control Systems Using Neural Network

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CA JP

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB IT LU NL SE

NENP Non-entry into the national phase

Ref country code: CA