FR2668625A1

FR2668625A1 - Automatic learning process and multilayer connectional network for the implementation of this process

Info

Publication number: FR2668625A1
Application number: FR9013445A
Authority: FR
Inventors: Burel Gilles
Original assignee: Thomson CSF SA
Current assignee: Thales SA
Priority date: 1990-10-30
Filing date: 1990-10-30
Publication date: 1992-04-30
Also published as: FR2668625B1

Abstract

The automatic learning process consisting in modifying weighting coefficients of a multilayer neural network so that the outputs from the neural network are representative of a group of data, is characterised in that the speed of learning is set at the start of learning by fixing a value of relative decrease of an error function between the output values obtained and the output values desired, and then frozen at the end of learning. Furthermore, the process consists of imposing an upper bound on the profusion of weighting coefficient modifications due to the presentation of an example relating to a quantity, termed the neural potential, corresponding to another example and in prohibiting the modifications of the weighting coefficients which tend to increase the absolute value of the neural potential too greatly. Application to speech processing and image processing, robotics, and signal processing.

Description

PROCEDE D'APPRENTISSAGE AUTOMATIQUE
ET RESEAU CONNEXIONNISTE MULTI-COUCHES
POUR LA MISE EN OEUVRE DE CE PROCEDE
L'invention se rapporte au domaine de l'apprentissage automatique utilisant un réseau connexionniste, dit aussi réseau de neurones, multi-couches. Les applications de l'invention couvrent tous les domaines auxquels les réseaux connexionnistes multi-couches sont ou seront appliqués tels que notamment le traitement de la parole et de l'image, la robotique, le traitement du signal.AUTOMATIC LEARNING METHOD
AND MULTI-LAYER CONNECTION NETWORK
FOR THE IMPLEMENTATION OF THIS METHOD
The invention relates to the field of machine learning using a connectionist network, also called neural network, multi-layer. The applications of the invention cover all the fields to which the multi-layer connectionist networks are or will be applied such as, in particular, speech and image processing, robotics, signal processing.

Les systèmes connexionnistes connaissent depuis quelques années un regain d'intérêt principalement dû à la publication de modèles de neurones (inspirés du fonctionnement des neurones biologiques), ainsi que d'algorithmes d'apprentissage pour réseaux de neurones. In recent years, connectionist systems have been gaining renewed interest mainly due to the publication of neuron models (inspired by the workings of biological neurons), as well as learning algorithms for neural networks.

Le procédé d'apprentissage qui semble actuellement le plus performant met en oeuvre un algorithme dit de "rétropropagation" cet algorithme est décrit en détail dans le document suivant
D.E. RUMELHART, G.E. HINTON, R,J. WILLIAMS.The training method that currently seems to be the most efficient uses a so-called "backpropagation" algorithm. This algorithm is described in detail in the following document.
DE RUMELHART, GE HINTON, R, J. WILLIAMS.

"Learning internal representations by error backpropagation" Parallel Distributed Processing, D . E. "Learning internal representations by error backpropagation" Parallel Distributed Processing, D. E.

RUMELHART and J.L. Mc CLELLAND Chap 8, Bradford Book
MIT Press - 1986"
Cet algorithme d'apprentissage modifie les poids des connexions des neurones de façon à ce que les sorties du réseau de neurones soient représentatives d'un groupe de données qui ont des caractéristiques semblables, c'est-à-dire qu a un groupe de données correspondent des valeurs de sortie déterminées.RUMELHART and JL Mc CLELLAND Chap 8, Bradford Book
MIT Press - 1986 "
This learning algorithm alters the weights of the neuron connections so that the outputs of the neural network are representative of a group of data that have similar characteristics, i.e., a data group. corresponding output values determined.

L'algorithme de rétropropagation est dédié à des réseaux de neurones multi-couches, chaque neurone étant modélisé par un sommateur pondéré associé à un dispositif de calcul d'une fonction non linéaire. Cependant cet algorithme présente les inconvénients suivants
- l'apprentissage du réseau de neurones est
généralement lent
- Le paramètre dit "vitesse d'apprentissage1, est
difficile à ajuster
- Le système neuronal est parfois bloqué dans une
mauvaise configuration, notamment lorsque la vitesse
d'apprentissage a été mal réglée.The backpropagation algorithm is dedicated to multi-layer neural networks, each neuron being modeled by a weighted summator associated with a device for calculating a nonlinear function. However, this algorithm has the following disadvantages
- learning the neural network is
usually slow
- The parameter called "learning speed1" is
difficult to adjust
- The neural system is sometimes blocked in a
misconfiguration, especially when speed
learning has been poorly regulated.

Pour résoudre ces problèmes, l'invention concerne un procédé d'apprentissage automatique consistant, à améliorer l'algorithme d'apprentissage par un réglage automatique des paramètres du réseau. To solve these problems, the invention relates to an automatic learning method consisting in improving the learning algorithm by automatically adjusting the network parameters.

Selon l'invention, le procédé d'apprentissage automatique consistant à modifier des coefficients de pondération d'un réseau de neurones multi-couches de façon à ce que les sorties du réseau de neurones soient représentatives d'un groupe de données est caractérisé en ce qu'il consiste
- à présenter successivement des valeurs caractéristiques de différents exemples, pris dans un ordre aléatoire, aux entrées du réseau de neurones,
- à calculer, pour chaque exemple, une fonction d'erreur quadratique moyenne définie comme la valeur moyenne de l'erreur de sortie totale entre les valeurs de sortie du réseau de neurones et les valeurs de sortie désirées
- à modifier, pour chaque exemple, les coefficients de pondération du réseau de neurones en fonction du gradient d'erreur pondéré par un paramètre correspondant à une vitesse d'apprentissage, de façon à réduire la valeur de la fonction d'erreur
- à ajuster la vitesse d'apprentissage en fixant, pour les premiers exemples, une valeur de décroissance relative de la fonction d'erreur et en gelant la valeur de la vitesse d'apprentissage pour les derniers exemples. According to the invention, the automatic learning method of modifying weighting coefficients of a multi-layer neural network so that the outputs of the neural network are representative of a group of data is characterized in that that it consists
to successively present characteristic values of various examples, taken in a random order, at the inputs of the neural network,
calculating, for each example, a mean squared error function defined as the mean value of the total output error between the output values of the neural network and the desired output values
modifying, for each example, the weighting coefficients of the neural network as a function of the error gradient weighted by a parameter corresponding to a learning speed, so as to reduce the value of the error function
to adjust the learning speed by fixing, for the first examples, a relative decay value of the error function and by freezing the value of the learning speed for the last examples.

L'invention concerne également un réseau connexionniste multi-couches comportant au moins trois couches de neurones, chaque neurone comportant un dispositif de calcul du potentiel neuronal associé à un dispositif de calcul d'une fonction non linéaire, caractérisé en ce qu'il comporte un dispositif de calcul de la vitesse d'apprentissage recevant en entrée des informations provenant de tous les neurones du réseau et délivrant en sortie une vitesse d'apprentissage optimisée à chaque exemple suivant la valeur de la fonction d'erreur. The invention also relates to a multi-layer connectionist network comprising at least three layers of neurons, each neuron comprising a device for calculating the neuronal potential associated with a device for calculating a non-linear function, characterized in that it comprises a learning rate calculating device receiving as input information from all the neurons of the network and outputting an optimized learning rate to each example according to the value of the error function.

D'autres particularités et avantages de l'invention apparaîtront clairement dans la description suivante donnée à titre d'exemple non limitatif et faite en regard des figures annexées qui représentent
- la figure 1, un schéma synoptique d'un modèle de neurone, selon l'art antérieur
- la figure 2, le schéma d'un réseau de neurones multi-couches
- la figure 3, un organigramme de l'algorithme de rétropropagation
- la figure 4, un schéma synoptique d'un modèle de neurone, selon l'invention.Other features and advantages of the invention will become apparent from the following description given by way of nonlimiting example and with reference to the appended figures which represent
FIG. 1, a block diagram of a neuron model, according to the prior art
FIG. 2, the diagram of a multi-layer neural network
FIG. 3, a flowchart of the backpropagation algorithm
FIG. 4, a block diagram of a neuron model, according to the invention.

La figure 1 représente un exemple de circuit de modélisation d'un neurone, selon l'art antérieur. Un neurone J peut être modélisé à l'aide d'un sommateur pondéré, 10, associé à un dispositif de calcul d'une fonction, 20. Les valeurs aux entrées O O ..., O. ... O du sommateur pondéré,
2 2' i n 10, sont multipliées par des coefficients de pondération respectivement W1j > W2j, ... > W1j > ... > Wnj puis additionnées pour former une valeur intermédiaire Xj appelée potentiel du neurone J à laquelle est appliqué un calcul de fonction correspondant, F (Xj) > qui fournit la valeur de la sortie O. de ce neurone J. La fonction F(Xj) est de préférence, une fonction non linéaire tangente hyperbolique.FIG. 1 represents an example of a modeling circuit of a neuron, according to the prior art. A neuron J can be modeled using a weighted summator, 10, associated with a function calculator, 20. The values at the inputs OO ..., O. ... O of the weighted summator,
2 2 'in 10, are multiplied by weighting coefficients respectively W1j> W2j, ...>W1j>...> Wnj then summed to form an intermediate value Xj called potential of the neuron J to which a function calculation is applied corresponding, F (Xj)> which provides the value of the output O. of this neuron J. The function F (Xj) is preferably a hyperbolic tangent nonlinear function.

Oj = F(Xj)=th(Xj/2)
Cette fonction tangente hyperbolique permet d'obtenir une valeur de sortie O. qui tend vers +l ou -1 lorsque les valeurs intermédiaires X. augmentent en valeur absolue.Oj = F (Xj) = th (Xj / 2)
This hyperbolic tangent function makes it possible to obtain an output value O. which tends to + l or -1 when the intermediate values X i increase in absolute value.

La figure 2 représente le schéma d'un réseau de neurones multi-couches. Ce réseau de neurones est organisé en plusieurs niveaux, les sorties d'un niveau constituant les entrées du niveau suivant, de telle façon que, en tenant compte de valeurs caractéristiques calculées sur des exemples et présentées aux entrées des neurones du premier niveau du réseau, les coefficients de pondération évoluent en cours d'apprentissage jusqu a ce que les neurones du dernier niveau du réseau fournissent pour tous les exemples dtun groupe de données ayant des caractéristiques semblables, un coefficient +1, sur la sortie du neurone correspondant à ce groupe de données, et des coefficients -1 sur l'ensemble des sorties des autres neurones. Figure 2 shows the diagram of a multi-layer neural network. This network of neurons is organized in several levels, the outputs of a level constituting the inputs of the next level, so that, taking into account characteristic values calculated on examples and presented to the inputs of the neurons of the first level of the network, the weighting coefficients change during the learning process until the neurons of the last level of the network provide for all the examples of a group of data having similar characteristics, a coefficient +1, on the output of the neuron corresponding to this group of data, and coefficients -1 on all the outputs of other neurons.

Le réseau de neurones représenté sur la figure 2 est un réseau seuil, c'est-à-dire qu'il comporte sur chacun de ses niveaux, sauf le niveau de sortie, un neurone supplémentaire appelé neurone seuil, NS, dont l'état reste toujours égal à 1 et qui permet d'ajouter un degré de liberté supplémentaire dans la masse d'informations que constituent les coefficients de pondération dits "poids". Ce degré de liberté supplémentaire permet au réseau d'apprendre plus facilement et plus vite. The neural network shown in FIG. 2 is a threshold network, that is to say that it comprises on each of its levels, except the output level, an additional neuron called a threshold neuron, NS, whose state always remains equal to 1 and which makes it possible to add an additional degree of freedom in the mass of information constituted by the weighting coefficients called "weight". This extra degree of freedom allows the network to learn more easily and quickly.

Le réseau de la figure 2 comporte trois niveaux - le premier niveau comporte un nombre de neurones Ni égal au nombre d'entrées, c'est-à-dire au nombre de valeurs caractéristiques calculées pour chaque groupe de données, et en outre un neurone seuil qui n'est pas connecté aux entrées. The network of FIG. 2 comprises three levels - the first level comprises a number of neurons Ni equal to the number of inputs, that is to say the number of characteristic values calculated for each group of data, and in addition a neuron threshold that is not connected to the inputs.

- le deuxième niveau comporte un nombre N2 de neurones compris entre Ni et la valeur N3 du nombre de neurones du dernier niveau. Chaque neurone du deuxième niveau a (Ni+1) entrées reliées aux (Ni+1) sorties des neurones du premier niveau et du neurone seuil. Le deuxième niveau comporte en outre un neurone seuil qui n'est pas relié aux neurones du premier niveau.the second level comprises a number N2 of neurons between Ni and the value N3 of the number of neurons of the last level. Each neuron of the second level has (Ni + 1) inputs connected to the (Ni + 1) outputs of the neurons of the first level and the threshold neuron. The second level also includes a threshold neuron that is not connected to the neurons of the first level.

- le troisième niveau comporte un nombre de neurones N3 égal au nombre de groupes de données. Les entrées des neurones du dernier niveau sont reliées aux (N2+1) sorties des neurones du deuxième niveau. Ce niveau ne comporte pas de neurone seuil.the third level comprises a number of neurons N3 equal to the number of groups of data. The inputs of the neurons of the last level are connected to the (N2 + 1) outputs of the neurons of the second level. This level does not include a threshold neuron.

L'apprentissage du réseau de neurone suivant l'algorithme de rétropropagation s'effectue de la manière suivante : Les coefficients de pondération Wij sont tout d'abord initialisés par exemple en utilisant une loi de probabilité P(Wij) uniforme dans un intervalle -M, +M. The training of the neuron network according to the backpropagation algorithm is carried out as follows: The weighting coefficients Wij are first initialized for example using a uniform probability law P (Wij) in a range -M , + M.

L'apprentissage consiste alors en la modification des coefficients de pondération Wij, cette modification des coefficients se déroulant de la façon suivante : à un rang d'itération n donné, un exemple est extrait parmi différents exemples et les valeurs caractéristiques calculées pour cet exemple sont appliquées aux entrées du réseau de neurones. Les sorties correspondantes du réseau sont alors calculées. Cet exemple étant caractéristique d'un groupe de données devra donner en fin d'apprentissage, en sortie du neurone associé à ce groupe, une valeur égale à +1 ou voisine de +1, les sorties des autres neurones du dernier niveau donnant des valeurs négatives aussi proche de -1 que possible. Les valeurs de sortie
O. du réseau sont donc comparées aux valeurs de sortie désirées.The learning then consists of modifying the weighting coefficients Wij, this modification of the coefficients taking place as follows: at a given iteration rank n, an example is extracted from among various examples and the characteristic values calculated for this example are applied to the inputs of the neural network. The corresponding outputs of the network are then calculated. This example being characteristic of a group of data will have to give at the end of learning, at the output of the neuron associated with this group, a value equal to +1 or close to +1, the outputs of the other neurons of the last level giving values negative as close to -1 as possible. The output values
O. of the network are therefore compared to the desired output values.

L'erreur sur la sortie, notée E., par rapport à la 2 sortie désirée est mesurée par la valeur i (0.-S.) et
JJ l'erreur de sortie totale E est la somme des erreurs E. ainsi calculées pour l'ensemble des sorties de j = 1 à N3.The error on the output, denoted E., with respect to the desired output 2 is measured by the value i (0.-S.) and
JJ the total output error E is the sum of the errors E thus calculated for all the outputs of j = 1 to N3.

Une fonction d'erreur quadratique moyenne Eq est alors définie comme la valeur moyenne de l'erreur totale E, et les coefficients de pondération Wij sont modifiés de façon à réduire la valeur de la fonction d'erreur en tenant compte de la dérivée de la fonction d'erreur quadratique moyenne Eq pondérée par un paramètre a, correspondant à une vitesse d'apprentissage, de la manière suivante (1) Wij (n) = a gij (n)
iJ iJ où gij (k) est l'inverse du gradient d'erreur
i) Eq
(2) gij
# wij en développant l'expression mathématique (2), gij peut également s'exprimer comme la valeur moyenne de la quantité dO. > où d. = -#E/#Xj.Lorsque le neurone J est situé
Ji J J sur la couche de sortie du réseau de neurones, l'expression de d. peut être facilement calculée à partir de la fonction d'erreur et s'écrit d. = (O-S) F'(X.). Lorsque le
J jj neurone J est situé sur une autre couche, l'expression de d.A mean squared error function Eq is then defined as the average value of the total error E, and the weighting coefficients Wij are modified so as to reduce the value of the error function taking into account the derivative of the mean squared error function Eq weighted by a parameter a, corresponding to a learning speed, in the following manner (1) Wij (n) = a gij (n)
iJ iJ where gij (k) is the inverse of the error gradient
i) Eq
(2) gij
# wij In developing the mathematical expression (2), gij can also be expressed as the average value of the quantity dO. > where d. = - # E / # Xj.When neuron J is located
Ji JJ on the output layer of the neural network, the expression of d. can be easily calculated from the error function and written d. = (OS) F '(X.). When the
J j neuron J is located on another layer, the expression of d.

s'écrit dj = ( # dkWjk) F'(Xj), k est un indice re-# présentant tous les neurones situés sur les couches supérieures à celle du neurone J. Ainsi d. est un coefficient proportionnel à la pente de la fonction non linéaire et dépend de l'état des neurones situés sur la couche supérieure au neurone J. Comme les exemples sont présentés aux entrées du réseau de neurones dans un ordre aléatoire, g.. peut être approximé en effectuant un filtrage passe-bas de coefficient b, de la quantité djOi.is written dj = (# dkWjk) F '(Xj), k is a index re- # presenting all the neurons located on the layers superior to that of the neuron J. Thus d. is a coefficient proportional to the slope of the nonlinear function and depends on the state of the neurons located on the upper layer at the neuron J. As the examples are presented at the inputs of the neural network in a random order, g .. can be approximated by performing a low-pass filtering of coefficient b, of the quantity djOi.

Cette modification des coefficients de pondération permet de faire varier les valeurs des poids Wij en fonction inverse du gradient d'erreur, tout en tenant compte des variations effectuées à l'itération précédente, c'est-à-dire en effectuant un filtrage passe-bas qui permet d'éviter des oscillations pour conserver la valeur moyenne du coefficient de pondération. This modification of the weighting coefficients makes it possible to vary the weight values Wij in inverse function of the error gradient, while taking into account the variations made at the previous iteration, that is to say by performing a pass filtering. low which avoids oscillations to keep the average value of the weighting coefficient.

Ces coefficients étant modifiés, un nouvel exemple est présenté aux entrées du réseau de neurones et la même opération est effectuée plusieurs fois à partir de tous les exemples disponibles jusqu a ce que les sorties du réseau de neurones soient toujours dans l'état correspondant au groupe de données considéré, c'est-à-dire que pour tous les exemples, les sorties soient proches de -1 pour toutes celles qui ne correspondent pas à l'exemple et proche de +l pour celle qui correspond à exemple, quel que soit l'exemple considéré. These coefficients being modified, a new example is presented to the inputs of the neural network and the same operation is performed several times from all the available examples until the outputs of the neural network are always in the state corresponding to the group. of data considered, that is to say that for all the examples, the outputs are close to -1 for all those which do not correspond to the example and close to + 1 for that which corresponds to example, whatever the example considered.

Les paramètres d'apprentissage a et de filtrage b sont choisis par l'opérateur. The learning parameters a and filtering b are chosen by the operator.

Lorsque la phase d'apprentissage est terminée, le réseau de neurones est capable de reconnaître à quel groupe de données appartiennent les exemples présentés et les coefficients de pondération sont gelés et sauvegardés. When the learning phase is over, the neural network is able to recognize which group of data belong to the presented examples and the weighting coefficients are frozen and saved.

Pour résumer et en référence à la figure 3, la phase d'apprentis sage suivant l'algorithme de rétropropagation, comporte les étapes suivantes
La première étape, 1, est une étape d'initialisation des coefficients de pondération, appelés aussi poids, suivi de l'initialisation d'un paramètre noté "n" qui correspond au nombre d'exemples déjà présentés aux entrées du réseau de neurones.To summarize and with reference to FIG. 3, the learning phase following the backpropagation algorithm comprises the following steps:
The first step, 1, is a step of initialization of the weighting coefficients, also called weight, followed by the initialization of a parameter denoted "n" which corresponds to the number of examples already presented to the inputs of the neural network.

Dans une seconde étape, 2, un exemple est extrait aléatoirement parmi tous les exemples disponibles et est présenté aux entrées du réseau de neurones. Chaque neurone calcule alors son potentiel puis la valeur de la fonction non linéaire dans une étape 3, la fonction d'erreur dans une étape 4 et modifie ses poids dans une étape 5 de façon à réduire la valeur du gradient d'erreur. In a second step, 2, an example is randomly selected from all available examples and is presented to the inputs of the neural network. Each neuron then calculates its potential and then the value of the non-linear function in a step 3, the error function in a step 4 and modifies its weight in a step 5 so as to reduce the value of the error gradient.

Le nombre d'exemples n est ensuite incrémenté et un test est effectué, étape 6 : si n est égal au nombre d'exemples disponibles, l'étape suivante est considérée sinon un autre exemple est extrait aléatoirement et les poids du réseau sont à nouveau modifiés pour réduire la nouvelle valeur de la fonction d'erreur, et ainsi de suite jusqu a épuisement des exemples. The number of examples is then incremented and a test is performed, step 6: if n is equal to the number of examples available, the next step is considered otherwise another example is randomly extracted and the weights of the network are again modified to reduce the new value of the error function, and so on until the examples are exhausted.

Dans l'étape suivante 7, lorsque tous les exemples ont été présentés au moins une fois, un test est effectué pour voir si tous les exemples ont été ou non correctement classifiés. In the next step 7, when all the examples have been presented at least once, a test is performed to see if all the examples have been correctly classified or not.

Si ce n'est pas le cas, le paramètre n est remis à zéro et tout le processus est réitéré jusqu a ce que tous les exemples présentés aux entrées du réseau de neurones soient correctement classifiés. If this is not the case, the parameter n is reset and the whole process is reiterated until all the examples presented to the inputs of the neural network are correctly classified.

Lorsque c'est le cas, l'état du réseau, c'est-à-dire les valeurs des poids, est sauvegardé dans une dernière étape 8, et le réseau de neurones peut alors être utilisé à des fins de reconnaissance d'un groupe de données nouveau. When this is the case, the state of the network, that is to say the values of the weights, is saved in a last step 8, and the neural network can then be used for the purposes of recognizing a new data group.

La figure 4 représente un schéma synoptique d'un modèle de neurone, selon l'invention. FIG. 4 represents a block diagram of a neuron model, according to the invention.

L'algorithme classique de rétropropagation est dédié aux réseaux de neurones multi-couches, chaque neurone étant modélisé par un sommateur pondéré 10 associé à un dispositif de calcul d'une fonction non linéaire 20. Il présente les inconvénients suivants
- la vitesse d'apprentissage du réseau est difficile à ajuster,
- il ne converge pas de façon satisfaisante pour certains exemples qui sont alors difficiles à apprendre. Ce problème est principalement dû à un phénomène de saturation sur la couche interne du réseau de neurones,
- L'apprentissage est parfois perturbé et le réseau de neurones est alors bloqué dans une mauvaise configuration.The conventional backpropagation algorithm is dedicated to multi-layered neural networks, each neuron being modeled by a weighted adder 10 associated with a device for calculating a nonlinear function 20. It has the following disadvantages
- the learning speed of the network is difficult to adjust,
- it does not converge satisfactorily for some examples which are then difficult to learn. This problem is mainly due to a saturation phenomenon on the inner layer of the neural network,
- The learning is sometimes disturbed and the neural network is then blocked in a bad configuration.

Ces inconvénients sont résolus par le dispositif représenté sur la figure 4 dans lequel la vitesse d'apprentissage est calculée automatiquement et dans lequel la saturation et les perturbations sont contrôlées automatiquement. These disadvantages are solved by the device shown in FIG. 4 in which the learning speed is calculated automatically and in which the saturation and the disturbances are automatically controlled.

A cet effet, le neurone J destiné à recevoir, sur ses entrées, un groupe de données représenté par le vecteur e et ayant des coefficients de pondération, ou poids, préalablement initialisés et représentés par le vecteur Wj, comporte
- un dispositif de calcul du potentiel neuronal, 40 recevant en entrée les vecteurs e et Wj, le vecteur W étant
j, le vecteur j modifié à chaque exemple par un dispositif de calcul des poids, 65, associé à un dispositif de calcul d'une fonction non linéaire, 50, délivrant la valeur de sortie O. du neurone J.For this purpose, the neuron J intended to receive, on its inputs, a group of data represented by the vector e and having weighting coefficients, or weights, previously initialized and represented by the vector Wj, comprises
a device for calculating the neuronal potential, receiving at its input the vectors e and Wj, the vector W being
j, the vector j modified at each example by a device for calculating the weights, 65, associated with a device for calculating a nonlinear function, 50, delivering the output value O. of the neuron J.

- un dispositif de calcul de norme, 71, recevant en entrée, le vecteur e > et délivrant en sortie la norme e de ce vecteur. La norme e est ensuite moyennée dans un filtre passe-bas, 72, dont la sortie est reliée à une entrée d'un dispositif de multiplication 74 qui reçoit sur une autre entrée la norme e et qui réalise le produit e. e de la norme du vecteur d'entrée par la valeur moyenne de cette norme. a norm computing device 71 receiving, as input, the vector e> and outputting the norm e of this vector. The standard e is then averaged in a low-pass filter, 72, the output of which is connected to an input of a multiplication device 74 which receives on another input the standard e and which produces the product e. e of the input vector standard by the average value of this standard.

- un dispositif de contrôle des perturbations, 70, recevant en entrée d'une part le produit e. e de la norme du vecteur d'entrée par la valeur moyenne de cette norme et d'autre part le coefficient d. calculé dans un dispositif de calcul de dj, 73, à partir des informations provenant des neurones des couches supérieures, et délivrant en sortie un coefficient noté dj*
- un dispositif de contrôle de la saturation, 60, recevant sur ses entrées l'inverse du gradient d'erreur représenté par le vecteur GJ, le coefficient dj* et le potentiel neuronal X. et délivrant en sortie le vecteur Gj* qui est ensuite moyenné dans un filtre passe-bas 68.Le vecteur
G. a préalablement été obtenu par un dispositif de calcul de produit 75 recevant une entrée le vecteur d'entrée # > et le coefficient d.* et effectuant le produit dj par e, la
J J valeur moyenne de l'inverse du gradient d'erreur délivrée en sortie du filtre passe-bas 68 est alors utilisée par un dispositif 69, qui reçoit en entrée la valeur de la vitesse d'apprentissage calculée par le réseau dans un dispositif de calcul de la vitesse d'apprentissage, 80, et qui délivre en sortie la variation des poids représentée par le vecteur A Wj, cette variation des poids étant utilisée ensuite par le dispositif de calcul des poids, 65.a disturbance control device 70 receiving, on the one hand, the product e. e of the norm of the vector of entry by the average value of this norm and on the other hand the coefficient d. calculated in a device for calculating dj, 73, from the information coming from the neurons of the upper layers, and outputting a coefficient noted dj *
a saturation control device, 60, receiving on its inputs the inverse of the error gradient represented by the vector GJ, the coefficient dj * and the neuronal potential X. and outputting the vector Gj * which is then averaged in a low-pass filter 68.The vector
G. has previously been obtained by a product calculation device 75 receiving an input the input vector #> and the coefficient d. * And making the product dj by e, the
The average value of the inverse of the error gradient outputted from the low-pass filter 68 is then used by a device 69, which receives as input the value of the learning speed calculated by the network in a computing device. of the learning speed, 80, which outputs the variation of the weights represented by the vector A Wj, this variation of the weights being then used by the device for calculating the weights, 65.

Les rôles du dispositif de contrôle de la saturation 60, du dispositif de contrôle des perturbations 70 et du dispositif du calcul de la vitesse d'apprentissage sont décrits ci-après . The roles of the saturation control device 60, the disturbance control device 70 and the learning rate calculation device are described below.

Pendant la phase d'apprentissage, à chaque exemple présenté, l'algorithme de rétropropagation modifie les coefficients de pondération Wij de chaque neurone J en fonction du gradient d'erreur donc proportionnellement à la pente de la fonction tangente hyperbolique F(Xj). Lorsque pour un exemple, représenté par le vecteur d'entrée e, les coefficients de pondération sont forts, le résultat de la somme pondérée X. est important et la fonction F(Xj) sature, sa pente est très faible et la correction des coefficients de pondération s'effectue très lentement.Pour limiter les effets de la saturation, chaque neurone J comporte un dispositif de contrôle de la saturation 60 dont le rôle est d'interdire à un dispositif 65 de modifications des poids, les modifications des coefficients de pondération qui tendent à accroître de façon trop importante la valeur absolue du potentiel Xj. Pour cela, le dispositif de contrôle de la saturation 60, recevant en entrée les coefficients g.. (inverse du gradient d'erreur) représentés par le vecteur G. et délivrant en sortie des
J - > coefficients modifiés représentés par le vecteur Gj*, impose une contrainte telle que la valeur absolue du potentiel | X. j soit inférieure à une valeur limite V1 lorsque la saturation tend à s'accroître.Si l'on note #Xj la variation du potentiel induite par des modifications , Wij des coefficients de pondération (représentées par le vecteur # Wj), la saturation tend à croître lorsque Xj#Xj > 0, ce qui est
J J équivalent à Xjd*j > 0. Le coefficient dj* est le coefficient d. modifié par un dispositif de contrôle des perturbations 70 décrit ci-après. La fonction effectuée par le dispositif de contrôle de la saturation 60 est alors la suivante
- > (3) si(jX.j > V1) et si (X.d.* > 0) alors G*. = 0 dans tous les autres cas Gj* = Gj. During the learning phase, with each example presented, the backpropagation algorithm modifies the weighting coefficients Wij of each neuron J as a function of the error gradient, therefore proportionally to the slope of the hyperbolic tangent function F (Xj). When for an example, represented by the input vector e, the weighting coefficients are strong, the result of the weighted sum X. is important and the function F (Xj) saturates, its slope is very low and the correction of the coefficients A method for controlling the saturation 60 whose function is to prohibit a device 65 for modifying the weights, the modifications of the weighting coefficients, is used to limit the effects of saturation. which tend to increase too much the absolute value of the potential Xj. For this, the saturation control device 60, receiving as input the coefficients g .. (inverse of the error gradient) represented by the vector G. and outputting
J -> modified coefficients represented by the vector Gj *, imposes a constraint such that the absolute value of the potential | X. j is less than a limit value V1 when the saturation tends to increase. If we denote #Xj the variation of the potential induced by modifications, Wij of the weighting coefficients (represented by the vector # Wj), the saturation tends to grow when Xj # Xj> 0, which is
JJ is equivalent to Xjd * j> 0. The coefficient dj * is the coefficient d. modified by a disturbance control device 70 described below. The function performed by the saturation control device 60 is then as follows
-> (3) if (jX.j> V1) and if (Xd *> 0) then G *. = 0 in all other cases Gj * = Gj.

A titre d'exemple, on peut choisir Vl = 6 ce qui correspond à une variation d'un facteur l00 pour la pente de la fonction non linéaire. For example, one can choose Vl = 6 which corresponds to a variation of a factor 100 for the slope of the nonlinear function.

Un filtre passe-bas 68 de constante b, placé en sortie du dispositif de contrôle de la saturation 60 effectue une valeur moyenne de G *. A low-pass filter 68 of constant b, placed at the output of the saturation control device 60, performs a mean value of G *.

L'algorithme de rétropropagation présente un autre inconvénient qui est celui de bloquer parfois le réseau de neurones dans une mauvaise configuration surtout lorsque la vitesse d'apprentissage a été mal réglée. Lors de la présentation d'un exemple au réseau de neurones, les coefficients de pondération sont modifiés en fonction de cet exemple pour réduire la valeur de la fonction d'erreur. Si cette modification a une trop forte influence sur les exemples suivants, il y a perturbation de l'apprentissage.Pour résoudre cet inconvénient, le neurone J comporte un dispositif de contrôle des perturbations 70 du potentiel neuronal, recevant en entrée le coefficient d. et délivrant en sortie un coefficient modifié d#. Le coefficient d. a été auparavant calculé dans un dispositif 73 à partir des informations transmises par les neurones appartenant à la couche supérieure au neurone J. Another disadvantage of the backpropagation algorithm is that it sometimes blocks the neural network in a bad configuration, especially when the learning speed has been incorrectly adjusted. When presenting an example to the neural network, the weighting coefficients are modified according to this example to reduce the value of the error function. If this modification has too great an influence on the following examples, there is a disturbance of the learning. To solve this drawback, the neuron J comprises a device for controlling the disturbances 70 of the neuronal potential, receiving as input the coefficient d. and outputting a modified coefficient d #. The coefficient d. was previously calculated in a device 73 from the information transmitted by the neurons belonging to the upper layer to neuron J.

Le rôle de'- ce dispositif de contrôle des perturbations est de minimiser l'influence de la perturbation d'un exemple sur le potentiel correspondant à un autre exemple. The role of this disturbance control device is to minimize the influence of the perturbation of one example on the potential corresponding to another example.

Pour cela, en considérant la modification du vecteur ss wu induite par un premier vecteur d'entrée e et l'influence /\ X.' de cette modification sur le potentiel
J ~, correspondant à un autre vecteur d'entrée et, il est possible d'estimer cette influence en valeur absolue en remplaçant le vecteur dtentrée et par un vecteur de norme e parallèle à et et de même sens que e'. Cette valeur e est une valeur moyenne de la norme des vecteurs d'entrée, cette norme étant calculée dans le dispositif 71 et cette valeur moyenne e étant obtenue par un filtre passe-bas 72 de constante b.Pour limiter l'influence de # Wj sur le potentiel correspondant à une autre entrée, il faut limiter la valeur de | # Xjt |, par exemple, telle qu'elle soit inférieure à 1 % de l'amplitude crête à crête tolérée sur le potentiel par le dispositif de contrôle de la saturation 60 soit (3) |#tj| max = (2 x V1)/100
L'influence a X'j de la modification des poids induite sur le potentiel s'exprime en fonction du coefficient d'#. de la façon suivante (4) |#X'j| = a|d ajd*.jee
La fonction que doit alors réaliser le dispositif de contrôle des perturbations 70 est (5) dj* = signe (dj) x min ((2 Vl/(100 aee)), jdjj)
Enfin, dans l'algorithme de rétropropagation, le paramètre a correspondant à la vitesse d'apprentissage du réseau de neurones est difficile à ajuster. Pour résoudre cet inconvénient, le réseau de neurones comprend un dispositif de calcul automatique de la vitesse d'apprentissage 80, recevant en entrée les informations globales du réseau et délivrant en sortie le paramètre d'apprentissage a. Ce paramètre d'apprentissage est alors communiqué à chaque neurone du réseau.For this, considering the modification of the vector ss wu induced by a first input vector e and the influence / \ X. ' of this change on the potential
J ~, corresponding to another input vector and, it is possible to estimate this influence in absolute value by replacing the input vector and a vector of norm e parallel to and in the same sense as e '. This value e is an average value of the standard of the input vectors, this standard being calculated in the device 71 and this average value e being obtained by a low-pass filter 72 of constant b. To limit the influence of # Wj on the potential corresponding to another entry, the value of | For example, such that it is less than 1% of the peak-to-peak amplitude tolerated on the potential by the saturation control device 60 (3) | #tj | max = (2 x V1) / 100
The influence at X'j of the change in weight induced on the potential is expressed as a function of the coefficient of #. as follows (4) | # X'j | = a | d ajd * .jee
The function that must then be performed by the disturbance control device 70 is (5) dj * = sign (dj) x min ((2 Vl / (100 ae)), jdjj)
Finally, in the backpropagation algorithm, the parameter a corresponding to the learning speed of the neural network is difficult to adjust. To solve this drawback, the neural network comprises a device for automatically calculating the learning speed 80, receiving as input the global information of the network and outputting the learning parameter a. This learning parameter is then communicated to each neuron of the network.

Le choix d'une vitesse d'apprentissage n'est pas aisé et nécessite souvent de nombreux essais avant d'obtenir une bonne convergence de l'algorithme de rétropropagation. Une valeur du paramètre a trop grande provoque un apprentissage irrégulier, des fortes oscillations de la fonction d'erreur et le réseau neuronal est généralement bloqué dans un minima local. The choice of a learning speed is not easy and often requires many tests before obtaining a good convergence of the backpropagation algorithm. A value of the parameter a too large causes an irregular learning, strong oscillations of the function of error and the neural network is generally blocked in a local minimum.

D'un autre côté, une valeur du paramètre a trop faible provoque un apprentissage très lent. Une vitesse d'apprentissage optimale dépend de la taille du réseau et du nombre d'itérations de la phase d'apprentissage. On the other hand, a value of the parameter a too low causes a very slow learning. An optimal learning speed depends on the size of the network and the number of iterations of the learning phase.

Ainsi, il est possible de fixer une valeur k de décroissance relative de la fonction d'erreur tel que i!Eq = -kEq. Thus, it is possible to set a relative decay value of the error function such that i! Eq = -kEq.

En notant W le vecteur contenant toutes les valeurs des poids W.. du réseau et g le vecteur égal à l'opposé du gradient d'erreur et contenant toutes les valeurs gij du réseau, alors l'équation de variation des poids lorsqu'un exemple est présenté au réseau s'écrit ag et l'équation de décroissance correspondante de l'erreur s'écrit (6) 8 Eq = -g .EW = -ag2
La valeur de a nécessaire pour obtenir la décroissance de l'erreur choisie s'obtient alors par ltéquation suivante (7) a = kEq/g2
Cependant, il y a souvent des problèmes à la fin de la phase d'apprentissage, car le gradient d erreur tend vers zéro.By denoting W the vector containing all the values of the weights W i of the network and g the vector equal to the opposite of the error gradient and containing all the values g ij of the network, then the equation of variation of the weights when example is presented to the network is written ag and the corresponding decay equation of the error is written (6) 8 Eq = -g .EW = -ag2
The value of a necessary to obtain the decay of the chosen error is then obtained by the following equation (7) a = kEq / g2
However, there are often problems at the end of the learning phase because the error gradient tends to zero.

Aussi, le dispositif de calcul de la vitesse d'apprentissage, calcule la valeur de a suivant l'équation (7) pendant les premières itérations de la phase d'apprentissage ce qui ne pose aucun problème lorsque les poids ont été correctement initialisés, puis après plusieurs itérations il gèle la valeur de a. Also, the device for calculating the learning speed calculates the value of a according to equation (7) during the first iterations of the learning phase, which poses no problem when the weights have been properly initialized, then after several iterations it freezes the value of a.

La présente invention n'est pas limitée aux exemples de réalisation précisément décrits, notamment, pour certaines applications, il suffit de contrôler automatiquement la vitesse d'apprentissage pour obtenir une bonne convergence de l'algorithme d'apprentissage. The present invention is not limited to the embodiments described precisely, in particular, for some applications, it is sufficient to automatically control the learning speed to obtain a good convergence of the learning algorithm.

Claims

A method of automatic learning comprising modifying weighting coefficients of a multi-layer neural network so that the output values of the neural network are representative of a group of data, characterized in that it consists

to successively present characteristic values of various examples, taken in a random order, at the inputs of the neural network,

- to calculate, for each example, a mean squared error (Eq) function defined as the average value of total exit terror between the output values

(Oj) of the neural network and the desired output values

modifying, for each example, the weighting coefficients (W) of the neural network as a function of the error gradient weighted by a parameter (a) corresponding to a learning rate, so as to reduce the value of the function of fault

to adjust the learning speed (a) by fixing, for the first examples, a value (k) of relative decay of the error function and by freezing the value of the learning speed (a) for the last examples.

2 - learning method according to claim 1, characterized in that it further comprises imposing, for each example, an upper limit (IDX ', lmax) to the influence of the changes in the weighting coefficients due to the presentation an example on a quantity, called neuronal potential, corresponding to another example, for controlling the disturbances of this neuronal potential.

3 - learning process according to one of claims 1 or 2, characterized in that it further comprises prohibiting, for each example, changes in the weighting coefficients which tend to increase too much the absolute value of the neuronal potential to control the saturation of the network.

4 - Multi-layer connectionist network for implementing the method according to claim 1, comprising at least three layers of neurons, each neuron (J) comprising a device (40) for calculating the neuronal potential (X.) associated with a computing device (50) for a non-linear function (F (Xj)), characterized in that it comprises a device for calculating the learning speed (80) receiving as input information coming from all the neurons of the network and outputting a learning speed (a) optimized at each example according to the value of the error function.

5 - multi-layer connection network according to claim 4, characterized in that, in addition, each neuron comprises

a weight modification device (65) receiving as input the output values of the saturation control device (60) averaged by a low-pass filter (68) and weighted by the learning speed (a), and outputting the modified values of the weighting coefficients,

a disturbance control device (70) receiving as input coefficient values (dj) proportional to the derivative of the error with respect to the potential and outputting modified coefficient values (dj *) making it possible to minimize the influence of the modifications of the weighting coefficients due to the presentation of an example on the neuronal potential corresponding to another example.

6 - multi-layer connectionist network according to one of claims 4 or 5, characterized in that, in addition, each neuron comprises a saturation control device (60) associated with a low-pass filter (68) receiving in input the values of the inverse of the error gradient vector and outputting zero values to prohibit the modification of the weighting coefficients or values equal to the input values to allow the modification of the weighting coefficients.