DE4138053A1

DE4138053A1 - Hybrid learning process for neural networks, e.g. for pattern recognition in speech processing - uses combination of stochastic and deterministic processes for optimising system

Info

Publication number: DE4138053A1
Application number: DE4138053A
Authority: DE
Inventors: Jochen Heistermann
Original assignee: Siemens AG
Current assignee: Siemens AG
Priority date: 1991-11-19
Filing date: 1991-11-19
Publication date: 1993-05-27

Abstract

The stochastic process is used in an initial attempt to optimise the weighting coefficients of the network. This is followed by the deterministic approach in which an error back propagation method is used to achieve a local optimisation. The resulting learning process based upon a multiprocessor system is accelerated. The first phase may use a genetic or metropolis algorithm. The second phase is initiated, pref. , when a value corresp. to the learning progress has dropped below a predetermined threshold value. ADVANTAGE - Reduced processing time required, can handle large amount of variables.

Description

In verschiedenen Gebieten der Technik, wie z. B. in der Mu stererkennung oder in der Sprachverarbeitung, versagen in der Regel algorithmische Ansätze. In vielen dieser Fälle können in diesen Technikgebieten auftretende Optimierungsprobleme mit Hilfe neuronaler Netze gelbst werden. Neuronale Netze werden nicht programmiert, sondern lösen Probleme selbständig durch gezielte Veränderung ihrer Gewichtsstruktur. Dieses selbstän dige Ändern von Gewichten wird bei neuronalen Netzen im allge meinen als Lernen bezeichnet.In various fields of technology, such as. B. in the Mu recognition or in speech processing, fail in the Rule algorithmic approaches. In many of these cases, optimization problems that arise in these technical areas With the help of neural networks. Neural networks become not programmed, but solve problems independently targeted change in their weight structure. This self Changes in weights are common in neural networks mean learning.

Die bekannten Lernverfahren für neuronale Netze lassen sich prinzipiell in stochastische oder deterministische Lernverfah ren klassifizieren. Ein wichtiger Vertreter der stochastischen Lernverfahren ist das sog. simulated annealing (metropolis,N.,; Rosenbluth,A.; Rosenbluth, M.; Teller, A.; Teller, E.: Equation of State Calculations by Fast Computing Machine; Journal of Chemical Physics, Vol. 21 1953) welches z. B. in einem neuro nalen Netz mit der Bezeichnung Boltzmann-Maschine verwendet wird (Hinton,G.E.; Sejnowski,T.J.; Ackley, D.H.: Bolztmann Ma chines: Constraint Satisfaction Networks that Learn; Technical Report CMU-CS-84-119, Carnegie-Mellon University 1984). Eines der bekanntesten deterministischen Lernverfahren für neuronale Netze ist das sog. error-backpropagation-Verfahren, welches im wesentlichen auf einem Gradientenabstieg basiert.The known learning methods for neural networks can be principally in stochastic or deterministic learning processes classify. An important representative of the stochastic The learning process is the so-called simulated annealing (metropolis, N.,; Rosenbluth, A .; Rosenbluth, M .; Teller, A .; Teller, E .: Equation of State Calculations by Fast Computing Machine; Journal of Chemical Physics, Vol. 21 1953) which e.g. B. in a neuro nal network called Boltzmann machine (Hinton, G.E .; Sejnowski, T.J .; Ackley, D.H .: Bolztmann Ma chines: Constraint Satisfaction Networks that Learn; Technical Report CMU-CS-84-119, Carnegie-Mellon University 1984). One the best known deterministic learning method for neural Networks is the so-called error-back propagation process, which is used in the is essentially based on a gradient descent.

Es ist ein Vorteil der stochastischen Lernverfahren, daß sie in der Lage sind, auch nicht differenzierbare Zielfunktionen mit einer Vielzahl von lokal optimalen Punkten zu optimieren. Dabei ist der Erfolg solcher Lernverfahren meist im wesentli chen unabhängig von der Vorgabe bestimmter Startlösungen; Auf der anderen Seite haben stochastische Lernverfahren den we sentlichen Nachteil, daß ihre Ausführung im allgemeinen sehr rechenintensiv ist und deshalb häufig unvertretbar große Zeiträume zu ihrer Durchführung erforderlich sind. It is an advantage of the stochastic learning process that it are able to perform non-differentiable target functions optimize with a variety of locally optimal points. The success of such learning processes is usually essential Chen regardless of the specification of certain starting solutions; On on the other hand, stochastic learning methods have the we considerable disadvantage that their execution is generally very is computationally intensive and therefore often unacceptably large Periods of time are required.

Deterministische Lernverfahren wie das error-backpropagation- Verfahren sind dagegen meist in vergleichbar kurzen Zeiträumen mit Hilfe eines vergleichbar geringen Rechenaufwands durch führbar. Diese Verfahren haben allerdings den Nachteil, daß ihre Ergebnisse von der Vorgabe geeigneter Startwerte abhän gen, daß ihre Durchführbarkeit eine differenzierbare Zielfunk tion voraussetzt, und daß diese Verfahren im allgemeinen nicht geeignet sind, Zielfunktionen mit einer großen Zahl lokal op timaler Punkte zutreffend zu optimieren.Deterministic learning methods such as error-backpropagation- On the other hand, procedures are usually in comparably short periods with the help of a comparatively low computing effort feasible. However, these methods have the disadvantage that their results depend on the specification of suitable starting values that their feasibility is a differentiable target radio tion and that these procedures generally do not are suitable, target functions with a large number locally op to optimize the timing points correctly.

Der Erfindung liegt die Aufgabe zugrunde, ein Lernverfahren für künstliche neuronale Netze anzugeben, welches die Nach teile beider genannten Verfahrenstypen vermeidet und gleich zeitig die Vorteile beider Verfahrenstypen in einem Verfahren vereint, also insbesondere eine Optimierung auch nicht diffe renzierbarer Zielfunktionen in Anwesenheit vieler lokaler Op timalwerte bei vertretbarem Rechenaufwand innerhalb vertretba rer Zeiträume zu optimieren. Diese Aufgabe wird erfindungsge mäß durch ein Lernverfahren für künstliche neuronale Netze mit Merkmalen nach Anspruch 1 gelöst.The invention has for its object a learning method for artificial neural networks, which specify the post Avoid parts of both types of processes mentioned and the same the advantages of both types of process in one process united, in particular an optimization not even diffe refinable target functions in the presence of many local ops Timal values with reasonable computing effort within reasonable optimize their periods. This task is fiction through a learning process for artificial neural networks Features solved according to claim 1.

Dieses Lernverfahren zeichnet sich dadurch aus, daß es in zwei zeitlich aufeinander folgenden Phasen abläuft, wobei in einer ersten Phase ein stochastisches Optimierungsverfahren und in der zweiten Phase ein deterministisches Optimierungsverfahren eingesetzt wird. Dabei ist es vorteilhaft, wenn in der ersten Phase ein genetisches Optimierungsverfahren oder der Metropo lis-Algorithmus verwendet wird. Für die zweite Phase hingegen sind Gradienten-Verfahren, wie z. B. der backpropagation-Algo rithmus, besonders geeignet. Es ist besonders vorteilhaft, wenn der Übergang von der ersten zur zweiten Phase eingeleitet wird, sobald ein Maß für den Lernfortschritt während des Ab laufs der ersten Phase auf einen Wert unterhalb einer vorgege benen Schwelle gefallen ist. Ein solches Maß für den Lernfort schritt kann besonders vorteilhaft aus einem Maß für die Ähn lichkeit der Individuen einer Population von neuronalen Netzen abgeleitet werden. Dabei wird die Ähnlichkeit zweier neurona ler Netze bevorzugt aus einem Maß für den Abstand der Vektoren ihrer Gewichte bestimmt.This learning process is characterized in that it consists of two successive phases takes place, whereby in one first phase a stochastic optimization process and in in the second phase a deterministic optimization process is used. It is advantageous if in the first Phase a genetic optimization process or the Metropo lis algorithm is used. For the second phase, however are gradient methods, such as B. the back propagation algo rhythm, particularly suitable. It is particularly beneficial when the transition from the first to the second phase is initiated as soon as a measure of the learning progress during the Ab during the first phase to a value below a given one ben threshold has fallen. Such a measure of learning progress step can be particularly advantageous from a measure of similarity the individuality of a population of neural networks be derived. The similarity between two neurons Networks preferably from a measure of the spacing of the vectors their weights.

Das erfindungsgemäße Lernverfahren läuft mit besonderem Vor teil auf einem Multiprozessorsystem ab, in dem autonome Popu lationen durch verschiedene Prozessoren verarbeitet werden. Dabei können zu bestimmten Zeitpunkten oder bei Eintritt be stimmter Ereignisse einzelne Individuen oder Teilpopulationen zwischen den Prozessoren ausgetauscht werden. Vorzugsweise nimmt dabei die Wahrscheinlichkeit für eine solche Migration von Individuen oder Teilpopulationen zwischen verschiedenen Prozessoren mit der Zeit monoton zu; In einer anderen Variante des erfindungsgemäßen Lernverfahrens wählt ein weiterer, zen traler Prozessor Nachkommen der Populationen aus und verteilt diese auf Prozessoren des Multiprozessorsystems. Schließlich hat es sich als besonders vorteilhaft erwiesen, das Lernver fahren in der Weise ablaufen zu lassen, daß Chromosomen oder Gene von Individuen als Vektoren implementiert werden und daß genetische Operationen durch Vektorisierung parallel ausge führt werden.The learning method according to the invention runs with special precedence part on a multiprocessor system in which autonomous Popu lations are processed by different processors. It can be at certain times or upon entry individual events or subpopulations exchanged between the processors. Preferably this reduces the likelihood of such a migration of individuals or subpopulations between different ones Processors monotonously with time; In another variant the learning method according to the invention chooses another, zen central processor descendants of the populations from and distributed this on processors of the multiprocessor system. In the end it has proven to be particularly advantageous to learn drive in such a way that chromosomes or Genes of individuals are implemented as vectors and that genetic operations by vectorization in parallel leads.

Weitere vorteilhafte Ausgestaltungen der Erfindung ergeben sich aus den Unteransprüchen.Further advantageous embodiments of the invention result itself from the subclaims.

Anhand eines bevorzugten Ausführungsbeispiels wird die Erfin dung im folgenden weiter beschrieben.Based on a preferred embodiment, the inven tion further described below.

Das erfindungsgemäße hybride Lernverfahren für neuronale Netze besteht aus zwei Teilen: Einem stochastischen Optimierungsver fahren, welches in einer ersten Phase, ausgehend von einer vorzugsweise zufälligen Initialisierung der Gewichtskoeffizi enten eine nahezu optimale Lösung findet; daran anschließend läuft in einer zweiten Phase ein deterministisches Optimie rungsverfahren, vorzugsweise ein error-backpropagation-Algo rithmus ab, welcher, ausgehend von der besten durch das stochastische Optimierungsverfahren gefundenen Lösung ein lo kales Optimum aufsucht. The hybrid learning method according to the invention for neural networks consists of two parts: a stochastic optimization ver drive, which in a first phase, starting from a preferably random initialization of the weight coefficient ducks finds an almost optimal solution; after that a deterministic optimization runs in a second phase tion process, preferably an error-back propagation algo rhythm, which, starting from the best through the stochastic optimization method found a lo kales optimum.

Ein wichtiges Problem besteht darin, im richtigen Augenblick zwischen den beiden Phasen zu wechseln. Eine vorteilhafte Mög lichkeit besteht darin, das genetische Optimierungsverfahren so lange laufen zu lassen, bis die Population in einem kleinen Gebiet des Suchraums des Optimierungsverfahrens lokalisiert bleibt; In einer solchen Situation werden sich die Individuen der Population zunehmend ähnlicher. Die Ähnlichkeit der Indi viduen einer Population kann beispielsweise gemessen werden, indem ein geeignetes Abstandsmaß, z. B. Differenz, zwischen den Werten ihrer Genen gemessen wird; Falls die Individuen z. B. aus Bit-Folgen bestehen, kann hierzu vorteilhaft der Ham ming-Abstand verwendet werden.An important problem is at the right moment to switch between the two phases. An advantageous possibility is the genetic optimization process let it run until the population in a small Localization of the search area of the optimization process remains; In such a situation, the individuals themselves increasingly similar to the population. The similarity of the Indi for example, a population's video can be measured by a suitable distance measure, e.g. B. difference between the values of their genes are measured; If the individuals e.g. B. Ham can advantageously consist of bit sequences ming distance can be used.

Durch Verwendung eines geeigneten Multiprozessorsystems kann das Lernverfahren entscheidend durch Parallelisierung be schleunigt werden. Hierzu sind verschiedene Ansätze geeignet:By using a suitable multiprocessor system the learning process crucial through parallelization be accelerated. Various approaches are suitable for this:

Eine grobe Parallelisierung läßt sich durchführen, indem an stelle einer einzigen Population mehrere autonome Populationen durch verschiedene Prozessoren eines Multiprozessorsystems verarbeitet werden. Ein derartiges Multiprozessorsystem benö tigt keine zentrale Steuereinheit, da die einzelnen Populatio nen sich voneinander unabhängig entwickeln. Dabei wachsen die Populationen parallel zueinander auf und erkunden verschiedene Teile des Suchraums. Aufgrund ihrer gegenseitigen Isolation findet kein horizontaler Informationstransfer zwischen den einzelnen Populationen statt. Um sich hieraus eventuell erge bende Nachteile zu vermeiden, ist es vorteilhaft, zu bestimm ten Zeitpunkten oder bei Eintritt bestimmter Ereignisse ein zelne Individuen oder Teilpopulationen zwischen den Prozesso ren auszutauschen. Solche Ereignisse sind z. B. das Auftreten besonders angepaßter Individuen innerhalb einer Teilpopulation oder aber eine zunehmende Ähnlichkeit der Individuen innerhalb einer unabhängigen Population; Das Verfahren kann durch eine geeignete Steuerung der Wahrscheinlichkeit für Migrationen von Individuen zwischen Populationen dahingehend beeinflußt wer den, daß Populationen stärker oder schwächer gekoppelt sein können. Dabei dient als gute Heuristik, zunächst jede Popula tion einer getrennten Entwicklung zu unterziehen und anschlie ßend daran die Migrationswahrscheinlichkeit der Individuen langsam zu erhöhen bis schließlich alle Populationen auf den einzelnen Prozessoren sich wie eine einzige große Population verhalten.A rough parallelization can be carried out by put several autonomous populations in a single population through different processors of a multiprocessor system are processed. Such a multiprocessor system does not have a central control unit, since the individual populatio develop independently of one another. The grow Populations parallel to each other and explore different ones Parts of the search space. Because of their mutual isolation finds no horizontal information transfer between the individual populations instead. To possibly derive from this To avoid the disadvantages, it is advantageous to determine ten times or when certain events occur individual individuals or subpopulations between the processes exchange. Such events are e.g. B. the appearance specially adapted individuals within a subpopulation or an increasing similarity of individuals within an independent population; The procedure can be carried out by a appropriate control of the likelihood of migrations from Individuals between populations influenced who that populations are more or less coupled can. Each popula serves as a good heuristic to undergo a separate development and then based on this the probability of migration of the individuals slowly increase until finally all populations on the individual processors look like a single large population behavior.

Auf einer mittleren Ebene ist ferner eine weitere Art der Par allelisierung möglich, wenn man ein Multiprozessorsystem ver wendet, welches einen weiteren, zentralen Prozessor beinhal tet, welcher Nachkommen der Populationen auswählt und auf ver schiedene Prozessoren des Multiprozessorsystems verteilt; Schließlich ist auf einer unteren Ebene eine weitere Art der Parallelisierung möglich, indem Chromosomen oder Gene von In dividuen als Vektoren implementiert werden und genetische Ope rationen durch Vektorisierung parallel ausgeführt werden.At a middle level there is also another type of par Allelization possible if you use a multiprocessor system which includes another central processor which descendant of the populations selects and ver distributed processors of the multiprocessor system distributed; After all, another level is on a lower level Parallelization possible by chromosomes or genes from In dividuen are implemented as vectors and genetic ope rations are carried out in parallel by vectorization.

Das erfindungsgemäße Verfahren vereinigt die Vorzüge sowohl stochastischer Lernverfahren als auch deterministischer Lern verfahren unter gleichzeitiger Vermeidung der Nachteile beider Verfahrenstypen. Das beschriebene Verfahren ist mit Erfolg auf das Problem der Lauterkennung auf dem Gebiet der Sprachverar beitung angewendet worden und mit bekannten Standard-Lernver fahren verglichen worden. Dabei haben diese Experimente die Überlegenheit des beschriebenen hybriden Lernverfahrens deut lich herausgestellt.The method according to the invention combines both the advantages stochastic learning methods as well as deterministic learning proceed while avoiding the disadvantages of both Process types. The procedure described is successful the problem of sound recognition in the field of speech processing processing has been applied and with known standard learning processes driving have been compared. These experiments have the Superiority of the hybrid learning process described highlighted.

Claims

1. Learning procedure for artificial neural networks, in which in in a first phase, a stochastic optimization process ren and in a subsequent second phase deterministic optimization method is used.

2. Learning method according to claim 1, in which in the first phase a genetic optimization process is used.

3. Learning method according to one of the preceding claims, at which in the first phase use the Metropolis algorithm det.

4. Learning method according to one of the preceding claims, at which uses a gradient descent in the second phase becomes.

5. Learning method according to one of the preceding claims, at in the second phase a back propagation algorithm is used.

6. Learning method according to one of the preceding claims, at which goes from the first to the second phase, as soon as a measure of learning progress during the course the first phase to a value below a predetermined a threshold has fallen.

7. Learning method according to claims 2 and 6, in which the measure for learning progress from a measure of similarity of individuals in a population of neural networks is directed.

8. Learning method according to claim 7, wherein the similarity two neural networks from one measure for the distance of the Vectors of their weight is determined.

9. Learning method according to claim 2 or one of the claims che 4 to 8 in connection with claim 2, in the autonomous Populations through different processors of a multi processor system are processed.

10. Learning method according to claim 9, at which at a specific time score or individual when certain events occur Individuals or sub-populations between the processors be replaced.

11. Learning method according to claim 10, wherein the probability possibility for a migration of individuals or partial populations lations between different processors over time growing monotonously.

12. Learning method according to one of claims 9 to 11, in which another central processor descendant of the Popula tion and on different processors of the Mul distributed processor system.

13. Learning method according to claim 2 or one of the claims che 4 to 12 in connection with claim 2, in the Chromo somen or genes of individuals implemented as vectors and in which genetic operations by vectori be carried out in parallel.