DE102020204544A1

DE102020204544A1 - Computer-implemented method and device for control with an artificial neural network

Info

Publication number: DE102020204544A1
Application number: DE102020204544.5A
Authority: DE
Inventors: Volker Fischer
Original assignee: Robert Bosch GmbH
Current assignee: Robert Bosch GmbH
Priority date: 2019-04-26
Filing date: 2020-04-08
Publication date: 2020-10-29
Also published as: CN111860762A

Abstract

Computerimplementiertes Verfahren zum Betreiben eines Bildklassifikators mit einem künstlichen neuronalen Netzwerk, dadurch gekennzeichnet, dass eine Ausgangsgröße des künstlichen neuronalen Netzwerks bestimmt wird (608), wobei die Ausgangsgröße abhängig von Eingangsdaten für das künstlichen neuronale Netzwerk und abhängig von einer Aktivierungsfunktion des künstlichen neuronalen Netzwerks bestimmt wird (606) und eine Klassifikation der Eingangsdaten charakterisiert, wobei die Aktivierungsfunktion zumindest abschnittsweise kontinuierlich ableitbar und monoton steigend ist, wobei die Aktivierungsfunktion wenigstens drei Fixpunkte aufweist, wobei eine Anzahl der Fixpunkte der Aktivierungsfunktion entweder endlich und ungerade oder abzählbar und diskret ist, und wobei eine Ableitung der Aktivierungsfunktion in Bereichen zwischen benachbarten Fixpunkten abwechselnd monoton steigend und monoton fallend ist.Computer-implemented method for operating an image classifier with an artificial neural network, characterized in that an output variable of the artificial neural network is determined (608), the output variable being determined as a function of input data for the artificial neural network and as a function of an activation function of the artificial neural network (606) and characterizes a classification of the input data, the activation function being continuously derivable and monotonically increasing, at least in sections, the activation function having at least three fixed points, a number of the fixed points of the activation function being either finite and odd or countable and discrete, and one Derivation of the activation function in areas between neighboring fixed points is alternately monotonically increasing and monotonically decreasing.

Description

Stand der TechnikState of the art

Die Erfindung geht aus von einem computerimplementierten Verfahren und einer Vorrichtung zur Ansteuerung mit einem künstlichen neuronalen Netzwerk. In den künstlichen neuronalen Netzwerken ist Information gespeichert, durch die Eingangsgrößen, beispielsweise von Sensoren, auf Ausgangsgrößen für die Ansteuerung abbildbar sind.The invention is based on a computer-implemented method and a device for control with an artificial neural network. Information is stored in the artificial neural networks by means of which input variables, for example from sensors, can be mapped onto output variables for control.

Eine wesentliche Komponente tiefer künstlicher neuronaler Netzwerke sind Aktivierungsfunktionen. Diese führen in bestimmten Architekturen und insbesondere in rekurrenten Architekturen des künstlichen neuronalen Netzwerks zu explodierenden oder verschwindenden Aktivierungen. Explodierende bzw. verschwindende Aktivierung bedeutet in diesem Zusammenhang, dass Zustände einzelner Einheiten, d.h. Neuronen, eines neuronalen Netzwerks Zahlenwerte annehmen die zu numerisch instabilem Verhalten des neuronalen Netzwerks zum einen während des Trainings des Netzwerkes und zum anderen während der Inferenz, d.h. in der Anwendungsphase, des neuronalen Netzes führen. Während des Trainings führen numerisch explodierende bzw. verschwindende Aktivierungen zu numerisch instabilen Gradienten eines Gradientenverfahrens, das zur Bestimmung der Parameter des künstlichen neuronalen Netzwerks verwendet wird. Während des Trainings können diese explodierenden bzw. verschwindenden Gradienten dazu führen, dass das Modell, d.h. die Parameter, die im künstlichen neuronalen Netzwerk gespeichert sind, nicht mehr gelernt werden. Damit wird die im künstlichen neuronalen Netzwerk gespeicherte Information unbrauchbar. Die Folge kann eine unerwünschte Ansteuerung sein. Darüber hinaus können explodierende oder verschwindende Aktivierungen bei der Ansteuerung, d.h. in der Inferenzphase, an sich zu unerwünschtem Verhalten des Systems führen. Dies ist verstärkt der Fall bei rekurrenten neuronalen Netzen, da diese durch ihre wiederholten rückgekoppelten Berechnungen instabiles Verhalten verstärken.
Wünschenswert ist es, eine demgegenüber verbesserte Ansteuerung mit einem künstlichen neuronalen Netzwerk bereit zu stellen.Activation functions are an essential component of deep artificial neural networks. In certain architectures and particularly in recurrent architectures of the artificial neural network, these lead to exploding or disappearing activations. Exploding or disappearing activation in this context means that the states of individual units, i.e. neurons, of a neural network assume numerical values that lead to numerically unstable behavior of the neural network on the one hand during training of the network and on the other hand during inference, i.e. in the application phase, of the neural network. During training, numerically exploding or vanishing activations lead to numerically unstable gradients of a gradient method that is used to determine the parameters of the artificial neural network. During training, these exploding or vanishing gradients can lead to the model, ie the parameters that are stored in the artificial neural network, no longer being learned. This makes the information stored in the artificial neural network unusable. The result can be undesired activation. In addition, exploding or disappearing activations during activation, ie in the inference phase, can in themselves lead to undesired behavior of the system. This is increasingly the case with recurrent neural networks, as these reinforce unstable behavior through their repeated feedback calculations.
It is desirable to provide an improved control with an artificial neural network.

Offenbarung der ErfindungDisclosure of the invention

Dies wird durch den Gegenstand der unabhängigen Ansprüche erreicht.This is achieved through the subject matter of the independent claims.

Ein Computerimplementiertes Verfahren zum Betreiben eines Bildklassifikators mit einem künstlichen neuronalen Netzwerk, insbesondere eine Faltungsnetzwerk (Englisch: „convolutional neural network“, CNN), dadurch gekennzeichnet, dass eine Ausgangsgröße des künstlichen neuronalen Netzwerks bestimmt wird (608), wobei die Ausgangsgröße abhängig von Eingangsdaten für das künstlichen neuronale Netzwerk und abhängig von einer Aktivierungsfunktion des künstlichen neuronalen Netzwerks bestimmt wird und eine Klassifikation der Eingangsdaten charakterisiert,
wobei die Aktivierungsfunktion zumindest abschnittsweise kontinuierlich ableitbar und monoton steigend ist, wobei die Aktivierungsfunktion wenigstens drei Fixpunkte aufweist, wobei eine Anzahl der Fixpunkte der Aktivierungsfunktion entweder endlich und ungerade oder abzählbar und diskret ist, und wobei eine Ableitung der Aktivierungsfunktion in Bereichen zwischen benachbarten Fixpunkten abwechselnd monoton steigend und monoton fallend ist.A computer-implemented method for operating an image classifier with an artificial neural network, in particular a convolutional neural network (CNN), characterized in that an output variable of the artificial neural network is determined (608), the output variable depending on input data is determined for the artificial neural network and depending on an activation function of the artificial neural network and characterizes a classification of the input data,
The activation function can be continuously derived and monotonically increasing, at least in sections, the activation function having at least three fixed points, a number of the fixed points of the activation function being either finite and odd or countable and discrete, and a derivative of the activation function in areas between adjacent fixed points alternately monotonously is increasing and monotonically decreasing.

Die Eingangsdaten können hierbei durch Bilddaten gegeben sein, die beispielsweise Frames eines Ausgangssignals eines Videosensors umfassen können, oder eines Radarsensors, oder eines LiDAR-Sensors, oder eines Ultraschallsensors.The input data can be given by image data, which can include, for example, frames of an output signal from a video sensor, or a radar sensor, or a LiDAR sensor, or an ultrasonic sensor.

Unter einer Klassifikation kann hierbei auch eine semantische Segmentierung als eine bereichsweise, insbesondere pixelweise, Klassifikation verstanden werden, und/oder eine Detektion als eine Klassifikation, ob in einem Bildausschnitt ein Objekt erkannt wurde, oder nicht.A classification can also be understood as a semantic segmentation as a region-wise, in particular pixel-wise, classification, and / or a detection as a classification of whether or not an object was recognized in an image section.

Auch kann ein computerimplementiertes Verfahren zur Ansteuerung abhängig von der ermittelten Ausgangsgröße vorsehen, dass ein Roboter, ein Fahrzeug, ein Hausgerät, ein Werkzeug, eine Fertigungsmaschine, ein Assistenzsystem für einen Menschen, ein Zutrittskontrollsystem oder ein System zur Informationsübermittlung mit einer Ansteuergröße angesteuert wird, wobei die Ansteuergröße abhängig von der Ausgangsgröße bestimmt wird.A computer-implemented method for control, depending on the determined output variable, can also provide that a robot, a vehicle, a household appliance, a tool, a production machine, an assistance system for a person, an access control system or a system for transmitting information is controlled with a control variable, with the control variable is determined depending on the output variable.

Im Falle eines rekurrenten neuronalen Netzwerks, das in einer Schicht Rückkopplungen von einzelnen Einheiten des rekurrenten neuronalen Netzwerks auf sich selbst aufweist, wird die Aktivierungsfunktion iterativ auf den internen Zustand der Schicht angewendet. Insbesondere wenn die Einheiten der Schicht untereinander vollständig verbunden sind und zudem Eingänge von anderen Schichten des rekurrenten neuronalen Netzwerks vorgesehen sind, wird mit dieser Aktivierungsfunktion das Auftreten verschwindender oder explodierender Gradienten vermieden. Unter Fixpunkt wird ein Punkt verstanden, für den die Aktivierungsfunktion einen Wert x aus der Definitionsmenge der Aktivierungsfunktion auf denselben Wert x aus einem Wertebereich der Aktivierungsfunktion abbildet. Beispielsweise weist die Aktivierungsfunktion bei einem ersten Wert x₁ der Definitionsmenge der Aktivierungsfunktion einen ersten Fixpunkt F₁ auf, wobei die Aktivierungsfunktion bei einem zweiten Wert x₂ der Definitionsmenge der Aktivierungsfunktion einen zweiten Fixpunkt F₂ aufweist, wobei die Aktivierungsfunktion bei einem dritten Wert x₃ der Definitionsmenge der Aktivierungsfunktion einen dritten Fixpunkt F₃ aufweist. In diesem Beispiel ist der erste Wert x₁ kleiner als der zweite Wert x₂ und der zweite Wert kleiner als der dritte Wert x₃. Die Ableitung der Aktivierungsfunktion ist in diesem Beispiel in einem ersten Bereich zwischen dem ersten Wert x₁ und dem zweiten Wert x₂ monoton steigend und in einem zweiten Bereich zwischen dem zweiten Wert x₂ und dem dritten Wert x₃ monoton fallend. In diesem Beispiel sind der erste Bereich und der zweite Bereich benachbart. Das bedeutet, die Aktivierungsfunktion weist zwischen dem ersten Wert x₁ und dem zweiten Wert x₂ keinen weiteren Fixpunkt auf und die Aktivierungsfunktion weist zwischen dem zweiten Wert x₂ und dem dritten Wert x₃ keinen weiteren Fixpunkt auf. Diese Klasse von Aktivierungsfunktionen hat gegenüber herkömmlichen Aktivierungsfunktionen den Vorteil, dass ein Gradient der Aktivierungsfunktion größtenteils von Null verschieden ist. Diese Klasse von Aktivierungsfunktionen hat insbesondere für eine Anwendungsphase, d.h. für eine Interferenzphase, für rekurrente neuronale Netzwerke Vorteile, da diese Aktivierungsfunktionen stabiles, konvergierendes Verhalten insbesondere bei wiederholter Anwendung, d.h. bei wiederholter rückgekoppelter Berechnung haben. Zudem verbessert diese Klasse der Aktivierungsfunktionen die Lerngeschwindigkeit insbesondere im Training. Die Fixpunkte sind einfach vorgebbar. Dies erleichtert eine iterative Wahl der Aktivierungsfunktion insbesondere im Training.In the case of a recurrent neural network which has feedback from individual units of the recurrent neural network to itself in a layer, the activation function is applied iteratively to the internal state of the layer. In particular if the units of the layer are completely connected to one another and inputs from other layers of the recurrent neural network are also provided, this activation function avoids the occurrence of disappearing or exploding gradients. A fixed point is understood to mean a point for which the activation function maps a value x from the definition set of the activation function to the same value x from a value range of the activation function. For example, the activation function has a first fixed point given a first value x _{1 of} the definition set of the activation function F ₁ on, the activation function at a second value x ₂ a second fixed point of the definition set of the activation function F ₂ having, the activation function having a third fixed point at a third value x _{3 of} the definition set of the activation function F ₃ having. In this example, the first value x _{1 is} smaller than the second value x ₂ and the second value is less than the third value x ₃ . In this example, the derivative of the activation function is in a first range between the first value x ₁ and the second value x ₂ increasing monotonically and in a second range between the second value x ₂ and the third value x ₃ decreasing monotonically. In this example, the first area and the second area are adjacent. This means that the activation function points between the first value x ₁ and the second value x ₂ no further fixed point and the activation function points between the second value x ₂ and the third value x _{3 has} no further fixed point. This class of activation functions has the advantage over conventional activation functions that a gradient of the activation function is largely different from zero. This class of activation functions has advantages in particular for an application phase, ie for an interference phase, for recurrent neural networks, since these activation functions have stable, converging behavior, particularly when used repeatedly, ie when the calculation is repeated with feedback. In addition, this class of activation functions improves learning speed, especially during training. The fixed points are easy to specify. This facilitates an iterative choice of the activation function, especially during training.

Vorzugsweise ist vorgesehen, dass die Ableitung der Aktivierungsfunktion für Werte der Definitionsmenge der Aktivierungsfunktion, die kleiner sind als der kleinste Wert der Definitionsmenge, für den ein Fixpunkt der Aktivierungsfunktion definiert ist, kleiner als Eins ist, oder dass die Ableitung der Aktivierungsfunktion für Werte der Definitionsmenge der Aktivierungsfunktion, die größer sind als der größte Wert der Definitionsmenge, für den ein Fixpunkt der Aktivierungsfunktion definiert ist, kleiner als Eins ist. Diese Klasse Aktivierungsfunktionen ist besonders für den Ersatz von herkömmlichen Sigmoid Funktionen geeignet.It is preferably provided that the derivation of the activation function for values of the definition set of the activation function that are smaller than the smallest value of the definition set for which a fixed point of the activation function is defined is less than one, or that the derivative of the activation function for values of the definition set of the activation function that are greater than the largest value of the definition set for which a fixed point of the activation function is defined is less than one. This class of activation functions is particularly suitable for replacing conventional sigmoid functions.

Vorzugsweise ist vorgesehen, dass die Ableitung insbesondere im Bereich von Werten der Definitionsmenge, für die kein Fixpunkt der Aktivierungsfunktion definiert ist, größer als Null ist. Diese Klasse von Aktivierungsfunktionen hat gegenüber herkömmlichen Aktivierungsfunktionen den Vorteil, dass der Gradient der Aktivierungsfunktion zumindest außerhalb der Fixpunkte von Null verschieden ist. Dies verbessert die Lerngeschwindigkeit zusätzlich.It is preferably provided that the derivative is greater than zero, in particular in the range of values of the definition set for which no fixed point of the activation function is defined. This class of activation functions has the advantage over conventional activation functions that the gradient of the activation function differs from zero at least outside the fixed points. This additionally improves the learning speed.

Vorzugsweise ist vorgesehen, dass eine Wertemenge der Aktivierungsfunktion durch einen Zahlenbereich zwischen einem unteren Wert und einem oberen Wert definiert ist, insbesondere durch einen Zahlenbereich von einschließlich Null bis einschließlich Eins. Dies ermöglicht einen direkten Einsatz der Aktivierungsfunktion in herkömmlichen künstlichen neuronalen Netzwerken.It is preferably provided that a set of values of the activation function is defined by a number range between a lower value and an upper value, in particular by a number range from zero to one inclusive. This enables the activation function to be used directly in conventional artificial neural networks.

Vorzugsweise ist vorgesehen, dass die Aktivierungsfunktion kontinuierlich ableitbar ist. Dadurch sind Berechnungen in einem Gradientenverfahren besonders effizient durchführbar.It is preferably provided that the activation function can be derived continuously. As a result, calculations can be carried out particularly efficiently using a gradient method.

Vorzugsweise ist vorgesehen, dass die Aktivierungsfunktion eine glatte Funktion oder eine abschnittsweise lineare Funktion ist, wobei der Fixpunkt mit dem kleinsten Wert aller Fixpunkte der Aktivierungsfunktion den Wert Null aufweist, wobei der Fixpunkt mit dem größten Wert aller Fixpunkte den Wert Eins aufweist. Diese Funktion eignet sich besonders gut für den Ersatz einer Sigmoid Funktion.It is preferably provided that the activation function is a smooth function or a linear function in sections, the fixed point with the smallest value of all fixed points of the activation function having the value zero, the fixed point with the largest value of all fixed points having the value one. This function is particularly suitable for replacing a sigmoid function.

Vorzugsweise ist vorgesehen, dass die Aktivierungsfunktion eine abschnittsweise lineare Funktion ist, wobei die Aktivierungsfunktion zwischen zwei zueinander unmittelbar benachbarten Fixpunkten wenigstens zwei Abschnitte mit linearen Funktionen unterschiedlicher Steigung aufweist. Diese Funktion ermöglicht ein Verhalten ähnlich einer Rundung zu einem Fixpunkt, ohne jedoch ein Gradientenverfahren durch einen Gradienten mit dem Wert Null zu beeinträchtigen. Dazu hat die Aktivierungsfunktion in ein einem ersten der Abschnitte eine erste Steigung und ein einem zweiten der Abschnitt eine zweite Steigung. Die erste Steigung ist gegenüber der zweiten Steigung sehr viel kleiner. Der erste Abschnitt ist gegenüber dem zweiten Abschnitt sehr viel größer. Dadurch werden die Werte aus dem Definitionsbereich der Aktivierungsfunktion in dem ersten Abschnitt auf Werte aus dem Wertebereich der Aktivierungsfunktion abgebildet, die sehr nah zueinander und sehr nahe des einen Fixpunkts liegen.It is preferably provided that the activation function is a linear function in sections, the activation function having at least two sections with linear functions of different slopes between two fixed points that are directly adjacent to one another. This function enables a behavior similar to a rounding to a fixed point, but without affecting a gradient method with a gradient with the value zero. For this purpose, the activation function has a first gradient in a first of the sections and a second gradient in a second of the sections. The first slope is very much smaller than the second slope. The first section is much larger than the second section. As a result, the values from the definition range of the activation function in the first section are mapped onto values from the value range of the activation function which are very close to one another and very close to the one fixed point.

Vorzugsweise ist vorgesehen, dass die Werte der Fixpunkte Potenzen von Zwei sind. Dadurch werden insbesondere für den Fall der zwei Abschnitte mit linearen Funktionen unterschiedlicher Steigung besonders geeignete Werte aus dem Wertebereich der Aktivierungsfunktion zugeordnet.It is preferably provided that the values of the fixed points are powers of two. As a result, particularly suitable values from the value range of the activation function are assigned to the case of the two sections with linear functions of different slopes.

Eine Vorrichtung zur Ansteuerung mit einem künstlichen neuronalen Netzwerk sieht vor, dass die Vorrichtung einen Ausgang zur Ansteuerung eines Roboters, eines Fahrzeugs, eines Hausgeräts, eines Werkzeugs, einer Fertigungsmaschine, eines Assistenzsystems für einen Menschen, eines Zutrittskontrollsystems oder eines Systems zur Informationsübermittlung mit einer Ansteuergröße umfasst, wobei die Vorrichtung eine Recheneinrichtung, einen Speicher für das künstliche neuronale Netzwerk und einen Eingang für Eingangsdaten umfasst, wobei die Vorrichtung ausgebildet ist, die Ansteuergröße gemäß dem beschriebenen Verfahren zu bestimmen. Diese Vorrichtung eignet sich zum Training des künstlichen neuronalen Netzwerks und zum Ansteuern nach erfolgtem Training.A device for control with an artificial neural network provides that the device has an output for controlling a robot, a vehicle, a household appliance, a tool, a manufacturing machine, an assistance system for a person, an access control system or a system for transmitting information with a control variable comprises, wherein the device comprises a computing device, a memory for the artificial neural network and an input for input data, wherein the device is designed to set the control variable according to the procedure described. This device is suitable for training the artificial neural network and for controlling it after training has taken place.

Weitere vorteilhafte Ausführungsformen ergeben sich aus der folgenden Beschreibung und der Zeichnung. In der Zeichnung zeigt

1 eine schematische Darstellung einer Vorrichtung mit einem künstlichen neuronalen Netzwerk,
2 eine schematische Darstellung einer ersten Aktivierungsfunktion,
3 eine schematische Darstellung einer zweiten Aktivierungsfunktion,
4 eine schematische Darstellung einer dritten Aktivierungsfunktion,
5 eine schematische Darstellung einer vierten Aktivierungsfunktion,
6 Schritte in einem Verfahren zur Ansteuerung.

Further advantageous embodiments emerge from the following description and the drawing. In the drawing shows

1 a schematic representation of a device with an artificial neural network,
2 a schematic representation of a first activation function,
3 a schematic representation of a second activation function,
4th a schematic representation of a third activation function,
5 a schematic representation of a fourth activation function,
6 Steps in a method of control.

In 1 ist eine Vorrichtung 100 mit einem künstlichen neuronalen Netzwerk schematisch dargestellt.In 1 is a device 100 shown schematically with an artificial neural network.

Die Vorrichtung 100 umfasst einen Ausgang 102 zur Ansteuerung eines Roboters, eines Fahrzeugs, eines Hausgeräts, eines Werkzeugs, einer Fertigungsmaschine, eines Assistenzsystems für einen Menschen, eines Zutrittskontrollsystems oder eines Systems zur Informationsübermittlung mit einer Ansteuergröße 104.The device 100 includes an exit 102 for controlling a robot, a vehicle, a household appliance, a tool, a manufacturing machine, an assistance system for a person, an access control system or a system for transmitting information with a control variable 104 .

Die Vorrichtung 100 umfasst eine Recheneinrichtung 106, einen Speicher 108 für ein künstliches neuronales Netzwerk und einen Eingang 110 für Eingangsdaten 112. Die Eingangsdaten sind beispielsweise Sensordaten, die an eine Eingangsschicht des künstlichen neuronalen Netzwerks übergeben werden. Dazu werden die Eingangsdaten beispielsweise normiert. Im Beispiel wird eine Ausgangsgröße an einer Ausgangsschicht des künstlichen neuronalen Netzwerks für den Ausgang 102 zur Bestimmung der Ansteuergröße 104 übergeben. Die Ansteuergröße 104 wird beispielsweise zur Ansteuerung eines Aktors bestimmt.The device 100 comprises a computing device 106 , a memory 108 for an artificial neural network and an input 110 for input data 112 . The input data are, for example, sensor data that are transferred to an input layer of the artificial neural network. For this purpose, the input data are standardized, for example. In the example, an output variable at an output layer of the artificial neural network is used for the output 102 to determine the control variable 104 to hand over. The control variable 104 is intended, for example, to control an actuator.

Die Vorrichtung 100 ist ausgebildet, die Ansteuergröße 104 gemäß dem im Folgenden beschriebenen Verfahren zu bestimmen. Die Vorrichtung 100 ist in einem Aspekt zum Training des künstlichen neuronalen Netzwerks und in einem anderen Aspekt zum Ansteuern des Roboters, des Fahrzeugs, des Hausgeräts, des Werkzeugs, der Fertigungsmaschine, des Assistenzsystems für den Menschen, des Zutrittskontrollsystems oder des Systems zur Informationsübermittlung nach erfolgtem Training ausgebildet. Die Vorrichtung 100 kann auch im Betrieb selbstlernend ausgebildet sein.The device 100 is designed, the control variable 104 according to the procedure described below. The device 100 is designed in one aspect to train the artificial neural network and in another aspect to control the robot, the vehicle, the household appliance, the tool, the production machine, the assistance system for humans, the access control system or the information transfer system after training has taken place. The device 100 can also be designed to be self-learning during operation.

Das künstliche neuronale Netzwerk ist beispielweise ein tiefes künstliches neuronales Netzwerk, deep neural network, insbesondere ein als rekurrentes neuronales Netzwerk, RNN, ausgebildetes künstliches neuronales Netzwerk. Vorzugsweise weist das RNN für manche Schichten Rückkopplungen auf. Das RNN kann als vollständig verbundenes RNN ausgebildet sein. Es kann auch eine Vielzahl Long-Short Term Memory, LSTM oder ein Gated Recurrent Unit module, GRU module, vorgesehen sein.The artificial neural network is, for example, a deep artificial neural network, in particular an artificial neural network designed as a recurrent neural network, RNN. The RNN preferably has feedback for some layers. The RNN can be designed as a fully connected RNN. A large number of long-short term memories, LSTMs or a gated recurrent unit module, GRU module, can also be provided.

Das künstliche neuronale Netzwerk verwendet in Einheiten 114 eine Aktivierungsfunktion. Diese Aktivierungsfunktion kann für eine Schicht 116 oder für alle Schichten mit Aktivierungsfunktion dieselbe sein, es können auch unterschiedliche Aktivierungsfunktionen in den einzelnen Einheiten oder Schichten verwendet werden. Es können Rückkopplungen 118 von Einheiten einer Schicht zu Einheiten einer vorherigen Schicht oder von einer Einheit einer Schicht zu sich selbst vorgesehen sein. In 1 sind Einheiten, die eine Übertragungsfunktion umfassten, mit Σ bezeichnet. Es sind auch Einheiten 120 vorgesehen, die Parameter umfassten. Die Parameter sind Gewichte für das künstliche neuronale Netzwerk. Die Parameter können in einem Training oder bei einem selbstlernenden künstlichen neuronalen Netzwerk im Betrieb mittels Gradientenverfahren erlernt werden.The artificial neural network used in units 114 an activation function. This activation function can be used for one shift 116 or be the same for all layers with an activation function; different activation functions can also be used in the individual units or layers. There can be feedback 118 from units of a layer to units of a previous layer or from a unit of a layer to itself. In 1 are units that included a transfer function, denoted by Σ. There are also units 120 provided that included parameters. The parameters are weights for the artificial neural network. The parameters can be learned in a training session or with a self-learning artificial neural network in operation using gradient methods.

Die 2 bis 5 stellen beispielhafte Aktivierungsfunktionen dar. Die Aktivierungsfunktionen sind in einem Koordinatensystem aufgetragen, in dem Werte y aus einem Wertebereich der Aktivierungsfunktion über Werten x aus einem Definitionsbereich der Aktivierungsfunktion aufgetragen sind.The 2 to 5 represent exemplary activation functions. The activation functions are plotted in a coordinate system in which values y from a value range of the activation function are plotted over values x from a definition range of the activation function.

Die Aktivierungsfunktionen weisen wenigstens drei Fixpunkte auf. Die Aktivierungsfunktionen sind abschnittsweise kontinuierlich ableitbar und monoton steigend. Eine Anzahl der Fixpunkte der Aktivierungsfunktion ist entweder endlich und ungerade oder abzählbar und diskret.The activation functions have at least three fixed points. The activation functions can be derived continuously in sections and increase monotonically. A number of the fixed points of the activation function is either finite and odd or countable and discrete.

Eine Ableitung der Aktivierungsfunktionen ist in Bereichen zwischen benachbarten Fixpunkten abwechselnd monoton steigend und monoton fallend. A derivation of the activation functions is alternately monotonically increasing and monotonically decreasing in areas between neighboring fixed points.

Unter Fixpunkt wird ein Punkt verstanden, für den die Aktivierungsfunktion einen Wert x aus der Definitionsmenge der Aktivierungsfunktion auf denselben Wert x aus einem Wertebereich der Aktivierungsfunktion abbildet.A fixed point is understood to mean a point for which the activation function maps a value x from the definition set of the activation function to the same value x from a value range of the activation function.

Beispielsweise weist die in 2 dargestellte erste Aktivierungsfunktion 200 bei einem ersten Wert x₁ der Definitionsmenge der ersten Aktivierungsfunktion 200 einen ersten Fixpunkt F₁ auf. Diese erste Aktivierungsfunktion 200 weist bei einem zweiten Wert x₂ der Definitionsmenge der ersten Aktivierungsfunktion 200 einen zweiten Fixpunkt F₂ auf. Diese erste Aktivierungsfunktion 200 weist bei einem dritten Wert x₃ der Definitionsmenge der ersten Aktivierungsfunktion 200 einen dritten Fixpunkt F₃ auf.For example, the in 2 illustrated first activation function 200 with a first value x _{1 of} the definition set of the first Activation function 200 a first fixed point F ₁ on. This first activation function 200 points at a second value x ₂ the definition set of the first activation function 200 a second fixed point F ₂ on. This first activation function 200 has a third value x _{3 of} the definition set of the first activation function 200 a third fixed point F ₃ on.

In diesem Beispiel ist der erste Wert x₁ im Ursprung des Koordinatensystems sind kleiner als der zweite Wert x₂ und der zweite Wert kleiner als der dritte Wert x₃.In this example the first value x ₁ at the origin of the coordinate system is smaller than the second value x ₂ and the second value is less than the third value x ₃ .

Die Ableitung der ersten Aktivierungsfunktion 200 ist in diesem Beispiel in einem ersten Bereich 202 zwischen dem ersten Wert x₁ und dem zweiten Wert x₂ monoton steigend und in einem zweiten Bereich 204 zwischen dem zweiten Wert x₂ und dem dritten Wert x₃ monoton fallend.The derivation of the first activation function 200 is in this example in a first area 202 between the first value x ₁ and the second value x ₂ increasing monotonously and in a second area 204 between the second value x ₂ and the third value x ₃ decreasing monotonically.

In diesem Beispiel sind der erste Bereich 202 und der zweite Bereich 204 benachbart. Das bedeutet, die erste Aktivierungsfunktion 200 weist zwischen dem ersten Wert x₁ und dem zweiten Wert x₂ keinen weiteren Fixpunkt auf und die erste Aktivierungsfunktion 200 weist zwischen dem zweiten Wert x₂ und dem dritten Wert x₃ keinen weiteren Fixpunkt auf.In this example are the first area 202 and the second area 204 adjacent. That means the first activation function 200 points between the first value x ₁ and the second value x ₂ no further fixed point and the first activation function 200 points between the second value x ₂ and the third value x _{3 has} no further fixed point.

Die erste Aktivierungsfunktion 200 ist kontinuierlich ableitbar.The first activation function 200 is continuously derivable.

In 3 ist eine zweite Aktivierungsfunktion 300 dargestellt. Die zweite Aktivierungsfunktion 300 weist im Beispiel dieselben Fixpunkte auf, wie die erste Aktivierungsfunktion 200. Im Unterschied zur ersten Aktivierungsfunktion 200 ist die zweite Aktivierungsfunktion 300 eine abschnittsweise lineare Funktion.In 3 is a second activation function 300 shown. The second activation function 300 has the same fixed points in the example as the first activation function 200 . In contrast to the first activation function 200 is the second activation function 300 a linear function in sections.

Die zweite Aktivierungsfunktion 300 weist zwischen zwei zueinander unmittelbar benachbarten Fixpunkten wenigstens zwei Abschnitte mit linearen Funktionen unterschiedlicher Steigung auf. In 3 sind ein erster Abschnitt 302 zwischen dem ersten Fixpunkt F₁ und dem zweiten Fixpunkt F₂ sowie ein zweiter Abschnitt 304 zwischen dem zweiten Fixpunkt F₂ und dem dritten Fixpunkt F₃ dargestellt.The second activation function 300 has at least two sections with linear functions of different slopes between two fixed points that are directly adjacent to one another. In 3 are a first section 302 between the first fixed point F ₁ and the second fixed point F ₂ as well as a second section 304 between the second fixed point F ₂ and the third fixed point F ₃ shown.

Vorzugsweise schließt der erste Abschnitt 302 im ersten Fixpunkt F₁ an einen weiteren Abschnitt mit derselben Steigung an. Vorzugsweise schließt sich an den zweiten Abschnitt 304 im dritten Fixpunkt F₃ ein weiterer Abschnitt mit derselben Steigung an.The first section preferably closes 302 in the first fixed point F ₁ to another section with the same slope. The second section preferably follows 304 in the third fixed point F ₃ another section with the same slope.

Die in 4 dargestellte dritte Aktivierungsfunktion 400 ist abschnittsweise linear und weist Fixpunkte auf, die im Beispiel Werte x aufweisen, die ganzzahlige Zahlen sind. Im Beispiel sind n Fixpunkt F₁ , ... F_n dargestellt. Die Steigung der abschnittsweise linearen dritte Aktivierungsfunktion 400 ist wie für die zweite Aktivierungsfunktion 300 beschrieben. Zwischen den Fixpunkten entstehen so abwechselnd monoton steigende und monoton fallende Bereiche.In the 4th third activation function shown 400 is linear in sections and has fixed points, which in the example have values x that are integer numbers. In the example n are fixed points F ₁ , ... F _n shown. The slope of the segmentally linear third activation function 400 is the same as for the second activation function 300 described. Between the fixed points there are alternating monotonically increasing and monotonously decreasing areas.

In der in 5 dargestellten vierten Aktivierungsfunktion 500 ist abschnittsweise linear und weist Fixpunkte auf, die im Beispiel Werte x aufweisen, die eine Potenz von Zwei sind. Im Beispiel sind n Fixpunkt F₁ , ... F_n dargestellt.In the in 5 fourth activation function shown 500 is linear in sections and has fixed points, which in the example have values x that are a power of two. In the example n are fixed points F ₁ , ... F _n shown.

Die Wertemenge der beschriebenen Aktivierungsfunktion ist vorzugsweise durch einen Zahlenbereich zwischen einem unteren Wert und einem oberen Wert definiert. Vorzugsweise ist die Wertemenge durch einen Zahlenbereich von einschließlich Null bis einschließlich Eins definiert.The set of values of the activation function described is preferably defined by a range of numbers between a lower value and an upper value. The set of values is preferably defined by a range of numbers from zero to one inclusive.

Die Ableitung der beschriebenen Aktivierungsfunktionen ist vorzugsweise für Werte der Definitionsmenge der Aktivierungsfunktion, die kleiner sind als der kleinste Wert der Definitionsmenge, für den ein Fixpunkt der Aktivierungsfunktion definiert ist, kleiner als Eins. Die Ableitung der beschriebenen Aktivierungsfunktionen ist vorzugsweise für Werte der Definitionsmenge der Aktivierungsfunktion, die größer sind als der größte Wert der Definitionsmenge, für den ein Fixpunkt der Aktivierungsfunktion definiert ist, kleiner als Eins.The derivation of the described activation functions is preferably less than one for values of the definition set of the activation function that are smaller than the smallest value of the definition set for which a fixed point of the activation function is defined. The derivation of the described activation functions is preferably less than one for values of the definition set of the activation function that are greater than the largest value of the definition set for which a fixed point of the activation function is defined.

Die Ableitung beschriebenen Aktivierungsfunktionen ist insbesondere im Bereich von Werten der Definitionsmenge, für die kein Fixpunkt der Aktivierungsfunktion definiert ist, größer als Null.The derivation of the activation functions described is greater than zero, in particular in the range of values of the definition set for which no fixed point of the activation function is defined.

Anhand der 6 wird im Folgenden ein Verfahren beschrieben, das eine der beschriebenen Aktivierungsfunktionen einsetzt.Based on 6 the following describes a method that uses one of the activation functions described.

Das computerimplementierte Verfahren dient dem Training des künstlichen neuronalen Netzwerks oder einer Ansteuerung des Roboters, des Fahrzeugs, des Hausgeräts, des Werkzeugs, der Fertigungsmaschine, des Assistenzsystems für den Menschen, des Zutrittskontrollsystems oder des Systems zur Informationsübermittlung mit der Ansteuergröße 104.The computer-implemented method is used to train the artificial neural network or to control the robot, the vehicle, the household appliance, the tool, the production machine, the assistance system for humans, the access control system or the information transmission system with the control variable 104 .

In einem Schritt 602 werden Eingangsdaten 112 erfasst. Die Eingangsdaten 112 sind beispielsweise Sensordaten.In one step 602 are input data 112 detected. The input data 112 are for example sensor data.

In einem Schritt 604 werden die Eingangsdaten 112 an die Eingangsschicht des künstlichen neuronalen Netzwerks übergeben. Die Eingangsdaten 112 werden beispielsweise zuvor normiert.In one step 604 are the input data 112 passed to the input layer of the artificial neural network. The input data 112 are standardized beforehand, for example.

In einem Schritt 606 wird die Ausgangsgröße des künstlichen neuronalen Netzwerks bestimmt. Die Ausgangsgröße wird abhängig von den Eingangsdaten für das künstliche neuronale Netzwerk und abhängig von einer Aktivierungsfunktion des künstlichen neuronalen Netzwerks bestimmt. Die Aktivierungsfunktion ist zumindest abschnittsweise kontinuierlich ableitbar und monoton steigend ist. Die Aktivierungsfunktion weist wenigstens drei Fixpunkte auf. Die Anzahl der Fixpunkte der Aktivierungsfunktion ist entweder endlich und ungerade oder abzählbar und diskret. Die Ableitung der Aktivierungsfunktion ist in Bereichen zwischen benachbarten Fixpunkten abwechselnd monoton steigend und monoton fallend. Im Beispiel wird eine der beschriebenen Aktivierungsfunktionen verwendet.In one step 606 the output size of the artificial neural network is determined. The output variable is determined as a function of the input data for the artificial neural network and as a function of an activation function of the artificial neural network. The activation function can be continuously derived at least in sections and is monotonically increasing. The activation function has at least three fixed points. The number of fixed points of the activation function is either finite and odd or countable and discrete. The derivation of the activation function is alternately monotonically increasing and monotonically decreasing in areas between neighboring fixed points. In the example one of the described activation functions is used.

Optional wird in diesem Schritt eine Gradientenverfahren verwendet, um abhängig von der Ableitung der Aktivierungsfunktion Parameterupdates für die Parameter des künstlichen neuronalen Netzwerks zu bestimmen.Optionally, a gradient method is used in this step in order to determine parameter updates for the parameters of the artificial neural network as a function of the derivation of the activation function.

Besonders vorteilhaft ist es, in einem rekurrenten neuronalen Netzwerk den Zustand wenigstens eines Neurons des rekurrenten neuronalen Netzwerks durch wiederholte rückgekoppelte Berechnung des Zustands mit dem Gradientenverfahren und abhängig von der Aktivierungsfunktion zu bestimmen. Sowohl in einer Trainingsphase als auch in einer Interferenzphase, d.h. bei einer Anwendung des künstlichen neuronalen Netzwerks, werden damit explodierende oder verschwindende Gradienten vermieden.It is particularly advantageous to determine the state of at least one neuron of the recurrent neural network in a recurrent neural network by repeated feedback calculation of the state using the gradient method and depending on the activation function. Both in a training phase and in an interference phase, i.e. when using the artificial neural network, exploding or disappearing gradients are avoided.

In einem Schritt 608 wird die Ansteuergröße abhängig von der Ausgangsgröße des künstlichen neuronalen Netzwerks bestimmt.In one step 608 the control variable is determined depending on the output variable of the artificial neural network.

In einem Schritt 610 wird der Roboter, das Fahrzeug, das Hausgerät, das Werkzeug, die Fertigungsmaschine, das Assistenzsystem für den Menschen, das Zutrittskontrollsystem oder des Systems zur Informationsübermittlung mit der Ansteuergröße 104 angesteuert.In one step 610 the robot, the vehicle, the household appliance, the tool, the production machine, the assistance system for humans, the access control system or the system for transmitting information with the control variable 104 controlled.

Anschließend wird der Schritt 602 ausgeführt.Then the step 602 executed.

Das Verfahren wird beispielsweise zur Regelung oder Steuerung des Aktors abhängig von Sensordaten gestartet und beendet, wenn die Regelung oder Steuerung abgeschaltet wird.The method is started, for example, to regulate or control the actuator as a function of sensor data and end when the regulation or control is switched off.

Die Sensordaten sind beispielsweise LiDAR, Radar, Raddrehzahlfühler, Video- oder Audiodaten, Daten eines Beschleunigungs- oder Gierratensensors oder Daten mit Information über einen Zustand einer Antriebsmaschine, eines Lenkwinkels oder einer Gaspedalstellung. Beispielsweise wird das künstliche neuronale Netzwerk dafür trainiert, abhängig von den Sensordaten ein teilweise autonomes Kraftfahrzeug anzusteuern.The sensor data are, for example, LiDAR, radar, wheel speed sensors, video or audio data, data from an acceleration or yaw rate sensor or data with information about a state of a prime mover, a steering angle or an accelerator pedal position. For example, the artificial neural network is trained to drive a partially autonomous motor vehicle depending on the sensor data.

Der Aktor ist beispielsweise eine aktive Lenkung, eine Motorsteuerung oder ein Ventil dafür. Der Aktor kann auch eine Zugangskontrolleinrichtung sein, beispielsweise eine Personenvereinzelungseinrichtung oder ein Türschließ- und öffenmechanismus, der abhängig von einer in einem Bild- oder Audiosignal erkannten Person den Zugang freigibt oder nicht.The actuator is, for example, an active steering system, an engine controller or a valve for it. The actuator can also be an access control device, for example a people isolation device or a door closing and opening mechanism which, depending on a person recognized in an image or audio signal, enables access or not.

Die Vorrichtung und das Verfahren ermöglichen das Training besonders effizient zu gestalten. Der Einsatz eines selbstlernenden künstlichen neuronalen Netzwerks ist aufgrund dieser Aktivierungsfunktionen besonders schnell möglich.The device and the method enable the training to be designed particularly efficiently. The use of a self-learning artificial neural network is particularly quick due to these activation functions.

Claims

Computer-implemented method for operating an image classifier with an artificial neural network, characterized in that an output variable of the artificial neural network is determined (608), the output variable being determined as a function of input data for the artificial neural network and as a function of an activation function of the artificial neural network (606) and characterizes a classification of the input data, the activation function being continuously derivable and monotonically increasing, at least in sections, the activation function having at least three fixed points, a number of the fixed points of the activation function being either finite and odd or countable and discrete, and one Derivation of the activation function in areas between neighboring fixed points is alternately monotonically increasing and monotonically decreasing.

Procedure according to Claim 1 , characterized in that the derivative of the activation function for values of the definition set of the activation function which are smaller than the smallest value of the definition set for which a fixed point of the activation function is defined is less than one, or that the derivative of the activation function for values of the definition set of the activation function that are greater than the largest value of the definition set for which a fixed point of the activation function is defined is less than one.

Procedure according to Claim 1 or 2 , characterized in that the derivative is greater than zero in particular in the range of values of the definition set for which no fixed point of the activation function is defined.

Method according to one of the preceding claims, characterized in that a set of values of the activation function is defined by a number range between a lower value and an upper value, in particular by a Number range from zero to one inclusive.

Method according to one of the preceding claims, characterized in that the activation function can be derived continuously.

Method according to one of the preceding claims, characterized in that the activation function is a smooth function or a linear function in sections, the fixed point with the smallest value of all fixed points of the activation function having the value zero, the fixed point with the largest value of all fixed points having the value Has one.

Method according to one of the Claims 1 to 5 , characterized in that the activation function is a linear function in sections, the activation function having at least two sections with linear functions of different slopes between two fixed points which are immediately adjacent to one another.

Method according to one of the preceding claims, characterized in that the values of the fixed points are powers of two.

Method according to one of the preceding claims, characterized in that the state of a neuron in a recurrent neural network is determined by repeated feedback calculation of the state with a gradient method and depending on the activation function.

Device (100) for control with an artificial neural network, characterized in that the device (100) has an output (102) for controlling a robot, a vehicle, a household appliance, a tool, a manufacturing machine, an assistance system for a person, a An access control system or a system for transmitting information with a control variable (104), the device (100) comprising a computing device (106), a memory (108) for the artificial neural network and an input (10) for input data (112), wherein the device (100) is designed, the control variable (104) according to the method according to one of the Claims 1 to 9 to determine.

Computer program, characterized in that the computer program comprises computer-readable instructions, when they are executed by the computer, the method according to one of the Claims 1 to 9 expires.

Computer program product, characterized by a storage medium on which the computer program is Claim 11 is stored.