DE102021208877A1

DE102021208877A1 - Training of neural networks for equivariance or invariance against changes in the input image

Info

Publication number: DE102021208877A1
Application number: DE102021208877.5A
Authority: DE
Inventors: Ivan Sosnovik; Jan Hendrik Metzen; Arnold Smeulders; Sadaf Gulshad
Original assignee: Robert Bosch GmbH
Current assignee: Robert Bosch GmbH
Priority date: 2021-08-12
Filing date: 2021-08-12
Publication date: 2023-02-16
Also published as: WO2023016859A1

Abstract

Verfahren (100) zum Trainieren eines neuronalen Netzwerks (1), das zur Verarbeitung von Eingabe-Bildern (2) ausgebildet ist und mehrere Faltungsschichten umfasst, wobei jede dieser Faltungsschichten dazu ausgebildet ist, die Eingabe f der jeweiligen Faltungsschicht durch Anwenden mindestens eines Filterkerns κ auf mindestens eine Merkmalskarte Φ(f,κ) abzubilden, mit den Schritten:• es wird eine Menge T von Transformationen T bereitgestellt (110), bezüglich derer das neuronale Netzwerk (1) beim Training in die Lage versetzt werden soll, bei Anwendung dieser Transformationen auf die Eingabe f mindestens einer Faltungsschicht die Erzeugung mindestens einer äquivarianten oder invarianten Merkmalskarte Φ(f,κ) zu erlernen;• diese Merkmalskarte Φ(f, κ) wird durch eine mit Parametern (5a) parametrisierte Aggregation (5) von Merkmalskarten Φj(f, Tj[κ]) ausgedrückt (120), die jeweils durch das Anwenden von Transformationen Tj∈ T auf den mindestens einen Filterkern κ erhalten werden;• es werden Lern-Bilder (2a) sowie Lern-Ausgaben (3a), auf die das trainierte neuronale Netzwerk (1) diese Lern-Bilder (2a) idealerweise abbilden soll, bereitgestellt (130);• die Lern-Bilder (2a) werden von dem neuronalen Netzwerk (1) auf Ausgaben (3) abgebildet (140);• Abweichungen dieser Ausgaben (3) von den Lern-Ausgaben (3a) werden mit einer vorgegebenen Kostenfunktion (4) bewertet (150);• Parameter (5a) der parametrisierten Aggregation (5) sowie weitere Parameter (1a), die das Verhalten des neuronalen Netzwerks (1) charakterisieren, werden optimiert (160) mit dem Ziel, dass sich bei weiterer Verarbeitung von Lern-Bildern (2a) die Bewertung (4a) durch die Kostenfunktion (4) voraussichtlich verbessert.Method (100) for training a neural network (1) which is designed to process input images (2) and comprises a plurality of convolution layers, each of these convolution layers being designed to filter the input f of the respective convolution layer by applying at least one filter kernel κ onto at least one feature map Φ(f,κ), with the steps:• a set T of transformations T is provided (110) with respect to which the neural network (1) is to be enabled during training when these are applied Transformations to the input f of at least one convolution layer to learn how to generate at least one equivariant or invariant feature map Φ(f,κ);• this feature map Φ(f, κ) is parameterized by an aggregation (5) of feature maps Φj with parameters (5a). (f, Tj[κ]) expressed (120), each obtained by applying transformations Tj∈ T to the at least one filter kernel κ;• L ern images (2a) and learning outputs (3a) to which the trained neural network (1) should ideally map these learning images (2a), provided (130); • the learning images (2a) are provided by the neural network (1) on outputs (3) mapped (140);• Deviations of these outputs (3) from the learning outputs (3a) are evaluated with a predetermined cost function (4) (150);• Parameters (5a) of the parameterized Aggregation (5) and other parameters (1a), which characterize the behavior of the neural network (1), are optimized (160) with the aim that, with further processing of learning images (2a), the evaluation (4a) by the Cost function (4) expected to improve.

Description

Die vorliegende Erfindung betrifft das Training neuronaler Netzwerke, die Bilder verarbeiten und beispielsweise auf Klassifikations-Scores in Bezug auf Klassen einer vorgegebenen Klassifikation abbilden.The present invention relates to the training of neural networks that process images and map them, for example, to classification scores in relation to classes of a given classification.

Stand der TechnikState of the art

Viele Fahrassistenzsysteme und Systeme für das zumindest teilweise automatisierte Fahren verarbeiten die von Sensoren eines Fahrzeugs aufgenommenen Messdaten mit Klassifikatoren zu Klassifikations-Scores in Bezug auf eine oder mehrere Klassen einer vorgegebenen Klassifikation. Auf der Basis dieser Klassifikations-Scores werden dann beispielsweise Entscheidungen über Eingriffe in die Fahrdynamik des Fahrzeugs getroffen.Many driver assistance systems and systems for at least partially automated driving process the measurement data recorded by sensors of a vehicle with classifiers to form classification scores in relation to one or more classes of a specified classification. On the basis of these classification scores, for example, decisions are then made about interventions in the driving dynamics of the vehicle.

Das Training derartiger Klassifikatoren erfordert Trainingsdaten mit einer großen Variabilität, damit der Klassifikator gut auf im Training bislang ungesehene Messdaten generalisieren kann. Das Aufnehmen von Trainingsdaten auf Testfahrten mit dem Fahrzeug und erst recht das größtenteils manuelle Labeln dieser Trainingsdaten mit Soll-Klassifikations-Scores sind zeitaufwändig und teuer.The training of such classifiers requires training data with a high degree of variability, so that the classifier can generalize well to measurement data previously unseen in the training. The recording of training data on test drives with the vehicle and especially the largely manual labeling of this training data with target classification scores are time-consuming and expensive.

Daher werden die Trainingsdaten häufig mit synthetisch generierten Trainingsdaten angereichert. So offenbart etwa die DE 10 2018 204 494 B3 ein Verfahren, mit dem Radarsignale synthetisch generiert werden können, um damit physikalisch aufgenommene Radarsignale für das Training eines Klassifikators anzureichern.Therefore, the training data is often enriched with synthetically generated training data. So reveals about the DE 10 2018 204 494 B3 a method with which radar signals can be generated synthetically in order to enrich physically recorded radar signals for training a classifier.

Offenbarung der ErfindungDisclosure of Invention

Im Rahmen der Erfindung wurde ein Verfahren zum Trainieren eines neuronalen Netzwerks entwickelt. Dieses neuronale Netzwerk ist zur Verarbeitung von Eingabe-Bildern ausgebildet und umfasst mehrere Faltungsschichten. Hierbei ist jede Faltungsschicht dazu ausgebildet, ihre jeweilige Eingabe f durch Anwenden mindestens eines Filterkerns κ auf mindestens eine Merkmalskarte Φ(f,κ) abzubilden. Typischerweise weist diese Merkmalskarte Φ(f,κ) eine im Vergleich zu der Eingabe f deutlich reduzierte Dimensionalität auf.A method for training a neural network was developed as part of the invention. This neural network is designed to process input images and includes multiple layers of convolution. Here, each convolution layer is designed to map its respective input f to at least one feature map Φ(f,κ) by applying at least one filter kernel κ. Typically, this feature map Φ(f,κ) has a significantly reduced dimensionality compared to the input f.

Beispielsweise kann ein Bildklassifikator, der Eingabe-Bilder auf Klassifikations-Scores bezüglich einer oder mehreren Klassen einer vorgegebenen Klassifikation abbildet, als neuronales Netzwerk gewählt werden. Insbesondere die von der letzten Faltungsschicht in einer Abfolge von Faltungsschichten gelieferten Merkmalskarten können im Hinblick auf die Klassifikations-Scores ausgewertet werden.For example, an image classifier that maps input images to classification scores related to one or more classes of a given classification can be chosen as the neural network. In particular, the feature maps supplied by the last convolutional layer in a sequence of convolutional layers can be evaluated with regard to the classification scores.

Im Rahmen des Verfahrens wird eine Menge T von Transformationen T bereitgestellt, bezüglich derer das neuronale Netzwerk beim Training in die Lage versetzt werden soll, bei Anwendung dieser Transformationen auf die Eingabe f mindestens einer Faltungsschicht die Erzeugung mindestens einer äquivarianten oder invarianten Merkmalskarte Φ(f,κ) zu erlernen. Dies bedeutet nicht, dass die Merkmalskarte Φ(f,κ) immer gegen alle Transformationen T aus der eine Menge T äquivariant bzw. invariant wird. Vielmehr wird angestrebt, die Merkmalskarte Φ(f,κ) in dem Umfang äquivariant bzw. invariant gegen Transformationen zu machen, in dem derartige Transformationen in den beim Training verwendeten Lern-Bildern vorkommen.As part of the method, a set T of transformations T is provided, with respect to which the neural network is to be enabled during training, when these transformations are applied to the input f of at least one convolutional layer, the generation of at least one equivariant or invariant feature map Φ(f, k) to learn. This does not mean that the feature map Φ(f,κ) is always equivariant or invariant to all transformations T from the set T. Rather, the aim is to make the feature map Φ(f,κ) equivariant or invariant to transformations to the extent that such transformations occur in the learning images used during training.

Zu diesem Zweck wird die äquivariant bzw. invariant zu machende Merkmalskarte Φ(f,κ) durch eine mit Parametern parametrisierte Aggregation von Merkmalskarten Φ_j (f,T_j [κ]) ausgedrückt, die jeweils durch das Anwenden von Transformationen T_j ∈ T auf den mindestens einen Filterkern κ erhalten werden. Diese Parameter werden beim Training des neuronalen Netzwerks als zusätzliche Freiheitsgrade verwendet.For this purpose, the feature map Φ(f,κ) to be made equivariant or invariant is expressed by an aggregation of feature maps Φ _j (f,T _j [κ]) parameterized with parameters, which are each obtained by applying transformations T _j ∈ T can be obtained on the at least one filter kernel κ. These parameters are used as additional degrees of freedom when training the neural network.

Für das überwachte Training werden Lern-Bilder sowie Lern-Ausgaben, auf die das trainierte neuronale Netzwerk diese Lern-Bilder idealerweise abbilden soll, bereitgestellt. Die Lern-Bilder werden von dem neuronalen Netzwerk auf Ausgaben abgebildet, und Abweichungen dieser Ausgaben von den Lern-Ausgaben werden mit einer vorgegebenen Kostenfunktion bewertet.Learning images and learning outputs, onto which the trained neural network should ideally map these learning images, are provided for the monitored training. The learning images are mapped onto outputs by the neural network, and deviations of these outputs from the learning outputs are evaluated using a predetermined cost function.

Es werden nun Parameter der parametrisierten Aggregation sowie weitere Parameter, die das Verhalten des neuronalen Netzwerks charakterisieren, optimiert mit dem Ziel, dass sich bei weiterer Verarbeitung von Lern-Bildern die Bewertung durch die Kostenfunktion voraussichtlich verbessert. Diese weiteren Parameter können insbesondere beispielsweise Gewichte sein, mit denen Eingaben, die Neuronen oder anderen Verarbeitungseinheiten des neuronalen Netzwerks zugeführt werden, gewichtet zu einer Aktivierung dieses Neurons, bzw. dieser Verarbeitungseinheit, summiert werden.Parameters of the parameterized aggregation and other parameters that characterize the behavior of the neural network are now optimized with the aim that the evaluation by the cost function will probably improve with further processing of learning images. These further parameters can in particular be weights, for example, with which inputs that are fed to neurons or other processing units of the neural network are weighted and summed to result in an activation of this neuron or this processing unit.

Der Begriff „voraussichtlich“ ist in diesem Zusammenhang so zu verstehen, dass iterative numerische Optimierungsalgorithmen die neuen Werte der Parameter für die nächste Iteration auf Grund der Vorgeschichte an Iterationen auswählen in der Erwartung, dass sich hiermit die Bewertung durch die Kostenfunktion verbessert. Diese Erwartung muss sich jedoch nicht für jede Iteration erfüllen, d.h., eine Iteration kann sich auch als „Rückschritt“ erweisen. Der Optimierungsalgorithmus kann jedoch auch ein Feedback dieser Art nutzen, um so letztendlich zu Werten der Parameter zu gelangen, für die sich die Bewertung durch die Kostenfunktion verbessert.The term "probably" in this context is to be understood in such a way that iterative numerical optimization algorithms select the new values of the parameters for the next iteration on the basis of the history of iterations in the expectation that this will improve the evaluation by the cost function. However, this expectation does not have to be fulfilled for every iteration, ie an iteration can also turn out to be a "step backwards". However, the optimization algorithm can also use this type of feedback to ultimately arrive at values of the parameters for which the evaluation by the cost function improves.

Indem die Parameter der parametrisierten Aggregation als zusätzliche Freiheitsgrade für das Training verwendet werden, lernt das neuronale Netzwerk, Merkmalskarten genau in dem Umfang äquivariant oder invariant gegen Transformationen der Eingabe zu machen, wie dies der Leistung des neuronalen Netzwerks in Bezug auf die jeweilige konkrete Anwendung tatsächlich förderlich ist. Dies ist ein Stück weit analog zum Anpassungsprozess einer Brille bei einem Augenoptiker. Den Transformationen T entsprechen hier den verschiedenen Korrekturlinsen für Kurzsichtigkeit, Weitsichtigkeit, Astigmatismus und andere Abbildungsfehler des Auges. Es werden genau diejenigen Korrekturen angewendet, mit denen der Kunde die zum Testen vorgelegten Zahlen und Buchstaben am besten erkennen kann.By using the parameters of the parameterized aggregation as additional degrees of freedom for training, the neural network learns to make feature maps equivariant or invariant to transformations of the input to exactly the extent that the neural network's performance with respect to the particular concrete application actually does is beneficial. This is somewhat analogous to the fitting process for glasses at an optician. The transformations T here correspond to the various corrective lenses for short-sightedness, long-sightedness, astigmatism and other imaging errors of the eye. Precisely those corrections are applied that allow the customer to best recognize the numbers and letters presented for testing.

Der Nutzeffekt der trainierten Äquivarianzen und Invarianzen beim Training ist insbesondere, dass das neuronale Netzwerk Objekte und Sachverhalte in verschiedenen Eingabe-Bildern, die sich nur um eine Anwendung der besagten Transformationen unterscheiden und ansonsten inhaltlich gleich sind, als gleich erkennt. Die Erkenntnis, dass beispielsweise ein gedrehtes, skaliertes oder aus einer anderen Perspektive betrachtetes Fahrzeug immer noch ein Fahrzeug ist, muss dem neuronalen Netzwerk daher nicht mehr implizit vermittelt werden, indem ihm eine Vielzahl derartiger abgewandelter Lern-Bilder vorgelegt wird und all diese Lern-Bilder mit der gleichen Lern-Ausgabe gelabelt werden.The useful effect of the trained equivariances and invariances during training is, in particular, that the neural network recognizes as the same objects and facts in different input images that differ only by an application of the said transformations and are otherwise the same in content. The realization that, for example, a vehicle that has been rotated, scaled, or viewed from a different perspective is still a vehicle, therefore no longer has to be implicitly conveyed to the neural network by presenting it with a large number of such modified learning images and all of these learning images be labeled with the same learning output.

Dementsprechend kann sich die Variabilität der verwendeten Lern-Bilder auf diejenigen Eigenschaften konzentrieren, die mit dem neuronalen Netzwerk untersucht werden sollen. Ein bestimmtes quantitatives Maß an Leistung in Bezug auf die Aufgabe des neuronalen Netzwerks, bei einem Bildklassifikator beispielsweise gemessen an der Klassifikationsgenauigkeit auf einem Satz von Test- oder Validierungsdaten, lässt sich dann insgesamt mit einer geringeren Menge an Lern-Bildern erzielen. Gerade mit Lern-Ausgaben gelabelte Lern-Bilder von Verkehrssituationen sind besonders teuer zu beschaffen, da lange Testfahrten erforderlich sind und das Labeln manuelle Arbeit erfordert.Accordingly, the variability of the learning images used can concentrate on those properties that are to be examined with the neural network. A certain quantitative measure of performance in relation to the task of the neural network, for example in the case of an image classifier measured by the classification accuracy on a set of test or validation data, can then be achieved overall with a smaller amount of training images. Learning images of traffic situations labeled with learning outputs are particularly expensive to obtain, since long test drives are required and the labeling requires manual work.

Dabei reicht eine nur ungefähre Kenntnis derjenigen Transformationen, bezüglich derer das Lernen einer Äquivarianz oder Invarianz vorteilhaft sein könnte, um in Bezug auf die an das neuronale Netzwerk gestellte Aufgabe hiervon profitieren zu können. Insofern trägt auch hier die Analogie zum Augenoptiker, der zunächst einmal nur weiß, von welcher Art Abbildungsfehler überhaupt sein können, und die Art und Stärke von Abbildungsfehlern eines konkreten Auges erst durch den iterativen Anpassungsprozess herausfindet.In this case, only approximate knowledge of those transformations, with respect to which learning an equivariance or invariance could be advantageous, is sufficient in order to be able to benefit from this in relation to the task assigned to the neural network. In this respect, the analogy to the optician also applies here, who initially only knows what type of aberrations can actually be and only finds out the type and strength of aberrations in a specific eye through the iterative adjustment process.

In einer besonders vorteilhaften Ausgestaltung wird der Filterkern κ als mit Parametern w_i parametrisierte Linearkombination Σ_i w_iψ_i von Basisfunktionen ψ_i ausgedrückt. Die Wirkung der Transformationen T auf die Basisfunktionen ψ_i kann dann vorausberechnet und immer wieder verwendet werden. Während des Trainings werden nur die Parameter w_i variiert, um die Linearkombination anzupassen. Somit zieht jede Anpassung der Linearkombination im Zuge eines Trainingsschritts einen geringeren Rechenaufwand nach sich.In a particularly advantageous embodiment, the filter kernel κ is expressed as a linear combination Σ _i w _i ψ _i of basis functions ψ _i parameterized with parameters w _i . The effect of the transformations T on the basis functions ψ _i can then be calculated in advance and used again and again. During training, only the parameters w _i are varied to fit the linear combination. Thus, each adjustment of the linear combination in the course of a training step entails a lower computational effort.

Die Anwendung einer Transformation T auf die Eingabe f der Faltungsschicht macht die Merkmalskarte Φ(f,κ) zu einer Merkmalskarte Φ(T[f],κ). Wenn K eine Matrixdarstellung des Filterkerns κ und f eine Matrixdarstellung der Eingabe f ist, ist Φ(f,κ) = K × f. Das Anwenden der Transformation T mit der Matrixdarstellung T bewirkt hier, dass T mit f zu multiplizieren ist, bevor der Filterkern κ angewendet wird. Nach dem Assoziativgesetz für die Multiplikation gilt: $Φ (T' [ƒ], κ) = K \times (T \times f) = (K \times T) \times f = Φ (ƒ, T [κ])$

Applying a transformation T to the input f of the convolutional layer makes the feature map Φ(f,κ) a feature map Φ(T[f],κ). If K is a matrix representation of the filter kernel κ and f is a matrix representation of the input f, then Φ(f,κ) = K × f. Applying the transform T with the matrix representation T here causes T to be multiplied by f before the Filter core κ is applied. According to the associative law for multiplication:

Φ (T' [ƒ], k) = K \times (T \times f) = (K \times T) \times f = Φ (ƒ, T [k])

Das Transformieren der Eingabe f ist also äquivalent zum Transformieren des Filterkerns κ. Die Bezeichnungen T'[f] einerseits und T[κ] andererseits drücken aus, dass die Multiplikation mit der Matrix T nicht kommutativ ist. Das heißt, die Multiplikation mit T von links führt nicht zum gleichen Ergebnis wie die Multiplikation mit T von rechts.So transforming the input f is equivalent to transforming the filter kernel κ. The notations T'[f] on the one hand and T[κ] on the other hand express that the multiplication with the matrix T is not commutative. That is, multiplying by T from the left does not produce the same result as multiplying by T from the right.

Die Merkmalskarte Φ_j(f,T_j [κ]), die sich durch Anwendung der Transformation T_j auf einen Filterkern κ ergibt, lässt sich schreiben als: $Φ_{j} (ƒ, T_{j} [κ]) = Φ_{j} (ƒ, T_{j} [\sum_{i} w_{i} ψ_{i}]) = Φ_{j} (ƒ, \sum_{i} w_{i} T_{j} [ψ_{i}]) = Φ_{j} (ƒ, w \cdot T_{j} [ψ]),$

da die Zusammensetzung κ = Σ_i w_iψ_i = w · ψ aus den Basisfunktionen sich unter der Transformation T_j nicht ändert.The feature map Φ _j (f,T _j [κ]) obtained by applying the transformation T _j to a filter kernel κ can be written as:

Φ_{j} (ƒ, T_{j} [k]) = Φ_{j} (ƒ, T_{j} [\sum_{i} w_{i} ψ_{i}]) = Φ_{j} (ƒ, \sum_{i} w_{i} T_{j} [ψ_{i}]) = Φ_{j} (ƒ, w \cdot T_{j} [ψ]),

since the composition κ = Σ _i w _i ψ _i = w · ψ from the basis functions does not change under the transformation T _j .

Die Gewichtung der Merkmalskarten untereinander in der Aggregation kann dann insbesondere beispielsweise mit von der Eingabe f abhängigen Gewichten β_j(f) erfolgen. Eine Merkmalskarte Φ(f, T[κ]), die durch Anwenden einer oder mehrerer Transformationen T_j ∈ T entsteht, lässt sich dann schreiben als: $Φ (ƒ, T [κ]) = σ (\begin{matrix} β_{0} (ƒ) Φ (ƒ, w \cdot T_{0} [ψ]) \\ β_{1} (ƒ) Φ (ƒ, w \cdot T_{1} [ψ]) \\ \dots \\ β_{κ} (ƒ) Φ (ƒ, w \cdot T_{κ} [ψ]) \end{matrix})$

The feature maps can then be weighted among one another in the aggregation, for example with weights β _j (f) dependent on the input f. A feature map Φ(f, T[κ]), which is obtained by applying one or more transformations T _j ∈ T, can then be written as:

Φ (ƒ, T [k]) = σ (\begin{matrix} β_{0} (ƒ) Φ (ƒ, w \cdot T_{0} [ψ]) \\ β_{1} (ƒ) Φ (ƒ, w \cdot T_{1} [ψ]) \\ \dots \\ β_{k} (ƒ) Φ (ƒ, w \cdot T_{k} [ψ]) \end{matrix})

Hierin ist σ eine beliebige Aggregationsfunktion, und die T_j sind die Transformationen aus der Menge T. Diese Menge T kann insbesondere beispielsweise auch die Identität als Transformation enthalten. Das Training lässt sich dann beispielsweise so initialisieren, dass zunächst nur das Gewicht β_j(f) für die Identität gleich 1 ist und die Gewichte β_j(f) für alle anderen Transformationen T_j gleich 0 sind.Here σ is any aggregation function, and the T _j are the transformations from the set T. This set T can in particular also contain the identity as a transformation, for example. The training can then be initialized, for example, such that initially only the weight β _j (f) for the identity is equal to 1 and the weights β _j (f) for all other transformations T _j are equal to 0.

Als Basisfunktionen ψ für die Filterkerne κ können insbesondere beispielsweise Funktionen gewählt werden, die mindestens über Hermitesche Polynome H_m, H_n von Ortskoordinaten x, y in der Eingabe f abhängen: $ψ_{π} (x, y) = A \cdot \frac{1}{π^{2}} \cdot H_{n} (\frac{x}{π}) \cdot H m (\frac{y}{π}) \cdot exp (- \frac{x^{2} + y^{2}}{2 π^{2}})$

mit einer Normierungskonstanten A und dem Skalierungsfaktor π. Mit derartigen Basisfunktionen können insbesondere solche Filterkerne κ konstruiert werden, die für die Erkennung von Merkmalen in Bildern besonders geeignet sind.In particular, functions can be selected as basis functions ψ for the filter cores κ, which depend at least via Hermitian polynomials H _m , H _n on spatial coordinates x, y in the input f:

ψ_{π} (x, y) = A \cdot \frac{1}{π^{2}} \cdot H_{n} (\frac{x}{π}) \cdot H m (\frac{y}{π}) \cdot ex (- \frac{x^{2} + y^{2}}{2 π^{2}})

with a normalization constant A and the scaling factor π. Such basic functions can be used in particular to construct filter cores κ that are particularly suitable for recognizing features in images.

In einer besonders vorteilhaften Ausgestaltung wird mindestens eine Merkmalskarte Φ(f,κ) gewählt, die eine Summe aus der Eingabe f und einem Verarbeitungsprodukt, das durch sukzessive Anwendung mehrerer Filterkerne κ₁, κ₂, ... auf die Eingabe f entsteht, beinhaltet. Auf diese Weise können die Anpassungen, die für die verschiedenen Faltungsschichten in einer Stapelung in einem so genannten „Residual Block“ gelernt werden, miteinander koordiniert werden. Eine Merkmalskarte Φ(f, T[κ₁, κ₂, ... ]), die durch Anwenden einer oder mehrerer Transformationen T_j ∈ T entsteht, lässt sich dann schreiben als: $Φ (ƒ, T [κ_{1}, κ_{2}, \dots]) = ƒ + σ (\begin{matrix} β_{0} (ƒ) Φ (ƒ, w_{1} \cdot T_{0} [ψ_{1}], w_{2} \cdot T_{0} [ψ_{2}], \dots) \\ β_{1} (ƒ) Φ (ƒ, w_{1} \cdot T_{1} [ψ_{1}], w_{2} \cdot T_{1} [ψ_{2}], \dots) \\ \dots \\ β_{κ} (ƒ) Φ (ƒ, w_{1} \cdot T_{κ} [ψ_{1}], w_{2} \cdot T_{κ} [ψ_{2}], \dots) \end{matrix}) .$

In a particularly advantageous embodiment, at least one feature map Φ(f,κ) is selected, which contains a sum of the input f and a processing product that results from the successive application of a plurality of filter cores κ ₁ , κ ₂ , . . . to the input f . In this way, the adjustments learned for the different convolutional layers in a stack in a so-called “residual block” can be coordinated with each other. A feature map Φ(f, T[κ ₁ , κ ₂ , ... ]) obtained by applying one or more transformations T _j ∈ T can then be written as:

Φ (ƒ, T [k_{1}, k_{2}, ...]) = ƒ + σ (\begin{matrix} β_{0} (ƒ) Φ (ƒ, w_{1} \cdot T_{0} [ψ_{1}], w_{2} \cdot T_{0} [ψ_{2}], ...) \\ β_{1} (ƒ) Φ (ƒ, w_{1} \cdot T_{1} [ψ_{1}], w_{2} \cdot T_{1} [ψ_{2}], ...) \\ \dots \\ β_{k} (ƒ) Φ (ƒ, w_{1} \cdot T_{k} [ψ_{1}], w_{2} \cdot T_{k} [ψ_{2}], ...) \end{matrix}) .

Hierin sind ψ₁, ψ₂, ... die Basisfunktionen, aus denen die Filterkerne κ₁, κ₂, ... gebildet sind.Here ψ ₁ , ψ ₂ , ... are the basis functions from which the filter kernels κ ₁ , κ ₂ , ... are formed.

Die Transformationen können insbesondere beispielsweise elastische Transformationen sein. Dies sind Transformationen, die zumindest näherungsweise als Feld von Auslenkungen τ in räumlichen Koordinaten x des Eingabe-Bildes f mit einer Stärke ε beschreibbar sind: $T [ƒ (x)] (ε) \approx (x + ε τ (x)) .$

The transformations can in particular be elastic transformations, for example. These are transformations that can be described at least approximately as a field of displacements τ in spatial coordinates x of the input image f with a strength ε:

T [ƒ (x)] (e) \approx (x + e τ (x)) .

Hiermit lässt sich eine große Klasse von Transformationen annähern, die sich ergeben, wenn etwa eine für eine Bildaufnahme verwendete Kamera ihre Perspektive relativ zur Szenerie ändert.This can be used to approximate a large class of transformations that result when, for example, a camera used to take an image changes its perspective relative to the scenery.

Die elastischen Transformationen können insbesondere beispielsweise lineare Streckungen und/oder Rotationsskalierungen umfassen. Dies sind Transformationen, die beispielsweise durch eine Änderung der Perspektive einer Kamera relativ zu einem Objekt bewirkt werden.The elastic transformations can in particular include, for example, linear stretching and/or rotational scaling. These are transformations that are caused, for example, by changing the perspective of a camera relative to an object.

Koordinaten x', y' im Eingabe-Bild nach einer linearen Streckung können beispielsweise gemäß $\begin{matrix} x' = \sqrt{x^{2} + y^{2}} \cdot (sin (θ) sin (δ) + γ cos (θ) cos (δ)), \\ y' = \sqrt{x^{2} + y^{2}} \cdot (- cos (θ) sin (δ) + γ sin (θ) cos (δ)) \end{matrix}$

aus den ursprünglichen Koordinaten x, y hervorgehen. Hierin ist γ = 1/(e^-6 + cos(α)), δ = θ - ϕ, ϕ = arctan(y/x), θ ist eine kleine Auslegung, und α ist ein Elastizitätskoeffizient. Hierin soll die sehr kleine positive, willkürlich gewählte Konstante e^-6 eine Division durch Null verhindern, wenn cos(α) = 0.Coordinates x', y' in the input image after a linear stretching can, for example, according to

\begin{matrix} x' = \sqrt{x^{2} + y^{2}} \cdot (sin (θ) sin (δ) + g cos (θ) cos (δ)), \\ y' = \sqrt{x^{2} + y^{2}} \cdot (- cos (θ) sin (δ) + g sin (θ) cos (δ)) \end{matrix}

from the original coordinates x, y. Here γ = 1/(e ^-6 + cos(α)), δ = θ - φ, φ = arctan(y/x), θ is a small design, and α is a coefficient of elasticity. Here the very small positive, arbitrarily chosen constant e ^-6 is intended to prevent division by zero when cos(α) = 0.

Koordinaten x', y' im Eingabe-Bild nach einer Rotationsskalierung können beispielsweise gemäß $\begin{matrix} x' = x + α (x cos (θ) + y sin (θ)), \\ y' = y + α (- x sin (θ) + y cos (θ)) \end{matrix}$

aus den ursprünglichen Koordinaten x, y hervorgehen.Coordinates x', y' in the input image after rotational scaling can be, for example, according to

\begin{matrix} x' = x + a (x cos (θ) + y sin (θ)), \\ y' = y + a (- x sin (θ) + y cos (θ)) \end{matrix}

from the original coordinates x, y.

Bei diesen Transformationen wird jeweils angenommen, dass das Zentrum des Filterkerns κ im Punkt (0, 0) liegt und ein Fixpunkt der Transformation ist.In these transformations it is assumed that the center of the filter kernel κ is at the point (0, 0) and is a fixed point of the transformation.

Die Aggregationsfunktion σ für die Aggregation der Merkmalskarten Φ_j(f,T_j [K]) kann insbesondere beispielsweise für jedes Element der Merkmalskarten

• ein elementweises Maximum,
• ein geglättetes elementweises Maximum oder
• ein elementweiser Mittelwert

entlang der Dimension j der Transformationen T_j ∈ T bilden.The aggregation function σ for the aggregation of the feature maps Φ _j (f,T _j [K]) can in particular, for example, for each element of the feature maps

• an elementwise maximum,
• a smoothed elementwise maximum or
• an element-wise average

along dimension j of the transformations T _j ∈ T.

Wenn beispielsweise ein Eingabe-Bild f eine Höhe H, eine Breite W und eine Anzahl C von Farbkanälen hat, kann es als Tensor der Form C × H × W vorliegen. Die K Transformationen T aus der Menge T fügen eine weitere Dimension hinzu. Die Aggregationsfunktion σ kann nun beispielsweise aus einem Raum der Dimension K × C × H × W zurück in den Raum der Dimension C × H × W abbilden und hierbei insbesondere beispielsweise diejenige Transformation T_j auswählen, die am besten zu den verfügbaren Trainingsdaten passt. Dies kann beispielsweise daran gemessen werden, wie groß jeweils die Aktivierungen von Neuronen sind, die für bestimmte Transformationen zuständig sind.For example, if an input image f has height H, width W, and number C of color channels, it can exist as a tensor of the form C×H×W. The K transformations T from the set T add another dimension. The aggregation function σ can now, for example, map from a space of the dimension K×C×H×W back into the space of the dimension C×H×W and, in particular, select the transformation T _j that best matches the available training data. This can be measured, for example, by the level of activation of neurons that are responsible for certain transformations.

Unter einem elementweisen Maximum, bzw. einem elementweisen Mittelwert, wird in diesem Zusammenhang insbesondere beispielsweise verstanden, dass für jeden Eintrag in den Dimensionen C × H × W separat ein Maximum, bzw. ein Mittelwert, entlang der Dimension K der Transformationen gebildet wird. Ein geglättetes Maximum kann beispielsweise mit der Logsumexp-Funktion $σ (x) = log \sum_{i} exp (x_{i})$

ermittelt werden.An element-by-element maximum or an element-by-element mean is understood in this context, for example, as meaning that a maximum or a mean is formed separately for each entry in the dimensions C×H×W along the dimension K of the transformations. For example, a smoothed maximum can be obtained using the Logsumexp function

σ (x) = log \sum_{i} ex (x_{i})

be determined.

In einer weiteren vorteilhaften Ausgestaltung beinhaltet das Aggregieren von Merkmalskarten Φ_j(f,T_j[κ]), eine Norm über eine oder mehrere räumliche Dimensionen einer jeden Merkmalskarte zu bilden und eine oder mehrere Merkmalskarten anhand dieser Norm auszuwählen. Beispielsweise können l_p-Normen entlang der Dimensionen C, H × W oder C × H × W gebildet werden. Es kann dann entlang der K-Dimension ermittelt werden, für welche Transformationen sich die größten Normen ergeben. Es kann also eine Merkmalskarte und somit auch eine Transformation ausgewählt werden, die am besten zu den vorhandenen Daten passt.In a further advantageous embodiment, the aggregation of feature maps Φ _j (f,T _j [κ]) includes forming a norm over one or more spatial dimensions of each feature map and selecting one or more feature maps based on this norm. For example, l _p -norms can be formed along the dimensions C, H×W, or C×H×W. It can then be determined along the K-dimension for which transformations the largest norms result. A feature map and thus also a transformation can be selected that best fits the existing data.

Wie zuvor erläutert, wird das neuronale Netzwerk durch das Training, das Invarianzen und Äquivarianzen einbezieht, in die Lage versetzt, seine übliche Aufgabe besser zu erfüllen. Dies schlägt sich beispielsweise bei einem Bildklassifikator in einer höheren Klassifikationsgenauigkeit auf Test-Bildern oder Validierungs-Bildern nieder.As previously discussed, training involving invariances and equivariances enables the neural network to do its usual job better. In the case of an image classifier, for example, this is reflected in a higher classification accuracy on test images or validation images.

Daher werden in einer weiteren vorteilhaften Ausgestaltung dem trainierten neuronalen Netzwerk Eingabe-Bilder zugeführt, die mit mindestens einem Sensor aufgenommen wurden, so dass diese Eingabe-Bilder von dem neuronalen Netzwerk auf Ausgaben abgebildet werden. Aus den Ausgaben wird ein Ansteuersignal ermittelt. Ein Fahrzeug, und/oder ein System für die Qualitätskontrolle von Produkten, und/oder ein System für die Überwachung von Bereichen, wird mit diesem Ansteuersignal angesteuert.Therefore, in a further advantageous embodiment, the trained neural network is supplied with input images that were recorded with at least one sensor, so that these input images are mapped onto outputs by the neural network. A control signal is determined from the outputs. A vehicle and/or a system for product quality control and/or a system for monitoring areas is controlled with this control signal.

Auf Grund der zutreffenden Ausgabe des neuronalen Netzwerks ist dann die Wahrscheinlichkeit, dass die von dem jeweils angesteuerten System ausgeführte Aktion der mit dem Sensor erfassten Situation angemessen ist, vorteilhaft erhöht.Due to the correct output of the neural network, the probability that the action carried out by the respectively controlled system is appropriate to the situation detected by the sensor is then advantageously increased.

Das Verfahren kann insbesondere ganz oder teilweise computerimplementiert sein. Daher bezieht sich die Erfindung auch auf ein Computerprogramm mit maschinenlesbaren Anweisungen, die, wenn sie auf einem oder mehreren Computern ausgeführt werden, den oder die Computer dazu veranlassen, das beschriebene Verfahren auszuführen. In diesem Sinne sind auch Steuergeräte für Fahrzeuge und Embedded-Systeme für technische Geräte, die ebenfalls in der Lage sind, maschinenlesbare Anweisungen auszuführen, als Computer anzusehen.In particular, the method can be fully or partially computer-implemented. The invention therefore also relates to a computer program with machine-readable instructions which, when executed on one or more computers, cause the computer or computers to carry out the method described. In this sense, control devices for vehicles and embedded systems for technical devices that are also able to execute machine-readable instructions are also to be regarded as computers.

Ebenso bezieht sich die Erfindung auch auf einen maschinenlesbaren Datenträger und/oder auf ein Downloadprodukt mit dem Computerprogramm. Ein Downloadprodukt ist ein über ein Datennetzwerk übertragbares, d.h. von einem Benutzer des Datennetzwerks downloadbares, digitales Produkt, das beispielsweise in einem Online-Shop zum sofortigen Download feilgeboten werden kann.The invention also relates to a machine-readable data carrier and/or a download product with the computer program. A downloadable product is a digital product that can be transmitted over a data network, i.e. can be downloaded by a user of the data network and that can be offered for sale in an online shop for immediate download, for example.

Weiterhin kann ein Computer mit dem Computerprogramm, mit dem maschinenlesbaren Datenträger bzw. mit dem Downloadprodukt ausgerüstet sein.Furthermore, a computer can be equipped with the computer program, with the machine-readable data carrier or with the downloadable product.

Weitere, die Erfindung verbessernde Maßnahmen werden nachstehend gemeinsam mit der Beschreibung der bevorzugten Ausführungsbeispiele der Erfindung anhand von Figuren näher dargestellt.Further measures improving the invention are presented in more detail below together with the description of the preferred exemplary embodiments of the invention with the aid of figures.

Ausführungsbeispieleexemplary embodiments

Es zeigt:

1 Ausführungsbeispiel des Verfahrens 100 zum Trainieren eines neuronalen Netzwerks 1;
2 Beispielhafte Wirkung des Trainings mit dem Verfahren 100 auf die Klassifikationsgenauigkeit eines Bildklassifikators.

It shows:

1 Embodiment of the method 100 for training a neural network 1;
2 Exemplary effect of training with the method 100 on the classification accuracy of an image classifier.

1 ist ein schematisches Ablaufdiagramm eines Ausführungsbeispiels des Verfahrens 100 zum Trainieren eines neuronalen Netzwerks 1. Es kann insbesondere beispielsweise in Schritt 105 ein Bildklassifikator als neuronales Netzwerk 1 gewählt werden. Das neuronale Netzwerk 1 ist zur Verarbeitung von Eingabe-Bildern 2 ausgebildet und umfasst mehrere Faltungsschichten. Jede dieser Faltungsschichten ist dazu ausgebildet, die Eingabe f der jeweiligen Faltungsschicht durch Anwenden mindestens eines Filterkerns κ auf mindestens eine Merkmalskarte Φ(f,κ) abzubilden. 1 1 is a schematic flowchart of an exemplary embodiment of the method 100 for training a neural network 1. In particular, an image classifier can be selected as the neural network 1 in step 105, for example. The neural network 1 is designed to process input images 2 and comprises a number of convolution layers. Each of these convolutional layers is designed to map the input f of the respective convolutional layer to at least one feature map Φ(f,κ) by applying at least one filter kernel κ.

In Schritt 110 wird eine Menge T von Transformationen T bereitgestellt. Das neuronale Netzwerk 1 kann im Rahmen des hier beschriebenen Trainings lernen, bei Anwendung einer oder mehrerer dieser Transformationen T auf die Eingabe f mindestens einer Faltungsschicht des Netzwerks 1 die Erzeugung mindestens einer äquivarianten oder invarianten Merkmalskarte Φ(f,κ) zu erlernen.In step 110, a set T of transformations T is provided. As part of the training described here, the neural network 1 can learn how to generate at least one equivariant or invariant feature map Φ(f,κ) when applying one or more of these transformations T to the input f of at least one convolutional layer of the network 1 .

Gemäß Block 111 können insbesondere beispielsweise elastische Transformationen, die als Feld von Auslenkungen in räumlichen Koordinaten des Eingabe-Bildes beschreibbar sind, als Transformationen T ∈ T gewählt werden. Diese elastischen Transformationen können insbesondere beispielsweise gemäß Block 111a lineare Streckungen und/oder Rotationsskalierungen umfassen.According to block 111, elastic transformations, for example, which can be described as a field of deflections in spatial coordinates of the input image, can be selected as transformations T∈T. In particular, these elastic transformations can include linear stretching and/or rotational scaling according to block 111a, for example.

In Schritt 120 wird diese Merkmalskarte Φ(f,κ) durch eine mit Parametern 5a parametrisierte Aggregation 5 von Merkmalskarten Φ_j(f,T_j[κ]) ausgedrückt 120, die jeweils durch das Anwenden von Transformationen T_j ∈ T auf den mindestens einen Filterkern κ erhalten werden. Das heißt, die Ausgabe der entsprechenden Faltungsschicht ändert sich in Abhängigkeit der Parameter 5a.In step 120, this feature map Φ(f,κ) is expressed 120 by an aggregation 5 of feature maps Φ _j (f,T _j [κ]) parameterized with parameters 5a, each obtained by applying Transformatio NEN T _j ∈ T on the at least one filter core κ can be obtained. That is, the output of the corresponding convolutional layer changes depending on the parameters 5a.

Gemäß Block 121 kann der Filterkern κ als mit Parametern w_i parametrisierte Linearkombination Σ_i w_iψ_i von Basisfunktionen ψ_i ausgedrückt werden. Hierbei können insbesondere beispielsweise gemäß Block 121a Basisfunktionen ψ_i gewählt werden, die mindestens über Hermitesche Polynome von Ortskoordinaten x, y in der Eingabe f abhängen.According to block 121, the filter kernel κ can be expressed as a linear combination Σ _i w _i ψ _i of basis functions ψ _i parameterized with parameters w _i . In this case, for example, according to block 121a, basic functions ψ _i can be selected, which depend at least on Hermitian polynomials on location coordinates x, y in the input f.

Gemäß Block 122 können die Merkmalskarten Φ_j (f,T_j [κ]) in der Aggregation 5 untereinander mit von der Eingabe f abhängigen Gewichten β_j(f) gewichtet werden.According to block 122, the feature maps Φ _j (f,T _j [κ]) in the aggregation 5 can be weighted among one another with weights β _j (f) dependent on the input f.

Gemäß Block 123 kann für das Parametrisieren mit den Parametern 5a mindestens eine Merkmalskarte Φ(f, κ) gewählt werden, die eine Summe aus der Eingabe f und einem Verarbeitungsprodukt, das durch sukzessive Anwendung mehrerer Filterkerne κ₁, κ₂, ... auf die Eingabe f entsteht, beinhaltet. Eine derartige Merkmalskarte ist das Arbeitsergebnis eines „Residual Blocks“.According to block 123, at least one feature map Φ(f, κ) can be selected for the parameterization with the parameters 5a, which is a sum of the input f and a processing product, which is obtained by successive application of several filter cores κ ₁ , κ ₂ , . . the input f arises. Such a feature map is the work result of a "residual block".

Das Aggregieren der Merkmalskarten Φ_j(f,T_j [κ]) kann gemäß Block 124 beinhalten, für jedes Element dieser Merkmalskarten

entlang der Dimension j der Transformationen T_j ∈ T zu bilden.Aggregating the feature maps may include Φ _j (f,T _j [κ]) according to block 124, for each element of these feature maps

along the dimension j of the transformations T _j ∈ T.

Alternativ oder auch in Kombination hierzu kann das Aggregieren der Merkmalskarten Φ_j(f,T_j [κ]) gemäß Block 125 beinhalten, eine Norm über eine oder mehrere räumliche Dimensionen einer jeden Merkmalskarte zu bilden und eine oder mehrere Merkmalskarten anhand dieser Norm auszuwählen.Alternatively or in combination with this, the aggregation of the feature maps Φ _j (f,T _j [κ]) according to block 125 can include forming a norm over one or more spatial dimensions of each feature map and selecting one or more feature maps based on this norm.

In Schritt 130 werden Lern-Bilder 2a sowie Lern-Ausgaben 3a, auf die das trainierte neuronale Netzwerk 1 diese Lern-Bilder 2a idealerweise abbilden soll, bereitgestellt.In step 130, learning images 2a and learning outputs 3a, onto which the trained neural network 1 should ideally map these learning images 2a, are provided.

In Schritt 140 werden die Lern-Bilder 2a von dem neuronalen Netzwerk 1 auf Ausgaben 3 abgebildet.In step 140, the learning images 2a are mapped by the neural network 1 to outputs 3.

Abweichungen dieser Ausgaben 3 von den Lern-Ausgaben 3a werden in Schritt 150 mit einer vorgegebenen Kostenfunktion 4 bewertet.Deviations of these outputs 3 from the learning outputs 3a are evaluated in step 150 using a predetermined cost function 4 .

In Schritt 160 werden Parameter 5a der parametrisierten Aggregation 5 sowie weitere Parameter 1a, die das Verhalten des neuronalen Netzwerks 1 charakterisieren, optimiert mit dem Ziel, dass sich bei weiterer Verarbeitung von Lern-Bildern 2a die Bewertung 4a durch die Kostenfunktion 4 voraussichtlich verbessert. Die fertig trainierten Zustände der Parameter 1a und 5a sind mit den Bezugszeichen 1a* bzw. 5a* bezeichnet. Das fertig trainierte neuronale Netzwerk 1, dessen Verhalten durch die Parameter 1a* und 5a* charakterisiert ist, ist mit dem Bezugszeichen 1* bezeichnet.In step 160, parameters 5a of the parameterized aggregation 5 and other parameters 1a, which characterize the behavior of the neural network 1, are optimized with the aim that the evaluation 4a by the cost function 4 is expected to improve with further processing of learning images 2a. The completely trained states of the parameters 1a and 5a are denoted by the reference symbols 1a* and 5a*, respectively. The completely trained neural network 1, whose behavior is characterized by the parameters 1a* and 5a*, is denoted by the reference symbol 1*.

In Schritt 170 werden dem trainierten neuronalen Netzwerk 1* Eingabe-Bilder 2 zugeführt, die mit mindestens einem Sensor 51 aufgenommen wurden. Diese Eingabe-Bilder 2 werden von dem neuronalen Netzwerk 1 auf Ausgaben 3 abgebildet.In step 170, input images 2 that were recorded with at least one sensor 51 are supplied to the trained neural network 1*. These input images 2 are mapped onto outputs 3 by the neural network 1 .

In Schritt 180 wird aus den Ausgaben 3 ein Ansteuersignal 180a ermittelt.In step 180, a control signal 180a is determined from the outputs 3.

In Schritt 190 wird ein Fahrzeug 50, und/oder ein System 60 für die Qualitätskontrolle von Produkten, und/oder ein System 70 für die Überwachung von Bereichen, mit diesem Ansteuersignal 180a angesteuert.In step 190, a vehicle 50 and/or a system 60 for product quality control and/or a system 70 for monitoring areas is controlled with this control signal 180a.

In 2 ist für ein als Bildklassifikator ausgebildetes neuronales Netzwerk 1 der Architektur WideResnet-18 der Verlust ΔA an Klassifikationsgenauigkeit aufgetragen, der sich einstellt, wenn die Eingabe-Bilder mit einer Stärke P verrauscht werden. Die Kurven a bis e beziehen sich auf Zustände des neuronalen Netzwerks 1 nach verschiedenen Trainings. Der Versuch wurde mit dem öffentlich zugänglichen Datensatz STL-10 zugeführt, der 5000 Lern-Bilder und 8000 Test-Bilder der Größe 96x96 Pixel aus 10 verschiedenen Klassen enthält.In 2 For a neural network 1 of the WideResnet-18 architecture designed as an image classifier, the loss ΔA in classification accuracy is plotted, which occurs when the input images are noisy with a strength P. The curves a to e relate to states of the neural network 1 after different training sessions. The experiment was fed with the publicly accessible data set STL-10, which contains 5000 training images and 8000 test images with a size of 96x96 pixels from 10 different classes.

Kurve a bezieht sich auf das herkömmliche Training. Die Kurven b bis e beziehen sich auf verschiedene Beispiele des Trainings nach dem hier beschriebenen Verfahren 100. Der durch die verrauschten Eingabe-Bilder verursachte Verlust an Genauigkeit kann durch das verbesserte Training zumindest teilweise wieder ausgeglichen werden. Für einige Konfigurationen zeigt sich auch bei unverrauschten Eingabe-Bildern bereits ein Gewinn (Kurvenverlauf oberhalb der Kurve a).Curve a refers to conventional training. The curves b to e relate to different examples of training according to the method 100 described here. The loss of accuracy caused by the noisy input images can be at least partially compensated again by the improved training. For some configurations, even with noise-free input images, there is already a gain (curve above curve a).

ZITATE ENTHALTEN IN DER BESCHREIBUNGQUOTES INCLUDED IN DESCRIPTION

Diese Liste der vom Anmelder aufgeführten Dokumente wurde automatisiert erzeugt und ist ausschließlich zur besseren Information des Lesers aufgenommen. Die Liste ist nicht Bestandteil der deutschen Patent- bzw. Gebrauchsmusteranmeldung. Das DPMA übernimmt keinerlei Haftung für etwaige Fehler oder Auslassungen.This list of documents cited by the applicant was generated automatically and is included solely for the better information of the reader. The list is not part of the German patent or utility model application. The DPMA assumes no liability for any errors or omissions.

Zitierte PatentliteraturPatent Literature Cited

DE 102018204494 B3 [0004]

Claims

Method (100) for training a neural network (1) which is designed to process input images (2) and comprises a plurality of convolution layers, each of these convolution layers being designed to filter the input f of the respective convolution layer by applying at least one filter kernel κ onto at least one feature map Φ(f,κ), with the steps: • a set T of transformations T is provided (110) with respect to which the neural network (1) is to be enabled during training when these are applied transformations on the input f of at least one convolutional layer to learn the generation of at least one equivariant or invariant feature map Φ(f,κ); • this feature map Φ(f, κ) is expressed (120) by an aggregation (5) parameterized with parameters (5a) of feature maps Φ _j (f, T _j [κ]), each obtained by applying transformations T _j ∈ T can be obtained on the at least one filter kernel κ; • learning images (2a) and learning outputs (3a) onto which the trained neural network (1) should ideally map these learning images (2a) are provided (130); • the learning images (2a) are mapped (140) by the neural network (1) to outputs (3); • Deviations of these outputs (3) from the learning outputs (3a) are evaluated (150) using a predetermined cost function (4); • Parameters (5a) of the parameterized aggregation (5) and other parameters (1a) that characterize the behavior of the neural network (1) are optimized (160) with the aim that further processing of learning images (2a) the assessment (4a) by the cost function (4) is expected to improve.

Method (100) according to claim 1 , where the filter kernel κ is expressed as a linear combination Σ _i w _i ψ _i of basis functions ψ _i parameterized with parameters w _i (121).

Method (100) according to claim 2 , where basis functions ψ _i are chosen (121a) which depend at least on Hermitian polynomials on spatial coordinates x, y in the input f.

Method (100) according to any one of Claims 1 until 3 , wherein the feature maps Φ _j (f, T _j [κ]) in the aggregation (5) are weighted (122) with one another with weights β _j (f) dependent on the input f.

Method (100) according to any one of Claims 1 until 4 , where elastic transformations, which can be described as a field of displacements in spatial coordinates of the input image, are chosen as transformations T ∈ T (111) .

Method (100) according to claim 5 , where linear stretching and/or rotational scaling are chosen as transformations T ∈ T (111a).

Method (100) according to any one of Claims 1 until 6 , wherein at least one feature map Φ(f, κ) is chosen (123) containing a sum of the input f and a processing product resulting from the successive application of a plurality of filter kernels κ ₁ , κ ₂ , ... to the input f .

Method (100) according to any one of Claims 1 until 7 , where the feature maps Φ _j (f,T _j [κ]) are aggregated by forming for each element of the feature map • an element-by-element maximum, • a smoothed element-by-element maximum or • an element-by-element mean value along the dimension j of the transformations T _j ∈ T becomes (124).

Method (100) according to any one of Claims 1 until 7 , wherein aggregating feature maps Φ _j (f,T _j [κ]) includes forming a norm about one or more spatial dimensions of each feature map and selecting one or more feature maps based on that norm (125).

Method (100) according to any one of Claims 1 until 9 wherein an image classifier that maps input images to classification scores related to one or more classes of a given classification is selected (105) as the neural network (1).

Method (100) according to any one of Claims 1 until 10 , where • the trained neural network (1*) is supplied with input images (2) (170) which were recorded with at least one sensor (51), so that these input images (2) are received by the neural network (1) mapped to outputs (3); • a control signal (180a) is determined from the outputs (3) (180); and • a vehicle (50), and/or a system (60) for product quality control, and/or a system (70) for monitoring areas, is controlled (190) with this control signal (180a).

Computer program containing machine-readable instructions which, when executed on one or more computers, cause the computer or computers to carry out a method according to one of Claims 1 until 11 to execute.

Machine-readable data carrier with the computer program claim 12 .

One or more computers with the computer program after claim 12 , and/or with the machine-readable data carrier Claim 13 .