DE102022208614A1

DE102022208614A1 - Reconstruction of training examples in federated training of neural networks

Info

Publication number: DE102022208614A1
Application number: DE102022208614.7A
Authority: DE
Inventors: Andres Mauricio Munoz Delgado
Original assignee: Robert Bosch GmbH
Current assignee: Robert Bosch GmbH
Priority date: 2022-08-19
Filing date: 2022-08-19
Publication date: 2024-02-22
Also published as: US20240062073A1; CN117592553A

Abstract

Verfahren (100) zum Rekonstruieren von Trainings-Beispielen x, mit denen ein vorgegebenes neuronales Netzwerk (1) auf die Optimierung einer vorgegebenen Kostenfunktion L trainiert wurde, mit den Schritten:• es wird eine Gütefunktion R bereitgestellt (110), die für ein rekonstruiertes Trainings-Beispiel x̃ misst, inwieweit es einer erwarteten Domäne oder Verteilung der Trainings-Beispiele x angehört;• es wird eine Größe B eines Batches von Trainings-Beispielen x, mit dem das neuronale Netzwerk 1 trainiert wurde, bereitgestellt (120);• ein bei diesem Training ermittelter Gradient dL/dMWder Kostenfunktion L nach Parametern MW, die das Verhalten des neuronalen Netzwerks (1) charakterisieren, wird in eine Partition aus B Komponenten Pjaufgeteilt (130);• aus jeder Komponente Pjdes Gradienten dL/dMWwird unter Heranziehung der funktionalen Abhängigkeit der Ausgaben yivon Neuronen in der Eingangsschicht des neuronalen Netzwerks (1), die die Trainings-Beispiele x entgegennimmt, von den Parametern MW,idieser Neuronen und den Trainings-Beispielen x ein Trainings-Beispielx˜jTrekonstruiert (140);• die so erhaltenen Rekonstruktionenx˜jTwerden mit der Gütefunktion R bewertet (150);• die Partition in die Komponenten Pjwird auf das Ziel optimiert (160), dass bei erneuter Aufteilung des Gradienten dL/dMWder Kostenfunktion L und Rekonstruktion neuer Trainings-Beispielex˜jTderen Bewertung durch die Gütefunktion R verbessert wird.Method (100) for reconstructing training examples x, with which a given neural network (1) was trained to optimize a given cost function L, with the steps: • A quality function R is provided (110), which is for a reconstructed Training example x̃ measures the extent to which it belongs to an expected domain or distribution of the training examples x; • a size B of a batch of training examples x with which the neural network 1 was trained is provided (120); • a The gradient dL/dMW of the cost function L determined during this training according to parameters MW that characterize the behavior of the neural network (1) is divided into a partition of B components Pj (130); Dependence of the outputs yivof neurons in the input layer of the neural network (1), which receives the training examples x, on the parameters MW,i of these neurons and the training examples x a training examplex˜jTreconstructed (140);• the ones obtained in this way Reconstructions x˜jT are evaluated with the quality function R (150); R is improved.

Description

Die vorliegende Erfindung betrifft das föderierte Training neuronaler Netzwerke, bei dem mehrere Clients auf der Basis lokaler Bestände von Trainings-Beispielen Beiträge zum Training beisteuern.The present invention relates to federated training of neural networks, in which multiple clients contribute to the training based on local inventories of training examples.

Stand der TechnikState of the art

Zum Trainieren neuronaler Netzwerke, die beispielsweise als Klassifikatoren für Bilder oder andere Messdaten verwendet werden können, sind eine große Menge Trainings-Beispiele mit hinreichender Variabilität nötig. Wenn die Trainings-Beispiele personenbezogene Daten enthalten, wie etwa Bilder von Gesichtern oder Kfz-Kennzeichen, wird insbesondere das Sammeln von Trainings-Beispielen aus einer Vielzahl von Ländern mit jeweils unterschiedlichen Datenschutzbestimmungen rechtlich problematisch. Darüber hinaus haben beispielsweise Bilder oder Videodaten ein sehr großes Volumen, so dass die zentralisierte Sammlung sehr viel Bandbreite und Speicherplatz benötigt.To train neural networks that can be used, for example, as classifiers for images or other measurement data, a large number of training examples with sufficient variability are required. If the training examples contain personal data, such as images of faces or license plates, collecting training examples from a large number of countries, each with different data protection regulations, becomes legally problematic. In addition, images or video data, for example, have a very large volume, so the centralized collection requires a lot of bandwidth and storage space.

Daher wird beim föderierten Lernen das neuronale Netzwerk von einer zentralen Entität an viele Clients ausgegeben, die das Netzwerk dann jeweils mit ihren lokalen Beständen trainieren und Vorschläge für Änderungen der Parameter des Netzwerks ermitteln. Diese Vorschläge werden von der zentralen Entität zu einem finalen Update der Parameter aggregiert.Therefore, in federated learning, the neural network is issued by a central entity to many clients, each of which then trains the network with their local assets and determines suggestions for changes to the network's parameters. These suggestions are aggregated by the central entity into a final update of the parameters.

Auf diese Weise werden zwischen der zentralen Entität und den Clients lediglich Parameter des neuronalen Netzwerks sowie dessen Änderungen ausgetauscht. In this way, only parameters of the neural network and its changes are exchanged between the central entity and the clients.

Die andere Seite dieser Medaille ist, dass die Kontrolle über die Qualität des letztendlichen Trainingserfolgs ein Stück weit abgegeben wird.The other side of this coin is that some control over the quality of the ultimate training success is given away.

Offenbarung der ErfindungDisclosure of the invention

Die Erfindung stellt ein Verfahren zum Rekonstruieren von Trainings-Beispielen x, mit denen ein vorgegebenes neuronales Netzwerk auf die Optimierung einer vorgegebenen Kostenfunktion L trainiert wurde, bereit. Die Kostenfunktion L ist insbesondere beim föderierten Training allen Teilnehmern bekannt.The invention provides a method for reconstructing training examples x with which a given neural network was trained to optimize a given cost function L. The cost function L is known to all participants, especially in federated training.

Im Rahmen des Verfahrens wird zunächst eine Gütefunktion R bereitgestellt. Unabhängig davon, auf welche Weise auch immer ein rekonstruiertes Trainings-Beispiel x̃ beschafft wurde, misst diese Gütefunktion R für dieses rekonstruierte Trainings-Beispiel x̃, inwieweit es einer erwarteten Domäne oder Verteilung der Trainings-Beispiele angehört. Die Gütefunktion R gibt also einen Score aus, der besagt, wie gut das rekonstruierte Trainings-Beispiel x̃ in die erwartete Domäne oder Verteilung hineinpasst. Damit ist das Ziel, dass sich das rekonstruierte Trainings-Beispiel x̃ dort einfügt, einer Optimierung zugänglich, getreu der Maxime von Archimedes, dass sich jede Last bewegen lässt, wenn denn nur ein Ansatzpunkt für einen Hebel vorhanden ist.As part of the process, a quality function R is first provided. Regardless of how a reconstructed training example x̃ was obtained, this quality function R for this reconstructed training example x̃ measures the extent to which it belongs to an expected domain or distribution of the training examples. The quality function R therefore outputs a score that indicates how well the reconstructed training example x̃ fits into the expected domain or distribution. This means that the goal is that the reconstructed training example x̃ fits there and is open to optimization, true to Archimedes' maxim that any load can be moved if there is only one starting point for a lever.

Es wird weiterhin eine Größe B eines Batches von Trainings-Beispielen x, mit dem das neuronale Netzwerk trainiert wurde, bereitgestellt. Beim föderierten Training, das von einer Mehrzahl dezentraler Clients C durchgeführt und von einer zentralen Entität Q koordiniert wird, wird B üblicherweise entweder von der zentralen Entität Q vorgegeben oder von den Clients C jeweils der zentralen Entität Q mitgeteilt. Sollte B nicht bekannt sein, kann ersatzweise eine Abschätzung eingesetzt und die Verfeinerung dieser Abschätzung in die im Folgenden beschriebene Optimierung mit einbezogen werden.A size B of a batch of training examples x with which the neural network was trained is also provided. In federated training, which is carried out by a plurality of decentralized clients C and coordinated by a central entity Q, B is usually either specified by the central entity Q or communicated to the central entity Q by the clients C. If B is not known, an estimate can be used as a substitute and the refinement of this estimate can be included in the optimization described below.

Die Größe B wird verwendet, um einen bei dem Training ermittelten Gradienten dL/dM_W der Kostenfunktion L nach Parametern M_W, die das Verhalten des neuronalen Netzwerks charakterisieren, in eine Partition aus B Komponenten P_j aufzuteilen. Der Gradient dL/dM_W ist beim föderierten Lernen typischerweise das, was von Clients C an eine koordinierende zentrale Entität Q zurückgemeldet wird. Die Partition kann beispielsweise gemäß $\sum_{j = 1, \dots, B} \frac{1}{B} P_{j} = \frac{d L}{d M_{W}}$

als Summe ausgeführt sein.The size B is used to divide a gradient dL/dM _W of the cost function L determined during training into a partition of B components P _j according to parameters M _W that characterize the behavior of the neural network. In federated learning, the gradient dL/dM _W is typically what is reported back by clients C to a coordinating central entity Q. For example, the partition can be according to

\sum_{j = 1, \dots, b} \frac{1}{b} P_{j} = \frac{d L}{d M_{W}}

be executed as a sum.

Aus jeder Komponente P_j des Gradienten dL/dM_W wird unter Heranziehung der funktionalen Abhängigkeit der Ausgaben y_i von Neuronen in der Eingangsschicht des neuronalen Netzwerks, die die Trainings-Beispiele x entgegennimmt, von den Parametern M_W,i dieser Neuronen und den Trainings-Beispielen x ein Trainings-Beispiel ${\tilde{x}}_{j}^{T}$

rekonstruiert. Wie im Folgenden näher erläutert werden wird, ist eine solche Rekonstruktion unter der vereinfachenden Annahme, dass ein einzelnes Trainings-Beispiel x mindestens ein Neuron in der Eingangsschicht aktiviert, möglich.Each component P _j of the gradient dL/dM _W is calculated using the functional dependence of the outputs y _i of neurons in the input layer of the neural network, which receives the training examples x, on the parameters M _W,i of these neurons and the trainings -Examples x a training example

{\tilde{x}}_{j}^{T}

reconstructed. As will be explained in more detail below, such a reconstruction is possible under the simplifying assumption that a single training example x activates at least one neuron in the input layer.

Eine solche Rekonstruktion setzt voraus, dass der auf dieses Trainings-Beispiel x zurückgehende Gradient dL/dM_W der Kostenfunktion L bekannt ist. Beim föderierten Lernen wird jedoch typischerweise ein Gradient dL/dM_W zurückgemeldet, der über alle B Trainings-Beispiele des Batches aggregiert ist, so dass hieraus kein unmittelbarer Rückschluss auf ein einziges Trainings-Beispiel gezogen werden kann. Das hier vorgeschlagene Verfahren führt die Rekonstruktion daher für jede Komponente P_j des Gradienten dL/dM_W separat durch und führt hiermit das Problem auf die Aufgabe zurück, die richtige Partition des Gradienten dL/dM_W in Komponenten P_j zu finden.Such a reconstruction requires that the gradient dL/dM _W of the cost function L based on this training example x is known. With federated learning, however, a gradient dL/dM _W is typically reported back, which is aggregated across all B training examples in the batch, so that no direct conclusion can be drawn about a single training example. The one suggested here The method therefore carries out the reconstruction separately for each component P _j of the gradient dL/dM _W and hereby reduces the problem to the task of finding the correct partition of the gradient dL/dM _W in components P _j .

Zu diesem Zweck werden die für alle Komponenten P_j jeweils erhaltenen Rekonstruktionen ${\tilde{x}}_{j}^{T}$

werden mit der Gütefunktion R bewertet. Die Partition in die Komponenten P_j wird dann auf das Ziel optimiert, dass bei erneuter Aufteilung des Gradienten der Kostenfunktion und Rekonstruktion neuer Trainings-Beispiele

{\tilde{x}}_{j}^{T}

deren Bewertung durch die Gütefunktion R verbessert wird.For this purpose, the reconstructions obtained for all components P _j are used

{\tilde{x}}_{j}^{T}

are evaluated with the quality function R. The partition into the components P _j is then optimized to the goal of splitting the gradient of the cost function again and reconstructing new training examples

{\tilde{x}}_{j}^{T}

whose evaluation is improved by the quality function R.

Letztendlich wird also im Raum der möglichen Partitionen des Gradienten dL/dM_W in Komponenten P_j nach derjenigen Partition gesucht, die, wenn gemäß dieser Partition je Komponente P_j eine Rekonstruktion ${\tilde{x}}_{j}^{T}$

eines Trainings-Beispiels erzeugt wird, auf solche Rekonstruktionen

{\tilde{x}}_{j}^{T}

führt, die der erwarteten Domäne oder Verteilung der Trainings-Beispiele angehören. Somit ist lediglich Vorwissen in Bezug auf diese erwartete Domäne oder Verteilung nötig, um jedes einzelne Trainings-Beispiel x₁, ...,x_B zumindest näherungsweise zu rekonstruieren.Ultimately, in the space of possible partitions of the gradient dL/dM _W in components P _j , a search is made for the partition which, if a reconstruction is possible for each component P _j according to this partition

{\tilde{x}}_{j}^{T}

of a training example is generated on such reconstructions

{\tilde{x}}_{j}^{T}

leads that belong to the expected domain or distribution of the training examples. Thus, only prior knowledge regarding this expected domain or distribution is required to at least approximately reconstruct each individual training example x ₁ , ...,x _B.

Auch eine nur näherungsweise Rekonstruktion, die nicht die beste Qualität hat, liefert bereits wertvolle Hinweise auf die Qualität des Trainings. Es kann insbesondere beispielsweise überprüft werden, ob überhaupt die richtige Art von Trainings-Beispielen x gemäß Vorgabe der zentralen koordinierenden Entität Q verwendet wurde. Wenn beispielsweise für ein Fahrassistenzsystem oder für ein zumindest teilweise automatisiert fahrendes Kraftfahrzeug ein neuronales Netzwerk trainiert wird, das Bilder von Verkehrssituationen klassifiziert oder in sonstiger Weise verarbeitet, werden Trainings-Beispiele von Verkehrssituationen benötigt, die aus der Perspektive eines Kraftfahrzeugs aufgenommen wurden. Einer der Clients könnte nun beispielsweise die Anweisung zum Sammeln von Trainings-Beispielen missverstehen und Trainings-Beispiele nutzen, die mit der Helmkamera eines Radfahrers aufgenommen wurden. Das Einbringen dieser Trainings-Beispiele könnte die Leistung des letztendlich für Kraftfahrzeuge gedachten neuronalen Netzwerks schlechter statt besser machen. Durch eine auch unvollkommene Rekonstruktion können derartige Fehler entdeckt werden.Even an approximate reconstruction that is not of the best quality already provides valuable information about the quality of the training. In particular, it can be checked, for example, whether the correct type of training examples x was used as specified by the central coordinating entity Q. For example, if a neural network is trained for a driving assistance system or for an at least partially automated motor vehicle that classifies or otherwise processes images of traffic situations, training examples of traffic situations that were recorded from the perspective of a motor vehicle are required. For example, one of the clients could misunderstand the instructions for collecting training examples and use training examples recorded with a cyclist's helmet camera. Introducing these training examples could make the performance of the neural network ultimately intended for automobiles worse instead of better. Such errors can be discovered through even an imperfect reconstruction.

Bilder als Trainings-Beispiele x sind auch noch in weiterer Hinsicht ein bevorzugter Anwendungsfall. Sie haben im Vergleich zu anderen Datenarten ein hohes Volumen, so dass durch das föderierte Lernen besonders viel Speicherplatz und Bandbreite für die Datenübertragung eingespart wird. Außerdem sind Bilder auch besonders sensitiv in Bezug auf den Datenschutz. Somit wird in einer besonders vorteilhaften Ausgestaltung ein neuronales Netzwerk gewählt, das dazu ausgebildet ist, aus einem in Pixel unterteilten, bzw. aus Pixeln zusammengesetzten, Eingabe-Bild auf der Basis der Pixelwerte

• Klassifikations-Scores in Bezug auf eine oder mehrere Klassen einer vorgegebenen Klassifikation, und/oder
• eine semantische Segmentierung dahingehend, dass jedes Pixel einer Klasse zugeordnet wird, und/oder
• in dem Eingabe-Bild enthaltene Instanzen von Objekten

auszuwerten.Images as training examples x are also a preferred use case in other respects. They have a high volume compared to other types of data, so federated learning saves a lot of storage space and bandwidth for data transfer. Images are also particularly sensitive in terms of data protection. Thus, in a particularly advantageous embodiment, a neural network is selected which is designed to generate an input image divided into pixels or composed of pixels on the basis of the pixel values

• Classification scores related to one or more classes of a given classification, and/or
• a semantic segmentation such that each pixel is assigned to a class, and/or
• Instances of objects contained in the input image

to evaluate.

In einer besonders vorteilhaften Ausgestaltung werden Anteile p_j · dL/dM_W mit Gewichten p_j und Σ_j p_j = 1 als Komponenten P_j der Partition gewählt. Für die Werte der Gewichte p_j gilt dann 0 < p_j < 1, was für die numerische Optimierung vorteilhaft ist.In a particularly advantageous embodiment, shares p _j · dL/dM _W with weights p _j and Σ _j p _j = 1 are selected as components P _j of the partition. For the values of the weights p _j then 0 < p _j < 1 applies, which is advantageous for numerical optimization.

In einer weiteren vorteilhaften Ausgestaltung wird ein Gradient der Gütefunktion R zu Änderungen der Gewichte p_j zurückpropagiert. Es können dann zur Auffindung des Optimums die bewährten gradientenbasierten Methoden genutzt werden, wie etwa ein stochastisches Gradientenabstiegsverfahren.In a further advantageous embodiment, a gradient of the quality function R is propagated back to changes in the weights p _j . The proven gradient-based methods can then be used to find the optimum, such as a stochastic gradient descent method.

Die Gewichte p_j können insbesondere beispielsweise mit aus Logits des neuronalen Netzwerks gebildeten Softmax-Werten initialisiert werden. Diese Logits sind Roh-Ausgaben einer Schicht des neuronalen Netzwerks und liefern somit einen ersten Anhaltspunkt dahingehend, welche der Trainings-Beispiele x im Batch besonders stark zum Gradienten dL/dM_W beigetragen haben.The weights p _j can in particular be initialized, for example, with softmax values formed from logits of the neural network. These logits are raw outputs of a layer of the neural network and thus provide an initial indication of which of the training examples x in the batch contributed particularly strongly to the gradient dL/dM _W.

In einer besonders vorteilhaften Ausgestaltung wird ein neuronales Netzwerk gewählt, das Gewichte $w_{i}^{T}$

und Bias-Werte b_i als Parameter M_W,i umfasst. In einem solchen Netzwerk

• multipliziert ein i-tes Neuron ein diesem Neuron zugeführtes Trainingsbeispiel x mit Gewichten $w_{i}^{T},$
• addiert das Neuron zu dem Ergebnis einen Bias-Wert b_i, um so einen Aktivierungswert des Neurons zu erhalten und
• ermittelt das Neuron eine Ausgabe y_i durch Anwenden einer nichtlinearen Aktivierungsfunktion auf diesen Aktivierungswert.

In a particularly advantageous embodiment, a neural network is selected that weights

w_{i}^{T}

and bias values b _i as parameters M _W,i . In such a network

• an i-th neuron multiplies a training example x fed to this neuron by weights $w_{i}^{T},$
• the neuron adds a bias value b _i to the result in order to obtain an activation value of the neuron and
• the neuron determines an output y _i by applying a nonlinear activation function to this activation value.

Der Aktivierungswert ist dann eine lineare Funktion des Trainingsbeispiels x. Die Aktivierungsfunktion kann insbesondere beispielsweise so ausgebildet sein, dass sie zumindest abschnittsweise linear ist. So reicht beispielsweise die „Rectified Linear Unit (ReLU)“-Funktion den positiven Anteil ihres Arguments unverändert durch.The activation value is then a linear function of the training example x. The activation function can in particular, for example, be designed so that it is linear at least in sections. For example, the “Rectified Linear Unit (ReLU)” function passes the positive portion of its argument through unchanged.

Wenn im neuronalen Netzwerk auf die Eingangsschicht des neuronalen Netzwerks unmittelbar eine dichte Schicht folgt, deren Neuronen mit allen Neuronen der Eingangsschicht verbunden sind, ist die Ausgabe y_i des i-ten Neurons gegeben durch $y_{i} = ReLU (w_{i}^{T} x + b_{i}),$

so dass für Ausgaben y_i > 0 abgeleitet werden kann:

\frac{d L}{d b_{i}} = \frac{d L}{d y_{i}} \frac{d y_{i}}{d b_{i}} = \frac{d L}{d y_{i}}

wegen dy_i/db_i = 1. Analog gilt

\frac{d L}{d w_{i}^{T}} = \frac{d L}{d y_{i}} \frac{d y_{i}}{d w_{i}^{T}} = \frac{d L}{d b_{i}} x^{T} .

If in the neural network the input layer of the neural network is immediately followed by a dense layer whose neurons are connected to all neurons of the input layer, the output y _i of the ith neuron is given by

y_{i} = ReLU (w_{i}^{T} x + b_{i}),

so that for outputs y _i > 0 it can be derived:

\frac{d L}{d b_{i}} = \frac{d L}{d y_{i}} \frac{d y_{i}}{d b_{i}} = \frac{d L}{d y_{i}}

because dy _i /db _i = 1. The same applies

\frac{d L}{d w_{i}^{T}} = \frac{d L}{d y_{i}} \frac{d y_{i}}{d w_{i}^{T}} = \frac{d L}{d b_{i}} x^{T} .

Somit lässt sich die Rekonstruktion x̃^T des Trainings-Beispiels x^T berechnen als ${\tilde{x}}^{T} = {(\frac{d L}{d b_{i}})}^{- 1} (\frac{d L}{d w_{i}^{T}})$

unter der ebenfalls vom neuronalen Netzwerk zu erfüllenden Voraussetzung, dass (dL/db_i) ≠ 0.The reconstruction x̃ ^T of the training example x ^T can therefore be calculated as

{\tilde{x}}^{T} = {(\frac{d L}{d b_{i}})}^{- 1} (\frac{d L}{d w_{i}^{T}})

under the condition, which also has to be fulfilled by the neural network, that (dL/db _i ) ≠ 0.

Wie zuvor erläutert, wird diese Berechnung für jede Komponente P_j des Gradienten dL/dM_W separat durchgeführt, um jeweils eine Rekonstruktion ${\tilde{x}}_{j}^{T}$

zu erhalten. Es werden also aus der Komponente P_j des Gradienten dL/dM_W Gradienten dL/db_i der Kostenfunktion L nach dem Bias b_i und Gradienten

d L / d w_{i}^{T}

der Kostenfunktion L nach den Gewichten

w_{i}^{T}

ermittelt, und aus diesen Gradienten dL/db_i und

d L / d w_{i}^{T}

wird die gesuchte Rekonstruktion

{\tilde{x}}_{j}^{T}

des Trainings-Beispiels ermittelt. Mit fortschreitender Optimierung der Partition des Gradienten dL/dM_W in die Komponenten P_j werden auch die Rekonstruktionen

{\tilde{x}}_{j}^{T}

immer besser.As explained previously, this calculation is carried out separately for each component P _j of the gradient dL/dM _W , in each case a reconstruction

{\tilde{x}}_{j}^{T}

to obtain. The component P _j of the gradient dL/dM _W thus becomes gradients dL/db _i of the cost function L according to the bias b _i and gradients

d L / d w_{i}^{T}

the cost function L according to the weights

w_{i}^{T}

determined, and from these gradients dL/db _i and

d L / d w_{i}^{T}

becomes the reconstruction you are looking for

{\tilde{x}}_{j}^{T}

of the training example. As the partition of the gradient dL/dM _W into the components P _j progresses, so do the reconstructions

{\tilde{x}}_{j}^{T}

better and better.

In einer besonders vorteilhaften Ausgestaltung wird ein trainierter Diskriminator eines Generative Adversarial Network, GAN, als Gütefunktion R gewählt. Ein solcher Diskriminator hat gelernt, echte Samples aus der erwarteten Domäne oder Verteilung von mit einem Generator des GAN generierten Samples zu unterscheiden. Als Wert der Gütefunktion R kann beispielsweise ein vom Diskriminator ausgegebener Klassifikations-Score verwendet werden. Es können auch beispielsweise probabilistische Modelle verwendet werden, die es ermöglichen, Dichteverteilungen der Trainings-Beispiele x über Likelihood-Funktionen (etwa die auch für die Spam-Filterung von E-Mails genutzten Bayes-Modelle) abzuschätzen.In a particularly advantageous embodiment, a trained discriminator of a Generative Adversarial Network, GAN, is selected as the quality function R. Such a discriminator has learned to distinguish real samples from the expected domain or distribution from samples generated with a GAN generator. For example, a classification score output by the discriminator can be used as the value of the quality function R. Probabilistic models can also be used, for example, which make it possible to estimate density distributions of the training examples x using likelihood functions (such as the Bayesian models also used for spam filtering of emails).

Die Trainings-Beispiele x können insbesondere beispielsweise Bilder, und/oder Zeitreihen von Messwerten, repräsentieren. Gerade Bilder sind besonders großvolumig und sensibel in Bezug auf den Datenschutz, so dass das föderierte Training besonders vorteilhaft ist. Auch detailgenaue Zeitreihen von Messwerten in Industrieanlagen können Rückschlüsse auf Interna eines Produktionsprozesses zulassen, die nicht für die Öffentlichkeit bestimmt sind. Die rekonstruierten Trainings-Beispiele ${\tilde{x}}_{j}^{T}$

sind nicht ganz so detailreich und somit für Unbefugte schlechter ausnutzbar.The training examples x can in particular represent, for example, images and/or time series of measured values. Images in particular are particularly large in volume and sensitive in terms of data protection, making federated training particularly advantageous. Detailed time series of measured values in industrial plants can also allow conclusions to be drawn about the internals of a production process that are not intended for the public. The reconstructed training examples

{\tilde{x}}_{j}^{T}

are not quite as detailed and therefore harder to exploit for unauthorized persons.

In einer besonders vorteilhaften Ausgestaltung werden die rekonstruierten Trainings-Beispiele ${\tilde{x}}_{j}^{T}$

dem neuronalen Netzwerk als Validierungsdaten zugeführt. Die daraufhin vom neuronalen Netzwerk gelieferten Ausgaben mit Soll-Ausgaben verglichen, mit denen diese rekonstruierten Trainings-Beispiele (aus beliebiger Quelle) gelabelt sind. Anhand des Ergebnisses dieses Vergleichs wird ermittelt, inwieweit das neuronale Netzwerk hinreichend auf ungesehene Daten generalisiert. Die rekonstruierten Trainings-Beispiele

{\tilde{x}}_{j}^{T}

sind insofern optimale Testobjekte, als sie ausweislich der Gütefunktion R erwiesenermaßen der Domäne oder Verteilung der ursprünglichen Trainings-Beispiele x angehören, ohne mit irgendeinem dieser Trainings-Beispiele x identisch zu sein.In a particularly advantageous embodiment, the reconstructed training examples

{\tilde{x}}_{j}^{T}

fed to the neural network as validation data. The outputs then provided by the neural network are compared with target outputs with which these reconstructed training examples (from any source) are labeled. The result of this comparison is used to determine the extent to which the neural network generalizes sufficiently to unseen data. The reconstructed training examples

{\tilde{x}}_{j}^{T}

are optimal test objects in that, according to the quality function R, they belong to the domain or distribution of the original training examples x without being identical to any of these training examples x.

Wenn sich bei dieser Prüfung ergibt, dass das neuronale Netzwerk hinreichend auf ungesehene Daten generalisiert, kann das Netzwerk im beabsichtigten Wirkbetrieb genutzt werden. Dem neuronalen Netzwerk werden dann vorteilhaft Messdaten zugeführt, die mit mindestens einem Sensor aufgenommen wurden. Aus der daraufhin vom neuronalen Netzwerk gelieferten Ausgabe wird ein Ansteuersignal ermittelt. Ein Fahrzeug, ein Fahrassistenzsystem, ein System zur Qualitätskontrolle, ein System zur Überwachung von Bereichen, und/oder ein System zur medizinischen Bildgebung, wird mit dem Ansteuersignal angesteuert. In diesem Kontext bietet die Rekonstruktion von Trainings-Beispielen mit dem hier vorgeschlagenen Verfahren letztendlich ein erhöhtes Maß an Sicherheit, dass die vom jeweils angesteuerten System ausgeführte Reaktion auf das Ansteuersignal der durch die Messdaten repräsentierten Situation angemessen ist.If this test shows that the neural network generalizes sufficiently to unseen data, the network can be used in its intended operational mode. The neural network is then advantageously supplied with measurement data that was recorded with at least one sensor. A control signal is determined from the output then provided by the neural network. A vehicle, a driving assistance system, a quality control system, an area monitoring system, and/or a medical imaging system is controlled with the control signal. In this context, the reconstruction of training examples using the method proposed here ultimately offers an increased degree of certainty that the reaction carried out by the controlled system to the control signal is appropriate to the situation represented by the measurement data.

Wie zuvor erläutert, wird vorteilhaft im Rahmen des föderierten Trainings die Rekonstruktion von einer zentralen Entität Q vorgenommen, die das neuronale Netzwerk zum Zwecke des föderierten Trainings an eine Mehrzahl von Clients C verteilt. Der Gradient dL/dM_W der Kostenfunktion L nach den Parametern M_W wird von einem Client C beim Training des neuronalen Netzwerks auf einem Batch mit B Trainings-Beispielen x ermittelt und über diese B Trainings-Beispiele x aggregiert. Wie zuvor erläutert, lässt sich auf diese Weise überprüfen, ob die Beiträge aller Clients C tatsächlich sinnvoll im Hinblick auf den beabsichtigten Anwendungszweck des neuronalen Netzwerks sind. Es wurde bereits zuvor das Beispiel erwähnt, dass auf Grund eines Missverständnisses zwischen Client C und zentraler Entität Q Trainings-Beispiele verwendet werden, die gar nicht zu der vorgesehenen Anwendung passen. Daneben ist es beispielsweise auch möglich, dass einzelne Clients C beständig Trainings-Beispiele in schlechter technischer Qualität nutzen. Beispielsweise können Kamerabilder falsch belichtet und/oder unscharf sein, so dass das Wesentliche auf ihnen nicht zu erkennen ist.As explained above, within the framework of federated training, the reconstruction is advantageously carried out by a central entity Q, which distributes the neural network to a plurality of clients C for the purpose of federated training. The gradient dL/dM _W of the cost function L according to the parameters M _W is determined by a client C when training the neural network on a batch with B training examples x and aggregated over these B training examples x. As explained previously, this makes it possible to check whether the contributions of all clients C actually make sense with regard to the intended application of the neural network. The example was already mentioned before that, due to a misunderstanding between client C and central entity Q, training examples are used that do not fit the intended application at all. In addition, it is also possible, for example, that individual clients C constantly use training examples of poor technical quality. For example, camera images may be incorrectly exposed and/or out of focus, meaning that the essence of them cannot be seen.

Beispielsweise kann eine Zeitentwicklung und/oder Statistik über die rekonstruierten Trainings-Beispiele ${\tilde{x}}_{j}^{T}$

ermittelt werden. Anhand dieser Zeitentwicklung und/oder Statistik kann dann eine Drift des Verhaltens des neuronalen Netzwerks, und/oder eine Verschlechterung des Verhaltens des neuronalen Netzwerks bezüglich bisheriger Trainings-Beispiele x beim Weitertraining mit neuen Trainings-Beispielen x detektiert werden. So könnte beispielsweise ein inkrementelles Training des neuronalen Netzwerks mit immer neuen Batches von Trainings-Beispielen dazu führen, dass ein aus früheren Trainings-Beispielen gelerntes „Wissen“ wieder in „Vergessenheit“ gerät (sogenanntes „catastrophic forgetting“).For example, a time development and/or statistics about the reconstructed training examples

{\tilde{x}}_{j}^{T}

be determined. Based on this time development and/or statistics, a drift in the behavior of the neural network and/or a deterioration in the behavior of the neural network with respect to previous training examples x can then be detected during further training with new training examples x. For example, incremental training of the neural network with ever new batches of training examples could lead to “knowledge” learned from previous training examples being “forgotten” again (so-called “catastrophic forgetting”).

Alternativ oder auch in Kombination hierzu kann ein Steuereingriff in die Zusammenarbeit der zentralen Entität Q mit den Clients C vorgenommen werden. Dieser Steuereingriff kann insbesondere beispielsweise zum Ziel haben, eine zuvor festgestellte Verschlechterung bzw. Drift aufzuhalten oder umzukehren. Ein Steuereingriff kann insbesondere beispielsweise beinhalten, zeitweise oder dauerhaft die von mindestens einem Client C gelieferten Gradienten dL/dM_W unberücksichtigt zu lassen.Alternatively or in combination with this, a control intervention in the cooperation between the central entity Q and the clients C can be carried out. This control intervention can in particular have the aim, for example, of stopping or reversing a previously determined deterioration or drift. A control intervention can in particular include, for example, temporarily or permanently disregarding the gradients dL/dM _W supplied by at least one client C.

Das Verfahren kann insbesondere ganz oder teilweise computerimplementiert sein. Daher bezieht sich die Erfindung auch auf ein Computerprogramm mit maschinenlesbaren Anweisungen, die, wenn sie auf einem oder mehreren Computern und/oder Compute-Instanzen ausgeführt werden, den oder die Computer und/oder Compute-Instanzen dazu veranlassen, das beschriebene Verfahren auszuführen. In diesem Sinne sind auch Steuergeräte für Fahrzeuge und Embedded-Systeme für technische Geräte, die ebenfalls in der Lage sind, maschinenlesbare Anweisungen auszuführen, als Computer anzusehen. Beispiele für Compute-Instanzen sind virtuelle Maschinen, Container oder serverlose Ausführungsumgebungen für die Ausführung maschinenlesbarer Anweisungen in einer Cloud.The method can in particular be implemented entirely or partially by computer. Therefore, the invention also relates to a computer program with machine-readable instructions which, when executed on one or more computers and/or compute instances, cause the computer(s) and/or compute instances to carry out the method described. In this sense, control devices for vehicles and embedded systems for technical devices that are also capable of executing machine-readable instructions are also considered computers. Examples of compute instances include virtual machines, containers, or serverless execution environments for executing machine-readable instructions in a cloud.

Ebenso bezieht sich die Erfindung auch auf einen maschinenlesbaren Datenträger und/oder auf ein Downloadprodukt mit dem Computerprogramm. Ein Downloadprodukt ist ein über ein Datennetzwerk übertragbares, d.h. von einem Benutzer des Datennetzwerks downloadbares, digitales Produkt, das beispielsweise in einem Online-Shop zum sofortigen Download feilgeboten werden kann.The invention also relates to a machine-readable data carrier and/or to a download product with the computer program. A download product is a digital product that can be transferred via a data network, i.e. downloadable by a user of the data network and which can be offered for sale in an online shop for immediate download, for example.

Weiterhin kann ein Computer mit dem Computerprogramm, mit dem maschinenlesbaren Datenträger bzw. mit dem Downloadprodukt ausgerüstet sein.Furthermore, a computer can be equipped with the computer program, with the machine-readable data carrier or with the download product.

Weitere, die Erfindung verbessernde Maßnahmen werden nachstehend gemeinsam mit der Beschreibung der bevorzugten Ausführungsbeispiele der Erfindung anhand von Figuren näher dargestellt.Further measures improving the invention are shown in more detail below together with the description of the preferred exemplary embodiments of the invention using figures.

AusführungsbeispieleExamples of embodiments

Es zeigt:

1 Ausführungsbeispiel des Verfahrens 100 zum Rekonstruieren von Trainings-Beispielen x;
2 Veranschaulichung der auf eine Optimierung von Komponenten P_j einer Partition zurückgeführten Rekonstruktion.

It shows:

1 Embodiment of the method 100 for reconstructing training examples x;
2 Illustration of the reconstruction resulting from an optimization of components P _j of a partition.

1 ist ein schematisches Ablaufdiagramm eines Ausführungsbeispiels des Verfahrens 100 zum Rekonstruieren von Trainings-Beispielen x, mit denen ein vorgegebenes neuronales Netzwerk 1 auf die Optimierung einer vorgegebenen Kostenfunktion L trainiert wurde. 1 is a schematic flow diagram of an exemplary embodiment of the method 100 for reconstructing training examples x with which a given neural network 1 was trained to optimize a given cost function L.

In Schritt 110 wird eine Gütefunktion R bereitgestellt, die für ein rekonstruiertes Trainings-Beispiel x̃ misst, inwieweit es einer erwarteten Domäne oder Verteilung der Trainings-Beispiele x angehört. Diese Gütefunktion R kann gemäß Block 111 ein trainierter Diskriminator eines Generative Adversarial Network, GAN, sein. Wie zuvor erläutert, können auch beispielsweise probabilistische Modelle verwendet werden.In step 110, a quality function R is provided which, for a reconstructed training example x̃, measures the extent to which it belongs to an expected domain or distribution of the training examples x. According to block 111, this quality function R can be a trained discriminator of a Generative Adversarial Network, GAN. As explained previously, probabilistic models can also be used, for example.

In Schritt 120 wird eine Größe B eines Batches von Trainings-Beispielen x, mit dem das neuronale Netzwerk trainiert wurde, bereitgestellt.In step 120, a size B of a batch of training examples x with which the neural network was trained is provided.

In Schritt 130 wird ein bei diesem Training ermittelter Gradient dL/dM_W der Kostenfunktion L nach Parametern M_W, die das Verhalten des neuronalen Netzwerks 1 charakterisieren, in eine Partition aus B Komponenten P_j aufgeteilt. Hierbei können gemäß Block 131 insbesondere beispielsweise Anteile p_j · dL/dM_W mit Gewichten p_j und Σ_j p_j = 1 als Komponenten P_j der Partition gewählt werden.In step 130, a gradient dL/dM _W of the cost function L determined during this training is divided into a partition of B components P _j according to parameters M _W that characterize the behavior of the neural network 1. Here, according to block 131, in particular, for example, shares p _j · dL/dM _W with weights p _j and Σ _j p _j = 1 can be selected as components P _j of the partition.

In Schritt 140 wird aus jeder Komponente P_j des Gradienten dL/dM_W unter Heranziehung der funktionalen Abhängigkeit der Ausgaben y_i von Neuronen in der Eingangsschicht des neuronalen Netzwerks 1, die die Trainings-Beispiele x entgegennimmt, von den Parametern M_W,i dieser Neuronen und den Trainings-Beispielen x ein Trainings-Beispiel ${\tilde{x}}_{j}^{T}$

rekonstruiert.In step 140, each component P _j of the gradient dL/dM _W is converted from the parameters M W,i using the functional dependence of the outputs y _i of neurons in the input layer of the neural network 1, which receives the training examples _x Neurons and the training examples x one training example

{\tilde{x}}_{j}^{T}

reconstructed.

Die Parameter des neuronalen Netzwerks können insbesondere beispielsweise multiplikative Gewichte $w_{i}^{T}$

und additive Bias-Werte b_i sein, die dem Trainings-Beispiel x am i-ten Neuron in der Eingangsschicht des neuronalen Netzwerks 1 zugeschlagen werden. Gemäß Block 141 können dann aus der Komponente P_j des Gradienten dL/dM_W Gradienten dL/db_i der Kostenfunktion L nach dem Bias b_i und Gradienten

d L / d w_{i}^{T}

der Kostenfunktion L nach den Gewichten

w_{i}^{T}

ermittelt werden. Gemäß Block 142 kann dann aus diesen Gradienten dL/db_i und

d L / d w_{i}^{T}

die gesuchte Rekonstruktion

{\tilde{x}}_{j}^{T}

des Trainings-Beispiels ermittelt werden. Wie zuvor erläutert, sollte hierfür die Aktivierungsfunktion eine ReLU-Funktion sein, und auf die Eingangsschicht des neuronalen Netzwerks 1 sollte unmittelbar eine dichte Schicht folgen. Weiterhin muss das neuronale Netzwerk 1 (dL/db_i) ≠ 0 gewährleisten.The parameters of the neural network can in particular, for example, be multiplicative weights

w_{i}^{T}

and additive bias values b _i , which are added to the training example x on the ith neuron in the input layer of the neural network 1. According to block 141, gradients dL/db _i _of the cost function L according to the bias b _i and gradients can then be derived from the component P _j of the gradient dL/dM W

d L / d w_{i}^{T}

the cost function L according to the weights

w_{i}^{T}

be determined. According to block 142, dL/db _i and

d L / d w_{i}^{T}

the reconstruction you are looking for

{\tilde{x}}_{j}^{T}

of the training example can be determined. As explained previously, the activation function for this should be a ReLU function, and the input layer of the neural network 1 should be immediately followed by a dense layer. Furthermore, the neural network must ensure 1 (dL/db _i ) ≠ 0.

Gemäß Block 143 kann die Rekonstruktion von einer zentralen Entität Q vorgenommen werden, die das neuronale Netzwerk 1 zum Zwecke des föderierten Trainings an eine Mehrzahl von Clients C verteilt. Dies geht dann damit einher, dass gemäß Block 132 der Gradient dL/dM_W von einem Client C beim Training des neuronalen Netzwerks 1 auf einem Batch mit B Trainings-Beispielen x ermittelt und über diese B Trainings-Beispiele x aggregiert wird.According to block 143, the reconstruction can be carried out by a central entity Q, which distributes the neural network 1 to a plurality of clients C for the purpose of federated training. This is then accompanied by the fact that, according to block 132, the gradient dL/dM _W is determined by a client C when training the neural network 1 on a batch with B training examples x and is aggregated over these B training examples x.

In Schritt 150 werden die erhaltenen Rekonstruktionen ${\tilde{x}}_{j}^{T}$

mit der Gütefunktion R bewertet.In step 150 the obtained reconstructions

{\tilde{x}}_{j}^{T}

rated with the quality function R.

In Schritt 160 wird die Partition in die Komponenten P_j auf das Ziel optimiert, dass bei erneuter Aufteilung des Gradienten dL/dM_W der Kostenfunktion L und Rekonstruktion neuer Trainings-Beispiele ${\tilde{x}}_{j}^{T}$

deren Bewertung durch die Gütefunktion R verbessert wird.In step 160, the partition into the components P _j is optimized to the goal of redistributing the gradient dL/dM _W of the cost function L and reconstructing new training examples

{\tilde{x}}_{j}^{T}

whose evaluation is improved by the quality function R.

Gemäß Block 161 kann ein Gradient der Gütefunktion R zu Änderungen der Gewichte p_j zurückpropagiert werden.According to block 161, a gradient of the quality function R can be propagated back to changes in the weights p _j .

Gemäß Block 162 können die Gewichte p_j mit aus Logits des neuronalen Netzwerks 1 gebildeten Softmax-Werten initialisiert werden.According to block 162, the weights p _j can be initialized with softmax values formed from logits of the neural network 1.

In Schritt 170 werden die rekonstruierten Trainings-Beispiele ${\tilde{x}}_{j}^{T}$

dem neuronalen Netzwerk 1 als Validierungsdaten zugeführt.In step 170 the reconstructed training examples

{\tilde{x}}_{j}^{T}

supplied to the neural network 1 as validation data.

In Schritt 180 werden die daraufhin vom neuronalen Netzwerk 1 gelieferten Ausgaben 3 mit Soll-Ausgaben 3a verglichen.In step 180, the outputs 3 then supplied by the neural network 1 are compared with target outputs 3a.

In Schritt 190 wird anhand des Ergebnisses dieses Vergleichs ermittelt, inwieweit das neuronale Netzwerk 1 hinreichend auf ungesehene Daten generalisiert. In dem in 1 gezeigten Beispiel wird dies binär klassifiziert.In step 190, the result of this comparison is used to determine to what extent the neural network 1 generalizes sufficiently to unseen data. In the in 1 In the example shown, this is classified in binary terms.

Wenn das neuronale Netzwerk 1 hinreichend generalisiert (Wahrheitswert 1), werden dem neuronalen Netzwerk 1 in Schritt 200 Messdaten 2 zugeführt, die mit mindestens einem Sensor aufgenommen wurden.If the neural network 1 generalizes sufficiently (truth value 1), the neural network 1 is supplied in step 200 with measurement data 2 that were recorded with at least one sensor.

In Schritt 210 wird aus der daraufhin vom neuronalen Netzwerk 1 gelieferten Ausgabe 3 ein Ansteuersignal 210a ermittelt.In step 210, a control signal 210a is determined from the output 3 then supplied by the neural network 1.

In Schritt 220 wird ein Fahrzeug 50, ein Fahrassistenzsystem 60, ein System 70 zur Qualitätskontrolle, ein System 80 zur Überwachung von Bereichen, und/oder ein System 90 zur medizinischen Bildgebung, mit dem Ansteuersignal 210a angesteuert.In step 220, a vehicle 50, a driving assistance system 60, a system 70 for quality control, a system 80 for monitoring areas, and/or a system 90 for medical imaging, is controlled with the control signal 210a.

Die rekonstruierten Trainings-Beispiele ${\tilde{x}}_{j}^{T}$

können alternativ oder in Kombination hierzu auch in anderer Weise genutzt werden. In Schritt 230 wird hierzu eine Zeitentwicklung und/oder Statistik 4 über die rekonstruierten Trainings-Beispiele

{\tilde{x}}_{j}^{T}

ermittelt. Anhand dieser Zeitentwicklung und/oder Statistik 4 wird

• in Schritt 240 eine Drift 5a des Verhaltens des neuronalen Netzwerks 1, und/oder eine Verschlechterung 5b des Verhaltens des neuronalen Netzwerks 1 bezüglich bisheriger Trainings-Beispiele x beim Weitertraining mit neuen Trainings-Beispielen x detektiert, und/oder
• in Schritt 250 ein Steuereingriff 6 in die Zusammenarbeit der zentralen Entität Q mit den Clients C vorgenommen.

The reconstructed training examples

{\tilde{x}}_{j}^{T}

can alternatively or in combination be used in other ways. In step 230, a time development and/or statistics 4 about the reconstructed training examples are created

{\tilde{x}}_{j}^{T}

determined. Based on this time development and/or statistics 4

• in step 240 a drift 5a of the behavior of the neural network 1, and/or a deterioration 5b of the behavior of the neural network 1 with respect to previous training examples x is detected during further training with new training examples x, and/or
• in step 250 a control intervention 6 is made in the cooperation of the central entity Q with the clients C.

Gemäß Block 251 kann der Steuereingriff 6 beispielsweise beinhalten, zeitweise oder dauerhaft die von mindestens einem Client C gelieferten Gradienten dL/dM_W unberücksichtigt zu lassen oder unterzugewichten, beispielsweise durch Herunterskalieren.According to block 251, the control intervention 6 may include, for example, temporary or permanent to disregard or underweight the gradients dL/dM _W provided by at least one client C, for example by downscaling.

2 veranschaulicht die Rekonstruktion in einer Anwendung des föderierten Lernens, bei dem eine zentrale Entität Q das neuronale Netzwerk an eine Mehrzahl von Clients C verteilt. Jeder Client C trainiert das neuronale Netzwerk 1 auf einem lokal vorhandenen Batch mit B Trainings-Beispielen x, ermittelt den Gradienten dL/dM_W der Kostenfunktion L nach den Parametern M_W und gibt diesen Gradienten dL/dM_W an die zentrale Entität Q weiter. 2 illustrates reconstruction in a federated learning application where a central entity Q distributes the neural network to a plurality of clients C. Each client C trains the neural network 1 on a locally available batch with B training examples x, determines the gradient dL/dM _W of the cost function L according to the parameters M _W and passes this gradient dL/dM _W on to the central entity Q.

Die zentrale Entität Q zerlegt den Gradienten dL/dM_W in eine Partition aus Komponenten P_j mit j = 1, ..., B. Aus jeder Komponente P_j wird ein eigenes Trainings-Beispiel ${\tilde{x}}_{j}^{T}$

rekonstruiert. Die rekonstruierten Trainings-Beispiele

{\tilde{x}}_{j}^{T}

werden mit der Gütefunktion R bewertet. Die Gewichte p_j, mit denen die Komponenten P_j der Partition ermittelt wurden, werden variiert mit dem Ziel, die Bewertung

R ({\tilde{x}}_{j}^{T})

durch die Gütefunktion R zu verbessern. Wenn dieser iterative Prozess bis zu einem beliebigen Abbruchkriterium fortgeführt wird, entstehen am Ende rekonstruierte Trainings-Beispiele

{\tilde{x}}_{j}^{T},

die zumindest ähnlich zu den ursprünglichen Trainings-Beispielen x sind und der Domäne oder Verteilung dieser Trainings-Beispiele x angehören.The central entity Q breaks down the gradient dL/dM _W into a partition of components P _j with j = 1, ..., B. Each component P _j becomes its own training example

{\tilde{x}}_{j}^{T}

reconstructed. The reconstructed training examples

{\tilde{x}}_{j}^{T}

are evaluated with the quality function R. The weights p _j , with which the components P _j of the partition were determined, are varied with the aim of the evaluation

R ({\tilde{x}}_{j}^{T})

to be improved by the quality function R. If this iterative process is continued up to any termination criterion, reconstructed training examples are created at the end

{\tilde{x}}_{j}^{T},

which are at least similar to the original training examples x and belong to the domain or distribution of these training examples x.

Claims

Method (100) for reconstructing training examples x, with which a given neural network (1) was trained to optimize a given cost function L, with the steps: • a quality function R is provided (110), which is for a reconstructed Training example x̃ measures the extent to which it belongs to an expected domain or distribution of the training examples x; • a size B of a batch of training examples x with which the neural network 1 was trained is provided (120); • a gradient dL/dM _W of the cost function L determined during this training according to parameters M _W that characterize the behavior of the neural network (1) is divided into a partition of B components P _j (130); • Each component P _j of the gradient dL/dM _W is calculated using the functional dependence of the outputs y _i of neurons in the input layer of the neural network (1), which receives the training examples x, on the parameters M _W,i of this Neurons and the training examples x one training example

{\tilde{x}}_{j}^{T}

reconstructed (140); • the reconstructions thus obtained

{\tilde{x}}_{j}^{T}

are evaluated with the quality function R (150); • the partition into the components P _j is optimized to the goal (160) of splitting the gradient dL/dM _W of the cost function L again and reconstructing new training examples

{\tilde{x}}_{j}^{T}

whose evaluation is improved by the quality function R.

Procedure (100) according to Claim 1 , where shares p _j · dL/dM _W with weights p _j and Σ _j P _j = 1 are chosen as components P _j of the partition (131).

Procedure (100) according to Claim 2 , whereby a gradient of the quality function R is propagated back to changes in the weights p _j (161).

Method (100) according to one of Claims 1 until 3 , where the weights p _j are initialized with softmax values formed from logits of the neural network (1) (162).

Method (100) according to one of Claims 1 until 4 , where a neural network (1) is chosen that weights

w_{i}^{T}

and bias values b _i as parameters M _W,i , where an i-th neuron • a training example x supplied to this neuron with weights

w_{i}^{T}

multiplied, • a bias value b _i is added to the result in order to obtain an activation value of the neuron and • an output y _i is determined by applying a nonlinear activation function to this activation value.

Procedure (100) according to Claim 5 , where • from the component P _j of the gradient dL/dM _W gradient dL/db _i of the cost function L according to the bias b _i and gradient

d L / d w_{i}^{T}

the cost function L according to the weights

w_{i}^{T}

can be determined (141), and • from these gradients dL/db _i and

d L / d w_{i}^{T}

the reconstruction you are looking for

{\tilde{x}}_{j}^{T}

of the training example is determined (142).

Method (100) according to one of Claims 1 until 6 , where a trained discriminator of a Generative Adversarial Network, GAN, is chosen as the quality function R (111).

Method (100) according to one of Claims 1 until 7 , whereby training examples x are selected that represent images and/or time series of measured values.

Method (100) according to one of Claims 1 until 8th , where • the reconstructed training examples

{\tilde{x}}_{j}^{T}

dem neural network (1) are supplied as validation data (170), • the outputs (3) then supplied by the neural network (1) are compared with target outputs (3a) (180) and • are determined based on the result of this comparison (190 ), to what extent the neural network (1) generalizes sufficiently to unseen data.

Procedure (100) according to Claim 9 , whereby • in response to the fact that the neural network (1) generalizes sufficiently to unseen data, the neural network (1) is supplied with measurement data (2) (200), which were recorded with at least one sensor, • from which the neural Network (1) supplied output (3) a control signal (210a) is determined (210), and • a vehicle (50), a driving assistance system (60), a system (70) for quality control, a system (80) for monitoring Areas, and / or a system (90) for medical imaging, with which the control signal (210a) is controlled (220).

Method (100) according to one of Claims 1 until 10 , where • the reconstruction is carried out by a central entity Q (143), which distributes the neural network (1) to a plurality of clients C for the purpose of federated training, and • the gradient dL/dM _W from a client C during training of the neural network (1) is determined on a batch with B training examples x and is aggregated over these B training examples x (132).

Procedure (100) according to Claim 11 , where • a time development and/or statistics (4) about the reconstructed training examples

{\tilde{x}}_{j}^{T}

is determined (230), and based on this time development and/or statistics (4) • a drift (5a) of the behavior of the neural network (1), and/or a deterioration (5b) of the behavior of the neural network (1) with respect to previous ones Training examples x are detected during further training with new training examples x (240), and/or • a control intervention (6) is carried out in the cooperation of the central entity Q with the clients C (250).

Procedure (100) according to Claim 12 , whereby the control intervention (6) includes (251) temporarily or permanently disregarding or underweighting the gradients dL/dM _W supplied by at least one client C.

Computer program containing machine-readable instructions that, when executed on one or more computers, cause the computer or computers to perform a method according to one of the Claims 1 until 13 to carry out.

Machine-readable data carrier with the computer program Claim 14 .

One or more computers with the computer program Claim 14 , and/or with the machine-readable data carrier Claim 15 .