DE102021104672A1

DE102021104672A1 - Generation of counterfactual images for the evaluation of image classifiers

Info

Publication number: DE102021104672A1
Application number: DE102021104672.6A
Authority: DE
Inventors: Claudia Blaiotta; Prateek Katiyar
Original assignee: Robert Bosch GmbH
Current assignee: Robert Bosch GmbH
Priority date: 2021-02-26
Filing date: 2021-02-26
Publication date: 2022-09-01

Abstract

Ein Verfahren (100) zum Erzeugen eines kontrafaktischen Bildes xgfür ein Eingangsbild x, das ein gegebener Bildklassifikator (3) in eine Quellenklasse c klassifiziert, die aus mehreren verfügbaren Klassen einer gegebenen Klassifikation ausgewählt ist, das die folgenden Schritte umfasst:• Abbilden (110) des Eingangsbildes x auf eine Darstellung z mit niedrigerer Dimension in einem latenten Raum mittels eines trainierten Codierernetzes (1); undAbbilden (120) zumindest einer Kombination der Darstellung z mit niedrigerer Dimension und einer Angabe einer Zielklasse c'≠c, die aus den mehreren verfügbaren Klassen ausgewählt ist, auf ein kontrafaktisches Bild xg, das in die Zielklasse c' durch den gegebenen Bildklassifikator (3) klassifiziert wird, mittels eines trainierten Generatornetzes (2).Ein Verfahren (200) zum Trainieren einer Kombination eines Codierernetzes (1) und eines Generatornetzes (2) für die Verwendung im Verfahren (100).Eine Vorrichtung (10), um aus einem Quellenbild x, das ein Teil einer Quellendomäne von Bildern ist, ein Zielbild xg, das ein anderes Muster aus derselben Quellendomäne ist, zu erzeugen.A method (100) for generating a counterfactual image xg for an input image x that a given image classifier (3) classifies into a source class c selected from a plurality of available classes of a given classification, comprising the steps of:• mapping (110) the input image x to a lower dimension representation z in a latent space by means of a trained coder network (1); andmapping (120) at least one combination of the lower dimension representation z and an indication of a target class c'≠c selected from the plurality of available classes to a counterfactual image xg which is classified into the target class c' by the given image classifier (3 ) is classified, by means of a trained generator network (2).A method (200) for training a combination of a coder network (1) and a generator network (2) for use in the method (100).An apparatus (10) to consist of a source image x, which is part of a source domain of images, to generate a target image xg, which is a different template from the same source domain.

Description

Die vorliegende Erfindung bezieht sich auf die Erzeugung von kontrafaktischen Bildern, die unter anderem für die Auswertung, auf welche Teile eines Bildes ein Bildklassifikator seine Entscheidung gestützt hat, verwendet werden können.The present invention relates to the generation of counterfactual images that can be used, among other things, for evaluating which parts of an image an image classifier based its decision on.

Hintergrundbackground

Wenn Produkte in Massen produziert werden, ist es üblicherweise eine Anforderung, die Qualität der Produktion kontinuierlich zu überwachen. Das Ziel besteht darin, Qualitätsprobleme so früh wie möglich zu detektieren, um die Grundursache so bald wie möglich zu beheben und nicht zu viele Einheiten des Produkts als Abfall zu verlieren.When products are mass-produced, it is usually a requirement to continuously monitor the quality of production. The goal is to detect quality problems as early as possible in order to fix the root cause as soon as possible and not lose too many units of product as waste.

Die optische Untersuchung der Geometrie und/oder Oberfläche eines Produkts ist schnell und zerstörungsfrei. WO 2018/197 074 A1 offenbart eine Untersuchungsvorrichtung, in der ein Objekt einer breiten Vielfalt von Beleuchtungskonfigurationen ausgesetzt werden kann. In jeder Beleuchtungskonfiguration erfasst eine Kamera Bilder des Objekts. Aus diesen Bildern wird die Topographie des Objekts ausgewertet.Optical inspection of a product's geometry and/or surface is fast and non-destructive. WO 2018/197 074 A1 discloses an inspection apparatus in which an object can be exposed to a wide variety of illumination configurations. In each lighting configuration, a camera captures images of the object. The topography of the object is evaluated from these images.

Bilder des Produkts können auch direkt auf eine oder mehrere Klassen einer gegebenen Klassifikation mittels eines Bildklassifikators auf der Basis von neuronalen Netzen abgebildet werden. Auf der Basis des Ergebnisses kann das Produkt einer oder mehreren vorbestimmten Qualitätsklassen zugewiesen werden. Im einfachsten Fall ist dies eine binäre Klassifikation der Form „OK“/ „nicht OK“ = „NOK“.Images of the product can also be directly mapped to one or more classes of a given classification using an image classifier based on neural networks. Based on the result, the product can be assigned to one or more predetermined quality classes. In the simplest case, this is a binary classification in the form "OK"/"not OK" = "NOK".

Offenbarung der ErfindungDisclosure of Invention

Die Erfindung schafft ein Verfahren zum Erzeugen eines kontrafaktischen Bildes x_g für ein Eingangsbild x. Das Eingangsbild x wird in eine Quellenklasse c aus mehreren verfügbaren Klassen einer gegebenen Klassifikation durch einen Bildklassifikator klassifiziert. Der Name „kontrafaktisch“ des Bildes x_g wird so verstanden, dass er impliziert, dass dieses Bild in eine Zielklasse c', die aus den mehreren verfügbaren Klassen ausgewählt ist, klassifiziert wird und diese Zielklasse c' von der Quellenklasse c verschieden ist.The invention provides a method for generating a counterfactual image _xg for an input image x. The input image x is classified into a source class c from several available classes of a given classification by an image classifier. The name "counterfactual" of the image x _g is understood to imply that this image is classified into a target class c' selected from the several available classes and this target class c' is different from the source class c.

Im Verlauf des Verfahrens wird das Eingangsbild x auf eine Darstellung z mit niedrigerer Dimension in einem latenten Raum mittels eines trainierten Codierernetzes abgebildet. Dieser latente Raum kann beispielsweise durch ein Training des Codierernetzes in Verbindung mit dem Training eines Decodierers, der das ursprüngliche Eingangsbild x rekonstruieren soll, definiert sein. Eine solche Codierer-Decodierer-Struktur wird „Autocodierer“ genannt. Ein Typ von Autocodierern, die vorteilhaft im Zusammenhang mit der vorliegenden Erfindung verwendet werden können, ist der variantenreiche Autocodierer, VAE. Der latente Raum kann als Untermannigfaltigkeit eines kartesischen Raums beliebiger Dimensionalität verstanden werden. Aufgrund dieser Eigenschaft ist es typischerweise leichter, ein Muster z von diesem latenten Raum zu erhalten, indem ein Muster in dem Raum von Eingangsbildern x genommen wird und dieses in das Muster z mittels des Codierernetzes transformiert wird, als es ist, direkt ein Muster z zu finden, das zu dem latenten Raum gehört. Da die Dimensionalität der Darstellung z niedriger ist als die Dimensionalität des Eingangsbildes x (und seiner Rekonstruktion), werden, wenn die Darstellung berechnet wird und dann das ursprüngliche Eingangsbild rekonstruiert wird, die Informationen durch einen „Engpass“ gezwungen. Dieser Engpass zwingt den Codierer, nur die Informationen in die Darstellung z zu codieren, die für die Rekonstruktion des ursprünglichen Bildes x am wichtigsten sind.In the course of the method, the input image x is mapped to a lower dimension representation z in latent space by means of a trained coder network. This latent space can be defined, for example, by training the coder network in conjunction with training a decoder that is to reconstruct the original input image x. Such a coder-decoder structure is called "autocoder". One type of autocoder that can be used to advantage in the context of the present invention is the Variant Autocoder, VAE. Latent space can be understood as a submanifold of a Cartesian space of any dimensionality. Because of this property, it is typically easier to obtain a pattern z from this latent space by taking a pattern in the space of input images x and transforming this into the pattern z by means of the encoder network than it is to obtain a pattern z directly find that belongs to the latent space. Because the dimensionality of the representation z is lower than the dimensionality of the input image x (and its reconstruction), when the representation is computed and then the original input image is reconstructed, the information is forced through a "bottleneck". This bottleneck forces the encoder to encode into the representation z only the information that is most important for the reconstruction of the original image x.

Mittels eines trainierten Generatornetzes wird zumindest eine Kombination der Darstellung z mit niedrigerer Dimension und einer Angabe einer Zielklasse c'≠c, die aus den mehreren verfügbaren Klassen ausgewählt ist, auf das gesuchte kontrafaktische Bild x_g abgebildet. Hier bedeutet „zumindest“, dass das Generatornetz weitere Eingaben wie z. B. ein Rauschmuster, das willkürlich aus irgendeiner Verteilung bezogen wird, empfangen kann.Using a trained generator network, at least one combination of the representation z with a lower dimension and an indication of a target class c'≠c, which is selected from the several available classes, is mapped onto the sought-after counterfactual image _xg . Here "at least" means that the generator network has other inputs such as e.g. B. can receive a noise pattern arbitrarily obtained from any distribution.

Es ist bekannt, dass Generatornetze wie z. B. Generatorteile von generativen gegnerischen Netzen, GANs, Bilder erzeugen, die zu einer gewünschten Domäne gehören und daher im Zusammenhang mit anderen Bildern, die zu dieser Domäne gehören, „realistisch“ sind. Diese Bilder können an zusätzlichen gewünschten Eigenschaften, wie z. B. einer Klasse, zu der das erzeugte Bild gehören soll, weiter „aufbereitet“ werden. Durch Kombinieren des Generatornetzes mit dem Codierernetz und Einspeisen der Darstellung z mit niedrigerer Dimension in das Generatornetz kann das resultierende kontrafaktische Bild x_g spezifisch so erzeugt werden, dass es zum ursprünglichen Eingangsbild so ähnlich wie möglich ist. Das heißt, die wichtigsten Informationen vom ursprünglichen Eingangsbild x, das in der Darstellung z mit niedrigerer Dimension verdichtet wird, werden im kontrafaktischen Bild x_g bewahrt.It is known that generator networks such. B. Generator parts of generative adversarial networks, GANs, generate images that belong to a desired domain and are therefore “realistic” in connection with other images that belong to this domain. These images can be customized with additional desired properties, e.g. B. a class to which the generated image should belong, further "edited". Specifically, by combining the generator network with the encoder network and feeding the lower dimension representation z into the generator network, the resulting counterfactual image x _g can be generated to be as similar as possible to the original input image. That is, the most important information from the original input image x, which is compressed in the lower dimension representation z, is preserved in the counterfactual image _xg .

In einer besonders vorteilhaften Ausführungsform wird das kontrafaktische Bild x_g mit dem Eingangsbild x verglichen. Die Bereiche, in denen sich das kontrafaktische Bild x_g vom Eingangsbild x unterscheidet, werden als Bereiche bestimmt, die in Bezug auf die Klassengrenze zwischen der Quellenklasse c und der Zielklasse c' signifikant sind. Dann kann bestimmt werden, ob diese Bereiche tatsächlich Informationen enthalten, die für die vorliegende Anwendung relevant sind.In a particularly advantageous embodiment, the counterfactual image x _g is compared with the input image x. The areas in which the counterfactual image x _g differs from the input image x are determined as areas that are significant with respect to the class boundary between the source class c and the target class c'. Then it can be determined whether these areas actually may contain information relevant to the application at hand.

Eine Klassifikationsbewertung, die durch einen Bildklassifikator während der optischen Untersuchung von Produkten ausgegeben wird, ist beispielsweise nicht sehr glaubwürdig, wenn sich herausstellt, dass die Entscheidung auf einigen Merkmalen im Hintergrund des Bildes basiert, die mit dem Produkt selbst nichts zu tun haben. Wenn das Ändern einiger Merkmale in einer Schaufensterauslage verursacht, dass eine Verkehrssituation auf der Straße vor diesem Schaufenster anders klassifiziert wird, besteht ebenso ein begründeter Zweifel, ob dieser Klassifikator tatsächlich passend ist, um Verkehrssituationen korrekt zu beurteilen. Ein solches Verhalten eines Bildklassifikators ist etwas ähnlich zum Kaufen eines Produkts A anstelle eines ähnlichen Produkts B aus dem einzigen Grund, dass der Vertriebsmitarbeiter für das Produkt B größer ist als der Vertriebsmitarbeiter für das Produkt A.For example, a classification score given by an image classifier during the visual inspection of products is not very credible if the decision turns out to be based on some features in the background of the image that have nothing to do with the product itself. Likewise, if changing some features in a window display causes a traffic situation on the road in front of that window to be classified differently, there is a reasonable doubt as to whether this classifier is actually appropriate to correctly assess traffic situations. Such behavior of an image classifier is somewhat similar to buying a product A instead of a similar product B for the sole reason that the sales rep for product B is larger than the sales rep for product A.

Insbesondere für die Beurteilung von Verkehrssituationen ist es auch sehr wichtig zu wissen, auf welche Teile eines Objekts ein Bildklassifikator seine Entscheidung stützt, um die Anwesenheit dieses Objekts zu erkennen. In vielen Verkehrssituationen werden Objekte teilweise durch andere Objekte verdeckt, müssen jedoch trotzdem erkannt werden. Wenn beispielsweise ein oktogonales Stoppschild durch Schnee bedeckt ist, wird erwartet, dass ein Fahrer und auch ein automatisiertes Fahrzeug es immer noch erkennt und sich dementsprechend verhält. Die rechtzeitige Detektion von Fußgängern, die gleich die Bahn des Fahrzeugs kreuzen, hängt auch kritisch davon ab, welche Teile des Fußgängers sichtbar sein müssen, damit der Fußgänger erkannt wird. Viele gefährliche Situationen entstehen, wenn ein Fußgänger ganz oder teilweise durch geparkte Autos oder andere Hindernisse verdeckt ist und plötzlich in den Weg eines Fahrzeugs schreitet. Wenn der Bildklassifikator benötigt, dass zumindest der Kopf und die zwei Beine sichtbar sind, bevor er den Fußgänger erkennt, und die Beine für eine lange Zeit verdeckt sind, kann der Fußgänger nicht erkannt werden, bis es zu spät ist, um eine Kollision abzuwenden. Wenn dagegen das Erscheinen des Kopfs, des Torso oder eines Arms bereits für die Detektion des Fußgängers ausreicht, wird viel mehr wertvolle Zeit zum Abwenden der Kollision gewonnen.In particular for the assessment of traffic situations, it is also very important to know which parts of an object an image classifier bases its decision on in order to recognize the presence of this object. In many traffic situations, objects are partially covered by other objects, but still have to be recognized. For example, if an octagonal stop sign is covered by snow, a driver, as well as an automated vehicle, is still expected to recognize it and behave accordingly. The timely detection of pedestrians who are about to cross the path of the vehicle also depends critically on which parts of the pedestrian must be visible in order for the pedestrian to be recognized. Many dangerous situations arise when a pedestrian, wholly or partially obscured by parked cars or other obstacles, suddenly steps in the path of a vehicle. If the image classifier requires at least the head and two legs to be visible before detecting the pedestrian, and the legs are obscured for a long time, the pedestrian may not be detected until it is too late to avert a collision. On the other hand, if the appearance of the head, the torso or an arm is already sufficient for the detection of the pedestrian, much more valuable time is gained to avoid the collision.

Daher wird in einer weiteren besonders vorteilhaften Ausführungsform durch eine gegebene Metrik bestimmt, wie gut die Bereiche, die in Bezug auf eine Klassengrenze signifikant sind, mit gegebenen Bereichen des Eingangsbildes x, die Merkmale des Eingangsbildes enthalten, die als auffällig erachtet werden, in Übereinstimmung sind. Eine Bewertung, die dem durch die gegebene Metrik ausgegebenen Ergebnis entspricht, wird dem Bildklassifikator zugeschrieben. Wenn beispielsweise in Massen produzierte Produkte optisch untersucht werden, liegen das Produkt und die Kamera immer in derselben räumlichen Anordnung in Bezug aufeinander. Daher ist im Voraus bekannt, welche Teile des erfassten Bildes zum Produkt gehören. Ebenso ist es in Verkehrssituationen bekannt, dass der Himmel keine anderen Fahrzeuge enthält, auf die ein Auto reagieren muss. Therefore, in another particularly advantageous embodiment, a given metric determines how well the regions that are significant with respect to a class boundary match given regions of the input image x that contain features of the input image that are considered conspicuous . A score corresponding to the result returned by the given metric is attributed to the image classifier. For example, when mass-produced products are optically inspected, the product and the camera are always in the same spatial arrangement with respect to each other. Therefore, it is known in advance which parts of the captured image belong to the product. Likewise, in traffic situations, it is known that the sky contains no other vehicles to which a car must react.

Insbesondere kann im Verwendungsfall der optischen Untersuchung die Metrik messen, ob die Bereiche, die mit Bezug auf die Klassifikation eines Produkts in die Klasse „nicht OK = NOK“ oder irgendeine andere nicht optimale Klasse signifikant sind, Bereichen mit konkreten Defekten oder Mängeln entsprechen. Dies bringt die automatisierte optische Untersuchung mit der manuellen Untersuchung in Einklang, bei der erwartet wird, dass ein menschlicher Qualitätsprüfer, wenn er gefragt wird, warum ein konkretes Produkt verworfen werden soll, auf spezielle Defekte oder Mängel zeigt.In particular, in the use case of visual inspection, the metric can measure whether the areas that are significant with respect to the classification of a product in the class "Not OK = NOK" or some other non-optimal class correspond to areas with concrete defects or imperfections. This aligns automated visual inspection with manual inspection, where a human quality inspector, when asked why a particular product should be discarded, is expected to point to specific defects or imperfections.

Die Bewertung, die dem Bildklassifikator zugeschrieben wird, kann als Rückmeldung verwendet werden, um den Bildklassifikator zu verbessern. In einer weiteren vorteilhaften Ausführungsform werden daher Parameter, die das Verhalten des Bildklassifikators charakterisieren, optimiert, so dass, wenn die Berechnung des kontrafaktischen Bildes x_g und die anschließende Auswertung dieses kontrafaktischen Bildes x_g wiederholt werden und die Bewertung für den Bildklassifikator erneut berechnet wird, diese Bewertung sich wahrscheinlich verbessert.The score attributed to the image classifier can be used as feedback to improve the image classifier. In a further advantageous embodiment, parameters that characterize the behavior of the image classifier are therefore optimized so that if the calculation of the counterfactual image x _g and the subsequent evaluation of this counterfactual image x _g are repeated and the evaluation for the image classifier is calculated again, this rating is likely to improve.

Ein Verwendungsfall dessen ist ein anwendungsspezifisches weiteres Training eines Bildklassifikators, der vorher in einer allgemeineren Weise vortrainiert wurde. Ein Bildklassifikator kann beispielsweise allgemein trainiert werden, um bestimmte Defekte oder Mängel zu detektieren, aber in einer konkreten Anwendung kann er weiter trainiert werden, um auf die richtigen Stellen zu blicken, die zum tatsächlichen Produkt und nicht zum Hintergrund gehören.A use case of this is an application specific further training of an image classifier that was previously pre-trained in a more general way. For example, an image classifier can be generally trained to detect certain defects or imperfections, but in a specific application it can be further trained to look at the right spots that belong to the actual product and not the background.

In einer weiteren vorteilhaften Ausführungsform kann im Verlauf der Abbildung der Darstellung z mit niedrigerer Dimension und der Zielklasse c' auf das kontrafaktische Bild x_g das kontrafaktische Bild x_g auf eine Klassifikationsbewertung c# durch einen gegebenen Bildklassifikator abgebildet werden. Dieser Bildklassifikator kann derselbe sein, der bereits das Eingangsbild x in die Klasse c klassifiziert hat, aber dies ist nicht erforderlich. Mittels einer gegebenen Klassifikationsverlustfunktion wird bestimmt, wie gut die Klassifikationsbewertung c# mit einer Klassifikation des kontrafaktischen Bildes x_g in die Zielklasse c' in Übereinstimmung ist. Mindestens eine Eingabe in das Generatornetz wird optimiert, so dass die erneute Berechnung des kontrafaktischen Bildes x_g auf der Basis der geänderten Eingabe wahrscheinlich bewirkt, dass sich der Wert der Klassifikationsverlustfunktion verbessert. In dieser Weise kann die Erzeugung des kontrafaktischen Bildes x_g feinabgestimmt werden, so dass es eindeutiger in die Zielklasse c' klassifiziert wird.In a further advantageous embodiment, in the course of mapping the lower dimension representation z and the target class c' to the counterfactual image _xg , the counterfactual image _xg can be mapped to a classification score c# by a given image classifier. This image classifier can be the same one that already classified the input image x into class c, but this is not required. A given classification loss function is used to determine how well the classification score c# matches a classification of the counterfactual image x _g into the target class c′. At least one input to the generator network is optimized so that the recalculation of the counterfactual image x _g based on the changed input is likely to cause the value of the classification loss function to improve. In this way, the generation of the counterfactual image x _g can be fine-tuned so that it is more clearly classified into the target class c'.

Eine oder mehrere Komponenten der Darstellung z mit niedrigerer Dimension oder eines Rauschmusters, das zusätzlich in den Generator eingespeist wird, können beispielsweise geändert werden.For example, one or more components of the representation z with lower dimension or a noise pattern that is additionally fed into the generator can be changed.

Wie vorher erörtert, kann das Eingangsbild x ein Bild eines hergestellten Produkts sein, das im Verlauf der optischen Untersuchung des Produkts erfasst wurde, und die Klassen der gegebenen Klassifikation stellen Qualitätsstufen für das Produkt dar. In einem anderen Beispiel kann das Eingangsbild x ein Bild einer Verkehrssituation sein und die Klassen der gegebenen Klassifikation können Objekte darstellen, die für die Interpretation der Verkehrssituation relevant sind.As previously discussed, the input image x may be an image of a manufactured product, captured in the course of visually inspecting the product, and the classes of the given classification represent levels of quality for the product. In another example, the input image x may be an image of a Be traffic situation and the classes of the given classification can represent objects that are relevant for the interpretation of the traffic situation.

Die Erfindung schafft auch ein Verfahren für das Training einer Kombination eines Codierernetzes und eines Generatornetzes für die Verwendung im vorher beschriebenen Verfahren.The invention also provides a method for training a combination of an encoder network and a generator network for use in the method previously described.

Im Verlauf dieses Trainingsverfahrens wird ein Decodierernetz bereitgestellt. Dieses Decodierernetz ist dazu konfiguriert, eine Darstellung z mit niedrigerer Dimension in einem latenten Raum, die vom Codierernetz erhalten wurde, in ein rekonstruiertes Bild x_d in der Domäne von ursprünglichen Eingangsbildern x abzubilden. Parameter, die das Verhalten der Codierer- und Decodierernetze charakterisieren, werden dann mit dem Ziel optimiert, dass das rekonstruierte Bild x_d dem ursprünglichen Eingangsbild x entspricht, aus dem die Darstellung z erhalten wurde. Wie vorher erörtert, bilden dann das Codierernetz und das Decodierernetz einen Autocodierer mit einem Informationsengpass, der zu einer Konzentration der wichtigsten Bildmerkmale in der Darstellung z führt.In the course of this training procedure, a decoder network is provided. This decoder network is configured to map a lower dimension representation z in latent space obtained from the encoder network into a reconstructed image x _d in the domain of original input images x. Parameters characterizing the behavior of the encoder and decoder networks are then optimized with the aim that the reconstructed image x _d corresponds to the original input image x from which the representation z was obtained. Then, as previously discussed, the encoder network and the decoder network form an autocoder with an information bottleneck that leads to a concentration of the most important image features in the representation z.

Ferner wird ein Diskriminatornetz geschaffen. Das Diskriminatornetz ist dazu konfiguriert zu unterscheiden, ob ein Bild von der Domäne von ursprünglichen Eingangsbildern x oder von der Domäne von erzeugten Bildern x_f stammt. Ein Bildklassifikator wird auch bereitgestellt. Dieser Bildklassifikator ist dazu konfiguriert, das ursprüngliche Eingangsbild x und das erzeugte Bild x_f auf eine oder mehrere Klassen der gegebenen Klassifikation abzubilden. Die erzeugten Bilder werden auch „gefälschte“ Bilder genannt.A discriminator network is also created. The discriminator network is configured to discriminate whether an image is from the domain of original input images x or from the domain of generated images x _f . An image classifier is also provided. This image classifier is configured to map the original input image _x and the generated image xf to one or more classes of the given classification. The generated images are also called "fake" images.

Das Generatornetz und das Diskriminatornetz werden gegnerisch trainiert. Das Training kann beispielsweise zwischen dem Training des Generatornetzes und dem Training des Diskriminatornetzes abwechseln.The generator network and the discriminator network are trained in opposition. The training can, for example, alternate between training the generator network and training the discriminator network.

Parameter, die das Verhalten des Generatornetzes charakterisieren, werden mit den Zielen optimiert, dass

• die Genauigkeit, mit der das Diskriminatornetz zwischen ursprünglichen Eingangsbildern x und erzeugten Bildern x_f unterscheidet, abnimmt, und
• der Bildklassifikator gefälschte (d. h. erzeugte) Bilder x_f auf ihre gegebene Zielklasse c abbildet.

Parameters characterizing the behavior of the generator network are optimized with the objectives that

• the accuracy with which the discriminator network distinguishes between original input images _x and generated images xf decreases, and
• the image classifier maps fake (ie generated) images _xf to their given target class c.

Andererseits werden Parameter, die das Verhalten des Diskriminators charakterisieren, mit dem Ziel optimiert, dass die Genauigkeit, mit der das Diskriminatornetz zwischen ursprünglichen Eingangsbildern x und gefälschten Bildern x_f unterscheidet, zunimmt.On the other hand, parameters that characterize the behavior of the discriminator are optimized with the aim of increasing the accuracy with which the discriminator network distinguishes between original input images x and fake images x _f .

Dieses gegnerische Training kann beispielsweise durch Optimieren einer Verlustfunktion bewerkstelligt werden, die Folgendes umfasst

• einen gegnerischen Verlust, der die Genauigkeit misst, mit der das Diskriminatornetz zwischen ursprünglichen Eingangsbildern x und erzeugten Bildern x_f unterscheidet, und
• einen Klassifikationsverlust, der misst, wie gut der Bildklassifikator erzeugte Bilder x_f auf ihre gegebene Zielklasse c' abbildet. Dieser Klassifikationsverlust kann beispielsweise ein Kreuzentropieverlust sein.

This adversarial training can be accomplished, for example, by optimizing a loss function that includes the following

• an opponent's loss, which measures the accuracy with which the discriminator network distinguishes between original input images _x and generated images xf, and
• a classification loss that measures how well the image classifier maps generated images x _f to their given target class c'. This classification loss can be a cross-entropy loss, for example.

Das Gesamtziel des gegnerischen Trainings besteht darin, diese kombinierte Verlustfunktion zu minimieren. Hier hängt der Klassifikationsverlust nur von den Parametern ab, die das Verhalten des Generatornetzes charakterisieren. Aber der gegnerische Verlust hängt zusätzlich von den Parametern ab, die das Verhalten des Diskriminatornetzes charakterisieren. Die Generatorparameter können optimiert werden, um den gegnerischen Verlust zu minimieren, und die Diskriminatorparameter können gleichzeitig optimiert werden, um den gegnerischen Verlust zu maximieren, oder umgekehrt.The overall goal of opponent training is to minimize this combined loss function. Here the classification loss depends only on the parameters that characterize the behavior of the generator network. But the enemy's loss additionally depends on the parameters characterizing the behavior of the discriminator network. The generator parameters can be optimized to minimize enemy loss and the discriminator parameters can be simultaneously optimized to maximize enemy loss, or vice versa.

Nachdem das gegnerische Training vollendet wurde, kann die Kombination des Codierernetzes und des Generatornetzes im Verfahren zum Erzeugen eines kontrafaktischen Bildes x_g, das vorstehend beschrieben ist, verwendet werden, um ein erzeugtes Bild x_f zu produzieren, das als gesuchtes kontrafaktisches Bild x_g dienen kann.After adversary training has been completed, the combination of the encoder network and the generator network can be used in the method for generating a counterfactual image _xg described above to produce a generated image _xf to serve as the sought-after counterfactual image _xg can.

Das gegnerische Training kann nach dem Training des Autocodierers durchgeführt werden. Das heißt, nachdem der Autocodierer trainiert wurde, können die Parameter des Codierernetzes und des Decodierernetzes fest bleiben. Aber das Training des Autocodierers und das gegnerische Training können auch zu einer einzelnen Trainingsprozedur kombiniert werden.Opponent training can be done after training the autocoder. That is, after the autocoder is trained the parameters of the encoder network and the decoder network can remain fixed. But autocoder training and adversary training can also be combined into a single training procedure.

In einer weiteren vorteilhaften Ausführungsform werden die Parameter, die das Verhalten des Codierernetzes charakterisieren, zusätzlich auf das Ziel hin optimiert, dass gegenseitige Informationen zwischen dem ursprünglichen Eingangsbild x und seiner Darstellung z maximiert werden. Dies stellt sicher, dass die Darstellung z einige wichtige Attribute des Eingangsbildes bewahrt, so dass sie auf das erzeugte Bild x_f übertragen werden können. Dies vermeidet Situationen, in denen das Codierernetz und das Decodierernetz eine sehr gute Rekonstruktion des ursprünglichen Eingangsbildes x zu dem Preis bewerkstelligen, dass die Darstellung z eine geringe oder keine sichtbare Korrelation mit dem Eingangsbild x aufweist.In a further advantageous embodiment, the parameters that characterize the behavior of the coder network are additionally optimized with the aim of maximizing mutual information between the original input image x and its representation z. This ensures that the representation z preserves some important attributes of the input image so that they can be transferred to the generated image x _f . This avoids situations where the encoder network and the decoder network accomplish a very good reconstruction of the original input image x at the price that the representation z has little or no apparent correlation with the input image x.

Alternativ oder in Kombination werden die Parameter, die das Verhalten des Generatornetzes charakterisieren, zusätzlich auf das Ziel hin optimiert, dass gegenseitige Informationen zwischen dem gefälschten Bild x_f und der Darstellung z maximiert werden. Dies bewirkt, dass Attribute vom ursprünglichen Eingangsbild x, die auf die Darstellung z übertragen wurden, sich zum erzeugten Bild x_f weiter bewegen. Insbesondere können solche Attribute Merkmale sein, die nicht durch die Klasse erfasst werden, zu der das Bild gehört. Beispiele dessen sind Bildstilattribute wie z. B. Liniendicken oder Farben, die vielmehr über alle Klassen vorhanden sind als dass sie an spezielle Klassen gebunden sind.Alternatively or in combination, the parameters that characterize the behavior of the generator network are additionally optimized with the aim of maximizing mutual information between the fake image x _f and the representation z. This causes attributes from the original input image x that were mapped to representation z to move on to the generated image x _f . In particular, such attributes can be features that are not covered by the class to which the image belongs. Examples of this are picture style attributes such as B. line thicknesses or colors, which exist across all classes rather than being bound to special classes.

In einer weiteren vorteilhaften Ausführungsform umfasst der Bildklassifikator ein trainierbares Netz, das dazu konfiguriert ist, Eingangsbilder x und erzeugte Bilder x_f auf eine Kombination einer Darstellung z mit niedrigerer Dimension im latenten Raum und einer Klassifikationsbewertung c# abzubilden. Parameter, die das Verhalten dieses trainierbaren Netzes charakterisieren, werden mit den Zielen optimiert, dass:

• der Bildklassifikator ein ursprüngliches Eingangsbild x auf eine Kombination von: einer Darstellung z, die der Darstellung z entspricht, die durch das Codierernetz erzeugt wird, und einer Klassifikationsbewertung c#, die mit einer Ground-Truth-Kennung c des Eingangsbildes x konsistent ist, abbildet; und
• der Bildklassifikator ein erzeugtes Bild x_f auf eine Kombination von: einer Darstellung z, die der Darstellung z entspricht, aus der das erzeugte Bild x_f erzeugt wurde, und einer Klassifikationsbewertung c#, die mit der Zielklasse c' konsistent ist, für die das erzeugte Bild x_f erzeugt wurde, abbildet.

In a further advantageous embodiment, the image classifier comprises a trainable network configured to map input images _x and generated images xf to a combination of a lower dimension representation z in latent space and a classification score c#. Parameters characterizing the behavior of this trainable network are optimized with the goals that:

• the image classifier maps an original input image x to a combination of: a representation z corresponding to the representation z generated by the encoder network and a classification score c# consistent with a ground truth identifier c of the input image x ; and
• the image classifier a generated image x _f to a combination of: a representation z corresponding to the representation z from which the generated image x _f was generated, and a classification score c# consistent with the target class c' for which the generated image x _f was generated.

In dieser Weise schafft der Bildklassifikator nicht nur eine Rückmeldung über diese Klasse des erzeugten Bildes x_f. Vielmehr dient er auch dazu, die Selbstkonsistenz im latenten Raum von Darstellungen z zu überwachen.In this way, the image classifier not only provides feedback about this class of generated image x _f . Rather, it also serves to monitor the self-consistency in the latent space of representations z.

Allgemeiner schafft die Erfindung auch eine Vorrichtung, um aus einem Quellenbild x, das ein Teil einer Quellenverteilung von Bildern (wie z. B. Bildern, die eine bestimmte Art von Szenerie zeigen), ist, ein Zielbild x_g, das ein anderes Muster aus derselben Quellenverteilung ist, zu erzeugen. Diese Vorrichtung umfasst Folgendes:

• ein trainiertes Codierernetz, das dazu konfiguriert ist, das Quellenbild x in eine Darstellung z mit niedrigerer Dimension in einem latenten Raum abzubilden; und
• ein trainiertes Generatornetz, das dazu konfiguriert ist, zumindest die Darstellung z mit niedrigerer Dimension auf das Zielbild x_g abzubilden.

More generally, the invention also provides apparatus for deriving from a source image x that is part of a source distribution of images (such as images showing a certain type of scenery) a target image x _g that consists of a different pattern same source distribution is to generate. This device includes the following:

• a trained coder network configured to map the source image x into a lower dimension representation z in latent space; and
• a trained generator network configured to map at least the lower dimension representation z to the target image x _g .

Wie vorher erörtert, bewirkt im Vergleich zur Verwendung nur eines trainierten Generatornetzes die Kombination mit dem trainierten Codierernetz, dass das Zielbild x_g nicht nur in derselben Domäne wie das Quellenbild x liegt, sondern auch eine semantische Ähnlichkeit zu diesem Quellenbild x aufweist.As previously discussed, compared to using only a trained generator network, the combination with the trained coder network causes the target image x _g not only to be in the same domain as the source image x, but also to have a semantic similarity to that source image x.

In einer vorteilhaften Ausführungsform ist das Generatornetz dazu konfiguriert, eine Kombination der Darstellung z mit niedrigerer Dimension und einer Zielklasse c' auf ein Zielbild x_g abzubilden, das ein Mitglied der Zielklasse c' ist. Die Zielklasse c' ist eine von mehreren Klassen, die in einer vorbestimmten Klassifikation von Bildern in der Quellendomäne enthalten sind. Insbesondere kann diese Zielklasse c' von einer Quellenklasse c verschieden sein, zu der das Quellenbild x gehört. In dieser Weise kann das Zielbild x_g dann zu einem kontrafaktischen Bild in der Hinsicht werden, dass es zum Quellenbild x ähnlich ist, während es gleichzeitig zu einer anderen Klasse gehört.In an advantageous embodiment, the generator network is configured to map a combination of the lower-dimensional representation z and a target class c' to a target image x _g that is a member of the target class c'. The target class c' is one of several classes included in a predetermined classification of images in the source domain. In particular, this target class c' can be different from a source class c to which the source image x belongs. In this way, the target image x _g can then become a counterfactual image in that it is similar to the source image x while at the same time belonging to a different class.

Die vorstehend beschriebenen Verfahren können ganz oder teilweise computerimplementiert sein und folglich in Software verkörpert sein. Die Erfindung bezieht sich daher auch auf ein Computerprogramm mit maschinenlesbaren Befehlen, die, wenn sie durch einen oder mehrere Computer ausgeführt werden, bewirken, dass der eine oder die mehreren Computer ein vorstehend beschriebenes Verfahren durchführen. In dieser Hinsicht sind Steuereinheiten für Fahrzeuge und andere eingebettete Systeme, die einen ausführbaren Programmcode abarbeiten können, ebenso als Computer zu verstehen. Ein nichttransitorisches Speichermedium und/oder ein Download-Produkt kann das Computerprogramm umfassen. Ein Download-Produkt ist ein elektronisches Produkt, das online verkauft und über ein Netz für die unmittelbare Erfüllung übertragen werden kann. Ein oder mehrere Computer können mit dem Computerprogramm und/oder mit dem nichttransitorischen Speichermedium und/oder Download-Produkt ausgestattet sein.The methods described above may be computer-implemented in whole or in part and thus embodied in software. The invention therefore also relates to a computer program with machine-readable instructions which, when executed by one or more computers, cause the one or more computers to carry out a method as described above. In this regard, vehicle control units and other embedded systems are those that have an executable program can process code, also to be understood as a computer. A non-transitory storage medium and/or a downloadable product may include the computer program. A Download Product is an electronic product that can be sold online and transmitted over a network for immediate fulfillment. One or more computers may be equipped with the computer program and/or with the non-transitory storage medium and/or downloadable product.

Im Folgenden werden die Erfindung und ihre bevorzugten Ausführungsformen unter Verwendung von Figuren, ohne irgendeine Absicht, den Schutzbereich der Erfindung zu begrenzen, dargestellt.In the following, the invention and its preferred embodiments are illustrated using figures, without any intention of limiting the scope of the invention.

In den Figuren zeigen:

1 eine beispielhafte Ausführungsform des Verfahrens 100 zum Erzeugen eines kontrafaktischen Bildes x_g;
2 eine beispielhafte Ausführungsform des Verfahrens 200 zum Trainieren einer Kombination eines Codierernetzes 1 und eines Generatornetzes 2;
3 eine beispielhafte Ausführungsform einer Konfiguration zum Durchführen des Verfahrens 200.

In the figures show:

1 an exemplary embodiment of the method 100 for generating a counterfactual image x _g ;
2 an exemplary embodiment of the method 200 for training a combination of an encoder network 1 and a generator network 2;
3 an exemplary embodiment of a configuration for performing the method 200.

1 ist ein schematischer Ablaufplan einer Ausführungsform des Verfahrens 100 zum Erzeugen eines kontrafaktischen Bildes x_g für ein Eingangsbild x, das zu einer Quellenklasse c einer gegebenen Klassifikation gemäß einem gegebenen Bildklassifikator 3 gehört. In Schritt 110 bildet ein trainiertes Codierernetz 1 ein Eingangsbild x auf eine Darstellung z mit niedrigerer Dimension in einem latenten Raum ab. In Schritt 120 bildet ein trainiertes Generatornetz eine Kombination dieser Darstellung z und einer Angabe einer Zielklasse c'≠c auf ein kontrafaktisches Bild x_g ab, das durch den gegebenen Bildklassifikator 3 in die Zielklasse c' klassifiziert wird. 1 12 is a schematic flow diagram of an embodiment of the method 100 for generating a counterfactual image x _g for an input image x belonging to a source class c of a given classification according to a given image classifier 3 . In step 110, a trained coder network 1 maps an input image x to a lower dimension representation z in latent space. In step 120, a trained generator network maps a combination of this representation z and an indication of a target class c'≠c onto a counterfactual image _xg , which is classified into the target class c' by the given image classifier 3.

Insbesondere kann das endgültige kontrafaktische Bild x_g durch einen Optimierungsprozess erhalten werden. Gemäß dem Block 121 kann ein anfängliches kontrafaktisches Bild x_g auf eine Klassifikationsbewertung c# durch den gegebenen Bildklassifikator 3 abgebildet werden. Mittels einer gegebenen Klassifikationsverlustfunktion Lc kann dann gemäß dem Block 122 bestimmt werden, wie gut die Klassifikationsbewertung c# mit einer Klassifikation des kontrafaktischen Bildes x_g in die Zielklasse c' in Übereinstimmung ist. Gemäß dem Block 123 kann dann mindestens eine Eingabe in das Generatornetz 2 derart optimiert werden, dass eine erneute Berechnung des kontrafaktischen Bildes x_g auf der Basis der geänderten Eingabe wahrscheinlich bewirkt, dass sich der Wert der Klassifikationsverlustfunktion Lc verbessert.In particular, the final counterfactual image x _g can be obtained through an optimization process. According to block 121, an initial counterfactual image x _g can be mapped to a classification score c# by the given image classifier 3 . Using a given classification loss function Lc, it can then be determined according to block 122 how well the classification score c# matches a classification of the counterfactual image x _g into the target class c′. According to block 123, at least one input to the generator network 2 can then be optimized such that a recalculation of the counterfactual image x _g based on the changed input is likely to cause the value of the classification loss function Lc to improve.

In Schritt 130 wird das kontrafaktische Bild x_g mit dem Eingangsbild x verglichen. In Schritt 140 werden Bereiche Δ, in denen das kontrafaktische Bild x_g sich vom Eingangsbild x unterscheidet, als Bereiche S bestimmt, die mit Bezug auf die Klassengrenze zwischen der Quellenklasse c und der Zielklasse c' signifikant sind.In step 130, the counterfactual image x _g is compared to the input image x. In step 140, areas Δ in which the counterfactual image x _g differs from the input image x are determined as areas S that are significant with respect to the class boundary between the source class c and the target class c'.

In Schritt 150 wird bestimmt, wie gut diese Bereiche S mit gegebenen Bereichen S* des Eingangsbildes x, die Merkmale des Eingangsbildes x enthalten, die als auffällig erachtet werden, in Übereinstimmung sind. Eine Bewertung 3a, die dem Ergebnis 4a entspricht, das durch die gegebene Metrik 4 ausgegeben wird, wird dem Bildklassifikator 3 in Schritt 160 zugeschrieben.In step 150, it is determined how well these regions S match given regions S* of the input image x that contain features of the input image x that are considered to be prominent. A score 3a corresponding to the result 4a given by the given metric 4 is assigned to the image classifier 3 in step 160.

In Schritt 170 werden Parameter (3b), die das Verhalten des gegebenen Bildklassifikators 3 charakterisieren, optimiert, so dass, wenn die Berechnung des kontrafaktischen Bildes x_g in Schritt 120 und die anschließende Auswertung dieses kontrafaktischen Bildes x_g in Schritten 130 bis 160 wiederholt werden, die Bewertung 3a des Bildklassifikators 3 sich wahrscheinlich verbessert. Der schließlich optimierte Zustand der Parameter 3b wird mit dem Bezugszeichen 3b* bezeichnet.In step 170 parameters (3b) characterizing the behavior of the given image classifier 3 are optimized so that when the calculation of the counterfactual image x _g in step 120 and the subsequent evaluation of this counterfactual image x _g in steps 130 to 160 are repeated , the rating 3a of the image classifier 3 is likely to improve. The finally optimized state of the parameters 3b is denoted by the reference symbol 3b*.

2 ist ein schematischer Ablaufplan einer Ausführungsform des Verfahrens 200 zum Trainieren einer Kombination eines Codierernetzes 1 und eines Generatornetzes 2 für die Verwendung im vorher beschriebenen Verfahren 100. 2 1 is a schematic flowchart of one embodiment of the method 200 for training a combination of an encoder network 1 and a generator network 2 for use in the previously described method 100.

In Schritt 210 wird ein Decodierernetz 5 bereitgestellt. Dieses Decodierernetz 5 ist dazu konfiguriert, eine Darstellung z mit niedrigerer Dimension in einem latenten Raum, die vom Codierernetz 1 erhalten wurde, auf ein rekonstruiertes Bild x_d in der Domäne von ursprünglichen Eingangsbildern x abzubilden. In Schritt 220 werden Parameter 1a, 5a, die das Verhalten der Netze des Codierers 1 und des Decodierers 5 charakterisieren, mit dem Ziel optimiert, dass das rekonstruierte Bild x_d dem ursprünglichen Eingangsbild x entspricht, von dem die Darstellung z erhalten wurde. In dieser Weise werden der Codierer 1 und der Decodierer 5 trainiert, so dass sie zu einem (variantenreichen) Autocodierer (V)AE werden. Die schließlich optimierten Zustände der Parameter 1a und 5a werden mit den Bezugszeichen 1a* bzw. 5a* bezeichnet. Gemäß dem Block 221 kann ein zusätzliches Ziel der Optimierung darin bestehen, dass gegenseitige Informationen zwischen dem ursprünglichen Eingangsbild x und seiner Darstellung z maximiert werden.In step 210 a decoder network 5 is provided. This decoder network 5 is configured to map a lower dimension representation z in latent space obtained from the encoder network 1 to a reconstructed image x _d in the domain of original input images x. In step 220, parameters 1a, 5a characterizing the behavior of the networks of encoder 1 and decoder 5 are optimized with the aim that the reconstructed image x _d corresponds to the original input image x from which representation z was obtained. In this way the encoder 1 and the decoder 5 are trained so that they become a (variant) autocoder (V)AE. The finally optimized states of the parameters 1a and 5a are denoted by the reference symbols 1a* and 5a*, respectively. According to block 221, an additional goal of the optimization may be that mutual information between the original input image x and its representation z is maximized.

In Schritt 230 wird ein Diskriminatornetz 6 bereitgestellt. Dieses Diskriminatornetz 6 ist dazu konfiguriert zu unterscheiden, ob ein Bild von der Domäne von ursprünglichen Eingangsbildern x oder von der Domäne von gefälschten Bildern x_f stammt. Ein Bildklassifikator 3 wird auch bereitgestellt. Dieser Bildklassifikator 3 ist dazu konfiguriert, das ursprüngliche Eingangsbild x und das gefälschte Bild x_f auf eine oder mehrere Klassen der gegebenen Klassifikation abzubilden.In step 230 a discriminator network 6 is provided. This discriminator network 6 is configured to discriminate whether an image from the domain of original input images x or comes from the domain of fake images x _f . An image classifier 3 is also provided. This image classifier 3 is configured to map the original input image _x and the counterfeit image xf to one or more classes of the given classification.

Dieser Bildklassifikator 3 kann fest sein und als solcher verwendet werden. Aber in der in 2 gezeigten Ausführungsform wird der Bildklassifikator 3 ebenso trainiert. Gemäß dem Block 241 kann der Bildklassifikator 3 ein trainierbares Netz umfassen, das dazu konfiguriert ist, Eingangsbilder x und erzeugte Bilder x_f auf eine Kombination einer Darstellung z mit niedrigerer Dimension im latenten Raum und einer Klassifikationsbewertung c# abzubilden. Gemäß dem Block 242 werden Parameter 3b, die das Verhalten dieses trainierbaren Netzes charakterisieren, mit den Zielen optimiert, dass:

• der Bildklassifikator 3 ein ursprüngliches Eingangsbild x auf eine Kombination von: einer Darstellung z, die der Darstellung z entspricht, die durch das Codierernetz 1 erzeugt wird, und einer Klassifikationsbewertung c#, die mit einer Ground-Truth-Kennung c des Eingangsbildes x konsistent ist, abbildet; und
• der Bildklassifikator 3 ein gefälschtes Bild x_f auf eine Kombination von:
- einer Darstellung z, die der Darstellung z entspricht, von der das gefälschte Bild x_f erzeugt wurde, und einer Klassifikationsbewertung c#,
- die mit der Zielklasse c konsistent ist, für die das Bild x_f erzeugt wurde, abbildet.

This image classifier 3 can be fixed and used as such. But in the in 2 In the embodiment shown, the image classifier 3 is also trained. According to block 241, the image classifier 3 may comprise a trainable network configured to map input images _x and generated images xf to a combination of a lower dimension latent space representation z and a classification score c#. According to block 242, parameters 3b characterizing the behavior of this trainable network are optimized with the objectives that:

• the image classifier 3 evaluates an original input image x to a combination of: a representation z corresponding to the representation z generated by the encoder network 1 and a classification score c# consistent with a ground truth identifier c of the input image x , depicts; and
• the image classifier 3 a fake image x _f on a combination of:
- a representation z corresponding to the representation z from which the fake image _xf was generated and a classification score c#,
- which is consistent with the target class c for which the image x _f was generated.

Der schließlich optimierte Zustand der Parameter 3b wird mit dem Bezugszeichen 3b* bezeichnet.The finally optimized state of the parameters 3b is denoted by the reference symbol 3b*.

In Schritt 250 werden Parameter 2a, die das Verhalten des Generatornetzes 2 charakterisieren, mit den Zielen optimiert, dass:

• die Genauigkeit, mit der das Diskriminatornetz 5 zwischen ursprünglichen Eingangsbildern x und gefälschten Bildern x_f unterscheidet, abnimmt, und
• der Bildklassifikator 3 erzeugte Bilder x_f auf ihre gegebene Zielklasse c abbildet.

In step 250, parameters 2a characterizing the behavior of the generator network 2 are optimized with the objectives that:

• the accuracy with which the discriminator network 5 distinguishes between original input images x and fake images x _f decreases, and
• the image classifier 3 maps generated images x _f to their given target class c.

Gemäß dem Block 251 kann ein weiteres Optimierungsziel darin bestehen, dass gegenseitige Informationen zwischen dem Bild x_f und der Darstellung z maximiert werden. Der schließlich optimierte Zustand der Parameter 2a wird mit dem Bezugszeichen 2a* bezeichnet.According to block 251, another optimization goal can be that mutual information between the image x _f and the representation z is maximized. The finally optimized state of the parameters 2a is denoted by the reference symbol 2a*.

Gleichzeitig in Schritt 260 Parameter 6a, die das Verhalten des Diskriminatornetzes 6 charakterisieren, mit dem Ziel, dass die Genauigkeit, mit der das Diskriminatornetz 6 zwischen ursprünglichen Eingangsbildern x und erzeugten Bildern x_f unterscheidet, zunimmt. Der schließlich optimierte Zustand der Parameter 6a wird mit dem Bezugszeichen 6a* bezeichnet.At the same time, in step 260, parameters 6a characterizing the behavior of the discriminator network 6, with the aim of increasing the accuracy with which the discriminator network 6 distinguishes between original input images x and generated images x _f . The finally optimized state of the parameters 6a is denoted by the reference symbol 6a*.

Das heißt, das Training 250 des Generatornetzes 2 und das Training 260 des Diskriminatornetzes 6 werden in gegnerischer Weise durchgeführt.This means that the training 250 of the generator network 2 and the training 260 of the discriminator network 6 are carried out in an opposing manner.

3 zeigt eine beispielhafte Konfiguration, die verwendet werden kann, um das vorstehend beschriebene Trainingsverfahren 200 durchzuführen. Das Codierernetz 1 und das Decodierernetz 5 bilden einen (variantenreichen) Autocodierer (V)AE. Das Generatornetz 2 und das Diskriminatornetz 6 bilden ein generatives gegnerisches Netz, GAN. Das Diskriminatornetz 6 gibt für jedes in diesen eingegebene Bild eine Klassifikation aus, ob dieses Bild sich in der Domäne von ursprünglichen Eingangsbildern x befindet oder ob es ein gefälschtes Bild x_f ist. In der in 3 gezeigten Ausführungsform wird der Bildklassifikator 3 verwendet, um nicht nur eine Klassifikationsbewertung c# vorherzusagen, sondern auch eine Darstellung z mit niedrigerer Dimension, die dem ursprünglichen Eingangsbild x bzw. dem erzeugten Bild x_f entspricht. Daher kann er als Teil des GAN betrachtet werden. Das Codierernetz 1 und das Generatornetz 2 bilden die Vorrichtung 10, um aus einem Quellenbild x, das ein Teil einer Quellenverteilung von Bildern ist, ein Zielbild x_g, das ein anderes Muster aus derselben Quellenverteilung ist, zu erzeugen. 3 FIG. 12 shows an example configuration that can be used to perform the training method 200 described above. The encoder network 1 and the decoder network 5 form a (various) autocoder (V)AE. The generator network 2 and the discriminator network 6 form a generative opposing network, GAN. For each image input into it, the discriminator network 6 outputs a classification as to whether this image is in the domain of original input images x or whether it is a counterfeit image x _f . in the in 3 In the embodiment shown, the image classifier 3 is used to predict not only a classification score c#, but also a lower dimension representation z corresponding to the original input image x and the generated image x _f , respectively. Therefore, it can be considered part of the GAN. The encoder network 1 and the generator network 2 form the apparatus 10 for generating from a source image x, which is part of a source distribution of images, a target image x _g , which is another sample from the same source distribution.

ZITATE ENTHALTEN IN DER BESCHREIBUNGQUOTES INCLUDED IN DESCRIPTION

Diese Liste der vom Anmelder aufgeführten Dokumente wurde automatisiert erzeugt und ist ausschließlich zur besseren Information des Lesers aufgenommen. Die Liste ist nicht Bestandteil der deutschen Patent- bzw. Gebrauchsmusteranmeldung. Das DPMA übernimmt keinerlei Haftung für etwaige Fehler oder Auslassungen.This list of documents cited by the applicant was generated automatically and is included solely for the better information of the reader. The list is not part of the German patent or utility model application. The DPMA assumes no liability for any errors or omissions.

Zitierte PatentliteraturPatent Literature Cited

WO 2018/197074 A1 [0003]

Claims

A method (100) for generating a counterfactual image x _g for an input image x that a given image classifier (3) classifies into a source class c selected from a plurality of available classes of a given classification, comprising the following steps: • mapping (110 ) the input image x into a lower dimension representation z in a latent space by means of a trained coder network (1); and • mapping (120) at least one combination of the lower dimension representation z and an indication of a target class c'≠c selected from the plurality of available classes to a counterfactual image x _g , which is mapped into the target class c' by the given Image classifier (3) is classified by means of a trained generator network (2).

Method (100) according to claim 1 , further comprising the steps of: • comparing (130) the counterfactual image x _g with the input image x; and • determining (140) areas (Δ) in which the counterfactual image x _g differs from the input image x as areas (S) that are significant with respect to the class boundary between the source class c and the target class c'.

Method (100) according to claim 2 , further comprising the steps of: • determining (150), by means of a given metric (4), how well the areas (S) that are significant with respect to the class boundary, with given areas (S*) of the input image x, the contain features of the input image x that are considered conspicuous are in agreement; and • assigning (160) to the image classifier (3) a score (3a) corresponding to the score (4a) returned by the given metric (4).

Method (100) according to claim 3 , further comprising: optimizing (170) parameters (3b) characterizing the behavior of the given image classifier (3) such that when the previous steps (120-160) of the method (100) are repeated, the score ( 3a) of the image classifier (3) is likely to improve.

Method (100) according to any one of Claims 1 until 4 wherein the mapping (120) of the lower dimension representation z and the target class c' to the counterfactual image x _g further comprises: • mapping (121) the counterfactual image x _g to a classification score c# using the given image classifier (3); • determining (122) how well the classification score c# matches a classification of the counterfactual image x _g into the target class c' using a given classification loss function (Lc); and • optimizing (123) at least one input to the generator network (2) such that recalculating the counterfactual image x _g based on the changed input is likely to cause the value of the classification loss function (Lc) to improve.

Method (100) according to any one of Claims 1 until 5 , where the input image x is an image of a manufactured product, captured in the course of a visual inspection of the product, and the classes of the given classification represent quality levels for the product.

Method (100) according to any one of Claims 1 until 5 , where the input image x is an image of a traffic situation and the classes of the given classification represent objects that are relevant for the interpretation of the traffic situation.

Method (200) for training a combination of an encoder network (1) and a generator network (2) for use in the method (100) according to any one of Claims 1 until 5 comprising the steps of: • providing (210) a decoder network (5) configured to map a lower dimension representation z in latent space obtained from the encoder network (1) onto a reconstructed image x _d to map in the domain of original input images x; • Optimizing (220) parameters (1a, 5a) characterizing the behavior of the encoder (1) and decoder (5) networks with the aim that the reconstructed image x _d corresponds to the original input image x from which the representation z was obtained; • providing (230) a discriminator network (6) configured to discriminate whether an image is from the domain of original input images _x or from the domain of generated images xf; • providing (240) an image classifier (3) configured to map the original input image _x and the generated image xf to one or more classes of the given classification; • Optimization (250) of parameters (2a) that characterize the behavior of the generator network (2) with the aims that ◯ the accuracy with which the discriminator network (5) distinguishes between original input images x and generated images x _f decreases, and ◯ the image classifier (3) maps generated images x _f to their given target class c; and • optimizing (260) parameters (6a) that Characterize the behavior of the discriminator network (6) with the aim of increasing the accuracy with which the discriminator network (6) distinguishes between original input images _x and generated images xf.

Method (200) according to claim 8 , where • the parameters (1a), which characterize the behavior of the coder network (1), are additionally optimized (221) with the aim of maximizing mutual information between the original input image x and its representation z; and/or • the parameters (2a), which characterize the behavior of the generator network (2), are additionally optimized (251) with the aim of maximizing mutual information between the generated image x _f and the representation z.

Method (200) according to any one of Claims 8 until 9 , wherein • the image classifier (3) comprises a trainable network (241) configured to map input images _x and generated images xf to a combination of a lower dimension latent space representation z and a classification score c#; and • parameters (3b), which characterize the behavior of this trainable network, are optimized (242) with the objectives that: or image classifier (3) an original input image x to a combination of: a representation z that corresponds to representation z, generated by the encoder network (1) and a classification score c# consistent with a ground truth identifier c of the input image x; and or image classifier (3) a generated image x _f to a combination of: a representation z corresponding to the representation z from which the generated image x _f was generated and a classification score c# consistent with the target class c', for which the generated image x _f was generated.

Apparatus (10) for generating from a source image x that is part of a source distribution of images a target image x _g that is another template from the same source domain, comprising: • a trained coder network (1) that configured to map the source image x into a lower dimension representation z in latent space; and • a trained generator network (2) configured to map at least the lower dimension representation z onto the target image _xg .

Device (10) after claim 11 , wherein the generator network (2) is configured to map a combination of the lower-dimensional representation z and a target class c' to a target image x _g that is a member of the target class c', the target class c' being one of a plurality of classes , which are included in a predetermined classification of images in the source domain.

A computer program having machine-readable instructions which, when executed by one or more computers, cause the one or more computers to carry out a method (100, 200) according to any one of Claims 1 until 10 execute.

Non-transitory storage medium with the computer program after Claim 13 .

One or more computers with the computer program after Claim 13 and/or with the non-transitory storage medium Claim 14 .