EP3785169A1

EP3785169A1 - Method and device for converting an input image of a first domain into an output image of a second domain

Info

Publication number: EP3785169A1
Application number: EP19721223.6A
Authority: EP
Inventors: Andrej Junginger; Markus Hanselmann; Thilo Strauss; Holger Ulmer
Original assignee: Robert Bosch GmbH
Current assignee: Robert Bosch GmbH
Priority date: 2018-04-23
Filing date: 2019-04-18
Publication date: 2021-03-03
Also published as: DE102018206806A1; WO2019206792A1

Abstract

The invention relates to a method for training a first neural network for converting an input image (E) of a first domain into an output image (A) of a second domain, wherein the training is carried out on training images (T) of the second domain and input images (E) of the first domain provided for the training, comprising the following steps: providing a GAN network with a generator network (2) having the first neural network and a discriminator network (3) having a second neural network; training the discriminator network (3) based on a discriminator error value (DF) and one or more training images (T) and/or one or more output images (A), which are generated by processing one or more of the input images via the generator network (2), wherein the discriminator error value (DF) is determined depending on a respective quality (C) of the one or more training images (T) and/or the one or more output images; training the generator network (2) based on an input image (E) provided for the training and a generator error value (GF) which depends upon a quality (C) of the output image (A), provided by the generator network (2) depending on the input image (E), and upon a degree of similarity (S) between the input image (E) and the output image (A), which indicates a degree of structural similarity.

Description

description

title

Method and device for implementing a single image of a first

Domain into an output image of a second domain

Technical area

The invention relates to methods for training a neural network for converting an input image of a first domain or in a first display style into an output image of a second domain or in a second display style.

Technical background

Motor vehicles are often equipped with camera systems that capture image information about a vehicle environment, in particular an image of a vehicle environment ahead in the direction of travel. This image information is used to perform driver assistance functions to assist the driver and autonomous driving functions. Examples of such driver assistance functions may include a recognition system for traffic signs or a brake assist, which recognizes, for example, that a pedestrian is in a collision area in front of the motor vehicle or moves into it.

One problem with the development of such functions is that there is insufficient image data at which these functions can be tested. In particular, it is troublesome to provide image data for critical situations. Furthermore, provided image data usually contain no meta information, the z. B. image segmentation information, ie, indicate which pixel regions of the image data to a pedestrian, to a surrounding area, to a street area, to a building area and the like. Often, such image information must be manually created, which is a costly and, above all, time-consuming process.

Known approaches to artificially generate image data for possible traffic situations as an artificial camera image, are desired traffic situations by script, d. H. to describe with a formal language and to visualize it with a graphic engine. However, the images or image data thus determined represent the traffic situations artificially and not photorealistically, which is unsuitable for the development and testing of driver assistance functions and autonomous driving functions under realistic conditions.

Other methods are known in the art which suggest a style transfer from an input image to an output image. While simple approaches to training such a system use associated image data of the input image and the output image, both of which display the same image content and differ only in style (with their domain), advanced methods may use input and output image data of the corresponding styles that have no reference must have to each other.

A disadvantage of the methods described above is that a so-called cycle consistency must be calculated during training, whereby in training the input image data must be explicitly calculated into the output image data and vice versa, which makes the training very computationally intensive and thus time consuming.

Disclosure of the invention

According to the invention, a method for training a neural network for converting an input image of a first domain into an output image of a second domain according to claim 1 and a corresponding device according to the independent claim are provided. Further embodiments are specified in the dependent claims.

According to a first aspect, there is provided a method of training a first neural network to convert an input image of a first domain to an output image of a second domain, wherein the training is performed on first domain input images provided for the training and second domain training images; with the following steps:

Providing a GAN network with a generator network comprising the first neural network and a discriminator network comprising a second neural network;

Training the discriminator network based on a discriminator error value and one or more training images and / or one or more output images generated by processing one or more of the input images by the generator network, the discriminator error value being dependent on a respective quality of the one or more training images and / or the one or more output images is determined;

Training the generator network based on an input image provided for training and a generator error value that depends on a quality of the output image provided by the generator network responsive to the input image and a similarity size between the input image and the output image that indicates a measure of structural similarity.

The aim of the above method is to train a neural network so that a given input image is converted into an output image. In this case, the input and output images should have different styles, ie the input image data should be available in a first domain and the output image data in a second domain. The styles correspond to display styles, such as a segmentation representation in which, for example, different color areas are assigned to different objects or image areas, a photorealistic image, a comic image, a line drawing, a watercolor sketch, and the like. To create and test driver assistance functions and / or autonomous driving functions for a motor vehicle based on an evaluation of camera images of the current vehicle environment, it is necessary to provide a sufficient number of photorealistic images of the vehicle environment. These images are intended to replace camera images and to be as indistinguishable as possible from them. These images may also optionally be provided with meta information including, for example, segmentation information that associates image areas of the photorealistic image with particular objects or backgrounds. Thus, an important application for such a trained neural network is the conversion of a z. B. by a scripting language or as a hand sketch described input image into an artificially generated photorealistic output image that corresponds to the input image content or scenic, but in the representation, the style of presentation, deviates from this.

In the following, a conversion of an input image of a first style into an output image of a second style or an input image in a first domain into an output image in a second domain is spoken to describe this generation process. For example, an input image indicating only image areas for particular objects and / or backgrounds, such as image areas representing a person, a cyclist, a road area, a development area, a vegetation area, and the like, may be processed by the trained neural network such that corresponding image areas are provided with realistic structures of the corresponding objects.

For this purpose, the above method envisages using a GAN network (GAN: Generative Adversarial Network) in which a generator network corresponding to a first neural network is to be trained by means of a discriminator network which corresponds to a second neural network. The generator network then generates output image data in a second domain from provided input image data in a first domain.

In a GAN network, the quality of training the generator network using the discriminator network is improved. The discriminator network provides training for the generator network as relevant information Rating label for the output image generated by the generator network. To provide the rating label, the discriminator network is trained to evaluate whether an image provided at its input is an image in a second domain. The discriminator network is trained at the same time or in alternation with the generator network based on generator-generated output images and training images in a second domain, wherein the training images are assigned a rating label indicating a high degree of allocation to the second domain (ie, indicating that the images in question are the second domain). In addition, the discriminator network is supplied with the output images generated by the generator network, together with a rating label indicating a low allocation level to the second domain (ie indicating that the respective second domain images were artificially generated by the generator network).

Generator network and discriminator network can be trained alternately, thereby iteratively improving both neural networks and finally learning the generator network to convert a provided input image in the first domain into an output image in the second domain.

To train the generator network and the discriminator network, loss functions or cost functions are used. To train the generator network, a generator function that includes two parts is used as the cost function. A first part forces the generated output image to be assigned to the second domain. For this purpose, the output image generated by the generator network is supplied to the discriminator network and the distance to the desired evaluation label (evaluation label for a training image of the second domain) is minimized. The second part ensures that the image contents of the output image generated by the generator network correspond to the original image by minimizing a structural distance of the output image to the input image, i. H. the output image differs from the input image only by the style of presentation (domain) but only slightly by the image content or the scene shown.

The structural distance can be determined, for example, by a similarity value, which is a measure of the structural similarity of two images in different domains. For example, an SSIM index (SSIM: Structural Similarity Index), which indicates the structural similarity between the input image and the output image in a known manner, is suitable for this purpose.

In this way, the generator network is allowed to train an input image in the first domain or a first rendering style into an output image in a second domain, i. a second style of presentation, to transform. For this, the input images of the first presentation style and the training images of the second presentation style must be specified, wherein a similarity or identity of the representation of the input images and the training images is not necessary, d. H. it is not necessary to provide input images that differ from the training images only by the style of presentation.

Thus, a neural network (generator network) can be trained by the above method, which automatically and monitored from synthetic input images that show, for example, a traffic situation schematically or stylized, photorealistic output images of the corresponding traffic situation generated. The output images can then be used to develop and / or test driver assistance functions or autonomous driving functions. One particular advantage is that situations can be created that can not be tested in reality, such as: B. a running on the roadway person to test a brake assist system or to test an evasive behavior of an autonomous driving function.

Overall, the training method described above can achieve a significantly improved conversion of an input image of a first presentation style into a corresponding output image of a second presentation style, wherein the training method can be implemented in a simple manner and has high reliability and robustness. Also, the above training method results in better results, ie, an improved more precise conversion of the input image of the first presentation style into the output image of the second presentation style, than corresponding conventional methods. Furthermore, the training of the discriminator network and the generator network can be performed simultaneously or alternately repeatedly, in particular using a backpropagation method, until an abort condition is met.

It can be provided that the termination condition is fulfilled if a number of passes or a predetermined quality of the output images generated by the generator network is reached.

Furthermore, the quality of the one or more training images and / or the one or more output images may each be determined by the discriminator network and may correspond to a rating of the extent to which the image in question is an image of the second domain.

In particular, the discriminator error value may be a function of a deviation measure for the deviation between the respective quality of the one or more training images and a rating label indicating a training image as a real image of the second domain, and depending on a deviation measure for the deviation between the respective quality of the respective one output image or the respective plurality of output images and a rating label which indicates an output image generated by the generator network as a false image of the second domain, the deviation measure corresponding in particular to a mean squared error or a binary cross entropy.

It can be provided that the similarity quantity depends on or corresponds to an SSIM index for a structural similarity between one of the input images and an output image generated by the generator network from the relevant input image.

Furthermore, the first and / or the second neural network can be configured as a convolutional neural network (folding neural network), wherein in particular the first and / or the second neural network is a series connection of some convolutional layer blocks (Convolution blocks), some ResNet blocks, and some Deconvolutional blocks, each of which blocks may contain as an activation function a ReLU, leaky-ReLU, tanh, or sigmoid function.

Furthermore, the generator error value may depend on a deviation measure for the deviation between the respective quality of the output image provided by the generator network as a function of the input image and a rating label from the discriminator network indicating a second domain image, wherein the deviation measure is in particular a mean squared error or corresponds to a binary cross entropy.

According to one embodiment, the training of the discriminator network and / or the generator network can only be performed if a condition dependent on the current discriminator error value and / or on the generator error value is satisfied.

Furthermore, a method for providing a control for a technical system, in particular for a robot, a vehicle, a tool or a factory machine, may be provided, wherein the above method is carried out for training a first neural network, wherein the trained first neural network uses is going to workout images, ie Output images of the second domain, with which the controller, which in particular contains a neural network, is trained. In particular, the technical system can be operated using the controller.

According to another aspect, a use of a first neural network trained in accordance with the above method is for generating photorealistic seed images in a second domain dependent on predetermined input images in a first domain, which are created in particular via a script-based description

Furthermore, the generated photorealistic output images may be used as artificial camera images for establishing a classifier for environmental situations. In another aspect, a GAN network is for training a first neural network to convert an input image of a first domain to an output image of a second domain, wherein the training is performed on first domain input images provided for training and second domain training images the GAN network comprises a generator network comprising the first neural network and a discriminator network comprising a second neural network, the GAN network being adapted to

train the discriminator network based on a discriminator error value and one or more training images and / or one or more output images generated by processing one or more of the input images by the generator network, the discriminator error value being dependent on a respective quality of the one or more training images and / or the one or more output images is determined; and

train the generator network based on an input image provided for the raining and a generator error value that depends on a quality of the output image provided by the generator network in response to the input image and a similarity size between the input image and the output image that indicates a measure of structural similarity ,

Brief description of the drawings

Embodiments are explained below with reference to the accompanying drawings. Show it:

Figures 1 a and 1 b exemplary representations of an image of a first

Presentation style and an associated image of a second presentation style;

Figure 2 is a block diagram illustrating a system for training a GAN network to translate an input image of a first presentation style and an output image of a second presentation style; and FIG. 3 shows a flow chart for illustrating a method for training a neural network for converting an input image into an output image of a different presentation style.

Description of embodiments

A neural network is to be trained which is able to convert an input image into an output image. The goal is that the input image in a first domain, i. H. in a first display style, and in an output image corresponding to the input image in a second domain, i. H. in a second of the first different style of presentation. "Presentation style" herein refers to a representation of information contained in the corresponding image.

For example, a segmentation image indicating segmentation of object and background areas of a photorealistic image, or other artificially generated (synthetic) image, such as a photorealistic image, may be used. a sketch, as an input image represent a template from which a photorealistic image is generated as an output image, so that the input image and the output image correspond to different presentation styles. Figures 1 a and 1 b show exemplary representations of a synthetic image or a photorealistic image corresponding to the synthetic image in sketch form and as realistic representations.

A possible application of such a trained neural network could be to convert a given input image in the form of a segmentation image, in which only segmentation ranges are given, into an artificially generated photorealistic output image. Thus, as shown in Figure 1 as a real image and as a sketch image, for example, a Segmentierungsbild (Figure 1 a) in which only areas are marked, for example, display areas for a carriageway area, a development area, a vegetation area of foreign vehicles, pedestrians, of Cyclists or other objects, in a corresponding photorealistic Image (Figure 1 b) are converted. Such a photorealistic image may then be used in a test or development environment for testing and / or creating driver assistance functions or autonomous driving functions.

To train a neural network, a system may be used which structurally corresponds to the block diagram of FIG. FIG. 2 essentially shows a basic structure of a GAN network 1 with a generator network 2 comprising a first neural network and a discriminator network 3 comprising a second neural network. The first and / or second neural network may in particular be designed as convolutional neural networks or other types of neural networks.

Various architectures known per se are conceivable for the first neural network of the generator network 2. In particular, a series connection of a few convolutional layer blocks (folding blocks), some ResNet blocks and a few deconvolutional blocks can be selected. Each of these blocks may optionally include a batch or other type of normalization. Each of the blocks may further contain none, one or more activation functions, such as a ReLU, leaky-ReLU, tanh or sigmoid function.

For the second neural network of the discriminator network 3, various network architectures known per se can also be provided. As a network architecture, it is possible to use a series of blocks, such as a plurality of convolutional-layer blocks, some ResNet blocks, and a few deconvolutional blocks. Each of these blocks may contain a batch or other type of normalization. Furthermore, each of the blocks may contain none, one or more activation functions, such as a ReLU, leaky-ReLU, tanh or sigmoid function.

The generator network 2 is designed to generate an output image A of a second presentation style based on an input image E of a first presentation style. The input image E can be an image with one or more Be color channels, in particular three color channels, and the output image A a tensor same or different format. Alternatively, a random tensor may be added to the input image E to cause the output image A to have higher variability.

The generator network 2 is trained based on a provided generator error value GF, in particular using a backpropagation method. The generator error value GF is generated in an evaluation block 4, on the one hand, the structural similarity S or dissimilarity of the input image E and of the generator network 2 based on a predetermined input image E generated output image A (image similarity (similarity of the image content or the scene) regardless of Domain or the presentation style) and on the other hand, the quality C of the output image A indicates. The quality C of the output image A indicates the proximity of the presentation style of the output image A to the style of presentation of predetermined training images T.

The quality C of the output image A is determined by means of the discriminator network 3, to which the output image A produced is provided as input. By taking into account the quality C during training of the generator network 3, it is achieved that the generated output image A assumes the second style of presentation. In addition, by taking into account the structural similarity S between the input image E and the output image A, it is achieved that the images have the same image content.

For training, the discriminator network 3 can be supplied with training images T, which are images of the second representation style and which are each provided with a rating label BT, which confirms the second presentation style of the training images. For example, the training images T may be provided with a rating label BT of 1, indicating that the training images T correspond to the second style of presentation. In order to improve the discrimination capability of the discriminator network 3, the discriminator network 3 can also be provided with the output images A generated by the generator network 2, which are provided with a rating label B _A of 0, indicating that the presentation style of these images is of the second style significantly different. By providing the training images T and the output images with the associated evaluation labels BT, BA, the Discriminator network 3 z. B. be trained using the Backpropagation method or other training method to determine the quality of C provided by the generator network 2 output images A.

When training the discriminator network 3, this can with the help of a discriminator error DFK, such. As a mean squared error, binary cross entropy or other appropriate cost functions are trained. As a result, by influencing the generator error value, the discriminator network 3 obtains the capability that the generator network 2 generates not only output images A corresponding to the second display style, but simultaneously the output images A have the same image content as the input image E of the first presentation style supplied to the generator network 2.

Through a mutual or simultaneous training phase of generator network 2 and discriminator network 3, these can be iteratively improved. In this case, the generator network 2 is trained with the generator error value GF by means of a backpropagation method or another training method, the generator error value GF being determined by the structural similarity between the input image E and the output image A generated by the generator network 2 and by the quality determined by the discriminator network 3 C of the generated by the generator network 2 output image A is determined.

Depending on the training image T as the input of the discriminator network 3, a tensor B _{x is} provided. This can be multidimensional or correspond to a real number. The tensor B _x corresponds to the evaluation label and can indicate 1 for the training images and 0 for the images generated by the generator network. The rating labels thus correspond to Bi for a training image T and Bo for an output image A generated by the generator network. The dimension of the evaluation label B is essentially freely selectable and depends on the selected network architecture. The evaluation label B can also be provided with a different standardization, and in particular so-called soft evaluation labels B can be used, ie instead of the values 1 and 0 correspondingly slightly noisy values can be assumed, whereby the stability of the training can be improved depending on the application , The map of the discriminator network 3 corresponds to D _9d , where 0 _{D are} the discriminator parameters (weights) of the neural network of the discriminator network to be optimized. Analogously, the mapping performed by the generator network 2 corresponds to Gg _G, where 0 _{G are} the generator parameters (weights) of the neural network of the generator network 2 to be optimized.

The discriminator error function for training the discriminator network 2 serves to determine the discriminator error value DF used in the parameter optimization training of the discriminator parameter 0D. The loss function has several addend IDs. The Diskriminatorfehlerfunktion implemented in the Diskriminatorbewertungsblock 4 and evaluated for training a deviation l _D between a for an applied image (training image T or output image A) determined grade C (T) = ϋ _qo (T) or C (A) = Dg _D { f) and a rating label B _A , B _T assigned to the applied (supplied) image (eg 1 for training image T or 0 for an output image A). The discriminator error function DFK used for this training of the discriminator network 3 must realize a deviation measure l _D as far as C (T), C (A) and the corresponding evaluation label B _A , B _T differ from each other. In order to determine the deviation measure l _D , it is possible to assume any suitable function for distance evaluation, whereby in particular the mean squared error (MSE) or the binary cross entropy (BCE) are suitable for this purpose. Thus, the two quantities L _D (T) = MSE {C (T), B _T ) and l _D (T) = BCE {C (T), B _T ) and

l _D A) = MSE {C (λ), B _a ) or l _D (λ) = BCE {C {λ), B _A ) into the loss function of the discriminator network 3, so that, for example, DF = I _D (T. ) + l _D Ä) can be selected.

For training the generator network 2, a generator error function is used to generate a generator error value consisting of two parts, a first part corresponding to a deviation amount l _G between the quality C of the output image A _T based on an input image E _T applied for training and a rating label B indicating complete achievement of the second display style, in particular a rating label BT, which is given training images T for the training of the discriminator network 3, preferably a rating label of 1.

In order to determine the deviation measure l _G by means of the generator error function, one can assume any suitable function for distance evaluation, whereby in particular the Mean Squared Error (MSE) or the binary Cross Entropy (BCE) are suitable for this purpose. l _G = MSE (D (A _T ), B _T ) or l _G = BCE {D {A _T ), B _T ) where A _T = G (E _T ).

The second part of the generator error function corresponds to a similarity quantity S, which is determined in a similarity block 6 by means of a similarity evaluation function. The similarity evaluation function calculates a measure of a structural similarity of the two images based on the input image ET of the first presentation style and the output image AT of the second presentation style respectively generated by the generator network 2. In particular, a function may be provided as a similarity evaluation function which, with a high structural similarity, assumes a value close to 1 and with no structural similarity near -1. Suitable as a similarity evaluation function is, for example, to select a so-called SSIM function which indicates an index of structural similarity or a MSSIM based thereon, such as Zhou Wang et al., Image Quality Assessment: From Error Visibility to Structural Similarity, IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 4, APRIL 2004, pages 600-612.

In the flow chart of Figure 3, the training method for the first neural network of the generator network 2 is descriptive described, so that the trained generator network 2 can be used to change a display style of an input image E. Initially, an initial parameterization of the first neural network with the Generator parameters 0G and the second neural network with the discriminator 0D.

In step S1, based on an input image ET of a first presentation style provided for training, an output image AT of a second presentation style is calculated: A _T = Gg _D (E _T ).

With the aid of the discriminator network 3, a quality is achieved in step S2

of the output image A _T generated by the generator network 2 and from this the deviation measure l _G = MSE {C (A _T ), B _T ) or l _G = BCE {C (A _T ), B _T ).

In step S3, the similarity quantity S between the input image ET provided for training and the corresponding output image AT is calculated:

S = SSIM (A _T , E _T )

Based on the deviation measure l _G and the similarity measure S, a generator error value GF for the generated output image A _{T is} determined using the generator error function GFF in step S4.

GF = GFF {l _G , S)

For example, the generator error value GF can be determined as: GF = 1 _G + k ^* S, where the optimization factor k can be chosen empirically and, in particular, between -1... 3 can be selected if Z _G e {0; I) and Se {-1; 1 are.

Based on the generator error value GF, a learning step for the first neural network of the generator network 2 is performed in step S5, in particular based on a backpropagation method. Thereby, the generator parameters 0 _G are updated based on the partial derivatives dGF / d0 _G.

If appropriate, the steps S1 to S5 of the training of the generator network 2 can be repeated with the same or with another input image ET provided for training.

Now the training of the discriminator network 3 begins. In a subsequent step S6, a quality C (Ti _..n ) corresponding to the current training state of the discriminator network 3 is determined for one or more predetermined training _images Ti _..n in the second display style and from this in step S7 the deviation measure l _D (T _n ) determines: l _D (T) = MSE {C (X), B _T ) or l _D (T) = BCE {C (X), B _T )

Furthermore, in step S8, one or more last-generated output images Ai _. m corresponding to a quality C (Ai _..m) is determined and from it in step S9, the dimensions or the deviation Z _ß (Ai .. _m) determined: _D l (71) MSE {Dg 04), B _a) and l _D (71) BCE {Dg (71), B _A )

In a next step S10, a discriminator error value DF is determined, for example, according to the following formula:

Based on the discriminator error value DF, a learning step for the second neural network of the discriminator network 3 may be performed in step S1 1. Characterized the Diskriminatorparameter be updated 0 _D in a back propagation method by using the corresponding partial derivatives dDF / dQ _D.

Of course, the backpropagation method can also be carried out only based on a training image T and / or an output image A. In addition, for training the discriminator network 3, not only generated images in the second display style but also other training images in the first presentation style of 0 (or near 0) may be used. This makes it easier for the discriminator, if necessary, to better learn the differences between the two domains.

Now, in step S12, an abort condition is checked. If the termination condition is not fulfilled (alternative: no), the method is continued with step S1, otherwise (alternative: yes) the method is continued with step S13. An abort condition can be, for example, the achievement of a number of passes or the achievement of a predetermined discriminator error value DF and / or generator error value GF, or the achievement of a predetermined quality C (A) of the output images A generated by the generator network 2.

The step S13 now represents the generator network 2 as a system for converting an input image E of a first presentation style or a first domain into an output image A of a second presentation style or a second domain.

The method described above can be modified in many ways. Thus it is possible that the discriminator parameters 0D and generator parameters 0G are only updated under certain conditions, e.g. B. depending on the current discriminator error value DF for training the discriminator network 3 and the generator error value GF for training the generator network 2. The size of the batches for the training of the discriminator network 3 or the generator network 2 can be varied.

Furthermore, in the discriminator error function DFK of the discriminator network 3, an input image deviation measure which adds a deviation of the quality C of the input image from a rating label B _A for a fake image, ie an output image A generated by the generator network, can still be additively added. This can increase the stability of the training. The trained generator network 2 can then be used to select from input images E created via a script-based description, e.g. B. Traffic situations show input images E to produce a first presentation style. If the generator network 2 has been trained based on images of the first representation style and photorealistic images of traffic situations, the artificially generated input images E can be assigned photorealistic images that represent a corresponding traffic situation. As a result, the generator network 2 can be used to create any number of photorealistic images that represent desired traffic situations.

The generator network 2 can also be trained in a reverse manner to convert photorealistic images into synthetic images, for example to remove reflections or the like from the photorealistic images, for example when a classifier can better classify synthetic images than photorealistic images.

Furthermore, the above system may also be trained to create segmented images from photorealistic images, in which case the photorealistic images correspond to the first style of presentation and the segmented images to the images of the second style of presentation.

Claims

claims

A method of training a first neural network to convert an input image (E) of a first domain to an output image (A) of a second domain, the training comprising training images (E) of the first domain and training images (T) provided for the training second domain is performed; with the following steps:

Providing a GAN network having a generator network (2) comprising the first neural network and a discriminator network (3) comprising a second neural network;

Training the discriminator network (3) based on a discriminator error value (DF) and one or more training images (T) and / or one or more output images (A) generated by processing one or more of the input images by the generator network (2) the discriminator error value (DF) is determined depending on a respective quality (C) of the one or more contour images (T) and / or the one or more output images;

Training the first neural network of the generator network (2) based on an input image (E) provided for training and a generator error value (GF) provided by a quality (C) of the generator network (2) dependent on the input image (E) Output image (A) and a similarity size (S) between the input image (E) and the output image (A) depends, indicating a measure of a structural similarity.

2. The method of claim 1, wherein the training of the discriminator network (3) and the generator network (2) is carried out simultaneously or alternately repeatedly, in particular by means of a backpropagation method, until an abort condition is met.

3. The method of claim 2, wherein the termination condition is satisfied when a number of passes or a predetermined goodness (C) of the generator network (2) generated output images is reached.

A method according to any of claims 1 to 3, wherein the quality (C) of the one or more training images (T) and / or the one or more output images (A) is determined by the discriminator network (3) and a rating, respectively corresponds to the extent to which the image in question is a picture of the second domain.

5. The method according to claim 4, wherein the discriminator error value depends on a respective deviation measure for the deviation between the respective quality of the respective one or more training images and a rating label that has a respective affiliation of the one or more training images the plurality of training images (T) to the second domain, and / or depending on a respective deviation measure for the deviation between the respective quality (C) of the one or more output images (A) and a rating label (B), one of the Generator network (2) generated output image as a not the second domain associated image indicating is determined, wherein the deviation measure in particular corresponds to a mean squared error or a binary cross entropy.

6. The method of claim 1, wherein the similarity quantity is dependent on or corresponds to an SSIM index for a structural similarity between one of the input images and an output image generated by the generator network from the respective input image.

7. The method according to any one of claims 1 to 6, wherein the first and / or the second neural network as folding neural networks (convolutional neural networks) are formed, in particular the first and / or the second neural network a series connection of some convolutional layer In particular, each of the blocks contains, as an activation function, a ReLU, leaky-ReLU, tanh, or sigmoid function - blocks, some ResNet blocks, and a few Deconvolutional blocks.

8. The method according to any one of claims 1 to 7, wherein the generator error value (GF) of a respective deviation measure for the deviation between the respective quality (C) of the output image (A) provided by the generator network (2) as a function of the input image (E) and a rating label (B) indicating an image of the second domain, wherein the deviation measure is in particular a mean squared error or corresponds to a binary cross entropy.

9. The method according to any one of claims 1 to 8, wherein the training of the discriminator network and / or the generator network is performed only if one of the current discriminator error value (DF) and / or the generator error value (GF) dependent condition is met.

10. A method for providing control for a technical system, in particular for a robot, a vehicle, a tool or a factory machine, wherein the method for training a first neural network is carried out according to one of claims 1 to 9, wherein the trained first neural Network is used to generate training images with which the control, which in particular contains a neural network, is trained.

1 1. The method of claim 10, wherein the technical system is operated by the controller.

12. Use of a first neural network, which is trained according to a method according to one of claims 1 to 9, for generating photorealistic output images in a second domain depending on predetermined input images (E) in a first domain, in particular via a script-based Description to be created

13. Use according to claim 12, wherein the generated photorealistic output images (A) are used as artificial camera images for producing a classifier for environmental situations.

14. A GAN network for destroying a first neural network for converting an input image (E) of a first domain into an output image (A) of a second domain, wherein the training is based on input images (E) of the first domain and training images ( T) of the second domain, the GAN network comprising a generator network (2) comprising the first neural network and a discriminator network (3) comprising a second neural network, wherein the GAN network is configured to

to train the discriminator network (3) based on a discriminator error value (DF) and one or more training images (T) and / or one or more output images (A) generated by processing one or more of the input images by the generator network (2) wherein the discriminator error value (DF) is determined depending on a respective goodness (C) of the one or more contour images (T) and / or the one or more output images; and to train the generator network (2) based on an input image (E) provided for the training and a generator error value (GF), which depends on a quality (C) of the output image provided by the generator network (2) ( A) and a similarity quantity (S) between the input image (E) and the output image (A), indicating a measure of structural similarity.

15. Computer program with program code means, which is adapted to carry out a method according to one of claims 1 to 9, when the computer program is executed on a computing unit, in particular a mobile computing unit.

16. A machine-readable storage medium with a computer program stored thereon according to claim 15.