CN116611500A

CN116611500A - Method and device for training neural network

Info

Publication number: CN116611500A
Application number: CN202310180171.6A
Authority: CN
Inventors: M·门科; T·温泽尔
Original assignee: Robert Bosch GmbH
Current assignee: Robert Bosch GmbH
Priority date: 2022-02-17
Filing date: 2023-02-15
Publication date: 2023-08-18
Also published as: DE102022201679A1; US20230260259A1

Abstract

A computer-implemented method for training a machine learning system, the method comprising: providing a source image from a source domain and a target image from a target domain; determining, using a first generator of the machine learning system, a generated first image based on the source image, and determining, using a second generator of the machine learning system, a first reconstruction based on the generated first image; determining a generated second image based on the target image using the second generator and using the first generator basisDetermining a second reconstruction from the generated second image; determine the first loss value (L ₁ ) Wherein the first loss value (L ₁ ) Characterizing a first difference of the source image and the first reconstruction, wherein the first difference is weighted according to a first attention map; determining a second loss value (L ₂ ) Wherein the second loss value (L ₂ ) Characterizing a second difference of the target image and the second reconstruction, wherein the second difference is weighted according to a second attention map; based on the first loss value (L ₁ ) And/or a second loss value (L ₂ ) Training a machine learning system.

Description

Method and device for training neural network

Technical Field

The invention relates to a method for training a machine learning system, a method for training an object detector, a method for operating a control system, a computer program and a machine-readable storage medium.

Background

THE ADVANTAGES OF THE PRESENT INVENTION

Many modern technology systems use machine learning methods to process data recorded from the environment of the technology system. These methods are generally capable of proposing predictions about data, and specifically predictions about data based on statistical knowledge obtained based on a set of training data.

Machine learning systems typically encounter problems if the statistical distribution of data that they process at the time of reasoning is different from the statistical distribution of data that is used to train the machine learning system. This problem is also known in the machine learning field as domain shifting (english: domain shift).

There are many examples of technical systems that are more or less affected by natural or unavoidable domain transfers. For example, in the field of at least partially autonomous vehicles, the following occurs: new vehicles may be observed on the road at regular periods. For sensors of at least partially autonomous vehicles, such as LIDAR sensors, camera sensors or radar sensors, such vehicles also typically result in unknown measurements in the potential training set, since these vehicles are by definition new and thus the sensor measurements recorded by them are also new.

Another form of domain transfer may occur when two product generations of a product are changed. For example, there are camera sensors that include a machine learning system in order to evaluate the environment (i.e., the camera image of the environment) recorded by the camera with respect to, for example, the position aspect of the object. A large amount of training data is typically required to train such machine learning systems. If the production generation of the camera is now changed, for example if a new Image sensor (english: image) is used, the machine learning system is usually no longer able to achieve the same prediction accuracy as the last camera generation without adaptation. Thus, product generation alternation means that new training data is determined for the machine learning system. While acquisition of pure data itself is typically very low cost, acquisition of annotations required for training is much more difficult and more cost intensive, as human experts typically must create annotations.

Advantageously, the method with the features of independent claim 1 allows adapting a source domain to a target domain (domain adaptation in english). In contrast to the known method, this method enables the introduction of a priori information about which parts of the source domain are particularly important when adapting to the target domain. The a priori information is automatically determined. Thus, the method can advantageously perform unsupervised domain adaptation. The inventors were able to determine: the domain adaptation becomes more accurate due to the a priori information.

Disclosure of Invention

In a first aspect, the present invention relates to a computer-implemented method for training a machine learning system, the method comprising the steps of:

providing a source image from a source domain and a target image from a target domain;

determining a generated first image based on the source image using a first generator of the machine learning system, and determining a first reconstruction (rekonstrukion) based on the generated first image using a second generator of the machine learning system;

determining, using a second generator, a generated second image based on the target image, and determining, using the first generator, a second reconstruction based on the generated second image;

determining a first loss value, wherein the first loss value characterizes a first difference of a source image and a first reconstruction, wherein the first difference is weighted according to a first attention map (aufmeirksamkeitskarte), and determining a second loss value, wherein the second loss value characterizes a second difference of the target image and the second reconstruction, wherein the second difference is weighted according to a second attention map;

training the machine learning system by training the first generator and/or the second generator based on the first loss value and/or the second loss value.

The machine learning system may be understood in this context as being configured to accept an image as input and to determine another image as output based on the input. The machine learning system may be trained using the method such that the machine learning system is capable of converting an image of a source domain to an image of a target domain.

A domain may be understood as a probability distribution of an image that may be generated. Thus, the method can also be understood as converting an image from one probability distribution (source domain) to another probability distribution (target domain).

The image is understood in particular as a sensor record or as a measurement of the sensor. In particular, a camera sensor, a LIDAR sensor, a radar sensor, an ultrasound sensor or a thermal imaging camera may be used as a sensor that can determine an image as a measurement. However, the image may also be generated synthetically (synthonsch), e.g. based on computer simulations, e.g. by rendering a virtual world. For such synthetic images, it is generally very simple to implement an automated determination of the annotation, wherein further images can then be generated from the synthetic images by means of this method, the appearance of which (Erscheinungsbild) is similar to that of a camera sensor, for example.

To determine the generated first image, the machine learning system uses a first generator that may be trained during the method. In the context of the present invention, a generator may be understood as a machine learning method that determines an output image based on an input image. In particular, the described generators may be understood as determining images of the same size as the image used as input.

In order to achieve an association from the source domain to the target domain and thus an appropriate adaptation from the source domain to the target domain, the machine learning system further comprises a second generator configured to map (projizieren) the image in the target domain back to the source domain. If an image is first processed by one generator of the machine learning system and then the image so determined is processed by another generator, then the image determined by the other generator can be understood as reconstructed. The purpose of the known method is: the generators are trained for images from the source domain and images from the target domain, respectively, such that the corresponding reconstruction is identical to the corresponding image. Advantageously, the use of the first attention profile and the second attention profile enables the training to be controlled such that certain areas of the image of the source domain and the image of the target domain may be classified as particularly important. The region that is intended to be declared in this way by attention (deklarieren) may preferably contain the object that should be identified on the image. Thus enabling the machine learning method to place focus in reconstruction specifically on objects. The inventors were able to determine that domain adaptation is thereby achieved that enables very accurate transfer of objects from images in the source domain into images in the target domain. In this way, for example, training data sets for object detectors of a target domain may be determined, wherein these training images can be generated from images of the data sets from a source domain, and annotations of the images of the data sets can be used as annotations of the generated training images.

Unlike known methods, a priori information is conveyed to the machine learning system during training by the first and second attention attempts, which indicates to the machine learning system which parts are particularly relevant in domain adaptation.

In a preferred embodiment it is possible that: the first note seeks to characterize separately for the pixels of the source image: whether the pixel belongs to an object imaged on the source image (abbilden), and/or wherein the second attention is intended to be separately characterized for the pixels of the target image: whether a pixel belongs to an object imaged on the target image.

The image and the correspondingly determined reconstruction may in particular be compared pixel by pixel, i.e. the difference between the image and the pixels at the same position in the reconstruction, such as the euclidean distance or the square of the euclidean distance, respectively, may be determined. An attention profile may then be used to assign a weight to each of the determined differences, wherein the weight should be used to obtain the differences from the attention profile. In particular, the pixels of an image may be weighted according to whether the pixels of the image characterize the object. Attention is drawn here to the fact that an image with channels or a matrix is to be understood. The differences determined between the image and the reconstruction can also be understood as an image with channels or matrices, wherein differences at specific locations characterize differences of the image and pixels reconstructed at the same location, respectively. The Hadamard product of the attention map and the difference matrix may then be used to determine the weights for the values of the difference matrix. Based on this, a loss value may then be determined, wherein for example all elements of the result of the Hadamard product are summed, preferably weighted.

In one embodiment it is possible that: the first attention attempt may assign a weight 1 to a pixel of the source image if the pixel belongs to an object imaged in the image, respectively, and a weight 0 to the pixel if the pixel does not belong to any object. In a typical statement in the field of object detection, attention is directed to the fact that it can thus be designed to distinguish foreground from background and to be precise by different values (1 representing foreground, 0 representing background). Alternatively or additionally, it is possible to: the second attention attempt may assign weights to pixels of the target image, respectively, if the pixels belong to an object imaged in the image, and assign a weight of 0 to the pixels if the pixels do not belong to any object.

Alternatively, it is also possible to: the first attention profile and/or the second attention profile may also characterize the probabilities that the corresponding pixels belong to the object, respectively.

In a preferred embodiment it is possible that: the first attention profile is determined based on the source image using the object detector and/or wherein the second attention profile is determined based on the target image using the object detector.

The object detector may in particular be a machine learning system designed for object detection, such as a neural network. The object detector may preferably be trained based on images of the source domain. To train the machine learning system presented in the present invention, the object detector may then determine a first attention map based on the source image. For example, all pixels of the source image that the object detector recognizes as belonging to one object may be assigned a value of 1 in the first attention map, while all other values of the attention map may be set to a value of 0. Similarly, all pixels of the target image that the object detector recognizes as belonging to one object may be assigned a value of 1 in the second attention profile, while all other values of the attention profile may be set to a value of 0.

In general, an object detector is designed to assign each pixel in an image a probability that the pixel is an object. For example, a common neural network for object detection outputs object detection in the form of a bounding box (english) and the probability that the bounding box is an object known to the neural network. The pixels within the bounding box may each be assigned the probabilities in an attention map. If the neural network determines overlapping object detections, the maximum probabilities determined for the pixels, respectively, may be used.

Preferably, in a respective embodiment of the method, the steps of the method may also be performed iteratively and the object detector determines a first attention profile for the source image in each iteration and/or a second attention profile for the target image in each iteration.

In general, a source image may be understood as a dataset derived from the source domain, and a target image may be understood as a dataset derived from the target domain. In particular, multiple images of the respective data sets may be used for training and the steps of the training method may be performed iteratively. In particular, images of the target domain cannot be annotated. In each iteration step of the training, the object detector may then determine an object detection for the source image and the target image of the iteration, respectively, and may then determine the first attention profile or the second attention profile on this basis as set forth in one of the above embodiments. This has the advantage that the image of the source domain dataset can be better transformed with each iteration step by iterative training, i.e. the image transformed by the source domain more and more approximates the image of the target domain.

The machine learning system trained in the method preferably characterizes a neural network, in particular CycleGAN. Alternatively, the machine learning system may also characterize another neural network capable of image-to-image conversion (English: image-to-image translation), such as MADAN or VAE-GAN.

In another aspect, the invention relates to a computer-implemented method for training an object detector, the method comprising the steps of:

providing an input image and an annotation characterizing a position of at least one object imaged in the input image;

determining an intermediate image using a first generator of a machine learning system, the machine learning system having been trained according to one of the embodiments of the first aspect of the invention;

training an object detector, wherein the object detector is trained such that the object detector predicts the one or more objects characterized by the annotation for the intermediate image as input.

The method for training the object detector can be understood as follows: an image corresponding in appearance to an image of the target domain is first determined based on the images of the source domain using a trained machine learning system, and then the object detector is trained based on these images (i.e., intermediate images). The object detector may in particular be trained iteratively, wherein the training image dataset of the source domain can be transformed into a dataset of intermediate images prior to training, which intermediate images are then used to train the object detector. Alternatively, the image from the source domain may also be transformed into an intermediate image separately in each iteration step, which intermediate image is then used to train the object detector.

Advantageously, the object detector may be adapted to the target domain in such a way that no object annotation is required for the image of the target domain. This speeds up the training method of the object detector, as the time for annotating the target domain image is eliminated. The object detector can therefore train (anernen) more images with the same time budget. In turn, the performance of the object detector may thereby be improved, as the object detector may be trained with more images.

In general, an object detector in the sense of the present invention may be understood as being arranged in such a way that it determines, in relation to object detection, a category characterizing the detected object in addition to the position and size of the object in the image.

In a method for training an object detector, a machine learning system may be understood that the machine learning system has been trained in accordance with one embodiment of a method for training the machine learning system. In particular, the method steps for training a machine learning system may thus be part of a method for training an object detector. In particular, the method steps of training the machine learning system may precede the method steps of training the object detector.

In another aspect, the invention relates to a computer-implemented method for determining a control signal for controlling an actuator and/or a display device, the method comprising the steps of:

providing an input image;

determining an object imaged on the input image using an object detector, wherein the object detector has been trained based on a representation of a method for training the object detector;

determining a manipulation signal based on the determined object;

controlling the actuator and/or the display device in dependence on the control signal.

An actuator is understood to mean, in particular, a technical system or a technical system component that influences a technical system or a movement within a technical system. For example, the actuator may be a motor, such as an electric motor, that affects the movement of the robot. Alternatively, it is also possible to: the actuator controls the hydraulic system, for example the actuator may be a pump driving a hydraulic cylinder. The actuator may be a valve for controlling inflow of liquid or gas.

Drawings

Embodiments of the present invention are explained in more detail below with reference to the accompanying drawings. In the drawings:

FIG. 1 illustrates a machine learning system;

FIG. 2 schematically illustrates a method for training a machine learning system;

FIG. 3 schematically illustrates a training system;

fig. 4 schematically shows the structure of a control system for manipulating an actuator;

FIG. 5 schematically illustrates an embodiment for controlling an at least partially autonomous robot;

FIG. 6 schematically illustrates an exemplary embodiment for controlling a manufacturing system;

FIG. 7 schematically illustrates an exemplary embodiment for controlling an access system;

FIG. 8 shows a schematic diagram of an exemplary embodiment for controlling a monitoring system.

Detailed Description

Fig. 1 shows how a source image (x ₁ ) And a target image (x) ₂ ) To determine a loss value for training a machine learning system (70)

Source image (x) ₁ ) A first generator (71) which is passed to a machine learning system (70), wherein the generator (71) is based on the source image (x ₁ ) Determining the generated first image (a ₁ ). Further, the target image (x ₂ ) A second generator (72) that is passed to the machine learning system (70), wherein the second generator (72) is based on the target image (x) ₂ ) Determining the generated second image (a ₂ )。

The generated first image (a ₁ ) Is fed to a second generator (72) to determine a first reconstruction (r ₁ ). Subsequently, the generated first image (a ₁ ) And a first reconstruction (r ₁ ) According to, for example, L _p Standard pixel-by-pixel distance. Then using the first attention map (m ₁ ) These differences are weighted and the weighted differences are summed to determine a first loss value

The generated second image (a ₂ ) Is fed to a first generator (71) to determine a second reconstruction (r ₂ ). Subsequently, the generated second image (a ₂ ) And a second reconstruction (r ₂ ) According to, for example, L _p Standard pixel-by-pixel distance. And then using a second attention map (m ₂ ) Weighting the differences and summing the weighted differences to determine a second loss value

Target image (x) ₂ ) And the generated first image (a ₁ ) Is further conveyed to a first discriminator (73). The first generator (71) and the first discriminator (73) may be understood as generating an impedance network (english: generative adversarial network, GAN). Based on the target image (x ₂ ) And the generated first image (x ₁ ) The first discriminator (73) then generates a first image (a ₁ ) And the target image (x) ₂ ) A first GAN loss value is determined for each pixel of (a). That is, unlike the normal GAN loss value, the average value of the pixel-by-pixel loss values is not used. The first GAN loss value can be understood as a loss value matrix in which the loss value at a position is compared with the target image (x ₂ ) And the generated first image (a ₁ ) Corresponds to the pixel location of (a). Then using the first attention map (m ₁ ) Weighting the first GAN loss value and summing the weighted loss values to determine a third loss value

Source image (x) ₁ ) And the generated second image (a ₂ ) Is further conveyed to a second discriminator (74). The second generator (72) and the second discriminator (74) may be understood as GAN. Based on the source image (x ₁ ) And the generated second image (x ₂ ) The second discriminator (74) then generates a second image (a ₂ ) Is defined as each pixel and target of (1)Image (x) ₁ ) A second GAN loss value is determined for each pixel of (a). That is, unlike the normal GAN loss value, the average value of pixel-by-pixel loss values is not used. The second GAN loss value can be understood as a loss value matrix in which the loss value at a position is compared with the source image (x ₁ ) And the generated second image (a ₂ ) Corresponds to the pixel location of (a). And then using a second attention map (m ₂ ) Weighting the second GAN penalty value and summing the weighted penalty values to determine a fourth penalty value

Loss valueAnd then may be summed, preferably weighted, to obtain a single loss value by which the parameters of the first generator (71) and/or the parameters of the second generator (72) and/or the parameters of the first discriminator (73) and/or the second discriminator (74) may be changed. Individual loss values->The weights of (2) are here the hyper-parameters of the method.

Fig. 2 illustrates in flow chart form a flow of a training method (100) of the machine learning system (70). In an embodiment, the machine learning system is configured as a CycleGAN according to fig. 1. In other embodiments, other configurations are possible.

In a first step (101) a source image is provided from a dataset of source domains and a target image is provided from a dataset of target domains.

In a second step (102), the source image (x) is processed using a pre-trained object detector (e.g. a neural network configured for object detection) ₁ ) And a target image (x ₂ ) In order to determine object detection, respectively. Based on the object detection, then a determination is made as to the source image (x ₁ ) Is to (m) ₁ ) And determining a target image (x ₂ ) Is a second injection of (2)Italian diagram (m) ₂ )。

In a third step (103), according to fig. 1, a first reconstruction (r ₁ )。

In a fourth step (104), a second reconstruction is determined according to fig. 1.

In a fifth step (105), a single loss value is determined according to fig. 1.

In a sixth step (106), the parameters of the first generator (71), the parameters of the second generator (72), the parameters of the first discriminator (73) and the parameters of the second discriminator (74) are trained using a gradient descent method, and thus the machine learning system (70) is trained.

The steps of the method may preferably be repeated iteratively. For example, as termination criteria for the iterative loop, one may choose: a specific number of iterations is completed. Alternatively, the training may also be ended based on a single loss value or a loss value determined based on another data set.

Fig. 3 shows an embodiment of a training system (140) for training a subject detector (60) using a training data set (T). The training data set (T) comprises a plurality of source images (x) for training one of the source fields of the object detector (60) _i ) Wherein the training dataset (T) is other than the respective source images (x _i ) In addition to the desired output signal (t _i ) The output signal is correlated with a source image (x _i ) Corresponds to and characterizes the source image (x _i ) Is a subject detection of (1).

For training, the training data unit (150) accesses a computer-implemented database (St ₂ ) Wherein the database (St ₂ ) -providing said training dataset (T). The training data unit (150) preferably randomly determines at least one source image (x) from the training data set (T) _i ) And (x) the source image (x _i ) Corresponding desired output signal (t _i ) And the source image (x _i ) A first generator (71) to a trained machine learning system (70). The first generator (71) is based on the source image (x _i ) An intermediate image is determined. The intermediate image is similar in appearance to the image of the target domain. The intermediate image is then conveyed to an object detector (60). An object detector (60) determines based on the intermediate imageOutput signal (y) _i )。

The desired output signal (t _i ) And the determined output signal (y _i ) Is passed to a change unit (180).

Then, based on the desired output signal (t) _i ) And the determined output signal (y _i ) A new parameter (Φ') is determined for the object detector (60). For this purpose, the changing unit (180) uses a loss function (English: loss function) to change the desired output signal (t _i ) And the determined output signal (y _i ) A comparison is made. The loss function determines a first loss value, which characterizes the determined output signal (t _i ) Deviates from the desired output signal (t _i ) To a degree of (3). In an embodiment, a negative log likelihood function (English: negative log-likehood function) is chosen as the loss function. Other loss functions are also contemplated in alternative embodiments.

It is also conceivable that the determined output signal (y _i ) And the desired output signal (t _i ) Comprising a plurality of sub-signals, for example in the form of tensors, respectively, wherein the desired output signal (t _i ) Respectively with the determined output signal (y _i ) Corresponds to the sub-signals of (a). For example, it is conceivable that the output signal (t _i ) Respectively, characterizes the probability of occurrence of the object with respect to a part of the source image, whereas the output signal (y _i ) Is indicative of the exact position of the object. For the determined output signal (y _i ) And the desired output signal (t _i ) In the case of a plurality of corresponding sub-signals, the second loss value is preferably determined for the corresponding sub-signals respectively using a suitable loss function and the determined second loss value is suitably combined (zusammen tuhren) to the first loss value, for example by weighted summation.

The changing unit (180) determines a new parameter (Φ') on the basis of the first loss value. In an embodiment, this is done using a gradient descent method, preferably a random gradient descent method, adam or AdamW. In further embodiments, training may also be based on evolutionary algorithms or second order optimization (English: second-order optimization).

The determined new parameters (phi') are stored in a model parameter memory (St ₁ ) Is a kind of medium. The determined new parameter (Φ') is preferably provided as parameter (Φ) to the object detector (60).

In a further preferred embodiment, the described training is iteratively repeated for a predefined number of iteration steps or iteratively repeated until the first loss value is below a predefined threshold. Alternatively or additionally, it is also conceivable to end the training when the average first loss value for the test or validation data set is below a predefined threshold value. In at least one of the iterations, the new parameter (Φ') determined in the previous iteration is used as the parameter (Φ) of the object detector (60).

Furthermore, the training system (140) may include at least one processor (145) and at least one machine-readable storage medium (146) containing instructions that, when executed by the processor (145), cause the training system (140) to perform a training method according to one of the aspects of the invention.

Fig. 4 illustrates the use of an object detector (60) within a control system (40) for an actuator (10) in an environment (20) for controlling the actuator (10). The environment (20) is detected at preferably regular time intervals in a sensor (30), in particular an imaging sensor, for example a camera sensor, which can also be provided by a plurality of sensors, for example a stereo camera. The sensor signals (S) of the sensors (30), or in the case of a plurality of sensors, one sensor signal (S) each, are transmitted to a control system (40). The control system (40) thus receives a sequence (Folge) of sensor signals (S). The control system (40) thus determines the actuating signal (A) which is transmitted to the actuator (10).

The control system (40) receives a sequence of sensor signals (S) of the sensor (30) in an optional receiving unit (50) which converts the sequence of sensor signals (S) into a sequence of input signals (x) (alternatively, each sensor signal (S) can also be employed directly as input signal (x)). For example, the input signal (x) may be a segment (Ausschnitt) of the sensor signal (S) or further processing. In other words, the input signal (x) is determined from the sensor signal (S). The sequence of input signals (x) is fed to an object detector (60).

Preferably, the object detector (60) is parameterized by a parameter (Φ) stored in and provided by the parameter memory (P).

The object detector (60) determines an output signal (y) from the input signal (x). The output signal (y) is fed to an optional conversion unit (80), which determines therefrom a control signal (A), which is fed to the actuator (10) for controlling the actuator (10) accordingly.

The actuator (10) receives the control signal (A), is controlled accordingly and performs a corresponding action. The actuator (10) may in this case comprise (not necessarily structurally integrated) actuation logic which determines a second actuation signal from the actuation signal (a) and then actuates the actuator (10) using the second actuation signal.

In further embodiments, the control system (40) includes a sensor (30). In still further embodiments, the control system (40) may alternatively or additionally further comprise an actuator (10).

In a further preferred embodiment, the control system (40) comprises at least one processor (45) and at least one machine readable storage medium (46), on which machine readable storage medium (46) instructions are stored which, when executed on the at least one processor (45), cause the control system (40) to perform the method according to the invention.

In an alternative embodiment, a display unit (10 a) is provided instead of or in addition to the actuator (10).

Fig. 5 shows how the control system (40) may be used to control an at least partly autonomous robot, here an at least partly autonomous motor vehicle (100).

The sensor (30) may be, for example, a video sensor preferably arranged in a motor vehicle (100).

An object detector (60) is arranged to identify identifiable objects on the input image (x).

The actuator (10) preferably arranged in the motor vehicle (100) can be, for example, a brake, a drive or a steering system of the motor vehicle (100). The actuation signal (A) can then be determined in such a way that the actuator or actuators (10) are actuated, in particular in the case of an object of a certain type, for example a pedestrian, in such a way that the motor vehicle (100) is prevented from colliding with the object identified by the object detector (60), for example.

Alternatively or additionally, the display unit (10 a) can be actuated with an actuating signal (a), and the identified object can be displayed, for example. It is also conceivable that the display unit (10 a) can be actuated with an actuating signal (a) such that the display unit (10 a) outputs an optical or acoustic warning signal if it is determined that the motor vehicle (100) is about to collide with the identified object. The warning by means of the warning signal can also be carried out by means of a haptic warning signal, for example by means of a vibration of the steering wheel of the motor vehicle (100).

Alternatively, the at least partially autonomous robot may also be another mobile robot (not shown), such as a robot that moves by flying, swimming, diving or walking. The mobile robot may also be, for example, an at least partially autonomous mower or an at least partially autonomous cleaning robot. In these cases, the actuating signal (A) can also be determined in such a way that the drive and/or steering system of the mobile robot is actuated in such a way that the at least partially autonomous robot is prevented from, for example, collision with an object identified by the object detector (60).

Fig. 6 shows an embodiment in which the control system (40) is used to control the manufacturing machine (11) of the manufacturing system (200) in such a way that the actuators (10) of the manufacturing machine (11) are controlled. The manufacturing machine (11) may be, for example, a machine for punching, sawing, drilling, welding and/or cutting. It is also conceivable that the manufacturing machine (11) is configured to grip the manufactured product (12 a,12 b) by means of a gripper.

The sensor (30) may be, for example, a video sensor, which for example detects a conveying surface of the conveyor belt (13), on which conveyor belt (13) the manufactured goods (12 a,12 b) may be located, in which case the input signal (x) is an input image (x). The object detector (60) may for example be arranged to determine the position of the manufactured goods (12 a,12 b) on the conveyor belt, the actuator (10) controlling the manufacturing machine (11) may then be manipulated in dependence on the determined position of the manufactured goods (12 a,12 b).

It is also contemplated that the object detector (60) is configured to determine other characteristics of the manufactured article (12 a,12 b) in lieu of or in addition to the location. In particular, it is contemplated that the object detector (60) determines whether the manufactured article (12 a,12 b) is defective and/or damaged. In this case, the actuator (10) can be actuated in such a way that the manufacturing machine (11) sorts out defective and/or damaged finished products (12 a,12 b).

Fig. 7 shows an embodiment wherein a control system (40) is used to control the access system (300). The access system (300) may include physical access controls, such as a door (401). The sensor (30) may be in particular a video sensor or a thermal imaging sensor, which is arranged to detect an area in front of the door (401). In particular, the object detector (60) may detect a person on the transmitted input image (x). If a plurality of persons are detected simultaneously, the identity of the person, for example, can be determined particularly reliably by assigning the persons (i.e. objects) to each other, for example by analysing their movements.

The actuator (10) may be a lock such that access control is released or not released, e.g. the door (401) is opened or not opened, depending on the manipulation signal (a). For this purpose, the control signal (A) can be selected as a function of an output signal (y) which is determined for the input image (x) by means of the object detector (60). For example, it is conceivable to: the output signal (y) comprises information characterizing the identity of the person detected by the object detector (60) and the manipulation signal (a) is selected based on the identity of the person.

Instead of physical access control, logical access control may be set.

Fig. 8 shows an embodiment wherein a control system (40) is used to control the monitoring system (400). This embodiment differs from the embodiment shown in fig. 4 in that a display unit (10 a) is provided instead of an actuator (10) which is operated by a control system (40). For example, the sensor (30) may record an input image (x) on which at least one person should be identified and the position of the at least one person may be detected by the object detector (60). The input image (x) can then be shown on a display unit (10 a), wherein the detected person can be shown with a color emphasis.

The term "computer" includes any device for executing a predefinable calculation rule. These calculation rules can exist in software or in hardware or also in a mixture of software and hardware.

In general, a complex number is understood to be indexed, i.e. each element in the complex number is assigned a unique index, preferably by assigning consecutive integers to the elements contained in the complex number. Preferably, when the complex number comprises N elements, where N is the number of elements in the complex number, then these elements are assigned an integer from 1 to N.

Claims

1. A computer-implemented method (100) for training a machine learning system (70), the method comprising the steps of:

providing (101) a source image (x) from a source domain ₁ ) And a target image (x) ₁ )；

-a first generator (71) using the machine learning system (70) is based on the source image (x) ₁ ) Determining (103) the generated first image (a) ₁ ) And using a second generator (72) of the machine learning system (70) to base the generated first image (a) ₁ ) Determining a first reconstruction (r ₁ )；

-using the second generator (72) based on the target image (x) ₂ ) Determining (104) the generated second image (a) ₂ ) And using the first generator (71) to generate a second image (a) based on the generated second image ₂ ) Determining a second reconstruction (r ₂ )；

Determining (105) a first loss valueWherein said first loss value +.>Characterizing the source image (x ₁ ) And said first reconstruction (r ₁ ) Wherein the first difference is based on a first attention map (m ₁ ) Weighted; determining a second loss value +.>Wherein said second loss value +.>Characterizing the target image (x ₂ ) And said second reconstruction (r ₂ ) Wherein the second difference is based on a second attention profile (m ₂ ) Weighted;

by being based on the first loss valueAnd/or said second loss value +.>-training the first generator (71) and/or the second generator (72) to train the machine learning system (70).

2. The method (100) of claim 1, wherein the first attention map (m ₁ ) For the source image (x ₁ ) Respectively, characterizing: whether a pixel belongs to a pixel in said source image (x ₁ ) An object imaged on, and/or wherein the second attention map (m ₂ ) For the target image (x ₂ ) Respectively, characterizing: whether or not a pixel belongs to the target image (x ₂ ) An object to be imaged.

3. The method (100) according to any one of claims 1 or 2, wherein the object detector is used to determine the position of the object based on the source image (x ₁ ) Determining the first wagerItalian diagram (m) ₁ ) And/or wherein the object detector is used to detect the object based on the target image (x ₂ ) Determining the second attention profile (m ₂ )。

4. A method (100) according to claim 3, wherein the steps of the method are performed iteratively and the object detector determines in each iteration a difference for the source image (x ₁ ) Is to (m) ₁ ) And/or determining in each iteration a target image (x ₂ ) Is to (m) ₂ )。

5. The method (100) according to claim 4, wherein the object detector is designed to determine objects in an image of a road traffic scene.

6. The method (100) of any one of claims 1 to 5, wherein the machine learning system (70) characterizes CycleGAN.

7. A computer-implemented method for training an object detector, the method comprising the steps of:

providing an input image and an annotation, wherein the annotation characterizes a position of at least one object imaged in the input image;

-determining an intermediate image using a first generator (71) of a machine learning system (70), which has been trained according to one of claims 1 to 6;

8. A computer-implemented method for determining a control signal (a) for controlling an actuator (10) and/or a display device (10 a), the method comprising the steps of:

providing an input image (x);

determining an object imaged on the input image (x) using an object detector, wherein the object detector has been trained according to claim 7;

-determining the manipulation signal (a) based on the determined object;

-actuating the actuator (10) and/or the display device (10 a) as a function of the actuating signal.

9. Training device (140) arranged to perform the method according to any of claims 1 to 7.

10. Computer program arranged, when executed by a processor (45, 145), to perform the method according to any of claims 1 to 8.

11. A machine readable storage medium (46, 146) having stored thereon a computer program according to claim 10.