DE102021202342A1

DE102021202342A1 - Method and device for training a classifier and/or regressor

Info

Publication number: DE102021202342A1
Application number: DE102021202342.8A
Authority: DE
Inventors: Simon Weissenmayer
Original assignee: Robert Bosch GmbH
Current assignee: Robert Bosch GmbH
Priority date: 2021-03-10
Filing date: 2021-03-10
Publication date: 2022-09-15

Abstract

Computerimplementiertes Verfahren zum Trainieren eines neuronalen Netzes, wobei das neuronale Netz ausgebildet ist basierend auf einem Eingabebild eine Klassifikation und/oder eine Regression durchzuführen und das neuronale Netz die Klassifikation und/oder die Regression basierend auf einer Schicht des neuronalen Netzes durchführt, umfassend die Schritte:• Ermitteln einer Schichteingabe der Schicht basierend auf einem Trainingseingabebild;• Ermitteln einer Mehrzahl von Ausgabepixeln mittels der Schicht, wobei ein Ausgabepixel basierend auf zumindest einem Eingabepixel der Schichteingabe und zumindest einem weiteren Ausgabepixel ermittelt wird;• Ermitteln einer Schichtausgabe basierend auf der Mehrzahl von Ausgabepixeln;• Ermitteln einer Klassifikation und/oder eines Regressionswertes basierend auf der Schichtausgabe;• Trainieren des neuronalen Netzes basierend auf einer Abweichung der ermittelten Klassifikation und/oder des ermittelten Regressionswertes bezüglich einer gewünschten Klassifikation und/oder eines gewünschten Regressionswertes des Trainingseingabebildes.Computer-implemented method for training a neural network, the neural network being designed to carry out a classification and/or a regression based on an input image and the neural network carrying out the classification and/or the regression based on a layer of the neural network, comprising the steps: • determining a layer input of the layer based on a training input image;• determining a plurality of output pixels using the layer, wherein an output pixel is determined based on at least one input pixel of the layer input and at least one other output pixel;• determining a layer output based on the plurality of output pixels; • determining a classification and/or a regression value based on the slice output;• training the neural network based on a deviation of the determined classification and/or the determined regression value with respect to a desired class sification and/or a desired regression value of the training input image.

Description

Stand der TechnikState of the art

Aus LeCun et al. „Object Recognition with Gradient-based Learning“, 1999, online verfügbar: http://yann.lecun.com/exdb/publis/pdf/lecun-99.pdf sind neuronale Faltungsnetze bekannt.From LeCun et al. "Object Recognition with Gradient-based Learning", 1999, available online: http://yann.lecun.com/exdb/publis/pdf/lecun-99.pdf convolutional neural networks are known.

Vorteile der ErfindungAdvantages of the Invention

Neuronale Faltungsnetze (engl. convolutional neural networks, CNNs) sind in vielen Bildklassifikationsproblemen einsetzbar. Im Allgemeinen liegt die Eingabe eines CNN als zwei- oder dreidimensionale Matrix (z. B. die Pixel eines Graustufen- oder Farbbildes) vor. Das CNN umfasst sogenannte Neuronen, die in Faltungsschichten (engl. convolutional layer) angeordnet sind. Die Ausgabe jedes Neurons wird über eine diskrete Faltung (daher der Zusatz convolutional) berechnet. Intuitiv wird dabei schrittweise eine vergleichsweise kleine Faltungsmatrix (auch Filterkernel) über die Eingabe bewegt. Die Eingabe eines Neurons im Convolutional Layer berechnet sich als inneres Produkt des Filterkernels mit dem aktuell unterliegenden Bildausschnitt. Dementsprechend reagieren benachbarte Neuronen im Convolutional Layer auf sich überlappende Bereiche (lokale Umgebungen im Bild).Convolutional neural networks (CNNs) can be used in many image classification problems. In general, the input to a CNN is a two- or three-dimensional matrix (e.g., the pixels of a grayscale or color image). The CNN comprises so-called neurons, which are arranged in convolutional layers. The output of each neuron is computed via a discrete convolution (hence the addition convolutional). A comparatively small convolution matrix (also known as a filter kernel) is moved intuitively step by step over the input. The input of a neuron in the convolutional layer is calculated as the inner product of the filter kernel with the currently underlying image section. Accordingly, neighboring neurons in the convolutional layer react to overlapping areas (local surroundings in the image).

Der Bildausschnitt, der von einem Filterkernel ausgewertet werden kann ist genauso groß wie der Filterkernel selbst d.h. der Filterkernel ist ein Filter mit endlicher Impulsantwort.The image section that can be evaluated by a filter kernel is just as large as the filter kernel itself, i.e. the filter kernel is a filter with a finite impulse response.

Damit Informationen größerer Bildbereiche ausgewertet werden können, können die Informationen mit einer Zusammenfassungsschicht (engl. Pooling Layer) verdichtet und anschließend in mit einem Filterkernel ausgewertet werden. Damit durch den Pooling Layer hochaufgelöste Informationen kleiner Bildausschnitte nicht verloren gehen, können dem Pooling-Layer Convolutional-Layer vorangeschaltet werden.So that information from larger image areas can be evaluated, the information can be compressed with a pooling layer and then evaluated with a filter kernel. So that high-resolution information of small image sections is not lost due to the pooling layer, the pooling layer can be preceded by a convolutional layer.

CNNs können im Allgemeinen mit einer größeren Tiefe, also einer größeren Anzahl von Schichten, eine höhere Vorhersagegenauigkeit erreichen. Ein Einsatz von verhältnismäßig vielen Schichten bedingt jedoch, dass ein Training des CNNs häufig durch das Problem der verschwindenden Gradienten (engl. Vanishing Gradients) erschwert wird.In general, CNNs can achieve higher prediction accuracy with greater depth, i.e. a greater number of layers. However, using a relatively large number of layers means that training the CNN is often made more difficult by the problem of vanishing gradients.

Zur Lösung dieses Problems wird ein anderes Verfahren zum Training von CNNs vorgeschlagen. Ein und derselbe Filterkernel soll sowohl Bildausschnitte auswerten können, die deutlich größer sind als der Filterkernel selbst, als auch sehr hochaufgelöste Bildausschnitte berücksichtigen können. Dabei soll die Anzahl der hierfür benötigten Schichten im CNN möglichst gering sein, weil das Vanishing Gradient Problem mit der Anzahl der Schichten in einem Netz zunimmt und eine der wesentlichen begrenzenden Größen darstellt. Außerdem sollen möglichst wenige Gewichte benötigt werden.To solve this problem, another method of training CNNs is proposed. One and the same filter kernel should be able to evaluate image sections that are significantly larger than the filter kernel itself, as well as very high-resolution image sections. The number of layers required for this in the CNN should be as small as possible, because the vanishing gradient problem increases with the number of layers in a network and is one of the main limiting factors. In addition, as few weights as possible should be required.

Hierfür wird der Filterkernel als Filter mit unendlicher Impulsantwort ausgeführt. Das bedeutet, dass der Filterkernel rekurrent ist bzw. bereits berechnete Ausgabewerte aus der unmittelbaren Umgebung des aktuell zu berechnenden Ausgabewerts mit einbezieht. Dadurch ist das Filter prinzipiell in der Lage, Informationen über Bildbereiche zu transportieren, die weit größer sind, als der Filterkernel selbst.For this purpose, the filter kernel is implemented as a filter with an infinite impulse response. This means that the filter kernel is recurrent or includes output values that have already been calculated from the immediate vicinity of the output value that is currently to be calculated. As a result, the filter is in principle able to transport information about image areas that are far larger than the filter kernel itself.

Offenbarung der ErfindungDisclosure of Invention

In einem ersten Aspekt betrifft die Erfindung ein computerimplementiertes Verfahren zum Trainieren eines neuronalen Netzes, wobei das neuronale Netz ausgebildet ist basierend auf einem Eingabebild eine Klassifikation und/oder eine Regression durchzuführen und das neuronale Netz die Klassifikation und/oder die Regression basierend auf einer Schicht des neuronalen Netzes durchführt, umfassend die Schritte:

• Ermitteln einer Schichteingabe der Schicht basierend auf einem Trainingseingabebild;
• Ermitteln einer Mehrzahl von Ausgabepixeln mittels der Schicht, wobei ein Ausgabepixel basierend auf zumindest einem Eingabepixel der Schichteingabe und zumindest einem weiteren Ausgabepixel ermittelt wird;
• Ermitteln einer Schichtausgabe basierend auf der Mehrzahl von Ausgabepixeln;
• Ermitteln einer Klassifikation und/oder eines Regressionswertes basierend auf der Schichtausgabe;
• Trainieren des neuronalen Netzes basierend auf einer Abweichung der ermittelten Klassifikation und/oder des ermittelten Regressionswertes bezüglich einer gewünschten Klassifikation und/oder eines gewünschten Regressionswertes des Trainingseingabebildes.

In a first aspect, the invention relates to a computer-implemented method for training a neural network, the neural network being designed to carry out a classification and/or a regression based on an input image and the neural network to carry out the classification and/or the regression based on a layer of the neural network, comprising the steps:

• determining a slice input of the slice based on a training input image;
• determining a plurality of output pixels using the layer, wherein an output pixel is determined based on at least one input pixel of the layer input and at least one further output pixel;
• determining a layer output based on the plurality of output pixels;
• determining a classification and/or a regression value based on the layer output;
• Training of the neural network based on a deviation of the determined classification and/or the determined regression value with respect to a desired classification and/or a desired regression value of the training input image.

Das Trainingseingabebild oder das Eingabebild können insbesondere von einem Sensor ermittelt werden, zum Beispiel einem Videosensor, LIDAR-Sensor, Radarsensor, Ultraschallsensor oder einer Thermalkamera.The training input image or the input image can in particular be determined by a sensor, for example a video sensor, LIDAR sensor, radar sensor, ultrasonic sensor or a thermal camera.

Falls die Schicht eine erste Schicht des neuronalen Netzes ist, kann die Schichteingabe insbesondere das Trainingseingabebild selber oder ein Ausschnitt des Trainingseingabebildes sein. Andernfalls kann die Schichteingabe eine Ausgabe einer der Schicht vorhergehenden Schicht sein.If the layer is a first layer of the neural network, the layer input can in particular be the training input image itself or a section of the training input image. Otherwise, the layer input may be an output of a layer preceding the layer.

Die Schichteingabe kann in beiden Fällen vorzugsweise als ein Tensor vorliegen. Der Tensor charakterisiert eine diskrete Höhe und eine diskrete Breite, wobei an jedem Punkt entlang der Höhe und der Breite sich ein Pixel befindet. Im Eingabebild kann beispielsweise an einer beliebigen Position ein Pixel vorliegen, dass einen Roten, Grünen und Blauen Kanal charakterisiert, vorzugsweise als Vektor. Falls die Schichteingabe die Ausgabe der vorhergehenden Schicht ist, kann die Schichtausgabe eine Mehrzahl von Merkmalskarten (engl. Feature Maps) umfassen, wobei ein Pixel jeweils Werte mehrerer Merkmalskarten an einem Punkt entlang der Höhe und der Breite der Merkmalskarte charakterisiert.In both cases, the layer input can preferably be present as a tensor. The tensor characterizes a discrete height and width, with a pixel at each point along the height and width. In the input image, for example, there can be a pixel at any position that characterizes a red, green, and blue channel, preferably as a vector. If the layer input is the output of the previous layer, the layer output may include a plurality of feature maps, where a pixel characterizes values of multiple feature maps at a point along the height and width of the feature map.

Die Schicht ist eingerichtet, basierend auf der Schichteingabe die Schichtausgabe zu ermitteln. Vorzugsweise umfasst die Schicht dafür eine Mehrzahl von Filtern. Jeder Filter ist eingerichtet einen Ausschnitt der Schichteingabe zu verarbeiten und für den Ausschnitt eine Ausgabe zu ermitteln, die in der Schichtausgabe bereitgestellt wird. Ausgabe ist vorzugsweise Teil einer Merkmalskarte, die wiederum als Teil der Schichtausgabe ausgegeben wird. Die Schichtausgabe umfasst, wie die Schichteingabe, Pixel.The layer is configured to determine the layer output based on the layer input. For this purpose, the layer preferably comprises a plurality of filters. Each filter is configured to process a portion of the slice input and to determine an output for the slice that is provided in the slice output. Output is preferably part of a feature map, which in turn is output as part of the layer output. Layer output, like layer input, includes pixels.

Die Schicht ermittelt, im Gegensatz zu bekannten Convolutional Layern, ihre Ausgabe sowohl auf Basis der Schichteingabe als auch auf Basis bereits ermittelter Werte der Schichtausgabe. Vorzugsweise geschieht dies auf Basis einer Verarbeitungsvorschrift wobei die Verarbeitungsvorschrift durch die Formel $h_{x, y} = B \cdot u_{x, y} + A \cdot h,$

charakterisiert wird, wobei B und A Matrizen von Gewichten der Schicht sind, die während des Trainings angepasst werden können, u_x,_y ein Vektor von Eingabepixeln in einer Umgebung einer Position x, y der Schichteingabe ist und h_∗ ein Vektor von Ausgabepixeln in einer Umgebung der Position in der Schichtausgabe ist.In contrast to known convolutional layers, the layer determines its output both on the basis of the layer input and on the basis of already determined values of the layer output. This is preferably done on the basis of a processing specification, the processing specification being represented by the formula

H_{x, y} = B \cdot {and}_{x, y} + A \cdot H,

is characterized, where B and A are matrices of weights of the layer, which can be adjusted during training, u _x , _y is a vector of input pixels in a neighborhood of a position x, y of the layer input and h _∗ a vector of output pixels in a environment of the position in the layer output.

Um ein wiederholtes Anwenden der Verarbeitungsvorschrift zu verhindern, kann vorzugsweise die Verarbeitungsvorschrift, nach der die Schicht den Ausgabepixel basierend auf den zumindest einen Eingabepixeln und dem zumindest einem weiteren Ausgabepixel ermittelt, mittels einer Z-Transformation transformiert wird und die Schichtausgabe basierend auf der transformierten Verarbeitungsvorschrift verarbeitet wird.In order to prevent the processing specification from being used repeatedly, the processing specification, according to which the layer determines the output pixel based on the at least one input pixel and the at least one further output pixel, can preferably be transformed by means of a Z transformation and the layer output processed based on the transformed processing specification becomes.

Vorzugsweise kann hierbei die Z-Transformation der Verarbeitungsvorschrift durch die Formel $H_{z} = \frac{B \cdot Z}{A \cdot Z} U_{z}$

charakterisiert wird, wobei Z ein Vektor darstellt, wobei jedes Element des Vektors Matrizen diskreter Ortsfrequenzen enthält und wobei B und A Matrizen von Gewichten der Schicht sind, die während des Trainings angepasst werden und/oder wobei B und A auf die nicht transformierte Bearbeitungsvorschrift übertragen werden und/oder die Ausgabe des Netzes mit Hilfe der nicht transformierten Bearbeitungsvorschrift berechnet wird.In this case, the Z transformation of the processing specification can preferably be carried out by the formula

H_{e.g} = \frac{B \cdot Z}{A \cdot Z} u_{e.g}

is characterized, where Z represents a vector, where each element of the vector contains matrices of discrete spatial frequencies and where B and A are matrices of weights of the layer, which are adjusted during training and/or where B and A are transferred to the non-transformed processing rule and/or the output of the network is calculated using the non-transformed processing rule.

Insbesondere kann hierdurch das CNN im Frequenzbereich trainiert werden, wobei die ermittelten Gewichte direkt im Ortsbereich verwendet werden können. Dadurch kann zur Inferenz auch wahlweise auf die Transformation in den Frequenzbereich verzichtet werden.In this way, in particular, the CNN can be trained in the frequency domain, with the determined weights being able to be used directly in the spatial domain. As a result, the transformation into the frequency domain can also optionally be dispensed with for the inference.

Um basierend auf der so transformierten Schichteingabe die Schichtausgabe zu ermitteln, kann vorzugsweise die Schichteingabe basierend auf einer 2-dimensionale Fourier Transformation transformiert werden und die transformierte Schichteingabe als Schichteingabe im Verfahren verwendet werden.In order to determine the slice output based on the slice input transformed in this way, the slice input can preferably be transformed based on a 2-dimensional Fourier transformation and the transformed slice input can be used as slice input in the method.

Damit das CNN auch nichtlineare Zusammenhänge der Trainingsbilder lernt, ist es von Vorteil Aktivierungsfunktionen zwischen zumindest Teilen der Schichten des CNN zu schalten.So that the CNN also learns non-linear relationships of the training images, it is advantageous to switch activation functions between at least parts of the layers of the CNN.

Vorzugsweise kann hierfür die Schichtausgabe mittels einer inversen 2-dimensionalen Fourier Transformation transformiert werden und die transformierte Schichtausgabe einer Aktivierungsfunktion zugeführt werden und die Klassifikation und/oder der Regressionswert basierend auf einer Ausgabe der Aktivierungsfunktion ermittelt werden.For this purpose, the slice output can preferably be transformed by means of an inverse 2-dimensional Fourier transformation and the transformed slice output can be supplied to an activation function and the classification and/or the regression value can be determined based on an output of the activation function.

Soll die Ausgabe der Aktivierungsfunktion wieder als Schichteingabe eines weiteren Convolutional Layer dienen, kann die Ausgabe der Aktivierungsfunktion wieder mittels einer 2-dimensionalen Fourier Transformation transformiert werden.If the output of the activation function is to serve as the layer input of another convolutional layer, the output of the activation function can be transformed again using a 2-dimensional Fourier transformation.

Nachfolgend werden Ausführungsformen der Erfindung unter Bezugnahme auf die beiliegenden Zeichnungen näher erläutert. In den Zeichnungen zeigen:

1 schematisch ein Trainingssystem zum Trainieren eines neuronalen Netzes;
2 schematisch einen Aufbau eines Steuerungssystems zur Ansteuerung eines Aktors;
3 schematisch ein Ausführungsbeispiel zur Steuerung eines wenigstens teilautonomen Roboters;
4 schematisch ein Ausführungsbeispiel zur Steuerung eines Fertigungssystems;
5 schematisch ein Ausführungsbeispiel zur Steuerung eines Zugangssystems;
6 schematisch ein Ausführungsbeispiel zur Steuerung eines Überwachungssystems;
7 schematisch ein Ausführungsbeispiel zur Steuerung eines persönlichen Assistenten;
8 schematisch ein Ausführungsbeispiel zur Steuerung eines medizinisch bildgebenden Systems;
9 schematisch ein Ausführungsbeispiel einer medizinischen Analysevorrichtung.

Embodiments of the invention are explained in more detail below with reference to the accompanying drawings. In the drawings show:

1 schematically a training system for training a neural network;
2 schematically a structure of a control system for controlling an actuator;
3 schematically an embodiment for controlling an at least partially autonomous robot;
4 schematically an embodiment for controlling a manufacturing system;
5 schematically an embodiment for controlling an access system;
6 schematically an embodiment for controlling a monitoring system;
7 schematically an embodiment for controlling a personal assistant;
8th schematically an embodiment for controlling a medical imaging system;
9 schematically an embodiment of a medical analysis device.

Beschreibung der AusführungsbeispieleDescription of the exemplary embodiments

1 zeigt ein Ausführungsbeispiel eines Trainingssystems (140) zum Trainieren eines neuronalen Netzes (60) des Steuerungssystems (40) mittels eines Trainingsdatensatzes (T). Der Trainingsdatensatz (T) umfasst eine Mehrzahl von Eingabebildern (x_i), die zum Trainieren des neuronalen Netzes (60) verwendet werden, wobei der Trainingsdatensatz (T) ferner zu jeweils einem Eingabebild (x_i) ein gewünschtes Ausgabesignal (t_i) umfasst, welches mit dem Eingabebild (x_i) korrespondiert und eine Klassifikation des Eingabebilds (x_i) oder einen das Eingabebild (x_i) beschreibenden Regressionswert charakterisiert. 1 shows an embodiment of a training system (140) for training a neural network (60) of the control system (40) using a training data set (T). The training data set (T) comprises a plurality of input images (x _i ), which are used to train the neural network (60), the training data set (T) also comprising a desired output signal (t _i ) for each input image (x _i ). , which corresponds to the input image (x _i ) and characterizes a classification of the input image (x _i ) or a regression value describing the input image (x _i ).

Zum Training greift eine Trainingsdateneinheit (150) auf eine computerimplementierte Datenbank (St₂) zu, wobei die Datenbank (St₂) den Trainingsdatensatz (T) zur Verfügung stellt. Die Trainingsdateneinheit (150) ermittelt aus dem Trainingsdatensatz (T) vorzugsweise zufällig zumindest ein Eingabebild (x_i) und das zum Eingabebild (x_i) korrespondierende gewünschte Ausgabesignal (t_i) und übermittelt das Eingabebild (x_i) an das neuronale Netz (60). Das neuronale Netz (60) ermittelt auf Basis des Eingabebild (x_i) ein Ausgabesignal (y_i).For training, a training data unit (150) accesses a computer-implemented database (St ₂ ), the database (St ₂ ) making the training data set (T) available. The training data unit (150) preferably randomly determines at least one input image (x _i ) and the desired output signal (t _i ) corresponding to the input image (x _i ) from the training data set (T) and transmits the input image (x _i ) to the neural network (60 ). The neural network (60) determines an output signal (y _i ) on the basis of the input image (x _i ).

Zur Ermittlung des Ausgabesignals (y_i) umfasst das neuronale Netz (60) ein Convolutional Layer, welche Filterkernel einer festen Größe aufweist. Im Ausführungsbeispiel ist die Größe 3x3, wobei in anderen Ausführungsbeispielen andere Größen gewählt werden können. Konkret wird ein Faltungstensor B von Gewichten des Convolutional Layer mit Pixeln u der Schichteingabe verrechnet und ein weiterer Faltungstensor A mit Pixeln einer Schichtausgabe des Convolutional Layer h wie folgt verrechnet: $\begin{array}{l} h_{x, y} = B_{0,0} \cdot u_{x, y} + B_{0, - 1} \cdot B_{0, - 2} \cdot u_{x, y - 2} + B_{- 1,0} \cdot u_{x - 1, y} + \dots B_{- 2, - 2} \\ \cdot u_{x - 2, y} + A_{0. - 1} \cdot A_{0, - 1} \cdot h_{x, y - 1} + A_{0, - 2} \cdot h_{x, y - 2} + A_{- 1,0} \cdot h_{x - 1, y} + \dots \\ + A_{- 2, - 2} \cdot h_{x - 2, y - 2} \\ = B \cdot u_{x, y} + A \cdot h \end{array}$

To determine the output signal (y _i ), the neural network (60) includes a convolutional layer, which has filter kernels of a fixed size. In the exemplary embodiment, the size is 3×3, although other sizes can be selected in other exemplary embodiments. Specifically, a convolution tensor B of weights of the convolutional layer is calculated with pixels u of the layer input and another convolution tensor A with pixels of a layer output of the convolutional layer h as follows:

\begin{array}{l} H_{x, y} = B_{0.0} \cdot {and}_{x, y} + B_{0, - 1} \cdot B_{0, - 2} \cdot {and}_{x, y - 2} + B_{- 1.0} \cdot {and}_{x - 1, y} + \dots B_{- 2, - 2} \\ \cdot {and}_{x - 2, y} + A_{0 - 1} \cdot A_{0, - 1} \cdot H_{x, y - 1} + A_{0, - 2} \cdot H_{x, y - 2} + A_{- 1.0} \cdot H_{x - 1, y} + \dots \\ + A_{- 2, - 2} \cdot H_{x - 2, y - 2} \\ = B \cdot {and}_{x, y} + A \cdot H \end{array}

Um eine ressourcenintensive iterative Berechnung (verursacht durch algebraische Schleifen) zu vermeiden, werden ausschließlich bereits berechnete Ausgabewerte in die Berechnung einbezogen d.h. Ausgabewerte mit negativem Index bezogen auf den aktuell zu berechnenden Ausgabewert. Durch die Berücksichtigung der zuvor berechneten Ausgabewerte ist es von Bedeutung in welcher der vier Ecken der Schichteingabe mit der Filterung begonnen wird. Wird beispielsweise in der linken oberen Ecke mit der Filterung begonnen, dann können Informationen dort die Ausgabe in der rechten unteren Bildecke beeinflussen, aber nicht umgekehrt. Es ist möglich im Convolutional Layer Filterkernel für jeweils alle vier unterschiedlichen Bildecken einzusetzen.In order to avoid a resource-intensive iterative calculation (caused by algebraic loops), only output values that have already been calculated are included in the calculation, ie output values with a negative index related to the output value currently to be calculated. By considering For the previously calculated output values, it is important in which of the four corners of the layer input the filtering is started. For example, if filtering starts in the top left corner, information there can affect the output in the bottom right corner of the image, but not vice versa. It is possible to use filter kernels in the convolutional layer for all four different image corners.

Das Training des neuronalen Netzes kann mit den bekannten Methoden zum Training rekurrenter neuronaler Netze geschehen, d.h., über ein ausrollen der Filterkerne über die Schichteingabe. Insbesondere bei hochauflösenden Bildern wird die Tiefe des neuronalen Netzes durch das Entfalten deutlich erhöht. Dadurch können beim Training verwendete Gradienten entweder viel zu groß oder viel zu klein werden (vanishing gradient problem). Das kann dazu führen, dass Schichten am Anfang des neuronalen Netzes selbst nicht korrekt trainiert werden können. Um das zu vermeiden, werden die Gewichte des Filterkernels im Ortsfrequenzbereich trainiert und anschließend für die rekurrente Ausführung verwendet.The neural network can be trained using the known methods for training recurrent neural networks, i.e. by rolling out the filter kernels via the layer input. In the case of high-resolution images in particular, the depth of the neural network is significantly increased by the unfolding. As a result, gradients used in training can either become too large or too small (vanishing gradient problem). This can mean that layers at the beginning of the neural network itself cannot be trained correctly. To avoid this, the weights of the filter kernel are trained in the spatial frequency range and then used for the recurrent execution.

Die Differenzengleichung wird mit der Z-Transformation in den Spektralbereich transformiert: $H_{z} = \frac{B \cdot Z}{A \cdot Z} U_{z},$

Z = (z_{x}^{0} \cdot z_{y}^{0 T} z_{x}^{0} \cdot z_{y}^{- 1 T} z_{x}^{0} \cdot z_{y}^{- 2 T} \dots z_{x}^{- 2} \cdot z_{y}^{- 2 T}),

z_{x} = e^{j \frac{2 π}{N_{x}} x},

z_{y} = e^{j \frac{2 π}{N_{y}} y},

x = {(01 \dots N_{x} - 1)}^{T},

y = {(01 \dots N_{y} - 1)}^{T},

wobei N_x und N_y die Anzahl an Elementen der Schichteingabe entlang der Breite und Höhe angeben und Z ein Vektor ist, wobei jedes dessen Elemente Matrizen diskreter Ortsfrequenzen enthält. Zunächst werden die Skalarprodukte von Zähler und Nenner berechnet und die verbleibenden Matrizen von Zähler und Nenner elementweise geteilt. Anschließend wird elementweise einer transformierten Schichteingabe U multipliziert. Die Transformation der Schichteingabe erfolgt vorzugsweise mit einer schnellen 2-dimensionalen Fourier Transformation.The difference equation is transformed into the spectral domain using the Z transformation:

H_{e.g} = \frac{B \cdot Z}{A \cdot Z} u_{e.g},

Z = ({e.g}_{x}^{0} \cdot {e.g}_{y}^{0 T} {e.g}_{x}^{0} \cdot {e.g}_{y}^{- 1 T} {e.g}_{x}^{0} \cdot {e.g}_{y}^{- 2 T} ... {e.g}_{x}^{- 2} \cdot {e.g}_{y}^{- 2 T}),

{e.g}_{x} = e^{j \frac{2 π}{N_{x}} x},

{e.g}_{y} = e^{j \frac{2 π}{N_{y}} y},

x = {(01 ... N_{x} - 1)}^{T},

y = {(01 ... N_{y} - 1)}^{T},

where N _x and N _{y denote} the number of elements of the slice input along width and height, and Z is a vector, each element of which contains matrices of discrete spatial frequencies. First, the scalar products of the numerator and denominator are calculated, and the remaining matrices of the numerator and denominator are divided element by element. Subsequently, a transformed layer input U is multiplied element by element. The slice input is preferably transformed with a fast 2-dimensional Fourier transform.

Die Schichtausgabe H wird vorzugsweise mit einer schnellen inversen 2-dimesionalen Fourier Transformation zurückprojiziert und einer nicht-linearen Aktivierungsfunktion übergeben. Basierend auf der Ausgabe der nichtlinearen Aktivierungsfunktion können dann weitere Schichten des neuronalen Netzes weitere Schichtausgaben ermitteln. Falls die Schicht eine letzte Schicht des neuronalen Netzes ist, kann die Schichtausgabe nach der Rücktransformation auch direkt als Ausgabesignal (y_i) des neuronalen Netzes dienen.The slice output H is preferably back-projected using a fast inverse 2-dimensional Fourier transform and passed to a non-linear activation function. Further layers of the neural network can then determine further layer outputs based on the output of the non-linear activation function. If the layer is a last layer of the neural network, the layer output after inverse transformation can also serve directly as the output signal (y _i ) of the neural network.

Das gewünschte Ausgabesignal (t_i) und das ermittelte Ausgabesignal (y_i) werden an eine Veränderungseinheit (180) übermittelt.The desired output signal (t _i ) and the determined output signal (y _i ) are transmitted to a changing unit (180).

Basierend auf dem gewünschten Ausgabesignal (t_i) und dem ermittelten Ausgabesignal (y_i) werden dann von der Veränderungseinheit (180) neue Parameter (Φ') für den Klassifikator (60) bestimmt. Hierfür vergleicht die Veränderungseinheit (180) das gewünschte Ausgabesignal (t_i) und das ermittelte Ausgabesignal (y_i) mittels einer Verlustfunktion (engl. Loss Function). Die Verlustfunktion ermittelt einen ersten Verlustwert, der charakterisiert, wie weit das ermittelte Ausgabesignal (y_i) vom gewünschten Ausgabesignal (t_i) abweicht. Als Verlustfunktion wird im Ausführungsbeispiel eine negative logarithmierte Plausibilitätsfunktion (engl. negative log-likehood function) gewählt. In alternativen Ausführungsbeispielen sind auch andere Verlustfunktion denkbar.Based on the desired output signal (t _i ) and the ascertained output signal (y _i ), the changing unit (180) then determines new parameters (Φ′) for the classifier (60). For this purpose, the modification unit (180) compares the desired output signal (t _i ) and the determined output signal (y _i ) using a loss function. The loss function determines a first loss value that characterizes how far the determined output signal (y _i ) deviates from the desired output signal (t _i ). In the exemplary embodiment, a negative logarithmic plausibility function (negative log-likehood function) is selected as the loss function. Other loss functions are also conceivable in alternative exemplary embodiments.

Weiterhin ist vorstellbar, dass das ermittelte Ausgabesignal (y_i) und das gewünschte Ausgabesignal (t_i) jeweils eine Mehrzahl von Untersignalen, zum Beispiel in Form von Tensoren, umfassen, wobei jeweils ein Untersignal des gewünschten Ausgabesignals (t_i) mit einem Untersignal des ermittelten Ausgabesignals (y_i) korrespondiert. Zum Beispiel ist vorstellbar, dass das neuronale Netz (60) zur Objektdetektion ausgebildet ist und ein erstes Untersignal jeweils eine Auftretenswahrscheinlichkeit eines Objekts bezüglich eines Teils des Eingabesignals (x_i) charakterisiert und zweites Untersignal die exakte Position des Objekts charakterisiert. Für den Fall, dass das ermittelte Ausgabesignal (y_i) und das gewünschte Ausgabesignal (t_i) eine Mehrzahl von korrespondierenden Untersignalen umfasst, wird vorzugsweise für jeweils korrespondierende Untersignale mittels einer geeigneten Verlustfunktion ein zweiter Verlustwert ermittelt und die ermittelten zweiten Verlustwerte geeignet zum ersten Verlustwert zusammengeführt, beispielsweise über eine gewichtete Summe.It is also conceivable that the determined output signal (y _i ) and the desired output signal (t _i ) each comprise a plurality of sub-signals, for example in the form of tensors, with each a sub-signal of the desired output signal (t _i ) corresponds to a sub-signal of the determined output signal (y _i ). For example, it is conceivable that the neural network (60) is designed for object detection and a first sub-signal characterizes a probability of an object occurring with regard to a part of the input signal (x _i ) and a second sub-signal characterizes the exact position of the object. In the event that the determined output signal (y _i ) and the desired output signal (t _i ) comprises a plurality of corresponding sub-signals, a second loss value is preferably determined for each corresponding sub-signal by means of a suitable loss function and the determined second loss values are suitable for the first loss value brought together, for example via a weighted sum.

Die Veränderungseinheit (180) ermittelt auf Grundlage des ersten Verlustwertes die neuen Parameter (Φ'). Im Ausführungsbeispiel geschieht dies mittels eines Gradientenabstiegsverfahren, vorzugsweise Stochastic Gradient Descent, Adam, oder AdamW.The changing unit (180) determines the new parameters (Φ') based on the first loss value. In the exemplary embodiment, this is done using a gradient descent method, preferably Stochastic Gradient Descent, Adam, or AdamW.

Die ermittelten neuen Parameter (Φ') werden in einem Modellparameterspeicher (St₁) gespeichert. Vorzugsweise werden die ermittelten neuen Parameter (Φ') als Parameter (Φ) des neuronalen Netzes (60) bereitgestellt.The determined new parameters (Φ') are stored in a model parameter memory (St ₁ ). The determined new parameters (Φ′) are preferably provided as parameters (Φ) of the neural network (60).

In weiteren bevorzugten Ausführungsbeispielen wird das beschriebene Training iterativ für eine vordefinierte Anzahl an Iterationsschritten wiederholt oder iterativ wiederholt, bis der erste Verlustwert einen vordefinierten Schwellenwert unterschreitet. Alternativ oder zusätzlich ist auch vorstellbar, dass das Training beendet wird, wenn ein durchschnittlicher erster Verlustwert bezüglich eines Test- oder Validierungsdatensatzes einen vordefinierten Schwellenwert unterschreitet. In mindestens einer der Iterationen werden die in einer vorherigen Iteration bestimmten neuen Parameter (Φ') als Parameter (Φ) des neuronalen Netzes (60) verwendet.In further preferred exemplary embodiments, the training described is repeated iteratively for a predefined number of iteration steps or iteratively repeated until the first loss value falls below a predefined threshold value. Alternatively or additionally, it is also conceivable that the training is ended when an average first loss value with regard to a test or validation data record falls below a predefined threshold value. In at least one of the iterations, the new parameters (Φ') determined in a previous iteration are used as parameters (Φ) of the neural network (60).

Des Weiteren kann das Trainingssystem (140) mindestens einen Prozessor (145) und mindestens ein maschinenlesbares Speichermedium (146) umfassen, welches Befehle enthält, welche, wenn sie durch den Prozessor (145) ausgeführt werden, das Trainingssystem (140) veranlassen, ein Trainingsverfahren nach einem der Aspekte der Erfindung auszuführen.Furthermore, the training system (140) can comprise at least one processor (145) and at least one machine-readable storage medium (146) containing instructions which, when executed by the processor (145), cause the training system (140) to implement a training method according to one of the aspects of the invention.

2 zeigt einen Aktor (10) in seiner Umgebung (20) in Interaktion mit einem Steuerungssystem (40). In vorzugsweise regelmäßigen zeitlichen Abständen wird die Umgebung (20) in einem Sensor (30), insbesondere einem bildgebenden Sensor wie einem Kamerasensor, erfasst, der auch durch eine Mehrzahl von Sensoren gegeben sein kann, beispielsweise eine Stereokamera. Das Sensorsignal (S) - bzw. im Fall mehrerer Sensoren je ein Sensorsignal (S) - des Sensors (30) wird an das Steuerungssystem (40) übermittelt. Das Steuerungssystem (40) empfängt somit eine Folge von Sensorsignalen (S). Das Steuerungssystem (40) ermittelt hieraus Ansteuersignale (A), welche an den Aktor (10) übertragen werden. 2 shows an actuator (10) in its environment (20) in interaction with a control system (40). The environment (20) is recorded at preferably regular time intervals in a sensor (30), in particular an imaging sensor such as a camera sensor, which can also be provided by a plurality of sensors, for example a stereo camera. The sensor signal (S) - or in the case of several sensors one sensor signal (S) each - of the sensor (30) is transmitted to the control system (40). The control system (40) thus receives a sequence of sensor signals (S). From this, the control system (40) determines control signals (A) which are transmitted to the actuator (10).

Das Steuerungssystem (40) empfängt die Folge von Sensorsignalen (S) des Sensors (30) in einer optionalen Empfangseinheit (50), die die Folge von Sensorsignalen (S) in eine Folge von Eingabesignalen (x) umwandelt (alternativ kann auch unmittelbar je das Sensorsignal (S) als Eingangssignal (x) übernommen werden). Das Eingabesignal (x) kann beispielsweise ein Ausschnitt oder eine Weiterverarbeitung des Sensorsignals (S) sein. Mit anderen Worten wird das Eingabesignal (x) abhängig von Sensorsignal (S) ermittelt. Die Folge von Eingabesignalen (x) wird dem neuronalen Netz (60) zugeführt.The control system (40) receives the sequence of sensor signals (S) from the sensor (30) in an optional receiving unit (50), which converts the sequence of sensor signals (S) into a sequence of input signals (x) (alternatively, each of the Sensor signal (S) can be accepted as input signal (x)). The input signal (x) can, for example, be a section or further processing of the sensor signal (S). In other words, the input signal (x) is determined as a function of the sensor signal (S). The sequence of input signals (x) is fed to the neural network (60).

Das neuronale Netz (60) wird vorzugsweise parametriert durch Parameter (Φ), die in einem Parameterspeicher (P) hinterlegt sind und von diesem bereitgestellt werden.The neural network (60) is preferably parameterized by parameters (Φ) that are stored in a parameter memory (P) and are provided by this.

Das neuronale Netz (60) ermittelt aus den Eingangssignalen (x) Ausgabesignale (y). Die Ausgabesignale (y) werden einer optionalen Umformeinheit (80) zugeführt, die hieraus Ansteuersignale (A) ermittelt, welche dem Aktor (10) zugeführt werden, um den Aktor (10) entsprechend anzusteuern.The neural network (60) determines output signals (y) from the input signals (x). The output signals (y) are fed to an optional conversion unit (80), which uses them to determine control signals (A) which are fed to the actuator (10) in order to control the actuator (10) accordingly.

Der Aktor (10) empfängt die Ansteuersignale (A), wird entsprechend angesteuert und führt eine entsprechende Aktion aus. Der Aktor (10) kann hierbei eine (nicht notwendigerweise baulich integrierte) Ansteuerlogik umfassen, welches aus dem Ansteuersignal (A) ein zweites Ansteuersignal ermittelt, mit dem dann der Aktor (10) angesteuert wird.The actuator (10) receives the control signals (A), is controlled accordingly and carries out a corresponding action. The actuator (10) can include control logic (not necessarily structurally integrated), which determines a second control signal from the control signal (A), with which the actuator (10) is then controlled.

In weiteren Ausführungsformen umfasst das Steuerungssystem (40) den Sensor (30). In noch weiteren Ausführungsformen umfasst das Steuerungssystem (40) alternativ oder zusätzlich auch den Aktor (10).In further embodiments, the control system (40) includes the sensor (30). In still other embodiments, the control system (40) alternatively or additionally also includes the actuator (10).

In weiteren bevorzugten Ausführungsformen umfasst das Steuerungssystem (40) zumindest einen Prozessor (45) und zumindest ein maschinenlesbares Speichermedium (46), auf dem Anweisungen gespeichert sind, die dann, wenn sie auf dem zumindest einen Prozessor (45) ausgeführt werden, das Steuerungssystem (40) veranlassen, das erfindungsgemäße Verfahren auszuführen.In further preferred embodiments, the control system (40) comprises at least one processor (45) and at least one machine-readable storage medium (46) on which instructions are stored which, when they are executed on the at least one processor (45), the control system ( 40) cause the method according to the invention to be carried out.

In alternativen Ausführungsformen ist alternativ oder zusätzlich zum Aktor (10) eine Anzeigeeinheit (10a) vorgesehen.In alternative embodiments, a display unit (10a) is provided as an alternative or in addition to the actuator (10).

3 zeigt, wie das Steuerungssystem (40) zur Steuerung eines wenigstens teilautonomen Roboters, hier eines wenigstens teilautonomen Kraftfahrzeugs (100), eingesetzt werden kann. 3 shows how the control system (40) can be used to control an at least partially autonomous robot, here an at least partially autonomous motor vehicle (100).

Bei dem Sensor (30) kann es sich beispielsweise um einen vorzugsweise im Kraftfahrzeug (100) angeordneten Videosensor handeln. Die Eingabesignale (x) können in diesem Fall als Eingabebilder verstanden werden und das neuronale Netz (60) als Bildklassifikator.The sensor (30) can be, for example, a video sensor that is preferably arranged in the motor vehicle (100). In this case, the input signals (x) can be understood as input images and the neural network (60) as an image classifier.

Der Bildklassifikator (60) ist eingerichtet, auf den Eingabebildern (x) erkennbare Objekte zu identifizieren.The image classifier (60) is set up to identify recognizable objects on the input images (x).

Bei dem vorzugsweise im Kraftfahrzeug (100) angeordneten Aktor (10) kann es sich beispielsweise um eine Bremse, einen Antrieb oder eine Lenkung des Kraftfahrzeugs (100) handeln. Das Ansteuersignal (A) kann dann derart ermittelt werden, dass der Aktor oder die Aktoren (10) derart angesteuert wird, dass das Kraftfahrzeug (100) beispielsweise eine Kollision mit den vom Bildklassifikator (60) identifizierten Objekten verhindert, insbesondere, wenn es sich um Objekte bestimmter Klassen, z.B. um Fußgänger, handelt.The actuator (10), which is preferably arranged in the motor vehicle (100), can be, for example, a brake, a drive or a steering system of the motor vehicle (100). The control signal (A) can then be determined in such a way that the actuator or actuators (10) is controlled in such a way that the motor vehicle (100), for example, prevents a collision with the objects identified by the image classifier (60), in particular if they are Objects of certain classes, e.g. pedestrians.

Alternativ oder zusätzlich kann mit dem Ansteuersignal (A) die Anzeigeeinheit (10a) angesteuert werden, und beispielsweise die identifizierten Objekte dargestellt werden. Auch ist es denkbar, dass die Anzeigeeinheit (10a) mit dem Ansteuersignal (A) derart angesteuert wird, dass sie ein optisches oder akustisches Warnsignal ausgibt, wenn ermittelt wird, dass das Kraftfahrzeug (100) droht, mit einem der identifizierten Objekte zu kollidieren. Die Warnung mittels eines Warnsignals kann auch mittels eines haptischen Warnsignals erfolgen, beispielsweise über ein Vibrieren eines Lenkrads des Kraftfahrzeugs (100).Alternatively or additionally, the display unit (10a) can be controlled with the control signal (A) and, for example, the identified objects can be displayed. It is also conceivable that the display unit (10a) is controlled with the control signal (A) in such a way that it emits an optical or acoustic warning signal if it is determined that the motor vehicle (100) is threatening to collide with one of the identified objects. The warning by means of a warning signal can also be given by means of a haptic warning signal, for example via a vibration of a steering wheel of the motor vehicle (100).

Alternativ kann es sich bei dem wenigstens teilautonomen Roboter auch um einen anderen mobilen Roboter (nicht abgebildet) handeln, beispielsweise um einen solchen, der sich durch Fliegen, Schwimmen, Tauchen oder Schreiten fortbewegt. Bei dem mobilen Roboter kann es sich beispielsweise auch um einen wenigstens teilautonomen Rasenmäher oder einen wenigstens teilautonomen Putzroboter handeln. Auch in diesen Fällen kann das Ansteuersignal (A) derart ermittelt werden, dass Antrieb und/oder Lenkung des mobilen Roboters derart angesteuert werden, dass der wenigstens teilautonome Roboter beispielsweise eine Kollision mit vom Bildklassifikator (60) identifizierten Objekten verhindert.Alternatively, the at least partially autonomous robot can also be another mobile robot (not shown), for example one that moves by flying, swimming, diving or walking. The mobile robot can, for example, also be an at least partially autonomous lawn mower or an at least partially autonomous cleaning robot. In these cases too, the control signal (A) can be determined in such a way that the drive and/or steering of the mobile robot are controlled in such a way that the at least partially autonomous robot prevents, for example, a collision with objects identified by the image classifier (60).

4 zeigt ein Ausführungsbeispiel, in dem das Steuerungssystem (40) zur Ansteuerung einer Fertigungsmaschine (11) eines Fertigungssystems (200) verwendet wird, indem ein die Fertigungsmaschine (11) steuernder Aktor (10) angesteuert wird. Bei der Fertigungsmaschine (11) kann es sich beispielsweise um eine Maschine zum Stanzen, Sägen, Bohren und/oder Schneiden handeln. Weiterhin ist denkbar, dass die Fertigungsmaschine (11) ausgebildet ist mittels eines Greifers ein Fertigungserzeugnis (12a, 12b) zu greifen. 4 shows an exemplary embodiment in which the control system (40) is used to control a production machine (11) of a production system (200), in that an actuator (10) controlling the production machine (11) is controlled. The production machine (11) can be, for example, a machine for punching, sawing, drilling and/or cutting. It is also conceivable that the manufacturing machine (11) is designed to grip a manufactured product (12a, 12b) by means of a gripper.

Bei dem Sensor (30) kann es sich dann beispielsweise um einen Videosensor handeln, der z.B. die Förderfläche eines Förderbandes (13) erfasst, wobei sich auf dem Förderband (13) Fertigungserzeugnissen (12a, 12b) befinden können. Die Eingabesignale (x) sind in diesem Fall Eingabebilder (x) und das neuronale Netz (60) ein Bildklassifikator. Der Bildklassifikator (60) kann beispielsweise eingerichtet sein eine Position der Fertigungserzeugnisse (12a, 12b) auf dem Förderband zu ermitteln. Der die Fertigungsmaschine (11) steuernde Aktor (10) kann dann abhängig von den ermittelten Positionen der Fertigungserzeugnisse (12a, 12b) angesteuert werden. Beispielsweise kann der Aktor (10) derart angesteuert werden, dass er ein Fertigungserzeugnis (12a, 12b) an einer vorbestimmten Stelle des Fertigungserzeugnisses (12a, 12b) stanzt, sägt, bohrt und/oder schneidet.The sensor (30) can then be a video sensor, for example, which detects the conveying surface of a conveyor belt (13), for example, with manufactured products (12a, 12b) being able to be located on the conveyor belt (13). In this case, the input signals (x) are input images (x) and the neural network (60) is an image classifier. The image classifier (60) can be set up, for example, to determine a position of the manufactured products (12a, 12b) on the conveyor belt. The actuator (10) controlling the production machine (11) can then be controlled depending on the determined positions of the manufactured products (12a, 12b). For example, the actuator (10) can be controlled in such a way that it punches, saws, drills and/or cuts a manufactured product (12a, 12b) at a predetermined point on the manufactured product (12a, 12b).

Weiterhin ist denkbar, dass der Bildklassifikator (60) ausgebildet ist, alternativ oder zusätzlich zur Position weitere Eigenschaften eines Fertigungserzeugnisses (12a, 12b) zu ermitteln. Insbesondere ist vorstellbar, dass der Bildklassifikator (60) ermittelt, ob ein Fertigungserzeugnis (12a, 12b) defekt und/oder beschädigt ist. In diesem Fall kann der Aktor (10) derart angesteuert werden, dass die Fertigungsmaschine (11) ein defektes und/oder beschädigtes Fertigungserzeugnis (12a, 12b) aussortiert.It is also conceivable that the image classifier (60) is designed to determine further properties of a manufactured product (12a, 12b) as an alternative or in addition to the position. In particular, it is conceivable that the image classifier (60) determines whether a manufactured product (12a, 12b) is defective and/or damaged. In this case, the actuator (10) can be controlled in such a way that the production machine (11) sorts out a defective and/or damaged product (12a, 12b).

5 zeigt ein Ausführungsbeispiel, bei dem das Steuerungssystem (40) zur Steuerung eines Zugangssystems (300) eingesetzt wird. Das Zugangssystem (300) kann eine physische Zugangskontrolle umfassen, beispielsweise eine Tür (401). Der Sensor (30) kann insbesondere ein Videosensor oder Wärmebildsensor sein, der eingerichtet ist, einen Bereich vor der Tür (401) zu erfassen. Das neuronale Netz (60) kann daher als Bildklassifikator verstanden werden. Mittels des Bildklassifikators (60) kann ein erfasstes Bild interpretiert werden. Insbesondere kann der Bildklassifikators (60) Personen auf einem ihm übermittelten Eingabebild (x) detektieren. Sind mehrere Personen gleichzeitig detektiert worden, kann durch eine Zuordnung der Personen (also der Objekte) zueinander beispielweise die Identität der Personen besonders zuverlässig ermittelt werden, beispielsweise durch eine Analyse ihrer Bewegungen. 5 shows an embodiment in which the control system (40) is used to control an access system (300). The access system (300) may include a physical access control, such as a door (401). The sensor (30) can in particular be a video sensor or thermal imaging sensor that is set up to detect an area in front of the door (401). The neural network (60) can therefore be understood as an image classifier. A captured image can be interpreted using the image classifier (60). In particular, the image classifier (60) can detect people in an input image (x) transmitted to it. If several people have been detected at the same time, the identity of the people can be determined particularly reliably by assigning the people (ie the objects) to one another, for example by analyzing their movements.

Der Aktor (10) kann ein Schloss sein, dass abhängig vom Ansteuersignal (A) die Zugangskontrolle freigibt, oder nicht, beispielsweise die Tür (401) öffnet, oder nicht. Hierzu kann das Ansteuersignal (A) abhängig vom mittels des Bildklassifikators (60) zum Eingabebild (x) ermittelten Ausgabesignal (y) gewählt werden. Beispielsweise ist denkbar, dass das Ausgabesignal (y) Informationen umfasst, die die Identität einer vom Bildklassifikator (60) detektierten Person charakterisiert, und das Ansteuersignal (A) basierend auf der Identität der Person gewählt wird.The actuator (10) can be a lock that, depending on the control signal (A), releases the access control or not, for example the door (401) opens or not. For this purpose, the control signal (A) can be selected depending on the output signal (y) determined by means of the image classifier (60) for the input image (x). For example, it is conceivable that the output signal (y) includes information that characterizes the identity of a person detected by the image classifier (60), and the control signal (A) is selected based on the identity of the person.

An Stelle der physischen Zugangskontrolle kann auch eine logische Zugangskontrolle vorgesehen sein.A logical access control can also be provided instead of the physical access control.

6 zeigt ein Ausführungsbeispiel, bei dem das Steuerungssystem (40) zur Steuerung eines Überwachungssystems (400) verwendet wird. Von dem in 4 dargestellten Ausführungsbeispiel unterscheidet sich dieses Ausführungsbeispiel dadurch, dass an Stelle des Aktors (10) die Anzeigeeinheit (10a) vorgesehen ist, die vom Steuerungssystem (40) angesteuert wird. Beispielsweise kann der Sensor (30) ein Eingabebild (x) aufzeichnen, auf dem zumindest eine Person zu erkennen ist, und die Position der zumindest einen Person mittels des Bildklassifikators (60) detektiert werden. Das Eingabebild (x) kann dann auf der Anzeigeeinheit (10a) dargestellt werden, wobei die detektierten Personen farblich hervorgehoben dargestellt werden können. 6 shows an embodiment in which the control system (40) is used to control a monitoring system (400). From the in 4 illustrated embodiment, this embodiment differs in that instead of the actuator (10) the display unit (10a) is provided, which is controlled by the control system (40). For example, the sensor (30) can record an input image (x) in which at least one person can be identified, and the position of the at least one person can be detected using the image classifier (60). The input image (x) can then be displayed on the display unit (10a), with the detected persons being able to be displayed highlighted in color.

7 zeigt ein Ausführungsbeispiel, bei dem das Steuerungssystem (40) zur Steuerung eines persönlichen Assistenten (250) eingesetzt wird. Der Sensor (30) ist bevorzugt ein optischer Sensor, der Bilder einer Geste eines Nutzers (249) empfängt, beispielsweise ein Videosensor oder eine Wärmebildkamera. In diesem Fall ist das neuronale Netz (60) ein Bildklassifikator. 7 shows an embodiment in which the control system (40) is used to control a personal assistant (250). The sensor (30) is preferably an optical sensor that receives images of a gesture of a user (249), for example a video sensor or a thermal imaging camera. In this case the neural network (60) is an image classifier.

Abhängig von den Signalen des Sensors (30) ermittelt das Steuerungssystem (40) ein Ansteuersignal (A) des persönlichen Assistenten (250), beispielsweise, indem der Bildklassifikator (60) eine Gestenerkennung durchführt. Dem persönlichen Assistenten (250) wird dann dieses ermittelte Ansteuersignal (A) übermittelt und er somit entsprechend angesteuert. Das ermittelte Ansteuersignal (A) kann insbesondere derart gewählt werden, dass es einer vermuteten gewünschten Ansteuerung durch den Nutzer (249) entspricht. Diese vermutete gewünschte Ansteuerung kann abhängig von der vom Bildklassifikator (60) erkannten Geste ermittelt werden. Das Steuerungssystem (40) kann dann abhängig von der vermuteten gewünschten Ansteuerung das Ansteuersignal (A) zur Übermittlung an den persönlichen Assistenten (250) wählen und/oder das Ansteuersignal (A) zur Übermittlung an den persönlichen Assistenten entsprechend der vermuteten gewünschten Ansteuerung (250) wählen.Depending on the signals from the sensor (30), the control system (40) determines a control signal (A) for the personal assistant (250), for example by the image classifier (60) carrying out gesture recognition. This determined control signal (A) is then transmitted to the personal assistant (250) and he is thus controlled accordingly. The control signal (A) determined can be selected in particular in such a way that it corresponds to an assumed desired control by the user (249). This presumed desired activation can be determined depending on the gesture recognized by the image classifier (60). Depending on the assumed desired activation, the control system (40) can then select the activation signal (A) for transmission to the personal assistant (250) and/or the activation signal (A) for transmission to the personal assistant according to the assumed desired activation (250) Select.

Diese entsprechende Ansteuerung kann beispielsweise beinhalten, dass der persönliche Assistent (250) Informationen aus einer Datenbank abruft und sie für den Nutzer (249) rezipierbar wiedergibt.This corresponding control can include, for example, the personal assistant (250) retrieving information from a database and reproducing it in a receptive manner for the user (249).

Anstelle des persönlichen Assistenten (250) kann auch ein Haushaltsgerät (nicht abgebildet), insbesondere eine Waschmaschine, ein Herd, ein Backofen, eine Mikrowelle oder eine Spülmaschine vorgesehen sein, um entsprechend angesteuert zu werden.Instead of the personal assistant (250), a household appliance (not shown), in particular a washing machine, a cooker, an oven, a microwave or a dishwasher, can also be provided in order to be controlled accordingly.

8 zeigt ein Ausführungsbeispiel, bei dem das Steuerungssystem (40) zur Steuerung eines medizinischen bildgebenden Systems (500), beispielsweise eines MRT-, Röntgen- oder Ultraschallgeräts, verwendet wird. Der Sensor (30) kann beispielsweise durch einen bildgebenden Sensor gegeben sein. Das neuronale Netz (60) kann daher als Bildklassifikator verstanden werden. Durch das Steuerungssystem (40) wird die Anzeigeeinheit (10a) angesteuert. 8th shows an embodiment in which the control system (40) is used to control a medical imaging system (500), for example an MRT, X-ray or ultrasound device. The sensor (30) can be an imaging sensor, for example. The neural network (60) can therefore be understood as an image classifier. The display unit (10a) is controlled by the control system (40).

Der Sensor (30) ist eingerichtet ein Bild eines Patienten zu ermitteln, beispielsweise ein Röntgenbild, ein MRT-Bild oder ein Ultraschallbild. Zumindest ein Teil des Bildes wird als Eingabebild (x) an den Bildklassifikator (60) übermittelt. Der Bildklassifikator (60) kann beispielsweise eingerichtet sein, unterschiedlicher Arten eines auf dem Eingabebild (x) zu erkennenden Gewebes zu klassifizieren, beispielsweise über eine semantische Segmentierung.The sensor (30) is set up to determine an image of a patient, for example an X-ray image, an MRT image or an ultrasound image. At least part of the image is passed to the image classes as the input image (x). fikator (60) submitted. The image classifier (60) can be set up, for example, to classify different types of tissue to be recognized in the input image (x), for example via semantic segmentation.

Das Ansteuersignal (A) kann dann derart gewählt werden, dass die ermittelten Arten von Gewebe auf der Anzeigeeinheit (10a) farblich hervorgehoben dargestellt werden.The control signal (A) can then be selected in such a way that the determined types of tissue are shown highlighted in color on the display unit (10a).

In weiteren Ausführungsbeispielen (nicht gezeigt) kann das bildgebende System (500) auch für nicht medizinische Zwecke verwendet werden, zum Beispiel um Materialeigenschaften eines Werkstücks zu ermitteln. Zu diesem Zweck kann das bildgebende System (500) ein Bild eines Werkstücks aufzeichnen. Der Bildklassifikator (60) kann in diesem Fall derart eingerichtet sein, dass er zumindest einen Teil des Bildes als Eingabebild (x) entgegennimmt und bezüglich der Materialeigenschaften des Werkstücks klassifiziert. Dies kann beispielsweise über eine semantische Segmentierung des Eingabebildes (x) geschehen. Die so ermittelte Klassifikation kann beispielsweise zusammen mit dem Eingabebild auf der Anzeigevorrichtung (10a) dargestellt werden.In further exemplary embodiments (not shown), the imaging system (500) can also be used for non-medical purposes, for example to determine material properties of a workpiece. For this purpose, the imaging system (500) can record an image of a workpiece. In this case, the image classifier (60) can be set up in such a way that it accepts at least part of the image as an input image (x) and classifies it with regard to the material properties of the workpiece. This can be done, for example, via a semantic segmentation of the input image (x). The classification determined in this way can, for example, be displayed on the display device (10a) together with the input image.

9 zeigt ein Ausführungsbeispiel, in dem das Steuerungssystem (40) eine medizinische Analysevorrichtung (600) steuert. Der Analysevorrichtung (600) wird eine Mikroreihe (601, engl. Microarray) zugeführt, welche eine Mehrzahl von Testfeldern (602) umfasst, wobei die Testfelder mit einer Probe bestrichen wurden. Die Probe kann beispielsweise einem Abstrich eines Patienten entstammen. 9 shows an embodiment in which the control system (40) controls a medical analysis device (600). The analysis device (600) is supplied with a microarray (601) which comprises a plurality of test fields (602), the test fields having been smeared with a sample. The sample can come from a smear of a patient, for example.

Das Microarray (601) kann ein DNA-Microarray oder ein Protein-Microarray sein. The microarray (601) can be a DNA microarray or a protein microarray.

Der Sensor (30) ist eingerichtet das Microarray (601) aufzuzeichnen. Als Sensor (30) kann insbesondere ein optischer Sensor verwendet werden, vorzugsweise ein Videosensor. Das neuronale Netz (60) kann daher als Bildklassifikator verstanden werden.The sensor (30) is set up to record the microarray (601). In particular, an optical sensor, preferably a video sensor, can be used as the sensor (30). The neural network (60) can therefore be understood as an image classifier.

Der Bildklassifikator (60) ist eingerichtet basierend auf einem Bild des Microarray (601) das Ergebnis einer Analyse der Probe zu bestimmen. Insbesondere kann der Bildklassifikator eingerichtet sein basierend auf dem Bild zu klassifizieren, ob das Microarray das Vorhandensein eines Virus innerhalb der Probe anzeigt.The image classifier (60) is set up to determine the result of an analysis of the sample based on an image of the microarray (601). In particular, the image classifier can be configured to classify based on the image whether the microarray indicates the presence of a virus within the sample.

Das Ansteuersignal (A) kann dann derart gewählt werden, dass das Ergebnis der Klassifikation auf der Anzeigevorrichtung (10a) dargestellt wird.The control signal (A) can then be selected in such a way that the result of the classification is displayed on the display device (10a).

Der Begriff „Computer“ umfasst beliebige Geräte zur Abarbeitung vorgebbarer Rechenvorschriften. Diese Rechenvorschriften können in Form von Software vorliegen, oder in Form von Hardware, oder auch in einer Mischform aus Software und Hardware.The term "computer" includes any device for processing predeterminable calculation rules. These calculation rules can be in the form of software, or in the form of hardware, or in a mixed form of software and hardware.

Im Allgemeinen kann eine Mehrzahl als indexiert verstanden werden, d.h. jedem Element der Mehrzahl wird ein eindeutiger Index zugewiesen, vorzugsweise durch Zuweisung aufeinanderfolgender Ganzzahlen an die in der Mehrzahl enthaltenen Elemente. Vorzugsweise, wenn eine Mehrzahl N Elemente umfasst, wobei N die Anzahl der Elemente in der Mehrzahl ist, werden den Elementen die ganzen Zahlen von 1 bis N zugewiesen.In general, a plurality can be understood as indexed, i.e. each element of the plurality is assigned a unique index, preferably by assigning consecutive integers to the elements contained in the plurality. Preferably, when a plurality comprises N elements, where N is the number of elements in the plurality, integers from 1 to N are assigned to the elements.

Claims

Computer-implemented method for training a neural network, the neural network being designed to carry out a classification and/or a regression based on an input image and the neural network carrying out the classification and/or the regression based on a layer of the neural network, comprising the steps: • determining a slice input of the slice based on a training input image; • determining a plurality of output pixels using the layer, wherein an output pixel is determined based on at least one input pixel of the layer input and at least one further output pixel; • determining a layer output based on the plurality of output pixels; • determining a classification and/or a regression value based on the layer output; • Training of the neural network based on a deviation of the determined classification and/or the determined regression value with respect to a desired classification and/or a desired regression value of the training input image.

procedure after claim 1 , wherein a non-transformed processing specification, according to which the layer determines the output pixel based on the at least one input pixel and the at least one further output pixel, is transformed by means of a Z transformation and the layer output is trained based on the transformed processing specification.

procedure after claim 2 , where the slice input is determined based on a 2-dimensional Fourier transform.

Procedure according to one of claims 2 or 3 , where the non-transformed processing rule is given by the formula

H_{x, y} = B \cdot {and}_{x, y} + A \cdot H,

is characterized, where B and A are matrices of weights of the layer, which can be adjusted during training, u _x,y is a vector of input pixels in a neighborhood of a position x,y of the layer input and h _∗ a vector of output pixels in a environment of the position in the layer output.

procedure after claim 4 , where the Z-transformation of the processing rule is given by the formula

H_{e.g} = \frac{B \cdot Z}{A \cdot Z} u_{e.g}

Method according to one of the preceding claims, wherein the slice output is transformed by means of an inverse 2-dimensional Fourier transform and the transformed slice output is supplied to an activation function and the classification and/or the regression value is determined based on an output of the activation function.

Computer-implemented method for classifying an image and/or for determining a regression value with respect to an image, comprising the steps: • Training the neural network according to any one of the preceding claims; • Determining a classification and/or the regression value based on the image using the classifier.

procedure after claim 7 , wherein an actuator (10) is controlled based on the classification and/or the regression value.

Neural network according to any of Claims 1 until 6 .

Training device (140), which is set up, the method according to any one of Claims 1 until 6 to execute.

Control device (40), which is designed, the method according to one of Claims 7 or 8th to execute.

Computer program which is set up, the method according to one of Claims 1 until 8th to be executed when executed by a processor (45, 145).

Machine-readable storage medium (46, 146) on which the computer program claim 12 is saved.