DE102020209651A1

DE102020209651A1 - METHOD OF IMAGE PROCESSING BY A NEURAL NETWORK

Info

Publication number: DE102020209651A1
Application number: DE102020209651.1A
Authority: DE
Inventors: Pauline Lux; Annika Baumann
Original assignee: Basler AG
Current assignee: Basler AG
Priority date: 2020-07-30
Filing date: 2020-07-30
Publication date: 2022-02-03

Abstract

Die vorliegende Erfindung betrifft ein computerimplementiertes Verfahren in einem künstlichen neuronalen Netzwerk, KNN, wobei das KNN mindestens einen Convolutional Layer mit mindestens einem Filterkernel umfasst, mit den Schritten Drehen jedes Filterkernels um mindestens einen Winkel aus 90, 180 und 270 Grad, wodurch für jeden Filterkernel für jeden für ein Drehen verwendeten Winkel jeweils ein gedrehter Filterkernel erzeugt wird, wobei die erzeugten gedrehten Filterkernel zusammen mit dem jeweiligen Filterkernel eine Kernelgruppe bilden, Filtern eines Bildes durch jeden Filterkernel einer Kernelgruppe, wodurch für jeden Filterkernel einer Kernelgruppe jeweils eine Merkmalskarte erzeugt wird, und Zusammenfassen aller erzeugten Merkmalskarten zu einer Gesamtmerkmalskarte. Weiterhin wird eine entsprechende Vorrichtung, sowie ein computerlesbares Medium vorgeschlagen.The present invention relates to a computer-implemented method in an artificial neural network, ANN, the ANN comprising at least one convolutional layer with at least one filter kernel, with the steps of rotating each filter kernel by at least one angle from 90, 180 and 270 degrees, whereby for each filter kernel a rotated filter kernel is generated for each angle used for rotating, the rotated filter kernels generated together with the respective filter kernel forming a kernel group, filtering an image through each filter kernel of a kernel group, whereby a respective feature map is generated for each filter kernel of a kernel group, and Combining all generated feature maps into an overall feature map. Furthermore, a corresponding device and a computer-readable medium are proposed.

Description

Die vorliegende Erfindung betrifft ein computerimplementiertes Verfahren zur Verarbeitung eines Bildes in einem Convolutional Neural Network (CNN) (dt. etwa „faltendes neuronales Netzwerk“), das ein sogenanntes künstliches neuronales Netzwerk (KNN) ist. Die Begriffe CNN, KNN und NN werden in diesem Dokument gleichwertig verwendet.The present invention relates to a computer-implemented method for processing an image in a convolutional neural network (CNN), which is a so-called artificial neural network (ANN). The terms CNN, KNN and NN are used interchangeably in this document.

Convolutional Neural Networks (CNN) sind bekannt, z.B. aus „Object Recognition with Gradient-Based Learning“ von Yann LeCun et al. Hierbei werden durch das CNN automatisch Merkmale (engl. features) aus den Bildern extrahiert. Innerhalb einer Faltungsschicht (engl. Convolutional Layer, auch bezeichnet mit CL oder Conv Layer) wird der Input auf die gleiche Weise durch mindestens einen Filterkernel verarbeitet. Dies wird auch Parameter-Sharing genannt und trägt zur Translationsäquivarianz der Faltungsschichten bei. Durch die automatisierte Merkmals-Extraktion (engl. Feature-Extraction) wird die Effizienz eines CNNs, bei beispielsweise einer Klassifikation (engl. Classification) von Bildern, wesentlich erhöht. Aufgrund dessen eignen sich CNNs zudem speziell für die Verarbeitung von zweidimensionalen Bildern.Convolutional Neural Networks (CNN) are known, e.g. from "Object Recognition with Gradient-Based Learning" by Yann LeCun et al. CNN automatically extracts features from the images. Within a convolutional layer (also known as CL or Conv Layer), the input is processed in the same way by at least one filter kernel. This is also called parameter sharing and contributes to the translational equivariance of the convolutional layers. The automated feature extraction significantly increases the efficiency of a CNN, for example when classifying images. Because of this, CNNs are also particularly suitable for processing two-dimensional images.

Sogenannte künstliche Intelligenz (KI; engl. artificial intelligence: AI) und insbesondere Machine Learning (ML) und Deep Learning (DL) wird aufgrund des Technologiefortschrittes in immer komplexeren und nichtlinearen Problemstellungen in unterschiedlichsten Bereichen von Wissenschaft und Technik eingesetzt.So-called artificial intelligence (AI) and in particular machine learning (ML) and deep learning (DL) is used due to technological advances in increasingly complex and non-linear problems in various areas of science and technology.

Beispielsweise in der Ophthalmologie, Dermatologie und Mikroskopie, also in der diagnostischen 2D-Bildgebung, oder medizinischen Diagnostik sowie Genetik nimmt deshalb der Einsatz von Deep Learning Algorithmen zur Automatisierung manueller Bildverarbeitungsaufgaben wie Klassifikation, Lokalisation und Segmentierung rapide zu. Eine wichtige Anwendung ist beispielsweise die Zellklassifikation. Die Erfindung beschränkt sich allerdings nicht hierauf, sondern kann auf Bilddaten jeder Art angewendet werden.For example, in ophthalmology, dermatology and microscopy, i.e. in diagnostic 2D imaging, or medical diagnostics and genetics, the use of deep learning algorithms to automate manual image processing tasks such as classification, localization and segmentation is increasing rapidly. An important application, for example, is cell classification. However, the invention is not limited to this, but can be applied to image data of any type.

Hierbei umfasst die Lösung einer Bildverarbeitungsaufgabe mit einer künstlichen Intelligenz in der Regel eine Datenakquisition, ein Training und eine Evaluation des Modells. Hierbei wird der Begriff der Datenakquisition allgemein dafür verwendet, die notwendigen Daten für ein Training zu beschaffen. Dazu kann die Datenakquisition auch Vorgänge beinhalten, die diese Daten für das Training vorbereiten oder optimieren. Ein Beispiel solcher Vorgänge ist zum Beispiel Data Augmentation, welches eingesetzt wird, um die Datenbasis zu vergrößern.The solution to an image processing task with artificial intelligence usually includes data acquisition, training and an evaluation of the model. The term data acquisition is generally used here to obtain the necessary data for training. In addition, data acquisition can also include processes that prepare or optimize this data for training. An example of such processes is data augmentation, which is used to enlarge the database.

Allgemein können Motive, also auf Bildern abgebildete Gegenstände, Lebewesen, Symbole, wie Schriftzeichen oder Zahlen, oder andere Abbildungen in verschiedenen Transformationen, die keine entscheidenden Merkmale zur Klassifikation des Motives darstellen, vorliegen. Insbesondere können diese Transformationen Verschiebungen oder Drehungen, d.h. Translation und Rotation, sein.In general, motifs, ie objects, living beings, symbols such as characters or numbers, or other images depicted in images can be present in various transformations that do not represent any decisive characteristics for the classification of the motif. In particular, these transformations can be translations or rotations, i.e. translation and rotation.

Solche Transformationsvariationen können durch Standardarchitekturen 200, wie in 2 gezeigt, welche bekannte Layer wie Convolutional Layer (CL) 220a und 220b, MaxPooling 235a und 235b, und Fully Connected Layer (FCL) 260 verwenden, nicht abgebildet werden. Bezugszeichen 210 bezeichnet das eingegebene Bild. Bezugszeichen 270 die entsprechende Ausgabe. Die Merkmalskarten der verschiedenen Layer sind mit den Bezugszeichen 230a und 240a sowie 230b und 240b versehen.Such transformation variations can be achieved by standard architectures 200, as in 2 shown using known layers such as Convolutional Layer (CL) 220a and 220b, MaxPooling 235a and 235b, and Fully Connected Layer (FCL) 260 are not shown. Numeral 210 denotes the input image. Numeral 270 the corresponding output. The feature maps of the various layers are provided with the reference symbols 230a and 240a and 230b and 240b.

Insbesondere wird in Standardarchitekturen das Eingangsbild 210 durch die Filterkernel eines CL 220a gefiltert, wobei durch die Filterkernel eine Merkmalskarte 230a erzeugt wird. 2 zeigt unter 220a drei Filterkernel, die entsprechend die Merkmalskarte 230a erzeugt. Durch die fünf Filterkernel des CL 220b wird entsprechend die Merkmalskarte 230b erzeugt.In particular, in standard architectures, the input image 210 is filtered by the filter kernels of a CL 220a, and a feature map 230a is generated by the filter kernels. 2 shows at 220a three filter kernels that generate the feature map 230a accordingly. The feature map 230b is correspondingly generated by the five filter kernels of the CL 220b.

Die Merkmalskarte 230a bzw. 230b ist hierbei eine 3D-Merkmalskarte, die aus den drei bzw. fünf 2D-Merkmalskarten besteht, die durch die drei bzw. fünf Filterkernel erzeugt werden.The feature map 230a or 230b is a 3D feature map that consists of the three or five 2D feature maps that are generated by the three or five filter kernels.

Zur Reduktion der Daten ist ebenfalls bekannt einen MaxPool Layer 235a, 235b, wie in 2 gezeigt, beispielsweise im Anschluss an einen Convolutional Layer 220a und 220b, einzusetzen. 8 zeigt hierbei eine Darstellung mit den Parametern der Kernelsize = 2 und Stride = 2. Dabei wird ein Filterkernel mit einem bestimmten Stride (zu Deutsch etwa „Schrittweite“ oder „Verschub“) über das Bild 810 verschoben und jeweils der maximale Wert einer Filterregion in das Ausgangsbild 820 übernommen. Für einen Stride ≥ 2 entsteht eine Dimensionsreduktion. Durch Missachtung des Nyquist-Theorems können Translations- sowie Rotationsäquivarianz durch den MaxPool Layer nicht gehalten werden, was bekannt ist.A MaxPool Layer 235a, 235b is also known for reducing the data, as in 2 shown, for example following a convolutional layer 220a and 220b. 8th shows a representation with the parameters of kernel size = 2 and stride = 2. A filter kernel is shifted with a certain stride (in English something like "step size" or "offset") over the image 810 and the maximum value of a filter region in the Source image 820 taken over. For a stride ≥ 2, a dimension reduction occurs. By disregarding the Nyquist theorem, translational and rotational equivariance cannot be maintained by the MaxPool layer, which is known.

In 2 wird die Merkmalskarte 230a durch den MaxPool Layer 235a zu der Merkmalskarte 240a reduziert.In 2 the feature map 230a is reduced by the MaxPool Layer 235a to the feature map 240a.

In 2 ist somit eine Faltung durch den CL 220a zu sehen. Im Anschluss daran ist eine analoge Faltung durch einen weiteren CL 220b abgebildet. Dieser hat fünf Filterkernel, wodurch eine Merkmalskarte 230b erzeugt wird, die wiederum durch den MaxPool Layer 235b zu einer reduzierten Merkmalskarte 240b reduziert wird.In 2 a fold can thus be seen through the CL 220a. Following this, an analog folding is shown by another CL 220b. This has five filter kernels, creating a feature map 230b, which in turn is generated by the MaxPool Layer 235b is reduced to a reduced feature map 240b.

Die reduzierten Merkmalskarten 240a und 240b sind entsprechend ebenfalls 3D-Merkmalskarten.Accordingly, the reduced feature maps 240a and 240b are also 3D feature maps.

Zur weiteren Verarbeitung wird dann die Merkmalskarte, die einen mehrdimensionalen Output darstellt, in einen eindimensionalen Vektor überführt. Diesen Vorgang nennt man „Flattening“, und das Ergebnis davon ist als Vektor 255 zu sehen.The feature map, which represents a multidimensional output, is then converted into a one-dimensional vector for further processing. This process is called "flattening" and the result can be seen as vector 255.

Eine verbesserte Klassifikation hinsichtlich geometrischer Transformationen der Objekte durch das NN kann erreicht werden, wenn das NN auf solche verschobenen oder gedrehten Motive trainiert wird.An improved classification with regard to geometric transformations of the objects by the NN can be achieved if the NN is trained on such shifted or rotated motifs.

Ein Lösungsansatz hierzu ist die künstliche Generierung solcher Trainingsdaten, beispielsweise durch geometrische Data Augmentation Verfahren, d.h. die vorhandenen Trainingsdaten werden durch Translation und Rotation verändert und dadurch neue, zusätzliche Trainingsdaten erzeugt.One solution to this is the artificial generation of such training data, for example using geometric data augmentation methods, i.e. the existing training data is changed by translation and rotation, thereby generating new, additional training data.

Da Standardarchitekturen, wie oben definiert, aber nicht invariant gegenüber Rotationen und Translationen sind, müssen diese Variationen der Daten, wenn die Anwendung es erfordert, durch die Trainingsdaten abgedeckt werden. Dies führt, aufgrund einer größeren Trainingsdatenmenge, zu einem erhöhten Bedarf an Rechenintensität, einer längeren Trainingszeit des Netzwerkes sowie zu einer größeren benötigten Netzwerkkapazität. Das Netzwerk muss beispielsweise alle Varianten einer Drehung eines Objektes durch die Feature-Extraktion abbilden.However, since standard architectures, as defined above, are not invariant to rotations and translations, these variations in the data must be covered by the training data if the application requires it. Due to a larger amount of training data, this leads to an increased need for computing intensity, a longer training time for the network and a larger required network capacity. For example, the network must map all variants of a rotation of an object through feature extraction.

Zusätzlich bietet das geometrische Data Augmentation Verfahren nur begrenzte Möglichkeiten, die Trainingsdaten hinsichtlich verschiedener Transformationen zu variieren. Da es sich um eine synthetische Generierung von neuen Daten handelt, spiegeln diese unter Umständen, beispielsweise aufgrund von verlustbehafteten Rotationen, die Realität nicht exakt wieder.In addition, the geometric data augmentation method offers only limited possibilities to vary the training data with regard to different transformations. Since this is a synthetic generation of new data, it may not reflect reality exactly, for example due to lossy rotations.

Ist zusätzlich nicht bekannt, welche Transformationsvariationen im Ausgangsdatensatz vorliegen, kann durch ein Data Augmentation Verfahren ein Ungleichgewicht bezüglich verschiedener Ausprägungen einer Klasse erzeugt werden.If it is also not known which transformation variations are present in the original data set, an imbalance with regard to different characteristics of a class can be created using a data augmentation method.

Nachteilig an Data Augmentation Verfahren ist also, dass diese nur begrenzte Möglichkeiten bieten, die Trainingsdaten hinsichtlich verschiedener Transformationen zu variieren und dass es zusätzlich den Gesamtaufwand des Lösungsprozesses erhöht. Insbesondere wird der Bedarf hinsichtlich Speicher und Rechenintensität erhöht.A disadvantage of data augmentation methods is that they offer only limited options for varying the training data with regard to different transformations and that it also increases the overall effort of the solution process. In particular, the need for memory and computational intensity is increased.

Es ist daher Aufgabe der Erfindung, die Nachteile des Standes der Technik zu überwinden, und ein ressourcensparendes Verfahren bereitzustellen.It is therefore the object of the invention to overcome the disadvantages of the prior art and to provide a resource-saving method.

Die Aufgabe wird durch den Gegenstand des unabhängigen Anspruch 1 gelöst.The object is solved by the subject matter of independent claim 1.

Demnach umfasst ein computerimplementiertes Verfahren in einem KNN, wobei das KNN mindestens einen Convolutional Layer 320 mit mindestens einem Filterkernel umfasst, folgende Schritte: Drehen 110 jedes Filterkernels um mindestens einen Winkel aus 90, 180 und 270 Grad, wodurch für jeden Filterkernel für jeden für ein Drehen verwendeten Winkel jeweils ein gedrehter Filterkernel erzeugt wird, wobei die erzeugten gedrehten Filterkernel zusammen mit dem jeweiligen Filterkernel eine Kernelgruppe bilden. Filtern 120 eines Bildes durch jeden Filter jeder Kernelgruppe, wodurch für jeden Filter jeder Kernelgruppe jeweils eine Merkmalskarte erzeugt wird. Und Zusammenfassen 130 aller erzeugten Merkmalskarten einer Kernelgruppe zu einer Gesamtmerkmalskarte.Accordingly, a computer-implemented method in an ANN, the ANN comprising at least one convolutional layer 320 with at least one filter kernel, comprises the following steps: Rotating 110 each filter kernel by at least one of 90, 180 and 270 degrees, whereby for each filter kernel for each one Rotate used angle in each case a rotated filter kernel is generated, wherein the generated rotated filter kernels together with the respective filter kernel form a kernel group. Filtering 120 an image through each filter of each kernel group, thereby generating a feature map for each filter of each kernel group. And combining 130 all generated feature maps of a kernel group into an overall feature map.

Hierbei sind die durch den jeweiligen Filterkernel erzeugten Merkmalskarten zweidimensional. Die Gesamtmerkmalskarten sind ebenfalls zweidimensional. Die Gesamtheit aller Gesamtmerkmalskarten kann aber beim Vorliegen von mehr als einem Ausgangsfilterkernel und somit bei mehr als einer Kernelgruppe dreidimensional sein.In this case, the feature maps generated by the respective filter kernel are two-dimensional. The overall feature maps are also two-dimensional. However, the entirety of all feature maps can be three-dimensional if there is more than one output filter kernel and thus if there is more than one kernel group.

Zusammen ergeben somit die 2D-Gesamtmerkmalskarten eine 3D-Gesamtmerkmalskarte, zumindest für mehr als eine Kernelgruppe.Together, the 2D overall feature maps thus result in a 3D overall feature map, at least for more than one kernel group.

Dadurch können alle Varianten, d.h. Drehungen und Verschiebungen, durch ein und dieselbe Feature-Extraktion abgebildet werden.This allows all variants, i.e. rotations and translations, to be mapped by one and the same feature extraction.

Weiterhin wird eine entsprechende Vorrichtung, sowie ein computerlesbares Medium vorgeschlagen.Furthermore, a corresponding device and a computer-readable medium are proposed.

Bevorzugte Ausführungen der Erfindung sind in den Unteransprüchen angegeben. Weitere Vorteile, Merkmale und Eigenschaften der Erfindung werden durch die folgende Beschreibung bevorzugter Ausführungen der beiliegenden Zeichnungen erläutert, in denen zeigt:

1 ein Verfahren entsprechend einem Ausführungsbeispiel,
2 ein NN entsprechend dem Stand der Technik,
3 ein NN entsprechend einem Ausführungsbeispiel,
4 Beispieldaten des MNIST Datensatzes,
5 Beispieldaten von Zellabbildungen,
6 Filterkernel entsprechend einem Ausführungsbeispiel,
7 Zusammenfassen der Merkmalskarten entsprechend einem Ausführungsbeispiel,
8 ein Beispiel für einen MaxPool Layer entsprechend dem Stand der Technik,
9 einen MaxBlurPool Layer, und
10 Module gemäß einem Ausführungsbeispiel für ein NN.

Preferred embodiments of the invention are specified in the subclaims. Further advantages, features and properties of the invention are explained by the following description of preferred embodiments of the accompanying drawings, in which shows:

1 a method according to an embodiment,
2 a state-of-the-art NN,
3 a NN according to an embodiment,
4 sample data from the MNIST data set,
5 example data of cell images,
6 filter kernel according to an embodiment,
7 Summarizing the feature maps according to an embodiment,
8th an example of a MaxPool Layer according to the state of the art,
9 a MaxBlurPool Layer, and
10 Modules according to an embodiment for a NN.

Für dieselben oder ähnliche Komponenten oder Schritte werden nachfolgend dieselben oder ähnliche Bezugszeichen verwendet.The same or similar reference symbols are used below for the same or similar components or steps.

Mikroskopiedaten, wie etwa in 5 zu sehen, oder andere Motive, wie beispielsweise Schriftzeichen wie etwa Ziffern, wie etwa in 4 zu sehen, sollen unabhängig ihrer Ausrichtung und ihrer Position im Bild mithilfe eines KNN beispielsweise klassifiziert werden. 4 zeigt hierbei den sogenannten MNIST Datensatz, der aus der MNIST Datenbank (Modified National Institute of Standards and Technology database) stammt, die handgeschriebene Ziffern beinhaltet, welche sich zentriert in der Bildmitte befinden. Dieser Datensatz wird in der Forschung häufig zum Trainieren von KI-Systemen verwendet. Es ist klar, dass beispielsweise die Klassifikation und damit die Bilderkennung bei verdrehten und/oder verschobenen Testdaten bzw. Ziffern durch herkömmliche NNs nicht ohne weiteres geleistet werden kann.Microscopy data, such as in 5 to see, or other motifs, such as characters such as numbers, such as in 4 to be seen, should be classified independently of their alignment and their position in the image using an ANN, for example. 4 shows the so-called MNIST data set, which comes from the MNIST database (Modified National Institute of Standards and Technology database), which contains handwritten digits that are centered in the middle of the image. This data set is often used in research to train AI systems. It is clear that, for example, the classification and thus the image recognition in the case of twisted and/or shifted test data or digits cannot be easily performed by conventional NNs.

Die zuvor genannten Variationen von Transformationen werden durch übliche Netzwerkarchitekturen nicht abgedeckt.The previously mentioned variations of transformations are not covered by usual network architectures.

Auch ist es schwierig und zeitintensiv, alle möglichen Rotationen und Verschiebungen bei der Aufnahme eines Datensatzes abzubilden. Aus diesen Gründen wird die Abbildung der Transformationsvariationen beispielsweise durch Data Augmentation nötig.It is also difficult and time-consuming to map all possible rotations and displacements when recording a data set. For these reasons, the mapping of the transformation variations, for example through data augmentation, becomes necessary.

Die generelle Data Augmentation wird als Verfahren zum Lernen robusterer Modelle eingesetzt. Es dient als Lösung für das Training mit unzureichenden Datensätzen. Ein möglicher Data Augmentation Ansatz bezieht sich auf die geometrischen Transformationen. Dies sorgt für die künstliche Dilatation des Trainingsdatensatzes, um so viele verschiedene Ausprägungen einer Klasse wie möglich durch unterschiedliche Transformationen abzudecken. Dadurch erhöht sich jedoch der Aufwand der Datenakquisition und somit des ganzen KI-getriebenen Lösungsprozesses.General data augmentation is used as a method for learning more robust models. It serves as a solution for training with insufficient data sets. A possible data augmentation approach refers to the geometric transformations. This takes care of the artificial dilation of the training data set in order to cover as many different instances of a class as possible through different transformations. However, this increases the effort involved in data acquisition and thus in the entire AI-driven solution process.

Die vorliegende Erfindung löst daher das Problem so, dass Transformationsvariationen nicht mehr durch den Datensatz abgebildet werden müssen, sondern die Netzwerkarchitektur derart ausgestaltet ist, ohne zusätzliche Transformationsvariationen in den Daten auszukommen.The present invention therefore solves the problem in such a way that transformation variations no longer have to be mapped by the data set, but the network architecture is designed in such a way that it does not require additional transformation variations in the data.

Dazu soll die Architektur in ihrer Gesamtheit Rotations- und Translationsinvarianz als Eigenschaft aufweisen. Dies bedeutet, dass das Netzwerk gegenüber einer Verschiebung oder Drehung des abgebildeten Motivs invariant ist, d.h. dass die Klassifikation des Bildes unabhängig von der Verschiebung oder Drehung erfolgen kann.For this purpose, the architecture as a whole should have rotation and translation invariance as a property. This means that the network is invariant to a translation or rotation of the imaged subject, i.e. the classification of the image can be done independently of the translation or rotation.

Invarianz bezeichnet hier allgemein, dass die genaue Lokalisierung oder Orientierung eines erkannten Merkmales eine geringere Rolle spielt, d.h. die genaue Stelle oder Orientierung eines Merkmales oder Objektes in einer Abbildung führt zu keiner anderen Klassifikation. In anderen Worten, ob ein Hund rechts oder links in einem Bild sitzt, ändert nichts an der Klassifikation „Hund.“ Oder entsprechend, ob das Motiv des Hundes innerhalb des Bildes gedreht wird, ändert ebenfalls nichts an der Klassifikation.Invariance here generally means that the exact localization or orientation of a recognized feature plays a minor role, i.e. the exact location or orientation of a feature or object in an image does not lead to any other classification. In other words, whether a dog sits on the right or left of an image does not change the classification "dog." Or similarly, whether the subject of the dog is rotated within the image does not change the classification either.

Äquivarianz ermöglicht es dem Netzwerk, die Erkennung von Kanten, Texturen und Formen an verschiedenen Orten und bei verschiedenen Orientierungen zu verallgemeinern. Mit anderen Worten, die Kanten eines Objektes in einer Abbildung an verschiedenen Orten oder Orientierungen zu erkennen, kann durch ähnliche Berechnung durchgeführt werden.Equivariance allows the network to generalize the detection of edges, textures, and shapes in different locations and at different orientations. In other words, detecting the edges of an object in an image at different locations or orientations can be done by similar computation.

Das erfindungsgemäße Prinzip wird beispielsweise verwirklicht durch ein computerimplementiertes Verfahren 100 in einem KNN. Dieses KNN umfasst hierbei mindestens einen Convolutional Layer, CL. Innerhalb des CL befindet sich jeweils mindestens ein Filterkernel.The principle according to the invention is implemented, for example, by a computer-implemented method 100 in an ANN. This ANN includes at least one convolutional layer, CL. At least one filter kernel is located within each CL.

Das Verfahren 100 in 1 umfasst hierbei die folgenden Schritte.The procedure 100 in 1 includes the following steps.

In Schritt 110 wird jeder Filterkernel 610, 710 des CL um mindestens einen Winkel aus 90, 180 und 270 Grad gedreht. Es ist möglich, dass je nach Einsatzzweck einer oder zwei der Winkel die Anwendungsanforderungen decken. Es können auch alle drei Winkel angewendet werden. Durch das Anwenden jedes Winkels auf den Ausgangskernel wird jeweils ein gedrehter Filterkernel 620, 630, 640, 720, 730, 740 erzeugt. Wird beispielsweise nur der 90 Grad-Winkel angewendet, so wird nur ein gedrehter Filterkernel erzeugt. Werden sowohl der 90, der 180 als auch der 270 Grad-Winkel angewendet, werden für den Ausgangskernel drei gedrehte Filterkernel erzeugt. Die erzeugten gedrehten Filterkernel bilden zusammen mit dem jeweiligen Ausgangsfilterkernel 610, 710 eine Kernelgruppe.In step 110, each filter kernel 610, 710 of the CL is rotated through at least one of 90, 180, and 270 degrees. Depending on the application, one or two of the brackets may meet the application requirements. All three angles can also be used. Applying each angle to the output kernel produces a rotated filter kernel 620, 630, 640, 720, 730, 740, respectively. For example, if only the 90 degree angle is applied, only a rotated filter kernel will be created. Will both the 90's, the 180 as well as the 270 degree angle are applied, three rotated filter kernels are generated for the output kernel. The generated rotated filter kernels together with the respective output filter kernel 610, 710 form a kernel group.

In Schritt 120 wird ein Bild, das beispielsweise klassifiziert werden soll, durch jede Kernelgruppe gefiltert. Hierdurch wird für das Bild durch jeden Filter einer Kernelgruppe eine Merkmalskarte erzeugt. Man kann auch sagen, dass für jeden Filter einer Kernelgruppe hinsichtlich des Bildes, das gefiltert wird, eine Merkmalskarte erzeugt bzw. generiert wird.In step 120, an image to be classified, for example, is filtered through each kernel group. This creates a feature map for the image through each filter in a kernel group. It can also be said that a feature map is generated for each filter of a kernel group with respect to the image that is being filtered.

In Schritt 130 werden die erzeugten Merkmalskarten einer Kernelgruppe zu einer Gesamtmerkmalskarte zusammengefasst.In step 130, the generated feature maps of a kernel group are combined into an overall feature map.

Das Zusammenfassen der erzeugten Merkmalskarten kann durch verschiedene Operationen bewirkt werden.The merging of the generated feature maps can be effected by various operations.

Beispielsweise können die Merkmalskarten addiert werden. Hierzu werden die erzeugten Merkmalskarten zu einer Gesamtmerkmalskarte pixelweise addiert.For example, the feature maps can be added. For this purpose, the generated feature maps are added pixel by pixel to form an overall feature map.

Alternativ oder zusätzlich kann eine Maximum-Funktion verwendet werden. Hierbei wird das Maximum der Pixelwerte der Merkmalskarten des Bildes in Abhängigkeit der Channel-Dimension, d.h. an jeder Stelle (x,y) gleichzeitig in allen Merkmalskarten bestimmt. Das Maximum bildet dann den neuen Pixelwert an der Stelle (x,y) der Gesamtmerkmalskarte.Alternatively or additionally, a maximum function can be used. Here, the maximum of the pixel values of the feature maps of the image is determined as a function of the channel dimension, i.e. at each point (x,y) simultaneously in all feature maps. The maximum then forms the new pixel value at location (x,y) on the overall feature map.

Es können stattdessen oder zusätzlich auch eine Minimum-Funktion und/oder eine Durchschnitts-Funktion verwendet werden. Diese sind analog definiert.A minimum function and/or an average function can also be used instead or in addition. These are defined analogously.

Schließlich kann auch eine gewichtete Summenfunktion der Pixelwerte verwendet werden. Hierbei werden die Pixelwerte jeder Merkmalskarte aufsummiert, wobei jedem Summanden eine Gewichtung in Form eines Faktors gegeben wird. Diese Gewichtungen können vorher definiert, oder durch einen beliebigen Lernansatz trainiert werden.Finally, a weighted sum function of the pixel values can also be used. Here, the pixel values of each feature map are summed up, with each summand being given a weighting in the form of a factor. These weights can be predefined, or trained by any learning approach.

Welche der oben aufgeführten Funktionen zur Zusammenfassung der Merkmalskarten zum Einsatz kommt, kann dem KNN als Parameter eingegeben werden. Eine entsprechende externe Information der Auswahl kann beispielsweise durch einen Anwender vorgegeben oder durch ein weiteres KNN erzeugt werden.Which of the functions listed above is used to summarize the feature maps can be entered as a parameter in the ANN. Corresponding external information about the selection can, for example, be specified by a user or generated by another ANN.

Optional kann anschließend ein Schritt 140 erfolgen, bei dem die Datenmenge der Gesamtmerkmalskarte durch Anwenden eines Pooling Layers 335 auf die Gesamtmerkmalskarte reduziert wird. Dadurch wird eine reduzierte Gesamtmerkmalskarte 340 erzeugt.Optionally, a step 140 can then take place, in which the amount of data of the overall feature map is reduced by applying a pooling layer 335 to the overall feature map. A reduced overall feature map 340 is thereby generated.

Die Schritte 110, 120 und 130 können zusammen auch als Faltung, bezeichnet werden, und der Schritt 140 kann als Pooling bezeichnet werden.Steps 110, 120 and 130 may also collectively be referred to as convolution, and step 140 may be referred to as pooling.

Als weitere Ausführungsform können Faltung und Pooling in beliebiger Kombination, und auch mehrfach, ausgeführt werden.As a further embodiment, convolution and pooling can be carried out in any combination, and also multiple times.

Optional kann weiterhin eine erste Fouriertransformation, FFT, auf die Gesamtmerkmalskarte oder die reduzierte Gesamtmerkmalskarte angewendet werden, wodurch eine erste modifizierte Gesamtmerkmalskarte erzeugt wird.Optionally, a first Fourier transform, FFT, may further be applied to the full feature map or the reduced full feature map, thereby generating a first modified full feature map.

Außerdem können optional die Achsen der ersten modifizierten Gesamtmerkmalskarte in Polarkoordinatendarstellung konvertiert werden, wodurch eine konvertierte Gesamtmerkmalskarte erzeugt wird.Additionally, optionally, the axes of the first modified global feature map may be converted to polar coordinate representation, thereby creating a converted global feature map.

Darüber hinaus kann auch eine weitere Fouriertransformation auf die erste modifizierte Gesamtmerkmalskarte oder die konvertierte Gesamtmerkmalskarte angewendet werden, wodurch eine zweite modifizierte Gesamtmerkmalskarte erzeugt wird.In addition, a further Fourier transform can also be applied to the first modified full feature map or the converted full feature map, thereby producing a second modified full feature map.

Das Pooling kann im Verfahren auch durch einen MaxBlurPool Layer erfolgen.The pooling can also be done in the method by a MaxBlurPool layer.

Analog wird auch eine Vorrichtung vorgestellt, wobei die Merkmale der Verfahren einzeln oder in Kombination auch für die Vorrichtungen implementiert werden können.Analogously, a device is also presented, in which case the features of the methods can also be implemented individually or in combination for the devices.

Insbesondere wird in 3 eine Vorrichtung 300 gezeigt, welche insgesamt ein KNN, darstellt, wobei das KNN mindestens einen Convolutional Layer 320a, 320b umfasst, mit jeweils mindestens einem Filterkernel 610, 710. Bezugszeichen 310 bezeichnet das eingegebene Bild. Bezugszeichen 370 die entsprechende Ausgabe. Die Filterkernel sind in den 6 und 7 dargestellt.In particular, in 3 a device 300 is shown, which as a whole represents an ANN, the ANN comprising at least one convolutional layer 320a, 320b, each with at least one filter kernel 610, 710. Reference numeral 310 designates the input image. Numeral 370 the corresponding output. The filter kernels are in the 6 and 7 shown.

Insbesondere ist in 3 zu sehen, dass das Eingangsbild 310 durch die Filterkernel eines CL 320a gefiltert wird, wobei durch jeden Filterkernel eine Merkmalskarte erzeugt wird. Die Merkmalskarten einer Kernelgruppe werden dann zusammengefasst, und bilden eine Gesamtmerkmalskarte 330a.In particular, in 3 It can be seen that the input image 310 is filtered by the filter kernels of a CL 320a, with a feature map being generated by each filter kernel. The feature maps of a kernel group are then combined to form an overall feature map 330a.

3 zeigt unter 320a beispielhaft drei Ausgangsfilterkernel. Das Bild 310 wird also dementsprechend durch die Filterkernel von drei Kernelgruppen gefiltert. Alle drei Kernelgruppen zusammen können, je nachdem wie viele gedrehte Filterkernel für jeden der Ausgangsfilterkernel erzeugt werden, insgesamt bis zu zwölf Filterkernel aufweisen. Zu den drei Ausgangsfilterkerneln können jeweils ein bis drei gedrehte Filterkernel erzeugt werden, was in Summe (zusammen mit den Ausgangsfilterkerneln) maximal zwölf Filterkernel ergibt. 3 320a shows three output filter kernels by way of example. The image 310 is accordingly filtered by the filter kernels of three kernel groups. All three kernel groups together can have up to twelve filter kernels in total, depending on how many rotated filter kernels are generated for each of the output filter kernels. One to three rotated filter kernels can be created for each of the three output filter kernels, which in total (together with the output filter kernels) results in a maximum of twelve filter kernels.

Aus den entstehenden maximal zwölf Merkmalskarten, werden dann jeweils die durch eine Kernelgruppe erzeugten Merkmalskarten zusammengefasst, so dass sich für jede der Kernelgruppen eine Gesamtmerkmalskarte 330a ergibt.The feature cards generated by a kernel group are then combined from the resulting maximum of twelve feature cards, so that an overall feature card 330a results for each of the kernel groups.

Hierbei sind wiederum die durch den jeweiligen Filterkernel erzeugten Merkmalskarten zweidimensional. Die Gesamtmerkmalskarten 330a, die durch Zusammenfassung für jede der Kernelgruppen erzeugt werden, sind ebenfalls zweidimensional. Die Gesamtheit aller Gesamtmerkmalskarten kann aber beim Vorliegen von mehr als einem Ausgangsfilterkernel und somit bei mehr als einer Kernelgruppe dreidimensional sein.Here again, the feature maps generated by the respective filter kernel are two-dimensional. The overall feature maps 330a generated by merging for each of the kernel groups are also two-dimensional. However, the entirety of all feature maps can be three-dimensional if there is more than one output filter kernel and thus if there is more than one kernel group.

Die Filterung und Zusammenfassung ist in 3 mit 325a bezeichnet, und geschieht im CL 320a. Deren Ergebnis sind, beispielhaft unter Verwendung der drei abgebildeten Filter im CL 320a, drei resultierende 2D-Gesamtmerkmalskarten 330a, die zusammen eine 3D-Gesamtmerkmalskarte bilden.The filtering and summarization is in 3 designated 325a, and occurs in the CL 320a. Their result is, for example using the three filters shown in the CL 320a, three resulting 2D overall feature maps 330a, which together form a 3D overall feature map.

3 weist einige Elemente mehrfach auf, dies soll verdeutlichen, dass einige Elemente mehrfach vorkommen können, und nicht beschränkend wirken. 3 has some elements more than once, this should make it clear that some elements can occur more than once, and are not limiting.

Insbesondere wird in 3 die 3D-Gesamtmerkmalskarte 330a durch einen Pooling Layer 335a, beispielsweise einen MaxBlurPool Layer, zu einer 3D-Gesamtmerkmalskarte 340a reduziert. In 3 ist somit eine erfindungsgemäße Faltung durch den CL 320a zu sehen.In particular, in 3 the 3D aggregate feature map 330a is reduced to a 3D aggregate feature map 340a by a pooling layer 335a, for example a MaxBlurPool layer. In 3 a folding according to the invention by the CL 320a can thus be seen.

Im Anschluss daran ist eine analog durchgeführte zweite Faltung durch einen weiteren CL 320b abgebildet, welche denselben Grundsätzen der Erfindung folgt. CL 320b hat fünf Filterkernel, wodurch fünf 2D-Gesamtmerkmalskarten 330b erzeugt werden, die zusammen die 3D-Gesamtmerkmalskarte bilden, die wiederum durch einen Pooling Layer 335b, beispielsweise einen MaxBlurPool Layer, zu fünf reduzierten 2D-Gesamtmerkmalskarten 340b reduziert werden, die zusammen eine 3D-Gesamtmerkmalskarte bilden.Following this, a second convolution performed in an analogous manner is depicted by a further CL 320b, which follows the same principles of the invention. CL 320b has five filter kernels, producing five 2D aggregate feature maps 330b, which together form the 3D aggregate feature map, which in turn are reduced by a pooling layer 335b, such as a MaxBlurPool layer, to five reduced 2D aggregate feature maps 340b, which together form a 3D -Form an overall feature map.

Im Folgenden wird die Erfindung anhand eines Ausgangsfilterkernels 610, 710 in einem CL 320 (320a oder 320b) beschrieben. Wie aber oben erwähnt, kann ein CL mehrere Ausgangsfilterkernel aufweisen, und ein KNN kann auch mehrere CL-Schichten (320a und 320b) haben.The invention is described below using an output filter kernel 610, 710 in a CL 320 (320a or 320b). However, as mentioned above, a CL can have multiple output filter kernels, and an ANN can also have multiple CL layers (320a and 320b).

Insbesondere können Faltung 325a, 325b und Pooling 335a, 335b in beliebiger Kombination eingesetzt werden.In particular, folding 325a, 325b and pooling 335a, 335b can be used in any combination.

Das KNN ist dazu eingerichtet, jeden Filterkernel 610, 710 um mindestens einen Winkel aus 90, 180 und 270 Grad zu drehen. Es ist je nach Anforderungen durch die spezifische Anwendung möglich, dass je nach Einsatzzweck einer oder zwei der Winkel die Anforderungen decken. Es können auch alle drei Winkel angewendet werden. Durch das Anwenden jedes Winkels auf den Ausgangskernel wird jeweils ein gedrehter Filterkernel 620, 630, 640, 720, 730, 740 erzeugt. Wird beispielsweise nur der 90 Grad-Winkel angewendet, so wird nur ein gedrehter Kernel 620, 720 erzeugt. Werden sowohl der 90, der 180 als auch der 270 Grad-Winkel angewendet, werden für den Ausgangskernel drei gedrehte Filterkernel 620, 630, 640, 720, 730, 740 erzeugt. Die erzeugten gedrehten Filterkernel 620, 630, 640, 720, 730, 740 bilden zusammen mit dem jeweiligen Ausgangsfilterkernel 610, 710 eine Kernelgruppe.The ANN is configured to rotate each filter kernel 610, 710 through at least one of 90, 180, and 270 degrees. Depending on the requirements of the specific application, it is possible for one or two of the brackets to cover the requirements, depending on the intended use. All three angles can also be used. Applying each angle to the output kernel produces a rotated filter kernel 620, 630, 640, 720, 730, 740, respectively. For example, if only the 90 degree angle is applied, only a rotated kernel 620, 720 will be created. Applying both the 90, 180 and 270 degree angles will produce three rotated filter kernels 620, 630, 640, 720, 730, 740 for the output kernel. The generated rotated filter kernels 620, 630, 640, 720, 730, 740 together with the respective output filter kernel 610, 710 form a kernel group.

Das KNN ist weiterhin dazu eingerichtet, ein Bild 701, das beispielsweise klassifiziert werden soll, durch jeden Kernel einer Kernelgruppe 710, 720, 730, 740 zu filtern. Dies ist in 7 zu sehen. Hierdurch wird für das Bild 701 durch jeden Kernel 710, 720, 730, 740 der Kernelgruppe eine Merkmalskarte 711, 721, 731, 741 erzeugt. Man kann auch sagen für jeden Kernel aus der Kernelgruppe wird hinsichtlich des Bildes, das gefiltert wird, eine Merkmalskarte erzeugt.The ANN is also set up to filter an image 701, which is to be classified, for example, through each kernel of a kernel group 710, 720, 730, 740. this is in 7 to see. As a result, a feature map 711, 721, 731, 741 is generated for the image 701 by each kernel 710, 720, 730, 740 of the kernel group. One can also say that a feature map is generated for each kernel from the kernel group in terms of the image that is being filtered.

Das KNN ist weiterhin dazu eingerichtet die erzeugten Merkmalskarten 711, 721, 731, 741 einer Kernelgruppe zu einer Gesamtmerkmalskarte 771 zusammenzufassen.The ANN is also set up to combine the generated feature maps 711, 721, 731, 741 of a kernel group into an overall feature map 771.

Für die Zusammenfassung 770 der Merkmalskarten einer Kernelgruppe stehen dem KNN die zuvor für das Verfahren beschriebenen Funktionen zur Verfügung, Addition aller erzeugten Merkmalskarten, Auswahl des Maximums aller erzeugten Merkmalskarten, Auswahl des Minimums aller erzeugten Merkmalskarten, Berechnung des Durchschnittwertes aller erzeugten Merkmalskarten, und/oder Bildung einer gewichteten Summe aller erzeugten Merkmalskarten.For the summary 770 of the feature maps of a kernel group, the ANN has the functions described above for the method, addition of all generated feature maps, selection of the maximum of all generated feature maps, selection of the minimum of all generated feature maps, calculation of the average value of all generated feature maps, and/or Formation of a weighted sum of all generated feature maps.

Die Beschreibung der Funktionen wie oben bei der Beschreibung der Verfahren ausgeführt, gilt analog für die Vorrichtung.The description of the functions as stated above in the description of the methods applies analogously to the device.

Das KNN kann dazu eingerichtet sein, die Auswahl aus den Funktionen von einem Anwender oder einem weiteren KNN, zu empfangen. Das weitere KNN kann Bestandteil der Vorrichtung sein, oder eine weitere Vorrichtung darstellen oder Teil dieser sein.The ANN can be set up to receive the selection from the functions from a user or another ANN. The further ANN can be part of the device, or represent a further device or be part of it.

Weiterhin kann das KNN einen Pooling Layer 335a, 335b aufweisen, und das KNN kann dazu eingerichtet sein, den Pooling Layer auf die Gesamtmerkmalskarte anzuwenden, so dass deren Datenmenge reduziert wird. Dadurch wird eine reduzierte Gesamtmerkmalskarte erzeugt.Furthermore, the ANN can have a pooling layer 335a, 335b, and the ANN can be set up to apply the pooling layer to the overall feature map, so that its amount of data is reduced. This creates a reduced overall feature map.

Das KNN kann weiterhin dazu eingerichtet sein, Fouriertransformationen und/oder Achsenkonvertierungen in Polarkoordinatendarstellung auf Gesamtmerkmalskarten anzuwenden. Dies kann beispielsweise in einem Transformationsmodul 350 geschehen.The ANN can also be set up to apply Fourier transformations and/or axis conversions in polar coordinate representation to overall feature maps. This can be done in a transformation module 350, for example.

Der Pooling Layer 335a, 335b, kann auch als MaxBlurPool Layer ausgestaltet sein.The pooling layer 335a, 335b can also be designed as a MaxBlurPool layer.

In Standardarchitekturen ist der Convolutional Layer translationsäquivariant, da die Operation der Faltung als solche die Translationsäquivarianz als Eigenschaft aufweist. Durch die Natur der Operationen in einem KNN ist aber der Fully Connected Layer, FCL, nicht geeignet, weder die Translationsäquivarianz noch die Translationsinvarianz zu halten.In standard architectures, the convolutional layer is translation equivariant since the operation of convolution as such has translation equivariance as a property. However, due to the nature of the operations in an ANN, the Fully Connected Layer, FCL, is not suitable for maintaining either translation equivariance or translation invariance.

Um dies zu ermöglichen, und damit das gesamte Netzwerk um die Eigenschaft der Translationsinvarianz zu erweitern, kann ein Transformationsmodul 350, welches vor dem FC Layer 360 eingesetzt wird, verwendet werden, welches zur Aufgabe hat, translationsäquivariante Merkmale in einen translationsinvarianten Raum zu überführen, auf welchem der Fully Connected Layer, FCL, 360 agieren kann.In order to make this possible, and thus to expand the entire network with the property of translation invariance, a transformation module 350, which is used in front of the FC layer 360, can be used, which has the task of converting translation-equivariant features into a translation-invariant space which the Fully Connected Layer, FCL, can act 360.

Dieses Transformationsmodul 350 kann entsprechend eine FFT ausführen, welche aus der klassischen Bild- und Signalverarbeitung bekannt ist und sich aus Amplitude und Phase zusammensetzt. Eine Translation im Ortsbereich bildet sich als Phasenverschiebung im Fourierraum ab, sodass die Amplituden für zwei zueinander verschobene Objekte im Fourierraum gleich sind.This transformation module 350 can correspondingly execute an FFT, which is known from classic image and signal processing and is made up of amplitude and phase. A translation in the spatial domain is represented as a phase shift in Fourier space, so that the amplitudes for two objects that are shifted relative to one another in Fourier space are the same.

Mit Hinzunahme des Transformationsmoduls 350 kann eine translationsinvariante Netzwerkarchitektur bereitgestellt werden.With the addition of the transformation module 350, a translation-invariant network architecture can be provided.

Die Faltung in einem CNN ist nicht rotationsäquivariant, sondern wie oben beschrieben nur äquivariant gegenüber der Translation.The folding in a CNN is not rotationally equivariant, but only equivariant to translation, as described above.

Um ein KNN bereitzustellen, das genauso unabhängig von der Rotation wie von der Translation arbeiten kann, um beispielsweise eine Klassifikation vorzunehmen, wird nachfolgend eine rotationsinvariante Netzwerkarchitektur vorgestellt. Aus diesem Grund wird wie folgt ein rotationsäquivarianter Convolutional Layer eingeführt.In order to provide an ANN that can work just as independently of the rotation as of the translation, for example to carry out a classification, a rotation-invariant network architecture is presented below. For this reason, a rotationally equivariant convolutional layer is introduced as follows.

Hierbei wird eine rotationsäquivariante Convolution 320 durch mehrere abhängige Filter bewirkt, sogenannte Filterkernel, um mögliche Drehungen eines Objektes abzudecken. Der rotationsäquivariante Convolution Layer kann auch als EquiConv Layer bezeichnet werden.Here, a rotation-equivariant convolution 320 is brought about by a plurality of dependent filters, so-called filter kernels, in order to cover possible rotations of an object. The rotationally equivariant convolution layer can also be referred to as the EquiConv layer.

Zur weiteren Verarbeitung werden dann die 2D-Gesamtmerkmalskarten, die zusammen eine 3D-Gesamtmerkmalskarte bilden, in einen eindimensionalen Vektor überführt.The 2D overall feature maps, which together form a 3D overall feature map, are then converted into a one-dimensional vector for further processing.

Diesen Vorgang nennt man „Flattening“, und das Ergebnis davon ist als Vektor 355 zu sehen.This process is called "flattening" and the result can be seen as vector 355.

6 zeigt die Filtergenerierung des rotationsäquivarianten Convolution Layers. Ein Ausgangsfilter F0° 610 kann im Kontext des Netzwerk-Trainings seine Gewichte lernen. In Abhängigkeit des Ausgangsfilters F0° 610 werden ein, zwei oder drei weitere Filterkernel 620, 630, 640 erzeugt, die sich aus der Rotation des Ausgangsfilters um 90°, 180° und 270° ergeben. 6 shows the filter generation of the rotationally equivariant convolution layer. An output filter F0° 610 can learn its weights in the context of network training. Depending on the output filter F0° 610, one, two or three further filter kernels 620, 630, 640 are generated, which result from rotating the output filter by 90°, 180° and 270°.

Diese Filtergewichte werden während des Trainings nicht optimiert. Vielmehr werden diese fortlaufend aus dem trainierten Ausgangsfilter erzeugt, sodass unabhängig von der Rotation des Objektes dieselben Merkmale durch die gleiche Zusammensetzung der Filterungen extrahiert werden.These filter weights are not optimized during training. Rather, these are generated continuously from the trained output filter, so that the same features are extracted by the same composition of the filters, regardless of the rotation of the object.

Das Filterergebnis, d.h. die Gesamtmerkmalskarte 771, setzt sich dementsprechend aus der Summe der Filterung, d.h. der Merkmalskarten 711, 721, 731, 741, der Inputbilder mit allen Filtern zusammen, siehe 7. Obwohl hier alle Filter angeben werden, kann es ausreichen - in Abhängigkeit vom Einsatzzweck, der Separierbarkeit der Filterkernel oder andere Anforderungen durch die spezifische Anwendung - dass nur ein, zwei oder drei gedrehte Filter, oder Filterkernel, erzeugt werden. Statt der Summe, können wie zuvor beschrieben auch andere Operationen zur Zusammenfassung 770 der Merkmalskarten 711, 721, 731, 741 eingesetzt werden.The filter result, ie the overall feature map 771, is composed accordingly of the sum of the filtering, ie the feature maps 711, 721, 731, 741, of the input images with all filters, see 7 . Although all filters are specified here, it may be sufficient - depending on the intended use, the separability of the filter kernels or other requirements of the specific application - that only one, two or three rotated filters, or filter kernels, are generated. Instead of the sum, other operations for summarizing 770 the feature maps 711, 721, 731, 741 can also be used, as described above.

Insbesondere die Drehungen um 90, 180, und 270 Grad können verlustfrei ausgeführt werden, sodass das Nyquist-Theorem in jedem Fall eingehalten werden kann. Werden separierbare Filter gelernt, kann die äquivariante Faltung auf zwei Filterkernel, 0° und 90°, reduziert werden, da die Filterkernel für die Rotation um 180° und 270° dieselben sind.In particular, the rotations of 90, 180, and 270 degrees can be performed without loss, so that the Nyquist theorem is always observed can be. If separable filters are learned, the equivariant convolution can be reduced to two filter kernels, 0° and 90°, since the filter kernels for the 180° and 270° rotation are the same.

Kann die Separierbarkeit der Filter während des Trainings nicht vorausgesetzt werden und die Anwendung es erfordert, wird für den rotationsäquivarianten Convolution Layer die Kombination aller drei Rotationen des Filters und dem Ausgangsfilter F0° angewendet. So kann im Rahmen dieser Kombination der verlustfreien Drehungen der Filterkernel, d.h. durch Zusammenfassen der einzelnen Merkmalskarten einer Kernelgruppe zu einer Gesamtmerkmalskarte, eine rotationsäquivariante Filterung erzeugt werden.If the separability of the filters cannot be assumed during the training and the application requires it, the combination of all three rotations of the filter and the output filter F0° is used for the rotation-equivariant convolution layer. Within the framework of this combination of the lossless rotations of the filter kernels, i.e. by combining the individual feature maps of a kernel group into an overall feature map, a rotation-equivariant filtering can be generated.

Durch Verwendung des beschriebenen Verfahrens entsteht durch Hinzufügen der zusätzlichen Richtungsinformation der 90°, 180° und 270° Rotation der Filter ein translations- und rotationsäquivariantes Verhalten des Convolutional Layers 320a, 320b.By using the method described, adding the additional directional information of the 90°, 180° and 270° rotation of the filters results in a translational and rotationally equivariant behavior of the convolutional layers 320a, 320b.

Der Fully Connected Layer 260 in herkömmlichen KNNs ist ebenfalls nicht rotationsinvariant. Es wird vorgeschlagen, das für die Translationsinvarianz eingesetzte Transformationsmodul 350 um eine Polarkoordinatendarstellung zu erweitern und eine zweite Fouriertransformation durchzuführen.The fully connected layer 260 in conventional ANNs is also not rotationally invariant. It is proposed to extend the transformation module 350 used for the translation invariance by a polar coordinate representation and to carry out a second Fourier transformation.

Wie bereits für die Eigenschaft der Translationsinvarianz vorangegangen beschrieben, kann auch für die Rotationsinvarianz ein Transformationsmodul 350 vor dem FC Layer 360 eingesetzt werden, oder falls bereits ein Transformationsmodul 350 eingesetzt wurde, kann dieses auch um folgende Funktionen erweitert werden. Zur Transformation rotationsäquivarianter Merkmale in einen rotationsinvarianten Raum kann ebenfalls die Fouriertransformation ausgenutzt werden.As already described above for the translation invariance property, a transformation module 350 can also be used before the FC layer 360 for the rotation invariance, or if a transformation module 350 has already been used, it can also be expanded by the following functions. The Fourier transformation can also be used to transform rotationally equivariant features into a rotationally invariant space.

Durch Konvertierung der Achsen einer Fouriertransformation in Polarkoordinatendarstellung, kann die Rotation zweier Objekte in eine Translation umgeformt werden. Zur Auflösung der Translation kann eine weitere Fouriertransformation herangezogen werden. Um Translations- und Rotationsäquivarianz in einen gemeinsamen invarianten Raum zu überführen, wird dem bereits existierenden Fouriertransformationsmodul 350 eine Transformation der Daten in Polarkoordinaten angefügt.By converting the axes of a Fourier transformation into polar coordinate representation, the rotation of two objects can be transformed into a translation. A further Fourier transformation can be used to resolve the translation. In order to convert translational and rotational equivariance into a common invariant space, a transformation of the data into polar coordinates is added to the already existing Fourier transformation module 350 .

Da für diese Operation ein zentraler Pixel benötigt wird, kann die Amplitude der ersten Fouriertransformation durch geeignetes Zurechtschneiden auf eine ungerade Pixelseitenanzahl, so dass ein einzelner Pixel in der Mitte jeder Achse existiert, durchgeführt werden. Anschließend können die Achsen konvertiert werden, um die Polarkoordinatendarstellung zu erhalten.Since a central pixel is required for this operation, the amplitude of the first Fourier transform can be performed by appropriate cropping to an odd number of pixel sides so that a single pixel exists at the center of each axis. Then the axes can be converted to get the polar coordinate representation.

Diese Operation ist nur notwendig, wenn eine Polarkoordinatendarstellung erzeugt werden soll, um Rotationsinvarianz herzustellen.This operation is only necessary if a polar coordinate representation is to be generated in order to establish rotational invariance.

Es kann eine weitere Fouriertransformation folgen, von welcher lediglich die Amplitude als neuer Input für den FCL 360 dient. Hiermit ergibt sich ein translations- und rotationsinvariantes NN-Design.Another Fourier transformation can follow, of which only the amplitude serves as a new input for the FCL 360. This results in a translation and rotation invariant NN design.

Eine weitere Möglichkeit besteht darin, optional als vorteilhafte Ausführung, die Daten der Ausgabe weiter zu reduzieren. Hierfür kann im Anschluss ein Pooling Layer, insbesondere ein MaxBlurPool Layer, eingesetzt werden. Dies ist in 3 zu sehen und in 9 detailliert dargestellt.Another possibility, optionally as an advantageous embodiment, is to further reduce the data of the output. A pooling layer, in particular a MaxBlurPool layer, can then be used for this. this is in 3 to see and in 9 shown in detail.

9 zeigt einen MaxBlurPool Layer, der sich aus einem einfachen MaxPool Layer 915 (mit Stride = 1) beim Übergang von der Input-Merkmalskarte 910 zur Merkmalskarte 920, einer Faltung mit einem Blur-Kernel 930 und einem anschließenden Downsampling Layer 945 (mit Stride = 2) beim Übergang von der gefalteten Merkmalskarte 940 zur reduzierten Output-Merkmalskarte 950 zusammensetzt. 9 shows a MaxBlurPool Layer, which results from a simple MaxPool Layer 915 (with stride = 1) in the transition from the input feature map 910 to the feature map 920, a convolution with a blur kernel 930 and a subsequent downsampling layer 945 (with stride = 2 ) in the transition from the convolved feature map 940 to the reduced output feature map 950.

10 zeigt, wie durch das Verfahren beziehungsweise die Vorrichtung dieser Anmeldung, ein KNN trainiert werden kann. Zuerst wird lediglich das Modell 1040 mit dem rotationsäquivarianten Convolution Layer trainiert, welches aus einem Feature-Extraction Modul 1010 und einem Anwendungs-Modul 1030 besteht. Das Anwendungs-Modul 1030 kann beispielsweise ein Classifier-Modul oder auch ein Detector-Modul sein. Ein Feature-Extraction-Modul 1010 dient der Merkmalsextraktion. Ein Classifier-Modul dient der Klassifikation. Unter gewissen Bedingungen, beispielsweise wenn wie hierin beschrieben, ein Transformationsmodul 1022 eingesetzt wird, werden die dabei gelernten Gewichte des Fully-Connected Layers, FCL, des Classifier Moduls für den weiteren Verlauf nicht mehr benötigt. 10 shows how an ANN can be trained using the method or the device of this application. First, only the model 1040 is trained with the rotationally equivariant convolution layer, which consists of a feature extraction module 1010 and an application module 1030. The application module 1030 can be a classifier module or a detector module, for example. A feature extraction module 1010 is used for feature extraction. A classifier module is used for classification. Under certain conditions, for example if a transformation module 1022 is used as described herein, the weights of the fully connected layer, FCL, of the classifier module learned in the process are no longer required for the further course.

Es soll dadurch erreicht werden, dass das gesamte NN invariant gegenüber Rotation und Translation wird.It is to be achieved that the entire NN becomes invariant with respect to rotation and translation.

Der erste Trainingsteil 1040 dient dem Lernen des Feature-Extraction Moduls, welcher translations- und rotationsäquivariante Merkmale im Ortsbereich generieren soll. Anschließend kann ein Feature-Extraction Modul 1050, z.B. ein FFT-Transformationsmodul 1021, mit einer FFT-Amplituden-Berechnung sowie einem Classifier Modul 1030 kombiniert werden, was als FFT-Modell bezeichnet werden kann. Dies ist für Problemstellungen sinnvoll, die lediglich Translationsinvarianz fordern und keine Rotationsinvarianz benötigen.The first training part 1040 is used to learn the feature extraction module, which is intended to generate translationally and rotationally equivariant features in the spatial domain. A feature extraction module 1050, for example an FFT transformation module 1021, can then be combined with an FFT amplitude calculation and a classifier module 1030, which can be referred to as an FFT model. This is useful for problems that only require translational invariance and do not require rotational invariance.

Sind sowohl Translations- als auch Rotationsinvarianz gefordert, ist auch eine weitere Kombination, möglich, die in 10 als 1060 zu sehen ist, und als Polar-FFT Modell bezeichnet werden kann. Dabei wird das Feature-Extraction-Modul 1010 um ein FFT-Transformationsmodul 1022, mit einer FFT-Amplituden-Berechnung und einer Achsenkonvertierung und Classifier-Modul 1030 ergänzt.If both translational and rotational invariance are required, another combination is also possible, which is given in 10 can be seen as 1060, and can be referred to as a polar FFT model. In this case, the feature extraction module 1010 is supplemented by an FFT transformation module 1022 with an FFT amplitude calculation and an axis conversion and classifier module 1030 .

Um das FFT-Modell 1050, bzw. Polar-FFT Model 1060 zu trainieren, werden die Parameter des zuvor trainierten Featureextraction-Moduls 1010 des Modells mit rotationsäquivariantem Convolution Layer 1040 eingefroren, sodass beispielsweise das Verfahren des Stochastic Gradient Descent lediglich über die Gewichte des Classifier-Models backpropagiert. Die FFT-Amplituden-Berechnung sowie das Polar-FFT-Modul sind lediglich Bildtransformationen und halten keine zu lernenden Parameter.In order to train the FFT model 1050 or polar FFT model 1060, the parameters of the previously trained feature extraction module 1010 of the model with rotationally equivariant convolution layer 1040 are frozen, so that, for example, the stochastic gradient descent method only uses the weights of the classifier -Models backpropagated. The FFT amplitude calculation and the polar FFT module are only image transformations and do not hold any parameters to be learned.

Andere Verfahren zum Training sind aber ebenfalls denkbar, insbesondere ist für alle Module aus 10 auch ein end-to-end Training möglich. Dabei erhält ein KNN Rohdaten und eine Aufgabe, etwa eine Klassifikation, und lernt, diese Aufgabe automatisch zu erledigen. In anderen Worten, es werden keine Gewichte der Layer eingefroren, alle Modulteile können in einem Trainingsschritt optimiert werden und müssen somit nicht mehr ausgetauscht werden.However, other methods of training are also conceivable, in particular is off for all modules 10 end-to-end training is also possible. An ANN receives raw data and a task, such as a classification, and learns to do this task automatically. In other words, no layer weights are frozen, all module parts can be optimized in one training step and therefore no longer have to be exchanged.

Zusammengefasst wird ein Konzept vorgestellt, welches durch eine einfache Anpassung der Standardoperation der Convolution und Hinzufügen eines Transformationsmodules eine Netzwerkarchitektur konstruiert, welche der Anforderung nach Rotations- und Translationsinvarianz genügt.In summary, a concept is presented which, by simply adapting the standard operation of convolution and adding a transformation module, constructs a network architecture which satisfies the requirement for rotation and translation invariance.

Es wird außerdem ein computerlesbares Medium vorgeschlagen, welches Befehle umfasst, die bei der Ausführung durch einen Prozessor diesen veranlassen, die Schritte des Verfahrens wie zuvor beschrieben auszuführen.A computer-readable medium is also proposed, which comprises instructions which, when executed by a processor, cause it to carry out the steps of the method as described above.

Claims

Computer-implemented method (100) in an artificial neural network, ANN, wherein the ANN comprises at least one convolutional layer, each with at least one filter kernel, with: rotating (110) each filter kernel through at least one of 90, 180 and 270 degrees, thereby creating a rotated filter kernel for each filter kernel for each angle used for rotating, the created rotated filter kernels together with the respective filter kernel forming a kernel group; filtering (120) an image through each filter kernel of a kernel group, thereby generating a feature map for each filter kernel of a kernel group; and Combining (130) all generated feature maps into an overall feature map.

procedure after claim 1 wherein the merging of all generated feature maps is effected by one or more of the following steps: - adding all generated feature maps; - selecting the maximum of all generated feature maps; - selecting the minimum of all generated feature maps; - calculating the average value of all generated feature maps; and/or - forming a weighted sum of all generated feature maps.

procedure after claim 2 , wherein the selection of the steps for combining the feature maps is performed by another artificial neural network, ANN.

Procedure according to one of Claims 1 until 3 , further comprising reducing (140) the data set of the aggregate feature map by applying a pooling layer to the aggregate feature map, thereby creating a reduced aggregate feature map.

procedure after claim 4 , wherein the process steps of claim 1 collectively referred to as convolution (325a, 325b), method step 140 of claim 4 is referred to as pooling (335a, 335b), and where convolution and pooling are performed multiple times in any combination.

Procedure according to one of Claims 1 until 5 , further comprising applying a first Fourier transform to the full feature map or the reduced full feature map, thereby producing a first modified full feature map.

procedure after claim 6 , further comprising converting the axes of the first modified global feature map to polar coordinate representation, thereby creating a converted global feature map.

procedure after claim 6 or 7 , further comprising applying a further Fourier transform to the first modified global feature map or the converted global feature map, thereby creating a second modified aggregate feature map.

Procedure according to one of Claims 4 until 8th , with pooling performed by a MaxBlurPool layer.

Device (300) with an artificial neural network, ANN, wherein the ANN comprises: at least one convolutional layer (320a, 320b) each having at least one filter kernel (610, 710); wherein the ANN is set up to rotate each filter kernel (610, 710) by at least one of 90, 180 and 270 degrees in order to generate a rotated filter kernel for each filter kernel for each angle used for a rotation, the generated rotated filter kernels (620, 630, 640, 720, 730, 740) together with the respective filter kernel form a kernel group; wherein the ANN is set up to filter an image (310, 701) through each filter kernel (610, 620, 630, 640, 710, 720, 730, 740) of a kernel group in order to, for each filter kernel (610, 620, 630, 640, 710, 720, 730, 740) of a kernel group to generate a feature map (711, 721, 731, 741) respectively; and wherein the ANN is set up to combine all generated feature maps (711, 721, 731, 741) into an overall feature map (330a, 330b, 771).

device after claim 10 , wherein the ANN is set up to summarize all generated feature maps by one or more of: - adding all generated feature maps; - Selection of the maximum of all generated trait cards; - selection of the minimum of all generated feature maps; - Calculation of the average value of all generated feature maps; and/or - formation of a weighted sum of all generated feature maps.

device after claim 11 wherein the ANN is arranged to receive the selection of the feature map aggregation functions (770) as input from another artificial neural network, ANN, which may be part of the apparatus.

Device according to one of Claims 10 until 12 , wherein the ANN is further set up to reduce the amount of data of the overall feature map (330a, 330b, 771) by using a pooling layer (335a, 335b) in order to generate a reduced overall feature map.

Device according to one of Claims 10 until 13 , wherein the ANN further comprises a transformation module 350 configured to apply Fourier transformations and/or axis conversions in polar coordinate representation to full feature maps.

Device according to one of Claims 13 or 14 , where the pooling layer is a MaxBlurPool layer.

A computer-readable medium comprising instructions which, when executed by a processor, cause the latter to perform the steps of the method according to any one of Claims 1 until 9 to execute.