WO2023016859A1 - Entraînement de réseaux neuronaux pour une équivariance ou une invariance par rapport à des changements dans l'image d'entrée - Google Patents

Entraînement de réseaux neuronaux pour une équivariance ou une invariance par rapport à des changements dans l'image d'entrée Download PDF

Info

Publication number
WO2023016859A1
WO2023016859A1 PCT/EP2022/071667 EP2022071667W WO2023016859A1 WO 2023016859 A1 WO2023016859 A1 WO 2023016859A1 EP 2022071667 W EP2022071667 W EP 2022071667W WO 2023016859 A1 WO2023016859 A1 WO 2023016859A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
input
transformations
learning
images
Prior art date
Application number
PCT/EP2022/071667
Other languages
German (de)
English (en)
Inventor
Ivan Sosnovik
Sadaf Gulshad
Arnold Smeulders
Jan Hendrik Metzen
Original Assignee
Robert Bosch Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Robert Bosch Gmbh filed Critical Robert Bosch Gmbh
Publication of WO2023016859A1 publication Critical patent/WO2023016859A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]

Definitions

  • the present invention relates to the training of neural networks that process images and map them, for example, to classification scores in relation to classes of a given classification.
  • driver assistance systems and systems for at least partially automated driving process the measurement data recorded by sensors of a vehicle with classifiers to form classification scores in relation to one or more classes of a specified classification. On the basis of these classification scores, for example, decisions are then made about interventions in the driving dynamics of the vehicle.
  • the training of such classifiers requires training data with a high degree of variability, so that the classifier can generalize well to measurement data previously unseen in the training.
  • the recording of training data on test drives with the vehicle and especially the largely manual labeling of this training data with target classification scores are time-consuming and expensive.
  • the training data is often enriched with synthetically generated training data.
  • DE 10 2018 204494 B3 discloses a method with which radar signals can be synthetically generated in order to enrich physically recorded radar signals for training a classifier. Disclosure of Invention
  • a method for training a neural network was developed as part of the invention.
  • This neural network is designed to process input images and includes multiple layers of convolution.
  • each convolution layer is designed to map its respective input f to at least one feature map ⁇ ( ⁇ , ⁇ ) by applying at least one filter kernel ⁇ .
  • this feature map ⁇ ( ⁇ , ⁇ ) has a significantly reduced dimensionality compared to the input f.
  • an image classifier that maps input images to classification scores related to one or more classes of a given classification can be chosen as the neural network.
  • the feature maps supplied by the last convolutional layer in a sequence of convolutional layers can be evaluated with regard to the classification scores.
  • a set ⁇ of transformations T is provided, with respect to which the neural network is to be enabled during training, when these transformations are applied to the input f of at least one convolutional layer, the generation of at least one equivariant or invariant feature map ⁇ ( ⁇ , ⁇ ) TO learn.
  • This does not mean that the feature map ⁇ ( ⁇ , ⁇ ) always becomes equivariant or invariant against all transformations T from the set ⁇ . Rather, the aim is to make the feature map ⁇ ( ⁇ , ⁇ ) equivariant or invariant to transformations to the extent that such transformations occur in the learning images used during training.
  • the feature map ⁇ ( ⁇ , ⁇ ) to be made equivariant or invariant is expressed by an aggregation of feature maps ⁇ j ( ⁇ , T j [ ⁇ ]) parameterized with parameters, which are each obtained by applying transformations T j ⁇ ⁇ can be obtained on the at least one filter core ⁇ .
  • These parameters are used as additional degrees of freedom when training the neural network.
  • Learning images and learning outputs, onto which the trained neural network should ideally map these learning images, are provided for the monitored training.
  • the learning images are mapped onto outputs by the neural network, and deviations of these outputs from the learning outputs are evaluated using a predetermined cost function.
  • Parameters of the parameterized aggregation and other parameters that characterize the behavior of the neural network are now optimized with the aim that the evaluation by the cost function will probably improve with further processing of learning images.
  • These further parameters can in particular be weights, for example, with which inputs that are fed to neurons or other processing units of the neural network are weighted and summed to result in an activation of this neuron or this processing unit.
  • the neural network learns to make feature maps equivariant or invariant to transformations of the input to exactly the extent that the neural network's performance with respect to the particular concrete application actually does is beneficial. This is somewhat analogous to the fitting process for glasses at an optician.
  • the transformations T here correspond to the various corrective lenses for short-sightedness, long-sightedness, astigmatism and other imaging errors of the eye. It will be exactly those corrections applied, with which the customer can best recognize the numbers and letters presented for testing.
  • the useful effect of the trained equivariances and invariances during training is, in particular, that the neural network recognizes as the same objects and facts in different input images that differ only by an application of the said transformations and are otherwise the same in content.
  • the variability of the learning images used can concentrate on those properties that are to be examined with the neural network.
  • a certain quantitative measure of performance in relation to the task of the neural network for example in the case of an image classifier measured by the classification accuracy on a set of test or validation data, can then be achieved overall with a smaller amount of training images.
  • Learning images of traffic situations labeled with learning outputs are particularly expensive to obtain, since long test drives are necessary and the labeling requires manual work.
  • the filter core ⁇ is configured as a linear combination parameterized with parameters expressed by basis functions i .
  • the effect of the transformations T on the basis functions ⁇ i can then be calculated in advance and used again and again. during the In the training, only the parameters are varied to adjust the linear combination.
  • each adjustment of the linear combination in the course of a training step entails a lower computational effort.
  • the feature maps can then be weighted among one another in the aggregation, for example with weights ⁇ j ( ⁇ ) dependent on the input f.
  • a feature map ⁇ j ( ⁇ , ⁇ [ ⁇ ],) obtained by applying one or more transformations T j ⁇ ⁇ can then be written as:
  • is an arbitrary aggregation function and the T j are the
  • Transformations from the set ⁇ can in particular also contain identity as a transformation, for example.
  • the training can then be initialized, for example, in such a way that initially only the weight ⁇ j ( ⁇ ) for the identity is equal to 1 and the weights ⁇ j ( ⁇ ) for all other transformations T j are equal to 0.
  • functions can be selected as basis functions ⁇ for the filter cores ⁇ that depend at least via Hermitian polynomials H m , H n on spatial coordinates x, y in the input f: with a normalization constant A and the scaling factor n.
  • Such basic functions can be used in particular to construct filter cores ⁇ that are particularly suitable for recognizing features in images.
  • At least one feature map ⁇ ( ⁇ , ⁇ ) is selected, which contains a sum of the input f and a processing product that results from the successive application of a plurality of filter cores ⁇ I, ⁇ 2, . . . to the input f.
  • a feature map ⁇ ( ⁇ , ⁇ [ ⁇ 1 , ⁇ 2 , ... ]), which is obtained by applying one or more transformations T j ⁇ ⁇ can then be written as:
  • ⁇ 1 , ⁇ 2 , ... are the basis functions from which the filter cores K 2 , ... are formed.
  • the transformations can in particular be elastic transformations, for example. These are transformations that can be described at least approximately as a field of displacements T in spatial coordinates x of the input image f with a strength s:
  • the elastic transformations can in particular include, for example, linear stretching and/or rotational scaling. These are transformations that are caused, for example, by changing the perspective of a camera relative to an object.
  • Coordinates x', y' in the input image after a linear stretching can, for example, according to from the original coordinates x, y.
  • y 1/(e -6 + cos( ⁇ ))
  • 0 - ⁇
  • arctan(y/x)
  • is a small design
  • is a coefficient of elasticity.
  • the aggregation function ⁇ for the aggregation of the feature maps ⁇ j ( ⁇ , ⁇ j [ ⁇ ]) can be used, for example, for each element of the feature maps
  • an element-by-element maximum or an element-by-element mean is understood in particular to mean, for example, that a maximum or a mean is formed separately for each entry in the dimensions C ⁇ H ⁇ W along the dimension K of the transformations.
  • a smoothed maximum can be obtained using the Logsumexp function be determined.
  • the aggregation of feature maps ⁇ j includes forming a norm over one or more spatial dimensions of each feature map and selecting one or more feature maps based on this norm.
  • l p - norms can be formed along the dimensions C, H x W or C x H x W. It can then be determined along the K-dimension for which transformations the largest norms result. A feature map and thus also a transformation can be selected that best fits the existing data.
  • the trained neural network is supplied with input images that were recorded with at least one sensor, so that these input images are mapped onto outputs by the neural network.
  • a control signal is determined from the outputs.
  • a vehicle and/or a system for product quality control and/or a system for monitoring areas is controlled with this control signal.
  • the probability that the action carried out by the respectively controlled system is appropriate to the situation detected by the sensor is then advantageously increased.
  • the method can be fully or partially computer-implemented.
  • the invention therefore also relates to a computer program with machine-readable instructions which, when executed on one or more computers, cause the computer or computers to carry out the method described.
  • control devices for vehicles and embedded systems for technical devices that are also able to execute machine-readable instructions are also to be regarded as computers.
  • the invention also relates to a machine-readable data carrier and/or a download product with the computer program.
  • a downloadable product is a digital product that can be transmitted over a data network, i.e. can be downloaded by a user of the data network and that can be offered for sale in an online shop for immediate download, for example.
  • a computer can be equipped with the computer program, with the machine-readable data carrier or with the downloadable product.
  • FIG. 1 embodiment of the method 100 for training a neural network 1
  • FIG. 2 Exemplary effect of training with method 100 on the classification accuracy of an image classifier.
  • FIG. 1 is a schematic flowchart of an exemplary embodiment of the method 100 for training a neural network 1.
  • an image classifier can be selected as the neural network 1 in step 105, for example.
  • the neural network 1 is designed to process input images 2 and comprises a number of convolution layers. Each of these convolution layers is designed to map the input f of the respective convolution layer to at least one feature map ⁇ ( ⁇ , ⁇ ) by applying at least one filter kernel ⁇ .
  • a set ⁇ of transformations T is provided.
  • the neural network 1 can learn how to generate at least one equivariant or invariant feature map ⁇ ( ⁇ , ⁇ ) when applying one or more of these transformations T to the input f of at least one convolutional layer of the network 1 .
  • elastic transformations for example, which can be described as a field of deflections in spatial coordinates of the input image, can be selected as transformations T ⁇ ⁇ .
  • These elastic transformations can include, for example, linear stretching and/or rotational scaling in accordance with block IIIa.
  • this feature map ⁇ ( ⁇ , ⁇ ) is expressed 120 by an aggregation 5 of feature maps ⁇ j ( ⁇ , ⁇ j [ ⁇ ]) parameterized with parameters 5a, each obtained by applying transformations T j ⁇ ⁇ to the at least one filter kernel ⁇ . That is, the output of the corresponding convolutional layer changes depending on the parameters 5a.
  • the filter kernel ⁇ can be used with parameters parameterized linear combination can be expressed by basis functions ⁇ i .
  • basis functions ⁇ i can be selected, which depend at least on Hermitian polynomials on location coordinates x, y in the input f.
  • the feature maps ⁇ j ( ⁇ , ⁇ j [ ⁇ ]) in the aggregation 5 can be weighted among one another with weights ⁇ j ( ⁇ ) dependent on the input f.
  • At least one feature map ⁇ ( ⁇ , ⁇ ) can be selected for the parameterization with the parameters 5a, which is a sum of the input f and a processing product that is obtained by successive application of a plurality of filter cores ⁇ 1 , ⁇ 2 , ... the input f arises.
  • Such a feature map is the work result of a "residual block”.
  • Aggregating the feature maps ⁇ j ( ⁇ , ⁇ j [ ⁇ ]) may include, according to block 124, for each element of these feature maps
  • the aggregation of the feature maps ⁇ j ( ⁇ , ⁇ j [ ⁇ ]) according to block 125 can include forming a norm over one or more spatial dimensions of each feature map and selecting one or more feature maps based on this norm.
  • step 130 learning images 2a and learning outputs 3a, onto which the trained neural network 1 should ideally map these learning images 2a, are provided.
  • step 140 the learning images 2a are mapped by the neural network 1 to outputs 3.
  • Deviations of these outputs 3 from the learning outputs 3a are evaluated in step 150 using a predetermined cost function 4 .
  • step 160 parameters 5a of the parameterized aggregation 5 and other parameters 1a that characterize the behavior of the neural network 1 are optimized with the aim that the evaluation 4a by the cost function 4 is expected to improve with further processing of learning images 2a.
  • the fully trained states of the parameters 1a and 5a are denoted by the reference symbols 1a* and 5a*, respectively.
  • the completely trained neural network 1, whose behavior is characterized by the parameters 1a* and 5a*, is denoted by the reference symbol 1*.
  • step 170 input images 2 that were recorded with at least one sensor 51 are supplied to the trained neural network 1*. These input images 2 are mapped onto outputs 3 by the neural network 1 .
  • step 180 a control signal 180a is determined from the outputs 3.
  • step 190 a vehicle 50 and/or a system 60 for product quality control and/or a system 70 for monitoring areas is controlled with this control signal 180a.
  • the loss AA in classification accuracy which occurs when the input images are noisy with a strength P, is plotted for a neural network 1 of the WideResnet-18 architecture designed as an image classifier.
  • the curves a to e relate to states of the neural network 1 after different training sessions.
  • the experiment was fed with the publicly accessible data set STL-10, which contains 5000 training images and 8000 test images with a size of 96x96 pixels from 10 different classes.
  • Curve a refers to conventional training.
  • the curves b to e relate to different examples of training according to the method 100 described here.
  • the loss of accuracy caused by the noisy input images can be at least partially compensated again by the improved training. For some configurations, even with noise-free input images, there is already a gain (curve above curve a).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé (100) pour l'entraînement d'un réseau neuronal (1) qui est conçu pour traiter des images d'entrée (2) et comprend de multiples couches de convolution, chaque couche de convolution étant conçue pour mapper l'entrée f de la couche de convolution respective sur au moins une carte de caractéristiques K) à l'aide d'au moins un cœur de filtre K. Le procédé comprend les étapes consistant à : • fournir (110) une quantité T de transformations T, par rapport à laquelle le réseau neuronal (1) doit être capable d'apprendre comment générer au moins une carte de caractéristiques équivariante ou invariante K) lorsque lesdites transformations sont appliquées à l'entrée f d'au moins une couche de convolution ; • exprimer (120) la carte de caractéristiques K) par une agrégation (5) de cartes de caractéristiques O7(f, T7 [K]) paramétrées par des paramètres (5a), chaque carte de caractéristiques étant obtenue par application de transformations 7) e T au(x) cœur(s) de filtre K ; • fournir (130) des images d'apprentissage (2a) et des sorties d'apprentissage (3a) sur lesquelles le réseau neuronal entraîné (1) doit idéalement mapper les images d'apprentissage (2a) ; • mapper (140) les images d'apprentissage (2a) sur des sorties (3) par le réseau neuronal (1) ; • évaluer (150) des écarts des sorties (3) à partir des sorties d'apprentissage (3a) à l'aide d'une fonction de coût spécifiée (4) ; et • optimiser (160) des paramètres (5a) de l'agrégation paramétrée (5) ainsi que des paramètres supplémentaires (la) qui caractérisent le comportement du réseau neuronal (1), en ayant pour but une amélioration attendue du processus d'évaluation (4a) à l'aide de la fonction de coût (4) lors du traitement ultérieur d'autres images d'apprentissage (2a).
PCT/EP2022/071667 2021-08-12 2022-08-02 Entraînement de réseaux neuronaux pour une équivariance ou une invariance par rapport à des changements dans l'image d'entrée WO2023016859A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE102021208877.5 2021-08-12
DE102021208877.5A DE102021208877A1 (de) 2021-08-12 2021-08-12 Training von neuronalen Netzwerken auf Äquivarianz oder Invarianz gegen Änderungen des Eingabe-Bildes

Publications (1)

Publication Number Publication Date
WO2023016859A1 true WO2023016859A1 (fr) 2023-02-16

Family

ID=83115415

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2022/071667 WO2023016859A1 (fr) 2021-08-12 2022-08-02 Entraînement de réseaux neuronaux pour une équivariance ou une invariance par rapport à des changements dans l'image d'entrée

Country Status (2)

Country Link
DE (1) DE102021208877A1 (fr)
WO (1) WO2023016859A1 (fr)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102018204494B3 (de) 2018-03-23 2019-08-14 Robert Bosch Gmbh Erzeugung synthetischer Radarsignale

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102019214402A1 (de) 2019-09-20 2021-03-25 Robert Bosch Gmbh Verfahren und vorrichtung zum verarbeiten von daten mittels eines neuronalen konvolutionsnetzwerks

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102018204494B3 (de) 2018-03-23 2019-08-14 Robert Bosch Gmbh Erzeugung synthetischer Radarsignale

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS: "Artificial neural network", WIKIPEDIA, 8 August 2021 (2021-08-08), pages 1 - 11, XP055978327, Retrieved from the Internet <URL:https://en.wikipedia.org/w/index.php?title=Artificial_neural_network&oldid=1037676373> [retrieved on 20221107] *
ANONYMOUS: "predict", 19 June 2021 (2021-06-19), pages 1 - 9, XP055978487, Retrieved from the Internet <URL:https://web.archive.org/web/20210619230904/https://www.mathworks.com/help/stats/classificationneuralnetwork.predict.html> [retrieved on 20221107] *
SADAF GULSHAD ET AL: "Built-in Elastic Transformations for Improved Robustness", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 20 July 2021 (2021-07-20), XP091014161 *

Also Published As

Publication number Publication date
DE102021208877A1 (de) 2023-02-16

Similar Documents

Publication Publication Date Title
DE102017218889A1 (de) Unscharf parametriertes KI-Modul sowie Verfahren zum Betreiben
DE112020000448T5 (de) Kameraselbstkalibrierungsnetz
DE102018208763A1 (de) Verfahren, Vorrichtung und Computerprogramm zum Betreiben eines maschinellen Lernsystems
DE102019209644A1 (de) Verfahren zum Trainieren eines neuronalen Netzes
EP3948688A1 (fr) Entraînement pour réseaux neuronaux artificiels avec une meilleure utilisation des jeux de données d&#39;apprentissage
DE102018220941A1 (de) Auswertung von Messgrößen mit KI-Modulen unter Berücksichtigung von Messunsicherheiten
DE102021203587A1 (de) Verfahren und Vorrichtung zum Trainieren eines Stilencoders eines neuronalen Netzwerks und Verfahren zum Erzeugen einer einen Fahrstil eines Fahrers abbildenden Fahrstilrepräsentation
DE102018222294A1 (de) Verfahren, Computerprogramm, maschinenlesbares Speichermedium sowie Vorrichtung zur Datenvorhersage
WO2023016859A1 (fr) Entraînement de réseaux neuronaux pour une équivariance ou une invariance par rapport à des changements dans l&#39;image d&#39;entrée
DE102019219734A1 (de) Auswertungssystem für Messdaten aus mehreren Domänen
EP3857455A1 (fr) Système d&#39;apprentissage automatique ainsi que procédé, programme informatique et dispositif pour créer le système d&#39;apprentissage automatique
DE102020214850A1 (de) Energie- und speichereffizientes Training neuronaler Netzwerke
WO2021245151A1 (fr) Apprentissage non surveillé d&#39;une présentation commune de données provenant de capteurs de modalité différente
DE102020208765A1 (de) Bildklassifikator mit variablen rezeptiven Feldern in Faltungsschichten
WO2021175783A1 (fr) Procédé mis en oeuvre par ordinateur, système de génération de données de capteur synthétiques et procédé d&#39;apprentissage
EP3748574A1 (fr) Correction adaptative des données mesurées en fonction de différents types de défaillances
WO2006134011A1 (fr) Procede de traitement de donnees numeriques assiste par ordinateur
DE102019210167A1 (de) Robusteres Training für künstliche neuronale Netzwerke
DE102019114049A1 (de) Verfahren zur Validierung eines Fahrerassistenzsystems mithilfe von weiteren generierten Testeingangsdatensätzen
DE102019103192A1 (de) Verfahren zum Erzeugen von Trainingsdaten für ein digitales, lernfähiges Kamerasystem
DE102022202999A1 (de) Erzeugung von Testdatensätzen für die Prüfung, inwieweit ein trainierter Klassifikator zur Generalisierung fähig ist
DE102021201019A1 (de) Semantische Segmentierung von Bildern ohne kleinteilig gelabelte Trainingsbilder
DE102021124252A1 (de) Neuronale Netzwerksysteme für abstraktes Denken
DE102021115251A1 (de) Erzeugen eines Eingangsbilds für einen Algorithmus zur Computer-Vision
DE102021132542A1 (de) Verfahren zum bereitstellen eines bit-flip-fehlerrobusten; perturbationsrobusten und komprimierten neuronalen netzes; computerprogramm; fahrerassistenzsystem

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22761104

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE