FR3113273A1

FR3113273A1 - Automated Neural Network Compression for Autonomous Driving

Info

Publication number: FR3113273A1
Application number: FR2008291A
Authority: FR
Inventors: Hugo Tessier; Thomas Hannagan; Vincent Gripon
Original assignee: PSA Automobiles SA
Current assignee: PSA Automobiles SA
Priority date: 2020-08-05
Filing date: 2020-08-05
Publication date: 2022-02-11

Abstract

L’invention concerne la conduite autonome d’un véhicule, et en particulier la génération d’une instruction de conduite à partir d’une donnée obtenue à partir d’au moins un capteur du véhicule, par exécution (34) sur un dispositif compris dans le véhicule d’un algorithme fondé sur un réseau de neurones FIG. 1The invention relates to the autonomous driving of a vehicle, and in particular the generation of a driving instruction from data obtained from at least one sensor of the vehicle, by execution (34) on a device comprising in the vehicle of an algorithm based on a neural network FIG. 1

Description

Automated Neural Network Compression for Autonomous Driving

La présente invention appartient au domaine de la conduite autonome, et en particulier aux réseaux de neurones utilisés dans ce domaine. Dans le détail, il concerne des réseaux neurones optimisés en taille et complexité par un compression automatisée obtenue lors de l’entrainement.The present invention belongs to the field of autonomous driving, and in particular to the neural networks used in this field. In detail, it concerns neural networks optimized in size and complexity by an automated compression obtained during training.

On entend par « véhicule terrestre à moteur » tout type de véhicule tel qu’un véhicule automobile, un cyclomoteur, une motocyclette, un robot de stockage dans un entrepôt, etc.“Motor land vehicle” means any type of vehicle such as a motor vehicle, a moped, a motorcycle, a storage robot in a warehouse, etc.

On entend par « réseau de neurones » tout type de réseau de neurones artificiels de la famille des méthodes de l’intelligence artificielle par apprentissage. Un réseau de neurones à convolution (CNN, convolutional neural networks), un réseau récursif à cascade de corrélation ou encore à rétropropagation sont des exemples de réseaux de neurones. Un réseau de neurones est aussi appelé réseau neuronal. Dans la présente invention, on utilise préférentiellement un réseau de neurones profonds, c’est-à-dire comportant au moins cinq couches.The term "neural network" means any type of artificial neural network from the family of methods of artificial intelligence by learning. A convolutional neural network (CNN), a recursive correlation cascade network or a backpropagation network are examples of neural networks. A neural network is also called a neural network. In the present invention, a deep neural network is preferably used, that is to say comprising at least five layers.

Les véhicules autonomes utilisent des réseaux de neurones pour assurer les fonctions de conduite autonome. Ces réseaux nécessitent de nombreux paramètres se présentant sous forme de poids des connexions du réseau, et pour les réseaux à convolutions, sous forme de noyaux de convolutions, qui ont un coût en terme de mémoire embarquée.Autonomous vehicles use neural networks to provide autonomous driving functions. These networks require many parameters in the form of network connection weights, and for convolutional networks, in the form of convolution kernels, which have a cost in terms of onboard memory.

En particulier, ces réseaux neuronaux nécessitent d’importantes ressources en calculs puisque riches en multiplications et additions tensorielles en virgule flottante : le réseau ResNet50, par exemple, peut requérir 3.8 milliards de FLOPS. De plus, ces réseaux peuvent requérir une mémoire importante : si le ResNet50 comporte environ 600 000 paramètres, les records en terme d’espace occupé peuvent atteindre la centaine de milliards de paramètres (la bibliothèque DeepSpeed de Microsoft, MARQUE DEPOSEE, vise à pouvoir entraîner de tels réseaux). De telles spécifications entraînent un besoin énergétique fort : les solutions connues peuvent consommer jusqu’à 30 Watts, ce qui peut nuire à l’autonomie du véhicule.In particular, these neural networks require significant computational resources since they are rich in floating point tensorial multiplications and additions: the ResNet50 network, for example, can require 3.8 billion FLOPS. In addition, these networks can require a lot of memory: if the ResNet50 has about 600,000 parameters, the records in terms of space occupied can reach hundreds of billions of parameters (Microsoft's DeepSpeed library, REGISTERED TRADEMARK, aims to be able to train such networks). Such specifications lead to a high energy requirement: known solutions can consume up to 30 Watts, which can affect the vehicle's range.

La raison de telles demandes en ressources provient du fait que les performances des réseaux neuronaux bénéficient grandement d’un accroissement de leur nombre de paramètres pour leur entraînement, ce qui explique l’explosion de la taille des réseaux durant la dernière décennie. Toutefois, cela ne veut pas dire que, une fois entraînés, tous ces paramètres soient utiles, d’où l’apparition des techniques d’élagage : elles permettent de bénéficier de la capacité des grands réseaux à trouver des solutions optimales tout en limitant l’impact de cette taille sur le besoin en ressources par l’élimination des paramètres inutiles après l’entraînement des réseaux.The reason for such resource demands comes from the fact that the performance of neural networks greatly benefits from an increase in their number of parameters for their training, which explains the explosion in the size of the networks during the last decade. However, this does not mean that, once trained, all these parameters are useful, hence the appearance of pruning techniques: they make it possible to benefit from the capacity of large networks to find optimal solutions while limiting the impact of this size on the resource requirement by eliminating unnecessary parameters after training the networks.

Afin d’alléger un réseau neuronal profond, de sorte à réduire les ressources en calcul et en mémoire qu’il requiert, il est possible de supprimer, ou « élaguer », des connexions, parmi celles que l’on aura jugé les moins nécessaires d’après un critère qui peut varier en fonction des techniques.In order to lighten a deep neural network, so as to reduce the computational and memory resources it requires, it is possible to delete, or "prune", connections, among those which have been deemed the least necessary. according to a criterion which may vary according to the techniques.

L’un des critères les plus répandu se fonde sur la valeur de ces paramètres, c’est-à-dire leur « magnitude » : les paramètres de moindre magnitude sont jugés moins importants et sont donc ceux qui sont éliminés en priorité. Toutefois, afin que l’élagage de paramètres puisse engendrer une accélération matérielle, il faut pour ça élaguer des neurones entiers, ce qui est appelé « élagage structuré ».One of the most widespread criteria is based on the value of these parameters, i.e. their “magnitude”: the parameters of lesser magnitude are considered less important and are therefore those which are eliminated in priority. However, in order for parameter pruning to cause hardware speedup, it requires pruning entire neurons, which is called “structured pruning”.

Les documents suivants passent en revu diverses techniques d’élagage :The following documents review various pruning techniques:

The State of Sparsity in Deep Neural Networks (https://arxiv.org/pdf/1902.09574), Rethinking the Value of Network Pruning (https://arxiv.org/pdf/1810.05270) and What is the state of neural network pruning ? (https://arxiv.org/pdf/2003.03033.pdf);
Method and system for vision-centric deep-learning-based road situation analysis - US9760806B1;
Systems and methods involving features of adaptive and/or autonomous traffic control - US20150134232A1;
“AutoPrune: Automatic Network Pruning by Regularizing Auxiliary Parameters” (http://papers.nips.cc/paper/9521-autoprune-automatic-network-pruning-by-regularizing-auxiliary-parameters.pdf).

L’élagage des paramètres est habituellement le résultat d’un processus arbitraire imposé manuellement au réseau durant l’entraînement sans que cela soit de son fait. Généralement, s’il permet un gain en nombre de paramètres, ce procédé va toutefois à l’encontre de l’algorithme utilisé pour l’apprentissage du réseau (rétropropagation du gradient) ce qui se traduit par une perte de performances du réseau, qui doit être ensuite ré-entraîné pour compenser la perte (fine-tuning).Parameter pruning is usually the result of some arbitrary process manually imposed on the network during training without its making. Generally, if it allows a gain in number of parameters, this method however goes against the algorithm used for learning the network (backpropagation of the gradient) which results in a loss of performance of the network, which must then be retrained to compensate for the loss (fine-tuning).

Quant à Autoprune, il utilise une technique, développée par « Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to +1 or −1 » (https://arxiv.org/pdf/1602.02830.pdf) qui implique une altération arbitraire du gradient (outil mathématique utilisé durant l’entraînement des réseaux de neurones). Cette altération consiste à calculer les gradients sur les paramètres binarisés mais à les appliquer sur ces mêmes paramètres non-binarisés, ce qui introduit une incohérence entre ce sur quoi les gradients sont appliqués et comment ils ont été calculés.As for Autoprune, it uses a technique, developed by “Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to +1 or −1” (https://arxiv.org/pdf/1602.02830.pdf) which involves altering arbitrary gradient (mathematical tool used during the training of neural networks). This alteration consists in calculating the gradients on the binarized parameters but in applying them to these same non-binarized parameters, which introduces an inconsistency between what the gradients are applied to and how they were calculated.

Il existe donc un besoin d’économie de la mémoire et des ressources en calcul pour les réseaux de neurones embarqués sur véhicules autonomes. En outre, les coûts de calcul nécessaires pour l’entraînement du réseau par les techniques d’élagage de l’état de l’art sont trop importants.There is therefore a need to save memory and computing resources for neural networks embedded in autonomous vehicles. Moreover, the computational costs required for training the network by the state-of-the-art pruning techniques are too large.

La présente invention vient améliorer la situation.The present invention improves the situation.

A cet effet un premier aspect de l’invention concerne un procédé de conduite autonome d’un véhicule, pour la génération d’une instruction de conduite à partir d’une donnée obtenue à partir d’au moins un capteur du véhicule, par exécution sur un dispositif compris dans le véhicule d’un algorithme fondé sur l’application à la donnée d’un réseau de neurones, le procédé comportant les étapes de :To this end, a first aspect of the invention relates to a method for autonomous driving of a vehicle, for the generation of a driving instruction from data obtained from at least one sensor of the vehicle, by execution on a device included in the vehicle of an algorithm based on the application to the data of a neural network, the method comprising the steps of:

receiving the data obtained from the sensor;
generation of the driving instruction by application of the algorithm;

caractérisé en ce que le réseau de neurones est simplifié lors de son entraînement par un élagage, l’élagage consistant à entrainer le réseau de neurones en lui appliquant des masques, l’application des masques lors de l’entrainement supprimant au moins un noyau de convolution du réseau de neurones.characterized in that the neural network is simplified during its training by pruning, the pruning consisting in training the neural network by applying masks to it, the application of the masks during the training removing at least one core of neural network convolution.

L’élagage par simple application de masques configurés pour supprimer directement des noyaux de convolution se fait en un nombre réduit d’étape. En particulier, quand les procédés de compression habituels nécessitent trois étapes : entrainement – élagage – réentrainement (ces étapes pouvant être réitérées plusieurs fois), le procédé selon l’invention se limite à une seule étape (entrainement avec masques).Pruning by simple application of masks configured to directly remove convolution kernels is done in a reduced number of steps. In particular, when the usual compression methods require three steps: training – pruning – retraining (these steps can be repeated several times), the method according to the invention is limited to a single step (training with masks).

La seule intervention extérieure (celle d’élaguer les poids) est garantie d’être sans aucun impact sur le réseau et d’être ainsi purement transparente tout en permettant une diminution du temps de calcul pour l’entraînement.The only external intervention (that of pruning the weights) is guaranteed to have no impact on the network and thus to be purely transparent while allowing a reduction in the calculation time for the training.

Puisque les masques, qui sont des éléments du réseau constitués de paramètres, peuvent être appris lors de l’apprentissage, cela revient, pour le réseau, à apprendre de lui-même à réduire sa propre architecture durant l’entraînement de manière purement automatisée, naturelle et sans intervention extérieure pouvant dénaturer l’apprentissage.Since the masks, which are elements of the network made up of parameters, can be learned during training, this amounts, for the network, to learning by itself to reduce its own architecture during training in a purely automated way, natural and without external intervention that could distort learning.

Dans un mode de réalisation, l’application des masques lors de l’entrainement comporte les sous-étapes de :In one embodiment, the application of the masks during training includes the sub-steps of:

obtaining predetermined training data;
applying the neural network to the predetermined training data, the application comprising applying an activation function.

Dans un mode de réalisation, la fonction d’activation est une fonction d’activation non saturante.In one embodiment, the activation function is a non-saturating activation function.

Dans un mode de réalisation, la fonction d’activation non saturant est une unité de rectification linéaire, ReLu.In one embodiment, the non-saturating activation function is a linear rectification unit, ReLu.

Ainsi, tous les paramètres négatifs des masques deviennent nuls. De cette manière, l’élagage est naturel, dérivable et ne rompt pas le graphe de calcul (utilisé par les frameworks, structure de conception informatique, de d’apprentissage profond pour effectuer l’apprentissage par rétropropagation de l’erreur, nécessaire à l’apprentissage). De plus, comme la dérivée de ReLU pour les termes négatifs est nulle, alors les feature maps (entrées/sorties entre les couches d’un réseau de neurones) annulées, du fait d’éléments négatifs des masques, ne peuvent être réactivés (comme ces paramètres négatifs ne peuvent plus apprendre). De ce fait, les neurones correspondants sont élagués pendant l’entraînement.Thus, all the negative parameters of the masks become zero. In this way, pruning is natural, derivable, and does not break the computational graph (used by deep learning frameworks, computer design frameworks, to perform error backpropagation learning, necessary for 'learning). Moreover, since the derivative of ReLU for negative terms is zero, then feature maps (inputs/outputs between layers of a neural network) canceled, due to negative elements of the masks, cannot be reactivated (like these negative parameters can no longer learn). Due to this, the corresponding neurons are pruned during training.

Un deuxième aspect de l’invention concerne un programme informatique comportant des instructions pour la mise en œuvre du procédé selon le premier ou le deuxième aspect de l’invention, lorsque ces instructions sont exécutées par un processeur.A second aspect of the invention relates to a computer program comprising instructions for implementing the method according to the first or the second aspect of the invention, when these instructions are executed by a processor.

Un troisième aspect de l’invention concerne un dispositif de conduite autonome d’un véhicule, pour la génération d’une instruction de conduite à partir d’une donnée obtenue à partir d’au moins un capteur du véhicule, par exécution sur un dispositif compris dans le véhicule d’un algorithme fondé sur l’application à la donnée d’un réseau de neurones, le dispositif comportant au moins un mémoire et au moins un processeur agencés pour effectuer les opérations de :A third aspect of the invention relates to a device for autonomous driving of a vehicle, for the generation of a driving instruction from data obtained from at least one sensor of the vehicle, by execution on a device included in the vehicle of an algorithm based on the application to the data of a neural network, the device comprising at least one memory and at least one processor arranged to perform the operations of:

- réception de la donnée obtenue à partir du capteur ;- reception of the data obtained from the sensor;

- génération de l’instruction de conduite par application de l’algorithme ;- generation of the driving instruction by application of the algorithm;

Un quatrième aspect de l’invention concerne un véhicule configuré pour comprendre le dispositif selon le troisième aspect de l’invention.A fourth aspect of the invention relates to a vehicle configured to include the device according to the third aspect of the invention.

D’autres caractéristiques et avantages de l’invention apparaîtront à l’examen de la description détaillée ci-après, et des dessins annexés sur lesquels :Other characteristics and advantages of the invention will appear on examination of the detailed description below, and of the appended drawings in which:

est un diagramme illustrant les étapes d’un procédé selon un mode de réalisation de l’invention ; is a diagram illustrating the steps of a method according to one embodiment of the invention;

est un schéma illustrant l’application théorique de masques dans un mode de réalisation de l’invention ; is a diagram illustrating the theoretical application of masks in one embodiment of the invention;

est un schéma illustrant un résultat d’une convolution dans un mode de réalisation de l’invention ; is a diagram illustrating a result of a convolution in one embodiment of the invention;

est un schéma illustrant des résultats expérimentaux pour un mode de réalisation de l’invention ; is a diagram illustrating experimental results for one embodiment of the invention;

illustre la structure d’un dispositif selon un mode de réalisation de l’invention. illustrates the structure of a device according to one embodiment of the invention.

L’invention est décrite ci-après dans son application, non limitative, au cas d’un réseau de neurones de type réseau neuronal convolutif, CNN ou encore ConvNet, utilisé pour analyse une image acquise par une caméra d’un véhicule autonome. D’autres applications sont naturellement envisageables pour la présente invention. Par exemple, le procédé selon l’invention peut être mis en œuvre par le véhicule autonome pour des étapes de fusion de données de capteurs (agrégation des images acquises par les caméras, des données des radars, lidars, ultrasons ou encore laser) ou encore pour des étapes de décisions de conduite.The invention is described below in its non-limiting application to the case of a neural network of the convolutional neural network, CNN or even ConvNet type, used to analyze an image acquired by a camera of an autonomous vehicle. Other applications are naturally possible for the present invention. For example, the method according to the invention can be implemented by the autonomous vehicle for steps of merging sensor data (aggregation of images acquired by the cameras, data from radars, lidars, ultrasounds or even lasers) or even for driving decision steps.

Lafigure 1illustre un procédé, selon un mode de réalisation de l’invention. Figure 1 illustrates a method, according to one embodiment of the invention.

A une étape 30, une donnée Img est obtenue à partir d’au moins un capteur d’un véhicule autonome. Dans le présent mode de réalisation, Img correspond à une pluralité d’images acquises par une caméra, telle qu’une caméra multifonction située en haut au centre du pare-brise du véhicule.At a step 30, a datum Img is obtained from at least one sensor of an autonomous vehicle. In the present embodiment, Img corresponds to a plurality of images acquired by a camera, such as a multifunction camera located at the top center of the windshield of the vehicle.

A une étape 32, un traitement d’Img est effectué pour extraire des données P_Img exploitables par un réseau de neurones. Un tel traitement consiste par exemple à effectuer un premier filtrage des images pour accentuer des contrastes ou enlever des perturbations visuelles, par exemple introduites par des saletés sur la caméra. Un autre traitement consiste à une fusion avec des données d’autres capteurs, tels qu’un radar ou un laser, pour enrichir les images d’informations complémentaires ou d’informations de fiabilité relatives aux images.At a step 32, an Img processing is performed to extract data P_Img usable by a neural network. Such processing consists for example of carrying out a first filtering of the images to accentuate contrasts or remove visual disturbances, for example introduced by dirt on the camera. Another processing consists of a fusion with data from other sensors, such as a radar or a laser, to enrich the images with additional information or reliability information relating to the images.

A une étape 34, un algorithme fondé sur un réseau de neuronesNest appliqué à P_Img. Dans un mode de réalisation particulier,Nest directement appliqué à Img et conduit à l’obtention d’une donnée S.At a step 34, an algorithm based on a neural network N is applied to P_Img. In a particular embodiment, N is directly applied to Img and results in obtaining a datum S.

Dans le cas d’une reconnaissance d’image, S comprend des données relatives à l’environnement du véhicule et typiquement d’identification d’objets pertinents pour la conduite autonome. Par exemple, S comprend l’information selon laquelle deux camions sont présents dans telle zone de l’image, un piéton proche d’un passage clouté, etc.In the case of image recognition, S includes data relating to the environment of the vehicle and typically identifying objects relevant to autonomous driving. For example, S includes the information that two trucks are present in such area of the image, a pedestrian near a zebra crossing, etc.

Le réseau de neurones est simplifié lors de son entraînement par un élagage. L’élagage consiste à entrainer le réseau de neurones en lui appliquant des masques. L’application des masques lors de l’entrainement supprime au moins un noyau de convolution du réseau de neurones.The neural network is simplified when trained by pruning. Pruning consists of training the neural network by applying masks to it. Applying the masks during training removes at least one convolution kernel from the neural network.

En particulier, l’application des masques lors de l’entrainement comporte les sous-étapes de :In particular, the application of masks during training includes the sub-steps of:

L’application de la fonction d’activation est mise en œuvre, classiquement, entre des couches du réseau. En outre, la fonction d’activation est configurée pour que les paramètres négatifs des neurones auxquels sont appliqués les données ainsi activées deviennent nuls.The application of the activation function is implemented, classically, between layers of the network. In addition, the activation function is configured so that the negative parameters of the neurons to which the data thus activated are applied become zero.

La fonction d’activation est une fonction d’activation non saturante telle qu’une unité de rectification linéaire, ReLu.The activation function is a non-saturating activation function such as a linear rectification unit, ReLu.

Ainsi, l’apprentissage consiste à apprendre des masques constitués de paramètres tout à fait normaux et, avant d’effectuer le produit avec les feature maps, d’appliquer une fonction d’activation ReLU aux masques de sorte que tous les paramètres négatifs des masques deviennent nuls. Des explications relatives à l’annulation des paramètres négatifs sont données ci-après en référence aux figures 3 et 4.So learning consists of learning masks made up of completely normal parameters and, before performing the product with the feature maps, applying a ReLU activation function to the masks so that all the negative parameters of the masks become zero. Explanations relating to the cancellation of negative parameters are given below with reference to figures 3 and 4.

Dans la présente description, on entend par « paramètre » un paramètre appris. En effet, on distingue entre paramètres appris et paramètres non-appris. Il est courant, comme dans la présente description, de nommer les paramètres appris comme « paramètres » et les paramètres non-appris comme « meta-paramètres », ou encore « hypermaramètre ». Les paramètres de profondeur, pas et de marge ne sont pas appris, ce sont des méta-paramètres. Par contre, les poids des connexions ainsi que les éléments des noyaux de convolution sont des paramètres appris, que la présente invention se propose de compresser.In the present description, “parameter” means a learned parameter. Indeed, we distinguish between learned parameters and non-learned parameters. It is common, as in the present description, to name the learned parameters as “parameters” and the non-learned parameters as “meta-parameters”, or even “hypermarameter”. The depth, step and margin parameters are not learned, they are meta parameters. On the other hand, the weights of the connections as well as the elements of the convolution kernels are learned parameters, which the present invention proposes to compress.

A une étape 36, un post-traitement est appliqué à S et conduit à l’obtention de P_S. Par exemple, le post-traitement comporte l’application d’un bloc de décision, qui peut également être fondé sur l’application d’un réseau de neurones, traduisant les informations environnementales en instructions de conduite (angle volant, accélération, clignotants, etc.).At a step 36, a post-processing is applied to S and results in obtaining P_S. For example, post-processing involves the application of a decision block, which can also be based on the application of a neural network, translating environmental information into driving instructions (steering angle, acceleration, turn signals, etc.).

A une étape 38, l’instruction de conduite est appliquée aux composants en charge de la conduite autonome. Par exemple, l’angle volant est transmis au calculateur ESP, Electronic Stability Program, pour correcteur électronique de trajectoire, pour application de l’angle volant.At a step 38, the driving instruction is applied to the components in charge of autonomous driving. For example, the steering wheel angle is transmitted to the ESP (Electronic Stability Program) computer, for electronic trajectory corrector, for application of the steering wheel angle.

Lafigure 2illustre l’application des masques conduisant à l’annulation de feature map. Figure 2 illustrates the application of masks leading to feature map cancellation.

Le réseau de neuronesNest constitué de couches appliquant une fonction sur une entrée et renvoyant une sortie, ces entrées et sorties pouvant être de dimensions différentes. L’exemple que nous allons prendre ici est celui de la couche de convolution, mais nous étendrons ce principe aux couches denses ou n’importe quel autre type de couche. Pour une autre couche, il suffit de faire en sorte que chaque paramètre du masque soit associé aux connections d’entrées ou aux sorties d’un même neurone.The neural network N is made up of layers applying a function to an input and returning an output, these inputs and outputs possibly having different dimensions. The example we will take here is that of the convolution layer, but we will extend this principle to dense layers or any other type of layer. For another layer, it suffices to ensure that each parameter of the mask is associated with the input connections or the outputs of the same neuron.

Considérons, comme illustré à la figure 2 une convolution prenant en entréef _in « feature maps » (en quelque sorte, des images àf _in canaux) de taillein _w xin _h et renvoyant en sortief _out des feature maps de tailleout _w ×out _h .Let us consider, as illustrated in figure 2, a convolution taking as input f _in "feature maps" (in a way, images with f _in channels) of size in _w x in _h and returning as output f _out feature maps of size out _w × out _h .

Sur la figure 2, à chaque feature map d’entrée et de sortie de la convolution 16 est associé un paramètre d’un masque. Il y a un produit 14 entre ces paramètres et leur feature map correspondante. De même en sortie avec le produit 18. Comme les paramètres des masques peuvent être nuls (grisés, références 10 et 12), cela revient à annuler ces feature maps (également en rouge).In FIG. 2, each input and output feature map of convolution 16 is associated with a parameter of a mask. There is a product 14 between these parameters and their corresponding feature map. Likewise on output with product 18. As the mask parameters can be zero (grayed out, references 10 and 12), this amounts to canceling these feature maps (also in red).

De ce fait, une telle annulation revient à faire comme illustré à lafigure 3. La figure 3 représente un résultat sur la convolution, représentée par l’opération 20. Nous pouvons alors considérer que les dimensions de sortie ne sont plusf _in etf _out maisf _in ’etf _out ’, ces dimensions valant alors la norme 0 (le nombre d’éléments non nuls) des masques correspondants : le réseau a bel et bien apprisf _in ’etf _out ’, alors que ces dimensions sont, sans cette technique, des hyperparamètres (paramètres utilisés pour contrôler le procédé d’entrainement) dont le réseau dépend sans pouvoir les apprendre.Therefore, such a cancellation amounts to doing as illustrated in FIG . Figure 3 represents a result on the convolution, represented by operation 20. We can then consider that the output dimensions are no longer f _in and f _out but f _in ' and f _out ' , these dimensions then being worth the norm 0 (the number of non-zero elements) corresponding masks: the network has indeed learned f _in ' and f _out ' , whereas these dimensions are, without this technique, hyperparameters (parameters used to control the training process ) on which the network depends without being able to learn them.

Lafigure 4est un schéma illustrant des résultats expérimentaux pour un mode de réalisation de l’invention. Figure 4 is a diagram illustrating experimental results for one embodiment of the invention.

Afin d’inciter les masques à avoir un maximum d’éléments nuls (donc en fait négatifs pour notre implémentation), un terme de régularisation est ajouté à la fonction de perte. Ce terme de régularisation cherche à estimer le nombre de paramètres dédiés au traitement de chaque feature map. Pour un réseauN, de paramètresw, entraîné sur le jeu de donnéesDavec la fonction de perteLet composé de couches indexées pari, le problème d’optimisation résolu par l’entraînement et auquel on a ajouté notre terme de régularisation s’écrit comme suit :In order to encourage the masks to have a maximum of null elements (thus in fact negative for our implementation), a regularization term is added to the loss function. This regularization term seeks to estimate the number of parameters dedicated to the processing of each feature map. For a network N , of parameters w , trained on the data set D with the loss function L and composed of layers indexed by i , the optimization problem solved by the training and to which we added our regularization term s written as follows:

Dans cette formule,sest la taille du noyau de convolution. C’est pourquoi elle est au carré, pour un noyau de taille 3, on a 3x3 paramètres.In this formula, s is the size of the convolution kernel. This is why it is squared, for a kernel of size 3, we have 3x3 parameters.

En faisant augmenter lambda (le coefficient caractérisant l’importance de la régularisation lors de l’entraînement), on force le réseau à réduire son nombre de paramètres.By increasing lambda (the coefficient characterizing the importance of regularization during training), the network is forced to reduce its number of parameters.

Le rapport entre nombre de paramètres et précision du réseau est montré à la Figure 4. Sur cette figure, la partie gauche correspond aux résultats expérimentaux sur l’ensemble de données CIFAR-10 avec un réseau ResNet20 et la partie droite aux résultats expérimentaux sur l’ensemble de données CIFAR-10 avec un réseau ResNet18. En particulier, sur ces deux parties, l’axe des ordonnées 40 correspond à un indice de précision de test et l’axe des abscisses 42 au nombre de paramètres.The relationship between number of parameters and precision of the network is shown in Figure 4. In this figure, the left part corresponds to the experimental results on the CIFAR-10 dataset with a ResNet20 network and the right part to the experimental results on the CIFAR-10 dataset with a ResNet18 network. In particular, on these two parts, the ordinate axis 40 corresponds to a test accuracy index and the abscissa axis 42 to the number of parameters.

Le point avec le plus grand nombre de paramètres est celui pour lequel lambda était nul. Les autres points correspondent aux cas où lambda était non nul (et plus grand inversement proportionnellement au nombre de paramètres à la fin de l’entraînement).The point with the largest number of parameters is the one for which lambda was zero. The other points correspond to the cases where lambda was non-zero (and larger inversely proportional to the number of parameters at the end of the training).

Dans un autre mode de réalisation, si le terme ajouté à l’équation précédente pénalise le nombre de paramètres, il est possible de pénaliser le nombre d’opérations en multipliant le terme de pénalisation, pour chaque couche, par . Ces deux types de pénalisation (mémoire et calcul) peuvent être combinées de sorte à laisser à l’entraînement le choix de quel aspect minimiser pour impacter le moins négativement les performances du réseau, puisque pénaliser les opérations risque d’inciter l’entraînement à élaguer le plus les premières couches qui sont pourtant les plus importantes à conserver pour les performances.In another embodiment, if the term added to the previous equation penalizes the number of parameters, it is possible to penalize the number of operations by multiplying the penalty term, for each layer, by . These two types of penalization (memory and computation) can be combined in such a way as to let the training choose which aspect to minimize in order to have the least negative impact on the performance of the network, since penalizing the operations risks encouraging the training to prune most the first layers which are nevertheless the most important to keep for the performances.

Lafigure 5représente un exemple de dispositif D compris dans le véhicule VEH, dans le réseau CLD ou dans le serveur SRVR. Ce dispositif D peut être utilisé en tant que dispositif centralisé en charge d’au moins certaines étapes du procédé décrit ci-avant en référence à la figure 1. Dans un mode de réalisation, il correspond à un calculateur de conduite autonome. FIG. 5 represents an example of device D included in the vehicle VEH, in the network CLD or in the server SRVR. This device D can be used as a centralized device in charge of at least certain steps of the method described above with reference to FIG. 1. In one embodiment, it corresponds to an autonomous driving computer.

Dans la présente invention, le dispositif D est compris dans le véhicule.In the present invention, the device D is included in the vehicle.

Ce dispositif D peut prendre la forme d’un boitier comprenant des circuits imprimés, de tout type d’ordinateur ou encore d’un smartphone.This device D can take the form of a box comprising printed circuits, of any type of computer or even of a smartphone.

Le dispositif D comprend une mémoire vive 1 pour stocker des instructions pour la mise en œuvre par un processeur 2 d’au moins une étape des procédés tels que décrits ci-avant. Le dispositif comporte aussi une mémoire de masse 3 pour le stockage de données destinées à être conservées après la mise en œuvre du procédé.The device D comprises a random access memory 1 for storing instructions for the implementation by a processor 2 of at least one step of the methods as described above. The device also comprises a mass memory 3 for storing data intended to be kept after the implementation of the method.

Le dispositif D peut en outre comporter un processeur de signal numérique (DSP) 4. Ce DSP 4 reçoit des données pour mettre en forme, démoduler et amplifier, de façon connue en soi ces données.The device D may further comprise a digital signal processor (DSP) 4. This DSP 4 receives data to shape, demodulate and amplify, in a manner known per se, this data.

Le dispositif comporte également une interface d’entrée 5 pour la réception des données mises en œuvre par des procédés selon l’invention et une interface de sortie 6 pour la transmission des données mises en œuvre par le procédé.The device also comprises an input interface 5 for receiving data implemented by methods according to the invention and an output interface 6 for transmitting data implemented by the method.

La présente invention ne se limite pas aux formes de réalisation décrites ci-avant à titre d’exemples ; elle s’étend à d’autres variantes.The present invention is not limited to the embodiments described above by way of examples; it extends to other variants.

Ainsi, il a été décrit un mode de réalisation correspondant à l’application d’un réseau de neurones de type CNN pour une analyse d’image. D’autres applications sont envisageables et par exemple l’application d’un réseau de neurones récurrents, RNN pour Recurrent Neural Network, pour la prise de décision d’instructions de conduite autonome à partir d’informations environnementales.
Thus, an embodiment corresponding to the application of a CNN-type neural network for image analysis has been described. Other applications are possible and for example the application of a recurrent neural network, RNN for Recurrent Neural Network, for the decision-making of autonomous driving instructions from environmental information.

Claims

Method for autonomous driving of a vehicle, for the generation of a driving instruction from data obtained from at least one sensor of the vehicle, by execution on a device included in the vehicle of an algorithm based on a neural network, the method comprising the steps of:

receiving (30) the data obtained from the sensor;
generation of the driving instruction from the data obtained by application of the algorithm;

characterized in that the neural network is simplified during its training by pruning, the pruning consisting in training the neural network by applying masks to it, the application of the masks during the training removing at least one core of neural network convolution.

Method according to claim 1, in which the application of the masks during training comprises the sub-steps of:

A method according to claim 2, wherein the activation function is a non-saturating activation function.

A method according to claim 3, wherein the non-saturating activating function is a linear rectification unit, ReLu.

Computer program comprising instructions for implementing the method according to any one of the preceding claims, when these instructions are executed by a processor (2).

Device (D) for autonomous driving of a vehicle, for generating a driving instruction from data obtained from at least one sensor of the vehicle, by execution on a device included in the vehicle of an algorithm based on a neural network, the device comprising at least one memory and at least one processor arranged to perform the operations of:

receiving the data obtained from the sensor;
generation of the driving instruction from the data obtained by application of the algorithm;

A vehicle configured to include the device according to claim 6.