EP4012620A1

EP4012620A1 - Method for automatically learning by transfer

Info

Publication number: EP4012620A1
Application number: EP21213450.6A
Authority: EP
Inventors: Mohamed El Amine SEDDIK; Mohamed TAMAAZOUSTI
Original assignee: Commissariat a lEnergie Atomique CEA; Commissariat a lEnergie Atomique et aux Energies Alternatives CEA
Current assignee: Commissariat a lEnergie Atomique et aux Energies Alternatives CEA
Priority date: 2020-12-14
Filing date: 2021-12-09
Publication date: 2022-06-15
Also published as: FR3117647A1

Abstract

Méthode, mise en œuvre par ordinateur, d'apprentissage automatique par transfert comprenant les étapes de :- Entrainer (401) un réseau de neurones artificiel (RS) à résoudre un premier problème source de classification, à partir d'au moins un premier ensemble de données d'apprentissage source (DAS),- Mettre à jour (404) les poids synaptiques de la dernière couche du réseau de neurones artificiel entrainé, à partir d'un second ensemble de données d'apprentissage cible (DAc), pour résoudre un second problème cible de classification,- Pour chaque neurone de la dernière couche du réseau de neurones artificiel associé à une classe, lesdits poids synaptiques étant proportionnels à la moyenne centrée normalisée des représentations du second ensemble de données d'apprentissage cible (DAc) extraites de la dernière couche cachée du réseau de neurones artificiel et appartenant à ladite classe.A computer-implemented method of machine transfer learning comprising the steps of:- training (401) an artificial neural network (RS) to solve a first classification source problem, from at least a first set of source training data (DAS), - updating (404) the synaptic weights of the last layer of the trained artificial neural network, from 'a second set of target training data (DAc), for solving a second target classification problem,- For each neuron of the last layer of the artificial neural network associated with a class, said synaptic weights being proportional to the normalized centered mean of the representations of the second set of target training data (DAc) extracted from the last hidden layer of the artificial neural network and belonging to said class.

Description

L'invention concerne le domaine des méthodes d'apprentissage automatique et en particulier celui des réseaux de neurones artificiels.The invention relates to the field of automatic learning methods and in particular that of artificial neural networks.

L'invention concerne plus précisément le domaine de l'apprentissage automatique par transfert ou « transfer learning » en anglais qui vise à transférer des connaissances d'une tâche source vers une tâche cible. La problématique générale de l'apprentissage par transfert peut être vue comme la capacité d'un système à reconnaitre et appliquer des connaissances, apprises à partir de tâches antérieures, sur de nouvelles tâches ou domaines partageant des similitudes.The invention relates more specifically to the field of automatic transfer learning, which aims to transfer knowledge from a source task to a target task. The general issue of transfer learning can be seen as the ability of a system to recognize and apply knowledge, learned from previous tasks, to new tasks or domains that share similarities.

Dans le cas de réseaux de neurones artificiels, l'apprentissage par transfert consiste, par exemple, à apprendre à résoudre un problème particulier à partir d'un réseau ayant au préalable appris à résoudre un problème plus général.In the case of artificial neural networks, transfer learning consists, for example, in learning to solve a particular problem from a network that has previously learned to solve a more general problem.

Dans le domaine de l'apprentissage automatique par transfert appliqué aux réseaux de neurones artificiels, un problème à résoudre consiste à optimiser un réseau pré-appris pour résoudre un problème général afin de l'adapter à un problème plus particulier et ce en limitant les ressources computationnelles aussi bien en terme de temps de calcul alloué à l'apprentissage du problème particulier que de ressources mémoires pour stocker les données utilisées pour réaliser l'apprentissage.In the field of automatic learning by transfer applied to artificial neural networks, a problem to be solved consists in optimizing a pre-learned network to solve a general problem in order to adapt it to a more specific problem and this by limiting the resources. computational both in terms of computation time allocated to learning the particular problem and memory resources to store the data used to perform the learning.

Autrement dit, un objectif recherché dans ce domaine est de pouvoir réaliser un apprentissage automatique pour un problème particulier en bénéficiant de l'apprentissage d'un réseau pour un problème plus général et ainsi en limitant la complexité de mise en œuvre de l'apprentissage du problème particulier.In other words, an objective sought in this field is to be able to carry out automatic learning for a particular problem by benefiting from the learning of a network for a more general problem and thus by limiting the complexity of implementing the learning of the particular problem.

Un autre problème réside dans la disponibilité de grandes quantités de données d'apprentissage qui sont généralement nécessaires pour réaliser un apprentissage automatique performant.Another problem lies in the availability of large amounts of training data which are generally required to perform efficient machine learning.

Pour certaines applications, les données d'apprentissage peuvent être couteuses à acquérir et ne sont pas toujours disponibles en grande quantité. Ainsi, il existe un besoin de recycler l'utilisation de réseaux de neurones pré-appris sur des bases de données d'apprentissage générales, avec un minimum d'adaptation, pour les utiliser afin de résoudre un problème spécifique à partir de nouvelles données d'apprentissage en quantité limitée.For some applications, training data can be expensive to acquire and is not always available in large quantities. Thus, there is a need to retrain the use of pre-learned neural networks on general training databases, with a minimum of adaptation, to use them to solve a specific problem from new training data in limited quantity.

Ce besoin est d'autant plus présent que de nombreux modèles de réseaux pré-entrainés sont à présent disponibles en ligne. Un réseau pré-entrainé est un réseau de neurones dont les paramètres (les poids du réseau ou coefficients synaptiques, notamment) ont été déterminés à l'issue d'un apprentissage sur une base de données d'apprentissage prédéfinie et pour résoudre un problème donné.This need is all the more present as many pre-trained network models are now available online. A pre-trained network is a neural network whose parameters (network weights or synaptic coefficients, in particular) have been determined after learning on a predefined learning database and to solve a given problem. .

En particulier, il peut s'agir d'un problème de classification. Par exemple, un réseau pré-entrainé à reconnaitre des voitures dans une scène peut être utilisé par la suite pour résoudre un problème plus spécifique de classifier les types de voitures identifiés.In particular, it may be a classification problem. For example, a network pre-trained to recognize cars in a scene can be used later to solve a more specific problem of classifying the types of cars identified.

Dans ce contexte, un autre problème auquel est confronté l'utilisateur est de choisir le réseau pré-entrainé qui est le plus adapté à la résolution du problème spécifique lorsque plusieurs réseaux et/ou plusieurs bases de données d'apprentissage sont disponibles.In this context, another problem faced by the user is to choose the pre-trained network which is the most suitable for solving the specific problem when several networks and/or several training databases are available.

Un problème similaire consiste à sélectionner les données d'apprentissage qui permettent d'adapter au mieux les performances du réseau pré-entrainé pour résoudre le problème initial.A similar problem is to select the training data that best adapts the performance of the pre-trained network to solve the initial problem.

De façon générale, l'optimisation du réseau pré-entrainé dans l'objectif de résoudre un problème spécifique doit être réalisée de sorte à limiter le nombre d'opérations et la place mémoire occupée.In general, the optimization of the pre-trained network with the aim of solving a specific problem must be carried out in such a way as to limit the number of operations and the memory space occupied.

Dans le domaine du transfert d'apprentissage, les solutions connues consistent en général à effectuer un apprentissage classique, au moyen d'un algorithme de rétro-propagation basé sur une méthode du gradient, pour au moins la dernière couche cachée du réseau de neurones pré-appris en adaptant la dernière couche du réseau au nouveau problème spécifique à résoudre.In the field of learning transfer, the known solutions generally consist in carrying out classical learning, by means of a back-propagation algorithm based on a gradient method, for at least the last hidden layer of the neural network pre -learned by adapting the last layer of the network to the specific new problem to be solved.

Cette méthode présente l'inconvénient d'être couteuse en opérations et en place mémoire car elle nécessite de réaliser un apprentissage complet d'au moins la dernière couche cachée et ce pour tous les réseaux pré-appris et/ou toutes les nouvelles données d'apprentissage disponibles.This method has the disadvantage of being expensive in operations and in memory space because it requires carrying out a complete learning of at least the last hidden layer and this for all the pre-learned networks and/or all the new data of learning available.

L'invention vise à résoudre les différents problèmes précités en proposant un critère d'optimisation qui vise à déterminer les paramètres de la dernière couche cachée d'un réseau pré-appris pour un problème source, à partir de nouvelles données d'apprentissage utilisées pour résoudre un problème cible.The invention aims to solve the various aforementioned problems by proposing an optimization criterion which aims to determine the parameters of the last hidden layer of a pre-learned network for a source problem, from new learning data used to solve a target problem.

Pour cela, l'invention consiste à remplacer un apprentissage classique du réseau par une mise à jour des poids synaptiques de la dernière couche à partir des nouvelles données d'apprentissage et du problème cible. L'invention s'applique avantageusement pour des problèmes de classification.For this, the invention consists in replacing a conventional learning of the network by an update of the synaptic weights of the last layer from the new learning data and from the target problem. The invention applies advantageously to classification problems.

Elle permet ainsi de limiter le cout des opérations du second apprentissage en le remplaçant par une optimisation directe des paramètres de la dernière couche du réseau.It thus makes it possible to limit the cost of the operations of the second learning by replacing it with a direct optimization of the parameters of the last layer of the network.

L'invention permet ainsi la sélection, avec un coût de traitement réduit, du réseau pré-appris et/ou des données d'apprentissage permettant de résoudre le problème cible avec les meilleures performances.The invention thus allows the selection, with a reduced processing cost, of the pre-learned network and/or of the learning data making it possible to solve the target problem with the best performance.

L'invention a pour objet une méthode, mise en œuvre par ordinateur, d'apprentissage automatique par transfert comprenant les étapes de :

Entrainer un réseau de neurones artificiel à résoudre un premier problème source de classification, à partir d'au moins un premier ensemble de données d'apprentissage source,
Mettre à jour les poids synaptiques de la dernière couche du réseau de neurones artificiel entrainé, à partir d'un second ensemble de données d'apprentissage cible, pour résoudre un second problème cible de classification,
Pour chaque neurone de la dernière couche du réseau de neurones artificiel associé à une classe, lesdits poids synaptiques étant proportionnels à la moyenne centrée normalisée des représentations du second ensemble de données d'apprentissage cible extraites de la dernière couche cachée du réseau de neurones artificiel et appartenant à ladite classe.

The subject of the invention is a method, implemented by computer, of automatic learning by transfer comprising the steps of:

Training an artificial neural network to solve a first source classification problem, from at least a first set of source training data,
Update the synaptic weights of the last layer of the trained artificial neural network, from a second target training dataset, to solve a second target classification problem,
For each neuron of the last layer of the artificial neural network associated with a class, said synaptic weights being proportional to the normalized centered mean of the representations of the second set of target learning data extracted from the last hidden layer of the artificial neural network and belonging to that class.

Selon un aspect particulier, la méthode selon l'invention comprend en outre une étape de remplacer la dernière couche du réseau de neurones artificiel entrainé par une couche spécifique au second problème cible de classification.According to a particular aspect, the method according to the invention further comprises a step of replacing the last layer of the artificial neural network trained by a layer specific to the second target classification problem.

Selon un aspect particulier de l'invention, la couche spécifique est une couche basée sur la fonction softmax.According to a particular aspect of the invention, the specific layer is a layer based on the softmax function.

Selon un aspect particulier de l'invention, pour mettre à jour les poids synaptiques de la dernière couche du réseau de neurones artificiel entrainé, les données d'apprentissage cible du second ensemble sont propagées de l'entrée dudit réseau jusqu'à sa dernière couche cachée de manière à pouvoir en extraire lesdites représentations.According to a particular aspect of the invention, to update the synaptic weights of the last layer of the trained artificial neural network, the target learning data of the second set are propagated from the input of said network to its last layer hidden so as to be able to extract said representations.

Selon un aspect particulier de l'invention, le réseau de neurones artificiel est entrainé pour plusieurs ensembles de données d'apprentissage source, afin de produire plusieurs réseaux de neurones entrainés, la mise à jour des poids synaptiques est réalisée pour chaque réseau de neurones entrainé et la méthode comprend en outre une étape de sélection du réseau de neurone entrainé qui présente les meilleures performances pour résoudre le second problème cible de classification.According to a particular aspect of the invention, the artificial neural network is trained for several sets of source learning data, in order to produce several trained neural networks, the update of the synaptic weights is carried out for each trained neural network and the method further comprises a step of selecting the trained neural network which exhibits the best performance for solving the second target classification problem.

Selon un aspect particulier de l'invention, le second ensemble de données d'apprentissage cible (est de taille inférieure à un ensemble de données d'apprentissage source.According to a particular aspect of the invention, the second set of target training data (is of smaller size than a set of source training data.

Selon un aspect particulier de l'invention, la mise à jour des poids synaptiques est réalisée pour plusieurs ensembles de données d'apprentissage cible, le second problème cible est identique au premier problème source et la méthode comprend en outre une étape de sélection de l'ensemble de données d'apprentissage cible qui permet d'obtenir les performances les moins bonnes pour résoudre le second problème cible de classification.According to a particular aspect of the invention, the updating of the synaptic weights is carried out for several sets of target learning data, the second target problem is identical to the first source problem and the method further comprises a step of selecting the the target training dataset that achieves the worst performance in solving the second target classification problem.

Dans une variante de réalisation, la méthode selon l'invention comprend en outre une nouvelle étape d'entrainement du réseau de neurones artificiels à partir de l'ensemble de données d'apprentissage cible sélectionné ou d'une combinaison de l'ensemble de données d'apprentissage cible sélectionné et de l'ensemble de données d'apprentissage source.In a variant embodiment, the method according to the invention further comprises a new step of training the artificial neural network from the set of target training data selected or from a combination of the set of data selected target training dataset and the source training dataset.

Selon un aspect particulier de l'invention, la dernière couche cachée du réseau de neurones artificiels comporte une fonction d'activation impaire, par exemple une fonction tangente hyperbolique.According to a particular aspect of the invention, the last hidden layer of the network of artificial neurons comprises an odd activation function, for example a hyperbolic tangent function.

L'invention a aussi pour objet un dispositif de calcul pour implémenter un réseau de neurones artificiel ayant appris à résoudre un problème cible de classification à l'aide d'une méthode de transfert d'apprentissage selon l'invention.The invention also relates to a computing device for implementing an artificial neural network having learned to solve a target classification problem using a learning transfer method according to the invention.

L'invention a aussi pour objet un programme d'ordinateur comportant des instructions pour l'exécution de la méthode selon l'invention, lorsque le programme est exécuté par un processeur ainsi qu'un support d'enregistrement lisible par un processeur sur lequel est enregistré un programme comportant des instructions pour l'exécution de la méthode selon l'invention, lorsque le programme est exécuté par un processeur.The invention also relates to a computer program comprising instructions for the execution of the method according to the invention, when the program is executed by a processor as well as a recording medium readable by a processor on which is recorded a program comprising instructions for the execution of the method according to the invention, when the program is executed by a processor.

D'autres caractéristiques et avantages de la présente invention apparaîtront mieux à la lecture de la description qui suit en relation aux dessins annexés suivants.

[Fig. 1] la figure 1 représente un schéma général d'un exemple de réseau de neurones artificiel configuré pour résoudre un problème général de classification dit problème source,
[Fig. 2] la figure 2 représente un schéma du réseau de neurones artificiel de la figure 1 modifié pour résoudre un problème spécifique de classification, dit problème cible,
[Fig. 3] la figure 3 représente les deux dernières couches du réseau de la figure 2,
[Fig. 4] la figure 4 illustre, sur un organigramme, les étapes de mise en œuvre d'une méthode d'apprentissage par transfert selon un premier mode de réalisation de l'invention,
[Fig. 5] la figure 5 illustre, sur un organigramme, les étapes de mise en œuvre d'une méthode d'apprentissage par transfert selon un deuxième mode de réalisation de l'invention,
[Fig. 6] la figure 6 illustre, sur un organigramme, les étapes de mise en œuvre d'une méthode d'apprentissage par transfert selon un troisième mode de réalisation de l'invention,
[Fig. 7] la figure 7 illustre un exemple comparatif de résultats obtenus avec une méthode de l'art antérieur et avec la méthode selon l'invention.

Other characteristics and advantages of the present invention will appear better on reading the following description in relation to the following appended drawings.

[ Fig. 1 ] the figure 1 represents a general diagram of an example of an artificial neural network configured to solve a general classification problem called the source problem,
[ Fig. 2 ] the figure 2 represents a diagram of the artificial neural network of the figure 1 modified to solve a specific classification problem, known as the target problem,
[ Fig. 3 ] the picture 3 represents the last two layers of the network of the picture 2 ,
[ Fig. 4 ] the figure 4 illustrates, on a flowchart, the steps for implementing a transfer learning method according to a first embodiment of the invention,
[ Fig. 5 ] the figure 5 illustrates, on a flowchart, the steps for implementing a transfer learning method according to a second embodiment of the invention,
[ Fig. 6 ] the figure 6 illustrates, on a flowchart, the steps for implementing a transfer learning method according to a third embodiment of the invention,
[ Fig. 7 ] the figure 7 illustrates a comparative example of results obtained with a method of the prior art and with the method according to the invention.

La figure 1 représente un schéma général d'un réseau de neurones artificiels, par exemple un réseau de neurones convolutif. Un réseau de neurones est classiquement composé de plusieurs couches C_e,C_l,C_l+i,C_L,C_s de neurones interconnectés. Le réseau comporte au moins une couche d'entrée C_e et une couche de sortie C_s et plusieurs couches intermédiaires C_l,C_l+i,C_L, encore appelées couches cachées. Les neurones N_i,e de la couche d'entrée C_e reçoivent des données d'entrée. Les données d'entrée peuvent être de natures différentes selon l'application visée. Par exemple, les données d'entrée sont des pixels d'une image. Un réseau de neurones a pour fonction générale d'apprendre à résoudre un problème donné, par exemple un problème de classification, de régression, de reconnaissance, d'identification. Un réseau de neurones est, par exemple, utilisé dans le domaine de la classification d'image ou de la reconnaissance d'image ou plus généralement la reconnaissance de caractéristiques qui peuvent être visuelles, audio, textuelles.The figure 1 represents a general diagram of an artificial neural network, for example a convolutional neural network. A neural network is conventionally composed of several layers C _e ,C _l ,C _l+i ,C _L ,C _s of interconnected neurons. The network comprises at least one input layer C _e and an output layer C _s and several intermediate layers C _l , C _{l +i} , C _L , also called hidden layers. The neurons N _i,e of the input layer C _e receive input data. The input data can be of different natures depending on the targeted application. For example, the input data is pixels of an image. A neural network has the general function of learning to solve a given problem, for example a problem of classification, regression, recognition, identification. A neural network is, for example, used in the field of image classification or image recognition or more generally the recognition of characteristics which may be visual, audio, textual.

Chaque neurone d'une couche est connecté, par son entrée et/ou sa sortie, à tous les neurones de la couche précédente ou suivante. Plus généralement un neurone peut n'être connecté qu'à une partie des neurones d'une autre couche, notamment dans le cas d'un réseau convolutionnel. Les connexions entre deux neurones N_i,e, N_i,l de deux couches successives se font à travers des synapses artificielles S₁,S₂,S₃ qui peuvent être réalisées, notamment, par des mémoires numériques ou par des dispositifs memristifs. Les coefficients des synapses sont optimisés grâce à un mécanisme d'apprentissage du réseau de neurones. Le mécanisme d'apprentissage a pour objectif l'entrainement du réseau de neurones à résoudre un problème défini. Ce mécanisme comporte deux phases distinctes, une première phase de propagation de données de la couche d'entrée vers la couche de sortie et une seconde phase de rétro-propagation d'erreurs de la couche de sortie vers la couche d'entrée avec, pour chaque couche, une mise à jour des poids des synapses. Les erreurs calculées par les neurones de sortie à l'issue de la phase de propagation de données sont liées au problème à résoudre. Il s'agit, de manière générale, de déterminer une erreur entre la valeur d'un neurone de sortie et une valeur attendue ou une valeur cible en fonction du problème à résoudre.Each neuron of a layer is connected, by its input and/or its output, to all the neurons of the preceding or following layer. More generally, a neuron can only be connected to part of the neurons of another layer, in particular in the case of a convolutional network. The connections between two neurons N _i,e , N _i,l of two successive layers are made through artificial synapses S ₁ , S ₂ , S ₃ which can be produced, in particular, by digital memories or by memristive devices. Synapse coefficients are optimized through a neural network learning mechanism. The learning mechanism aims to train the neural network to solve a defined problem. This mechanism comprises two distinct phases, a first phase of propagation of data from the input layer to the output layer and a second phase of back-propagation of errors from the output layer to the input layer with, for each layer, an update of the synapse weights. The errors calculated by the output neurons at the end of the data propagation phase are linked to the problem to be solved. It is, in a way general, to determine an error between the value of an output neuron and an expected value or a target value depending on the problem to be solved.

Lors de la première phase de propagation de données, des données d'apprentissage, par exemple des images ou séquences d'images de référence, sont fournies en entrée des neurones de la couche d'entrée et propagées dans le réseau. Chaque neurone met en œuvre, pendant cette première phase, une fonction d'intégration des données reçues qui consiste, dans le cas d'un réseau convolutionnel, à calculer une somme des données reçues pondérées par les coefficients des synapses. Autrement dit, chaque neurone réalise une opération de convolution sur une taille de filtre de convolution correspondant à la taille de la sousmatrice de neurones de la couche précédente auxquels il est connecté. Chaque neurone propage ensuite le résultat de la convolution vers les neurones de la couche suivante. Selon les modèles de neurones choisis, la fonction d'intégration qu'il réalise peut varier.During the first phase of data propagation, learning data, for example reference images or sequences of images, are supplied as input to the neurons of the input layer and propagated in the network. Each neuron implements, during this first phase, a function of integrating the data received which consists, in the case of a convolutional network, of calculating a sum of the data received weighted by the coefficients of the synapses. In other words, each neuron performs a convolution operation on a convolution filter size corresponding to the size of the submatrix of neurons of the previous layer to which it is connected. Each neuron then propagates the result of the convolution to the neurons in the next layer. Depending on the models of neurons chosen, the integration function that it performs may vary.

Les neurones N_i,s de la couche de sortie C_s exécutent un traitement supplémentaire en ce qu'ils calculent une erreur entre la valeur de sortie du neurone N_i,s et une valeur attendue ou une valeur cible qui correspond à l'état final du neurone de la couche de sortie que l'on souhaite obtenir en relation avec les données d'entrée d'apprentissage et le problème à résoudre par le réseau. Par exemple, si le réseau doit résoudre un problème de classification, l'état final attendu d'un neurone correspond à la classe qu'il est censé identifier dans les données d'entrée.The neurons N _i,s of the output layer C _s perform additional processing in that they calculate an error between the output value of the neuron N _i,s and an expected value or a target value which corresponds to the state end of the neuron of the output layer that one wishes to obtain in relation to the learning input data and the problem to be solved by the network. For example, if the network has to solve a classification problem, the expected end state of a neuron corresponds to the class it is supposed to identify in the input data.

Plus généralement, à chaque problème particulier à résoudre (classification, régression, prédiction) on associe une ou plusieurs fonction(s) de coût ou fonction objective à optimiser (par minimisation ou maximisation). Un objectif de la phase de rétro-propagation d'erreurs est de rechercher les paramètres du réseau qui optimisent (par exemple minimisent) la fonction objective.More generally, each particular problem to be solved (classification, regression, prediction) is associated with one or more cost function(s) or objective function to be optimized (by minimization or maximization). One objective of the error back-propagation phase is to search for network parameters that optimize (eg minimize) the objective function.

La fonction objective dépend des données propagées entre chaque couche du réseau depuis la couche d'entrée mais aussi d'autres paramètres, par exemple les identifiants des classes auxquelles appartiennent les données dans le cas d'un problème de classification.The objective function depends on the data propagated between each layer of the network from the input layer but also on other parameters, for example the identifiers of the classes to which the data belong in the case of a classification problem.

Lors de la seconde phase de rétro-propagation d'erreurs, les neurones de la couche de sortie C_s transmettent les erreurs calculées aux neurones de la couche précédente C_l+1 qui calculent une erreur locale à partir de l'erreur rétro-propagée de la couche précédente et transmettent à leur tour cette erreur locale à la couche précédente C_l. En parallèle, chaque neurone calcule, à partir de l'erreur locale, une valeur de mise à jour des poids des synapses auxquelles il est connecté et met à jour les synapses. Le processus se poursuit pour chaque couche de neurones jusqu'à l'avant dernière couche qui est chargée de mettre à jour les poids des synapses qui la relient à la couche d'entrée C_e.During the second error back-propagation phase, the neurons of the output layer C _s transmit the calculated errors to the neurons of the previous layer C _l+1 which calculate a local error from the back-propagated error of the previous layer and in turn transmit this local error to the previous layer C _l . In parallel, each neuron calculates, from the local error, an update value of the weights of the synapses to which it is connected and updates the synapses. The process continues for each layer of neurons up to the penultimate layer which is responsible for updating the weights of the synapses which connect it to the input layer C _e .

La figure 1 représente un exemple de réseau de neurones donné à titre illustratif et non limitatif. De façon générale, chaque couche intermédiaire d'un réseau peut être connectée totalement ou partiellement à une autre couche.The figure 1 represents an example of a neural network given by way of non-limiting illustration. In general, each intermediate layer of a network can be totally or partially connected to another layer.

Les données d'entrée du réseau peuvent être organisées sous la forme de matrices. Le réseau peut être configuré pour générer en sortie un score ou une information de classification ou tout autre type de sortie adaptée selon le problème à résoudre.Network input data can be organized in the form of matrices. The network can be configured to output a score or classification information or any other type of output suitable for the problem to be solved.

Par exemple, si les données d'entrée sont des images ou des caractéristiques extraites d'images ou de séquences d'images, un réseau de neurone peut être configuré pour reconnaitre les objets extraits des images et les classifier selon différentes classes ou catégories prédéfinies. Le problème résolu par le réseau est alors un problème de classification.For example, if the input data are images or features extracted from images or image sequences, a neural network can be configured to recognize the objects extracted from the images and classify them according to different predefined classes or categories. The problem solved by the network is then a classification problem.

Un problème de classification peut aussi être appliqué à des données textuelles, des documents, des pages Internet, des données audio.A classification problem can also be applied to textual data, documents, Internet pages, audio data.

Le problème à résoudre par le réseau de neurones peut être de nature très diverse. Il peut s'agir d'un problème de classification d'éléments dans différentes catégories dans l'optique d'apprendre à reconnaitre ces éléments.The problem to be solved by the neural network can be very diverse in nature. It may be a problem of classifying elements into different categories in order to learn to recognize these elements.

Le problème peut aussi consister en l'approximation d'une fonction inconnue ou en une modélisation accélérée d'une fonction connue mais complexe à calculer ou encore un problème de régression ou de prédiction.The problem can also consist of the approximation of an unknown function or of an accelerated modeling of a known but complex function to calculate or even a problem of regression or prediction.

La présente invention est applicable plus particulièrement à un problème de classification.The present invention is more particularly applicable to a classification problem.

On considère un réseau appelé réseau source, du type de celui décrit à la figure 1, qui est pré-entrainé sur un ou plusieurs ensemble(s) de données d'apprentissage pour résoudre un problème de classification source.We consider a network called the source network, of the type described in section figure 1 , which is pre-trained on one or more training data sets to solve a source classification problem.

Par exemple, le problème source consiste à identifier, dans une scène capturée via une ou plusieurs image(s), la présence de voitures. Autrement dit, ce problème source consiste à classifier les objets présents dans la scène selon différentes catégories, l'une d'elle étant la catégorie voiture.For example, the source problem consists in identifying, in a scene captured via one or more image(s), the presence of cars. In other words, this source problem consists in classifying the objects present in the scene according to different categories, one of them being the car category.

Pour cela on utilise une ou plusieurs base(s) de données d'images d'apprentissage qui sont utilisées pour réaliser un apprentissage du réseau, de la façon décrite ci-dessus, de sorte à optimiser les poids synaptiques de toutes les couches pour que le réseau apprenne à classifier les objets correspondant à des voitures.For this, one or more databases of training images are used which are used to train the network, in the manner described above, so as to optimize the synaptic weights of all the layers so that the network learns to classify objects corresponding to cars.

La figure 2 schématise un deuxième réseau de neurones, appelé réseau cible, qui a pour objectif d'apprendre à résoudre un problème de classification cible, qui est en général plus spécifique que le problème de classification source. Par exemple, le problème cible consiste à classifier les voitures selon leur type ou leur marque.The picture 2 schematizes a second neural network, called the target network, which aims to learn how to solve a target classification problem, which is generally more specific than the source classification problem. For example, the target problem is to classify cars according to their type or brand.

Un objectif général de l'apprentissage automatique par transfert est d'exploiter le réseau source optimisé de la figure 1 pour exploiter ses connaissances afin de résoudre le problème cible et ce, sans réaliser un apprentissage complet sur la même quantité de données d'apprentissage que pour le réseau source.A general goal of transfer-based machine learning is to exploit the optimized source network of the figure 1 to exploit its knowledge in order to solve the target problem and this, without carrying out a complete training on the same quantity of training data as for the source network.

Pour cela, une méthode possible consiste à remplacer la dernière couche du réseau source par une couche de sortie SM spécifique au problème cible. Autrement dit, la couche de sortie SM du réseau cible obtenue contient autant de neurones de sortie a1,a2,a3 que de catégories associées au problème cible (par exemple correspondant aux différentes catégories de voitures que l'on cherche à classifier).For this, one possible method is to replace the last layer of the source network with an SM output layer specific to the target problem. In other words, the output layer SM of the target network obtained contains as many output neurons a1, a2, a3 as there are categories associated with the target problem (for example corresponding to the different categories of cars that it is desired to classify).

Ensuite, on réalise un apprentissage partiel de ce réseau pour mettre à jour uniquement les poids w1,w2,w3 de la dernière couche qui sont les plus spécifiques au problème visé. Cet apprentissage partiel est réalisé à partir de nouvelles données d'apprentissage en utilisant un algorithme de rétro-propagation de la même façon que pour le réseau source de la figure 1, à la différence près que seuls les poids de la dernière couche sont mis à jour.Then, we carry out a partial training of this network to update only the weights w1, w2, w3 of the last layer which are the most specific to the targeted problem. This partial training is carried out from new training data using a back-propagation algorithm in the same way as for the source network of the figure 1 , except that only the weights of the last layer are updated.

Un objectif de l'invention est de réaliser l'adaptation du réseau source en réseau cible sans réaliser l'apprentissage partiel, c'est-à-dire sans utiliser d'algorithme de rétro-propagation complet.One objective of the invention is to carry out the adaptation of the source network into the target network without carrying out the partial learning, that is to say without using a complete back-propagation algorithm.

La figure 4 illustre, sur un organigramme, les étapes de mise en œuvre d'une méthode de transfert d'apprentissage automatique selon un premier mode de réalisation de l'invention.The figure 4 illustrates, on a flowchart, the steps for implementing an automatic learning transfer method according to a first embodiment of the invention.

La première étape 401 consiste, comme expliqué ci-dessus pour la figure 1, à réaliser un apprentissage ou entrainement complet du réseau de neurones source R_S à partir d'un ensemble de données d'apprentissage source DA_S dans l'objectif de résoudre un problème de classification source.The first step 401 consists, as explained above for the figure 1 , to carry out complete learning or training of the source neural network R _S from a set of source learning data DA _S with the aim of solving a source classification problem.

La deuxième étape 402 consiste à remplacer la dernière couche du réseau source R_S par une couche spécifique au problème cible, afin de générer une architecture d'un réseau cible R_c Le réseau cible R_c diffère du réseau source R_S uniquement de par sa dernière couche. Avantageusement, la dernière couche du réseau cible est basée sur la fonction SoftMax tel que représenté à la figure 3.The second step 402 consists in replacing the last layer of the source network R _S by a layer specific to the target problem, in order to generate an architecture of a target network R _c The target network R _c differs from the source network R _S only by its last layer. Advantageously, the last layer of the target network is based on the SoftMax function as shown in picture 3 .

La figure 3 représente, sur un exemple non limitatif, les deux dernières couches d'un réseau cible R_c, c'est-à-dire la couche finale SoftMax et la dernière couche cachée C_L. Sur cet exemple, la couche SoftMax comporte trois neurones a1,a2,a3 associés à trois catégories du problème cible de classification. Chacun de ces trois neurones est lié à un vecteur de poids synaptiques w₁,w₂,w₃.The picture 3 represents, in a non-limiting example, the last two layers of a target network R _c , that is to say the final SoftMax layer and the last hidden layer C _L . In this example, the SoftMax layer comprises three neurons a1,a2,a3 associated with three categories of the target classification problem. Each of these three neurons is linked to a vector of synaptic weights w ₁ ,w ₂ ,w ₃ .

La troisième étape 403 prend en entrée un nouvel ensemble de données d'apprentissage destinées à être utilisées pour résoudre le problème cible de classification. Typiquement ce nouvel ensemble de données est de taille réduite par rapport au premier ensemble de données d'apprentissage source. Par exemple, il s'agit de données acquises dans un contexte industriel pour résoudre un problème spécifique, l'acquisition de ces données étant couteuse et donc la quantité de données disponible étant limitée. Au contraire, les données d'apprentissage source correspondent à un problème source plus général et sont en conséquence disponibles en plus grande quantité. Par exemple, il s'agit de bases de données disponibles publiquement telles que les bases de données ImageNet ou d'autres bases publiques destinées à l'apprentissage de réseaux de neurones pour la classification.The third step 403 takes as input a new set of training data intended to be used to solve the target classification problem. Typically this new data set is reduced in size compared to the first source training data set. For example, it concerns data acquired in an industrial context to solve a specific problem, the acquisition of this data being expensive and therefore the quantity of data available being limited. On the contrary, the source training data corresponds to a more general source problem and is therefore available in greater quantity. For example, these are publicly available databases such as ImageNet databases or other public databases for training neural networks for classification.

La troisième étape 403 consiste à propager les données d'apprentissage cible de l'entrée du réseau cible jusqu'à la dernière couche cachée C_L puis d'extraire les représentations de ces données de cette dernière couche cachée. Sur l'exemple de la figure 3, les représentations extraites correspondent aux caractéristiques x1,x2,x3,x4,x5.The third step 403 consists in propagating the target learning data from the input of the target network up to the last hidden layer C _L then in extracting the representations of these data from this last hidden layer. On the example of the picture 3 , the extracted representations correspond to the features x1,x2,x3,x4,x5.

A partir de ces représentations, dans une quatrième étape 404, on calcule les valeurs mises à jour des poids synaptiques w₁,w₂,w₃ de la dernière couche comme étant proportionnels à la moyenne centrée normalisée des représentations x1 ,x2,x3,x4,x5 et appartenant à chaque classe respective a1 ,a2,a3.From these representations, in a fourth step 404, the updated values of the synaptic weights w ₁ ,w ₂ ,w ₃ of the last layer are calculated as being proportional to the normalized centered average of the representations x1 ,x2,x3, x4,x5 and belonging to each respective class a1,a2,a3.

Plus précisément, les données d'apprentissage cible utilisées sont labellisées en fonction de la catégorie à laquelle elles appartiennent. Ces labels sont propagés avec les données jusqu'à la dernière couche cachée.More precisely, the target training data used is labeled according to the category to which it belongs. These labels are propagated with the data until the last hidden layer.

Ensuite, on calcule, pour chaque classe associée à un label, les moyennes m_l des représentations sur l'ensemble des données d'apprentissage correspondant à ce label.Then, for each class associated with a label, the averages m _l of the representations on the set of learning data corresponding to this label are calculated.

On calcule ensuite les moyennes centrées via la formule suivante : $\tilde{m_{l}} = m_{l} - \frac{1}{k} \sum_{j = 1}^{k} m_{j}$

, avec k le nombre de classes.The centered means are then calculated using the following formula:

\tilde{m_{I}} = m_{I} - \frac{1}{k} \sum_{I = 1}^{k} m_{I}

, with k the number of classes.

On calcule enfin les poids de la dernière couche via la formule : $w_{t} = Q_{i} \cdot \tilde{m_{l}}$

avec Q_l un facteur de normalisation qui est, par exemple, égal ou proportionnel à

Q_{l} = \frac{1}{‖ \tilde{m_{l}} ‖}

.We finally calculate the weights of the last layer using the formula:

w_{you} = Q_{I} \cdot \tilde{m_{I}}

with Q _l a normalization factor which is, for example, equal or proportional to

Q_{I} = \frac{1}{‖ \tilde{m_{I}} ‖}

.

Cette approximation du calcul des poids synaptiques permet d'éviter le recours à un algorithme de rétropropagation basé sur une descente de gradient qui présente un cout important en nombre d'opérations.This approximation of the calculation of the synaptic weights makes it possible to avoid having recourse to a backpropagation algorithm based on a gradient descent which has a significant cost in terms of the number of operations.

L'approximation proposée est en particulier valable pour une couche finale basée sur la fonction SoftMax et pour des fonctions d'activation impaires (par exemple tangente hyperbolique, sinus ou linéaire) pour la dernière couche cachée du réseau.The proposed approximation is in particular valid for a final layer based on the SoftMax function and for odd activation functions (eg hyperbolic tangent, sine or linear) for the last hidden layer of the network.

On rappelle que la fonction Softmax est définie par :
$f {(A)}_{j} = \frac{e^{a_{j}}}{\sum_{k = 1}^{K} e^{a} k}$

, un vecteur dont les composantes correspondent à chaque classe du problème de classification.We recall that the Softmax function is defined by:

f {(HAS)}_{I} = \frac{e^{{has}_{I}}}{\sum_{k = 1}^{K} e^{has} k}

, a vector whose components correspond to each class of the classification problem.

Cette approximation est obtenue en analysant le comportement théorique de la couche Softmax pour des réseaux classifieurs.This approximation is obtained by analyzing the theoretical behavior of the Softmax layer for classifier networks.

La figure 5 décrit un deuxième mode de réalisation de l'invention qui a pour objectif de sélectionner le réseau de neurones le plus adapté au problème cible à résoudre parmi plusieurs réseaux pré-appris pour un problème source.The figure 5 describes a second embodiment of the invention which aims to select the most suitable neural network for the target problem to be solved from among several pre-learned networks for a source problem.

En effet, lorsque plusieurs bases de données d'apprentissage sont disponibles, un problème à résoudre consiste à déterminer quel réseau pré-appris pour quelle base de données d'apprentissage est le plus adapté pour résoudre le problème cible.Indeed, when several training databases are available, a problem to solve consists in determining which pre-trained network for which training database is the most suitable to solve the target problem.

Dans ce deuxième mode de réalisation, le réseau de neurones source est entrainé à résoudre le problème source pour plusieurs ensembles de données d'apprentissage DA_S1,..., DA_SN. Dans une variante, plusieurs réseaux de neurones sources ayant des architectures différents sont entrainés pour différents ensembles de données d'apprentissage DA_S1,..., DA_SN.In this second embodiment, the source neural network is trained to solve the source problem for several sets of learning data DA _S1 ,..., DA _SN . In a variant, several source neural networks having different architectures are trained for different sets of training data DA _S1 ,..., DA _SN .

A l'issue de l'étape 401 d'apprentissage on obtient donc plusieurs réseaux de neurones source pré-appris R_S1,..., R_SN correspondant chacun à un ensemble de données d'apprentissage par exemple à une base de données d'images publique dans le cas d'un problème de classification d'images, et/ou à une architecture particulière,At the end of the learning step 401, several pre-learned source neural networks R _S1 ,..., R _SN are therefore obtained, each corresponding to a set of learning data, for example to a database d public images in the case of an image classification problem, and/or a particular architecture,

L'étape 402 est appliqué à chaque réseau pré-appris en remplaçant la dernière couche par une couche spécifique au problème cible. On applique ensuite les étapes 403 et 404 du procédé décrit à la figure 4 à chaque réseau pour obtenir plusieurs réseaux cibles R_c1,..., R_cN à partir d'un même ensemble de données d'apprentissage cible DA_C.Step 402 is applied to each pre-trained network by replacing the last layer with a layer specific to the target problem. Steps 403 and 404 of the method described in figure 4 to each network to obtain several target networks R _c1 ,..., R _cN from the same set of target training data DA _C .

Dans une dernière étape 405, on compare les performances de classification de tous les réseaux cibles R_c1,..., R_cN et on retient le réseau cible R_ci qui présente les meilleures performances pour résoudre le problème cible.In a final step 405, the classification performances of all the target networks R _c1 ,..., R _cN are compared and the target network R _ci which presents the best performances for solving the target problem is retained.

L'étape 405 est, par exemple, réalisée en utilisant des données de test labellisées et en exécutant une phase d'inférence de chaque réseau cible pour ces données de test.Step 405 is, for example, carried out by using labeled test data and by executing an inference phase of each target network for this test data.

A la fin de la phase d'inférence, la couche SoftMax du réseau cible calcule une probabilité ou un score de classification (comme illustré à la figure 3).At the end of the inference phase, the SoftMax layer of the target network calculates a probability or a classification score (as illustrated in picture 3 ).

Le réseau cible le plus performant est celui qui permet d'obtenir les scores de classification les plus fiables pour les mêmes données de test.The best performing target network is the one that provides the most reliable classification scores for the same test data.

Ainsi, ce deuxième mode de réalisation permet de sélectionner le réseau pré-appris le plus adapté au problème cible sans besoin de réaliser un apprentissage complet de la dernière couche de chaque réseau pré-appris.Thus, this second embodiment makes it possible to select the pre-learned network best suited to the target problem without the need to carry out complete learning of the last layer of each pre-learned network.

La figure 6 décrit un troisième mode de réalisation de l'invention qui a pour objectif de sélectionner, parmi plusieurs ensembles de données d'apprentissage cible disponibles, celui qui permet le mieux d'améliorer l'apprentissage du réseau pré-appris pour résoudre le problème cible.The figure 6 describes a third embodiment of the invention, the objective of which is to select, among several sets of target training data available, the one which best makes it possible to improve the training of the pre-learned network in order to solve the target problem.

Dans ce troisième mode de réalisation, le problème cible est identique au problème source et donc la dernière couche du réseau cible contient autant de neurones que la dernière couche du réseau source (contrairement aux modes de réalisation précédents).In this third embodiment, the target problem is identical to the source problem and therefore the last layer of the target network contains as many neurons as the last layer of the source network (unlike the previous embodiments).

Un objectif de ce troisième mode de réalisation est de déterminer quel ensemble de données d'apprentissage cible est le plus susceptible de modifier le modèle pré-appris du réseau source. Afin d'éviter de refaire un entrainement complet du réseau source pour chaque nouvel ensemble de données d'apprentissage cible disponibles, on utilise le réseau cible pour déterminer un critère de sélection des données d'apprentissage cible les plus pertinentes.An objective of this third embodiment is to determine which target training data set is most likely to modify the pre-trained model of the source network. In order to avoid redoing a complete training of the source network for each new set of target training data available, the target network is used to determine a criterion for selecting the most relevant target training data.

Dans ce troisième mode de réalisation, l'étape 401 est identique à celle décrite à la figure 4 et est exécutée pour un seul ensemble de données d'apprentissage source afin de produire un modèle de réseau source pré-appris R_S.In this third embodiment, step 401 is identical to that described in figure 4 and is run for a single source training data set to produce a pre-trained source network model R _S .

L'étape 402 consiste à remplacer la dernière couche du réseau source par une couche SoftMax mais cette nouvelle couche comporte autant de neurones que la dernière couche du réseau source de sorte à adresser le même problème source.Step 402 consists of replacing the last layer of the source network with a SoftMax layer but this new layer comprises as many neurons as the last layer of the source network so as to address the same source problem.

Les étapes 403,404 sont exécutées pour plusieurs ensembles de données d'apprentissage cible DA_c1, ..,DA_cN. Ce scénario correspond à une situation où de nouvelles données d'apprentissage cible sont acquises au cours du temps et sont disponibles pour mettre à jour le réseau source. Par exemple, dans le cas d'une voiture autonome, celle-ci peut acquérir en permanence de nouvelles images de son environnement qui peuvent constituer de nouvelles données d'apprentissage.Steps 403,404 are executed for several sets of target training data DA _c1 , .., DA _cN . This scenario corresponds to a situation where new target training data is acquired over time and is available to update the source network. For example, in the case of a autonomous car, it can constantly acquire new images of its environment which can constitute new learning data.

A l'issue de l'étape 404, on obtient plusieurs réseaux cibles R_c1,..., R_cN correspondant à chaque ensemble de données d'apprentissage cible.At the end of step 404, several target networks R _c1 ,..., R _cN are obtained corresponding to each set of target training data.

On applique ensuite une étape 605 finale pour sélectionner le réseau cible, et donc l'ensemble de données d'apprentissage cible associé qui permet de s'éloigner le plus du modèle initialement pré-appris pour générer le réseau source Rs. Autrement dit, à partir de données test, on exécute une phase d'inférence pour chaque réseau cible et on compare les scores de classification déterminés par la couche SoftMax.A final step 605 is then applied to select the target network, and therefore the associated target training data set which makes it possible to deviate the most from the initially pre-learned model to generate the source network Rs. In other words, to From test data, an inference phase is executed for each target network and the classification scores determined by the SoftMax layer are compared.

On retient ensuite le réseau cible qui donne les scores les moins bons. En effet, un mauvais score de classification traduit le fait que les nouvelles données d'apprentissage cible utilisées sont fortement éloignées des données d'apprentissage source DAs utilisées pour générer le modèle du réseau source Rs.We then retain the target network that gives the worst scores. Indeed, a bad classification score reflects the fact that the new target training data used is very far from the source training data DAs used to generate the source network model Rs.

A l'inverse, si le réseau cible permet d'obtenir un bon score de classification alors que le calcul des poids de la dernière couche est basée sur un critère de moyenne centrée normalisée, alors cela signifie que les données utilisées pour mettre à jour ces poids ne sont pas très nouvelles par rapport aux données source.Conversely, if the target network makes it possible to obtain a good classification score while the calculation of the weights of the last layer is based on a normalized centered mean criterion, then this means that the data used to update these weights are not very new compared to the source data.

En sélectionnant les données d'apprentissage cible DAc_i correspondant au réseau cible le moins performant en termes de classification, cela signifie que ces données d'apprentissage sont réellement nouvelles par rapport aux données d'apprentissage source DA_S.By selecting the target training data DAc _i corresponding to the target network with the least performance in terms of classification, this means that these training data are really new with respect to the source training data DA _S .

Ainsi, il est utile de réaliser un nouvel entrainement complet du réseau source Rs à partir de ces nouvelles données d'apprentissage cible DAc_i pour mettre à jour le modèle du réseau source. On reboucle alors à l'étape 401 pour mettre à jour le modèle du réseau source Rs. Le nouvel entrainement du réseau source Rs peut également être réalisé en reprenant la base de données d'apprentissage source DA_s et en la complétant avec les nouvelles données d'apprentissage cible DAc_i.Thus, it is useful to carry out a complete new training of the source network Rs from these new target training data DAc _i to update the model of the source network. We then loop back to step 401 to update the model of the source network Rs. The new training of the source network Rs can also be carried out by taking the source learning database DA _s and completing it with the new data learning target DAc _i .

Un avantage de ce troisième mode de réalisation est qu'il évite la réalisation d'un nouvel entrainement complet du réseau source pour chaque nouveau jeu de données d'apprentissage disponible lorsque ces nouvelles données sont régulièrement disponibles au cours du temps. L'invention permet de définir un critère pour évaluer la nouveauté des nouvelles données disponibles afin de sélectionner uniquement ces données pour réaliser un nouvel apprentissage complet du réseau source.An advantage of this third embodiment is that it avoids carrying out a complete new training of the source network for each new training data set available when these new data are regularly available over time. The invention makes it possible to define a criterion to evaluate the novelty of the new data available in order to select only this data to carry out a complete new learning of the source network.

La figure 7 représente un exemple comparatif de performances obtenues pour une méthode d'apprentissage classique par rétro-propagation et pour une méthode selon l'invention.The figure 7 represents a comparative example of performances obtained for a conventional learning method by back-propagation and for a method according to the invention.

Le diagramme de la figure 7 représente les performances de classification en fonction de la quantité de données testées au cours du temps respectivement pour les deux méthodes précitées, pendant l'apprentissage et pendant une phase de test.The diagram of the figure 7 represents the classification performance as a function of the quantity of data tested over time respectively for the two aforementioned methods, during learning and during a test phase.

Les courbes 701,711 correspondent aux performances obtenues au cours d'un apprentissage respectivement avec une méthode par rétro-propagation (701) et avec une méthode selon l'invention (711).The curves 701,711 correspond to the performances obtained during learning respectively with a back-propagation method (701) and with a method according to the invention (711).

Les courbes 702,712 correspondent aux performances obtenues au cours d'une phase d'inférence avec des données de test pour un réseau appris respectivement avec une méthode par rétro-propagation (702) et avec une méthode selon l'invention (712).The curves 702,712 correspond to the performances obtained during an inference phase with test data for a network learned respectively with a back-propagation method (702) and with a method according to the invention (712).

Les résultats de la figure 7 sont obtenus avec un réseau dont la dernière couche cachée comporte une fonction d'activation du type tangente hyperbolique.The results of the figure 7 are obtained with a network whose last hidden layer comprises an activation function of the hyperbolic tangent type.

Les courbes du diagramme de la figure 7 montrent que l'invention permet d'obtenir des performances de classification similaires à celles obtenues avec un apprentissage classique.The curves of the diagram of the figure 7 show that the invention makes it possible to obtain classification performances similar to those obtained with conventional learning.

L'invention peut être mise en œuvre en tant que programme d'ordinateur comportant des instructions pour son exécution. Le programme d'ordinateur peut être enregistré sur un support d'enregistrement lisible par un processeur.The invention can be implemented as a computer program comprising instructions for its execution. The computer program may be recorded on a processor-readable recording medium.

La référence à un programme d'ordinateur qui, lorsqu'il est exécuté, effectue l'une quelconque des fonctions décrites précédemment, ne se limite pas à un programme d'application s'exécutant sur un ordinateur hôte unique. Au contraire, les termes programme d'ordinateur et logiciel sont utilisés ici dans un sens général pour faire référence à tout type de code informatique (par exemple, un logiciel d'application, un micro logiciel, un microcode, ou toute autre forme d'instruction d'ordinateur) qui peut être utilisé pour programmer un ou plusieurs processeurs pour mettre en œuvre des aspects des techniques décrites ici. Les moyens ou ressources informatiques peuvent notamment être distribués ("Cloud computing"), éventuellement selon des technologies de pair-à-pair. Le code logiciel peut être exécuté sur n'importe quel processeur approprié (par exemple, un microprocesseur) ou cœur de processeur ou un ensemble de processeurs, qu'ils soient prévus dans un dispositif de calcul unique ou répartis entre plusieurs dispositifs de calcul (par exemple tels qu'éventuellement accessibles dans l'environnement du dispositif). Le code exécutable de chaque programme permettant au dispositif programmable de mettre en œuvre les processus selon l'invention, peut être stocké, par exemple, dans le disque dur ou en mémoire morte. De manière générale, le ou les programmes pourront être chargés dans un des moyens de stockage du dispositif avant d'être exécutés. L'unité centrale peut commander et diriger l'exécution des instructions ou portions de code logiciel du ou des programmes selon l'invention, instructions qui sont stockées dans le disque dur ou dans la mémoire morte ou bien dans les autres éléments de stockage précités.Reference to a computer program which, when executed, performs any of the functions previously described, is not limited to an application program running on a single host computer. Rather, the terms computer program and software are used herein in a general sense to refer to any type of computer code (e.g., application software, firmware, microcode, or other form of computer instruction) which can be used to program one or more processors for implement aspects of the techniques described herein. The computing means or resources can in particular be distributed (“ Cloud computing ”), possibly using peer-to-peer technologies. The software code may be executed on any suitable processor (e.g., a microprocessor) or processor core or set of processors, whether provided in a single computing device or distributed among multiple computing devices (e.g. example as possibly accessible in the environment of the device). The executable code of each program allowing the programmable device to implement the processes according to the invention can be stored, for example, in the hard disk or in ROM. In general, the program or programs can be loaded into one of the storage means of the device before being executed. The central unit can control and direct the execution of the instructions or portions of software code of the program or programs according to the invention, instructions which are stored in the hard disk or in the ROM or else in the other aforementioned storage elements.

Le réseau de neurones cible peut être implémenté sur un dispositif de calcul basé, par exemple, sur un processeur embarqué. Le processeur peut être un processeur générique, un processeur spécifique, un circuit intégré propre à une application (connu aussi sous le nom anglais d'ASIC pour « Application-Specific Integrated Circuit ») ou un réseau de portes programmables in situ (connu aussi sous le nom anglais de FPGA pour « Field-Programmable Gate Array »). Le dispositif de calcul peut utiliser un ou plusieurs circuits électroniques dédiés ou un circuit à usage général. La technique de l'invention peut se réaliser sur une machine de calcul reprogrammable (un processeur ou un micro-contrôleur par exemple) exécutant un programme comprenant une séquence d'instructions, ou sur une machine de calcul dédiée (par exemple un ensemble de portes logiques comme un FPGA ou un ASIC, ou tout autre module matériel).The target neural network can be implemented on a computing device based, for example, on an embedded processor. The processor may be a generic processor, a specific processor, an application-specific integrated circuit (also known as an ASIC for "Application-Specific Integrated Circuit") or an array of field-programmable gates (also known as the English name of FPGA for “Field-Programmable Gate Array”). The computing device may use one or more dedicated electronic circuits or a general purpose circuit. The technique of the invention can be implemented on a reprogrammable calculation machine (a processor or a microcontroller for example) executing a program comprising a sequence of instructions, or on a dedicated calculation machine (for example a set of gates such as an FPGA or an ASIC, or any other hardware module).

L'invention s'applique dans de nombreux domaines dans lesquels il existe un besoin de résoudre un problème de classification.The invention applies in many fields in which there is a need to solve a classification problem.

Par exemple, l'invention s'applique à l'analyse de données images ou vidéos ou plus généralement multimédia pour résoudre un problème de classification. Dans ce cas les données d'entrée du réseau sont des données multimédia.For example, the invention applies to the analysis of image or video data or more generally multimedia to solve a classification problem. In this case, the network input data is multimedia data.

L'invention peut être implémentée dans des dispositifs embarqués, comprenant plusieurs couches de Silicium respectivement dédiées à l'acquisition des données d'entrées (par exemple image, ou vidéo) et aux calculs relatifs au réseau de neurones.The invention can be implemented in embedded devices, comprising several silicon layers respectively dedicated to the acquisition of input data (for example image, or video) and to calculations relating to the neural network.

L'invention s'applique pour tout problème de classification d'objets parmi des données. Par exemple, il peut s'agir de classer des images ou des séquences audio ou des données textuelles en fonction de leur contenu.The invention applies to any problem of classifying objects among data. For example, it may involve classifying images or audio sequences or textual data according to their content.

Claims

A computer-implemented method of machine transfer learning comprising the steps of: - Training (401) an artificial neural network (R _S ) to solve a first source classification problem, from at least a first set of source learning data (DA _S ),

- updating (404) the synaptic weights of the last layer of the trained artificial neural network, from a second set of target learning data (DA _c ), to solve a second target classification problem,

- For each neuron of the last layer of the artificial neural network associated with a class, said synaptic weights being proportional to the normalized centered mean of the representations of the second set of target learning data (DA _c ) extracted from the last hidden layer of the artificial neural network and belonging to said class.

Method according to claim 1 further comprising a step of replacing (402) the last layer of the trained artificial neural network with a layer specific to the second target classification problem.

A method according to claim 2 wherein the specific layer is a layer based on the softmax function.

A method according to any preceding claim wherein, to update the synaptic weights of the last layer of the trained artificial neural network, the target training data (DA _c ) of the second set is propagated (403) from the entry of said network up to its last hidden layer so as to be able to extract said representations therefrom.

Method according to any one of the preceding claims, in which the artificial neural network is trained for several sets of source training data (DA _S1 ,..., DA _SN ), in order to produce several trained neural networks (R _S1 ,..., R _SN ), the updating of the synaptic weights is performed for each trained neural network and the method further comprises a step (405) of selecting the trained neural network ( R _ci ) which has the best performance in solving the second classification target problem.

A method according to claim 5 wherein the second target training data set (DA _c ) is smaller in size than a source training data set.

Method according to any one of Claims 1 to 4, in which the updating of the synaptic weights is carried out for several sets of target training data (DA _C1 ,..., DA _CN ), the second target problem is identical to the first source problem and the method further comprises a step of selecting (605) the target training data set which results in the worst performance in solving the second classification target problem.

Method according to claim 7 further comprising a new step of training the artificial neural network from the selected target training data set or from a combination of the selected target training data set and the source training dataset.

Method according to any one of the preceding claims, in which the last hidden layer of the artificial neural network comprises an odd activation function, for example a hyperbolic tangent function.

Computing device for implementing an artificial neural network which has learned to solve a classification target problem using a learning transfer method according to any one of the preceding claims.

Computer program downloadable from a communication network and/or stored on a medium readable by a processor, the program comprising instructions for the execution of the method according to one any of claims 1 to 9, when the program is executed by a processor.

A processor-readable recording medium on which is recorded a program comprising instructions for the execution of the method according to any one of claims 1 to 9, when the program is executed by a processor.