FR3140973A1

FR3140973A1 - Method for distributing the parameters of a neural network, inference method and associated devices

Info

Publication number: FR3140973A1
Application number: FR2210604A
Authority: FR
Inventors: Nihel KABOUBI; Loïc LETONDEUR; Thierry Coupaye
Original assignee: Orange SA
Current assignee: Orange SA
Priority date: 2022-10-14
Filing date: 2022-10-14
Publication date: 2024-04-19
Anticipated expiration: 2042-10-14
Also published as: FR3140973B1

Abstract

Ce procédé de distribution de paramètres d’un réseau de neurones à au moins un dispositif est mis en œuvre par un orchestrateur. Il comporte des étapes de :- partitionnement dudit réseau en un ensemble de sous-réseaux, ledit partitionnement comportant :(i) au moins une étape (E20) dite de découpage vertical du réseau de neurones entre deux couches du réseau pour obtenir au moins un sous-réseau dit vertical, un sous-réseau vertical étant constitué d’une ou plusieurs couches consécutives dudit réseau ; et(ii) au moins une étape (E40) dite de découpage horizontal d’un ensemble d’au moins une couche du réseau pour obtenir au moins un sous-réseau dit horizontal comportant une partie seulement des neurones de l’ensemble, le résultat d’une inférence d’un sous-réseau horizontal étant uniquement fonction des paramètres des neurones de ce sous-réseau ;- envoi (E60, E70) des paramètres d’au moins un dit sous-réseau pour qu’ils soient distribués à au moins un dispositif d’exécution (W) configuré pour réaliser une inférence du sous-réseau,le résultat de l’inférence du sous-réseau contribuant au résultat d’une inférence dudit réseau de neurones. Fig. 4 This method of distributing parameters from a neural network to at least one device is implemented by an orchestrator. It comprises steps of: - partitioning of said network into a set of subnetworks, said partitioning comprising: (i) at least one step (E20) called vertical division of the neural network between two layers of the network to obtain at least one so-called vertical subnetwork, a vertical subnetwork consisting of one or more consecutive layers of said network; and (ii) at least one step (E40) called horizontal cutting of a set of at least one layer of the network to obtain at least one so-called horizontal subnetwork comprising only part of the neurons of the set, the result an inference of a horizontal subnetwork being solely a function of the parameters of the neurons of this subnetwork; - sending (E60, E70) the parameters of at least one said subnetwork so that they are distributed to at least one at least one execution device (W) configured to perform an inference of the subnetwork, the result of the inference of the subnetwork contributing to the result of an inference of said neural network. Fig. 4

Description

Method for distributing parameters of a neural network, inference method and associated devices

La présente invention se situe dans le domaine de l’exécution des réseaux de neurones, en phase d’inférence. On rappelle qu’en apprentissage automatisé, la phase d'inférence renvoie à l'exécution d'un modèle déjà entrainé sur un jeu de données d'apprentissage puis testé sur un jeu de données de validation. La phase d’inférence (ou l’inférence) fait référence au déploiement du modèle et à sa mise en application en situation réelle de production.The present invention is in the field of execution of neural networks, in the inference phase. It is recalled that in automated learning, the inference phase refers to the execution of a model already trained on a training data set and then tested on a validation data set. The inference phase (or inference) refers to the deployment of the model and its application in a real production situation.

L’invention se situe plus particulièrement dans le contexte de l’exécution de réseaux de neurones par des dispositifs fortement contraints en ressources, notamment en mémoire.The invention is more particularly situated in the context of the execution of neural networks by devices with high resource constraints, particularly in memory.

De nos jours, en particulier dans le contexte IoT (internet des objets), des objets connectés génèrent des données dont certaines peuvent être traitées par des réseaux de neurones dans le cadre d’un service. Par exemple, une caméra capture des images puis les transmet, via un réseau de communication, à un fournisseur de service de reconnaissance d’image qui utilise un réseau de neurones pour détecter et reconnaître des personnes.Nowadays, especially in the context of IoT (Internet of Things), connected objects generate data, some of which can be processed by neural networks as part of a service. For example, a camera captures images and then transmits them, via a communication network, to an image recognition service provider that uses a neural network to detect and recognize people.

L’inférence des réseaux de neurones requiert généralement plus de ressources que celles dont disposent les objets connectés qui consomment les résultats produits par ces réseaux. Aussi, dans l’état actuel de la technique, il est usuel de déployer les réseaux de neurones sur des serveurs en nuage (en anglais « cloud computing ») ; les objets connectés envoient leurs données à traiter au réseau de neurones et récupèrent le résultat de l’inférence via le réseau.Neural network inference generally requires more resources than those available to the connected objects that consume the results produced by these networks. Also, in the current state of the art, it is common to deploy neural networks on cloud servers (in English "cloud computing"); the connected objects send their data to be processed to the neural network and retrieve the result of the inference via the network.

Ce mécanisme a pour inconvénient d’entraîner un transfert de données massif entre les objets connectés, (ou plus généralement les dispositifs qui sollicitent ces réseaux de neurones) et les serveurs qui exécutent ces réseaux de neurones. Ce transfert massif introduit une latence qui peut ne pas être acceptable pour certaines applications.This mechanism has the disadvantage of causing a massive data transfer between connected objects (or more generally the devices that request these neural networks) and the servers that execute these neural networks. This massive transfer introduces a latency that may not be acceptable for certain applications.

Un autre inconvénient est celui de la préservation de la confidentialité des données échangées. Par exemple, si un objet connecté souhaite solliciter un réseau de neurones pour traiter des images, certaines applications requièrent que les images ne soient pas divulguées.Another drawback is that of preserving the confidentiality of the exchanged data. For example, if a connected object wishes to request a neural network to process images, certain applications require that the images not be disclosed.

D’autre part, en cas de coupure réseau, un tel mode de fonctionnement ne permet pas de garantir la bonne exécution d’un service.On the other hand, in the event of a network outage, such an operating mode does not guarantee the proper execution of a service.

Les solutions actuelles ne sont donc pas satisfaisantes.Current solutions are therefore not satisfactory.

Subject and summary of the invention

Selon un premier aspect, l’invention concerne un procédé de distribution de paramètres d’un réseau de neurones à au moins un dispositif, le procédé étant mis en œuvre par un orchestrateur de distribution et comportant des étapes de :
- partitionnement dudit réseau en un ensemble de sous-réseaux, ledit partitionnement comportant :
(i) au moins une étape dite de découpage vertical du réseau de neurones entre deux couches pour obtenir au moins un sous-réseau dit vertical, un sous-réseau vertical étant constitué d’une ou plusieurs couches consécutives dudit réseau ; et
(ii) au moins une étape dite de découpage horizontal d’un ensemble d’au moins une couche du réseau pour obtenir au moins un sous-réseau dit horizontal comportant une partie seulement des neurones de l’ensemble, le résultat d’une inférence d’un sous-réseau horizontal étant uniquement fonction des paramètres des neurones de ce sous-réseau ;
- envoi des paramètres d’au moins un dit sous-réseau pour qu’ils soient distribués à au moins un dispositif d’exécution configuré pour réaliser une inférence du sous-réseau,
le résultat de l’inférence du sous-réseau contribuant au résultat d’une inférence dudit réseau de neurones.According to a first aspect, the invention relates to a method for distributing parameters of a neural network to at least one device, the method being implemented by a distribution orchestrator and comprising steps of:
- partitioning said network into a set of subnetworks, said partitioning comprising:
(i) at least one step called vertical cutting of the neural network between two layers to obtain at least one so-called vertical sub-network, a vertical sub-network being made up of one or more consecutive layers of said network; and
(ii) at least one step called horizontal cutting of a set of at least one layer of the network to obtain at least one so-called horizontal sub-network comprising only part of the neurons of the set, the result of an inference of a horizontal sub-network being solely a function of the parameters of the neurons of this sub-network;
- sending the parameters of at least one said subnetwork so that they are distributed to at least one execution device configured to perform an inference of the subnetwork,
the result of the inference of the subnetwork contributing to the result of an inference of said neural network.

Ainsi, et d’une façon générale, l’invention propose de partitionner un réseau de neurones en plusieurs sous-réseaux et de distribuer les paramètres d’au moins un sous-réseau à un dispositif d’exécution configuré pour réaliser une inférence de ce sous-réseau.Thus, and in general, the invention proposes to partition a neural network into several sub-networks and to distribute the parameters of at least one sub-network to an execution device configured to perform an inference of this sub-network.

Conformément à l’invention, le partitionnement comporte au moins un découpage vertical et au moins un découpage horizontal.According to the invention, the partitioning comprises at least one vertical division and at least one horizontal division.

De façon générale, un réseau de neurones traite des données en entrée pour déterminer des données en sortie (par exemple une classification d’une image, un score de régression). Ce traitement peut être décomposé en plusieurs traitements intermédiaires, chacun nécessitant moins de ressources (mémoire et processeur) que le traitement complet. Chaque traitement intermédiaire est effectué par une sous-partie du réseau de neurone. Ainsi un réseau de neurones peut être découpé en sous-réseaux.Generally speaking, a neural network processes input data to determine output data (e.g. an image classification, a regression score). This processing can be broken down into several intermediate processes, each requiring fewer resources (memory and processor) than the complete processing. Each intermediate process is performed by a sub-part of the neural network. Thus a neural network can be divided into sub-networks.

Les termes de « partitionnement » ou « découpage » d’un réseau de neurones désignent ici la décomposition du réseau de neurones en plusieurs sous-réseaux de plus petite taille que le réseau de neurones initial, chacun de ces sous-réseaux pouvant effectuer une partie du traitement que doit effectuer le réseau de neurones.The terms "partitioning" or "slicing" a neural network here refer to the decomposition of the neural network into several subnetworks of smaller size than the initial neural network, each of these subnetworks being able to carry out part of the processing that the neural network must carry out.

Par « découpage vertical » on se réfère ici à une décomposition d’un réseau composé de couches de neurones successives. Le réseau de neurones est ainsi décomposé en plusieurs sous-réseaux chacun composé d’un sous-ensemble de couches successives.By "vertical slicing" we refer here to a decomposition of a network composed of successive layers of neurons. The neural network is thus decomposed into several sub-networks each composed of a subset of successive layers.

L’expression « découpage vertical » est un label. Elle est utilisée de façon imagée car par convention un réseau de neurones est représenté avec sa ou ses couches d’entrée à gauche, sa couche de sortie à droite, et chaque couche est représentée verticalement. La notion de « découpage vertical » renvoie à une décomposition d’un réseau en sous-réseau composés d’une ou plusieurs couches entières.The expression "vertical splitting" is a label. It is used metaphorically because by convention a neural network is represented with its input layer(s) on the left, its output layer on the right, and each layer is represented vertically. The notion of "vertical splitting" refers to a decomposition of a network into subnetworks composed of one or more entire layers.

On considèrera ci-après que les couches d’un réseau de données sont ordonnées, la première couche étant la couche d’entrée, la dernière couche étant la couche de sortie.The layers of a data network will be considered below to be ordered, with the first layer being the input layer and the last layer being the output layer.

A titre d’exemple, la représente un réseau de neurones VGG16 composé de couches de convolutions, de couches de regroupement (« pooling » en anglais) et de couches denses.For example, the represents a VGG16 neural network composed of convolution layers, pooling layers and dense layers.

On rappelle que :
- les couches de convolution effectuent un filtrage par convolution pour détecter la présence d'un ensemble de caractéristiques dans des données reçues en entrée ;
- les couches de regroupement sont des couches, généralement placées entre deux couches de convolution, qui permettent de réduire le nombre de paramètres dans le réseau et d’éviter le sur-apprentissage (overfitting en anglais) ; et que
- les couches denses sont des couches dont chaque neurone est connecté à tous les neurones de la couche précédente.We remind you that:
- convolution layers perform convolutional filtering to detect the presence of a set of features in input data;
- pooling layers are layers, usually placed between two convolution layers, which allow to reduce the number of parameters in the network and to avoid overfitting; and that
- dense layers are layers where each neuron is connected to all neurons in the previous layer.

Sur la , on a représenté par des accolades sept sous-réseaux verticaux SRV1 à SRV7 obtenus par découpage vertical de réseau VGG16.On the , we have represented by braces seven vertical subnetworks SRV1 to SRV7 obtained by vertical cutting of network VGG16.

Lors d’un découpage vertical, l’ordre des sous-réseaux dans le réseau de neurones est conservé pour le traitement des données en entrée : le premier sous-réseau calcule des premières données intermédiaires à partir des données d’entrée, puis le deuxième sous-réseau calcule des deuxièmes données intermédiaires à partir des premières données intermédiaires, et ainsi de suite jusqu’à ce que le dernier sous-réseau calcule les données en sortie du réseau de neurones. Chaque sous-réseau produit des résultats, ou sorties (en anglais output), le résultat produit par le dernier sous-réseau correspondant au résultat que produirait le réseau de neurones d’origine, sans partitionnement. A l’exclusion du dernier sous-réseau, les sorties d’un sous-réseau intermédiaire constituent les entrées du sous-réseau suivant.In vertical slicing, the order of the subnetworks in the neural network is preserved for processing the input data: the first subnetwork computes first intermediate data from the input data, then the second subnetwork computes second intermediate data from the first intermediate data, and so on until the last subnetwork computes the output data of the neural network. Each subnetwork produces results, or outputs, with the result produced by the last subnetwork corresponding to the result that the original neural network would produce, without partitioning. Excluding the last subnetwork, the outputs of an intermediate subnetwork constitute the inputs of the next subnetwork.

Par « découpage horizontal », on se réfère ici au partitionnement d’une ou plusieurs couches consécutives de neurones en plusieurs sous-réseaux, chacun comportant un sous-ensemble de neurones. Ce découpage ne s’effectue pas entre des couches mais au sein de ces couches, la contrainte étant que le résultat d’une inférence d’un sous-réseau horizontal doit être uniquement fonction des paramètres des neurones de ce sous-réseau.By "horizontal slicing" we refer here to the partitioning of one or more consecutive layers of neurons into several subnetworks, each comprising a subset of neurons. This slicing is not done between layers but within these layers, the constraint being that the result of an inference of a horizontal subnetwork must be a function only of the parameters of the neurons of this subnetwork.

La illustre un exemple de découpage horizontal d’une couche en trois sous-réseaux horizontaux SRH1 à SRH3.There illustrates an example of horizontal division of a layer into three horizontal sub-networks SRH1 to SRH3.

Par exemple, le découpage horizontal d’une couche dense composée de 1000 neurones par l’invention peut produire 10 sous-couches denses de 100 neurones chacune, chacune de ces sous-couches déterminant, à partir des mêmes données en entrée, une partie des données en sortie de la couche de neurones.For example, the horizontal slicing of a dense layer composed of 1000 neurons by the invention can produce 10 dense sub-layers of 100 neurons each, each of these sub-layers determining, from the same input data, a part of the output data of the neuron layer.

L’homme du métier comprend que, lorsqu’un réseau original est découpé horizontalement en sous-réseaux horizontaux, les sorties de ces sous-réseaux doivent être fusionnées pour produire la sortie que produirait le réseau original sans découpage. Cette opération de fusion induit un coût qualifié ci-après de coût de synchronisation.The person skilled in the art understands that when an original network is horizontally divided into horizontal subnetworks, the outputs of these subnetworks must be merged to produce the output that the original network would produce without division. This merging operation induces a cost hereinafter referred to as a synchronization cost.

La illustre un exemple de partitionnement hybride au sens de l’invention en sous-réseaux verticaux SRVi et horizontaux SRHj. Comme représenté sur cette figure :
- les sous-réseaux verticaux SRVi sont constitués par une ou plusieurs couches successives et complètes;
- le résultat d’une inférence d’un sous-réseau horizontal est uniquement fonction des paramètres des neurones de ce sous-réseau. Cette propriété vient du fait que les neurones d’un sous-réseau horizontal issu du découpage horizontal d’une couche ou d’un ensemble de couches ne sont connectés à aucun neurone d’un autre sous-réseau horizontal issu du même découpage.There illustrates an example of hybrid partitioning within the meaning of the invention into vertical SRVi and horizontal SRHj sub-networks. As shown in this figure:
- SRVi vertical subnetworks are made up of one or more successive and complete layers;
- the result of an inference of a horizontal subnetwork is only a function of the parameters of the neurons of this subnetwork. This property comes from the fact that the neurons of a horizontal subnetwork resulting from the horizontal division of a layer or a set of layers are not connected to any neuron of another horizontal subnetwork resulting from the same division.

En proposant un partitionnement hybride du réseau de neurones original, l’invention permet de traiter l’inférence de grands réseaux de neurones en la distribuant sur des dispositifs disposant de moins de ressources que celles nécessaires pour réaliser l’inférence du réseau original dans son intégralité, sans partitionnement.By proposing a hybrid partitioning of the original neural network, the invention makes it possible to process the inference of large neural networks by distributing it on devices with fewer resources than those required to perform the inference of the original network in its entirety, without partitioning.

Dans la suite, on appellera dispositif d’exécution (en anglais workers), les dispositifs qui réalisent les inférences des sous-réseaux. Les dispositifs d’exécution peuvent être des machines physiques ou des machines virtuelles.In the following, we will call execution devices (in English workers) the devices that perform the inferences of the subnets. The execution devices can be physical machines or virtual machines.

Dans un mode de réalisation, un orchestrateur de distribution effectue le partitionnement hybride (ou de façon équivalente « la partition hybride ») du réseau de neurones original et distribue lui-même les paramètres des sous-réseaux aux dispositifs d’exécution.In one embodiment, a distribution orchestrator performs hybrid partitioning (or equivalently “hybrid partitioning”) of the original neural network and itself distributes the parameters of the subnets to the execution devices.

Dans un autre mode de réalisation, l’orchestrateur de distribution qui effectue la partition du réseau original en sous-réseaux peut déléguer la partition d’au moins un de ces sous-réseaux à un orchestrateur de distribution de niveau 2. Cet orchestrateur de distribution de niveau 2 peut effectuer un partitionnement hybride du sous-réseau, en sous-sous-réseaux et fournir les paramètres d’au moins un sous-sous-réseau à un dispositif configuré pour obtenir le résultat d’une inférence de ce sous-sous-réseau.In another embodiment, the distribution orchestrator that performs the partition of the original network into subnets may delegate the partition of at least one of those subnets to a Tier 2 distribution orchestrator. This Tier 2 distribution orchestrator may perform a hybrid partitioning of the subnet, into sub-subnets and provide the parameters of at least one sub-subnet to a device configured to obtain the result of an inference of this sub-subnet.

Le nombre de niveau d’orchestration peut être quelconque.The number of orchestration levels can be any.

L’invention vise également un orchestrateur de distribution configuré pour distribuer des paramètres d’un réseau de neurones à au moins un dispositif, ledit orchestrateur de distribution comportant :
- un module de partitionnement du réseau en un ensemble de sous-réseaux, ledit module étant configuré pour :
(i) effectuer au moins un découpage dit vertical du réseau de neurones entre deux couches pour obtenir au moins un sous-réseau dit vertical, un sous-réseau vertical étant constitué d’une ou plusieurs couches consécutives dudit réseau ; et
(ii) effectuer au moins une étape dite de découpage horizontal d’un ensemble d’au moins une couche du réseau pour obtenir au moins un sous-réseau dit horizontal comportant une partie seulement des neurones de l’ensemble, le résultat d’une inférence d’un sous-réseau horizontal étant uniquement fonction des paramètres des neurones de ce sous-réseau ;
- un module d’envoi des paramètres d’au moins un dit sous-réseau pour qu’ils soient distribués à un dispositif d’exécution configuré pour réaliser une inférence du sous-réseau,
le résultat de l’inférence du sous-réseau contribuant au résultat d’une inférence dudit réseau de neurones.The invention also relates to a distribution orchestrator configured to distribute parameters of a neural network to at least one device, said distribution orchestrator comprising:
- a module for partitioning the network into a set of subnetworks, said module being configured to:
(i) performing at least one so-called vertical division of the neural network between two layers to obtain at least one so-called vertical subnetwork, a vertical subnetwork being made up of one or more consecutive layers of said network; and
(ii) performing at least one so-called horizontal cutting step of a set of at least one layer of the network to obtain at least one so-called horizontal sub-network comprising only part of the neurons of the set, the result of an inference of a horizontal sub-network being solely a function of the parameters of the neurons of this sub-network;
- a module for sending the parameters of at least one said subnetwork so that they are distributed to an execution device configured to perform an inference of the subnetwork,
the result of the inference of the subnetwork contributing to the result of an inference of said neural network.

Selon un autre aspect, l’invention concerne un procédé d’inférence d’un réseau de neurones partitionné en une pluralité de sous-réseaux ordonnés, la pluralité de sous-réseaux comportant :
(i) au moins un sous-réseau dit vertical étant constitué d’une ou plusieurs couches consécutives dudit réseau ; et
(ii) au moins un sous-réseau dit horizontal comportant une partie seulement des neurones d’un sous-réseau vertical, le résultat d’une inférence d’un sous-réseau horizontal étant uniquement fonction des paramètres des neurones de ce sous-réseau ;
les paramètres de chacun desdits sous-réseaux étant fournis à un dispositif configuré pour obtenir une inférence dudit sous-réseau,
lesdits sous-réseaux étant ordonnés, l’ordre d’un sous-réseau correspondant à l’ordre de la couche dont il est issu dans le réseau de neurones ;
le procédé comportant une étape de réception de données initiales ; et pour chacun des sous-réseaux d’ordre n pris dans ledit ordre, des étapes de:
- envoi de données d’entrée audit sous-réseau, lesdites données d’entrée étant :
(i) au moins une partie des données initiales pour ledit au moins un sous-réseau d’ordre 1 ou ;
(ii) pour un sous-réseau d’ordre n supérieur à 1, des données obtenues à partir de données de sortie obtenues par l’inférence d’au moins un sous-réseau d’ordre n-1 ;
- l’inférence dudit réseau de neurones étant obtenue à partir des données de sortie dudit au moins un sous-réseau d’ordre maximal.According to another aspect, the invention relates to a method of inferring a neural network partitioned into a plurality of ordered subnetworks, the plurality of subnetworks comprising:
(i) at least one so-called vertical sub-network consisting of one or more consecutive layers of said network; and
(ii) at least one so-called horizontal subnetwork comprising only part of the neurons of a vertical subnetwork, the result of an inference of a horizontal subnetwork being solely a function of the parameters of the neurons of this subnetwork;
the parameters of each of said subnetworks being provided to a device configured to obtain an inference of said subnetwork,
said subnetworks being ordered, the order of a subnetwork corresponding to the order of the layer from which it comes in the neural network;
the method comprising a step of receiving initial data; and for each of the sub-networks of order n taken in said order, steps of:
- sending input data to said subnet, said input data being:
(i) at least part of the initial data for said at least one subnetwork of order 1 or;
(ii) for a subnetwork of order n greater than 1, data obtained from output data obtained by the inference of at least one subnetwork of order n-1;
- the inference of said neural network being obtained from the output data of said at least one maximum order sub-network.

L’invention vise aussi un orchestrateur d’inférence configuré pour réaliser une inférence d’un réseau de neurones partitionné en une pluralité de sous-réseaux ordonnés, la pluralité de sous-réseaux comportant :The invention also relates to an inference orchestrator configured to perform an inference of a neural network partitioned into a plurality of ordered subnetworks, the plurality of subnetworks comprising:

(i) au moins un sous-réseau dit vertical étant constitué d’une ou plusieurs couches consécutives dudit réseau ; et(i) at least one so-called vertical subnetwork consisting of one or more consecutive layers of said network; and

(ii) au moins un sous-réseau dit horizontal comportant une partie seulement des neurones d’un sous-réseau vertical, le résultat d’une inférence d’un sous-réseau horizontal étant uniquement fonction des paramètres des neurones de ce sous-réseau ;(ii) at least one so-called horizontal subnetwork comprising only part of the neurons of a vertical subnetwork, the result of an inference of a horizontal subnetwork being solely a function of the parameters of the neurons of this subnetwork;

les paramètres de chacun desdits sous-réseau étant fournis à un dispositif configuré pour obtenir une inférence dudit sous-réseau,the parameters of each of said subnetworks being provided to a device configured to obtain an inference of said subnetwork,

lesdits sous-réseaux étant ordonnés, l’ordre d’un sous-réseau correspondant à l’ordre de la couche dont il est issu dans le réseau de neurones ;said subnetworks being ordered, the order of a subnetwork corresponding to the order of the layer from which it comes in the neural network;

l’orchestrateur d’inférence comportant :the inference orchestrator comprising:

- un module de réception de données de données initiales ;- a data reception module for initial data;

- un module d’ordonnancement configuré, pour envoyer à chacun des sous-réseaux d’ordre n pris dans ledit ordre, des données d’entrée de ce sous-réseau, lesdites données d’entrée étant :- a scheduling module configured to send to each of the subnetworks of order n taken in said order, input data of this subnetwork, said input data being:

(i) au moins une partie des données initiales pour ledit au moins un sous-réseau d’ordre 1 ou ;(i) at least part of the initial data for said at least one subnetwork of order 1 or;

(ii) pour un sous-réseau d’ordre n supérieur à 1, des données obtenues à partir de données de sortie obtenues par l’inférence d’au moins un sous-réseau d’ordre n-1 ;(ii) for a subnetwork of order n greater than 1, data obtained from output data obtained by the inference of at least one subnetwork of order n-1;

- un module de restitution de l’inférence dudit réseau de neurones, celle-ci étant obtenue à partir des données de sortie dudit au moins un sous-réseau d’ordre maximal.- a module for restoring the inference of said neural network, this being obtained from the output data of said at least one maximum order sub-network.

Lorsque l’on dit qu’un orchestrateur est configuré pour envoyer ou fournir des paramètres d’au moins un sous-réseau pour qu’ils soient distribués à au moins un dispositif configuré pour obtenir une inférence d’un sous-réseau, cela peut notamment signifier que :
- l’orchestrateur est configuré pour fournir directement ces paramètres à un dispositif d’exécution configuré pour réaliser l’inférence du sous-réseau et retourner le résultat de cette inférence à l’orchestrateur ; ou que
- l’orchestrateur est configuré pour fournir ces paramètres à un autre orchestrateur de niveau supérieur et obtenir le résultat de cette inférence de cet autre orchestrateur.When an orchestrator is said to be configured to send or provide parameters of at least one subnet for distribution to at least one device configured to obtain inference of a subnet, this may mean, among other things, that:
- the orchestrator is configured to directly provide these parameters to an execution device configured to perform the subnet inference and return the result of this inference to the orchestrator; or that
- the orchestrator is configured to provide these parameters to another higher level orchestrator and get the result of this inference from this other orchestrator.

Dans un mode de réalisation, les orchestrateurs de distribution et d’inférence sont implémentés sur la même machine, physique ou virtuelle.In one embodiment, the distribution and inference orchestrators are implemented on the same machine, physical or virtual.

L’invention vise aussi un système comportant :
- un orchestrateur de distribution et/ou un distributeur d’inférence tel que mentionné ci-dessus; et
- au moins un dispositif d’exécution configuré pour réaliser l’inférence du sous-réseau.The invention also relates to a system comprising:
- a distribution orchestrator and/or an inference distributor as mentioned above; and
- at least one execution device configured to perform subnet inference.

Au moins certains dispositifs d’exécution peuvent être localisés en périphérie d’un réseau de communication (ou dispositifs de type « edge » en anglais) par exemple dans des passerelles domestiques.At least some execution devices may be located at the periphery of a communication network (or “edge” devices in English), for example in home gateways.

Les inventeurs ont détecté qu’un simple partitionnement vertical d’un réseau de neurones profond ne permettait pas en pratique de distribuer l’inférence de ce réseau sur un des dispositifs de type edge. Le partitionnement hybride de l’invention résout ce problème.The inventors detected that a simple vertical partitioning of a deep neural network did not allow in practice to distribute the inference of this network on one of the edge devices. The hybrid partitioning of the invention solves this problem.

De façon très avantageuse, l’inférence du réseau de neurones original est obtenue à partir des inférences des sous-réseaux, sans aucune dégradation de performance, car l’architecture du modèle n’est pas modifiée, juste distribuée.Very advantageously, the inference of the original neural network is obtained from the inferences of the subnetworks, without any performance degradation, because the architecture of the model is not modified, just distributed.

De façon très avantageuse, le procédé n’implique pas le réentrainement d’un ou plusieurs réseaux de neurones ce qui implique de disposer de données à proximité, de consommer de l’énergie et peut impacter négativement les performances du réseau de neurones original.Very advantageously, the method does not involve retraining one or more neural networks, which requires having data nearby, consumes energy and can negatively impact the performance of the original neural network.

Dans un mode de réalisation, l’orchestrateur peut confier l’exécution d’une même tâche à plusieurs dispositifs et comparer les résultats produits par ces dispositifs afin de détecter le comportement anormal d’un dispositif, causé par une panne ou par un acte malveillant par exemple. L’invention permet ainsi d’effectuer plusieurs calculs intermédiaires, cette redondance permettant de déceler des erreurs de calcul..In one embodiment, the orchestrator can entrust the execution of the same task to several devices and compare the results produced by these devices in order to detect the abnormal behavior of a device, caused by a breakdown or by a malicious act for example. The invention thus makes it possible to perform several intermediate calculations, this redundancy making it possible to detect calculation errors.

La politique de découpage et d’attribution des sous-réseaux peut être définie selon différents critères, par exemple selon des critères de performance et/ou selon des critères de confidentialité.The subnetwork division and allocation policy can be defined according to different criteria, for example according to performance criteria and/or according to confidentiality criteria.

La performance de la partition peut notamment être impactée par :
- un coût de communication des données échangées entre les orchestrateurs, éventuellement le ou les (sous-)orchestrateurs et les dispositifs d’exécution ; et par
- le coût de synchronisation induit par la fusion des résultats produits par les sous-réseaux horizontaux.The performance of the partition can be impacted in particular by:
- a communication cost of the data exchanged between the orchestrators, possibly the (sub-)orchestrator(s) and the execution devices; and by
- the synchronization cost induced by the merging of the results produced by the horizontal subnetworks.

Dans un mode particulier de réalisation de l’invention, le découpage vertical est effectué après une couche de regroupement (couches de pooling). Un tel découpage vertical permet de minimiser le volume à transmettre entre les sous-réseaux verticaux, et donc le coût de communication.In a particular embodiment of the invention, the vertical slicing is performed after a grouping layer (pooling layers). Such vertical slicing makes it possible to minimize the volume to be transmitted between the vertical subnetworks, and therefore the communication cost.

Dans un mode de réalisation, le procédé de distribution selon l’invention comporte une étape pour déterminer si au moins un dispositif d’exécution possède des ressources disponibles suffisantes pour réaliser l’inférence d’un sous-réseau vertical.In one embodiment, the distribution method according to the invention comprises a step for determining whether at least one execution device has sufficient available resources to perform the inference of a vertical subnetwork.

Dans un mode de réalisation, si ce n’est pas le cas et si sous-réseau vertical comporte plus d’une couche, le sous-réseau vertical est à son tour découpé verticalement. Cette étape peut être répétée jusqu’à obtenir des sous-réseaux verticaux constitués d’une couche unique.In one embodiment, if this is not the case and if the vertical subnetwork comprises more than one layer, the vertical subnetwork is in turn cut vertically. This step can be repeated until vertical subnetworks consisting of a single layer are obtained.

Dans un mode de réalisation, seuls les sous-réseaux verticaux constitués d’une couche unique peuvent être découpées horizontalement.In one embodiment, only vertical subarrays consisting of a single layer can be sliced horizontally.

Par exemple, les couches denses peuvent être découpées horizontalement.For example, dense layers can be cut horizontally.

Dans un mode de réalisation, le procédé de distribution comporte :
- une étape pour déterminer si au moins un dispositif d’exécution possède des ressources suffisantes pour réaliser une inférence d’un dit sous-réseau en comparant le nombre de paramètres de ce sous-réseau avec un nombre maximum de paramètres pouvant être traité par ce dispositif d’exécution ; et si ce n’est pas le cas ;
- au moins une étape de découpage vertical ou horizontal dudit sous-réseau.In one embodiment, the distribution method comprises:
- a step for determining whether at least one execution device has sufficient resources to perform an inference of a said subnetwork by comparing the number of parameters of this subnetwork with a maximum number of parameters that can be processed by this execution device; and if this is not the case;
- at least one step of vertical or horizontal cutting of said subnetwork.

Le partitionnement garantit ainsi que les sous-réseaux de neurones sont traités par des dispositifs d’exécution ayant des ressources suffisantes.Partitioning thus ensures that neural subnetworks are processed by execution devices with sufficient resources.

En pratique, ce nombre maximum sera choisi pour ne pas asphyxier le dispositif d’exécution. Il représentera par exemple un pourcentage déterminé des ressources disponibles du dispositif d’exécution, par exemple 30%.In practice, this maximum number will be chosen so as not to suffocate the enforcement system. It will represent, for example, a specific percentage of the enforcement system's available resources, for example 30%.

En effet, du point de vue expérimental, des tests ont montré que les temps d’inférences pouvaient être réduits sur des modèles pouvant s’exécuter sur des dispositifs d’exécution dont les ressources n’étaient pas saturées par le traitement de l’inférence au-delà d’un certain seuil.Indeed, from an experimental point of view, tests have shown that inference times could be reduced on models that could be executed on execution devices whose resources were not saturated by inference processing beyond a certain threshold.

La politique de découpage peut être définie selon différents critères, par exemple selon des critères de performance et/ou selon des critères de confidentialité.The slicing policy can be defined according to different criteria, for example according to performance criteria and/or according to confidentiality criteria.

Dans un mode de réalisation, la politique de découpage et d’attribution des sous-réseaux prend en compte des critères de confidentialité.In one embodiment, the subnetwork slicing and allocation policy takes into account confidentiality criteria.

Par exemple, le procédé de distribution comporte une étape de détermination d’un caractère sensible des données traitées ou générées par un sous réseau et (i) au moins le partitionnement du réseau en sous-réseaux ou (ii) une sélection des dispositifs auxquels sont fournis les paramètres desdits sous-réseaux est effectué en fonction de cette détermination.For example, the distribution method comprises a step of determining a sensitive nature of the data processed or generated by a subnetwork and (i) at least the partitioning of the network into subnetworks or (ii) a selection of the devices to which the parameters of said subnetworks are provided is carried out according to this determination.

Par exemple, si un sous-réseau reçoit ou produit des données sensibles :
i/ ce sous-réseau peut être à son tour découpé, par exemple horizontalement, pour diluer les données sensibles dans plusieurs sous-réseaux et/ou
ii/ ce sous-réseau peut être distribué à un dispositif d’exécution sécurisé.For example, if a subnet receives or produces sensitive data:
i/ this subnetwork can in turn be divided, for example horizontally, to dilute sensitive data in several subnetworks and/or
ii/ this subnet can be distributed to a secure execution device.

Un dispositif d’exécution sécurisé est par exemple un dispositif s’exécutant dans une zone sécurisée, par exemple une zone isolée du réseau Internet, un bastion...A secure execution device is for example a device running in a secure area, for example an area isolated from the Internet network, a bastion...

On rappelle qu’en sécurité des systèmes d'information, un bastion est un élément d’un réseau informatique à sécurité renforcée, séparé des autres activités du réseau informatique et accessible depuis l'extérieur, par exemple Internet. Un bastion peut par exemple être une zone démilitarisée d’un intranet, partiellement filtré par un pare-feu.It is recalled that in information systems security, a bastion is an element of a computer network with reinforced security, separated from other activities of the computer network and accessible from the outside, for example the Internet. A bastion can for example be a demilitarized zone of an intranet, partially filtered by a firewall.

Dans un mode de réalisation, l’orchestrateur (de distribution et/ou d’inférence) est placé dans un bastion.In one embodiment, the orchestrator (distribution and/or inference) is placed in a bastion.

Dans un mode particulier de réalisation, l’orchestrateur (de distribution et/ou d’inférence) est exécuté par une passerelle domestique.In a particular embodiment, the orchestrator (distribution and/or inference) is executed by a home gateway.

Dans un mode particulier de réalisation, l’orchestrateur (de distribution et/ou d’inférence) est installé dans un micro centre de données (en anglais data center) et les sous-réseaux sont distribués à des passerelles domestiques situées à proximité de ce centre de données.In a particular embodiment, the orchestrator (distribution and/or inference) is installed in a micro data center and the subnets are distributed to home gateways located near this data center.

De façon avantageuse d’un point de vue sécuritaire, les dispositifs d’exécution n’ont pas conscience que la tâche qu’ils exécutent (inférence d’un sous-réseau) fait partie d’une tâche plus générale correspondant à une inférence du réseau de neurones original.Advantageously from a security point of view, the execution devices are not aware that the task they are performing (inference of a subnetwork) is part of a more general task corresponding to an inference of the original neural network.

De façon avantageuse, dans un mode de réalisation, les dispositifs d’exécution n’ont pas connaissance de l’entièreté du réseau original qui est issu d’un entrainement requérant du savoir-faire, des données, de l’énergie… et représente donc une valeur. Le procédé peut donc protéger cette valeur.Advantageously, in one embodiment, the execution devices are not aware of the entire original network which is the result of training requiring know-how, data, energy, etc. and therefore represents a value. The method can therefore protect this value.

Dans un mode de réalisation, les dispositifs d’exécution n’ont pas connaissance des autres dispositifs. Ils ne communiquent qu’avec un orchestrateur (ou sous-orchestrateur).In one embodiment, the execution devices are unaware of other devices. They communicate only with an orchestrator (or sub-orchestrator).

Dans un mode particulier de réalisation, l’orchestrateur d’inférence mémorise les résultats produits par les inférences des sous-réseaux exécutées par les dispositifs d’exécution de sorte à permettre une reprise sur panne.In a particular embodiment, the inference orchestrator stores the results produced by the subnetwork inferences executed by the execution devices so as to enable failure recovery.

Ainsi si le fonctionnement d’un dispositif d’exécution est interrompu pendant l’exécution d’une tâche T, l’orchestrateur d’inférence peut resoumettre cette tâche au même dispositif d’exécution ou à un autre dispositif d’exécution sans avoir à demander une nouvelle exécution des tâches nécessaires à la production des données d’entrée de cette tâche T.Thus, if the operation of an execution device is interrupted during the execution of a task T, the inference orchestrator can resubmit this task to the same execution device or to another execution device without having to request a new execution of the tasks necessary for the production of the input data of this task T.

Dans un mode particulier de réalisation, les différentes étapes du procédé de distribution ou du procédé d’inférence sont déterminées par des instructions de programmes d'ordinateurs ou sont implémentées par une puce en silicium qui comprend des transistors adaptés pour constituer des portes logiques d'une logique câblée non programmable.In a particular embodiment, the various steps of the distribution method or the inference method are determined by computer program instructions or are implemented by a silicon chip which comprises transistors adapted to constitute logic gates of non-programmable hard-wired logic.

En conséquence, l'invention vise aussi un programme d'ordinateur sur un support d'informations, ce programme étant susceptible d'être mis en œuvre dans un ordinateur contrôleur, ce programme comportant des instructions adaptées à la mise en œuvre des étapes d'un procédé de distribution et/ou d’un procédé d’inférence tel que décrit ci-dessus.Consequently, the invention also relates to a computer program on an information medium, this program being capable of being implemented in a controller computer, this program comprising instructions adapted to the implementation of the steps of a distribution method and/or an inference method as described above.

Ce programme peut utiliser n'importe quel langage de programmation, et être sous la forme de code source, code objet, ou de code intermédiaire entre code source et code objet, tel que dans une forme partiellement compilée, ou dans n'importe quelle autre forme souhaitable.This program may use any programming language, and may be in the form of source code, object code, or code intermediate between source code and object code, such as in a partially compiled form, or in any other desirable form.

L'invention vise aussi un support d'informations lisible par un ordinateur, et comportant des instructions d'un programme d'ordinateur tel que mentionné ci-dessus. Le support d'informations peut être n'importe quelle entité ou dispositif capable de stocker le programme. Par exemple, le support peut comporter un moyen de stockage, tel qu'une ROM, une mémoire non volatile de type flash ou encore un moyen d'enregistrement magnétique, par exemple un disque dur. D'autre part, le support d'informations peut être un support transmissible tel qu'un signal électrique ou optique, qui peut être acheminé via un câble électrique ou optique, par radio ou par d'autres moyens. Le programme selon l'invention peut être en particulier téléchargé sur un réseau de type Internet. Alternativement, le support d'informations peut être un circuit intégré dans lequel le programme est incorporé, le circuit étant adapté pour exécuter ou pour être utilisé dans l'exécution du procédé en question.The invention also relates to a computer-readable information medium, and comprising instructions of a computer program as mentioned above. The information medium can be any entity or device capable of storing the program. For example, the medium can comprise a storage means, such as a ROM, a non-volatile memory of the flash type or a magnetic recording means, for example a hard disk. Furthermore, the information medium can be a transmissible medium such as an electrical or optical signal, which can be conveyed via an electrical or optical cable, by radio or by other means. The program according to the invention can in particular be downloaded from a network such as the Internet. Alternatively, the information medium can be an integrated circuit in which the program is incorporated, the circuit being adapted to execute or to be used in the execution of the method in question.

D’autres caractéristiques et avantages de la présente invention ressortiront de la description faite ci-dessous, en référence aux dessins annexés qui en illustrent des exemples de réalisation dépourvus de tout caractère limitatif. Sur les figures :Other characteristics and advantages of the present invention will emerge from the description given below, with reference to the attached drawings which illustrate exemplary embodiments thereof which are not limiting in nature. In the figures:

La déjà décrite représente le découpage vertical d’un réseau de neurones de type VGG16 ; There already described represents the vertical cutting of a VGG16 type neural network;

La illustre un exemple de découpage horizontal; There illustrates an example of horizontal cutting;

La illustre un exemple de découpage hybride; There illustrates an example of hybrid clipping;

La représente sous forme d’ordinogramme les principales étapes d’un procédé de distribution conforme à un mode particulier de réalisation de l’invention ; There represents in the form of a flowchart the main steps of a distribution method in accordance with a particular embodiment of the invention;

La représente sous forme d’ordinogramme les principales étapes d’un procédé d’inférence conforme à un mode particulier de réalisation de l’invention ; There represents in the form of a flowchart the main steps of an inference method in accordance with a particular embodiment of the invention;

La illustre l’inférence d’un réseau de neurones partitionné par l’invention; There illustrates the inference of a partitioned neural network by the invention;

La illustre un exemple d’application de l’invention; There illustrates an example of application of the invention;

La représente l’architecture fonctionnelle d’un orchestrateur de distribution conforme à un mode particulier de réalisation de l’invention ; There represents the functional architecture of a distribution orchestrator conforming to a particular embodiment of the invention;

La représente l’architecture matérielle d’un orchestrateur d’inférence conforme à un mode particulier de réalisation de l’invention ; There represents the hardware architecture of an inference orchestrator conforming to a particular embodiment of the invention;

Nous allons maintenant décrire un mode de réalisation de l’invention pour illustrer comment un réseau de neurones RN peut être partitionné selon un partitionnement hybride pour minimiser un coût de communication et la latence de réponse d'inférence.We will now describe an embodiment of the invention to illustrate how a neural network RN can be partitioned according to a hybrid partitioning to minimize a communication cost and the inference response latency.

Dans ce mode de réalisation, un orchestrateur ORC joue à la fois le rôle d’orchestrateur de distribution et d’orchestrateur d’inférence. Cet orchestrateur est représenté aux figures 8 et 9.In this embodiment, an ORC orchestrator acts as both a distribution orchestrator and an inference orchestrator. This orchestrator is shown in Figures 8 and 9.

La présente les principales étapes d’un procédé de distribution selon un mode particulier de réalisation de l’invention. Ce procédé est mis en œuvre par l’orchestrateur ORC.There presents the main steps of a distribution method according to a particular embodiment of the invention. This method is implemented by the ORC orchestrator.

Au cours d’une étape E10, l’orchestrateur ORC obtient un réseau de neurones RN. Cette obtention peut se faire par tous moyens, notamment via un réseau de communication.During a step E10, the ORC orchestrator obtains a neural network RN. This can be done by any means, in particular via a communication network.

Ce réseau de neurones RN va être partitionné et distribué au cours des étapes E20 à E40. Pour le distinguer des sous-réseaux issus de cette partition hybride, ce réseau RN pourra être aussi appelé réseau original.This RN neural network will be partitioned and distributed during steps E20 to E40. To distinguish it from the subnetworks resulting from this hybrid partition, this RN network can also be called the original network.

Ce réseau original RN peut être le réseau complet tel qu’issu des phases d’apprentissage et de validation.This original RN network can be the complete network as it results from the learning and validation phases.

Ce réseau original RN peut également être un sous-réseau issu de la partition hybride du réseau complet (dite partition de niveau 1), ou un sous-sous-réseau issu de la partition hybride d’un sous-réseau issu de la partition hybride du réseau complet (partition de niveau 2) et ainsi de suite.This original RN network can also be a subnetwork from the hybrid partition of the complete network (called level 1 partition), or a sub-subnetwork from the hybrid partition of a subnetwork from the hybrid partition of the complete network (level 2 partition) and so on.

Dans le mode de réalisation décrit ici, au cours d’une étape E20, l’orchestrateur ORC effectue un découpage vertical du réseau complet RN pour obtenir des sous-réseaux verticaux SRVi.In the embodiment described here, during a step E20, the orchestrator ORC performs a vertical division of the complete network RN to obtain vertical sub-networks SRVi.

Lors de la partition hybride de niveau 1, ce découpage vertical peut consister à découper le réseau original RN après les couches de regroupement.In hybrid level 1 partitioning, this vertical slicing can consist of slicing the original RN network after the pooling layers.

Lors d’une partition hybride de niveau n supérieur à 1, ce découpage vertical peut consister à diviser verticalement le réseau original RN par 2. Autrement dit si le réseau de neurones RN comporte 2N couches, (respectivement 2N+1 couches), le découpage vertical de ce réseau peut produire deux sous-réseaux de N couches (respectivement un sous-réseau de N couches et un sous-réseau de N+1 couches).In a hybrid partition of level n greater than 1, this vertical division can consist of vertically dividing the original RN network by 2. In other words, if the RN neural network has 2N layers (respectively 2N+1 layers), the vertical division of this network can produce two subnetworks of N layers (respectively a subnetwork of N layers and a subnetwork of N+1 layers).

Un réseau de neurones RN constitué d’une seule couche ne peut pas être découpé verticalement.A single-layer neural network RN cannot be sliced vertically.

Dans le mode de réalisation décrit ici, on suppose que les dispositifs d’exécution peuvent tous traiter le même nombre maximum MAX de paramètres.In the embodiment described here, it is assumed that the execution devices can all process the same maximum number MAX of parameters.

Dans le mode de réalisation décrit ici, au cours d’une étape E30, l’orchestrateur ORC comporte une étape E30 qui détermine pour chaque sous-réseau vertical SRVi, si le nombre de paramètres P(SRVi) de ce sous-réseau est supérieur au nombre maximum MAX de paramètres pouvant être traité par chacun des dispositifs d’exécution. Si tel est le cas, aucun dispositif d’exécution n’a la capacité de réaliser une inférence de ce sous-réseau.In the embodiment described here, during a step E30, the orchestrator ORC comprises a step E30 which determines for each vertical subnetwork SRVi, whether the number of parameters P(SRVi) of this subnetwork is greater than the maximum number MAX of parameters that can be processed by each of the execution devices. If this is the case, no execution device has the capacity to perform an inference of this subnetwork.

Dans le mode de réalisation décrit ici, si le nombre NL de couches de ce sous-réseau est strictement supérieur à 1, le sous-réseau vertical SRVi est à nouveau découpé verticalement par retour à l’étape E20.In the embodiment described here, if the number NL of layers of this sub-network is strictly greater than 1, the vertical sub-network SRVi is again cut vertically by returning to step E20.

Au contraire, si le nombre NL de couches de ce sous-réseau est égal à 1, le sous-réseau vertical SRVi est découpé horizontalement au cours d’une étape E40.On the contrary, if the number NL of layers of this subnetwork is equal to 1, the vertical subnetwork SRVi is cut horizontally during a step E40.

Dans le mode de réalisation décrit ici, au cours de cette étape E40, l’orchestrateur ORC découpe horizontalement le sous-réseau en une partition de sous-sous-réseaux horizontaux dans laquelle chaque sous-sous-réseau horizontal comporte le nombre maximum MAX de paramètres pouvant être traité par un dispositif d’exécution.In the embodiment described here, during this step E40, the ORC orchestrator horizontally divides the subnetwork into a partition of horizontal sub-subnetworks in which each horizontal sub-subnetwork comprises the maximum number MAX of parameters that can be processed by an execution device.

En variante, au cours du test E30, le dispositif d’orchestration compare le nombre de paramètres P(SRVi) d’un sous-réseau avec le nombre maximum MAX de paramètres pouvant effectivement être traité par chaque dispositif d’exécution à l’instant présent.Alternatively, during the E30 test, the orchestration device compares the number of parameters P(SRVi) of a subnet with the maximum number MAX of parameters that can actually be processed by each execution device at the current time.

Dans le mode de réalisation décrit ici, après l’étape E40, tous les sous-réseaux verticaux et horizontaux peuvent être traités par un dispositif d’exécution.In the embodiment described herein, after step E40, all vertical and horizontal sub-arrays may be processed by an execution device.

Toute autre stratégie de partitionnement hybride fait partie de l’invention. Un partitionnement hybride peut par exemple commencer par des découpages horizontaux, alterner des découpages horizontaux et verticaux …Any other hybrid partitioning strategy is part of the invention. A hybrid partitioning can for example start with horizontal divisions, alternate horizontal and vertical divisions, etc.

Dans le mode de réalisation décrit ici, l’attribution des sous-réseaux aux dispositifs d’exécution, est effectuée en fonction d’un critère de confidentialité.In the embodiment described herein, the assignment of subnets to the execution devices is carried out based on a confidentiality criterion.

Ainsi, dans le mode de réalisation décrit ici, au cours d’une étape E50, l’orchestrateur ORC détermine le caractère sensible des données traitées ou générées par un sous réseau.Thus, in the embodiment described here, during a step E50, the ORC orchestrator determines the sensitive nature of the data processed or generated by a subnetwork.

Si les données traitées ou générées par un sous-réseau sont considérées sensibles, les paramètres de ces sous-réseaux sont fournis à des dispositifs d’exécution DES considérés comme offrant une sécurité importante (étape E60). Si les données traitées ou générées par un sous-réseau sont considérées non sensibles, les paramètres de ces sous-réseaux peuvent être fournis à des dispositifs d’exécution DEC considérés comme offrant une sécurité moins importante (étape E70). Bien entendu plus de deux niveaux de sensibilité ou de sécurité peuvent être envisagés.If the data processed or generated by a subnetwork is considered sensitive, the parameters of these subnetworks are provided to DES execution devices considered to offer significant security (step E60). If the data processed or generated by a subnetwork is considered non-sensitive, the parameters of these subnetworks may be provided to DEC execution devices considered to offer less significant security (step E70). Of course, more than two levels of sensitivity or security may be envisaged.

Dans le mode de réalisation décrit ici, les sous-réseaux sont distribués sur les différents dispositifs d’exécution à travers un réseau de communication NET. Chaque dispositif d’exécution communique reçoit les paramètres des sous-réseaux et les installe.In the embodiment described here, the subnets are distributed to the different execution devices through a NET communication network. Each execution device communicates receives the parameters of the subnets and installs them.

La illustre un procédé pour réaliser une inférence d’un réseau de neurones RN dont le partitionnement est illustré à la . Ce procédé est mis en œuvre par l’orchestrateur ORC.There illustrates a method for performing an inference of a neural network RN whose partitioning is illustrated in . This process is implemented by the ORC orchestrator.

Dans le mode de réalisation décrit ici, le réseau RN comprend 21 couches ordonnées L1 à L21, la première couche recevant des données d’entrée D_IN, la 21^èmeet dernière couche produisant des données de sortie D_OUT.In the embodiment described here, the RN network comprises 21 ordered layers L1 to L21, the first layer receiving input data D _IN , the ^21st and last layer producing output data D _OUT .

Comme représenté sur cette figure on suppose que ce réseau de neurones RN a été partitionné et que les sous-réseaux ont été distribués sur quatre dispositifs d’exécution D1 à D4.As shown in this figure it is assumed that this neural network RN has been partitioned and the subnetworks have been distributed across four execution devices D1 to D4.

Dans ce découpage hybride :
- un premier découpage vertical a été effectué entre des couches L9 et L10. Les paramètres des couches L1 à L9 ont été fournis au dispositif D1 ;
- un deuxième découpage vertical a été effectué entre des couches L14 et L15. Les paramètres des couches L10 à L14 ont été fournis au dispositif D2 ;
- un troisième découpage vertical a été effectué entre des couches L18 et L19. Les paramètres des couches L15 à L18 ont été fournis au dispositif D3 ;
- un découpage horizontal a été effectué au sein de la couche L19 pour obtenir quatre sous-réseaux horizontaux L19P1 à L19P4. Les paramètres d’un sous-réseau L19Pi ont été fournis au dispositif Di ;
- un quatrième découpage vertical a été effectué après la couche L19. Les paramètres des couches L20 et L21 ont été fournis au dispositif D3.In this hybrid cut:
- a first vertical cutting was carried out between layers L9 and L10. The parameters of layers L1 to L9 were supplied to the device D1;
- a second vertical cut was made between layers L14 and L15. The parameters of layers L10 to L14 were provided to device D2;
- a third vertical cut was made between layers L18 and L19. The parameters of layers L15 to L18 were provided to the D3 device;
- a horizontal split was performed within the L19 layer to obtain four horizontal subnetworks L19P1 to L19P4. The parameters of a L19Pi subnetwork were provided to the Di device;
- a fourth vertical cut was made after layer L19. The parameters of layers L20 and L21 were provided to device D3.

Les sous-réseaux Li(Pj) sont ordonnés selon l’ordre de la couche dont il sont issus dans le réseau de neurones. Ici l’ordre d’un sous-réseau Li(Pj) est i.The Li(Pj) subnetworks are ordered according to the order of the layer from which they come in the neural network. Here the order of a Li(Pj) subnetwork is i.

Au cours d’une étape F10, l’orchestrateur reçoit des données d’entrée initiales DIN.During an F10 step, the orchestrator receives initial input data DIN.

Au cours d’une étape F20, l’orchestrateur détermine si l’inférence est terminée.During an F20 step, the orchestrator determines whether the inference is complete.

Si ce n’est pas le cas, au cours d’une étape F30, l’orchestrateur détermine si le prochain sous-réseau est un sous-réseau vertical. Si c’est le cas :
- il envoie les données DIN à ce sous-réseau (étape F32) ;
-il reçoit les données DOUT produites par l’inférence de ce sous-réseau sur les données DIN (étape F34) et les mémorise (étape F36).If not, in a F30 step, the orchestrator determines whether the next subnet is a vertical subnet. If so:
- it sends the DIN data to this subnet (step F32);
-it receives the DOUT data produced by the inference of this subnetwork on the DIN data (step F34) and stores them (step F36).

Au moins une partie des données de sortie DOUT constituent des données d’entrée pour le prochain sous-réseau (étape F38) et le procédé retourne à l’étape F20.At least part of the output data DOUT constitutes input data for the next subnetwork (step F38) and the method returns to step F20.

Si au cours d’une étape F30, l’orchestrateur détermine que le prochain sous-réseau est issu d’un découpage horizontal, pour chaque sous-réseau horizontal issu de ce découpage, l’orchestrateur:
- envoie une partie des données DIN à ce sous-réseau (étape F42) ;
- reçoit les données DOUT produites par l’inférence de ce sous-réseau sur ces données (étape F44) et les mémorise (étape F46).If during a step F30, the orchestrator determines that the next subnetwork comes from a horizontal split, for each horizontal subnetwork coming from this split, the orchestrator:
- sends part of the DIN data to this subnet (step F42);
- receives the DOUT data produced by the inference of this subnetwork on this data (step F44) and stores them (step F46).

L’orchestrateur fusionne (étape F47) les données DOUT reçues de ce sous-réseau pour produire des données fusionnées DFOUT et les mémorise (étape F48). Cette étape d’attente peut être qualifiée de synchronisation.The orchestrator merges (step F47) the DOUT data received from this subnet to produce merged DFOUT data and stores them (step F48). This waiting step can be called synchronization.

Les données de sortie fusionnées DFOUT constituent des données d’entrée DIN pour le prochain sous-réseau (étape F49) et le procédé retourne à l’étape F20.The merged output data DFOUT constitutes input data DIN for the next subnetwork (step F49) and the method returns to step F20.

Lorsque tous les sous-réseaux ont été sollicités, l’orchestrateur détermine, au cours de l’étape F20, que l’inférence est terminée. Il produit (étape F50) le résultat de cette inférence qui correspond aux données DIN de la dernière occurrence de l’étape F38 ou F49.When all subnets have been queried, the orchestrator determines, during step F20, that the inference is complete. It produces (step F50) the result of this inference which corresponds to the DIN data of the last occurrence of step F38 or F49.

Application example

En référence à la , on présente maintenant un exemple de mise en œuvre de l’invention par un industriel qui fabrique des pièces aéronautiques en aluminium par un procédé à extrusion.In reference to the , we now present an example of implementation of the invention by a manufacturer who manufactures aluminum aeronautical parts using an extrusion process.

Pour pouvoir détecter des défauts dans ces pièces, l’industriel fait appel à une solution d’intelligence artificielle par reconnaissance d’image, en utilisant un réseau de neurones de type VGG16.To be able to detect defects in these parts, the manufacturer uses an artificial intelligence solution through image recognition, using a VGG16 type neural network.

Dans cet exemple l’industriel souhaite :
i/ impliquer dans la reconnaissance d’images, des dispositifs sous-utilisés de son infrastructure qui se trouvent dans des services (comptabilité COMPTA, secrétariat SECR, graphisme GRAPH,…) moins sécurisés qu’un service de production PROD ;
ii/ limiter les risques de divulgation de secrets de fabrication en maitrisant l’emplacement des données sensibles (images brutes notamment) ;
iii/ protéger son savoir-faire et ses investissements en limitant le risque de prise de connaissance par un tiers de l’entièreté de son modèle d’intelligence artificielle ;
iv/ limiter les risques d’espionnage d’activité par un tiersIn this example, the industrialist wishes to:
i/ involve in image recognition, underused devices of its infrastructure which are located in services (accounting COMPTA, secretarial SECR, graphics GRAPH, etc.) less secure than a production service PROD;
ii/ limit the risks of disclosure of manufacturing secrets by controlling the location of sensitive data (raw images in particular);
iii/ protect its know-how and investments by limiting the risk of a third party becoming aware of its entire artificial intelligence model;
iv/ limit the risks of activity espionage by a third party

Pour cela, deux types de dispositifs sont distingués :
i/ ceux dont la politique de sécurité est compatible et maîtrisée par l’environnement de production (représentées en traits épais) ;
ii/ ceux considérés « non-sûrs » par l’environnement de la production (représentées en traits fins ).For this, two types of devices are distinguished:
i/ those whose security policy is compatible and controlled by the production environment (represented in thick lines);
ii/ those considered “unsafe” by the production environment (represented in thin lines).

Dans cet exemple, un orchestrateur ORC découpe le réseau de neurones original VGG16 en sous-réseaux en fonction des caractéristiques techniques des dispositifs dont il a la gestion.In this example, an ORC orchestrator divides the original VGG16 neural network into subnetworks based on the technical characteristics of the devices it manages.

Dans un mode particulier de réalisation, l’orchestrateur ORC peut gérer des sous-orchestrateurs et leur déléguer tout ou partie du découpage hybride.In a particular embodiment, the ORC orchestrator can manage sub-orchestrators and delegate all or part of the hybrid slicing to them.

Chaque orchestrateur ou sous-orchestrateur gère l’exécution de tâches par des sous-réseaux issus du découpage hybride dont il a la responsabilité. On note :
- DPi : des dispositifs sécurisés du service de production PROD ;
- DGi : des dispositifs moins sécurisés du service de graphisme GRAPH;
- DMi : des dispositifs moins sécurisés du service de comptabilité COMPTA ;
- DSi : des dispositifs moins sécurisés du service de secrétariat SECR.Each orchestrator or sub-orchestrator manages the execution of tasks by subnets resulting from the hybrid division for which it is responsible. We note:
- DPi: secure devices from the PROD production service;
- DGi: less secure devices of the GRAPH graphics service;
- DMi: less secure devices from the COMPTA accounting service;
- DSi: less secure devices from the SECR secretarial service.

Dans le mode de réalisation d’écrit ici, on considère que l’orchestrateur ORC gère l’ensemble des dispositifs.In the embodiment written here, it is considered that the ORC orchestrator manages all the devices.

L’orchestrateur ORC est responsable de l’ordonnancement des tâches à exécuter par les différents sous-réseaux par les dispositifs (ou « workers » en anglais).The ORC orchestrator is responsible for scheduling the tasks to be executed by the different subnets by the devices (or “workers” in English).

Les dispositifs exécutent les tâches qui leur sont confiées par l’orchestrateur ORC (ou le cas échéant par un sous-orchestrateur) via un réseau de communication.The devices perform the tasks assigned to them by the ORC orchestrator (or, where appropriate, by a sub-orchestrator) via a communication network.

Dans l’exemple de la , un module de routage MR permet la communication entre l’orchestrateur ORC et les dispositifs.In the example of the , an MR routing module enables communication between the ORC orchestrator and the devices.

Grâce à cette architecture :
- seuls l’orchestrateur ORC et le module de routage MR connaissent l’intégralité du modèle de réseau de neurones original VGG16 ;
- seuls l’orchestrateur ORC et le module de routage MR connaissent les données dans leur totalité ;
- les dispositifs ne connaissent que le ou les sous-réseaux qui leurs sont confiés et la partie des données en entrée et en sortie de ces sous-réseaux.Thanks to this architecture:
- only the ORC orchestrator and the MR routing module know the entire original VGG16 neural network model;
- only the ORC orchestrator and the MR routing module know the data in its entirety;
- the devices only know the subnet(s) entrusted to them and the part of the data entering and leaving these subnets.

Dans cet exemple, les sous-réseaux obtenus par découpage du réseau original VGG16 qui traitent ou produisent des données considérées comme les plus sensibles peuvent être distribués sur les dispositifs les plus sécurisés, les dispositifs moins sécurisés étant utilisés pour ne traiter que des données intermédiaires et partielles difficilement interprétables puisqu’ayant subi des transformations non connues.In this example, the subnetworks obtained by cutting the original VGG16 network which process or produce data considered to be the most sensitive can be distributed on the most secure devices, the less secure devices being used to process only intermediate and partial data which are difficult to interpret since they have undergone unknown transformations.

La solution d’exécution permet ainsi à l’industriel de tirer parti de son infrastructure en protégeant ses secrets y compris sur des dispositifs dont les politiques de sécurité sont moins fortes.The execution solution thus allows the manufacturer to take advantage of its infrastructure by protecting its secrets, including on devices with less strong security policies.

Dans un mode particulier de réalisation, le module de routage utilisé est MQTT.In a particular embodiment, the routing module used is MQTT.

On rappelle que le protocole MQTT est un protocole de communication adapté pour traverser différents réseaux et bâti autour d’une logique d’adressage.We remind you that the MQTT protocol is a communication protocol adapted to cross different networks and built around addressing logic.

Dans le mode de réalisation décrit ici, l’orchestrateur peut adresser les dispositifs soit en point-à-point soit selon un mode de diffusion (en anglais broadcast).In the embodiment described herein, the orchestrator can address the devices either point-to-point or in a broadcast mode.

Dans un mode de réalisation, l’orchestrateur peut confier l’exécution d’une même tâche à plusieurs dispositifs et comparer les résultats produits par ces dispositifs afin de détecter le comportement anormal d’un dispositif, causé par une panne ou par un acte malveillant par exemple.In one embodiment, the orchestrator can entrust the execution of the same task to several devices and compare the results produced by these devices in order to detect the abnormal behavior of a device, caused by a failure or by a malicious act for example.

Ce mode de réalisation permet de fiabiliser le procédé.This embodiment makes the process more reliable.

Utiliser MQTT pour distribuer l’exécution d’un réseau de neurones est original.Using MQTT to distribute the execution of a neural network is original.

Dans le mode de réalisation décrit ici, une politique d’optimisation peut être mise en place pour répartir les sous-réseaux sur les différents dispositifs de sorte à limiter une fonction de coût prenant en compte :
i/ un coût de communication prenant en compte le volume de données envoyé par l’orchestrateur ORC à chacun des dispositifs, le volume de données reçu par l’orchestrateur ORC reçu de chacun des dispositifs, une distance (distance physique, nombre de sauts dans le réseau, latence, …) entre l’orchestrateur ORC et chacun des dispositifs,
ii/ un coût de synchronisation pour fusionner les résultats produits par les différents sous-réseaux issus d’un même découpage horizontal.In the embodiment described herein, an optimization policy can be implemented to distribute the subnetworks across the different devices so as to limit a cost function taking into account:
i/ a communication cost taking into account the volume of data sent by the ORC orchestrator to each of the devices, the volume of data received by the ORC orchestrator from each of the devices, a distance (physical distance, number of hops in the network, latency, etc.) between the ORC orchestrator and each of the devices,
ii/ a synchronization cost to merge the results produced by the different subnetworks resulting from the same horizontal division.

De façon avantageuse, la politique d’optimisation peut limiter les chargements/déchargements en mémoire de sous-modèles, en attribuant à chaque dispositif une tâche (autrement dit un sous-réseau) qui n’évolue pas dans le temps. Cette politique présente un double intérêt puisqu’elle permet à la fois d’économiser des temps non négligeables de chargement de sous-modèles, mais aussi car elle permet de cartographier spécifiquement la connaissance de chaque dispositif. Cette politique présente un intérêt pour le traitement de tâches par lots (batch).Advantageously, the optimization policy can limit the loading/unloading of sub-models in memory, by assigning to each device a task (in other words a sub-network) that does not evolve over time. This policy has a double interest since it allows both to save significant times of loading sub-models, but also because it allows to specifically map the knowledge of each device. This policy is of interest for batch processing of tasks.

Un orchestrateur ORCD de distribution conforme à un mode de réalisation de l’invention va maintenant être décrit en relation avec la . Cet orchestrateur est un équipement informatique, tel un ordinateur.A distribution ORCD orchestrator according to one embodiment of the invention will now be described in relation to the . This orchestrator is a computer equipment, like a computer.

Il comprend :It includes:

- une unité de traitement ou processeur 801, ou CPU, destinée à charger des instructions en mémoire, à les exécuter, à effectuer des opérations ;- a processing unit or processor 801, or CPU, intended to load instructions into memory, to execute them, to carry out operations;

- un ensemble de mémoires, dont une mémoire volatile 802, ou RAM utilisée pour exécuter des instructions de code, stocker des variables, etc., et une mémoire de stockage 803 de type EEPROM. En particulier, la mémoire de stockage 803 est agencée pour mémoriser un module logiciel de distribution de paramètres qui comprend des instructions de code pour mettre en œuvre les étapes du procédé de distribution tel que décrit précédemment.- a set of memories, including a volatile memory 802, or RAM used to execute code instructions, store variables, etc., and a storage memory 803 of the EEPROM type. In particular, the storage memory 803 is arranged to store a parameter distribution software module which comprises code instructions for implementing the steps of the distribution method as described above.

L’orchestrateur de distribution comprend également :
- un module MPD de partitionnement d’un réseau de neurones en un ensemble de sous-réseaux, ledit module étant configuré pour :
(i) effectuer au moins un découpage dit vertical du réseau de neurones entre deux couches pour obtenir au moins un sous-réseau dit vertical, un sous-réseau vertical étant constitué d’une ou plusieurs couches consécutives dudit réseau ; et
(ii) effectuer au moins une étape dite de découpage horizontal d’un ensemble d’au moins une couche du réseau pour obtenir au moins un sous-réseau dit horizontal comportant une partie seulement des neurones de l’ensemble, le résultat d’une inférence d’un sous-réseau horizontal étant uniquement fonction des paramètres des neurones de ce sous-réseau ;
- un module MED d’envoi des paramètres d’au moins un dit sous-réseau pour qu’il soient distribués à au moins un dispositif d’exécution (W) configuré pour réaliser une inférence du sous-réseau,
le résultat de l’inférence du sous-réseau contribuant au résultat d’une inférence dudit réseau de neurones.The distribution orchestrator also includes:
- an MPD module for partitioning a neural network into a set of subnetworks, said module being configured to:
(i) performing at least one so-called vertical division of the neural network between two layers to obtain at least one so-called vertical subnetwork, a vertical subnetwork being made up of one or more consecutive layers of said network; and
(ii) performing at least one so-called horizontal cutting step of a set of at least one layer of the network to obtain at least one so-called horizontal sub-network comprising only part of the neurons of the set, the result of an inference of a horizontal sub-network being solely a function of the parameters of the neurons of this sub-network;
- a MED module for sending the parameters of at least one said subnetwork so that they are distributed to at least one execution device (W) configured to perform an inference of the subnetwork,
the result of the inference of the subnetwork contributing to the result of an inference of said neural network.

Un orchestrateur ORCI d’inférence conforme à un mode de réalisation de l’invention va maintenant être décrit en relation avec la . Cet orchestrateur est un équipement informatique, tel un ordinateur.An ORCI inference orchestrator according to one embodiment of the invention will now be described in relation to the . This orchestrator is a computer equipment, like a computer.

Il comprend :It includes:

- une unité de traitement ou processeur 901, ou CPU, destinée à charger des instructions en mémoire, à les exécuter, à effectuer des opérations ;- a processing unit or processor 901, or CPU, intended to load instructions into memory, to execute them, to carry out operations;

- un ensemble de mémoires, dont une mémoire volatile 902, ou RAM utilisée pour exécuter des instructions de code, stocker des variables, etc., et une mémoire de stockage 903 de type EEPROM. En particulier, la mémoire de stockage 903 est agencée pour mémoriser un module logiciel d’inférence qui comprend des instructions de code pour mettre en œuvre les étapes du procédé d’inférence tel que décrit précédemment.- a set of memories, including a volatile memory 902, or RAM used to execute code instructions, store variables, etc., and a storage memory 903 of the EEPROM type. In particular, the storage memory 903 is arranged to store an inference software module which comprises code instructions for implementing the steps of the inference method as described above.

L’orchestrateur d’inférence comprend également :
- un module MRI de réception de données de données initiales ;The inference orchestrator also includes:
- an MRI module for receiving initial data;

- un module MOI d’ordonnancement configuré, pour envoyer à chacun des sous-réseaux d’ordre n issus d’une partition hybride, des données d’entrée de ce sous-réseau, lesdites données d’entrée étant :
(i) au moins une partie des données initiales pour ledit au moins un sous-réseau d’ordre 1 ou ;
(ii) pour un sous-réseau d’ordre n supérieur à 1, des données obtenues à partir de données de sortie DOUT, DFOUT obtenues par l’inférence d’au moins un sous-réseau d’ordre n-1 ;
- un module MCI de restitution de l’inférence dudit réseau de neurones, celle-ci étant obtenue à partir des données de sortie dudit au moins un sous-réseau d’ordre maximal.- a MOI scheduling module configured to send to each of the subnetworks of order n from a hybrid partition, input data from this subnetwork, said input data being:
(i) at least part of the initial data for said at least one subnetwork of order 1 or;
(ii) for a subnetwork of order n greater than 1, data obtained from output data DOUT, DFOUT obtained by the inference of at least one subnetwork of order n-1;
- an MCI module for restoring the inference of said neural network, this being obtained from the output data of said at least one maximum order sub-network.

Conformément à l’invention, les sous-réseaux sont ordonnés, l’ordre d’un sous-réseau correspondant à l’ordre de la couche dont ils sont issus dans le réseau de neurones.According to the invention, the subnetworks are ordered, the order of a subnetwork corresponding to the order of the layer from which they come in the neural network.

Claims

Method for distributing parameters of a neural network to at least one device, the method being implemented by an orchestrator (ORC, SORC) and comprising steps of:
- partitioning said network into a set of subnetworks, said partitioning comprising:
(i) at least one step (E20) called vertical cutting of the neural network between two layers of the network to obtain at least one so-called vertical sub-network, a vertical sub-network being made up of one or more consecutive layers of said network; and
(ii) at least one step (E40) called horizontal cutting of a set of at least one layer of the network to obtain at least one so-called horizontal sub-network comprising only part of the neurons of the set, the result of an inference of a horizontal sub-network being solely a function of the parameters of the neurons of this sub-network;
- sending (E60, E70) the parameters of at least one said subnetwork so that they are distributed to at least one execution device (W) configured to perform an inference of the subnetwork,
the result of the inference of the subnetwork contributing to the result of an inference of said neural network.

Distribution method according to claim 1, characterized in that at least one vertical cutting is carried out after a grouping layer of the neural network.

Distribution method according to claim 1 or 2 comprising:
- a step (E30) for determining whether at least one execution device has sufficient resources to perform an inference of a said subnetwork by comparing the number of parameters of this subnetwork with a maximum number of parameters that can be processed by this execution device; and if this is not the case;
- at least one step (E20, E40) of vertical or horizontal cutting of said subnetwork.

Distribution method according to any one of claims 1 to 3, characterized in that said at least one horizontal cutting is applied to an assembly consisting of a single layer of said network.

Dispensing method according to claim 4, characterized in that said at least one horizontal cutting is applied to a dense layer.

A distribution method according to claim 3 to 5, wherein said at least one horizontal split is applied to a layer if and only if the method does not identify any execution device having sufficient resources to perform an inference of this layer.

Distribution method according to any one of claims 1 to 6, characterized in that it comprises a step (E50) of determining a sensitive nature of the data processed or generated by a subnetwork and in which at least the partitioning of said network into subnetworks or a selection of the devices to which the parameters of said subnetworks are provided is carried out according to this determination.

A method of inferring a neural network partitioned into a plurality of ordered subnetworks, the plurality of subnetworks comprising:
(i) at least one so-called vertical sub-network consisting of one or more consecutive layers of said network; and
(ii) at least one so-called horizontal subnetwork comprising only part of the neurons of a vertical subnetwork, the result of an inference of a horizontal subnetwork being solely a function of the parameters of the neurons of this subnetwork;
the parameters of each of said subnetworks being provided to a device (SORC, W) configured to obtain an inference of said subnetwork,
said subnetworks being ordered, the order of a subnetwork corresponding to the order of the layer from which it comes in the neural network;
the method comprising a step of receiving (F10) initial data; and for each of the sub-networks of order n taken in said order, steps of:
- sending (F32, F42) input data to said subnetwork, said input data being:
(i) at least part of the initial data for said at least one subnetwork of order 1 or;
(ii) for a subnetwork of order n greater than 1, data obtained from output data (DOUT, DFOUT) obtained by the inference of at least one subnetwork of order n-1;
- the inference of said neural network being obtained from the output data of said at least one maximum order sub-network.

Inference method according to claim 8, comprising a step of storing (F36, F46) the output data produced (DOUT) by the inference of a said sub-network.

Distribution orchestrator (ORCD) configured to distribute parameters of a neural network to at least one device, said orchestrator comprising:
- a module (MPD) for partitioning said network into a set of subnetworks, said module being configured to:
(i) performing at least one so-called vertical division of the neural network between two layers to obtain at least one so-called vertical subnetwork, a vertical subnetwork being made up of one or more consecutive layers of said network; and
(ii) performing at least one so-called horizontal cutting step of a set of at least one layer of the network to obtain at least one so-called horizontal sub-network comprising only part of the neurons of the set, the result of an inference of a horizontal sub-network being solely a function of the parameters of the neurons of this sub-network;
- a module (MED) for sending the parameters of at least one said subnetwork so that they are distributed to at least one execution device (W) configured to perform an inference of the subnetwork,
the result of the inference of the subnetwork contributing to the result of an inference of said neural network.

An inference orchestrator (ORC) configured to perform inference of a neural network partitioned into a plurality of ordered subnetworks, the plurality of subnetworks comprising:
(i) at least one so-called vertical sub-network consisting of one or more consecutive layers of said network; and
(ii) at least one so-called horizontal subnetwork comprising only part of the neurons of a vertical subnetwork, the result of an inference of a horizontal subnetwork being solely a function of the parameters of the neurons of this subnetwork;
the parameters of each of said subnetworks being provided to a device (SORC, W) configured to obtain an inference of said subnetwork,
said subnetworks being ordered, the order of a subnetwork corresponding to the order of the layer from which it comes in the neural network;
the inference orchestrator comprising:
- a module (MRI) for receiving initial data;
- a scheduling module (MOI) configured to send to each of the subnetworks of order n taken in said order, input data of this subnetwork, said input data being:
(i) at least part of the initial data for said at least one subnetwork of order 1 or;
(ii) for a subnetwork of order n greater than 1, data obtained from output data (DOUT, DFOUT) obtained by the inference of at least one subnetwork of order n-1;
- a module (MCI) for restoring the inference of said neural network, this being obtained from the output data of said at least one maximum order sub-network.

System comprising:
- a partition orchestrator (ORC) according to claim 10 and/or an inference orchestrator according to claim 11; and
- at least one execution device (W) configured to perform the inference of at least one subnetwork.

System according to claim 12 wherein said execution device (W) is integrated into a home gateway.

Computer program (PG) comprising instructions for executing the steps of the distribution method according to any one of claims 1 to 7 and/or instructions for executing the steps of the inference method according to any one of claims 8 or 9 when said program is executed by a computer.

Computer-readable recording medium on which a computer program (PG) according to claim 14 is recorded.