WO2020094995A1

WO2020094995A1 - Method of neural network construction for the simulation of physical systems

Info

Publication number: WO2020094995A1
Application number: PCT/FR2019/052649
Authority: WO
Inventors: Manuel BOMPARD; Mathieu CAUSSE; Florent MASMOUDI; Mohamed Masmoudi; Houcine TURKI
Original assignee: Adagos
Priority date: 2018-11-09
Filing date: 2019-11-07
Publication date: 2020-05-14

Abstract

The subject of the invention is a method for constructing a forward propagation neural network, a set of nodes and of connection between the nodes forming a topology organized into layers, such that each layer is defined by a set of computable nodes that can be calculated during one and the same step, and the input of a processing node of a layer can be connected to the output of a node of any one of the previous layers, the method comprising a step of initializing a neural network according to an initial topology and at least one topological optimization phase, of which each phase comprises: - at least one additive phase comprising the modification of the topology of the network by the addition of at least one node and/or a connection link between the input of a node of a layer and the output of a node of any one of the previous layers, and/or - at least one subtractive phase comprising the modification of the topology of the network by the deletion of at least one node and/or a connection link between two layers, and in which each topology modification comprises the selecting of a topology modification from among a plurality of candidate modifications, on the basis of an estimation of the variation of the error of the network between each topology modified according to a candidate modification and the previous topology.

Description

Title: Method for building a neural network for the simulation of physical systems

Technical area

The invention relates to the learning of phenomena representing real systems with sparse neural networks, having very few connections.

The invention is particularly applicable to the simulation of a real static system, for example to assess the response of the real system in new situations, but also to the simulation of a real dynamic system over long times, for example to model the evolution of a real system. The dynamic model is based on a recurrent form of a propagating neural network before what we will call in the following "recurrent pattern".

The invention finds an advantageous application in the simulation at least real time of complex physical systems.

Prior art

The present invention provides a method of learning real phenomena by sparse neural networks, having very few connections. This can concern physical, biological, chemical or even computer phenomena.

State-of-the-art methods have been largely inspired by the highly redundant biological brain. Redundancy helps protect the brain from the loss of neural cells. This loss can be accidental or not. It turns out that redundant choice in artificial neural networks plays a major role in the learning process.

The first cause of redundancy is linked to the organization of the topology of the neural network by layers of neural cells. It is up to the user to define the number of layers and the number of cells per layer ... This construction is done in a manual way according to a trial and error process. The neural network must be large enough to perform the learning, but it is not minimal in size and is necessarily redundant.

This redundant nature plays a major role in the learning process. Indeed, according to LeCun's publication, Yann; Bengio, Yoshua; Hinton, Geoffrey (2015). "Deep learning". Nature. 521 (7553): 436-444, the learning process is not trapped by the local minima, when the neural network is sufficiently large.

This fundamental property makes the gradient method a possible candidate for learning. But this method, reputed to have a very low convergence rate (https://en.wikipedia.org/wiki/Gradient_descent), ensures a very good descent of the error at the start of the learning process. Hence the idea of the stochastic gradient: Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT2010 (pp. 177-186). Physica-Verlag HD, which reinforces this property by changing the error function with each iteration of the gradient. This involves applying an iteration of the gradient to each learning sample in turn. Sometimes the stochastic gradient method is applied in small groups of samples. The stochastic gradient, like the gradient, does not have good local convergence. The answer to this problem is redundancy. Indeed, due to this redundant nature, the learning process must stop prematurely to avoid the phenomenon of over-learning. Thus the gradient and stochastic gradient methods are used only in their field of efficiency.

Finally, in a redundant context, the large number of connection weights to be determined requires the use of massive amounts of data. The state of the art goes hand in hand with what is known in English as "big data".

The state of the art represents a coherent building based on redundancy. But the absence of local convergence shows that the state of the art is oriented towards qualitative learning. If the answer is greater than 0.5, it is assimilated to one and if it is less than 0.5, it is assimilated to 0. Quantitative responses have precision requirements which are not taken into account by these methods.

The present invention meets the needs of the emerging field of modeling complex physical systems by creating a digital copy, also called digital twin or Digital Twin, of the physical system, adapted to accurately predict the state of a physical system faster than the real system, and preferably thousands of times faster, so as to be able to simulate a large number of possible scenarios impacting the physical system before making the best decision for the real system.

The notion of digital twin has been introduced in the following publications:

- Glaessgen, EH & Stargel, D (April 2012), “The Digital Twin paradigm for future NASA and US Air Force vehicles ”, In 53rd Struct. Dyn. Mater. Conf. Special Session: Digital Twin, Honolulu, Hi, US.

- Tuegel, E.J., Ingraffea, A.R., Eason, T.G. & Spottswood, S.M. (201 1), “Reengineering aircraft structural life prediction using a digital twin”, International Journal of Aerospace Engineering, 201 1.

Most learning methods, when applied to quantitative phenomena, are generally limited to relatively simple cases which require only shallow models. In addition to neural methods, we can cite methods such as Kriging (in English "Kriging") and the vector-supported machine (in English "Support Vector Machine Regression"):

- Lophaven, S. N., Nielsen, H. B., & Sondergaard, J. (2002). DACE: a Matlab kriging toolbox (Vol. 2). IMM, Informatics and Mathematical Modeling, The Technical University of Denmark,:

- Balabin, R. M., & Lomakina, E. I. (201 1). Support vector machine regression (SVR / LS- SVM) - an alternative to neural networks (ANN) for analytical chemistry? Comparison of nonlinear methods on near infrared (NIR) spectroscopy data. Analyst, 136 (8), 1703-1712,

These two extremely popular methods can be likened to shallow neural networks, having only three layers of neurons.

These methods, as well as neural networks with a low number of layers cover most of the needs in the field of modeling quantitative phenomena.

The need for deep and quantitative learning appears in special cases such as:

- Dynamic modeling with recurrent neural networks. A prediction over 1000 time steps is equivalent to the creation of a neural network with several thousand layers,

- Non-linear compression of data by neural networks, where the compression rate increases considerably with the number of layers of the neural networks.

Even if the manual determination of the topology of the neural network dominates the state of the art, the question of determining a topology adapted to the problem is posed. The automatic search for an optimal topology is an old research subject in the neuronal field. We can quote for example Attik, M., Bougrain, L., & Alexandre, F. (2005, September). Neural network topology optimization. In International Conférence on Artificial Neural Networks (pp. 53-58). Springer, Berlin, Heidelberg, which is representative of pruning techniques to simplify a network.

We can cite other topological optimization methods:

- Mineu, N. L, Ludermir, T. B., & Almeida, L. M. (2010, July). Topology optimization for artificial neural networks using differential evolution. In Neural Networks (IJCNN), The 2010 International Joint Conférence on (pp. 1-7). IEEE.

- Nazghelichi, T., Aghbashlo, M., & Kianmehr, M. H. (201 1). Optimization of an artificial neural network topology using coupled response surface methodology and genetic algorithm for fluidized bed drying. Computers and electronics in agriculture, 75 (1), 84-91.

They are based on genetic algorithms. These methods are known to be very slow. Thanks to the means of calculation available, these methods are increasingly used, on a redundant basis of neural networks.

However, there are also applications for which the amount of data available is very limited (this is called "small data"), and in this case the redundant structures of neural networks cannot be used because they require more than what is available.

Other approaches consist in creating a reduced model by relying on heavy simulation software, which requires hours of calculation and which is not compatible with real time. These approaches consist in creating a space of reduced dimension on which are projected the parameters of the system. So for example, for the case of a dynamic system, by noting X, the solution of a problem not reduced at time i, a solver must, to determine X _{i + 1} from X ,, solve a system of N equations of type F (Xi, X _{i + 1} ) = 0

The number N is also the dimension of the vectors X, and X _{i + 1} . The implementation of a reduced model consists in determining a reduced orthonormal base which one notes U = (Ui, U ₂ , ..., U _n ) where n «N. We can therefore compress X, by: x _t = U ^T X _U where the x, are the size coefficients n of X ,, in the reduced base U, and we can decompress x, to obtain X, as follows: Xi " Ux _t

The reduced model consists in solving at each time step a system F (Uxi, Ux _{i + i} ) = 0 whose unknown x _{i + i} is of small size n. This system is solved in the least squares sense. As shown schematically in Figure 1, once the compressed data x _{i + i} determined from c ,, it is decompressed to implement a recursion loop on the real data.

This reduced model approach has for example been proposed in the following publications:

- Carlberg, K., Farhat, C., Cortial, J., & Amsallem, D. (2013). The GNAT method for nonlinear model reduction: effective implementation and application to computational fluid dynamics and turbulent flows. Journal of Computational Physics, 242, 623-647,

- Chinesta, F., Ladeveze, P., & Cueto, E. (201 1). A short review on model order reduction based on proper generalized decomposition. Archives of Computational Methods in Engineering, 18 (4), 395-404.

This approach is not without drawbacks, however.

First, the reduced problem is highly unstable, which means that a small data disturbance leads to a large deviation from the solution. Therefore, the approximation of the state of a complex physical system with such a model is difficult.

In addition, the minimization of | | F | | ² implies the computation of a residue, of large dimension N, a certain number of times, which can prove to be costly in computation time. However, because of the instability problem, the residue must be minimized with the greatest precision at each step. Consequently the current methods are insufficiently precise to describe complex non-linear physical systems, and too costly in computation time to be able to be used in real time in embedded systems.

The basic idea of these methods is to extract modeling information from the simulation software through the residue calculation. Our approach is so parsimonious, that it manages to capture the physical and biological phenomena conveyed by the data.

So there is no solution to date to accurately and quickly model a complex physical system, over long periods of time, in order to reproduce it in the form of a digital copy.

Statement of the invention The invention aims to remedy the shortcomings of the prior art described above, based on the use of redundant neural networks for learning real phenomena representing real systems.

In particular, an object of the invention is to propose a method of dynamic simulation of a complex physical system provided with excellent prediction capacities over long times and which is faster than the real time of the physical system.

Another object of the invention is to be applicable to both static and dynamic modeling of complex physical systems, and to also be applicable to nonlinear compression of complex systems. In fact, the compression ratio increases drastically with the depth of the network. This compression is the basis of dynamic prediction over long times.

The invention finally aims to provide a neural network structure adapted to the application which is made later, this structure being parsimonious, that is to say as small as possible to require a small amount of data for his learning.

More particularly, the invention relates to a method of constructing a forward propagation neural network, comprising a set of processing nodes and connections between the nodes forming a topology organized in layers, such as each layer is defined by a set of simultaneously calculable nodes, and the input of a processing node of a layer can be connected to the output of a node of any of the previously calculated layers,

the method comprising a step of initializing a neural network according to an initial topology comprising an input layer, at least one hidden layer comprising at least one node, and a set of output nodes,

and at least one topological optimization phase, each topological optimization phase comprising:

at least one additive phase comprising the modification of the network topology by adding at least one node and / or a connection link between the entry of a node of a layer and the exit of a node of any of the preceding layers, and / or at least one subtractive phase comprising the modification of the network topology by the removal of at least one node and / or a connection link between two layers, and in which each modification of topology includes the selection of a topology modification from among several candidate modifications, from an estimate of the variation of the network error, calculated on training data, between each topology modified according to a candidate modification and the previous topology.

Advantageously, but optionally, the modification of topology selected is that, among the candidate modifications, optimizing the variation of the error compared to the previous topology.

In one embodiment, the network error for a given topology is defined by / (r, W ^* ) where

- J is an error function between a data output from the network and a target result,

- G is the network topology, and

- W ^* is the network connection weight matrix minimizing the error function J with fixed topology G.

In one embodiment, the variation of the network error between a candidate topology and the previous topology is estimated by calculating the quantity: j (r ⁿ , W ⁿ ,) - J (r ^n_1 , W ^{n- 1 *} ) where by abuse of notation, we note

G ^p the topology of the candidate network for iteration n,

W ⁿ is a matrix of network connection weights after at least one learning iteration of the network following the candidate topological modification to iteration n, and W ^{n 1 *} is the matrix of connection weights of the iteration network n-1 minimizing the error function J with the topography G ^{p 1} fixed.

W ⁿ can then be initialized with the same connection weights as the matrix W ^{n 1 *} for the connections common to the two topologies and, in the case of an additive phase, a connection weight of zero for each link created during of the additive phase.

In one embodiment, the estimation of the variation of the network error between a modified topology and the previous topology comprises the estimation of the network error according to the modified topology from the operator of Lagrange applied to the connection weights of the neural network £ (G, W, X, A) where:

L is the operator of Lagrange,

G is the network topology,

W is a network connection weight matrix,

X = (X °, ..., X ^nc ) represents the outputs of all the nodes of the network and X 'represents the outputs of the nodes of layer i, and

A, is the Lagrange multiplier associated with the expression defining the elements of layer X '. Advantageously, during an additive phase, the variation of the network error between a candidate topology and the previous topology is estimated by calculating the quantity: L (Y ⁿ , W ⁿ , X, K) - / ( G ^p_1 , W ^{n_1 *} ) where:

- G ^P is the topology of the candidate network for iteration n,

- W ^{n 1 *} is the matrix of the connection weights of the network of the iteration topology n-1 minimizing the error function J for a fixed topography,

- W ⁿ is a matrix of the network connection weights after the candidate topological modification to iteration n, said matrix being initialized with the same connection weights as the matrix W ^{n_1 *} for the common connections between the candidate topology iteration n and the iteration topology n-1, and a zero connection weight for each link created during the additive phase. An update of W ⁿ is then obtained by minimizing £ with respect to the weight of the links created.

Advantageously, during a subtractive phase, the variation of the network error between a calculated topology and the previous topology is estimated by calculating the quantity: £ (T ⁿ , W ⁿ , X, A) -

is a restriction of W ^{n 1 *} to the topology G ^P.

In one embodiment, the neural network is adapted to simulate a real system governed by an equation of the type Y = f (X) where X is an input datum and Y is a response of the physical system, and the error J of the neural network is defined according to the topology G and the matrix W of the network connection weights, by:

is the output of the neural network, and X, and Y, are respectively input and output data generated by measurements on the real system.

In one embodiment, the method comprises, once the topology modification has been selected, the determination of a matrix of network connection weights by a method of descending the error with respect to said matrix. This step is a training of the network in the topology obtained after the topological modification.

Unlike the state of the art, this learning process is based on a descent method having rapid convergence of the Gauss-Newton type.

Advantageously, the topological optimization step is implemented as a function of average errors of the neural network on training data on the one hand, and on validation data on the other hand, in which :

- at least one additive step is implemented to reduce the error on the learning data,

at least one subtractive step is implemented, if the error on the training data becomes less than the error on the validation data beyond a predetermined tolerance, and

topological optimization is stopped when any additive or subtractive step no longer results in a reduction of the error on the training data and on the validation data.

In one embodiment, the neural network comprises at least one compression block adapted to generate compressed data, and a decompression block, the method comprising at least one topological optimization phase implemented on the block of compression and the decompression block, and further comprising, after the topological optimization of the blocks, a learning phase of the entire neural network with fixed topology.

In this case, the initialization step of the neural network includes:

- the creation of an initial neural network including:

- an input layer receiving an input X ,,

- an output layer generating an output X, and

- a central hidden layer placed between the input layer and the output layer,

- the implementation of an initial neural network learning,

the replacement, in the initial neural network, of the central hidden layer by a first intermediate layer, a new central layer, and a second intermediate layer, the intermediate layers being copies of the replaced central layer, and

- the definition of the compression block as all of the layers between the input layer and the central layer, and of the decompression block as all of the layers between the central layer and the output layer.

The method can also include iterative implementation:

- a step of subdividing the central layer by a new first intermediate layer, a new central layer, and a new second intermediate layer,

redefining the compression and decompression blocks to include the layers obtained at the end of the subdivision step and

- topological optimization of the compression and decompression blocks. In one embodiment, the method further comprises the selection of the compression and decompression block and the addition of a modeling block, respectively at the output of the compression block or at the input of the decompression block, in which at least one topological optimization phase is implemented on the modeling block, and a learning phase with fixed topology is implemented on the assembly comprising the modeling block and the compression or decompression block.

In one embodiment, the method further comprises the insertion, between the compression block and the decompression block, of a modeling block adapted to model the evolution of a dynamic system governed by an equation of the form X _{i + 1} = F (X _u Pt) + G i ³ 0 where X, is a measurable characteristic of the physical system at a given time, P, describes the internal state of the physical system, and G, describes a excitation, and the modeling block is adapted to calculate an output x _{i + 1} of the form: x _{i + 1} = hf _, w ( ^x i'PÔ + 9u i ³ 0, X ₀ = C _x (X ₀ )) (17) where:

- x, is a compression of X, by the compression block x _t = O _c (C),

- hf _'W is the function calculated by the modeling block, f and V are respectively the topology and the matrix of the connection weights of the modeling block, and

- p _k and g _k are the data representative of the excitation and the internal state of the system supplying the modeling block.

The invention also relates to a neural network, characterized in that it is obtained by the implementation of the method according to the preceding description.

The invention also relates to a computer program product, comprising code instructions for implementing the method according to the preceding description, when it is executed by a processor.

The invention also relates to a method for simulating a real system governed by an equation of type Y = f (X) where X is an input data and Y is a response of the real system, comprising:

- the construction of a neural network adapted to calculate a function f _TW such that Y ~ fr _, w (X _> by 'implementing the method according to the preceding description, the neural network possibly comprising a compression block , and

the application, to a new input datum X ,, of the neural network in order to deduce therefrom a simulation of response Y, of the system. The invention also relates to a method for simulating a dynamic physical system governed by an equation of the form X _{i + 1} = F (Xi, Pt) + G i ³ 0 where X, is a measurable quantity of the physical system at a given time, P, describes the internal state of the physical system, and G, describes an excitation, the method comprising the steps of:

- acquisition of C ,, P, and G ,,

- compression of X, to obtain a compressed data x ,,

- recurrent application, a number k of times, of a neural network modeling the dynamic physical system on the compressed data x, to obtain at least one subsequent compressed data x _{i + k} , and

- decompression of the subsequent compressed data x _{i + k} to obtain a modeling of a subsequent quantity X _{i + k} .

In one embodiment, the simulation method is implemented by means of a neural network constructed according to the method described above and comprising a compression block and a decompression block, and the compression steps of X ,, application of a neural network and decompression of x _{i + 1} are implemented respectively by means of the compression block, the modeling block and the decompression block of the neural network constructed.

The invention finally relates to a data compression method comprising:

- the construction, by the implementation of the compression method according to the above description, of a neural network comprising a compression block receiving as input an X datum and a decompression block generating at output the datum X, in which the construction of the neural network includes the implementation of at least one topological optimization phase on the compression block and the decompression block, and the application, to at least one datum representative of the state of a real system, of the compression block of the neural network constructed.

The method of constructing a neural network according to the invention makes it possible to obtain a neural network whose structure depends on the intended use or application, since the construction comprises a topological optimization phase which is governed by the network error on training and validation data.

In other words, the construction method simultaneously comprises the construction, and the learning, for a specific task, of the neural network. This allows a user of this process not to need to have specific mathematical knowledge to choose a neural network structure adapted to the targeted technical application.

More particularly, the construction method according to the invention makes it possible to build a sparse neural network, that is to say where any redundancy is removed, optimized for the intended task. This property is obtained by an incremental construction from a possibly minimal initial topology, that is to say comprising a single hidden layer comprising a single neuron, then by implementing an iterative process comprising a learning step in the current state of the network, using a method of rapid local convergence, such as the Gauss-Newton method, and a step of topological modification of the network to improve learning. In addition, the implementation of a topological optimization technique in construction plays a double role:

- Avoid local minima, where at each (rapid) convergence of the learning process, the additive topological optimization technique enriches the neural network with the element (node or link) that best improves learning. Indeed, the state of the art avoids local minima by uncontrolled redundancy and we avoid local minima by enrichment controlled by the topological gradient.

- Create a sparse neural network, and in particular reduce its depth in order to alleviate the learning problems mentioned above, and allow learning of the network even with scarce data or in small quantities.

The topological optimization method gives the neural network an innovative structure insofar as a neuron of a layer, including the output layer, can be connected to a neuron of any previous layer, including the input layer. Indeed, when a physical phenomenon depends on a large number of parameters, most of these parameters contribute in a linear way to the response of the system. Hence the advantage of connecting the corresponding inputs directly to the output layer of the neural network. The effect of weakly non-linear parameters can be taken into account by a single intermediate layer between the input and the output and so on.

The reduction in the complexity of the neural network in fact improves its generalization capacity (ability to give the right answer on unlearned data), This also makes it possible to attenuate the learning difficulties (explosive gradients and evanescent gradients) reducing the number of layers. Indeed, in a network structured in layers, certain cells can simply be used to duplicate previous cells to make them available for the next layer. This increases the complexity of the network in an unnecessary way.

This neural network, used for modeling a complex physical system, provides very good simulation quality for reduced computation times, and in particular less than the real time of the physical system. The simulation model can be constructed from measurements made during normal operation of the physical system or during test phases.

In addition, the topological optimization of the network is advantageously carried out by the use of the Lagrange operator, or Lagrangian, applied to the connection weights of the neural network. This method makes it possible to calculate in a particularly fast way the effect of a topological modification of the network (addition / elimination of a neuronal cell, addition elimination of a link), which makes it possible to quickly assess and select at each stage the best topological improvement of the neural network.

The forward propagation neural network is advantageously used, as a recurring motif, in the context of the dynamic simulation of physical systems to predict a future state of the system as a function of an initial state and possible source or excitation terms. .

The neural network is advantageously combined with an approach in which the data representative of the state of the physical system are compressed. The dynamic model simulates the future state of the system on the compressed data, then decompresses the simulated data to return to real space. Unlike the state of the art on reduced bases described above, the recursion loop is not done in real space but in the compressed data space, which eliminates noise on the data while ensuring better stability of the dynamic model. This also makes it possible to reduce the computation times in the learning and simulation phases.

Topological optimization plays a major role in the control of dynamic models. Indeed, if we perform m iterations of a recurring pattern having n layers, the learning difficulty is equivalent to that of a neural network having n x m layers. The invention therefore makes it possible to reduce n, and consequently the number of calculations and their duration, in two different ways:

- By compression which reduces the size of the recurring pattern,

- By topological optimization which reduces the number of layers of the recurring pattern. Brief description of the drawings

Other characteristics, details and advantages of the invention will appear on reading the detailed description below, and on analysis of the accompanying drawings, in which:

[Fig. 1] already described, schematically represents a dynamic simulation process by means of a reduced projection base.

[Fig. 2] schematically represents a system for implementing a method of building a neural network and simulating a physical system.

[Fig. 3] schematically represents the main steps in the construction of a neural network according to an embodiment of the invention.

[Fig. 4a] represents an example of a neural network obtained by state-of-the-art software for a given application. This is the best result obtained by the prior art software, after having carried out fifty test experiments on error.

[Fig. 4b] represents an example of a neural network obtained by implementing the construction method according to an embodiment of the invention for the same application as that of [Fig. 4a].

[Fig. 4c] represents another example of a neural network obtained by implementing the construction method according to an embodiment of the invention for modeling a complex system involving fluid-structure interactions in the automotive field .

[Fig. 5] schematically represents an example of construction of a neural network comprising a compression block and a decompression block.

[Fig. 6a] represents the implementation of a dynamic modeling method according to an alternative embodiment of the invention.

[Fig. 6b] represents the implementation of a dynamic modeling method according to another variant embodiment of the invention.

[Fig. 7a] schematically shows a top view of an installation for measuring the progress of a sodium melting front.

[Fig. 7b] represents three different power controls of an electrical resistance of the installation of FIG. 7a. [Fig. 8a] represents the compression / decompression network produced to model the data of the sensors of the installation of FIG. 7a.

[Fig. 8b] represents the dynamic modeling network produced to model the data of the sensors of the installation of FIG. 7a.

[Fig. 9a] represents, for a sensor of the installation of FIG. 7a, a comparison between the sensor data and the modeling data for one of the three power controls of FIG. 7b.

[Fig. 9b] represents, for a sensor of the installation of FIG. 7a, a comparison between the sensor data and the modeling data for another of the three power controls of FIG. 7b.

[Fig. 9c] represents, for a sensor of the installation of FIG. 7a, a comparison between the sensor data and the modeling data for a last of the three power commands of FIG. 7b.

Description of the embodiments

We will now describe a method of building a sparse neural network that can be used for modeling a physical system or phenomenon. This method, as well as the data compression methods, for simulating a static or dynamic system described below, are implemented by a computer 1 shown diagrammatically in FIG. 2, comprising a computer (for example a processor) 10, and a memory 1 1, the computer being adapted to execute instructions stored in the memory 1 1 for the implementation of these methods. The computer advantageously comprises or can be connected to at least one sensor 12 suitable for acquiring measurements of physical quantities.

The method comprises two phases: a phase of learning and building the model, and a simulation phase for operating the model. The two phases can be carried out on different equipment. Only the simulation phase is intended to record real time.

In what follows, the term “real system” means any system whose state can at least in part be measured by sensors of physical quantities. Among the real systems, there are notably physical, biological, chemical and computer systems. We suppose that the real system that we are trying to model is governed by a model of type: Y = f (X) (1) where X and Y are respectively input and output variables characterizing the system status.

For the construction of this model, we have a type database (C _ί , U _ί )

generated by measurements on the real system, the data being able to be stored in the memory 1 1, where:

Xi e ¾ ⁿ ° is an input data comprising a number n ₀ of components, the last of which, for example, is fixed at 1 and the remaining n ₀ -1 typically correspond to physical quantities representative of the state of the system, these data having been measured by means of sensors 12, and

Y; e ¾ ⁿ ° is an output data comprising a number n ₀ of components, which correspond to other physical quantities of the system, these data having also been measured by means of sensors 12.

This database is divided into two disjoint subsets, the first of which constitutes a learning database formed by the indices, for example, i = 1., Mi, M ^ M, and the rest of the indices form a validation database. The purpose of this distribution is the implementation of a cross-validation method on learning the constructed neural network.

The objective of the physical system modeling method is to construct an approximate model of (1) of the form: Y ~ f _rw (Y) (2) where f _rw is a simulation function calculated by a network of neurons defined by a topology G and a matrix or a list of matrices of connection weight W, so as to be able to simulate the output Y from an input variable X.

The topology G and the matrix W of the connection weights are determined by the minimization of an error function J of the neural network: min J (G, W) (3)

r, w

Where J quantifies the error between an output of the neural network calculated on the input database X, and the corresponding target result Y ,, calculated on the database

Neural network

Referring to Figure 3, there is shown schematically a method of building a neural network used for modeling the physical system. This neural network includes a set of processing nodes, also called neurons, and of connections between the processing nodes, each connection being weighted by a weighting coefficient, the processing nodes and the connections forming an organized topology in layers.

Unlike a conventional neural network, each layer of which takes its inputs from the outputs of the previous one and is therefore only connected to the previous layer, the neural network according to the invention is a calculation graph, of which each layer is defined by the set of nodes which can be calculated simultaneously, and the input of a processing node of a layer can be connected to the output of a processing node of any of the layers previously calculated.

Consequently, too, the set of processing nodes calculating the outputs of the neural network, hereinafter called “set of the output nodes”, does not form a layer because the output nodes can be calculated in stages different and be spread across multiple layers.

In addition, the neural network is of the forward propagation type, that is to say that it does not include any calculation loop bringing the output of a processing node to the input of the same node or of a knot of a previous layer.

Finally, the learning of the neural network is carried out during its construction, so as to adapt the structure of the neural network to the function which it must calculate.

We denote X ,, i = 1, ..., nc the layer formed by the cells which can be calculated simultaneously during step i and X '= (X °, ..., X' ) the layers already calculated in step i. We set X ° = (Xi) “which is of size n ₀ x M1 and represents the state of the input layer (in other words we apply the neural network to the data in the database that the we have). We set Y = (U -ί ^, the target values corresponding to the input X °.

By noting the number of layers of the neural network, and by associating with the layer i a number or processing nodes, we associate a matrix of the weights of connections W, of size n _{i + 1} x

at each layer. The matrix W, is very hollow. Most of its columns are null and those which are not null contain a lot of zeros. The set of connection weights of the entire neural network is then W = (W ₀ , ..., W _nc-1 ). By abuse of language, we will call this matrix object.

The neural network then implements the following calculations (hereinafter described as "the calculation algorithm") on the input data XO: X ° = °

For i = 1 to nc,

End

Where the function f _Si is the Identity function for the output processing nodes and the sigmoid: / _{5 /} (c) = _{1 + e} ^ _ _x ^ for the other processing nodes. We suppose that, for example, the last line, of x ° is formed by 1. This means that the last cell of the input layer is a bias cell. In conventional architectures, each layer, other than the output layer, has a bias cell. In the architecture according to this invention, only the input layer has a bias cell. Cells from other layers can connect directly to this cell.

The error function J of the neural network is then written: / = | | OX ^nc - X | | ²

Where O is the observation matrix making it possible to extract the output elements from Xnc. Indeed, the number of cells of the last layer noted n _nc is less than or equal to the size of the output data of the neural network n ₀ . It is for this reason that the observation operator applies to X _nc , that is to say to all the cells of the network.

The topology G of the neural network is defined by the incidence matrices of the calculation graph r = (M ₀ , ..., M _nc.1 ), where M, is an incidence matrix which has the same size as W, which is 1 for the non-harmful coefficients of W, and zero elsewhere.

Returning to FIG. 3, the method for constructing the neural network comprises a first step 100 of initializing a neural network according to an initial topology which may be minimal, namely understanding:

An input layer, comprising a set of input processing nodes whose number of nodes is imposed by the number n ₀ of input data including a bias,

- An output layer whose number of nodes n _nc is less than the number n ₀ of output data, and

- At least one hidden layer containing at least one neuron.

The initialization step also includes a determination of the optimal connection weights W ^{1 *} , that is to say connection weights minimizing the error function J for the initial topology G ¹ fixed, denoted J (T ¹ , W ^{1 *} ). This determination is made by training the neural network on the training data. One can use for this purpose, the backpropagation of the gradient, but the quantitative and deep phenomena require the use of the zero memory Gauss-Newton method, described in

- Fehrenbach, J., Masmoudi, M., Souchon, R., & Trompette, P. (2006). Detection of small inclusions by elastography. Inverse problems, 22 (3), 1055.

The zero memory Gauss Newton method combines the backpropagation of the gradient with a forward propagation method of the gradient. It significantly improves local convergence.

The method then comprises at least one topological optimization phase 200 of the neural network, determined so as to reduce the error J of the network.

The topological optimization phase can include:

- at least one additive phase, in which at least one processing node and / or at least one connection is added to the neural network, the added connection being such that it connects the input of a neuron to the output of a neuron from any previous layer, and / or

- at least one subtractive phase, in which at least one processing node and / or at least one connection is deleted from the neural network.

In addition, each topology modification 210, additive or subtractive, comprises the selection 212 from a plurality of candidate topological modifications, from an estimate 21 1 of the variation of the network error between each topology modified according to a candidate modification and the previous topology, the selected topological modification being that which optimizes the variation of the error compared to the previous topology, with the objective of maximizing the reduction of the error at each iteration. As will be seen, however, modifications of subtractive topology can induce an increase in the error J on the training data at a given iteration, but nevertheless make it possible to improve the accuracy of the network by reducing its error on the data. of confirmation.

It remains to define the choice of candidate topological modifications. In the case of a subtractive phase, all the nodes and links are candidates for a topological modification in turn.

In an additive phase, one can connect by a link, two nodes which do not belong to the same layer and which are not already connected. You can add nodes to any layer, other than the input and output layers of the network. You can also create a new layer by inserting a knot between two successive layers. A created node must be connected to the network with at least two links, at least one input link and at least one output link. The choice of links to add can be done randomly. In an additive phase, if the network is large, one can choose a thousand candidate topological modifications taken at random. The estimate of the variation is calculated for these candidate perturbations. The best disturbances,

- those which realize the smallest estimated increase in error J, for a subtractive phase,

- those which offer the largest estimated drop in error J, for an additive phase, are used to define the topology G ^p .

The variation in network error between a modified topology (candidate for iteration n) and the previous topology (iteration n-1) is measured with the optimal connection weights for each topology considered, ie -to say that it is written:

j (r ⁿ , w ^{n *} ) - KG ^h_1 , w ^{n_1 *} )

where G ^p is the topology modified according to the candidate modification to iteration n, and W ^{n *} and the matrix of optimal connection weights for this topology.

However, the calculation of a matrix of optimal connection weights for a given topology is very long, and it is not easy to calculate this error variation for all the candidate topological modifications considered.

We will therefore describe how we estimate this error variation rather than calculating it.

According to a first embodiment, for an additive phase, the connection weights W ⁿ of the topology modified by:

W ⁿ

with g the set of links of G ^p included in that of G ^{p 1} , and

The other links of G ^p are initialized to 0.

This initialization does not degrade the error, we have J (r ⁿ , W ⁿ ) = KG ^, nn ¹¹ - ^{1 *} ).

Then we perform a few learning iterations to improve W ⁿ and we estimate the variation of the error by: J (r ⁿ , W ⁿ ) - J (r ^n-1 , W ^{n-1 *} ), which is necessarily negative or zero. The aim of the additive phase is to ensure learning. In the case of a subtractive phase, the connection weights W ⁿ of the topology modified by W ⁿ = W are initialized _| " ^{_1 *} , then we can proceed to a few learning iterations to improve W ⁿ .

The error estimate is then also: J (T ⁿ , W ⁿ ) - J (T ^n_1 , W ^{n-1 *} ).

This variation is necessarily positive or zero. Otherwise W ^{n 1 *} is not optimal. Indeed, the matrix W ⁿ would offer a better solution by zeroing the removed links. This phase, which only increases the error, aims to ensure generalization: the ability to predict the neural network on data that is not part of the learning set. When the error function J increases, the average error on the validation data tends to decrease.

According to a more advantageous alternative embodiment, the estimation of the error between a modified topology and the previous topology is carried out using the Lagrange operator, or Lagrangian, applied to the internal variables of the neural network that are the network layers X = (X °, ...., X ^nc ), which

fs W- 1 * X ^{4 1} ))) (5)

Where A = (A), A, being the Lagrange multiplier associated with the equation defining X '. The multiplier A, has the same size as X '. The function tr is the trace, that is to say the sum of the diagonal terms of a matrix. According to the calculation algorithm described above of the neural network, if W and X ° are known it is possible to construct all the X 'and then all the A- The A, are well defined and are obtained by solving the equations: a _xi £ (r, W, X _w , A) = 0, (7)

We refer to the Annex at the end of the description for the resolution of these equations.

We can see that for any given W, if X is obtained by the calculation algorithm described above, then the terms under the sum sign of equation (5) cancel each other and we obtain the following equality: / (r, W) = £ (r, W, X _w , A) (6)

Thus for all W there is an equality between the error of the neural network and the Lagrangian applied to it. We can deduce :

d _w J (r, W) 5W = d _w £ (r, W, X _w> A) 5W (8)

where d _w is the total derivative with respect to W and 5W is the variation of W. As J only depends on W via X, the total derivative is written:

d _w J (r, W) 5W = d _w J (T, W) 5W + d _x J (T, W) d _w X5W = 2 (OX ^nc - Y) d _w X5W. (9) Here the total derivative d _w takes into account d _w , the partial derivative with respect to W, and the variation via the variable X. This expression cannot be used because of the cost of calculating d _w X. According to equality (6), this derivative of J can also be calculated in an explicit way without having to calculate d _w X:

d _w J (T, W) 6W = d _w £ (T, W, X _w , N) 5W

= d _w £ (Y, W, X _w> L) 5W + d _x £ (T, W, X _w , A) d _w X5W (10)

However, as by construction of A, we have d _x £ = 0 and therefore we obtain the following formula:

d _w J (r, W) 5W = d _w £ (T, W, X _w , A) 5W (11)

The A's are chosen so that the variation of the Lagrangian with respect to the X 'is zero. The Lagrangian behaves as if we had eliminated the variable X 'locally. It follows that for all W ₀ fixed, we calculate X _Wo and A _Wo and for all W close to W ₀ we have:

; (G, W) ~ x (r, W, X _Wo , L w ₀ ) (12)

This result is advantageously transposed to the selection of a candidate topological modification which minimizes the error function. Indeed, we can then, for a topological modification subtractive to iteration n, estimate the variation of the network error between a topology G ^p calculated according to a candidate modification and the previous topology G ^{p 1} is estimated by calculating the quantity :

£ (T ⁿ , W ⁿ , X, A) -Jir ⁷¹ - ¹ , ^ ⁷¹ - ^{1 *} ), (13)

Where W ⁿ =

is a simple restriction of W ^{n_1 *} to the new topology G ^p . The quantity (13) can be calculated quickly and therefore makes it possible to select the best candidate modification for each iteration.

In the case of an additive topological modification, the variation of the network error between a calculated topology and the previous topology is estimated by calculating the quantity:

£ (T ⁿ , W ⁿ , X, A - / (G ^h_1 , W ^{n-1 *} ) (14)

where W ⁿ is a matrix of network connection weights after the topological modification candidate for iteration n, said matrix being initialized with the same connection weights as the matrix W ^{n 1 *} for the same connections and a zero connection weight for each link created during the additive phase. At this initialization level, the variation given by (14) is equal to zero. To estimate the potential variation, after a learning phase, it suffices to minimize the Lagrangian compared to the only links created. It is a form of application of the Pontryagin principle:

- Ross, I. M. (2015). A primer on Pontryagin's principle in optimal control (Vol. 2). San Francisco, CA: Collegiate publishers.

The error variation estimates (13) and (14) can be improved by updating the W ⁿ :

- It suffices to apply to W ⁿ one or two learning iterations with G ^p fixed,

- By analogy with the Pontryagin minimization principle, we can minimize L (T ⁿ , W ⁿ , X, L) with respect to W ⁿ . This minimization is done at X and G ^p fixed.

Returning to FIG. 3, the topological optimization phase 200 typically includes several topological modifications of each additive and subtractive type.

The additive phases are implemented to lower the value of the error J on the training data. The subtractive phases are implemented if the error on the training data becomes less than the error on the validation data beyond a certain limit. This indeed means that the neural network has performed an over-learning process which leads it to give a bad response for the unlearned data (validation data).

Finally, the topological optimization iterations stop when any change in topology does not lead to an improvement in the accuracy of the network, that is to say when it no longer lowers the errors on the data. validation data or learning data after optimizing connection weights.

Finally, for each topological optimization phase 200, once a topological modification has been selected, the method comprises updating 213 of the network connection weight matrix by a descent method of backpropagation type of the gradient: W ⁿ <- W ⁿ - pVJ (W ⁿ ) (15) where p is the learning rate. We can also use the zero memory Gauss-Newton method.

If we compare this approach with that of the state of the prior art, we see that we are learning after each topological modification, we then need a fast convergence algorithm. The state of the art relies on redundancy to avoid local minima. In the parsimonious context, the local minima are present, but the addition of new degrees of freedom, allows us to locally modify the error function J.

FIGS. 4a and 4b show an example of comparison between a neural network (FIG. 4a) obtained by the application of state of the art software. technique for a telephone localization application and a neural network constructed according to the method described above (Figure 4b) for the same application.

It is observed that the neural network provided by the prior art software is organized by layers, each layer of which communicates only with the adjacent layers, and this neural network has 22,420 links. The one obtained by applying the above method comprises 291 links and the layers which are visible are only the graphic display of the processing nodes which can be calculated simultaneously. We see that the processing nodes of a layer can communicate with the nodes of all the previous layers.

Simulation process

Once the neural network obtained and trained on the database (X ^ Y _j ) ” _! it can then be applied to new data which is denoted theoretical data (Xi) _ieS or acquired by one or more sensors on the physical system to be simulated to generate results (Yi) _ieS · S represents the set of data for the simulation, and it is therefore disjoint from all the learning and validation data indexed from 1 to M.

Typically, the data (Xi) _ieS are representative of certain quantities characterizing the state of the real system, these data being able to be measured, and the data (Yi) _them can be representative of other quantities characterizing the state of the system physical data can be more difficult to measure, hence the need to simulate them. The data (Xi) _ieS can include command or actuator status data, the purpose of the simulation can be to determine the choice of (Xi) _ieS which allows the best response of the system (Yi) _ieS ·

We can envisage many possible applications such as for example:

- Location of a mobile phone based on the strength of the signals received by several telephone relays,

Determination of the energy consumption of a building from meteorological data.

Expression of the torque of an electric motor as a function of the three phases of the electric supply.

For these three examples, a simulation of each system was made by means of a neural network according to the preceding description, compared to a simulation by means of the software of the state of the art already compared in the section former. In this comparison, the neural network according to the invention is executed only once on each test case. On the other hand, the prior art software requires specifying the number of layers, the number of cells per layer and the weight of the links between the cells, so that 50 error tests have been made with this state software. of technique. [Table 1] below shows the mean of the error, the standard deviation on the error and the best error obtained; we note that the error obtained by the neural network described above is always less than the best error obtained by the prior art software.

[Table 1]

Another comparison can be made between the performance of the invention applied to the modeling of a complex phenomenon involving fluid-structure interactions in the automotive field, and the performance obtained by a major player in the field of digital by exploiting a solution available on the market. The neural network obtained by the invention for this application is represented in FIG. 4c and the comparison of performances can be found in [Table 2] below. [Table 2]

Compression

The neural network construction method described above can also be used for data compression.

In this regard, and with reference to FIG. 5, a neural network is constructed comprising a compression block C and a decompression block D, in which the compression block and the decompression block are neural networks built according to the process described above, using learning and validation databases comprising pairs of the form

The construction of the compression neural network includes a step 100 of initializing a neural network which comprises:

- An input layer receiving an X input ,,

- A set of processing nodes forming an output layer generating an output X, identical to the input, and

- A hidden layer that has the same number of processing nodes as the input layer and the output layer.

The method then comprises a step 101 of learning this initial neural network, on the training database, then a subtractive phase 102 conforming to a subtractive phase of the topological optimization step described below. before to reduce the size of the hidden layer without degrading the learning. We note X, 'the compression of X, at the level of the hidden layer. The method then comprises a step of subdivision 103 of the hidden layer into three layers of the same size, and a repetition of the step 101 of learning on the constituted subnetwork, and of the subtractive step 102 on the new central layer.

Then defined in a step 104 a compression block C which is constituted by all of the layers between the input layer and the central layer, and a decompression block D which is constituted by all layers between the central layer and the output layer, and step 200 of topological optimization of each block is implemented separately.

The method then comprises a step 300 of learning on the entire network thus formed. Steps 103 to 300 can then be iterated until it becomes impossible to reduce the size of the compressed vector without significantly degrading the decompression error.

The compression ratio obtained makes it possible to describe very complex structures with only a few variables. To illustrate the power of these nonlinear compression methods, we can give an example where X, = e ,, the i ^th element of the canonical base. No compression is possible by conventional linear methods. But we note that the vectors X, are parameterized by a single variable, the index i.

Advantageously, one can use the compression block, and / or the decompression block thus created to model a real system whose inputs and / or outputs are of great dimensionality.

[0158] In the case of a large dimension input, it is possible for example to insert a modeling block just after the compression block, to obtain a neural network comprising:

- A compression block, suitable for compressing input data X ,, such that x, = C (Xi),

- A modeling block adapted to calculate on compressed data a function Yi = f (Xi) and,

Here the decompression block is only used to ensure that the xi represent Xi well by ensuring that Xi = D (xi). In this case the construction method advantageously comprises at least one learning phase with additional fixed topology over the entire network f ° C. This makes it possible to correct the decompression as a function of the application, that is to say modeling. Indeed, the compression process ignores the objective to reach Y ,.

We can take the example of a system modeling the risk of developing a pathology based on the genetic characteristics of an individual. Network input data can have hundreds of thousands of inputs, while output is reduced to a single scalar. The best results in this area are based on the process outlined above.

The large dimensioned outputs give rise to a high compression ratio. This phenomenon can be explained by the cause and effect link between the Xi and the Yi. We can for example insert a modeling block just before the decompression block, to obtain a neural network comprising:

- A suitable modeling block in which the outputs Y have been replaced by their compressed version y ,, which gives yi = f (Xi),

- A decompression block, suitable for obtaining the output data Y ,, from compression coefficients y, such that Y, = D (yi).

One can advantageously carry out a final learning with fixed topology of the global network D ° /.

In the experimental approach, in particular for the simulated experiments, one can have X, of very large dimension, which are by construction not compressible. The Y's, which are generally compressible. Indeed, the resolution of partial differential equations has a regularizing effect. The fact of constructing the model yi = f (Xi) shows that finally, in a certain sense, the X, are compressible: their effect on Y, is compressible.

Dynamic system

The neural network construction method can also be used for the modeling of a dynamic physical system, in which one seeks to determine a future state of a physical system from information on its current state.

In this regard, a neural network is constructed comprising a compression block, a modeling block, and a decompression block, in which at least the compression block and the decompression block are neural networks constructed according to the method described above, using learning and validation databases comprising pairs of the form

Here, each X represents the state of the system at successive instants. If (z _j ) “_ _p represents the instantaneous state of the system studied, then x _t =

[0167] For reasons explained above, the bias is added to the data. In methods such as the ARMA method or recurrent networks of the NARX type, the next step depends on p + 1 previous steps. The use of this technique improves the stability of the model. But it also increases the size of the model and reduces its generalization capacity.

The compression of the Xs makes it possible to reduce the size of the recurring pattern, while increasing p to ensure better stability.

This compression has the advantage of filtering the noise of the X's, which is essential in the context of measured data.

For the modeling of a dynamic physical system, with reference to FIGS. 6a and 6b, a block h adapted to model the between the compression block C and the decompression block D constructed in accordance with the above description dynamic behavior of the real system, which is of the form: X _{i +} \ = FiX ^ Pi) + G _i i ³ 0 (16) where ù G, corresponds to one or more excitations representing the environment of the simulated system and P, described the internal state of the system.

The system is only known through a few measurements made over time:

The modeling block is advantageously a neural network adapted to reproduce a model of the form:

or :

- x, is a compression of X, by the compression block x _t = C _x {X _t )

- hffi is the function calculated by the modeling block, f and W are respectively the topology and the matrix of the connection weights of the modeling block, and

- p _k and g _k are the data representative of the excitation and the internal state of the system on which the modeling block is implemented.

In one embodiment, shown diagrammatically in FIG. 6a, the number of parameters for the internal state P, and the number of excitation G, is small, for example less than the size of the x ,, then it is possible to take Pi = Pi and gi = Gi. The determination of hf¾v is then done by solving the following optimization problem

The minimization with respect to f is advantageously carried out by the topological optimization step 200 described above, and for fixed f, a zero memory Gauss-Newton technique is used to estimate W.

Otherwise, in the case where the number of parameters for P and G is higher, these parameters are also compressed to obtain

Or :

- C _p is a compression operator, possibly linear, adapted to compress data P, into data p _i: and

- C _G is a compression operator, possibly linear, adapted to compress data G, into data g, of size equal to that of the data x ,.

It is compression induced by that of X ,. Even if the P, and G, do not lend themselves easily to compression, their effect on the dynamic system is compressible. FIG. 6b shows that the compression of the X induces a compression on the excitations G,. Indeed, the X, being resulting from the integration of an equation with the differences, they are more regular than the excitations P, and G ,. Therefore, their effect on the model is compressible.

This embodiment is shown schematically in Figure 6b. In this case the determination of h _P is done by solving the following optimization problem:

The minimization with respect to f is carried out by the topological optimization step 200 described above, and for fixed f, a zero memory Gauss-Newton technique is used to estimate W, C _P and C _G.

In this method, the recursion loop is not done in the real space of the Xi but in the space of the compressed data. This compression reduces the noise on the data and ensures better stability of the dynamic model, while reducing the calculation times in the learning and simulation phases. Whatever method is used for initializing W and possibly updating it, the number of topological changes to be tested can increase very quickly with the size of the neural network. To limit the amount of calculations, we can choose the configurations to be tested at random and select only the one that gives the best estimate for reducing the error.

Example By way of illustration, an example of a possible but in no way limitative application is that of modeling the fusion of a solid sodium block.

Referring to Figure 7a, we consider a small square container which contains sodium in the solid state. Figure 7a is a top view of this container, which is heated by an electrical resistor positioned in a corner of the container for the purpose of melting the sodium.

Three experiments are carried out. During each experiment, the resistance is supplied respectively by one of the three power profiles shown in Figure 7b. In this figure, the time in seconds is shown on the abscissa and the power delivered to the resistor in Watts on the ordinate.

The response of this system is represented by 9 temperature sensors 2, which only supply the value 0 if the temperature does not exceed the sodium melting threshold, and 1 if this value is exceeded.

If we denote z, the vector formed by the 9 measurements at an instant i, then X, represents the state of the system at successive times i and i-1: x _t

A "digital twin" of this dynamic system is established from data measured during the first experiment with the first power profile, and in accordance with the dynamic system simulation method previously described by first performing compression of X ,.

The compression gives rise to a neural network comprising 18 inputs (two for each of the nine sensors) and 18 outputs. With reference to FIG. 8a, which represents a neural network of compression then decompression of X ,, it is found that the compression makes it possible to represent each X, by only two coefficients.

A dynamic modeling block is then inserted between the compression block and the decompression block in the form of a recurrent neural network, the pattern of which is represented in FIG. 8b. The third input of this network (at ordinate 2) corresponds to the power injected into the resistor. In FIG. 8b, the thickness of each connection represents its intensity, that is to say it is representative in relative terms of the weight of the connection. You can see that excitement plays an important role. To improve the readability of Figures 8a and 8b, we have not shown the bias, which is connected to practically all the nodes of the network.

Referring to Figures 9a to 9c, there is shown a comparison between the actual data (data in solid lines) and the model prediction data (data in dotted lines) thus constructed, for a central sensor 20 of the container and for each of the three experiments: FIG. 9a corresponds to experiment 1, FIG. 9b corresponds to experiment 2 and FIG. 9c corresponds to experiment 3. On the abscissa is represented the time elapsed in seconds, and on the ordinate is represented the response of the sensor which is recalled that it takes as values only 0 for solid sodium and 1 for sodium liquid.

It can be noted through these figures that the position of the sodium liquefaction front depends significantly on the excitation, and that the model constructed succeeds in predicting this position in the validation cases, which are those of the figures 9b and 9c.

Annex

The derivative of the sum being equal to the sum of the derivatives, the result is established for a single learning datum: M1 = 1.

This does

Here designates the dot product in 5R ^no .

It follows from this A _nc = 2 (0 X ^nc - Y) ^T O.

And we get

0, vf, where Wj represents the sub-matrix of W _j which acts on the components of X 'The notation. ^* indicates the product component by component of two matrices of the same size.

By making F traverse the elements of the canonical base of ¾ ⁿⁱ , we obtain A [= å _j > _î tf ((fsi ( ^w _j -i * Y ¹ )

is a line vector with n, elements. By transposing, we get:

, for i = nc - I,. . , O.

We can also write in the form A _t = å _{j> i} (diag (fsi (w _] ^I _ ₁ * X ')) * (W _j ^I _ ₁ ) ^T ) A _j , for r = ne - 1, ..., 0, where diag (x) denotes the diagonal matrix whose diagonal terms are formed by the elements of the vector x.

Claims

[Claim 1] Method for building a forward propagation neural network, comprising a set of processing nodes and connections between the nodes forming a topology organized in layers, such that each layer is defined by a set of nodes which can be calculated simultaneously , and the input of a processing node of a layer can be connected to the output of a node of any of the previously calculated layers, the method comprising a step of initialization (100) of a network of neurons according to an initial topology comprising an input layer, at least one hidden layer comprising at least one node, and a set of output nodes, and at least one topological optimization phase (200), each optimization phase topological including:

- at least one additive phase comprising the modification of the network topology by adding at least one node and / or a connection link between the entry of a node of a layer and the exit of a node any of the preceding layers, and / or

- at least one subtractive phase comprising the modification of the topology of the network by the removal of at least one node and / or a connection link between two layers, and in which each modification of topology comprises the selection (212) of a modification of topology among several candidate modifications, from an estimate (21 1) of the variation of the network error, calculated on training data, between each topology modified according to a candidate modification and the previous topology.

[Claim 2] Construction method according to claim 1, in which the modification of topology selected is that, among the candidate modifications, optimizing the variation of the error compared to the previous topology, and the network error for a topology data is defined by / (G, W ^* ), where

- J is an error function between network outputs and a target result,

- G is the network topology, and

W ^* is the matrix of network connection weights minimizing the error function J with fixed topology G.

[Claim 3] Construction method according to one of the preceding claims, in which the estimation of the variation of the network error between a modified topology and the preceding topology comprises the estimation of the network error according to the topology modified from the Lagrange operator applied to the connection weights of the neural network £ (G, W, X, A) where: £ is the operator of Lagrange

G is the network topology,

W is a network connection weight matrix,

X = (X °, ..., X ^nc ) represents the outputs of all the nodes of the network and X ¹ represents the outputs of the cells of layer i, and

A is the Lagrange multiplier associated with the expression defining the elements of layer X ¹ .

[Claim 4] Construction method according to the preceding claim, in which, during an additive phase, the variation of the network error between a candidate topology and the previous topology is estimated by calculating the quantity:

£ (G ^p , W ⁿ , X, L) - / (G ⁷¹ - ¹ , W ⁷¹ - ^{1 *} )

or :

- G ^p is the topology of the candidate network for iteration n,

- W ⁿ is a matrix of network connection weights after the topological modification candidate for iteration n, said matrix being initialized with the same connection weights as the matrix W ^{n 1 *} for the common connections between the topology candidate for l 'iteration n and the iteration topology n-1, and a zero connection weight for each link created during the additive phase, then updated by minimizing £ compared to the weight of the links created

[Claim 5] Construction method according to one of claims 3 or 4 in which, during a subtractive phase, the variation of the network error between a calculated topology and the previous topology is estimated by calculating the quantity:

L (Y ⁿ , W ⁿ , C, L) - / (r ^71-1 , W ^{71-1 *} )

where W ⁿ = W ^ n ^{1 *} is a restriction of W ^{n 1 *} to the topology G ^p .

[Claim 6] Construction method according to one of the preceding claims, in which the neural network is adapted to simulate a physical system governed by an equation of type Y = f (X) where X is an input datum and Y is a response from the physical system, and the error J of the neural network is defined according to the topology G and the matrix W of the network connection weights, by:

where / r _{, ii /} (Xi) is the output of the neural network, and X, and Y, are respectively input and output data generated by measurements on the real system.

[Claim 7] Construction method according to one of the preceding claims, comprising, once the topology modification has been selected, the determination (213) of a network connection weight matrix by a method of descent of the error by compared to said matrix.

[Claim 8] Construction method according to one of the preceding claims, in which the topological optimization step (200) is implemented as a function of average errors of the neural network on training data of a on the other hand, and on validation data, in which:

- at least one additive step is implemented to reduce the average error on the training data,

- at least one subtractive step is implemented, if the error on the training data becomes less than the error on the validation data beyond a predetermined tolerance, and

- topological optimization is stopped when any additive or subtractive step no longer results in a reduction of the error on the training data and on the validation data.

[Claim 9] Construction method according to one of the preceding claims, in which the neural network comprises at least one compression block adapted to generate compressed data and a decompression block, the method comprising at least one optimization phase topological (200) implemented on the compression block and the decompression block, and further comprising, after the topological optimization of the blocks, a learning phase (300) of the entire network of neurons with fixed topology.

[Claim 10] Construction method according to the preceding claim, further comprising the selection of the compression and decompression block and the addition of a modeling block, respectively at the output of the compression block or at the input of the decompression block, in which at least one topological optimization phase (200) is implemented on the modeling block, and a learning phase with fixed topology is implemented on the assembly comprising the modeling block and the compression block or decompression.

[Claim 11] A construction method according to claim 9, further comprising inserting, between the compression block and the decompression block, a block of modeling adapted to model the evolution of a dynamic system governed by an equation of the form

Ci ₊₁ = R (Ci, Ri) + ΰi, i ³ 0

where X, is a measurable characteristic of the physical system at a given time, P, describes the internal state of the physical system, and G, describes an excitation,

and the modeling block is adapted to calculate an output x _{i + 1} of the form:

X _{i +} 1 = hf _' ftixi. pi) + g _t , i ³ 0

Xo = C _x (X ₀ (17)

or :

x, is a compression of X, by the compression block x _t = C _x (X _j ),

hfffi is the function calculated by the modeling block, f and V are respectively the topology and the matrix of the connection weights of the modeling block, and

p _k and g _k are the data representative of the excitation and the internal state of the system on which the modeling block is implemented.

[Claim 12] Neural network, characterized in that it is obtained by the implementation of the method according to one of the preceding claims.

[Claim 13] Product computer program, comprising code instructions for implementing the method according to one of claims 1 to 1 1, when executed by a processor (10).

[Claim 14] Method for simulating a real system governed by an equation of type Y = f (X) where X is an input data and Y is a response of the real system, comprising:

the construction of a neural network adapted to calculate a function f _TW such that Y ~ fr _, w (X _> P ^ar 'has implemented the method according to one of claims 1 to 1 1, and

- the application, to a new input data X, representative of a physical quantity of the system, of the neural network to deduce a simulation of response Y, of the system.

[Claim 15] Simulation method according to the preceding claim, in which the neural network further comprises a data compression block, the data compression block being obtained by the implementation of the method according to claim 9.

[Claim 16] Method for simulating a dynamic physical system governed by an equation of the form

Ci ₊₁ = R (Ci, Ri) + ΰi, i ³ 0

where X, is a measurable quantity of the physical system at a given time, P, describes the internal state of the physical system, and G, describes an excitation, the method comprising the steps of:

- acquisition of C ,, P, and G ,,

- compression of X, to obtain a compressed data x ,,

- recurrent application, a number k of times of a neural network modeling the dynamic physical system on the compressed data x, to obtain at least one subsequent compressed data x _{i + k} , and

- decompression of the subsequent compressed data x _{i + i} to obtain a modeling of a subsequent quantity X _{i + k} .

[Claim 17] Simulation method according to the preceding claim, implemented by means of a neural network constructed by the implementation of the method according to claim 1 1, and wherein the steps of compressing X ,, d ' application of a neuron network and decompression of x _{i + i} are implemented respectively by means of the compression block, the modeling block and the decompression block of the neural network constructed.

[Claim 18] Data compression method comprising:

the construction, by the implementation of the method according to one of claims 1 to 1 1, of a neural network comprising a compression block receiving as input X data and a decompression block generating as output X data, in which the construction of the neural network includes the implementation of at least one topological optimization phase on the compression block and the decompression block, and the application, to at least one datum representative of the state d 'a real system, from the compression block of the neural network constructed.