WO2021233758A1

WO2021233758A1 - Determining the state of health of a system on the basis of a network of measured time sequences

Info

Publication number: WO2021233758A1
Application number: PCT/EP2021/062666
Authority: WO
Inventors: Themis Palpanas; Paul Boniol; Mohammed Meftah; Emmanuel Remy
Original assignee: Electricite De France; Universite De Paris
Priority date: 2020-05-20
Filing date: 2021-05-12
Publication date: 2021-11-25
Also published as: FR3110719A1; FR3110719B1

Abstract

The invention proposes a method for determining a state of health of a system of interest equipped with a sensor, comprising: - obtaining a time series of measurements, - extracting sub-sequences of the series, - constructing a network representing the time series, in which each node represents sub-sequences, and each weighted connection between two nodes represents the number of times a sub-sequence represented by one of the nodes is followed by a sub-sequence represented by the other node, - attributing a normality score to at least one sub-sequence of the series on the basis of a set of connected nodes of the network forming the sub-sequence, - identifying at least one abnormal sub-sequence, on the basis of the normality scores, and on the basis of the identified abnormal sub-sequence, determining the state of health of the system of interest.

Description

Title: Determination of the state of health of a system from a network of measured temporal sequences

Technical area

[0001] This is in the field of data science.

[0002] More particularly, the present disclosure relates to methods for determining the state of health of systems equipped with sensors, as well as to computer programs, computer readable storage devices and processing circuits for the implementation. such processes.

Prior art

[0003] Monitoring an operating equipment consists of setting up a system that reports in real time on the status of the equipment. Very conventionally, the equipment is equipped with sensors, in particular digital sensors which measure physical parameters (flow rate, pressure, temperature, number of beats per minute, etc.).

[0004] The measurements from these sensors can therefore be stored and analyzed, mainly in the form of time series, which are sequences of time-stamped values.

[0005] The analysis of these time series, in particular the detection of an anomaly, therefore makes it possible to monitor the state of the equipment and, if necessary, to launch alarms when leaving the normal operating range of the equipment, which can have consequences on the operation such as production blockages, premature wear or other.

The detection of anomalies in large time series, typically in time series formed from data accumulated over several years with a granularity of the order of a second, is an important problem finding applications in a large number of areas, including in particular: - aeronautics, automobiles and railways, for example for monitoring the operation of vehicles,

- smart cities and factories, for example for forecasting electricity consumption,

- the Internet of Things, for example for the detection of gestures and movements from a connected watch,

- the industrial internet of things, for example for monitoring the operation of an industrial device or system, such as for example a production line bringing together several devices, or an industrial site such as a factory or a grouping of such sites,

- control systems such as SCADA systems, for example integrated into electricity production sites,

- health, for example for monitoring a physiological parameter such as cardiac activity or sleep activity,

- economics and finance, for example for the detection of fraud,

- telecommunications and information systems, for example for the management of data centers,

- cybersecurity, for example for intrusion detection,

- web services, for example for analyzing user web sessions for the detection of new behaviors, and

- law, for example for the analysis of court cases and the characterization of discriminating elements.

[0007] It is then desirable to detect, among data samples from a time series, each sample being formed from a sub-sequence of time-stamped values extracted from the time series, those deviating from a standard and thus constituting anomalies.

Several known methods for detecting anomalies can be applied either directly to data samples without preprocessing. preliminary, or based on a discrete representation of the time series.

[0009] It is in fact known to define a representation space, using a minimum number of variables, in which the data samples can be represented and classified. Various known methods make it possible to define such a representation space, in particular discrete Fourier transforms, wavelet transforms, singular value decompositions using a principal component analysis, approximations by piecewise linear functions such as SAX, etc. These methods make it possible to transform each data sample obtained into a set of n values. It is thus possible to graphically represent all the data samples in the form of a point cloud in an n-dimensional space, each point corresponding to a data sample. The resemblance between two data samples can be expressed as the Euclidean distance between the two points corresponding to these two samples in n-dimensional space. The smaller this distance, the more similar the two samples are.

A known anomaly detection method consists in detecting the data samples whose Euclidean distance from their closest neighbor in n-dimensional space is the greatest. This method is based on the notion of discord. The notion of discord of a time series T is defined as follows. Among all the subsequences of size i in T, the discord of T is the subsequence T i, t which has the greatest distance from its nearest neighbor. Formally, the nearest neighbor is defined as follows:

Discord is therefore defined as follows:

An illustration of this definition is shown in Figure 1. In this figure, a point symbolizes a sub-sequence of T. Three groups of sub-sequences 201, 202, 203 are represented there including an isolated sub-sequence Ti, Even if the information that the notion of discord provides is useful and interesting in some French Description use case, approaches using it fail when the time series of interest contains several similar anomalous samples. Here, only isolated abnormal subsequences, such as the subsequence Ti, ℓ 201 can be detected by their distance di from their nearest neighbor. Recurrent anomalies Tj, ℓ or Tk, ℓ having a relatively small distance dj or dk with their nearest neighbor remain undetected. The notion of m ^th Discord has been proposed to solve this problem. The notion of m ^th Discord of a time series T is defined as follows. Among all the subsequences of size ℓ in T, the m ^th Discord of T is the subsequence Ti, ℓ which has the greatest distance from its m ^th nearest neighbor. An illustration of this definition is shown in Figure 2. In this figure, a dot symbolizes a subsequence of T. As in Figure 1, three groups of subsequences 201, 202, 203 are shown there as well as an isolated subsequence Ti, ℓ. If the 3 ^rd Discord of T, i.e. the distance between each subsequence and its third closest neighbor, is retained as a parameter defining an abnormality of a subsequence, then each isolated subsequence Ti , ℓ like 201, each isolated pair of subsequences like 202 and each isolated triplet Tk, ℓ of subsequences like 203 are detected as anomalies by their distance di, 3, dk, 3. On the other hand, the groups of subsequences Tj, ℓ comprising more than three subsequences, therefore having a relatively small distance dj, 3 with their third closest neighbor, are not detected as anomalies. The two preceding concepts can be grouped together in the concept of Top km ^th Discord, defined as follows. A sub-sequence Ti, ℓ is the Top km ^th Discord of T if it has the k ^th greatest distance from its m ^th closest neighbor. Therefore, T's Discord is also the Top 11 ^th Discord. In addition, ^{T's m th} Discord is rated Top 1 m ^th Discord. In general, the known methods based on the notion of m ^th Discord aim to search for the subsequences having the m ^th closest furthest neighbor. However, these methods are found to be very sensitive to the change in the value of the parameter m. Small variations in this parameter can cause false positives to appear, that is to say, sub-sequences detected as abnormal when they are not.

[0013] The previously cited methods do not make it possible to cover all the possible cases of detection of anomalies. In the case where the number of anomalies is not known (in addition the majority of cases, including those concerning the detection of hardware failure of sensors in their operating history), and in the case where the anomalies are repeated (and therefore each anomaly has a very close neighbor), the methods using these definitions do not work optimally. They either present difficulties in providing a reliable response (with a low rate of correct detections), or require a high computation time.

Other methods falling within the field of the detection of outliers, not specifically dedicated to the field of time series, are known. Methods based on the local outlier factor (the expression "Local Outlier Factor" (LOF) is frequently used) are examples of such known methods. Similarly to methods using the m ^th Discord, methods based on the local aberrant factor comprises a step of calculating a degree measuring the density of the neighborhood of each subsequence. This method requires a parameter k indicating the number of neighbors to consider in order to measure the density of the neighborhood.

Likewise, other known methods aim to evaluate the isolation of each sub-sequence. This isolation is measured by constructing random binary trees dividing the space of the subsequences of the time series in question into two at each node, until only one subsequence is obtained in each area of the space. The depth of the tree is used to construct a score indicating the sub-sequences considered abnormal. The greater the depth to reach the subsequence to be evaluated, the more the subsequence in question is considered normal. Conversely, the shorter the depth, the more abnormal the subsequence will be considered. In order to homogenize and stabilize the score, several random trees are constructed and an average score is established. As indicated previously, these methods are not specifically dedicated to the sub-sequences of time series, they therefore fail in certain cases tested during our experimental evaluation. Not being able to detect all types of anomalies is detrimental because the state of the system studied is then not precisely monitored. Thus, the ability to predict premature wear, failure or degradation is negatively affected.

[0017] Finally, solutions using deep machine learning methods, more particularly recurrent neural networks, designated by the expression in English "Long Short Term Memory", have recently been proposed. A disadvantage of these methods is that the correct detection rate is only optimized on the condition that examples of normal subsequences, or even in some cases examples of different types of anomalies are previously provided and identified as such. These methods therefore require prior supervision, which hinders their dissemination.

The approaches which have been proposed so far in the literature for the detection of anomalies in time series, for example from sensors, bringing together measurements of a physical parameter over time have serious limits: either they require prior knowledge of the field, or they become cumbersome and expensive to use in situations where recurring anomalies of the same type occur.

There is therefore a need to be able to detect a large number of types of operating anomalies in a generic and scalable manner, adaptable to the monitoring of any system equipped with a sensor capable of measuring a value indicative of a state of current system operation. It is desirable that the detection be reliable, that is to say that operating anomalies like normal operations are correctly identified as such. It is, moreover, desirable that the detection does not require any supervision.

Abstract

The present disclosure improves the situation. In this regard, there is proposed a method for determining a state of health of a system of interest equipped with a sensor, the method being implemented by a processing circuit comprising a processor and a memory , and comprising: - obtaining an OBT T of a time series formed by a sequence of measurements from the sensor as a function of time, - an extraction EXTR Tj, ℓ of a plurality of sub-sequences of the same size l from of the time series, each sub-sequence extracted being formed by a number l of consecutive measurements in time, of said sequence of measurements, - A CONST G _ℓ (N, ε) construction of a network representing the time series, the network comprising a set of nodes and weighted connections between nodes, where each node represents a set of extracted subsequences, and each weighted connection between two nodes represents the number of times a subsequence represented by one of the nodes is followed by a sub-sequence represented ée by the other node, - A SCOR Tj, ℓ attribution of a normality score to at least one subsequence of the time series having a size greater than or equal to the size l of the extracted subsequences, from d '' a set of connected nodes of the network forming the subsequence, - An identification ID Tk, ℓ of at least one abnormal subsequence, indicating an operating anomaly of the system of interest, on the basis of the assigned normality scores , and - On the basis of said at least one abnormal sub-sequence identified, a determination DET SoH of the state of health of the system of interest. In one embodiment, the method comprises the extraction EXTR Tj, ℓ, from the time series, of all the sub-sequences of the time series formed by the same number l of consecutive measurements over time . In one embodiment, the construction CONST G _ℓ (N, ε) of the network comprises a projection PRJ Tj, ℓ of each sub-sequence extracted into a vector of a two-dimensional space, and a CONST N construction of a set of nodes of the network, where each node corresponds to a dense area of two-dimensional space. In one embodiment, the projection PRJ Tj, ℓ of each sub-sequence extracted into a vector of a two-dimensional space comprises: - for each sub-sequence extracted, a REP construction Vj, ℓ of a vector representing the subsequence, each term of the vector being a sum of a subset of components of the subsequence, - A reduction of dimensions RED 3D of a first matrix formed by the set of vectors representing the subs -sequences extracted to obtain a second matrix where each vector representing an extracted subsequence only comprises three dimensions, and - A ROT calculation of a third matrix obtained by rotating the second matrix, where a first dimension of the third matrix is defined so that the subsequences extending along this dimension are constant subsequences, and the other two dimensions are orthogonal to the first, and - The definition of the two dimensions of the projection space sub-sequences like the other two dimensions of the third matrix. In one embodiment, each vector representing an extracted subsequence comprises a number λ- ℓ of terms where each term is a sum of λ components of the subsequence, and λ is between 0.1 * ℓ and 0.5 * ℓ. In one embodiment, the vector representing a subsequence is defined by:

Where Tk is the k-th component of the subsequence and λ is an integer of determined value. In one embodiment, the reduction in dimensions of the first matrix is implemented by Principal Component Analysis. In one embodiment, the CONST N construction of the nodes of the network comprises the implementation of a circular scan of two-dimensional space by a set of radial scan vectors of different angular positions, and for each radial scanning vector: - the identification of the set of points of intersection of the projections of the sub-sequences extracted with the radial scanning vector, and - the construction of a node of the network as a local maximum of density of the points intersection identified. In other words, the number of nodes is not fixed. In one embodiment, the CONST N construction of the nodes of the network comprises the calculation of a density estimation function per kernel applied to each set of intersection points corresponding to a position in the space of the vector radial scanning, the estimation function being defined by:

where ℐ ₄ is the set of points of intersection of the projections of the time series with a radial scan vector forming an angle ψ with respect to the x-axis of two-dimensional space, r is the number of vectors of radial sweep, h is a passband parameter, μ is an average, σ is a standard deviation and n is the number of points contained in ℐ ₄ . In one embodiment, each extracted sub-sequence is associated with the node closest to its two-dimensional projection, and the construction of the network further comprises the EXT CNX construction of a connection between two nodes corresponding respectively to of the extracted sub-sequences which follow one another. In one embodiment, the attribution of a normality score to a sub-sequence comprises: - The identification of a path formed by a consecutive series of nodes of the network to form said sub-sequence, and - The calculation of the normality score as a function of the weight of the connections of the network forming the path, and of the number of connections associated with each node of the network included in the path. According to another object, there is proposed a computer program comprising instructions for the implementation of the method according to the above description, when this program is executed by a processor. According to another object, there is proposed a non-transient recording medium readable by a computer on which is recorded a program for the implementation of the method according to the above description, when this program is executed by a processor. According to another project, there is proposed a processing circuit comprising a processor connected to a non-transient recording medium according to the above description. The proposed method makes it possible to determine a state of health of a system of interest, for example by detecting anomalies of a time series measured on the system of interest, without prior knowledge of the system. In particular, the proposed method makes it possible to correctly identify unique anomalies, that is to say a behavior which has never been observed, but also recurring anomalies, which may be linked for example to a degraded operation of the equipment. , without prior knowledge. In addition, the network constructed from a sub-sequence of determined size then makes it possible to evaluate a normal or abnormal character of any sub-sequence of the series, of size greater than or equal to the size of the sub-sequences. sequences used for the construction of the network. The method also has good detection precision for a reduced computation time compared to the prior art. In particular, the fact of projecting the sequences in a two-dimensional space makes it possible to reduce the computation times necessary for the construction of the network. In addition, the proposed method involves only a limited number of linear paths of the time series, namely: a first path for the calculation of the projection of the time series, a second for the extraction of the nodes and a last for the 'extraction of connections. Brief description of the drawings [0038] Other characteristics, details and advantages will become apparent on reading the detailed description below, and on analyzing the appended drawings, in which: FIG. 1 [0039] [FIG. 1] graphically represents a known example of anomaly detection for an example of the distribution of subsequences. Fig.2 [0040] [Fig. 2] graphically represents another known example of anomaly detection for an example of distribution of subsequences. Fig.3a [0041] [Fig.3a] represents an example of time series T. Fig.3b [0042] [Fig. 3b] represents examples of sub-sequences T1, T2 and T3 extracted from the time series T of FIG. 3a. Fig.3c [0043] [Fig. 3c] is a representation of a projection according to three determined dimensions of the sub-sequences extracted from the time series T of FIG. 3a. Fig.3d [0044] [Fig.3d] represents the two-dimensional projection of the subsequences extracted from the time series T of Figure 3b, where the subsequences T1, T2 and T3 are indicated, and the creation of nodes . Fig.3e [0045] [Fig. 3e] represents an example of a part of nodes and connections between nodes obtained from the projection of figure 3d. Fig.4 [0046] [Fig. 4] shows an example of constructions of sets of points of intersection between the projections of the subsequences extracted with radial scan vectors. Fig.5a [0047] [Fig.5a] represents, on the top graph, an example of a time series, and on the bottom graph a normality score associated with the sub-sequences forming the time series. Fig.5b [0048] [Fig. 5b] represents an example of a network constructed from the time series of FIG. 5a. Fig.6 [0049] [Fig.6] schematically shows a system provided with three sensors and a processing circuit for implementing the method for determining the state of health of the system. Fig.7 [0050] [Fig. 7] represents the main steps of a method for determining a state of health of a system according to one embodiment. Description of the Embodiments [0051] The drawings and the description below essentially contain elements of a certain nature. They can therefore not only serve to better understand this disclosure, but also contribute to its definition, if applicable. With reference to FIG. 6, many SYS systems are equipped with sensors C making it possible to measure quantities indicative of their operation in the form of time series which are sequences of time-stamped values. For example, in an industrial site, a pump is equipped with a flow sensor accounting for the outlet speed of a fluid. In medicine, a patient can be fitted with an EKG machine to report cardiac activity (especially heart rate). The system of interest can therefore be a technical system, such as for example an industrial installation (factory, electricity production installation, etc.), a connected object, a vehicle, a building, an electrical or electronic device. , etc., equipped with one or more sensors capable of measuring at least one physical quantity representative of the state of the system of interest. The physical quantity is a value liable to change over time, and may be, for example but not limited to, a quantity relating to a temperature, a position, a speed, a frequency, wavelength, a quantity of heat or thermal flux, luminous flux, current, voltage, etc. as well as their temporal derivatives. The system of interest can also be a living being, such as for example a human being or an animal, equipped with one or more sensors capable of measuring at least one physiological quantity of the system of interest. The physiological quantity is liable to change over time and may be, for example but not limited to, a pulse, temperature, heart rate, oxygen level in the blood, a blood glucose value, etc. These SYS systems can be equipped with processing circuits making it possible to store and process the measurements locally. With the emergence of so-called intelligent and communicating systems, it is also possible to transmit the measurements acquired to a remote processing circuit with a view to centralized processing. The processing of the acquired measurements can make it possible to qualify the operation of the system considered. For example, by considering as a system an industrial device having to follow a preprogrammed temperature cycle and by considering as an associated sensor a temperature probe, an objective may be to detect on the basis of temperature measurements by the sensor whether the industrial appliance is functioning properly. Ideally, this detection is implemented automatically and without prior knowledge of the preprogrammed temperature cycle, nor of any anomalies in relation to this preprogrammed temperature cycle, in other words without supervision. [0058] For example, considering a person or an animal as a system and considering an electrocardiograph as an associated sensor, an objective may be to detect on the basis of electrocardiograms whether the electrical activity of the heart of the person or of the animal is normal. This detection is carried out automatically and without supervision, in particular without first providing examples of normal EKGs or EKGs with abnormal characteristics. Yet another example is that of connected objects, such as an intelligent factory where a sensor makes it possible to measure a pressure or a temperature in an installation, or even a connected vehicle whose behavior can be monitored for example by the analysis of vibration data measured by a sensor. An example of such a processing circuit PROC, performing the measured data processing method described below, is shown in Figure 6. The processing circuit shown comprises a processor CPU connected to a recording medium non-transient MEM on which is recorded a program for the implementation of a method as described below when this program is executed by the processor CPU. Reference is now made to FIG. 7 which illustrates the main steps of an embodiment of a method for determining a state of health of a system of interest equipped with a sensor. In one example, a representation of which is provided in FIGS. 5a and 5b, the system of interest may be a steam generator of an electricity production plant, equipped, among other things, with a water level sensor. By “determination of a state of health”, one understands for example the determination of a state of normal operation or not of the system of interest, or also the determination of a faulty state or not of the system of interest. The method makes it possible to determine this state on the basis of an analysis of at least one series of measurements one or more physical quantities of the system of interest acquired by the sensor (s) with which it is equipped. A time series T, formed of a sequence of measurements from the sensor as a function of time, is obtained OBT T (S1), the sequence being liable to include anomalies in these measurements. In the example indicated above, the time series obtained is a history of measurements taken by the level sensor, spaced by a regular time interval, each measurement corresponding to a value of the level of steam in the steam generator at measurement time. The size of the time series T, that is to say the total number of measurement points, is denoted | T |. The time series thus obtained is then processed in order to determine, as the “state of health of the system of interest”, whether the steam generator exhibits normal or abnormal behavior over the period of interest considered, corresponding to the series temporal. The treatment is therefore carried out without prior knowledge of the presence or absence of anomalies in the time series. It should be noted that, of course, in various industrial applications, many systems of interest are equipped with a plurality of sensors and configured to obtain a time series from each sensor. For example, a centrifugal pump is at least equipped with two pressure sensors (suction and discharge) and a flow sensor, all absolutely necessary to determine the performance and therefore to quantify the proper functioning of the equipment. Although the determination method makes it possible to process, together or separately, several time series, it is considered in this exemplary embodiment, for reasons of simplicity, the processing of one of time series originating from a single sensor in order to determine the state of health of a system of interest. All the sub-sequences Ti, ℓ are extracted EXTR Ti, ℓ (S2) from the time series T, these sub-sequences possibly comprising anomalies, giving the process an absence of supervision. The sub-sequences Ti, ℓ extracted are subsets of consecutive measurements within the time series. Each sub-sequence Ti, ℓ begins at index i, ie at the ith measurement point of T, and contains the following ℓ points. Therefore, a sub- given sequence Ti, ℓ has the size ℓ and a single point of T can be seen as a subsequence of size 1. For example the point of the time series T having the index i can be alternatively denoted Ti, 1, or Ti . Here the set of sub-sequences extracted has the same size ℓ. In one embodiment, step S2 comprises the extraction of all the sub-sequences of size ℓ of the time series T. With reference to FIG. 3a, an example of time series T has been shown from which three are extracted. T1, T2 and T3 sub-sequences of the same size. These subsequences are represented in a more developed version in figure 3b, where we notice that the subsequences T1 and T2 have substantially the same shape, but with differences in values, while the sequence T3 has a different shape from the two raw. These sub-sequences are used as a nonlimiting example but purely illustrative of the description which follows. The size of the sub-sequences extracted can be determined by a user. In the example considered, each sub-sequence thus extracted can correspond to a fixed number of consecutive measurements, for example of the order of 10, 20, 50 or 100 measurements, within a time sequence of several hours or several. days, with a measurement step for example of the order of a few seconds to a few minutes. The method then comprises the construction CONST G _ℓ (N, ε) (step S3) of a network R representing the time series T from all the sub-sequences extracted. Thus, the construction of this model is carried out with the subsequences comprising the anomalies. The following notations are used for the continuation. A node is defined as an abstract object identified by an integer, and we denote # a set of nodes. A connection is a tuple w (x _i , x _j ) with x _i , x _j ∈ N, ^{and the weight of a connection is denoted w (x} i ^{, x} j ^{). A set of connections is denoted} ℰ. A graph is defined by the pair (#, ℰ) and denoted by G ^ #, ℰ ^. A network is a graph G _ℓ (N, ε, x, y) _{where x and y are respectively values assigned to nodes and} connections. Finally, the degree of a node is the number of connections entering and leaving the node. It is denoted deg (N ⁽ⁱ⁾ ). The network is constructed such that each node of the network represents a set of extracted subsequences, and each connection between two nodes represents the number of times an extracted subsequence corresponding to one of the nodes linked by the connection follows another corresponding to the other node, in the time sequence T. To do this, the construction of the network includes the projection PRJ Ti, ℓ (S31) of each sub-sequence extracted from the time series T in a two-dimensional space. In one embodiment, this projection firstly comprises the REP representation Vi, ℓ (S311) of each subsequence extracted by a vector each term of which is a sum of a component subset of the subsequence. Typically, each sub-sequence Ti, ℓ is represented by a vector defined as follows:

where λ is an integer parameter less than ℓ which can be set by a user. Advantageously, λ can be fixed between λ = 0.1 * ℓ and λ = 0.5 * ℓ A convolutional operation of size λ is therefore applied to each extracted sub-sequence to obtain the vector representing it. This representation makes it possible to remove noise and residual disturbances, while keeping the main evolutions of the sub-sequence. Each extracted sub-sequence being represented by a vector of size ℓ- λ, the set of sub-sequences thus represented forms a matrix which we denote by Proj (T, ℓ, λ ⁾ ∈ P _{| ^ |, ℓ / +} (ℝ) where M _{| T |, ℓ-λ} is the set of matrices having | T | rows and ℓ- λ columns. An operation of reduction of dimensions RED 3D (S312) of this first matrix Proj (T, ℓ, λ) is then carried out to arrive at a three-dimensional space, that is to say to obtain a second matrix always comprising | T | rows, each row corresponding to a vector representing an extracted subsequence, but comprising only three columns. We denote this second matrix Proj _r (T, ℓ, λ). The method then comprises the ROT calculation (S313) of a third matrix obtained by rotating this second matrix, such that a first dimension of this third matrix is collinear with the constant subsequences of the temporal sequence, and that the other two dimensions are orthogonal to this one. From this third matrix, only these two other dimensions are retained as two-dimensional space for the projection of the extracted sub-sequences. Indeed, the fact of projecting the sub-sequences according to these two dimensions makes it possible to keep after the projection of the sub-sequences only information on the shape of the sub-sequences, which makes it possible to privilege the detection of the form anomalies of the sub-sequences. -sequences, as opposed to value anomalies, characterized by the mean value of the sub-sequences, and which are rather detectable according to the first dimension.

By noting a unit sub-sequence, we define a

vector characterizing the component of constant subsequences noted

PCA ₃ returns the three important components of the Principal Component Analysis mentioned above, min and max correspond respectively to the minimum and maximum values of the time series. Noting

the unit vectors of the orthonormal base resulting from the analysis in principal component, one calculates the angles:

And from these angles, we calculate the following rotation matrix:

where are the rotation matrices respectively

associated with the angles Φ _x , Φ _y , Φ _z . The matrix resulting from this operation therefore has its first component aligned with

The underlying unit vectors are are

the vectors resulting from the previous rotation. Two-dimensional space

which is kept for the projection of the subsequences for the following steps is therefore SProj (T, ℓ, λ) on the dimensions

Referring to Figure 3c, there is shown the projection of the sub-sequences extracted from the time series T shown in Figure 3a in three dimensions at the end of step S312 of reduction of the dimensions. In FIG. 3d, the two-dimensional projection, keeping only the two dimensions orthogonal to the dimension of the constant subsequences, is represented. We observe in this figure that the sub-sequences T1 and T2 are close in the two-dimensional projection space because they do not vary by the shape but by the mean value, while the sub-sequence T3 is far from the first two due to its difference in shape.

Once the sub-sequences are projected into two-dimensional space, the construction of the network comprises the construction CONST N (S32) of a set of nodes N where each node N corresponds to a dense region of space two-dimensional projection. The number of nodes is not fixed.

For this, we perform a two-dimensional space scan by a set of radial vectors defined as is

the angle formed between the radial vector and the axis of two-dimensional space, and

all the segments [x _i-1 x _i ] of the time series formed by two consecutive points of the projection of the series T in two-dimensional space: P = SProj (T, ℓ, λ). We call radius subset the set of these intersection points

with vector

Formally:

where x is the cross product. In FIG. 4, two examples of radius subsets have been shown for two different angle values y. Once the radius subsets have been obtained, the nodes of the network are defined by estimating the density of each radius subset then by assigning each local maximum to a node N. Formally, by noting ^ the set of values of ψ, l ' set of nodes # is constructed as follows:

The function f _^ is a kernel density estimation function applied to each subset radius, the function μ represents the average of a subset radius, that is to say the average position of the points of intersection forming the radius subset, and the function σ represents the standard deviation of the positions of the points of intersection forming the radius subset. n is the number of points in the considered subset radius, which can be denoted by ^ ℐ ₄ ^. r is the number of radius subsets, which corresponds to the number of angles ψ in the set Ψ, and therefore to the number of radial scan vectors. r can be set by the user. It is preferably between 1 and 360, and preferably between 20 and 100, to allow precise space scanning but limiting the number of nodes and therefore the calculation time necessary to build the network and use it. For example r can be equal to 50. h finally is a parameter called bandwidth of the function fh, which controls the degree of smoothing of the density estimate. The value of h is optimal for according to the

DW Scott publication, “Multivariate Density Estimation. Theory, Practice, Visualization. Wiley 1992. Returning to figure 3d, we can observe the nodes of the

network constructed from the local maxima identified for the two radius subsets represented (ψ and ψ + 1), which are the same as those in figure 4. The number of nodes therefore adapts to the number of dense areas in two-dimensional space, and therefore to the dynamics of the time series. Once all the nodes of the network have been obtained, the method comprises extracting the connections between the EXT CNX nodes (S33) as well as the weights associated with each connection. For this, we go through the set SProj (T, ℓ, λ) of the projections of the sub-sequences extracted in two dimensions, and with each subsequence contained in this set is associated one of the nodes # of the network. By noting # the set of network nodes and ℰ the set of network connections, this set is constructed as follows:

In other words, the function S finds the node closest to each point in P where a point of P is the two-dimensional projection of a subsequence extracted from the time series, and d is the geometric distance. The network associated with the time series T is denoted G _ℓ (N, ε), the index ℓ coming from the fact that it is built from the set of projected subsequences of size ℓ. Then, a connection is created between two nodes each time a subsequence of the set P corresponding to a node is followed by a subsequence corresponding to another node. The number of times the connection takes place corresponds to the weight of the connection. Referring to Figure 3e, there is shown an example of connections between the nodes previously identified in Figure 3d. Once the network has been obtained, SCOR Tj, ℓ (S4) is assigned a normality score to at least one subsequence T _{j, ℓq} , where ℓ _^ ≥ ℓ, that is to say that the sub -sequence to which a score is attributed may have a size greater than or equal to that of the sub-sequences used for the construction of the network. Indeed, the normality score is defined as a function of the path that must be taken in the network to obtain the sub-sequence, or in other words, as a function of the set of nodes and of connections between the nodes forming the sub-sequence. sequence. In a mode of realization, a normality score is attributed to several subsequences, for example to all the subsequences of the time series whose size is greater than or equal to ℓ.

Series2Path denotes the function which has a sub-sequence of the time series T associates all of the successive nodes of the network corresponding to this sub-sequence

The score of a sub-sequence of the time series is determined as a function of the weights of the connections traversed to form the sub-sequence, and of the degrees of the nodes traversed. Regarding connection weights, the higher the weight of a connection, the more often this transition takes place in the time series. Moreover, the degree of a node provides information on the centrality of the node in the network: the more central the node, the higher the score. In one embodiment, the Norm normality score is defined as follows:

where w is the weight of a connection and deg is the degree of a node.

[0086] Alternatively, the normality score can be inverted to become an anomaly score.

Once the scores are calculated, it is possible to identify ID Tk, ℓ (S5) at least one abnormal subsequence in the time series used to construct the graph, indicating an anomaly in the functioning of the system of interest. For example, a subsequence with a particularly low normality score (or respectively a high abnormality score) may be considered abnormal.

The fact of having the abnormal sub-sequences makes it possible to have the times and the different types of anomalies detected on the sensor of the system of interest, which makes it possible to determine DET SoH (S6) a state of health of the system of interest. For example, it is possible to determine a cause of abnormal subsequences such as degradation, wear of a component, unexpected event, etc. Then, depending on the analysis made on the causes of the anomalies, corrective, repair or prediction actions on the operation of the system can be implemented. For example, an alert can be generated to attract the attention of an operator of the system of interest (or, if the system of interest is a person, the person themselves or a doctor or caregiver, or if the system of interest is an animal, the owner of the animal or a veterinarian) on the need to intervene to remedy this anomaly. As a variant, an intervention, a repair or maintenance operation can be planned or re-planned, if for example a maintenance operation was planned but must be brought forward. As a variant, additional treatments can be implemented to identify or diagnose the nature or the cause of the failure of the system of interest considered. It is also possible to use this information to enrich feedback on the operation of the system, for example by updating a database relating to the operation of the system. Referring to Figure 5a, there is shown the example, in the top graph, of a time series representing a measurement of the water level in a steam generator of a power plant, and FIG. 5b shows a graph obtained from this time series by application of the above method. In this representation, the width of the connections between the nodes is proportional to their weight. An example of a recurrent transition has been shown on the arrow TN and on the arrow TA an example of a rare or abnormal transition. Returning to FIG. 5a, the bottom graph represents an anomaly score calculated for the subsequences forming the time series and, used to construct the graph represented in FIG. 5b, and which makes it possible to quickly identify the abnormal subsequences. .

Claims

Claims [Claim 1] Method for determining a state of health of a system of interest equipped with a sensor, the method being implemented by a processing circuit comprising a processor and a memory, and comprising: - A obtaining OBT T (S1) of a time series (T) which is a series of measurements acquired by the sensor as a function of time and in which an operating anomaly of the system of interest is sought, - An extraction EXTR Tj, ℓ (S2) of all the sub-sequences of size ℓ of the time series T, each sub-sequence extracted being formed by a number l of consecutive measurements in time, of said series of measurements, - A CONST construction " _ℓ ^ #, ℰ ^ (S3) of a network representing the time series, the network comprising a set of nodes and weighted connections between nodes, where each node represents a set of extracted subsequences, and each weighted connection between two nodes represents the number of times a subsequence represented by one of the nodes is followed by a subsequence represented by the other node, - A SCOR assignment Tj, ℓ (S4) of a normality score of subsequences Tj, l of the time series T , having a size greater than or equal to the size l of the sub-sequences extracted, from a set of connected nodes of the network G _ℓ (N, ε), representing the sub-sequence Tj, l, - An identification ID Tk , ℓ (S5) of at least one abnormal subsequence, indicating an anomaly in the functioning of the system of interest, on the basis of the assigned normality scores, and - On the basis of said at least one identified abnormal subsequence , a determination DET SoH (S6) of the state of health of the system of interest. [Claim 2] Determination method according to the preceding claim, [Claim 3] wherein the CONST " _ℓ ^ #, ℰ ^ construct of the lattice comprises a projection PRJ Tj, ℓ (S31) of each sub-sequence extracted into a vector of a two-dimensional space, and a CONST N construct (S32) of a set of nodes of the network, where each node corresponds to a dense area of two-dimensional space. Determination method according to the preceding claim, wherein the projection PRJ Tj, ℓ (S31) of each sub -sequence extracted into a vector from a two-dimensional space comprises: - For each sub-sequence extracted, a REP construct Vj, ℓ (S311) of a vector representing the sub-sequence, each term of the vector being a sum of d 'a subset of components of the subsequence, - A reduction of dimensions RED 3D (S312) of a first matrix formed by the set of vectors representing the sub-sequences extracted to obtain a second matrix where each vector representing an extracted subsequence only has three dimensions sions, and - A ROT (S313) calculation of a third matrix obtained by rotating the second matrix, where a first dimension of the third matrix is defined such that the subsequences extending along this dimension are sub- constant sequences, and the other two dimensions are orthogonal to the first, and - The definition of the two dimensions of the projection space of the subsequences as the other two dimensions of the third matrix. [Claim 4] The method of claim 3, wherein each vector representing an extracted subsequence comprises a number λ- ℓ of terms where each term is a sum of λ components of the subsequence, and λ is between 0.1 * ℓ and 0.5 * ℓ. [Claim 5] The method of claim 3 or 4, wherein the vector representing a subsequence is defined by:

Where Tk is the k-th component of the subsequence and λ is an integer of determined value. [Claim 6] The method according to claim 3 to 5, wherein the reduction in dimensions of the first matrix is carried out by Principal Component Analysis. [Claim 7] The method according to one of claims 3 to 6, wherein the CONST N (S32) construction of the nodes of the network comprises the implementation of a circular scan of the two-dimensional space by a set of vectors of radial scanning of different angular positions, and for each radial scanning vector: - the identification of the set of intersection points of the projections of the sub-sequences extracted with the radial scanning vector, and - the construction of a network node as a local maximum density of the identified intersection points. [Claim 8] The method of the preceding claim, wherein the CONST N (S32) construction of the nodes of the network comprises calculating a per-core density estimation function applied to each set of intersection points corresponding to a position. in the space of the radial scan vector, the estimation function being defined by:

Where is the set of points of intersection of the time series projections with a radial scan vector forming an angle ψ with respect to the x-axis of two-dimensional space, r is the number of radial scan vectors , h is a bandwidth parameter, μ is an average, σ is a standard deviation and n is the number of points contained in

[Claim 9] The method according to one of claims 2 to 8, wherein each extracted subsequence is associated with the node closest to its projection in two dimensions, and the construction of the network further comprises the construction EXT CNX (S33) of a connection between two nodes corresponding respectively to extracted subsequences which follow one another. [Claim 10] Method according to one of the preceding claims, in which the attribution of a normality score to a sub-sequence comprises: - The identification of a path formed by a consecutive series of nodes of the network for forming said sub-sequence, and - calculating the normality score as a function of the weight of the connections of the network forming the path, and of the number of connections associated with each node of the network included in the path. [Claim 11] Method according to one of the preceding claims, in which the system of interest is equipped with a plurality of sensors and configured to obtain a time series from each sensor. [Claim 12] Method according to one of the preceding claims, further comprising, from the state of health determined for the system of interest, the implementation of at least one action from the group comprising: - Generation an alert, - Identification of a failure of the system of interest, - Planning of a maintenance or repair operation of the system of interest. [Claim 13] Method according to one of the preceding claims, in which the system of interest is a technical system, and the sensor is able to measure a physical quantity of the technical system, or the system of interest is a person or a person. animal, and the sensor is able to measure at least one physiological quantity of the person or of the animal. [Claim 14] Method according to one of the preceding claims, in which the system is a centrifugal pump equipped with two suction and discharge pressure sensors, and a flow sensor. [Claim 15] Method according to one of the preceding claims, in which the system is a steam generator and each measurement corresponding to a value of the level of steam in the steam generator [Claim 16] Method according to one of the preceding claims , wherein the a person or animal system, the associated sensor is an electrocardiograph, the method for detecting on the basis of electrocardiograms whether the electrical activity of the heart of the person or animal is normal [Claim 17] Method according to one of the preceding claims, in which the system is a connected object, such as a smart factory where a sensor makes it possible to measure a pressure or a temperature in an installation, or else a connected vehicle whose behavior is monitored by the device. analysis of vibration data measured by a sensor. [Claim 18] Computer program comprising instructions for implementing the method according to one of claims 1 to 17 when this program is executed by a processor (CPU). [Claim 19] Non-transient recording medium readable by a computer on which is recorded a program for the implementation of the method according to one of claims 1 to 17 when this program is executed by a processor (CPU). [Claim 20] A processing circuit comprising a processor connected to a non-transient recording medium according to claim 17.