WO2024078746A1

WO2024078746A1 - A computer-implemented method for providing one or more missing values of a feature in a graph and a corresponding system

Info

Publication number: WO2024078746A1
Application number: PCT/EP2023/054218
Authority: WO
Inventors: Federico ERRICA; Timo SZTYLER
Original assignee: NEC Laboratories Europe GmbH
Priority date: 2022-10-14
Filing date: 2023-02-20
Publication date: 2024-04-18

Abstract

A computer-implemented method for providing one or more missing values of a feature in a graph is provided, wherein the graph results from or is extracted from a node of a network of nodes, comprising the following steps: recording or collecting data from at least one node of the network or from the network; transforming at least a part of the data into a graph; extracting at least one representation or node posterior representation from the graph using a definable deep architecture of Graph Convolutional, GC, layers; constructing a distribution over at least one missing feature of the graph or node using the at least one representation or node posterior representation; and extracting a value or a vector for the at least one missing feature from the distribution by means of an imputation method. Further, a corresponding system is provided.

Description

A COMPUTER-IMPLEMENTED METHOD FOR PROVIDING ONE OR MORE MISSING VALUES OF A FEATURE IN A GRAPH AND A CORRESPONDING SYSTEM

The present invention relates to a computer-implemented method for providing one or more missing values of a feature in a graph, wherein the graph results from or is extracted from a node of a network of nodes.

Further, the present invention relates to a corresponding system for providing one or more missing values of a feature in a graph, wherein the graph results from or is extracted from a node of a network of nodes.

Corresponding prior art documents are listed as follows:

[1] Pigott, Therese D. "A review of methods for missing data." Educational research and evaluation ! A (2001): 353-383.

[2] Rossi, Emanuele, et al. "On the unreasonable effectiveness of feature propagation in learning on graphs with missing node features." arXiv preprint arXiv:2111. 12128 (2021 ).

[3] Malone, Brandon, Alberto Garcia-Duran, and Mathias Niepert. "Learning representations of missing data for predicting patient outcomes." AAAi Workshop 2019.

[4] Taguchi, Hibiki, Xin Liu, and Tsuyoshi Murata. "Graph convolutional networks for graphs containing missing features." Future Generation Computer Systems 117 (2021): 155-168.

[5] Chen, Xu, et al. "Learning on attribute-missing graphs." IEEE transactions on pattern analysis and machine intelligence (2020).

[6] You, Jiaxuan, et al. "Handling missing data with graph representation learning.

Advances in Neural Information Processing Systems 33 (2020) : 19075- 19087. [7] Gordon, David, et al. "TSI-GNN: Extending Graph Neural Networks to Handle Missing Data in Temporal Settings." Frontiers in big Data (2021): 78.

Further prior art documents:

US 2020/0342362 A1 discloses systems, apparatus, instructions, and methods for medical machine time-series event data generation.

US 2010/0057651 A1 discloses knowledge-based interpretable predictive modeling comprising outputting of a representation of the model.

US 2019/0287680 A1 discloses methods and systems for generating and using graph models to perform entity-specific mappings to investigatory events.

EP 3 888 103 A1 discloses a method that can comprise predicting a target value by means of a neural network.

US 2020/0176121 A1 discloses methods and systems for imputing values for missing sensor readings.

US 2022/0157466 A1 refers to provision of missing values by means of a probabilistic graphical model.

US 8 751 273 B2 discloses model generation using a variety of tools and features of a model generation platform.

TSI-GNN: “Extending Graph Neural Networks to Handle Missing Data in Temporal Settings”, David Gordon et al., Frontiers in Big Data, www.frontiersin.org, 1 September 2021 , Volume 4, Article 693869, presents an approach for imputing missing data into bipartite graphs.

Further, “Essentials to Understand Probabilistic Graphical Models: A Tutorial about Inference and Learning”, Christine Sinoquet, https://doi.Org/10.1093/acprof:oso/9780198709022.003.0002, Pages 30-82, Published: September 2014, informs about probabilistic graphical models as well as about parameter and structure learning.

Due to external factors typically out of our control, every real-world data collection process will not operate as expected and produce observations that are incomplete. For instance, a sensor monitoring soil parameters may stop recording the temperature due to a hardware failure that is caused by adverse weather conditions. Similarly, patients never undergo all possible medical exams each time they are admitted to the hospital. Whenever data analytics and predictive systems have to be used to extract value out of the data, imputation is used to fill the missing information [1], Imputation methods are typically based on the underlying statistics of the missing information, and treat each observation as independent from the others; Mean Value Imputation, MVI, Non-Negative matrix factorization, NMF, and Maximum Likelihood Estimation, MLE approaches such as Gaussian Mixture Models, GMMs, trained on Expectation-Maximization, EM, are the de-facto choices for data that can be represented as a vector.

However, when an entity with partially missing information belongs to a network, e.g., a community, we can exploit the structure of said network and assume that the values - observed and missing - of connected entities depend on each other. To give some examples, temperature sensors that are geographically close to each other may share similar measurements, and “friends” in a social network usually share one or more common interests. Whenever an imputation technique considers such observations as independent, it may fail to precisely reconstruct the missing information and typically rely on frequentist statistics that do not convey much information.

In the recent literature, techniques based on graph representation learning have been proposed to address the phenomenon of missing data from different points of view. In [2], a simple aggregation mechanism based on the heat-diffusion equation of the graph fills missing data information by propagating the features of nearby nodes. While this mechanism is shown to be effective on downstream tasks, nothing is said in terms of the quality of reconstruction; but most importantly, this method has no way of telling when two or more values are equally plausible when reconstructing the missing feature.

Similarly to [2], the Embedding Propagation, EP, [3] architecture generates unsupervised node embeddings using a contrastive loss objective that encourages adjacent nodes to have similar representations. However, EP deals with the problem of missing data by learning an embedding that encodes “missingness” for each data modality, rather than performing reconstruction of the missing features; this hampers the efficacy of explainable machine learning predictors built upon the reconstructed features, and does not provide the user with potential alternatives about plausible reconstructed values to consider.

In [4], a GMM is used to impute values of missing features and the resulting graph is processed by a Deep Graph Network to solve a downstream task. The architecture can be trained in an end-to-end fashion, and uncertainty can be assessed through the GMM. However, there are two main problems: 1) the GMM does not exploit the structural information and 2) the loss objective does not focus on the reconstruction of the features, rather it tunes the GMM to fill information in the missing gaps in such a way that the loss value decreases. Therefore, accurate reconstruction of the missing information is not part of the objective of this work.

The work presented in [5], instead, explicitly focuses on the reconstruction of both node features and links of the input graph. It achieves the goal through a variational objective that forces node and link representations to be mutually informative for the reconstruction. This method exhibits some limitations, in particular 1) the node cannot have partially missing attributes: either it is completely missing or it is fully observable, which limits the applicability to real-world scenarios; 2) it is not inductive, meaning that the method can only be trained on a single graph and cannot transfer to different structures; 3) it does not provide an uncertainty measure of the reconstructed values.

The work of [6] tackles missing data in tabular datasets by predicting links in a bipartite graph of nodes and features using graph representation learning methods. Therefore, it does not exploit the available graph structure to reconstruct missing values, i.e. , there is no structure here.

Finally, the TSI-GNN model of [7] extends the setting of [6] to the temporal scenario, but the data remains flat rather than graph-structured.

It is an object of the present invention to improve and further develop a computer- implemented method for providing a missing value of a feature in a graph and a corresponding system for providing a missing value of a feature in a graph for providing an efficient provision of a missing value by simple means.

In accordance with the invention, the aforementioned object is accomplished by a computer-implemented method for providing one or more missing values of a feature in a graph, wherein the graph results from or is extracted from a node of a network of nodes, comprising the following steps: recording or collecting data from at least one node of the network or from the network; transforming at least a part of the data into a graph; extracting at least one representation or node posterior representation from the graph using a definable deep architecture of Graph Convolutional, GC, layers; constructing a distribution over at least one missing feature of the graph or node using the at least one representation or node posterior representation; and extracting a value or a vector for the at least one missing feature from the distribution by means of an imputation method.

Further, the aforementioned object is accomplished by a system for providing one or more missing values of a feature in a graph, preferably for carrying out the above computer-implemented method for providing a missing value of a feature in a graph, wherein the graph results from or is extracted from a node of a network of nodes, comprising: recording or collecting means for recording or collecting data from at least one node of the network or from the network; transforming means for transforming at least a part of the data into a graph; extracting means for extracting at least one representation or node posterior representation from the graph using a definable deep architecture of Graph Convolutional, GC, layers; constructing means for constructing a distribution over at least one missing feature of the graph or node using the at least one representation or node posterior representation; and extracting means for extracting a value or a vector for the at least one missing feature from the distribution by means of an imputation method.

According to the invention it has been recognized that it is possible to provide a very efficient computer-implemented method by simply using a definable deep architecture of GC layers and constructing a distribution over at least one missing feature of the graph or node using the at least one representation or node posterior representation. Based on this proceeding extracting a value or a vector for the at least one missing feature from the distribution by means of an imputation method is possible.

Thus, on the basis of the invention an efficient method and system for provision of a missing value by simple means are provided.

According to an embodiment of the invention the recorded or collected data can additionally be used during the constructing step or graph building step. This enhances effectiveness of the method. The recorded or collected data can be vital parameter of patients or data from sensors of a soil, particularly for determining where and when it is necessary to irrigate the soil, in order to keep humidity and temperature to optimal levels, for example. Such sensors can be humidity or temperature sensors. Within embodiments of the invention the data can be collected by means of a sensor or a sensor network.

Within a further embodiment at least one learned emission parameter can additionally be used during the reconstruction step for further enhancing effectiveness and accuracy of the method. According to a further embodiment the distribution can be a multimodal distribution. Thus, a corresponding system can output multimodal distributions of reconstructed information, rather than single scalars.

Within a further embodiment the method can comprise modeling of an uncertainty or of an uncertainty estimate over at least one plausible missing value for the at least one missing feature. This enhances effectiveness and accuracy of the method.

According to a further embodiment of the invention a graph of at least partially available or completed node features and/or unsupervised node embeddings can be used in subsequent downstream tasks or subject of data analytics tools. As a result a very versatile and flexible method is provided.

Within a further embodiment the method can further comprise conditioning a mixture model at each GC layer on a set of states of nodes in the neighborhood of the at least one node and learning appropriate neighborhood aggregation functions. As a result a very effective method and system can be provided.

According to a further embodiment the method can further comprise providing an end-to-end architecture made of Graph Convolutional layers, which behave as mixture models conditioned on a graph structure, for simultaneously generating unsupervised node embeddings dependent on the graph structure and using them to impute missing values or features. These features further contribute in providing an effective and accurate method for providing a missing value of a feature in a graph.

Within a further embodiment the vector for the at least one missing feature can contain values for the recorded or collected data. Thus, the structure of the vector can be very simple contributing to effectiveness and simplicity of the method.

According to a further embodiment, for performing the method a Node Posterior Module comprising the deep architecture of Graph Convolution, GC, layers, a Feature Distribution Generation Module for computing the generative distribution of a feature f of a node u for all features and nodes in the graph and an imputation strategy or conditional mean imputation strategy can be used. Such modules and/or imputation strategy contribute in providing a very versatile simple structure of the system enhancing effectiveness of a method performed with these modules and/or imputation strategy.

Within a further embodiment Bayes’ theorem can be applied to the observable node features or only to the observable node features for computing the posterior distribution encoded as a vector of dimension C. This feature enhances effectiveness and simplicity of the disclosed method.

According to a further embodiment the architecture is trained to maximize the loglikelihood of at least one node feature. As a result, effectiveness and simplicity of the method can be enhanced.

Within a further embodiment, during the method a different mixture model for each node and consequently a different multimodal distribution for each feature value can be defined. This enhances versatility and effectiveness of the method.

According to a further embodiment the method can comprise a reconstructing of plausible missing values in a graph or graphs and/or a discovering of anomalies in the graph, using probabilistic graph learning or a probabilistic graph Artificial Intelligence, Al, system. Also these features can contribute to effectiveness and simplicity of the disclosed method and system.

Advantages and aspects of embodiments of the present invention are summarized as follows:

Embodiments can comprise the following steps:

A) By conditioning the mixture model at each GC layer on a set of neighboring states and learning the most appropriate neighborhood aggregation functions, a graph-aware multimodal distribution for each missing feature is produced.

B) By proposing an end-to-end architecture made of Graph Convolutional layers, which behave as mixture models conditioned on the graph structure, a user can simultaneously: a. Generate unsupervised node embeddings dependent on the structure; b. Use them to impute missing features

Embodiments can provide a method for reconstruction of plausible missing values in a graph using Probabilistic Graph Learning.

Embodiments can require information from the user about the family of distribution to use for each feature (see section 2.2 in the following description of figures).

Embodiments can comprise the following method steps (see also Fig. 1):

1) Record or collect information, e.g., from a sensor network, and transform it into a graph with potential missing information (see also the first block of Fig. 1 and section 2.1 in the following description of figures).

2) Extract node posterior representations from the graph using the proposed deep architecture of GC layers (see the above step B).

3) Reconstruct distribution over nodes’ missing features using the learned emission parameters and the node posterior representations (see the above step A).

4) Choose an imputation method and produce a value or a vector for the reconstructed missing feature (see Section 2.4 in the following description of figures).

Embodiments propose an efficient reconstruction of plausible missing values in a graph or graphs using probabilistic graph learning.

Further embodiments propose a probabilistic graph Al system able to reconstruct at least partially missing observations or values in a network and detect local abnormalities. A corresponding system can output multimodal distributions of reconstructed information, rather than single scalars, thus modeling an uncertainty over plausible missing observations or values.

Nodes of embodiments can be connected by wire or radio communication. Further advantages and aspects of embodiments of the present invention are summarized as follows:

There is no prior art showing embodiments capable of simultaneously:

• Process graph structures in a fully probabilistic and end-to-end fashion

• Model the distribution of node features conditioned on the graph

• Reconstruct missing features by capturing a potential multimodal distribution

• Discover anomalies in the graph

Embodiments of this invention provide an all-in-one method to reason probabilistically over graphs while keeping the benefits of end-to-end training with backpropagation, which has always been the main argument in favor of neural networks in the past. For instance, embodiments of the method can be appealing because it produces unsupervised node embeddings that take advantage of huge amounts of unlabeled data, in contrast to fully-supervised methods that cannot do so. This allows to improve performances when the amount of supervised labels are very scarce, a common problem for many companies. Also, due to its probabilistic nature, the invention naturally deals with and reconstructs missing features, so it can be used to discover anomalies even when not all information is available. Embodiments of the invention are useful in situations, wherein at least one feature of features of a graph is missing and a reconstruction of missing node features or of partially missing node features has to be provided.

Further embodiments exploit the structural information or graph structure to guide the reconstruction.

Contrary to prior art methods, embodiments of the present invention simultaneously focus on 1) reconstruction of missing node features or partially missing node features; 2) modeling the uncertainty over the plausible values for a missing feature; 3) exploiting the structural information to guide the reconstruction.

Embodiments of the disclosed method comprises a graphical model operating on networks/graph-structured data. Embodiments of the disclosed method generally collect information and organize it as a network. However, these embodiments are completely domain-agnostic, and the data collection is just a necessary building block.

Within embodiments of the disclosed method the graph building is a step to pre- process the data.

Generally, embodiments of the method and/or system can provide a distribution over plausible missing values of a feature.

Embodiments of the method and/or system can be part of a medical device, for example.

Prior art documents do not predict missing information of entity conditioned on neighboring nodes. In these methodologies, there is no notion of graph neighbors because they do not operate on graphs but on vectorial data. The neighbors they mention relate to those in the graphical models, not in the input graph.

There are several ways how to design and further develop the teaching of the present invention in an advantageous way. To this end it is to be referred to the following explanation of examples of embodiments of the invention, illustrated by the drawing. In the drawing

Fig. 1 shows in a block diagram components of an embodiment of the system of the invention for node feature reconstruction when applied to Embodiment 2,

Fig. 2 shows a diagram an embodiment of a Graph Convolution Layer,

Fig. 3 shows a schema of distribution generation for a continuous missing feature f of node u of an embodiment of the proposed invention and

Fig. 4 in a diagram nodes with varying portions of missing features. 1. Introduction: Embodiments of the invention can consist of three components that determine how the missing values are reconstructed. These are the “Node Posterior Module”, the “Feature Distribution Generation” module and the “Imputation Phase” module for performing an imputation strategy, as shown in Fig. 1. In the following, there is described in detail each element of the above pipeline, which is instantiated for the specific case of later mentioned Embodiment 2 for ease of understanding.

2.1 Input and problem definition: There can be given a collection of graphs {gi g2,...,gN} extracted from soil sensors, which are considered to be connected if their distance falls below a certain range. Each graph is a tuple ( V_g, E_g, _g), ]/_g is a set of nodes (here: sensors), E_g is a set of directed edges (u,v) connecting pairs of nodes, _g is a domain of node features and A< is a domain of edge features. We denote by

the feature vector of node t/(e.g. containing measurements of soil ’s temperature, humidity etc.). As depicted in Fig. 1 , some node features, i.e., positions in the vector x_u, can be missing (bold black X) due to sensors’ malfunctioning, so there is no access to that information.

2.2 Node Posterior module: The analysis is now restricted to a single graph in the collection, noting that embodiments of the invention handle graphs of different sizes and shapes. The first component to process the input graph is the Node Posterior module, a deep architecture of L layers where each layer is a Graph Convolution, GC.

To understand how the overall architecture works, there is shown in Fig. 2 a schematized representation of each intermediate layer H. Here, X_u represents the random variable whose realization is the feature vector x_u of node u, which may include some missing information. Instead, Q_u ^f is another categorical random variable that denotes the hidden state - out of C possible states - encoding the structural information around node u. In practice, this layer has the same graphical structure of a conditional mixture model, where Q_u ^f determines the weight of each mixture component and the emission parameters

specify the parameters of the generative distributions P₀# (X_u | Q_u ^{ = i) for each feature and each mixture component, respectively. Importantly, the choice of the family of distributions to use depends on the features and has to be made by a user. The latent variable value also depends on the state of the neighboring latent variables computed at the previous layer, i.e. , P(Qu = i

. By applying the Bayes’ theorem only to the observable node features, there will be computed the posterior distribution encoded as a vector of dimension C. Each feature i of the vector is computed as follows:

The simplest way to compute the neighborhood aggregation is P(Qu = i |

=

The entire machine learning architecture is trained to maximize the log-likelihood of the node features. To compute it for each node u, there will be combined the mixing weights of the variable

at the last layer \N\ the emission parameters 0_U of all layers. A simple choice would be to take the emission parameters of the last layer only, but any function applied to the set {0^, 0^,... ,0^} that returns valid emission parameters is admissible. There will be trained this deep generative architecture in an end-to-end fashion using any standard gradient ascent optimizer. The loglikelihood is therefore computed as follows:

The architecture produces, for each node u, a vector of posterior probabilities associated with Q_u ^{ for each layer i . These vectors, once concatenated, represent the node embedding oft node u.

Whenever a feature is missing for a specific node, there will be ignored the contribution of that feature when computing P₀# (X_M | Q_u ^{ = i). This is known as conditional mean imputation in the literature. To be able to do so, the simplest way is to assume independence of the features when conditioned on the latent state Q_u ^{.

2.3 Feature Distribution Generation: This module is responsible for computing the generative distribution of the feature fof node u for all features and nodes in the graph. For the purpose of the invention, there is interest in the multimodal distribution of the missing node features. Fig. 3 visually depicts what happens inside the module.

Without loss of generality, as described above, there is considered both the posterior of the variable

and the emission parameters

at the last layer. There is restricted the analysis to the parameters to the missing feature f, that is, there will be picked the subset of parameters Qu(f)- These parameters define a unimodal distribution for each mixture component, whose family depends on the input data. Combined with the posterior distribution computed using the observable node features, there will be effectively defined a different mixture model for each node, and consequently a different multimodal distribution for each feature value. These distributions capture the uncertainty of the model with respect to different plausible values for the reconstructed value. For a univariate Gaussian mixture associated with node u and feature f, there can be computed its distribution in the following way, wherein N stands for the Gaussian distribution:

2.4 Imputation Phase: Once there is produced a multimodal distribution for each missing feature, there can be chosen a method to extract the imputed value. There are multiple options, each with its own advantages and disadvantages, and this invention is not restricted to one of those. One option is to take the weighted mean of the unimodal distributions that are part of the mixture model, whereas another is to use heuristics to approximate the global maximum point of the distribution. In principle, there can also be used both the emission parameters and the posterior information as a substitute for a specific imputed value; this way, the information about uncertainty is preserved and the reconstructed value can be interpreted as a distribution by humans or subsequent Al systems.

Summary: There is proposed the first technique that can reconstruct partially missing node features while exploiting the graph structure and providing uncertainty estimates on the reconstructed values. For these reasons, the proposed layer and architecture constitute important novelties of the present approach.

The graph of reconstructed node features, as well as the unsupervised node embeddings, can then be used in subsequent downstream tasks or subject of data analytics tools.

Embodiment 1 : Missing value reconstruction in a patient graph for improved prediction of clinical risk. The computer-implemented method is useful in the medical field and provides one or more missing values of a missing feature or of a missing patient information.

Use Case: Predicting clinical risk in hospitals well ahead of time can save lives by determining in advance which patients could be subject to determined illnesses, e.g., sepsis, acute kidney injuries etc.. Patient records always contain missing information, and accurate reconstruction of missing features or values of missing features can drastically improve the performance of Al mortality predictors. A node of the network with partially missing values can be built from a patient's record or patient's parameters or corresponding values. Such nodes can form a network of nodes by connecting nodes with similar values, for instance using a distance-based strategy. Embodiments of the present invention can be used to improve the accuracy of such reconstruction whenever a sensible graph of patients is provided, which in turn leads to better predictions of clinical risk and better decisions on which actions to take. The original and reconstructed features can be fed into a predictive system that outputs, together with a prediction of the clinical risk, a set of additional exams to take, e.g., by providing the configuration of a CBC blood test machine.

Data Source: A patient network, which consists of partially available data of patients, wherein patients can form corresponding nodes or the network. This data includes, but it is not limited to, vital parameters or data of patients, values of parameters extracted from blood samples, basic vital measurements and laboratory exams such as heart rate, oxygen saturation, weight, height, glucose, temperature, pH etc.. A blood sample of the patient is available for further use.

The present computer-implemented Method: Embodiments of the invention will reconstruct, with estimates of uncertainty, the missing features of the patient graphs using information from neighboring patients. The quality of the structure will impact the reconstruction.

Output: A list of patients with their original and reconstructed features or values of missing features.

Physical Change (Technicity): The output of the system is fed into a predictive Al system for clinical risk, which can diagnose a potential illness and require an additional set of specialized blood tests. The system activates an automated blood test pipeline (like a Tempus600), which first retrieves the patient’s blood samples from storage and then executes the required blood tests for obtaining corresponding data.

Embodiment 2: Predictive Maintenance: Missing values reconstruction in a defective soil sensor network. The computer-implemented method is useful in the field or agriculture and provides one or more missing values of a missing feature or of a sensor measurement.

Use Case: Precision agriculture seeks to improve productivity while reducing costs using smart devices that constantly monitor the target environment. Soil monitoring is one example of how a network of sensors can be used to determine where and when it is necessary to irrigate the soil, in order to keep humidity and temperature to optimal levels, wherein each sensor can form a node of the network. Data resulting from the sensors, such as temperature, moisture, and chemical composition can be recorded or collected. At least a part of the data will be transformed into a graph using the geographical position to connect nearby sensors. Due to adverse weather conditions, it may happen that some sensors will become defective, and part of all measurements will become unavailable until the sensor is replaced, an action which can take time. By inferring the missing measurements from the neighboring sensors, there could be mitigated the defection while providing accurate approximations of the exact measurements. If predictions were sufficiently accurate, there could be the possibility to completely remove some of the sensors with savings in terms of money and maintenance jobs.

Data Source: A sensor network, which consists of partially available data of humidity, temperature, and other metrics of interest related to the soil.

The present computer-implemented Method: Embodiments of the invention will predict, with estimates of uncertainty, which are the plausible values for the missing sensors’ measurements using information available from nearby geographical areas. The granularity of the sensor network may impact the reconstruction; if not, some sensors may be deemed unnecessary and completely removed from the network.

Output: A list of reconstructed measurements for the defective sensors. These can be incorporated with the available data and processed by an automated irrigation system.

Physical Change (Technicity): The output of the system can be used in two ways:

1) A presentation of information, which credibly assists the user in performing a technical task by means of a continued and/or guided maintenance process.

2) As input to an automated irrigation system. This system is able to determine how much water specific areas of land should receive to keep productivity high and resource consumption low.

Embodiment 3: Predictive Maintenance/Smart City: Detecting node anomalies in a highly-automated factory. The computer-implemented method is useful in the industrial field and provides one or more missing values of a missing feature or of a monitored parameter. Use Case: As industrial factories become more and more bound to Internet of Things, loT, systems, there is a great amount of data to be processed and stored in real time. Smart machines in a factory are logically connected by the production process, so it is possible to consider a factory as a network of machines, wherein each machine or selected machines can form nodes of the network. Detecting anomalies in advance, even when partial information is not available in a given instant, is crucial to prevent damage to the individual and possibly neighboring machines.

Data Source: A network of smart industrial machines, each of which streams a given set of control parameters that have to be constantly monitored. Values of the machines' parameters, such as stress level, status of lubrication products, temperature and time of activity can be recorded or collected as data for the nodes of the network. Machines that are physically or logically connected are define the connections of the network/graph.

The present computer-implemented Method: The invention will predict, on the basis of the data or training data, a likelihood measure associated with each machine at every instant of time. Whenever the likelihood falls below a given security threshold, security systems of the building are activated and the cause is investigated by looking at which control parameters have caused the anomaly.

Output: A list of likelihood scores for each machine. Whenever one of them falls below a set threshold, the security system of the building is activated.

Physical Change (Technicity): The output of the system can be used to control a smart/automated security system. This system can stop production machines, lock or unlock (emergency) doors, enable an acoustic and visual alarm, etc.

In the following are shown some reconstruction results on a synthetic dataset that has been created for demonstrating a causal dependency of a node feature on the network’s structure. In Fig. 4 there can be observed how, for nodes with varying portions of missing features - the portion is determined by a Gamma distribution with shape 1.5 and rate 2 - the model performs better than a Gaussian Mixture Model at reconstructing the features in terms of Mean Squared Error, MSE. When averaged across the entire test set (1 K samples), the MSE average performances are 1689.08 (±138.0) for GC and 1871.10 (±192.25) for GMM. A naive baseline that reconstructs a node feature by averaging the neighboring values is not shown and performs much worse.

Many modifications and other embodiments of the invention set forth herein will come to mind to the one skilled in the art to which the invention pertains having the benefit of the teachings presented in the foregoing description and the associated drawings. Therefore, it is to be understood that the invention is not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims

C l a i m s

1. A computer-implemented method for providing one or more missing values of a feature in a graph, wherein the graph results from or is extracted from a node of a network of nodes, comprising the following steps: recording or collecting data from at least one node of the network or from the network; transforming at least a part of the data into a graph; extracting at least one representation or node posterior representation from the graph using a definable deep architecture of Graph Convolutional, GC, layers; constructing a distribution over at least one missing feature of the graph or node using the at least one representation or node posterior representation; and extracting a value or a vector for the at least one missing feature from the distribution by means of an imputation method.

2. A method according to claim 1 , wherein the recorded or collected data is additionally used during the constructing step or graph building step.

3. A method according to claim 1 or 2, wherein the distribution is a multimodal distribution.

4. A method according to any of claims 1 to 3, wherein the method comprises modeling of an uncertainty or of an uncertainty estimate over at least one plausible missing value for the at least one missing feature.

5. A method according to any of claims 1 to 4, wherein a graph of at least partially available or completed node features and/or unsupervised node embeddings are used in subsequent downstream tasks or subject of data analytics tools.

6. A method according to any of claims 1 to 5, wherein the method further comprises conditioning a mixture model at each GC layer on a set of states of nodes in the neighborhood of the at least one node and learning appropriate neighborhood aggregation functions.

7. A method according to any of claims 1 to 6, wherein the method further comprises providing an end-to-end architecture made of Graph Convolutional layers, which behave as mixture models conditioned on a graph structure, for simultaneously generating unsupervised node embeddings dependent on the graph structure and using them to impute missing values or features.

8. A method according to any of claims 1 to 7, wherein the vector for the at least one missing feature contains values for the recorded or collected data.

9. A method according to any of claims 1 to 8, wherein for performing the method a Node Posterior Module comprising the deep architecture of Graph Convolutional, GC, layers, a Feature Distribution Generation Module for computing the generative distribution of a feature f of a node u for all features and nodes in the graph and an imputation strategy or conditional mean imputation strategy are used.

10. A method according to any of claims 1 to 9, wherein Bayes’ theorem is applied to the observable node features for computing the posterior distribution encoded as a vector of dimension C.

11. A method according to any of claims 1 to 10, wherein the architecture is trained to maximize the log-likelihood of at least one node feature.

12. A method according to any of claims 1 to 11 , wherein during the method a different mixture model for each node and consequently a different multimodal distribution for each feature value is defined.

13. A method according to any of claims 1 to 12, wherein the method comprises a reconstructing of plausible missing values in a graph or graphs.

14. A method according to any of claims 1 to 13, wherein the method comprises a discovering of anomalies in the graph, using probabilistic graph learning or a probabilistic graph Artificial Intelligence, Al, system.

15. A system for providing one or more missing values of a feature in a graph, preferably for carrying out the computer-implemented method for providing a missing value of a feature in a graph according to any one of claims 1 to 14, wherein the graph results from or is extracted from a node of a network of nodes, comprising: - recording or collecting means for recording or collecting data from at least one node of the network or from the network; transforming means for transforming at least a part of the data into a graph; extracting means for extracting at least one representation or node posterior representation from the graph using a definable deep architecture of Graph Convolutional, GC, layers; constructing means for constructing a distribution over at least one missing feature of the graph or node using the at least one representation or node posterior representation; and extracting means for extracting a value or a vector for the at least one missing feature from the distribution by means of an imputation method.