WO2020191001A1 - Real-world network link analysis and prediction using extended probailistic maxtrix factorization models with labeled nodes - Google Patents
Real-world network link analysis and prediction using extended probailistic maxtrix factorization models with labeled nodes Download PDFInfo
- Publication number
- WO2020191001A1 WO2020191001A1 PCT/US2020/023264 US2020023264W WO2020191001A1 WO 2020191001 A1 WO2020191001 A1 WO 2020191001A1 US 2020023264 W US2020023264 W US 2020023264W WO 2020191001 A1 WO2020191001 A1 WO 2020191001A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- model
- nodes
- pmf
- extended
- sets
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/147—Network analysis or design for predicting network behaviour
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/145—Network analysis or design involving simulating, designing, planning or modelling of a network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/149—Network analysis or design for prediction of maintenance
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/142—Network analysis or design using statistical or mathematical methods
Definitions
- the present invention generally relates to link prediction techniques, and more particularly, to link prediction techniques that extend the Poisson matrix factorization (PMF) model by incorporating node-specific covariates for two sets of nodes in the PMF framework, modeling sparsity on both latent feature matrices, and/or accounting for seasonal effects to determine whether new links are anomalous.
- PMF Poisson matrix factorization
- Link prediction is defined as the problem of predicting the presence of an edge between two nodes in the network based on observed edges and attributes of the nodes.
- a link is an edge, and the determination of whether a new edge is anomalous in some embodiments is based on analysis of node attributes and previously observed edges, which, in the extended PMF model, determine the probability of a link.
- the link prediction problem has been an active field of research, and is somewhat similar to recommender systems, especially in its static formation.
- links between entities in computer networks may provide insights into adversary behavior.
- users or computing systems are just one example for certain applications in cybersecurity, and there are many more possible applications without deviating from the scope of the invention. In other words, some embodiments could be used in other application domains in which case nodes would be something else.
- Prediction and modeling of links within a computer network has relevant implications in cybersecurity. Relationships between entities within the computer network, such as user interactions with computing systems or system libraries and the corresponding processes that use them, can provide key insights into adversary behavior.
- Entities in some embodiments may be users or computing systems, for example. However, some embodiments may be applied to other entities without deviating from the scope of the invention.
- Certain embodiments of the present invention may provide solutions to the problems and needs in the art that have not yet been fully identified, appreciated, or solved by conventional cybersecurity technologies.
- some embodiments of the present invention pertain to link prediction techniques that extend the PMF model by incorporating node-specific covariates for two sets of nodes in the PMF framework, modeling sparsity on both latent feature matrices, and/or accounting for seasonal effects to improve link prediction.
- a computer program is embodied on a non-transitory computer-readable medium.
- the program is configured to cause at least one processor to observe a real-world network over time and construct a matrix for two sets of nodes based on the observation of the real-world network over time.
- the program is also configured to cause the at least one processor to fit an extended PMF model to the matrix for the two sets of nodes to learn posterior estimates for model parameters for predictive analytical purposes, the extended PMF model incorporating node-specific covariates for the two sets of nodes, modeling sparsity on latent feature matrices for the two sets of nodes, accounting for seasonal effects, or any combination thereof, to predict links.
- the program is further configured to cause the at least one processor to use the learned posterior estimates for the model parameters to make predictions for future network observations or to determine anomaly scores about links observed after the training period or previously unobserved links. Additionally, the program is configured to cause the at least one processor to output the predictions, the anomaly scores, the model parameters themselves, or any combination thereof.
- a computer program is embodied on a non-transitory computer-readable medium.
- the program is configured to cause at least one processor to observe a real-world network over time and construct a matrix for two sets of nodes based on the observation of the real-world network over time.
- the program is also configured to cause the at least one processor to fit an extended PMF model to the matrix for the two sets of nodes to learn posterior estimates for model parameters for predictive analytical purposes, the extended PMF model incorporating node-specific covariates for the two sets of nodes, modeling sparsity on latent feature matrices for the two sets of nodes, accounting for seasonal effects, or any combination thereof, to predict links.
- the program is further configured to cause the at least one processor to use the learned posterior estimates for the model parameters to make predictions for future network observations or to determine anomaly scores about links observed after the training period or previously unobserved links. Additionally, the program is configured to cause the at least one processor to output the predictions, the anomaly scores, the model parameters themselves, or any combination thereof.
- the extended PMF model uses a variational inference procedure for binary matrices. The variational inference procedure provides inference on marginal posterior distributions of parameters for the two sets of nodes, as well as for the covariates since this underpins a predictive distribution on which edges are likely to be observed in the future.
- a computer-implemented method includes fitting, by a computing system, an extended PMF model to a matrix for two sets of nodes based on the observation of the real-world network over time to learn posterior estimates for model parameters for predictive analytical purposes, the extended PMF model incorporating node-specific covariates for the two sets of nodes, modeling sparsity on latent feature matrices for the two sets of nodes, accounting for seasonal effects, or any combination thereof, to predict links.
- the computer-implemented method also includes using the learned posterior estimates for the model parameters, by the computing system, to make predictions for future network observations or to determine anomaly scores about links observed after the training period or previously unobserved links and outputting the predictions, the anomaly scores, the model parameters themselves, or any combination thereof, by the computing system.
- FIG. 1 is a graph illustrating a full extended Poisson matrix factorization (PMF) model, according to an embodiment of the present invention.
- FIG. 2 is a flowchart illustrating a process for new link prediction using an extended PMF model, according to an embodiment of the present invention.
- FIG. 3 is a block diagram illustrating a computing system configured to perform new link prediction using an extended PMF model, according to an embodiment of the present invention.
- PMF Poisson matrix factorization
- Some embodiments of the present invention pertain to an extension to, and a practical application of, a Poisson matrix factorization (PMF) model for binary matrices.
- PMF is extended in some embodiments to include scenarios that are commonly encountered in real life networks, such as those motivated by applications in cybersecurity.
- the methodology is described in the context of computer networks.
- the technique of some embodiments may be extended to other practical applications, such as biological or social networks, without deviating from the scope of the invention.
- the extension in some embodiments explicitly includes known covariates associated with the nodes.
- nodes may be users or computing systems, for instance.
- IBP Indian Buffet Process
- a seasonal version of PMF may be employed to handle dynamic networks.
- Fast inference schemes using variational inference and Gibbs sampling may be employed. These techniques may be used individually or in combination. Such embodiments have shown improved performance over the standard PMF model and other known link prediction techniques in testing at Los Alamos National Laboratory.
- some embodiments incorporate this information into the learning process by extending the PMF model to allow for covariates for both sets of nodes (e.g., users and computing systems), showing improved performance in scoring and predicting links.
- the model of some embodiments may also incorporate seasonality, making the model applicable to a more realistic dynamic computer network setting.
- some embodiments provide new link prediction techniques that extend the PMF model by incorporating node-specific covariates for two sets of nodes (e.g., a first set of nodes for users and a second set of nodes for items) items in the PMF framework, modeling sparsity on both latent feature matrices for each set of nodes, and/or accounting for seasonal effects (i.e., seasonality) to predict links.
- the standard PMF model may therefore be extended in three directions in some embodiments, which can all be implemented simultaneously in certain embodiments to produce more accurate inference and prediction. Furthermore, the model of some embodiments may be extended to properly deal with binary edges, where only the existence of an edge is observed rather than an associated count along the edge, corresponding to the number of observed links between the two nodes during a given time period.
- the objective of the link prediction procedure of some embodiments is to reliably predict the structure of the subsequently observed graph A T+1.
- Standard PMF has prior distributions on the latent features a and b . However, standard PMF does not have a prior on HPMF adds a prior on the
- a relevant advantage of PMF over competing models is the likelihood that only depends on the number of observed links (i.e., evaluating the likelihood is where nnz(-) is the number of non-zero elements in the matrix— ,
- the PMF model has been used as building block for multiple extensions.
- a social Poisson factorization (SPF) has been developed that includes the latent social influence in the recommender system. It has also been proposed to combine PMF with the standard collective matrix factorization model to tackle the problem of cold- starts and jointly model relational matrices.
- a collaborative topic Poisson factorization (CTPF) has been developed that adds a document topic offset to the standard PMF model to provide content-based recommendations. Note that this allows for item- specific covariate information to be incorporated into the model, but not user-specific covariate information. The approach of some embodiments in this article allows for both user-specific and item-specific covariate information to be included.
- a non negative matrix factorization model has also been proposed with Poisson likelihood with sparsity constraints imposed only on one of the two matrices in the decomposition, and structured stochastic mean-field variational inference is used to infer the model parameters.
- sparsity is imposed on both matrices.
- the PMF model is modified to treat the counts N ij as latent variables and to treat A ij as a censored count.
- This type of link has been previously used, and is sometimes referred as Bernoulli-Poisson (BerPo) link, where Gibbs sampling is commonly used for inference.
- some embodiments use a variational inference procedure instead.
- Variational inference schemes have been successfully used for PMF models with a number of different link functions.
- Variational inference schemes have been successfully used for matrices of counts.
- Gibbs sampling or hybrid approaches e.g., structured stochastic variational inference, which involves Gibbs sampling steps
- a variational inference scheme for binary matrices is used instead.
- the covariate“job title” divides users into different groups according to their jobs, such as managers or scientists. Therefore, for the remainder of this disclosure, the covariates are assumed to be categorical. It is also assumed that the observed values A ij are obtained from binary truncations of Poisson draws using the following hierarchical model:
- inference is on the marginal posterior distributions of the parameters a i and b j for all the users and items and for the covariates since this underpins the predictive distribution on which edges are likely to be observed in the future. Inference is straightforward using Gibbs sampling since the prior distributions are chosen to be conjugate to the posteriors, but sampling-based approaches do not scale well with the size of the network. Therefore, inference is usually performed using variational inference, which turns the problem of sampling from the posterior into an optimization task.
- l refers to a ( k , h ) covariate pair
- Variational inference is an optimization-based technique for approximating intractable posterior distributions, such as
- the proxy distribution q(.) is usually chosen to make the above approximation possible, and is in a much simpler form than the posterior distribution.
- the mean-field variational family is used, where the latent variables in the posterior are considered to be independent and governed by their own distribution, such that:
- V j is an element of a“partition” of the full set of parameters v
- V- j denotes the full set v excluding the parameters in the subset V j .
- the expectation is taken with respect to the variational approximation for the parameters v_ j , excluding the component V j .
- shp and rte refer to the two parameters that fully characterize a Gamma distribution.
- the shape parameter shp controls the shape of the distribution and the rate parameter rte controls its variability. Note that all the update equations in Algorithm 1 only depend on the elements of the matrix where A ij > 0, providing computational efficiency for large sparse matrices.
- Convergence of the CAVI algorithm is determined by monitoring the change in the ELBO.
- the ELBO can have many local optima, it can be highly dependent on the initial starting values. Therefore, it is generally advisable to run the algorithm multiple times using different starting points. Also, computing the ELBO on very large matrices is computationally costly. Assessing convergence may be achieved by calculating the average predictive log-likelihood on a small held-out dataset, providing a proxy to calculating the ELBO on the entire dataset.
- the problem statement with respect to anomaly detection is to determine whether an observed user-item pair is normal with respect to the model parameters learned over some training period.
- An anomaly score can be given by the posterior predictive upper tail p-value.
- a ij is a Bernoulli random variable, this is equivalent
- the number of latent features R must be specified in advance.
- the PMF model given in Eq. (2) is generalized by introducing binary coefficients D Î ⁇ 0,1 ⁇ used to switch the latent variables a ir and jr on or off, allowing for a much sparser representation. It should be noted that binary variables for the covariate coefficients could also be added to simultaneously turn the covariate pairs on and off.
- the resulting model is an extension to Beta-Process Non-Negative Matrix Factorization (BPNNMF):
- the IBP process is the infinite limit of a Beta-Bemoulli process:
- Some embodiments extend the PMF model by incorporating node-specific covariates for two sets of nodes (e.g., a first set of nodes for users and a second set of nodes for items) in the PMF framework, modeling sparsity on both latent feature matrices for each set of nodes, and/or accounting for seasonal effects (i.e., seasonality) to improve prediction of links.
- the standard PMF model may therefore be extended in three directions, which can be all implemented simultaneously in some embodiments to produce more accurate inference and prediction. The following equation summarizes the multiple models discussed herein:
- the component does not contribute to the probability of a link.
- each latent feature is associated only with a subset of the nodes. This allows for a more precise assessment of the probability of a link, provides a framework for model selection, and can be used to select automatically the appropriate number of latent features.
- FIG. 2 is a flowchart illustrating a process for link prediction using an extended PMF model, according to an embodiment of the present invention.
- the process begins with observing a real-world network over time and constructing a matrix for the two sets of nodes (e.g., one for users and another for items) based on the observations at 210 over a training period.
- An extended PMF model is then fit to the matrix for the two sets of nodes at 220 to learn posterior estimates for the model parameters for predictive analytical purposes.
- the extended PMF model incorporates node-specific covariates for the two sets of nodes in the graph, models sparsity on latent feature matrices for the two sets of nodes, accounts for seasonal effects, or any combination thereof, to predict the links.
- variational inference or a Gibbs sampler may be used.
- Gibbs sampling is a common Monte Carlo method for inference, as is variational inference.
- variational inference and Gibbs samplers have been modified to account for binary edges.
- the parameters of the model have been learned. This estimate allows for predictions to be made for anomaly detection, recommendations, etc.
- the learned posterior estimates for the model parameters are thus used to make predictions for future network observations or to determine anomaly scores about links observed after the training period or previously unobserved links at 230.
- the predictions or anomaly scores (and in some embodiments, the model parameters themselves) are then output at 240 for review.
- the link prediction results may include the top- most likely links for recommendations, the top-Mleast likely links for anomaly detection, etc.
- the results could include the top-M (or top-a%) most anomalous links, which could be further examined by security experts for assessment of the threat to the system.
- the parameters of the model could be output for secondary analyses for interpretability of results, such as to identify strongly linked users or items, or understand which covariates are important in making predictions.
- FIG. 3 is a block diagram illustrating a computing system configured to perform new link prediction using an extended PMF model, according to an embodiment of the present invention.
- Computing system 300 includes a bus 305 or other communication mechanism for communicating information, and processor(s) 310 coupled to bus 305 for processing information.
- Processor(s) 310 may be any type of general or specific purpose processor, including a central processing unit (CPU), application specific integrated circuit (ASIC), field programmable gate array (FPGA), etc.
- Processor(s) 310 may also have multiple processing cores, and at least some of the cores may be configured to perform specific functions. Multi-parallel processing may be used in some embodiments.
- Computing system 300 further includes a memory 315 for storing information and instructions to be executed by processor(s) 310.
- Memory 315 can be comprised of any combination of random access memory (RAM), read only memory (ROM), flash memory, cache, static storage such as a magnetic or optical disk, or any other types of non-transitory computer-readable media or combinations thereof.
- computing system 300 includes a communication device 320, such as a transceiver and antenna, to wirelessly provide access to a communications network.
- Non-transitory computer-readable media may be any available media that can be accessed by processor(s) 310 and may include volatile media, non-volatile media, or both. The media may also be removable, non-removable, or both.
- Processor(s) 310 are further coupled via bus 305 to a display 325, such as a Liquid Crystal Display (LCD), for displaying information to a user.
- a keyboard 330 and a cursor control device 335 are further coupled to bus 305 to enable a user to interface with computing system.
- a physical keyboard and mouse may not be present, and the user may interact with the device solely through display 325 and/or a touchpad (not shown). Any type and combination of input devices may be used as a matter of design choice.
- no physical input device is present. For instance, the user may interact with computing system 300 remotely via another computing system in communication therewith, or computing system 300 may operate autonomously.
- Memory 315 stores software modules that provide functionality when executed by processor(s) 310.
- the modules include an operating system 340 for computing system 300.
- the modules further include a module 1145 that is configured to perform new link prediction using an extended PMF model by employing any of the approaches discussed herein or derivatives thereof.
- Computing system 1100 may include one or more additional functional modules 1150 that include additional functionality.
- a“system” could be embodied as a server, an embedded computing system, a personal computer, a console, a personal digital assistant (PDA), a cell phone, a tablet computing device, or any other suitable computing device, or combination of devices.
- PDA personal digital assistant
- Presenting the above-described functions as being performed by a“system” is not intended to limit the scope of the present invention in any way, but is intended to provide one example of many embodiments of the present invention. Indeed, methods, systems and apparatuses disclosed herein may be implemented in localized and distributed forms consistent with computing technology, including cloud computing systems.
- modules may be implemented as a hardware circuit comprising custom very large scale integration (VLSI) circuits or gate arrays, off-the-shelf semiconductors such as logic chips, transistors, or other discrete components.
- VLSI very large scale integration
- a module may also be implemented in programmable hardware devices such as field programmable gate arrays, programmable array logic, programmable logic devices, graphics processing units, or the like.
- a module may also be at least partially implemented in software for execution by various types of processors.
- An identified unit of executable code may, for instance, comprise one or more physical or logical blocks of computer instructions that may, for instance, be organized as an object, procedure, or function. Nevertheless, the executables of an identified module need not be physically located together, but may comprise disparate instructions stored in different locations which, when joined logically together, comprise the module and achieve the stated purpose for the module.
- modules may be stored on a computer-readable medium, which may be, for instance, a hard disk drive, flash device, RAM, tape, or any other such medium used to store data.
- a module of executable code could be a single instruction, or many instructions, and may even be distributed over several different code segments, among different programs, and across several memory devices.
- operational data may be identified and illustrated herein within modules, and may be embodied in any suitable form and organized within any suitable type of data structure. The operational data may be collected as a single data set, or may be distributed over different locations including over different storage devices, and may exist, at least partially, merely as electronic signals on a system or network.
- the process steps performed in FIG. 2 may be performed by a computer program, encoding instructions for the processor(s) to perform at least the process described in FIG. 2, in accordance with embodiments of the present invention.
- the computer program may be embodied on a non-transitory computer-readable medium.
- the computer-readable medium may be, but is not limited to, a hard disk drive, a flash device, RAM, a tape, or any other such medium used to store data.
- the computer program may include encoded instructions for controlling the processor(s) to implement the process described in FIG. 2, which may also be stored on the computer-readable medium.
- the computer program can be implemented in hardware, software, or a hybrid implementation.
- the computer program can be composed of modules that are in operative communication with one another, and which are designed to pass information or instructions to display.
- the computer program can be configured to operate on a general purpose computer, an ASIC, or any other suitable device.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Optimization (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Algebra (AREA)
- Probability & Statistics with Applications (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
A practical adaptation and application of a Poisson matrix factorization (PMF) model for binary matrices to scenarios encountered in real-world computer networks is disclosed. Link prediction techniques may extend the PMF model by incorporating node-specific covariates for two sets of nodes in the PMF framework, modeling sparsity on both latent feature matrices, and/or accounting for seasonal effects to predict links. The standard PMF model may therefore be extended in three directions, which may all be implemented simultaneously to produce more accurate inference and prediction. Furthermore, the model may be extended to properly deal with binary edges, where only the existence of an edge is observed rather than an associated count along the edge.
Description
TITLE
REAL-WORLD NETWORK LINK ANALYSIS AND PREDICTION USING
EXTENDED PROBABILISTIC MATRIX FACTORIZATION MODELS WITH LABELED NODES
CROSS-REFERENCE TO RELATED APPLICATION
[0001] This application claims the benefit of U.S. Provisional Patent Application No. 62/819,912 filed March 18, 2019. The subject matter of this earlier filed application is hereby incorporated by reference in its entirety.
STATEMENT OF FEDERAL RIGHTS
[0002] The United States government has rights in this invention pursuant to Contract No. 89233218CNA000001 between the United States Department of Energy and Triad National Security, LLC for the operation of Los Alamos National Laboratory.
FIELD
[0003] The present invention generally relates to link prediction techniques, and more particularly, to link prediction techniques that extend the Poisson matrix factorization (PMF) model by incorporating node-specific covariates for two sets of nodes in the PMF framework, modeling sparsity on both latent feature matrices, and/or accounting for seasonal effects to determine whether new links are anomalous.
BACKGROUND
[0004] Graphs and networks have emerged as popular mathematical structures to represent datasets that are commonly encountered in real world applications, such as computer science, biology, and social sciences. A network may be defined as a graph G = (V, E ), where V is a set of nodes (e.g., users or computing systems) and E : V X V is a set of edges connecting at least some of the nodes. If a node x Î V interacts with a node y Î V, then ( x, y ) Î E. In the case of a computer network, x and y could be, for example, users or computing systems.
[0005] Link prediction is defined as the problem of predicting the presence of an edge between two nodes in the network based on observed edges and attributes of the nodes. A link is an edge, and the determination of whether a new edge is anomalous in some embodiments is based on analysis of node attributes and previously observed edges, which, in the extended PMF model, determine the probability of a link. The link prediction problem has been an active field of research, and is somewhat similar to recommender systems, especially in its static formation.
[0006] In some embodiments, links between entities in computer networks, such as user interactions with computers or system libraries and the corresponding processes that use them, may provide insights into adversary behavior. However, it should be noted that users or computing systems are just one example for certain applications in cybersecurity, and there are many more possible applications without deviating from the scope of the invention. In other words, some embodiments could be used in other application domains in which case nodes would be something else.
[0007] Prediction and modeling of links within a computer network has relevant implications in cybersecurity. Relationships between entities within the computer network, such as user interactions with computing systems or system libraries and the corresponding processes that use them, can provide key insights into adversary behavior. Previously unobserved edges may be of particular interest since many attack behaviors, such as lateral movement, phishing, and data retrieval, create new links between such entities. Entities in some embodiments may be users or computing systems, for example. However, some embodiments may be applied to other entities without deviating from the scope of the invention.
[0008] Existing approaches for anomaly detection in cybersecurity research involve building models of normal behavioral patterns and detecting deviations. In this sense, previously observed links can be scored and predicted. However, probabilistically assigning anomaly scores to new links is a serious challenge because no prior observations exist. In other words, because there is no previous link, it is difficult to determine whether a new link is actually anomalous. Accordingly, an approach that represents rich, complex datasets and assigns probabilistic scores for new link formation over time may be beneficial.
[0009] The link prediction problem in computer networks is particularly challenging because the number of nodes involved is potentially very large and the structure of the network is inherently dynamic. For use in practical cybersecurity applications, it is typically necessary to use relatively simple and scalable techniques, given the size and dynamic nature of the networks.
[0010] Probabilistic matrix factorization approaches, especially probabilistic matrix factorization techniques (e.g., classical Gaussian matrix factorization) are currently widely used in the tech industry. Poisson matrix factorization (PMF) recently emerged as a suitable model in the link prediction framework due to its flexibility and scalability. In previous work by Turcotte et al. it was shown that the hierarchical PMF model performs well in the context of cybersecurity applications. See Melissa J. M. Turcotte et al. “Poisson Factorization for Peer-Based Anomaly Detection,” 2016 IEEE Conference on Intelligence and Security Informatics (ISI), pages 208-210 (2016).
SUMMARY
[0011] Certain embodiments of the present invention may provide solutions to the problems and needs in the art that have not yet been fully identified, appreciated, or solved by conventional cybersecurity technologies. For example, some embodiments of the present invention pertain to link prediction techniques that extend the PMF model by incorporating node-specific covariates for two sets of nodes in the PMF framework, modeling sparsity on both latent feature matrices, and/or accounting for seasonal effects to improve link prediction.
[0012] In an embodiment, a computer program is embodied on a non-transitory computer-readable medium. The program is configured to cause at least one processor to observe a real-world network over time and construct a matrix for two sets of nodes based on the observation of the real-world network over time. The program is also configured to cause the at least one processor to fit an extended PMF model to the matrix for the two sets of nodes to learn posterior estimates for model parameters for predictive
analytical purposes, the extended PMF model incorporating node-specific covariates for the two sets of nodes, modeling sparsity on latent feature matrices for the two sets of nodes, accounting for seasonal effects, or any combination thereof, to predict links. The program is further configured to cause the at least one processor to use the learned posterior estimates for the model parameters to make predictions for future network observations or to determine anomaly scores about links observed after the training period or previously unobserved links. Additionally, the program is configured to cause the at least one processor to output the predictions, the anomaly scores, the model parameters themselves, or any combination thereof.
[0013] In another embodiment, a computer program is embodied on a non-transitory computer-readable medium. The program is configured to cause at least one processor to observe a real-world network over time and construct a matrix for two sets of nodes based on the observation of the real-world network over time. The program is also configured to cause the at least one processor to fit an extended PMF model to the matrix for the two sets of nodes to learn posterior estimates for model parameters for predictive analytical purposes, the extended PMF model incorporating node-specific covariates for the two sets of nodes, modeling sparsity on latent feature matrices for the two sets of nodes, accounting for seasonal effects, or any combination thereof, to predict links. The program is further configured to cause the at least one processor to use the learned posterior estimates for the model parameters to make predictions for future network observations or to determine anomaly scores about links observed after the training period or previously unobserved links. Additionally, the program is configured to cause the at least one processor to output the predictions, the anomaly scores, the model
parameters themselves, or any combination thereof. The extended PMF model uses a variational inference procedure for binary matrices. The variational inference procedure provides inference on marginal posterior distributions of parameters for the two sets of nodes, as well as for the covariates since this underpins a predictive distribution on which edges are likely to be observed in the future.
[0014] In yet another embodiment, a computer-implemented method includes fitting, by a computing system, an extended PMF model to a matrix for two sets of nodes based on the observation of the real-world network over time to learn posterior estimates for model parameters for predictive analytical purposes, the extended PMF model incorporating node-specific covariates for the two sets of nodes, modeling sparsity on latent feature matrices for the two sets of nodes, accounting for seasonal effects, or any combination thereof, to predict links. The computer-implemented method also includes using the learned posterior estimates for the model parameters, by the computing system, to make predictions for future network observations or to determine anomaly scores about links observed after the training period or previously unobserved links and outputting the predictions, the anomaly scores, the model parameters themselves, or any combination thereof, by the computing system.
BRIEF DESCRIPTION OF THE DRAWINGS
[0015] In order that the advantages of certain embodiments of the invention will be readily understood, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments that are illustrated in the appended drawings. While it should be understood that these drawings depict only
typical embodiments of the invention and are not therefore to be considered to be limiting of its scope, the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings, in which:
[0016] FIG. 1 is a graph illustrating a full extended Poisson matrix factorization (PMF) model, according to an embodiment of the present invention.
[0017] FIG. 2 is a flowchart illustrating a process for new link prediction using an extended PMF model, according to an embodiment of the present invention.
[0018] FIG. 3 is a block diagram illustrating a computing system configured to perform new link prediction using an extended PMF model, according to an embodiment of the present invention.
DETAILED DESCRIPTION OF THE EMBODIMENTS
[0019] Some embodiments of the present invention pertain to an extension to, and a practical application of, a Poisson matrix factorization (PMF) model for binary matrices. PMF is extended in some embodiments to include scenarios that are commonly encountered in real life networks, such as those motivated by applications in cybersecurity. For the purposes of this disclosure, the methodology is described in the context of computer networks. However, it should be noted that the technique of some embodiments may be extended to other practical applications, such as biological or social networks, without deviating from the scope of the invention.
[0020] In particular, the extension in some embodiments explicitly includes known covariates associated with the nodes. By way of nonlimiting example, and per the above, nodes may be users or computing systems, for instance. A doubly sparse PMF with
Indian Buffet Process (IBP) priors, which further refines the edge probabilities, may be used. A seasonal version of PMF may be employed to handle dynamic networks. Fast inference schemes using variational inference and Gibbs sampling may be employed. These techniques may be used individually or in combination. Such embodiments have shown improved performance over the standard PMF model and other known link prediction techniques in testing at Los Alamos National Laboratory.
[0021] In order to identify and predict links, it is typically beneficial to develop an understanding of the normal structure of the network graph, and then perform prediction procedures by associating a score with each link. The scores obtained for each edge can be used to flag the connection as anomalous if the event is probabilistically“surprising” according to the model (e.g., if the event is improbable by a predetermined threshold for that event based on past network analysis). In order to compute the scores, it is typically beneficial to use relatively computationally inexpensive and scalable techniques, given the huge amount of data and information available in most networks. PMF is a suitable model in this framework due to its flexibility and scalability. However, known covariate level information about entities is typically discarded or included after the fact.
[0022] Accordingly, some embodiments incorporate this information into the learning process by extending the PMF model to allow for covariates for both sets of nodes (e.g., users and computing systems), showing improved performance in scoring and predicting links. The model of some embodiments may also incorporate seasonality, making the model applicable to a more realistic dynamic computer network setting. In other words, some embodiments provide new link prediction techniques that extend the PMF model by incorporating node-specific covariates for two sets of nodes (e.g., a first
set of nodes for users and a second set of nodes for items) items in the PMF framework, modeling sparsity on both latent feature matrices for each set of nodes, and/or accounting for seasonal effects (i.e., seasonality) to predict links. The standard PMF model may therefore be extended in three directions in some embodiments, which can all be implemented simultaneously in certain embodiments to produce more accurate inference and prediction. Furthermore, the model of some embodiments may be extended to properly deal with binary edges, where only the existence of an edge is observed rather than an associated count along the edge, corresponding to the number of observed links between the two nodes during a given time period.
[0023] It is assumed herein that a computer network can be represented by a bipartite dynamic graph Gt = (U, V, Et) observed at discrete time intervals, where U is the set of users (e.g., user accounts), V i s the set of items (e.g., computing systems - also referred to as“hosts” herein), and the set Et
U x V is the observed set of edges over a time period (t— 1 , t]. Assuming a given user i and a host j, ( i,j ) Î Et if i connects to j within the time period (t— 1, t] . From Et, a rectangular adjacency matrix At can readily be obtained, with {At}ij = 1 Et{(i,j)} , where 1s(x) denotes the indicator function: given a set S and an atom x, 1s(x) = 1 if x Î 5, and 0 otherwise. Given a single adjacency matrix A, or a sequence of observed adjacency matrices A1 ... , AT, the objective of the link prediction procedure of some embodiments is to reliably predict the structure of the subsequently observed graph AT+1.
[0024] POISSON MATRIX FACTORIZATION (PMF) AND DEFICIENCIES
FOR NEW LINK PREDICTION
[0025] For | U| users and | V| items, let N Î N | U | X | V | e a matrix containing counts Nij , representing the number of times the ith user connected to the jth item. The hierarchical Poisson factorization model (HPMF) models the counts Nij using a Poisson link function with a rate given by the inner product between user-specific latent features and item-specific latent features
[0026] Standard PMF has prior distributions on the latent features a and b . However, standard PMF does not have a prior on HPMF adds a prior on the
latent features. This second layer of priors is what makes it hierarchical.
[0027] The specification of the model is completed in a Bayesian framework using Gamma hierarchical priors on the latent parameters:
[0028] A relevant advantage of PMF over competing models is the likelihood that only depends on the number of observed links (i.e., evaluating the likelihood is where nnz(-) is the number of non-zero elements in the matrix— ,
observed in real world applications tend to be extremely sparse, with nnz(N) « | U| X | V| . This makes the PMF scalable to graphs of enormous sizes using relatively straightforward algorithms.
[0029] The PMF model has been used as building block for multiple extensions. For example, a social Poisson factorization (SPF) has been developed that includes the latent social influence in the recommender system. It has also been proposed to combine PMF with the standard collective matrix factorization model to tackle the problem of cold- starts and jointly model relational matrices. A collaborative topic Poisson factorization (CTPF) has been developed that adds a document topic offset to the standard PMF model to provide content-based recommendations. Note that this allows for item- specific covariate information to be incorporated into the model, but not user-specific covariate information. The approach of some embodiments in this article allows for both user-specific and item-specific covariate information to be included.
[0030] Model selection issues have been tackled previously, where a Bayesian non- parametric model for automatic choice of the number of latent features R is developed based on the Gamma process. The Gamma process construction has also been used to jointly model the adjacency matrix and side information. Content and social trust information have previously been included in the PMF framework. It should be noted that a concept related to PMF is Poisson factor analysis (PFA), which has been further extended to hierarchical (deep) topic models. It should be noted that all of these approaches are constructed for count matrices only. As such, they have not been appropriately adapted to the case of binary matrices, nor do they provide any indication as to why one may do so.
[0031] Sparse latent features are considered in a number of different models. A generic framework for modeling dyadic data, called binary matrix factorization (BMF), has been proposed using a product of sparse matrices and a matrix of weights. The
model has been generalized to a Bayesian non-parametric context with Indian buffet process (IBP) priors. It should be noted that“priors” means“prior distributions.” A prior distribution encompasses the prior knowledge about the properties of the parameters of the process. In this context, priors are chosen such that the property of conjugacy holds, which makes inference procedures analytically tractable. A non negative matrix factorization model has also been proposed with Poisson likelihood with sparsity constraints imposed only on one of the two matrices in the decomposition, and structured stochastic mean-field variational inference is used to infer the model parameters. However, in some embodiments, sparsity is imposed on both matrices.
[0032] Dynamic extensions to PMF have also been studied. Kalman filter updates have been used to dynamically correct the rates of the Poisson distribution. A temporal version of PMF has been proposed using the two main tensor factorization algorithms: CP decomposition (also known as the CANDECOMP/PARAFAC, or canonical decomposition parallel factors decomposition) and Tucker decomposition. It has also been suggested to combine the PMF model with the Poisson process to produce dynamic recommendations. In general, despite the extensive treatments of PMF in a dynamic context, seasonality has not been explicitly accounted for. Indeed, it has been overlooked. As such, some embodiments employ a seasonal PMF model.
[0033] INCLUDING COVARIATES IN THE PMF MODEL
[0034] In cybersecurity applications, the counts associated with the links are usually extremely difficult to model due to repeated observations, beaconing behavior, and intrinsic burstiness of the events. As a result, the Poisson model for Nij is likely not appropriate and arguably, no parametric distribution is able to reliably capture the
properties of counts of connections between computing systems. Therefore, some embodiments work directly with the adjacency matrix A, obtained by setting Aij =
the Poisson link for convenience, despite the difference in the ranges. In some embodiments, the PMF model is modified to treat the counts Nij as latent variables and to treat Aij as a censored count. This type of link has been previously used, and is sometimes referred as Bernoulli-Poisson (BerPo) link, where Gibbs sampling is commonly used for inference. However, some embodiments use a variational inference procedure instead. Variational inference schemes have been successfully used for PMF models with a number of different link functions. Variational inference schemes have been successfully used for matrices of counts. For binary matrices, Gibbs sampling or hybrid approaches (e.g., structured stochastic variational inference, which involves Gibbs sampling steps), are used. In some embodiments, a variational inference scheme for binary matrices is used instead.
[0035] Moreover, in many applications, users and items (or any other desired two sets of nodes) usually have associated covariates. Suppose that there are K covariates associated with each user and H covariates for each item. Let the value of the covariate k for the user i be denoted as xik . Similarly, let the value of the covariate h for the item j be yjh. In cybersecurity applications, and more generically in network applications, the main interest is typically on categorical covariates, which provide known groupings or clusters of nodes. Covariates group or cluster nodes by dividing a group of nodes into different groups. For instance, the covariate“job title” divides users into different groups according to their jobs, such as managers or scientists. Therefore, for the
remainder of this disclosure, the covariates are assumed to be categorical. It is also assumed that the observed values Aij are obtained from binary truncations of Poisson draws using the following hierarchical model:
wise product, and = {xik} and yj = {yjh} are the H-dimensional and H-dimensional binary vectors of covariates. In the model of Eq. (2), is a matrix of
interaction terms for each combination of the covariates.
[0037] Assume for a cybersecurity example that a covariate for the employment type “manager” is used for the users and that a covariate for the location“research lab” is used for the hosts. If the user i is a manager and the host j is located in a research laboratory, then Fkh expresses a correction to the rate for a manager connecting
to a research laboratory. The link for the covariates is inspired by the bilinear mixed- effects models for network data. The same priors given in Eq. (1) are used for ai and bj and the following prior distribution completes the specification of the model:
[0038] Given an observed matrix A, inference is on the marginal posterior distributions of the parameters ai and bj for all the users and items and for the
covariates since this underpins the predictive distribution on which edges are likely to be observed in the future. Inference is straightforward using Gibbs sampling since the prior distributions are chosen to be conjugate to the posteriors, but sampling-based
approaches do not scale well with the size of the network. Therefore, inference is usually performed using variational inference, which turns the problem of sampling from the posterior into an optimization task.
[0039] In order to perform inference efficiently, a common latent variable approach is used. Given the unobserved count Nij , a further set of latent variables Zijl, l = 1, ... , R + KH is added. Zijl represents the contribution of the component / to the total latent count For
[0041] This construction ensures that Nij has precisely the Poisson distribution specified in Eq. (2).
[0042] GIBBS SAMPLING
[0043] Since the prior distributions are chosen to be conjugate, the conditionals are all available analytically. First, note that the latent vector
[0044] where is the probability vector proportional to
[0045] where Pois+(·) denotes the zero-truncated Poisson distribution The complete conditionals for the user and item latent features are Gamma, where
[0048] where / is the index corresponding to the covariate pair ( k , h).
[0049] VARIATIONAL INFERENCE
[0050] Variational inference is an optimization-based technique for approximating intractable posterior distributions, such as
[0051] with a proxy distribution
[0052] from a given family and then finding the member q*(.) of the family that minimizes the Kullback-Leibler (KL) divergence to the true posterior. Usually, the KL divergence cannot be explicitly computed. Therefore, an alternative equivalent objective, called evidence lower bound (ELBO), is maximized instead:
[0053] The proxy distribution q(.) is usually chosen to make the above approximation possible, and is in a much simpler form than the posterior distribution. The mean-field variational family is used, where the latent variables in the posterior are considered to be independent and governed by their own distribution, such that:
conditional for each of the parameters given above in the discussion of Gibbs sampling, taking advantage of the fact that the complete conditionals are in the exponential family. Note that under the approximation in Eq. (7), the ELBO given by Eq. (6) is analytically tractable.
[0056] are optimized using coordinate ascent mean-field variational inference (CAVI), where each parameter is optimized while holding the others fixed. Using this algorithm, the optimal form of the variational factors is:
[0057] where Vj is an element of a“partition” of the full set of parameters v, and V-j denotes the full set v excluding the parameters in the subset Vj . Importantly, the expectation is taken with respect to the variational approximation for the parameters v_j , excluding the component Vj . Under the mean-field assumption,
[0058] The full variational inference algorithm is detailed in Algorithm 1 below.
(4) repeat
(5) for each entry of A such that Aij > 0, update the rate of the truncated Poisson distribution for
where is the digamma function;
where / in the lower equation corresponds to a pair ( k , h )
(7) update the user-specific parameters:
(8) update the item-specific parameters:
(9) update the covariate-specific parameters:
(10) until convergence (in ELBO or predictive log-likelihood on a held- out dataset).
[0059] Obtaining the update equations for the variational parameters is straightforward, and similar to the variational inference
algorithm for standard Poisson factorization. Note that these are all parameters for a Gamma distribution governed by a rate and shape, referred to in Algorithm 1 with the superscripts“rte” and“shp,” respectively, where The superscripts
“shp” and“rte” refer to the two parameters that fully characterize a Gamma distribution. The shape parameter shp controls the shape of the distribution and the rate parameter rte controls its variability. Note that all the update equations in Algorithm 1 only depend on the elements of the matrix where Aij > 0, providing computational efficiency for large sparse matrices.
[0060] Convergence of the CAVI algorithm is determined by monitoring the change in the ELBO. As the ELBO can have many local optima, it can be highly dependent on the initial starting values. Therefore, it is generally advisable to run the algorithm multiple times using different starting points. Also, computing the ELBO on very large matrices is computationally costly. Assessing convergence may be achieved by calculating the average predictive log-likelihood on a small held-out dataset, providing a proxy to calculating the ELBO on the entire dataset.
[0061 ] LINK PREDICTION AND ANOMALY DETECTION
[0062] If a Gibbs sampler is used for inference then, given S samples from the joint posterior, an estimate can be obtained for the posterior predictive distribution of future observations
[0063] Similarly, given the optimized values of the parameters of the variational approximation to the posterior, the estimate can be equivalently obtained using Eq.
[0065] where, for example, for the Gibbs sampler, or
computational burden. The approximation in Eq. (12) has been successfully used for link prediction and network anomaly detection purposes in Turcotte et al. (2016). However, if available, it is in general strongly recommended to use Eq. (11), which is a standard Monte Carlo estimate of a probability.
[0066] The problem statement with respect to anomaly detection is to determine whether an observed user-item pair is normal with respect to the model parameters learned over some training period. An anomaly score can be given by the posterior predictive upper tail p-value. As Aij is a Bernoulli random variable, this is equivalent
[0067] DOUBLY SPARSE PMF WITH IBP PRIORS
[0068] One of the main limitations of the PMF model from Eq. (2) is that the coefficients are non-negative, and they all contribute to the
summation in the Poisson rate. Also, the number of latent features R must be specified in advance. The most common criteria for selection of R used in the literature, inspired by the literature on principal component analysis, are based on visual inspection of the scree-plot of singular values and on the position of an elbow in the graph, or alternatively, by maximizing the predictive performance on a held-out data set over various values of the number of latent features R. In this section, the PMF model given in Eq. (2) is generalized by introducing binary coefficients D Î {0,1} used to switch the latent variables air and jr on or off, allowing for a much sparser representation. It should be noted that binary variables for the covariate coefficients could also be added to simultaneously turn the covariate pairs on and off. The resulting model is an extension to Beta-Process Non-Negative Matrix Factorization (BPNNMF):
e corresponding vectors. The model in Eq. (13) is called“doubly sparse” since it allows for sparsity on a and b simultaneously. It should be noted that in this case, the number of latent features is not restricted to affixed value R , but allowed to be potentially infinite. The binary indicators are variable selection tool that can be used to assess
the impact of each covariate on the link probabilities. A suitable prior on the infinite matrices D and
is the IBP, which is used in the Bayesian non-
parametric literature primarily for latent feature models. The IBP process is the infinite limit of a Beta-Bemoulli process:
[0070] This approximation is particularly convenient for the model given in Eq. (13) for reasons that will be discussed later herein. Similarly, equivalent priors can be placed on the binary variables corresponding to the covariates:
[0071 ] INFERENCE VIA GIBB S SAMPLING
[0072] It should be noted that coordinate ascent mean-field variational inference in this model cannot be trivially applied since in Eq. (16), the expectation infinite, as
slight modification of those given in the discussion of Gibbs sampling above. Alternatively, structured stochastic variational inference with Gibbs sampling, which is a hybrid between the two techniques, could be used. The conditional distribution for Nij and Zij follows Eq. (3), except that the rate for the Poisson and probability vectors for the Multinomial will now depend on the binary variables. The conditional distribution of air , conditioned on Da and Db, is
approximation:
[0075] and a similar equation can be obtained for In the full IBP setting,
0 in Eq. (14), and new non-zero columns should be resampled. In the linear Gaussian model, an explicit expression has been derived for the marginal likelihood for this type of move. In this model, this step is particularly complicated. If a new non-empty column is proposed in Da, new columns should also be proposed for Db , a and b. Hence, the Beta-Bemoulli approximation (or finite-dimensional IBP) is used for simplicity.
is:
[0078] DYNAMIC NETWORKS AND SEASONAL PMF
[0079] In the previous sections, it has been assumed that a single adjacency matrix A is available. Now consider that a discrete sequence of adjacency matrices A1, ... , AT is observed, where it is assumed that the sequence has seasonal dynamics with some known fixed seasonal period P. To include time dependence, a third index t is added to some of the parameters to denote the time at which an adjacency matrix was observed. As before, the counts Nijt are treated as latent variables, and the sequence of observed adjacency matrices is obtained as Aijt =
The latent counts are modeled as follows:
[0080] where g: N+ ® {1, ... , P } maps the observation time t to a seasonal segment. For example, with a fixed seasonal period of a week and daily observations, then g(t) = t mod 7 + 1 could correspond to each day of the week. The priors on air and jr do not change from the previous sections, and represent a baseline level of activity, which is constant over time. On the other hand, represent
should be noted that for some applications, it may not be expected that there is a seasonal adjustment to the rate for the interaction terms of the covariates, in which case the dependency on wkhg(t) could be dropped. For identifiability, it may be necessary to impose constraints on the seasonal adjustments. For example,
dropped, but model can be appropriately modified using the same technique presented in Eq. (13). For example, ai in Eq. (15) could be replaced with
[0082] Inference in the seasonal model follows the same principles used in the previous sections. Gibbs sampling and variational inference procedures can be used, and details and equations are discussed further below.
[0083] VARIATIONAL INFERENCE IN THE PMF MODEL
[0084] All of the factors in the variational approximation given in Eq. (8) are of closed form and take the same distributional form of the complete conditionals for each of the parameters. Let denote the rate å of the
where k is a constant with respect to Nij and Zij· .
with domain of Zi;- restricted to have
[0087] Evaluating the normalizing constants for the distribution in Eq. (17) gives the optimal variational distribution with the same form as Eq. (18) below, so
of the zero truncated Poisson is updated using (see Eq. (9) above for the final expression), and the update
for the vector of probabilities ij (see Eq. (10) above) is given by an extension of the standard result for variational inference in the PMF model:
[0089] INFERENCE IN THE SEASONAL MODEL
[0090] The Gibbs sampler for the seasonal model in Eq. (15) follows the same guidelines followed above for the non-seasonal models. Given the unobserved count latent variables are added, representing the contribution of the component /
to the total count The full conditional for follows Eq.
(3), except the rate for the Poisson and probability vectors for the Multinomial will now depend on the seasonal parameters Letting q denote a
Also:
to Eq. (4).
mean-field variational family is again used, implying a factorization similar to Eq. (7), so that:
form of the full conditional distributions for the corresponding parameter or group of parameters. Again, the variational parameters are updated using CAVI and a similar
algorithm is obtained to that detailed in Algorithm 1, where steps 7, 8, and 9 are modified to include the time-dependent parameters. It follows that for the user-specific parameters, the update equations take the form:
[0097] and similarly for
the expectation:
can be evaluated in a similar manner to Eq. (16).
[0100] Hence, one can derive the update equations for similar to in the
section above:
[0101] PMF EXTENSION
[0102] Some embodiments extend the PMF model by incorporating node-specific covariates for two sets of nodes (e.g., a first set of nodes for users and a second set of nodes for items) in the PMF framework, modeling sparsity on both latent feature matrices for each set of nodes, and/or accounting for seasonal effects (i.e., seasonality) to improve prediction of links. The standard PMF model may therefore be extended in three directions, which can be all implemented simultaneously in some embodiments to produce more accurate inference and prediction. The following equation summarizes the multiple models discussed herein:
that only the binary indicator is observed. “Censored” means that it
is assumed that only whether two entities connected is observed, and not the number of connections, as in most PMF applications. In other words, for a given link for a given user and host over a given time period, a link either is or is not observed (i.e., counts either exist they do not). This differs from conventional approaches, which consider observations rather than counts.
[0104] Starting from the hierarchical PMF model discussed above, which only includes the latent features covariates have been included though the matrix
movie ratings. If it is observed that a user watched a movie, but did not rate the movie, it is unknown whether and how much a user liked the movie. Based on application of the extended PMF of some embodiments, if covariates are known about the users and movies, it can be predicted from the covariate coefficients and the latent features and
whether the user would have liked the movie. The covariates can improve the
predictive performance, especially in the cases where there is little to no known information about what other movies the user has watched. Seasonal adjustments for the coefficients are obtained though the variables Similar to
seasonal corrections to F, and its elements are denoted
Inference schemes using variational inference and Gibbs sampling are discussed. In
particular, a variational inference scheme for the Bemoulli-Poisson link is proposed. The model is summarized graphically in graph 100 of FIG. 1.
[0106] The techniques of some embodiments have been applied to a user authentication graph of the Los Alamos National Laboratory, showing improvements over competing models for link prediction purposes. Including covariates improves area under curve (AUC) scores and average predictive log-likelihoods. Including covariates also enables prediction of cold-starts where new nodes enter the network. Using binary variables for doubly sparse latent features guarantees improvements in the log-likelihood on a held-out dataset. Seasonal corrections enable calculation of time- varying anomaly scores and produces different predictions on different time frames, producing more accurate results.
[0107] The complexity of some real-world networks (e.g., larger computer networks), requires adaptations to the standard PMF model in order to be practical. Often, nodes within a network have associated covariates providing prior knowledge about groupings of nodes, which can be used to improve the predictive power of the model. Also, some networks are intrinsically dynamic with strong seasonal patterns. Therefore, it is important to include the time to observation to have more accurate prediction of observed links in some cases. For example, in the computer network application where the observed network is users authenticating to computers, it may be normal for a user U to connect to computing system X during the week. However, on the weekend, this behavior may be extremely abnormal. Without incorporating seasonal effects, this information would be lost. Also, by using sparse latent feature matrices, each latent feature is associated only with a subset of the nodes. This allows for a more
precise assessment of the probability of a link, provides a framework for model selection, and can be used to select automatically the appropriate number of latent features.
[0108] FIG. 2 is a flowchart illustrating a process for link prediction using an extended PMF model, according to an embodiment of the present invention. The process begins with observing a real-world network over time and constructing a matrix for the two sets of nodes (e.g., one for users and another for items) based on the observations at 210 over a training period. An extended PMF model is then fit to the matrix for the two sets of nodes at 220 to learn posterior estimates for the model parameters for predictive analytical purposes. The extended PMF model incorporates node-specific covariates for the two sets of nodes in the graph, models sparsity on latent feature matrices for the two sets of nodes, accounts for seasonal effects, or any combination thereof, to predict the links.
[0109] To actually learn parameters, variational inference or a Gibbs sampler may be used. Gibbs sampling is a common Monte Carlo method for inference, as is variational inference. However, in some embodiments, variational inference and Gibbs samplers have been modified to account for binary edges.
[0110] After these techniques have been applied, the parameters of the model have been learned. This estimate allows for predictions to be made for anomaly detection, recommendations, etc. The learned posterior estimates for the model parameters are thus used to make predictions for future network observations or to determine anomaly scores about links observed after the training period or previously unobserved links at 230. The predictions or anomaly scores (and in some embodiments, the model parameters themselves) are then output at 240 for review.
[0111] In some embodiments, the link prediction results may include the top- most likely links for recommendations, the top-Mleast likely links for anomaly detection, etc. For anomaly detection, the results could include the top-M (or top-a%) most anomalous links, which could be further examined by security experts for assessment of the threat to the system. In addition, the parameters of the model could be output for secondary analyses for interpretability of results, such as to identify strongly linked users or items, or understand which covariates are important in making predictions.
[0112] FIG. 3 is a block diagram illustrating a computing system configured to perform new link prediction using an extended PMF model, according to an embodiment of the present invention. Computing system 300 includes a bus 305 or other communication mechanism for communicating information, and processor(s) 310 coupled to bus 305 for processing information. Processor(s) 310 may be any type of general or specific purpose processor, including a central processing unit (CPU), application specific integrated circuit (ASIC), field programmable gate array (FPGA), etc. Processor(s) 310 may also have multiple processing cores, and at least some of the cores may be configured to perform specific functions. Multi-parallel processing may be used in some embodiments. Computing system 300 further includes a memory 315 for storing information and instructions to be executed by processor(s) 310. Memory 315 can be comprised of any combination of random access memory (RAM), read only memory (ROM), flash memory, cache, static storage such as a magnetic or optical disk, or any other types of non-transitory computer-readable media or combinations thereof. Additionally, computing system 300 includes a communication device 320, such as a transceiver and antenna, to wirelessly provide access to a communications network.
[0113] Non-transitory computer-readable media may be any available media that can be accessed by processor(s) 310 and may include volatile media, non-volatile media, or both. The media may also be removable, non-removable, or both.
[0114] Processor(s) 310 are further coupled via bus 305 to a display 325, such as a Liquid Crystal Display (LCD), for displaying information to a user. A keyboard 330 and a cursor control device 335, such as a computer mouse, are further coupled to bus 305 to enable a user to interface with computing system. However, in certain embodiments, a physical keyboard and mouse may not be present, and the user may interact with the device solely through display 325 and/or a touchpad (not shown). Any type and combination of input devices may be used as a matter of design choice. In certain embodiments, no physical input device is present. For instance, the user may interact with computing system 300 remotely via another computing system in communication therewith, or computing system 300 may operate autonomously.
[0115] Memory 315 stores software modules that provide functionality when executed by processor(s) 310. The modules include an operating system 340 for computing system 300. The modules further include a module 1145 that is configured to perform new link prediction using an extended PMF model by employing any of the approaches discussed herein or derivatives thereof. Computing system 1100 may include one or more additional functional modules 1150 that include additional functionality.
[0116] One skilled in the art will appreciate that a“system” could be embodied as a server, an embedded computing system, a personal computer, a console, a personal digital assistant (PDA), a cell phone, a tablet computing device, or any other suitable
computing device, or combination of devices. Presenting the above-described functions as being performed by a“system” is not intended to limit the scope of the present invention in any way, but is intended to provide one example of many embodiments of the present invention. Indeed, methods, systems and apparatuses disclosed herein may be implemented in localized and distributed forms consistent with computing technology, including cloud computing systems.
[0117] It should be noted that some of the system features described in this specification have been presented as modules, in order to more particularly emphasize their implementation independence. For example, a module may be implemented as a hardware circuit comprising custom very large scale integration (VLSI) circuits or gate arrays, off-the-shelf semiconductors such as logic chips, transistors, or other discrete components. A module may also be implemented in programmable hardware devices such as field programmable gate arrays, programmable array logic, programmable logic devices, graphics processing units, or the like.
[0118] A module may also be at least partially implemented in software for execution by various types of processors. An identified unit of executable code may, for instance, comprise one or more physical or logical blocks of computer instructions that may, for instance, be organized as an object, procedure, or function. Nevertheless, the executables of an identified module need not be physically located together, but may comprise disparate instructions stored in different locations which, when joined logically together, comprise the module and achieve the stated purpose for the module. Further, modules may be stored on a computer-readable medium, which may be, for
instance, a hard disk drive, flash device, RAM, tape, or any other such medium used to store data.
[0119] Indeed, a module of executable code could be a single instruction, or many instructions, and may even be distributed over several different code segments, among different programs, and across several memory devices. Similarly, operational data may be identified and illustrated herein within modules, and may be embodied in any suitable form and organized within any suitable type of data structure. The operational data may be collected as a single data set, or may be distributed over different locations including over different storage devices, and may exist, at least partially, merely as electronic signals on a system or network.
[0120] The process steps performed in FIG. 2 may be performed by a computer program, encoding instructions for the processor(s) to perform at least the process described in FIG. 2, in accordance with embodiments of the present invention. The computer program may be embodied on a non-transitory computer-readable medium. The computer-readable medium may be, but is not limited to, a hard disk drive, a flash device, RAM, a tape, or any other such medium used to store data. The computer program may include encoded instructions for controlling the processor(s) to implement the process described in FIG. 2, which may also be stored on the computer-readable medium.
[0121] The computer program can be implemented in hardware, software, or a hybrid implementation. The computer program can be composed of modules that are in operative communication with one another, and which are designed to pass
information or instructions to display. The computer program can be configured to operate on a general purpose computer, an ASIC, or any other suitable device.
[0122] It will be readily understood that the components of various embodiments of the present invention, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the detailed description of the embodiments of the present invention, as represented in the attached figures, is not intended to limit the scope of the invention as claimed, but is merely representative of selected embodiments of the invention.
[0123] The features, structures, or characteristics of the invention described throughout this specification may be combined in any suitable manner in one or more embodiments. For example, reference throughout this specification to “certain embodiments,”“some embodiments,” or similar language means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases“in certain embodiments,”“in some embodiment,”“in other embodiments,” or similar language throughout this specification do not necessarily all refer to the same group of embodiments and the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
[0124] It should be noted that reference throughout this specification to features, advantages, or similar language does not imply that all of the features and advantages that may be realized with the present invention should be or are in any single embodiment of the invention. Rather, language referring to the features and advantages is understood to mean that a specific feature, advantage, or characteristic described in
connection with an embodiment is included in at least one embodiment of the present invention. Thus, discussion of the features and advantages, and similar language, throughout this specification may, but do not necessarily, refer to the same embodiment.
[0125] Furthermore, the described features, advantages, and characteristics of the invention may be combined in any suitable manner in one or more embodiments. One skilled in the relevant art will recognize that the invention can be practiced without one or more of the specific features or advantages of a particular embodiment. In other instances, additional features and advantages may be recognized in certain embodiments that may not be present in all embodiments of the invention.
[0126] One having ordinary skill in the art will readily understand that the invention as discussed above may be practiced with steps in a different order, and/or with hardware elements in configurations which are different than those which are disclosed. Therefore, although the invention has been described based upon these preferred embodiments, it would be apparent to those of skill in the art that certain modifications, variations, and alternative constructions would be apparent, while remaining within the spirit and scope of the invention. In order to determine the metes and bounds of the invention, therefore, reference should be made to the appended claims.
Claims
1. A computer program embodied on a non-transitory computer-readable medium, the program configured to cause at least one processor to:
observe a real-world network over time and construct a matrix for two sets of nodes based on the observation of the real-world network over time;
fit an extended Poisson matrix factorization (PMF) model to the matrix for the two sets of nodes to learn posterior estimates for model parameters for predictive analytical purposes, the extended PMF model incorporating node-specific covariates for the two sets of nodes, modeling sparsity on latent feature matrices for the two sets of nodes, accounting for seasonal effects, or any combination thereof, to predict links; use the learned posterior estimates for the model parameters to make predictions for future network observations or to determine anomaly scores about links observed after the training period or previously unobserved links; and
output the predictions, the anomaly scores, the model parameters themselves, or any combination thereof.
2. The computer program of claim 1, wherein the extended PMF model is a doubly sparse PMF with Indian Buffet Process (IBP) priors that further refine edge probabilities.
3. The computer program of claim 1, wherein the extended PMF model employs fast inference schemes using variational inference and Gibbs sampling either individually or in combination.
4. The computer program of claim 3, wherein for the Gibbs sampling, a latent vector conditional on counts Nij has a multinomial distribution such that so that the latent vector and the counts Nij are jointly resampled in a blocked Gibbs sampler step, and complete conditionals for the latent features for each set of nodes are Gamma.
5. The computer program of claim 1, wherein the extended PMF model is extended to deal with binary edges, where only existence of an edge is observed during a predetermined time period.
6. The computer program of claim 1, the program further configured to cause the at least one processor to:
form at least one rectangular adjacency matrix using an indicator function; and use a structure of the at least one rectangular adjacency matrix to predict a structure of a subsequently observed graph of the computer network, wherein
the adjacency matrix is obtained by setting is an indicator function providing an /V-dimensional vector of ones that indicates whether counts are present, thus treating counts as latent variables and treating as a censored count.
7. The computer program of claim 1, wherein
the extended PMF model uses a variational inference procedure for binary matrices, and
the variational inference procedure provides inference on marginal posterior distributions of parameters for the two sets of nodes, as well as for the covariates since this underpins a predictive distribution on which edges are likely to be observed in the future.
8. The computer program of claim 1, wherein the extended PMF model uses a common latent variable approach to inference.
9. The computer program of claim 1, wherein seasonality is accounted for by considering a discrete sequence of adjacency matrices A1, ... , AT representing observation during time periods 1 to T, including time dependence.
10. The computer program of claim 1, wherein the node-specific covariates for both sets of nodes in the PMF model, sparsity on latent feature matrices for the two sets of nodes, and seasonal effects are accounted for simultaneously in the extended PMF model to produce more accurate inference and link prediction.
11. The computer program of claim 1, wherein the output comprises at least one Internet Protocol (IP) address, at least one Media Access Control (MAC) address,
or a combination thereof identifying the computing system initiating the link, the computing system receiving the link, or both.
12. The computer program of claim 1, wherein the program is further configured to cause the at least one processor to:
generate a bipartite graph representative of the computer network over a plurality of discrete time intervals, the bipartite graph comprising a set of users in the computer network, a set of hosts in the computer network, and an observed set of links between the user accounts and the hosts over a predetermined time period.
13. The computer program of claim 1, wherein the extended PMF model is implemented as follows:
where counts /Vi;t have been considered as censored, only a binary indicator is observed, and represent latent features, covariates are
included through a matrix of coefficients F that contains interaction terms between the covariates between both sets of nodes, and seasonal adjustments for the coefficients of F are obtained though variables and sparsity and variable selection issues are tackled using binary random vectors D
14. A computer program embodied on a non-transitory computer-readable medium, the program configured to cause at least one processor to:
observe a real-world network over time and construct a matrix for two sets of nodes based on the observation of the real-world network over time;
fit an extended Poisson matrix factorization (PMF) model to the matrix for the two sets of nodes to learn posterior estimates for model parameters for predictive analytical purposes, the extended PMF model incorporating node-specific covariates for the two sets of nodes, modeling sparsity on latent feature matrices for the two sets of nodes, accounting for seasonal effects, or any combination thereof, to predict links; use the learned posterior estimates for the model parameters to make predictions for future network observations or to determine anomaly scores about links observed after the training period or previously unobserved links; and
output the predictions, the anomaly scores, the model parameters themselves, or any combination thereof, wherein
the extended PMF model uses a variational inference procedure for binary matrices, and
the variational inference procedure provides inference on marginal posterior distributions of parameters for the two sets of nodes, as well as for the covariates since this underpins a predictive distribution on which edges are likely to be observed in the future.
15. The computer program of claim 14, wherein
the extended PMF model employs fast inference schemes using variational inference and Gibbs sampling either individually or in combination, and
for the Gibbs sampling, a latent vector conditional on counts Nij has a multinomial distribution such that so that the latent vector and the counts Nij are jointly resampled in a blocked Gibbs sampler step, and complete conditionals for the latent features for each set of nodes are Gamma.
16. The computer program of claim 14, wherein the node-specific covariates for both sets of nodes in the PMF model, sparsity on latent feature matrices for the two sets of nodes, and seasonal effects are accounted for simultaneously in the extended PMF model to produce more accurate inference and link prediction.
17. The computer program of claim 14, wherein the output comprises at least one Internet Protocol (IP) address, at least one Media Access Control (MAC) address, or a combination thereof identifying the computing system initiating the link, the computing system receiving the link, or both.
18. The computer program of claim 14, wherein the program is further configured to cause the at least one processor to:
generate a bipartite graph representative of the computer network over a plurality of discrete time intervals, the bipartite graph comprising a set of users in the computer network, a set of hosts in the computer network, and an observed set of links between the user accounts and the hosts over a predetermined time period.
19. A computer-implemented method, comprising:
fitting, by a computing system, an extended Poisson matrix factorization (PMF) model to a matrix for two sets of nodes based on the observation of the real-world network over time to learn posterior estimates for model parameters for predictive analytical purposes, the extended PMF model incorporating node-specific covariates for the two sets of nodes, modeling sparsity on latent feature matrices for the two sets of nodes, accounting for seasonal effects, or any combination thereof, to predict links; using the learned posterior estimates for the model parameters, by the computing system, to make predictions for future network observations or to determine anomaly scores about links observed after the training period or previously unobserved links; and
outputting the predictions, the anomaly scores, the model parameters themselves, or any combination thereof, by the computing system.
20. The computer-implemented method of claim 19, wherein the node- specific covariates for both sets of nodes in the PMF model, sparsity on latent feature matrices for the two sets of nodes, and seasonal effects are accounted for simultaneously in the extended PMF model to produce more accurate inference and link prediction.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201962819912P | 2019-03-18 | 2019-03-18 | |
US62/819,912 | 2019-03-18 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2020191001A1 true WO2020191001A1 (en) | 2020-09-24 |
WO2020191001A8 WO2020191001A8 (en) | 2021-04-15 |
Family
ID=72519144
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2020/023264 WO2020191001A1 (en) | 2019-03-18 | 2020-03-18 | Real-world network link analysis and prediction using extended probailistic maxtrix factorization models with labeled nodes |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2020191001A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113780584A (en) * | 2021-09-28 | 2021-12-10 | 京东科技信息技术有限公司 | Label prediction method, apparatus, storage medium and program product |
US11418526B2 (en) | 2019-12-20 | 2022-08-16 | Microsoft Technology Licensing, Llc | Detecting anomalous network activity |
CN115278706A (en) * | 2021-04-29 | 2022-11-01 | 中国移动通信集团河北有限公司 | Network structure evaluation method, device, equipment and computer storage medium |
US11556636B2 (en) | 2020-06-30 | 2023-01-17 | Microsoft Technology Licensing, Llc | Malicious enterprise behavior detection tool |
US11949701B2 (en) | 2021-08-04 | 2024-04-02 | Microsoft Technology Licensing, Llc | Network access anomaly detection via graph embedding |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180232646A1 (en) * | 2017-02-14 | 2018-08-16 | Cognitive Scale, Inc. | Augmented Gamma Belief Network Operation |
US20180307994A1 (en) * | 2017-04-25 | 2018-10-25 | Nec Laboratories America, Inc. | Identifying multiple causal anomalies in power plant systems by modeling local propagations |
-
2020
- 2020-03-18 WO PCT/US2020/023264 patent/WO2020191001A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180232646A1 (en) * | 2017-02-14 | 2018-08-16 | Cognitive Scale, Inc. | Augmented Gamma Belief Network Operation |
US20180307994A1 (en) * | 2017-04-25 | 2018-10-25 | Nec Laboratories America, Inc. | Identifying multiple causal anomalies in power plant systems by modeling local propagations |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11418526B2 (en) | 2019-12-20 | 2022-08-16 | Microsoft Technology Licensing, Llc | Detecting anomalous network activity |
US11556636B2 (en) | 2020-06-30 | 2023-01-17 | Microsoft Technology Licensing, Llc | Malicious enterprise behavior detection tool |
CN115278706A (en) * | 2021-04-29 | 2022-11-01 | 中国移动通信集团河北有限公司 | Network structure evaluation method, device, equipment and computer storage medium |
CN115278706B (en) * | 2021-04-29 | 2023-08-15 | 中国移动通信集团河北有限公司 | Network structure evaluation method, device, equipment and computer storage medium |
US11949701B2 (en) | 2021-08-04 | 2024-04-02 | Microsoft Technology Licensing, Llc | Network access anomaly detection via graph embedding |
CN113780584A (en) * | 2021-09-28 | 2021-12-10 | 京东科技信息技术有限公司 | Label prediction method, apparatus, storage medium and program product |
CN113780584B (en) * | 2021-09-28 | 2024-03-05 | 京东科技信息技术有限公司 | Label prediction method, label prediction device, and storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2020191001A8 (en) | 2021-04-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020191001A1 (en) | Real-world network link analysis and prediction using extended probailistic maxtrix factorization models with labeled nodes | |
Gan et al. | Bayesian regularization for graphical models with unequal shrinkage | |
EP3574454B1 (en) | Learning neural network structure | |
Warton et al. | So many variables: joint modeling in community ecology | |
Corchado et al. | Ibr retrieval method based on topology preserving mappings | |
Tabouy et al. | Variational inference for stochastic block models from sampled data | |
Crawford et al. | Bayesian approximate kernel regression with variable selection | |
Wei et al. | Measuring temporal patterns in dynamic social networks | |
WO2022110640A1 (en) | Model optimization method and apparatus, computer device and storage medium | |
Lubold et al. | Identifying the latent space geometry of network models through analysis of curvature | |
Manski | Vaccination with partial knowledge of external effectiveness | |
Hüsler et al. | Estimation for the generalized Pareto distribution using maximum likelihood and goodness of fit | |
US11914672B2 (en) | Method of neural architecture search using continuous action reinforcement learning | |
Lee et al. | Anomaly detection in large-scale networks with latent space models | |
Cheng et al. | Long-term effect estimation with surrogate representation | |
Shen et al. | Enhancing stochastic kriging for queueing simulation with stylized models | |
Karabatsos | Marginal maximum likelihood estimation methods for the tuning parameters of ridge, power ridge, and generalized ridge regression | |
Bickel et al. | Correcting false discovery rates for their bias toward false positives | |
Zhou | Challenges and strategies in analysis of missing data | |
Vihrs et al. | Approximate Bayesian inference for a spatial point process model exhibiting regularity and random aggregation | |
JP2013037471A (en) | Probabilistic model update system, probabilistic model update device, probabilistic model update method, and program | |
VanDerwerken et al. | Monitoring joint convergence of MCMC samplers | |
Su et al. | Hidden Markov model in multiple testing on dependent count data | |
Almomani et al. | Selecting a good stochastic system for the large number of alternatives | |
Bai et al. | GPR-OPT: A Practical Gaussian optimization criterion for implicit recommender systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20772749 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20772749 Country of ref document: EP Kind code of ref document: A1 |