WO2017019391A1 - Graph-based intrusion detection using process traces - Google Patents

Graph-based intrusion detection using process traces Download PDF

Info

Publication number
WO2017019391A1
WO2017019391A1 PCT/US2016/043040 US2016043040W WO2017019391A1 WO 2017019391 A1 WO2017019391 A1 WO 2017019391A1 US 2016043040 W US2016043040 W US 2016043040W WO 2017019391 A1 WO2017019391 A1 WO 2017019391A1
Authority
WO
WIPO (PCT)
Prior art keywords
entities
graph
score
entity
patterns
Prior art date
Application number
PCT/US2016/043040
Other languages
French (fr)
Inventor
Zhengzhang CHEN
Luan Tang
Boxiang Dong
Guofei Jiang
Haifeng Chen
Original Assignee
Nec Laboratories America, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US15/213,896 external-priority patent/US10305917B2/en
Application filed by Nec Laboratories America, Inc. filed Critical Nec Laboratories America, Inc.
Priority to DE112016002806.7T priority Critical patent/DE112016002806T5/en
Priority to JP2018502363A priority patent/JP6557774B2/en
Publication of WO2017019391A1 publication Critical patent/WO2017019391A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/552Detecting local intrusion or implementing counter-measures involving long-term monitoring or reporting

Definitions

  • the present invention relates to computer and information security and, more particularly, to host-level intrusion detection on massive process traces.
  • Enterprise networks are key systems in corporations and they carry the vast majority of mission-critical information. As a result of their importance, these networks are often the targets of attack. To guarantee the information security in the network of computers, an intrusion detection system is needed to keep track of the running status of the entire network and identify scenarios that are associated with potential attacks or malicious behaviors.
  • detection systems collect rich information about process/program events (e.g., when a program opens a file) on a particular host or machine. While this information enables intrusion detection systems to monitor intrusive behavior accurately, signature-based detection techniques fail to detect new threats, while anomaly- based detection techniques either focus on detecting a single abnormal process or need an offline model to be built from training data with purely normal events.
  • intrusion detection systems often rely on a coordinated or sequential, not independent, action of several system events to determine what state a given system is in.
  • the system monitoring data is typically made up of low-level process events or interactions between various system entities, such as processes, files and sockets (e.g., when a program opens a file or connects to a server) with exact time stamps, while attempted intrusions are higher-level activities which usually involve multiple different process events.
  • a network attack called Advanced Persistent Threat (APT) is composed of a set of stealthy and continuous computer hacking processes. APT first attempts to gain a foothold in the environment. Then, using the compromised systems as an access into the target network, APT deploys additional tools that help fulfill the attack objective.
  • APT Advanced Persistent Threat
  • a method for detecting malicious processes includes modeling system data as a graph comprising vertices that represent system entities and edges that represent events between respective system entities. Each edge has one or more timestamps corresponding respective events between two system entities.
  • a set of valid path patterns that relate to potential attacks is generated. One or more event sequences in the system are determined to be suspicious based on the graph and the valid path patterns using a random walk on the graph.
  • a system for detecting malicious processes includes a modeling module configured to model system data as a graph that has vertices that represent system entities and edges that represent events between respective system entities. Each edge includes one or more timestamps corresponding respective events between two system entities.
  • a malicious process path discovery module includes a processor configured to generate a set of valid path patterns that relate to potential attacks and to determine one or more event sequences in the system to be suspicious based on the graph and the valid path patterns using a random walk on the graph.
  • FIG. 1 is directed to a network graph representing communities and roles of nodes in accordance with the present principles.
  • FIG. 2 is a block/flow diagram of a method of discovering community and role memberships and detecting anomalies in accordance with the present principles.
  • FIG. 3 is a block diagram of a host-level analysis module in accordance with the present principles.
  • FIG. 4 is a block/flow diagram of a method for detecting suspicious host-level event sequences in accordance with the present principles.
  • FIG. 5 is a segment of pseudo-code for detecting suspicious host-level event sequences in accordance with the present principles.
  • FIG. 6 is a block diagram of a processing system in accordance with the present principles.
  • the present embodiments provide malicious process path discovery, detecting abnormal process paths that are related to intrusion activities. This is accomplished using process traces. A set of valid sequence patterns is generated and a random walk based process is used to learn system functions and discover suspicious process sequences. To eliminate score bias from the path length, the Box-Cox power transformation is applied to normalize the anomaly scores of process sequences.
  • the present embodiments thereby provide complete evidence of an attacker's activity trace (i.e., the process path) after an attack has occurred.
  • the present embodiments more accurately detect malicious process paths, reducing the number of false positives and false negatives in less time and with less computational complexity.
  • a compact graph structure may be used to reduce memory load, a set of valid sequence patterns may be generated and used to reduce the search space, and a random walk approach may be used to reduce the computational cost.
  • the present embodiments are able to detect new attacks because no training data is needed.
  • an automatic security intelligence system (ASI) architecture is shown.
  • the ASI system includes three major components: an agent 10 is installed in each machine of an enterprise network to collect operational data; backend servers 200 receive data from the agents 10, pre-process the data, and sends the pre- processed data to an analysis server 30; and an analysis server 30 that runs the security application program to analyze the data.
  • Each agent 10 includes an agent manager 11, an agent updater 12, and agent data 13, which in turn may include information regarding active processes, file access, net sockets, number of instructions per cycle, and host information.
  • the backend server 20 includes an agent updater server 21 and surveillance data storage.
  • Analysis server 30 includes intrusion detection 31, security policy compliance assessment 32, incident backtrack and system recovery 33, and centralized threat search and query 34.
  • intrusion detection 31 There are five modules in an intrusion detection engine: a data distributor 41 that receives the data from backend server 20 and distributes the corresponding to network level module 42 and host level module 43; network analysis module 42 that processes the network communications (including TCP and UDP) and detects abnormal communication events; host level analysis module 43 that processes host level events, including user-to-process events, process-to-file events, and user-to-registry events; anomaly fusion module 44 that integrates network level anomalies and host level anomalies and refines the results for trustworthy intrusion events; and visualization module 45 that outputs the detection results to end users.
  • a data distributor 41 that receives the data from backend server 20 and distributes the corresponding to network level module 42 and host level module 43
  • network analysis module 42 that processes the network communications (including TCP and UDP) and detects abnormal communication events
  • host level analysis module 43 that processes host level events, including user-to-process events, process-to-file events, and user-to-registry events
  • anomaly fusion module 44 that integrates
  • the host level analysis module 43 includes a hardware processor 312 and a memory 314.
  • the host level analysis module 43 includes one or more functional modules that may, in one embodiment, be stored in memory 314 and executed by hardware processor 312.
  • the functional modules may be implemented as one or more discrete hardware components in the form of, e.g., application specific integrated chips or field programmable gate arrays.
  • the host level analysis module 43 includes a number of different analysis and detection functions.
  • a process-to-file anomaly detection module 302 takes host level process-to-file events from data distributor 41 as input and discovers abnormal process-to- file events. These events may include, e.g., reading from or writing to a file.
  • a user-to- process anomaly detection module 304 takes all streaming process events as input from the data distributor 41 and models each user's behavior at the process level, identifying suspicious processes run by each user.
  • a USB event anomaly detection module 306 also considers the streaming process events and identifies all USB device-related events to detect anomalous device activity.
  • a process signature anomaly detection module 308 takes process names and signatures as input from data distributor 41 and detects processes with suspicious signatures.
  • a malicious process path discovery module 310 takes current active processes from the data distributor 41 as starting points and tracks all of the possible process paths by combing the incoming and previous events in a time window.
  • the malicious process path discovery module 310 detects anomalous process sequences/paths as described in greater detail below.
  • a blueprint graph is used as input.
  • the blueprint graph is a heterogeneous graph constructed from a historical dataset of communications in a network, with nodes of the blueprint graph representing physical devices on an enterprise network and edges reflecting the normal communication patterns among the nodes.
  • Block 402 performs graph modeling, using a compact graph structure to capture the complex interactions between system entities as an acyclic, multipartite graph.
  • Block 404 then generates a set of valid sequence patterns based on a maximum sequence length. The maximum sequence length may be set by a user or an optimal value may be determined automatically. Forming the valid sequence patterns dramatically reduces the search space size of the graph.
  • Block 406 then scans the graph to determine candidate event sequences that is consistent with the patterns.
  • a "pattern” refers to an ordered set of system entity types, while a “sequence” refers to an ordered set of specific system entities. Thus, a sequence is consistent with a pattern if its series of system entities' respective types match those of the pattern. Sequences are alternatively referred to herein as "paths.”
  • Block 408 applies a random walk to extract the characteristics of every entity. Based on a discovered entity's nature, block 408 calculates an anomaly score of each candidate process to evaluate how abnormal the process is. As there may be multiple different sequence patterns of different lengths, and as the scores of two paths with different lengths are not directly comparable, the anomaly score distribution for each sequence pattern is transformed into a single distribution by block 410 using, e.g., a Box-Cox power transformation. Block 410 measures the deviation between the suspicious sequences and the normal sequences, reporting those sequences that have a higher-than-threshold deviation.
  • Attacking behavior often involves multiple system entities (e.g., processes, files, sockets, etc.). Therefore the present embodiments incorporate the interactions between multiple different system entities.
  • system entities e.g., processes, files, sockets, etc.
  • the amount of information provided by system monitoring can be extremely large, making direct storage and access of that information in memory impractical.
  • the information has a large degree of redundancy. Redundancy can be found first in attributes, as each event record includes not only the involved entities but also the attributes of these entities. Storing the attributes repeatedly for each event is redundant. Second, saving events that involve the same entities repeatedly with only time stamps changing between them is redundant. Third, storage of records that are irrelevant to the detection of intrusion attacks is unneeded.
  • the graph model is therefore generated by block 402 to incorporate the meaningful information from the monitoring data in a compressed way.
  • G (V, E, ⁇ )
  • the attribute values for a unique entity are stored only once. For each event sequence of a length /, there is a corresponding path through G of / edges.
  • block 404 To extract the most suspicious paths from the graph G, a naive approach would be to explore all of the existing paths. However, it is impractical to enumerate all the possible paths from a densely connected graph.
  • block 404 To provide guidance to candidate searching in block 406, block 404 generates a set of valid path patterns B. Only those paths that conform to the valid path patterns are related to potential attacks— others may be discarded.
  • Each path pattern B of length / includes / entities and/or entity types.
  • the path pattern B may include both specific entities (e.g., particular files) as well as more general designations of entity type in the same path.
  • B is determined to be a valid path pattern only if there exists at least a path p ' G that is consistent with B. Considering that / may be a small number, all of the possible paths can be enumerated. Searching the graph G permits all of the valid patterns to be extracted.
  • the valid path patterns B may be, for example, generated by experts using their experiences with previous intrusion detection attacks. However, it may be difficult to obtain an accurate and complete set of valid path patterns from such experts. As such, path patterns may be automatically generated. Each entity is set as a specific system entity type. For all the paths that correspond to information leakage, they must begin with a File entity (F) and end with an Internet socket entity (I). Given a path p E G and a path pattern B of G, p [i] and B [i] represent the 2 th node in p and B respectively.
  • F File entity
  • I Internet socket entity
  • the path p is therefore consistent with B, denoted as p ⁇ B, if p and B have the same length and if, for each /, p [i] E B [i] (i.e., the specific entity p [i] belongs to the entity type B [i]). Then B is a valid path pattern if there exists at least one path p in G such that p ⁇ B.
  • B is a valid path pattern if there exists at least one path p in G such that p ⁇ B.
  • block 406 searches for paths in the multipartite graph satisfying the patterns.
  • an event sequence seq ⁇ e x , e 2 ,—, e r )
  • path p ⁇ v lt v 2 ,— , v r+1 ⁇ in the graph G.
  • a time order constraint is applied to candidate path searching.
  • a candidate path is determined to be suspicious by block 408 if, in the path, the involved entities behave differently from their normal roles.
  • the information senders and receivers are identified as entity roles.
  • the sender and receiver score should be accurately learned from the computer system, as they are used to set the profile of normal behavior.
  • a random walk is applied to the graph G. From G, an N X N square transition matrix A is calculated as: where N is the total number of entities and T ⁇ vi,vj) is the set of timestamps on which the event between v; and vj has ever happened.
  • A denotes the probability that the information flows from v; to vj in G.
  • A is the matrix representation of the multipartite graph G, A can also be denoted as:
  • each entity's sender and receiver scores can be iteratively generated as:
  • an entity that sends information to a large number of entities having high receiver scores is itself an important information sender, and similarly an entity that receives information from a large number of entities having high sender scores is an important information receiver.
  • the sender and receiver scores for an entity are therefore iteratively calculated by accumulating the receiver and sender scores related to the entity. For example, the file /etc/passwd on a UNIX® system will have a high sender score and a low receiver score, because it is sent to many processes for checking access permissions, but it is rarely modified.
  • a result of this iterative refinement is that the learned score values will depend on the initial score values. However, the effect of the initial score values can be eliminated using the steady state property of the matrix. Given a general square matrix M and a general vector 7 ⁇ , the general vector ⁇ can be repeatedly updated as:
  • a convergence state is possible such that n m+1 « n m for sufficiently large values of m. In this case, there is only one unique value which can reach the convergence state:
  • the convergence state has the property that the converged vector is only dependent on the matrix , but is independent from the initial vector value ⁇ 0 .
  • a graph G is irreducible if and only if, for any two nodes, there exists at least one path between them.
  • the period of a node is the minimum path length from the node back to itself, and the period of a graph is the greatest common divisor of all of the nodes' period values.
  • a graph G is aperiodic if and only if it is irreducible and the period of G is 1.
  • a new transition matrix A is defined as:
  • c is a value between 0 and 1 and is referred to as the restart ratio.
  • A is guaranteed to be irreducible and aperiodic, providing converged sender score and receiver score vectors.
  • the convergence rate can be controlled by controlling the restart rate value.
  • One exemplary value for a number of iterations to use to ensure convergence is about 10.
  • the anomaly score for the path is calculated as:
  • the anomaly scores for paths of different lengths have different distributions. Therefore, to compare the suspiciousness of paths of different lengths, a transformation is performed to put paths of different length on equal footing.
  • the path anomaly score can have an arbitrary distribution and is generally not a normal distribution.
  • the suspiciousness of a path of length r can be defined as:
  • the top k suspicious paths are those having the largest suspiciousness score.
  • a Box-Cox power transformation is used as the normalization function.
  • Q(r) denote the set of anomaly scores calculated from paths of length r, for each score q G
  • is a normalization parameter. Different values for ⁇ yield different transformed distributions. The aim is to select a value of ⁇ that yields a normalized distribution as close as possible to the normal distribution (i.e., T(B, ⁇ ) ⁇ ( ⁇ , ⁇ 2 )).
  • the top k suspicious paths are not considered related to intrusion attacks unless they are sufficiently distinctive from the normal paths.
  • the i-value is calculated between the two groups of paths. Due to the large number of normal paths, an efficient solution based on a Monte Carlo simulation is used to calculate the expectation and variance from a sample of relatively small size without computing the summation.
  • the host-level analysis module 43 provides information regarding the anomaly, generating a report that may include one or more alerts.
  • the anomaly fusion module 44 integrates these host- level alerts with other host- and network- level anomalies and automatically filters out false alarms.
  • the resulting list of anomalies is provided via visualization module 45 to a user.
  • certain anomalies or classes of anomalies may be addressed automatically, for example by deploying security countermeasures or mitigations.
  • an automatic response to a detected anomaly may be to shut down a device showing the anomalous behavior until it can be reviewed by an administrator.
  • Sender and receiver score vectors X and Y are created using the random walk process.
  • a queue of files F x is sorted according to descending X and a second queue of files F Y is sorted descending Y.
  • Queues of processes P x and P Y , queues of UNIX® sockets U x and U Y , and queues of internet sockets S x and S Y are created in the same way.
  • the paths are then processed to find a path set that conforms to the event sequence pattern and time constraint.
  • Embodiments described herein may be entirely hardware, entirely software or including both hardware and software elements.
  • the present invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
  • Embodiments may include a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.
  • a computer-usable or computer readable medium may include any apparatus that stores, communicates, propagates, or transports the program for use by or in connection with the instruction execution system, apparatus, or device.
  • the medium can be magnetic, optical, electronic, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium.
  • the medium may include a computer-readable storage medium such as a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk, etc.
  • Each computer program may be tangibly stored in a machine-readable storage media or device (e.g., program memory or magnetic disk) readable by a general or special purpose programmable computer, for configuring and controlling operation of a computer when the storage media or device is read by the computer to perform the procedures described herein.
  • the inventive system may also be considered to be embodied in a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.
  • a data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus.
  • the memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code to reduce the number of times code is retrieved from bulk storage during execution.
  • I/O devices including but not limited to keyboards, displays, pointing devices, etc. may be coupled to the system either directly or through intervening I/O controllers.
  • Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks.
  • Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
  • an exemplary processing system 600 which may represent the analytic server 30, the intrusion detection system 31, and/or the host level analysis module 43.
  • the processing system 600 includes at least one processor (CPU) 604 operatively coupled to other components via a system bus 602.
  • a first storage device 622 and a second storage device 624 are operatively coupled to system bus 602 by the I/O adapter 620.
  • the storage devices 622 and 624 can be any of a disk storage device (e.g., a magnetic or optical disk storage device), a solid state magnetic device, and so forth.
  • the storage devices 622 and 624 can be the same type of storage device or different types of storage devices.
  • a speaker 632 is operatively coupled to system bus 602 by the sound adapter 630.
  • a transceiver 642 is operatively coupled to system bus 602 by network adapter 640.
  • a display device 662 is operatively coupled to system bus 602 by display adapter 660.
  • a first user input device 652, a second user input device 654, and a third user input device 656 are operatively coupled to system bus 602 by user interface adapter 650.
  • the user input devices 652, 654, and 656 can be any of a keyboard, a mouse, a keypad, an image capture device, a motion sensing device, a microphone, a device incorporating the functionality of at least two of the preceding devices, and so forth. Of course, other types of input devices can also be used, while maintaining the spirit of the present principles.
  • the user input devices 652, 654, and 656 can be the same type of user input device or different types of user input devices.
  • the user input devices 652, 654, and 656 are used to input and output information to and from system 600.
  • processing system 600 may also include other elements (not shown), as readily contemplated by one of skill in the art, as well as omit certain elements.
  • various other input devices and/or output devices can be included in processing system 600, depending upon the particular implementation of the same, as readily understood by one of ordinary skill in the art.
  • various types of wireless and/or wired input and/or output devices can be used.
  • additional processors, controllers, memories, and so forth, in various configurations can also be utilized as readily appreciated by one of ordinary skill in the art.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Virology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Computer And Data Communications (AREA)

Abstract

Methods and systems for detecting malicious processes include modeling system data as a graph comprising vertices that represent system entities and edges that represent events between respective system entities. Each edge has one or more timestamps corresponding respective events between two system entities. A set of valid path patterns that relate to potential attacks is generated. One or more event sequences in the system are determined to be suspicious based on the graph and the valid path patterns using a random walk on the graph.

Description

GRAPH-BASED INTRUSION DETECTION USING PROCESS TRACES
RELATED APPLICATION INFORMATION
[0001] This application is a continuation in part of Application No. 15/098,861, filed on April 14, 2016, which in turn claims priority to Application Serial No. 62/148,232, filed on April 16, 2015. This application further claims priority to Application Serial No. 62/196,404, filed on July 24, 2015, and to Application Serial No. 62/360,572, filed on July 11, 2016, incorporated herein by reference in their entirety.
BACKGROUND
Technical Field
[0002] The present invention relates to computer and information security and, more particularly, to host-level intrusion detection on massive process traces.
Description of the Related Art
[0003] Enterprise networks are key systems in corporations and they carry the vast majority of mission-critical information. As a result of their importance, these networks are often the targets of attack. To guarantee the information security in the network of computers, an intrusion detection system is needed to keep track of the running status of the entire network and identify scenarios that are associated with potential attacks or malicious behaviors.
[0004] At the host level, detection systems collect rich information about process/program events (e.g., when a program opens a file) on a particular host or machine. While this information enables intrusion detection systems to monitor intrusive behavior accurately, signature-based detection techniques fail to detect new threats, while anomaly- based detection techniques either focus on detecting a single abnormal process or need an offline model to be built from training data with purely normal events.
[0005] More importantly, intrusion detection systems often rely on a coordinated or sequential, not independent, action of several system events to determine what state a given system is in. The system monitoring data is typically made up of low-level process events or interactions between various system entities, such as processes, files and sockets (e.g., when a program opens a file or connects to a server) with exact time stamps, while attempted intrusions are higher-level activities which usually involve multiple different process events. For example, a network attack called Advanced Persistent Threat (APT) is composed of a set of stealthy and continuous computer hacking processes. APT first attempts to gain a foothold in the environment. Then, using the compromised systems as an access into the target network, APT deploys additional tools that help fulfill the attack objective. The gap existing between the levels of process events and intrusion activities makes it hard to infer which process events are related to real malicious activities, especially considering that there are massive, "noisy" process events happening in between. Hence, conventional attack detection techniques that identify individual suspicious process events are inadequate to address this scenario.
SUMMARY
[0006] A method for detecting malicious processes includes modeling system data as a graph comprising vertices that represent system entities and edges that represent events between respective system entities. Each edge has one or more timestamps corresponding respective events between two system entities. A set of valid path patterns that relate to potential attacks is generated. One or more event sequences in the system are determined to be suspicious based on the graph and the valid path patterns using a random walk on the graph.
[0007] A system for detecting malicious processes includes a modeling module configured to model system data as a graph that has vertices that represent system entities and edges that represent events between respective system entities. Each edge includes one or more timestamps corresponding respective events between two system entities. A malicious process path discovery module includes a processor configured to generate a set of valid path patterns that relate to potential attacks and to determine one or more event sequences in the system to be suspicious based on the graph and the valid path patterns using a random walk on the graph.
[0008] These and other features and advantages will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.
BRIEF DESCRIPTION OF DRAWINGS
[0009] The disclosure will provide details in the following description of preferred embodiments with reference to the following figures wherein:
[0010] FIG. 1 is directed to a network graph representing communities and roles of nodes in accordance with the present principles.
[0011] FIG. 2 is a block/flow diagram of a method of discovering community and role memberships and detecting anomalies in accordance with the present principles. [0012] FIG. 3 is a block diagram of a host-level analysis module in accordance with the present principles.
[0013] FIG. 4 is a block/flow diagram of a method for detecting suspicious host-level event sequences in accordance with the present principles.
[0014] FIG. 5 is a segment of pseudo-code for detecting suspicious host-level event sequences in accordance with the present principles.
[0015] FIG. 6 is a block diagram of a processing system in accordance with the present principles.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
[0016] In accordance with the present principles, the present embodiments provide malicious process path discovery, detecting abnormal process paths that are related to intrusion activities. This is accomplished using process traces. A set of valid sequence patterns is generated and a random walk based process is used to learn system functions and discover suspicious process sequences. To eliminate score bias from the path length, the Box-Cox power transformation is applied to normalize the anomaly scores of process sequences.
[0017] The present embodiments thereby provide complete evidence of an attacker's activity trace (i.e., the process path) after an attack has occurred. In addition, the present embodiments more accurately detect malicious process paths, reducing the number of false positives and false negatives in less time and with less computational complexity. A compact graph structure may be used to reduce memory load, a set of valid sequence patterns may be generated and used to reduce the search space, and a random walk approach may be used to reduce the computational cost. Furthermore, the present embodiments are able to detect new attacks because no training data is needed.
[0018] Referring now in detail to the figures in which like numerals represent the same or similar elements and initially to FIG. 1, an automatic security intelligence system (ASI) architecture is shown. The ASI system includes three major components: an agent 10 is installed in each machine of an enterprise network to collect operational data; backend servers 200 receive data from the agents 10, pre-process the data, and sends the pre- processed data to an analysis server 30; and an analysis server 30 that runs the security application program to analyze the data.
[0019] Each agent 10 includes an agent manager 11, an agent updater 12, and agent data 13, which in turn may include information regarding active processes, file access, net sockets, number of instructions per cycle, and host information. The backend server 20 includes an agent updater server 21 and surveillance data storage. Analysis server 30 includes intrusion detection 31, security policy compliance assessment 32, incident backtrack and system recovery 33, and centralized threat search and query 34.
[0020] Referring now to FIG. 2, additional detail on intrusion detection 31 is shown. There are five modules in an intrusion detection engine: a data distributor 41 that receives the data from backend server 20 and distributes the corresponding to network level module 42 and host level module 43; network analysis module 42 that processes the network communications (including TCP and UDP) and detects abnormal communication events; host level analysis module 43 that processes host level events, including user-to-process events, process-to-file events, and user-to-registry events; anomaly fusion module 44 that integrates network level anomalies and host level anomalies and refines the results for trustworthy intrusion events; and visualization module 45 that outputs the detection results to end users.
[0021] Referring now to FIG. 3, additional detail on the host level analysis module 43 is shown. The host level analysis module 43 includes a hardware processor 312 and a memory 314. In addition, the host level analysis module 43 includes one or more functional modules that may, in one embodiment, be stored in memory 314 and executed by hardware processor 312. In alternative embodiment, the functional modules may be implemented as one or more discrete hardware components in the form of, e.g., application specific integrated chips or field programmable gate arrays.
[0022] The host level analysis module 43 includes a number of different analysis and detection functions. A process-to-file anomaly detection module 302 takes host level process-to-file events from data distributor 41 as input and discovers abnormal process-to- file events. These events may include, e.g., reading from or writing to a file. A user-to- process anomaly detection module 304 takes all streaming process events as input from the data distributor 41 and models each user's behavior at the process level, identifying suspicious processes run by each user. A USB event anomaly detection module 306 also considers the streaming process events and identifies all USB device-related events to detect anomalous device activity. A process signature anomaly detection module 308 takes process names and signatures as input from data distributor 41 and detects processes with suspicious signatures. Finally, a malicious process path discovery module 310 takes current active processes from the data distributor 41 as starting points and tracks all of the possible process paths by combing the incoming and previous events in a time window. The malicious process path discovery module 310 detects anomalous process sequences/paths as described in greater detail below.
[0023] Referring now to FIG. 4, a method for malicious process path detection is shown. A blueprint graph is used as input. The blueprint graph is a heterogeneous graph constructed from a historical dataset of communications in a network, with nodes of the blueprint graph representing physical devices on an enterprise network and edges reflecting the normal communication patterns among the nodes. Block 402 performs graph modeling, using a compact graph structure to capture the complex interactions between system entities as an acyclic, multipartite graph. Block 404 then generates a set of valid sequence patterns based on a maximum sequence length. The maximum sequence length may be set by a user or an optimal value may be determined automatically. Forming the valid sequence patterns dramatically reduces the search space size of the graph.
[0024] Block 406 then scans the graph to determine candidate event sequences that is consistent with the patterns. A "pattern" refers to an ordered set of system entity types, while a "sequence" refers to an ordered set of specific system entities. Thus, a sequence is consistent with a pattern if its series of system entities' respective types match those of the pattern. Sequences are alternatively referred to herein as "paths."
[0025] Block 408 applies a random walk to extract the characteristics of every entity. Based on a discovered entity's nature, block 408 calculates an anomaly score of each candidate process to evaluate how abnormal the process is. As there may be multiple different sequence patterns of different lengths, and as the scores of two paths with different lengths are not directly comparable, the anomaly score distribution for each sequence pattern is transformed into a single distribution by block 410 using, e.g., a Box-Cox power transformation. Block 410 measures the deviation between the suspicious sequences and the normal sequences, reporting those sequences that have a higher-than-threshold deviation.
[0026] Attacking behavior often involves multiple system entities (e.g., processes, files, sockets, etc.). Therefore the present embodiments incorporate the interactions between multiple different system entities.
[0027] The amount of information provided by system monitoring can be extremely large, making direct storage and access of that information in memory impractical. However, the information has a large degree of redundancy. Redundancy can be found first in attributes, as each event record includes not only the involved entities but also the attributes of these entities. Storing the attributes repeatedly for each event is redundant. Second, saving events that involve the same entities repeatedly with only time stamps changing between them is redundant. Third, storage of records that are irrelevant to the detection of intrusion attacks is unneeded.
[0028] The graph model is therefore generated by block 402 to incorporate the meaningful information from the monitoring data in a compressed way. The graph model is represented as a directed graph G = (V, E, Γ), where is a set of timestamps, £ y x V X T is the set of edges, and V = F u P u U u S is, the set of vertices, where F is the set of files residing in the computer system, P is the set of processes, U is the set of UNIX® sockets, and S is the set of Internet sockets. For a specific edge (v>i, Vj) in E, T{x>i, Vj ) denotes the set of timestamps on the edge. For each event e, if the corresponding edge already exists in G, a timestamp t is added to the edge timestamps. If not, block 402 builds such an edge to G with the timestamp set T{x>i, Vj) = {t}. In this structure, the attribute values for a unique entity are stored only once. For each event sequence of a length /, there is a corresponding path through G of / edges.
[0029] To extract the most suspicious paths from the graph G, a naive approach would be to explore all of the existing paths. However, it is impractical to enumerate all the possible paths from a densely connected graph. To provide guidance to candidate searching in block 406, block 404 generates a set of valid path patterns B. Only those paths that conform to the valid path patterns are related to potential attacks— others may be discarded.
[0030] Each path pattern B of length / includes / entities and/or entity types. Thus the path pattern B may include both specific entities (e.g., particular files) as well as more general designations of entity type in the same path. B is determined to be a valid path pattern only if there exists at least a path p ' G that is consistent with B. Considering that / may be a small number, all of the possible paths can be enumerated. Searching the graph G permits all of the valid patterns to be extracted.
[0031] The valid path patterns B may be, for example, generated by experts using their experiences with previous intrusion detection attacks. However, it may be difficult to obtain an accurate and complete set of valid path patterns from such experts. As such, path patterns may be automatically generated. Each entity is set as a specific system entity type. For all the paths that correspond to information leakage, they must begin with a File entity (F) and end with an Internet socket entity (I). Given a path p E G and a path pattern B of G, p [i] and B [i] represent the 2th node in p and B respectively. The path p is therefore consistent with B, denoted as p < B, if p and B have the same length and if, for each /, p [i] E B [i] (i.e., the specific entity p [i] belongs to the entity type B [i]). Then B is a valid path pattern if there exists at least one path p in G such that p < B. In one example following the above constraints, there are four potential path patterns of length three using the four entity types described above: {F, F , /}, {F, P, /}, {F, U, /}, and {F, I, /}. Because only a Process node can connect a File node to an Internet socket node. In this manner, all of the valid patterns in G can be discovered.
[0032] Based on the generated valid path patterns B, block 406 searches for paths in the multipartite graph satisfying the patterns. Given an event sequence seq = {ex, e2,—, er), there must be an equivalent path p = {vlt v2,— , vr+1} in the graph G. Because the events follow time order, a time order constraint is applied to candidate path searching. By applying the path patterns and the time order constraint to a breadth first search, the candidate paths can be discovered with a one-time scan of G. The candidate paths C are defined as:
C = {p |p G G, 3b E B s. t. p < b}
[0033] Even with a filtering policy based on path patterns and the time order constraint, there can still be a large number of candidate paths remaining in the graph G, most of which will be related to normal behavior. As such, the present embodiments extract suspicious paths from the larger set of candidate paths.
[0034] A candidate path is determined to be suspicious by block 408 if, in the path, the involved entities behave differently from their normal roles. In a computer system, the information senders and receivers are identified as entity roles. The sender and receiver score should be accurately learned from the computer system, as they are used to set the profile of normal behavior. To achieve this, a random walk is applied to the graph G. From G, an N X N square transition matrix A is calculated as:
Figure imgf000012_0001
where N is the total number of entities and T{vi,vj) is the set of timestamps on which the event between v; and vj has ever happened. Thus A denotes the probability that the information flows from v; to vj in G.
[0035] As A is the matrix representation of the multipartite graph G, A can also be denoted as:
P F s u
P 0 AP→F AP→S AP→U
F AF→P 0 0 0
S AS→P 0 0 0
U AU→P 0 0 Au→u where zero represents a zero sub-matrix, and where the arrow operator indicates a direction of information flow. For example, P→ F indicates a flow of information from process to file. It should be noted that the non-zero sub-matrices of A only appear between processes and files and between processes and sockets, but not between respective processes, because process-process interactions do not come with interaction flow. These are constraints set by UNIX® systems.
[0036] Letting X be a sender score vector, with X[i] denoting sender score, and letting F be the receiver score vector, then each entity's sender and receiver scores can be iteratively generated as:
Figure imgf000013_0001
with initial vectors X0 and YQ being randomly generated and with m referring to the number of the current iteration. Stated generally, an entity that sends information to a large number of entities having high receiver scores is itself an important information sender, and similarly an entity that receives information from a large number of entities having high sender scores is an important information receiver. The sender and receiver scores for an entity are therefore iteratively calculated by accumulating the receiver and sender scores related to the entity. For example, the file /etc/passwd on a UNIX® system will have a high sender score and a low receiver score, because it is sent to many processes for checking access permissions, but it is rarely modified.
[0037] A result of this iterative refinement is that the learned score values will depend on the initial score values. However, the effect of the initial score values can be eliminated using the steady state property of the matrix. Given a general square matrix M and a general vector 7Γ, the general vector π can be repeatedly updated as:
^m+i = Μ χ π]η
A convergence state is possible such that nm+1 « nm for sufficiently large values of m. In this case, there is only one unique value which can reach the convergence state:
Figure imgf000014_0001
The convergence state has the property that the converged vector is only dependent on the matrix , but is independent from the initial vector value π0.
[0038] To reach the convergence state, the matrix M needs to satisfy two conditions: irreducibility and aperiodicity. A graph G is irreducible if and only if, for any two nodes, there exists at least one path between them. The period of a node is the minimum path length from the node back to itself, and the period of a graph is the greatest common divisor of all of the nodes' period values. A graph G is aperiodic if and only if it is irreducible and the period of G is 1.
[0039] As the system graph G is not always strongly connected, the above-described iteration will not always reach convergence. To ensure convergence, a restart matrix R is added, which is an N X N square matrix, with each cell value being— . A new transition matrix A is defined as:
A = (l - c) x A + c x R
where c is a value between 0 and 1 and is referred to as the restart ratio. A is guaranteed to be irreducible and aperiodic, providing converged sender score and receiver score vectors. The convergence rate can be controlled by controlling the restart rate value. One exemplary value for a number of iterations to use to ensure convergence is about 10.
[0040] Based on the sender and receiver score, and given a path p, the anomaly score for the path is calculated as:
Score(p) = 1— NS(p)
Figure imgf000015_0001
[0041] As noted above, the anomaly scores for paths of different lengths have different distributions. Therefore, to compare the suspiciousness of paths of different lengths, a transformation is performed to put paths of different length on equal footing. The path anomaly score can have an arbitrary distribution and is generally not a normal distribution. The suspiciousness of a path of length r can be defined as:
susp(p \r) = Prob(T(Score(p' <
Figure imgf000015_0002
\p' \ = r) where T is the normalization function.
[0042] The top k suspicious paths are those having the largest suspiciousness score. Mathematically, it is feasible to get the transformation to convert a normal distribution to any other distribution, but it is difficult to get the inverse function. To solve this problem, a Box-Cox power transformation is used as the normalization function. In particular, letting Q(r) denote the set of anomaly scores calculated from paths of length r, for each score q G
<2(r),
Figure imgf000016_0001
where λ is a normalization parameter. Different values for λ yield different transformed distributions. The aim is to select a value of λ that yields a normalized distribution as close as possible to the normal distribution (i.e., T(B, λ)~Ν(μ, σ2)).
[0043] The top k suspicious paths are not considered related to intrusion attacks unless they are sufficiently distinctive from the normal paths. To measure the deviation from suspicious paths from normal paths, the i-value is calculated between the two groups of paths. Due to the large number of normal paths, an efficient solution based on a Monte Carlo simulation is used to calculate the expectation and variance from a sample of relatively small size without computing the summation.
[0044] Once a suspicious path has been detected, the host-level analysis module 43 provides information regarding the anomaly, generating a report that may include one or more alerts. The anomaly fusion module 44 integrates these host- level alerts with other host- and network- level anomalies and automatically filters out false alarms. The resulting list of anomalies is provided via visualization module 45 to a user. In an alternative embodiment, certain anomalies or classes of anomalies may be addressed automatically, for example by deploying security countermeasures or mitigations. In one specific example, an automatic response to a detected anomaly may be to shut down a device showing the anomalous behavior until it can be reviewed by an administrator. [0045] Referring now to FIG. 5, pseudo-code for discovering he top k suspicious paths "SP." Sender and receiver score vectors X and Y are created using the random walk process. A queue of files Fx is sorted according to descending X and a second queue of files FY is sorted descending Y. Queues of processes Px and PY, queues of UNIX® sockets Ux and UY, and queues of internet sockets Sx and SY are created in the same way. The paths are then processed to find a path set that conforms to the event sequence pattern and time constraint.
[0046] Embodiments described herein may be entirely hardware, entirely software or including both hardware and software elements. In a preferred embodiment, the present invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
[0047] Embodiments may include a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. A computer-usable or computer readable medium may include any apparatus that stores, communicates, propagates, or transports the program for use by or in connection with the instruction execution system, apparatus, or device. The medium can be magnetic, optical, electronic, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. The medium may include a computer-readable storage medium such as a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk, etc. [0048] Each computer program may be tangibly stored in a machine-readable storage media or device (e.g., program memory or magnetic disk) readable by a general or special purpose programmable computer, for configuring and controlling operation of a computer when the storage media or device is read by the computer to perform the procedures described herein. The inventive system may also be considered to be embodied in a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.
[0049] A data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code to reduce the number of times code is retrieved from bulk storage during execution. Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) may be coupled to the system either directly or through intervening I/O controllers.
[0050] Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
[0051] Referring now to FIG. 6, an exemplary processing system 600 is shown which may represent the analytic server 30, the intrusion detection system 31, and/or the host level analysis module 43. The processing system 600 includes at least one processor (CPU) 604 operatively coupled to other components via a system bus 602. A cache 606, a Read Only Memory (ROM) 608, a Random Access Memory (RAM) 610, an input/output (I/O) adapter 620, a sound adapter 630, a network adapter 640, a user interface adapter 650, and a display adapter 660, are operatively coupled to the system bus 602.
[0052] A first storage device 622 and a second storage device 624 are operatively coupled to system bus 602 by the I/O adapter 620. The storage devices 622 and 624 can be any of a disk storage device (e.g., a magnetic or optical disk storage device), a solid state magnetic device, and so forth. The storage devices 622 and 624 can be the same type of storage device or different types of storage devices.
[0053] A speaker 632 is operatively coupled to system bus 602 by the sound adapter 630. A transceiver 642 is operatively coupled to system bus 602 by network adapter 640. A display device 662 is operatively coupled to system bus 602 by display adapter 660.
[0054] A first user input device 652, a second user input device 654, and a third user input device 656 are operatively coupled to system bus 602 by user interface adapter 650. The user input devices 652, 654, and 656 can be any of a keyboard, a mouse, a keypad, an image capture device, a motion sensing device, a microphone, a device incorporating the functionality of at least two of the preceding devices, and so forth. Of course, other types of input devices can also be used, while maintaining the spirit of the present principles. The user input devices 652, 654, and 656 can be the same type of user input device or different types of user input devices. The user input devices 652, 654, and 656 are used to input and output information to and from system 600.
[0055] Of course, the processing system 600 may also include other elements (not shown), as readily contemplated by one of skill in the art, as well as omit certain elements. For example, various other input devices and/or output devices can be included in processing system 600, depending upon the particular implementation of the same, as readily understood by one of ordinary skill in the art. For example, various types of wireless and/or wired input and/or output devices can be used. Moreover, additional processors, controllers, memories, and so forth, in various configurations can also be utilized as readily appreciated by one of ordinary skill in the art. These and other variations of the processing system 600 are readily contemplated by one of ordinary skill in the art given the teachings of the present principles provided herein.
[0056] The foregoing is to be understood as being in every respect illustrative and exemplary, but not restrictive, and the scope of the invention disclosed herein is not to be determined from the Detailed Description, but rather from the claims as interpreted according to the full breadth permitted by the patent laws. It is to be understood that the embodiments shown and described herein are only illustrative of the principles of the present invention and that those skilled in the art may implement various modifications without departing from the scope and spirit of the invention. Those skilled in the art could implement various other feature combinations without departing from the scope and spirit of the invention. Having thus described aspects of the invention, with the details and particularity required by the patent laws, what is claimed and desired protected by Letters Patent is set forth in the appended claims.

Claims

WHAT IS CLAIMED IS:
1. A method for detecting malicious processes, comprising:
modeling system data as a graph comprising vertices that represent system entities and edges that represent events between respective system entities, each edge comprising one or more timestamps corresponding respective events between two system entities; generating a set of valid path patterns that relate to potential attacks; and determining one or more event sequences in the system to be suspicious based on the graph and the valid path patterns using a random walk on the graph.
2. The method of claim 1, wherein the system entities comprise files in the system, processes in the system, UNIX® sockets in the system, and Internet sockets in the system.
3. The method of claim 1, wherein the valid patterns are determined based on properties of the system entity types.
4. The method of claim 3, wherein the valid path patterns are determined based on definitions provided by security experts according to their experiences based on previous intrusion attacks.
5. The method of claim 1, wherein determining the one or more event sequences to be suspicious comprises performing a breadth first search of candidate paths within the graph.
6. The method of claim 5, wherein the breadth first search comprises a time order constraint based on the edge timestamps.
7. The method of claim 1, wherein determining the one or more event sequences to be suspicious comprises determining that the entities on an edge deviate from the normal roles for those entities.
8. The method of claim 7, wherein determining that entities deviate from the normal role for those entities comprises determining a sender score for a sender entity and a receiver score for a receiver entity.
9. The method of claim 8, wherein determining one or more event sequences to be suspicious comprises calculating an anomaly score based on the sender score and receiver score for each entity in each event sequence.
10. The method of claim 9, wherein determining one or more event sequences to be suspicious comprises normalizing anomaly scores using a Box-Cox power
transformation.
11. A system for detecting malicious processes, comprising:
a modeling module configured to model system data as a graph that comprises vertices that represent system entities and edges that represent events between respective system entities, each edge comprising one or more timestamps corresponding respective events between two system entities; and
a malicious process path discovery module comprising a processor configured to generate a set of valid path patterns that relate to potential attacks and to determine one or more event sequences in the system to be suspicious based on the graph and the valid path patterns using a random walk on the graph.
12. The system of claim 11, wherein the system entities comprise files in the system, processes in the system, UNIX® sockets in the system, and Internet sockets in the system.
13. The system of claim 11, wherein the malicious process path discovery module is further configured to determine valid patterns based on properties of the system entity types.
14. The system of claim 13, wherein the malicious process path discovery module is further configured to determine valid path patterns based on definitions provided by security experts according to their experiences based on previous intrusion attacks.
15. The system of claim 11, wherein the malicious process path discovery module is further configured to perform a breadth first search of candidate paths within the graph.
16. The system of claim 15, wherein the breadth first search comprises a time order constraint based on the edge timestamps.
17. The system of claim 11, wherein the malicious process path discovery module is further configured to determine whether the entities on an edge deviate from the normal roles for those entities.
18. The system of claim 17, wherein the malicious process path discovery module is further configured to determine a sender score for a sender entity and a receiver score for a receiver entity.
19. The system of claim 18, wherein the malicious process path discovery module is further configured to calculate an anomaly score based on the sender score and receiver score for each entity in each event sequence.
20. The system of claim 19, wherein the malicious process path discovery module is further configured to normalize anomaly scores using a Box-Cox power transformation.
PCT/US2016/043040 2015-07-24 2016-07-20 Graph-based intrusion detection using process traces WO2017019391A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
DE112016002806.7T DE112016002806T5 (en) 2015-07-24 2016-07-20 Graphene-based intrusion detection using process traces
JP2018502363A JP6557774B2 (en) 2015-07-24 2016-07-20 Graph-based intrusion detection using process trace

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201562196404P 2015-07-24 2015-07-24
US62/196,404 2015-07-24
US15/213,896 US10305917B2 (en) 2015-04-16 2016-07-19 Graph-based intrusion detection using process traces
US15/213,896 2016-07-19

Publications (1)

Publication Number Publication Date
WO2017019391A1 true WO2017019391A1 (en) 2017-02-02

Family

ID=57885033

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2016/043040 WO2017019391A1 (en) 2015-07-24 2016-07-20 Graph-based intrusion detection using process traces

Country Status (3)

Country Link
JP (1) JP6557774B2 (en)
DE (1) DE112016002806T5 (en)
WO (1) WO2017019391A1 (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111651761A (en) * 2019-03-04 2020-09-11 腾讯科技(深圳)有限公司 Black production electronic equipment detection method and device, server and storage medium
WO2021041901A1 (en) * 2019-08-30 2021-03-04 Palo Alto Networks, Inc. Context informed abnormal endpoint behavior detection
CN112487421A (en) * 2020-10-26 2021-03-12 中国科学院信息工程研究所 Heterogeneous network-based android malicious application detection method and system
US11159549B2 (en) 2016-03-30 2021-10-26 British Telecommunications Public Limited Company Network traffic threat identification
US11194901B2 (en) 2016-03-30 2021-12-07 British Telecommunications Public Limited Company Detecting computer security threats using communication characteristics of communication protocols
US11194915B2 (en) 2017-04-14 2021-12-07 The Trustees Of Columbia University In The City Of New York Methods, systems, and media for testing insider threat detection systems
US11520882B2 (en) 2018-12-03 2022-12-06 British Telecommunications Public Limited Company Multi factor network anomaly detection
US11552977B2 (en) 2019-01-09 2023-01-10 British Telecommunications Public Limited Company Anomalous network node behavior identification using deterministic path walking
US20240089091A1 (en) * 2022-09-13 2024-03-14 Capital One Services, Llc Secure cryptographic transfer using multiparty computation
US11960610B2 (en) 2018-12-03 2024-04-16 British Telecommunications Public Limited Company Detecting vulnerability change in software systems
US11973778B2 (en) 2018-12-03 2024-04-30 British Telecommunications Public Limited Company Detecting anomalies in computer networks
US11989307B2 (en) 2018-12-03 2024-05-21 British Telecommunications Public Company Limited Detecting vulnerable software systems
US11989289B2 (en) 2018-12-03 2024-05-21 British Telecommunications Public Limited Company Remediating software vulnerabilities

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070209074A1 (en) * 2006-03-04 2007-09-06 Coffman Thayne R Intelligent intrusion detection system utilizing enhanced graph-matching of network activity with context data
EP1919162A2 (en) * 2006-10-30 2008-05-07 Juniper Networks, Inc. Identification of potential network threats using a distributed threshold random walk
US20080178293A1 (en) * 2007-01-23 2008-07-24 Arthur Keen Network intrusion detection
US20090222435A1 (en) * 2008-03-03 2009-09-03 Microsoft Corporation Locally computable spam detection features and robust pagerank
US20100312669A1 (en) * 2005-04-11 2010-12-09 Microsoft Corporation Method and system for performing searches and returning results based on weighted criteria

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5121622B2 (en) * 2008-08-05 2013-01-16 日本電信電話株式会社 Access destination scoring method and program
JP2011053893A (en) * 2009-09-01 2011-03-17 Hitachi Ltd Illicit process detection method and illicit process detection system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100312669A1 (en) * 2005-04-11 2010-12-09 Microsoft Corporation Method and system for performing searches and returning results based on weighted criteria
US20070209074A1 (en) * 2006-03-04 2007-09-06 Coffman Thayne R Intelligent intrusion detection system utilizing enhanced graph-matching of network activity with context data
EP1919162A2 (en) * 2006-10-30 2008-05-07 Juniper Networks, Inc. Identification of potential network threats using a distributed threshold random walk
US20080178293A1 (en) * 2007-01-23 2008-07-24 Arthur Keen Network intrusion detection
US20090222435A1 (en) * 2008-03-03 2009-09-03 Microsoft Corporation Locally computable spam detection features and robust pagerank

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11194901B2 (en) 2016-03-30 2021-12-07 British Telecommunications Public Limited Company Detecting computer security threats using communication characteristics of communication protocols
US11159549B2 (en) 2016-03-30 2021-10-26 British Telecommunications Public Limited Company Network traffic threat identification
US12079345B2 (en) 2017-04-14 2024-09-03 The Trustees Of Columbia University In The City Of New York Methods, systems, and media for testing insider threat detection systems
US11194915B2 (en) 2017-04-14 2021-12-07 The Trustees Of Columbia University In The City Of New York Methods, systems, and media for testing insider threat detection systems
US11973778B2 (en) 2018-12-03 2024-04-30 British Telecommunications Public Limited Company Detecting anomalies in computer networks
US11520882B2 (en) 2018-12-03 2022-12-06 British Telecommunications Public Limited Company Multi factor network anomaly detection
US11960610B2 (en) 2018-12-03 2024-04-16 British Telecommunications Public Limited Company Detecting vulnerability change in software systems
US11989307B2 (en) 2018-12-03 2024-05-21 British Telecommunications Public Company Limited Detecting vulnerable software systems
US11989289B2 (en) 2018-12-03 2024-05-21 British Telecommunications Public Limited Company Remediating software vulnerabilities
US11552977B2 (en) 2019-01-09 2023-01-10 British Telecommunications Public Limited Company Anomalous network node behavior identification using deterministic path walking
CN111651761A (en) * 2019-03-04 2020-09-11 腾讯科技(深圳)有限公司 Black production electronic equipment detection method and device, server and storage medium
US11483326B2 (en) 2019-08-30 2022-10-25 Palo Alto Networks, Inc. Context informed abnormal endpoint behavior detection
US11888881B2 (en) 2019-08-30 2024-01-30 Palo Alto Networks, Inc. Context informed abnormal endpoint behavior detection
WO2021041901A1 (en) * 2019-08-30 2021-03-04 Palo Alto Networks, Inc. Context informed abnormal endpoint behavior detection
CN112487421A (en) * 2020-10-26 2021-03-12 中国科学院信息工程研究所 Heterogeneous network-based android malicious application detection method and system
CN112487421B (en) * 2020-10-26 2024-06-11 中国科学院信息工程研究所 Android malicious application detection method and system based on heterogeneous network
US20240089091A1 (en) * 2022-09-13 2024-03-14 Capital One Services, Llc Secure cryptographic transfer using multiparty computation

Also Published As

Publication number Publication date
JP6557774B2 (en) 2019-08-07
JP2018526728A (en) 2018-09-13
DE112016002806T5 (en) 2018-03-22

Similar Documents

Publication Publication Date Title
US10305917B2 (en) Graph-based intrusion detection using process traces
WO2017019391A1 (en) Graph-based intrusion detection using process traces
US12126636B2 (en) Anomaly alert system for cyber threat detection
US11463472B2 (en) Unknown malicious program behavior detection using a graph neural network
Rabbani et al. A hybrid machine learning approach for malicious behaviour detection and recognition in cloud computing
EP3211854B1 (en) Cyber security
US10419466B2 (en) Cyber security using a model of normal behavior for a group of entities
US11316891B2 (en) Automated real-time multi-dimensional cybersecurity threat modeling
Friedberg et al. Combating advanced persistent threats: From network event correlation to incident detection
US20160308725A1 (en) Integrated Community And Role Discovery In Enterprise Networks
Hu et al. A simple and efficient hidden Markov model scheme for host-based anomaly intrusion detection
US10298607B2 (en) Constructing graph models of event correlation in enterprise security systems
JP7302019B2 (en) Hierarchical Behavior Modeling and Detection Systems and Methods for System-Level Security
Garitano et al. A review of SCADA anomaly detection systems
Alserhani et al. MARS: multi-stage attack recognition system
Dong et al. Efficient discovery of abnormal event sequences in enterprise security systems
Cotroneo et al. Automated root cause identification of security alerts: Evaluation in a SaaS Cloud
Cheng et al. A novel probabilistic matching algorithm for multi-stage attack forecasts
WO2018071356A1 (en) Graph-based attack chain discovery in enterprise security systems
Ahmed Thwarting dos attacks: A framework for detection based on collective anomalies and clustering
WO2023163842A1 (en) Thumbprinting security incidents via graph embeddings
Kang et al. Actdetector: A sequence-based framework for network attack activity detection
Jeon et al. An Effective Threat Detection Framework for Advanced Persistent Cyberattacks
Al Mallah et al. On the initial behavior monitoring issues in federated learning
Li et al. Few-shot multi-domain knowledge rearming for context-aware defence against advanced persistent threats

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16831068

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2018502363

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 112016002806

Country of ref document: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16831068

Country of ref document: EP

Kind code of ref document: A1