CN114637892A - Overview map generation method of system log dependency map for attack investigation and recovery - Google Patents

Overview map generation method of system log dependency map for attack investigation and recovery Download PDF

Info

Publication number
CN114637892A
CN114637892A CN202210107372.9A CN202210107372A CN114637892A CN 114637892 A CN114637892 A CN 114637892A CN 202210107372 A CN202210107372 A CN 202210107372A CN 114637892 A CN114637892 A CN 114637892A
Authority
CN
China
Prior art keywords
graph
nodes
node
subgraph
system entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210107372.9A
Other languages
Chinese (zh)
Inventor
孟丹
文雨
徐志强
张博洋
杨纯
郑阳
张东雪
杜莹莹
吴艳娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN202210107372.9A priority Critical patent/CN114637892A/en
Publication of CN114637892A publication Critical patent/CN114637892A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for generating a summary graph of a system log dependency graph for attack investigation and restoration, which comprises the following steps: determining a system entity dependency relationship graph of an attack event to be investigated and restored, wherein the dependency relationship graph comprises system entity nodes related to the attack event and a calling relationship between the system entity nodes; the system entity node comprises a process node and a resource node; executing hierarchical random walking on the process nodes in the dependency graph, and determining the behavior representation of the process nodes; clustering the process nodes based on the behavior representation, and dividing the dependency graph into at least one first subgraph based on a clustering result; compressing each first subgraph to obtain at least one second subgraph; and generating a summary corresponding to each second sub-graph, and obtaining a summary graph corresponding to the dependency graph. The invention is convenient for viewing the outline of the related system activity and the outline information of the subgraph related to the attack by dividing the dependency graph into a plurality of subgraphs and providing each subgraph with a concise outline to generate the outline graph.

Description

Overview map generation method of system log dependency map for attack investigation and recovery
Technical Field
The invention relates to the technical field of network security, in particular to a method for generating a summary graph of a system log dependency graph for attack investigation and restoration.
Background
In order to cope with network attacks, causal analysis based on system monitoring becomes an important method for attack investigation.
The causal analysis method uses a system entity dependency graph to represent a system call event, and based on the system entity dependency graph, context information of an attack can be investigated by reconstructing an event chain leading to a Point of interest (POI) event, and the context information can effectively reveal the event related to the attack. However, because of the reliance on explosion problems, it is difficult to efficiently extract the required context information from a large graph, requiring extensive manual inspection.
Aiming at the problem of dependence explosion, the existing method mainly comprises technologies of automatically filtering irrelevant events in a dependence relationship diagram, revealing attack relevant events and the like, and although the attack investigation technologies based on the system entity dependence relationship diagram have good effects, manual attack investigation still exists, so that the actual application range is relatively limited.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a method for generating a summary graph of a system log dependency graph for attack investigation and restoration.
In a first aspect, the present invention provides a method for generating a summary graph of a system log dependency graph for attack investigation and recovery, including:
determining a system entity dependency relationship graph of an attack event to be investigated and restored, wherein the system entity dependency relationship graph comprises system entity nodes associated with the attack event to be investigated and restored and call relationships among the system entity nodes; the system entity nodes comprise process nodes and resource nodes, and the calling relationship among the system entity nodes represents system activities;
executing layered random walking on the process nodes in the system entity dependency relationship graph, and determining the behavior representation of the process nodes;
clustering the process nodes based on the behavior representation of the process nodes, and dividing the system entity dependency relationship graph into at least one first subgraph based on the clustering result;
compressing each first sub-graph of the at least one first sub-graph to obtain at least one second sub-graph, wherein the at least one second sub-graph is in one-to-one correspondence with the at least one first sub-graph;
and generating a summary corresponding to each second subgraph in the at least one second subgraph, and obtaining a summary graph corresponding to the system entity dependency graph.
Optionally, according to a method for generating a summary graph of a system log dependency graph for attack investigation and recovery provided by the present invention, performing hierarchical random walk on a process node in the system entity dependency graph to determine a behavior representation of the process node, includes:
randomly walking by a preset length by taking each process node in the system entity dependency relationship graph as a starting point to generate a walking route;
and acquiring the behavior representation of the process node by adopting a word vector model based on the walking route.
Optionally, according to a summary graph generating method of a system log dependency graph for attack investigation and recovery provided by the present invention, compressing each of the at least one first sub-graph to obtain at least one second sub-graph includes:
determining a first pattern in a target sub-graph of the at least one first sub-graph, the first pattern comprising: the method comprises the steps that at least two identical process node sets are generated by the same process node to access the same resource node mode, the process node sets comprise at least one sub-process node, and the resource node comprises a file node or a network node;
and merging the same child process nodes in the first mode, merging edges connecting the child process nodes, completing compression of the target subgraph, and acquiring a second subgraph corresponding to the target subgraph.
Optionally, according to a summary graph generating method of a system log dependency graph for attack investigation and recovery provided by the present invention, compressing each of the at least one first sub-graph to obtain at least one second sub-graph includes:
determining a second pattern in a target sub-graph of the at least one first sub-graph, the second pattern comprising: the method comprises the following steps that the same process node accesses different resource nodes at least twice, wherein the resource nodes comprise file nodes or network nodes;
and combining the different resource nodes in the second mode, completing the compression of the target subgraph, and acquiring a second subgraph corresponding to the target subgraph.
Optionally, according to a summary graph generating method of a system log dependency graph for attack investigation and restoration provided by the present invention, before the compressing each of the at least one first sub-graph, the method further includes:
under the condition that at least two process nodes access the same resource node and come from different first subgraphs, creating at least one copy node of the resource node;
and allocating the resource node and the at least one copy node to the first subgraph in which the at least two process nodes are positioned in a one-to-one correspondence manner, and creating a directional edge between the resource node and the at least one copy node to connect the resource node and the copy node.
Optionally, according to a summary graph generating method of a system log dependency graph for attack investigation and recovery provided by the present invention, before performing hierarchical random walk on a process node in the system entity dependency graph, the method further includes:
merging each process node in the system entity dependency graph with a parallel edge between the resource nodes in the system entity dependency graph, respectively, where the parallel edge includes: edges having the same read operation or the same write operation type.
Optionally, according to a summary graph generation method of a system log dependency graph for attack investigation and recovery provided by the present invention, before performing hierarchical random walk on a process node in the system entity dependency graph, the method further includes:
and deleting the resource nodes which only have input edges but not output edges in the system entity dependency relationship graph.
Optionally, according to the method for generating a summary graph of a system log dependency graph for attack investigation and recovery provided by the present invention, the summary graph includes at least one of the following items:
a main process;
a time span;
a target information stream;
wherein the master process represents a parent process node of system activity included in the second subgraph;
the time span represents a time interval between an earliest start time and a latest end time of system activity included in the second subgraph;
and the target information flow represents the information flow of which the priority ranking is a preset number of bits before the ranking of all the information flows in the information flow corresponding to the system activity in the second subgraph.
Optionally, according to a summary graph generating method of a system log dependency graph for attack investigation and recovery provided by the present invention, the merging the different resource nodes in the second mode includes:
and merging the different resource nodes in the second mode into one node as a merged resource node, wherein the attribute of the merged resource node is the union of the attributes of the different resource nodes.
In a second aspect, the present invention further provides a schematic diagram generating apparatus for a system log dependency diagram for attack investigation and recovery, including:
the system comprises a first determining module, a second determining module and a third determining module, wherein the first determining module is used for determining a system entity dependency relationship graph of an attack event to be investigated and restored, and the system entity dependency relationship graph comprises system entity nodes related to the attack event to be investigated and restored and call relations among the system entity nodes; the system entity nodes comprise process nodes and resource nodes, and the calling relationship among the system entity nodes represents system activities;
the second determination module is used for executing hierarchical random walking on the process nodes in the system entity dependency relationship graph and determining the behavior representation of the process nodes;
the subgraph division module is used for clustering the process nodes based on the behavior representation of the process nodes and dividing the system entity dependency relationship graph into at least one first subgraph based on the clustering result;
the subgraph compression module is used for compressing each first subgraph in the at least one first subgraph to obtain at least one second subgraph, and the at least one second subgraph is in one-to-one correspondence with the at least one first subgraph;
and the schematic diagram generating module is used for generating a schematic diagram corresponding to each of the at least one second sub-diagram and obtaining a schematic diagram corresponding to the system entity dependency diagram.
In a third aspect, the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the program, the processor implements the steps of the method for generating a summary graph of a system log dependency graph for attack investigation and recovery according to the first aspect.
In a fourth aspect, the present invention further provides a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the profile generation method for a system log dependency graph for attack investigation and recovery as described in the first aspect.
In a fifth aspect, the present invention further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program implements the steps of the method for generating a profile of a system log dependency graph for attack investigation and recovery as described in any one of the above.
The invention provides a method for generating a summary graph of a system log dependency graph for attack investigation and restoration, which comprises the steps of determining a system entity dependency graph of an attack event to be investigated and restored, executing hierarchical random walking on process nodes in the system entity dependency graph, determining behavior representation of the process nodes, dividing the system entity dependency graph into at least one first subgraph based on the behavior representation of the process nodes, compressing each first subgraph to obtain at least one second subgraph, and finally generating the summary of each second subgraph so as to obtain the summary graph corresponding to the system entity dependency graph; the system entity dependency graph is divided into a plurality of sub graphs, and a concise summary is provided for each sub graph to generate a summary graph, each sub graph only comprises closely related processes to jointly complete system tasks, the generated summary graph keeps the semantics of system activities in the system entity dependency graph by hiding less important details, and is visualized in a summary form, so that the size of the system entity dependency graph can be reduced, and the summary of the related system activities and summary information of communities related to attacks can be conveniently viewed.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart diagram of a schematic diagram generation method provided by the present invention;
FIG. 2 is a system entity dependency graph diagram illustration of a sketch generation method provided by the present invention;
FIG. 3 is one of the schematic overlapping node diagrams of the schematic diagram generation method provided by the present invention;
FIG. 4 is a second schematic diagram of overlapping nodes of the schematic diagram generation method according to the present invention;
FIG. 5 is a third schematic diagram of an overlapping node in the schematic diagram generating method according to the present invention;
FIG. 6 is a schematic diagram of a schematic diagram generation method according to the present invention;
FIG. 7 is a schematic diagram of a first mode of a schematic diagram generation method according to the present invention;
FIG. 8 is a schematic diagram of a second mode of a schematic diagram generation method according to the present invention;
FIG. 9 is a second schematic flow chart diagram of the schematic diagram generation method provided by the present invention;
FIG. 10 is a diagram illustrating the number of subgraphs monitored by the sketch generation method provided in the present invention;
FIG. 11 is a diagram illustrating the size distribution of sub-graphs monitored by the schematic diagram generation method according to the present invention;
FIG. 12 is a schematic view of node compression ratio distribution of the schematic diagram generation method according to the present invention;
FIG. 13 is a schematic diagram of an edge compression ratio distribution of the sketch generating method provided in the present invention;
FIG. 14 is a schematic configuration diagram of a schematic diagram generating apparatus according to the present invention;
fig. 15 is a schematic physical structure diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms first, second and the like in the description and in the claims of the present invention are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application are capable of operation in sequences other than those illustrated or described herein, and that the terms "first," "second," etc. are generally used in a generic sense and do not limit the number of terms, e.g., a first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.
To facilitate a clearer understanding of embodiments of the present invention, some relevant background information is first presented below.
In order to cope with network attacks, causal analysis based on system monitoring becomes an important method for attack investigation. The system monitors and observes system calls and generates kernel level audit events as system audit logs. These logs enable causal analysis to identify entry points of intrusions (traceback) and branches of attacks (traceforward), which has proven effective in assisting attack investigation and system recovery.
While causal analysis has achieved good success in some areas, existing methods require extensive manual inspection, which prevents their widespread use. The causal analysis approach considers system entities (e.g., files, processes, and network connections) that involve causal dependencies in the same system-level call event (e.g., the process of reading a file). Based on the dependency relationships, the methods use a system entity system dependency relationship diagram to represent system call events, the nodes are system entities, the edges are events, and the connection relationship of the edges refers to the dependency relationship derived from the system events.
Using a system entity dependency graph, contextual information of an attack may be investigated by reconstructing an event chain that results in a POI (Point of interest) event (e.g., an alarm event reported by an intrusion detection system). Such context information may effectively reveal events related to the attack. However, it is difficult to efficiently extract the required context information from a large graph (typically containing more than 100K edges) due to the dependency on explosion problem.
In order to overcome the defect that the dependency explosion exists when the system entity dependency graph is used in attack investigation, the existing method mainly adopts the technology of automatically filtering irrelevant events and revealing attack relevant events. Although these techniques work well, manual attack investigation is still essential for three main reasons:
(1) there is always a residual risk in the system, and although this risk is small, these automation techniques cannot be accurately disclosed, especially techniques that rely heavily on system profiles;
(2) threats are evolving to evade defense techniques, such as emerging tactics of attack and recently developed techniques for adversaries;
(3) the prior art mainly relies on heuristic rules, which can cause information loss, and some technologies need to be changed in a system to discover attack behaviors, such as binary detection, so that the universality is poor, and the practical application of the technologies is hindered.
The following describes a schematic diagram generation method and apparatus for a system log dependency diagram for attack investigation and recovery provided by the present invention with reference to fig. 1 to 14.
Fig. 1 is one of the flow diagrams of the schematic diagram generating method provided by the present invention, and as shown in fig. 1, the method includes the following flows:
step 100, determining a system entity dependency relationship graph of an attack event to be investigated and restored, wherein the system entity dependency relationship graph comprises system entity nodes associated with the attack event to be investigated and restored and call relations among the system entity nodes; the system entity nodes comprise process nodes and resource nodes, and the calling relationship among the system entity nodes represents system activities;
step 110, executing hierarchical random walking on the process nodes in the system entity dependency relationship graph, and determining the behavior representation of the process nodes;
step 120, clustering the process nodes based on the behavior representation of the process nodes, and dividing the system entity dependency relationship graph into at least one first subgraph based on the clustering result;
step 130, compressing each first sub-graph of the at least one first sub-graph to obtain at least one second sub-graph, wherein the at least one second sub-graph corresponds to the at least one first sub-graph one by one;
step 140, generating a summary corresponding to each of the at least one second sub-graph, and obtaining a summary corresponding to the system entity dependency graph.
In order to overcome the defect that the dependency graph of the system entity is used for dependency explosion in attack investigation, the invention divides the system entity dependency graph into communities (subgraphs), each community only comprises closely related processes, and generates a corresponding summary for each community, the semantics of the system activity in the system entity dependency graph is kept by hiding less important details, and the system activity is visualized through the summary form, so that the size of the system entity dependency graph is reduced, and the summary of the related system activity and the summary information of the communities related to the attack are conveniently viewed.
Optionally, a system entity dependency graph of the attack events to be investigated and restored may be determined.
Optionally, the system entity dependency graph may include system entity nodes associated with the attack events to be investigated and restored and invocation relationships between the respective system entity nodes.
Optionally, the system entity nodes may include process nodes and resource nodes.
Alternatively, the resource nodes may include file nodes and network nodes.
Optionally, the calling relationship between system entity nodes may characterize system activity.
Alternatively, a hierarchical random walk may be performed on a process node in the system entity dependency graph to determine a behavioral representation of the process node.
Alternatively, the process nodes may be clustered based on their behavioral representations to group process nodes with similar behavioral representations into the same class.
Alternatively, in order to calculate the overlapping clusters of the process nodes according to their behavior representations, the process nodes may be clustered by using a soft clustering method FCM (Fuzzy C-means).
Specifically, unlike hard clustering methods (i.e., K-means) which classify process nodes into only one cluster, FCM outputs the degree of membership of each process node in each cluster by minimizing an objective function.
Optionally, the system entity dependency graph may be partitioned into at least one first subgraph based on the results of the process node clustering, so that closely related groups of processes may be determined as process node-centric subgraphs.
Optionally, each of the at least one first sub-graph may be compressed to obtain at least one second sub-graph.
For example, redundant edges or redundant nodes present in the first subgraph can be merged or deleted.
For example, edges included in the first subgraph that have the same read operation or the same write operation may be merged.
For example, read-only file nodes in the first subgraph that do not contain useful attack-related information may be deleted.
Optionally, the obtained at least one second sub-graph corresponds to the at least one first sub-graph one by one, that is, one first sub-graph is compressed, that is, the compressed first sub-graph can be obtained and used as the second sub-graph.
Optionally, a summary corresponding to each of the at least one second sub-graph may be generated, and a summary corresponding to the system entity dependency graph is obtained.
Optionally, the summary corresponding to each second sub-graph can be used to characterize summary information of the relevant system activity corresponding to that second sub-graph.
Optionally, after generating the summary corresponding to each second sub-graph, a summary graph corresponding to the system entity dependency graph may be obtained.
Alternatively, system audit events, including process events, file events, and network events, may be collected using system audit tools running on mainstream operating systems (e.g., Windows, Linux, Mac OS, and Android). For each entity and each event collected, some attributes essential to the security analysis (e.g., PID (Process Identifier), file name and IP (Internet Protocol) of the entity, start time, end time and operation of the event) may be recorded, as shown in tables 1 and 2.
TABLE 1 System entity Attribute
Entity Attributes
Process PID,Name,User,Cmd
File Name,Path
Network IP,Port,Protocol
TABLE 2 System event Attribute
Figure BDA0003494382810000111
Figure BDA0003494382810000121
Alternatively, the system entity dependency graph can be constructed by tracking system entity dependencies by performing inverse causal analysis given a POI event (e.g., an alert regarding a file download).
For example, fig. 2 is a system entity dependency graph diagram illustration of the sketch generation method provided by the present invention, as shown in fig. 2, a causal analysis may iteratively find events that occur along certain dependent paths of a POI event and occur before the POI event, starting with the POI event, and these found events (i.e., edges) may form a system entity dependency graph of the POI event.
Alternatively, the tightly connected process groups may be determined as process node centric communities (subgraphs).
Optionally, the process node-centric community is a graph that may include a host process node, a set of process nodes (representing child processes that are generated by the host process and have data dependencies through the resource nodes), and a set of resource nodes that are accessed by the host process and the child processes.
For example, leak in FIG. 2 is the main process of Community C3, which generates sub-processes tar, bzip, gpg, and curl to compress and upload files, which have data dependencies at least with another sub-process, as in the dependency graph shown in FIG. 2: tar →,/upload.tar → bzip2 → ∑ upload.tar.bz2 → gpg →./upload → curl → xxx- > xxx.
Optionally, the process node-centric community may further include process nodes or resource nodes that belong to multiple communities and are referred to as overlay nodes.
For example, in fig. 2, leak first cooperates with curl to complete execution of script leak.sh in C2, and then generates child processes tar, bzip2, gpg, and curl to compress and upload files in C3. In this case, leak is an overlapping node in C2 and C3.
For example, fig. 3 is one of schematic diagrams of overlapping nodes of a schematic diagram generating method provided by the present invention, fig. 4 is a second schematic diagram of overlapping nodes of a schematic diagram generating method provided by the present invention, fig. 5 is a third schematic diagram of overlapping nodes of a schematic diagram generating method provided by the present invention, and as shown in fig. 3-5, the overlapping nodes can be divided into the following three types:
(1) a master process node: collaborate with different sets of sub-processes of different system activities;
(2) and (3) a child process node: collaborate with its peers to complete system activities, while spawning sub-processes to complete different system activities;
(3) resource node: are resource nodes accessed by process nodes from different communities.
Optionally, a summary may be generated for each community, which may visualize the major system activities included by the corresponding community.
For example, fig. 6 is a schematic diagram of the schematic diagram generation method provided by the present invention, and as shown in fig. 6, each community (C1, C2, …, C10) generates a concise summary for visualizing the system activities of each community.
The invention provides a method for generating a summary graph of a system log dependency graph for attack investigation and restoration, which comprises the steps of determining a system entity dependency graph of an attack event to be investigated and restored, executing hierarchical random walking on process nodes in the system entity dependency graph, determining behavior representation of the process nodes, dividing the system entity dependency graph into at least one first subgraph based on the behavior representation of the process nodes, compressing each first subgraph to obtain at least one second subgraph, and finally generating the summary of each second subgraph so as to obtain the summary graph corresponding to the system entity dependency graph; the system entity dependency graph is divided into a plurality of sub-graphs, and a concise summary is provided for each sub-graph to generate a summary graph, each sub-graph only contains closely related processes and completes system tasks together, the generated summary graph keeps the semantics of system activities in the system entity dependency graph by hiding less important details and is visualized in a summary form, so that the size of the system entity dependency graph is reduced, and the summary of related system activities and summary information of communities related to attacks can be conveniently viewed.
Optionally, the performing hierarchical random walk on a process node in the system entity dependency graph to determine a behavior representation of the process node includes:
randomly walking by a preset length by taking each process node in the system entity dependency relationship graph as a starting point to generate a walking route;
and acquiring the behavior representation of the process node by adopting a word vector model based on the walking route.
Optionally, the preset length may be randomly walked starting from each process node in the system entity dependency graph to generate a walking route.
Alternatively, the preset length may have a value of 5 or 10 or 15, which is not particularly limited in the present invention.
For example, with node v1Generating a walking route W of a specific length for random walking of the starting point { v ═ v }1,...,vnIn which v isiE W is randomly selected with transition probabilities. From viThe transition probability to its neighbor node n is
Figure BDA0003494382810000141
Wherein w (v)iN) represents from viThe weight of the walk to n is given,
Figure BDA0003494382810000142
denotes viIs calculated as the sum of the walking weights between all neighboring nodes. Different from the existing random walk algorithm which processes neighbor nodes with equal probability, the walk algorithm in the invention is viProvide a higher probability that these neighboring nodes are more likely to beAnd viTightly coupled process nodes.
In particular, the neighbor nodes of a process node and the global process lineage tree can be considered at the same time to ensure that closely related process nodes are sampled to the same walking path so that they will have similar contexts. For each process node p, the single-hop neighbor nodes of p are examined and p is associated with the parent process node, child process nodes, and accessed resource nodes.
Alternatively, the word vector model may be used to obtain the behavior representation of the process node based on the generated walking route.
Optionally, a word2vec model can be adopted to obtain the behavior representation of each process node based on the walking route.
For example, an analogy can be made by treating nodes in the system entity dependency graph as words and the walking route as an ordered sequence of words.
Alternatively, a widely used word representation learning algorithm SkipGram may be used to learn the behavior representation of the process nodes included in the walking route.
The method takes each process node in the system entity dependency relationship graph as a starting point to randomly walk for a preset length so as to generate a walking route, then learns the behavior representation of the process nodes by adopting a word vector model based on the walking route, and finally divides the process nodes with similar behavior representation into the same subgraph based on the behavior representation of the process nodes, thereby effectively determining the closely-connected process group as a process-centered community.
Optionally, the compressing each of the at least one first sub-graph to obtain at least one second sub-graph includes:
determining a first pattern in a target sub-graph of the at least one first sub-graph, the first pattern comprising: the method comprises the steps that at least two identical process node sets are generated by the same process node to access the same resource node mode, the process node sets comprise at least one sub-process node, and the resource node comprises a file node or a network node;
and merging the same child process nodes in the first mode, merging edges connecting the child process nodes, completing compression of the target subgraph, and acquiring a second subgraph corresponding to the target subgraph.
Optionally, a first pattern in a target sub-graph of at least one first sub-graph in the system entity dependency graph may be determined.
Optionally, the first mode may include a mode in which the same process node in the target subgraph generates at least two same process node sets to access the same resource node, and the process node set includes at least one child process node.
Optionally, the same child process nodes in the first mode may be merged, and edges of the child process nodes are merged and connected to complete compression of the target subgraph.
Optionally, a first pattern in each first subgraph in the system entity dependency graph may be identified, redundant nodes and redundant edges in the first pattern are merged to obtain a compressed second subgraph, and then a summary corresponding to the second subgraph is generated based on the compressed second subgraph.
Alternatively, the first mode may be used to describe repeated activities that produce the same set of processes to handle certain resources.
For example, fig. 7 is a schematic diagram of a first mode of the schematic diagram generation method provided by the present invention, and as shown in fig. 7, a process node P0 repeatedly generates child process nodes named P1 and P2 to write into a file F1, such activity of keeping repetition does not provide additional value for security analysis, so that a plurality of P1 and P2 may be merged in the manner shown in fig. 7 to remove redundant nodes and redundant edges, respectively.
Alternatively, the compression of the first sub-graph may be accomplished by identifying a first pattern in the first sub-graph through four steps (building a process lineage tree, association with accessed resources, mining process-based patterns, and pattern-based compression), and merging duplicate nodes and edges in the first pattern.
The invention realizes the compression of the subgraph by identifying the first mode in the subgraph and combining the repeated edges and nodes in the first mode, thereby effectively reducing the redundant edges and nodes and reducing the size of the entity dependency graph of the system.
Optionally, the compressing each of the at least one first sub-graph to obtain at least one second sub-graph includes:
determining a second pattern in a target sub-graph of the at least one first sub-graph, the second pattern comprising: the method comprises the following steps that the same process node accesses different resource nodes at least twice, wherein the resource nodes comprise file nodes or network nodes;
and combining the different resource nodes in the second mode, completing the compression of the target subgraph, and acquiring a second subgraph corresponding to the target subgraph.
Optionally, a second pattern in a target sub-graph in at least one first sub-graph in the system entity dependency graph may be determined.
Optionally, the second pattern may include a pattern in which the same process node in the target subgraph accesses different resource nodes at least twice.
For example, fig. 8 is a schematic diagram of a second mode of the schematic diagram generation method provided by the present invention, and as shown in fig. 8, the process node P0 repeatedly accesses the resource nodes F1, F2, …, Fn, and such activities that remain repeated do not provide additional value for security analysis, so the resource nodes F1, F2, …, Fn may be merged in the manner shown in fig. 8 to remove redundant nodes and redundant edges.
Optionally, different resource nodes in the second mode may be merged to complete the compression of the target subgraph.
Optionally, a second pattern in each first subgraph in the system entity dependency graph may be identified, redundant nodes and redundant edges in the second pattern are merged to obtain a compressed second subgraph, and a summary corresponding to the second subgraph is generated based on the compressed second subgraph.
Alternatively, to identify the second pattern, the process node may be first associated with the resource node it accesses, and then each resource node may be searched to identify repeated accesses.
Alternatively, the resource nodes in the second mode may be merged into one node according to the searched second mode.
Alternatively, the attributes of the resource node obtained after the merging may be the union of the attributes of the original resource nodes.
The invention realizes the compression of the subgraph by identifying the second mode in the subgraph and combining the repeated edges and nodes in the second mode, thereby effectively reducing the redundant edges and nodes and reducing the size of the entity dependency graph of the system.
Optionally, before the compressing each of the at least one first sub-graph, the method further comprises:
under the condition that at least two process nodes access the same resource node and come from different first subgraphs, creating at least one copy node of the resource node;
and allocating the resource nodes and the at least one copy node to the first subgraph in which the at least two process nodes are positioned in a one-to-one correspondence manner, and creating directional edges between the resource nodes and the at least one copy node to connect the resource nodes and the copy node.
Specifically, before compressing the first subgraph, the first subgraph can be searched for an overlapping node, which is a resource node and which is connected with a plurality of process nodes from different first subgraphs; after the overlapping nodes are searched, duplicate nodes of the overlapping nodes can be created, and the created duplicate nodes and resource nodes are allocated to the first subgraph where the plurality of process nodes are located in a one-to-one correspondence mode.
Optionally, in a case where at least two process nodes access the same resource node, and at least two process nodes are from different first subgraphs, at least one replica node of the resource node may be created.
Optionally, the resource node and the at least one replica node may be allocated to the first subgraph in which the at least two process nodes are located in a one-to-one correspondence.
Optionally, directed edges may be created between the resource nodes and the respective replica nodes to connect the resource nodes and the created replica nodes.
For example, two replica nodes v given a resource node v1And v2Wherein v is1Is contained in subfigure CiIn, v2Is contained in subfigure CjIn, suppose v1With an input edge e1,v2With an output edge e2And e is and1is started earlier than e2Can create a directed edge v1→v2
For example, given a resource node r and a process node p, v is associated with the community to which p belongs if they are connected by an edge. If a resource node is connected to multiple process nodes from different communities, the resource node is an overlapping node, and copies of the resource node can be created and assigned to each community. Because these nodes lack a visible information flow direction, directional edges are created in the present invention to connect the replicas (e.g., thin dashed arrows in fig. 2).
Optionally, dependencies between communities may be classified as edge-based dependencies (i.e., dependencies represented by inter-community edges between communities) and node-based dependencies (i.e., dependencies represented by overlapping nodes).
Optionally, before performing the hierarchical random walk on the process node in the system entity dependency graph, the method further includes:
merging each process node in the system entity dependency graph with a parallel edge between the resource nodes in the system entity dependency graph, respectively, where the parallel edge includes: edges having the same read operation or the same write operation type.
Specifically, before performing subgraph division on the system entity dependency graph based on the hierarchical random walk method, preprocessing may be performed on the system entity dependency graph first, where the preprocessing may include merging parallel edges between process nodes and resource nodes in the system entity dependency graph to remove redundant edges.
Optionally, before performing the hierarchical random walk on the process nodes in the system entity dependency graph, each process node in the system entity dependency graph may be merged with a parallel edge between resource nodes in the system entity dependency graph, respectively.
Alternatively, the parallel edges may include edges having the same read operation or the same write operation type.
It will be appreciated that a system entity dependency graph typically has many parallel edges between a process node and a file node or network node, i.e., there are typically read or write operations that are repeated by the operating system for a short period of time. This is because operating systems typically perform read or write tasks by proportionally distributing data to multiple system calls. These parallel edges do not provide additional useful information for attack investigation, so parallel edges of the same operation type can be merged directly into one edge.
The parallel edges which do not contain useful attack related information in the system entity dependency relationship graph are merged, so that the redundant edges are effectively reduced, and the semantic summary information of the main system activity is conveniently generated.
Optionally, before the performing hierarchical random walk on the process node in the system entity dependency graph, the method further includes:
and deleting the resource nodes which only have input edges but not output edges in the system entity dependency relationship graph.
Optionally, before performing hierarchical random walks on the process nodes in the system entity dependency graph, resource nodes in the system entity dependency graph that have only input edges and no output edges may be deleted.
For example, a system entity dependency graph has many read-only files, which are libraries, configuration files, and resources for process initialization (e.g.,/lib 64/libdl.so.2), which do not contain useful attack-related information. Thus, read-only file nodes may be filtered (deleted) while process nodes are retained to retain the semantics of the primary system activity.
The invention deletes the resource nodes which do not contain useful attack related information in the system entity dependency relationship diagram, thereby not only effectively reducing redundant nodes, but also being convenient for generating the summary information of the semantics of the main system activities.
Optionally, the summary includes at least one of:
a main process;
a time span;
a target information stream;
wherein the master process represents a parent process node of system activity included in the second subgraph;
the time span represents a time interval between an earliest start time and a latest end time of system activity included in the second subgraph;
and the target information flow represents the information flow of which the priority ranking is a preset number of bits before the ranking of all the information flows in the information flow corresponding to the system activity in the second subgraph.
Optionally, the summary of the generated second sub-graph may include at least one of: a host process, a time span, and a target information stream.
Alternatively, the master process may represent a parent process node, or root process node, of the system activity included in the second subgraph.
Optionally, the time span may represent a time interval between an earliest start time and a latest end time of the system activity included in the second sub-graph.
Optionally, the target information flow may represent a preset number of information flows with priority scores ranked first in the information flow corresponding to the system activity included in the second sub-graph.
Optionally, the target information flow may be an information flow with a higher priority score among information flows corresponding to the system activities included in the second sub-graph.
Alternatively, the target information flow may be an information flow with a priority ranking that is a preset number of bits before the ranking of all information flows.
For example, the target information stream may be an information stream with a priority ranking of the top 2 bits or the top 3 bits or the top 4 bits, and the specific number of bits is not particularly limited in the present invention.
Optionally, before generating the summary corresponding to the second sub-graph, information stream extraction may be performed on the second sub-graph.
Alternatively, the input nodes and output nodes of each second subgraph can be identified according to the information flow among a plurality of second subgraphs in the system entity dependency graph, and then the information flow is generated by searching the path of each pair of input nodes and output nodes.
For example, given a process node-centric community (subgraph), its input nodes and output nodes may be identified first, where the input nodes represent incoming information streams for the community, i.e., target nodes of the community's inter-zone edge connection lines and network nodes without output edges. In addition, for a community without an input edge, a master process node may be selected as an input node, and an output node represents an output information stream of the community, that is, a source node of a community interval connection edge. Network nodes with input edges represent external IP and POI. Then, for each pair of input and output nodes, the longest path can be found using a Depth First Search (DFS) algorithm without using duplicate nodes as information flows. Such a path typically covers more activity information than a shorter path.
Optionally, the information streams extracted in the second sub-graph may be prioritized.
For example, priority scores of all information flows in the second subgraph can be calculated according to the possibility that the information flows represent main activities (such as attack behaviors), the information flows are prioritized based on the priority scores, and finally the information flows ranked in the top three of the ranks of all the information flows are used as target information flows in the summary corresponding to the second subgraph.
Optionally, the merging the different resource nodes in the second mode includes:
merging the different resource nodes in the second mode into one node as a merged resource node, wherein the attribute of the merged resource node is a union of the attributes of the different resource nodes.
Alternatively, different resource nodes in the second mode may be merged into one node as a merged resource node.
Alternatively, the attributes of the merged resource node may be the union of the attributes of the different resource nodes.
For example, resource nodes F1, F2, …, Fn in fig. 8 may be merged, and the attribute of the merged node is the union of the attributes F1, F2, …, Fn.
Fig. 9 is a second schematic flow chart of the schematic diagram generation method provided by the present invention, and as shown in fig. 9, the method mainly includes five parts: (1) generating a dependency graph; (2) preprocessing a dependency graph; (3) monitoring communities; (4) community compression; (5) a community summary. The specific implementation of each part and the schematic diagram generation method of the system log dependency diagram for attack investigation and restoration described above may be referred to correspondingly, and are not described herein again.
The schematic diagram generation method of the system log dependency diagram for attack investigation and restoration provided by the invention is verified through experiments.
(1) The data set includes the following two:
(1.1) attacking the data set;
an attack data set was collected from 6 Linux hosts with 10 active users using Sysdig (a powerful system tool). Conventional system tasks on these hosts include web browsing, text editing, code development, and some other services (e.g., databases). On these hosts, 6 multi-step attacks were performed according to known vulnerability and killing chains. The data set collected contained 1 million events over 3 days.
(1.2) DARPA TC dataset.
DARPA TC datasets are dedicated to the development of forensics and detection of Advanced Persistent Threats (APT). This dataset records the attack traces of various vulnerability attacks on different operating systems (e.g., Linux and Windows). From the attack description, failed attacks were excluded and 8 attacks (5000 ten thousand events) were used in the evaluation.
(2) And (5) experimental effect.
(a) Overall effect in terms of summary dependency graph.
The method for generating the schematic diagram of the dependency diagram of the system log for attack investigation and recovery, which is provided by the invention, is used for generating the schematic diagram for the dependency diagram shown in the table 3, and measuring the number and the scale of the monitored communities.
TABLE 3 statistics of attack dependency graphs
Figure BDA0003494382810000221
Figure BDA0003494382810000231
Fig. 10 is a schematic diagram of the number of subgraphs monitored by the method for generating a sketch provided in the present invention, and as shown in fig. 10, the method for generating a sketch provided in the present invention equally divides a dependency graph into 18.4 communities (subgraphs). Compared with the original dependency graph, the average size of the graph is 1302.1 nodes, and the size of the graph is reduced by 70.7 times. These results show that with a smaller number of communities, all communities can be visualized so that a summary of all relevant system activities can be easily viewed. In addition, as can also be seen in FIG. 10, the maximum community number for phishing emails (C.S) is 48, which include different system tasks (e.g., browsing web pages in firefox, sending or receiving emails, and calendar services).
Fig. 11 is a diagram of subgraph size distribution monitored by the sketch generation method provided by the invention, as shown in fig. 11, the community size distribution of 14 attacks is displayed, and as can be seen from the diagram, the community (subgraph) scale is relatively small (15.7 nodes on average), which greatly reduces the workload of checking each community. These results also show that community compression is very efficient in compressing the repeated edges, reducing on average 216.4 redundant edges per community compared to the original dependency graph. Furthermore, the profile needs only 2.26MB to store the profile on average, whereas the original dependency graph needs 344.32MB on average.
Table 4 is an edge statistics table generated using the inventive method and Nodoze, respectively, as shown in table 4, comparing the number of events in the top-1, top-2 and top-3 flows of all communities with events determined by the most advanced dependency graph reduction method Nodoze, which learns the execution profiles from benign system behaviors and reduces the dependency graph based on the anomaly scores calculated using the profiles of each path in the dependency graph.
TABLE 4 edge statistics Table for the method of the invention and Nodoze Generation
Figure BDA0003494382810000241
As can be seen from Table 4, the edges of the top-3 information flow of the method for generating the profile of the system log dependency graph for attack investigation and restoration provided by the invention are reduced by 21 times compared with NoDoze on average. NoDoze performs relatively poorly because its effectiveness depends largely on whether the execution profile can cover all benign events, and is representative, which is very difficult because the runtime environment of most systems is versatile. Therefore, the execution configuration file learned from one system is difficult to be popularized to other systems, and the method for generating the profile of the system log dependency graph for attack investigation and restoration provided by the invention is not limited by the same method, because the method for generating the profile of the system log dependency graph for attack investigation and restoration provided by the invention needs an additional execution configuration file.
(b) Collaboration with the HOLMES method.
The method for generating the overview of the system log dependency Graph for attack investigation and restoration is combined with the HOLMES which is one of the most advanced investigation technologies, the HOLMES constructs a High-level scene Graph (HSG) which integrates Tactics, technologies and Procedures (TTP), which is an important index for describing the steps of the High-level persistent threat APT, and the HSG is used for mapping the low-level event information flow to the killing chain.
The HSG is first constructed for 14 attack cases and then used to map top ranked information flows into the killing chain. The results show that HOLEMS identified 35 of the 37 attack-related communities with a recall of 96.2%, and it can be observed that the first 2 information flows were sufficient to find the killer chain. In addition, based on the method for generating the overview chart of the system log dependency graph for attack investigation and restoration provided by the invention, attack-related communities which are not monitored by the HOLMES can still be easily identified, because the information streams of the communities usually come from the attack-related communities and enter another attack-related community, so that the attack-related communities are indispensable steps of an attack chain. These results show that the summary graph generation method of the system log dependency graph for attack investigation and restoration provided by the invention can easily cooperate with other automatic technologies, highlight the communities related to the attacks, and help identify other communities related to the attacks which are not discovered by the automatic technologies.
(c) Comparison of community monitoring.
The method for generating the sketch of the system log dependency graph for attack investigation and restoration provided by the invention is compared with other most advanced community monitoring algorithms to verify the effectiveness of the technology for monitoring the community by the method for generating the sketch of the system log dependency graph for attack investigation and restoration provided by the invention. Considering the overlapping nature of the dependency graphs, 9 typical overlapping community monitoring algorithms were selected as baseline, including NISE (2016), ego splitter (2017), NMNF (2017), DANMF (2018), PMCV (2019), CGAN (2019), VGRAPH (2019), CNRL (2019), and DeepWalk (2014), and the overall correspondence between the monitored communities and the labeled terrestrial facies communities was evaluated using F1 scores, with experimental results as shown in table 5.
TABLE 514 Community monitoring results for attack events
Figure BDA0003494382810000251
Figure BDA0003494382810000261
As can be seen from table 5, the F1 score obtained by the method for generating the summary graph of the system log dependency graph for attack investigation and recovery provided by the present invention is on average 2.29 times higher than the baseline score, which indicates that the method for generating the summary graph of the system log dependency graph for attack investigation and recovery provided by the present invention can effectively monitor the process node-centered community, and the performance of other baselines is poor.
(d) The effect of community compression.
Fig. 12 is a schematic diagram of node compression ratio distribution of the schematic diagram generating method provided by the present invention, and fig. 13 is a schematic diagram of edge compression ratio distribution of the schematic diagram generating method provided by the present invention, as shown in fig. 12 and fig. 13, it can be seen that, for a community, the number of nodes and the number of edges are respectively reduced by 38.4% and 44.7% on average, and the maximum reduction amount is 97.3% of the nodes and 98.9% of the edges. Furthermore, it is verified that the information flow is not altered after compression because the repeated activities have the same information flow, which typically enters the subgraph formed by the repeated activities through a single node and then leaves the subgraph through another single node, so compressing the repeated activities does not change the events in the information flow. In summary, compressing these repeated activities still preserves the semantics of the tasks represented by the community.
(e) Validity of information flow ranking.
Table 6 shows the first 3 information flows for community C3 containing attack-related events and community C8 containing no attack-related events. The event in C3 indicates that an attacker runs a malicious script to compress, encrypt and upload sensitive files to a remote server. As can be seen from the table, the attack behaviors can be effectively represented by using the top-1 information flow with the priority of 0.8234, and although top-2 and top-3 can also cover the behaviors, the input node of the top-1 information flow is a malicious script process and is more helpful for further tracking the community which creates the malicious script. The event in C8 shows that the user logs on to his host through sshd, transfers the compressed file from the server to the host, and then decompresses the file. As can be seen from the table, the top 1 stream with the highest priority (0.4914) can represent all these activities, while the top 2 stream lacks events for sshd login, the top 3 stream lacks sshd login events, and contains a file event (/ dev/null! bash) that occurs in many communities.
Table 6 top 3 flows containing C3 Community with and C8 Community without attack
Figure BDA0003494382810000271
Figure BDA0003494382810000281
The invention provides a method for generating a sketch of a system log dependency graph for attack investigation and restoration, which comprises the steps of determining a system entity dependency graph of an attack event to be investigated and restored, executing hierarchical random walking on process nodes in the system entity dependency graph, determining behavior representation of the process nodes, dividing the system entity dependency graph into at least one first subgraph based on the behavior representation of the process nodes, compressing each first subgraph to obtain at least one second subgraph, and finally generating a summary of each second subgraph so as to obtain the sketch corresponding to the system entity dependency graph; the system entity dependency graph is divided into a plurality of sub-graphs, and a concise summary is provided for each sub-graph to generate a summary graph, each sub-graph only contains closely related processes and completes system tasks together, the generated summary graph keeps the semantics of system activities in the system entity dependency graph by hiding less important details, and the generated summary graph is visualized in a summary form, so that the size of the dependency graph can be reduced, and the summary of related system activities and summary information of communities related to attacks can be conveniently viewed.
The following describes a schematic diagram generation apparatus for a system log dependency diagram for attack investigation and recovery according to the present invention, and the schematic diagram generation apparatus for a system log dependency diagram for attack investigation and recovery described below and the schematic diagram generation method for a system log dependency diagram for attack investigation and recovery described above may be referred to correspondingly.
Fig. 14 is a schematic configuration diagram of a schematic diagram generating apparatus according to the present invention, and as shown in fig. 14, the apparatus includes: a first determination module 1410, a second determination module 1420, a sub-graph partitioning module 1430, a sub-graph compression module 1440, and a sketch generation module 1450; wherein:
the first determining module 1410 is configured to determine a system entity dependency graph of an attack event to be investigated and restored, where the system entity dependency graph includes system entity nodes associated with the attack event to be investigated and restored and call relationships between the system entity nodes; the system entity nodes comprise process nodes and resource nodes, and the calling relationship among the system entity nodes represents system activities;
the second determining module 1420 is configured to perform a hierarchical random walk on a process node in the system entity dependency graph to determine a behavioral representation of the process node;
the subgraph division module 1430 is configured to cluster the process nodes based on the behavior representation of the process nodes, and divide the system entity dependency graph into at least one first subgraph based on the clustering result;
the sub-graph compressing module 1440 is configured to compress each of the at least one first sub-graph to obtain at least one second sub-graph, where the at least one second sub-graph corresponds to the at least one first sub-graph one to one;
the summary graph generating module 1450 is configured to generate a summary corresponding to each of the at least one second sub-graph, and obtain a summary graph corresponding to the system entity dependency graph.
The invention provides a summary graph generating device of a system log dependency graph for attack investigation and restoration, which is used for determining a system entity dependency graph of an attack event to be investigated and restored, executing hierarchical random walking on a process node in the system entity dependency graph, determining behavior representation of the process node, dividing the system entity dependency graph into at least one first subgraph based on the behavior representation of the process node, compressing each first subgraph to obtain at least one second subgraph, and finally generating a summary of each second subgraph so as to obtain the summary graph corresponding to the system entity dependency graph; the system entity dependency graph is divided into a plurality of sub-graphs, and a concise summary is provided for each sub-graph to generate a summary graph, each sub-graph only contains closely related processes and completes system tasks together, the generated summary graph keeps the semantics of system activities in the system entity dependency graph by hiding less important details and is visualized in a summary form, so that the size of the system entity dependency graph can be reduced, and the summary of related system activities and summary information of communities related to attacks can be conveniently viewed.
Fig. 15 is a schematic physical structure diagram of an electronic device provided in the present invention, and as shown in fig. 15, the electronic device may include: a processor (processor)1510, a communication Interface (Communications Interface)1520, a memory (memory)1530 and a communication bus 1540, wherein the processor 1510, the communication Interface 1520 and the memory 1530 communicate with each other via the communication bus 1540. The processor 1510 may call logic instructions in the memory 1530 to execute a profile generation method of the system log dependency graph for attack investigation and recovery provided by the above methods, the method comprising:
determining a system entity dependency relationship graph of an attack event to be investigated and restored, wherein the system entity dependency relationship graph comprises system entity nodes related to the attack event to be investigated and restored and call relations among the system entity nodes; the system entity nodes comprise process nodes and resource nodes, and the calling relationship among the system entity nodes represents system activities;
executing layered random walking on the process nodes in the system entity dependency relationship graph, and determining the behavior representation of the process nodes;
clustering the process nodes based on the behavior representation of the process nodes, and dividing the system entity dependency relationship graph into at least one subgraph based on the clustering result;
compressing each first subgraph in the at least one first subgraph to obtain at least one second subgraph, wherein the at least one second subgraph is in one-to-one correspondence with the at least one first subgraph;
and generating a summary corresponding to each second subgraph in the at least one second subgraph, and obtaining a summary graph corresponding to the system entity dependency graph.
In addition, the logic instructions in the memory 1530 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product including a computer program stored on a non-transitory computer-readable storage medium, the computer program including program instructions, which when executed by a computer, enable the computer to perform a profile generation method for a system log dependency graph for attack investigation and recovery provided by the above methods, the method including:
determining a system entity dependency relationship graph of an attack event to be investigated and restored, wherein the system entity dependency relationship graph comprises system entity nodes associated with the attack event to be investigated and restored and call relationships among the system entity nodes; the system entity nodes comprise process nodes and resource nodes, and the calling relationship among the system entity nodes represents system activities;
executing layered random walking on the process nodes in the system entity dependency relationship graph, and determining the behavior representation of the process nodes;
clustering the process nodes based on the behavior representation of the process nodes, and dividing the system entity dependency relationship graph into at least one first subgraph based on the clustering result;
compressing each first sub-graph of the at least one first sub-graph to obtain at least one second sub-graph, wherein the at least one second sub-graph is in one-to-one correspondence with the at least one first sub-graph;
and generating a summary corresponding to each second sub-graph in the at least one second sub-graph, and obtaining a summary graph corresponding to the system entity dependency relationship graph.
In still another aspect, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to perform the above-mentioned methods for generating a profile of a system log dependency graph for attack investigation and recovery, the method comprising:
determining a system entity dependency relationship graph of an attack event to be investigated and restored, wherein the system entity dependency relationship graph comprises system entity nodes associated with the attack event to be investigated and restored and call relations among the system entity nodes; the system entity nodes comprise process nodes and resource nodes, and the calling relationship among the system entity nodes represents system activities;
executing layered random walking on the process nodes in the system entity dependency relationship graph, and determining the behavior representation of the process nodes;
clustering the process nodes based on the behavior representation of the process nodes, and dividing the system entity dependency relationship graph into at least one first subgraph based on the clustering result;
compressing each first sub-graph of the at least one first sub-graph to obtain at least one second sub-graph, wherein the at least one second sub-graph is in one-to-one correspondence with the at least one first sub-graph;
and generating a summary corresponding to each second sub-graph in the at least one second sub-graph, and obtaining a summary graph corresponding to the system entity dependency relationship graph.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (13)

1. A method for generating a profile of a system log dependency graph for attack investigation and recovery, comprising:
determining a system entity dependency relationship graph of an attack event to be investigated and restored, wherein the system entity dependency relationship graph comprises system entity nodes associated with the attack event to be investigated and restored and call relations among the system entity nodes; the system entity nodes comprise process nodes and resource nodes, and the calling relationship among the system entity nodes represents system activities;
executing layered random walking on the process nodes in the system entity dependency relationship graph, and determining the behavior representation of the process nodes;
clustering the process nodes based on the behavior representation of the process nodes, and dividing the system entity dependency relationship graph into at least one first subgraph based on the clustering result;
compressing each first sub-graph of the at least one first sub-graph to obtain at least one second sub-graph, wherein the at least one second sub-graph is in one-to-one correspondence with the at least one first sub-graph;
and generating a summary corresponding to each second sub-graph in the at least one second sub-graph, and obtaining a summary graph corresponding to the system entity dependency relationship graph.
2. The method for generating a profile of a system log dependency graph for attack investigation and recovery as claimed in claim 1, wherein the performing a hierarchical random walk on a process node in the system entity dependency graph to determine a behavioral representation of the process node comprises:
randomly walking by a preset length by taking each process node in the system entity dependency relationship graph as a starting point to generate a walking route;
and acquiring the behavior representation of the process node by adopting a word vector model based on the walking route.
3. The method for generating a profile of a system log dependency graph for attack investigation and recovery as claimed in claim 1, wherein the compressing each of the at least one first sub-graph to obtain at least one second sub-graph comprises:
determining a first pattern in a target sub-graph of the at least one first sub-graph, the first pattern comprising: the method comprises the steps that at least two identical process node sets are generated by the same process node to access the same resource node mode, the process node sets comprise at least one sub-process node, and the resource node comprises a file node or a network node;
and merging the same child process nodes in the first mode, merging edges connecting the child process nodes, completing compression of the target subgraph, and acquiring a second subgraph corresponding to the target subgraph.
4. The method for generating a profile of a system log dependency graph for attack investigation and recovery according to claim 1, wherein the compressing each of the at least one first sub-graph to obtain at least one second sub-graph comprises:
determining a second pattern in a target sub-graph of the at least one first sub-graph, the second pattern comprising: the mode that the same process node accesses different resource nodes at least twice, wherein the resource nodes comprise file nodes or network nodes;
and combining the different resource nodes in the second mode, completing the compression of the target subgraph, and acquiring a second subgraph corresponding to the target subgraph.
5. The method for profile generation of a system log dependency graph for attack investigation and restoration according to any of claims 1-4, wherein prior to the compressing each of the at least one first sub-graph, the method further comprises:
under the condition that at least two process nodes access the same resource node and come from different first subgraphs, creating at least one copy node of the resource node;
and allocating the resource node and the at least one copy node to the first subgraph in which the at least two process nodes are positioned in a one-to-one correspondence manner, and creating a directional edge between the resource node and the at least one copy node to connect the resource node and the copy node.
6. The method for profile generation of a system log dependency graph for attack investigation and recovery as claimed in claim 1 wherein prior to performing hierarchical random walks on process nodes in the system entity dependency graph, the method further comprises:
merging each process node in the system entity dependency graph with a parallel edge between the resource nodes in the system entity dependency graph, respectively, where the parallel edge includes: edges having the same read operation or the same write operation type.
7. The method for generating a profile of a system log dependency graph for attack investigation and recovery as claimed in claim 1, wherein prior to performing hierarchical random walks on process nodes in the system entity dependency graph, the method further comprises:
and deleting the resource nodes which only have input edges but not output edges in the system entity dependency relationship graph.
8. The method of generating a summary graph of a system log dependency graph for attack investigation and recovery as claimed in claim 1, wherein the summary includes at least one of:
a main process;
a time span;
a target information stream;
wherein the master process represents a parent process node of system activity included in the second subgraph;
the time span represents a time interval between an earliest start time and a latest end time of system activity included in the second subgraph;
and the target information flow represents the information flow of which the priority ranking is a preset number of bits before the ranking of all the information flows in the information flow corresponding to the system activity in the second subgraph.
9. The method of profile generation for a system log dependency graph for attack investigation and recovery as claimed in claim 4 wherein the merging the different resource nodes in the second schema comprises:
merging the different resource nodes in the second mode into one node as a merged resource node, wherein the attribute of the merged resource node is a union of the attributes of the different resource nodes.
10. An overview generating apparatus for a system log dependency graph for attack investigation and recovery, comprising:
the system comprises a first determining module, a second determining module and a third determining module, wherein the first determining module is used for determining a system entity dependency relationship graph of an attack event to be investigated and restored, and the system entity dependency relationship graph comprises system entity nodes related to the attack event to be investigated and restored and call relations among the system entity nodes; the system entity nodes comprise process nodes and resource nodes, and the calling relationship among the system entity nodes represents system activities;
the second determination module is used for executing hierarchical random walking on the process nodes in the system entity dependency relationship graph and determining the behavior representation of the process nodes;
the subgraph division module is used for clustering the process nodes based on the behavior representation of the process nodes and dividing the system entity dependency relationship graph into at least one first subgraph based on the clustering result;
the subgraph compression module is used for compressing each first subgraph in the at least one first subgraph to obtain at least one second subgraph, and the at least one second subgraph is in one-to-one correspondence with the at least one first subgraph;
and the schematic diagram generating module is used for generating a schematic diagram corresponding to each of the at least one second sub-diagram and obtaining a schematic diagram corresponding to the system entity dependency diagram.
11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of the method for profile generation of a syslog dependency graph for attack investigation and recovery as claimed in any one of claims 1 to 9.
12. A non-transitory computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the method for profile generation of a system log dependency graph for attack investigation and recovery as claimed in any one of claims 1 to 9.
13. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, realizes the steps of the method for generating a profile for a syslog dependency graph for attack investigation and recovery as claimed in any one of claims 1 to 9.
CN202210107372.9A 2022-01-28 2022-01-28 Overview map generation method of system log dependency map for attack investigation and recovery Pending CN114637892A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210107372.9A CN114637892A (en) 2022-01-28 2022-01-28 Overview map generation method of system log dependency map for attack investigation and recovery

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210107372.9A CN114637892A (en) 2022-01-28 2022-01-28 Overview map generation method of system log dependency map for attack investigation and recovery

Publications (1)

Publication Number Publication Date
CN114637892A true CN114637892A (en) 2022-06-17

Family

ID=81946559

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210107372.9A Pending CN114637892A (en) 2022-01-28 2022-01-28 Overview map generation method of system log dependency map for attack investigation and recovery

Country Status (1)

Country Link
CN (1) CN114637892A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114915501A (en) * 2022-07-15 2022-08-16 北京微步在线科技有限公司 Intrusion event detection method and device based on process behavior diagram and electronic equipment
CN115146271A (en) * 2022-09-02 2022-10-04 浙江工业大学 APT (advanced persistent threat) source tracing and researching method based on causal analysis
CN115514580A (en) * 2022-11-11 2022-12-23 华中科技大学 Method and device for detecting source-tracing intrusion of self-encoder
CN115664863A (en) * 2022-12-27 2023-01-31 北京微步在线科技有限公司 Network attack event processing method, device, storage medium and equipment
CN116795850A (en) * 2023-05-31 2023-09-22 山东大学 Method, device and storage medium for concurrent execution of massive transactions of alliance chains

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200143052A1 (en) * 2018-11-02 2020-05-07 Microsoft Technology Licensing, Llc Intelligent system for detecting multistage attacks
CN111259204A (en) * 2020-01-13 2020-06-09 深圳市联软科技股份有限公司 APT detection correlation analysis method based on graph algorithm
CN111919001A (en) * 2018-03-20 2020-11-10 住友重机械工业株式会社 Shovel, information processing device, information processing method, and program
CN112182567A (en) * 2020-09-29 2021-01-05 西安电子科技大学 Multi-step attack tracing method, system, terminal and readable storage medium
CN112955610A (en) * 2018-11-08 2021-06-11 住友建机株式会社 Shovel, information processing device, information processing method, information processing program, terminal device, display method, and display program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111919001A (en) * 2018-03-20 2020-11-10 住友重机械工业株式会社 Shovel, information processing device, information processing method, and program
US20200143052A1 (en) * 2018-11-02 2020-05-07 Microsoft Technology Licensing, Llc Intelligent system for detecting multistage attacks
CN112955610A (en) * 2018-11-08 2021-06-11 住友建机株式会社 Shovel, information processing device, information processing method, information processing program, terminal device, display method, and display program
CN111259204A (en) * 2020-01-13 2020-06-09 深圳市联软科技股份有限公司 APT detection correlation analysis method based on graph algorithm
CN112182567A (en) * 2020-09-29 2021-01-05 西安电子科技大学 Multi-step attack tracing method, system, terminal and readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
文雨;王伟平;孟丹;: "面向内部威胁检测的用户跨域行为模式挖掘", 计算机学报, no. 08, 24 January 2016 (2016-01-24) *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114915501A (en) * 2022-07-15 2022-08-16 北京微步在线科技有限公司 Intrusion event detection method and device based on process behavior diagram and electronic equipment
CN114915501B (en) * 2022-07-15 2022-09-13 北京微步在线科技有限公司 Intrusion event detection method and device based on process behavior diagram and electronic equipment
CN115146271A (en) * 2022-09-02 2022-10-04 浙江工业大学 APT (advanced persistent threat) source tracing and researching method based on causal analysis
CN115514580A (en) * 2022-11-11 2022-12-23 华中科技大学 Method and device for detecting source-tracing intrusion of self-encoder
CN115514580B (en) * 2022-11-11 2023-04-07 华中科技大学 Method and device for detecting source-tracing intrusion of self-encoder
CN115664863A (en) * 2022-12-27 2023-01-31 北京微步在线科技有限公司 Network attack event processing method, device, storage medium and equipment
CN116795850A (en) * 2023-05-31 2023-09-22 山东大学 Method, device and storage medium for concurrent execution of massive transactions of alliance chains
CN116795850B (en) * 2023-05-31 2024-04-12 山东大学 Method, device and storage medium for concurrent execution of massive transactions of alliance chains

Similar Documents

Publication Publication Date Title
CN114637892A (en) Overview map generation method of system log dependency map for attack investigation and recovery
US11710131B2 (en) Method and apparatus of identifying a transaction risk
US10909241B2 (en) Event anomaly analysis and prediction
US9998484B1 (en) Classifying potentially malicious and benign software modules through similarity analysis
Rossi et al. Modeling dynamic behavior in large evolving graphs
US8204904B2 (en) Network graph evolution rule generation
Xu et al. Depcomm: Graph summarization on system audit logs for attack investigation
JP7101272B2 (en) Automatic threat alert triage through data history
WO2018208451A1 (en) Real time detection of cyber threats using behavioral analytics
CN114679329B (en) System for automatically grouping malware based on artifacts
Hamann et al. Structure-preserving sparsification methods for social networks
US20200225936A1 (en) Software discovery using exclusion
Lin et al. Collaborative alert ranking for anomaly detection
Satish et al. Big data processing with harnessing hadoop-MapReduce for optimizing analytical workloads
Jeziorowski et al. Towards image-based dark vendor profiling: an analysis of image metadata and image hashing in dark web marketplaces
Podolskiy et al. The weakest link: Revealing and modeling the architectural patterns of microservice applications
Chen et al. Building machine learning-based threat hunting system from scratch
Studiawan et al. Automatic graph-based clustering for security logs
Ogundiran et al. A framework to reconstruct digital forensics evidence via goal-oriented modeling
Rosli et al. Ransomware behavior attack construction via graph theory approach
CN111886594A (en) Malicious process tracking
Paphitis et al. Graph analysis of blockchain p2p overlays and their security implications
CN112750047B (en) Behavior relation information extraction method and device, storage medium and electronic equipment
CN114915485A (en) Abnormal behavior analysis method and device based on UEBA
Olsson et al. Hard cases in source code to architecture mapping using Naive Bayes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination