CN116015939A

CN116015939A - Advanced persistent threat interpretation method based on atomic technology template

Info

Publication number: CN116015939A
Application number: CN202211730402.8A
Authority: CN
Inventors: 袁淇萱; 朱添田; 应杰; 程雯睿; 陈铁明; 吕明琪
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2022-12-30
Filing date: 2022-12-30
Publication date: 2023-04-25

Abstract

The invention discloses an advanced persistent threat interpretation method based on an atomic technology template, which belongs to the technical field of network security, and comprises the following steps: constructing an attack technology template diagram based on an attack command line; attack technical interpretation based on graph alignment algorithm; real attack mining based on APT attack interpretation sequences. The invention constructs the template diagram of each attack technology based on the real APT attack command, and carries out fine granularity type division on various attack entities, thereby obtaining more accurate technical description; the matching effect of the attack technology can be improved by adopting a graph alignment algorithm considering multi-hop equivalent semantics; for potential attack paths generated by the POIs found by EDR, a great deal of time is consumed for manual research and judgment, false alarms are removed through interpretation of atomic technology, and labor cost is reduced.

Description

Advanced persistent threat interpretation method based on atomic technology template

Technical Field

The invention belongs to the technical field of network security, and particularly relates to an advanced persistent threat interpretation method based on an atomic technology template.

Background

Advanced persistent threat attack is a persistent network threat activity developed for organization objects with specific high-value targets, initiated by complex and resource-rich attackers, has the characteristics of pertinence, concealment, persistence, advanced, and the like, and has been highly popular in recent years. To discover and respond to potential threats within an organization more timely, various types of Endpoint Detection and Response Systems (EDRs) are deployed and applied by organizations such as enterprises.

After the EDR finds suspicious nodes (POIs), a large number of attack paths (potential attack paths) are found through a forward-backward search or machine learning method. However, only a small part of the potential attack paths are true attack paths, that is, false alarms exist in the detection of advanced persistent threats, and even if a security practitioner can manually analyze whether each potential attack path is a true attack chain or not according to professional knowledge, such analysis process is very time-consuming, so that effective response time is missed. On the other hand, the current automatic attack detection strategies based on statistical behavior characteristics, strategy dispatch and behavior learning are limited in precision and lack of reasonable interpretation capability of detection results. The ATT & CK model of MITRE corporation is a framework describing the attack technique used by advanced persistent threat attacks, RED CANARY further defines an atomic technology template that can be used to implement each attack technique, bringing new ideas for advanced persistent threat attack technique-level interpretation.

Aiming at the problems, how to design reasonable entity and relationship type to represent the existing atomic attack technical template, and on the basis, the graph alignment algorithm is utilized to carry out technical matching in the potential attack path so as to find a real attack chain is a problem to be solved.

Disclosure of Invention

The invention aims to provide an advanced persistent threat interpretation method based on an atomic technology template so as to accurately find out a real attack chain.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

an advanced persistent threat interpretation method based on an atomic technology template, the advanced persistent threat interpretation method based on an atomic technology template comprising:

step 1, constructing an attack technology template diagram based on an attack command line;

step 1-1, extracting command words and entities in an attack command line according to a predefined attack entity type and command word mapping relation;

step 1-2, converting the atomic technology in the atomic technology template into an attack technology template diagram based on the mapping relation of the command words according to the extracted command words and entities of each attack command line;

step 2, attack technical explanation based on a graph alignment algorithm;

step 2-1, searching candidate nodes corresponding to all nodes in the attack technique template diagram in the system traceability diagram to form a candidate node set corresponding to each node;

step 2-2, traversing node scores of the candidate nodes in the candidate node set corresponding to the node, and taking the candidate node with the highest node score and larger than a threshold value as a fixed node of the node;

step 2-3, calculating graph alignment scores of the attack technology template graph and the system tracing graph according to nodes in the attack technology template graph and corresponding fixed nodes in the system tracing graph, and if the graph alignment scores are greater than a threshold value, the attack technology template graph exists in the system tracing graph, and matching for multiple times to obtain an attack chain existing in the system tracing graph;

step 3, real attack mining based on APT attack interpretation sequence:

step 3-1, carrying out graph matching on an attack technology template graph and an attack live-action graph formed by CTI reports, and combining the attack technology occurrence sequence into an attack chain to form an attack interpretation sequence library;

step 3-2, traversing the similarity between the system tracing graph and the attack chains in the attack interpretation sequence library, and if the attack chains with the similarity larger than a threshold exist, considering the system tracing graph as a real attack; otherwise, a false attack.

The following provides several alternatives, but not as additional limitations to the above-described overall scheme, and only further additions or preferences, each of which may be individually combined for the above-described overall scheme, or may be combined among multiple alternatives, without technical or logical contradictions.

Preferably, the predefined attack entity types include: hub process, executable file process, shortcut instruction process, office class application software process, entertainment class application software process, graphic image class application software process, management class application software process, data class application software process, other processes, system configuration sensitive file, user configuration sensitive file, application configuration sensitive file, log sensitive file, library file, executable file, other files, registry and socket.

Preferably, the predefined command word mapping relationship includes:

the entity dependency relationship corresponding to the command word call and the start is exec, and the number of parameters of the command word operation is 1; the physical dependency relationship corresponding to the command word del, delete, drop, clean, remove and pop is unlink, and the number of parameters of the command word operation is 1; the entity dependency relationship corresponding to the command words echo, inject, push, mkond, dump, nc-l and mkdir is write, and the number of parameters of the command word operation is 1; the entity dependency relationship corresponding to the command words cat, ls, net, ipconfig, tasklist, netstat, whoami, schtasks, query and wmic is read, and the number of parameters of the command word operation is 1; the entity dependency relationship corresponding to the command words ping, mov and nc-l is send, and the number of parameters of the command word operation is 1; the entity dependency relationship corresponding to the command word ping is a receiver, and the number of parameters of the command word operation is 1; the entity dependency relationship corresponding to the command word ping and wmic is connect, and the number of parameters of the command word operation is 1; the entity dependency relationship corresponding to the command word gate is load, and the number of parameters of the command word operation is 1; the entity dependency relationship corresponding to the command words taskkill, kill and stop is exit, and the number of parameters of the command word operation is 1; the entity dependency relationship corresponding to the command words cp and scp is copy, and the number of parameters of the command word operation is 2.

Preferably, the extracting command words and entities in the attack command line includes:

extracting command words from the input attack command line text;

traversing the text of the attack command line backwards from one command word until the next command word, deleting invalid information in the contents obtained by traversing, and obtaining valid information in the current attack command line;

generating a regular expression and an entity dictionary according to predefined attack entity types, carrying out target entity matching on the effective information based on the number of command word operation parameters in the command word mapping relation, and labeling entity types for successfully matched entities.

Preferably, the converting the atomic technology in the atomic technology template into the attack technology template diagram based on the command word mapping relation includes:

creating a command word homonym process, and creating an entity dependency relationship from the homonym process to an entity according to a command word mapping relationship;

creating an initial process mal which is used as a role of an attacker, wherein a fork relation exists between the mal which is used as a hub process and a homonymous process of a command word;

for an attack command line containing the character 'I', respectively creating processes by the left and right parts of the 'I' according to the command word, setting that a fork relation exists between the processes created on the left and right sides, and obtaining an attack technology template diagram representing an attack technology.

Preferably, whether the node in the system tracing graph is a candidate node of the node in the attack technique template graph is judged, as follows:

in the formula, candi (i: k) represents the corresponding relation between a node i in an attack technique template diagram and a node k in a system traceability diagram, i _type Representing the entity type, k, of node i _type Representing the entity type, i, of node k _degree Representing the degree, k, of node i _degree Representing the degree of node k;

if Candi (i: k) =1, it indicates that node k is a candidate node for node i, and if Candi (i: k) =0, it indicates that node k is not a candidate node for node i.

Preferably, the node score of the candidate node is calculated as follows:

where NodeScore (i, k) represents the node score of candidate node k for node i, G _q A template diagram of the attack technique is shown,

represents a path starting with node i and ending with node j, +.>

Representing the path->

PathScore (i, j: k) represents the path +.>

Is the path score of the node i ', j' is the path +.>

The node i '. Fwdarw.j' in (i '. Fwdarw.j') is the edge formed by the node i '. Fwdarw.j'), the hop count of the edge i '. Fwdarw.j', and the edge score (i ', j': k) represents the edge score of the edge i '. Fwdarw.j', Γ _c (j) Candidate node G representing node j _p Representing a system traceability map,/->

Represents a path starting at node k and ending at node l +.>

Representing the path->

MHES (i ', j': k, l) is the edge i '→j' and path +.>

Multi-hop semantic equivalent results of (a).

Preferably, the edge i '. Fwdarw.j' and the path are determined

Whether multi-hop semantics are equivalent or not, as follows:

defining triplet < subject, object, event type > information representing side, then taking side i '→j' = (< v) _i′ ，v _j′ ，e _v >) take path

Is an n-1 hop path, and path +.>

If the following formula is satisfied, the edge i '. Fwdarw.j' and the path +.>

Is multi-hop semantic equivalence; otherwise, the multi-hop semantic equivalence is not realized;

in the formula, i '. Type represents the entity type of the node i', and U represents the suspicious semantic transfer rule set.

Preferably, the suspicious semantic transfer rule set includes:

the event type is Read between the host Process and the object File, the event type is Write between the host Process and the object File, the event type is Load between the host Process and the object File, the event type is Receive between the host Process and the object Socket, the event type is Send between the host Process and the object Socket, and the event type is Fork/Clone between the host Process and the object Process.

Preferably, the graph alignment score is calculated as follows:

wherein Γ (G) _p ：G _q ) Tracing a graph G for a system _p And attack technique template graph G _q Is (are) graph alignment score Γ _F (i) Is a fixed node of node i, Γ _F (j) Is a fixed node of the node j,

represents a path starting with node i and ending with node j, +.>

Representing the path->

Is>

Represented by a fixed node Γ _F (i) Starting with and fixing node Γ _F (j) For the terminated path, ++>

Representing the path->

Is a number of hops.

Compared with the prior art, the advanced persistent threat interpretation method based on the atomic technology template has the following beneficial effects:

1. the template diagram of each attack technology is constructed based on the real APT attack command, and fine-granularity category division is carried out on various attack entities, so that more accurate technology description can be obtained.

2. The matching effect of the attack technology can be improved by adopting a graph alignment algorithm considering multi-hop equivalent semantics.

3. For potential attack paths generated by the POIs found by EDR, a great deal of time is consumed for manual research and judgment, false alarms are removed through interpretation of atomic technology, and labor cost is reduced.

Drawings

Fig. 1 is a flow chart of a high-level persistent threat interpretation method based on an atomic technology template of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

Referring to fig. 1, an advanced persistent threat interpretation method based on an atomic technology template includes the steps of:

(1) Constructing an attack technology template diagram based on an attack command line: and defining the entity type and entity dependency relation related in the atomic technology template, and constructing a template diagram representing a certain type of APT attack technology on the basis of the entity type and entity dependency relation.

(1-1) extracting the entity in the attack command line and labeling the entity type according to the predefined attack entity type and command word mapping relation.

The entity type and entity dependency relationship (namely command word mapping relationship) of the atomic technology template are defined: the Red Canary further provides a command line which can be operated by the specific implementation of a certain attack technology on the basis of the technology provided by the ATT & CK, and provides more accurate information sources for technology matching in the attack traceability graph. The attack command line includes many different types of entities, such as files, processes, sockets, etc., and the entities of the same type need to be further finely divided due to different sources, sensitivity, functions, etc. to improve the accuracy of the matching of the subsequent technologies, and table 1 defines the entity types and descriptions of the atomic technology templates.

Table 1 definition of entity types in atomic technology templates

/>

Through analysis of a large number of atomic technology templates, the following 10 kinds of entity dependency relationships are defined: exec, unlink, write, read, send, receive, connect, load, exit, copy.

An attack command line is typically composed of command words (shortcuts), options, and parameters. The command words are generally shaped as cat, rm, ls, and the shortcut instructions contain rich dependency relationships among entities but cannot be directly extracted as verbs which can be mapped to the system bottom log. In order to effectively capture abundant semantic information in the shortcut instruction, the mapping relation between the shortcut instruction and the entity and the number of shortcut instruction operation parameters are defined as shown in table 2 by analyzing common command words in an attack scene.

Table 2 command word mapping and parameter number definition

/>

The key information contained in the attack command line is the command word, the command word operation target and the implicit dependency relationship between the command word and the command word operation target. A general command line shape is like "mkdir-p/tmp/atomic", where the command word is mkdir, the operation target entity is/tmp/atomic, and the dependency relationship between them is defined as write according to the above table, and-p is non-critical information, so that it is filtered. According to the case, the design attack command line key information automatic extraction steps are as follows:

(1) and extracting a predefined command word from the input attack command line text.

(2) The attack command line text is traversed backwards until the next command word is found, and invalid information in the form of "-p" is deleted from the content obtained by traversing.

(3) Designing a regular expression shown in table 3 and an entity dictionary shown in table 4 according to entity types defined in table 1, referring to the parameter number defined in table 2, performing target entity matching in the remaining effective information in step (2), and labeling each entity successfully matched with the entity type to which the entity belongs, namely marking the entity.

Table 3 entity types and their corresponding regular expressions

/>

Table 4 entity types and corresponding entity dictionaries

It should be noted that, on the basis of the physical dictionary disclosed in the embodiment, the physical dictionary may be increased or decreased to obtain a physical dictionary that is more suitable for application.

(1-2) converting the atomic technical templates containing the attack command lines into an attack technical template diagram based on the command word mapping relation according to the extracted command words and entities of the attack command lines.

After the command word in the attack command and the operation target (entity) are obtained, they are combined into a connected attack technique template diagram by the following steps. First, a command word homonym process entity (P) is created, and an entity dependency relationship of the process to an operation target is created from the command word mapping relationship defined in table 2. Second, since an atomic technology may include multiple attack command lines, these may not have a dependency relationship between them, in order to ensure connectivity of the entire attack technology template graph, an initial process mal is created as an attacker, and mal exists as a fork relationship (e.g., mal fork cat) between hub process and command word process. Finally, considering special cases such as pipelines, the processes are respectively created by the left and right parts of the 'I' according to the command words, and the relationship between the pipeline left side process P1 and the pipeline right side process P2 is considered to be P1 fork P2. So far a connectivity graph representing the attack technique template can be obtained.

And (3) executing the steps (1-1) and (1-2) aiming at each atomic technology in the atomic technology template to obtain an attack technology template diagram corresponding to each atomic technology (attack technology).

(2) Attack technical explanation based on graph alignment algorithm: and finally, obtaining the alignment score of the system tracing image and the attack technique template image by designing an image alignment algorithm considering multi-hop semantic equivalence.

And (2-1) searching candidate nodes corresponding to all nodes in the attack technology template diagram in the system traceability diagram to form a candidate node set corresponding to each node.

Multi-hop semantic equivalence definition: in the attack process, the interaction between the suspicious process and other entities can enable the suspicious information flow and the control flow carried by the suspicious process to be transmitted according to a certain rule, so that other entities also have suspicious semantics. To judge a single-hop edge i- & gt j and a multi-hop path

If the semantics are equivalent, defining suspicious semantic rule transfer is shown in table 5, and single jump edge i- & gt j= & lt/No. & gt<v _i ,v _j ,e _v >) N-1 hop path->

And use triplets<Subject, object, event type>To represent the information of the edges, and the suspicious semantic transfer rules define the information in the table as a set U.

If equation 1 is satisfied, then we consider the edges i→j and the multi-hop path

Semantic equivalence between them.

In the formula, i.type is the entity type of the node i, j.type is the entity type of the node j, v ₁ Type is the entity type of the body in the first triplet in the multi-hop path, v _n Type is the last in the multi-hop pathEntity type of object in the triplet.

TABLE 5 suspicious semantic rules delivery tables

Searching candidate nodes: reference is first made to Poirot (Milajerdi S M, eshete B, gjomamo R, et al Poirot: aligning attack behavior with kernel audit records for cyber threat hunting [ C)]i/Proceedings of the 2019 ACM SIGSAC conference on computer and communications security.2019:1795-1812.), and a system traceability graph G representing a kernel audit log is constructed by combining entity types defined above _p . Then trace the source graph G in the system _p Middle search attack technique template graph G _q Corresponding candidate node k of all nodes i in the list, and the candidate node set corresponding to the node i is called gamma _c (i) A. The invention relates to a method for producing a fibre-reinforced plastic composite The process is shown in formula 2, wherein i _type Indicating the type of node i, i _deiree Representing the degree of node i, if Candi (i: k) is 1, then k is added to Γ _c (i)。

(2-2) fixing candidate nodes: at candidate node set Γ _c (i) In the method, node scores of all candidate nodes of the node i are calculated through traversal, and a node m, m epsilon gamma with the highest node score and larger than a threshold value is selected _c (i) And the fixed node m of i is marked as Γ _F (i) A. The invention relates to a method for producing a fibre-reinforced plastic composite It should be noted that, if there is no node whose node score is greater than the threshold value, the corresponding node has no candidate node.

The node score calculation is shown in equation 3, wherein

Indicating the number of hops of the path starting with node i and ending with node j, nodes i and k are fixed as inputs. The calculation part of the path score is shown in formula 4, where i ', j' is the path

The node i '. Fwdarw.j ' in (i '. Fwdarw.j ') is the edge formed by i ', ji, and (i '. Fwdarw.j ') is the path +.>

The number of hops of a certain side of the frame is 1. The calculation part of the edge score is shown in formula 5, wherein MHES is multi-hop semantic equivalence, and can be calculated by using formula 6.

(2-3) graph alignment score calculation: at the moment of determining the attack technique template graph G _q After the fixed nodes of all the nodes in the graph, a graph pair Ji Fenshu Γ (G) between the two graphs is calculated according to the formula 7 _p :G _q ) Obtaining an attack technique template graph G _q Exists in the system traceability graph G _p And (5) quantitatively judging the condition. And repeatedly executing the steps (2-1), (2-2) and (2-3), and mapping all attack technique templates G _q And system tracing graph G _p Matching to obtain a system traceability graph G _p All attack technique template graph G contained _q Finally, a representative system traceability graph G can be obtained _p Attack technique sequences of (a).

(3) True attack mining based on APT attack interpretation sequences: and calculating the similarity between the attack technical chain obtained by matching and the existing attack interpretation sequence, so as to judge whether the attack is a real attack.

(3-1) attack interpretation sequence construction based on network threat intelligence report: the network threat intelligence (CTI) report is a report formed by sorting and summarizing information such as motivations, purposes and means of an attacker, provides complete description of one attack activity, and has various sources and huge quantity. By crawling massive network threat information and screening out parts with better quality, reliable data sources of real attack interpretation sequences can be formed. Therefore, the attack technology template diagram obtained in the step (1) is matched with an attack live-action diagram formed by CTI reports, and the attack live-action diagram is combined into an attack chain according to the occurrence sequence of the attack technology, so that an attack interpretation sequence library which covers the whole attack types can be constructed.

The attack technique in this embodiment refers to an attack technique defined in the att & ck model, and the attack technique template diagram can only represent a specific attack technique, and the attack chain is a sequence (attack technique sequence) with timing sequence of the attack technique. The attack interpretation sequence library is a large number of real attack chains obtained by matching from a large number of data sources.

(3-2) true attack mining based on similarity calculation: let the attack technical sequence to be confirmed obtained in the step (2) be T _C Some attack chain in the attack interpretation sequence library is T _E ，T _C Middle and T _E The same attack technique is composed of a set of T _S ＝{t ₁ ，t ₂ ，t ₃ ...t _n }, t is _i Representing T _C Middle and T _E The same attack technique, and t _i Representative attack techniques are at T _C The corresponding index value of (c) _i At T _E The corresponding index value in (a) is e _i . To judge T _C Whether the represented attack is a real attack or not, and designing a similarity calculation formula as formula 8:

wherein d _i Is C _i And e _i Absolute value of the difference. If the finally obtained Sim is larger than the threshold value, the attack is considered to be a real attack; otherwise, a false attack.

The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present invention, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of the invention should be assessed as that of the appended claims.

Claims

1. An advanced persistent threat interpretation method based on an atomic technology template, characterized in that the advanced persistent threat interpretation method based on an atomic technology template comprises the following steps:

step 2, attack technical explanation based on a graph alignment algorithm;

step 3, real attack mining based on APT attack interpretation sequence:

2. The advanced persistent threat interpretation method based on atomic technology templates as claimed in claim 1, wherein the predefined attack entity types include: hub process, executable file process, shortcut instruction process, office class application software process, entertainment class application software process, graphic image class application software process, management class application software process, data class application software process, other processes, system configuration sensitive file, user configuration sensitive file, application configuration sensitive file, log sensitive file, library file, executable file, other files, registry and socket.

3. The advanced persistent threat interpretation method based on an atomic technology template as claimed in claim 1, wherein the predefined command word mapping relationship comprises:

4. The advanced persistent threat interpretation method based on an atomic technology template as claimed in claim 1, wherein said extracting command words and entities in an attack command line comprises:

extracting command words from the input attack command line text;

5. The advanced persistent threat interpretation method based on atomic technology template according to claim 1, wherein the converting atomic technologies in atomic technology template into attack technology template graph based on command word mapping relation comprises:

6. The advanced persistent threat interpretation method based on atomic technology template according to claim 1, wherein determining whether a node in the system trace-source graph is a candidate node for a node in the attack technical template graph is as follows:

7. The advanced persistent threat interpretation method based on atomic technology template as claimed in claim 1, wherein the node score of the candidate node is calculated as follows:

represents a path starting with node i and ending with node j, +.>

Representing the path->

PathScore (i, j: k) represents the path +.>

Path score, node i ^′ ,j ^′ Is a path->

Nodes i in (1) ^′ →j ^′ Then it is node i ^′ ,j ^′ A structured edge, (i) ^′ →j ^′ ) Representing edge i ^′ →j ^′ Is the hop count of (i) ^′ ,j ^′ :k) Representing edge i ^′ →j ^′ Is of the edge score Γ _c (j) Candidate node G representing node j _p Representing a system traceability map,/->

Represents a path starting at node k and ending at node l +.>

Representing the path->

Is the hop count of MHES (i) ^′ ,j ^′ K, l) is an edge i ^′ →j ^′ And Path->

Multi-hop semantic equivalent results of (a).

8. The advanced persistent threat interpretation method based on atomic technology template as claimed in claim 7, wherein the edge i is judged ^′ →j ^′ Sum path

Whether multi-hop semantics are equivalent or not, as follows:

defining triples<Subject, object, event type>Representing information of the edge, then taking the edge i ^′ →j ^′ ＝(<v _i′ ,v _j′ ,e _v >) Taking a path

Is an n-1 hop path, and path +.>

Edge i if the following formula is satisfied ^′ →j ^′ And Path->

wherein i is ^′ Type represents node i ^′ And U represents the suspicious semantic delivery rule set.

9. The advanced persistent threat interpretation method based on an atomic technology template in accordance with claim 8, wherein said suspicious semantic delivery rule set comprises:

10. An advanced persistent threat interpretation method based on an atomic technology template, as claimed in claim 1, wherein the graph alignment score is calculated as follows:

wherein Γ (G) _p :G _q ) Tracing a graph G for a system _p And attack technique template graph G _q Is (are) graph alignment score Γ _F (i) Is a fixed node of node i, Γ _F (j) Is a fixed node of the node j,

represents a path starting with node i and ending with node j, +.>

Representing the path->

Is>

Representing the path->

Is a number of hops. />