CN111988339B

CN111988339B - Network attack path discovery, extraction and association method based on DIKW model

Info

Publication number: CN111988339B
Application number: CN202010929301.8A
Authority: CN
Inventors: 宋磊; 李俊
Original assignee: Zhuhai Yizhi Security Technology Co ltd
Current assignee: Guangdong Yizhi Security Technology Co ltd; Guangzhou Yizhi Security Technology Co ltd
Priority date: 2020-09-07
Filing date: 2020-09-07
Publication date: 2022-03-11
Anticipated expiration: 2040-09-07
Also published as: CN111988339A

Abstract

A network attack path discovering, extracting and associating method based on a DIKW model belongs to the technical field of network security. The invention aims to solve the problems that the traditional network detection method can only find isolated attack events and judge the threat degree of a single attack event, and lacks the capabilities of extracting and analyzing the information of the attack events and judging the whole attack situation. The method includes the steps of summarizing network attack information to form a knowledge graph, comparing and matching acquired network behavior data D with a knowledge K base, expanding, removing duplicates and combining the matched network behavior data D to obtain attack information I, entering the obtained attack information I into a corresponding intelligent model W, judging whether the information I and associated information meet a logical relation in the intelligent model W, if so, generating an attack event chain, and otherwise, storing the attack information I. The method is suitable for the network security fields of the traditional internet, the industrial internet, the mobile internet, the internet of things and the like.

Description

Network attack path discovery, extraction and association method based on DIKW model

Technical Field

The invention belongs to the technical field of network security, and particularly relates to a network attack path discovery, extraction and association method.

Background

Under the current complex international environment, the frequency of network attack events is higher and higher, the influence is larger and larger, and the technology and attack means applied by network attack are also more and more complex. In particular, organized and purposeful attackers often utilize the existing built-in tools and methods of the system to probe and attack network programs, and the existing detection means often cannot identify the network programs. The traditional network attack path analysis method adopts an analysis method based on an attack graph, extracts the source and the purpose of an attack from each attack event, further analyzes the vulnerability of a single device existing in a target network in an isolated manner, and cannot comprehensively analyze the security risk of the attack event on the whole network and the security risk possibly caused in the future. In the analysis process, the relation is generally established among different attack events through network equipment IP, host names and the like, and the logical relation of the attack events cannot be thoroughly analyzed.

At present, network attacks are divided into stages of attack detection, authority acquisition, authority maintenance, transverse movement, target acquisition and the like, and traditional detection methods such as antivirus software, a firewall, an IDS and an IPS can only detect certain attacks in a certain stage based on isolated attack events, for example, antivirus software can only detect virus trojans left by attackers in the authority maintenance process generally, and IDS can only detect partial attack attempts in the attack detection process generally. The detection means mainly depends on a feature library (including virus Trojan characteristics, network attack characteristics and the like) and threat information, has single characteristics, can only identify single attack and generate single alarm, and cannot detect malicious operation behaviors carried out by utilizing system legal software such as Powershell and other tools. The traditional analysis method can only judge the threat degree of a single attack event and cannot grasp the network attack on the whole. Based on the current technical failure, no matter NTA flow probe or EDR terminal probe, the generated attack log is a single alarm, for example, XXX carries out SQL injection on XXX, and a host AAA finds BBB virus, so that the security state and the attack condition of the whole network cannot be comprehensively mastered. The traditional analysis method is based on known attack characteristic analysis and network vulnerability analysis, lacks of analysis and judgment capability of attack situation, and detection means such as a characteristic library and threat information have serious hysteresis, so that generated threat events can be detected, but unknown attack events cannot be detected, while the latest detection method, such as machine learning, artificial intelligence and the like, can only solve the hysteresis problem of the traditional characteristic library and a virus library, and cannot break through attack association and overall situation.

Disclosure of Invention

The invention provides the following technical scheme for solving the problems that the traditional network detection method can only find isolated attack events and judge the threat degree of a single attack event and lacks the capabilities of extracting and analyzing attack event information and judging the whole attack situation:

the invention discloses a network attack path discovering, extracting and associating method based on a DIKW model, which comprises the following steps:

step one, summarizing the network attack information of the same type to generate knowledge k according to the known network attack information; forming a knowledge k base by different types of knowledge k together, converting the knowledge k base into a graph, and forming a knowledge graph capable of representing the logical relationship between different knowledge k;

acquiring network flow data by using an NTA flow probe, acquiring equipment activity behavior data and equipment log data by using an EDR terminal probe, and taking the network flow data, the equipment activity behavior data and the equipment log data as network behavior data D with the same format as that of the knowledge k base data;

comparing and matching the network behavior data D with knowledge K in a knowledge K library to obtain the same type of network behavior data D and knowledge K, expanding, removing duplication and combining the same type of network behavior data D according to the knowledge K to obtain attack information I, and storing the network behavior data D which is not matched with the knowledge K;

inputting the attack information I into an intelligent model W, and performing aggregation association on the attack information I by the intelligent model W according to the logical relationship in the knowledge graph to obtain associated information with the same attribute as the attack information I;

judging whether the attack information I and the associated information meet the logical relationship in the intelligent model W:

if so, automatically constructing a model corresponding to the attack information I by the intelligent model W, and generating an attack event chain by using the model, wherein the information displayed by the attack event chain comprises information of an attack source, information obtained by authority and associated aggregated information; the information of the attack source is used for discovering the network attack path, the information obtained by the authority is used for extracting the network attack path, and the information aggregated by the association is used for associating the network attack path;

otherwise, saving the attack information I.

Preferably, in the first step, the information data in the knowledge k base includes: the method comprises the following steps that device process creation data, device process end data, device file activity data, device startup item data, device user login data, device user modification data and network flow data are obtained; the device process creation data includes: process PID data, process name data, process user name data, father process PID data and father process user name data; the device process end data includes: process PID data, process name data, and process user name data.

Preferably, in the second step, the network traffic data includes: quintuple data, application layer protocol data, source IP data, source port data, destination IP data and destination port data; the device activity behavior data includes: process path data, command line data, authorized user data and start time data; the device log data includes: device name data, time data, user data, activity type data, activity impact data, and log source data.

Preferably, in the fourth step, the intelligent model W can construct a corresponding model according to the attack information I and the related information, where the model includes: a Lesson attack model, a phishing model, a mining attack model, a data stealing model, a botnet model, an APT attack model, an attack key technology point model, an attack team model, an attack tactical model and a machine learning model.

Preferably, the intelligent model W is used for evaluating a certain type of attack behavior in the process of analyzing the attack information I, and the attack behavior includes the following procedures:

(1) opening a mail attachment by using a mail client;

(2) the mail attachment starts the Word process of the Office program;

(3) the Word process starts the Cmd.exe process;

(4) starting a certutil.exe process by the cmd.exe process;

(5) the certutil. exe process is connected with an external network attack IP address to realize attack behaviors.

The intelligent model W performs aggregate association on the attack behaviors according to the logical relationship in the knowledge graph to generate associated information; when the attack behavior and the associated information meet the logical relationship in the intelligent model W, the intelligent model W detects an attack event based on a mail attachment, a phishing network mail model is constructed in a matching mode, a chain of the attack event is generated by the phishing network mail model, the attack behavior can be known to occur in the initial permission obtaining stage according to the information in the chain, information collection and intranet expansion can be predicted in the future, and the attack organization group and the attack method are associated based on the associated information of the phishing network mail model, so that the potential purpose and the influence range of the attack are obtained.

Has the advantages that: the network attack path discovering, extracting and associating method based on the DIKW model changes the traditional identifying and analyzing method, and greatly improves the reliability and operability of the technology by utilizing the matching of knowledge maps and the pipeline type processing mode of the model based on the general characteristics inevitably applied in the attack process. The invention can be detected in the early stage of network attack based on the correlation between events, and the intelligent model analyzes the whole attack event chain, directly generates the complete attack chain of the attack event, further discovers suspicious attack targets and effectively and reasonably predicts the future security situation.

Drawings

FIG. 1 is a flow chart of a method for network attack path discovery, extraction and correlation based on a DIKW model;

FIG. 2 is a schematic diagram of a rights elevation attack knowledge graph;

FIG. 3 is a schematic flow chart of acquiring attack information I;

FIG. 4 is a schematic flow chart of the chain of attack events generated by the intelligent model W.

Detailed Description

The first embodiment is as follows: referring to fig. 1, the present embodiment is specifically described, and a network attack path discovery, extraction, and association method based on a didw model in the present embodiment is as shown in fig. 1, and the method includes the following steps:

otherwise, saving the attack information I.

The network attack path is discovered, extracted and associated based on a DIKW model, and the construction of the DIKW model comprises the steps of summarizing a knowledge map K, collecting network behavior data D, and obtaining the matching of attack information I and a smart model W.

The second embodiment is as follows: the embodiment further describes a network attack path discovery, extraction and association method based on a DIKW model, in the embodiment, information data in a knowledge k base in the step one comprises equipment process creation data, equipment process end data, equipment file activity data, equipment startup item data, equipment user login data, equipment user modification data and network flow data; the equipment process creation data comprises process PID data, process name data, process user name data, father process PID data and father process user name data; the device process end data includes process PID data, process name data, and process user name data.

In the first step, the knowledge K is a term describing a certain attack behavior and method, the knowledge K library is a set of a large number of knowledge K of different types, and the knowledge graph represents a structural description of the knowledge K and can also describe a logic relationship of a certain type of attack behavior, for example, the following authority lifting attack behaviors:

in a windows SYSTEM, a common attack type is authority promotion, that is, when a user unintentionally infects a malicious program, the authority of the malicious program is the authority of the user, and the general malicious program is expected to become the SYSTEM (highest) authority of a computer SYSTEM, and at this time, a security expert generalizes the following attack method:

A. modifying the current authority of the malicious program from the user authority to a SYSTEM authority through an operating SYSTEM kernel vulnerability;

B. modifying the new program authority created by the malicious program from a user to a SYSTEM authority through an operating SYSTEM kernel vulnerability;

one attack method is that an authority promoting knowledge entry, an authority promoting knowledge base is a collection of a large number of authority promoting knowledge entries of different types, the authority promoting knowledge base is converted into a graph to form a knowledge graph capable of representing the logical relation between authority promoting attack behaviors, and as shown in fig. 2, the authority promoting knowledge graph describes the following information:

1. data sources required for rights-raising attack behavior:

(1) the process creation event data comprises process PID data, process name data, process command line data, process user name data, process father process PID data, user name data and process Token level data;

(2) the process end event data includes: process PID data, process name data, process user name data and process Token level data;

2. judging whether the data is attack information:

(1) when the process is established, judging whether the parent process user name is consistent with the current process user name or not, and if not, taking the data as attack information;

(2) and when the process is finished, judging whether the process user name is consistent with the user name during creation, if not, the data is attack information.

The third concrete implementation mode: the embodiment further describes a network attack path discovery, extraction and association method based on a DIKW model in the first embodiment. In this embodiment, the network behavior data D acquisition device includes, but is not limited to, a traffic hardware acquisition device, a traffic software acquisition device, a terminal behavior acquisition device, and a terminal log acquisition device.

In the second embodiment, in the second step, the NTA flow probe is used to collect all network flow data of the target network, and the network flow data is extracted layer by layer, so that the storage space is saved. The NTA flow probe is divided into serial and bypass deployments, and the invention adopts the bypass deployments to avoid the influence on the original network to the maximum extent. All network flow data packets in a data center and an office network are sent to a network card of an NTA flow probe by configuring equipment such as a switch mirror flow, an optical splitter and the like in the network, and a network flow acquisition protocol analysis program deployed on the network card of the NTA flow probe realizes the acquisition of network flow data by capturing the data packets of the network card. The analysis software analyzes the format of the data packet subsequently through the RFC standard protocol document, analyzes the source IP, the source port, the destination IP and the application protocol data, and analyzes the key information in the protocol according to the application protocol and the corresponding RFC document. Key information such as HTTP protocol includes HTTP request header, HTTP response code, HTTP response header, HTTP Host field, URI.

The EDR terminal probe adopts corresponding probe software installed on a terminal (each computer device, Internet of things device and the like), and enables an equipment behavior log acquisition terminal of the EDR terminal probe to acquire equipment activity behavior data and equipment log data of an operating system data center in a mode of calling an operating system interface, hooking an operating system function and the like.

The network traffic data includes: quintuple data, application layer protocol data, source IP data, source port data, destination IP data and destination port data; the device activity behavior data includes: process path data, command line data, authorized user data and start time data; the device log data includes device name data, time data, user data, activity type data, activity impact data, and log source data.

As shown in fig. 3, the device activity behavior data and the device log data of the data center are collected by the device behavior log collection terminal using the EDR terminal probe. And acquiring network traffic data in a data center and an office network by using a network traffic acquisition protocol analysis program of the NTA traffic probe. And comparing the network traffic data, the equipment activity behavior data and the equipment log data serving as network behavior data D with a knowledge K base to respectively obtain knowledge 1 and knowledge 2 … with the same format as the network behavior data D, and matching, expanding, de-duplicating and combining the network behavior data D with the same format by the knowledge 1 and the knowledge 2 … to respectively obtain information 1 and information 2 …. The data expansion is used for acquiring relevant data of the data, for example, network traffic data acquired by the NTA traffic probe is as follows:

and (3) source IP: 94.191.2.168, respectively;

source port: 4245, mixing the raw materials;

destination IP: 69.63.176.59, respectively;

destination port: 22;

comparing and matching the network traffic data with knowledge K in a knowledge K base, wherein the matched knowledge contains geographical position data, but the acquired network traffic data does not contain the geographical position data, and the knowledge K base is used for comparing and matching the network traffic data with the knowledge K in the knowledge K base according to a target IP: 69.63.176.59 expanding the geographical location data of the piece of data, the expanded data being:

and (3) source IP: 94.191.2.168, respectively;

source port: 4245, mixing the raw materials;

destination IP: 69.63.176.59, respectively;

destination port: 22;

source geographic location: chongqing of China;

the destination geographic location: new york, usa;

if the same source IP, different destination IPs and the same destination port are detected to have network access for more than 5 times within 3 seconds, the knowledge K base needs to duplicate the repeatedly accessed source IP, merge the different destination IPs and finally obtain attack information.

The fourth concrete implementation mode: the implementation mode is further explained for a network attack path discovery, extraction and association method based on a DIKW model in the first specific implementation mode, in the implementation mode, the intelligent model in the fourth step can construct a corresponding model according to attack information I and association information, and the corresponding model comprises a Lesoxhlet attack model, a network mail fishing model, a mine digging attack model, a data stealing model, a zombie network model, an APT attack model, an attack key technology point model, an attack team model, an attack tactical model and a machine learning model;

as shown in fig. 4, when the attack information I and the associated information satisfy the event logical relationship in the smart model W, the smart model W automatically matches the models corresponding to the attack information for different types of attack information, and the information 1 to the information 5 sequentially enter the corresponding smart models to generate an attack event chain with the attack behavior, where the information displayed by the attack event chain includes information for discovering a network attack source, information for authority extraction, and information for aggregation association. The attack event chain is not only the description of the origin and the process of the attack event, but also the prediction of the whole event and future threats, and can be matched with the organization group and the method of the attack, so that the potential purpose and the influence range of the attack are presumed and a solution is given.

The fifth concrete implementation mode: in the fourth embodiment, the intelligent model W is a basis for evaluating a certain type of attack behavior in the process of analyzing an attack event by a security expert, and the expression form of the intelligent model W is a knowledge graph which includes a plurality of data types and logic relationships of various attack events. And when the attack information I and the associated information meet the logical relationship in the intelligent model W, the intelligent model W automatically constructs a model corresponding to the attack information I and generates an attack event chain. Such as the following described acts of attacking mail attachments:

1. opening a mail attachment by using a mail client;

2. the mail attachment starts the Word process of the Office program;

3, starting a Cmd.exe process by the Word process;

4, starting a Pid 12 and a name certutil process by the cmd.exe process;

the Pid 12, name certutil. exe process connects IP 10.130.0.1.

The intelligent model W carries out aggregation and association on the attack behaviors of the mail attachments according to the logical relationship in the knowledge graph, and the information at the moment is as follows:

process Pid 12, name certutil. exe accesses network source IP 10.130.0.1 destination IP 141.22.11.22.

The associated information includes a process Pid: 12, process creation information:

the process creates the Pid: 12;

the process name is as follows: certutil.exe;

process username: test;

the parent process Pid: 10;

parent process username: exe.

The reassociation process Pid: 10, information:

the process creates the Pid: 10;

the process name is as follows: exe;

process username: test;

the parent process Pid: 8;

parent process username: word.exe;

and ending the process Pid: 10;

the process name is as follows: exe;

process username: SYSTEM.

When the attack behavior and the associated information of the mail attachment meet the logical relationship in the intelligent model W, an attack event based on the mail attachment is considered to occur once and is matched with the phishing network mail model, the phishing network mail model generates an attack event chain, the attack can be known to occur in the initial permission obtaining stage according to the information in the chain, information collection and intranet expansion can be predicted to be carried out in the future, and the organization group and the method of the attack can be matched based on the associated information of the phishing network mail model, so that the potential purpose and the influence range of the attack can be obtained.

Claims

1. A network attack path discovery, extraction and association method based on a DIKW model is disclosed, wherein the DIKW model is a model of the relationship among network behavior data D, attack information I, knowledge K and a smart model W;

characterized in that the method comprises the following steps:

if so, the intelligent model W automatically matches a model corresponding to the attack information I aiming at different types of attack information, and an attack event chain is generated by utilizing the model, wherein the information displayed by the attack event chain comprises information of an attack source, information obtained by authority and associated aggregated information; the information of the attack source is used for discovering the network attack path, the information obtained by the authority is used for extracting the network attack path, and the information aggregated by the association is used for associating the network attack path;

otherwise, saving the attack information I.

2. The method for discovering, extracting and correlating network attack paths based on DIKW model as claimed in claim 1, wherein in the first step, the information data in the knowledge k base includes: the method comprises the following steps that device process creation data, device process end data, device file activity data, device startup item data, device user login data, device user modification data and network flow data are obtained; the device process creation data includes: process PID data, process name data, process user name data, father process PID data and father process user name data; the device process end data includes: process PID data, process name data, and process user name data.

3. The method for network attack path discovery, extraction and correlation based on DIKW model as claimed in claim 1, wherein in step two, the network traffic data comprises: quintuple data, application layer protocol data, source IP data, source port data, destination IP data and destination port data; the device activity behavior data includes: process path data, command line data, authorized user data and start time data; the device log data includes: device name data, time data, user data, activity type data, activity impact data, and log source data.

4. The method for network attack path discovery, extraction and correlation based on DIKW model as claimed in claim 1, wherein the intelligent model W in step four is capable of constructing a corresponding model according to the attack information I and the correlation information, and the model includes: a Lesson attack model, a phishing model, a mining attack model, a data stealing model, a botnet model, an APT attack model, an attack key technology point model, an attack team model, an attack tactical model and a machine learning model.

5. The DIKW model-based network attack path discovery, extraction and association method of claim 4, wherein the intelligent model W is used for evaluating a certain type of attack behavior in the process of analyzing the attack information I, and the attack behavior comprises the following procedures:

(1) opening a mail attachment by using a mail client;

(2) the mail attachment starts the Word process of the Office program;

(3) the Word process starts the Cmd.exe process;

(4) starting certutil.exe process by Cmd.exe process;

(5) the certute.exe process is connected with an external network attack IP address to realize attack behaviors;