CN117201165A

CN117201165A - Threat alarm association analysis method based on network threat information

Info

Publication number: CN117201165A
Application number: CN202311276189.2A
Authority: CN
Inventors: 乌吉斯古愣; 俞赛赛; 刘晓影; 梁丰麒; 王淮; 陈静; 刘文瀚; 谭震; 李斌
Original assignee: CETC 15 Research Institute
Current assignee: CETC 15 Research Institute
Priority date: 2023-09-29
Filing date: 2023-09-29
Publication date: 2023-12-08
Anticipated expiration: 2043-09-29
Also published as: CN117201165B

Abstract

The invention discloses a threat alarm association analysis method based on network threat information, which comprises the following steps: acquiring network threat information; constructing and obtaining a threat alarm association analysis model by utilizing the network threat information; the network threat information comprises a network threat event report library, a network countermeasure tactics technology, a general knowledge document and an alarm log; the threat alarm association analysis model comprises a threat semantic dictionary, an alarm type mapping dictionary and a technical number pair; and analyzing and processing the network threat information by using the threat alarm association analysis model to obtain a threat alarm association analysis result. On the basis of association identification based on threat information driving, the invention utilizes the rule set to find the internal association between alarms to construct an alarm association diagram, provides complete descriptive description of one-time invasion attack scene, and improves the accuracy of threat alarms.

Description

Threat alarm association analysis method based on network threat information

Technical Field

The invention relates to the technical field of network security, in particular to a threat alarm association analysis method and device based on network threat information.

Background

With the rapid development of the internet, the network intrusion problem has gradually become a hotspot for students and security enterprises to study. As an important tool for detecting and analyzing network malicious behaviors, intrusion detection systems (Intrusion Detection Systems, IDS) have been widely used in different institutions, playing a vital role in the field of network security. The massive log data generated by IDS not only contains information about the attack, but also contains potentially complex multi-step patterns of attack.

For large institutions, a large number of alarms are generated daily. It becomes increasingly challenging to discover logical associations between alarms, especially in terms of building complex attack scenarios at a higher level of abstraction. Thus, research in this area focuses on how to reduce the impact of redundant alarms on analysis results, how to extract the behavioral patterns of an attacker from the original alarm data, and how to understand and interpret the attack event on a semantic level. At present, IDS based on rule detection technology or anomaly detection technology has some disadvantages, such as high false alarm rate, high repeated alarm rate, and low detection rate for multi-step attack. Aiming at a Network Intrusion Detection System (NIDS), the network intrusion detection system is mainly focused on analyzing and matching abnormal conditions of network data packet levels, however, due to the insufficient context information, the association relation of threat alarms among data packets sent by the same attacker is difficult to find, so that the overall view of multi-step attacks is difficult to comprehensively know, and the accuracy of threat alarms is reduced.

Disclosure of Invention

Aiming at the problems that a plurality of complex multi-step attack behaviors faced by a computer network seldom occur in a longer time range, so that the number of attack samples which can be used for system training in an alarm data set is limited, and even if the data quantity is huge, the samples for threat information analysis are still limited, the invention discloses a threat alarm association analysis method and device based on network threat information. The present invention automates the process of threat analysis by correlating opponents tactics, techniques and procedures (TTPs) with the attack IDS detection mechanism.

The invention discloses a threat alarm association analysis method based on network threat information, which comprises the following steps:

s1, acquiring network threat information;

s2, constructing and obtaining a threat alarm association analysis model by utilizing the network threat information; the threat alarm association analysis model comprises a threat semantic dictionary, an alarm type mapping dictionary and a technical number pair;

s3, analyzing and processing the network threat information by using the threat alert association analysis model to obtain a threat alert association analysis result; and the threat alarm association analysis result is used for representing the association information of the threat alarm of the network threat information.

The acquiring network threat information comprises the following steps:

s11, acquiring a network threat event report library, a network countermeasure tactics technology and a general knowledge document; the network threat event report library comprises a plurality of network threat event reports;

s12, acquiring alarm logs generated by network intrusion detection equipment deployed on different threat observation points; the alarm log comprises alarm time, alarm type, source IP, source port, target IP, target port and protocol information;

s13, constructing and obtaining the network threat information by using the network threat event report library, the network countermeasure tactics technology, the general knowledge document and the alarm log.

The construction of the threat alarm association analysis model by utilizing the network threat information comprises the following steps:

s21, constructing a first attack behavior technology chain library and a threat semantic dictionary by using the network threat event report library, a network countermeasure tactics technology and a general knowledge document;

s22, constructing a second attack behavior technology chain library and an alarm type mapping dictionary by using the threat semantic dictionary and the first attack behavior technology chain library;

s23, constructing a technical chain pair library by using the second attack behavior technical chain library.

The construction of the first attack behavior technology chain library and the threat semantic dictionary by using the network threat event report library, the network countermeasure tactics technology and the general knowledge document comprises the following steps:

s211, extracting each network threat event report to obtain a description sentence of a corresponding attack behavior;

s212, sequencing the extracted description sentences of the attack behaviors of each network threat event report according to the occurrence time of the description sentences to obtain a first attack behavior technical chain of the network threat event report; the first attack behavior technical chain is a directed sequence of description sentences of attack behaviors;

s213, integrating the first attack behavior technology chains reported by all the network threat events to obtain a first attack behavior technology chain library;

s214, extracting the network countermeasure tactics technology and the general knowledge document to obtain an attack technology document; the attack technical document comprises attack alarm types, attack technical description sentences and corresponding technical numbers;

s215, carrying out semantic similarity calculation on the description statement of the attack behavior of the first attack behavior technical chain and the attack technology description statement of the attack technology document to obtain an attack technology similarity value;

S216, judging whether the similarity of the attack technique is larger than a set similarity threshold, if so, establishing a mapping relation between attack technique description sentences in the attack technique document and attack behaviors corresponding to the description sentences of the attack behaviors of the first attack behavior technique chain;

s217, constructing a threat semantic dictionary by using all the established mapping relations.

The construction of the second attack behavior technology chain library and the alarm type mapping dictionary by using the threat semantic dictionary and the first attack behavior technology chain library comprises the following steps:

s221, mapping the description sentences of the attack behaviors of each first attack behavior technical chain into the attack technology description sentences by using the mapping relation of the threat semantic dictionary; after mapping operation is completed on the description sentences of all the attack behaviors of the first attack behavior technical chain, determining the mapped first attack behavior technical chain as a corresponding second attack behavior technical chain;

s222, integrating all the second attack behavior technical chains to obtain a second attack behavior technical chain library;

s223, grouping the alarm logs by utilizing the target IP to obtain a plurality of alarm log sets; an alarm log set comprising a plurality of alarm logs with the same target IP;

S224, carrying out statistical analysis on the simultaneous occurrence times of the attack alarm type and the technical number in the attack technical document to obtain a corresponding relation between the attack alarm type and the technical number; and constructing and obtaining an alarm type mapping dictionary by utilizing all the corresponding relations.

The construction of the technical chain pair library by using the second attack behavior technical chain library comprises the following steps:

s231, constructing and obtaining a technical number pair for each second attack behavior technical chain in the second attack behavior technical chain library by using the technical numbers of the attack technical description sentences which are adjacent to each other in front and back and are contained in the second attack behavior technical chain;

s232, constructing a technical chain pair library by utilizing the technical number pairs constructed by all the second attack behavior technical chains.

The method for analyzing the network threat information by using the threat alert association analysis model to obtain a threat alert association analysis result comprises the following steps:

s31, sequencing all alarm logs in each alarm log set according to time sequence to obtain a corresponding alarm log sequence;

s32, for each alarm log sequence, carrying out binary association judgment on every two alarm logs adjacent to each other in the alarm log sequence to obtain binary association relation information of the alarm log sequence;

S33, carrying out alarm chain identification operation on the binary association relation information of the alarm log sequence to obtain an alarm threat chain of the alarm log sequence;

s34, integrating alarm threat chains of all alarm log sequences to obtain an alarm threat chain library;

s35, judging the association relation between every two alarm log sets to obtain the attack association relation between the alarm logs of the two alarm log sets;

s36, determining alarm logs in the alarm log set as points, and determining attack association relations and binary association relations among different alarm logs as edges; constructing and obtaining an alarm association graph by utilizing the points and the edges; in the alarm association diagram, points corresponding to alarm logs with attack association relation and binary association relation are connected by edges, and points corresponding to alarm logs without attack association relation and binary association relation are not connected;

s37, constructing and obtaining threat alarm association analysis results by utilizing the alarm threat chain library and the alarm association graph.

And performing binary association judgment on every two front and back adjacent alarm logs in the alarm log sequence to obtain binary association relation information of the alarm log sequence, wherein the binary association relation information comprises the following steps:

S321, for every two front and back adjacent alarm logs in the alarm log sequence, acquiring two corresponding alarm types; determining two technical numbers corresponding to the two alarm types by using an alarm type mapping dictionary; determining the two technical numbers as a technical number pair to be matched; determining the sequence of the two technical numbers in the technical number pair to be matched according to the front-back sequence of the two corresponding alarm logs in the technical number pair to be matched;

s322, searching whether the technical chain pair library contains the technical chain pair which is the same as the technical number pair to be matched, if so, determining that a binary association relationship exists between two alarm logs corresponding to the technical number pair to be matched, and determining that the two alarm logs with the binary association relationship are associated alarm log pairs; if the technical number pair to be matched does not exist, determining that a binary association relationship does not exist between two alarm logs corresponding to the technical number pair to be matched;

s323, constructing binary association relation information by using the judging result of whether the binary association relation exists or not and whether the association alarm log exists or not, which are obtained in S322.

And carrying out alarm chain identification operation on the binary association relation information of the alarm log sequence to obtain an alarm threat chain of the alarm log sequence, wherein the alarm threat chain comprises the following components:

s331, sorting the associated alarm log pairs of the alarm log sequence according to the occurrence time, and obtaining an associated alarm log pair sequence;

s332, determining the first association alarm log pair of the association alarm log pair sequence as the current discrimination log pair;

s333, constructing an initialized alarm threat chain by utilizing the current discrimination log pair; the initialized alarm threat chain comprises a current discrimination log pair;

s334, determining the last associated alarm log pair in the alarm threat chain as the current judging log pair;

s335, determining a later alarm log in the last-ordered associated alarm log pair in the alarm threat chain as a log to be distinguished;

s336, determining an associated alarm log pair of which the sequence is positioned at the rear of the current judging log pair in the associated alarm log pair sequence as a current matching log pair; when the current matching log pair is the last associated alarm log pair of the associated alarm log pair sequence, S339 is executed;

S337, judging whether the previous alarm log of the current matching log pair is the same as the log to be judged, and obtaining a log judging result;

s338, if the log judging results are the same, adding the associated alarm log pair to the rear of the current judging log pair in the alarm threat chain, updating the alarm threat chain, and executing S334; if the log discrimination results are different, replacing the current matching log pair by utilizing the current matching log pair to a later associated alarm log pair in the associated alarm log pair sequence, and executing S337; when the current matching log pair is the last associated alarm log pair of the associated alarm log pair sequence, S339 is executed;

s339, judging whether the alarm threat chain only comprises one associated alarm log pair, and deleting the alarm threat chain if the alarm threat chain only comprises one alarm threat chain; if the associated alarm log pair is more than one, the alarm threat chain is stored, the first associated alarm log pair in the associated alarm log pair sequence is deleted from the associated alarm log pair sequence, the update of the associated alarm log pair sequence is realized, whether the associated alarm log pair sequence is empty or not is judged, and if not, S332 is executed; if the alarm log sequence is empty, determining all stored alarm threat chains as the alarm threat chains of the alarm log sequence.

And judging the association relation between every two alarm log sets to obtain the attack association relation between the alarm logs of the two alarm log sets, wherein the method comprises the following steps:

s351, traversing every two alarm log sets to obtain two alarm logs belonging to different alarm log sets, judging whether target IP addresses of the two alarm logs are the same, if so, determining that an attack association relationship exists between the two alarm logs, and if not, determining that the attack association relationship does not exist between the two alarm logs;

s352, executing S351 on the alarm logs belonging to different alarm log sets of each two alarm log sets to obtain the attack association relationship between the alarm logs of the two alarm log sets.

The beneficial effects of the invention are as follows:

the invention realizes the intrinsic association between alarms, namely, the sending stiffness of an important feature of intrusion alarms on the basis of grouping the alarms according to the target IP address. According to the invention, a high-efficiency IDS alarm association framework is constructed by identifying complex multi-step attack scenes, the proposed framework is based on TTPs adopted by different attack organizations in actual network threat events extracted in threat information, ATT & CK frameworks are used as TTPs representation standards, mapping dictionary between different IDS equipment alarm types and ATT & CK technical tactics is synchronously constructed, and association tie between low-layer threat information such as IDS equipment alarms and high-layer threat semantic information such as TTPs contained in threat information is constructed. The framework can be applied to application scenes such as detection, attack process reconstruction and prediction of advanced complex threats in network space, and the accuracy of threat alarming is improved.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

For a better understanding of the present disclosure, an embodiment is presented herein. FIG. 1 is a flow chart of the method of the present invention.

s1, acquiring network threat information;

The acquiring network threat information comprises the following steps:

S13, constructing and obtaining network threat information by using the network threat event report library, the network countermeasure tactics technology, the general knowledge document and the alarm log;

the network threat event report library, the network countermeasure tactics technology, the general knowledge document and the alarm log are utilized to construct and obtain network threat information, and the network threat event report library, the network countermeasure tactics technology, the general knowledge document and the alarm log are combined to obtain the network threat information;

The directional sequence refers to that a previous element in the sequence points to a next element, and the previous element and the next element have fixed pointing relation.

The extraction operation can be realized by adopting a TextBlob or a PyTorch-NLP processed by natural language;

the technical chain of attack behavior is called C-TTPs in the invention. Standard identifier references are rarely used in general CTIRs with respect to TTPs, but are presented in the form of unstructured, human-understandable text descriptions, which not only describe the different attack behaviors in the attack process, but also contain their contextual semantic relationships. The technical chain of the attack behavior is constructed based on attack tactics, technologies and program contents contained in the attack behavior under different attack steps on the network threat event in the CTIR. In the construction process of the attack behavior technical chain, the execution sequence relationship among attack behaviors of different attack steps under the current threat event is mainly identified as the basis of attack technology association so as to construct the attack behavior technical chain, thereby describing the threat event attack process.

The network combat tactic technology and the general knowledge document are obtained from an combat tactic and technology knowledge base of the ATT & CK of the internet.

The invention adopts a semantic similarity calculation method based on a threat semantic dictionary to map and convert the C-TTPs extracted from CTRs with different sources, different structures and different description formats into TTPs based on unified standard characterization of an ATT & CK framework, thereby providing priori knowledge of a standardized structure for threat detection and analysis processes.

The semantic similarity calculation comprises the following calculation expressions:

wherein, R (s 1, s 2) represents the correlation calculation result of the sentence s1 and the sentence s2, s1 (i) represents the i-th element of the corresponding vector of the sentence s1, s2 (i) represents the i-th element of the corresponding vector of the sentence s2, and L represents the number of elements contained in the sentence.

S222, integrating all the second attack behavior technical chains to obtain a second attack behavior technical chain library; s223, grouping the alarm logs by utilizing the target IP to obtain a plurality of alarm log sets; an alarm log set comprising a plurality of alarm logs with the same target IP;

A typical alert type mapping dictionary may be represented as table 1.

TABLE 1 alarm type mapping dictionary table

Alarm type	Attack technique numbering
		A	1
B	2
		C	3
D	4

The statistical analysis is performed on the simultaneous occurrence times of the attack alarm type and the technical number in the attack technical document to obtain the corresponding relationship of the attack alarm type and the technical number, and the statistical analysis comprises the following steps:

counting the occurrence times of combinations of attack alarm types and technical numbers in each attack technical document to obtain a statistical value of each combination; presetting a statistical threshold;

for each combination, judging whether the statistical value of each combination is larger than a statistical threshold value in sequence, and if the statistical value of each combination is larger than the statistical threshold value, determining that the attack alarm type and the technical number in the combination have a corresponding relation; if the attack alarm type is smaller than the statistical threshold value, determining that the attack alarm type and the technical number in the combination have no corresponding relation.

The extracting the technical number pair from the attack technical chain in the second attack behavior technical chain library may be: the second attack behavior technology chain library comprises the following attack behavior technology chains:

attack behavior technology chain 1: attack technical description statement 1 (technical number 1) →attack technical description statement 2 (technical number 2);

attack behavior technology chain 2: attack technical description statement 1 (technical number 1) →attack technical description statement 2 (technical number 2) →attack technical description statement 3 (technical number 3);

attack behavior technology chain 3: attack technical description statement 3 (technical number 3) →attack technical description statement 4 (technical number 4) →attack technical description statement 5 (technical number 5);

then the pairs of technical numbers that can be extracted are: technical number 1- & gt technical number 2- & gt technical number 3- & gt technical number 4- & gt technical number 5.

The construction method of the alarm association graph can adopt a finite graph construction method in graph theory.

The alarm log is expressed as Alert < time, type, source IP, source port, destination IP, destination port, protocol >. Wherein the field "type" is used, alert to represent the name of the alarm log.

In step S3, assuming that the original alarm arrives at the system in real time, two points need to be noted when the alarms are grouped: 1) Source IP and source port are not used as a basis for packets of the alarm log, as they are easily tampered with by malicious attackers; 2) Protocols are also excluded from the grouping of alarm logs because most alarms involve only a few protocols, in other words, this field may coarsen the granularity of the behavior extraction process.

In step S36, definition of an alarm association diagram; the alarm association graph is a directed acyclic graph, denoted g= < V (n), E (f) >, where V represents a set of nodes and E represents a set of edges of the graph. Each graph node represents an alarm log, and the connection between two nodes presents an attack association relationship or a binary association relationship between the two alarm logs. In node V (n), n represents the name of an invading session, and f in E (f) represents the frequency of binary correlations.

s322, searching whether the technical chain pair library contains the technical chain pair which is the same as the technical number pair to be matched, if so, determining that a binary association relationship exists between two alarm logs corresponding to the technical number pair to be matched, and determining that the two alarm logs with the binary association relationship are associated alarm log pairs; if the technical number pair to be matched does not contain the technical number pair, determining that a binary association relationship does not exist between the two alarm logs corresponding to the technical number pair to be matched.

S323, constructing binary association relation information by utilizing the judging result of whether the binary association relation exists or not and whether the association alarm log exists or not, which are obtained in the S322; the binary association relation information comprises whether binary association relation and association alarm log pairs exist or not.

The step S32 may be to determine whether a binary association exists between two alarm logs according to the alarm type and the time attribute of the alarm log in the alarm log set in the packet alarm, for example, the alarm G1: < t1, a, IPl, port2, IP3, port3, protocol 1> and alarm G2: < t2, B, IP2, port2, IP3, port3, protocol 1>; based on the mapping dictionary and the constructed attack technology pair, if t2 is later than t1, the directed binary association relationship of G1-G2 is considered to exist.

S339, judging whether the alarm threat chain only comprises one associated alarm log pair, and deleting the alarm threat chain if the alarm threat chain only comprises one alarm threat chain; if the associated alarm log pair is more than one, the alarm threat chain is stored, the first associated alarm log pair in the associated alarm log pair sequence is deleted from the associated alarm log pair sequence, the update of the associated alarm log pair sequence is realized, whether the associated alarm log pair sequence is empty or not is judged, and if not, S332 is executed; if the alarm log sequence is empty, determining all stored alarm threat chains as alarm threat chains of the alarm log sequence;

the threat alert chain identification includes: after the binary association relation among all alarms in the alarm group is found, a threat alarm chain is identified, for example, the directed binary association relation of G1→G2 is found, and the directed binary association relation of G2→G3 is found, so that the threat alarm chain of G1→G2→G3 can be identified.

Step S33 is responsible for revealing the logical association between alarms, which becomes increasingly apparent when two alarms occur successively in the same attack scenario, and therefore determining the threat alert chain that constitutes the attack scenario is an important task.

According to an attack behavior technology chain composed of tactics, technologies and programs contained in actual threat event cases extracted from CTIRs, the obtained TTPs can quickly find out the highly stable association between alarms hidden in each group of alarm sets through a mapping comparison table between alarm types and attack technologies/sub-technologies under an ATT & CK framework, and redundant alarms are deleted from each group of alarm sets, so that the influence of false alarms is reduced.

The alarm log pair comprises two alarm logs, which are called a previous alarm log and a next alarm log;

And judging the attack association relation, namely judging whether the target IP addresses of the two alarm logs are the same.

The network threat event comprises a plurality of attack behaviors;

in the application, C-TTPs refer to an attack behavior technical chain, TTPs refer to tactics, technologies and programs (TTPs) included in attack behaviors, CTIRs are network threat event reports, and network threat event reports disclosed by a network include APT reports and TC reports published by DALPA; ATT & CK is an open source knowledge base of network attack behavior, IDS equipment is network intrusion detection equipment

The application adopts a semantic similarity calculation method based on a threat semantic dictionary to map and convert the C-TTPs extracted from CTRs with different sources, different structures and different description formats into TTPs based on unified standard characterization of an ATT & CK framework, thereby providing priori knowledge of a standardized structure for threat detection and analysis processes.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims

1. A threat alarm association analysis method based on network threat information is characterized by comprising the following steps:

s1, acquiring network threat information;

2. The threat alert association analysis method based on network threat information of claim 1, wherein the acquiring the network threat information comprises:

3. The threat alert association analysis method based on network threat information according to claim 2, wherein the constructing a threat alert association analysis model using the network threat information comprises:

4. The threat alert association analysis method based on cyber threat information of claim 3, wherein constructing a first attack behavior technology chain library and a threat semantic dictionary by using the cyber threat event report library and a cyber countermeasure tactical technology and a common knowledge document comprises:

5. The threat alert association analysis method based on network threat information of claim 3, wherein constructing a second attack behavior technology chain library and an alert type mapping dictionary using the threat semantic dictionary and the first attack behavior technology chain library comprises:

6. The threat alert correlation analysis method based on network threat information of claim 3, wherein constructing a technology chain pair library using the second attack behavior technology chain library comprises:

7. The threat alert association analysis method based on network threat information according to claim 2, wherein the analyzing the network threat information by using the threat alert association analysis model to obtain a threat alert association analysis result comprises:

8. The threat alert association analysis method based on network threat information of claim 7, wherein the performing binary association discrimination on every two front-back adjacent alert logs in the alert log sequence to obtain binary association relationship information of the alert log sequence includes:

9. The threat alert association analysis method based on network threat information of claim 7, wherein the performing an alert chain identification operation on the binary association information of the alert log sequence to obtain an alert threat chain of the alert log sequence includes:

10. The threat alert association analysis method based on network threat information of claim 7, wherein the performing association relationship discrimination between each two alert log sets to obtain attack association relationship between alert logs of the two alert log sets comprises: