CN111709022B - Hybrid alarm association method based on AP clustering and causal relationship - Google Patents

Hybrid alarm association method based on AP clustering and causal relationship Download PDF

Info

Publication number
CN111709022B
CN111709022B CN202010549657.9A CN202010549657A CN111709022B CN 111709022 B CN111709022 B CN 111709022B CN 202010549657 A CN202010549657 A CN 202010549657A CN 111709022 B CN111709022 B CN 111709022B
Authority
CN
China
Prior art keywords
alarm
clustering
data
similarity
degree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010549657.9A
Other languages
Chinese (zh)
Other versions
CN111709022A (en
Inventor
陶晓玲
赵培超
石兰
顾涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN202010549657.9A priority Critical patent/CN111709022B/en
Publication of CN111709022A publication Critical patent/CN111709022A/en
Application granted granted Critical
Publication of CN111709022B publication Critical patent/CN111709022B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/552Detecting local intrusion or implementing counter-measures involving long-term monitoring or reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Abstract

The invention discloses a mixed alarm correlation method based on AP clustering and causal relationship, which comprises the steps of firstly, carrying out standardization processing on an obtained original alarm log based on an intrusion detection information exchange format, and obtaining the alarm log according to extracted alarm attributes; secondly, obtaining similarity among a plurality of data points in the alarm log and reference degree of corresponding nodes, performing iterative updating of an attraction degree matrix and an attribution degree matrix, introducing a damping factor into an AP clustering algorithm for attenuation until a candidate clustering center is stable or the iteration times reach a threshold value, and outputting a clustering center and a data set after an attack scene is divided; and finally, calculating the overall similarity value of any two pieces of alarm data by using a weighted average algorithm, clustering and dividing the overall similarity value after taking a negative value based on an AP clustering algorithm, sequencing according to a time sequence, and performing association matching on the alarm data, so that the association precision and the time efficiency are improved.

Description

Hybrid alarm association method based on AP clustering and causal relationship
Technical Field
The invention relates to the technical field of network security, in particular to a hybrid alarm association method based on AP clustering and causal relationship.
Background
With the rapid development of global informatization, a massive informatization system becomes a key infrastructure of the country and government, but with the acceleration of a digitization process, the network security threatens not only the traditional industry and the key information infrastructure, but also risks in emerging fields such as artificial intelligence, cloud computing, big data, internet of things and the like are increased rapidly; in recent years, China is continuously strengthening self-response capability aiming at severe network security situation and achieving certain achievement. In order to effectively protect network security, managers widely deploy security products such as firewalls, Intrusion Detection Systems (IDS) and the like in a network system, but due to continuous expansion of network scale and increasingly complex structure, the existing IDS system is exposed to many problems, researchers open up a new idea while improving an intrusion detection algorithm and optimizing a detection model, and propose to perform fusion analysis on mutually independent alarm information through an alarm correlation technology.
With the wide application and development of the alarm correlation technology in the intrusion detection field, some representative methods and achievements appear, which can be roughly divided into the following methods: (1) associating methods based on predefined attack scenarios; (2) correlation method based on causal relationship; (3) an association method based on alarm attribute similarity; (4) the correlation method based on statistical analysis needs to establish a knowledge base based on the alarm correlation technology of causal relationship, establish causal rules for each attack action, and perform correlation analysis in a mode of pattern matching. The alarm correlation method based on the attribute similarity has the advantages of simple algorithm, strong real-time performance and the like, but the similarity of the attributes does not have a standard, the final correlation result is greatly influenced by artificially set parameters such as a similarity weight coefficient and the like, the correlation result has poor logicality, and an administrator cannot be helped to understand the attack intention and the relation between attack actions; through years of research and development, the alarm correlation technology has made a certain progress, but recent research shows that no single processing method can perfectly solve the alarm correlation problem, the existing method inevitably increases the calculation overhead while improving the correlation precision, and the adoption of a multi-type mixed architecture to realize the complementation between different methods is a direction worthy of research.
Disclosure of Invention
The invention aims to provide a hybrid alarm association method based on AP clustering and causal relationship, which improves association precision and time efficiency.
In order to achieve the above object, the present invention provides a hybrid alarm association method based on AP clustering and causal relationship, including:
normalizing the obtained original alarm log to obtain an alarm log;
the attack scenes are divided by improving the similarity and combining an AP clustering method;
and matching and associating alarm data in the attack scene based on the causal relationship idea.
The normalizing process of the obtained original alarm log to obtain the alarm log includes:
and standardizing the obtained original alarm log based on an intrusion detection information exchange format, and after deleting repeated alarm logs, extracting a plurality of alarm attributes from the original alarm data to form a 7-tuple to obtain the alarm log.
Wherein, the utilization of the improvement of the similarity and the division of the attack scene by combining the AP clustering method comprises the following steps:
and obtaining the similarity among a plurality of data points in the alarm log and the reference degree of corresponding nodes, carrying out iterative update on an attraction degree matrix according to the corresponding similarity and the reference degree, and carrying out iterative update on an attribution degree matrix according to a corresponding attraction degree value generated by the iterative update of the attraction degree matrix.
Wherein, the utilization of the improvement of the similarity and the division of the attack scene by combining the AP clustering method further comprises the following steps:
and introducing a damping factor into the AP clustering algorithm, respectively attenuating the attraction degree matrix and the attribution degree matrix, and judging whether the candidate clustering center is stable or whether the iteration times reach a threshold value.
Judging whether the candidate clustering center is stable or whether the iteration frequency reaches a threshold value comprises the following steps:
if the candidate clustering center is not stable or the iteration times do not reach the threshold value, continuously carrying out iterative updating on the attraction degree matrix and the attribution degree matrix until the candidate clustering center is stable or the iteration times reach the threshold value, and outputting the clustering center and the divided attack scene set;
if the candidate clustering center is stable or the iteration number reaches a threshold value, judging whether the sum of the self-attraction degree and the self-attribution degree is larger than zero;
if the sum of the self-indexing degree and the self-attribution degree is greater than zero, the corresponding node is a clustering center;
and if the sum of the self-index degree and the self-attribution degree is less than or equal to zero, the corresponding node is a non-clustering center.
After the cluster center and the divided attack scene set are output, the method further comprises the following steps:
and calculating the overall similarity values of any two pieces of alarm data by using a weighted average algorithm according to the attack type similarity value, the IP address similarity value, the port similarity value and the time similarity value of any two pieces of alarm data in the attack scene set, and clustering and dividing the overall similarity values after taking negative values based on an AP clustering algorithm.
The matching and association of the alarm data in the attack scene based on the causal relationship idea comprises the following steps:
after the clustered and divided data are sorted according to the time sequence, a first piece of data is sequentially matched with a plurality of second pieces of data according to the step length of 1, the first piece of data with matched ports and IP addresses is divided into related and corresponding alarm data, and the first piece of data with unmatched ports and IP addresses is divided into an isolated alarm queue until all data are matched.
The invention relates to a mixed alarm correlation method based on AP clustering and causal relationship, which comprises the steps of firstly, carrying out standardization processing on an obtained original alarm log based on an intrusion detection information exchange format, and obtaining the alarm log according to extracted alarm attributes; secondly, obtaining similarity among a plurality of data points in the alarm log and reference degree of corresponding nodes, carrying out iterative updating of an attraction degree matrix and an attribution degree matrix, introducing a damping factor into an AP clustering algorithm for attenuation until a candidate clustering center is stable or the iteration frequency reaches a threshold value, and outputting a clustering center and a data set after an attack scene is divided; and finally, calculating the overall similarity value of any two pieces of alarm data by using a weighted average algorithm, clustering and dividing the overall similarity value after taking a negative value based on an AP clustering algorithm, sequencing according to a time sequence, and performing association matching on the alarm data, so that the association precision and the time efficiency are improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic step diagram of a hybrid alarm association method based on AP clustering and causal relationship provided in the present invention.
FIG. 2 is a diagram of the manner in which information is communicated between data points provided by the present invention.
FIG. 3 is a flow chart of causal relationship based alarm association provided by the present invention.
Fig. 4 is a network topology diagram of the honeypot system provided by the invention.
Fig. 5 is a comparison graph of the run time provided by the present invention.
Fig. 6 is a first attack diagram provided by the present invention.
The second attack graph provided in fig. 7.
The third attack graph provided in fig. 8.
The fourth attack graph provided in fig. 9.
The fifth attack diagram provided in fig. 10.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
Referring to fig. 1, the present invention provides a hybrid alarm association method based on AP clustering and causal relationship, including:
s101, normalizing the acquired original alarm log to obtain the alarm log.
Specifically, in order to effectively protect network security, an administrator usually deploys a plurality of intrusion detection systems in a network security protection system, and alarm data generated by different intrusion detection systems have different formats, and the data cannot be directly used for correlation analysis, so that normalization processing of an alarm log is the basis of subsequent work. The (IDMEF) is a standard format initiated and established by an Intrusion Detection Working Group (IDWG), the scheme provides basis for realizing information sharing and format exchange among IDSs of different types, so that the acquired original alarm log is standardized based on the intrusion detection information exchange format, after repeated alarm logs are deleted, 8 alarm attributes are extracted from the original alarm data to form a 7-tuple to obtain the alarm log, wherein the extracted alarm attributes have meanings shown in a table 1,
TABLE 1 alarm data Attribute meanings
Figure BDA0002541996710000041
Figure BDA0002541996710000051
Because the same intrusion event can trigger multiple intrusion detection devices in the network security system at the same time, a large number of repeated alarms exist in the replayed data. Aiming at the problem, other alarms with the same attribute except the signature attribute and the time attribute within a certain time threshold are combined into a single alarm, so that repeated alarm data are reduced, and preparation is made for subsequent alarm correlation work.
And S102, dividing the attack scene by improving the similarity and combining an AP clustering method.
Specifically, the similarity among a plurality of data points in the alarm log and the reference degree of a corresponding node are obtained, specifically, a similarity matrix S formed by the similarities among N data points is used as input for cluster analysis, S (i, j) is used for representing the similarity between a node i and a node j, a reference degree Perference concept is introduced for representing the reference degree of the data points as a cluster center, the reference degree of the point i is represented as P (i) or S (i, i), the higher the value is, the higher the possibility that the point i is used as the cluster center is, because the AP algorithm considers that each data point is possible to be used as the cluster center, all P take the same value, the final cluster number is greatly influenced by the reference degree, and the P value is usually taken as a median value or a minimum value of the input similarity value.
And performing iterative update on an attraction matrix according to the corresponding similarity and the corresponding reference degree, and performing iterative update on an attribution degree matrix according to a corresponding attraction degree value generated by the attraction degree matrix iteration, introducing two concepts of attraction degree (responsiveness) and attribution degree (Availability), and realizing transmission and update between data points by repeatedly updating the attraction degree matrix and the attribution degree matrix iteratively to obtain a final clustering center point. The attraction degree r (i, j) represents the information sent by the data point i to the data point j, and reflects the degree that the point j is suitable for being used as the clustering center of the point i, the larger the r value is, the higher the possibility that the point j is used as the clustering center of the point i is, the attribution degree a (i, j) represents the information transmitted by the data point j to the point i, and reflects the suitable degree that the point i selects the point j as the clustering center of the point i, and the larger the a value is, the higher the possibility that the point i selects the cluster center point j as the clustering center is. The information transmission manner between the data points is shown in fig. 2, wherein the update formula of the attraction matrix R is:
Figure BDA0002541996710000052
the update formula of the attribution degree matrix A is as follows:
Figure BDA0002541996710000061
meanwhile, in order to avoid the problem of data oscillation in the matrix updating process, a damping factor lambda is introduced into the AP clustering algorithm to respectively attenuate the attraction degree matrix and the attribution degree matrix, and the updating formula is as follows:
R t+1 (i,k)=λ*r t (i,k)+(1-λ)*r t+1 (i,k)
A t+1 (i,k)=λa t (i,k)+(1-λ)*a t+1 (i,k)
wherein r is t+1 (i, k) represents the attraction between the point i and the point k after the t +1 th update,R t+1 (i, k) represents the attraction degree after the attenuation. a is t+1 (i, k) denotes the attribution degree after the t +1 th update, A t+1 (i, k) is the degree of attribution after attenuation.
Then judging whether the candidate clustering center is stable or whether the iteration frequency reaches a threshold value, if the candidate clustering center is not stable or the iteration frequency does not reach the threshold value, continuing to iteratively update the attraction degree matrix and the attribution degree matrix until the candidate clustering center is stable or the iteration frequency reaches the threshold value, and outputting the clustering center and the divided attack scene set;
if the candidate clustering center is stable or the iteration number reaches a threshold value, judging whether the sum of the self-attraction degree r (i, i) and the self-attribution degree a (i, i) is greater than zero; if the sum of the self-indexing degree and the self-attribution degree is larger than zero, namely r (i, i) + a (i, i) > 0, the corresponding node i is a clustering center; and if the sum of the self-attraction degree and the self-attribution degree is less than or equal to zero, namely r (i, i) + a (i, i) is less than or equal to 0, the corresponding node is a non-clustering center, and for a non-clustering center point j, if a data point k enables r (j, k) + a (j, k) to be the maximum, the point k is the clustering center of the data point j. And finally, outputting the final clustering center and the data set to which the classification belongs.
The standard AP algorithm adopts Euclidean distance as a similarity calculation criterion, the method belongs to a similarity calculation method based on numerical variables, and is not suitable for non-numerical alarm log attributes, so different similarity calculation methods are set for each attribute of alarm data, the total similarity among the alarm data is obtained through weighted average of the similarity of each attribute, and a similarity matrix is further obtained to serve as the input of the AP algorithm.
Firstly, calculating an attack type similarity value, an IP address similarity value, a port similarity value and a time similarity value of any two pieces of alarm data in the data set, specifically:
attack type similarity:
for the alarm type attribute, if two pieces of alarm data alert i And alert j If the alarm types are the same, the similarity is set to be 1, otherwise, the similarity is set to be 0:
Figure BDA0002541996710000062
and (3) IP address similarity calculation:
the IP addresses in the alarm log are expressed in decimal form, reference [79], and the similarity is calculated by comparing the same prefix digit, so the IP addresses are firstly converted into a binary representation format, and the calculation formula is as follows:
Figure BDA0002541996710000071
wherein r represents the continuous same digit of the IP address from the high order to the low order of the two pieces of alarm data.
Port similarity calculation:
the port number is a boolean attribute, and if two alarm ports are completely the same, the similarity is considered to be 1, and if they are different, the similarity is 0:
Figure BDA0002541996710000072
calculating the time similarity:
in most researches, a time window is set to determine a limiting threshold of alarm time, the set size of the window threshold can influence the final similarity calculation, date attributes are firstly compared, the time similarity is calculated by adopting a sigmoid function for alarms with the same date attributes, otherwise, the similarity is 0, and the calculation formula is as follows:
Figure BDA0002541996710000073
wherein
Figure BDA0002541996710000074
Then, the weighted average algorithm is used for calculating two overall phases of the alarm dataAnd (4) determining the similarity value, wherein in the process of determining the attribute weight, in order to avoid the subjectivity of artificial assignment, the weight of each characteristic attribute is determined by adopting a principal component analysis method. Two-alarm alert i With alert j The overall similarity calculation formula of (2) is as follows:
Figure BDA0002541996710000075
wherein, sim l Representing the degree of similarity of the respective attributes, ω l And representing the weight corresponding to each attribute.
Based on the similarity calculation method, a similarity matrix can be obtained, the overall similarity value takes a negative value based on an AP clustering algorithm and then is subjected to clustering division, and alarm data with high similarity are clustered in the same attack scene. The attack scenario divides the pseudo-code as follows:
attack scene partitioning based on AP clustering
Input: alarm data set Alert ═ { a ═ 1 ,a 2 ,…,a n }
Output: attack scene set (sequence) { sensor ═ 1 ,sence 2 ,…,sence n }
// similarity matrix calculation
Simility=[sim 11 ,sim 12 ,…sim nn ]
V/initial attraction degree, attribution degree matrix
Initialresponsibility=[r 11 ,r 12 ,…r nn ]
Initialavailability=[a 11 ,a 12 …a nn ]
V/update attraction and affiliation matrix
If r(i,i)+a(i,i)<0
For(i=0,i<n;i++)
{
R t+1 (i,j)=λ*r t (i,j)+(1-λ)*r t+1 (i,j)
A t+1 (i,j)=λa t (i,j)+(1-λ)*a t+1 (i,j)
}
Endif
V/record Cluster center
String center=updatecenter(classlist,center)
Readcenter.filewriter(center)
ReturnSence={sence 1 ,sence 2 ,…,sence n }
S103, matching and associating alarm data in the attack scene based on the causal relationship idea.
Specifically, by means of an AP clustering method, massive unordered alarm logs are divided according to a similarity principle to form an attack scene set with small intra-class intervals and large inter-class intervals. As shown in the causal relationship-based alarm association flow chart provided in fig. 3:
sequentially reading the alarm data sets after clustering processing, and sequentially carrying out association analysis on data in each attack scene; and then sorting the clustered and divided data according to the time sequence, matching a first piece of data with a plurality of second pieces of data in sequence according to the step length of 1, dividing the first piece of data with matched ports and IP addresses into related corresponding alarm data, and dividing the first piece of data with unmatched ports and IP addresses into an isolated alarm queue until all data are matched.
The causality association method considers that most of a complete network attack is completed according to a certain logic sequence through attack behaviors in different stages, the former attack behavior is used as a condition for providing next-step invasion behaviors, for example, an attacker needs to realize TCP SYN denial of service attack on a certain host, firstly, a message is sent to a target computer and can carry out next-step attack actions after the target receives and establishes connection, therefore, the premise that each attack action has the requirement is provided and a corresponding result is generated, the premise and the result of alarm information are connected according to a certain rule, and a complete attack sequence can be found. The method is used for carrying out correlation analysis on the alarm data in the same attack scene by taking the thought as a reference, so that the logical relation among the alarm data is described.
The method is adopted for experiments, a win10 operating system, Intel Core i7-8550U, a 2.4GHz processor, an 8G memory and a hard disk are configured in a 1T mode, an alarm correlation algorithm adopts Python3.6 programming in a Windows environment, a programming platform is Pycharm Community 2017.3, and the alarm data file processing work is simply and efficiently realized by means of a scipit-lern library. After alarm association is finished, the attack scene is displayed in a visual mode in the form of an attack graph by using a Graphviz graph drawing tool, and a Honeypot data set of a security research organization of a dense network project is selected for research and analysis. The Honeypot data set (Honeypot data set) is a Honeypot data set which is constructed to simulate a real network system and runs a disguised service in the Honeypot data set to induce an intruder to start attacks and capture and analyze related information, a Honeypot system network topological diagram is shown in fig. 4, thousands of virtual systems can be deployed in one virtual Honeypot system, each system uses different IP and port numbers, so that the diversity of data sources is guaranteed, the data are from the real network attacks, the data types comprise multiple aspects such as an operating system, a violent network attack, a host computer vulnerability and port scanning, and the attack types contained in the data are shown in table 3.
TABLE 3 honeypot data attack types
Figure BDA0002541996710000091
Figure BDA0002541996710000101
The method includes the steps that data need to be preprocessed before alarm association work is carried out, 123854 pieces of alarm data in 2014 October in honeypot data set are selected, the number of alarms is reduced to 80928 after preprocessing and duplicate removal, 35% of repeated alarms are deleted, and repeated association is avoided. As the complexity of the alarm correlation method is mainly concentrated on the attack scene division part based on AP clustering, the most common clustering method K-means algorithm in machine learning is selected for comparison and analysis.
1. Temporal complexity analysis
The AP method needs to utilize the similarity to calculate the attraction degree and the attribution degree in the clustering process, an N-order similarity input matrix is formed by N sample numbers, the attraction degree and the attribution degree are updated for N times, and each timeAttenuation is carried out after new finishing, the iteration number depends on the sample data size N and the reference degree p, the iteration number under the limit condition is plogN, and the AP algorithm time complexity is O (N) because p is a constant 2 logN), and K-Means algorithm
All data are respectively executed with K different clustering centers by one cycle operation, so the time complexity of the algorithm is only O (N x K), and the running time pair is shown in FIG. 5.
It can be seen from the figure that the running speed of the AP algorithm is higher when the sample size is smaller, and the running efficiency of the K-means is obviously better than that of the AP algorithm when the data size is increased to 500, however, the clustering center can be extracted in the early training process, and the new input data can be divided into clusters only by calculating with the clustering center set, so that the sample number in real-time association is effectively reduced, and the defect of the AP clustering algorithm in time efficiency is well made up. Meanwhile, the sum of squared errors is used as a verification index to compare two algorithms, and the calculation formula is as follows:
Figure BDA0002541996710000111
where r represents r attributes in the sample data, n i An ith attribute representing an nth sample; and (3) comparing the difference between the sample value and the mean value of the n sample data under different attributes to check the fitting accuracy of the model. The calculation results are shown in table 4, and it can be known from the results that the fitting degree of the AP clustering is much higher than that of the K-means algorithm.
TABLE 4 comparison of degrees of Fidelity
Clustering method Sum of squares of errors
AP clustering 0.46
K-means clustering 2.38
Correlation results analysis
After the attack scene is clustered and divided, the attack process can be restored according to the idea of causal association, then the attack graph is described by using graph drawing software, and 5 representative attack graphs are selected for analysis:
as shown in the first attack diagram provided in fig. 6, a complete DoS attack process is restored, where an attacker first scans a host in the network, performs vulnerability scanning (sadmlind-ping) on an attack target when finding an active target, and launches a buffer overflow attack on the host with an IP address of 61.163.217.30 to obtain a root authority thereof, and then sends a mstreamDoS atomic attack to the 61.163.217.255 host with the host as a jump board, thereby controlling the puppet to launch an attack on the network.
The attack process shown in the second attack diagram provided in fig. 7 is relatively common, and these attack actions are only simple actions of scanning and initiating remote connection, etc., and do not complete a real attack, but still trigger alarm information, and these actions are not highly threatening, but still expose vulnerabilities existing in the network.
The third attack diagram provided in fig. 8 restores the process that one target host is attacked by multiple hackers, multiple intruders perform SYN scanning or FIN scanning on the target host in the same time period, obtain active port information through a returned message, further attack the host and obtain a high-level right, and then launch different types of distributed attacks on different hosts in the network by using the host as a jumper.
As a fourth attack diagram provided in fig. 9 shows a distributed port attack process, an attack source initiates an attack on the same port at different IP addresses, controls the puppet computers by remote login, searches for a target port with the current host as a host, initiates a local or remote attack, and destroys the target host by a buffer overflow attack.
The fifth attack graph as provided in fig. 10 is a typical attack process of obtaining rights and performing distributed attack by using a vulnerability, and an attacker scans vulnerability information in different targets through the vulnerability and develops an attack against different vulnerabilities to obtain higher-level rights.
2. Correlation efficiency analysis
The correlation proportion and the false alarm rate are reasonable indexes for verifying the validity of alarm correlation, and the calculation formulas are respectively as follows:
Figure BDA0002541996710000121
Figure BDA0002541996710000122
the association analysis method based on the causal knowledge network and the association analysis method based on the attribute similarity and the knowledge base are selected to be compared with the method provided by the text, and the comparison condition is shown in table 5.
TABLE 5 correlation ratio, false alarm Rate comparison
Correlation analysis method False alarm rate Correlation ratio
The methods as presented herein 2.1% 96.7%
Correlation analysis method based on causal knowledge network 10.7% 83.6%
Correlation analysis method based on attribute similarity and knowledge base 4.5% 93.2%
As can be seen from table 5, the hybrid alarm correlation method is higher than the single correlation method in the correlation accuracy, and can effectively reduce the false alarm rate; the method provided by the text selects the AP algorithm with higher fitting degree, so that the association proportion is higher than that of an association analysis method based on attribute similarity and a knowledge base, and meanwhile, the false alarm rate is reduced to a certain extent, which shows that the method can find out more internal logic relations from massive alarm data, and effectively reduces isolated alarms.
The invention relates to a mixed alarm correlation method based on AP clustering and causal relationship, which comprises the steps of firstly, carrying out standardization processing on an obtained original alarm log based on an intrusion detection information exchange format, and obtaining the alarm log according to extracted alarm attributes; secondly, obtaining similarity among a plurality of data points in the alarm log and reference degree of corresponding nodes, carrying out iterative updating of an attraction degree matrix and an attribution degree matrix, introducing a damping factor into an AP clustering algorithm for attenuation until a candidate clustering center is stable or the iteration frequency reaches a threshold value, and outputting a clustering center and a data set after an attack scene is divided; and finally, calculating the overall similarity value of any two pieces of alarm data by using a weighted average algorithm, clustering and dividing the overall similarity value after taking a negative value based on an AP clustering algorithm, sequencing according to a time sequence, and performing association matching on the alarm data, so that the association precision and the time efficiency are improved.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (2)

1. A mixed alarm association method based on AP clustering and causal relationship is characterized by comprising the following steps:
normalizing the obtained original alarm log to obtain an alarm log;
the attack scenes are divided by improving the similarity and combining an AP clustering method;
matching and associating alarm data in the attack scene based on the causality idea;
the method for dividing the attack scenes by improving the similarity and combining the AP clustering method comprises the following steps:
acquiring similarity among a plurality of data points in the alarm log and reference degree of corresponding nodes, performing iterative update on an attraction matrix according to the corresponding similarity and reference degree, and performing iterative update on an attribution matrix according to a corresponding attraction value generated by the attraction matrix iteration;
introducing damping factors into an AP clustering algorithm, respectively attenuating the attraction degree matrix and the attribution degree matrix, and judging whether a candidate clustering center is stable or whether the iteration times reach a threshold value;
judging whether the candidate clustering center is stable or whether the iteration number reaches a threshold value, including:
if the candidate clustering center is not stable or the iteration times do not reach the threshold value, continuously carrying out iteration updating on the attraction degree matrix and the attribution degree matrix until the candidate clustering center is stable or the iteration times reach the threshold value, and outputting the clustering center and the divided attack scene set;
if the candidate clustering center is stable or the iteration number reaches a threshold value, judging whether the sum of the self-attraction degree and the self-attribution degree is greater than zero;
if the sum of the self-attraction degree and the self-homing degree is greater than zero, the corresponding node is a clustering center;
if the sum of the self-index degree and the self-attribution degree is less than or equal to zero, the corresponding node is a non-clustering center;
after the cluster center and the divided attack scene set are output, the method further comprises the following steps:
calculating the overall similarity values of two pieces of alarm data by using a weighted average algorithm according to the attack type similarity value, the IP address similarity value, the port similarity value and the time similarity value of any two pieces of alarm data in the attack scene set, and clustering and dividing the overall similarity values after taking a negative value based on an AP clustering algorithm;
the matching and association of the alarm data in the attack scene based on the causal relationship idea comprises the following steps:
after the data after clustering division are sequenced according to the time sequence, a first piece of data is sequentially matched with a plurality of second pieces of data according to the step length of 1, the first piece of data with matched ports and IP addresses is divided into related corresponding alarm data, and the first piece of data with unmatched ports and IP addresses is divided into an isolated alarm queue until all data are matched.
2. The AP clustering and causal relationship-based hybrid alarm correlation method of claim 1, wherein said normalizing said obtained raw alarm logs to obtain alarm logs comprises:
and standardizing the obtained original alarm log based on an intrusion detection information exchange format, and after deleting repeated alarm logs, extracting a plurality of alarm attributes from the original alarm data to form a 7-tuple to obtain the alarm log.
CN202010549657.9A 2020-06-16 2020-06-16 Hybrid alarm association method based on AP clustering and causal relationship Active CN111709022B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010549657.9A CN111709022B (en) 2020-06-16 2020-06-16 Hybrid alarm association method based on AP clustering and causal relationship

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010549657.9A CN111709022B (en) 2020-06-16 2020-06-16 Hybrid alarm association method based on AP clustering and causal relationship

Publications (2)

Publication Number Publication Date
CN111709022A CN111709022A (en) 2020-09-25
CN111709022B true CN111709022B (en) 2022-08-19

Family

ID=72540951

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010549657.9A Active CN111709022B (en) 2020-06-16 2020-06-16 Hybrid alarm association method based on AP clustering and causal relationship

Country Status (1)

Country Link
CN (1) CN111709022B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113282639B (en) * 2021-04-27 2022-08-05 深圳市中燃科技有限公司 Gas leakage data monitoring method and system, intelligent terminal and storage medium
CN113420802B (en) * 2021-06-04 2023-05-30 桂林电子科技大学 Alarm data fusion method based on improved spectral clustering
CN114287903A (en) * 2021-12-31 2022-04-08 佳禾智能科技股份有限公司 Heart rate detection method and device based on piezoelectric sensor and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105471623A (en) * 2015-11-16 2016-04-06 中国烟草总公司江苏省公司 Key IP address safety alarm association analysis method based on fuzzy scene
CN106911629A (en) * 2015-12-22 2017-06-30 中国移动通信集团公司 A kind of alert correlation method and device
CN109450946A (en) * 2018-12-27 2019-03-08 浙江大学 A kind of unknown attack scene detection method based on alert correlation analysis

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070008098A1 (en) * 2005-07-08 2007-01-11 Hsing-Kuo Wong Method and architecture for online classification-based intrusion alert correlation
CN100518089C (en) * 2006-07-19 2009-07-22 华为技术有限公司 Security event associative analysis method and system
CN103746961B (en) * 2013-12-12 2017-03-15 中国人民解放军63928部队 A kind of causal knowledge method for digging of cyber attack scenarios, device and server
US20190166457A1 (en) * 2017-11-27 2019-05-30 Pelmorex Corp. Systems and methods for location-based alert generation
CN108833139B (en) * 2018-05-22 2021-02-19 桂林电子科技大学 OSSEC alarm data aggregation method based on category attribute division
WO2020013958A1 (en) * 2018-07-10 2020-01-16 Siemens Aktiengesellschaft Hybrid unsupervised machine learning framework for industrial control system intrusion detection
CN110381015A (en) * 2019-06-03 2019-10-25 西安电子科技大学 A kind of clustering method based on intruding detection system warning message
CN110474885B (en) * 2019-07-24 2021-10-22 桂林电子科技大学 Alarm correlation analysis method based on time sequence and IP address
CN110659997B (en) * 2019-08-15 2023-06-27 中国平安财产保险股份有限公司 Data cluster recognition method, device, computer system and readable storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105471623A (en) * 2015-11-16 2016-04-06 中国烟草总公司江苏省公司 Key IP address safety alarm association analysis method based on fuzzy scene
CN106911629A (en) * 2015-12-22 2017-06-30 中国移动通信集团公司 A kind of alert correlation method and device
CN109450946A (en) * 2018-12-27 2019-03-08 浙江大学 A kind of unknown attack scene detection method based on alert correlation analysis

Also Published As

Publication number Publication date
CN111709022A (en) 2020-09-25

Similar Documents

Publication Publication Date Title
Aljawarneh et al. Anomaly-based intrusion detection system through feature selection analysis and building hybrid efficient model
CN111709022B (en) Hybrid alarm association method based on AP clustering and causal relationship
CN111565205B (en) Network attack identification method and device, computer equipment and storage medium
CN110474885B (en) Alarm correlation analysis method based on time sequence and IP address
Peng et al. Network intrusion detection based on deep learning
CN112333195B (en) APT attack scene reduction detection method and system based on multi-source log correlation analysis
Laftah Al-Yaseen et al. Hybrid modified-means with C4. 5 for intrusion detection systems in Multiagent Systems
CN113422763B (en) Alarm correlation analysis method constructed based on attack scene
CN112182567B (en) Multi-step attack tracing method, system, terminal and readable storage medium
Herrera-Semenets et al. A data reduction strategy and its application on scan and backscatter detection using rule-based classifiers
Yang et al. Fast3DS: A real-time full-convolutional malicious domain name detection system
Brandao et al. Log Files Analysis for Network Intrusion Detection
Wang et al. Res-TranBiLSTM: An intelligent approach for intrusion detection in the Internet of Things
Zhang et al. Detection of android malware based on deep forest and feature enhancement
Tao et al. A hybrid alarm association method based on AP clustering and causality
CN112215300A (en) Network structure enhancement-based graph convolution model defense method, device and system
CN117294497A (en) Network traffic abnormality detection method and device, electronic equipment and storage medium
CN114726634B (en) Knowledge graph-based hacking scene construction method and device
Zhang et al. A Intrusion Detection Model Based on Convolutional Neural Network and Feature Selection
CN111901137A (en) Method for mining multi-step attack scene by using honeypot alarm log
CN113420791B (en) Access control method and device for edge network equipment and terminal equipment
Leevy et al. Feature evaluation for IoT botnet traffic classification
CN110197066B (en) Virtual machine monitoring method and system in cloud computing environment
Shpinareva et al. Detection and classification of network attacks using the deep neural network cascade
Lin et al. Behaviour classification of cyber attacks using convolutional neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20200925

Assignee: Guangxi Jun'an Network Security Technology Co.,Ltd.

Assignor: GUILIN University OF ELECTRONIC TECHNOLOGY

Contract record no.: X2022450000459

Denomination of invention: A Hybrid Alarm Correlation Method Based on AP Clustering and Causality

Granted publication date: 20220819

License type: Common License

Record date: 20221228

EE01 Entry into force of recordation of patent licensing contract