CN111709022B

CN111709022B - Hybrid alarm association method based on AP clustering and causal relationship

Info

Publication number: CN111709022B
Application number: CN202010549657.9A
Authority: CN
Inventors: 陶晓玲; 赵培超; 石兰; 顾涛
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2020-06-16
Filing date: 2020-06-16
Publication date: 2022-08-19
Anticipated expiration: 2040-06-16
Also published as: CN111709022A

Abstract

The invention discloses a mixed alarm correlation method based on AP clustering and causal relationship, which comprises the steps of firstly, carrying out standardization processing on an obtained original alarm log based on an intrusion detection information exchange format, and obtaining the alarm log according to extracted alarm attributes; secondly, obtaining similarity among a plurality of data points in the alarm log and reference degree of corresponding nodes, performing iterative updating of an attraction degree matrix and an attribution degree matrix, introducing a damping factor into an AP clustering algorithm for attenuation until a candidate clustering center is stable or the iteration times reach a threshold value, and outputting a clustering center and a data set after an attack scene is divided; and finally, calculating the overall similarity value of any two pieces of alarm data by using a weighted average algorithm, clustering and dividing the overall similarity value after taking a negative value based on an AP clustering algorithm, sequencing according to a time sequence, and performing association matching on the alarm data, so that the association precision and the time efficiency are improved.

Description

Hybrid alarm association method based on AP clustering and causal relationship

Technical Field

The invention relates to the technical field of network security, in particular to a hybrid alarm association method based on AP clustering and causal relationship.

Background

With the rapid development of global informatization, a massive informatization system becomes a key infrastructure of the country and government, but with the acceleration of a digitization process, the network security threatens not only the traditional industry and the key information infrastructure, but also risks in emerging fields such as artificial intelligence, cloud computing, big data, internet of things and the like are increased rapidly; in recent years, China is continuously strengthening self-response capability aiming at severe network security situation and achieving certain achievement. In order to effectively protect network security, managers widely deploy security products such as firewalls, Intrusion Detection Systems (IDS) and the like in a network system, but due to continuous expansion of network scale and increasingly complex structure, the existing IDS system is exposed to many problems, researchers open up a new idea while improving an intrusion detection algorithm and optimizing a detection model, and propose to perform fusion analysis on mutually independent alarm information through an alarm correlation technology.

With the wide application and development of the alarm correlation technology in the intrusion detection field, some representative methods and achievements appear, which can be roughly divided into the following methods: (1) associating methods based on predefined attack scenarios; (2) correlation method based on causal relationship; (3) an association method based on alarm attribute similarity; (4) the correlation method based on statistical analysis needs to establish a knowledge base based on the alarm correlation technology of causal relationship, establish causal rules for each attack action, and perform correlation analysis in a mode of pattern matching. The alarm correlation method based on the attribute similarity has the advantages of simple algorithm, strong real-time performance and the like, but the similarity of the attributes does not have a standard, the final correlation result is greatly influenced by artificially set parameters such as a similarity weight coefficient and the like, the correlation result has poor logicality, and an administrator cannot be helped to understand the attack intention and the relation between attack actions; through years of research and development, the alarm correlation technology has made a certain progress, but recent research shows that no single processing method can perfectly solve the alarm correlation problem, the existing method inevitably increases the calculation overhead while improving the correlation precision, and the adoption of a multi-type mixed architecture to realize the complementation between different methods is a direction worthy of research.

Disclosure of Invention

The invention aims to provide a hybrid alarm association method based on AP clustering and causal relationship, which improves association precision and time efficiency.

In order to achieve the above object, the present invention provides a hybrid alarm association method based on AP clustering and causal relationship, including:

normalizing the obtained original alarm log to obtain an alarm log;

the attack scenes are divided by improving the similarity and combining an AP clustering method;

and matching and associating alarm data in the attack scene based on the causal relationship idea.

The normalizing process of the obtained original alarm log to obtain the alarm log includes:

and standardizing the obtained original alarm log based on an intrusion detection information exchange format, and after deleting repeated alarm logs, extracting a plurality of alarm attributes from the original alarm data to form a 7-tuple to obtain the alarm log.

Wherein, the utilization of the improvement of the similarity and the division of the attack scene by combining the AP clustering method comprises the following steps:

and obtaining the similarity among a plurality of data points in the alarm log and the reference degree of corresponding nodes, carrying out iterative update on an attraction degree matrix according to the corresponding similarity and the reference degree, and carrying out iterative update on an attribution degree matrix according to a corresponding attraction degree value generated by the iterative update of the attraction degree matrix.

Wherein, the utilization of the improvement of the similarity and the division of the attack scene by combining the AP clustering method further comprises the following steps:

and introducing a damping factor into the AP clustering algorithm, respectively attenuating the attraction degree matrix and the attribution degree matrix, and judging whether the candidate clustering center is stable or whether the iteration times reach a threshold value.

Judging whether the candidate clustering center is stable or whether the iteration frequency reaches a threshold value comprises the following steps:

if the candidate clustering center is not stable or the iteration times do not reach the threshold value, continuously carrying out iterative updating on the attraction degree matrix and the attribution degree matrix until the candidate clustering center is stable or the iteration times reach the threshold value, and outputting the clustering center and the divided attack scene set;

if the candidate clustering center is stable or the iteration number reaches a threshold value, judging whether the sum of the self-attraction degree and the self-attribution degree is larger than zero;

if the sum of the self-indexing degree and the self-attribution degree is greater than zero, the corresponding node is a clustering center;

and if the sum of the self-index degree and the self-attribution degree is less than or equal to zero, the corresponding node is a non-clustering center.

After the cluster center and the divided attack scene set are output, the method further comprises the following steps:

and calculating the overall similarity values of any two pieces of alarm data by using a weighted average algorithm according to the attack type similarity value, the IP address similarity value, the port similarity value and the time similarity value of any two pieces of alarm data in the attack scene set, and clustering and dividing the overall similarity values after taking negative values based on an AP clustering algorithm.

The matching and association of the alarm data in the attack scene based on the causal relationship idea comprises the following steps:

after the clustered and divided data are sorted according to the time sequence, a first piece of data is sequentially matched with a plurality of second pieces of data according to the step length of 1, the first piece of data with matched ports and IP addresses is divided into related and corresponding alarm data, and the first piece of data with unmatched ports and IP addresses is divided into an isolated alarm queue until all data are matched.

The invention relates to a mixed alarm correlation method based on AP clustering and causal relationship, which comprises the steps of firstly, carrying out standardization processing on an obtained original alarm log based on an intrusion detection information exchange format, and obtaining the alarm log according to extracted alarm attributes; secondly, obtaining similarity among a plurality of data points in the alarm log and reference degree of corresponding nodes, carrying out iterative updating of an attraction degree matrix and an attribution degree matrix, introducing a damping factor into an AP clustering algorithm for attenuation until a candidate clustering center is stable or the iteration frequency reaches a threshold value, and outputting a clustering center and a data set after an attack scene is divided; and finally, calculating the overall similarity value of any two pieces of alarm data by using a weighted average algorithm, clustering and dividing the overall similarity value after taking a negative value based on an AP clustering algorithm, sequencing according to a time sequence, and performing association matching on the alarm data, so that the association precision and the time efficiency are improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic step diagram of a hybrid alarm association method based on AP clustering and causal relationship provided in the present invention.

FIG. 2 is a diagram of the manner in which information is communicated between data points provided by the present invention.

FIG. 3 is a flow chart of causal relationship based alarm association provided by the present invention.

Fig. 4 is a network topology diagram of the honeypot system provided by the invention.

Fig. 5 is a comparison graph of the run time provided by the present invention.

Fig. 6 is a first attack diagram provided by the present invention.

The second attack graph provided in fig. 7.

The third attack graph provided in fig. 8.

The fourth attack graph provided in fig. 9.

The fifth attack diagram provided in fig. 10.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

Referring to fig. 1, the present invention provides a hybrid alarm association method based on AP clustering and causal relationship, including:

s101, normalizing the acquired original alarm log to obtain the alarm log.

Specifically, in order to effectively protect network security, an administrator usually deploys a plurality of intrusion detection systems in a network security protection system, and alarm data generated by different intrusion detection systems have different formats, and the data cannot be directly used for correlation analysis, so that normalization processing of an alarm log is the basis of subsequent work. The (IDMEF) is a standard format initiated and established by an Intrusion Detection Working Group (IDWG), the scheme provides basis for realizing information sharing and format exchange among IDSs of different types, so that the acquired original alarm log is standardized based on the intrusion detection information exchange format, after repeated alarm logs are deleted, 8 alarm attributes are extracted from the original alarm data to form a 7-tuple to obtain the alarm log, wherein the extracted alarm attributes have meanings shown in a table 1,

TABLE 1 alarm data Attribute meanings

Because the same intrusion event can trigger multiple intrusion detection devices in the network security system at the same time, a large number of repeated alarms exist in the replayed data. Aiming at the problem, other alarms with the same attribute except the signature attribute and the time attribute within a certain time threshold are combined into a single alarm, so that repeated alarm data are reduced, and preparation is made for subsequent alarm correlation work.

And S102, dividing the attack scene by improving the similarity and combining an AP clustering method.

Specifically, the similarity among a plurality of data points in the alarm log and the reference degree of a corresponding node are obtained, specifically, a similarity matrix S formed by the similarities among N data points is used as input for cluster analysis, S (i, j) is used for representing the similarity between a node i and a node j, a reference degree Perference concept is introduced for representing the reference degree of the data points as a cluster center, the reference degree of the point i is represented as P (i) or S (i, i), the higher the value is, the higher the possibility that the point i is used as the cluster center is, because the AP algorithm considers that each data point is possible to be used as the cluster center, all P take the same value, the final cluster number is greatly influenced by the reference degree, and the P value is usually taken as a median value or a minimum value of the input similarity value.

And performing iterative update on an attraction matrix according to the corresponding similarity and the corresponding reference degree, and performing iterative update on an attribution degree matrix according to a corresponding attraction degree value generated by the attraction degree matrix iteration, introducing two concepts of attraction degree (responsiveness) and attribution degree (Availability), and realizing transmission and update between data points by repeatedly updating the attraction degree matrix and the attribution degree matrix iteratively to obtain a final clustering center point. The attraction degree r (i, j) represents the information sent by the data point i to the data point j, and reflects the degree that the point j is suitable for being used as the clustering center of the point i, the larger the r value is, the higher the possibility that the point j is used as the clustering center of the point i is, the attribution degree a (i, j) represents the information transmitted by the data point j to the point i, and reflects the suitable degree that the point i selects the point j as the clustering center of the point i, and the larger the a value is, the higher the possibility that the point i selects the cluster center point j as the clustering center is. The information transmission manner between the data points is shown in fig. 2, wherein the update formula of the attraction matrix R is:

the update formula of the attribution degree matrix A is as follows:

meanwhile, in order to avoid the problem of data oscillation in the matrix updating process, a damping factor lambda is introduced into the AP clustering algorithm to respectively attenuate the attraction degree matrix and the attribution degree matrix, and the updating formula is as follows:

R _t+1 (i,k)＝λ*r _t (i,k)+(1-λ)*r _t+1 (i,k)

A _t+1 (i,k)＝λa _t (i,k)+(1-λ)*a _t+1 (i,k)

wherein r is _t+1 (i, k) represents the attraction between the point i and the point k after the t +1 th update,R _t+1 (i, k) represents the attraction degree after the attenuation. a is _t+1 (i, k) denotes the attribution degree after the t +1 th update, A _t+1 (i, k) is the degree of attribution after attenuation.

Then judging whether the candidate clustering center is stable or whether the iteration frequency reaches a threshold value, if the candidate clustering center is not stable or the iteration frequency does not reach the threshold value, continuing to iteratively update the attraction degree matrix and the attribution degree matrix until the candidate clustering center is stable or the iteration frequency reaches the threshold value, and outputting the clustering center and the divided attack scene set;

if the candidate clustering center is stable or the iteration number reaches a threshold value, judging whether the sum of the self-attraction degree r (i, i) and the self-attribution degree a (i, i) is greater than zero; if the sum of the self-indexing degree and the self-attribution degree is larger than zero, namely r (i, i) + a (i, i) > 0, the corresponding node i is a clustering center; and if the sum of the self-attraction degree and the self-attribution degree is less than or equal to zero, namely r (i, i) + a (i, i) is less than or equal to 0, the corresponding node is a non-clustering center, and for a non-clustering center point j, if a data point k enables r (j, k) + a (j, k) to be the maximum, the point k is the clustering center of the data point j. And finally, outputting the final clustering center and the data set to which the classification belongs.

The standard AP algorithm adopts Euclidean distance as a similarity calculation criterion, the method belongs to a similarity calculation method based on numerical variables, and is not suitable for non-numerical alarm log attributes, so different similarity calculation methods are set for each attribute of alarm data, the total similarity among the alarm data is obtained through weighted average of the similarity of each attribute, and a similarity matrix is further obtained to serve as the input of the AP algorithm.

Firstly, calculating an attack type similarity value, an IP address similarity value, a port similarity value and a time similarity value of any two pieces of alarm data in the data set, specifically:

attack type similarity:

for the alarm type attribute, if two pieces of alarm data alert _i And alert _j If the alarm types are the same, the similarity is set to be 1, otherwise, the similarity is set to be 0:

and (3) IP address similarity calculation:

the IP addresses in the alarm log are expressed in decimal form, reference [79], and the similarity is calculated by comparing the same prefix digit, so the IP addresses are firstly converted into a binary representation format, and the calculation formula is as follows:

wherein r represents the continuous same digit of the IP address from the high order to the low order of the two pieces of alarm data.

Port similarity calculation:

the port number is a boolean attribute, and if two alarm ports are completely the same, the similarity is considered to be 1, and if they are different, the similarity is 0:

calculating the time similarity:

in most researches, a time window is set to determine a limiting threshold of alarm time, the set size of the window threshold can influence the final similarity calculation, date attributes are firstly compared, the time similarity is calculated by adopting a sigmoid function for alarms with the same date attributes, otherwise, the similarity is 0, and the calculation formula is as follows:

wherein

Then, the weighted average algorithm is used for calculating two overall phases of the alarm dataAnd (4) determining the similarity value, wherein in the process of determining the attribute weight, in order to avoid the subjectivity of artificial assignment, the weight of each characteristic attribute is determined by adopting a principal component analysis method. Two-alarm alert _i With alert _j The overall similarity calculation formula of (2) is as follows:

wherein, sim _l Representing the degree of similarity of the respective attributes, ω _l And representing the weight corresponding to each attribute.

Based on the similarity calculation method, a similarity matrix can be obtained, the overall similarity value takes a negative value based on an AP clustering algorithm and then is subjected to clustering division, and alarm data with high similarity are clustered in the same attack scene. The attack scenario divides the pseudo-code as follows:

attack scene partitioning based on AP clustering
	Input: alarm data set Alert ═ { a ═ ₁ ,a ₂ ，…,a _n }
Output: attack scene set (sequence) { sensor ═ ₁ ,sence ₂ ,…,sence _n }
	// similarity matrix calculation
Simility＝[sim ₁₁ ,sim ₁₂ ,…sim _nn ]
	V/initial attraction degree, attribution degree matrix
Initialresponsibility＝[r ₁₁ ,r ₁₂ ,…r _nn ]
	Initialavailability＝[a ₁₁ ,a ₁₂ …a _nn ]
V/update attraction and affiliation matrix
	If r(i,i)+a(i,i)＜0
For(i＝0,i＜n；i++)
	{
R _t+1 (i,j)＝λr _t (i,j)+(1-λ)r _t+1 (i,j)
	A _t+1 (i,j)＝λa _t (i,j)+(1-λ)*a _t+1 (i,j)
}
	Endif
V/record Cluster center
	String center＝updatecenter(classlist,center)
Readcenter.filewriter(center)
	ReturnSence＝{sence ₁ ,sence ₂ ,…,sence _n }

S103, matching and associating alarm data in the attack scene based on the causal relationship idea.

Specifically, by means of an AP clustering method, massive unordered alarm logs are divided according to a similarity principle to form an attack scene set with small intra-class intervals and large inter-class intervals. As shown in the causal relationship-based alarm association flow chart provided in fig. 3:

sequentially reading the alarm data sets after clustering processing, and sequentially carrying out association analysis on data in each attack scene; and then sorting the clustered and divided data according to the time sequence, matching a first piece of data with a plurality of second pieces of data in sequence according to the step length of 1, dividing the first piece of data with matched ports and IP addresses into related corresponding alarm data, and dividing the first piece of data with unmatched ports and IP addresses into an isolated alarm queue until all data are matched.

The causality association method considers that most of a complete network attack is completed according to a certain logic sequence through attack behaviors in different stages, the former attack behavior is used as a condition for providing next-step invasion behaviors, for example, an attacker needs to realize TCP SYN denial of service attack on a certain host, firstly, a message is sent to a target computer and can carry out next-step attack actions after the target receives and establishes connection, therefore, the premise that each attack action has the requirement is provided and a corresponding result is generated, the premise and the result of alarm information are connected according to a certain rule, and a complete attack sequence can be found. The method is used for carrying out correlation analysis on the alarm data in the same attack scene by taking the thought as a reference, so that the logical relation among the alarm data is described.

The method is adopted for experiments, a win10 operating system, Intel Core i7-8550U, a 2.4GHz processor, an 8G memory and a hard disk are configured in a 1T mode, an alarm correlation algorithm adopts Python3.6 programming in a Windows environment, a programming platform is Pycharm Community 2017.3, and the alarm data file processing work is simply and efficiently realized by means of a scipit-lern library. After alarm association is finished, the attack scene is displayed in a visual mode in the form of an attack graph by using a Graphviz graph drawing tool, and a Honeypot data set of a security research organization of a dense network project is selected for research and analysis. The Honeypot data set (Honeypot data set) is a Honeypot data set which is constructed to simulate a real network system and runs a disguised service in the Honeypot data set to induce an intruder to start attacks and capture and analyze related information, a Honeypot system network topological diagram is shown in fig. 4, thousands of virtual systems can be deployed in one virtual Honeypot system, each system uses different IP and port numbers, so that the diversity of data sources is guaranteed, the data are from the real network attacks, the data types comprise multiple aspects such as an operating system, a violent network attack, a host computer vulnerability and port scanning, and the attack types contained in the data are shown in table 3.

TABLE 3 honeypot data attack types

The method includes the steps that data need to be preprocessed before alarm association work is carried out, 123854 pieces of alarm data in 2014 October in honeypot data set are selected, the number of alarms is reduced to 80928 after preprocessing and duplicate removal, 35% of repeated alarms are deleted, and repeated association is avoided. As the complexity of the alarm correlation method is mainly concentrated on the attack scene division part based on AP clustering, the most common clustering method K-means algorithm in machine learning is selected for comparison and analysis.

1. Temporal complexity analysis

The AP method needs to utilize the similarity to calculate the attraction degree and the attribution degree in the clustering process, an N-order similarity input matrix is formed by N sample numbers, the attraction degree and the attribution degree are updated for N times, and each timeAttenuation is carried out after new finishing, the iteration number depends on the sample data size N and the reference degree p, the iteration number under the limit condition is plogN, and the AP algorithm time complexity is O (N) because p is a constant ² logN), and K-Means algorithm

All data are respectively executed with K different clustering centers by one cycle operation, so the time complexity of the algorithm is only O (N x K), and the running time pair is shown in FIG. 5.

It can be seen from the figure that the running speed of the AP algorithm is higher when the sample size is smaller, and the running efficiency of the K-means is obviously better than that of the AP algorithm when the data size is increased to 500, however, the clustering center can be extracted in the early training process, and the new input data can be divided into clusters only by calculating with the clustering center set, so that the sample number in real-time association is effectively reduced, and the defect of the AP clustering algorithm in time efficiency is well made up. Meanwhile, the sum of squared errors is used as a verification index to compare two algorithms, and the calculation formula is as follows:

where r represents r attributes in the sample data, n _i An ith attribute representing an nth sample; and (3) comparing the difference between the sample value and the mean value of the n sample data under different attributes to check the fitting accuracy of the model. The calculation results are shown in table 4, and it can be known from the results that the fitting degree of the AP clustering is much higher than that of the K-means algorithm.

TABLE 4 comparison of degrees of Fidelity

Clustering method	Sum of squares of errors
		AP clustering	0.46
K-means clustering	2.38

Correlation results analysis

After the attack scene is clustered and divided, the attack process can be restored according to the idea of causal association, then the attack graph is described by using graph drawing software, and 5 representative attack graphs are selected for analysis:

as shown in the first attack diagram provided in fig. 6, a complete DoS attack process is restored, where an attacker first scans a host in the network, performs vulnerability scanning (sadmlind-ping) on an attack target when finding an active target, and launches a buffer overflow attack on the host with an IP address of 61.163.217.30 to obtain a root authority thereof, and then sends a mstreamDoS atomic attack to the 61.163.217.255 host with the host as a jump board, thereby controlling the puppet to launch an attack on the network.

The attack process shown in the second attack diagram provided in fig. 7 is relatively common, and these attack actions are only simple actions of scanning and initiating remote connection, etc., and do not complete a real attack, but still trigger alarm information, and these actions are not highly threatening, but still expose vulnerabilities existing in the network.

The third attack diagram provided in fig. 8 restores the process that one target host is attacked by multiple hackers, multiple intruders perform SYN scanning or FIN scanning on the target host in the same time period, obtain active port information through a returned message, further attack the host and obtain a high-level right, and then launch different types of distributed attacks on different hosts in the network by using the host as a jumper.

As a fourth attack diagram provided in fig. 9 shows a distributed port attack process, an attack source initiates an attack on the same port at different IP addresses, controls the puppet computers by remote login, searches for a target port with the current host as a host, initiates a local or remote attack, and destroys the target host by a buffer overflow attack.

The fifth attack graph as provided in fig. 10 is a typical attack process of obtaining rights and performing distributed attack by using a vulnerability, and an attacker scans vulnerability information in different targets through the vulnerability and develops an attack against different vulnerabilities to obtain higher-level rights.

2. Correlation efficiency analysis

The correlation proportion and the false alarm rate are reasonable indexes for verifying the validity of alarm correlation, and the calculation formulas are respectively as follows:

the association analysis method based on the causal knowledge network and the association analysis method based on the attribute similarity and the knowledge base are selected to be compared with the method provided by the text, and the comparison condition is shown in table 5.

TABLE 5 correlation ratio, false alarm Rate comparison

Correlation analysis method	False alarm rate	Correlation ratio
			The methods as presented herein	2.1％	96.7％
Correlation analysis method based on causal knowledge network	10.7％	83.6％
			Correlation analysis method based on attribute similarity and knowledge base	4.5％	93.2％

As can be seen from table 5, the hybrid alarm correlation method is higher than the single correlation method in the correlation accuracy, and can effectively reduce the false alarm rate; the method provided by the text selects the AP algorithm with higher fitting degree, so that the association proportion is higher than that of an association analysis method based on attribute similarity and a knowledge base, and meanwhile, the false alarm rate is reduced to a certain extent, which shows that the method can find out more internal logic relations from massive alarm data, and effectively reduces isolated alarms.

While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A mixed alarm association method based on AP clustering and causal relationship is characterized by comprising the following steps:

normalizing the obtained original alarm log to obtain an alarm log;

matching and associating alarm data in the attack scene based on the causality idea;

the method for dividing the attack scenes by improving the similarity and combining the AP clustering method comprises the following steps:

acquiring similarity among a plurality of data points in the alarm log and reference degree of corresponding nodes, performing iterative update on an attraction matrix according to the corresponding similarity and reference degree, and performing iterative update on an attribution matrix according to a corresponding attraction value generated by the attraction matrix iteration;

introducing damping factors into an AP clustering algorithm, respectively attenuating the attraction degree matrix and the attribution degree matrix, and judging whether a candidate clustering center is stable or whether the iteration times reach a threshold value;

judging whether the candidate clustering center is stable or whether the iteration number reaches a threshold value, including:

if the candidate clustering center is not stable or the iteration times do not reach the threshold value, continuously carrying out iteration updating on the attraction degree matrix and the attribution degree matrix until the candidate clustering center is stable or the iteration times reach the threshold value, and outputting the clustering center and the divided attack scene set;

if the candidate clustering center is stable or the iteration number reaches a threshold value, judging whether the sum of the self-attraction degree and the self-attribution degree is greater than zero;

if the sum of the self-attraction degree and the self-homing degree is greater than zero, the corresponding node is a clustering center;

if the sum of the self-index degree and the self-attribution degree is less than or equal to zero, the corresponding node is a non-clustering center;

calculating the overall similarity values of two pieces of alarm data by using a weighted average algorithm according to the attack type similarity value, the IP address similarity value, the port similarity value and the time similarity value of any two pieces of alarm data in the attack scene set, and clustering and dividing the overall similarity values after taking a negative value based on an AP clustering algorithm;

after the data after clustering division are sequenced according to the time sequence, a first piece of data is sequentially matched with a plurality of second pieces of data according to the step length of 1, the first piece of data with matched ports and IP addresses is divided into related corresponding alarm data, and the first piece of data with unmatched ports and IP addresses is divided into an isolated alarm queue until all data are matched.

2. The AP clustering and causal relationship-based hybrid alarm correlation method of claim 1, wherein said normalizing said obtained raw alarm logs to obtain alarm logs comprises: