CN111709022A - Hybrid alarm association method based on AP clustering and causal relationship - Google Patents

Hybrid alarm association method based on AP clustering and causal relationship Download PDF

Info

Publication number
CN111709022A
CN111709022A CN202010549657.9A CN202010549657A CN111709022A CN 111709022 A CN111709022 A CN 111709022A CN 202010549657 A CN202010549657 A CN 202010549657A CN 111709022 A CN111709022 A CN 111709022A
Authority
CN
China
Prior art keywords
alarm
clustering
data
similarity
attack
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010549657.9A
Other languages
Chinese (zh)
Other versions
CN111709022B (en
Inventor
陶晓玲
赵培超
石兰
顾涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN202010549657.9A priority Critical patent/CN111709022B/en
Publication of CN111709022A publication Critical patent/CN111709022A/en
Application granted granted Critical
Publication of CN111709022B publication Critical patent/CN111709022B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/552Detecting local intrusion or implementing counter-measures involving long-term monitoring or reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Abstract

The invention discloses a hybrid alarm association method based on AP clustering and causal relationship, which comprises the steps of firstly, carrying out standardization processing on an obtained original alarm log based on an intrusion detection information exchange format, and obtaining the alarm log according to extracted alarm attributes; secondly, obtaining similarity among a plurality of data points in the alarm log and reference degree of corresponding nodes, carrying out iterative updating of an attraction degree matrix and an attribution degree matrix, introducing a damping factor into an AP clustering algorithm for attenuation until a candidate clustering center is stable or the iteration frequency reaches a threshold value, and outputting a clustering center and a data set after an attack scene is divided; and finally, calculating the overall similarity value of any two pieces of alarm data by using a weighted average algorithm, clustering and dividing the overall similarity value after taking a negative value based on an AP clustering algorithm, sequencing according to a time sequence, and performing association matching on the alarm data, so that the association precision and the time efficiency are improved.

Description

Hybrid alarm association method based on AP clustering and causal relationship
Technical Field
The invention relates to the technical field of network security, in particular to a hybrid alarm association method based on AP clustering and causal relationship.
Background
With the rapid development of global informatization, a massive informatization system becomes a key infrastructure of the nation and the government, but with the acceleration of a digitization process, network security threatens the traditional industry and the key information infrastructure, and also causes the risk of emerging fields such as artificial intelligence, cloud computing, big data, internet of things and the like to increase rapidly; in recent years, China continuously strengthens self-response capability aiming at severe network security situation and obtains certain achievement. In order to effectively protect network security, managers widely deploy security products such as firewalls, Intrusion Detection Systems (IDS) and the like in a network system, but due to continuous expansion of network scale and increasingly complex structure, the existing IDS system is exposed to many problems, researchers open up a new idea while improving an intrusion detection algorithm and optimizing a detection model, and propose to perform fusion analysis on mutually independent alarm information through an alarm correlation technology.
With the wide application and development of the alarm correlation technology in the intrusion detection field, some representative methods and achievements appear, which can be roughly divided into the following methods: (1) associating methods based on predefined attack scenarios; (2) correlation method based on causal relationship; (3) an association method based on alarm attribute similarity; (4) the correlation method based on statistical analysis needs to establish a knowledge base based on the alarm correlation technology of causal relationship, establish causal rules for each attack action, and perform correlation analysis in a mode of pattern matching. The alarm correlation method based on the attribute similarity has the advantages of simple algorithm, strong real-time performance and the like, but the similarity of the attributes does not have a standard, the final correlation result is greatly influenced by artificially set parameters such as a similarity weight coefficient and the like, the logic of the correlation result is poor, and an administrator cannot be helped to understand the attack intention and the relationship between attack actions; through years of research and development, the alarm correlation technology has made a certain progress, but recent research shows that no single processing method can perfectly solve the alarm correlation problem, the existing method inevitably increases the calculation overhead while improving the correlation precision, and the adoption of a multi-type mixed architecture to realize the complementation between different methods is a direction worthy of research.
Disclosure of Invention
The invention aims to provide a hybrid alarm association method based on AP clustering and causal relationship, which improves association precision and time efficiency.
In order to achieve the above object, the present invention provides a hybrid alarm association method based on AP clustering and causal relationship, including:
normalizing the obtained original alarm log to obtain an alarm log;
the attack scenes are divided by improving the similarity and combining an AP clustering method;
and matching and associating alarm data in the attack scene based on the causal relationship idea.
The normalizing process of the obtained original alarm log to obtain the alarm log includes:
and standardizing the obtained original alarm log based on an intrusion detection information exchange format, and after deleting repeated alarm logs, extracting a plurality of alarm attributes from the original alarm data to form a 7-tuple to obtain the alarm log.
Wherein, the utilization of the improvement of the similarity and the division of the attack scene by combining the AP clustering method comprises the following steps:
and acquiring the similarity among a plurality of data points in the alarm log and the reference degree of the corresponding node, performing iterative update on an attraction matrix according to the corresponding similarity and the reference degree, and performing iterative update on an attribution matrix according to the corresponding attraction value generated by the iterative update of the attraction matrix.
Wherein, the utilization of the improvement of the similarity and the division of the attack scene by combining the AP clustering method further comprises the following steps:
and introducing a damping factor into the AP clustering algorithm, respectively attenuating the attraction degree matrix and the attribution degree matrix, and judging whether the candidate clustering center is stable or whether the iteration times reach a threshold value.
Judging whether the candidate clustering center is stable or whether the iteration frequency reaches a threshold value comprises the following steps:
if the candidate clustering center is not stable or the iteration times do not reach the threshold value, continuously carrying out iteration updating on the attraction degree matrix and the attribution degree matrix until the candidate clustering center is stable or the iteration times reach the threshold value, and outputting the clustering center and the divided attack scene set;
if the candidate clustering center is stable or the iteration number reaches a threshold value, judging whether the sum of the self-attraction degree and the self-attribution degree is greater than zero;
if the sum of the self-attraction degree and the self-homing degree is greater than zero, the corresponding node is a clustering center;
and if the sum of the self-index degree and the self-attribution degree is less than or equal to zero, the corresponding node is a non-clustering center.
After the cluster center and the divided attack scene set are output, the method further comprises the following steps:
and calculating the overall similarity values of the two pieces of alarm data by using a weighted average algorithm according to the attack type similarity value, the IP address similarity value, the port similarity value and the time similarity value of any two pieces of alarm data in the attack scene set, and clustering and dividing the overall similarity values after taking a negative value based on an AP clustering algorithm.
The matching and association of the alarm data in the attack scene based on the causal relationship idea comprises the following steps:
after the clustered and divided data are sorted according to the time sequence, a first piece of data is sequentially matched with a plurality of second pieces of data according to the step length of 1, the first piece of data with matched ports and IP addresses is divided into related and corresponding alarm data, and the first piece of data with unmatched ports and IP addresses is divided into an isolated alarm queue until all data are matched.
The invention relates to a mixed alarm association method based on AP clustering and causal relationship, which comprises the steps of firstly, carrying out standardization processing on an obtained original alarm log based on an intrusion detection information exchange format, and obtaining the alarm log according to extracted alarm attributes; secondly, obtaining similarity among a plurality of data points in the alarm log and reference degree of corresponding nodes, carrying out iterative updating of an attraction degree matrix and an attribution degree matrix, introducing a damping factor into an AP clustering algorithm for attenuation until a candidate clustering center is stable or the iteration frequency reaches a threshold value, and outputting a clustering center and a data set after an attack scene is divided; and finally, calculating the overall similarity value of any two pieces of alarm data by using a weighted average algorithm, clustering and dividing the overall similarity value after taking a negative value based on an AP clustering algorithm, sequencing according to a time sequence, and performing association matching on the alarm data, so that the association precision and the time efficiency are improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic step diagram of a hybrid alarm association method based on AP clustering and causal relationship provided in the present invention.
FIG. 2 is a diagram of the information transfer between data points provided by the present invention.
FIG. 3 is a flow chart of causal relationship based alarm association provided by the present invention.
Fig. 4 is a network topology diagram of the honeypot system provided by the invention.
Fig. 5 is a comparison graph of the run time provided by the present invention.
Fig. 6 is a first attack diagram provided by the present invention.
The second attack graph provided in fig. 7.
The third attack graph provided in fig. 8.
The fourth attack graph provided in fig. 9.
The fifth attack diagram provided in fig. 10.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
Referring to fig. 1, the present invention provides a hybrid alarm association method based on AP clustering and causal relationship, including:
s101, normalizing the acquired original alarm log to obtain the alarm log.
Specifically, in order to effectively protect network security, an administrator usually deploys a plurality of intrusion detection systems in a network security protection system, and alarm data generated by different intrusion detection systems have different formats, and the data cannot be directly used for correlation analysis, so that normalization processing of an alarm log is the basis of subsequent work. The (IDMEF) is a standard format initiated and established by an Intrusion Detection Working Group (IDWG), the scheme provides basis for realizing information sharing and format exchange among IDSs of different types, so that the acquired original alarm log is standardized based on the intrusion detection information exchange format, after repeated alarm logs are deleted, 8 alarm attributes are extracted from the original alarm data to form a 7-tuple to obtain the alarm log, wherein the extracted alarm attributes have meanings shown in a table 1,
TABLE 1 alarm data Attribute meanings
Figure BDA0002541996710000041
Figure BDA0002541996710000051
Since the same intrusion event will trigger multiple intrusion detection devices in the network security system at the same time, there are a large number of repeated alarms in the replayed data. Aiming at the problem, other alarms with the same attribute except the signature attribute and the time attribute within a certain time threshold are combined into a single alarm, so that repeated alarm data are reduced, and preparation is made for subsequent alarm correlation work.
And S102, dividing the attack scene by improving the similarity and combining an AP clustering method.
Specifically, the similarity among a plurality of data points in the alarm log and the reference degree of a corresponding node are obtained, specifically, a similarity matrix S formed by the similarities among N data points is used as input for cluster analysis, S (i, j) is used for representing the similarity between a node i and a node j, a reference degree Perference concept is introduced for representing the reference degree of the data points as a cluster center, the reference degree of the point i is represented as P (i) or S (i, i), the higher the value is, the higher the possibility that the point i is used as the cluster center is, because the AP algorithm considers that each data point is possible to be used as the cluster center, all P take the same value, the final cluster number is greatly influenced by the reference degree, and the P value is usually taken as a median value or a minimum value of the input similarity value.
And performing iterative update on the attraction degree matrix according to the corresponding similarity and the reference degree, and performing iterative update on the attribution degree matrix according to a corresponding attraction degree value generated by the iterative update of the attraction degree matrix, introducing two concepts of attraction degree (responsiveness) and attribution degree (availabilitity), and realizing the transmission and update between data points by repeatedly updating the attraction degree matrix and the attribution degree matrix in an iterative manner so as to obtain a final clustering center point. The attraction degree r (i, j) represents information sent by the data point i to the data point j, and reflects the degree that the data point j is suitable for being used as the clustering center of the data point i, the larger the r value is, the higher the possibility that the data point j is used as the clustering center of the data point i is, the attribution degree a (i, j) represents information transmitted by the data point j to the data point i, and reflects the suitable degree that the data point i selects the data point j as the clustering center of the data point i, and the larger the a value is, the higher the possibility that the data point i selects the cluster center point j. The information transmission manner between the data points is shown in fig. 2, wherein the update formula of the attraction matrix R is:
Figure BDA0002541996710000052
the update formula of the attribution degree matrix A is as follows:
Figure BDA0002541996710000061
meanwhile, in order to avoid the problem of data oscillation in the matrix updating process, a damping factor lambda is introduced into the AP clustering algorithm to respectively attenuate the attraction degree matrix and the attribution degree matrix, and the updating formula is as follows:
Rt+1(i,k)=λ*rt(i,k)+(1-λ)*rt+1(i,k)
At+1(i,k)=λat(i,k)+(1-λ)*at+1(i,k)
wherein r ist+1(i, k) represents the attraction between the point i and the point k after the t +1 th update, Rt+1(i, k) represents the attraction degree after the attenuation. a ist+1(i, k) denotes the attribution degree after the t +1 th update, At+1(i, k) is the degree of attribution after attenuation.
Then judging whether the candidate clustering center is stable or whether the iteration frequency reaches a threshold value, if the candidate clustering center is not stable or the iteration frequency does not reach the threshold value, continuing to iteratively update the attraction degree matrix and the attribution degree matrix until the candidate clustering center is stable or the iteration frequency reaches the threshold value, and outputting the clustering center and the divided attack scene set;
if the candidate clustering center is stable or the iteration number reaches a threshold value, judging whether the sum of the self-attraction degree r (i, i) and the self-attribution degree a (i, i) is greater than zero; if the sum of the self-indexing degree and the self-attribution degree is larger than zero, namely r (i, i) + a (i, i) > 0, the corresponding node i is a clustering center; and if the sum of the self-attraction degree and the self-attribution degree is less than or equal to zero, namely r (i, i) + a (i, i) is less than or equal to 0, the corresponding node is a non-clustering center, and for a non-clustering center point j, if a data point k enables r (j, k) + a (j, k) to be the maximum, the point k is the clustering center of the data point j. And finally, outputting the final clustering center and the data set to which the classified attributes belong.
The standard AP algorithm adopts Euclidean distance as a similarity calculation criterion, the method belongs to a similarity calculation method based on numerical variables, and is not suitable for non-numerical alarm log attributes, so different similarity calculation methods are set for each attribute of alarm data, the total similarity among the alarm data is obtained through weighted average of the similarity of the attributes, and a similarity matrix is further obtained to serve as the input of the AP algorithm.
Firstly, calculating an attack type similarity value, an IP address similarity value, a port similarity value and a time similarity value of any two pieces of alarm data in the data set, specifically:
attack type similarity:
for the alarm type attribute, if two pieces of alarm data alertiAnd alertjIf the alarm types are the same, the similarity is set to be 1, otherwise, the similarity is set to be 0:
Figure BDA0002541996710000062
and (3) IP address similarity calculation:
the IP addresses in the alarm log are expressed in decimal form, reference [79], and the similarity is calculated by comparing the same prefix digit, so the IP addresses are firstly converted into a binary representation format, and the calculation formula is as follows:
Figure BDA0002541996710000071
wherein r represents the continuous same digit of the IP address from the high order to the low order of the two pieces of alarm data.
Port similarity calculation:
the port number is a boolean attribute, and if two alarm ports are completely the same, the similarity is considered to be 1, and if they are different, the similarity is 0:
Figure BDA0002541996710000072
and (3) calculating the time similarity:
in most researches, a time window is set to determine a limiting threshold of alarm time, the set size of the window threshold can influence the final similarity calculation, date attributes are firstly compared, the time similarity is calculated by adopting a sigmoid function for alarms with the same date attributes, otherwise, the similarity is 0, and the calculation formula is as follows:
Figure BDA0002541996710000073
wherein
Figure BDA0002541996710000074
And then, calculating the overall similarity value of the two pieces of alarm data by using a weighted average algorithm, and determining the weight of each characteristic attribute by using a principal component analysis method in order to avoid the subjectivity of artificial assignment in the process of determining the attribute weight. Two-alarm alertiAnd alertjThe overall similarity calculation formula is as follows:
Figure BDA0002541996710000075
wherein, simlRepresenting the similarity of the properties, ωlAnd representing the weight corresponding to each attribute.
Based on the similarity calculation method, a similarity matrix can be obtained, the overall similarity value takes a negative value based on an AP clustering algorithm and then is subjected to clustering division, and alarm data with high similarity are clustered in the same attack scene. The attack scenario divides the pseudo-code as follows:
attack scene partitioning based on AP clustering
Input: alarm data set Alert ═ { a ═1,a2,…,an}
Output: attack scene set (sequence) { sensor ═1,sence2,…,sencen}
// similarity matrix calculation
Simility=[sim11,sim12,…simnn]
V/initial attraction degree, attribution degree matrix
Initialresponsibility=[r11,r12,…rnn]
Initialavailability=[a11,a12…ann]
V/update attraction and affiliation matrix
If r(i,i)+a(i,i)<0
For(i=0,i<n;i++)
{
Rt+1(i,j)=λ*rt(i,j)+(1-λ)*rt+1(i,j)
At+1(i,j)=λat(i,j)+(1-λ)*at+1(i,j)
}
Endif
V/record Cluster center
String center=updatecenter(classlist,center)
Readcenter.filewriter(center)
ReturnSence={sence1,sence2,…,sencen}
S103, matching and associating alarm data in the attack scene based on the causal relationship idea.
Specifically, by means of an AP clustering method, massive unordered alarm logs are divided according to a similarity principle to form an attack scene set with small intra-class intervals and large inter-class intervals. As shown in the causal relationship-based alarm association flow chart provided in fig. 3:
sequentially reading the alarm data sets after clustering processing, and sequentially carrying out association analysis on data in each attack scene; and then sorting the clustered and divided data according to the time sequence, matching a first piece of data with a plurality of second pieces of data according to the step length of 1, dividing the first piece of data with matched ports and IP addresses into related and corresponding alarm data, and dividing the first piece of data with unmatched ports and IP addresses into an isolated alarm queue until all data are matched.
The causality association method considers that most of a complete network attack is completed according to a certain logic sequence through attack behaviors in different stages, the former attack behavior is used as a condition for providing next-step invasion behaviors, for example, an attacker needs to realize TCP SYN denial of service attack on a certain host, firstly, a message is sent to a target computer and can carry out next-step attack actions after the target receives and establishes connection, therefore, the premise that each attack action has the requirement is provided and a corresponding result is generated, the premise and the result of alarm information are connected according to a certain rule, and a complete attack sequence can be found. The method is used for carrying out correlation analysis on the alarm data in the same attack scene by taking the thought as a reference, so that the logical relation among the alarm data is described.
The method is adopted for experiments, a win10 operating system, Intel Core i7-8550U, a 2.4GHz processor, an 8G memory and a hard disk are configured in a 1T mode, an alarm correlation algorithm adopts Python3.6 programming in a Windows environment, a programming platform is Pycharm Community 2017.3, and the alarm data file processing work is simply and efficiently realized by means of a scipit-lern library. After alarm association is finished, the attack scene is displayed in a visual mode in the form of an attack graph by using a Graphviz graph drawing tool, and a Honeypot data set of a security research organization of a dense network project is selected for research and analysis. The Honeypot data set (Honeypot data set) is a Honeypot data set which is constructed to simulate a real network system and runs a disguised service in the Honeypot data set to induce an intruder to start attacks and capture and analyze related information, a Honeypot system network topological diagram is shown in fig. 4, thousands of virtual systems can be deployed in one virtual Honeypot system, each system uses different IP and port numbers, so that the diversity of data sources is guaranteed, the data are from the real network attacks, the data types comprise multiple aspects such as an operating system, a violent network attack, a host computer vulnerability and port scanning, and the attack types contained in the data are shown in table 3.
TABLE 3 honeypot data attack types
Figure BDA0002541996710000091
Figure BDA0002541996710000101
The data are required to be preprocessed before alarm association work is carried out, 123854 pieces of alarm data in a honeypot data set in October 2014 are selected, the number of alarms is reduced to 80928 after preprocessing and duplicate removal, 35% of repeated alarms are deleted, and repeated association is avoided. Since the complexity of the alarm association method adopted by the method is mainly concentrated on the attack scene division part based on AP clustering, the most common clustering method K-means algorithm in machine learning is selected for comparison and analysis.
1. Temporal complexity analysis
The AP method needs to utilize similarity to calculate attraction degree and attribution degree in a clustering process, an N-order similarity input matrix is formed by N sample numbers, attraction degree and attribution degree updating is carried out for N times, attenuation is carried out after each updating is finished, the iteration number depends on a sample data volume N and a reference degree p, the iteration number is plogN under the limit condition, and the time complexity of the AP algorithm is O (N) because p is a constant2logN), and K-Means algorithm
The sample data size N and the number K of cluster centers jointly determine that all data respectively perform a loop operation with K different cluster centers, so the time complexity of the algorithm is only O (N × K), and the running time pair thereof is as shown in fig. 5.
It can be seen from the figure that the running speed of the AP algorithm is high when the sample size is small, the running efficiency of the K mean value is obviously superior to that of the AP algorithm when the data size is increased to 500, however, a clustering center can be extracted in the early training process, and the clustering division can be completed only by calculating newly input data with a clustering center set, so that the sample number in real-time association is effectively reduced, and the defect of the AP clustering algorithm in time efficiency is well made up. Meanwhile, the sum of the squared errors is taken as a verification index to compare two algorithms, and the calculation formula is as follows:
Figure BDA0002541996710000111
where r represents r attributes in the sample data, niRepresents the nth sampleThe ith attribute of the book; and (3) comparing the difference between the sample value and the mean value of the n sample data under different attributes to check the fitting accuracy of the model. The calculation results are shown in table 4, and it can be known from the results that the fitting degree of the AP clustering is much higher than that of the K-means algorithm.
TABLE 4 comparison of degrees of Fidelity
Clustering method Sum of squares of errors
AP clustering 0.46
K-means clustering 2.38
Correlation result analysis
After the attack scene is clustered and divided, the attack process can be restored according to the idea of causal association, then the attack graph is described by using graph drawing software, and 5 representative attack graphs are selected for analysis:
as shown in the first attack diagram provided in fig. 6, a complete DoS attack process is restored, where an attacker first scans a host in the network, performs vulnerability scanning (sadmlind-ping) on an attack target when finding an active target, and launches a buffer overflow attack on the host with an IP address of 61.163.217.30 to obtain a root authority thereof, and then sends a mstreamDoS atomic attack to the 61.163.217.255 host with the host as a jump board, thereby controlling the puppet to launch an attack on the network.
The attack process shown in the second attack diagram provided in fig. 7 is relatively common, and these attack actions are only simple actions of scanning and initiating remote connection, etc., and do not complete a real attack, but still trigger alarm information, and these actions are not highly threatening, but still expose vulnerabilities existing in the network.
The third attack diagram provided in fig. 8 restores the process that one target host is attacked by multiple hackers, multiple intruders perform SYN scanning or FIN scanning on the target host in the same time period, obtain active port information through a returned message, further attack the host and obtain a high-level right, and then launch different types of distributed attacks on different hosts in the network by using the host as a jumper.
As a fourth attack diagram provided in fig. 9 shows a distributed port attack process, an attack source initiates an attack on the same port at different IP addresses, controls the puppet computers by remote login, searches for a target port with the current host as a host, initiates a local or remote attack, and destroys the target host by a buffer overflow attack.
The fifth attack diagram provided in fig. 10 is a typical attack process of obtaining rights and performing distributed attack by using vulnerabilities, and an attacker obtains vulnerability information in different targets through vulnerability scanning, and develops an attack against different vulnerabilities to obtain higher-level rights.
2. Correlation efficiency analysis
The correlation proportion and the false alarm rate are reasonable indexes for verifying the alarm correlation validity, and the calculation formulas are respectively as follows:
Figure BDA0002541996710000121
Figure BDA0002541996710000122
the correlation analysis method based on the causal knowledge network and the correlation analysis method based on the attribute similarity and the knowledge base are selected and compared with the method provided by the text, and the comparison condition is shown in table 5.
TABLE 5 correlation ratio, false alarm Rate comparison
Correlation analysis method False alarm rate Correlation ratio
The methods as presented herein 2.1% 96.7%
Correlation analysis method based on causal knowledge network 10.7% 83.6%
Correlation analysis method based on attribute similarity and knowledge base 4.5% 93.2%
As can be seen from table 5, the hybrid alarm correlation method is higher than the single correlation method in the correlation accuracy, and can effectively reduce the false alarm rate; the method provided by the invention selects the AP algorithm with higher fitting degree, so that the association proportion is higher than that of an association analysis method based on attribute similarity and a knowledge base, and meanwhile, the false alarm rate is reduced to a certain extent, which shows that the method can find out more internal logic relations from massive alarm data, and effectively reduces isolated alarms.
The invention relates to a mixed alarm association method based on AP clustering and causal relationship, which comprises the steps of firstly, carrying out standardization processing on an obtained original alarm log based on an intrusion detection information exchange format, and obtaining the alarm log according to extracted alarm attributes; secondly, obtaining similarity among a plurality of data points in the alarm log and reference degree of corresponding nodes, carrying out iterative updating of an attraction degree matrix and an attribution degree matrix, introducing a damping factor into an AP clustering algorithm for attenuation until a candidate clustering center is stable or the iteration frequency reaches a threshold value, and outputting a clustering center and a data set after an attack scene is divided; and finally, calculating the overall similarity value of any two pieces of alarm data by using a weighted average algorithm, clustering and dividing the overall similarity value after taking a negative value based on an AP clustering algorithm, sequencing according to a time sequence, and performing association matching on the alarm data, so that the association precision and the time efficiency are improved.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (7)

1. A hybrid alarm association method based on AP clustering and causal relationship is characterized by comprising the following steps:
normalizing the obtained original alarm log to obtain an alarm log;
the attack scenes are divided by improving the similarity and combining an AP clustering method;
and matching and associating alarm data in the attack scene based on the causal relationship idea.
2. The AP clustering and causal relationship-based hybrid alarm correlation method of claim 1, wherein the normalizing the obtained original alarm log to obtain an alarm log comprises:
and standardizing the obtained original alarm log based on an intrusion detection information exchange format, and after deleting repeated alarm logs, extracting a plurality of alarm attributes from the original alarm data to form a 7-tuple to obtain the alarm log.
3. The AP clustering and causal relationship-based hybrid alarm correlation method of claim 2, wherein the partitioning of attack scenarios using similarity improvement in combination with AP clustering comprises:
and acquiring the similarity among a plurality of data points in the alarm log and the reference degree of the corresponding node, performing iterative update on an attraction matrix according to the corresponding similarity and the reference degree, and performing iterative update on an attribution matrix according to the corresponding attraction value generated by the iterative update of the attraction matrix.
4. The AP clustering and causal relationship-based hybrid alarm correlation method of claim 3, wherein said utilizing similarity improvement in conjunction with AP clustering to segment attack scenarios further comprises:
and introducing a damping factor into the AP clustering algorithm, respectively attenuating the attraction degree matrix and the attribution degree matrix, and judging whether the candidate clustering center is stable or whether the iteration times reach a threshold value.
5. The AP clustering and causal relationship-based hybrid alarm correlation method of claim 4, wherein determining whether a candidate cluster center is stable or whether a number of iterations reaches a threshold comprises:
if the candidate clustering center is not stable or the iteration times do not reach the threshold value, continuously carrying out iteration updating on the attraction degree matrix and the attribution degree matrix until the candidate clustering center is stable or the iteration times reach the threshold value, and outputting the clustering center and the divided attack scene set;
if the candidate clustering center is stable or the iteration number reaches a threshold value, judging whether the sum of the self-attraction degree and the self-attribution degree is greater than zero;
if the sum of the self-attraction degree and the self-homing degree is greater than zero, the corresponding node is a clustering center;
and if the sum of the self-index degree and the self-attribution degree is less than or equal to zero, the corresponding node is a non-clustering center.
6. The AP clustering and causal relationship-based hybrid alarm correlation method of claim 5, wherein said outputting a cluster center and a partitioned set of attack scenarios further comprises:
and calculating the overall similarity values of the two pieces of alarm data by using a weighted average algorithm according to the attack type similarity value, the IP address similarity value, the port similarity value and the time similarity value of any two pieces of alarm data in the attack scene set, and clustering and dividing the overall similarity values after taking a negative value based on an AP clustering algorithm.
7. The AP clustering and causality-based hybrid alarm correlation method according to claim 6, wherein the matching correlation of the alarm data in the attack scene based on the causality idea comprises:
after the clustered and divided data are sorted according to the time sequence, a first piece of data is sequentially matched with a plurality of second pieces of data according to the step length of 1, the first piece of data with matched ports and IP addresses is divided into related and corresponding alarm data, and the first piece of data with unmatched ports and IP addresses is divided into an isolated alarm queue until all data are matched.
CN202010549657.9A 2020-06-16 2020-06-16 Hybrid alarm association method based on AP clustering and causal relationship Active CN111709022B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010549657.9A CN111709022B (en) 2020-06-16 2020-06-16 Hybrid alarm association method based on AP clustering and causal relationship

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010549657.9A CN111709022B (en) 2020-06-16 2020-06-16 Hybrid alarm association method based on AP clustering and causal relationship

Publications (2)

Publication Number Publication Date
CN111709022A true CN111709022A (en) 2020-09-25
CN111709022B CN111709022B (en) 2022-08-19

Family

ID=72540951

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010549657.9A Active CN111709022B (en) 2020-06-16 2020-06-16 Hybrid alarm association method based on AP clustering and causal relationship

Country Status (1)

Country Link
CN (1) CN111709022B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113282639A (en) * 2021-04-27 2021-08-20 深圳市中燃科技有限公司 Gas leakage data monitoring method and system, intelligent terminal and storage medium
CN113420802A (en) * 2021-06-04 2021-09-21 桂林电子科技大学 Alarm data fusion method based on improved spectral clustering
CN114287903A (en) * 2021-12-31 2022-04-08 佳禾智能科技股份有限公司 Heart rate detection method and device based on piezoelectric sensor and storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1878093A (en) * 2006-07-19 2006-12-13 华为技术有限公司 Security event associative analysis method and system
US20070008098A1 (en) * 2005-07-08 2007-01-11 Hsing-Kuo Wong Method and architecture for online classification-based intrusion alert correlation
CN103746961A (en) * 2013-12-12 2014-04-23 中国人民解放军63928部队 Method, apparatus and server for mining causal knowledge of network attack scenario
CN105471623A (en) * 2015-11-16 2016-04-06 中国烟草总公司江苏省公司 Key IP address safety alarm association analysis method based on fuzzy scene
CN106911629A (en) * 2015-12-22 2017-06-30 中国移动通信集团公司 A kind of alert correlation method and device
CN108833139A (en) * 2018-05-22 2018-11-16 桂林电子科技大学 A kind of OSSEC alert data polymerization divided based on category attribute
CN109450946A (en) * 2018-12-27 2019-03-08 浙江大学 A kind of unknown attack scene detection method based on alert correlation analysis
CA3025386A1 (en) * 2017-11-27 2019-05-27 Pelmorex Corp. Systems and methods for location-based alert generation
CN110381015A (en) * 2019-06-03 2019-10-25 西安电子科技大学 A kind of clustering method based on intruding detection system warning message
CN110474885A (en) * 2019-07-24 2019-11-19 桂林电子科技大学 Alert correlation analysis method based on time series and IP address
CN110659997A (en) * 2019-08-15 2020-01-07 中国平安财产保险股份有限公司 Data cluster identification method and device, computer system and readable storage medium
WO2020013958A1 (en) * 2018-07-10 2020-01-16 Siemens Aktiengesellschaft Hybrid unsupervised machine learning framework for industrial control system intrusion detection

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070008098A1 (en) * 2005-07-08 2007-01-11 Hsing-Kuo Wong Method and architecture for online classification-based intrusion alert correlation
CN1878093A (en) * 2006-07-19 2006-12-13 华为技术有限公司 Security event associative analysis method and system
CN103746961A (en) * 2013-12-12 2014-04-23 中国人民解放军63928部队 Method, apparatus and server for mining causal knowledge of network attack scenario
CN105471623A (en) * 2015-11-16 2016-04-06 中国烟草总公司江苏省公司 Key IP address safety alarm association analysis method based on fuzzy scene
CN106911629A (en) * 2015-12-22 2017-06-30 中国移动通信集团公司 A kind of alert correlation method and device
CA3025386A1 (en) * 2017-11-27 2019-05-27 Pelmorex Corp. Systems and methods for location-based alert generation
CN108833139A (en) * 2018-05-22 2018-11-16 桂林电子科技大学 A kind of OSSEC alert data polymerization divided based on category attribute
WO2020013958A1 (en) * 2018-07-10 2020-01-16 Siemens Aktiengesellschaft Hybrid unsupervised machine learning framework for industrial control system intrusion detection
CN109450946A (en) * 2018-12-27 2019-03-08 浙江大学 A kind of unknown attack scene detection method based on alert correlation analysis
CN110381015A (en) * 2019-06-03 2019-10-25 西安电子科技大学 A kind of clustering method based on intruding detection system warning message
CN110474885A (en) * 2019-07-24 2019-11-19 桂林电子科技大学 Alert correlation analysis method based on time series and IP address
CN110659997A (en) * 2019-08-15 2020-01-07 中国平安财产保险股份有限公司 Data cluster identification method and device, computer system and readable storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
FENG XUEWEI: "An Approach of Discovering Causal Knowledge for Alert Correlating Based on Data Mining", 《2014 IEEE 12TH INTERNATIONAL CONFERENCE ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING》 *
YANG BO 等: "novel correlation analysis of alarms based on block matching similarities", 《INDUSTRIAL & ENGINEERING CHEMISTRY PROCESS DESIGN AND DEVELOPMENT》 *
林斌等: "基于近邻传播聚类的电力通信告警分析方法", 《电子设计工程》 *
王硕 等: "基于因果知识网络的攻击场景构建方法", 《计算机研究与发展》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113282639A (en) * 2021-04-27 2021-08-20 深圳市中燃科技有限公司 Gas leakage data monitoring method and system, intelligent terminal and storage medium
CN113420802A (en) * 2021-06-04 2021-09-21 桂林电子科技大学 Alarm data fusion method based on improved spectral clustering
CN113420802B (en) * 2021-06-04 2023-05-30 桂林电子科技大学 Alarm data fusion method based on improved spectral clustering
CN114287903A (en) * 2021-12-31 2022-04-08 佳禾智能科技股份有限公司 Heart rate detection method and device based on piezoelectric sensor and storage medium

Also Published As

Publication number Publication date
CN111709022B (en) 2022-08-19

Similar Documents

Publication Publication Date Title
Aljawarneh et al. Anomaly-based intrusion detection system through feature selection analysis and building hybrid efficient model
CN111565205B (en) Network attack identification method and device, computer equipment and storage medium
CN112738015B (en) Multi-step attack detection method based on interpretable convolutional neural network CNN and graph detection
CN111709022B (en) Hybrid alarm association method based on AP clustering and causal relationship
Peng et al. Network intrusion detection based on deep learning
CN110474885B (en) Alarm correlation analysis method based on time sequence and IP address
Niu et al. Identifying APT malware domain based on mobile DNS logging
CN112333195B (en) APT attack scene reduction detection method and system based on multi-source log correlation analysis
Tong et al. A method for detecting DGA botnet based on semantic and cluster analysis
Laftah Al-Yaseen et al. Hybrid modified-means with C4. 5 for intrusion detection systems in Multiagent Systems
CN112182567B (en) Multi-step attack tracing method, system, terminal and readable storage medium
Yang et al. Fast3DS: A real-time full-convolutional malicious domain name detection system
Dasari et al. Detection of Different DDoS Attacks Using Machine Learning Classification Algorithms.
Brandao et al. Log Files Analysis for Network Intrusion Detection
Wang et al. Res-TranBiLSTM: An intelligent approach for intrusion detection in the Internet of Things
Tao et al. A hybrid alarm association method based on AP clustering and causality
CN115795330A (en) Medical information anomaly detection method and system based on AI algorithm
CN112215300A (en) Network structure enhancement-based graph convolution model defense method, device and system
CN114726634B (en) Knowledge graph-based hacking scene construction method and device
CN111953665A (en) Server attack access identification method and system, computer equipment and storage medium
CN115242487B (en) APT attack sample enhancement and detection method based on meta-behavior
Zhang et al. A Intrusion Detection Model Based on Convolutional Neural Network and Feature Selection
CN111901137A (en) Method for mining multi-step attack scene by using honeypot alarm log
Srilatha et al. DDoSNet: A Deep Learning Model for detecting Network Attacks in Cloud Computing
CN113420791B (en) Access control method and device for edge network equipment and terminal equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20200925

Assignee: Guangxi Jun'an Network Security Technology Co.,Ltd.

Assignor: GUILIN University OF ELECTRONIC TECHNOLOGY

Contract record no.: X2022450000459

Denomination of invention: A Hybrid Alarm Correlation Method Based on AP Clustering and Causality

Granted publication date: 20220819

License type: Common License

Record date: 20221228

EE01 Entry into force of recordation of patent licensing contract