CN114726589A

CN114726589A - Alarm data fusion method

Info

Publication number: CN114726589A
Application number: CN202210267375.9A
Authority: CN
Inventors: 陶星宇; 黄义杰; 高翔; 肖华
Original assignee: Jiangsu Paienjie Network Security Co ltd; Nanjing Polytechnic Institute
Current assignee: Jiangsu Paienjie Network Security Co ltd; Nanjing Polytechnic Institute
Priority date: 2022-03-17
Filing date: 2022-03-17
Publication date: 2022-07-08

Abstract

The invention discloses an alarm data fusion method, which comprises the steps of preprocessing obtained alarm data into a preset format, namely combining all alarm sequences into an alarm time window set according to a preset time difference; carrying out multiple attribute similarity calculation on the sub-time window set; substituting the calculated similarity of various attributes into a preset judgment matrix, calculating a characteristic value and a corresponding characteristic vector of the judgment matrix, fusing the alarm data of the sub-time window set reaching a preset similarity threshold value, and inputting the fused data into a fused data set; if the sub-time window set does not reach the preset similarity threshold value, directly inputting the sub-time window set into the fusion data set; and combining the fused data sets of all the sub time window sets into a reduced alarm data set for output. The invention can solve the problem that a great deal of redundant or misinformed alarms generally exist in the alarm data and find out key safety events.

Description

Alarm data fusion method

Technical Field

The invention relates to the technical field of network security, in particular to an alarm data fusion method.

Background

With the increasing network security, the research on the intrusion detection field has become a research hotspot in the whole computer science field. Intrusion detection is developed from the earliest proposal to the present, and various detection technologies are continuously developed and matured, such as detection technologies based on detection mechanisms and detection data sources. Related products are also increasingly rich in host-based, network-based IDS, distributed IDS, and the like. In addition, researchers at home and abroad have also conducted a great deal of research on intrusion detection methods. The traditional safety protection system has low efficiency on processing a large number of alarms, has high error rate and is easy to ignore key alarm information. The alarm fusion technology is provided for reducing redundant alarms and false alarms in the alarm data generated by the IDS and providing valuable alarm data for the alarm correlation analysis of the next stage. The alarm fusion technology is mainly characterized in that high phases are combined

The similarity alarm data are combined to reduce redundant and false alarm data.

Disclosure of Invention

1. The technical problem to be solved is as follows:

aiming at the technical problem, the invention provides an alarm data fusion method, which is used for carrying out similarity calculation on attributes of repeated and low-level data in a large amount of alarm data generated by an attack event and adopting

2. The technical scheme is as follows:

an alarm data fusion method is characterized in that: preprocessing the obtained alarm data into a preset format, namely all alarm sequences; dividing all alarm sequences according to alarm time, and dividing a previous alarm with a time difference smaller than a preset interval threshold value into a previous time window i-1; if the time difference is larger than or equal to a preset interval threshold, dividing the alarm to the starting point of the next alarm time to obtain the current sub-time window i; on the basis, all alarm sequences are divided into n sub-time window sets, and the n sub-time window sets are combined into an alarm time window set;

carrying out multiple attribute similarity calculation on the sub-time window set; the attribute similarity comprises calculation of IP addresses, port numbers, detection occurrence time and attack type similarity; substituting the calculated similarity of various attributes into a preset judgment matrix, calculating the eigenvalue of the judgment matrix and the corresponding eigenvector, and solving the maximum eigenvalue and the corresponding eigenvector of the judgment matrix; fusing the alarm data of the sub-time window set reaching the preset similarity threshold value, and then inputting the fused data into a fused data set; if the sub-time window set does not reach the preset similarity threshold value, directly inputting the sub-time window set into the fusion data set;

and combining the fused data sets of all the sub time window sets into a reduced alarm data set for output.

Further, the preprocessing specifically comprises extracting key attributes of alarm data from the original data set; converting the format of the original data into a unified sequence according to the intrusion detection message exchange format to obtain all alarm sequences; the key attributes include a characteristic string, an alarm category, an alarm date, an alarm timestamp, a source IP, a source port, a destination IP, and a destination port.

Further, the similarity to the IP address in the attribute similarity calculation is calculated as:

(1) in the formula, l is a plurality of continuous same digits, and epsilon is a preset IP similarity threshold; l is the number of consecutive identical digits, l ∈ [1,32 ];

the port similarity is calculated as:

(2) in the formula, alert port represents a port number, and alert1.port is a port number with a port number of 1;

the detection occurrence time similarity is as follows:

(3) in the formula, Tmin is a preset alarm time minimum threshold, Tmax is a preset alarm time maximum threshold, wherein the time interval is alert1.t ime-alert2.time, namely two continuous alarm time differences;

the attack type similarity is calculated as:

(4) type represents the type of alarm.

Further, the preset judgment matrix is a ═ a_ij)_n*nWherein a is_ijThe importance of the preset key attribute i to the similarity j is [1, 9]The integers in the interval, wherein the numbers 1,3, 5, 7, 9 respectively indicate that the weights are equally important, more important, very important and absolutely important, and 2, 4, 6, 8 are between the two adjacent judgments.

3. Has the advantages that:

the invention provides an alarm data fusion method, which aims at the problem that a large number of redundant or false alarm alarms generally exist in alarm data and key safety events are difficult to find out from the alarm data. Aiming at the fact that certain relation exists among the attributes of the alarm data, the relative importance of each attribute field is different, namely, a similarity matrix among the alarm data is constructed by using an attribute similarity calculation method to replace a traditional similarity measurement method in spectral clustering, and better clustering can be achieved under the condition that the relation among the alarm data is maintained. The method can realize better clustering fusion under the condition of not destroying the relation between alarms, reduce information loss, improve the fusion rate and reduce the false alarm rate of alarm data.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

The present invention will be described in detail with reference to the accompanying drawings.

As shown in fig. 1, an alarm data fusion method is characterized in that: preprocessing the obtained alarm data into a preset format, namely all alarm sequences; dividing all alarm sequences according to alarm time, and dividing a previous alarm with a time difference smaller than a preset interval threshold value into a previous time window i-1; if the time difference is larger than or equal to a preset interval threshold, dividing the alarm to the starting point of the next alarm time to obtain the current sub-time window i; based on the method, all the alarm sequences are divided into n sub-time window sets, and the n sub-time window sets are combined into an alarm time window set.

When a port is attacked by DoS, a large number of same or similar alarms can be generated in a short time, generally speaking, the alarm triggering time interval is short and the distribution is concentrated under the same complete continuous attack, and the alarms triggered by the same attack event and different attack events can be effectively divided through the method.

When the attribute similarity is calculated, because the expressed meanings of the numerical types are greatly different due to different attributes of the numerical types, different attributes need to be calculated by adopting a plurality of similarity calculation methods, and the four attributes of the calculated similarity are respectively an IP address, a port number, detection occurrence time and an attack type.

Further, the preprocessing specifically comprises extracting key attributes of alarm data from the original data set; converting the format of the original data into a unified sequence according to the intrusion detection message exchange format to obtain all alarm sequences; the key attributes include a feature string, an alarm category, an alarm date, an alarm timestamp, a source IP, a source port, a destination IP, and a destination port.

(1) in the formula, l is a plurality of continuous same digits, and epsilon is a preset IP similarity threshold; l is the number of consecutive identical bits, l ∈ [1,32 ].

The function of l is to balance the probability of whether two IP addresses belong to the same subnet, if the two IP addresses are in the same subnet and have larger similarity, the larger the value of l is, the more the attack is proved to be from the same attack source or the same attack target, the IP address of the same attack source is similar, and the IP of the same attack target is similar.

The port similarity is calculated as:

(2) in the formula, alert port represents a port number, and alert1.port is a port number with a port number of 1. If the port numbers are the same, the similarity is 1, otherwise, the similarity is 0.

The detection occurrence time similarity is as follows:

(3) in the formula, Tmin is a preset minimum alarm time threshold, Tmax is a preset maximum alarm time threshold, and the time interval is 1.t ime-alert2.time, which is a difference between two continuous alarm times. And calculating the similarity of the time attributes through the difference of the two alarm time.

The attack type similarity is calculated as:

(4) type represents the type of alarm. If the attack types are the same, the similarity is 1, otherwise, the similarity is 0.

Further, the preset judgment matrix is a ═ a_ij)_n*nWherein a is_ijThe importance of the preset key attribute i to the similarity j specifically comprises [1, 9]The integers in the interval, wherein the numbers 1,3, 5, 7, 9 respectively indicate that the weights are equally important, more important, very important and absolutely important, and 2, 4, 6, 8 are between the two adjacent judgments.

Although the present invention has been described with reference to the preferred embodiments, it should be understood that various changes and modifications can be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An alarm data fusion method is characterized in that: preprocessing the obtained alarm data into a preset format, namely all alarm sequences; dividing all alarm sequences according to alarm time, and dividing a previous alarm with a time difference smaller than a preset interval threshold value into a previous time window i-1; if the time difference is larger than or equal to a preset interval threshold, dividing the alarm to the starting point of the next alarm time to obtain the current sub-time window i; on the basis, all alarm sequences are divided into n sub-time window sets, and the n sub-time window sets are combined into an alarm time window set;

carrying out multiple attribute similarity calculation on the sub-time window set; the attribute similarity comprises calculation of IP addresses, port numbers, detection occurrence time and attack type similarity; substituting the calculated similarity of various attributes into a preset judgment matrix, calculating the eigenvalue of the judgment matrix and the corresponding eigenvector, and solving the maximum eigenvalue and the corresponding eigenvector of the judgment matrix; fusing the alarm data of the sub-time window set reaching the preset similarity threshold value, and then inputting the fused data into a fused data set; if the sub-time window set does not reach the preset similarity threshold value, the sub-time window set is directly input into the fusion data set;

2. The alarm data fusion method according to claim 1, characterized in that: the preprocessing specifically comprises extracting key attributes of alarm data from an original data set; converting the format of the original data into a unified sequence according to the intrusion detection message exchange format to obtain all alarm sequences; the key attributes include a feature string, an alarm category, an alarm date, an alarm timestamp, a source IP, a source port, a destination IP, and a destination port.

3. The alarm data fusion method according to claim 1, characterized in that: the similarity calculation of the IP address in the attribute similarity calculation comprises the following steps:

(1) in the formula, l is a plurality of continuous same digits, and epsilon is a preset IP similarity threshold; l is the number of a plurality of consecutive identical bits, l ∈ [1,32 ];

the port similarity is calculated as:

the detection occurrence time similarity is as follows:

the attack type similarity is calculated as:

(4) type represents the type of alarm.

4. The alarm data fusion method according to claim 1, characterized in that: the preset judgment matrix is A ═ a_ij)_n*nWherein a is_ijThe importance of the preset key attribute i to the similarity j is [1, 9]The integers in the interval, wherein the numbers 1,3, 5, 7, 9 respectively indicate that the weights are equally important, more important, very important and absolutely important, and 2, 4, 6, 8 are between the two adjacent judgments.