CN115378738B

CN115378738B - Alarm filtering method, system and equipment based on classification algorithm

Info

Publication number: CN115378738B
Application number: CN202211298642.5A
Authority: CN
Inventors: 吴麒麟; 路冰; 唐上; 卢延科; 梁宇
Original assignee: Zhongfu Safety Technology Co Ltd
Current assignee: Zhongfu Safety Technology Co Ltd
Priority date: 2022-10-24
Filing date: 2022-10-24
Publication date: 2023-03-24
Anticipated expiration: 2042-10-24
Also published as: CN115378738A

Abstract

The application discloses an alarm filtering method, system and device based on a classification algorithm, mainly relates to the technical field of alarm filtering, and aims to solve the problems that the existing false alarm prediction effect is poor and the like. The method comprises the following steps: acquiring predicted alarm data, alarm types and typical training data from historical false alarm logs in a preset time period of a server; obtaining an average distance value corresponding to the predicted alarm data according to a preset classification algorithm; extracting verification alarm data from a historical false alarm log of a server as input of a preset classification algorithm to obtain an average distance value corresponding to the extracted verification alarm data; obtaining estimated false alarm data in the verification alarm data; updating a preset screening value; taking the predicted alarm data as the input of the algorithm again to obtain an average distance value; and then according to the updated preset screening value, alarm filtering is completed. The method improves the prediction rate of false alarm.

Description

Alarm filtering method, system and equipment based on classification algorithm

Technical Field

The present application relates to the field of alarm filtering technologies, and in particular, to an alarm filtering method, system and device based on a classification algorithm.

Background

With the rapid development of networks, the increase of network attack types and detection rules causes huge workload of an intrusion detection system, and meanwhile, due to the limitation of the detection rules, the intrusion detection system has a more serious false alarm condition.

At present, the method for avoiding false alarm mainly comprises the following steps: an alarm training set is constructed using the alarm data. Then, training a prior alarm neural network by combining an alarm training set with a Spark engine, and updating the neural network according to the network structure score; and finally, analyzing the real-time alarms according to the alarm neural network, and finding out the association relation among the real-time alarms so as to position the root alarm.

However, the prior probability in the prior warning neural network depends on the assumption many times, and when the assumed prior model is greatly different from the real event probability model, the prediction effect is poor due to the fact that the assumed prior model is not matched with the real event model. And different types of alarms are not mutually independent under the real condition, so that the accuracy of the alarm of the prior alarm neural network is greatly influenced.

Disclosure of Invention

In view of the above-mentioned deficiencies of the prior art, the present invention provides a method, a system and a device for alarm filtering based on a classification algorithm, so as to solve the above-mentioned technical problems.

In a first aspect, the present application provides an alarm filtering method based on a classification algorithm, the method comprising: obtaining predicted alarm data, an alarm type corresponding to the predicted alarm data and typical training data corresponding to the alarm type from historical false alarm logs in a preset time period of a server; according to a preset classification algorithm, obtaining the Euclidean distance between the prediction alarm data and the typical training data, further obtaining the characteristic dimension N of the prediction alarm data, and obtaining the average distance value in an N-dimensional space; extracting verification alarm data from a historical false alarm log of a server as input of a preset classification algorithm, and obtaining an average distance value of a plurality of alarm types corresponding to the extracted verification alarm data; obtaining pre-estimated false alarm data in the verification alarm data according to the size relation between the average distance value and the preset screening value; determining the ratio of the real alarm data amount in the estimated false alarm data to the real alarm data total amount in the verified alarm data; updating the preset screening value based on the ratio and the preset increment value until the ratio is smaller than a preset ratio threshold value; taking the predicted alarm data as the input of the algorithm again to obtain an average distance value; and then according to the updated preset screening value, alarm filtering is completed.

Further, obtaining predicted alarm data, an alarm type corresponding to the predicted alarm data, and typical training data corresponding to the alarm type from historical false alarm logs in a preset time period of the server specifically includes: acquiring historical false alarm logs in a preset time period, and reading alarm message characteristics in the historical false alarm logs; wherein, the alarm message characteristics at least include: the IP, port, protocol type, alarm grade, alarm type and IP packet head of the alarm record; acquiring a type ratio of the number of historical false alarm logs corresponding to each alarm type to the total number of the historical false alarm logs; determining the quantity of typical training data corresponding to each alarm type according to the type ratio corresponding to each alarm type; and then the acquisition of typical training data is completed.

Further, according to a preset classification algorithm, the Euclidean distance between the prediction alarm data and the typical training data is obtained; further, obtaining an average distance value of the predicted alarm data corresponding to a plurality of alarm types, specifically comprising: carrying out normalization processing on the prediction alarm data and the typical training data; taking the prediction alarm data after the normalization processing as the input of a preset classification algorithm, and calculating the Euclidean distance between the prediction alarm data and the typical training data through the preset classification algorithm; acquiring an average Euclidean distance corresponding to each alarm type; determining a weight value corresponding to each alarm type based on the average Euclidean distance; and determining an average distance value corresponding to the predicted alarm data according to the average Euclidean distance and the weight value.

Further, calculating the Euclidean distance between the prediction alarm data and the typical training data through a preset classification algorithm, and specifically comprising the following steps of: and calculating the Euclidean distance between the prediction alarm data and the typical training data through a KNN algorithm.

Further, extracting verification alarm data from the historical false alarm log of the server as the input of a preset classification algorithm specifically comprises the following steps: calculating the Euclidean distance between the predicted alarm data and the typical training data through a preset classification algorithm; determining predicted alarm data with Euclidean distance smaller than a preset minimum distance threshold as reference false alarm data; determining the predicted alarm data with Euclidean distance greater than the preset maximum distance threshold as reference real alarm data; randomly extracting data with preset extraction quantity from the reference false alarm data and the reference real alarm data to be used as verification alarm data.

In a second aspect, the present application provides an alarm filtering system based on a classification algorithm, the system comprising: the acquisition module is used for acquiring predicted alarm data, an alarm type corresponding to the predicted alarm data and typical training data corresponding to the alarm type from historical false alarm logs in a preset time period of the server; according to a preset classification algorithm, obtaining the Euclidean distance between the prediction alarm data and the typical training data, further obtaining the characteristic dimension N of the prediction alarm data, and obtaining the average distance value in an N-dimensional space; the acquisition module is used for extracting verification alarm data from a historical false alarm log of the server as the input of a preset classification algorithm and acquiring an average distance value of a plurality of alarm types corresponding to the extracted verification alarm data; obtaining pre-estimated false alarm data in the verification alarm data according to the size relation between the average distance value and the preset screening value; the completion module is used for determining the ratio of the real alarm data volume in the estimated false alarm data to the real alarm data total volume in the verified alarm data; updating the preset screening value based on the ratio and the preset increment value until the ratio is smaller than a preset ratio threshold value; taking the predicted alarm data as the input of the algorithm again to obtain an average distance value; and then according to the updated preset screening value, alarm filtering is completed.

Further, the acquisition module further comprises an acquisition unit; the system comprises a log acquisition module, a log storage module, a log display module and a log display module, wherein the log acquisition module is used for acquiring historical false alarm logs in a preset time period and reading alarm message characteristics in the historical false alarm logs; wherein, the alarm message characteristics at least include: the IP, port, protocol type, alarm grade, alarm type and IP packet head of the alarm record; acquiring a type ratio of the number of historical false alarm logs corresponding to each alarm type to the total number of the historical false alarm logs; determining the quantity of typical training data corresponding to each alarm type according to the type ratio corresponding to each alarm type; and then the acquisition of typical training data is completed.

In a third aspect, the present application provides an alarm filtering device based on a classification algorithm, where the device includes: a processor; and a memory having executable code stored thereon, the executable code, when executed, causing the processor to perform a method of alarm filtering based on a classification algorithm as described above.

As can be appreciated by those skilled in the art, the present invention has at least the following beneficial effects:

(1) And (3) a characterization method for classifying the alarm types based on a preset classification algorithm (for example, a KNN algorithm and the like). The alarm data is mapped to a multidimensional feature space (alarm type), and the average distance value of the multidimensional feature in the multidimensional space (alarm type) is calculated.

(2) And continuously updating the preset screening value through the test data, and when the average distance value of the new alarm data is smaller than the preset screening value, determining that the alarm is a false alarm, otherwise, determining that the alarm is a real alarm. The method can be suitable for threshold value screening of different problems and has better robustness.

(3) The method and the device do not need to make any change on the intrusion detection system, and can reduce the number of false alarms by analyzing the alarm data. Can be used as an external pin scheme.

(4) The method and the device do not need prior probability or prior expert knowledge, and can filter various types of alarms. And is suitable for alarm filtering of various products. Such as a network attack detection platform and a network supervision platform.

Drawings

Some embodiments of the disclosure are described below with reference to the accompanying drawings, in which:

fig. 1 is a flowchart of an alarm filtering method based on a classification algorithm according to an embodiment of the present application.

Fig. 2 is a schematic diagram of an internal structure of an alarm filtering system based on a classification algorithm according to an embodiment of the present application.

Fig. 3 is a schematic diagram of an internal structure of an alarm filtering device based on a classification algorithm according to an embodiment of the present application.

Detailed Description

It should be understood by those skilled in the art that the embodiments described below are only preferred embodiments of the present disclosure, and do not mean that the present disclosure can be implemented only by the preferred embodiments, which are merely for explaining the technical principles of the present disclosure and are not intended to limit the scope of the present disclosure. All other embodiments that can be derived by one of ordinary skill in the art from the preferred embodiments provided by the disclosure without undue experimentation will still fall within the scope of the disclosure.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional identical elements in the process, method, article, or apparatus comprising the element.

The technical solutions proposed in the embodiments of the present application are described in detail below with reference to the accompanying drawings.

An embodiment of the present application provides an alarm filtering method based on a classification algorithm, and as shown in fig. 1, the method provided in the embodiment of the present application mainly includes the following steps:

step 110, obtaining predicted alarm data, an alarm type corresponding to the predicted alarm data and typical training data corresponding to the alarm type from historical false alarm logs in a preset time period of a server; and according to a preset classification algorithm, obtaining the Euclidean distance between the predicted alarm data and the typical training data, further obtaining the characteristic dimension N of the predicted alarm data, and obtaining the average distance value in an N-dimensional space.

In particular, the preset time period may be any feasible time period. And predicting alarm data as data for filtering false alarms. Typical training data is used to provide sample references for a pre-set classification algorithm. The predetermined classification algorithm may be any feasible algorithm capable of performing classification, such as the KNN algorithm. Because there are several alarm types, each alarm type also corresponds to several typical training data; the average distance value means: firstly, calculating the average value of Euclidean distances of prediction alarm data relative to a plurality of typical training data under one alarm type; and then the average distance value is obtained through the average value of a plurality of alarm types. The average distance value may be obtained as an average of several averages, or may be obtained by weighting each average according to the magnitude relationship of the averages.

The method includes acquiring predicted alarm data, an alarm type corresponding to the predicted alarm data, and typical training data corresponding to the alarm type from a historical false alarm log in a preset time period of a server, and specifically may be: acquiring historical false alarm logs in a preset time period, and reading alarm message characteristics in the historical false alarm logs; wherein, the alarm message characteristics at least include: the IP, port, protocol type, alarm grade, alarm type and IP packet head of the alarm record; acquiring a type ratio of the number of historical false alarm logs corresponding to each alarm type to the total number of the historical false alarm logs; determining the quantity of the typical training data corresponding to each alarm type according to the type proportion value corresponding to each alarm type (the larger the type proportion value is, the larger the quantity of the typical training data is); and then the acquisition of typical training data is completed.

Acquiring Euclidean distance between prediction alarm data and typical training data according to a preset classification algorithm; further, an average distance value of the predicted alarm data corresponding to the plurality of alarm types is obtained, which may specifically be: carrying out normalization processing on the prediction alarm data and the typical training data; taking the prediction alarm data after the normalization processing as the input of a preset classification algorithm (KNN algorithm) so as to calculate the Euclidean distance between the prediction alarm data and the typical training data through the preset classification algorithm; acquiring an average Euclidean distance corresponding to each alarm type (the average of Euclidean distances of a plurality of typical training data corresponding to each alarm type); the average Euclidean distance-weight value mapping relation exists, and the weight value corresponding to each alarm type can be obtained through the average Euclidean distance; and multiplying the average Euclidean distances by the weighted values to obtain an average value, so as to obtain an average distance value corresponding to the predicted alarm data.

In addition, the above-mentioned calculating the euclidean distance between the predicted alarm data and the typical training data by the preset classification algorithm may specifically be: and calculating the Euclidean distance between the prediction alarm data and the typical training data through a KNN algorithm.

Step 120, extracting verification alarm data from the historical false alarm log of the server as the input of a preset classification algorithm, and obtaining an average distance value of a plurality of alarm types corresponding to the extracted verification alarm data; and obtaining pre-estimated false alarm data in the verification alarm data according to the size relation between the average distance value and the preset screening value.

It should be noted that the verification alarm data can be randomly obtained from an external database. In order to better fit the predicted alarm data, the predicted alarm data may be extracted from the predicted alarm data.

In order to ensure that the verification alarm data extracted from the prediction alarm data has high accuracy, the method comprises the following steps: obtaining predicted alarm data from a historical false alarm log of a server; calculating the Euclidean distance between the prediction alarm data and the typical training data through a preset classification algorithm; determining predicted alarm data with Euclidean distance smaller than a preset minimum distance threshold as reference false alarm data; determining the predicted alarm data with Euclidean distance greater than the preset maximum distance threshold as reference real alarm data; randomly extracting data with preset extraction quantity from the reference false alarm data and the reference real alarm data to be used as verification alarm data. As mentioned above, the preset minimum distance threshold and the preset maximum distance threshold may be any feasible values, and those skilled in the art may determine the specific values according to actual requirements. The random extraction method can be implemented by the existing method or technology, and the application is not limited too much.

Step 130, determining the ratio of the real alarm data amount in the estimated false alarm data to the real alarm data total amount in the verified alarm data; updating the preset screening value based on the ratio and the preset increment value until the ratio is smaller than a preset ratio threshold value; taking the predicted alarm data as the input of the algorithm again to obtain an average distance value; and then according to the updated preset screening value, alarm filtering is completed.

It should be noted that the initial value, the preset increment value and the preset ratio threshold of the preset screening value may be any feasible values, for example: the initial value of the preset screening value may be 0.1; the preset increment value may be 0.2; the preset ratio threshold may be 5%.

Wherein, updating the preset screening value based on the ratio and the preset increment value specifically comprises: when the ratio is larger than the preset ratio threshold, the preset screening value is increased by a preset increment value until the ratio is smaller than the preset ratio threshold,

in addition, fig. 2 is a diagram of an alarm filtering system based on a classification algorithm according to an embodiment of the present application. As shown in fig. 3, the system provided in the embodiment of the present application mainly includes:

an obtaining module 210, configured to obtain predicted alarm data, an alarm type corresponding to the predicted alarm data, and typical training data corresponding to the alarm type from a historical false alarm log in a preset time period of a server; according to a preset classification algorithm, obtaining the Euclidean distance between the prediction alarm data and the typical training data, further obtaining the characteristic dimension N of the prediction alarm data, and obtaining the average distance value in an N-dimensional space;

in addition, the obtaining module 210 further includes a obtaining unit 211; the system comprises a log acquisition module, a log storage module, a log display module and a log display module, wherein the log acquisition module is used for acquiring historical false alarm logs in a preset time period and reading alarm message characteristics in the historical false alarm logs; wherein, the alarm message characteristics at least include: the IP, port, protocol type, alarm grade, alarm type and IP packet head of the alarm record; acquiring a type ratio of the number of historical false alarm logs corresponding to each alarm type to the total number of the historical false alarm logs; determining the quantity of typical training data corresponding to each alarm type according to the type proportion value corresponding to each alarm type; and then the acquisition of typical training data is completed.

An obtaining module 220, configured to extract verification alarm data from a historical false alarm log of a server as an input of a preset classification algorithm, and obtain an average distance value of a plurality of alarm types corresponding to the extracted verification alarm data; obtaining pre-estimated false alarm data in the verification alarm data according to the size relation between the average distance value and the preset screening value;

a completion module 230, configured to determine a ratio between a real alarm data amount in the estimated false alarm data and a real alarm data total amount in the verified alarm data; updating the preset screening value based on the ratio and the preset increment value until the ratio is smaller than a preset ratio threshold value; taking the predicted alarm data as the input of the algorithm again to obtain an average distance value; and then according to the updated preset screening value, alarm filtering is completed.

In addition, the embodiment of the present application also provides an alarm filtering device based on a classification algorithm, as shown in fig. 3, on which executable instructions are stored, and when the executable instructions are executed, the alarm filtering device based on the classification algorithm as described above is implemented. Specifically, the server sends an execution instruction to the memory through the bus, and when the memory receives the execution instruction, sends an execution signal to the processor through the bus so as to activate the processor.

The processor is used for acquiring predicted alarm data, an alarm type corresponding to the predicted alarm data and typical training data corresponding to the alarm type from historical false alarm logs in a preset time period of the server; according to a preset classification algorithm, obtaining the Euclidean distance between the prediction alarm data and the typical training data, further obtaining the characteristic dimension N of the prediction alarm data, and obtaining the average distance value in an N-dimensional space; extracting verification alarm data from historical false alarm logs of a server as input of a preset classification algorithm, and obtaining average distance values of a plurality of alarm types corresponding to the extracted verification alarm data; obtaining pre-estimated false alarm data in the verification alarm data according to the size relation between the average distance value and the preset screening value; determining the ratio of the real alarm data amount in the estimated false alarm data to the real alarm data total amount in the verified alarm data; updating the preset screening value based on the ratio and the preset increment value until the ratio is smaller than a preset ratio threshold value; taking the predicted alarm data as the input of the algorithm again to obtain an average distance value; and then according to the updated preset screening value, alarm filtering is completed.

So far, the technical solutions of the present disclosure have been described in connection with the foregoing embodiments, but it is easily understood by those skilled in the art that the scope of the present disclosure is not limited to only these specific embodiments. The technical solutions in the above embodiments can be split and combined, and equivalent changes or substitutions can be made on related technical features by those skilled in the art without departing from the technical principles of the present disclosure, and any changes, equivalents, improvements, and the like made within the technical concept and/or technical principles of the present disclosure will fall within the protection scope of the present disclosure.

Claims

1. An alarm filtering method based on a classification algorithm is characterized by comprising the following steps:

obtaining predicted alarm data, an alarm type corresponding to the predicted alarm data and typical training data corresponding to the alarm type from historical false alarm logs in a preset time period of a server; according to a preset classification algorithm, obtaining the Euclidean distance between the prediction alarm data and the typical training data, further obtaining the characteristic dimension N of the prediction alarm data, and obtaining the average distance value in an N-dimensional space; the predicted alarm data is data for filtering false alarms;

extracting verification alarm data from a historical false alarm log of a server as input of a preset classification algorithm, and obtaining an average distance value of a plurality of alarm types corresponding to the extracted verification alarm data; according to the size relation between the average distance value and a preset screening value, estimated false alarm data in the verified alarm data are obtained; wherein, the checking alarm data is at least alarm data randomly obtained from an external database or alarm data extracted from the prediction alarm data;

determining the ratio of the real alarm data amount in the estimated false alarm data to the real alarm data total amount in the verified alarm data; updating the preset screening value based on the ratio and the preset increment value until the ratio is smaller than a preset ratio threshold value; taking the predicted alarm data as the input of the algorithm again to obtain an average distance value; and then according to the updated preset screening value, alarm filtering is completed.

2. The alarm filtering method based on the classification algorithm according to claim 1, wherein the steps of obtaining predicted alarm data, an alarm type corresponding to the predicted alarm data, and typical training data corresponding to the alarm type from historical false alarm logs in a preset time period of a server specifically include:

acquiring historical false alarm logs in a preset time period, and reading alarm message characteristics in the historical false alarm logs; wherein, the alarm message characteristics at least include: the IP, port, protocol type, alarm grade, alarm type and IP packet head of the alarm record;

acquiring a type ratio of the number of historical false alarm logs corresponding to each alarm type to the total number of the historical false alarm logs;

determining the quantity of typical training data corresponding to each alarm type according to the type ratio corresponding to each alarm type; and then the acquisition of typical training data is completed.

3. The warning filtering method based on the classification algorithm according to claim 1, characterized in that according to a preset classification algorithm, the Euclidean distance between the predicted warning data and the typical training data is obtained; further, obtaining an average distance value of the predicted alarm data corresponding to a plurality of alarm types, specifically comprising:

carrying out normalization processing on the prediction alarm data and the typical training data;

taking the prediction alarm data after the normalization processing as the input of a preset classification algorithm, and calculating the Euclidean distance between the prediction alarm data and the typical training data through the preset classification algorithm; acquiring an average Euclidean distance corresponding to each alarm type;

determining a weight value corresponding to each alarm type based on the average Euclidean distance; and determining an average distance value corresponding to the predicted alarm data according to the average Euclidean distance and the weight value.

4. The alarm filtering method based on the classification algorithm according to claim 3, wherein the Euclidean distance between the predicted alarm data and the typical training data is calculated through a preset classification algorithm, and the method specifically comprises the following steps:

and calculating the Euclidean distance between the prediction alarm data and the typical training data through a KNN algorithm to serve as the Euclidean distance.

5. The alarm filtering method based on the classification algorithm according to claim 1, wherein the extracting of the verification alarm data from the historical false alarm log of the server as the input of the preset classification algorithm specifically comprises:

obtaining predicted alarm data from a historical false alarm log of a server;

calculating the Euclidean distance between the predicted alarm data and the typical training data through a preset classification algorithm;

determining predicted alarm data with Euclidean distance smaller than a preset minimum distance threshold as reference false alarm data; determining the predicted alarm data with Euclidean distance greater than the preset maximum distance threshold as reference real alarm data;

randomly extracting data with preset extraction quantity from the reference false alarm data and the reference real alarm data to be used as verification alarm data.

6. An alarm filtering system based on a classification algorithm, the system comprising:

the acquisition module is used for acquiring predicted alarm data, an alarm type corresponding to the predicted alarm data and typical training data corresponding to the alarm type from historical false alarm logs in a preset time period of the server; according to a preset classification algorithm, obtaining Euclidean distance between predicted alarm data and typical training data, and further obtaining average distance values of a plurality of alarm types corresponding to the predicted alarm data; the predicted alarm data is data for filtering false alarms;

the acquisition module is used for extracting verification alarm data from historical false alarm logs of the server as input of a preset classification algorithm, and acquiring average distance values of a plurality of alarm types corresponding to the extracted verification alarm data; obtaining pre-estimated false alarm data in the verification alarm data according to the size relation between the average distance value and the preset screening value; wherein, the checking alarm data is at least alarm data randomly obtained from an external database or alarm data extracted from the prediction alarm data;

the completion module is used for determining the ratio of the real alarm data volume in the estimated false alarm data to the real alarm data total volume in the verified alarm data; updating the preset screening value based on the ratio and the preset increment value until the ratio is smaller than a preset ratio threshold value; taking the predicted alarm data as the input of the algorithm again to obtain an average distance value; and then according to the updated preset screening value, alarm filtering is completed.

7. The warning filtering system based on the classification algorithm according to claim 6, wherein the obtaining module further comprises an obtaining unit;

the system comprises a log acquisition module, a log storage module, a log display module and a log display module, wherein the log acquisition module is used for acquiring historical false alarm logs in a preset time period and reading alarm message characteristics in the historical false alarm logs; wherein, the alarm message characteristics at least include: the IP, port, protocol type, alarm grade, alarm type and IP packet head of the alarm record; acquiring a type ratio of the number of historical false alarm logs corresponding to each alarm type to the total number of the historical false alarm logs; determining the quantity of typical training data corresponding to each alarm type according to the type ratio corresponding to each alarm type; and then the acquisition of typical training data is completed.

8. An alarm filtering device based on a classification algorithm, characterized in that it comprises:

a processor;

and a memory having executable code stored thereon, which when executed, causes the processor to perform a method of alarm filtering based on a classification algorithm according to any one of claims 1-5.