CN112561236A - Alarm information compression method based on frequent item set mining - Google Patents

Alarm information compression method based on frequent item set mining Download PDF

Info

Publication number
CN112561236A
CN112561236A CN202011320123.5A CN202011320123A CN112561236A CN 112561236 A CN112561236 A CN 112561236A CN 202011320123 A CN202011320123 A CN 202011320123A CN 112561236 A CN112561236 A CN 112561236A
Authority
CN
China
Prior art keywords
alarm
compression
alarm information
initial
historical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011320123.5A
Other languages
Chinese (zh)
Other versions
CN112561236B (en
Inventor
庞晓健
苏扬
陶文伟
吴金宇
张文哲
曾初阳
易思瑶
陈刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Southern Power Grid Co Ltd
Original Assignee
China Southern Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Southern Power Grid Co Ltd filed Critical China Southern Power Grid Co Ltd
Priority to CN202011320123.5A priority Critical patent/CN112561236B/en
Publication of CN112561236A publication Critical patent/CN112561236A/en
Application granted granted Critical
Publication of CN112561236B publication Critical patent/CN112561236B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06311Scheduling, planning or task assignment for a person or group
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Marketing (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Tourism & Hospitality (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Public Health (AREA)
  • General Health & Medical Sciences (AREA)
  • Water Supply & Treatment (AREA)
  • Primary Health Care (AREA)
  • Debugging And Monitoring (AREA)
  • Alarm Systems (AREA)

Abstract

The application relates to an alarm information compression method, an alarm information compression device, computer equipment and a storage medium. The method comprises the following steps: acquiring real-time alarm information; and comparing the real-time alarm information with a compression rule set mined based on historical alarm information of a preset time period to obtain a compression result of the real-time alarm information. By the method, the alarm information can be compressed in real time on the premise of ensuring the compression efficiency.

Description

Alarm information compression method based on frequent item set mining
Technical Field
The application relates to the field of industrial control system network security, in particular to a mass alarm compression method, a mass alarm compression device, computer equipment and a storage medium.
Background
With the deep progress of the digital process of the southern power grid and the monitoring of the network safety operation, the safety equipment deployment is more and more widely applied. Due to reasons of service configuration errors, zombie services and the like, generated alarm information is more and more, and serious interference is caused to network safety operation monitoring.
Based on the above, in the existing power monitoring system, an algorithm for compressing alarm data is designed for network security alarm information, and the energy resource construction group, Hunan Power saving design institute, Inc. of China proposes an alarm merging and tracing method based on integration of a main power distribution network (application number: CN201810882998.0, publication number: CN108520370A, application time: 2018.08.06). However, the algorithm has a low compression rate and low real-time performance when the alarm data is compressed, and is difficult to perform lossless compression, which is not favorable for the development of alarm tracing work.
Disclosure of Invention
In view of the above, it is necessary to provide an alarm compression method, apparatus, computer device and storage medium for solving the above technical problems.
A method for compressing alarm information based on frequent item set mining, the method comprises:
acquiring real-time alarm information;
comparing the real-time alarm information with a compression rule set mined based on historical alarm information of a preset time period to obtain a compression result of the real-time alarm information;
the method for mining the compression rule set based on the historical alarm information of the preset time period comprises the following steps:
acquiring historical alarm information of a preset time period;
obtaining a first initial alarm set and a second initial alarm set based on the historical alarm information, wherein the access direction of the historical alarm information in the first initial alarm set is that a local host accesses a remote host; the access direction of the historical alarm information in the second initial alarm set is that the remote host accesses the local host;
respectively mining a frequent item set from the first initial alarm set and the second initial alarm set to obtain a first frequent item set and a second frequent item set;
and obtaining a first compression rule set and a second compression rule set based on the first frequent item set and the second frequent item set, and merging the first compression rule set and the second compression rule set to obtain a compression rule set.
In one embodiment, the step of obtaining a first initial alarm set and a second initial alarm set based on the historical alarm information comprises:
forming a historical alarm set based on the historical alarm information, and recording the historical alarm information in the historical alarm set as historical alarm information;
adding a first identifier before a local host IP and a port item of historical alarm information in the historical alarm set, and adding a second identifier before a remote host IP and a port item to obtain the historical alarm set after the identifiers are added, wherein the first identifier and the second identifier are used for distinguishing the IP and the port item of the local host and the remote host;
and obtaining a first initial alarm set and a second initial alarm set based on the historical alarm set after the identifier is added.
In one embodiment, the obtaining a first initial alarm set and a second initial alarm set based on the history alarm set after the identifier is added includes:
defining an initial alarm compression set;
judging whether the initial alarm compression set is empty or not;
if yes, acquiring historical alarm information in the historical alarm set, and adding one piece of historical alarm information in the historical alarm set to the initial alarm compression set;
matching the historical alarm information in the historical alarm set with the historical alarm information in the initial alarm compression set in sequence to obtain a matched initial alarm compression set;
if not, matching the historical alarm information in the historical alarm set with the historical alarm information in the initial alarm information set in sequence to obtain a matched initial alarm compressed set;
and obtaining a first initial alarm set and a second initial alarm set according to the matched initial alarm compression set.
In one embodiment, matching the historical alarm information in the historical alarm set with the historical alarm information in the initial alarm compression set in sequence to obtain a matched initial alarm compression set includes:
if the history alarm information in the history alarm set is completely consistent with the quadruple of any one piece of history alarm information in the initial alarm set, matching the next piece of history alarm information;
if only three of the four-tuple of the historical alarm information in the historical alarm set is consistent with the four-tuple of the historical alarm information in the initial alarm compression set, replacing one inconsistent four-tuple of the historical alarm information in the initial alarm compression set with a wildcard;
otherwise, directly adding the historical alarm information in the historical alarm set to the initial alarm compression set until the historical alarm information in the historical alarm set is matched.
In one embodiment, obtaining a first initial alarm set and a second initial alarm set according to the matched initial alarm compression set includes:
determining the access direction of the historical alarm information in the initial alarm compression set according to the first identifier, the second identifier and the wildcard, and recording the access direction as forward access if the wildcard is located at the position of the first identifier; if the wildcard is at the position of the second identifier, recording the wildcard as reverse access;
deleting the wildcard;
and based on the access direction, dividing the historical alarm information in the initial alarm compression set into a first initial alarm set and a second initial alarm set.
In one embodiment, the mining the frequent item sets from the first initial alarm set and the second initial alarm set respectively to obtain a first frequent item set and a second frequent item set includes: the method comprises the following steps:
and setting a minimum support threshold, and excavating the first frequent item set and the second frequent item set from the first initial alarm set and the second initial alarm set.
In one embodiment, deriving a first set of compression rules and a second set of compression rules based on the first set of frequent items and the second set of frequent items comprises: ,
and selecting an item set with a binary group and a triple from the first frequent item set and the second frequent item set, and merging the item sets with the binary group and the triple into a set to obtain the first compression rule set and the second compression rule set.
In one embodiment, the method further comprises:
and ordering the compression rules in the compression rule set according to the support degree.
In one embodiment, comparing the real-time alarm information with a compression rule set mined based on historical alarm information of a predetermined time period to obtain a compression result of the real-time alarm information includes:
comparing the real-time alarm information with the compression rules in the compression rule set in sequence;
if the compression rule is a subset of the real-time alarm information and represents that the compression rule hits the real-time alarm information, adding the compression rule into a real-time compression result set;
if the compression rule is not the subset of the real-time alarm information, comparing the real-time alarm with the next compression rule;
if all compression rules are not the subset of the real-time alarm information, adding the real-time alarm information into a real-time compression result set;
and obtaining a compression result of the real-time alarm information according to the compression result set.
In one embodiment, the method comprises the following steps:
and updating the compression rule set based on a preset compression time period.
An alarm information compression based on frequent itemset mining, the apparatus comprising:
the real-time alarm information acquisition module is used for acquiring real-time alarm information;
and the compression result acquisition module is used for comparing the real-time alarm information with a compression rule set mined based on historical alarm information of a preset time period to acquire a compression result of the real-time alarm information.
A compression rule set mining module: for mining a set of compression rules based on historical alarm information for a predetermined period of time.
A computer device comprising a memory storing a computer program and a processor implementing the steps of the method described above when executing the computer program.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.
According to the alarm information compression method, the device, the computer equipment and the storage medium, the real-time alarm information is obtained, and the real-time alarm information is compared with the compression rule set mined based on the historical alarm information of the preset time period, so that the real-time compression of the alarm information is realized, and the compression result of the real-time compression of the alarm information is obtained. By the method, the alarm information is compressed in real time on the premise of ensuring the compression efficiency and lossless compression.
Drawings
FIG. 1 is a diagram of an exemplary embodiment of a compression method for alarm information;
FIG. 2 is a flow chart illustrating a method for compressing alarm information according to an embodiment;
FIG. 3 is a flowchart illustrating a method for compressing alarm information according to an embodiment;
FIG. 4 is a flow chart of a preprocessing for obtaining a first initial alarm set and a second initial alarm set of the alarm information compression method in one embodiment;
FIG. 5 is a diagram illustrating a pre-compression history alarm message rule of an alarm message compression method according to an embodiment;
FIG. 6 is a compressed historical alarm information compression gauge diagram of an alarm information compression method in one embodiment;
FIG. 7 is a flowchart illustrating a method for compressing alarm information according to an embodiment;
FIG. 8 is a block diagram showing the construction of an alarm information compressing apparatus according to an embodiment;
FIG. 9 is a block diagram showing the construction of an alarm information compressing apparatus according to an embodiment;
FIG. 10 is a block diagram showing the construction of an alarm information compressing apparatus according to an embodiment;
FIG. 11 is a block diagram showing the construction of an alarm information compressing apparatus according to an embodiment;
FIG. 12 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The alarm information compression method provided by the application can be applied to the application environment shown in fig. 1. The server 104 compresses the information which is generated by the security device and does not conform to the security policy in the communication process, that is, the alarm information, and compares the real-time alarm information with the compression rule set dug based on the historical alarm information in the predetermined time period to obtain the compression result of the real-time alarm information, and outputs the real-time compression result to the terminal 102 for display. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers. The security device 106 may be a firewall or like security device.
In an embodiment, as shown in fig. 2, an alarm information compression method is provided, which is described by taking the method as an example applied to the terminal 102 in fig. 1, and includes the following steps:
step 202, acquiring real-time alarm information.
The alarm information refers to information which does not accord with the security policy, and the real-time alarm information refers to information which does not accord with the security policy at the current moment. The alarm information generally includes the following information: host name, host type, local host IP (network address), local host port, remote host IP and remote host port, alarm time, etc., and the local host IP, local host port, remote host IP and remote host port may be selected to constitute the alarm information.
And 204, comparing the real-time alarm information with a compression rule set mined based on historical alarm information of a preset time period to obtain a compression result of the real-time alarm information.
The historical alarm information refers to information which is generated in the communication process of the local host and the remote host within a past period and does not conform to the security policy, for example, alarm information generated in the past 24 hours can be acquired, and alarm information generated in the past 10 hours can also be acquired. The compression rule set refers to a set consisting of compression rules, the compression rules are mined frequent item sets, and the frequent item sets are item sets with the support degree larger than a minimum support degree threshold value.
In one embodiment, the compression rule set is mined based on the historical alarm information of the predetermined time period, as shown in fig. 3, a way of mining the compression rule set based on the historical alarm information of the predetermined time period is provided:
step 302, acquiring historical alarm information of a preset time period;
the historical alarm information of the preset time period is preset historical alarm information of a past time period, and the preset time period can be adjusted according to actual conditions.
Step 304, obtaining a first initial alarm set and a second initial alarm set based on the historical alarm information, wherein the access direction of the historical alarm information in the first initial alarm set is that a local host accesses a remote host; the access direction of the historical alarm information in the second initial alarm set is that the remote host accesses the local host;
the compressed historical alarm information is stored in the first initial alarm set and the second initial alarm set, and the access directions of a part of the historical alarm information are that the local host accesses the remote host, and the access directions of a part of the historical alarm information are that the remote host accesses the local host, so the alarm information in different directions are classified by setting the first initial alarm set and the second initial alarm set.
Step 306, respectively mining a frequent item set from the first initial alarm set and the second initial alarm set to obtain a first frequent compressed item set and a second frequent compressed item set;
wherein, let I ═ I1,i2,Λ,IMA set of M different elements is called an item, and a set of several items is called an item set. The number of items in the item set is referred to as the length of the item set, and the item set of length k is referred to as the k-item set. The frequent item set refers to an item set with the support degree larger than the minimum support degree threshold value, and the item set is set
Figure BDA0002792610070000071
The number of times of the item set P appearing in I is called the support count, the frequency of the item set P appearing in I is defined as the support of P, the minimum support threshold is the set algorithm parameter, and by setting the parameter, the item set P appears in IThe frequency of occurrence is greater than the minimum support threshold before being mined into a frequent set of items.
In one embodiment, an alarm message may be represented by a 4-entry set, and the elements included in the 4-entry set may be a local host IP address, a local host port, a remote host IP address, and a remote host port. The alarm information in the first initial alarm set and the alarm information in the second initial alarm set after the compression preprocessing can comprise a 1-item set, a 2-item set, a 3-item set and a 4-item set. Wherein, 1-item set comprises any element in the local host port, the far-end host IP address and the far-end host port, 2-item set comprises the combination of any two elements in the local host port, the far-end host IP address and the far-end host port, and 3-item set comprises the combination of any three elements in the local host port, the far-end host IP address and the far-end host port. And mining frequent item sets from the first initial alarm set and the second initial alarm set respectively by adopting an Apriori (association rule mining) algorithm, wherein a minimum support threshold value in the Apriori algorithm can be set to be 2/the total number of alarm information of the first initial alarm set, if the support of an item set consisting of the alarm information in the first initial alarm set is greater than the minimum support threshold value, the item set is mined into the first frequent item set, and similarly, the minimum support threshold value in the Apriori algorithm can be set to be 2/the total number of alarm information of the second initial alarm set, and if the support of the item set consisting of the alarm information in the second initial alarm set is greater than the minimum support threshold value, the item set is mined into the second frequent item set.
And 308, obtaining a first compression rule set and a second compression rule set based on the first frequent compression item set and the second frequent compression item set, and merging the first compression rule set and the second compression rule set to obtain a compression rule set.
And obtaining a first compression rule set and a second compression rule set based on the mined first frequent compression item set and second frequent compression item set. In one embodiment, the alarm information is seriously lost because the frequent 1-item set represents the quadruple of the alarm information and only IP or ports are left, and the frequent 4-item set is not compressed at all and has no guiding significance for compressing the alarm. Therefore, a frequent 2-item set and a frequent 3-item set can be selected from the mined frequent item sets, and the frequent 2-item set and the frequent 3-item set are combined into a set, wherein the frequent 1-item set indicates that the support degree of the 1-item set is greater than the minimum support degree threshold set in the Apriori algorithm, the frequent 2-item set indicates that the support degree of the 2-item set is greater than the minimum support degree threshold set in the Apriori algorithm, the frequent 3-item set indicates that the support degree of the 3-item set is greater than the minimum support degree threshold set in the Apriori algorithm, the frequent 3-item set indicates that the support degree of the 4-item set is greater than the minimum support degree threshold set in the Apriori algorithm, and each frequent item set in the set is regarded as a compression rule.
In the alarm information compression method, the real-time alarm information is obtained and compared with a compression rule set mined based on historical alarm information of a preset time period, so as to obtain a compression result of the real-time alarm information, wherein the method for mining the compression rule set based on the historical alarm information of the preset time period comprises the following steps: acquiring historical alarm information of a preset time period; acquiring a first initial alarm set and a second initial alarm set based on historical alarm information, wherein the access direction of the historical alarm information in the first initial alarm set is that a local host accesses a remote host; the access direction of the historical alarm information in the second initial alarm set is that the remote host accesses the local host; respectively mining frequent item sets from the first initial alarm set and the second initial alarm set to obtain a first frequent compressed item set and a second frequent compressed item set; and obtaining a first compression rule set and a second compression rule set based on the first frequent compression item set and the second frequent compression item set, and merging the first compression rule set and the second compression rule set to obtain a compression rule set. By the method, the alarm information can be compressed in real time on the premise of ensuring the compression efficiency.
In one embodiment, after obtaining the historical alarm information for a predetermined period of time, the step of obtaining a first initial alarm set and a second initial alarm set based on the historical alarm information includes:
forming a historical alarm set based on the historical alarm information, and recording the historical alarm information in the historical alarm set as historical alarm information;
adding a first identifier before a local host IP and a port item of historical alarm information in the historical alarm set, and adding a second identifier before a remote host IP and a port item to obtain the historical alarm set after the identifiers are added, wherein the first identifier and the second identifier are used for distinguishing the IP and the port item of the local host and the remote host;
and obtaining a first initial alarm set and a second initial alarm set based on the historical alarm set after the identifier is added.
The identifier is used to distinguish the IP and port number of the local host and the remote host, and may also be used to distinguish the IP and port number of the local host and the remote host by adding a feature code, as long as the difference is obtained, in one embodiment, a quadruple of the historical alarm information is identified by using a letter, for example, based on the obtained historical alarm information, an identifier 'L' is added before the IP and port of the local host of the historical alarm information, an identifier 'R' is added before the IP and port of the remote host, specifically, the quadruple of the historical alarm information is {10.88.3.18, 4775, 10.88.49.66, 6000}, and then the quadruple of the historical alarm information after adding the identifier is { L10.88.3.18, L4775, R10.88.49.66, R6000 }.
In one embodiment, the obtaining a first initial alarm set and a second initial alarm set based on the history alarm set after the identifier is added includes:
defining an initial alarm compression set;
judging whether the initial alarm compression set is empty or not;
if yes, acquiring historical alarm information in the historical alarm set, and adding one piece of historical alarm information in the historical alarm set to the initial alarm compression set;
matching the historical alarm information in the historical alarm set with the historical alarm information in the initial alarm compression set in sequence to obtain a matched initial alarm compression set;
if not, matching the historical alarm information in the historical alarm set with the historical alarm information in the initial alarm information set in sequence to obtain a matched initial alarm compressed set;
and obtaining a first initial alarm set and a second initial alarm set according to the matched initial alarm compression set. The initial alarm compression set is used for storing history alarm information after initial compression, and the history alarm information may be stored in advance or not, and the subsequent steps can be performed by judging whether the initial alarm compression set is empty or not. If the set is empty, acquiring historical alarm information in a historical alarm set, and adding one piece of historical alarm information in the historical alarm set to the initial alarm compression set; and matching the historical alarm information in the historical alarm set with the historical alarm information in the initial alarm compression set in sequence to obtain the matched initial alarm compression set. If not, matching the historical alarm information in the historical alarm set with the historical alarm information in the initial alarm information set to obtain a matched initial alarm compression set; and then adding one piece of historical alarm information in the historical alarm set to the initial alarm compression set, sequentially matching the historical alarm information in the historical alarm set with the historical alarm information in the initial alarm compression set to obtain a matched initial alarm compression set, and obtaining a first initial alarm set and a second initial alarm set according to the matched initial alarm compression set.
In one embodiment, matching the historical alarm information in the historical alarm set with the historical alarm information in the initial alarm compression set in sequence to obtain a matched initial alarm compression set includes:
if the history alarm information in the history alarm set is completely consistent with the quadruple of any one piece of history alarm information in the initial alarm set, matching the next piece of history alarm information;
if only three of the four-tuple of the historical alarm information in the historical alarm set is consistent with the four-tuple of the historical alarm information in the initial alarm compression set, replacing one inconsistent four-tuple of the historical alarm information in the initial alarm compression set with a wildcard;
otherwise, directly adding the historical alarm information in the historical alarm set to the initial alarm compression set until the historical alarm information in the historical alarm set is matched. .
In one embodiment, obtaining a first initial alarm set and a second initial alarm set according to the matched initial alarm compression set includes:
determining the access direction of the historical alarm information in the initial alarm compression set according to the first identifier, the second identifier and the wildcard, and recording the access direction as forward access if the wildcard is located at the position of the first identifier; if the wildcard is at the position of the second identifier, recording the wildcard as reverse access;
deleting the wildcard;
and based on the access direction, dividing the historical alarm information in the initial alarm compression set into a first initial alarm set and a second initial alarm set.
In one embodiment, deriving a first set of compression rules and a second set of compression rules based on the first set of frequent items and the second set of frequent items comprises:
and selecting an item set with a binary group and a triple from the first frequent item set and the second frequent item set, and merging the item sets with the binary group and the triple into a set to obtain the first compression rule set and the second compression rule set.
In one embodiment, translating the compression result of the real-time alarm information into an alarm output with semantics comprises:
translating the alarm information in the compression result set into alarms with semantics, wherein the translation rule is as follows: if the alarm information is forward alarm information, according to the existing local host IP, local host port, remote host IP and remote host port items in the alarm information, translating the alarm information into a port of a local address accessing a remote group of addresses, a group of ports of the local address accessing a group of ports of the remote address or a group of ports of a local group of addresses accessing a port of the remote address; similarly, if the alarm information is reverse alarm information, the alarm is correspondingly translated into a port of a remote address accessing a local group of addresses, a group of ports of the remote address accessing a group of ports of the local address or a group of ports of the remote group of addresses accessing a port of the local address; if the alarm information is an alarm with unknown direction, a certain port which translates the alarm into a local address is communicated with a certain port which accesses a remote address.
In one embodiment, the method further comprises the following steps: and updating the compression rule set based on a preset compression time period. Specifically, at intervals of 30 minutes, Apriori algorithm is adopted to mine historical alarm information of the last 24 hours, the compression rule is updated once, and a new compression rule set is generated.
In one embodiment, the method further comprises the following steps: and deleting the repeated real-time alarm information in the real-time compression result set to obtain the compression result of the real-time alarm information.
In one embodiment, as shown in fig. 4, a flow chart of preprocessing for obtaining a first initial set of alarms and a second initial set of alarms is provided. The pre-processing flow diagram is illustrated by way of example.
And defining an initial alarm compression set, wherein the initial alarm compression set is used for storing the history alarm information in the compressed history alarm set. Specifically, any one of the historical alarm information is added to the initial alarm compression set, for example, the historical alarm information with the quadruple of { L10.88.3.18, L4775, R10.88.49.66, R6000} is added to the initial alarm compression set.
If the quadruple of the next historical alarm information is L10.88.3.18, L4001, R10.88.49.66 and R6000, and only one port item is different from the alarm information in the initial alarm compression set, replacing the different one item in the initial alarm compression set with a wildcard, wherein the wildcard is only used for identifying the access direction of the alarm information, if the quadruple of the next historical alarm information is completely consistent with the quadruple of the historical alarm information in the initial alarm compression set, deleting the one item with consistent quadruple information in the historical alarm information, and matching the next item, otherwise, directly matching the alarm information in the historical alarm set until the historical alarm information in the historical alarm set is completely matched.
And determining the access direction of the historical alarm information in the initial alarm compression set according to the position of the historical alarm information of which the wildcard appears after passing the identifier, wherein the access direction is a first initial alarm set if the access direction is that the local host accesses the remote host, and the access direction is a second initial alarm set if the access direction is that the remote host accesses the local host.
In one embodiment, as shown in fig. 5 and 6, a graph showing the scale of the history alarm information before compression and a graph showing the scale of the history alarm information after compression are respectively shown. By randomly selecting the historical alarm information of one week as input, the data scale is shown in fig. 5, the average total number of the historical alarm information per day is 130 ten thousand, the scale of the compressed historical alarm information is shown in fig. 6, and the average total number of the historical alarm information per day is 90. The result verifies the effectiveness of the invention, and shows that the method can be used as an efficient alarm information compression technology.
In one embodiment, as shown in fig. 7, a flowchart of an alarm information compression method is shown. Embodiments of the alert information compression method are illustrated by way of example.
1. Acquiring historical alarm information of a preset time period, and compressing the historical alarm information by applying a heuristic preprocessing method, wherein the specific steps comprise:
in the first step, historical alarm information of a predetermined time period is obtained, the predetermined time period can be set to 24 hours, and specifically, historical alarm information of the past 24 hours is obtained. And forming a historical alarm set based on the acquired historical alarm information of the past 24 hours. The historical alarm information is formed based on four-tuple, wherein the four-tuple comprises a local host IP, a local host port, a remote host IP and a remote host port, in order to distinguish the local host IP, the local host port, the remote host IP and the remote host port, different identifiers can be added in front of the host IP and the port items of each piece of historical alarm information, and the identifiers can be numbers or letters as long as the ports and the IPs of the local host and the remote host can be distinguished. For example, an identifier 'L' may be added before the local host IP and port entry and an identifier 'R' may be added before the remote host IP and port entry.
And secondly, defining an initial alarm compression set for storing the history alarm information after initial compression.
And thirdly, adding one piece of historical alarm information in the historical alarm set into the initial alarm compression set, and then sequentially matching the historical alarm information in the historical alarm set with the alarm information in the initial alarm compression set.
If the local host IP, the local host port, the remote host IP and the remote host port in the two pieces of historical alarm information are completely consistent, deleting the historical alarm information, and matching the next piece of historical alarm information; if only three items of the local host IP, the local host port, the far-end host IP and the far-end host port in the two pieces of historical alarm information are matched, replacing the item in the initial alarm compression set with a wildcard character', and if the other conditions are not matched, directly adding the historical alarm information into the initial alarm compression set;
and fourthly, recording the host access direction of the historical alarm information in each initial compression set. Checking the position of the wildcard character, if the wildcard character is at the position of the IP or port of the local host, the host access direction is that the local host accesses the remote host, and the host access direction is recorded as forward access; if the wildcard is in the IP or port position of the remote host, the host access direction is that the remote host accesses the local host, and the record is reverse access.
Fifthly, deleting wildcard items of the historical alarm information in the initial alarm set;
and sixthly, dividing the historical alarm information in the initial alarm set into two sets according to different access directions, wherein the two sets are respectively a first initial alarm set and a second initial alarm set.
2. After a heuristic preprocessing method is applied to obtain a first initial alarm set and a second initial alarm set, a compression rule mining algorithm is used for mining a compression rule, and the specific steps comprise:
the method comprises the following steps of firstly, mining a compression rule by using an association algorithm, specifically, mining a frequent item set from a first initial alarm set by using an Apriori algorithm, wherein in order to achieve the maximum compression rate, the minimum support threshold of Apriori algorithm parameters can be set to be 2/the total number of alarms of the first initial alarm set.
And secondly, selecting a frequent 2-item set and a frequent 3-item set from the mined frequent item sets, and combining the frequent 2-item set and the frequent 3-item set into a set, wherein each frequent item set in the set is regarded as a compression rule, and therefore the set is called a first compression rule set.
And thirdly, processing the second initial alarm set according to the same processing method as the first initial alarm set to obtain a second compression rule set.
And fourthly, combining the first compression rule set and the second compression rule set, and sorting the compression rules in the compression rule sets in a descending order according to the support degree.
3. And after the compression rules in the compression rule set are sorted in a descending order according to the support degree, compressing the real-time alarm information by using a real-time alarm compression method.
Comparing the real-time alarm with the compression rule in the compression rule set from top to bottom in sequence, if the compression rule is a subset of the alarm, indicating that the rule hits the alarm, and adding the rule into the real-time compression result set;
if the compression rule is not a subset of the alarms, comparing the alarm with the next rule; if all compression rules of the compression rule set do not hit the alarm, the alarm is directly added into the real-time compression result set;
the alarm information hit by the forward compression rule is called forward alarm information, the alarm information hit by the reverse compression rule is called reverse alarm information, and the alarm not hit by the compression rule is called unknown direction alarm information;
secondly, only one alarm is reserved in the repeated alarm information in the compression result set;
thirdly, translating the alarm information in the compression result set into alarms with semantics, wherein the translation rule is as follows: if the alarm information is forward alarm information, translating the alarm information into a port of a local address which accesses a remote group address, a group of ports of the local address which accesses a group port of the remote address or a group of ports of a local group address which accesses a remote port of the remote address according to the existing local host IP, local host port, remote host IP and remote host port items in the alarm information; similarly, if the alarm information is reverse alarm information, the alarm information is correspondingly translated into a port of a remote address, a group of ports of the remote address, or a group of ports of the remote address, which access the local address; if the alarm information is the alarm information with unknown direction, a certain port which translates the alarm information into a local address is communicated with a certain port which accesses a remote address.
4. And updating the compression rule by using a compression rule dynamic updating method, wherein the updating time is set according to the actual situation, for example, the compression rule can be set to be updated once every 5 minutes, and the method for obtaining the compression rule in the steps is adopted to process the original alarm data of the last 24 hours and generate a new compression rule set.
It should be understood that although the various steps in the flowcharts of fig. 2-3 and 7 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-3 and 7 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least some of the other steps.
In one embodiment, as shown in fig. 8, there is provided an alert information compression apparatus including: the warning information acquisition module, the compression result acquisition module and the compression rule set mining module, wherein:
and an alarm information obtaining module 802, configured to obtain real-time alarm information.
A compression result obtaining module 804, configured to compare the real-time alarm information with a compression rule set mined based on historical alarm information of a predetermined time period, and obtain a compression result of the real-time alarm information.
A compression rule set mining module 806 for mining a compression rule set based on historical alarm information for a predetermined time period.
In one embodiment, as shown in FIG. 9, the compression rule set mining module comprises:
a historical alarm information obtaining module 902, configured to obtain historical alarm information for a predetermined time period.
An initial alarm set obtaining module 904, configured to obtain a first initial alarm set and a second initial alarm set based on the historical alarm information.
A frequent item set mining module 906, configured to mine frequent item sets from the first initial alarm set and the second initial alarm set, respectively, to obtain a first frequent compressed item set and a second frequent compressed item set.
The compression rule set obtaining module 908 is configured to obtain a first compression rule set and a second compression rule set based on the first frequent compression item set and the second frequent compression item set, and merge the first compression rule set and the second compression rule set to obtain a compression rule set.
In one embodiment, as shown in fig. 10, the apparatus further includes:
a historical alarm set composing module 1002, configured to compose a historical alarm set based on historical alarm information;
the identifier adding module 1004 is configured to add a first identifier before a local host IP and a port entry of each piece of historical alarm information in the historical alarm set, and add a second identifier before a remote host IP and a port entry.
In one embodiment, as shown in fig. 11, the apparatus further includes:
an initial alarm compression set generation module 1102, configured to define an initial alarm compression set.
And the historical alarm information adding module 1104 is configured to sequentially obtain historical alarm information in a historical alarm set, and add one piece of historical alarm information in the historical alarm set to the initial alarm compression set.
And an alarm information matching module 1106, configured to sequentially match the historical alarm information in the historical alarm set with the historical alarm information in the initial alarm compression set, so as to obtain a matched initial alarm compression set.
In one embodiment, the apparatus further comprises:
the compression rule sorting module is used for sorting the compression rules in the compression rule set according to the support degree;
and the compression rule updating module is used for updating the compression rule set based on a preset compression time period.
For the specific limitations of the alarm information compression device, reference may be made to the above limitations of the alarm information compression method, which will not be described herein again. All or part of the modules in the alarm information compression device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 12. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of alert information compression. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 12 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, which includes a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the above-mentioned alarm information compression method when executing the computer program.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the above-mentioned alert information compression method.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), for example.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent application shall be subject to the appended claims.

Claims (10)

1. A method for compressing alarm information based on frequent item set mining is characterized by comprising the following steps:
acquiring real-time alarm information;
comparing the real-time alarm information with a compression rule set mined based on historical alarm information of a preset time period to obtain a compression result of the real-time alarm information;
the method for mining the compression rule set based on the historical alarm information of the preset time period comprises the following steps:
acquiring historical alarm information of a preset time period;
obtaining a first initial alarm set and a second initial alarm set based on the historical alarm information, wherein the access direction of the historical alarm information in the first initial alarm set is that a local host accesses a remote host; the access direction of the historical alarm information in the second initial alarm set is that the remote host accesses the local host;
respectively mining a frequent item set from the first initial alarm set and the second initial alarm set to obtain a first frequent item set and a second frequent item set;
and obtaining a first compression rule set and a second compression rule set based on the first frequent item set and the second frequent item set, and merging the first compression rule set and the second compression rule set to obtain a compression rule set.
2. The method of claim 1, wherein obtaining a first initial set of alarms and a second initial set of alarms based on the historical alarm information comprises:
forming a historical alarm set based on the historical alarm information;
adding a first identifier before a local host IP and a port item of each piece of historical alarm information in the historical alarm set, and adding a second identifier before a remote host IP and a port item to obtain the historical alarm set after the identifiers are added, wherein the first identifier and the second identifier are used for distinguishing the IP and the port items of the local host and the remote host;
and obtaining a first initial alarm set and a second initial alarm set based on the historical alarm set after the identifier is added.
3. The method of claim 2, wherein obtaining a first initial set of alarms and a second initial set of alarms based on the set of historical alarms after adding the identifier comprises:
defining an initial alarm compression set;
judging whether the initial alarm compression set is empty or not;
if yes, acquiring historical alarm information in the historical alarm set, and adding one piece of historical alarm information in the historical alarm set to the initial alarm compression set;
matching the historical alarm information in the historical alarm set with the historical alarm information in the initial alarm compression set in sequence to obtain a matched initial alarm compression set; if not, matching the historical alarm information in the historical alarm set with the historical alarm information in the initial alarm information set in sequence to obtain a matched initial alarm compressed set;
and obtaining a first initial alarm set and a second initial alarm set according to the matched initial alarm compression set.
4. The method according to claim 3, wherein the matching the historical alarm information in the historical alarm set with the historical alarm information in the initial compressed alarm set in sequence to obtain the matched initial compressed alarm set comprises:
if the history alarm information in the history alarm set is completely consistent with the quadruple of any one piece of history alarm information in the initial alarm set, matching the next piece of history alarm information;
if only three of the four-tuple of the historical alarm information in the historical alarm set is consistent with the four-tuple of the historical alarm information in the initial alarm compression set, replacing one inconsistent four-tuple of the historical alarm information in the initial alarm compression set with a wildcard;
otherwise, directly adding the historical alarm information in the historical alarm set to the initial alarm compression set until the historical alarm information in the historical alarm set is matched.
5. The method according to any of claims 1 to 4, wherein obtaining a first initial alarm set and a second initial alarm set according to the matched initial alarm compression set comprises:
determining the access direction of the historical alarm information in the initial alarm compression set according to the first identifier, the second identifier and the wildcard, and recording the access direction as forward access if the wildcard is located at the position of the first identifier; if the wildcard is located at the position of the second identifier, recording the wildcard as reverse access;
deleting the wildcard;
and based on the access direction, dividing the historical alarm information in the initial alarm compression set into a first initial alarm set and a second initial alarm set.
6. The method of claim 1, wherein mining a frequent item set from a first initial alarm set and a second initial alarm set, respectively, to obtain a first frequent item set and a second frequent item set, comprises:
and setting a minimum support threshold, and mining the first frequent item set and the second frequent item set from the first initial alarm set and the second initial alarm set.
7. The method of claim 1, wherein deriving a first set of compression rules and a second set of compression rules based on the first set of frequent items and the second set of frequent items comprises:
and selecting an item set with a binary group and a triple from the first frequent item set and the second frequent item set, and combining the item sets with the binary group and the triple into a set to obtain the first compression rule set and the second compression rule set.
8. The method of claim 1, comprising:
and ordering the compression rules in the compression rule set according to the support degree.
9. The method according to any one of claims 1-8, wherein comparing the real-time alarm information with a compression rule set mined based on historical alarm information for a predetermined time period to obtain a compression result of the real-time alarm information comprises:
comparing the real-time alarm information with the compression rules in the compression rule set in sequence;
if the compression rule is a subset of the real-time alarm information and represents that the compression rule hits the real-time alarm information, adding the compression rule into a real-time compression result set;
if the compression rule is not the subset of the real-time alarm information, comparing the real-time alarm with the next compression rule;
if all compression rules are not the subset of the real-time alarm information, adding the real-time alarm information into a real-time compression result set;
and obtaining a compression result of the real-time alarm information according to the compression result set.
10. The method of claim 1, comprising:
and updating the compression rule set based on a preset compression time period.
CN202011320123.5A 2020-11-23 2020-11-23 Alarm information compression method based on frequent item set mining Active CN112561236B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011320123.5A CN112561236B (en) 2020-11-23 2020-11-23 Alarm information compression method based on frequent item set mining

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011320123.5A CN112561236B (en) 2020-11-23 2020-11-23 Alarm information compression method based on frequent item set mining

Publications (2)

Publication Number Publication Date
CN112561236A true CN112561236A (en) 2021-03-26
CN112561236B CN112561236B (en) 2022-12-06

Family

ID=75044843

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011320123.5A Active CN112561236B (en) 2020-11-23 2020-11-23 Alarm information compression method based on frequent item set mining

Country Status (1)

Country Link
CN (1) CN112561236B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09128212A (en) * 1995-10-30 1997-05-16 Kokusai Electric Co Ltd Method and device for compressing data on bill information
US20160179903A1 (en) * 2014-12-23 2016-06-23 Ran Bittmann Enhancing frequent itemset mining
CN106789145A (en) * 2016-03-30 2017-05-31 新华三技术有限公司 A kind of warning information method for pushing and device
CN110096410A (en) * 2019-03-15 2019-08-06 中国平安人寿保险股份有限公司 Alarm information processing method, system, computer installation and readable storage medium storing program for executing
CN110752942A (en) * 2019-09-06 2020-02-04 平安科技(深圳)有限公司 Alarm information decision method and device, computer equipment and storage medium
CN111752811A (en) * 2020-06-29 2020-10-09 平安普惠企业管理有限公司 Abnormal alarm information processing method, electronic device and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09128212A (en) * 1995-10-30 1997-05-16 Kokusai Electric Co Ltd Method and device for compressing data on bill information
US20160179903A1 (en) * 2014-12-23 2016-06-23 Ran Bittmann Enhancing frequent itemset mining
CN106789145A (en) * 2016-03-30 2017-05-31 新华三技术有限公司 A kind of warning information method for pushing and device
CN110096410A (en) * 2019-03-15 2019-08-06 中国平安人寿保险股份有限公司 Alarm information processing method, system, computer installation and readable storage medium storing program for executing
CN110752942A (en) * 2019-09-06 2020-02-04 平安科技(深圳)有限公司 Alarm information decision method and device, computer equipment and storage medium
CN111752811A (en) * 2020-06-29 2020-10-09 平安普惠企业管理有限公司 Abnormal alarm information processing method, electronic device and storage medium

Also Published As

Publication number Publication date
CN112561236B (en) 2022-12-06

Similar Documents

Publication Publication Date Title
CN108090567B (en) Fault diagnosis method and device for power communication system
EP3899770B1 (en) System and method for detecting data anomalies by analysing morphologies of known and/or unknown cybersecurity threats
KR20150038738A (en) Detection of confidential information
CN111159413A (en) Log clustering method, device, equipment and storage medium
EP4080842A1 (en) Method and apparatus for obtaining malicious event information, and electronic device
US20160019211A1 (en) A process for obtaining candidate data from a remote storage server for comparison to a data to be identified
CN112148305A (en) Application detection method and device, computer equipment and readable storage medium
CN111475324A (en) Log information analysis method and device, computer equipment and storage medium
CN114036059A (en) Automatic penetration testing system and method for power grid system and computer equipment
CN115001753A (en) Method and device for analyzing associated alarm, electronic equipment and storage medium
CN114693192A (en) Wind control decision method and device, computer equipment and storage medium
CN117061254B (en) Abnormal flow detection method, device and computer equipment
CN114240344A (en) Enterprise personnel data processing method and device, computer equipment and storage medium
CN114157480A (en) Method, device, equipment and storage medium for determining network attack scheme
CN112561236B (en) Alarm information compression method based on frequent item set mining
CN110838940B (en) Underground cable inspection task configuration method and device
CN115567572A (en) Method, device and equipment for determining abnormality degree of object and storage medium
CN111666501A (en) Abnormal community identification method and device, computer equipment and storage medium
CN115589339A (en) Network attack type identification method, device, equipment and storage medium
CN115660073A (en) Intrusion detection method and system based on harmony whale optimization algorithm
CN116015677A (en) Network safety protection method and device based on key dynamics characteristics
US20220374524A1 (en) Method and system for anamoly detection in the banking system with graph neural networks (gnns)
CN114064723A (en) Association rule mining method and device, computer equipment and storage medium
CN113254672A (en) Abnormal account identification method, system, equipment and readable storage medium
CN112261006B (en) Mining method, terminal and storage medium for discovering dependency relationship among threat behaviors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant