CN115544519A

CN115544519A - Method for carrying out security association analysis on threat information of metering automation system

Info

Publication number: CN115544519A
Application number: CN202211284766.8A
Authority: CN
Inventors: 孙文龙; 何智帆; 刘涛; 姜和芳; 马越; 梁洪浩
Original assignee: Shenzhen Power Supply Bureau Co Ltd
Current assignee: Shenzhen Power Supply Bureau Co Ltd
Priority date: 2022-10-20
Filing date: 2022-10-20
Publication date: 2022-12-30

Abstract

The invention discloses a method for carrying out security association analysis on threat information of a metering automation system, which comprises the following steps: acquiring multi-source heterogeneous threat information; data preprocessing is carried out, and a threat information transaction data set is constructed; performing correlation analysis by using multi-algorithm fusion; and finding out potential risks through correlation analysis results. By implementing the method, the polymerization degree of threat information is improved, the relevance analysis efficiency is improved, and the safety analysis can be quickly formed to resist potential risks, so that the safety of a metering automation system is improved.

Description

Method for carrying out security association analysis on threat information of metering automation system

Technical Field

The invention relates to the technical field of computer security, in particular to a method for carrying out security association analysis on threat information of a metering automation system based on a fusion algorithm.

Background

With the arrival of the big data era, the proportion of network services in the life of people is getting larger and larger, the network scale is continuously enlarged, the number and the influence of network attacks are increased rapidly, the network and the information safety are greatly damaged, the efficient and stable operation of the network and the information system is guaranteed, and the method is the basis of all market activities and normal operation. In the existing metering automation system, the overall safety analysis of the system needs to be satisfied, so that reliable first-hand and third-party data are provided for the safety evaluation of electric energy metering, and the effectiveness and reliability of the safety evaluation are improved. In order to achieve the above purpose, analysis and research on network threat information are needed, security protection on the network and the service system is enhanced, a security defense system is established, and normal operation of the network and the service system is guaranteed.

Threat intelligence is evidence-based knowledge including context, mechanisms, indicators, implicit and actual feasible suggestions. Threat intelligence describes an existing, or imminent, threat or danger to an asset and may be used to notify a subject to take some response to the relevant threat or danger.

Due to the fact that sources of obtained threat information data are complex, different data formats and analysis methods exist, a data isolated island is easily formed, really valuable threat information cannot be obtained, correlation analysis cannot be conducted on all threat information, in the prior art, most of the threat information is manually searched by a security analyst and correlated with various threats, efficiency is low, the threat information cannot be effectively aggregated, and security analysis is quickly formed to resist potential risks.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a method for carrying out security association analysis on threat information of a metering automation system, so that the polymerization degree of various threat information is improved, and the analysis efficiency is improved.

As an aspect of the present invention, a method for security association analysis of threat intelligence of a metering automation system is provided, which at least comprises the following steps:

step 101, collecting multisource heterogeneous threat intelligence data, wherein the threat intelligence comprises internal source threat intelligence or/and external source threat intelligence; the internal source threat information is key infrastructure data; the externally sourced threat intelligence comprises security events from different OSINT offerings;

step 102, preprocessing the collected threat information, unifying formats, and constructing a threat information transaction database;

103, selecting a transaction data set to be analyzed from a transaction database, associating the selected transaction data set by using a FiDoop algorithm, and pruning to generate a frequent item set; obtaining a strong association rule corresponding to the frequent item set through an Apriori algorithm to form an association analysis result;

and 104, finding potential risks according to the correlation analysis result, and performing positioning processing to ensure the safe operation of the system.

Preferably, the step 102 further comprises:

normalizing the collected threat information data, and unifying the data or items with the same meaning into the same description language;

extracting keywords capable of completely expressing items from the threat description language as transaction data to perform association analysis;

and removing repeated or meaningless data or items for association analysis to form a source transaction data set, thereby constructing the transaction database.

Preferably, the step 103 further comprises:

step 201, selecting a transaction data set to be analyzed from a transaction database, and sorting the transaction data set according to intelligence data;

step 202, appointing a minimum support degree and a minimum confidence degree;

step 203, finding all frequent item sets from the transaction data set by adopting a FiDoop algorithm, wherein the frequent item sets are non-empty subsets with the support degree greater than the minimum support degree after each iteration process;

and step 204, after finding out all frequent item sets, performing association analysis by using an Apriori algorithm, acquiring association rule confidence by using a subset association rule of Apriori in all frequent items with the length of more than 1, comparing the association rule confidence with a minimum confidence threshold value, acquiring a strong association rule meeting conditions, and forming an association analysis result.

Preferably, the step 203 further comprises:

step 301, in a transaction data set, adopting a first MapReduce operation to find all frequent 1-item sets;

step 302, finding out a frequent k-item set by adopting a second MapReduce operation;

and step 303, mining the frequent item set by adopting a third MapReduce operation to obtain all frequent item sets.

Preferably, the step S301 further includes:

the first MapReduce operation is responsible for finding all frequent 1-item sets, the task of the Map at the stage is input into an original transaction data set, and the Reduce task is output of all frequent 1-item sets; the transaction data set is divided into a plurality of segments and stored in the data nodes, each Mapper locally inputs the transaction set segments and stores the transaction set segments in the form of key-value pairs < offset, itemset >, wherein the offset points to the offset value of the transaction, and the itemset represents the transaction itself;

then, each Mapper respectively calculates the frequency of each local item and generates a local frequent 1-item set, the 1-item sets with the same key value are sent to a designated Reducer and are subjected to merging operation to generate a global 1-item set, then the global frequent 1-item set is obtained by comparing with the minimum support degree, the non-frequent items are clipped, and the global frequent 1-item set is output as a first MapReduce in a key value pair < item, count > form.

Preferably, the step S302 further includes:

scanning the database again in the second MapReduce operation, removing the non-frequent item set in the transaction, and generating a frequent k-item set;

taking the first MapReduce operation output as a second operation input, carrying out secondary scanning on a transaction data set, cutting off the infrequent items in the transaction, and if the transaction contains k frequent items, determining the item set as a k-item set, wherein the process is consistent with the first operation; generating a frequent k-item set by referring to all the frequent 1-item sets generated previously;

and after the Mapper work is finished, outputting an intermediate key value pair in the form of < k-items, 1>, wherein the k-items indicates the number of frequent 1-item sets in the clipped residual transactions and the content of the item sets. Merging operation is carried out in a Reducer, the count value of the Reducer is counted, and a key value pair with the form of < k, (k-instances, count) > is output, wherein the key value pair is expressed as the length of an item set, and the value is expressed as the content and the count of the item set under the length; the k-instances generated by this job are arranged in ascending dictionary order.

Preferably, the step 303 further comprises:

in the third MapReduce operation, decomposing the k-itemsets generated by the second operation into a shorter item set, merging the item set with the k-itemsets with the same k value in the local memory according to the item set with the same length to construct a k-FIU-tree, utilizing the distributed processing capacity of MapReduce, generating a group of new key value pairs < k, k-FIU-tree > in the process, meaning generating a group of local FIU-trees with path length k, distributing the items with the same length to the same Reducer, aggregating local FIU-trees with unique lengths in respective maps, constructing k-FIU-trees in a global scope, wherein leaf nodes of the FIU-trees have two attributes of item names and counts, and comparing the count values of the leaf nodes of the global k-FIU-trees with Sms to obtain all frequent item sets.

Preferably, the step 104 further comprises:

performing association analysis on the obtained threat information through a system security association analysis model based on a FiDoop-Apriori fusion algorithm to find out vulnerability information existing in the system and find out potential risks, wherein the method comprises the following steps: software and hardware security risks, external attack risks, network vulnerability and the like, help analysts identify, evaluate and classify multi-source heterogeneous threat information, and locate and process system vulnerabilities.

The embodiment of the invention has the following beneficial effects:

the embodiment of the invention provides a method for carrying out security association analysis on threat information of a metering automation system, which can collect security events and key infrastructure data of internal sources provided by different OSINTs by carrying out data acquisition on multi-source heterogeneous threat information and acquiring the information; and then, preprocessing the acquired threat intelligence, unifying the format of the threat intelligence and generating a source transaction data set. Then, performing correlation analysis on the preprocessed intelligence data by using big data and machine learning, performing algorithm fusion on the intelligence data by using a FiDoop algorithm and an Apriori algorithm, and finding out vulnerability information existing in the current system; and finally, finding potential risks through the correlation analysis result, and performing related operations such as positioning processing and the like to ensure the safe operation of the system.

By implementing the method, the threat intelligence data of different sources and different formats can be subjected to correlation analysis, correlation threat index characteristics are researched and analyzed based on understanding and characteristic induction of the information, potential risks existing in the analysis are predicted through a model, and a suggested measure for responding to threat activities is provided according to results. By implementing the method, the polymerization degree of threat information is improved, the analysis efficiency is improved, safety analysis can be quickly formed to resist potential risks, and the safety of a metering automation system is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic main flow chart of an embodiment of a method for security association analysis of threat intelligence of a metering automation system according to the present invention;

FIG. 2 is a more detailed flow chart of step 103 of FIG. 1;

FIG. 3 is a more detailed flow chart of step 203 of FIG. 2;

FIG. 4 is a diagram of a process of MapReduce operation corresponding to the FiDoop algorithm in FIG. 3 according to an embodiment of the present invention.

FIG. 5 is a diagram illustrating an example of the FIU-tree construction process according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that, in order to avoid obscuring the present invention with unnecessary details, only the structures and/or processing steps closely related to the scheme according to the present invention are shown in the drawings, and other details not so relevant to the present invention are omitted.

Fig. 1 is a main flow diagram illustrating an embodiment of a method for security association analysis of threat intelligence of a metering automation system according to the present invention; referring to fig. 2 to 5 together, in the present embodiment, the method at least includes:

step 101, multi-source heterogeneous threat intelligence is obtained.

Multi-source heterogeneous threat intelligence data is collected, wherein the threat intelligence comprises internal source threat intelligence or/and external source threat intelligence; the internally sourced threat intelligence is key infrastructure data; the out-of-source threat intelligence includes security events from different Open source network intelligence tools (OSINT);

more specifically, for the metering automation system, threat information can be divided into two types of internal sources and external sources, wherein the internal sources mainly comprise asset logs, network flow, operation state data and the like, commercial threat information in the external sources is mainly provided for domestic and foreign security manufacturers and information security government organization units, and open-source data information providers are system provider enterprises and network operators. Wherein, open source information can be collected through the OSINT, and the internal information is obtained through the state data of the system equipment information platform.

And 102, preprocessing data and constructing a threat intelligence transaction data set.

Carrying out data preprocessing operation on the collected threat information, dividing the information data into two categories of structured data and unstructured data, carrying out fake-removing and duplicate-removing, consistency analysis and data fusion operation on the whole data, eliminating redundant repeated information in the data, effectively managing the information with multiple sources and carrying out fusion processing on the information with self data of a corresponding system to form information data suitable for algorithm correlation analysis, storing the information data into a database, constructing a transaction data set, and improving the algorithm analysis efficiency;

more specifically, in one example, the step 102 further comprises:

And 103, performing correlation analysis by using multi-algorithm fusion. Selecting a transaction data set to be analyzed from a transaction database, associating the selected transaction data set by using a FiDoop algorithm, and pruning to generate a frequent item set; obtaining a strong association rule corresponding to the frequent item set through an Apriori algorithm to form an association analysis result;

in one specific example, a transaction data set constructed using threat intelligence is obtained from a database, and association rules for system security analysis are extracted. Association rules may discover some regularity that exists between two or more variable values, i.e., the association or correlation that exists between sets of items in a large amount of data. The method mainly comprises two stages:

the first stage is as follows: all the frequent item sets are found from the transactional data set.

And a second stage: the final association rules are generated from the frequent set of items.

The method for analyzing threat information safety association of the metering automation system based on the fusion algorithm is characterized in that a FiDoop algorithm and an Apriori algorithm are fused, firstly, a transaction data set in a database is associated by the FiDoop algorithm to generate a frequent item set, and the frequent item set which is more than or less than the minimum support degree in the pruning process is stored. And secondly, processing the frequent item set by the advantages of the Apriori algorithm to generate a strong association rule, and performing subsequent association analysis research. Using MapReduce distributed parallel operation framework to process mass data, performing algorithm fusion on a FiDoop algorithm and an Apriori algorithm, and performing frequent pattern mining and association analysis on frequent item sets, as shown in fig. 2, in an example, the step 103 further includes:

step 201, sorting transaction data sets according to intelligence data;

the transaction data set to be analyzed is selected from the transaction database, and the corresponding object mined by the association rule is generally the transaction data set.

More specifically, the association rule may be defined as: let T = { I ₁ ,I ₂ ,I ₃ ,…,I _k 8230and is a transaction database, t _k For the kth transaction of T, a transaction data set I is formed for each set of data items _k ＝{i ₁ ,i ₂ ,i ₃ ,…,i _k 8230, where i _k For a transaction, each I _k In which a plurality of i _k . Forming all transaction data sets into a set T = { I = { n } ₁ ,I ₂ ,I ₃ ,…,I _k 8230, then, the correlation analysis is carried out on T, and then, each transaction in T can be obtainedi _k Inter or multi-part transaction sets N, M (N, M are parts i, respectively) _k Set of) and obtaining support and confidence of the relations.

Step 202, appointing a minimum support degree and a minimum confidence degree;

the support degree represents the probability of occurrence of the event combination of X and Y in the total transaction record, and the probability of occurrence of a certain item set, namely the proportion of the threat information number containing the item set to the total threat information. X and Y are subsets of I, and the association relation between X and Y is found in the transaction data set and is recorded as

Wherein P (X) represents the proportion of the item set X, P (X ≧ Y) represents the proportion of the item set X occurring simultaneously with the item set Y, and the support can be expressed as:

confidence represents the probability that the combination of X and Y events in the transaction of the occurrence item set X occurs. The probability that item set Y appears simultaneously when item set X appears is recorded as

The confidence level refers to the ratio of threat intelligence including both item set X and item set Y, and can be expressed as:

in the present embodiment, the minimum support degree S is set _min And minimum confidence C _min With the thresholds for specifying support and confidence, only the association rules that reach these two thresholds are referred to as strong association rules. In the subsequent process, the non-empty subset with the minimum support degree is kept as a frequent item set after each iteration process, and the finally generated association rule is also composed of the frequent item set.

Step 203, finding all frequent item sets from the transaction data set by adopting a FiDoop algorithm;

this stage serves as the first stage of association rule mining: all the frequent itemsets are found from the transactional dataset. The FiDoop algorithm is adopted to efficiently discover a frequent item set.

The FiDoop algorithm is an algorithm for optimizing through three times of MapReduce operation based on the FIUT algorithm, wherein the FIUT algorithm mainly comprises two stages, the first stage is used for carrying out two database scanning steps and respectively acquiring a frequent 1-item set and a k-item set generated by an infrequent item, and the second stage is used for acquiring all frequent k-item sets. It can be understood that MapReduce is a programming framework (programming model) of a mature distributed operation program, and is used for parallel operation of large data sets.

As shown in fig. 3, which is a flowchart of the three MapReduce operations in step 203, the FiDoop algorithm may be utilized to implement efficient mining on frequent item sets through the three MapReduce operations. Specifically, the step 203 further includes:

step 301, finding all frequent 1-item sets, namely finding all frequent 1-item sets in a transaction data set by adopting a first MapReduce operation;

and the first MapReduce job is responsible for discovering all frequent 1-item sets, the task of the Map at the stage inputs an original transaction data set, and the Reduce task outputs all frequent 1-item sets. Wherein the transaction data set is divided into a plurality of segments and stored in the data nodes, each Mapper locally inputs the transaction set segments and stores them in the form of key-value pairs < offset, itemset >, wherein offset points to the offset value of the transaction and itemset denotes the transaction itself. Then, each Mapper respectively calculates the frequency of each local item and generates a local frequent 1-item set, the 1-item sets with the same key value are sent to a designated Reducer and are subjected to merging operation to generate a global 1-item set, then the global frequent 1-item set is obtained by comparing with the minimum support degree, the non-frequent items are clipped, and the global frequent 1-item set is output as a first MapReduce in a key value pair < item, count > form.

Step 302, finding out a frequent k-item set, namely finding out the frequent k-item set by adopting a second MapReduce operation;

the database is scanned again in a second MapReduce job, removing the non-frequent set of items in the transaction, resulting in a frequent set of k-items. And taking the first MapReduce operation output as a second operation input, performing secondary scanning on a transaction data set, cutting off the infrequent items in the transaction, and if the transaction contains k frequent items, determining the item set as a k-item set, wherein the process is consistent with the first operation. A frequent k-term set is generated with reference to all the frequent 1-term sets previously generated. And after the Mapper finishes working, outputting an intermediate key value pair in the form of < k-itemsets,1>, wherein the k-itemsets indicate the number of frequent 1-item sets and the content of the item sets in the clipped residual transactions. And performing a merge operation in the Reducer, counting the count value of the merge operation, and outputting a key value pair with the form of < k (k-occurrences, count) > which is expressed as the length of the item set, wherein the value is expressed as the content and the count of the item set under the length. The K-items generated by this operation are important data sources for constructing the K-FIU-tree by the third MapReduce operation, and therefore are arranged in ascending order of the dictionary.

Step 303, mining the frequent item set, namely mining the frequent item set by adopting a third MapReduce operation to obtain all frequent item sets;

in the third MapReduce operation, k-itemsets generated by the second operation need to be decomposed into a shorter item set, and are combined with k-itemsets with the same k value in a local memory according to an item set with the same length to construct a k-FIU-tree, by utilizing the distributed processing capacity of MapReduce, a group of new key value pairs < k, k-FIU-tree > is generated by a Mapper in the process, meaning that a group of local FIU-trees with path length k are generated, items with the same length are distributed into the same Reducer, local FIU-trees with unique length in respective maps are aggregated, k-FIU-trees in a global scope are constructed, FIU-tree nodes have two attributes of item names and counts, a recursive traversal of the whole tree is not needed, and a Smulti set is obtained by counting the count values of leaf nodes of the whole k-FIU-tree and comparing the K-FIU-tree with the Smount values.

For ease of understanding, the steps described above may be referred to collectively as shown in fig. 4-5. FIG. 4 shows a FiDoop algorithm MapReduce job process in one embodiment of the invention. FIG. 5 is an exemplary diagram of the FIU-tree construction process, wherein the initial transaction data set is tested to contain 8 pieces of intelligence data, and the minimum support and confidence are set to 0.5.

Step 204, performing Apriori algorithm association analysis;

after all frequent item sets are found out, association analysis is carried out by using an Apriori algorithm, association rule confidence is obtained by using a subset association rule of Apriori in all frequent items with the length larger than 1, comparison is carried out according to a minimum confidence threshold value, and a strong association rule meeting conditions is obtained to form an association analysis result. Specifically, the association rule generating step is as follows:

(1) For each frequent item set I, generating all non-empty subsets, and for each non-empty subset X of I, calculating:

Conference(X)≥minconfidence

to represent

This is true. Obtaining a theorem: let set X, X1 be a subset of X if rule:

not a strong rule, then:

must not be strongly regular; if the rule:

is a strong rule, then

Must be a strong rule.

(2) And verifying whether the generated strong association rule meets the minimum support degree and the minimum confidence degree.

104, finding out potential risks according to the correlation analysis result;

The embodiment of the invention has the following beneficial effects:

the embodiment of the invention provides a method for carrying out security association analysis on threat information of a metering automation system, which can collect security events and key infrastructure data of internal sources provided by different OSINTs by carrying out data acquisition on multi-source heterogeneous threat information and acquiring the information; then, preprocessing the collected threat intelligence, unifying the format of the threat intelligence and generating a source transaction data set. Then, performing correlation analysis on the preprocessed intelligence data by using big data and machine learning, performing algorithm fusion on the intelligence data by using a FiDoop algorithm and an Apriori algorithm, and finding out vulnerability information existing in the current system; and finally, finding potential risks through the correlation analysis result, and performing related operations such as positioning processing and the like to ensure the safe operation of the system.

The above description is only a preferred embodiment of the present invention and should not be taken as limiting the scope of the claims, therefore, other equivalent changes and modifications should be made without departing from the spirit of the present invention.

Claims

1. A method for security association analysis of threat intelligence of a metering automation system is characterized by at least comprising the following steps:

step 101, collecting multisource heterogeneous threat information data, wherein the threat information comprises internal source threat information or/and external source threat information; the internally sourced threat intelligence is key infrastructure data; the externally sourced threat intelligence comprises security events from different OSINT offerings;

step 102, preprocessing the collected threat intelligence and constructing a threat intelligence transaction database;

2. The method of claim 1, wherein the step 102 further comprises:

extracting keywords capable of completely expressing items from the threat description language as transaction data to perform correlation analysis;

3. The method of claim 1, wherein the step 103 further comprises:

step 202, appointing a minimum support degree and a minimum confidence degree;

step 203, finding all frequent item sets from the transaction data set by adopting a FiDoop algorithm, wherein the frequent item sets are non-empty subsets with the support degree larger than the minimum support degree after each iteration process;

4. The method of claim 3, wherein said step 203 further comprises:

5. The method of claim 4, further comprising, in the step S301:

the first MapReduce operation is responsible for finding all frequent 1-item sets, the task of the Map at the stage is input into an original transaction data set, and the Reduce task is output of all frequent 1-item sets; the transaction data set is divided into a plurality of segments and stored in the data nodes, each Mapper locally inputs the transaction set segments and stores the segments in the form of key value pairs < offset, itemset >, wherein the offset points to the offset value of the transaction, and the itemset represents the transaction itself;

then, each Mapper calculates the frequency of each local item and generates a local frequent 1-item set, the 1-item sets with the same key value are sent to a designated Reducer and are merged to generate a global 1-item set, then the global frequent 1-item set is obtained by comparing with the minimum support degree and by clipping the non-frequent items, and the global frequent 1-item set is output as a first MapReduce in the form of a key value pair < item, count >.

6. The method of claim 5, further comprising, in the step S302:

taking the output of the first MapReduce operation as the input of the second operation, carrying out secondary scanning on a transaction data set, cutting off the infrequent items in the transaction, and if the transaction contains k frequent items, then the item set is changed into a k-item set, and the process is consistent with the operation of the first operation; generating a frequent k-item set by referring to all the frequent 1-item sets generated previously;

after the Mapper finishes working, outputting an intermediate key value pair in the form of < k-items, 1>, wherein the k-items indicates the number of frequent 1-item sets in the clipped residual transaction and the content of the item sets; and merging operation is carried out in the Reducer, the count value is counted, a key value pair with the form of < k (k-itemsets, count) > is output, and the k-itemsets generated by the operation are arranged in ascending order of a dictionary.

7. The method of claim 6, further comprising, in step 303:

in the third MapReduce operation, decomposing the k-itemsets generated by the second operation into a shorter item set, merging the item set with the k-itemsets with the same k value in the local memory according to the item set with the same length to construct a k-FIU-tree, and generating a group of new key value pairs < k, k-FIU-tree > by using the distributed processing capacity of the MapReduce in the process, wherein the meaning of the key value pairs is that a group of local FIU-trees with the path length of k is generated; items with the same length are distributed to the same Reducer, local FIU-trees with the unique length in respective maps are gathered, k-FIU-trees in a global scope are constructed, FIU-tree leaf nodes have two attributes of item names and counts, and all frequent item sets are obtained by counting the count values of the leaf nodes of the global k-FIU-trees and comparing the count values with Smin.

8. The method of any of claims 1 to 7, wherein the step 104 further comprises: