CN114880380A - Method for realizing power grid alarm data association traceability system based on density clustering and self-organizing network - Google Patents

Method for realizing power grid alarm data association traceability system based on density clustering and self-organizing network Download PDF

Info

Publication number
CN114880380A
CN114880380A CN202210597168.XA CN202210597168A CN114880380A CN 114880380 A CN114880380 A CN 114880380A CN 202210597168 A CN202210597168 A CN 202210597168A CN 114880380 A CN114880380 A CN 114880380A
Authority
CN
China
Prior art keywords
data
alarm
alarm data
target
correlation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210597168.XA
Other languages
Chinese (zh)
Inventor
王光辉
李海龙
苏生平
王云翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information and Telecommunication Branch of State Grid Qinghai Electric Power Co Ltd
Original Assignee
Information and Telecommunication Branch of State Grid Qinghai Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information and Telecommunication Branch of State Grid Qinghai Electric Power Co Ltd filed Critical Information and Telecommunication Branch of State Grid Qinghai Electric Power Co Ltd
Priority to CN202210597168.XA priority Critical patent/CN114880380A/en
Publication of CN114880380A publication Critical patent/CN114880380A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Business, Economics & Management (AREA)
  • Economics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Fuzzy Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • General Health & Medical Sciences (AREA)
  • Water Supply & Treatment (AREA)
  • Public Health (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a method for realizing a power grid alarm data correlation traceability system based on density clustering and a self-organizing network, which comprises the following steps: segmenting original alarm data of multiple power stations based on multiple dimensions, and performing feature vectorization by using feature keywords of each dimension; extracting target suspicious alarm data from the vectorized alarm data, establishing a correlation index ConValue, and clustering the target alarm data and the alarm data of each power station by using a DBSCAN to obtain a plurality of pieces of correlation data; extracting the associated data and the alarm starting time and the latest occurrence time of the target alarm data, defining the port entry and the port exit of the associated data by taking the time of the target alarm data as an origin, and constructing an attack propagation diagram with two time dimensions; and performing secondary feature segmentation on the associated data of the multiple power stations, and performing secondary feature mining by using a self-organizing network to obtain fine-grained feature description of the associated data. The invention can ensure the safe and stable operation of the power monitoring system.

Description

Method for realizing power grid alarm data association traceability system based on density clustering and self-organizing network
Technical Field
The invention relates to a method for realizing a power grid alarm data correlation traceability system based on density clustering and a self-organizing network, belonging to the technical field of power grid information safety.
Background
With the rapid development of computer technology and industrial internet of things, more and more fields begin to carry out informatization construction. In the rapid development of informatization and digitization, a plurality of computer technologies are also adopted to ensure the stable operation of the system as a key information infrastructure. Meanwhile, the safety of the power system is not negligible as a national pillar energy.
The power monitoring system is used as an important component for guaranteeing the safety of the power system, mainly undertakes tasks of real-time data acquisition, on-off state detection, remote control and the like of power equipment of different levels, can capture various abnormal states of the power equipment in time, and reports the abnormal states to the power scheduling system in time. Because the types and the number of the accessed electric power equipment are rapidly increased, the external threats suffered by the electric power monitoring system are increasingly increased, and how to analyze and research various abnormal alarm data and the back attack behaviors is the key for guaranteeing the safety and the stability of the electric power monitoring system. The existing power monitoring system has the defects in the aspect of network security behavior analysis, and the data processing capability and the security pre-judging capability are limited, which are mainly shown in the following steps: (1) the network security event analysis capability in the power monitoring system is poor; (2) the attack tracing positioning capability in the power monitoring system is poor; (3) the safety state in the power monitoring system is determined to be deficient.
At present, researches show that although alarm data of a power monitoring system are complex, some of the alarm data have intrinsic associated data, for example, multiple pieces of alarm data in multiple time periods are caused by illegal USB device access, multiple pieces of alarm data are captured by multiple stations caused by illegal access of a certain station to the Internet, and the like. The attack behavior is often captured by the monitoring system and presented in the form of alarm data, and is classified according to different sites, so that possible attack behaviors need to be analyzed from massive alarm data of multiple sites at present, source tracing operation is carried out, the internal behavior characteristics of suspicious attack behaviors are analyzed, the network security risk is reduced, and the safe and stable operation of the power monitoring system is guaranteed.
Disclosure of Invention
The invention aims to provide a method for realizing a power grid alarm data correlation traceability system based on density clustering and a self-organized network aiming at the problems that effective attack behaviors are difficult to extract from massive alarm data of a plurality of power stations and high-efficiency traceability operation cannot be carried out, the method analyzes deep level correlation of the alarm data of a plurality of sites in time and space by using a density clustering algorithm from the alarm data collected by a power monitoring system, tries to find the relation between different sites on the alarm data, excavates the generation reason of certain abnormal alarm data, positions the source of a specific alarm event, realizes the correlation analysis and traceability operation of suspicious attack behaviors from a plurality of dimensions of the massive alarm data, simultaneously adopts the self-organized mapping network to carry out secondary characteristic division on the obtained correlation data, analyzes the proportion of the characteristics of each dimension in the correlation data, the method has the advantages that further feature mining on the attack link is realized, the complete time-space level attack link of the suspicious attack behavior and the attack features with different weights are finally obtained, the capability of the power monitoring system for analyzing and defending attack events can be effectively enhanced, and the deep research and associated tracing on the suspicious attack behavior are realized.
The technical scheme adopted by the invention for solving the technical problems is as follows: a method for realizing a power grid alarm data correlation traceability system based on density clustering and a self-organized network is disclosed, the method carries out multidimensional segmentation on alarm data, utilizes the density clustering to carry out correlation data analysis, and adopts the self-organized network to carry out attack link construction and deep level feature description on the basis of feature secondary processing, and the method comprises the following steps:
step 1: collecting a plurality of power station alarm data, performing segmentation operation on original data based on a plurality of dimensions, and performing feature vectorization by using feature keywords of each dimension to form multi-power station vectorized alarm data;
step 2: on the basis of the step 1, extracting target suspicious alarm data from the vectorized alarm data, simultaneously establishing a correlation index ConValue to represent the degree of correlation between the target suspicious alarm data and other data, calculating two parameters Eps and MinPts of a DBSCAN algorithm, and clustering the target alarm data and the alarm data of each power station by using the DBSCAN to obtain a plurality of pieces of correlation data;
and step 3: on the basis of the step 2, extracting the associated data and the alarm starting time and the latest occurrence time of the target alarm data, defining the port entry and the port exit of the associated data by taking the time of the target alarm data as an origin, and constructing an attack propagation diagram with two time dimensions;
and 4, step 4: and on the basis of the step 2 and the step 3, performing secondary feature segmentation on the associated data of the multiple power stations, performing secondary feature mining by using a self-organizing network to obtain fine-grained feature description of the associated data, and finally giving a more detailed attack propagation diagram.
Further, the invention collects a plurality of power station alarm data, performs segmentation operation on the original data based on a plurality of dimensions, and performs feature vectorization by using the feature keywords of each dimension to form multi-power station vectorized alarm data, comprising:
step 1-1, dimension division: segmenting all original alarm data of a plurality of power stations in a certain time range according to 11 dimensions such as alarm level, alarm content, alarm equipment, reporting equipment, alarm starting time and the like, and realizing the object description of the alarm data;
step 1-2, vectorizing the characteristics: aiming at the text type alarm data after the dimensionality segmentation, a keyword matching method is adopted to search keywords for the content of each dimensionality, and the specific vector converted by each dimensionality is determined according to the existence of the keywords, so that the data set is converted into simple numerical data from a text type;
step 1-3, time dimension conversion: aiming at data of two dimensions of alarm starting time and latest occurrence time in alarm data, a timestamp conversion method is directly adopted to convert an original date into a standard 10-bit timestamp form;
1-4, summarizing data: and summarizing the operated data set, and completely segmenting the original text alarm data into vectorized numerical data with multiple dimensions, wherein the vectorized numerical data is used as a target data set of subsequent operation.
Further, on the basis of step 1 of the present invention, extracting target suspicious alarm data from vectorized alarm data, and at the same time, formulating a correlation index ConValue, which represents the degree of correlation between the target suspicious alarm data and other data, and calculating two parameters Eps and MinPts of the dbs can algorithm, and using dbs can to perform clustering operation on the target alarm data and the alarm data of each power station to obtain a plurality of pieces of correlation data, including:
step 2-1, based on the segmented multi-power station alarm data, selecting an item with more suspicious alarm content as target suspicious data, copying a plurality of pieces of vectorized data of the target suspicious data and putting the vectorized data into an alarm database of each power station, so that a total data set is divided into a plurality of power station sub-data sets;
step 2-2, establishing a correlation index ConValue for representing the correlation degree of the target suspicious alarm data and the data in the multi-power station sub-data set, wherein the value is N multiplied by 0.2 generally, and N is the total number of alarm data items of the power station;
step 2-3, calculating parameters Eps and MinPts of the DBSCAN cluster according to the correlation index ConValue, wherein Eps is ConValue × 0.07+0.1, and MinPts is Round (ConValue × 0.2+2), wherein Round is an integer function, that is, the number of samples in the field where the distance in the samples is Eps is guaranteed to be an integer;
and 2-4, based on the parameters Eps and MinPts, clustering the alarm data in each power station by using a DBSCAN algorithm to obtain a plurality of alarm data associated with the target suspicious alarm data, and summarizing the alarm data into an associated data set.
Further, in step 3 of the present invention, based on step 2, extracting the association data and the alarm start time and the latest occurrence time of the target alarm data, defining the ingress and egress of the association data with the time of the target alarm data as the origin, and constructing an attack propagation diagram with two time dimensions, including:
step 3-1, extracting the alarm starting time and the latest occurrence time of each piece of data for the current alarm data and a plurality of pieces of associated data obtained by DBSCAN clustering, and directly sorting the timestamps to obtain the sequential relation of all data in two time dimensions;
step 3-2, with the time of the target alarm data as an origin, defining the inbound association data as the association data with a timestamp smaller than an origin timestamp, defining the outbound association data as the association data with a timestamp larger than the origin timestamp, and dividing the target alarm data and the association data according to the definitions of inbound and outbound by taking the timestamp of the alarm starting time and the timestamp of the alarm latest occurrence time as two dimensions;
and 3-3, drawing two graphs on a map of the power station, wherein one graph is the entrance associated data and the exit associated data at the alarm starting time, the other graph is the entrance associated data and the exit associated data at the latest alarm occurrence time, and the two graphs are used as attack propagation graphs aiming at the target alarm data in two time dimensions.
Further, in step 4 of the present invention, on the basis of step 2 and step 3, secondary feature segmentation is performed on the associated data of multiple power stations, secondary feature mining is performed by using a self-organizing network, fine-grained feature description of the associated data is obtained, and finally, a more detailed attack propagation diagram is given, including:
step 4-1, selecting alarm content of alarm data based on the associated data set obtained in step 3, formulating more dimensional feature judgment standards, such as whether partitions, partition names, USB equipment access, USB equipment manufacturers, serial port names, serial port protocols and the like are involved, and performing secondary feature quantization on the associated data by using the judgment standards;
step 4-2, performing feature mining on the associated data and the target alarm data after the secondary feature quantization by using a Self-Organizing network (SOM) to obtain deeper feature information of the data;
and 4-3, adding the obtained characteristic information into the two attack propagation graphs constructed in the step 3, further perfecting the attack propagation graphs, and simultaneously giving out a multi-dimensional characteristic with high occupation ratio for describing the common information of the associated data.
Has the advantages that:
1. in addition, the method is popularized from a single power station to a plurality of power stations, technical portability is realized, and compared with other modes, the method can effectively save analysis time and improve research efficiency.
2. The invention describes the degree of association between target alarm data and other power station data in multiple dimensions by defining the association index, the larger the association index is, the more similar the finally mined association data and the target data are, and simultaneously two core parameters required by DBSCAN clustering are calculated by the association index, so that the actual effect of clustering can be controlled, the condition of low clustering accuracy or non-uniform clustering standards is avoided, and compared with other modes, the actual effect is better, and the association data mining is more accurate.
3. The invention establishes the sequence of the associated data in the attack propagation process, describes a plurality of associated data under the time dimension by defining the port of entry and the port of exit, simultaneously formulates the secondary feature mining standard, realizes the further mining and analysis of the features, not only achieves the associated tracing operation of the target suspicious alarm data, but also provides a solution idea for researching the internal features and rules of the attack behavior in the later period, and can ensure the safety and the stability of the operation of the power monitoring system to a certain extent.
Drawings
FIG. 1 is an overall flow chart of the present invention.
FIG. 2 is a block diagram of the structure of density clustering in the present invention.
FIG. 3 is a block diagram of the structure of attack propagation graph construction in the present invention.
FIG. 4 is a block diagram of a quadratic feature mining architecture in accordance with the present invention.
Detailed Description
In the following description, for purposes of explanation, numerous implementation details are set forth in order to provide a thorough understanding of the embodiments of the invention. It should be understood, however, that these implementation details are not to be interpreted as limiting the invention. That is, in some embodiments of the invention, such implementation details are not necessary.
As shown in fig. 1-4, the invention provides a method for implementing a power grid alarm data correlation traceability system based on density clustering and a self-organizing network, which performs multidimensional division and correlation clustering based on original alarm data, performs quadratic feature mining, and performs correlation traceability on target alarm data, and comprises the following steps:
step 1: collecting a plurality of power station alarm data, carrying out segmentation operation on original data based on a plurality of dimensions, and carrying out feature vectorization by using feature keywords of each dimension to form multi-power station vectorization alarm data, which specifically comprises the following steps:
(1) dimension division: segmenting all original alarm data of a plurality of power stations in a certain time range according to 11 dimensions such as alarm level, alarm content, alarm equipment, reporting equipment, alarm starting time and the like, and realizing the object description of the alarm data;
(2) vectorizing the characteristics: aiming at the text type alarm data after the dimensionality segmentation, a keyword matching method is adopted to search keywords for the content of each dimensionality, and the specific vector converted by each dimensionality is determined according to the existence of the keywords, so that the data set is converted into simple numerical data from a text type;
(3) and (3) time dimension conversion: aiming at data of two dimensions of alarm starting time and latest occurrence time in alarm data, a timestamp conversion method is directly adopted to convert an original date into a standard 10-bit timestamp form;
(4) data summarization: and summarizing the operated data set, and completely segmenting the original text alarm data into vectorized numerical data with multiple dimensions, wherein the vectorized numerical data is used as a target data set of subsequent operation.
Step 2: on the basis of the step 1, extracting target suspicious alarm data from vectorized alarm data, simultaneously formulating a correlation index ConValue to represent the degree of correlation between the target suspicious alarm data and other data, calculating two parameters Eps and MinPts of a dbs can algorithm, and clustering the target alarm data and the alarm data of each power station by using the dbs can to obtain a plurality of pieces of correlation data, as shown in fig. 2, specifically:
step 2-1, based on the segmented multi-power station alarm data, selecting an item with more suspicious alarm content as target suspicious data, copying a plurality of pieces of vectorized data of the target suspicious data and putting the vectorized data into an alarm database of each power station, so that a total data set is divided into a plurality of power station sub-data sets;
step 2-2, establishing a correlation index ConValue for representing the correlation degree of the target suspicious alarm data and the data in the multi-power station sub-data set, wherein the value is N multiplied by 0.2 generally, and N is the total number of alarm data items of the power station;
step 2-3, calculating parameters Eps and MinPts of the DBSCAN cluster according to the correlation index ConValue, wherein Eps is ConValue × 0.07+0.1, and MinPts is Round (ConValue × 0.2+2), wherein Round is an integer function, that is, the number of samples in the field where the distance in the samples is Eps is guaranteed to be an integer;
and 2-4, based on the parameters Eps and MinPts, clustering the alarm data in each power station by using a DBSCAN algorithm to obtain a plurality of alarm data associated with the target suspicious alarm data, and summarizing the alarm data into an associated data set.
And step 3: on the basis of the step 2, extracting the associated data and the alarm start time and the latest occurrence time of the target alarm data, defining the entry and exit of the associated data by taking the time of the target alarm data as an origin, and constructing an attack propagation diagram with two time dimensions, as shown in fig. 3, specifically comprising:
(3-1) extracting the alarm starting time and the latest occurrence time of each piece of data from the current alarm data and a plurality of pieces of associated data obtained by DBSCAN clustering, and directly sorting the timestamps to obtain the sequential relation of all data in two time dimensions;
(3-2) dividing the target alarm data and the associated data according to the definitions of the entry and the exit by taking the time of the target alarm data as an origin, defining the entry associated data as the associated data of which the timestamp is smaller than the timestamp of the origin, defining the exit associated data as the associated data of which the timestamp is larger than the timestamp of the origin, and taking the timestamp of the alarm starting time and the timestamp of the alarm latest occurrence time as two dimensions;
and (3-3) drawing two graphs on a map of the power station, wherein one graph is the inbound associated data and the outbound associated data at the alarm starting time, the other graph is the inbound associated data and the outbound associated data at the latest alarm occurrence time, and the two graphs are used as attack propagation graphs aiming at the target alarm data in two time dimensions.
And 4, step 4: on the basis of the step 2 and the step 3, performing secondary feature segmentation on the associated data of multiple power stations, performing secondary feature mining by using a self-organizing network to obtain fine-grained feature description of the associated data, and finally giving a more detailed attack propagation diagram, as shown in fig. 4, specifically including:
step 4-1: selecting alarm content of alarm data based on the associated data set obtained in the step 3, formulating more dimensional characteristic judgment standards, such as whether partitions and partition names are involved, whether USB equipment, USB equipment manufacturers, serial port names, serial port protocols and the like are accessed, and performing secondary characteristic quantization on the associated data by using the judgment standards;
step 4-2: performing feature mining on the associated data and the target alarm data after the secondary feature quantization by using a self-organizing network to obtain deeper feature information of the data;
step 4-3: and adding the obtained characteristic information into the two attack propagation diagrams constructed in the step 3, further perfecting the attack propagation diagrams, and simultaneously giving out multi-dimensional characteristics with higher occupation ratio for describing the common information of the associated data.
The method comprises the steps of analyzing original alarm data of multiple power stations, segmenting based on multiple dimensions, converting text data into vectorized data by adopting a keyword matching method, defining association indexes to describe the association degree between target suspicious alarm data and the alarm data of the multiple power stations, calculating parameters of DBSCAN and performing association clustering operation to obtain a plurality of pieces of association alarm data, performing secondary feature mining on the alarm data, obtaining data features with finer granularity by using a self-organizing network, constructing and describing an attack propagation diagram based on two time dimensions, and finally realizing association traceability operation of the target suspicious alarm data under the background of the multiple power stations, thereby effectively improving the capability of the power monitoring system for defending attacks and realizing safer and more stable operation.
The above description is only an embodiment of the present invention, and is not intended to limit the present invention. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims (5)

1. A method for realizing a power grid alarm data correlation traceability system based on density clustering and a self-organizing network is characterized by comprising the following steps: the method comprises the following steps:
step 1: collecting a plurality of power station alarm data, carrying out segmentation operation on original data based on a plurality of dimensions, and carrying out feature vectorization by using feature keywords of each dimension to form multi-power station vectorized alarm data;
step 2: extracting target suspicious alarm data from the vectorized alarm data, simultaneously formulating a correlation index ConValue to represent the degree of correlation between the target suspicious alarm data and other data, calculating two parameters Eps and MinPts of a DBSCAN algorithm, and clustering the target alarm data and the alarm data of each power station by using the DBSCAN to obtain a plurality of pieces of correlation data;
and 3, step 3: extracting the associated data and the alarm starting time and the latest occurrence time of the target alarm data, defining the port entry and the port exit of the associated data by taking the time of the target alarm data as an origin, and constructing an attack propagation diagram with two time dimensions;
and 4, step 4: and performing secondary feature segmentation on the associated data of the multiple power stations, performing secondary feature mining by using a self-organizing network to obtain fine-grained feature description of the associated data, and finally giving a more detailed attack propagation diagram.
2. The method for implementing the power grid alarm data correlation traceability system based on density clustering and the self-organizing network as claimed in claim 1, is characterized in that: the step 1 comprises the following steps:
step 1-1, dimension division: segmenting all original alarm data of a plurality of power stations in a certain time range according to 11 dimensions of alarm level, alarm content, alarm equipment, reporting equipment and alarm starting time to realize the object-oriented description of the alarm data;
step 1-2, vectorizing the characteristics: aiming at the text type alarm data after the dimensionality segmentation, a keyword matching method is adopted to search keywords for the content of each dimensionality, and the specific vector converted by each dimensionality is determined according to the existence of the keywords, so that the data set is converted into simple numerical data from a text type;
step 1-3, time dimension conversion: aiming at data of two dimensions of alarm starting time and latest occurrence time in alarm data, a timestamp conversion method is directly adopted to convert an original date into a standard 10-bit timestamp form;
1-4, summarizing data: and summarizing the operated data set, and completely segmenting the original text alarm data into vectorized numerical data with multiple dimensions, wherein the vectorized numerical data is used as a target data set of subsequent operation.
3. The method for implementing the power grid alarm data correlation traceability system based on density clustering and the self-organizing network as claimed in claim 1, is characterized in that: the step 2 comprises the following steps:
step 2-1, based on the segmented multi-power station alarm data, selecting an item with more suspicious alarm content as target suspicious data, copying a plurality of pieces of vectorized data of the target suspicious data and putting the vectorized data into an alarm database of each power station, so that a total data set is divided into a plurality of power station sub-data sets;
step 2-2, establishing a correlation index ConValue for representing the correlation degree of the target suspicious alarm data and the data in the multi-power station sub-data set, wherein the value is Nx 0.2, and N is the total number of alarm data items of the power station;
step 2-3, calculating parameters Eps and MinPts of the DBSCAN cluster according to the correlation index ConValue, wherein Eps is ConValue × 0.07+0.1, and MinPts is Round (ConValue × 0.2+2), wherein Round is an integer function, that is, the number of samples in the field where the distance in the samples is Eps is guaranteed to be an integer;
and 2-4, based on the parameters Eps and MinPts, clustering the alarm data in each power station by using a DBSCAN algorithm to obtain a plurality of alarm data associated with the target suspicious alarm data, and summarizing the alarm data into an associated data set.
4. The method for implementing the power grid alarm data correlation traceability system based on density clustering and the self-organizing network as claimed in claim 1, is characterized in that: the step 3 comprises the following steps:
step 3-1, extracting the alarm starting time and the latest occurrence time of each piece of data for the current alarm data and a plurality of pieces of associated data obtained by DBSCAN clustering, and directly sorting the timestamps to obtain the sequential relation of all data in two time dimensions;
step 3-2, with the time of the target alarm data as an origin, defining the inbound association data as the association data with a timestamp smaller than an origin timestamp, defining the outbound association data as the association data with a timestamp larger than the origin timestamp, and dividing the target alarm data and the association data according to the definitions of inbound and outbound by taking the timestamp of the alarm starting time and the timestamp of the alarm latest occurrence time as two dimensions;
and 3-3, drawing two graphs on a map of the power station, wherein one graph is the entrance associated data and the exit associated data at the alarm starting time, the other graph is the entrance associated data and the exit associated data at the latest alarm occurrence time, and the two graphs are used as attack propagation graphs aiming at the target alarm data in two time dimensions.
5. The method for implementing the power grid alarm data correlation traceability system based on density clustering and the self-organizing network as claimed in claim 1, is characterized in that: the step 4 comprises the following steps:
step 4-1, selecting alarm content of alarm data, formulating more dimensional characteristic judgment standards, namely whether partitions and partition names are involved, whether USB equipment, USB equipment manufacturers, serial port names, serial port protocols and the like are accessed, and performing secondary characteristic quantization on associated data by utilizing the judgment standards;
4-2, performing feature mining on the associated data and the target alarm data after the secondary feature quantization by using a self-organizing network to obtain deeper feature information of the data;
and 4-3, adding the obtained characteristic information into the two attack propagation graphs constructed in the step 3, further perfecting the attack propagation graphs, and simultaneously giving out a multi-dimensional characteristic with high occupation ratio for describing the common information of the associated data.
CN202210597168.XA 2022-05-30 2022-05-30 Method for realizing power grid alarm data association traceability system based on density clustering and self-organizing network Pending CN114880380A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210597168.XA CN114880380A (en) 2022-05-30 2022-05-30 Method for realizing power grid alarm data association traceability system based on density clustering and self-organizing network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210597168.XA CN114880380A (en) 2022-05-30 2022-05-30 Method for realizing power grid alarm data association traceability system based on density clustering and self-organizing network

Publications (1)

Publication Number Publication Date
CN114880380A true CN114880380A (en) 2022-08-09

Family

ID=82679783

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210597168.XA Pending CN114880380A (en) 2022-05-30 2022-05-30 Method for realizing power grid alarm data association traceability system based on density clustering and self-organizing network

Country Status (1)

Country Link
CN (1) CN114880380A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024066292A1 (en) * 2022-09-26 2024-04-04 中兴通讯股份有限公司 Device group fault identification method and apparatus, and computer-readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024066292A1 (en) * 2022-09-26 2024-04-04 中兴通讯股份有限公司 Device group fault identification method and apparatus, and computer-readable storage medium

Similar Documents

Publication Publication Date Title
CN107835087B (en) Automatic extraction method of alarm rule of safety equipment based on frequent pattern mining
CN107391598B (en) Automatic threat information generation method and system
CN107104951B (en) Method and device for detecting network attack source
Efstathopoulos et al. Operational data based intrusion detection system for smart grid
CN105376193A (en) Intelligent association analysis method and intelligent association analysis device for security events
CN109376797B (en) Network traffic classification method based on binary encoder and multi-hash table
CN110519231A (en) A kind of cross-domain data exchange supervisory systems and method
CN114880380A (en) Method for realizing power grid alarm data association traceability system based on density clustering and self-organizing network
CN105654392A (en) Familial defect analysis method of equipment based on clustering algorithm
CN110287237B (en) Social network structure analysis based community data mining method
CN115034671A (en) Secondary system information fault analysis method based on association rule and cluster
CN115664703A (en) Attack tracing method based on multi-dimensional information
Zhao Research on network security defence based on big data clustering algorithms
CN111143622B (en) Fault data set construction method based on big data platform
CN109615558B (en) User electricity habit analysis method based on electricity parameter big data statistics
CN112529191A (en) Pump station fault tree establishment method based on chaotic algorithm
CN115169234B (en) Power network reliability assessment method based on big data analysis
Gao et al. Machine Learning-Based Reliability Improvement of Ambient Mode Extraction for Smart Grid Utilizing Isolation Forest
Zhu et al. MCFM: Discover Sensitive Behavior from Encrypted Traffic in Industrial Control System
Lu et al. Anomaly Recognition Method for Massive Data of Power Internet of Things Based on Bayesian Belief Network
Ishikawa et al. A dynamic mobility histogram construction method based on Markov chains
Yu et al. Heterogeneous IoT and data fusion communication algorithms for power distribution station areas
CN110990791A (en) Big data processing-based data analysis method for smart power grid construction
Wang et al. A Comprehensive Model for Analysing Correlation and Traceability of Alarm Data from Power Stations by Coupling DBSCAN and SOM Algorithms
Deng et al. An overview on key technologies of secure and efficient data transmission for Energy Internet

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination