CN108062363A - A kind of data filtering method and system towards active power distribution network - Google Patents

A kind of data filtering method and system towards active power distribution network Download PDF

Info

Publication number
CN108062363A
CN108062363A CN201711265255.0A CN201711265255A CN108062363A CN 108062363 A CN108062363 A CN 108062363A CN 201711265255 A CN201711265255 A CN 201711265255A CN 108062363 A CN108062363 A CN 108062363A
Authority
CN
China
Prior art keywords
data
attribute
power distribution
distribution network
active power
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711265255.0A
Other languages
Chinese (zh)
Inventor
岳东
李诗玥
邓松
葛辉
张利平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201711265255.0A priority Critical patent/CN108062363A/en
Publication of CN108062363A publication Critical patent/CN108062363A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • H02J13/0013

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Fuzzy Systems (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of data filtering method and system towards active power distribution network, method includes:Active power distribution network data are started the cleaning processing;Attribute reduction is carried out to cleaned active power distribution network data, determines to obtain the training data comprising the conditional attribute in minimum relative priority set and data to be filtered after yojan;Categorical filtering is carried out to the data to be filtered after the yojan, data mining is carried out to training data using gene expression algorithm, obtains the classification function relation between conditional attribute and known Sensitive Attributes;Conditional attribute in data to be filtered is substituting in gained classification function relation, is calculated function actual value, and judges whether the attribute is Sensitive Attributes and filters out the sensitive data with Sensitive Attributes according to the function actual value of calculating.The present invention can carry out categorical filtering to all kinds of business sensitive datas of active power distribution network, so as to carry out active defense to sensitive data.Solve the data transmission problems of active power distribution network.

Description

Data filtering method and system for active power distribution network
Technical Field
The invention relates to a data filtering method and system for an active power distribution network, and belongs to the technical field of power information transmission.
Background
The power distribution network is one of the core contents of intelligent power grid construction as an important component of a power system. Especially in recent years, with the wide access of units of multiple distributed equipment components (such as micro-grids, charging piles and the like) such as large-scale distributed energy sources, energy storage, flexible loads and the like, power distribution networks are developing towards active power distribution networks. Compared with the traditional power distribution network, the composition of the active power distribution network and the interaction relationship between the source network loads are more frequent and complex.
With the development of economy and the progress of science and technology, active power distribution networks are widely applied, but a plurality of problems still exist in the current active power distribution networks to be solved and improved. With the great application of advanced information communication technologies such as wireless communication and internet of things in active power distribution networks, the problems in network security become more and more serious, including attacks of viruses, trojans and hackers from the internet, and even malicious attacks from the inside of the information communication network can cause the control network of the whole active power distribution network to crash. The existing safety protection scheme of the power secondary system does not consider the safety protection problem when the active power distribution network is inside and interacts with the outside, and particularly along with the continuous deep construction of the smart power grid, the active power distribution network has a more complex access environment, flexible and various access modes (such as GPRS, wiFi, optical fiber communication and the like), a large number of intelligent access terminals (such as information acquisition terminals of various distributed energy resources, power distribution equipment, line operation state monitoring terminals and the like) and dynamically distributed mass access data. The safe operation of the active power distribution network is closely related to the benefits of the masses of people, and the protection of the safe transmission of mass data in the active power distribution network is a key link for realizing the safe operation of the active power distribution network.
Along with the rapid development of the active power distribution network, the importance of data security transmission is increasingly outstanding, and the characteristics of complex data structure, wide source, huge information quantity, frequent interaction and the like under the active power distribution network bring great challenges to the security guarantee of data transmission and interaction under the active power distribution network. The traditional data transmission adopts an encryption mode to protect data security, and the efficiency of data transmission of the active power distribution network is inevitably influenced along with the expansion of the data volume of encrypted information. The data filtering can be essentially understood as binary or multivariate data classification, and the existing data classification technology cannot meet the safety protection requirement of various service sensitive data of the active power distribution network transmitted through the wireless public network in the aspects of accuracy and efficiency. Therefore, it is urgent to explore relevant theories and methods for data filtering of the active power distribution network, and provide theoretical and technical support for safe and stable operation of the active power distribution network.
Disclosure of Invention
The technical problem to be solved by the invention is to overcome the defects of the prior art, provide a data filtering method and system for an active power distribution network, solve the safety problem of data transmission of the active power distribution network, and classify and filter various service sensitive data of the active power distribution network by using the method, so as to actively protect the sensitive data.
The invention specifically adopts the following technical scheme to solve the technical problems:
a data filtering method for an active power distribution network comprises the following steps:
cleaning the data of the active power distribution network;
and performing attribute reduction on the cleaned active power distribution network data, wherein the attribute reduction comprises the following steps: selecting known sensitive data from the cleaned data of the active power distribution network to form training data, marking the training data with known sensitive attributes, and forming data to be filtered by the residual data; determining an attribute set from the training data, wherein the attribute set comprises defined data condition attributes and known sensitivity attributes; calculating a dependency calculation value between the condition attribute and the known sensitive attribute of the training data in the attribute set, and selecting the condition attribute according to the calculated dependency calculation value to generate a minimum relative attribute set; determining training data and data to be filtered which are obtained after reduction and contain conditional attributes in the minimum relative attribute set based on the minimum relative attribute set;
classifying and filtering the reduced data to be filtered, comprising the following steps: carrying out data mining on the training data by using a gene expression programming algorithm to obtain a classification function relation between the condition attribute and the known sensitive attribute; substituting the condition attribute in the data to be filtered into the obtained classification function relationship, calculating to obtain a function actual value, comparing the calculated function actual value with a target value to judge whether the attribute of the data is sensitive attribute, and filtering out the sensitive data with the sensitive attribute.
Further, as a preferred technical solution of the present invention: the method for cleaning the data of the active power distribution network comprises the following steps: eliminating noise data, removing redundant attributes, judging whether missing values exist in the data and filling the missing values.
Further, as a preferred technical solution of the present invention: in the method, a calculated value of the dependency between the condition attribute and the known sensitive attribute of the training data in the attribute set is calculatedThe formula is adopted:
wherein M is 1 ={m 1 ,m 2 ,…,m m Is the set of conditional attributes, F is a known sensitivity attribute;is an equivalence class U F About M 1 The positive field of (1), card (-) represents the cardinality of the set.
Further, as a preferred technical solution of the present invention, in the method, a conditional attribute is selected according to the calculated dependency calculation value to generate a minimum relative attribute set, specifically:
calculating to obtain a calculated value of the dependence of all condition attributes of the data in the attribute set relative to the known sensitive attribute;
respectively calculating the calculated value of the dependence of each condition attribute of the data in the attribute set relative to the known sensitive attribute, selecting the condition attribute according to the calculated value of the dependence to generate a current minimum relative attribute set, and calculating the dependence values of all the condition attributes in the current minimum relative attribute set relative to the known sensitive attribute;
and comparing the calculated dependency values of all condition attributes in the current minimum relative attribute set relative to the known sensitive attributes with a set threshold, and determining to obtain a minimum relative attribute set according to the comparison result.
Further, as a preferred technical solution of the present invention: the method further comprises the step of carrying out encryption security protection on the sensitive data with the sensitive attribute.
The invention also provides a data filtering system facing the active power distribution network, which comprises:
the data cleaner is used for cleaning the data of the active power distribution network;
the data attribute reducer is used for carrying out attribute reduction on the active power distribution network data cleaned by the data cleaner, selecting known sensitive data from the cleaned active power distribution network data to form training data, marking the known sensitive data, and forming data to be filtered by the residual data; determining an attribute set from the training data, wherein the attribute set comprises defined data condition attributes and known sensitivity attributes; calculating a dependency calculation value between the condition attribute and the known sensitive attribute of the training data in the attribute set, and selecting the condition attribute according to the calculated dependency calculation value to generate a minimum relative attribute set; determining training data and data to be filtered which are obtained after reduction and contain conditional attributes in the minimum relative attribute set based on the minimum relative attribute set;
the data filter is used for classifying and filtering the data to be filtered after the data attribute reducer reduces, and comprises: carrying out data mining on the training data by using a gene expression programming algorithm to obtain a classification function relation between the condition attribute and the known sensitive attribute; substituting the condition attribute in the data to be filtered into the obtained classification function relationship, calculating to obtain a function actual value, comparing the calculated function actual value with a target value to judge whether the attribute of the data is a sensitive attribute, and filtering out the sensitive data with the sensitive attribute.
By adopting the technical scheme, the invention can produce the following technical effects:
the invention provides a data filtering method and system for an active power distribution network, and the data filtering method and system can improve the safety of data transmission because the original active power distribution network data may have redundant noise data, redundant attributes and incomplete attribute values for data cleaning; the cleaned active power distribution network data is subjected to simplified dimensionality reduction processing, so that the efficiency and accuracy of data filtering are improved, the attribute dimensionality of the data is simplified, and the data mining efficiency can be greatly improved; and a gene expression programming algorithm is adopted to perform function mining on complex active power distribution network data, and the method has the advantages of simple coding and solving of complex problems and does not need prior knowledge.
Therefore, the method mainly solves the safety problem of the complex active power distribution network data in the transmission process, and can filter and actively protect the sensitive data in the active power distribution network by using the method provided by the invention.
Drawings
Fig. 1 is a schematic block diagram of a data filtering system for an active power distribution network according to the present invention.
Fig. 2 is a schematic flow chart of the data filtering method for the active power distribution network according to the present invention.
Detailed Description
The following describes embodiments of the present invention with reference to the drawings.
As shown in fig. 1, the present invention designs a data filtering system for an active power distribution network, which is divided into three parts: the system comprises a data cleaner, a data attribute reducer and a data filter; by adopting the system, the data filtering method facing the active power distribution network comprises the following steps:
and (1) cleaning the data of the active power distribution network by using a data cleaner.
The data cleaner in the embodiment is used for cleaning data of original active power distribution network data, and filling missing values of corresponding attributes, smooth noise data and the like to clean the data of the active power distribution network; and then, the data of the active power distribution network is converted into a form which can be used for classification and filtration by methods such as standardization and the like.
In order to ensure the safe operation of the active power distribution network, protecting the safe transmission of mass data in the active power distribution network is a key link for realizing the safe operation of the active power distribution network. Redundant noise data, redundant attributes and incomplete attribute values may exist in original active power distribution network data, and data cleaning is performed, and is divided into the following steps:
and 11, firstly judging whether noise data exists in the data, and if so, eliminating the noise data.
And step 12, judging whether the data has the redundancy attribute, and if so, removing the redundancy attribute.
And step 13, finally judging whether the data has missing values, and if the missing values exist under a certain attribute, filling the data by adopting the average value of all the data of the attribute.
And (2) performing attribute reduction on the cleaned active power distribution network data by using a data attribute reducer, wherein the attribute reduction comprises the following steps: performing attribute reduction on the active power distribution network data cleaned and processed by the data cleaner, selecting known sensitive data from the cleaned active power distribution network data to form training data, marking the known sensitive attributes, and forming data to be filtered by the remaining data; determining an attribute set from the training data, wherein the attribute set comprises defined data condition attributes and known sensitivity attributes; calculating a dependency calculation value between the condition attribute and the known sensitive attribute of the training data in the attribute set, and selecting the condition attribute according to the calculated dependency calculation value to generate a minimum relative attribute set; and determining the reduced training data containing the conditional attributes in the minimum relative attribute set and the data to be filtered based on the minimum relative attribute set.
Because the active power distribution network has rich data structures and high data dimensionality, the cleaned active power distribution network data needs to be simplified and dimensionality reduced so as to improve the efficiency and accuracy of data filtering. The attribute reduction algorithm based on dependency ranking is utilized to reduce m attributes of the original data into n (m > n) attributes, so that the decisive influence of the m attributes of the original data on the sensitive attributes is kept, the attribute dimensionality of the data is simplified, and the data mining efficiency can be greatly improved.
Selecting known sensitive data to form training data and marking the known sensitive data with known sensitive attributes; the method comprises the following specific steps: setting a data sample decision table T = < U, M, V, f >, wherein U is a non-empty finite set of objects and is a domain of discourse; m = { M 1 ,m 2 ,…,m m F property set, where M 1 ={m 1 ,m 2 ,…,m m Is a set of conditional attributes, F is a known sensitive attribute, V is a set of all attribute values of the training data, and F is a map of U M → V that expresses the attribute value of each object x in U.
Let M 1Defining a condition attribute M 1 The dependence on the known sensitivity attribute F is:
wherein M is 1 ={m 1 ,m 2 ,…,m m Is the set of conditional attributes, F is a known sensitivity attribute;is the equivalence class U/F with respect to M 1 Positive field of (1), card (-) tableShowing the cardinality of the set.
The data attribute reduction is specifically divided into the following steps:
step 21, initializing a minimum relative attribute set
Step 22, calculating a dependency calculation value Q of the whole condition attribute of the active power distribution and distribution network data relative to the known sensitive attribute;
step 23, calculating a dependency calculation value of each condition attribute of the active distribution grid data relative to a known sensitive attribute;
step 24, sorting from large to small according to the dependency values calculated in the step 23;
step 25, selecting according to the sequence arranged in step 24, and adding the corresponding condition attributes into the minimum relative attribute set in sequenceAnd calculates the currentCalculating a value Q' of the dependency of all condition attributes in the set relative to the known sensitive attribute;
step 26, comparing the difference between the set threshold Q and the calculated value Q' of the dependency obtained in step 25, and if the difference is less than a given threshold, obtaining the minimum set of relative attributes
Step 27, minimum set of relative attributes according to step 26Screening values corresponding to the plurality of attributes in the training data and the data to be filtered, namely reducing the training data and the data to be filtered in each active power distribution network data to only contain a minimum relative attribute setData of the attribute(s) in (1).
And 3, classifying and filtering the data to be filtered after the data attribute reducer is reduced by using a data filter, wherein the classifying and filtering method comprises the following steps: carrying out data mining on the training data by using a gene expression programming algorithm to obtain a classification function relation between the condition attribute and the known sensitive attribute; substituting the condition attribute in the data to be filtered into the obtained classification function relationship, calculating to obtain a function actual value, comparing the calculated function actual value with a target value to judge whether the attribute of the data is sensitive attribute, and filtering out the sensitive data with the sensitive attribute.
The selection strategy of each generation of population is optimized by using a gene expression programming algorithm, the traditional roulette strategy is adopted to select individuals entering the next generation according to the fitness value of each chromosome, each block on the disk represents the fitness proportion of each chromosome, and the next generation of chromosomes are obtained by rotating for corresponding times, so that the size of the population is kept unchanged. This selection strategy is simple and convenient, but has the disadvantage that in the initial stage of population evolution, if chromosomes with fitness values significantly higher than those of other individuals exist, the probability that the chromosome is selected is higher than that of other individuals every time the disk rotates according to the roulette strategy, so that the population may lose diversity and the global optimal search capability of GEP is affected.
The selection strategy provided by the invention arranges the population evolved in each generation from large to small according to the fitness value of the chromosomes, and divides the population into 10 groups of chromosomes in sequence, removes the group of chromosomes with the lowest fitness value, divides the disc into 9 small areas to represent the remaining 9 groups of chromosomes respectively, preferentially reserves the individuals with the highest fitness value, and rotates for corresponding times to obtain the next generation of chromosomes. The improved roulette strategy directly eliminates chromosomes with the lowest fitness value, preferentially reserves the best individuals in each generation, accelerates the convergence rate of the GEP algorithm, and avoids local optimization in the initial stage of population evolution. The invention adopts a gene expression programming algorithm to carry out data, has the advantages of simple coding and solving of complex problems, does not need prior knowledge, and can process complex active power distribution network data.
The invention adopts a gene expression programming algorithm to carry out data mining on training data, and the essence of the method is to excavate the classification function relation test F = F (m) of the condition attribute and the known sensitive attribute of the training data in the active power distribution network data 1 ,m 2 ,…,m n ) The method specifically comprises the following steps:
step 31, collecting the data of the active power distribution network processed in the step 2, and performing data cleaning and attribute reduction on the data to form a training sample to be mined;
step 32, setting parameters of a gene expression programming algorithm, and initializing a population;
step 33, according to the fitness functionEvaluating fitness value of each individual in the current population, wherein N is the selection range, C (i,j) Is the staining individual i corresponds to the fitness sample j (from the fitness sample set C) t In a collection). And T is j Is the sample value of fitness sample j;
step 34, judging whether the current population meets the termination condition, if so, jumping to step 39, otherwise, continuing to execute the following operation;
step 35, sorting according to the fitness values obtained in the step 33, dividing the chromosomes into 10 groups equally, and removing the group of chromosomes with the lowest fitness value;
step 36, dividing the disc into 9 small areas to represent the remaining 9 groups of chromosomes respectively, preferentially reserving the individual with the highest fitness value, and rotating for corresponding times to obtain the next generation of chromosomes;
step 37, performing various genetic operations such as recombination, mutation and the like in sequence according to the probability;
step 38, generating a new population, and skipping to step 33;
step 39. The individual with the largest fitness value is decoded into the functional relation F = F (m) 1 ,m 2 ,…,m n )。
And 310, substituting the condition attributes of the data to be filtered in the active power distribution network data with the classified filtering into the data to be filtered to calculate to obtain a function value F ', if the difference between the function value F' and the sensitive attribute target value F is within a given threshold range, judging that the attribute is sensitive attribute, marking the data as sensitive data, filtering out the data, and preferably carrying out encryption security protection on the filtered sensitive data.
To verify that the present invention can implement fast data cleaning, an embodiment is specifically illustrated.
According to the characteristics that Data in an active power distribution network has high dimensionality, large Data volume and a complex Data structure, the method has great analysis value for users, companies and social economy, for example, in table 1, if part of known sensitive Data1 exists in active power distribution network Data, data filtering needs to be carried out on the active power distribution network Data, and the sensitive Data are protected.
TABLE 1 Data of active distribution network
The specific implementation scheme is as follows:
(1) The user A puts forward the requirements to carry out active encryption protection on Data transmission of the active power distribution network, and active power distribution network Data in a short period of time are collected.
(2) Data cleaning is carried out on Data of the active power distribution network, whether noise Data exist in the Data is judged firstly, and if one Data is abnormal: 0asa, if the noise data are removed firstly;
(3) Judging whether redundant attributes exist in the data, if two voltage attributes exist, removing one repeated attribute;
(4) Judging whether the data has a missing value, if the missing value exists under a certain attribute, filling by adopting the average value of all the data of the attribute;
(5) Converting the cleaned Data of the active power distribution network into a form capable of Data mining, such as table 1, selecting 1000 pieces of known sensitive Data as training Data1, marking known sensitive attributes, and determining an attribute set, wherein the attribute set comprises defined Data condition attributes and known sensitive attributes; the Data2 to be filtered is formed by the rest active power distribution network Data;
(6) Initializing a minimum set of relative attributes using a data attribute reducer designed using a dependency ranking-based attribute reduction algorithmRespectively calculating the dependency value of each condition attribute of the data in the set relative to the known sensitivity attribute, and sequentially adding the corresponding condition attribute into the minimum relative attribute set from large to small according to the dependency valueAnd calculates the currentUntil Q' is approximately equal to the dependency value Q of the whole condition attribute relative to the known sensitive attribute, obtaining the minimum relative attribute set
(7) According to the minimum relative attribute setObtaining reduced training Data1 and Data2 to be filtered;
(8) Performing function mining on the reduced training Data1 by operating a gene expression programming algorithm to obtain a relation function relation F = F (m) 1 ,m 2 ,…,m n )。
(9) Substituting the reduced Data2 to be filtered into F = F (m) for each piece of Data 1 ,m 2 ,…,m n ) And calculating the error between the target value F ' and the actual value F ' if the error is smaller than a given threshold value, judging that the attribute corresponding to the actual value F ' is a sensitive attribute, and filtering the data and preferably performing encryption protection.
In conclusion, the method and the device can classify and filter various service sensitive data of the active power distribution network, so that the sensitive data can be actively protected. The problem of data transmission of an active power distribution network is solved.
The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.

Claims (6)

1. A data filtering method for an active power distribution network is characterized by comprising the following steps:
cleaning the data of the active power distribution network;
carrying out attribute reduction on the cleaned active power distribution network data, including: selecting known sensitive data from the cleaned active power distribution network data to form training data, marking the training data with known sensitive attributes, and forming data to be filtered by the remaining data; determining an attribute set from the training data, wherein the attribute set comprises defined data condition attributes and known sensitivity attributes; calculating a dependency calculation value between the condition attribute and the known sensitive attribute of the training data in the attribute set, and selecting the condition attribute according to the calculated dependency calculation value to generate a minimum relative attribute set; determining training data and data to be filtered which contain conditional attributes in the minimum relative attribute set after reduction based on the minimum relative attribute set;
classifying and filtering the reduced data to be filtered, comprising the following steps: carrying out data mining on the training data by using a gene expression programming algorithm to obtain a classification function relation between the condition attribute and the known sensitive attribute; substituting the condition attribute in the data to be filtered into the obtained classification function relationship, calculating to obtain a function actual value, comparing the calculated function actual value with a target value to judge whether the attribute of the data is sensitive attribute, and filtering the sensitive data with the sensitive attribute.
2. The active power distribution network-oriented data filtering method according to claim 1, characterized in that: the method for cleaning the data of the active power distribution network comprises the following steps: eliminating noise data, removing redundant attributes, judging whether missing values exist in the data and filling the missing values.
3. The active power distribution network-oriented data filtering method according to claim 1, characterized in that: in the method, a calculated value of the dependency between the condition attribute and the known sensitive attribute of the training data in the attribute set is calculatedThe formula is adopted:
wherein M is 1 ={m 1 ,m 2 ,…,m m Is a set of conditional attributes, F is a known sensitive attribute, U is a field,is the equivalence class U/F with respect to M 1 The positive field of (1), card (-) represents the cardinality of the set.
4. The active power distribution network-oriented data filtering method according to claim 1, wherein in the method, a condition attribute is selected according to the calculated dependency calculation value to generate a minimum relative attribute set, specifically:
calculating to obtain a calculated value of the dependency of all condition attributes of the data in the attribute set relative to the known sensitive attributes;
respectively calculating the calculated value of the dependence of each condition attribute of the data in the attribute set relative to the known sensitive attribute, selecting the condition attribute according to the calculated value of the dependence to generate a current minimum relative attribute set, and calculating the dependence values of all the condition attributes in the current minimum relative attribute set relative to the known sensitive attribute;
and comparing the calculated dependency values of all condition attributes in the current minimum relative attribute set relative to the known sensitive attributes with a set threshold, and determining to obtain a minimum relative attribute set according to the comparison result.
5. The active power distribution network-oriented data filtering method according to claim 1, characterized in that: the method further comprises the step of carrying out encryption security protection on the sensitive data with the sensitive attribute.
6. An active power distribution network-oriented data filtering system, comprising:
the data cleaner is used for cleaning the data of the active power distribution network;
the data attribute reducer is used for carrying out attribute reduction on the active power distribution network data cleaned by the data cleaner, selecting known sensitive data from the cleaned active power distribution network data to form training data, marking the known sensitive data, and forming data to be filtered by the residual data; determining an attribute set from the training data, wherein the attribute set comprises defined data condition attributes and known sensitivity attributes; calculating a dependency calculation value between the condition attribute and the known sensitive attribute of the training data in the attribute set, and selecting the condition attribute according to the calculated dependency calculation value to generate a minimum relative attribute set; determining training data and data to be filtered which are obtained after reduction and contain conditional attributes in the minimum relative attribute set based on the minimum relative attribute set;
the data filter is used for classifying and filtering the data to be filtered after the data attribute reducer reduces, and comprises: carrying out data mining on the training data by using a gene expression programming algorithm to obtain a classification function relation between the condition attribute and the known sensitive attribute; and substituting the condition attribute in the data to be filtered into the obtained classification function relationship, calculating to obtain a function actual value, comparing the calculated function actual value with a target value to judge whether the attribute of the data is a sensitive attribute, and filtering the sensitive data with the sensitive attribute.
CN201711265255.0A 2017-12-05 2017-12-05 A kind of data filtering method and system towards active power distribution network Pending CN108062363A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711265255.0A CN108062363A (en) 2017-12-05 2017-12-05 A kind of data filtering method and system towards active power distribution network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711265255.0A CN108062363A (en) 2017-12-05 2017-12-05 A kind of data filtering method and system towards active power distribution network

Publications (1)

Publication Number Publication Date
CN108062363A true CN108062363A (en) 2018-05-22

Family

ID=62136137

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711265255.0A Pending CN108062363A (en) 2017-12-05 2017-12-05 A kind of data filtering method and system towards active power distribution network

Country Status (1)

Country Link
CN (1) CN108062363A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109980781A (en) * 2019-03-26 2019-07-05 惠州学院 A kind of transformer substation intelligent monitoring system
CN111352966A (en) * 2020-02-24 2020-06-30 交通运输部水运科学研究所 Data tag calibration method in autonomous navigation
CN111460505A (en) * 2020-04-02 2020-07-28 深圳前海微众银行股份有限公司 Modeling method, device, equipment and storage medium based on privacy protection
CN111934437A (en) * 2020-09-22 2020-11-13 中科全维科技(苏州)有限公司 Active power distribution network big data transmission method based on behavior mark and lightweight encryption
CN116578557A (en) * 2023-03-03 2023-08-11 齐鲁工业大学(山东省科学院) Missing data filling method for data center

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706883A (en) * 2009-11-09 2010-05-12 北京航空航天大学 Data mining method and device
CN102915423A (en) * 2012-09-11 2013-02-06 中国电力科学研究院 System and method for filtering electric power business data on basis of rough sets and gene expressions
CN104298873A (en) * 2014-10-10 2015-01-21 浙江大学 Attribute reduction method and mental state assessment method on the basis of genetic algorithm and rough set
CN104615789A (en) * 2015-03-06 2015-05-13 苏州大学 Data classifying method and device
CN106405319A (en) * 2015-07-30 2017-02-15 南京理工大学 Rough set electric power system fault diagnosis method based on heuristic information

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706883A (en) * 2009-11-09 2010-05-12 北京航空航天大学 Data mining method and device
CN102915423A (en) * 2012-09-11 2013-02-06 中国电力科学研究院 System and method for filtering electric power business data on basis of rough sets and gene expressions
CN104298873A (en) * 2014-10-10 2015-01-21 浙江大学 Attribute reduction method and mental state assessment method on the basis of genetic algorithm and rough set
CN104615789A (en) * 2015-03-06 2015-05-13 苏州大学 Data classifying method and device
CN106405319A (en) * 2015-07-30 2017-02-15 南京理工大学 Rough set electric power system fault diagnosis method based on heuristic information

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109980781A (en) * 2019-03-26 2019-07-05 惠州学院 A kind of transformer substation intelligent monitoring system
CN109980781B (en) * 2019-03-26 2023-03-03 惠州学院 Intelligent monitoring system of transformer substation
CN111352966A (en) * 2020-02-24 2020-06-30 交通运输部水运科学研究所 Data tag calibration method in autonomous navigation
CN111460505A (en) * 2020-04-02 2020-07-28 深圳前海微众银行股份有限公司 Modeling method, device, equipment and storage medium based on privacy protection
CN111934437A (en) * 2020-09-22 2020-11-13 中科全维科技(苏州)有限公司 Active power distribution network big data transmission method based on behavior mark and lightweight encryption
CN116578557A (en) * 2023-03-03 2023-08-11 齐鲁工业大学(山东省科学院) Missing data filling method for data center
CN116578557B (en) * 2023-03-03 2024-04-02 齐鲁工业大学(山东省科学院) Missing data filling method for data center

Similar Documents

Publication Publication Date Title
CN108062363A (en) A kind of data filtering method and system towards active power distribution network
Tesfahun et al. Intrusion detection using random forests classifier with SMOTE and feature reduction
CN109218304B (en) Network risk blocking method based on attack graph and co-evolution
CN110135167B (en) Edge computing terminal security level evaluation method for random forest
Wang et al. Constructing robust community structure against edge-based attacks
CN111611324A (en) Cross-domain access strategy optimization method and device
Mbow et al. An intrusion detection system for imbalanced dataset based on deep learning
CN108446562B (en) Intrusion detection method based on tabu and artificial bee colony bidirectional optimization support vector machine
Sun et al. Wrapper feature selection based on lightning attachment procedure optimization and support vector machine for intrusion detection
Hao et al. Producing more with less: a GAN-based network attack detection approach for imbalanced data
Muddumadappa et al. An efficient reconfigurable cryptographic model for dynamic and secure unstructured data sharing in multi-cloud storage server
CN106022936B (en) Community structure-based influence maximization algorithm applicable to thesis cooperative network
Moriguchi et al. Sustaining behavioral diversity in neat
He et al. Firmware vulnerabilities homology detection based on clonal selection algorithm for IoT devices
CN107832621B (en) AHP-based weight calculation method for behavior trust evidence
CN114511330B (en) Ether house Pompe fraudster detection method and system based on improved CNN-RF
Lima et al. An empirical investigation of attribute selection techniques based on Shannon, Rényi and Tsallis entropies for network intrusion detection
Krömer et al. Genetic algorithm for sampling from scale-free data and networks
ZHANG et al. Integrated intrusion detection model based on artificial immune
CN113537313A (en) Unbalanced data set analysis method based on WGAN training convergence
Pope et al. Evolving bipartite authentication graph partitions
CN111144540A (en) Generation method of anti-electricity-stealing simulation data set
CN113546426B (en) Security policy generation method for data access event in game service
Liu Digital information storage method of power grid enterprises based on random forest
Hu et al. A mixed sampling method for imbalanced data based on neighborhood density

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180522

RJ01 Rejection of invention patent application after publication