CN102915423A - System and method for filtering electric power business data on basis of rough sets and gene expressions - Google Patents

System and method for filtering electric power business data on basis of rough sets and gene expressions Download PDF

Info

Publication number
CN102915423A
CN102915423A CN201210335416XA CN201210335416A CN102915423A CN 102915423 A CN102915423 A CN 102915423A CN 201210335416X A CN201210335416X A CN 201210335416XA CN 201210335416 A CN201210335416 A CN 201210335416A CN 102915423 A CN102915423 A CN 102915423A
Authority
CN
China
Prior art keywords
data
power business
business data
attribute
filtering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201210335416XA
Other languages
Chinese (zh)
Other versions
CN102915423B (en
Inventor
邓松
张涛
林为民
马媛媛
李伟伟
时坚
汪晨
周诚
胡斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
China Electric Power Research Institute Co Ltd CEPRI
Original Assignee
State Grid Corp of China SGCC
China Electric Power Research Institute Co Ltd CEPRI
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, China Electric Power Research Institute Co Ltd CEPRI filed Critical State Grid Corp of China SGCC
Priority to CN201210335416.XA priority Critical patent/CN102915423B/en
Publication of CN102915423A publication Critical patent/CN102915423A/en
Application granted granted Critical
Publication of CN102915423B publication Critical patent/CN102915423B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a novel system and a novel method for filtering electric power business data, which are used for solving the sensitive data filtering problem of an electric power business data protection system. The system mainly comprises the parts of a data preprocessor, a data attribute reduction controller, a data filtering controller. A rough set method is adopted to reduce electric power business sets and data complexity, a gene expression method is utilized to build an electric power business data classification model, the sensitive degree of the electric power business data is actively identified on the basis of the classification model, and leakage is prevented through the cooperation of a strategic knowledge base.

Description

A kind of power business data filtering system and method based on rough set and gene expression
Technical field
The sensitive data that the present invention relates in the power business data security protecting system filters, and is mainly used in solving the problem that sensitive data filters in the power business data protection system, belongs to information security field.
Background technology
Eleventh Five-Year Plan up till now, State Grid Corporation of China is by " SG186 " engineering construction, its infosystem covers its main business field substantially, the support effect of Informatization on Enterprise's strategic development is obvious.Informatization for " 12 ", State Grid Corporation of China is around intelligent grid, proposed to provide for intelligent grid " SG-ERP " plan of information support, its target is: utilize modern communications and infotech, on the basis of Power Network Digital and robotization, data acquisition, transmission, storage and the utilization of in-depth electric power each link realize that data acquisition digitizing, Automation of Manufacturing Process, business processing are interactive, operation and management information, strategic decision be scientific; The power-assisted intelligent grid is built, the production of General Promotion company, operation, management and decision level.As seen SG-ERP is as the informatization engineering, with integrated services as primary starting point, be conceived to break previous power automation and power informatization is relatively independent, separately the development situation, informatization has been extended in the electrical production core business.
Along with the enforcement of SG-ERP informatization and the propelling that deepens continuously, more and more (safety in production, marketing management, material management etc.) will extensively adopt mobile intelligent terminal access way and power information Intranet to carry out real-time, non real-time data communication and exchanges data in the power business application system.Simultaneously, along with the construction of strong intelligent grid, the use of a large amount of intelligent acquisitions and intelligent terminal, being widely used of the wireless communication technologys such as 3G/WIFI increases with approach that let out greatly so that various power business application data is destroyed.Power business application data protection is the basis that intelligent grid such as sent out, fails, becomes, joins, uses at each link operation system safe and stable operation.Along with the construction of State Grid Corporation of China's three large data centers, the miscellaneous service system data is more and more concentrated and is stored simultaneously, and it is further important that the reliable memory of sensitive data and protection become.
For the data protection of power business application-aware, its core technology is exactly that the power business application data can effectively be identified in storage and transmission course, and realizes that for corresponding strategy it filters, thereby reaches the purpose that prevents its leakage.And the method that data identification is filtered is also a lot, has based on strategy matching, based on the BP network etc.Different data identification filter methods, its effect are very not identical, are mainly reflected in the aspects such as performance, automaticity, daily management and extensibility.The final purpose that the identification of power business application data is filtered is in its storage and transmission course, can effectively protect it by certain strategy process, prevents from relating to the business datum leakage of State Grid Corporation of China's secret.Therefore a kind of active data of research is identified filtering scheme for improving the protecting sensitive data ability, reduce its leakage, ensureing that the stable operation of power business security of system all has great importance in power business application-aware data protection system.
The development that deepens continuously along with State Grid Corporation of China's informationization technology, various informationization technologies are ripe and be applied to during various power businesses use gradually, employed data are positioned on the different memory nodes, along with sharing between all kinds of power business application datas, existing security mechanism can't guarantee its not revealed in storage and transmission course.In order to ensure the security of these power business application datas in storage and transmission course, prevent the leakage of sensitive data, can adopt the methods such as in full encryption, strategy matching and artificial intelligence.Encrypt in full and can solve the security of data in storage and transmission course, but for magnanimity power business data, be difficult to effectively guarantee its performance and accuracy.The strategy matching method can solve the safety problem of power business data, can guarantee its performance and accuracy again, but for different power business system, need the tactful different of formulation, therefore need the comparatively complicated policy library support of a cover just can reach the purpose of power business application data anti-leak.And the method for various artificial intelligence can satisfy the demand of data security protection, and the ability that can take full advantage of again the intellectuality of method and self study improves the performance of data filtering.
The data identification filter method is mainly considered from the following aspects: (1) for the ease of the late time data filtration treatment, carries out the pre-service such as data cleansing, noise data rejecting for all kinds of power business data that collect to it; (2) for pretreated power business data, its attribute of combining rough set method yojan, the complexity of reduction data; (3) for the power business data behind the attribute reduction, utilize the gene expression method to set up the power business data classification model, based on this disaggregated model in power business data storages and transmission course, its sensitivity of initiative recognition, and the Cooperation Strategy knowledge base prevents its leakage.
Summary of the invention
In order to solve the above problems; purpose of the present invention just provides a kind of system and method for relevant power business data filtering newly; solve the sensitive data filtration problem in the power business data protection system; the mechanism that adopts is a kind of tactic method; the application of the invention can be so that all kinds of power business sensitive data can prevent the leakage of power business sensitive data when transmitting to greatest extent between terminal and network, thereby protects the stable operation of all kinds of power business security of system.
According to an aspect of the present invention, proposed a kind of power business data filtering system based on rough set and gene expression, described power business data filtering system comprises:
Data pre-processor is used for pending all kinds of power business data are carried out the data pre-service, and described data pre-service can comprise data scrubbing and data transformation etc.;
Data attribute yojan controller is used for yojan power business data acquisition, simplifies the power business data set;
The data filtering controller is used for responsive power business data are carried out intelligently filters, guarantees the security of power business data transmission.
According to an aspect of the present invention, in power business data security protecting system, data attribute yojan controller adopts rough set method yojan power business data acquisition, reduces the complexity of data.
According to an aspect of the present invention; in power business data security protecting system; the data filtering controller utilizes the gene expression method to set up the power business data classification model, based on the sensitivity of these disaggregated model initiative recognition power business data, Cooperation Strategy knowledge base Leakage prevention.
According to an aspect of the present invention, proposed a kind of power business data filtering method based on rough set and gene expression, described power business data method is used for the power business data security protecting, and the step that comprises is:
By data pre-processor pending all kinds of power business data are carried out the data pre-service, described data pre-service can comprise data scrubbing and data transformation etc.;
By data attribute yojan controller yojan power business data acquisition, simplify the power business data set;
By the data filtering controller responsive power business data are carried out intelligently filters, guarantee the security of power business data transmission.
According to an aspect of the present invention, in data filtering method, data attribute yojan controller adopts rough set method yojan power business data acquisition, reduces the complexity of data.
According to an aspect of the present invention, in data filtering method, the data filtering controller utilizes the gene expression method to set up the power business data classification model, based on the sensitivity of these disaggregated model initiative recognition power business data, Cooperation Strategy knowledge base Leakage prevention.
Description of drawings
The present invention is further described in more detail below in conjunction with drawings and the specific embodiments.
Fig. 1 is data filtering structural drawing according to an embodiment of the invention, mainly comprises: data pre-processor, data attribute yojan controller, data filtering controller and data filtering operation core.
Fig. 2 is reference architecture synoptic diagram according to an embodiment of the invention, the assembly that expression the present invention includes.
Fig. 3 is method flow synoptic diagram according to an embodiment of the invention.
Embodiment
In order to make purpose of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, is not intended to limit the present invention.
Method of the present invention is a kind of method of tactic, by the power business data in various storages or the transmission course are carried out pre-service, obtain meeting the power business sample data collection that rough set is processed, then by rough set theory to pre-service after power business sample data collection carry out effective attribute reduction, greatly reduce the time complexity that the power business data identification is filtered, at last based on the processing of classifying of the data set of gene expression algorithm after to yojan, realize the filtration of all kinds of power business data.
Fig. 1 has provided the structural drawing based on the power business data filtering system of rough set and gene expression, and it mainly comprises five parts: data pre-processor, data attribute yojan controller, data filtering controller and data filtering operation core.Data filtering operation core among the figure has comprised in data pre-service and classification situation well, data has been filtered required concrete operations.The present invention has increased other three parts and has guaranteed that data filtering carries out more smoothly effectively, guarantees to greatest extent the identification filtration capacity of data, reduces the disclosure risk of power business sensitive data.
The below provides concrete introduction:
Data pre-processor: in order to ensure the stable operation of power business security of system, most important be exactly all kinds of power business data in the safety of storage and transmission course, prevent that wherein its data leakage from being the most important thing.Before carrying out power business data identification and protection, need to carry out the data pre-service such as data scrubbing and data transformation to pending all kinds of power business data.In order to improve the quality of filtering the power business data, reduce its time complexity, at first for all kinds of power business data, the cleaning data such as the missing values by filling in respective attributes, smooth noise data; Secondly by methods such as standardization the power business data-switching is become to be applicable to identify the form of filtration.The specific implementation of logarithm Data preprocess is not done any restriction in this patent.
Data attribute yojan controller: for the pretreated power business data acquisition of process, because the characteristic of power business self, cause this data acquisition attribute numerous, if there is not corresponding attribute reduction method, the complexity that will cause the power business data set identify to be filtered increases greatly, and the efficient of simultaneously its processing also will descend greatly.Data attribute yojan controller is for pretreated power business data acquisition, in the situation that does not change its intrinsic classification capacity, adopt rough set method to come yojan power business data acquisition, the Attribute Redundancy information in the deletion power business data acquisition is simplified the power business data set.
Data filtering controller: when guaranteeing that the power business data are transmitted between terminal and network; can fast and effeciently identify and whether comprise sensitive information in the power business data; in data protection system, be far from being enough based on keyword match only; behavior effectively checks because keyword match can not be changed to malice leakage person data self attributes etc., can't guarantee all kinds of responsive power business data maintaining secrecy in transmission course.If malice leakage person changes its filename, file attribute etc. to sensitive data waiting for transmission, data protection system can't guarantee its safety so.Therefore the data identification filtering model that must set up intelligent automaticization.Adopt gene expression to set up the filtering model of power business data identification in this patent.
1, data pre-processor
Whether the power business data set needs to carry out pre-service and depends on whether the form of data set satisfies the requirement of identification filter method.Whether need to carry out corresponding pretreatment operation in order in time to judge the power business data set, in the method, set up a number Data preprocess rule base, when the user when executing data filters, at first by data query preprocessing rule storehouse, judge whether the property value that current data is concentrated has disappearance, if a certain property value has disappearance, then by filling with the average of this property value of current data set; Secondly judge by the method for cluster current data concentrates whether contain noise data, if having, the noise data of then deletion correspondence; Judge at last current data concentrates whether contain character type data, if having, then by normalization method it is transformed into the numeric type data, the final power business data set that meets the data filtering method requirement that forms.Here we are take original power business data set ODataSet as example, and the data structure form of whole ODataSet is as shown in table 1.
The data structure of table 1ODataSet
Sequence number Voltage (V) Electric current (A) Benchmark wind speed (m/s) ...... Wind direction
1 220 25 25 ...... The southeast
2 220 20 ...... The northwest
3 220 25 20 ...... East
4 2000 25 20 ...... The southeast
...... ...... ...... ...... ......
As can be seen from Table 1, the current properties value of the 2nd data disappearance is filled this property value by the average (25A) of calculating this property value among the original power business data set ODataSet.Can find that through clustering method magnitude of voltage is noise data unusually in the 4th data, performance and the accuracy rate of filtering in order not affect final data are deleted this data.Wind direction property value by ODataSet is character type, do not meet in this patent the data filtering method of carrying be the requirement of numeric type to property value, so according to the combination of wind direction property value, to its processing that quantizes.The data structure form of final data collection UDataSet after the whole processing is as shown in table 2.
The data structure of table 2UDataSet
Sequence number Voltage (V) Electric current (A) Benchmark wind speed (m/s) ...... Wind direction
1 220 25 25 ...... 5
2 220 25 20 ...... 8
3 220 25 20 ...... 1
...... ...... ...... ...... ......
2, data attribute yojan controller
When carrying out data filtering, because the attribute that does not have data set self carries out reduction, thereby can cause the performance of data filtering greatly to descend through the pretreated power business data set of above-mentioned data.The efficient of filtering in order to improve responsive power business data set identify, data attribute yojan controller is in the situation that does not change its intrinsic classification capacity, utilize rough set method to come the pretreated power business data set of yojan, greatly reduce the complexity that data set filters.
For the workflow of clear data of description attribute reduction, at first establish sample decision table T=<U, C ∪ D, V, f 〉, wherein U is the research object set of sample data, C ∪ D=R is the community set of sample data, C={c 1, c 2..., c nBe the conditional attribute set of sample data, D={d 1, d 2..., d mBe the decision attribute set of sample data, V=∪ v r, r ∈ R is the set of sample data property value, v rThe attribute-value ranges that represents some attribute r ∈ R, f:U * R → V defines an information function, and it specifies the property value of each object x among the U, and is namely right F (x, r) ∈ v is arranged r
The groundwork flow process is as follows:
(1) at first whether judgement sample decision table T coordinates, if inharmonious, the all conditions attribute that then this sample decision table T is divided into the sample decision table T ' of a coordination and an inharmonic sample decision table T in ", and with sample decision table T " joins among the final attribute reduction set reductionSet;
(2) then for each the conditional attribute c in the sample decision table T ' conditional attribute set of coordinating, the conditional attribute among the judgement sample decision table T ' is with respect to the positive territory POS of decision attribute C(D) whether equal conditional attribute collection among the sample decision table T ' and remove behind the c positive territory POS with respect to decision attribute C-{c}(D), if but equate then to represent this conditional attribute c yojan, and this conditional attribute c is added among the final attribute reduction set reductionSet.
(3) last, with sample decision table T=<U, C ∪ D, V, f〉conditional community set C removes the sample decision table T=<U that obtains behind the attribute reduction set reductionSet after the yojan, C ' ∪ D, V, f 〉.
For the sample decision table of coordinating, but come whether yojan of each conditional attribute in the Rule of judgment community set by the positive territory of calculating the relative decision attribute of its conditional attribute, thereby do not change under the prerequisite of the intrinsic categorised decision ability of former sample data in assurance, reach the purpose of yojan sample data conditional attribute.
3, data filtering controller
For in all kinds of power business data transmission procedures, can guarantee fast and effeciently that responsive power business data are not revealed, must in this data transmission, can effectively carry out intelligently filters to responsive power business data, guarantee the security of power business data transmission.How to design the security that effective intelligent data filter method can guarantee data transmission, can improve substantially again the strainability of sensitive data, the present invention proposes the power business data filtering method based on gene expression.
At first make up power business training sample data with filter attribute according to expert knowledge library, as shown in table 3, then use the gene expression method that these power business training sample data are excavated filter attribute F and corresponding conditions attribute { x after through pre-service and attribute reduction 1, x 2..., x nBetween funtcional relationship F=f (x 1, x 2..., x n), then each bar data in the power business test sample book data to be filtered are brought into F=f (x 1, x 2..., x n) in the value of obtaining F', and calculate error between desired value F' and the actual value F, satisfy in advance given threshold value and namely sentence this data bit sensitive data, and stop its transmission according to expert knowledge library.
Table 3 is with the power business training sample data example of filter attribute
Figure BDA00002124245900071
A wherein, b, c, e, f, g represent respectively conditional attribute x 1, x 2, x nCorresponding property value, the value of filter attribute F are that 1 these data of expression are sensitive data, are that 0 these data of expression is general data.
The groundwork flow process of whole data filtering controller is as follows:
(1) makes up with filter attribute with not with the power business data of filter attribute, simultaneously two types data are carried out pre-service and attribute reduction, form respectively training sample data collection to be excavated and test sample book data set to be filtered;
(2) according to training sample data collection feature, determine the parameter of gene expression method, and the initialization population;
(3) estimate each individual fitness function value in the population;
(4) judge whether to satisfy end condition, then forwarded for (7) step to if satisfy, otherwise continue;
(5) carry out various genetic manipulations according to probability;
(6) produce new population, and forwarded for (3) step to.
(7) return function relational expression F=f (x 1, x 2..., x n), bringing simultaneously test sample book data to be filtered into and calculate, functional value F is judged as sensitive data near those data of 1.
According to an aspect of the present invention, a kind of new data filtering method in power business data security protecting system, can adopt following step to realize:
Step 1: make up respectively with filter attribute with not with power business data set A and the B of filter attribute, the user judges whether that according to data pre-processor data query preprocessing rule storehouse needs carry out the data pre-service, if need pre-service then to enter into next step, otherwise go to step 3;
Step 2: judge at first respectively whether current pending power business data set A and the property value among the B have disappearance, if having, then respectively by filling with current pending A and the average of this property value of B data centralization; Secondly judge among current power business datum collection A and the B whether contain noise data by the method for cluster, if having, the noise data of then deletion correspondence; Judge at last among current power business datum collection A and the B and whether contain character type data, if have, then by normalization method it is transformed into the numeric type data, final power business training sample data collection and the power business test sample book data set that meets the data filtering method requirement that form respectively;
Step 3: power business training sample data and power business test sample book data according to pre-service obtains make up respectively corresponding sample decision table T TrainAnd T Test, then distinguish judgement sample decision table T TrainAnd T TestWhether coordinate, if inharmonious, then with sample decision table T TrainAnd T TestThe sample decision table T ' that is divided into respectively coordination TrainAnd T ' TestAnd inharmonic sample decision table T " TrainAnd T " Test, and with sample decision table T " TrainAnd T " TestIn all conditions attribute join respectively final attribute reduction set reductionSet TrainAnd reductionSet TestIn;
Step 4: respectively for the sample decision table T ' that coordinates TrainAnd T ' TestEach conditional attribute c in the conditional attribute set, respectively judgement sample decision table T ' TrainAnd T ' TestIn conditional attribute whether equal conditional attribute collection in the corresponding sample decision table with respect to the positive territory of decision attribute and remove behind the c positive territory with respect to decision attribute, if equate, but then represent the conditional attribute c yojan in this sample decision table, and this conditional attribute c is joined in the corresponding attribute reduction set;
Step 5: respectively with sample decision table T ' TrainAnd T ' TestConditional community set C removes attribute reduction set reductionSet TrainAnd reductionSet Test, obtain respectively the training and testing sample data collection RT ' after the yojan TrainAnd RT ' Test
Step 6: according to the feature of training sample data collection, determine the parameter of gene expression method, and the initialization population;
Step 7: operation gene expression method is excavated the filter attribute of training sample data and the functional relation F=f (x between the conditional attribute 1, x 2..., x n);
Step 8: according to this functional relation, bring test sample book data to be filtered into and calculate, if the Error Absolute Value between the functional value F and 1 that obtains, judges then that these data are sensitive data less than 0.001, according to the filtering rule storehouse, implement blocking-up and do not allow its transmission;
Step 9: the power business data filtering finishes.
According to an aspect of the present invention, adopt a kind of power business data filtering method based on rough set and gene expression, be mainly used in solving in terminal and network transmission process, the anti-leak problem of power business sensitive data, the method that proposes in the application of the invention can effectively realize the identification filtration of power business data, stop its leakage in terminal and network transmission process, thereby improve the security of power business sensitive data.
The below provides specific description.
Data attribute yojan controller is at first by after the data pre-processor pre-service power business training and testing data, in the more situation of power business training data conditional attribute, if can not effectively carry out yojan to its conditional attribute, will certainly have influence on the performance of data filtering, thereby strengthen the risk that responsive power business data are revealed.So in data attribute yojan controller, introduced the attribute reduction method based on rough set, at first make up corresponding sample decision table for pretreated power business training and testing data set, then whether training of judgement and test sample book decision table are coordinated respectively, if inharmonious, the sample decision table and the inharmonic sample decision table that then training and testing sample decision table are divided into respectively corresponding coordination, and all conditions attribute in inharmonic sample decision table joined respectively in the set of final attribute reduction; Secondly respectively for each the conditional attribute c in the sample decision table conditional attribute set of coordinating, respectively by judging whether its conditional attribute equals conditional attribute collection in the corresponding sample decision table with respect to the positive territory of decision attribute and remove behind the c positive territory with respect to decision attribute, if equate, but then represent the conditional attribute c yojan in this sample decision table, and this conditional attribute c is joined in the corresponding attribute reduction set.Respectively the sample decision table conditional community set of coordinating is removed at last corresponding attribute reduction set, thereby finally obtain respectively the training and testing sample data collection after the yojan.
The data filtering controller has increased the intelligent of power business data filtering by using the gene expression method.When the power business data are transmitted in terminal and network; need to judge in time whether these power business data are sensitive data; only can't practical requirement based on keyword match in the data security protecting system; because keyword match can not be changed to malice leakage person the behaviors such as data self attributes and effectively check; if malice leakage person changes its filename, file attribute etc. to sensitive data waiting for transmission, data protection system can't guarantee the safety in its transmission course so.So in the data filtering controller, introduce the gene expression method, after the power business training and testing data set that makes up is carried out pre-service and attribute reduction, feature according to the training sample set to be excavated that obtains, the operation gene expression programming, structure filter function F and training sample data conditional attribute (x to be excavated 1, x 2..., x n) funtcional relationship F=f (x 1, x 2..., x n), by this functional relation, the user can be easily obtains the filter function value according to the attribute of the test data set of yojan, whether judges data as sensitive data take this, at last by the filtering rule storehouse, implements the protection of sensitive data.
In actual applications; certain electric power enterprise has with certain power business training data X of filter attribute with not with the test data Y of filter attribute; in order to set up the data security protecting system; need to be according to the characteristic of power business training data X; set up the filter attribute of power business training data X and the funtcional relationship between the conditional attribute based on the gene expression method; and judge by this funtcional relationship whether the power business test data Y that is not with filter attribute is sensitive data, thereby determine whether allowing this data Y to transmit and exchange at network and terminal room by the filtering rule storehouse.
Its concrete embodiment is:
(1) whether needs carry out pre-service according to data preprocessing rule storehouse training of judgement data set X and test data set Y, if there are the property value disappearance in training data X and test data Y, when containing noise data and having character type data, data pre-service request is then proposed, then data pre-processor accepts request and training data X and test data Y are carried out property value replenishes afterwards, noise data disappears scarce and character type data converts the operations such as numeric type data to, final power business training sample data collection and the power business test sample book data set that meets the data filtering method requirement that form respectively;
(2) the power business training sample data and the power business test sample book data that obtain according to pre-service, make up respectively corresponding sample decision table, then judge whether it coordinates, if inharmonious sample decision table and inharmonic sample decision table that then this sample decision table is decomposed into a coordination;
(3) for each conditional attribute in the sample decision table of coordinating, whether equal conditional attribute collection in the corresponding sample decision table with respect to the positive territory of decision kind set and remove behind the conditional attribute c positive territory with respect to decision kind set by calculating its conditional attribute collection, if equate, but then represent the conditional attribute c yojan in this sample decision table, and this conditional attribute c is joined in the corresponding attribute reduction set;
(4) respectively training and testing sample decision table conditional community set is removed corresponding attribute reduction set, obtain respectively the training and testing sample data collection RT ' after the yojan TrainAnd RT ' Test
(5) according to training sample data collection RT ' TrainFeature, determine the parameter of gene expression method and initialization population;
(6) operation gene expression method is excavated the filter attribute of training sample data and the functional relation between the conditional attribute;
(7) according to this functional relation, bring test sample book data to be filtered into and calculate, if the Error Absolute Value between the functional value F and 1 that obtains, judges then that these data are sensitive data less than 0.001, according to the filtering rule storehouse, implement blocking-up and do not allow its transmission.Whole data filtering process finishes.
Although in specific embodiments, described embodiment of the present invention and various functional module thereof, but be to be understood that, can realize embodiment of the present invention with hardware, software, firmware, middleware or their combination, and embodiment of the present invention can be used in multiple systems, subsystem, assembly or its sub-component.When realizing with software or firmware, unit of the present invention is be used to the instructions/code section of carrying out necessary task.Program or code segment (for example can be stored in machine readable media, processor readable medium or computer program) in, perhaps in transmission medium or communication link, by being included in carrier wave or being transmitted by the computer data signal in the signal of carrier modulation.Machine readable media can comprise can store or transmit machine (for example, processor, computing machine etc.) but any medium of the information of readable and execute form.The example of machine readable media comprises electronic circuit, semiconductor storage unit, ROM, flash memory, erasable programmable ROM (EPROM), floppy disk, compact disk (CD-ROM), CD, hard disk, fiber medium, radio frequency (RF) link etc.Computer data signal can comprise any signal that can propagate at transmission medium, and described transmission medium is such as being electronic network channels, optical fiber, air, electromagnetic medium, radio frequency (RF) link, bar code etc.Code segment can be downloaded via networks such as the Internet, intranets.
Although the present invention detailed illustrate and described one relevant and specifically implement the example reference, those skilled in the art should be understood that, can make in the form and details various changes not deviating from the spirit and scope of the present invention.These change the claim of the present invention scope required for protection that all will fall into.

Claims (6)

1. power business data filtering system based on rough set and gene expression is used for ensureing it is characterized in that the safety of power business data, and described power business data filtering system comprises:
Data pre-processor is used for pending all kinds of power business data are carried out the data pre-service, and described data pre-service can comprise data scrubbing and data transformation;
Data attribute yojan controller is used for yojan power business data acquisition, simplifies the power business data set;
The data filtering controller is used for responsive power business data are carried out intelligently filters, guarantees the security of power business data transmission.
2. the power business data security protecting system in according to claim 1 is characterized in that:
Data attribute yojan controller adopts rough set method yojan power business data acquisition, reduces the complexity of data.
3. the power business data security protecting system in according to claim 2 is characterized in that:
The data filtering controller utilizes the gene expression method to set up the power business data classification model, based on the sensitivity of these disaggregated model initiative recognition power business data, Cooperation Strategy knowledge base Leakage prevention.
4. power business data filtering method based on rough set and gene expression is characterized in that the step that the method comprises is:
By data pre-processor pending all kinds of power business data are carried out the data pre-service, described data pre-service can comprise data scrubbing and data transformation;
By data attribute yojan controller yojan power business data acquisition, simplify the power business data set;
By the data filtering controller responsive power business data are carried out intelligently filters, guarantee the security of power business data transmission.
5. the data filtering method in according to claim 4 is characterized in that:
Data attribute yojan controller adopts rough set method yojan power business data acquisition, reduces the complexity of data.
6. the data filtering method in according to claim 5 is characterized in that:
The data filtering controller utilizes the gene expression method to set up the power business data classification model, based on the sensitivity of these disaggregated model initiative recognition power business data, Cooperation Strategy knowledge base Leakage prevention.
CN201210335416.XA 2012-09-11 2012-09-11 A kind of power business data filtering system based on rough set and gene expression and method Expired - Fee Related CN102915423B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210335416.XA CN102915423B (en) 2012-09-11 2012-09-11 A kind of power business data filtering system based on rough set and gene expression and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210335416.XA CN102915423B (en) 2012-09-11 2012-09-11 A kind of power business data filtering system based on rough set and gene expression and method

Publications (2)

Publication Number Publication Date
CN102915423A true CN102915423A (en) 2013-02-06
CN102915423B CN102915423B (en) 2016-01-20

Family

ID=47613786

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210335416.XA Expired - Fee Related CN102915423B (en) 2012-09-11 2012-09-11 A kind of power business data filtering system based on rough set and gene expression and method

Country Status (1)

Country Link
CN (1) CN102915423B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103297302A (en) * 2013-05-07 2013-09-11 河北旭辉电气股份有限公司 Digital substation Ethernet data processing device
CN104750813A (en) * 2015-03-30 2015-07-01 浪潮集团有限公司 Data cleaning method based on data reduction model
CN106156046A (en) * 2015-03-27 2016-11-23 中国移动通信集团云南有限公司 A kind of informatization management method, device, system and analytical equipment
CN107679089A (en) * 2017-09-05 2018-02-09 全球能源互联网研究院 A kind of cleaning method for electric power sensing data, device and system
CN108062363A (en) * 2017-12-05 2018-05-22 南京邮电大学 A kind of data filtering method and system towards active power distribution network
CN109978715A (en) * 2017-12-28 2019-07-05 北京南瑞电研华源电力技术有限公司 User side distributed generation resource Data Reduction method and device
CN111222139A (en) * 2020-02-24 2020-06-02 南京邮电大学 GEP optimization-based smart power grid data anomaly effective identification method
CN113449060A (en) * 2021-06-29 2021-09-28 金陵科技学院 Geographic big data security risk assessment method based on mixed gene expression programming

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706883A (en) * 2009-11-09 2010-05-12 北京航空航天大学 Data mining method and device
CN102457893A (en) * 2010-10-26 2012-05-16 中国移动通信集团公司 Data processing method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706883A (en) * 2009-11-09 2010-05-12 北京航空航天大学 Data mining method and device
CN102457893A (en) * 2010-10-26 2012-05-16 中国移动通信集团公司 Data processing method and device

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
刘发升等: "一种基于粗糙集的新的数据预处理算法", 《计算机工程与应用》, 1 May 2005 (2005-05-01), pages 177 - 179 *
吴为英: "基于粗糙集理论的商业数据挖掘", 《中国优秀学位论文全文数据库》, 30 April 2003 (2003-04-30) *
李文波等: "基于核方法的敏感信息过滤的研究", 《通信学报》, 30 April 2008 (2008-04-30) *
段磊等: "基因表达式编程ORF过滤算子的设计和实现", 《四川大学学报》, 30 November 2007 (2007-11-30) *
黄容伟等: "基于粗糙集理论的数据预处理", 《广西师范学院学报》, vol. 23, no. 4, 31 December 2006 (2006-12-31), pages 87 - 92 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103297302A (en) * 2013-05-07 2013-09-11 河北旭辉电气股份有限公司 Digital substation Ethernet data processing device
CN103297302B (en) * 2013-05-07 2016-08-24 河北旭辉电气股份有限公司 Digital transformer substation Ethernet data processing means
CN106156046A (en) * 2015-03-27 2016-11-23 中国移动通信集团云南有限公司 A kind of informatization management method, device, system and analytical equipment
CN106156046B (en) * 2015-03-27 2021-03-30 中国移动通信集团云南有限公司 Information management method, device and system and analysis equipment
CN104750813A (en) * 2015-03-30 2015-07-01 浪潮集团有限公司 Data cleaning method based on data reduction model
CN107679089A (en) * 2017-09-05 2018-02-09 全球能源互联网研究院 A kind of cleaning method for electric power sensing data, device and system
CN107679089B (en) * 2017-09-05 2021-10-15 全球能源互联网研究院 Cleaning method, device and system for power sensing data
CN108062363A (en) * 2017-12-05 2018-05-22 南京邮电大学 A kind of data filtering method and system towards active power distribution network
CN109978715A (en) * 2017-12-28 2019-07-05 北京南瑞电研华源电力技术有限公司 User side distributed generation resource Data Reduction method and device
CN111222139A (en) * 2020-02-24 2020-06-02 南京邮电大学 GEP optimization-based smart power grid data anomaly effective identification method
CN111222139B (en) * 2020-02-24 2022-06-03 南京邮电大学 GEP optimization-based smart power grid data anomaly effective identification method
CN113449060A (en) * 2021-06-29 2021-09-28 金陵科技学院 Geographic big data security risk assessment method based on mixed gene expression programming

Also Published As

Publication number Publication date
CN102915423B (en) 2016-01-20

Similar Documents

Publication Publication Date Title
CN102915423B (en) A kind of power business data filtering system based on rough set and gene expression and method
CN108229181B (en) Differential privacy and outlier detection in non-interactive models
Ghorbanian et al. Big data issues in smart grids: A survey
CN107358116B (en) A kind of method for secret protection in multi-sensitive attributes data publication
CN109754258B (en) Online transaction fraud detection method based on individual behavior modeling
CN108038130A (en) Automatic cleaning method, device, equipment and the storage medium of fictitious users
Li et al. Retracted: Design of multimedia blockchain privacy protection system based on distributed trusted communication
Kanimozhi et al. Oppositional tunicate fuzzy C‐means algorithm and logistic regression for intrusion detection on cloud
Walshe et al. Artificial intelligence as enabler for sustainable development
Sun et al. Attention-based graph neural networks: a survey
Zheng et al. Efficient publication of distributed and overlapping graph data under differential privacy
CN105630797A (en) Data processing method and system
CN116628360A (en) Social network histogram issuing method and device based on differential privacy
CN111597411A (en) Method and system for distinguishing and identifying power protocol data frames
CN105988998A (en) Relationship network establishment method and device
Fei et al. RETRACTED: Optimization of Communication Network Fault Identification Based on NB-IoT
Zhang et al. A hierarchical clustering strategy of processing class imbalance and its application in fraud detection
CN112837060B (en) Payment business processing method for block chain security protection and digital financial platform
CN115563069A (en) Data sharing processing method and system based on artificial intelligence and cloud platform
Yan et al. A local differential privacy based method to preserve link privacy in mobile social network
CN116151409A (en) Urban daily water demand prediction method based on neural network
Gunavathi et al. Big data analysis for anomaly detection in telecommunication using clustering techniques
CN116136843A (en) Multi-source heterogeneous mass data fusion sharing method under complex service scene
CN107067222A (en) Management method, the device and system of financial data
CN109783569A (en) A kind of account book recording method, device and terminal device based on block chain

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160120

Termination date: 20160911

CF01 Termination of patent right due to non-payment of annual fee