CN102915423A - System and method for filtering electric power business data on basis of rough sets and gene expressions - Google Patents
System and method for filtering electric power business data on basis of rough sets and gene expressions Download PDFInfo
- Publication number
- CN102915423A CN102915423A CN201210335416XA CN201210335416A CN102915423A CN 102915423 A CN102915423 A CN 102915423A CN 201210335416X A CN201210335416X A CN 201210335416XA CN 201210335416 A CN201210335416 A CN 201210335416A CN 102915423 A CN102915423 A CN 102915423A
- Authority
- CN
- China
- Prior art keywords
- data
- power business
- business data
- filtering
- attribute
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001914 filtration Methods 0.000 title claims abstract description 78
- 238000000034 method Methods 0.000 title claims abstract description 69
- 230000014509 gene expression Effects 0.000 title claims abstract description 28
- 238000013145 classification model Methods 0.000 claims abstract description 10
- 230000035945 sensitivity Effects 0.000 claims abstract description 6
- 230000005540 biological transmission Effects 0.000 claims description 27
- 238000013501 data transformation Methods 0.000 claims description 5
- 230000002265 prevention Effects 0.000 claims description 4
- 238000005201 scrubbing Methods 0.000 claims 2
- 230000009467 reduction Effects 0.000 abstract description 31
- 238000012360 testing method Methods 0.000 description 36
- 238000012549 training Methods 0.000 description 30
- 238000007781 pre-processing Methods 0.000 description 20
- 238000003860 storage Methods 0.000 description 11
- 238000010276 construction Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 8
- 238000007726 management method Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 238000004140 cleaning Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000004519 manufacturing process Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 230000000903 blocking effect Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000013480 data collection Methods 0.000 description 2
- 230000008030 elimination Effects 0.000 description 2
- 238000003379 elimination reaction Methods 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000009469 supplementation Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明提出了一种新的电力业务数据过滤的系统和方法,来解决电力业务数据保护系统中的敏感数据过滤问题,该系统主要有数据预处理器、数据属性约简控制器、数据过滤控制器等部分组成。采用粗糙集方法约简电力业务数据集合,降低数据的复杂度,利用基因表达式方法建立电力业务数据分类模型,基于该分类模型主动识别电力业务数据的敏感程度,配合策略知识库防止泄漏。
The present invention proposes a new system and method for power business data filtering to solve the problem of sensitive data filtering in the power business data protection system. The system mainly includes data preprocessor, data attribute reduction controller, data filtering control device and other components. The rough set method is used to simplify the power business data set to reduce the complexity of the data. The gene expression method is used to establish the power business data classification model. Based on the classification model, the sensitivity of the power business data is actively identified, and the policy knowledge base is used to prevent leakage.
Description
技术领域 technical field
本发明涉及电力业务数据安全保护系统中的敏感数据过滤,主要用于解决电力业务数据保护系统中敏感数据过滤的问题,属于信息安全领域。The invention relates to sensitive data filtering in a power business data security protection system, mainly used to solve the problem of sensitive data filtering in the power business data protection system, and belongs to the field of information security.
背景技术 Background technique
“十一五”到目前,国家电网公司通过“SG186”工程建设,其信息系统已基本覆盖其主要业务领域,信息化对企业战略发展的支撑效果明显。对于“十二五”的信息化建设,国家电网公司围绕智能电网,提出了为智能电网提供信息支撑的“SG-ERP”计划,其目标是:利用现代通信和信息技术,在电网数字化和自动化的基础之上,深化电力各环节的数据采集、传输、存储和利用,实现数据采集数字化、生产过程自动化、业务处理互动化、经营管理信息化、战略决策科学化;助力智能电网建设,全面提升公司生产、经营、管理和决策水平。可见SG-ERP作为信息化建设工程,已经将业务融合作为首要出发点,已经着眼于打破先前电力自动化和电力信息化相对独立、各自发展的局面,已经将信息化建设延伸到了电力生产核心业务中。From the "Eleventh Five-Year Plan" to the present, the State Grid Corporation has passed the construction of the "SG186" project, and its information system has basically covered its main business areas. For the informatization construction of the "Twelfth Five-Year Plan", the State Grid Corporation of China proposed the "SG-ERP" plan to provide information support for the smart grid around the smart grid. On the basis of the above, deepen the data collection, transmission, storage and utilization of all links of electric power, realize the digitalization of data collection, automation of production process, interactive business processing, informatization of operation and management, and scientific strategic decision-making; help the construction of smart grid and comprehensively improve The level of production, operation, management and decision-making of the company. It can be seen that SG-ERP, as an informatization construction project, has taken business integration as the primary starting point, has focused on breaking the previous situation where power automation and power informatization were relatively independent and developed independently, and has extended informatization construction to the core business of power production.
随着SG-ERP信息化建设的实施和不断深入推进,越来越多的电力业务应用系统中(安全生产、营销管理、物资管理等)将广泛采用移动智能终端接入方式和电力信息内网进行实时、非实时的数据通讯和数据交换。同时,随着坚强智能电网的建设,大量智能采集和智能终端设备的使用,3G/WIFI等无线通信技术的广泛使用,使得各种电力业务应用数据被破坏和泄漏出去的途径大大增加。电力业务应用数据保护是智能电网发、输、变、配、用等各环节业务系统安全稳定运营的基础。同时随着国家电网公司三大数据中心的建设,各种业务系统数据越来越集中进行存储,敏感数据的可靠存储与防护变得愈加重要。With the implementation and continuous advancement of SG-ERP information construction, more and more power business application systems (safety production, marketing management, material management, etc.) will widely use mobile smart terminal access methods and power information intranet Carry out real-time and non-real-time data communication and data exchange. At the same time, with the construction of a strong smart grid, the use of a large number of intelligent collection and intelligent terminal equipment, and the widespread use of wireless communication technologies such as 3G/WIFI, the ways for various power service application data to be destroyed and leaked have greatly increased. Power business application data protection is the basis for the safe and stable operation of business systems in all links of smart grid generation, transmission, transformation, distribution, and use. At the same time, with the construction of the three major data centers of the State Grid Corporation of China, various business system data are increasingly stored in a centralized manner, and the reliable storage and protection of sensitive data has become increasingly important.
针对电力业务应用敏感数据保护而言,其核心技术就是电力业务应用数据在存储和传输过程中能够被有效识别,并针对相应的策略来实现其过滤,从而达到防止其泄露的目的。而数据识别过滤的方法也很多,有基于策略匹配、基于BP网络等。不同的数据识别过滤方法,其效果不甚相同,主要体现在性能、自动化程度、日常管理以及可扩展性等方面。电力业务应用数据识别过滤的最终目的是为了在其存储和传输过程中,能够通过一定的策略方法对其进行有效的保护,防止涉及国家电网公司机密的业务数据泄露。因此在电力业务应用敏感数据保护系统中研究一种有效的数据识别过滤方案对于提高敏感数据保护能力、降低其泄露、保障电力业务系统安全稳定运行都具有重要的意义。For the protection of sensitive data of power business applications, the core technology is that power business application data can be effectively identified during storage and transmission, and filtered according to corresponding policies, so as to prevent its leakage. There are also many methods for data identification and filtering, including policy-based matching and BP network-based methods. Different data identification and filtering methods have different effects, which are mainly reflected in performance, automation, daily management, and scalability. The ultimate purpose of identifying and filtering power business application data is to effectively protect it through certain strategies and methods during its storage and transmission, and prevent the leakage of business data involving State Grid Corporation's confidentiality. Therefore, it is of great significance to study an effective data identification and filtering scheme in the power business application sensitive data protection system to improve the protection ability of sensitive data, reduce its leakage, and ensure the safe and stable operation of the power business system.
随着国家电网公司信息化技术的不断深入发展,各种信息化技术逐渐成熟并被应用到各种电力业务应用中,所使用的数据位于不同存储节点上,随着各类电力业务应用数据之间的共享,现有的安全机制无法保证其在存储和传输过程中的不被泄露。为了确保这些电力业务应用数据在存储和传输过程中的安全性,防止敏感数据的泄露,可以采用全文加密、策略匹配以及人工智能等方法。全文加密可以解决数据在存储和传输过程中的安全性,但针对海量电力业务数据而言,难以有效地保证其性能和准确性。策略匹配方法既能解决电力业务数据的安全问题,又可以保证其性能和准确性,但是针对不同电力业务系统,需要制定的策略各不相同,因此需要一套较为复杂的策略库支持才能达到电力业务应用数据防泄漏的目的。而各种人工智能的方法既能满足数据安全防护的需求,又可以充分利用方法的智能化和自学习的能力来提高数据过滤的性能。With the continuous and in-depth development of the information technology of the State Grid Corporation of China, various information technologies have gradually matured and been applied to various power business applications. The data used are located on different storage nodes. The existing security mechanism cannot guarantee that it will not be leaked during storage and transmission. In order to ensure the security of these power business application data during storage and transmission and prevent the leakage of sensitive data, methods such as full-text encryption, policy matching, and artificial intelligence can be used. Full-text encryption can solve the security of data during storage and transmission, but it is difficult to effectively guarantee its performance and accuracy for massive power business data. The strategy matching method can not only solve the security problem of power business data, but also ensure its performance and accuracy. However, for different power business systems, the strategies that need to be formulated are different, so a set of more complex policy library support is needed to achieve power. The purpose of business application data leakage prevention. Various artificial intelligence methods can not only meet the needs of data security protection, but also can make full use of the intelligence and self-learning ability of the method to improve the performance of data filtering.
数据识别过滤方法主要从以下几个方面来进行考虑:(1)针对采集得到的各类电力业务数据,为了便于后期数据过滤处理,对其进行数据清洗、噪声数据剔除等预处理;(2)针对预处理后的电力业务数据,结合粗糙集方法约简其属性,降低数据的复杂度;(3)针对属性约简后的电力业务数据,利用基因表达式方法建立电力业务数据分类模型,基于该分类模型在电力业务数据存储和传输过程中,主动识别其敏感程度,并配合策略知识库防止其泄露。The data identification and filtering method is mainly considered from the following aspects: (1) For the collected various types of power business data, in order to facilitate the later data filtering and processing, perform preprocessing such as data cleaning and noise data elimination; (2) For the preprocessed power business data, combined with the rough set method to reduce its attributes to reduce the complexity of the data; (3) For the power business data after attribute reduction, the gene expression method is used to establish a classification model of power business data, based on During the storage and transmission of power business data, the classification model actively identifies its sensitivity, and cooperates with the policy knowledge base to prevent its leakage.
发明内容 Contents of the invention
为了解决上述的问题,本发明的目的就是提供一种新的有关电力业务数据过滤的系统和方法,来解决电力业务数据保护系统中的敏感数据过滤问题,所采用的机制是一种策略性方法,通过使用本发明可以使得各类电力业务敏感数据在终端和网络间传输时可以最大限度地防止电力业务敏感数据的泄露,从而保护各类电力业务系统安全稳定运行。In order to solve the above problems, the purpose of the present invention is to provide a new system and method for filtering power business data to solve the problem of sensitive data filtering in the power business data protection system. The mechanism adopted is a strategic method By using the present invention, the leakage of sensitive data of various electric power services can be prevented to the greatest extent when the sensitive data of electric power service is transmitted between the terminal and the network, thereby protecting the safe and stable operation of various electric service systems.
根据本发明的一个方面,提出了一种基于粗糙集和基因表达式的电力业务数据过滤系统,所述电力业务数据过滤系统包括:According to one aspect of the present invention, a power service data filtering system based on rough sets and gene expressions is proposed, and the power service data filtering system includes:
数据预处理器,用于对待处理的各类电力业务数据进行数据预处理,所述数据预处理可包括数据清理及数据变换等;A data preprocessor is used to perform data preprocessing on various types of power business data to be processed, and the data preprocessing may include data cleaning and data transformation, etc.;
数据属性约简控制器,用于约简电力业务数据集合,简化电力业务数据集;The data attribute reduction controller is used to reduce the power business data set and simplify the power business data set;
数据过滤控制器,用于对敏感电力业务数据进行智能过滤,保证电力业务数据传输的安全性。The data filtering controller is used to intelligently filter sensitive power business data to ensure the security of power business data transmission.
根据本发明的一个方面,在电力业务数据安全保护系统中,数据属性约简控制器采用粗糙集方法约简电力业务数据集合,降低数据的复杂度。According to one aspect of the present invention, in the power service data security protection system, the data attribute reduction controller adopts a rough set method to reduce the power service data set to reduce the complexity of the data.
根据本发明的一个方面,在电力业务数据安全保护系统中,数据过滤控制器利用基因表达式方法建立电力业务数据分类模型,基于该分类模型主动识别电力业务数据的敏感程度,配合策略知识库防止泄漏。According to one aspect of the present invention, in the power business data security protection system, the data filtering controller uses the gene expression method to establish a power business data classification model, and actively identifies the sensitivity of the power business data based on the classification model, and cooperates with the policy knowledge base to prevent leakage.
根据本发明的一个方面,提出了一种基于粗糙集和基因表达式的电力业务数据过滤方法,所述电力业务数据方法用于电力业务数据安全保护,包含的步骤为:According to one aspect of the present invention, a method for filtering power business data based on rough sets and gene expressions is proposed. The power business data method is used for power business data security protection, and the steps included are:
通过数据预处理器对待处理的各类电力业务数据进行数据预处理,所述数据预处理可包括数据清理及数据变换等;Perform data preprocessing on various types of power business data to be processed through the data preprocessor, and the data preprocessing may include data cleaning and data transformation, etc.;
通过数据属性约简控制器约简电力业务数据集合,简化电力业务数据集;Reduce the power business data set through the data attribute reduction controller to simplify the power business data set;
通过数据过滤控制器对敏感电力业务数据进行智能过滤,保证电力业务数据传输的安全性。Intelligently filter sensitive power business data through the data filtering controller to ensure the security of power business data transmission.
根据本发明的一个方面,在数据过滤方法中,数据属性约简控制器采用粗糙集方法约简电力业务数据集合,降低数据的复杂度。According to one aspect of the present invention, in the data filtering method, the data attribute reduction controller adopts the rough set method to reduce the power business data set, reducing the complexity of the data.
根据本发明的一个方面,在数据过滤方法中,数据过滤控制器利用基因表达式方法建立电力业务数据分类模型,基于该分类模型主动识别电力业务数据的敏感程度,配合策略知识库防止泄漏。According to one aspect of the present invention, in the data filtering method, the data filtering controller uses the gene expression method to establish a power service data classification model, actively identifies the sensitivity of the power service data based on the classification model, and cooperates with the policy knowledge base to prevent leakage.
附图说明 Description of drawings
下面结合附图及具体实施例对本发明再作进一步详细的说明。The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.
图1是根据本发明一个实施例的数据过滤结构图,主要包括:数据预处理器、数据属性约简控制器、数据过滤控制器以及数据过滤操作核心。Fig. 1 is a structure diagram of data filtering according to an embodiment of the present invention, which mainly includes: a data preprocessor, a data attribute reduction controller, a data filtering controller and a data filtering operation core.
图2是根据本发明一个实施例的参考体系结构示意图,表示本发明包括的组件。Fig. 2 is a schematic diagram of a reference architecture according to an embodiment of the present invention, showing components included in the present invention.
图3是根据本发明一个实施例的方法流程示意图。Fig. 3 is a schematic flowchart of a method according to an embodiment of the present invention.
具体实施方式 Detailed ways
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.
本发明的方法是一种策略性的方法,通过对各种存储或传输过程中的电力业务数据进行预处理,得到符合粗糙集处理的电力业务样本数据集,然后通过粗糙集理论对预处理后电力业务样本数据集进行有效的属性约简,大大降低电力业务数据识别过滤的时间复杂度,最后基于基因表达式算法对约简后的数据集进行分类处理,实现各类电力业务数据的过滤。The method of the present invention is a strategic method. By preprocessing the power business data in various storage or transmission processes, the power business sample data set that conforms to rough set processing is obtained, and then the preprocessed power business data set is processed by rough set theory. The effective attribute reduction of the power business sample data set greatly reduces the time complexity of power business data identification and filtering. Finally, based on the gene expression algorithm, the reduced data set is classified and processed to realize the filtering of various power business data.
图1给出了基于粗糙集和基因表达式的电力业务数据过滤系统的结构图,它主要包括五个部分:数据预处理器、数据属性约简控制器、数据过滤控制器以及数据过滤操作核心。图中的数据过滤操作核心包括了在数据预处理和分类好的情况下,对数据进行过滤所需的具体操作。本发明增加了其它三个部分保证数据过滤更加顺利有效地进行,最大限度地保证数据的识别过滤能力,减小电力业务敏感数据的泄露风险。Figure 1 shows the structure diagram of the power business data filtering system based on rough sets and gene expressions, which mainly includes five parts: data preprocessor, data attribute reduction controller, data filtering controller and data filtering operation core . The core of the data filtering operation in the figure includes the specific operations required to filter the data when the data is preprocessed and classified. The present invention adds three other parts to ensure that data filtering can be carried out more smoothly and effectively, ensure the ability of identifying and filtering data to the greatest extent, and reduce the risk of leakage of sensitive data of electric power business.
下面给出具体介绍:The specific introduction is given below:
数据预处理器:为了保障电力业务系统安全稳定运行,最重要的就是各类电力业务数据在存储和传输过程的安全,其中防止其数据泄露是重中之重。在进行电力业务数据识别和保护之前,需要对待处理的各类电力业务数据进行数据清理及数据变换等数据预处理。为了提高过滤电力业务数据的质量,降低其时间复杂度,首先针对各类电力业务数据,通过填写相应属性的缺失值、平滑噪声数据等清理数据;其次通过规范化等方法将电力业务数据转换成适用于识别过滤的形式。在本专利中对数据预处理的具体实现不做任何限制。Data preprocessor: In order to ensure the safe and stable operation of the power business system, the most important thing is the security of various power business data during storage and transmission, among which preventing data leakage is the top priority. Before identifying and protecting power business data, it is necessary to perform data preprocessing such as data cleaning and data transformation for various power business data to be processed. In order to improve the quality of filtering power business data and reduce its time complexity, firstly, for all kinds of power business data, the data is cleaned by filling in the missing values of corresponding attributes, smoothing noise data, etc.; secondly, the power business data is converted into applicable in the form of recognition filtering. In this patent, there is no limitation on the specific implementation of data preprocessing.
数据属性约简控制器:对于经过预处理后的电力业务数据集合,由于电力业务自身的特性,造成该数据集合属性众多,如果没有相应的属性约简方法,将会导致电力业务数据集识别过滤的复杂度大大增加,同时其处理的效率也将大大下降。数据属性约简控制器针对预处理后的电力业务数据集合,在不改变其固有分类能力的情况下,采用粗糙集方法来约简电力业务数据集合,删除电力业务数据集合中的属性冗余信息,简化电力业务数据集。Data attribute reduction controller: For the preprocessed power business data set, due to the characteristics of the power business itself, the data set has many attributes. If there is no corresponding attribute reduction method, it will lead to identification and filtering of the power business data set The complexity of the process is greatly increased, and the efficiency of its processing will also be greatly reduced. The data attribute reduction controller adopts the rough set method to reduce the power business data set and deletes the attribute redundant information in the power business data set without changing the inherent classification ability of the preprocessed power business data set , to simplify the power business dataset.
数据过滤控制器:为了保证电力业务数据在终端和网络间传输时,能够快速有效地识别电力业务数据中是否包含敏感信息,在数据保护系统中仅基于关键字匹配是远远不够的,因为关键字匹配并不能对恶意泄露者更改数据自身属性等行为进行有效的检查,无法保证各类敏感电力业务数据在传输过程中的保密。如果恶意泄露者对待传输的敏感数据更改其文件名、文件属性等,那么数据保护系统将无法保证其安全。因此必须建立智能化自动化的数据识别过滤模型。本专利中采用基因表达式来建立电力业务数据识别过滤模型。Data filtering controller: In order to ensure that when the power business data is transmitted between the terminal and the network, it can quickly and effectively identify whether the power business data contains sensitive information. In the data protection system, only based on keyword matching is far from enough, because the key Word matching cannot effectively check behaviors such as changing data attributes by malicious leakers, and cannot guarantee the confidentiality of various sensitive power business data during transmission. If a malicious leaker changes the file name, file attributes, etc. of the sensitive data to be transferred, then the data protection system will not be able to keep it safe. Therefore, an intelligent and automated data identification and filtering model must be established. In this patent, gene expression is used to establish a power service data identification and filtering model.
1、数据预处理器1. Data preprocessor
电力业务数据集是否需要进行预处理取决于数据集的格式是否满足识别过滤方法的要求。为了及时判断出电力业务数据集是否需要进行相应的预处理操作,在本方法中,建立一个数据预处理规则库,当用户在执行数据过滤时,首先通过查询数据预处理规则库,判断当前数据集中的属性值是否有缺失,若某一属性值有缺失,则通过使用当前数据集该属性值的均值来填充;其次通过聚类的方法来判断当前数据集中是否含有噪声数据,若有,则删除对应的噪声数据;最后判断当前数据集中是否含有字符型数据,若有,则通过规范化方法将其变换成数值型数据,最终形成符合数据过滤方法要求的电力业务数据集。在这里我们以原始电力业务数据集ODataSet为例,整个ODataSet的数据结构形式如表1所示。Whether the power business data set needs to be preprocessed depends on whether the format of the data set meets the requirements of the identification and filtering method. In order to judge in time whether corresponding preprocessing operations are required for the power business data set, in this method, a data preprocessing rule base is established. When the user is performing data filtering, firstly, by querying the data preprocessing rule base, it is judged that the current data Whether the attribute value in the set is missing, if a certain attribute value is missing, it will be filled by using the mean value of the attribute value of the current data set; secondly, the clustering method is used to judge whether the current data set contains noise data, and if so, then Delete the corresponding noise data; finally determine whether the current data set contains character data, and if so, convert it into numerical data through a normalization method, and finally form a power business data set that meets the requirements of the data filtering method. Here we take the original power business dataset ODataSet as an example, the data structure of the entire ODataSet is shown in Table 1.
表1ODataSet的数据结构Table 1 Data structure of ODataSet
从表1中可以看出,原始电力业务数据集ODataSet中第2条数据的电流属性值缺失,通过计算该属性值的均值(25A)来填充该属性值。经过聚类方法可发现第4条数据中电压值异常是噪声数据,为了不影响最终数据过滤的性能和准确率,删除该条数据。通过ODataSet的风向属性值均为字符型,不符合本专利中所提数据过滤方法对属性值为数值型的要求,故根据风向属性值的组合,对其进行数值化处理。整个处理后的最终数据集UDataSet的数据结构形式如表2所示。It can be seen from Table 1 that the current attribute value of the second piece of data in the original power business data set ODataSet is missing, and the attribute value is filled by calculating the mean value (25A) of the attribute value. After the clustering method, it can be found that the abnormal voltage value in the fourth data is noise data. In order not to affect the performance and accuracy of the final data filtering, this data is deleted. The wind direction attribute values passed through ODataSet are all character type, which does not meet the requirement of the data filtering method proposed in this patent for the attribute value to be numerical. Therefore, according to the combination of wind direction attribute values, it is numerically processed. The data structure form of the final data set UDataSet after the whole processing is shown in Table 2.
表2UDataSet的数据结构Table 2 Data structure of UDataSet
2、数据属性约简控制器2. Data attribute reduction controller
经过上述数据预处理后的电力业务数据集在进行数据过滤时,由于并没有对数据集自身的属性进行归约,从而会造成数据过滤的性能大大下降。为了提高敏感电力业务数据集识别过滤的效率,数据属性约简控制器在不改变其固有分类能力的情况下,利用粗糙集方法来约简预处理后的电力业务数据集,大大降低数据集过滤的复杂度。When data filtering is performed on the electric power business data set after the above-mentioned data preprocessing, the performance of data filtering is greatly degraded because the attributes of the data set itself are not reduced. In order to improve the efficiency of identifying and filtering sensitive power business data sets, the data attribute reduction controller uses the rough set method to reduce the preprocessed power business data sets without changing its inherent classification ability, greatly reducing the data set filtering. of complexity.
为了清晰描述数据属性约简的工作流程,首先设样本决策表T=<U,C∪D,V,f>,其中U为样本数据的研究对象集合,C∪D=R为样本数据的属性集合,C={c1,c2,...,cn}为样本数据的条件属性集合,D={d1,d2,...,dm}为样本数据的决策属性集合,V=∪vr,r∈R是样本数据属性值的集合,vr表示某一个属性r∈R的属性值范围,f:U×R→V定义一个信息函数,它指定U中每一对象x的属性值,即对有f(x,r)∈vr。In order to clearly describe the workflow of data attribute reduction, first set the sample decision table T=<U, C∪D, V, f>, where U is the research object set of sample data, C∪D=R is the attribute of sample data Set, C={c 1 , c 2 ,...,c n } is the set of condition attributes of the sample data, D={d 1 , d 2 ,...,d m } is the set of decision attributes of the sample data, V=∪v r , r∈R is a set of sample data attribute values, v r represents the attribute value range of a certain attribute r∈R, f: U×R→V defines an information function, which specifies each object in U The attribute value of x, that is, for There is f(x, r) ∈ v r .
主要工作流程如下:The main workflow is as follows:
(1)首先判断样本决策表T是否协调,若不协调,则将该样本决策表T分成一个协调的样本决策表T′和一个不协调的样本决策表T″,并将样本决策表T″中的所有条件属性加入到最终的属性约简集合reductionSet中;(1) First judge whether the sample decision table T is coordinated, if not, divide the sample decision table T into a coordinated sample decision table T′ and an uncoordinated sample decision table T″, and divide the sample decision table T″ All condition attributes in are added to the final attribute reduction set reductionSet;
(2)然后针对协调的样本决策表T′条件属性集合中的每一个条件属性c,判断样本决策表T′中的条件属性相对于决策属性的正域POSC(D)是否等于样本决策表T′中的条件属性集去掉c后相对于决策属性的正域POSC-{c}(D),若相等则表示该条件属性c可约简,并将该条件属性c加入最终的属性约简集合reductionSet中。(2) Then, for each condition attribute c in the coordinated sample decision table T′ condition attribute set, judge whether the positive domain POS C (D) of the condition attribute in the sample decision table T′ relative to the decision attribute is equal to the sample decision table The conditional attribute set in T′ after removing c is relative to the positive field POS C-{c} (D) of the decision attribute. If they are equal, it means that the conditional attribute c can be reduced, and the conditional attribute c can be added to the final attribute reduction Jane collection reductionSet.
(3)最后,将样本决策表T=<U,C∪D,V,f>中条件属性集合C去掉属性约简集合reductionSet后得到约简后的样本决策表T=<U,C′∪D,V,f>。(3) Finally, remove the attribute reduction set reductionSet from the conditional attribute set C in the sample decision table T=<U, C∪D, V, f> to obtain the reduced sample decision table T=<U, C′∪ D, V, f>.
针对协调的样本决策表,通过计算其条件属性相对决策属性的正域来判断条件属性集合中的每一个条件属性是否可约简,从而在保证不改变原样本数据固有分类决策能力的前提下,达到约简样本数据中条件属性的目的。For the coordinated sample decision table, by calculating the positive domain of its condition attribute relative to the decision attribute, it is judged whether each condition attribute in the condition attribute set can be reduced, so as to ensure that the inherent classification decision-making ability of the original sample data is not changed. To achieve the purpose of reducing the conditional attributes in the sample data.
3、数据过滤控制器3. Data filtering controller
为了在各类电力业务数据传输过程中,能够快速有效地保证敏感电力业务数据不被泄露,必须在该数据传输中能够有效对敏感电力业务数据进行智能过滤,保证电力业务数据传输的安全性。如何设计有效的智能数据过滤方法既能保证数据传输的安全性,又能最大化地提高敏感数据的过滤性能,本发明提出了基于基因表达式的电力业务数据过滤方法。In order to quickly and effectively ensure that sensitive power business data is not leaked during the transmission of various power business data, it is necessary to effectively filter sensitive power business data intelligently during the data transmission to ensure the security of power business data transmission. How to design an effective intelligent data filtering method that can not only ensure the security of data transmission, but also maximize the filtering performance of sensitive data, the present invention proposes a power service data filtering method based on gene expression.
首先根据专家知识库来构建带有过滤属性的电力业务训练样本数据,如表3所示,然后使用基因表达式方法对该电力业务训练样本数据在经过预处理和属性约简后挖掘出过滤属性F与相应条件属性{x1,x2,...,xn}之间的函数关系F=f(x1,x2,...,xn),然后对待过滤的电力业务测试样本数据中每一条数据带入F=f(x1,x2,...,xn)中得到值F',并计算目标值F'和实际值F之间的误差,满足事先给定的阈值即判处该条数据位敏感数据,并根据专家知识库阻止其传输。Firstly, according to the expert knowledge base, the power business training sample data with filter attributes is constructed, as shown in Table 3, and then the gene expression method is used to mine the filter attributes of the power business training sample data after preprocessing and attribute reduction The functional relationship F=f(x 1 ,x 2 ,...,x n ) between F and the corresponding condition attributes {x 1 ,x 2 ,...,x n }, and then the power service test sample to be filtered Put each piece of data in the data into F=f(x 1 ,x 2 ,...,x n ) to get the value F', and calculate the error between the target value F' and the actual value F, satisfying the given in advance The threshold is to judge the piece of data as sensitive data, and prevent its transmission according to the expert knowledge base.
表3带有过滤属性的电力业务训练样本数据示例Table 3 Example of power business training sample data with filtering attributes
其中a,b,c,e,f,g分别表示条件属性x1,x2,xn对应的属性值,过滤属性F的值为1表示该条数据为敏感数据,为0则表示该条数据为普通数据。Among them, a, b, c, e, f, and g represent the attribute values corresponding to the conditional attributes x 1 , x 2 , and x n respectively. The value of the filter attribute F is 1, indicating that the piece of data is sensitive data, and 0, indicating that the piece of data is sensitive. The data is normal data.
整个数据过滤控制器的主要工作流程如下:The main workflow of the entire data filtering controller is as follows:
(1)构建带有过滤属性和不带过滤属性的电力业务数据,同时对两种类型的数据进行预处理和属性约简,分别组成待挖掘的训练样本数据集和待过滤测试样本数据集;(1) Construct electric power business data with and without filter attributes, and preprocess and attribute reduce the two types of data at the same time to form the training sample data set to be mined and the test sample data set to be filtered;
(2)根据训练样本数据集特征,确定基因表达式方法的参数,并初始化种群;(2) According to the characteristics of the training sample data set, determine the parameters of the gene expression method and initialize the population;
(3)评价种群中每一个个体的适应度函数值;(3) Evaluate the fitness function value of each individual in the population;
(4)判断是否满足终止条件,若满足则转到第(7)步,否则继续;(4) Judging whether the termination condition is satisfied, if so, go to step (7), otherwise continue;
(5)按照概率执行各种遗传操作;(5) Perform various genetic operations according to probability;
(6)产生新的种群,并转到第(3)步。(6) Create a new population and go to step (3).
(7)返回函数关系式F=f(x1,x2,...,xn),同时带入待过滤测试样本数据进行计算,函数值F最接近1的那条数据判断为敏感数据。(7) Return the functional relationship F=f(x 1 ,x 2 ,...,x n ), and bring in the test sample data to be filtered for calculation. The data whose function value F is closest to 1 is judged as sensitive data .
根据本发明的一个方面,在电力业务数据安全保护系统中的一种新的数据过滤方法,可以采用如下的步骤实现:According to one aspect of the present invention, a new data filtering method in the power service data security protection system can be implemented by the following steps:
步骤1:分别构建带有过滤属性和不带过滤属性的电力业务数据集A和B,用户根据数据预处理器查询数据预处理规则库来判断是否需要进行数据预处理,若需要预处理则进入到下一步,否则转至步骤3;Step 1: Construct electric power business datasets A and B with and without filter attributes respectively. Users can query the data preprocessing rule library according to the data preprocessor to determine whether data preprocessing is required. If preprocessing is required, enter Go to the next step, otherwise go to step 3;
步骤2:首先分别判断当前待处理的电力业务数据集A和B中的属性值是否有缺失,若有,则分别通过使用当前待处理的A和B数据集中该属性值的均值来进行填充;其次通过聚类的方法来判断当前电力业务数据集A和B中是否含有噪声数据,若有,则删除对应的噪声数据;最后判断当前电力业务数据集A和B中是否含有字符型数据,若有,则通过规范化方法将其变换成数值型数据,最终分别形成符合数据过滤方法要求的电力业务训练样本数据集和电力业务测试样本数据集;Step 2: first judge whether there are missing attribute values in the current power business data sets A and B to be processed, and if so, fill them in by using the mean values of the attribute values in the current data sets A and B to be processed; Secondly, through the clustering method, it is judged whether the current power business data sets A and B contain noise data, and if so, the corresponding noise data is deleted; finally, it is judged whether the current power business data sets A and B contain character data, if If there is, transform it into numerical data through a normalization method, and finally form a power business training sample data set and a power business test sample data set that meet the requirements of the data filtering method;
步骤3:根据预处理得到的电力业务训练样本数据和电力业务测试样本数据,分别构建相应的样本决策表Ttrain和Ttest,然后分别判断样本决策表Ttrain和Ttest是否协调,若不协调,则将样本决策表Ttrain和Ttest分别分成协调的样本决策表T′train和T′test以及不协调的样本决策表T″train和T″test,并将样本决策表T″train和T″test中的所有条件属性分别加入到最终的属性约简集合reductionSettrain和reductionSettest中;Step 3: According to the power business training sample data and power business test sample data obtained by preprocessing, respectively construct the corresponding sample decision tables T train and T test , and then judge whether the sample decision tables T train and T test are coordinated, and if not , the sample decision tables T train and T test are divided into coordinated sample decision tables T′ train and T′ test and uncoordinated sample decision tables T″ train and T″ test respectively, and the sample decision tables T″ train and T ″ All conditional attributes in test are added to the final attribute reduction sets reductionSet train and reductionSet test respectively;
步骤4:分别针对协调的样本决策表T′train和T′test条件属性集合中的每一个条件属性c,分别判断样本决策表T′train和T′test中的条件属性相对于决策属性的正域是否等于对应的样本决策表中的条件属性集去掉c后相对于决策属性的正域,若相等,则表示该样本决策表中的条件属性c可约简,并将该条件属性c加入到对应的属性约简集合中;Step 4: For each condition attribute c in the coordinated sample decision table T′ train and T′ test condition attribute set, respectively judge the positive value of the condition attribute in the sample decision table T′ train and T′ test relative to the decision attribute Whether the domain is equal to the positive domain of the decision attribute after removing c from the conditional attribute set in the corresponding sample decision table. If they are equal, it means that the conditional attribute c in the sample decision table can be reduced, and the conditional attribute c is added to In the corresponding attribute reduction set;
步骤5:分别将样本决策表T′train和T′test中条件属性集合C去掉属性约简集合reductionSettrain和reductionSettest,分别得到约简后的训练和测试样本数据集RT′train和RT′test;Step 5: Remove the attribute reduction sets reductionSet train and reductionSet test from the condition attribute set C in the sample decision table T′ train and T′ test respectively, and obtain the reduced training and test sample data sets RT′ train and RT′ test respectively ;
步骤6:根据训练样本数据集的特征,确定基因表达式方法的参数,并初始化种群;Step 6: According to the characteristics of the training sample data set, determine the parameters of the gene expression method, and initialize the population;
步骤7:运行基因表达式方法挖掘出训练样本数据的过滤属性与条件属性之间的函数关系式F=f(x1,x2,...,xn);Step 7: Run the gene expression method to mine the functional relationship F=f(x 1 ,x 2 ,...,x n ) between the filter attribute and the condition attribute of the training sample data;
步骤8:根据该函数关系式,带入待过滤测试样本数据进行计算,若得到的函数值F与1之间的误差绝对值小于0.001,则判断该数据为敏感数据,根据过滤规则库,实施阻断不允许其传输;Step 8: According to the functional relational expression, bring in the test sample data to be filtered for calculation, if the absolute value of the error between the obtained function value F and 1 is less than 0.001, then it is judged that the data is sensitive data, and according to the filtering rule base, implement Blocking does not allow its transmission;
步骤9:电力业务数据过滤结束。Step 9: The power service data filtering ends.
根据本发明的一个方面,采用一种基于粗糙集和基因表达式的电力业务数据过滤方法,主要用于解决在终端和网络传输过程中,电力业务敏感数据的防泄漏问题,通过使用本发明中提出的方法可有效实现电力业务数据的识别过滤,阻止其在终端和网络传输过程中的泄露,从而提高电力业务敏感数据的安全性。According to one aspect of the present invention, a power service data filtering method based on rough sets and gene expressions is adopted, which is mainly used to solve the leakage prevention problem of sensitive data of power services in the process of terminal and network transmission. The proposed method can effectively realize the identification and filtering of power business data, prevent its leakage during terminal and network transmission, and improve the security of sensitive power business data.
下面给出具体的说明。A specific description is given below.
数据属性约简控制器首先通过数据预处理器预处理电力业务训练和测试数据后,在电力业务训练数据条件属性较多的情况下,如果不能有效地对其条件属性进行约简,势必会影响到数据过滤的性能,从而加大敏感电力业务数据泄露的风险。故在数据属性约简控制器中引入了基于粗糙集的属性约简方法,首先针对预处理后的电力业务训练和测试数据集构建相应的样本决策表,然后分别判断训练和测试样本决策表是否协调,若不协调,则将训练和测试样本决策表分别分成相应的协调的样本决策表以及不协调的样本决策表,并将不协调的样本决策表中的所有条件属性分别加入到最终的属性约简集合中;其次分别针对协调的样本决策表条件属性集合中的每一个条件属性c,分别通过判断其条件属性相对于决策属性的正域是否等于对应的样本决策表中的条件属性集去掉c后相对于决策属性的正域,若相等,则表示该样本决策表中的条件属性c可约简,并将该条件属性c加入到对应的属性约简集合中。最后分别将协调的样本决策表中条件属性集合去掉对应的属性约简集合,从而最终分别得到约简后的训练和测试样本数据集。The data attribute reduction controller first preprocesses the power service training and test data through the data preprocessor, and if there are many condition attributes in the power service training data, if the condition attributes cannot be effectively reduced, it will inevitably affect To the performance of data filtering, thereby increasing the risk of data leakage of sensitive power business. Therefore, an attribute reduction method based on rough sets is introduced in the data attribute reduction controller. First, the corresponding sample decision table is constructed for the preprocessed power service training and test data sets, and then the training and test sample decision tables are respectively judged whether they are Coordination, if not coordinated, the training and testing sample decision tables are divided into corresponding coordinated sample decision tables and uncoordinated sample decision tables, and all conditional attributes in the uncoordinated sample decision tables are added to the final attribute In the reduction set; secondly, for each condition attribute c in the coordinated sample decision table condition attribute set, respectively, by judging whether the positive domain of the condition attribute relative to the decision attribute is equal to the condition attribute set in the corresponding sample decision table to remove If c is equal to the positive domain of the decision attribute, it means that the conditional attribute c in the sample decision table can be reduced, and the conditional attribute c is added to the corresponding attribute reduction set. Finally, the corresponding attribute reduction sets are removed from the conditional attribute sets in the coordinated sample decision table, so as to finally obtain the reduced training and test sample data sets respectively.
数据过滤控制器通过使用基因表达式方法,增加了电力业务数据过滤的智能性。当电力业务数据在终端和网络中传输时,需要能够及时判断该电力业务数据是否为敏感数据,在数据安全保护系统中仅仅基于关键字匹配是无法满足实际需求的,因为关键字匹配并不能对恶意泄露者更改数据自身属性等行为进行有效的检查,如果恶意泄露者对待传输的敏感数据更改其文件名、文件属性等,那么数据保护系统将无法保证其传输过程中的安全。故在数据过滤控制器中引入基因表达式方法,通过对构建的电力业务训练和测试数据集进行预处理和属性约简后,根据得到的待挖掘的训练样本集的特征,运行基因表达式编程算法,构造过滤函数F与待挖掘的训练样本数据条件属性(x1,x2,...,xn)的函数关系F=f(x1,x2,...,xn),通过该函数关系式,用户可以很方便地根据约简的测试数据集的属性来得到过滤函数值,以此来判断数据是否为敏感数据,最后借助过滤规则库,实施敏感数据的保护。The data filtering controller increases the intelligence of power business data filtering by using the gene expression method. When power business data is transmitted between the terminal and the network, it is necessary to be able to judge whether the power business data is sensitive data in a timely manner. In the data security protection system, only based on keyword matching cannot meet the actual needs, because keyword matching cannot The behavior of malicious leakers changing the attributes of the data itself is effectively checked. If the malicious leaker changes the file name and file attributes of the sensitive data to be transmitted, the data protection system will not be able to guarantee the security of the transmission process. Therefore, the gene expression method is introduced into the data filtering controller. After preprocessing and attribute reduction of the constructed power service training and test data sets, according to the characteristics of the training sample set to be mined, run the gene expression programming Algorithm, construct the functional relationship F=f(x 1 ,x 2 ,...,x n ) between the filter function F and the conditional attributes (x 1 ,x 2 ,...,x n ) of the training sample data to be mined, Through this functional relational expression, users can easily obtain the filter function value according to the attributes of the reduced test data set, so as to judge whether the data is sensitive data, and finally implement the protection of sensitive data with the help of the filter rule library.
在实际应用中,某电力企业拥有带有过滤属性的某电力业务训练数据X和不带过滤属性的测试数据Y,为了建立数据安全保护系统,需要根据电力业务训练数据X的特性,基于基因表达式方法来建立电力业务训练数据X的过滤属性和条件属性之间的函数关系,并通过该函数关系来判断不带过滤属性的电力业务测试数据Y是否为敏感数据,从而借助过滤规则库来决定是否允许该数据Y在网络和终端间进行传输和交换。In practical applications, a power company has a power business training data X with filter attributes and test data Y without filter attributes. In order to establish a data security protection system, it is necessary to base on the characteristics of the power business training data The method is used to establish the functional relationship between the filter attribute and the condition attribute of the electric power business training data X, and through this functional relationship to judge whether the electric power business test data Y without the filter attribute is sensitive data, and then use the filter rule base to determine Whether to allow the data Y to be transmitted and exchanged between the network and the terminal.
其具体的实施方式为:Its specific implementation method is:
(1)根据数据预处理规则库判断训练数据集X和测试数据集Y是否需要进行预处理,若训练数据X和测试数据Y存在属性值缺失、含有噪声数据以及存在字符型数据时,则提出数据预处理请求,然后数据预处理器接受请求后对训练数据X和测试数据Y进行属性值补充、噪声数据消缺以及字符型数据转换成数值型数据等操作,最终分别形成符合数据过滤方法要求的电力业务训练样本数据集和电力业务测试样本数据集;(1) Judge whether training data set X and test data set Y need to be preprocessed according to the data preprocessing rule base. Data preprocessing request, and then the data preprocessor performs operations such as attribute value supplementation, noise data elimination, and character data conversion into numerical data on the training data X and test data Y after receiving the request, and finally form data that meet the requirements of the data filtering method. The power business training sample data set and the power business test sample data set;
(2)根据预处理得到的电力业务训练样本数据和电力业务测试样本数据,分别构建相应的样本决策表,然后判断其是否协调,若不协调则将该样本决策表分解为一个协调的样本决策表和一个不协调的样本决策表;(2) According to the pre-processed power business training sample data and power business test sample data, respectively construct the corresponding sample decision table, and then judge whether it is coordinated, if not, decompose the sample decision table into a coordinated sample decision table and an incongruous sample decision table;
(3)对于协调的样本决策表中的每一个条件属性,通过计算其条件属性集相对于决策属性集的正域是否等于对应的样本决策表中的条件属性集去掉一个条件属性c后相对于决策属性集的正域,若相等,则表示该样本决策表中的条件属性c可约简,并将该条件属性c加入到对应的属性约简集合中;(3) For each conditional attribute in the coordinated sample decision table, by calculating whether the positive domain of its conditional attribute set relative to the decision attribute set is equal to the conditional attribute set in the corresponding sample decision table after removing a conditional attribute c relative to If the positive domain of the decision attribute set is equal, it means that the conditional attribute c in the sample decision table can be reduced, and the conditional attribute c is added to the corresponding attribute reduction set;
(4)分别将训练和测试样本决策表中条件属性集合去掉对应的属性约简集合,分别得到约简后的训练和测试样本数据集RT′train和RT′test;(4) Remove the corresponding attribute reduction sets from the condition attribute sets in the training and test sample decision tables, respectively, and obtain the reduced training and test sample data sets RT′ train and RT′ test respectively;
(5)根据训练样本数据集RT′train的特征,确定基因表达式方法的参数,并初始化种群;(5) According to the characteristics of the training sample data set RT′ train , determine the parameters of the gene expression method and initialize the population;
(6)运行基因表达式方法挖掘出训练样本数据的过滤属性与条件属性之间的函数关系式;(6) Run the gene expression method to mine the functional relationship between the filter attribute and the condition attribute of the training sample data;
(7)根据该函数关系式,带入待过滤测试样本数据进行计算,若得到的函数值F与1之间的误差绝对值小于0.001,则判断该数据为敏感数据,根据过滤规则库,实施阻断不允许其传输。整个数据过滤过程结束。(7) According to the function relational expression, bring in the test sample data to be filtered for calculation, if the absolute value of the error between the obtained function value F and 1 is less than 0.001, then it is judged that the data is sensitive data, and according to the filtering rule base, implement Blocking does not allow its transmission. The entire data filtering process is over.
虽然已在具体实施方案中描述了本发明的实施方案及其各种功能组件,但是应当理解,可以用硬件、软件、固件、中间件或它们的组合来实现本发明的实施方案,并且本发明的实施方案可以用在多种系统、子系统、组件或其子组件中。当用软件或固件来实现时,本发明的单元是用于执行必要任务的指令/代码段。程序或代码段可被存储在机器可读介质(例如,处理器可读介质或计算机程序产品)中,或者在传输介质或通信链路中,通过包含在载波或由载波调制的信号中的计算机数据信号来传输。机器可读介质可以包括可存储或传输机器(例如,处理器、计算机等)可读并可执行形式的信息的任何介质。机器可读介质的例子包括电子线路、半导体存储器件、ROM、闪存、可擦除可编程ROM(EPROM)、软盘、压缩盘(CD-ROM)、光盘、硬盘、光纤介质、射频(RF)链路等。计算机数据信号可以包括可在传输介质上传播的任何信号,所述传输介质例如是电子网络信道、光纤、空气、电磁介质、射频(RF)链路、条形码等。代码段可以经由因特网、企业内部网等网络来下载。Although the embodiments of the present invention and their various functional components have been described in specific embodiments, it should be understood that the embodiments of the present invention can be implemented in hardware, software, firmware, middleware or a combination thereof, and the present invention Embodiments of the can be used in a variety of systems, subsystems, components, or subassemblies thereof. When implemented in software or firmware, the elements of the invention are the instructions/code segments for performing the necessary tasks. The program or code segments can be stored on a machine-readable medium (e.g., a processor-readable medium or a computer program product), or in a transmission medium or a communication link by a computer embodied in a carrier wave or a signal modulated by a carrier wave. data signal to transmit. A machine-readable medium may include any medium that can store or transmit information in a form readable and executable by a machine (eg, processor, computer, etc.). Examples of machine-readable media include electronic circuits, semiconductor memory devices, ROM, flash memory, erasable programmable ROM (EPROM), floppy disks, compact disks (CD-ROMs), optical disks, hard disks, fiber optic media, radio frequency (RF) links Road and so on. A computer data signal may include any signal that can travel over a transmission medium, such as an electronic network channel, fiber optics, air, electromagnetic media, radio frequency (RF) links, bar codes, and the like. The code segments can be downloaded via a network such as the Internet, an intranet, or the like.
虽然本发明已经详细的示出并描述了一个相关且特定的实施范例参考,但本领域的技术人员应该能够理解,在不背离本发明的精神和范围内可以在形式上和细节上做出各种改变。这些改变都将落入本发明的权利要求所要求保护的范围。Although the present invention has been shown and described in detail with reference to a specific embodiment, it should be understood by those skilled in the art that various changes in form and details may be made without departing from the spirit and scope of the present invention. kind of change. These changes will all fall within the scope of protection required by the claims of the present invention.
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210335416.XA CN102915423B (en) | 2012-09-11 | 2012-09-11 | A kind of power business data filtering system based on rough set and gene expression and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210335416.XA CN102915423B (en) | 2012-09-11 | 2012-09-11 | A kind of power business data filtering system based on rough set and gene expression and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102915423A true CN102915423A (en) | 2013-02-06 |
CN102915423B CN102915423B (en) | 2016-01-20 |
Family
ID=47613786
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210335416.XA Expired - Fee Related CN102915423B (en) | 2012-09-11 | 2012-09-11 | A kind of power business data filtering system based on rough set and gene expression and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102915423B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103297302A (en) * | 2013-05-07 | 2013-09-11 | 河北旭辉电气股份有限公司 | Digital substation Ethernet data processing device |
CN104750813A (en) * | 2015-03-30 | 2015-07-01 | 浪潮集团有限公司 | Data cleaning method based on data reduction model |
CN106156046A (en) * | 2015-03-27 | 2016-11-23 | 中国移动通信集团云南有限公司 | A kind of informatization management method, device, system and analytical equipment |
CN107679089A (en) * | 2017-09-05 | 2018-02-09 | 全球能源互联网研究院 | A kind of cleaning method for electric power sensing data, device and system |
CN108062363A (en) * | 2017-12-05 | 2018-05-22 | 南京邮电大学 | A kind of data filtering method and system towards active power distribution network |
CN109978715A (en) * | 2017-12-28 | 2019-07-05 | 北京南瑞电研华源电力技术有限公司 | User side distributed generation resource Data Reduction method and device |
CN111222139A (en) * | 2020-02-24 | 2020-06-02 | 南京邮电大学 | An effective identification method of smart grid data anomalies based on GEP optimization |
CN113449060A (en) * | 2021-06-29 | 2021-09-28 | 金陵科技学院 | Geographic big data security risk assessment method based on mixed gene expression programming |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101706883A (en) * | 2009-11-09 | 2010-05-12 | 北京航空航天大学 | Data mining method and device |
CN102457893A (en) * | 2010-10-26 | 2012-05-16 | 中国移动通信集团公司 | Data processing method and equipment |
-
2012
- 2012-09-11 CN CN201210335416.XA patent/CN102915423B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101706883A (en) * | 2009-11-09 | 2010-05-12 | 北京航空航天大学 | Data mining method and device |
CN102457893A (en) * | 2010-10-26 | 2012-05-16 | 中国移动通信集团公司 | Data processing method and equipment |
Non-Patent Citations (5)
Title |
---|
刘发升等: "一种基于粗糙集的新的数据预处理算法", 《计算机工程与应用》, 1 May 2005 (2005-05-01), pages 177 - 179 * |
吴为英: "基于粗糙集理论的商业数据挖掘", 《中国优秀学位论文全文数据库》, 30 April 2003 (2003-04-30) * |
李文波等: "基于核方法的敏感信息过滤的研究", 《通信学报》, 30 April 2008 (2008-04-30) * |
段磊等: "基因表达式编程ORF过滤算子的设计和实现", 《四川大学学报》, 30 November 2007 (2007-11-30) * |
黄容伟等: "基于粗糙集理论的数据预处理", 《广西师范学院学报》, vol. 23, no. 4, 31 December 2006 (2006-12-31), pages 87 - 92 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103297302A (en) * | 2013-05-07 | 2013-09-11 | 河北旭辉电气股份有限公司 | Digital substation Ethernet data processing device |
CN103297302B (en) * | 2013-05-07 | 2016-08-24 | 河北旭辉电气股份有限公司 | Digital transformer substation Ethernet data processing means |
CN106156046A (en) * | 2015-03-27 | 2016-11-23 | 中国移动通信集团云南有限公司 | A kind of informatization management method, device, system and analytical equipment |
CN106156046B (en) * | 2015-03-27 | 2021-03-30 | 中国移动通信集团云南有限公司 | Information management method, device and system and analysis equipment |
CN104750813A (en) * | 2015-03-30 | 2015-07-01 | 浪潮集团有限公司 | Data cleaning method based on data reduction model |
CN107679089A (en) * | 2017-09-05 | 2018-02-09 | 全球能源互联网研究院 | A kind of cleaning method for electric power sensing data, device and system |
CN107679089B (en) * | 2017-09-05 | 2021-10-15 | 全球能源互联网研究院 | A cleaning method, device and system for power sensing data |
CN108062363A (en) * | 2017-12-05 | 2018-05-22 | 南京邮电大学 | A kind of data filtering method and system towards active power distribution network |
CN109978715A (en) * | 2017-12-28 | 2019-07-05 | 北京南瑞电研华源电力技术有限公司 | User side distributed generation resource Data Reduction method and device |
CN111222139A (en) * | 2020-02-24 | 2020-06-02 | 南京邮电大学 | An effective identification method of smart grid data anomalies based on GEP optimization |
CN111222139B (en) * | 2020-02-24 | 2022-06-03 | 南京邮电大学 | GEP optimization-based smart power grid data anomaly effective identification method |
CN113449060A (en) * | 2021-06-29 | 2021-09-28 | 金陵科技学院 | Geographic big data security risk assessment method based on mixed gene expression programming |
Also Published As
Publication number | Publication date |
---|---|
CN102915423B (en) | 2016-01-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102915423B (en) | A kind of power business data filtering system based on rough set and gene expression and method | |
US20240289647A1 (en) | Knowledge graph processing | |
CN112235283B (en) | A network attack assessment method for power industrial control system based on vulnerability description attack graph | |
CN118656870A (en) | A method and system for secure access management of enterprise sensitive data | |
Nandurge et al. | Analyzing road accident data using machine learning paradigms | |
WO2020038100A1 (en) | Feature relationship recommendation method and apparatus, computing device and storage medium | |
CN117540106B (en) | Social activity recommendation method and device for protecting multi-mode data privacy | |
CN103605992A (en) | Sensitive image recognizing method in interaction of inner and outer power networks | |
Liu et al. | A dynamic updating method of digital twin knowledge model based on fused memorizing-forgetting model | |
CN114880635A (en) | User security level identification method, system, electronic device and medium for model constructed by integrated lifting tree | |
He et al. | Cat: A causal graph attention network for trimming heterophilic graphs | |
CN113268370A (en) | Root cause alarm analysis method, system, equipment and storage medium | |
CN113902052B (en) | A network anomaly detection method for distributed denial of service attacks based on AE-SVM model | |
Jiang et al. | On spectral graph embedding: A non-backtracking perspective and graph approximation | |
CN117978545B (en) | Network security risk assessment method, system, equipment and medium based on large model | |
CN117896121B (en) | Anomaly detection method and system based on industrial network user behavior learning model | |
CN118760844A (en) | Automatic optimization method, device, equipment and storage medium of intelligent computing platform based on AutoEdge | |
CN118780455A (en) | Vehicle path planning method and device | |
Hu et al. | Rebalancing Strategy for Bike‐Sharing Systems Based on the Model of Level of Detail | |
CN110582091B (en) | Method and device for locating wireless quality problems | |
CN117786232A (en) | Software management method in software platform and software platform | |
CN115392615B (en) | Data missing value completion method and system for generating countermeasure network based on information enhancement | |
CN114168268B (en) | Smart substation distribution data collection and fusion method and system based on container technology | |
Arafat et al. | A conceptual anonymity model to ensure privacy for sensitive network data | |
Anwar et al. | Review literature performance: Quality of service from internet of things for transportation system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20160120 Termination date: 20160911 |