CN110766272A - Power business collaborative classification method and system based on ID3 decision tree algorithm - Google Patents

Power business collaborative classification method and system based on ID3 decision tree algorithm Download PDF

Info

Publication number
CN110766272A
CN110766272A CN201910860591.2A CN201910860591A CN110766272A CN 110766272 A CN110766272 A CN 110766272A CN 201910860591 A CN201910860591 A CN 201910860591A CN 110766272 A CN110766272 A CN 110766272A
Authority
CN
China
Prior art keywords
index
decision tree
power
sample set
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910860591.2A
Other languages
Chinese (zh)
Inventor
司为国
朱炯
张博
张玉鹏
赵开
郭小茜
张�浩
俞成彪
严志毅
闫宇铎
曹杰人
金仁云
宋惠忠
李骏
柳志军
唐鸣
张益军
施萌
张俊
侯伟宏
钟晓红
何可人
高瑾
吴颖
陈晨
厉律阳
徐国锋
章晨璐
朱小炜
孙远
向新宇
华玫
沈志强
朱坚
孙建军
仲从杰
毛无穷
刘磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Zhejiang Zhongxin Electric Power Engineering Construction Co Ltd
Original Assignee
Hangzhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd filed Critical Hangzhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority to CN201910860591.2A priority Critical patent/CN110766272A/en
Publication of CN110766272A publication Critical patent/CN110766272A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Data Mining & Analysis (AREA)
  • Educational Administration (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Operations Research (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Water Supply & Treatment (AREA)
  • Public Health (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an ID3 decision tree algorithm-based power business collaborative classification method and a system, wherein the method comprises the following steps: acquiring a power business collaborative correlation database, and extracting a sample set S from the power business collaborative correlation database; extracting an index set A, wherein the index set A contains indexes used for evaluating business collaboration data; calculating the information entropy and the information gain of each index on the sample set S based on an ID3 algorithm to select proper root nodes and intermediate nodes; constructing a decision tree according to the selected root node; evaluating and selecting each business cooperation scheme based on the decision tree; a corresponding system is also disclosed. The method adopts the information entropy and the information gain for calculation, has relatively small calculated amount and high classification accuracy, is applied to cooperative data calculation and analysis of services such as power outsourcing and the like, generates the decision tree by selecting the optimal division characteristics as the nodes and classifies the data, has quick and good classification effect, and effectively realizes cooperative management of the services such as power outsourcing and the like.

Description

Power business collaborative classification method and system based on ID3 decision tree algorithm
Technical Field
The invention relates to the technical field of data processing of power systems, in particular to a power service collaborative classification method and system based on an ID3 decision tree algorithm.
Background
With the construction requirement of the ubiquitous power Internet of things, the nation is continuously accelerated to develop steps and promote the reform of the power system, and power supply enterprises in China introduce advanced data technologies such as big data and artificial intelligence, so that the strategic deployment of the power supply protection navigation is realized for establishing top-grade energy Internet enterprises in the world with global competitiveness.
In order to achieve intensive management of human resources, finance and materials to meet the requirements of new forms, power supply enterprises need to outsource part of services sometimes so as to mobilize social resources. However, these business processes may have certain risks in terms of establishment, tendering, contracts, payments, and closing.
Therefore, the power supply enterprise needs to collect the composition, threshold value, analysis rule, typical case, data information and the like of the outsourcing business risk factor, research the current risk situation of the outsourcing of the power supply enterprise business, standardize the outsourcing management of the power supply enterprise business, establish a collaborative supervision system of the outsourcing business, and effectively cover the business risk in the whole domain, thereby improving the production and management efficiency and practically ensuring the safe and reliable supply and high-quality service of electric power.
In a collaborative system of outsourcing business, complex association in time, flow and relationship exists among various departments, data, flow, data and personnel such as marketing, production management, bidding, finance and the like, and the network is a complex network with scale-free network characteristics, and is typically characterized in that most nodes in the network are connected with few nodes, and few nodes are connected with very many nodes. Therefore, original sample data is cluttered, and if classification and evaluation cannot be effectively performed, an optimized power cooperative working scheme cannot be obtained, and reduction of working efficiency and benefit is likely to be caused.
Disclosure of Invention
The invention provides a power business collaborative classification method and system based on an ID3 decision tree algorithm to solve the technical problems.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
according to a first aspect of the embodiments of the present invention, there is provided a method for classifying outsourced items of an electric power system based on an ID3 decision tree algorithm, including the following steps:
step 101, acquiring a power business cooperation related database, and extracting a sample set S from the power business cooperation related database;
102, extracting an index set A, wherein the index set A contains indexes used for evaluating business cooperative data;
103, calculating the information entropy and the information gain of each index on the sample set S based on an ID3 algorithm to select a proper root node and a proper middle node;
104, constructing a decision tree according to the selected root node;
and 105, evaluating each business cooperation scheme based on the decision tree, and selecting according to requirements.
Preferably, the step 103 specifically includes:
step 1031, calculating the information entropy and the information gain of each index for the sample set S based on the ID3 algorithm;
step 1032, calculating other data except the training data set S by using the information entropy and information gain test obtained in step 1031;
step 1033, select the appropriate root node and intermediate node after the comparison.
Preferably, the process of calculating the information entropy and the information gain of each index for the sample set S based on the ID3 algorithm is as follows:
selecting an index C in the index set A, wherein the index C has m possible values C ═ C1,C2,...,CmC in training set SiFrequency of occurrence is piWherein i is more than or equal to 1 and less than or equal to m, and m and i are integers, the information entropy of the training set S is as follows:
Figure BDA0002199626970000021
another index B is selected as a root node, and the index B is used for dividing the sample set S into sample subsets Sj(j ═ 1, 2.. times, k), the information gain of S divided by the index B is:
Gain(S,B)=Entropy(S)-EntropyB(S) (2)
the information entropy of the sample subset after dividing S according to the index B is as follows:
Figure BDA0002199626970000031
wherein | SjL is the sample subset SjThe number of samples contained in the sample set S, | S | is the number of samples contained in the sample set S, j is more than or equal to 1 and less than or equal to k, and k and j are integers.
Preferably, the following steps are further included between step 101 and step 102:
step 111, judging whether all sample data in the sample set S are of the same type, if so, turning to step 112, otherwise, executing step 102;
step 112, selecting the class to which all the sample data belong as a root node, and going to step 104.
Preferably, the following steps are further included between step 102 and step 103:
step 121, determining whether the sample set S and the index set a are empty, if yes, going to step 123, otherwise, executing step 103;
and step 123, selecting the class with the highest proportion in the sample set S as a root node, and jumping to the step 104.
Preferably, the following steps are further included between step 121 and step 103:
step 122, determining whether the values of all the indexes in the index set a are unique, if yes, going to step 123, otherwise, executing step 103.
Preferably, the following steps are further included between step 132 and step 133:
step 10321, determine whether there is an error classification, if yes, go back to step 131, otherwise, go to step 1033.
Preferably, between the step 103 and the step 104, the following steps may be further included:
step 131, determining whether all indexes have been traversed, if not, going to step 132, if yes, executing step 104;
and 132, eliminating the traversed indexes, generating a sample subset S without the traversed indexes, and jumping to the step 101.
According to a second aspect of the embodiments of the present invention, there is provided an ID3 decision tree algorithm-based power system service collaborative classification system, including:
the system comprises a sample set extraction module, a data processing module and a data processing module, wherein the sample set extraction module is used for acquiring a power business cooperation related database and extracting a sample set S from the power business cooperation related database;
the index set extraction module is used for extracting an index set A, and the index set A contains indexes used for evaluating business collaborative data;
the information entropy and information gain calculation module is used for calculating information entropy and information gain of each index of the sample set S based on an ID3 algorithm so as to select proper root nodes and middle nodes;
the decision tree construction module is used for constructing a decision tree according to the selected root node;
and the scheme selection module is used for evaluating each business cooperation scheme based on the decision tree and selecting according to the requirement.
Preferably, the system further comprises a user interaction module, which is used for visualization display of the data after the decision tree is constructed and classified and configuration of the interface and the application program.
Compared with the prior art, the method adopts the information entropy and the information gain for calculation, has relatively small calculated amount and high classification accuracy, is applied to cooperative data calculation and analysis of services such as power outsourcing and the like, generates the decision tree by selecting the optimal division characteristics as the nodes and classifies the data, has quick and good classification effect, and effectively realizes cooperative management of the services such as power outsourcing and the like.
Drawings
FIG. 1 is a flowchart of a collaborative classification method for power services based on ID3 decision tree algorithm according to the present invention;
FIG. 2 is another flow chart of the power service collaborative classification method based on the ID3 decision tree algorithm according to the present invention;
FIG. 3 is a flowchart of step 103 of the power service collaborative classification method based on the ID3 decision tree algorithm according to the present invention;
fig. 4 is a structural block diagram of the power service collaborative classification system based on the ID3 decision tree algorithm according to the present invention.
In the figure, 201-a sample set extraction module, 202-an index set extraction module, 203-information entropy and information gain calculation, 204-decision tree construction, 205-a scheme selection module and 206-a user interaction module.
Detailed Description
The present invention will be described in detail below with reference to specific embodiments shown in the drawings. These embodiments are not intended to limit the present invention, and structural, methodological, or functional changes made by those skilled in the art according to these embodiments are included in the scope of the present invention.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
As shown in fig. 1, a power service collaborative classification method based on an ID3 decision tree algorithm includes the following specific steps.
Step 101, obtaining a power business cooperation related database, and extracting a sample set S from the power business cooperation related database.
The sample set S can be randomly selected from a collaborative correlation database of the power service as a training data set to ensure that the data has no specificity and avoid that the data is too large and is not easy to converge. For example, in the risk policy system, the category of the data in the sample set S may be marketing, Production Management (PMS), bidding, finance, and the like, and is specifically determined by the values of all samples. For example, the sample set S, all samples have multiple values under a certain category, and the category can be set to multiple types.
Step 102, extracting an index set A, wherein the index set A contains indexes used for evaluating business collaboration data.
Index set a ═ a here1,A2,...,AnN is an integer, n indexes such as marketing, production management, bidding, finance and the like can be preset, and m values of each index can exist in the sample set S, wherein m is an integer larger than or equal to zero. For example, the marketing index may have a value of high sales, low sales, high advertisement popularity, low advertisement popularity, high market share, low market share, good customer satisfaction, poor customer satisfaction, etc., the production management index may have a value of good production quality, poor production quality, etc., the bid index may have a value of high bid amount, etc., and the financial index may have a value of good financial status, poor financial status, etc. Finally, the data can be classified and summarized to obtain the good and the poor of the power business coordination degree.
And 103, calculating the information entropy and the information gain of each index on the sample set S based on the ID3 algorithm to select proper root nodes and middle nodes.
Aiming at various indexes, the information entropy is used for reflecting the chaos degree of the index distribution, then the information gain of other indexes is calculated according to the calculated information entropy, and the maximum information gain is selected to divide sub-nodes. The ID3 algorithm selects indexes by using information gain, the calculated amount is relatively small, the accuracy is high, the realization is simple, pruning operation optimization can be carried out in the tree construction process, the index attribute with the maximum information gain is selected when the segmentation attribute is selected, and therefore the optimal segmentation index is selected to generate important nodes such as root nodes.
And 104, constructing a decision tree according to the selected root node. After the selected root node and corresponding branches such as child nodes are obtained, a complete decision tree can be constructed, and real-time display can be performed through interactive operation.
And 105, evaluating each business cooperation scheme based on the decision tree, and selecting according to requirements.
Based on the decision tree constructed above, the most appropriate business cooperation scheme in aspects of marketing, production management system PMS, bid, finance and the like can be selected from multiple indexes of marketing, production management system PMS, bid, finance and the like to data and data of related outsourcing cooperation business, a risk assessment system is established through intelligent recognition such as graph, mode and voice recognition of basic data and further deep learning such as quantification, machine, knowledge map and big data topology and the like, risk factors are analyzed, data classification is effectively achieved, auxiliary decision is provided, and outsourcing business risks are effectively avoided.
As shown in fig. 2, the following steps may also be included between step 101 and step 102:
step 111, judging whether all sample data in the sample set S are of the same type, if so, turning to step 112;
step 112, selecting the class to which all the sample data belong as a root node, and going to step 104.
When all sample data in the sample set S only have the same class, the class to which all sample data belong can be directly set as the root node without further information entropy and information gain.
The following steps can be further included between step 102 and step 103:
step 121, determining whether the sample set S and the index set a are empty, if yes, going to step 123, otherwise, executing step 122;
step 122, judging whether the values of all indexes in the index set A are unique, if so, turning to step 123, otherwise, executing step 103;
and step 123, selecting the class with the highest proportion in the sample set S as a root node, and jumping to the step 104.
When the sample set S and the index set a are empty, without further information entropy and information gain, the class to which all sample data belongs may be directly set as a root node, for example, the class is "unaffected data class"; when the values of all indexes in the index set A are unique, the created decision tree is not branched, and one index can be set at will by the root node and the child nodes. The above steps 121 and 122 may exist separately, or may be performed sequentially in two steps as described above; if step 121 and step 122 are present separately, the process proceeds directly to step 103 if no, and proceeds to step 123 if yes.
Thus, the root node selected in step 104 may be determined according to step 103, or may be determined according to step 111, step 121, or step 122, and then intermediate nodes and leaf nodes are determined based on this, so as to construct a decision tree.
Between the step 103 and the step 104, the following steps may be further included:
step 131, determining whether all indexes have been traversed, if not, going to step 132, if yes, executing step 104;
and 132, eliminating the traversed indexes, generating a sample subset S without the traversed indexes, and jumping to the step 101.
And circulating the operation until all indexes are traversed once and the index set A is empty.
Step 103 is described in further detail below, and as shown in fig. 3, specifically includes:
step 1031, calculating the information entropy and the information gain of each index for the sample set S based on the ID3 algorithm;
step 1032, calculating other data except the training data set S by using the information entropy and information gain test obtained in step 1031;
step 1033, select the appropriate root node and intermediate node after the comparison.
In step 1031, the process of calculating the information entropy and the information gain of each index for the sample set S based on the ID3 algorithm is as follows:
selecting an index setIn the index C in A, the index C has m possible values C ═ C1,C2,...,CmC in training set SiFrequency of occurrence is piWherein i is more than or equal to 1 and less than or equal to m, and m and i are integers, the information entropy of the training set S is as follows:
Figure BDA0002199626970000081
another index B is selected as a root node, and the index B is used for dividing the sample set S into sample subsets Sj(j ═ 1, 2.. times, k), the information gain of S divided by the index B is:
Gain(S,B)=Entropy(S)-EntropyB(S) (2)
the information entropy of the sample subset after dividing S according to the index B is as follows:
Figure BDA0002199626970000082
wherein, | SjL is the sample subset SjThe number of samples contained in the sample set S, | S | is the number of samples contained in the sample set S, j is more than or equal to 1 and less than or equal to k, and k and j are integers.
The information Gain (S, B) of dividing S by the index B is obtained by subtracting the sample subset S divided by the index B from the entropy of the sample set SjThe entropy of (a).
For example, the index power business cooperation degree is selected as the final classification, the sample set S has 15 samples, 8 samples belonging to the power business cooperation degree are good, and 7 samples belonging to the power business cooperation degree are poor, so that Encopy (p)1,p2,...,pi) When a value of 0 is taken, the information entropy is as follows:
one index B in the index set A is marketing, wherein the index 'marketing' takes values as follows: { high sales, high advertisement popularity, and small market share }. If the index is used to divide the sample set S, 3 sample subsets can be obtained, and the division is carried outRespectively recording as: s1(marketing ═ sales are high), S2(marketing is highly famous for advertisement), S3(marketing ═ market share is small).
Setting S1Comprises 6 samples, wherein the category is the proportion of good power service synergy isThe proportion of the power service synergy difference is
Figure BDA0002199626970000085
S2Comprises 4 samples, wherein the category is the proportion of good power service synergy is
Figure BDA0002199626970000086
The proportion of the power service synergy difference is
Figure BDA0002199626970000087
S3The method comprises 5 samples, wherein the category is the proportion of good power service synergy
Figure BDA0002199626970000088
The proportion of the power service synergy difference is
Figure BDA0002199626970000089
The information entropy of three branch points is:
Figure BDA00021996269700000810
Figure BDA0002199626970000091
Figure BDA0002199626970000092
the information entropy of the sample subset after dividing S by the index B is:
Figure BDA0002199626970000093
the information gain for dividing S by the index B is:
Gain(S,B)=Entropy(S)-EntropyB(S)=0.997-0.824=0.173
the information gain is the difference value between the impurity degree (entropy) of the sample data set before division and the impurity degree (entropy) of the sample set after division, and the larger the information gain is, the purer the sample subset after division by using the index B is, the more the classification is facilitated. And obtaining the information entropy and the information gain under other indexes in the same way.
In step 1032, data other than the training data set S is calculated using the information entropy and information gain test obtained in step 1031.
In addition, the following steps may also be included between step 1032 and step 1033:
step 10321, determine whether there is an error classification, if yes, return to step 1031, otherwise execute step 1033. Here, the judgment basis may be determined by the classification result of the training data set S.
Corresponding to the foregoing embodiment of the power system service collaborative classification method based on the ID3 decision tree algorithm, the present invention also provides an embodiment of a power system service collaborative classification system based on the ID3 decision tree algorithm.
Referring to fig. 4, a block diagram of an embodiment of the power system service collaborative classification system based on ID3 decision tree algorithm according to the present invention is shown, the system includes:
the sample set extraction module 201 is configured to obtain a power service collaborative correlation database, and extract a sample set S therefrom;
an index set extraction module 202, configured to extract an index set a, where the index set a contains an index for evaluating business collaboration data;
an information entropy and information gain calculation module 203, configured to calculate information entropy and information gain for each index of the sample set S based on an ID3 algorithm, so as to select a suitable root node and an appropriate intermediate node;
a decision tree construction module 204, configured to construct a decision tree according to the selected root node;
and the scheme selection module 205 is configured to evaluate each business cooperation scheme based on the decision tree and select the business cooperation scheme according to requirements.
Further, the sample set extraction module 201 is further configured to determine whether all sample data in the sample set S is of the same class, and if so, select the class to which all sample data belongs as a root node; the index set extraction module 202 is further configured to determine whether the sample set S and the index set a are empty, and if yes, select the class with the highest proportion in the sample set S as a root node; the index set extraction module 202 is further configured to determine whether values of all indexes in the index set a are unique, and if so, select the class with the highest proportion in the sample set S as the root node.
The information entropy and information gain calculating module 203 may specifically include:
the information entropy and information gain calculation submodule calculates the information entropy and information gain of each index on the sample set S based on the ID3 algorithm;
the test calculation submodule calculates other data except the training data set S by using the calculated information entropy and information gain test;
and selecting a submodule by the node, and selecting a proper root node and a proper middle node after comparison.
And the test calculation submodule is also used for judging whether error classification exists or not, and if so, the error classification is fed back to the information entropy and information gain calculation submodule.
With regard to the system in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
In particular, the power system business collaborative classification system based on the ID3 decision tree algorithm may further include a user interaction module 206, which is used for visualization display of data and configuration of interfaces and applications after decision tree construction and classification.
For the system embodiment, since it basically corresponds to the method embodiment, reference may be made to the partial description of the method embodiment for relevant points. The above-described system embodiments are merely illustrative, and some or all of the modules may be selected according to actual needs to achieve the purpose of the disclosed solution. One of ordinary skill in the art can understand and implement it without inventive effort.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (10)

1. The power business collaborative classification method based on the ID3 decision tree algorithm is characterized by comprising the following steps:
step 101, acquiring a power business cooperation related database, and extracting a sample set S from the power business cooperation related database;
102, extracting an index set A, wherein the index set A contains indexes used for evaluating business cooperative data;
103, calculating the information entropy and the information gain of each index on the sample set S based on an ID3 algorithm to select a proper root node and a proper middle node;
104, constructing a decision tree according to the selected root node;
and 105, evaluating and selecting each business cooperation scheme based on the decision tree.
2. The ID3 decision tree algorithm-based power service collaborative classification method according to claim 1, wherein the step 103 specifically includes:
step 1031, calculating the information entropy and the information gain of each index for the sample set S based on the ID3 algorithm;
step 1032, calculating other data except the training data set S by using the information entropy and information gain test obtained in step 1031;
step 1033, select the appropriate root node and intermediate node after the comparison.
3. The power system traffic collaborative classification system based on the ID3 decision tree algorithm according to claim 2, wherein the process of calculating the information entropy and the information gain of each index for the sample set S based on the ID3 algorithm is as follows:
selecting an index C in the index set A, wherein the index C has m possible values C ═ C1,C2,…,CmC in training set SiFrequency of occurrence is piWherein i is more than or equal to 1 and less than or equal to m, and m and i are integers, the information entropy of the training set S is as follows:
Figure FDA0002199626960000011
another index B is selected as a root node, and the index B is used for dividing the sample set S into sample subsets Sj(j ═ 1,2, …, k), the information gain for S divided by the index B is:
Gain(S,B)=Entropy(S)-EntropyB(S) (2)
the information entropy of the sample subset after dividing S according to the index B is as follows:
Figure FDA0002199626960000021
wherein | SjL is the sample subset SjThe number of samples contained in the sample set S, | S | is the number of samples contained in the sample set S, j is more than or equal to 1 and less than or equal to k, and k and j are integers.
4. The ID3 decision tree algorithm-based power business cooperative classification method according to claim 1, wherein the method between the step 101 and the step 102 further comprises the following steps:
step 111, judging whether all sample data in the sample set S are of the same type, if so, turning to step 112, otherwise, executing step 102;
step 112, selecting the class to which all the sample data belong as a root node, and going to step 104.
5. The ID3 decision tree algorithm-based power traffic collaborative classification method according to claim 1, wherein the method between step 102 and step 103 further comprises the following steps:
step 121, determining whether the sample set S and the index set a are empty, if yes, going to step 123, otherwise, executing step 103;
and step 123, selecting the class with the highest proportion in the sample set S as a root node, and jumping to the step 104.
6. The ID3 decision tree algorithm-based power traffic collaborative classification method according to claim 5, wherein the method between the step 121 and the step 103 further comprises the following steps:
step 122, determining whether the values of all the indexes in the index set a are unique, if yes, going to step 123, otherwise, executing step 103.
7. The ID3 decision tree algorithm-based power traffic collaborative classification method according to claim 2, wherein the method between step 1032 and step 1033 further comprises the following steps:
step 10321, determine whether there is an error classification, if yes, return to step 1031, otherwise execute step 1033.
8. The power traffic cooperative classification method based on the ID3 decision tree algorithm according to any one of claims 1-7, wherein between the step 103 and the step 104, the method further comprises the following steps:
step 131, judging whether all indexes are traversed, if not, turning to step 1032, and if so, executing step 104;
and 132, eliminating the traversed indexes, generating a sample subset S without the traversed indexes, and jumping to the step 101.
9. The power system service collaborative classification system based on the ID3 decision tree algorithm is characterized by comprising the following steps:
the system comprises a sample set extraction module, a data processing module and a data processing module, wherein the sample set extraction module is used for acquiring a power business cooperation related database and extracting a sample set S from the power business cooperation related database;
the index set extraction module is used for extracting an index set A, and the index set A contains indexes used for evaluating business collaborative data;
the information entropy and information gain calculation module is used for calculating information entropy and information gain of each index of the sample set S based on an ID3 algorithm so as to select proper root nodes and middle nodes;
the decision tree construction module is used for constructing a decision tree according to the selected root node;
and the scheme selection module is used for evaluating and selecting each business cooperation scheme based on the decision tree.
10. The power system business collaborative classification system based on the ID3 decision tree algorithm according to claim 1, further comprising a user interaction module for visualization display of data and configuration of interfaces and applications after decision tree construction and classification.
CN201910860591.2A 2019-09-11 2019-09-11 Power business collaborative classification method and system based on ID3 decision tree algorithm Pending CN110766272A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910860591.2A CN110766272A (en) 2019-09-11 2019-09-11 Power business collaborative classification method and system based on ID3 decision tree algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910860591.2A CN110766272A (en) 2019-09-11 2019-09-11 Power business collaborative classification method and system based on ID3 decision tree algorithm

Publications (1)

Publication Number Publication Date
CN110766272A true CN110766272A (en) 2020-02-07

Family

ID=69329451

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910860591.2A Pending CN110766272A (en) 2019-09-11 2019-09-11 Power business collaborative classification method and system based on ID3 decision tree algorithm

Country Status (1)

Country Link
CN (1) CN110766272A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111651579A (en) * 2020-06-03 2020-09-11 腾讯科技(深圳)有限公司 Information query method and device, computer equipment and storage medium
CN112149731A (en) * 2020-09-23 2020-12-29 内蒙古电力(集团)有限责任公司乌海电业局 Power system fault classification method and system based on ID3 algorithm
CN112884700A (en) * 2020-12-11 2021-06-01 武汉光谷航天三江激光产业技术研究院有限公司 Laser cleaning image classification method and device based on decision tree
CN113052269A (en) * 2021-04-29 2021-06-29 上海德衡数据科技有限公司 Intelligent cooperative identification method, system, equipment and medium
CN114881419A (en) * 2022-04-11 2022-08-09 核动力运行研究所 Automatic flow analysis method for nuclear power evaluation data
CN116029613A (en) * 2023-02-17 2023-04-28 国网浙江省电力有限公司 Novel power system index data processing method and platform

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105894177A (en) * 2016-03-25 2016-08-24 国家电网公司 Decision-making-tree-algorithm-based analysis and evaluation method for operation risk of power equipment
CN106022583A (en) * 2016-05-12 2016-10-12 中国电力科学研究院 Electric power communication service risk calculation method and system based on fuzzy decision tree

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105894177A (en) * 2016-03-25 2016-08-24 国家电网公司 Decision-making-tree-algorithm-based analysis and evaluation method for operation risk of power equipment
CN106022583A (en) * 2016-05-12 2016-10-12 中国电力科学研究院 Electric power communication service risk calculation method and system based on fuzzy decision tree

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
吕晓丹等: "基于决策树的信用评价模型及实证研究", 《市场周刊(理论研究)》 *
周艳等: "基于决策属性挑选策略的改进的决策树算法", 《沈阳师范大学学报(自然科学版)》 *
戴小廷等: "基于信息熵的决策树挖掘算法在智能电力营销中的应用", 《郑州轻工业学院学报(自然科学版)》 *
蒲天添: "基于决策树的工程项目管理优化研究", 《现代电子技术》 *
陈燕,李桃迎著: "《数据挖掘与聚类分析》", 30 November 2012, 大连海事大学出版社 *
陈立荣等: "基于决策树的供应商识别在ERP中的应用", 《电力信息化》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111651579A (en) * 2020-06-03 2020-09-11 腾讯科技(深圳)有限公司 Information query method and device, computer equipment and storage medium
CN111651579B (en) * 2020-06-03 2023-05-09 腾讯科技(深圳)有限公司 Information query method, device, computer equipment and storage medium
CN112149731A (en) * 2020-09-23 2020-12-29 内蒙古电力(集团)有限责任公司乌海电业局 Power system fault classification method and system based on ID3 algorithm
CN112884700A (en) * 2020-12-11 2021-06-01 武汉光谷航天三江激光产业技术研究院有限公司 Laser cleaning image classification method and device based on decision tree
CN112884700B (en) * 2020-12-11 2023-08-22 武汉光谷航天三江激光产业技术研究院有限公司 Laser cleaning image classification method and device based on decision tree
CN113052269A (en) * 2021-04-29 2021-06-29 上海德衡数据科技有限公司 Intelligent cooperative identification method, system, equipment and medium
CN114881419A (en) * 2022-04-11 2022-08-09 核动力运行研究所 Automatic flow analysis method for nuclear power evaluation data
CN116029613A (en) * 2023-02-17 2023-04-28 国网浙江省电力有限公司 Novel power system index data processing method and platform

Similar Documents

Publication Publication Date Title
CN110796331A (en) Power business collaborative classification method and system based on C4.5 decision tree algorithm
CN110766272A (en) Power business collaborative classification method and system based on ID3 decision tree algorithm
Qin et al. Blockchain: a carbon-neutral facilitator or an environmental destroyer?
CN108446886A (en) Personnel recruitment system and method based on big data
CN104182474A (en) Method for recognizing pre-churn users
CN109087140A (en) A kind of closed loop target client's recognition methods based on spark big data
Zhang et al. A system for tender price evaluation of construction project based on big data
CN109118155B (en) Method and device for generating operation model
CN105069080B (en) A kind of document retrieval method and system
CN108961031A (en) Realize information processing method, device and the computer readable storage medium of loan examination & approval
CN111967971A (en) Bank client data processing method and device
CN110147389A (en) Account number treating method and apparatus, storage medium and electronic device
Trąpczyński et al. Identification of Linkages between the Competitive Potential and Competitive Position of SMEs Related to their Internationalization Patterns Shortly after the Economic Crisis.
CN112732786A (en) Financial data processing method, device, equipment and storage medium
CN113282623A (en) Data processing method and device
CN105868415B (en) A kind of microblogging real time filtering model based on historical weibo
CN113435713B (en) Risk map compiling method and system based on GIS technology and two-model fusion
Gupta et al. Forecasting macroeconomic variables using large datasets: dynamic factor model versus large-scale BVARs
CN110110962A (en) A kind of task gunz executes the preferred method of team
Fu et al. Expert recommendation in oss projects based on knowledge embedding
Tabak et al. Topological properties of bank networks: the case of Brazil
CN109918482A (en) A kind of Students' Innovation plan of starting an undertaking the system of analysis and appraisal
CN102681979A (en) Content editing intelligent verifying method facing to open knowledge community
Barbazza et al. Consensus modeling in multiple criteria multi-expert real options-based valuation of patents
Yang et al. Study on the application of data mining for customer groups based on the modified ID3 algorithm in the e-commerce

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200930

Address after: 310000, No. 219, Jianguo Middle Road, Shangcheng District, Zhejiang, Hangzhou

Applicant after: HANGZHOU POWER SUPPLY COMPANY, STATE GRID ZHEJIANG ELECTRIC POWER Co.,Ltd.

Applicant after: ZHEJIANG ZHONGXIN ELECTRIC POWER ENGINEERING CONSTRUCTION Co.,Ltd.

Address before: 310000, No. 219, Jianguo Middle Road, Shangcheng District, Zhejiang, Hangzhou

Applicant before: HANGZHOU POWER SUPPLY COMPANY, STATE GRID ZHEJIANG ELECTRIC POWER Co.,Ltd.

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200207