CN107808245A

CN107808245A - Based on the network scheduler system for improving traditional decision-tree

Info

Publication number: CN107808245A
Application number: CN201711015342.0A
Authority: CN
Inventors: 马湧; 孙彦广; 张云贵
Original assignee: Automation Research and Design Institute of Metallurgical Industry
Current assignee: Automation Research and Design Institute of Metallurgical Industry
Priority date: 2017-10-25
Filing date: 2017-10-25
Publication date: 2018-03-16

Abstract

A kind of network scheduler system based on improvement traditional decision-tree, belongs to pipe network dispatching technique field.Hardware includes relational database server, live database server, application server, engineer station.Relational database server is connected with engineer station and application server, and application server is also connected in addition to being connected with relational database server with real-time data base and engineer station, keeps data exchange between three.Application module includes relational database, data acquisition module, scheduling rule result display module, Decision Tree Rule storehouse generation module.Wherein scheduling rule result display module is deployed in engineer station, and Decision Tree Rule library module is deployed in application server, and relational database is deployed in relational database server, and data acquisition module is deployed in real-time data base.Advantage is that scientific and reasonable establishes steam pipe system scheduling rule system, reduces pipe network fluctuation, ensures the operation of pipe network highly effective and safe.

Description

Based on the network scheduler system for improving traditional decision-tree

Technical field

The invention belongs to pipe network dispatching technique field, is especially to provide a kind of based on the pipe network scheduling for improving traditional decision-tree System, the sequencing of steam pipe system scheduling rule is realized, it is rapid, it is scientific.

Background technology

In large-scale joint iron and steel enterprise, vapour system is with large dead time, big inertia, variable element, non-linear, multivariable The complex object of the features such as coupling, in face of such complicated operation conditions, administrative staff are substantially or by production for many years Run the experience command system operation under accumulation, it is difficult to avoid occurring to be vented, degrade situations such as using, cause great wave Take.So necessarily cause scheduling blindness and pipe network operation it is poorly efficient.Simultaneously because the reason such as output, seasonal variety is steamed Vapour dosage can change therewith, and Optimized Operation and the management to vapour system are that relevant enterprise energy efficiency reduces environmental pollution Effective measures.The scheduling rule storehouse based on decision tree is established, with reference to enterprise's scheduling rule storehouse, reasonably optimizing steam pipe system is adjusted Degree, the blindness and hysteresis quality of manual dispatching can be substantially reduced, so as to improve vapour system scheduling level.

Decision Tree algorithms be a kind of this data structure by decision tree based on sorting algorithm.Conventional construction decision tree Algorithm is ID3 algorithms.Decision tree is a tree construction for being similar to flow chart, wherein each internal node is represented on an attribute Test, each branch represents a test output, and each leaf nodes represent a class or class distribution.To unknown sample During this classification, its value tested the attribute of object in the sample one by one in sequence by tree root, and along eligible Branch walk downwards, until reaching some leaf node, the class that this leaf node represents is then the class belonging to the object.Based on decision-making Tree method, establish steel enterprise steam pipe network scheduling rule decision tree system.According to the system entropy and decision-making category of decision tree principle Property the methods of, with reference to existing enterprise's scheduling rule storehouse, expert knowledge library and factbase, calculated with improved construction decision tree greed Method ID3 algorithms establish steel enterprise steam pipe network scheduling rule decision tree system so that pipe network scheduling strategy more reasonable benefit/risk, Effectively reduce pipe network fluctuation range

The content of the invention

It is an object of the invention to provide a kind of based on the network scheduler system for improving traditional decision-tree, pass through improved structure Decision tree greedy algorithm ID3 algorithms are made, the program for realizing steam pipe system scheduling rule is scientific so that pipe network operation more section It is reasonable to learn.System using it is top-down it is recursive divide and rule mode to build, opened from training sample set and relative attribute Begin construction.With the continuous profound construction of decision tree, training sample set will recursively be divided into several less subsets.Tree Path between root and each node correspond to a correlation rule, therefore whole decision tree also just correspond to one group and completely associate Rule.

Hardware of the present invention includes relational database server, live database server, application server, engineer station. Relational database server is connected with engineer station and application server, and application server removes to be connected with relational database server Outside, also it is connected with real-time data base and engineer station, keeps data exchange between three.Application module includes relational database, Data acquisition module, scheduling rule result display module, Decision Tree Rule storehouse generation module.Wherein scheduling rule result shows mould Block is deployed in engineer station, and Decision Tree Rule library module is deployed in application server, and relational database is deployed in relational database Server, data acquisition module are deployed in real-time data base.

Relational database is the data communication medium between display module and Decision Tree Rule library module.Decision Tree Rule storehouse The decision rule of generation is write relational database by module, and display module reads and shown from relational database again.

Relational database：Store for dispatching record, Decision Tree Rule storehouse, the data shown.

Data acquisition module：It is made up of real-time data base and collection in worksite instrument and transmission network；Collection in worksite instrument Information is passed in real-time data base in real time；

Scheduling rule result display module：Data-interface part, data input function is provided for decision Tree algorithms, including read Take data file；

Decision Tree Rule library module：Function includes

1st, since tree root be representing the individual node of training sample set；

2nd, the node that training sample set belongs to same class is leaf, and such is marked；

3rd, otherwise using information gain measurement as split criterion, selection can realize that the attribute of best sample classification is used as and be somebody's turn to do The Split Attribute of node；

4th, a branch is created for each given value of Split Attribute, and divides sample set on this basis；

5th, using above-mentioned same process, the sample decision tree each divided is recursively formed.Once some attribute occurs On some node, then its offspring need not just consider further that；

6th, when meeting following condition for the moment, recurrence partiting step stops：

A) all sample sets for giving node belong to a class together；

B) can be used for further dividing sample without remaining attribute；

C) no specimen in branch.

Entropy before sample set division：

For thering is s data sample set S, wherein categorical attribute C to have m different discrete value c₁, c₂..., c_m(i.e. Data sample S will finally be divided into m classification).Categorical attribute value is c₁, c₂..., c_mSample number difference s₁, s₂..., s_m.That Before division, sample set S total entropy (expectation information) is：

Wherein, p_iIt is that S concentrates any one sample to belong to classification C_iProbability, and use s_i/ s estimates.Pay attention to, logarithmic function It is bottom with 2, because information binary coding.It can easily be seen that data set S total entropy is the sample to belong to a different category before division The weighted average of this information content.

Entropy after sample set division：

If attribute A has n different Category Attributes value { a₁, a₂..., a_n, it can be used attribute A that data set S is divided into n Individual subset { s₁, s₂..., s_n, corresponding each subset S_jIn the attribute A of all samples be all a_j。

If subset S_jIn whole sample numbers be s_j, wherein categorical attribute value is c₁, c₂..., c_mSample number be s_1j, s_2j..., s_mj, then subset S_jEntropy be：

Wherein p_ij=s_ij/s_j, it is S_jMiddle sample is belonging respectively to classification C_iProbability.

After data set S is divided into n subset using attribute A, S total entropy is the weighted average of the entropy of n subset：

WhereinFor S_jThe weight of subset, represent s_jProportion of the subset in data set S.

Information gain:

Information gain represents the information content that system is obtained due to classification, is measured by the reduction of system entropy, defines data set S and presses Information gain after attribute A divisions is poor for the front and rear entropy of S divisions：

Gain (A)=I (s₁,s₂,...,s_m)-E(A)

Algorithm calculates the information gain of each attribute, and then attribute of the selection with highest information gain is as data-oriented Collect S decision attribute, create a node, and marked with the attribute, branch is created to each value of attribute, and divide sample accordingly This.

Improved ID3 algorithms

ID3 algorithms are the typical decision Tree algorithms based on information gain, are constructed by the top-down recursive mode of dividing and ruling Decision tree is learnt.Its specific method is that all candidate attributes are tested, and selects the maximum attribute of information gain Pass through the different values construction point of the attribute as the root node of decision tree as optimal Split Attribute, and using this Split Attribute Branch, the above method is constantly then repeated to the subset of each branch, and construct other branches of decision-making tree node successively, until institute Untill some subclass only include generic training sample set.Finally obtained ID3 decision-tree models can is to new Set of data samples is classified and predicted.Attribute is all discrete type, or numerical attribute changes into discrete type by pretreatment in advance.

To improve ID3 solving speeds, raising is improved to traditional ID3 algorithms.Traditional ID3 algorithms are to each on node Attribute will calculate its information gain, then therefrom select the maximum attribute as the node of information gain.Due in information It is related to the calculating of logarithmic function in gain calculation process, built-in function must be called in calculation procedure, which adds meter The calculation amount time.This method greatly reduces amount of calculation, is improved Algorithm for Solving speed using a kind of new standard for selecting attribute. In ID3 algorithms, in ID3 algorithms, it is assumed that the size of positive example collection PE and counter-example collection NE in vector space are respectively p and q, then believe Breath entropy I is represented by：

Weighted information entropy is：

Bring into：

According to Equivalent Infinitesimal principle, if x very littles, ln (1+x) ≈ x

After bringing into：

WithCalculate weighted average entropy and be greatly improved data-handling capacity.

Assuming that representing current sample set with T, current candidate property set is represented with T_attributelist, candidate attribute collection Middle all properties are all discrete type, or numerical attribute changes into discrete type by pretreatment in advance.Then improved ID3 algorithm GID3formtree (T, T_attributelist) flow is described in detail below：

Step 1：Create root node N；

Step 2：If T belongs to same class C, return N is leaf node, labeled as class C；

Step 3：If T_attributelist is sky, return N is leaf node, and mark N is to occur at most existing in T Class；

Step 4：To the attribute in each T_attributelist, information gain gain is calculated；

Step 5：N testing attribute test_attribute=T_attributelist has the category of highest gain values Property；

Step 6：To each test_attributelist value, by mono- new leaf node of node N, and such as Sample set T corresponding to the new leaf node of fruit is sky, then does not divide this leaf node, be marked as in T occurring at most in class； Otherwise ID3formtree (T, T_attributelist) is performed on the leaf node, continues to divide it；

The advantage of the invention is that：Steam pipe system scheduling rule system is established based on traditional decision-tree is improved, realizes pipe network The sequencing of scheduling, it is rapid, it is scientific, ensure that pipe network operation is safe and efficient, improve operational efficiency, be industry energy conservation emission reduction.

Brief description of the drawings

Fig. 1 is graph of a relation between each module of present system.

Fig. 2 is that Decision Tree Rule solves flow chart.

Embodiment

Fig. 1 is graph of a relation between each module of invention system.Present system includes relational database, data acquisition module, Scheduling rule result display module, Decision Tree Rule storehouse generation module.Wherein scheduling rule result display module is deployed in engineering Teacher stands, and Decision Tree Rule storehouse generation module is deployed in application server, and relational database is deployed in relational database server, number Real-time data base is deployed according to acquisition module.Relational database is that scheduling rule result display module generates with Decision Tree Rule storehouse Data communication medium between module.The rule base of generation is write relational database, display by Decision Tree Rule storehouse generation module Module reads and shown from relational database again.

Fig. 2 is that Decision Tree Rule solves flow chart.Root node N is created first, if judging that T belongs to same class C, is returned It is leaf node to return N, labeled as class C；If then judging T_attributelist as sky, return N is leaf node, and mark N is Occur in T at most in class；Secondly to the attribute in each T_attributelist, information gain gain is calculated；N test Attribute test_attribute=T_attributelist has the attribute of highest gain values；Finally to each test_ Attributelist value, by mono- new leaf node of node N, and if sample set T corresponding to new leaf node is Sky, then do not divide this leaf node, be marked as in T occurring at most in class；Otherwise performed on the leaf node GID3formtree (T, T_attributelist), continue to divide it.

Claims

It is 1. a kind of based on the network scheduler system for improving traditional decision-tree, it is characterised in that real including relational database server When database server, application server, engineer station；Relational database server and engineer station and application server phase Even, application server is also connected in addition to being connected with relational database server with real-time data base and engineer station, keeps three Between data exchange；Application module includes relational database, data acquisition module, scheduling rule result display module, decision tree Rule base generation module；Wherein scheduling rule result display module is deployed in engineer station, and Decision Tree Rule library module is deployed in Application server, relational database are deployed in relational database server, and data acquisition module is deployed in real-time data base；

Relational database is the data communication medium between display module and Decision Tree Rule library module, Decision Tree Rule library module The decision rule of generation is write into relational database, display module reads and shown from relational database again.
2. system according to claim 1, it is characterised in that

Described relational database：Store for dispatching record, Decision Tree Rule storehouse, the data shown；

Described data acquisition module：It is made up of real-time data base and collection in worksite instrument and transmission network；Collection in worksite instrument Information is passed in real-time data base by table in real time；

Described scheduling rule result display module：Data-interface part, data input function is provided for decision Tree algorithms, including Read data file.
3. system according to claim 1, it is characterised in that described Decision Tree Rule library facility is：

Since tree root be representing the individual node of training sample set；

The node that training sample set belongs to same class is leaf, and such is marked；

Otherwise using information gain measurement as split criterion, selection can realize the attribute of best sample classification as the node Split Attribute；

A branch is created for each given value of Split Attribute, and divides sample set on this basis；

Using above-mentioned same process, the sample decision tree each divided is recursively formed；Once some attribute appears in some On node, then its offspring need not just consider further that；

When meeting following condition for the moment, recurrence partiting step stops：

A) all sample sets for giving node belong to a class together；

B) can be used for further dividing sample without remaining attribute；

C) no specimen in branch.