CN106294667A - A kind of decision tree implementation method based on ID3 and device - Google Patents

A kind of decision tree implementation method based on ID3 and device Download PDF

Info

Publication number
CN106294667A
CN106294667A CN201610635132.0A CN201610635132A CN106294667A CN 106294667 A CN106294667 A CN 106294667A CN 201610635132 A CN201610635132 A CN 201610635132A CN 106294667 A CN106294667 A CN 106294667A
Authority
CN
China
Prior art keywords
nodes
sub
attribute
matrix
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610635132.0A
Other languages
Chinese (zh)
Inventor
谢京华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Jiuzhou Electric Group Co Ltd
Original Assignee
Sichuan Jiuzhou Electric Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Jiuzhou Electric Group Co Ltd filed Critical Sichuan Jiuzhou Electric Group Co Ltd
Priority to CN201610635132.0A priority Critical patent/CN106294667A/en
Publication of CN106294667A publication Critical patent/CN106294667A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of decision tree implementation method based on ID3 and device, the method includes matrix construction step, reads in data to build attribute matrix and data matrix;Present node determines step, selects the attribute that in the attribute matrix according to data matrix calculating, current information gain maximum is corresponding as present node;Sub-first nodes determines step, remove reconfiguration attribute matrix after present node attribute, the attribute that after selecting attribute matrix to reconstruct, calculated information gain maximum is corresponding is as current sub-first nodes, and determines step with this sub-first nodes for present node duplicon first nodes;Decision tree realizes step, realizes decision tree according to all sub-first nodes of present node and correspondence.The present invention can fully, efficiently, practical, obtain DECISION KNOWLEDGE reliably, thus realize data mining.

Description

A kind of decision tree implementation method based on ID3 and device
Technical field
The invention belongs to areas of information technology, specifically, especially design a kind of decision tree implementation method based on ID3 and Device.
Background technology
Along with the rapid advances of developing rapidly of information technology, data collection and data storage technology makes each organization Mass data can be accumulated.This results in and extracts the challenge that useful information becomes huge from mass data.Due to data volume The biggest, it is impossible to use traditional data analysis tool and technical finesse.Sometimes, even if data volume is relatively small, but due to data Itself there are some non-traditional features, traditional method can not be used to process.In yet some other cases, problems faced is Existing data analysis technique can not be used to solve.Accordingly, it would be desirable to develop new method to carry out data process.
Data mining is a kind of technology, and traditional data analysing method and the complicated algorithm processing mass data are tied by mutually Close, mined information inside these substantial amounts of data.At present, decision tree has become as a kind of important data digging method.? In decision tree structure, ID3 algorithm is the most influential a kind of decision trees, is to be proposed by Quinlan for 1986 's.Quinlan illustrates decision tree and the correlation theory of ID3 algorithm, and decision tree is carried out by a lot of experts and scholars thereafter In-depth study.But in the application process that reality is concrete, ID3 algorithm is processing incomplete training data or ambiguous instruction When practicing data, there is certain defect.
Summary of the invention
For solving problem above, the invention provides a kind of decision tree implementation method based on ID3 and device, be used for filling Point, efficiently, practical, obtain DECISION KNOWLEDGE reliably, thus realize data mining.
According to an aspect of the invention, it is provided a kind of decision tree implementation method based on ID3, including:
Matrix construction step, reads in data to build attribute matrix and data matrix;
Present node determines step, selects current information gain maximum pair in the attribute matrix according to data matrix calculating The attribute answered is as present node;
Sub-first nodes determines step, removes reconfiguration attribute matrix after present node attribute, after selecting attribute matrix reconstruct Attribute corresponding to calculated information gain maximum is as current sub-first nodes, and is current with this sub-first nodes Node duplicon first nodes determines step;
Decision tree realizes step, realizes decision tree according to all sub-first nodes of present node and correspondence.
According to one embodiment of present invention, described sub-first nodes determines that step farther includes:
Judge whether current sub-first nodes is leaf node step, the most then judge whether this leaf node is effective leaf segment Point, reconfiguration attribute matrix return sub-first nodes and determine step after otherwise rejecting this sub-first nodes;
If this leaf node is effective leaf node, then form a branch of decision tree, after otherwise rejecting this sub-first nodes Reconfiguration attribute matrix also returns sub-first nodes and determines step.
According to one embodiment of present invention, also include when judging that current sub-first nodes is effective leaf node calculating The support of this leaf node respective branches.
According to one embodiment of present invention, the sub-first nodes weeded out has calculated at the sub-first nodes of parallel sane level Cheng Hou, reconfiguration attribute matrix return sub-first nodes and determine step after rejecting the sub-first nodes that calculated.
According to one embodiment of present invention, described decision tree realizes step and farther includes:
Travel through the sub-first nodes determining all of parallel sane level;
Travel through the present node determining all of parallel sane level, wherein, as present node correspondence is last attribute, then Form decision tree according to all of sub-first nodes and present node, otherwise, reject reconfiguration attribute square after this present node attribute Battle array also returns present node and determines step.
According to another aspect of the present invention, additionally provide a kind of decision tree based on ID3 and realize device, including:
Matrix builds module, reads in data to build attribute matrix and data matrix;
Present node determines module, selects current information gain maximum pair in the attribute matrix according to data matrix calculating The attribute answered is as present node;
Sub-first nodes determines module, removes reconfiguration attribute matrix after present node attribute, after selecting attribute matrix reconstruct Attribute corresponding to calculated information gain maximum is as current sub-first nodes, and is current with this sub-first nodes At sub-first nodes, node determines that module reruns;
Decision tree realizes module, realizes decision tree according to all sub-first nodes of present node and correspondence.
According to one embodiment of present invention, described sub-first nodes determines that module determines that sub-one-level saves in the following manner Point:
Judge whether current sub-first nodes is leaf node step, the most then judge whether this leaf node is effective leaf segment Point, reconfiguration attribute matrix return sub-first nodes and determine module after otherwise rejecting this sub-first nodes;
If this leaf node is effective leaf node, then form a branch of decision tree, after otherwise rejecting this sub-first nodes Reconfiguration attribute matrix also returns sub-first nodes and determines module.
According to one embodiment of present invention, described sub-first nodes determines that module is when judging that current sub-first nodes is Effectively also include the support calculating this leaf node respective branches during leaf node.
According to one embodiment of present invention, described sub-first nodes determines that sub-first nodes that module weeds out is parallel After the sub-first nodes of sane level has calculated, reconfiguration attribute matrix return sub-one-level after rejecting the sub-first nodes that calculated Node determines module.
According to one embodiment of present invention, described decision tree realizes module and is accomplished by decision tree:
Travel through the sub-first nodes determining all of parallel sane level;
Travel through the present node determining all of parallel sane level, wherein, as present node correspondence is last attribute, then Form decision tree according to all of sub-first nodes and present node, otherwise, reject reconfiguration attribute square after this present node attribute Battle array also returns present node and determines module.
Beneficial effects of the present invention:
One aspect of the present invention uses the thought of process beta pruning to weed out incomplete DECISION KNOWLEDGE and ambiguous decision-making Knowledge, on the other hand uses the thought of process computing to complete the calculating to every DECISION KNOWLEDGE support during the course, when having Between and space efficiency utilization is higher, the knowledge excavated restrains effectively, the features such as actual application of more fitting, can fully, efficient, in fact With, obtain DECISION KNOWLEDGE reliably, thus realize data mining.
Other features and advantages of the present invention will illustrate in the following description, and, partly become from description Obtain it is clear that or understand by implementing the present invention.The purpose of the present invention and other advantages can be by description, rights Structure specifically noted in claim and accompanying drawing realizes and obtains.
Accompanying drawing explanation
For the technical scheme being illustrated more clearly that in the embodiment of the present invention, required in embodiment being described below Accompanying drawing does simply to be introduced:
Fig. 1 is method flow diagram according to an embodiment of the invention;
Fig. 2 is algorithm flowchart according to an embodiment of the invention;
Fig. 3 is attribute matrix according to an embodiment of the invention and the signal of data Matrix Programs actual motion cross section Figure;
Fig. 4 is entropy result of calculation schematic cross-section according to an embodiment of the invention;
Fig. 5 is matrix reconstruction result schematic cross-section according to an embodiment of the invention;
Fig. 6 is another matrix reconstruction result schematic cross-section according to an embodiment of the invention;
Fig. 7 is that decision-making according to an embodiment of the invention exports result schematic cross-section;
Fig. 8 is that another decision-making according to an embodiment of the invention exports result schematic cross-section;
Fig. 9 is that another decision-making according to an embodiment of the invention exports result schematic cross-section;
Figure 10 is according to an embodiment of the invention to be provided without the result that process computing calculates obtained by support and show It is intended to;
Figure 11 be according to an embodiment of the invention have employed process computing calculate support obtain result signal Figure.
Detailed description of the invention
Describe embodiments of the present invention in detail below with reference to drawings and Examples, whereby how the present invention is applied Technological means solves technical problem, and the process that realizes reaching technique effect can fully understand and implement according to this.Need explanation As long as not constituting conflict, each embodiment in the present invention and each feature in each embodiment can be combined with each other, The technical scheme formed is all within protection scope of the present invention.
The core concept of classical ID3 algorithm is to use information gain to carry out metric attribute to select, and selects information gain after division Maximum attribute divides.This algorithm uses the decision tree space that top-down greedy search traversal is possible.Specifically, ID3 algorithm challenge is how to determine each branch node of decision tree, in ID3 algorithm, Mei Gejie from training sample The attribute test standard of point uses the method for information gain, utilizes the attribute that the method selection information gain is the highest as current knot The Split Attribute of point is estimated.During ID3 decision tree classification, typically portray each reality with multiple attributes (or referred to as feature) Body, and each attribute is limited to value separate in data set.
ID3 decision Tree algorithms method is simple, classification capacity strong, is suitable to process large-scale Machine Learning Problems, is data Excavate and an effective means in machine learning field.The advantage of ID3 algorithm is summarized as follows:
1) ID3 decision Tree algorithms is based on theory of information thought, selects Split Attribute node with information gain, and algorithm is simple, easily In realization, the categorised decision tree that nodal point number is minimum can be formed, and be also complete on search volume;
2) ID3 decision Tree algorithms is due to based on top-down recursion strategy, and the time complexity of algorithm and training The product of sample size, number of attributes and node quantity three is linear, so the classification speed of this algorithm is very fast;
3) ID3 decision Tree algorithms uses tree, has hierarchical structure feature, can conclude from data easily Go out to be prone to " if-so " classifying rules that user understands.
But, ID3 decision Tree algorithms there is also intrinsic shortcoming:
1) thought of ID3 decision Tree algorithms is based on a kind of Greedy strategy, it is impossible to data of incrementally undergoing training, and is not suitable for In gradual machine learning task;
2) ID3 decision Tree algorithms is not recalled in search procedure, during building decision tree, can converge to office Portion is optimum, but is difficult to converge to global optimum;
3) ID3 decision Tree algorithms is based on theory of information thought, the attribute that this calculation is more to property value number in data Rely on, and the more attribute of property value is the most best on classification performance;
4) when training data is imperfect or training data has ambiguity, there is certain flaw, institute in ID3 decision Tree algorithms With it the most also by a definite limitation, such as, training data sample as shown in table 1, its 13rd row and the 14th row Data all of condition entry in addition to decision-making item is the most identical, now, uses the ID3 algorithm of classics it would appear that decision-making judgement is different Normal phenomenon, will form interference for being subsequently formed DECISION KNOWLEDGE.
Table 1
Lon Ran Azi Typ Judge
1 101 201 301 401 120
2 101 201 301 402 120
3 101 201 301 403 120
4 102 201 301 401 110
5 102 201 301 403 110
6 103 202 301 401 120
7 103 202 301 403 120
8 103 201 302 401 110
9 103 201 302 402 120
10 101 202 301 401 120
11 101 202 301 403 120
12 103 202 302 401 120
13 103 202 302 403 120
14 103 202 302 403 110
15 101 202 302 402 110
16 102 202 301 402 110
17 102 202 301 403 110
18 102 201 302 401 110
19 103 202 301 402 120
20 102 201 302 403 110
At present, both at home and abroad mostly for the 1st), 2), 3) item shortcoming improves, ignore to a certain extent for the 4) research of item shortcoming.Be summed up, for the 4th) method mainly taked of item shortcoming is the method for beta pruning, including first beta pruning and Rear beta pruning.Wherein first pruning method is, once to stop hedge clipper branch by stopping the structure of tree in advance, and node then becomes tree Leaf, this leaves is held and is occurred most frequent class in subset sample, but owing to this leaves is not the leaf node of judgment condition, therefore should Decision condition is insignificant in a lot of actual application;Then pruning method is to be gone branch, efficiency by the hedge clipper grown completely Relatively low.
Therefore, the invention provides a kind of decision tree implementation method based on ID3, for abundant, efficient, practical, reliable Acquisition DECISION KNOWLEDGE, it is achieved data mining.It is illustrated in figure 1 method flow diagram according to an embodiment of the invention, as Fig. 2 show algorithm flow chart according to an embodiment of the invention, carries out the present invention in detail below with reference to Fig. 1 and Fig. 2 Describe in detail bright.
First it is step S110 matrix construction step, reads in data to build attribute matrix and data matrix.
Concrete, in this step, first read in data from data source, according to the data construct attribute matrix sum read in According to matrix.It is illustrated in figure 3 structure attribute matrix and the actual motion sectional drawing of data matrix, wherein, Azi, Lon, Ran, Tye table Show the attribute in attribute matrix,<Azi, 0>,<Lon, 0>,<Ran, 0>,<Tye, 0>, be attribute matrix,<Azi, 301>,< Judge, 120>,<Lon, 101>,<Ran, 201>,<Tye, 401>... ... represent data matrix, be equivalent to Property Name Mate with property value, decision-making item and decision value coupling, and collectively form a big data matrix, run for follow-up program and do The pretreatment work of good data.
It is followed by step S120 present node and determines step, select the attribute matrix according to data matrix calculating is currently believed Cease attribute corresponding to gain maximum as present node.
Concrete, according to the information gain of each attribute in data matrix computation attribute matrix.Subprogram actual motion As shown in Figure 4, in this example, the entropy of attribute Azi is 0.932751 to sectional drawing, and the entropy of attribute Lon is 0.519518, attribute Ran Entropy be 0.966097, the entropy of attribute Tye is 0.987567.According to the entropy of calculated each attribute, choose entropy minimum Attribute, namely chooses the maximum attribute Lon of information gain as present node Nd.
It is noted that decision tree is sequential (trunk, bough, twig, leaves), say, that pass through every time Azi after the judgement sequence of information gain, the order between these four attribute items of Lon, Ran, Tye can not arbitrarily be changed.
It is followed by the sub-first nodes of step S130 and determines step, remove reconfiguration attribute matrix after present node attribute, select After attribute matrix reconstruct, attribute corresponding to calculated information gain maximum is as current sub-first nodes, and with this son First nodes is that present node duplicon first nodes determines step.
Concrete, remove reconfiguration attribute matrix after present node attribute, the letter of each attribute in attribute matrix after calculating reconstruct Breath gain, and select attribute that information gain maximum is corresponding as sub-first nodes.
Subprogram actual motion sectional drawing corresponding to sub-first nodes attribute as shown in Figure 5 and Figure 6, in this example, belongs to Property Lon be 101,102,103 data matrix respectively the most as shown in FIG., and select wherein genus corresponding to information gain maximum Property is as sub-first nodes.Subprogram actual motion sectional drawing as shown in Figure 4, in this example, the next stage attribute Azi of Lon Entropy be 0, the entropy of attribute Ran is 0.459148, and the entropy of attribute Tye is 0.333333.Choose the attribute that entropy is minimum, namely believe The attribute Azi of breath gain maximum is as next stage node (sub-first nodes).
Determine step still further comprises at this sub-first nodes and judge whether current sub-first nodes is leaf node Step, the most then determine whether whether this leaf node is effective leaf node.Otherwise reject reconfiguration attribute after this sub-first nodes Matrix also returns sub-first nodes and determines step, i.e. this sub-first nodes can also continue to be divided into next stage node.
If this leaf node is effective leaf node, then form a branch of decision tree, i.e. this node has been leaf node, that This decision tree branches has been formed, and avoids the need for wasting processor resource.Otherwise, i.e. this sub-first nodes can also continue to Branch, then reconfiguration attribute matrix return sub-first nodes and determine step after rejecting this sub-first nodes.The node weeded out is also It not useless node, but individually preserve in the buffer, be not involved in follow-up calculating, so can reduce follow-up calculating defeated Enter, promote computational efficiency, after follow-up calculating completes, then by preposition for this node, thus the decision tree branches formed. The sub-first nodes namely weeded out, after the sub-first nodes of parallel sane level has calculated, rejects the sub-one-level calculated Reconfiguration attribute matrix return sub-first nodes and determine step after node.
Effectively leaf node is that is whether the most last decision-making item can correspond to the value of Judge, if can not be so The most certainly not leaf node, because the purpose that decision tree is final (leaves) must be conclusion item;On the other hand it is subsequently noted exactly The problem whether conflicted of leaf node, say, that condition A=x can get at B=-1 condition A=x simultaneously can also obtain B=1, aobvious So this condition A=x is insignificant, because it result in the conclusion of a conflict.When judging that current sub-first nodes is Effectively also include the support calculating this leaf node respective branches during leaf node.
Judge that whether sub-first nodes is the process of leaf node and effective leaf node, namely the judgement that process beta pruning is whether. If the data of the 13rd row in table 1 and the 14th row are owing to can produce contradiction decision-making, can be directly in the judge process that program is run Obtain the mark that court verdict is invalid, therefore directly not enroll this decision information, i.e. process beta pruning.By process beta pruning, It is possible not only to greatly reduce the quantity that data process, it is also possible to weed out incomplete DECISION KNOWLEDGE and ambiguous decision-making is known Know.
It is finally that step S140 decision tree realizes step, realizes certainly according to all sub-first nodes of present node and correspondence Plan tree.
Determine that the sub-first nodes of all of parallel sane level and traversal complete to determine institute as in figure 2 it is shown, include having traveled through The present node of the parallel sane level having, wherein, if present node correspondence is last attribute, then according to all of sub-first nodes and Present node forms decision tree, otherwise, and reconfiguration attribute matrix return present node and determine step after rejecting this present node attribute Suddenly.
Specifically include following steps, first determine whether whether the sub-first nodes of present node has traveled through, i.e. judge institute The sub-first nodes having all has carried out above leaf node and the judgement of effective leaf node.In this way, then continue to judge present node Whether attribute corresponding for Nd is last attribute, and last attribute then forms decision tree (subprogram actual motion in this way Sectional drawing is as shown in Figure 7, Figure 8 and Figure 9), otherwise delete attribute corresponding to present node Nd, after reconfiguration attribute matrix, return information increases Benefit calculation procedure.As judged, sub-one-level attribute has not traveled through, then weight after deleting this sub-one-level attribute and reconstructing attribute matrix The new information gain calculating each attribute.
During traveling through sub-first nodes formation decision tree, by judging whether sub-first nodes has traveled through, Can weed out and carry out leaf node and the sub-first nodes of effectively leaf node judgement, only remaining sub-first nodes be carried out attribute Reconstruct and leaf node and effectively leaf node judge, thus reduce data operation quantity.To attribute corresponding for present node Nd it is whether Last attribute judges, can weed out attribute corresponding for this present node Nd and genus corresponding to sub-first nodes thereof Property, other nodal communitys of parallel sane level other with present node Nd grade are analyzed, thus reduce data operation quantity.
In one embodiment of the invention, judge also to include after sub-first nodes is effective leaf node this branch Degree of holding calculation procedure.After the branch forming decision tree, synchronize to obtain the support of this branch, i.e. process computing.Part Program actual motion sectional drawing as shown in Figure 4, in this example, the Article 1 DECISION KNOWLEDGE of output, namely Article 1 is from decision tree Root node to the branch road of leaf node be 0,101,301-> 120;Each node grade in decision tree is respectively 0,1,2;This The support counting value of branch is 5.
The present invention is as a example by the training data sample shown in table 1, and the present invention is at CPU3.4GHz, internal memory 4GB in employing, Test under the development environment of WindowsXP SP3, VC2008, and add timing program operation result is carried out timing.Its Middle accompanying drawing 10 is to be provided without the result obtained by process computing calculates support, and accompanying drawing 11 calculates support for have employed process computing Result obtained by degree.
Contrast Figure 10 and Figure 11 understands, and the present invention uses the thought of process beta pruning effectively to weed out incomplete decision-making Knowledge and ambiguous DECISION KNOWLEDGE;Meanwhile, the completing every of thought precise and high efficiency during the course of process computing is used The calculating of DECISION KNOWLEDGE support.Owing to this example is limited by data volume, when data volume is bigger, the operational efficiency of the present invention is relatively Traditional method is obviously improved.It addition, the present invention has carries out, with external data, the interface and the ability that merge, such that it is able to support The data fusion of multiple excavation means.
The composite can be widely applied to the early warning detection in national defence system, identify, monitor and civil aviaton and General Aviation neck The aspects such as the supervision in territory, flow-control, by historical data and real-time mining analysis, optimize the management in region monitor with And control to provide decision support for operational commanding.Additionally, it is also possible to apply the invention to its people such as medical science, business intelligence, WEB search Economic all trades and professions, formulate more reasonable, clear and definite, wise decision-making for related personnel and provide reference.
According to another aspect of the present invention, additionally provide a kind of decision tree based on ID3 and realize device, including matrix structure Modeling block, present node determine that module, sub-first nodes determine that module and decision tree realize module.
Wherein, matrix builds module and reads in data to build attribute matrix and data matrix;Present node determines that module is selected Select in the attribute matrix calculated according to data matrix attribute corresponding to current information gain maximum as present node;Sub-one-level Node determines that module removes reconfiguration attribute matrix after present node attribute, and after selecting attribute matrix reconstruct, calculated information increases Benefit attribute corresponding to maximum is as current sub-first nodes, and with this sub-first nodes for present node at sub-first nodes Determine that module reruns;Decision tree realizes module and realizes decision tree according to all sub-first nodes of present node and correspondence.
In one embodiment of the invention, sub-first nodes determines that module determines sub-first nodes in the following manner:
Judge whether current sub-first nodes is leaf node step, the most then judge whether this leaf node is effective leaf segment Point, reconfiguration attribute matrix return sub-first nodes and determine module after otherwise rejecting this sub-first nodes;
If this leaf node is effective leaf node, then form a branch of decision tree, after otherwise rejecting this sub-first nodes Reconfiguration attribute matrix also returns sub-first nodes and determines module.
In one embodiment of the invention, sub-first nodes determines that module is when judging that current sub-first nodes is effective The support calculating this leaf node respective branches is also included during leaf node.
In one embodiment of the invention, sub-first nodes determines that sub-first nodes that module weeds out is in parallel sane level Sub-first nodes calculated after, reconfiguration attribute matrix return sub-first nodes after rejecting the sub-first nodes that calculated Determine module.
In one embodiment of the invention, decision tree realizes module and is accomplished by decision tree:
Travel through the sub-first nodes determining all of parallel sane level;
Travel through the present node determining all of parallel sane level, wherein, as present node correspondence is last attribute, then Form decision tree according to all of sub-first nodes and present node, otherwise, reject reconfiguration attribute square after this present node attribute Battle array also returns present node and determines module.
While it is disclosed that embodiment as above, but described content is only to facilitate understand the present invention and adopt Embodiment, be not limited to the present invention.Technical staff in any the technical field of the invention, without departing from this On the premise of spirit and scope disclosed in invention, in form and any amendment and change can be made in details implement, But the scope of patent protection of the present invention, still must be defined in the range of standard with appending claims.

Claims (10)

1. a decision tree implementation method based on ID3, including:
Matrix construction step, reads in data to build attribute matrix and data matrix;
Present node determines step, selects current information gain maximum in the attribute matrix according to data matrix calculating corresponding Attribute is as present node;
Sub-first nodes determines step, removes reconfiguration attribute matrix after present node attribute, calculates after selecting attribute matrix reconstruct Attribute corresponding to the information gain maximum that obtains is as current sub-first nodes, and with this sub-first nodes as present node Duplicon first nodes determines step;
Decision tree realizes step, realizes decision tree according to all sub-first nodes of present node and correspondence.
Method the most according to claim 1, it is characterised in that described sub-first nodes determines that step farther includes:
Judge whether current sub-first nodes is leaf node step, the most then judge whether this leaf node is effective leaf node, Reconfiguration attribute matrix return sub-first nodes and determine step after otherwise rejecting this sub-first nodes;
If this leaf node is effective leaf node, then form a branch of decision tree, reconstruct after otherwise rejecting this sub-first nodes Attribute matrix also returns sub-first nodes and determines step.
Method the most according to claim 2, it is characterised in that when judging that current sub-first nodes is effective leaf node Also include the support calculating this leaf node respective branches.
Method the most according to claim 2, it is characterised in that the sub-first nodes weeded out is in the sub-one-level of parallel sane level After node has calculated, reconfiguration attribute matrix return sub-first nodes and determine step after rejecting the sub-first nodes that calculated Suddenly.
Method the most according to claim 4, it is characterised in that described decision tree realizes step and farther includes:
Travel through the sub-first nodes determining all of parallel sane level;
Travel through the present node determining all of parallel sane level, wherein, if present node correspondence is last attribute, then basis All of sub-first nodes and present node form decision tree, and otherwise, after rejecting this present node attribute, reconfiguration attribute matrix is also Return present node and determine step.
6. realize a device based on ID3 decision tree, including:
Matrix builds module, reads in data to build attribute matrix and data matrix;
Present node determines module, selects current information gain maximum in the attribute matrix according to data matrix calculating corresponding Attribute is as present node;
Sub-first nodes determines module, removes reconfiguration attribute matrix after present node attribute, calculates after selecting attribute matrix reconstruct Attribute corresponding to the information gain maximum that obtains is as current sub-first nodes, and with this sub-first nodes as present node Determine that module reruns at sub-first nodes;
Decision tree realizes module, realizes decision tree according to all sub-first nodes of present node and correspondence.
Device the most according to claim 6, it is characterised in that described sub-first nodes determines that module is the most true Stator first nodes:
Judge whether current sub-first nodes is leaf node step, the most then judge whether this leaf node is effective leaf node, Reconfiguration attribute matrix return sub-first nodes and determine module after otherwise rejecting this sub-first nodes;
If this leaf node is effective leaf node, then form a branch of decision tree, reconstruct after otherwise rejecting this sub-first nodes Attribute matrix also returns sub-first nodes and determines module.
Device the most according to claim 7, it is characterised in that described sub-first nodes determines that module is when judging current son The support calculating this leaf node respective branches is also included when first nodes is effective leaf node.
Device the most according to claim 7, it is characterised in that described sub-first nodes determines the sub-one-level that module weeds out Node is after the sub-first nodes of parallel sane level has calculated, and after rejecting the sub-first nodes calculated, reconfiguration attribute matrix is also Return sub-first nodes and determine module.
Device the most according to claim 9, it is characterised in that described decision tree realizes module and is accomplished by Decision tree:
Travel through the sub-first nodes determining all of parallel sane level;
Travel through the present node determining all of parallel sane level, wherein, if present node correspondence is last attribute, then basis All of sub-first nodes and present node form decision tree, and otherwise, after rejecting this present node attribute, reconfiguration attribute matrix is also Return present node and determine module.
CN201610635132.0A 2016-08-05 2016-08-05 A kind of decision tree implementation method based on ID3 and device Pending CN106294667A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610635132.0A CN106294667A (en) 2016-08-05 2016-08-05 A kind of decision tree implementation method based on ID3 and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610635132.0A CN106294667A (en) 2016-08-05 2016-08-05 A kind of decision tree implementation method based on ID3 and device

Publications (1)

Publication Number Publication Date
CN106294667A true CN106294667A (en) 2017-01-04

Family

ID=57665564

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610635132.0A Pending CN106294667A (en) 2016-08-05 2016-08-05 A kind of decision tree implementation method based on ID3 and device

Country Status (1)

Country Link
CN (1) CN106294667A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107678531A (en) * 2017-09-30 2018-02-09 广东欧珀移动通信有限公司 Using method for cleaning, device, storage medium and electronic equipment
CN107894827A (en) * 2017-10-31 2018-04-10 广东欧珀移动通信有限公司 Using method for cleaning, device, storage medium and electronic equipment
CN107943537A (en) * 2017-11-14 2018-04-20 广东欧珀移动通信有限公司 Using method for cleaning, device, storage medium and electronic equipment
CN108170769A (en) * 2017-12-26 2018-06-15 上海大学 A kind of assembling manufacturing qualitative data processing method based on decision Tree algorithms
CN109150845A (en) * 2018-07-26 2019-01-04 曙光信息产业(北京)有限公司 Monitor the method and system of terminal flow
CN109614415A (en) * 2018-09-29 2019-04-12 阿里巴巴集团控股有限公司 A kind of data mining, processing method, device, equipment and medium
CN109961075A (en) * 2017-12-22 2019-07-02 广东欧珀移动通信有限公司 User gender prediction method, apparatus, medium and electronic equipment

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107678531A (en) * 2017-09-30 2018-02-09 广东欧珀移动通信有限公司 Using method for cleaning, device, storage medium and electronic equipment
US11422831B2 (en) 2017-09-30 2022-08-23 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Application cleaning method, storage medium and electronic device
CN107894827A (en) * 2017-10-31 2018-04-10 广东欧珀移动通信有限公司 Using method for cleaning, device, storage medium and electronic equipment
CN107894827B (en) * 2017-10-31 2020-07-07 Oppo广东移动通信有限公司 Application cleaning method and device, storage medium and electronic equipment
CN107943537A (en) * 2017-11-14 2018-04-20 广东欧珀移动通信有限公司 Using method for cleaning, device, storage medium and electronic equipment
CN107943537B (en) * 2017-11-14 2020-01-14 Oppo广东移动通信有限公司 Application cleaning method and device, storage medium and electronic equipment
CN109961075A (en) * 2017-12-22 2019-07-02 广东欧珀移动通信有限公司 User gender prediction method, apparatus, medium and electronic equipment
CN108170769A (en) * 2017-12-26 2018-06-15 上海大学 A kind of assembling manufacturing qualitative data processing method based on decision Tree algorithms
CN109150845A (en) * 2018-07-26 2019-01-04 曙光信息产业(北京)有限公司 Monitor the method and system of terminal flow
CN109614415A (en) * 2018-09-29 2019-04-12 阿里巴巴集团控股有限公司 A kind of data mining, processing method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN106294667A (en) A kind of decision tree implementation method based on ID3 and device
CN106651188A (en) Electric transmission and transformation device multi-source state assessment data processing method and application thereof
CN107273429A (en) A kind of Missing Data Filling method and system based on deep learning
CN105956016A (en) Associated information visualization processing system
CN107515952A (en) The method and its system of cloud data storage, parallel computation and real-time retrieval
Li et al. Improved Bayesian network-based risk model and its application in disaster risk assessment
Konikov et al. Research of the possibilities of application of the Data Warehouse in the construction area
CN105354208A (en) Big data information mining method
CN102881039B (en) Based on the tree three-dimensional vector model construction method of laser three-dimensional scanning data
CN107871183A (en) Permafrost Area highway distress Forecasting Methodology based on uncertain Clouds theory
Guo et al. Monitoring and simulation of dynamic spatiotemporal land use/cover changes
CN104219088A (en) Hive-based network alarm information OLAP method
CN116129262A (en) Cultivated land suitability evaluation method and system for suitable mechanized transformation
CN111814528A (en) Connectivity analysis noctilucent image city grade classification method
CN107818338A (en) A kind of method and system of building group pattern-recognition towards Map Generalization
CN113779105B (en) Distributed track flow accompanying mode mining method
CN102637227A (en) Land resource assessment factor scope dividing method based on shortest path
WO2018196214A1 (en) Statistics system and statistics method for geographical influence on vernacular architectural form
Li et al. Research on spatial data mining based on uncertainty in Government GIS
Zhu et al. A Method of Random Forest Classification based on Fuzzy Comprehensive Evaluation
CN111611667A (en) Road network automatic selection method combining POI data
Lyu et al. Intelligent clustering analysis model for mining area mineral resource prediction
Gao et al. Exploring influence of groundwater and lithology on data-driven stability prediction of soil slopes using explainable machine learning: a case study
Chen et al. Internet of things technology in ecological security assessment system of intelligent land
Ren et al. Research on owner project management maturity model of highway construction project

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170104