CN106294667A - A kind of decision tree implementation method based on ID3 and device - Google Patents
A kind of decision tree implementation method based on ID3 and device Download PDFInfo
- Publication number
- CN106294667A CN106294667A CN201610635132.0A CN201610635132A CN106294667A CN 106294667 A CN106294667 A CN 106294667A CN 201610635132 A CN201610635132 A CN 201610635132A CN 106294667 A CN106294667 A CN 106294667A
- Authority
- CN
- China
- Prior art keywords
- nodes
- sub
- attribute
- matrix
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Fuzzy Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of decision tree implementation method based on ID3 and device, the method includes matrix construction step, reads in data to build attribute matrix and data matrix;Present node determines step, selects the attribute that in the attribute matrix according to data matrix calculating, current information gain maximum is corresponding as present node;Sub-first nodes determines step, remove reconfiguration attribute matrix after present node attribute, the attribute that after selecting attribute matrix to reconstruct, calculated information gain maximum is corresponding is as current sub-first nodes, and determines step with this sub-first nodes for present node duplicon first nodes;Decision tree realizes step, realizes decision tree according to all sub-first nodes of present node and correspondence.The present invention can fully, efficiently, practical, obtain DECISION KNOWLEDGE reliably, thus realize data mining.
Description
Technical field
The invention belongs to areas of information technology, specifically, especially design a kind of decision tree implementation method based on ID3 and
Device.
Background technology
Along with the rapid advances of developing rapidly of information technology, data collection and data storage technology makes each organization
Mass data can be accumulated.This results in and extracts the challenge that useful information becomes huge from mass data.Due to data volume
The biggest, it is impossible to use traditional data analysis tool and technical finesse.Sometimes, even if data volume is relatively small, but due to data
Itself there are some non-traditional features, traditional method can not be used to process.In yet some other cases, problems faced is
Existing data analysis technique can not be used to solve.Accordingly, it would be desirable to develop new method to carry out data process.
Data mining is a kind of technology, and traditional data analysing method and the complicated algorithm processing mass data are tied by mutually
Close, mined information inside these substantial amounts of data.At present, decision tree has become as a kind of important data digging method.?
In decision tree structure, ID3 algorithm is the most influential a kind of decision trees, is to be proposed by Quinlan for 1986
's.Quinlan illustrates decision tree and the correlation theory of ID3 algorithm, and decision tree is carried out by a lot of experts and scholars thereafter
In-depth study.But in the application process that reality is concrete, ID3 algorithm is processing incomplete training data or ambiguous instruction
When practicing data, there is certain defect.
Summary of the invention
For solving problem above, the invention provides a kind of decision tree implementation method based on ID3 and device, be used for filling
Point, efficiently, practical, obtain DECISION KNOWLEDGE reliably, thus realize data mining.
According to an aspect of the invention, it is provided a kind of decision tree implementation method based on ID3, including:
Matrix construction step, reads in data to build attribute matrix and data matrix;
Present node determines step, selects current information gain maximum pair in the attribute matrix according to data matrix calculating
The attribute answered is as present node;
Sub-first nodes determines step, removes reconfiguration attribute matrix after present node attribute, after selecting attribute matrix reconstruct
Attribute corresponding to calculated information gain maximum is as current sub-first nodes, and is current with this sub-first nodes
Node duplicon first nodes determines step;
Decision tree realizes step, realizes decision tree according to all sub-first nodes of present node and correspondence.
According to one embodiment of present invention, described sub-first nodes determines that step farther includes:
Judge whether current sub-first nodes is leaf node step, the most then judge whether this leaf node is effective leaf segment
Point, reconfiguration attribute matrix return sub-first nodes and determine step after otherwise rejecting this sub-first nodes;
If this leaf node is effective leaf node, then form a branch of decision tree, after otherwise rejecting this sub-first nodes
Reconfiguration attribute matrix also returns sub-first nodes and determines step.
According to one embodiment of present invention, also include when judging that current sub-first nodes is effective leaf node calculating
The support of this leaf node respective branches.
According to one embodiment of present invention, the sub-first nodes weeded out has calculated at the sub-first nodes of parallel sane level
Cheng Hou, reconfiguration attribute matrix return sub-first nodes and determine step after rejecting the sub-first nodes that calculated.
According to one embodiment of present invention, described decision tree realizes step and farther includes:
Travel through the sub-first nodes determining all of parallel sane level;
Travel through the present node determining all of parallel sane level, wherein, as present node correspondence is last attribute, then
Form decision tree according to all of sub-first nodes and present node, otherwise, reject reconfiguration attribute square after this present node attribute
Battle array also returns present node and determines step.
According to another aspect of the present invention, additionally provide a kind of decision tree based on ID3 and realize device, including:
Matrix builds module, reads in data to build attribute matrix and data matrix;
Present node determines module, selects current information gain maximum pair in the attribute matrix according to data matrix calculating
The attribute answered is as present node;
Sub-first nodes determines module, removes reconfiguration attribute matrix after present node attribute, after selecting attribute matrix reconstruct
Attribute corresponding to calculated information gain maximum is as current sub-first nodes, and is current with this sub-first nodes
At sub-first nodes, node determines that module reruns;
Decision tree realizes module, realizes decision tree according to all sub-first nodes of present node and correspondence.
According to one embodiment of present invention, described sub-first nodes determines that module determines that sub-one-level saves in the following manner
Point:
Judge whether current sub-first nodes is leaf node step, the most then judge whether this leaf node is effective leaf segment
Point, reconfiguration attribute matrix return sub-first nodes and determine module after otherwise rejecting this sub-first nodes;
If this leaf node is effective leaf node, then form a branch of decision tree, after otherwise rejecting this sub-first nodes
Reconfiguration attribute matrix also returns sub-first nodes and determines module.
According to one embodiment of present invention, described sub-first nodes determines that module is when judging that current sub-first nodes is
Effectively also include the support calculating this leaf node respective branches during leaf node.
According to one embodiment of present invention, described sub-first nodes determines that sub-first nodes that module weeds out is parallel
After the sub-first nodes of sane level has calculated, reconfiguration attribute matrix return sub-one-level after rejecting the sub-first nodes that calculated
Node determines module.
According to one embodiment of present invention, described decision tree realizes module and is accomplished by decision tree:
Travel through the sub-first nodes determining all of parallel sane level;
Travel through the present node determining all of parallel sane level, wherein, as present node correspondence is last attribute, then
Form decision tree according to all of sub-first nodes and present node, otherwise, reject reconfiguration attribute square after this present node attribute
Battle array also returns present node and determines module.
Beneficial effects of the present invention:
One aspect of the present invention uses the thought of process beta pruning to weed out incomplete DECISION KNOWLEDGE and ambiguous decision-making
Knowledge, on the other hand uses the thought of process computing to complete the calculating to every DECISION KNOWLEDGE support during the course, when having
Between and space efficiency utilization is higher, the knowledge excavated restrains effectively, the features such as actual application of more fitting, can fully, efficient, in fact
With, obtain DECISION KNOWLEDGE reliably, thus realize data mining.
Other features and advantages of the present invention will illustrate in the following description, and, partly become from description
Obtain it is clear that or understand by implementing the present invention.The purpose of the present invention and other advantages can be by description, rights
Structure specifically noted in claim and accompanying drawing realizes and obtains.
Accompanying drawing explanation
For the technical scheme being illustrated more clearly that in the embodiment of the present invention, required in embodiment being described below
Accompanying drawing does simply to be introduced:
Fig. 1 is method flow diagram according to an embodiment of the invention;
Fig. 2 is algorithm flowchart according to an embodiment of the invention;
Fig. 3 is attribute matrix according to an embodiment of the invention and the signal of data Matrix Programs actual motion cross section
Figure;
Fig. 4 is entropy result of calculation schematic cross-section according to an embodiment of the invention;
Fig. 5 is matrix reconstruction result schematic cross-section according to an embodiment of the invention;
Fig. 6 is another matrix reconstruction result schematic cross-section according to an embodiment of the invention;
Fig. 7 is that decision-making according to an embodiment of the invention exports result schematic cross-section;
Fig. 8 is that another decision-making according to an embodiment of the invention exports result schematic cross-section;
Fig. 9 is that another decision-making according to an embodiment of the invention exports result schematic cross-section;
Figure 10 is according to an embodiment of the invention to be provided without the result that process computing calculates obtained by support and show
It is intended to;
Figure 11 be according to an embodiment of the invention have employed process computing calculate support obtain result signal
Figure.
Detailed description of the invention
Describe embodiments of the present invention in detail below with reference to drawings and Examples, whereby how the present invention is applied
Technological means solves technical problem, and the process that realizes reaching technique effect can fully understand and implement according to this.Need explanation
As long as not constituting conflict, each embodiment in the present invention and each feature in each embodiment can be combined with each other,
The technical scheme formed is all within protection scope of the present invention.
The core concept of classical ID3 algorithm is to use information gain to carry out metric attribute to select, and selects information gain after division
Maximum attribute divides.This algorithm uses the decision tree space that top-down greedy search traversal is possible.Specifically,
ID3 algorithm challenge is how to determine each branch node of decision tree, in ID3 algorithm, Mei Gejie from training sample
The attribute test standard of point uses the method for information gain, utilizes the attribute that the method selection information gain is the highest as current knot
The Split Attribute of point is estimated.During ID3 decision tree classification, typically portray each reality with multiple attributes (or referred to as feature)
Body, and each attribute is limited to value separate in data set.
ID3 decision Tree algorithms method is simple, classification capacity strong, is suitable to process large-scale Machine Learning Problems, is data
Excavate and an effective means in machine learning field.The advantage of ID3 algorithm is summarized as follows:
1) ID3 decision Tree algorithms is based on theory of information thought, selects Split Attribute node with information gain, and algorithm is simple, easily
In realization, the categorised decision tree that nodal point number is minimum can be formed, and be also complete on search volume;
2) ID3 decision Tree algorithms is due to based on top-down recursion strategy, and the time complexity of algorithm and training
The product of sample size, number of attributes and node quantity three is linear, so the classification speed of this algorithm is very fast;
3) ID3 decision Tree algorithms uses tree, has hierarchical structure feature, can conclude from data easily
Go out to be prone to " if-so " classifying rules that user understands.
But, ID3 decision Tree algorithms there is also intrinsic shortcoming:
1) thought of ID3 decision Tree algorithms is based on a kind of Greedy strategy, it is impossible to data of incrementally undergoing training, and is not suitable for
In gradual machine learning task;
2) ID3 decision Tree algorithms is not recalled in search procedure, during building decision tree, can converge to office
Portion is optimum, but is difficult to converge to global optimum;
3) ID3 decision Tree algorithms is based on theory of information thought, the attribute that this calculation is more to property value number in data
Rely on, and the more attribute of property value is the most best on classification performance;
4) when training data is imperfect or training data has ambiguity, there is certain flaw, institute in ID3 decision Tree algorithms
With it the most also by a definite limitation, such as, training data sample as shown in table 1, its 13rd row and the 14th row
Data all of condition entry in addition to decision-making item is the most identical, now, uses the ID3 algorithm of classics it would appear that decision-making judgement is different
Normal phenomenon, will form interference for being subsequently formed DECISION KNOWLEDGE.
Table 1
Lon | Ran | Azi | Typ | Judge | |
1 | 101 | 201 | 301 | 401 | 120 |
2 | 101 | 201 | 301 | 402 | 120 |
3 | 101 | 201 | 301 | 403 | 120 |
4 | 102 | 201 | 301 | 401 | 110 |
5 | 102 | 201 | 301 | 403 | 110 |
6 | 103 | 202 | 301 | 401 | 120 |
7 | 103 | 202 | 301 | 403 | 120 |
8 | 103 | 201 | 302 | 401 | 110 |
9 | 103 | 201 | 302 | 402 | 120 |
10 | 101 | 202 | 301 | 401 | 120 |
11 | 101 | 202 | 301 | 403 | 120 |
12 | 103 | 202 | 302 | 401 | 120 |
13 | 103 | 202 | 302 | 403 | 120 |
14 | 103 | 202 | 302 | 403 | 110 |
15 | 101 | 202 | 302 | 402 | 110 |
16 | 102 | 202 | 301 | 402 | 110 |
17 | 102 | 202 | 301 | 403 | 110 |
18 | 102 | 201 | 302 | 401 | 110 |
19 | 103 | 202 | 301 | 402 | 120 |
20 | 102 | 201 | 302 | 403 | 110 |
At present, both at home and abroad mostly for the 1st), 2), 3) item shortcoming improves, ignore to a certain extent for the
4) research of item shortcoming.Be summed up, for the 4th) method mainly taked of item shortcoming is the method for beta pruning, including first beta pruning and
Rear beta pruning.Wherein first pruning method is, once to stop hedge clipper branch by stopping the structure of tree in advance, and node then becomes tree
Leaf, this leaves is held and is occurred most frequent class in subset sample, but owing to this leaves is not the leaf node of judgment condition, therefore should
Decision condition is insignificant in a lot of actual application;Then pruning method is to be gone branch, efficiency by the hedge clipper grown completely
Relatively low.
Therefore, the invention provides a kind of decision tree implementation method based on ID3, for abundant, efficient, practical, reliable
Acquisition DECISION KNOWLEDGE, it is achieved data mining.It is illustrated in figure 1 method flow diagram according to an embodiment of the invention, as
Fig. 2 show algorithm flow chart according to an embodiment of the invention, carries out the present invention in detail below with reference to Fig. 1 and Fig. 2
Describe in detail bright.
First it is step S110 matrix construction step, reads in data to build attribute matrix and data matrix.
Concrete, in this step, first read in data from data source, according to the data construct attribute matrix sum read in
According to matrix.It is illustrated in figure 3 structure attribute matrix and the actual motion sectional drawing of data matrix, wherein, Azi, Lon, Ran, Tye table
Show the attribute in attribute matrix,<Azi, 0>,<Lon, 0>,<Ran, 0>,<Tye, 0>, be attribute matrix,<Azi, 301>,<
Judge, 120>,<Lon, 101>,<Ran, 201>,<Tye, 401>... ... represent data matrix, be equivalent to Property Name
Mate with property value, decision-making item and decision value coupling, and collectively form a big data matrix, run for follow-up program and do
The pretreatment work of good data.
It is followed by step S120 present node and determines step, select the attribute matrix according to data matrix calculating is currently believed
Cease attribute corresponding to gain maximum as present node.
Concrete, according to the information gain of each attribute in data matrix computation attribute matrix.Subprogram actual motion
As shown in Figure 4, in this example, the entropy of attribute Azi is 0.932751 to sectional drawing, and the entropy of attribute Lon is 0.519518, attribute Ran
Entropy be 0.966097, the entropy of attribute Tye is 0.987567.According to the entropy of calculated each attribute, choose entropy minimum
Attribute, namely chooses the maximum attribute Lon of information gain as present node Nd.
It is noted that decision tree is sequential (trunk, bough, twig, leaves), say, that pass through every time
Azi after the judgement sequence of information gain, the order between these four attribute items of Lon, Ran, Tye can not arbitrarily be changed.
It is followed by the sub-first nodes of step S130 and determines step, remove reconfiguration attribute matrix after present node attribute, select
After attribute matrix reconstruct, attribute corresponding to calculated information gain maximum is as current sub-first nodes, and with this son
First nodes is that present node duplicon first nodes determines step.
Concrete, remove reconfiguration attribute matrix after present node attribute, the letter of each attribute in attribute matrix after calculating reconstruct
Breath gain, and select attribute that information gain maximum is corresponding as sub-first nodes.
Subprogram actual motion sectional drawing corresponding to sub-first nodes attribute as shown in Figure 5 and Figure 6, in this example, belongs to
Property Lon be 101,102,103 data matrix respectively the most as shown in FIG., and select wherein genus corresponding to information gain maximum
Property is as sub-first nodes.Subprogram actual motion sectional drawing as shown in Figure 4, in this example, the next stage attribute Azi of Lon
Entropy be 0, the entropy of attribute Ran is 0.459148, and the entropy of attribute Tye is 0.333333.Choose the attribute that entropy is minimum, namely believe
The attribute Azi of breath gain maximum is as next stage node (sub-first nodes).
Determine step still further comprises at this sub-first nodes and judge whether current sub-first nodes is leaf node
Step, the most then determine whether whether this leaf node is effective leaf node.Otherwise reject reconfiguration attribute after this sub-first nodes
Matrix also returns sub-first nodes and determines step, i.e. this sub-first nodes can also continue to be divided into next stage node.
If this leaf node is effective leaf node, then form a branch of decision tree, i.e. this node has been leaf node, that
This decision tree branches has been formed, and avoids the need for wasting processor resource.Otherwise, i.e. this sub-first nodes can also continue to
Branch, then reconfiguration attribute matrix return sub-first nodes and determine step after rejecting this sub-first nodes.The node weeded out is also
It not useless node, but individually preserve in the buffer, be not involved in follow-up calculating, so can reduce follow-up calculating defeated
Enter, promote computational efficiency, after follow-up calculating completes, then by preposition for this node, thus the decision tree branches formed.
The sub-first nodes namely weeded out, after the sub-first nodes of parallel sane level has calculated, rejects the sub-one-level calculated
Reconfiguration attribute matrix return sub-first nodes and determine step after node.
Effectively leaf node is that is whether the most last decision-making item can correspond to the value of Judge, if can not be so
The most certainly not leaf node, because the purpose that decision tree is final (leaves) must be conclusion item;On the other hand it is subsequently noted exactly
The problem whether conflicted of leaf node, say, that condition A=x can get at B=-1 condition A=x simultaneously can also obtain B=1, aobvious
So this condition A=x is insignificant, because it result in the conclusion of a conflict.When judging that current sub-first nodes is
Effectively also include the support calculating this leaf node respective branches during leaf node.
Judge that whether sub-first nodes is the process of leaf node and effective leaf node, namely the judgement that process beta pruning is whether.
If the data of the 13rd row in table 1 and the 14th row are owing to can produce contradiction decision-making, can be directly in the judge process that program is run
Obtain the mark that court verdict is invalid, therefore directly not enroll this decision information, i.e. process beta pruning.By process beta pruning,
It is possible not only to greatly reduce the quantity that data process, it is also possible to weed out incomplete DECISION KNOWLEDGE and ambiguous decision-making is known
Know.
It is finally that step S140 decision tree realizes step, realizes certainly according to all sub-first nodes of present node and correspondence
Plan tree.
Determine that the sub-first nodes of all of parallel sane level and traversal complete to determine institute as in figure 2 it is shown, include having traveled through
The present node of the parallel sane level having, wherein, if present node correspondence is last attribute, then according to all of sub-first nodes and
Present node forms decision tree, otherwise, and reconfiguration attribute matrix return present node and determine step after rejecting this present node attribute
Suddenly.
Specifically include following steps, first determine whether whether the sub-first nodes of present node has traveled through, i.e. judge institute
The sub-first nodes having all has carried out above leaf node and the judgement of effective leaf node.In this way, then continue to judge present node
Whether attribute corresponding for Nd is last attribute, and last attribute then forms decision tree (subprogram actual motion in this way
Sectional drawing is as shown in Figure 7, Figure 8 and Figure 9), otherwise delete attribute corresponding to present node Nd, after reconfiguration attribute matrix, return information increases
Benefit calculation procedure.As judged, sub-one-level attribute has not traveled through, then weight after deleting this sub-one-level attribute and reconstructing attribute matrix
The new information gain calculating each attribute.
During traveling through sub-first nodes formation decision tree, by judging whether sub-first nodes has traveled through,
Can weed out and carry out leaf node and the sub-first nodes of effectively leaf node judgement, only remaining sub-first nodes be carried out attribute
Reconstruct and leaf node and effectively leaf node judge, thus reduce data operation quantity.To attribute corresponding for present node Nd it is whether
Last attribute judges, can weed out attribute corresponding for this present node Nd and genus corresponding to sub-first nodes thereof
Property, other nodal communitys of parallel sane level other with present node Nd grade are analyzed, thus reduce data operation quantity.
In one embodiment of the invention, judge also to include after sub-first nodes is effective leaf node this branch
Degree of holding calculation procedure.After the branch forming decision tree, synchronize to obtain the support of this branch, i.e. process computing.Part
Program actual motion sectional drawing as shown in Figure 4, in this example, the Article 1 DECISION KNOWLEDGE of output, namely Article 1 is from decision tree
Root node to the branch road of leaf node be 0,101,301-> 120;Each node grade in decision tree is respectively 0,1,2;This
The support counting value of branch is 5.
The present invention is as a example by the training data sample shown in table 1, and the present invention is at CPU3.4GHz, internal memory 4GB in employing,
Test under the development environment of WindowsXP SP3, VC2008, and add timing program operation result is carried out timing.Its
Middle accompanying drawing 10 is to be provided without the result obtained by process computing calculates support, and accompanying drawing 11 calculates support for have employed process computing
Result obtained by degree.
Contrast Figure 10 and Figure 11 understands, and the present invention uses the thought of process beta pruning effectively to weed out incomplete decision-making
Knowledge and ambiguous DECISION KNOWLEDGE;Meanwhile, the completing every of thought precise and high efficiency during the course of process computing is used
The calculating of DECISION KNOWLEDGE support.Owing to this example is limited by data volume, when data volume is bigger, the operational efficiency of the present invention is relatively
Traditional method is obviously improved.It addition, the present invention has carries out, with external data, the interface and the ability that merge, such that it is able to support
The data fusion of multiple excavation means.
The composite can be widely applied to the early warning detection in national defence system, identify, monitor and civil aviaton and General Aviation neck
The aspects such as the supervision in territory, flow-control, by historical data and real-time mining analysis, optimize the management in region monitor with
And control to provide decision support for operational commanding.Additionally, it is also possible to apply the invention to its people such as medical science, business intelligence, WEB search
Economic all trades and professions, formulate more reasonable, clear and definite, wise decision-making for related personnel and provide reference.
According to another aspect of the present invention, additionally provide a kind of decision tree based on ID3 and realize device, including matrix structure
Modeling block, present node determine that module, sub-first nodes determine that module and decision tree realize module.
Wherein, matrix builds module and reads in data to build attribute matrix and data matrix;Present node determines that module is selected
Select in the attribute matrix calculated according to data matrix attribute corresponding to current information gain maximum as present node;Sub-one-level
Node determines that module removes reconfiguration attribute matrix after present node attribute, and after selecting attribute matrix reconstruct, calculated information increases
Benefit attribute corresponding to maximum is as current sub-first nodes, and with this sub-first nodes for present node at sub-first nodes
Determine that module reruns;Decision tree realizes module and realizes decision tree according to all sub-first nodes of present node and correspondence.
In one embodiment of the invention, sub-first nodes determines that module determines sub-first nodes in the following manner:
Judge whether current sub-first nodes is leaf node step, the most then judge whether this leaf node is effective leaf segment
Point, reconfiguration attribute matrix return sub-first nodes and determine module after otherwise rejecting this sub-first nodes;
If this leaf node is effective leaf node, then form a branch of decision tree, after otherwise rejecting this sub-first nodes
Reconfiguration attribute matrix also returns sub-first nodes and determines module.
In one embodiment of the invention, sub-first nodes determines that module is when judging that current sub-first nodes is effective
The support calculating this leaf node respective branches is also included during leaf node.
In one embodiment of the invention, sub-first nodes determines that sub-first nodes that module weeds out is in parallel sane level
Sub-first nodes calculated after, reconfiguration attribute matrix return sub-first nodes after rejecting the sub-first nodes that calculated
Determine module.
In one embodiment of the invention, decision tree realizes module and is accomplished by decision tree:
Travel through the sub-first nodes determining all of parallel sane level;
Travel through the present node determining all of parallel sane level, wherein, as present node correspondence is last attribute, then
Form decision tree according to all of sub-first nodes and present node, otherwise, reject reconfiguration attribute square after this present node attribute
Battle array also returns present node and determines module.
While it is disclosed that embodiment as above, but described content is only to facilitate understand the present invention and adopt
Embodiment, be not limited to the present invention.Technical staff in any the technical field of the invention, without departing from this
On the premise of spirit and scope disclosed in invention, in form and any amendment and change can be made in details implement,
But the scope of patent protection of the present invention, still must be defined in the range of standard with appending claims.
Claims (10)
1. a decision tree implementation method based on ID3, including:
Matrix construction step, reads in data to build attribute matrix and data matrix;
Present node determines step, selects current information gain maximum in the attribute matrix according to data matrix calculating corresponding
Attribute is as present node;
Sub-first nodes determines step, removes reconfiguration attribute matrix after present node attribute, calculates after selecting attribute matrix reconstruct
Attribute corresponding to the information gain maximum that obtains is as current sub-first nodes, and with this sub-first nodes as present node
Duplicon first nodes determines step;
Decision tree realizes step, realizes decision tree according to all sub-first nodes of present node and correspondence.
Method the most according to claim 1, it is characterised in that described sub-first nodes determines that step farther includes:
Judge whether current sub-first nodes is leaf node step, the most then judge whether this leaf node is effective leaf node,
Reconfiguration attribute matrix return sub-first nodes and determine step after otherwise rejecting this sub-first nodes;
If this leaf node is effective leaf node, then form a branch of decision tree, reconstruct after otherwise rejecting this sub-first nodes
Attribute matrix also returns sub-first nodes and determines step.
Method the most according to claim 2, it is characterised in that when judging that current sub-first nodes is effective leaf node
Also include the support calculating this leaf node respective branches.
Method the most according to claim 2, it is characterised in that the sub-first nodes weeded out is in the sub-one-level of parallel sane level
After node has calculated, reconfiguration attribute matrix return sub-first nodes and determine step after rejecting the sub-first nodes that calculated
Suddenly.
Method the most according to claim 4, it is characterised in that described decision tree realizes step and farther includes:
Travel through the sub-first nodes determining all of parallel sane level;
Travel through the present node determining all of parallel sane level, wherein, if present node correspondence is last attribute, then basis
All of sub-first nodes and present node form decision tree, and otherwise, after rejecting this present node attribute, reconfiguration attribute matrix is also
Return present node and determine step.
6. realize a device based on ID3 decision tree, including:
Matrix builds module, reads in data to build attribute matrix and data matrix;
Present node determines module, selects current information gain maximum in the attribute matrix according to data matrix calculating corresponding
Attribute is as present node;
Sub-first nodes determines module, removes reconfiguration attribute matrix after present node attribute, calculates after selecting attribute matrix reconstruct
Attribute corresponding to the information gain maximum that obtains is as current sub-first nodes, and with this sub-first nodes as present node
Determine that module reruns at sub-first nodes;
Decision tree realizes module, realizes decision tree according to all sub-first nodes of present node and correspondence.
Device the most according to claim 6, it is characterised in that described sub-first nodes determines that module is the most true
Stator first nodes:
Judge whether current sub-first nodes is leaf node step, the most then judge whether this leaf node is effective leaf node,
Reconfiguration attribute matrix return sub-first nodes and determine module after otherwise rejecting this sub-first nodes;
If this leaf node is effective leaf node, then form a branch of decision tree, reconstruct after otherwise rejecting this sub-first nodes
Attribute matrix also returns sub-first nodes and determines module.
Device the most according to claim 7, it is characterised in that described sub-first nodes determines that module is when judging current son
The support calculating this leaf node respective branches is also included when first nodes is effective leaf node.
Device the most according to claim 7, it is characterised in that described sub-first nodes determines the sub-one-level that module weeds out
Node is after the sub-first nodes of parallel sane level has calculated, and after rejecting the sub-first nodes calculated, reconfiguration attribute matrix is also
Return sub-first nodes and determine module.
Device the most according to claim 9, it is characterised in that described decision tree realizes module and is accomplished by
Decision tree:
Travel through the sub-first nodes determining all of parallel sane level;
Travel through the present node determining all of parallel sane level, wherein, if present node correspondence is last attribute, then basis
All of sub-first nodes and present node form decision tree, and otherwise, after rejecting this present node attribute, reconfiguration attribute matrix is also
Return present node and determine module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610635132.0A CN106294667A (en) | 2016-08-05 | 2016-08-05 | A kind of decision tree implementation method based on ID3 and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610635132.0A CN106294667A (en) | 2016-08-05 | 2016-08-05 | A kind of decision tree implementation method based on ID3 and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106294667A true CN106294667A (en) | 2017-01-04 |
Family
ID=57665564
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610635132.0A Pending CN106294667A (en) | 2016-08-05 | 2016-08-05 | A kind of decision tree implementation method based on ID3 and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106294667A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107678531A (en) * | 2017-09-30 | 2018-02-09 | 广东欧珀移动通信有限公司 | Using method for cleaning, device, storage medium and electronic equipment |
CN107894827A (en) * | 2017-10-31 | 2018-04-10 | 广东欧珀移动通信有限公司 | Using method for cleaning, device, storage medium and electronic equipment |
CN107943537A (en) * | 2017-11-14 | 2018-04-20 | 广东欧珀移动通信有限公司 | Using method for cleaning, device, storage medium and electronic equipment |
CN108170769A (en) * | 2017-12-26 | 2018-06-15 | 上海大学 | A kind of assembling manufacturing qualitative data processing method based on decision Tree algorithms |
CN109150845A (en) * | 2018-07-26 | 2019-01-04 | 曙光信息产业(北京)有限公司 | Monitor the method and system of terminal flow |
CN109614415A (en) * | 2018-09-29 | 2019-04-12 | 阿里巴巴集团控股有限公司 | A kind of data mining, processing method, device, equipment and medium |
CN109961075A (en) * | 2017-12-22 | 2019-07-02 | 广东欧珀移动通信有限公司 | User gender prediction method, apparatus, medium and electronic equipment |
-
2016
- 2016-08-05 CN CN201610635132.0A patent/CN106294667A/en active Pending
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107678531A (en) * | 2017-09-30 | 2018-02-09 | 广东欧珀移动通信有限公司 | Using method for cleaning, device, storage medium and electronic equipment |
US11422831B2 (en) | 2017-09-30 | 2022-08-23 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Application cleaning method, storage medium and electronic device |
CN107894827A (en) * | 2017-10-31 | 2018-04-10 | 广东欧珀移动通信有限公司 | Using method for cleaning, device, storage medium and electronic equipment |
CN107894827B (en) * | 2017-10-31 | 2020-07-07 | Oppo广东移动通信有限公司 | Application cleaning method and device, storage medium and electronic equipment |
CN107943537A (en) * | 2017-11-14 | 2018-04-20 | 广东欧珀移动通信有限公司 | Using method for cleaning, device, storage medium and electronic equipment |
CN107943537B (en) * | 2017-11-14 | 2020-01-14 | Oppo广东移动通信有限公司 | Application cleaning method and device, storage medium and electronic equipment |
CN109961075A (en) * | 2017-12-22 | 2019-07-02 | 广东欧珀移动通信有限公司 | User gender prediction method, apparatus, medium and electronic equipment |
CN108170769A (en) * | 2017-12-26 | 2018-06-15 | 上海大学 | A kind of assembling manufacturing qualitative data processing method based on decision Tree algorithms |
CN109150845A (en) * | 2018-07-26 | 2019-01-04 | 曙光信息产业(北京)有限公司 | Monitor the method and system of terminal flow |
CN109614415A (en) * | 2018-09-29 | 2019-04-12 | 阿里巴巴集团控股有限公司 | A kind of data mining, processing method, device, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106294667A (en) | A kind of decision tree implementation method based on ID3 and device | |
CN106651188A (en) | Electric transmission and transformation device multi-source state assessment data processing method and application thereof | |
CN107273429A (en) | A kind of Missing Data Filling method and system based on deep learning | |
CN105956016A (en) | Associated information visualization processing system | |
CN107515952A (en) | The method and its system of cloud data storage, parallel computation and real-time retrieval | |
Li et al. | Improved Bayesian network-based risk model and its application in disaster risk assessment | |
Konikov et al. | Research of the possibilities of application of the Data Warehouse in the construction area | |
CN105354208A (en) | Big data information mining method | |
CN102881039B (en) | Based on the tree three-dimensional vector model construction method of laser three-dimensional scanning data | |
CN107871183A (en) | Permafrost Area highway distress Forecasting Methodology based on uncertain Clouds theory | |
Guo et al. | Monitoring and simulation of dynamic spatiotemporal land use/cover changes | |
CN104219088A (en) | Hive-based network alarm information OLAP method | |
CN116129262A (en) | Cultivated land suitability evaluation method and system for suitable mechanized transformation | |
CN111814528A (en) | Connectivity analysis noctilucent image city grade classification method | |
CN107818338A (en) | A kind of method and system of building group pattern-recognition towards Map Generalization | |
CN113779105B (en) | Distributed track flow accompanying mode mining method | |
CN102637227A (en) | Land resource assessment factor scope dividing method based on shortest path | |
WO2018196214A1 (en) | Statistics system and statistics method for geographical influence on vernacular architectural form | |
Li et al. | Research on spatial data mining based on uncertainty in Government GIS | |
Zhu et al. | A Method of Random Forest Classification based on Fuzzy Comprehensive Evaluation | |
CN111611667A (en) | Road network automatic selection method combining POI data | |
Lyu et al. | Intelligent clustering analysis model for mining area mineral resource prediction | |
Gao et al. | Exploring influence of groundwater and lithology on data-driven stability prediction of soil slopes using explainable machine learning: a case study | |
Chen et al. | Internet of things technology in ecological security assessment system of intelligent land | |
Ren et al. | Research on owner project management maturity model of highway construction project |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170104 |