CN102930290A - Integrated classifier and classification method thereof - Google Patents
Integrated classifier and classification method thereof Download PDFInfo
- Publication number
- CN102930290A CN102930290A CN2012103796409A CN201210379640A CN102930290A CN 102930290 A CN102930290 A CN 102930290A CN 2012103796409 A CN2012103796409 A CN 2012103796409A CN 201210379640 A CN201210379640 A CN 201210379640A CN 102930290 A CN102930290 A CN 102930290A
- Authority
- CN
- China
- Prior art keywords
- attribute
- computing process
- classifier
- coarse
- sub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to an integrated classifier and a classification method thereof, and the classifier and the method are used for solving the problems of low speed and precision as well as biasing characteristics and nondeterministic polynomial of attribute subsets in the field of spatial raster data monitoring and classification. The integrated classifier and the classification method adopt an attribute division mode, combine training data subsets with a parallel computing technique, and can be applied to high-latitude raster data; and as the integrated classifier and the classification method adopt a fuzzy rough sets theory as a standard for parallel division of high-altitude attributes, each subset has independent characteristics, and the integrity of a strategy is maintained. Therefore, the integrated classifier and the classification method are applicable to discrete and continuous heterogeneous data and can be applied to the fields of remote sensing and geographical information systems.
Description
Technical field
The present invention relates to RS and GIS system field.
Background technology
In existing space raster data supervised classification field, the main technology of using comprises nerve net, support vector machine, decision tree, Bayes, KNN scheduling algorithm.The Main Means that these algorithms adopt is inputted exactly the training data algorithm and is learnt to produce " disaggregated model ", by " disaggregated model " the further classification information of predicted position data.For high-dimensional data, usually adopt " attribute is chosen " algorithm, reduce dimension and improve speed.
A current other important technology that adopts is exactly " integrated classifier ", and integrated classifier is voted by a plurality of classifiers combination of isomery, and expectation obtains the nicety of grading higher than single sorter.
. in processing the space lattice data procedures, often need in the face of data magnanimity, the superelevation dimension, comprise space attribute more than 2000 such as some spatial data, data volume will be processed fast and effectively these data and will face some difficulties more than several TB:
(1) speed issue: when data volume is excessive, when especially dimension strengthens, the expense of Algorithm for Training disaggregated model also will strengthen, current popular may several hours can not obtain training result based on the SVM algorithm routine (as: LIBSVM) of C++, perhaps until memory headroom exhaust also can't the inventory analysis result.
(2) attribute set problem: for raising speed, a lot of algorithms all adopt " attribute is chosen ".On the one hand, choosing suitable attribute set from a very large property set is a nondeterministic polynomial problem, and combined number too much is difficult to exhaustive; The sub-attribute of near-optimization has " biasing " characteristic usually, and the precision of prediction of some classification has certain loss.
(3) precision problem: in order to solve precision problem, a lot of algorithms adopt " integrated classifier " technology, exactly training data are divided into a plurality of subsets, are then training ballot.For high-dimensional data grid, on the one hand, because data volume is larger, thus be difficult to guarantee difference between the sub-classifier, and the too approximate purpose that will not reach " integrated " and " ballot " of a plurality of sub-classifier; On the other hand, the training data subset of a large amount of attribute counterparts will cause " overfitting " phenomenon; These two kinds of problems all cause nicety of grading to reduce.
Have in existing space raster data supervised classification field in sum that speed is slow, precision is low, attribute set has biasing characteristic and attribute set is the problem of nondeterministic polynomial.
Summary of the invention
The present invention exists in the existing space raster data supervised classification field in order to solve that speed is slow, precision is low, attribute set has biasing characteristic and attribute set is the problem of nondeterministic polynomial, thereby has proposed the sorting technique of integrated classifier and this device.
The sorting technique of integrated classifier, it comprises the steps:
The sub-classifier number n of A, input integrated classifier;
N is the number of sub-classifier, and n is more than or equal to 2, by Expectation Algorithm all space attributes of raster data is divided into n part according to decision-making capability, and each sorter all possesses the whole classification capacity of complete or collected works,
B, n+1 process of startup;
Wherein, n+1 process is Rank 0, Rank 1 ... Rankn; Rank0 is managing process, and Rank 1 ... Rankn is the computing process, computing process Rank 1 ... Rankn is corresponding one by one with n sub-classifier respectively,
C, when current process is managing process Rank 0, the coarse relation table that structure is empty evenly is allocated to each computing process with pending raster data; Start n thread, the separately corresponding computing process of each thread;
Wherein, thread comprises the 1st thread, the 2nd thread ... the n thread,
D, when current process is the computing process, each process all reads pending raster data simultaneously;
Step 2, managing process Rank0 maintain attribute discretize interval table, and this attribute discretization interval table evenly is allocated to a plurality of threads, the raster data that described a plurality of threads start simultaneously to corresponding space connection attribute carries out discretize;
Step 3, managing process Rank0 evenly give n computing process with space attribute and process, and collect the result of n computing process, the coarse relation table that structure is complete, should issue each computing process by coarse relation table, each computing process is set up an attribute set according to coarse relation table;
Step 4, managing process Rank0 carry out parallel training sub-classifier production model with each computing process according to corresponding attribute set, this sub-classifier is and described process sub-classifier one to one, each sub-classifier is predicted the type of the attribute set that this sub-classifier is corresponding, add up all sub-classifiers and predict the outcome, choose maximum the predicting the outcome of ballot in the mode of vote by ballot.
Integrated classifier, it comprises following apparatus:
The mode that is used for the combination of multi-process and multithreading reads the device of pending raster data, and this device comprises such as lower module:
The module that is used for the sub-classifier number n of input integrated classifier;
Wherein, n is the number of sub-classifier, and n is more than or equal to 2, by Expectation Algorithm all space attributes of raster data is divided into n part according to decision-making capability, and each sorter all possesses the whole classification capacity of complete or collected works,
Be used for starting the module of n+1 process;
Wherein, n+1 process is Rank 0, Rank 1 ... Rankn; Rank0 is managing process, and Rank 1 ... Rankn is the computing process, computing process Rank 1 ... Rankn is corresponding one by one with n sub-classifier respectively,
Be used for when current process is managing process Rank 0, the coarse relation table that structure is empty evenly is allocated to each computing process with pending raster data; Start n thread, the module of the separately corresponding computing process of each thread;
Wherein, thread comprises the 1st thread, the 2nd thread ... the n thread,
Be used for when current process is the computing process, each process all reads the module of pending raster data simultaneously;
Be used for managing process Rank0 maintain attribute discretize interval table, and this attribute discretization interval table evenly is allocated to a plurality of threads, described a plurality of threads start the device that the raster data of corresponding space connection attribute is carried out discretize simultaneously;
Be used for managing process Rank0 space attribute is evenly given n computing process processing, and collect the result of n computing process, the coarse relation table that structure is complete, should issue each computing process by coarse relation table, each computing process is set up the device of an attribute set according to coarse relation table;
Be used for managing process Rank0 each computing process is carried out parallel training sub-classifier production model according to corresponding attribute set, this sub-classifier is and described process sub-classifier one to one, each sub-classifier is predicted the type of the attribute set that this sub-classifier is corresponding, add up all sub-classifiers and predict the outcome, choose the maximum device that predicts the outcome of ballot in the mode of vote by ballot.
The present invention has following advantage:
(1) adopt the attribute dividing mode, rather than sample dividing mode structure training data subset.
(2) training data subset and parallel computing are combined, be applied to the high latitude raster data.
(3) use the Fuzzy Rough Sets theory as the parallel standard of dividing of high latitude attribute, so that every subset namely has own autonomous behavior, kept again the decision-making integrality.
(4) be adapted to the isomeric data of discrete type, continuous type.
Description of drawings
Fig. 1 is the process flow diagram of the sorting technique of integrated classifier;
Fig. 2 reads the process flow diagram of pending raster data concrete steps for the mode that adopts the combination of multi-process and multithreading;
Fig. 3 starts the concrete steps process flow diagram that the raster data of corresponding space connection attribute is carried out discretize for each thread;
Fig. 4 is the graph of a relation between each thread in the discretize process, 2≤l among the figure≤n;
Fig. 5 is the structure of coarse relation table and the graph of a relation of attribute use table;
Fig. 6 is the process flow diagram in training production model stage.
Embodiment
Embodiment one, specify present embodiment in conjunction with Fig. 1 and Fig. 2, the sorting technique of the described integrated classifier of present embodiment, it comprises the steps:
The sub-classifier number n of A, input integrated classifier;
N is the number of sub-classifier, and n is more than or equal to 2, by Expectation Algorithm all space attributes of raster data is divided into n part according to decision-making capability, and each sorter all possesses the whole classification capacity of complete or collected works,
B, n+1 process of startup;
Wherein, n+1 process is Rank 0, Rank 1 ... Rankn; Rank0 is managing process, and Rank 1 ... Rankn is the computing process, computing process Rank 1 ... Rankn is corresponding one by one with n sub-classifier respectively,
C, when current process is managing process Rank 0, the coarse relation table that structure is empty evenly is allocated to each computing process with pending raster data; Start n thread, the separately corresponding computing process of each thread;
Wherein, thread comprises the 1st thread, the 2nd thread ... the n thread,
D, when current process is the computing process, each process all reads pending raster data simultaneously;
Step 2, managing process Rank0 maintain attribute discretize interval table, and this attribute discretization interval table evenly is allocated to a plurality of threads, the raster data that described a plurality of threads start simultaneously to corresponding space connection attribute carries out discretize;
Step 3, managing process Rank0 evenly give n computing process with space attribute and process, and collect the result of n computing process, the coarse relation table that structure is complete, should issue each computing process by coarse relation table, each computing process is set up an attribute set according to coarse relation table;
Step 4, managing process Rank0 carry out parallel training sub-classifier production model with each computing process according to corresponding attribute set, this sub-classifier is and described process sub-classifier one to one, each sub-classifier is predicted the type of the attribute set that this sub-classifier is corresponding, add up all sub-classifiers and predict the outcome, choose maximum the predicting the outcome of ballot in the mode of vote by ballot.
Present embodiment is after step 3, each process all obtains " attribute set ", each process is by the sorter (as: ID3 of an appointment of attribute set parallel training, SVM, this class model of nerve net is traditional algorithm), can use data volume (data volume is dwindled tens of quilts doubly for relatively hundreds of dimensions, 10-20 usually of the size of the every subset of this algorithm) the Fast Training production model of less.These models can be only in decision process with the form of vote by ballot as shown in Figure 6, can effectively prevent overfitting, increase nicety of grading.
The described vote by ballot mode of present embodiment is: if n sorter arranged at present, for the object x of a needs prediction, this n sorter is made a prediction respectively, and interim m1 sorter decision-making thought " the category-A type ", " B " type is thought in m2 sorter decision-making.At this moment with ballot, the minority is subordinate to the majority is principle, and the decision-making of getting than the multi-categorizer approval is the decision-making of integrated classifier integral body.It is exactly the vote by ballot process.
The difference of the sorting technique of embodiment two, present embodiment and embodiment one described integrated classifier is that the described raster data of steps A is high-dimensional raster data.
Present embodiment is for the high-dimensional raster data of magnanimity, and the slow precision of traditional algorithm speed is low, and this patent reaches the fast processing raster data, obtains disaggregated model, and owing to adopt the isomery decision-making mechanism, so nicety of grading is also higher.
Embodiment three, specify present embodiment in conjunction with Fig. 3, the difference of the sorting technique of present embodiment and embodiment one or two described integrated classifiers is, described each thread of step 2 starts the concrete steps that raster data to corresponding space connection attribute carries out discretize and is:
Step 2 one, the cluster number is set is ceil;
Step 2 two, between the maximal value of the space connection attribute that this thread starts and minimum value, ask for even distributional clustering initial center;
Step 2 three, according to the K-Means algorithm even distributional clustering initial center is carried out cluster, form ceil cluster;
Step 2 four, export its minimum and maximal value for each cluster, it is interval to form ceil codomain;
Step 2 five, consist of an interval tabulation with described ceil codomain is interval.
Present embodiment is by discretize, and it is interval to obtain discretize, and 1,2,3,4 grades that just original continuous data can be become limited number by this class interval are digital, and distinct relation is accelerated compare of analysis speed.In the multi-process situation, the treatment scheme of all data such as Fig. 4.
The difference of the sorting technique of embodiment four, present embodiment and embodiment one or two described integrated classifiers is, coarse relation table described in the described step 3 is a bivariate table, represent two direct overlapping degree of attribute, coarse pass is that the directly related property of 1 expression attribute is the strongest, coarse pass is that 0 expression is least relevant, and coarse relation table is as follows:
The difference of the sorting technique of embodiment five, present embodiment and embodiment four described integrated classifiers is that the concrete steps that each computing process is set up an attribute set according to coarse relation table in the step 3 are:
Step 3 one, in the coarse relation table of described computing process, select at random the incoherent attribute of a pair of coarse relation, the state of this attribute is " not using ", this attribute is added in the attribute set of described computing process, this subset is and described computing process subset one to one, and it is labeled as " using "
The state of attribute is " using " or " not using ";
Step 3 two, in described computing process, calculate the relation of the attribute set of the attribute of whenever a pair of " using " and described computing process according to formula (8),
The coarse pass of attribute and attribute set is:
Wherein, b represents the attribute set of described computing process, and an represents the arbitrarily attribute of a pair of " not using ", RT
(b, an)Represent the attribute set of described computing process and any coarse relation of the attribute of a pair of " use ";
Step 3 three, select the attribute of result of calculation minimum, this attribute is joined in the attribute set of described computing process, and the attribute set of described computing process is labeled as " using ";
Step 3 four, calculate the attribute set of described computing process and the relation of dimension complete or collected works D according to formula (6);
Wherein, w represents the attribute set of described computing process, and IND (w) is the corresponding undistinguishable of w subset relation, and the element in set W is namely thought to can not distinguish, and is incomparable; Card (U) is the order of set of computations, is the abbreviation of cardinal; POS
D(X) be X corresponding to the positive territory of D, more generally saying is, X set is gathered fully by D and is comprised;
Step 3 five, work as γ
D(w)=1 o'clock, export the attribute set of described computing process;
Step 3 six, work as γ
D(w)=0 o'clock, in described computing process, calculate the relation of the attribute set of the attribute of whenever a pair of " using " and described computing process according to formula (8).
According to the Rough Set of Pawlak, an infosystem S can be counted as a tables of data.It can be by S=(U, A) being represented wherein: domain U is the non-limit set of having in vain; A is the non-community set of having limit in vain; For each the element a ∈ A among the A, there is a mapping a:U → V
a, V wherein
aIt is the set of a value.Decision table is exactly the infosystem of shape such as S=(U, A ∪ { d}), wherein
It is decision attribute.For community set arbitrarily
There is a undistinguishable relations I ND (P):
Wherein, x and y are under the hyperspace, the various dimensions vector;
Equivalence class based on P undistinguishable relation can be defined as:
[x]
p={y∈U |(x,y)∈IND(P)} (2)
Can define up and down approximate collection according to undistinguishable relation. order set X ∈ U, X can be by the expressions of following two Set approximations:
Lower approximate collection:
Upper approximate collection:
If
So right
Just be referred to as rough set.Defining positive territory, negative territory and territory, limit: X is a set,
The lower approximate collection of expression X, [x]
D∩ X represents the upper approximate collection of X,
POS
D(X)=
DX (3)
BND
D(X)=NEG
D(X)-POS
D(X) (5)
A key concept of rough set is the dependency degree between the attribute.Attribute Q can be defined as (the Feature Dependence degree is represented by γ) for the degree of dependence of attribute D:
As can be known when subset R is the summation of all dimensions (attribute), formula (6) result is 1 according to formula (6).Work as D
1={ a
1, D
2={ a
1, a
2The time calculate Diff (D2D1)=γ
D2(Q)-γ
D1(Q), if this moment, Diff value was larger, illustrate that the decision domain of a1 and a2 dimension covering is different and scope is larger, be fit to combine; If the Diff value is less, the decision domain close (extreme case a2=a1, two attributes of Diff=0 are without any difference) that a1 and a2 dimension cover is described, two attributes are not suitable for combining.So each the list item computing formula in the coarse relation table is:
The coarse RT that concerns
(a1, a2)=1-(γ
{ a1, a2}(Q)-γ
A1(Q)) (7)
For an attribute b and a property set D={a
1, a
2..., a
n, its coarse pass is:
The coarse calculated amount that concerns is larger, need to walk abreast according to Fig. 5, and can traveling through coarse relation table after calculating, to calculate " attribute and set relations " this process be the statistic processes of suing for peace, and calculated amount is less.After coarse relation table is set up, obtain each attribute set.
The sorting technique of embodiment six, the described integrated classifier of present embodiment, it comprises the steps:
The mode that is used for the combination of multi-process and multithreading reads the device of pending raster data, and this device comprises such as lower module:
The module that is used for the sub-classifier number n of input integrated classifier;
Wherein, n is the number of sub-classifier, and n is more than or equal to 2, by Expectation Algorithm all space attributes of raster data is divided into n part according to decision-making capability, and each sorter all possesses the whole classification capacity of complete or collected works,
Be used for starting the module of n+1 process;
Wherein, n+1 process is Rank 0, Rank 1 ... Rankn; Rank0 is managing process, and Rank 1 ... Rankn is the computing process, computing process Rank 1 ... Rankn is corresponding one by one with n sub-classifier respectively,
Be used for when current process is managing process Rank 0, the coarse relation table that structure is empty evenly is allocated to each computing process with pending raster data; Start n thread, the module of the separately corresponding computing process of each thread;
Wherein, thread comprises the 1st thread, the 2nd thread ... the n thread,
Be used for when current process is the computing process, each process all reads the module of pending raster data simultaneously;
Be used for managing process Rank0 maintain attribute discretize interval table, and this attribute discretization interval table evenly is allocated to a plurality of threads, described a plurality of threads start the device that the raster data of corresponding space connection attribute is carried out discretize simultaneously;
Be used for managing process Rank0 space attribute is evenly given n computing process processing, and collect the result of n computing process, the coarse relation table that structure is complete, should issue each computing process by coarse relation table, each computing process is set up the device of an attribute set according to coarse relation table;
Be used for managing process Rank0 each computing process is carried out parallel training sub-classifier production model according to corresponding attribute set, this sub-classifier is and the described process device of sub-classifier one to one.
Be used for managing process Rank0 each computing process is carried out parallel training sub-classifier production model according to corresponding attribute set, this sub-classifier is and described process sub-classifier one to one, each sub-classifier is predicted the type of the attribute set that this sub-classifier is corresponding according to the fuzzy coarse central theory, add up all sub-classifiers and predict the outcome, choose the maximum device that predicts the outcome of ballot in the mode of vote by ballot.
The difference of the sorting technique of embodiment seven, present embodiment and embodiment six described integrated classifiers is that described raster data is high-dimensional raster data.
The difference of the sorting technique of embodiment eight, present embodiment and embodiment six or seven described integrated classifiers is, be used for managing process Rank0 maintain attribute discretize interval table, and this attribute discretization interval table evenly is allocated to a plurality of threads, described a plurality of thread starts the device that the raster data of corresponding space connection attribute is carried out discretize simultaneously, comprises such as lower module:
Be used for arranging the module that the cluster number is ceil;
Be used between the maximal value of the space connection attribute that this thread starts and minimum value, asking for the module of even distributional clustering initial center;
Be used for according to the K-Means algorithm even distributional clustering initial center being carried out cluster, form the module of ceil cluster;
Be used for each cluster is exported its minimum and maximal value, form the module in ceil codomain interval;
Be used for the interval module that consists of an interval tabulation of described ceil codomain.
The difference of the sorting technique of embodiment nine, present embodiment and embodiment six described integrated classifiers is, described coarse relation table is a bivariate table, represent two direct overlapping degree of attribute, coarse pass is that the directly related property of 1 expression attribute is the strongest, coarse pass is that 0 expression is least relevant, and coarse relation table is as follows:
The difference of the sorting technique of embodiment ten, present embodiment and embodiment six described integrated classifiers is, being used for managing process Rank0 evenly gives space attribute n computing process processing and collects the result of n computing process, the coarse relation table that structure is complete, should issue each computing process by coarse relation table, each computing process is set up the device of an attribute set according to coarse relation table, comprises such as lower module:
Be used for selecting at random the incoherent attribute of a pair of coarse relation at the coarse relation table of described computing process, the state of this attribute is " not using ", this attribute is added in the attribute set of described computing process, this subset is and described computing process subset one to one, and it is labeled as the module of " using "
The state of attribute is " using " or " not using ";
Be used in described computing process, calculate the module of relation of the attribute set of the attribute of whenever a pair of " using " and described computing process according to formula (8),
The coarse pass of attribute and attribute set is:
Wherein, b represents the attribute set of described computing process, and an represents the arbitrarily attribute of a pair of " not using ", RT
(b, an)Represent the attribute set of described computing process and any coarse relation of the attribute of a pair of " use ";
Be used for selecting the attribute of result of calculation minimum, this attribute joined in the attribute set of described computing process, and the attribute set of described computing process is labeled as the module of " using "
Be used for calculating according to formula (6) module of the relation of the attribute set of described computing process and dimension complete or collected works D;
Wherein, w represents the attribute set of described computing process, and IND (w) is the corresponding undistinguishable relation of w subset, and Card (U) is the order of set of computations, POS
D(X) be that X is corresponding to the positive territory of D;
Be used for working as γ
D(w)=1 o'clock, export the module of the attribute set of described computing process;
Be used for working as γ
D(w)=0 o'clock, in described computing process, calculate the module of relation of the attribute set of the attribute of whenever a pair of " using " and described computing process according to formula (8).
Claims (10)
1. the sorting technique of integrated classifier is characterized in that, it comprises the steps:
Step 1, the mode that adopts multi-process and multithreading to make up read pending raster data, and detailed process comprises the steps:
The sub-classifier number n of A, input integrated classifier;
N is the number of sub-classifier, and n is more than or equal to 2, by Expectation Algorithm all space attributes of raster data is divided into n part according to decision-making capability, and each sorter all possesses the whole classification capacity of complete or collected works,
B, n+1 process of startup;
Wherein, n+1 process is Rank 0, Rank 1 ... Rankn; Rank0 is managing process, and Rank 1 ... Rankn is the computing process, computing process Rank 1 ... Rankn is corresponding one by one with n sub-classifier respectively,
C, when current process is managing process Rank 0, the coarse relation table that structure is empty evenly is allocated to each computing process with pending raster data; Start n thread, the separately corresponding computing process of each thread;
Wherein, thread comprises the 1st thread, the 2nd thread ... the n thread,
D, when current process is the computing process, each process all reads pending raster data simultaneously;
Step 2, managing process Rank0 maintain attribute discretize interval table, and this attribute discretization interval table evenly is allocated to a plurality of threads, the raster data that described a plurality of threads start simultaneously to corresponding space connection attribute carries out discretize;
Step 3, managing process Rank0 evenly give n computing process with space attribute and process, and collect the result of n computing process, the coarse relation table that structure is complete, should issue each computing process by coarse relation table, each computing process is set up an attribute set according to coarse relation table;
Step 4, managing process Rank0 carry out parallel training sub-classifier production model with each computing process according to corresponding attribute set, this sub-classifier is and described process sub-classifier one to one, each sub-classifier is predicted the type of the attribute set that this sub-classifier is corresponding, add up all sub-classifiers and predict the outcome, choose maximum the predicting the outcome of ballot in the mode of vote by ballot.
2. the sorting technique of integrated classifier according to claim 1 is characterized in that, the described raster data of steps A is high-dimensional raster data.
3. the sorting technique of integrated classifier according to claim 1 and 2 is characterized in that, described each thread of step 2 starts the concrete steps that raster data to corresponding space connection attribute carries out discretize and is:
Step 2 one, the cluster number is set is ceil;
Step 2 two, between the maximal value of the space connection attribute that this thread starts and minimum value, ask for even distributional clustering initial center;
Step 2 three, according to the K-Means algorithm even distributional clustering initial center is carried out cluster, form ceil cluster;
Step 2 four, export its minimum and maximal value for each cluster, it is interval to form ceil codomain;
Step 2 five, consist of an interval tabulation with described ceil codomain is interval.
4. the sorting technique of integrated classifier according to claim 1 and 2, it is characterized in that, coarse relation table described in the described step 3 is a bivariate table, represent two direct overlapping degree of attribute, coarse pass is that the directly related property of 1 expression attribute is the strongest, coarse pass is that 0 expression is least relevant, and coarse relation table is as follows:
5. the sorting technique of integrated classifier according to claim 4 is characterized in that, the concrete steps that each computing process is set up an attribute set according to coarse relation table in the step 3 are:
Step 3 one, in the coarse relation table of described computing process, select at random the incoherent attribute of a pair of coarse relation, the state of this attribute is " not using ", this attribute is added in the attribute set of described computing process, this subset is and described computing process subset one to one, and it is labeled as " using "
The state of attribute is " using " or " not using ";
Step 3 two, in described computing process, calculate the relation of the attribute set of the attribute of whenever a pair of " using " and described computing process according to formula (8),
The coarse pass of attribute and attribute set is:
Wherein, b represents the attribute set of described computing process, and an represents the arbitrarily attribute of a pair of " not using ", RT
(b, an)Represent the attribute set of described computing process and any coarse relation of the attribute of a pair of " use ";
Step 3 three, select the attribute of result of calculation minimum, this attribute is joined in the attribute set of described computing process, and the attribute set of described computing process is labeled as " using ";
Step 3 four, calculate the attribute set of described computing process and the relation of dimension complete or collected works D according to formula (6);
Wherein, w represents the attribute set of described computing process, and IND (w) is the corresponding undistinguishable relation of w subset, and Card (U) is the order of set of computations, POS
D(X) be that X is corresponding to the positive territory of D;
Step 3 five, work as γ
D(w)=1 o'clock, export the attribute set of described computing process;
Step 3 six, work as γ
D(w)=0 o'clock, in described computing process, calculate the relation of the attribute set of the attribute of whenever a pair of " using " and described computing process according to formula (8).
6. integrated classifier is characterized in that, it comprises following apparatus:
The mode that is used for the combination of multi-process and multithreading reads the device of pending raster data, and this device comprises such as lower module:
The module that is used for the sub-classifier number n of input integrated classifier;
Wherein, n is the number of sub-classifier, and n is more than or equal to 2, by Expectation Algorithm all space attributes of raster data is divided into n part according to decision-making capability, and each sorter all possesses the whole classification capacity of complete or collected works,
Be used for starting the module of n+1 process;
Wherein, n+1 process is Rank 0, Rank 1 ... Rankn; Rank0 is managing process, and Rank 1 ... Rankn is the computing process, computing process Rank 1 ... Rankn is corresponding one by one with n sub-classifier respectively,
Be used for when current process is managing process Rank 0, the coarse relation table that structure is empty evenly is allocated to each computing process with pending raster data; Start n thread, the module of the separately corresponding computing process of each thread;
Wherein, thread comprises the 1st thread, the 2nd thread ... the n thread,
Be used for when current process is the computing process, each process all reads the module of pending raster data simultaneously;
Be used for managing process Rank0 maintain attribute discretize interval table, and this attribute discretization interval table evenly is allocated to a plurality of threads, described a plurality of threads start the device that the raster data of corresponding space connection attribute is carried out discretize simultaneously;
Be used for managing process Rank0 space attribute is evenly given n computing process processing, and collect the result of n computing process, the coarse relation table that structure is complete, should issue each computing process by coarse relation table, each computing process is set up the device of an attribute set according to coarse relation table;
Be used for managing process Rank0 each computing process is carried out parallel training sub-classifier production model according to corresponding attribute set, this sub-classifier is and described process sub-classifier one to one, each sub-classifier is predicted the type of the attribute set that this sub-classifier is corresponding, add up all sub-classifiers and predict the outcome, choose the maximum device that predicts the outcome of ballot in the mode of vote by ballot.
7. integrated classifier according to claim 6 is characterized in that, described raster data is high-dimensional raster data.
8. according to claim 6 or 7 described integrated classifiers, it is characterized in that, be used for managing process Rank0 maintain attribute discretize interval table, and this attribute discretization interval table evenly is allocated to a plurality of threads, described a plurality of thread starts the device that the raster data of corresponding space connection attribute is carried out discretize simultaneously, comprises such as lower module:
Be used for arranging the module that the cluster number is ceil;
Be used between the maximal value of the space connection attribute that this thread starts and minimum value, asking for the module of even distributional clustering initial center;
Be used for according to the K-Means algorithm even distributional clustering initial center being carried out cluster, form the module of ceil cluster;
Be used for each cluster is exported its minimum and maximal value, form the module in ceil codomain interval;
Be used for the interval module that consists of an interval tabulation of described ceil codomain.
9. integrated classifier according to claim 6 is characterized in that, described coarse relation table is a bivariate table, represent two direct overlapping degree of attribute, coarse pass is that the directly related property of 1 expression attribute is the strongest, and coarse pass is that 0 expression is least relevant, and coarse relation table is as follows:
10. the sorting technique of the integrated classifier of the classification towards raster data according to claim 9, it is characterized in that, being used for managing process Rank0 evenly gives space attribute n computing process processing and collects the result of n computing process, the coarse relation table that structure is complete, should issue each computing process by coarse relation table, each computing process is set up the device of an attribute set according to coarse relation table, comprises such as lower module:
Be used for selecting at random the incoherent attribute of a pair of coarse relation at the coarse relation table of described computing process, the state of this attribute is " not using ", this attribute is added in the attribute set of described computing process, this subset is and described computing process subset one to one, and it is labeled as the module of " using "
The state of attribute is " using " or " not using ";
Be used in described computing process, calculate the module of relation of the attribute set of the attribute of whenever a pair of " using " and described computing process according to formula (8),
The coarse pass of attribute and attribute set is:
Wherein, b represents the attribute set of described computing process, and an represents the arbitrarily attribute of a pair of " not using ", RT
(b, an)Represent the attribute set of described computing process and any coarse relation of the attribute of a pair of " use ";
Be used for selecting the attribute of result of calculation minimum, this attribute joined in the attribute set of described computing process, and the attribute set of described computing process is labeled as the module of " using "
Be used for calculating according to formula (6) module of the relation of the attribute set of described computing process and dimension complete or collected works D;
Wherein, w represents the attribute set of described computing process, and IND (w) is the corresponding undistinguishable relation of w subset, and Cd (U) is the order of set of computations, POS
D(X) be that X is corresponding to the positive territory of D;
Be used for working as γ
D(w)=1 o'clock, export the module of the attribute set of described computing process;
Be used for working as γ
D(w)=0 o'clock, in described computing process, calculate the module of relation of the attribute set of the attribute of whenever a pair of " using " and described computing process according to formula (8).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210379640.9A CN102930290B (en) | 2012-10-09 | 2012-10-09 | The sorting technique of integrated classifier and this device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210379640.9A CN102930290B (en) | 2012-10-09 | 2012-10-09 | The sorting technique of integrated classifier and this device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102930290A true CN102930290A (en) | 2013-02-13 |
CN102930290B CN102930290B (en) | 2015-08-19 |
Family
ID=47645087
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210379640.9A Expired - Fee Related CN102930290B (en) | 2012-10-09 | 2012-10-09 | The sorting technique of integrated classifier and this device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102930290B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104484404A (en) * | 2014-12-15 | 2015-04-01 | 中国科学院东北地理与农业生态研究所 | Improved processing method for geo-raster data file in distributed file system |
CN105303470A (en) * | 2015-11-26 | 2016-02-03 | 国网辽宁省电力有限公司大连供电公司 | Electric power project planning and construction method based on big data |
CN107203775A (en) * | 2016-03-18 | 2017-09-26 | 阿里巴巴集团控股有限公司 | A kind of method of image classification, device and equipment |
CN111259273A (en) * | 2018-11-30 | 2020-06-09 | 顺丰科技有限公司 | Webpage classification model construction method, classification method and device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101251896A (en) * | 2008-03-21 | 2008-08-27 | 腾讯科技(深圳)有限公司 | Object detecting system and method based on multiple classifiers |
US7562017B1 (en) * | 2003-05-29 | 2009-07-14 | At&T Intellectual Property Ii, L.P. | Active labeling for spoken language understanding |
-
2012
- 2012-10-09 CN CN201210379640.9A patent/CN102930290B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7562017B1 (en) * | 2003-05-29 | 2009-07-14 | At&T Intellectual Property Ii, L.P. | Active labeling for spoken language understanding |
CN101251896A (en) * | 2008-03-21 | 2008-08-27 | 腾讯科技(深圳)有限公司 | Object detecting system and method based on multiple classifiers |
Non-Patent Citations (1)
Title |
---|
潘欣等: "粗集属性划分的集成遥感分类", 《遥感学报》, 31 December 2009 (2009-12-31) * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104484404A (en) * | 2014-12-15 | 2015-04-01 | 中国科学院东北地理与农业生态研究所 | Improved processing method for geo-raster data file in distributed file system |
CN104484404B (en) * | 2014-12-15 | 2017-11-07 | 中国科学院东北地理与农业生态研究所 | One kind improves geographical raster data document handling method in distributed file system |
CN105303470A (en) * | 2015-11-26 | 2016-02-03 | 国网辽宁省电力有限公司大连供电公司 | Electric power project planning and construction method based on big data |
CN107203775A (en) * | 2016-03-18 | 2017-09-26 | 阿里巴巴集团控股有限公司 | A kind of method of image classification, device and equipment |
CN107203775B (en) * | 2016-03-18 | 2021-07-27 | 斑马智行网络(香港)有限公司 | Image classification method, device and equipment |
CN111259273A (en) * | 2018-11-30 | 2020-06-09 | 顺丰科技有限公司 | Webpage classification model construction method, classification method and device |
Also Published As
Publication number | Publication date |
---|---|
CN102930290B (en) | 2015-08-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Paredes et al. | Machine learning or discrete choice models for car ownership demand estimation and prediction? | |
Chen et al. | Financial credit risk assessment: a recent review | |
Wang et al. | Data-driven mode identification and unsupervised fault detection for nonlinear multimode processes | |
Sun et al. | An objective-based scenario selection method for transmission network expansion planning with multivariate stochasticity in load and renewable energy sources | |
Hachicha et al. | A survey of control-chart pattern-recognition literature (1991–2010) based on a new conceptual classification scheme | |
CN105320957B (en) | Classifier training method and device | |
CN106845717B (en) | Energy efficiency evaluation method based on multi-model fusion strategy | |
US11514369B2 (en) | Systems and methods for machine learning model interpretation | |
Tsai | Global data mining: An empirical study of current trends, future forecasts and technology diffusions | |
Lawi et al. | Ensemble GradientBoost for increasing classification accuracy of credit scoring | |
Doumpos et al. | Additive support vector machines for pattern classification | |
CN107209754A (en) | Technology and semantic signal processing in large-scale unstructured data field | |
CN107480694A (en) | Three clustering methods are integrated using the weighting selection evaluated twice based on Spark platforms | |
CN109325607A (en) | A kind of short-term wind power forecast method and system | |
CN102930290B (en) | The sorting technique of integrated classifier and this device | |
Trstenjak et al. | Determining the impact of demographic features in predicting student success in Croatia | |
Li et al. | A new approach for manufacturing forecast problems with insufficient data: the case of TFT–LCDs | |
Hanczar | Performance visualization spaces for classification with rejection option | |
CN112613542A (en) | Bidirectional LSTM-based enterprise decontamination equipment load identification method | |
Yun et al. | Application of the PSO-SVM model for Credit Scoring | |
CN106022359A (en) | Fuzzy entropy space clustering analysis method based on orderly information entropy | |
Li et al. | Exploring Feature Selection With Limited Labels: A Comprehensive Survey of Semi-Supervised and Unsupervised Approaches | |
Krishnamurthy et al. | 9Cr steel visualization and predictive modeling | |
Sivakumar et al. | A hybrid text classification approach using KNN and SVM | |
Zhu et al. | ε-Proximal support vector machine for binary classification and its application in vehicle recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20150819 Termination date: 20181009 |