CN102930290A - Integrated classifier and classification method thereof - Google Patents

Integrated classifier and classification method thereof Download PDF

Info

Publication number
CN102930290A
CN102930290A CN2012103796409A CN201210379640A CN102930290A CN 102930290 A CN102930290 A CN 102930290A CN 2012103796409 A CN2012103796409 A CN 2012103796409A CN 201210379640 A CN201210379640 A CN 201210379640A CN 102930290 A CN102930290 A CN 102930290A
Authority
CN
China
Prior art keywords
attribute
computing process
classifier
coarse
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012103796409A
Other languages
Chinese (zh)
Other versions
CN102930290B (en
Inventor
张淑清
潘欣
张策
姜春雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeast Institute of Geography and Agroecology of CAS
Original Assignee
Northeast Institute of Geography and Agroecology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeast Institute of Geography and Agroecology of CAS filed Critical Northeast Institute of Geography and Agroecology of CAS
Priority to CN201210379640.9A priority Critical patent/CN102930290B/en
Publication of CN102930290A publication Critical patent/CN102930290A/en
Application granted granted Critical
Publication of CN102930290B publication Critical patent/CN102930290B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to an integrated classifier and a classification method thereof, and the classifier and the method are used for solving the problems of low speed and precision as well as biasing characteristics and nondeterministic polynomial of attribute subsets in the field of spatial raster data monitoring and classification. The integrated classifier and the classification method adopt an attribute division mode, combine training data subsets with a parallel computing technique, and can be applied to high-latitude raster data; and as the integrated classifier and the classification method adopt a fuzzy rough sets theory as a standard for parallel division of high-altitude attributes, each subset has independent characteristics, and the integrity of a strategy is maintained. Therefore, the integrated classifier and the classification method are applicable to discrete and continuous heterogeneous data and can be applied to the fields of remote sensing and geographical information systems.

Description

The sorting technique of integrated classifier and this device
Technical field
The present invention relates to RS and GIS system field.
Background technology
In existing space raster data supervised classification field, the main technology of using comprises nerve net, support vector machine, decision tree, Bayes, KNN scheduling algorithm.The Main Means that these algorithms adopt is inputted exactly the training data algorithm and is learnt to produce " disaggregated model ", by " disaggregated model " the further classification information of predicted position data.For high-dimensional data, usually adopt " attribute is chosen " algorithm, reduce dimension and improve speed.
A current other important technology that adopts is exactly " integrated classifier ", and integrated classifier is voted by a plurality of classifiers combination of isomery, and expectation obtains the nicety of grading higher than single sorter.
. in processing the space lattice data procedures, often need in the face of data magnanimity, the superelevation dimension, comprise space attribute more than 2000 such as some spatial data, data volume will be processed fast and effectively these data and will face some difficulties more than several TB:
(1) speed issue: when data volume is excessive, when especially dimension strengthens, the expense of Algorithm for Training disaggregated model also will strengthen, current popular may several hours can not obtain training result based on the SVM algorithm routine (as: LIBSVM) of C++, perhaps until memory headroom exhaust also can't the inventory analysis result.
(2) attribute set problem: for raising speed, a lot of algorithms all adopt " attribute is chosen ".On the one hand, choosing suitable attribute set from a very large property set is a nondeterministic polynomial problem, and combined number too much is difficult to exhaustive; The sub-attribute of near-optimization has " biasing " characteristic usually, and the precision of prediction of some classification has certain loss.
(3) precision problem: in order to solve precision problem, a lot of algorithms adopt " integrated classifier " technology, exactly training data are divided into a plurality of subsets, are then training ballot.For high-dimensional data grid, on the one hand, because data volume is larger, thus be difficult to guarantee difference between the sub-classifier, and the too approximate purpose that will not reach " integrated " and " ballot " of a plurality of sub-classifier; On the other hand, the training data subset of a large amount of attribute counterparts will cause " overfitting " phenomenon; These two kinds of problems all cause nicety of grading to reduce.
Have in existing space raster data supervised classification field in sum that speed is slow, precision is low, attribute set has biasing characteristic and attribute set is the problem of nondeterministic polynomial.
Summary of the invention
The present invention exists in the existing space raster data supervised classification field in order to solve that speed is slow, precision is low, attribute set has biasing characteristic and attribute set is the problem of nondeterministic polynomial, thereby has proposed the sorting technique of integrated classifier and this device.
The sorting technique of integrated classifier, it comprises the steps:
Step 1, the mode that adopts multi-process and multithreading to make up read pending raster data, and detailed process comprises the steps:
The sub-classifier number n of A, input integrated classifier;
N is the number of sub-classifier, and n is more than or equal to 2, by Expectation Algorithm all space attributes of raster data is divided into n part according to decision-making capability, and each sorter all possesses the whole classification capacity of complete or collected works,
B, n+1 process of startup;
Wherein, n+1 process is Rank 0, Rank 1 ... Rankn; Rank0 is managing process, and Rank 1 ... Rankn is the computing process, computing process Rank 1 ... Rankn is corresponding one by one with n sub-classifier respectively,
C, when current process is managing process Rank 0, the coarse relation table that structure is empty evenly is allocated to each computing process with pending raster data; Start n thread, the separately corresponding computing process of each thread;
Wherein, thread comprises the 1st thread, the 2nd thread ... the n thread,
D, when current process is the computing process, each process all reads pending raster data simultaneously;
Step 2, managing process Rank0 maintain attribute discretize interval table, and this attribute discretization interval table evenly is allocated to a plurality of threads, the raster data that described a plurality of threads start simultaneously to corresponding space connection attribute carries out discretize;
Step 3, managing process Rank0 evenly give n computing process with space attribute and process, and collect the result of n computing process, the coarse relation table that structure is complete, should issue each computing process by coarse relation table, each computing process is set up an attribute set according to coarse relation table;
Step 4, managing process Rank0 carry out parallel training sub-classifier production model with each computing process according to corresponding attribute set, this sub-classifier is and described process sub-classifier one to one, each sub-classifier is predicted the type of the attribute set that this sub-classifier is corresponding, add up all sub-classifiers and predict the outcome, choose maximum the predicting the outcome of ballot in the mode of vote by ballot.
Integrated classifier, it comprises following apparatus:
The mode that is used for the combination of multi-process and multithreading reads the device of pending raster data, and this device comprises such as lower module:
The module that is used for the sub-classifier number n of input integrated classifier;
Wherein, n is the number of sub-classifier, and n is more than or equal to 2, by Expectation Algorithm all space attributes of raster data is divided into n part according to decision-making capability, and each sorter all possesses the whole classification capacity of complete or collected works,
Be used for starting the module of n+1 process;
Wherein, n+1 process is Rank 0, Rank 1 ... Rankn; Rank0 is managing process, and Rank 1 ... Rankn is the computing process, computing process Rank 1 ... Rankn is corresponding one by one with n sub-classifier respectively,
Be used for when current process is managing process Rank 0, the coarse relation table that structure is empty evenly is allocated to each computing process with pending raster data; Start n thread, the module of the separately corresponding computing process of each thread;
Wherein, thread comprises the 1st thread, the 2nd thread ... the n thread,
Be used for when current process is the computing process, each process all reads the module of pending raster data simultaneously;
Be used for managing process Rank0 maintain attribute discretize interval table, and this attribute discretization interval table evenly is allocated to a plurality of threads, described a plurality of threads start the device that the raster data of corresponding space connection attribute is carried out discretize simultaneously;
Be used for managing process Rank0 space attribute is evenly given n computing process processing, and collect the result of n computing process, the coarse relation table that structure is complete, should issue each computing process by coarse relation table, each computing process is set up the device of an attribute set according to coarse relation table;
Be used for managing process Rank0 each computing process is carried out parallel training sub-classifier production model according to corresponding attribute set, this sub-classifier is and described process sub-classifier one to one, each sub-classifier is predicted the type of the attribute set that this sub-classifier is corresponding, add up all sub-classifiers and predict the outcome, choose the maximum device that predicts the outcome of ballot in the mode of vote by ballot.
The present invention has following advantage:
(1) adopt the attribute dividing mode, rather than sample dividing mode structure training data subset.
(2) training data subset and parallel computing are combined, be applied to the high latitude raster data.
(3) use the Fuzzy Rough Sets theory as the parallel standard of dividing of high latitude attribute, so that every subset namely has own autonomous behavior, kept again the decision-making integrality.
(4) be adapted to the isomeric data of discrete type, continuous type.
Description of drawings
Fig. 1 is the process flow diagram of the sorting technique of integrated classifier;
Fig. 2 reads the process flow diagram of pending raster data concrete steps for the mode that adopts the combination of multi-process and multithreading;
Fig. 3 starts the concrete steps process flow diagram that the raster data of corresponding space connection attribute is carried out discretize for each thread;
Fig. 4 is the graph of a relation between each thread in the discretize process, 2≤l among the figure≤n;
Fig. 5 is the structure of coarse relation table and the graph of a relation of attribute use table;
Fig. 6 is the process flow diagram in training production model stage.
Embodiment
Embodiment one, specify present embodiment in conjunction with Fig. 1 and Fig. 2, the sorting technique of the described integrated classifier of present embodiment, it comprises the steps:
Step 1, the mode that adopts multi-process and multithreading to make up read pending raster data, and detailed process comprises the steps:
The sub-classifier number n of A, input integrated classifier;
N is the number of sub-classifier, and n is more than or equal to 2, by Expectation Algorithm all space attributes of raster data is divided into n part according to decision-making capability, and each sorter all possesses the whole classification capacity of complete or collected works,
B, n+1 process of startup;
Wherein, n+1 process is Rank 0, Rank 1 ... Rankn; Rank0 is managing process, and Rank 1 ... Rankn is the computing process, computing process Rank 1 ... Rankn is corresponding one by one with n sub-classifier respectively,
C, when current process is managing process Rank 0, the coarse relation table that structure is empty evenly is allocated to each computing process with pending raster data; Start n thread, the separately corresponding computing process of each thread;
Wherein, thread comprises the 1st thread, the 2nd thread ... the n thread,
D, when current process is the computing process, each process all reads pending raster data simultaneously;
Step 2, managing process Rank0 maintain attribute discretize interval table, and this attribute discretization interval table evenly is allocated to a plurality of threads, the raster data that described a plurality of threads start simultaneously to corresponding space connection attribute carries out discretize;
Step 3, managing process Rank0 evenly give n computing process with space attribute and process, and collect the result of n computing process, the coarse relation table that structure is complete, should issue each computing process by coarse relation table, each computing process is set up an attribute set according to coarse relation table;
Step 4, managing process Rank0 carry out parallel training sub-classifier production model with each computing process according to corresponding attribute set, this sub-classifier is and described process sub-classifier one to one, each sub-classifier is predicted the type of the attribute set that this sub-classifier is corresponding, add up all sub-classifiers and predict the outcome, choose maximum the predicting the outcome of ballot in the mode of vote by ballot.
Present embodiment is after step 3, each process all obtains " attribute set ", each process is by the sorter (as: ID3 of an appointment of attribute set parallel training, SVM, this class model of nerve net is traditional algorithm), can use data volume (data volume is dwindled tens of quilts doubly for relatively hundreds of dimensions, 10-20 usually of the size of the every subset of this algorithm) the Fast Training production model of less.These models can be only in decision process with the form of vote by ballot as shown in Figure 6, can effectively prevent overfitting, increase nicety of grading.
The described vote by ballot mode of present embodiment is: if n sorter arranged at present, for the object x of a needs prediction, this n sorter is made a prediction respectively, and interim m1 sorter decision-making thought " the category-A type ", " B " type is thought in m2 sorter decision-making.At this moment with ballot, the minority is subordinate to the majority is principle, and the decision-making of getting than the multi-categorizer approval is the decision-making of integrated classifier integral body.It is exactly the vote by ballot process.
The difference of the sorting technique of embodiment two, present embodiment and embodiment one described integrated classifier is that the described raster data of steps A is high-dimensional raster data.
Present embodiment is for the high-dimensional raster data of magnanimity, and the slow precision of traditional algorithm speed is low, and this patent reaches the fast processing raster data, obtains disaggregated model, and owing to adopt the isomery decision-making mechanism, so nicety of grading is also higher.
Embodiment three, specify present embodiment in conjunction with Fig. 3, the difference of the sorting technique of present embodiment and embodiment one or two described integrated classifiers is, described each thread of step 2 starts the concrete steps that raster data to corresponding space connection attribute carries out discretize and is:
Step 2 one, the cluster number is set is ceil;
Step 2 two, between the maximal value of the space connection attribute that this thread starts and minimum value, ask for even distributional clustering initial center;
Step 2 three, according to the K-Means algorithm even distributional clustering initial center is carried out cluster, form ceil cluster;
Step 2 four, export its minimum and maximal value for each cluster, it is interval to form ceil codomain;
Step 2 five, consist of an interval tabulation with described ceil codomain is interval.
Present embodiment is by discretize, and it is interval to obtain discretize, and 1,2,3,4 grades that just original continuous data can be become limited number by this class interval are digital, and distinct relation is accelerated compare of analysis speed.In the multi-process situation, the treatment scheme of all data such as Fig. 4.
The difference of the sorting technique of embodiment four, present embodiment and embodiment one or two described integrated classifiers is, coarse relation table described in the described step 3 is a bivariate table, represent two direct overlapping degree of attribute, coarse pass is that the directly related property of 1 expression attribute is the strongest, coarse pass is that 0 expression is least relevant, and coarse relation table is as follows:
Figure BDA00002233108900051
The difference of the sorting technique of embodiment five, present embodiment and embodiment four described integrated classifiers is that the concrete steps that each computing process is set up an attribute set according to coarse relation table in the step 3 are:
Step 3 one, in the coarse relation table of described computing process, select at random the incoherent attribute of a pair of coarse relation, the state of this attribute is " not using ", this attribute is added in the attribute set of described computing process, this subset is and described computing process subset one to one, and it is labeled as " using "
The state of attribute is " using " or " not using ";
Step 3 two, in described computing process, calculate the relation of the attribute set of the attribute of whenever a pair of " using " and described computing process according to formula (8),
The coarse pass of attribute and attribute set is:
RTD = Σ 1 n RT ( b , an ) - - - ( 8 )
Wherein, b represents the attribute set of described computing process, and an represents the arbitrarily attribute of a pair of " not using ", RT (b, an)Represent the attribute set of described computing process and any coarse relation of the attribute of a pair of " use ";
Step 3 three, select the attribute of result of calculation minimum, this attribute is joined in the attribute set of described computing process, and the attribute set of described computing process is labeled as " using ";
Step 3 four, calculate the attribute set of described computing process and the relation of dimension complete or collected works D according to formula (6);
γ D ( w ) = Card ( U X ⋐ IND ( w ) POS D ( X ) ) Card ( U ) - - - ( 6 )
Wherein, w represents the attribute set of described computing process, and IND (w) is the corresponding undistinguishable of w subset relation, and the element in set W is namely thought to can not distinguish, and is incomparable; Card (U) is the order of set of computations, is the abbreviation of cardinal; POS D(X) be X corresponding to the positive territory of D, more generally saying is, X set is gathered fully by D and is comprised;
Step 3 five, work as γ D(w)=1 o'clock, export the attribute set of described computing process;
Step 3 six, work as γ D(w)=0 o'clock, in described computing process, calculate the relation of the attribute set of the attribute of whenever a pair of " using " and described computing process according to formula (8).
According to the Rough Set of Pawlak, an infosystem S can be counted as a tables of data.It can be by S=(U, A) being represented wherein: domain U is the non-limit set of having in vain; A is the non-community set of having limit in vain; For each the element a ∈ A among the A, there is a mapping a:U → V a, V wherein aIt is the set of a value.Decision table is exactly the infosystem of shape such as S=(U, A ∪ { d}), wherein
Figure BDA00002233108900071
It is decision attribute.For community set arbitrarily
Figure BDA00002233108900072
There is a undistinguishable relations I ND (P):
IND ( P ) = { ( x , y ) ∈ U 2 | ∀ a ∈ P , a ( x ) = a ( y ) } - - - ( 1 )
Wherein, x and y are under the hyperspace, the various dimensions vector;
Equivalence class based on P undistinguishable relation can be defined as:
[x] p={y∈U |(x,y)∈IND(P)} (2)
Can define up and down approximate collection according to undistinguishable relation. order set X ∈ U, X can be by the expressions of following two Set approximations:
Lower approximate collection: Upper approximate collection:
Figure BDA00002233108900075
If
Figure BDA00002233108900076
So right
Figure BDA00002233108900077
Just be referred to as rough set.Defining positive territory, negative territory and territory, limit: X is a set,
Figure BDA00002233108900078
The lower approximate collection of expression X, [x] D∩ X represents the upper approximate collection of X,
POS D(X)= DX (3)
NEG D ( X ) = 1 - D ‾ X - - - ( 4 )
BND D(X)=NEG D(X)-POS D(X) (5)
A key concept of rough set is the dependency degree between the attribute.Attribute Q can be defined as (the Feature Dependence degree is represented by γ) for the degree of dependence of attribute D:
γ D ( w ) = Card ( U X ⋐ IND ( w ) POS D ( X ) ) Card ( U ) - - - ( 6 )
As can be known when subset R is the summation of all dimensions (attribute), formula (6) result is 1 according to formula (6).Work as D 1={ a 1, D 2={ a 1, a 2The time calculate Diff (D2D1)=γ D2(Q)-γ D1(Q), if this moment, Diff value was larger, illustrate that the decision domain of a1 and a2 dimension covering is different and scope is larger, be fit to combine; If the Diff value is less, the decision domain close (extreme case a2=a1, two attributes of Diff=0 are without any difference) that a1 and a2 dimension cover is described, two attributes are not suitable for combining.So each the list item computing formula in the coarse relation table is:
The coarse RT that concerns (a1, a2)=1-(γ { a1, a2}(Q)-γ A1(Q)) (7)
For an attribute b and a property set D={a 1, a 2..., a n, its coarse pass is:
RTD = Σ 1 n RT ( b , an ) - - - ( 8 )
The coarse calculated amount that concerns is larger, need to walk abreast according to Fig. 5, and can traveling through coarse relation table after calculating, to calculate " attribute and set relations " this process be the statistic processes of suing for peace, and calculated amount is less.After coarse relation table is set up, obtain each attribute set.
The sorting technique of embodiment six, the described integrated classifier of present embodiment, it comprises the steps:
The mode that is used for the combination of multi-process and multithreading reads the device of pending raster data, and this device comprises such as lower module:
The module that is used for the sub-classifier number n of input integrated classifier;
Wherein, n is the number of sub-classifier, and n is more than or equal to 2, by Expectation Algorithm all space attributes of raster data is divided into n part according to decision-making capability, and each sorter all possesses the whole classification capacity of complete or collected works,
Be used for starting the module of n+1 process;
Wherein, n+1 process is Rank 0, Rank 1 ... Rankn; Rank0 is managing process, and Rank 1 ... Rankn is the computing process, computing process Rank 1 ... Rankn is corresponding one by one with n sub-classifier respectively,
Be used for when current process is managing process Rank 0, the coarse relation table that structure is empty evenly is allocated to each computing process with pending raster data; Start n thread, the module of the separately corresponding computing process of each thread;
Wherein, thread comprises the 1st thread, the 2nd thread ... the n thread,
Be used for when current process is the computing process, each process all reads the module of pending raster data simultaneously;
Be used for managing process Rank0 maintain attribute discretize interval table, and this attribute discretization interval table evenly is allocated to a plurality of threads, described a plurality of threads start the device that the raster data of corresponding space connection attribute is carried out discretize simultaneously;
Be used for managing process Rank0 space attribute is evenly given n computing process processing, and collect the result of n computing process, the coarse relation table that structure is complete, should issue each computing process by coarse relation table, each computing process is set up the device of an attribute set according to coarse relation table;
Be used for managing process Rank0 each computing process is carried out parallel training sub-classifier production model according to corresponding attribute set, this sub-classifier is and the described process device of sub-classifier one to one.
Be used for managing process Rank0 each computing process is carried out parallel training sub-classifier production model according to corresponding attribute set, this sub-classifier is and described process sub-classifier one to one, each sub-classifier is predicted the type of the attribute set that this sub-classifier is corresponding according to the fuzzy coarse central theory, add up all sub-classifiers and predict the outcome, choose the maximum device that predicts the outcome of ballot in the mode of vote by ballot.
The difference of the sorting technique of embodiment seven, present embodiment and embodiment six described integrated classifiers is that described raster data is high-dimensional raster data.
The difference of the sorting technique of embodiment eight, present embodiment and embodiment six or seven described integrated classifiers is, be used for managing process Rank0 maintain attribute discretize interval table, and this attribute discretization interval table evenly is allocated to a plurality of threads, described a plurality of thread starts the device that the raster data of corresponding space connection attribute is carried out discretize simultaneously, comprises such as lower module:
Be used for arranging the module that the cluster number is ceil;
Be used between the maximal value of the space connection attribute that this thread starts and minimum value, asking for the module of even distributional clustering initial center;
Be used for according to the K-Means algorithm even distributional clustering initial center being carried out cluster, form the module of ceil cluster;
Be used for each cluster is exported its minimum and maximal value, form the module in ceil codomain interval;
Be used for the interval module that consists of an interval tabulation of described ceil codomain.
The difference of the sorting technique of embodiment nine, present embodiment and embodiment six described integrated classifiers is, described coarse relation table is a bivariate table, represent two direct overlapping degree of attribute, coarse pass is that the directly related property of 1 expression attribute is the strongest, coarse pass is that 0 expression is least relevant, and coarse relation table is as follows:
Figure BDA00002233108900091
The difference of the sorting technique of embodiment ten, present embodiment and embodiment six described integrated classifiers is, being used for managing process Rank0 evenly gives space attribute n computing process processing and collects the result of n computing process, the coarse relation table that structure is complete, should issue each computing process by coarse relation table, each computing process is set up the device of an attribute set according to coarse relation table, comprises such as lower module:
Be used for selecting at random the incoherent attribute of a pair of coarse relation at the coarse relation table of described computing process, the state of this attribute is " not using ", this attribute is added in the attribute set of described computing process, this subset is and described computing process subset one to one, and it is labeled as the module of " using "
The state of attribute is " using " or " not using ";
Be used in described computing process, calculate the module of relation of the attribute set of the attribute of whenever a pair of " using " and described computing process according to formula (8),
The coarse pass of attribute and attribute set is:
RTD = Σ 1 n RT ( b , an ) - - - ( 8 )
Wherein, b represents the attribute set of described computing process, and an represents the arbitrarily attribute of a pair of " not using ", RT (b, an)Represent the attribute set of described computing process and any coarse relation of the attribute of a pair of " use ";
Be used for selecting the attribute of result of calculation minimum, this attribute joined in the attribute set of described computing process, and the attribute set of described computing process is labeled as the module of " using "
Be used for calculating according to formula (6) module of the relation of the attribute set of described computing process and dimension complete or collected works D;
γ D ( w ) = Card ( U X ⋐ IND ( w ) POS D ( X ) ) Card ( U ) - - - ( 6 )
Wherein, w represents the attribute set of described computing process, and IND (w) is the corresponding undistinguishable relation of w subset, and Card (U) is the order of set of computations, POS D(X) be that X is corresponding to the positive territory of D;
Be used for working as γ D(w)=1 o'clock, export the module of the attribute set of described computing process;
Be used for working as γ D(w)=0 o'clock, in described computing process, calculate the module of relation of the attribute set of the attribute of whenever a pair of " using " and described computing process according to formula (8).

Claims (10)

1. the sorting technique of integrated classifier is characterized in that, it comprises the steps:
Step 1, the mode that adopts multi-process and multithreading to make up read pending raster data, and detailed process comprises the steps:
The sub-classifier number n of A, input integrated classifier;
N is the number of sub-classifier, and n is more than or equal to 2, by Expectation Algorithm all space attributes of raster data is divided into n part according to decision-making capability, and each sorter all possesses the whole classification capacity of complete or collected works,
B, n+1 process of startup;
Wherein, n+1 process is Rank 0, Rank 1 ... Rankn; Rank0 is managing process, and Rank 1 ... Rankn is the computing process, computing process Rank 1 ... Rankn is corresponding one by one with n sub-classifier respectively,
C, when current process is managing process Rank 0, the coarse relation table that structure is empty evenly is allocated to each computing process with pending raster data; Start n thread, the separately corresponding computing process of each thread;
Wherein, thread comprises the 1st thread, the 2nd thread ... the n thread,
D, when current process is the computing process, each process all reads pending raster data simultaneously;
Step 2, managing process Rank0 maintain attribute discretize interval table, and this attribute discretization interval table evenly is allocated to a plurality of threads, the raster data that described a plurality of threads start simultaneously to corresponding space connection attribute carries out discretize;
Step 3, managing process Rank0 evenly give n computing process with space attribute and process, and collect the result of n computing process, the coarse relation table that structure is complete, should issue each computing process by coarse relation table, each computing process is set up an attribute set according to coarse relation table;
Step 4, managing process Rank0 carry out parallel training sub-classifier production model with each computing process according to corresponding attribute set, this sub-classifier is and described process sub-classifier one to one, each sub-classifier is predicted the type of the attribute set that this sub-classifier is corresponding, add up all sub-classifiers and predict the outcome, choose maximum the predicting the outcome of ballot in the mode of vote by ballot.
2. the sorting technique of integrated classifier according to claim 1 is characterized in that, the described raster data of steps A is high-dimensional raster data.
3. the sorting technique of integrated classifier according to claim 1 and 2 is characterized in that, described each thread of step 2 starts the concrete steps that raster data to corresponding space connection attribute carries out discretize and is:
Step 2 one, the cluster number is set is ceil;
Step 2 two, between the maximal value of the space connection attribute that this thread starts and minimum value, ask for even distributional clustering initial center;
Step 2 three, according to the K-Means algorithm even distributional clustering initial center is carried out cluster, form ceil cluster;
Step 2 four, export its minimum and maximal value for each cluster, it is interval to form ceil codomain;
Step 2 five, consist of an interval tabulation with described ceil codomain is interval.
4. the sorting technique of integrated classifier according to claim 1 and 2, it is characterized in that, coarse relation table described in the described step 3 is a bivariate table, represent two direct overlapping degree of attribute, coarse pass is that the directly related property of 1 expression attribute is the strongest, coarse pass is that 0 expression is least relevant, and coarse relation table is as follows:
Figure FDA00002233108800021
5. the sorting technique of integrated classifier according to claim 4 is characterized in that, the concrete steps that each computing process is set up an attribute set according to coarse relation table in the step 3 are:
Step 3 one, in the coarse relation table of described computing process, select at random the incoherent attribute of a pair of coarse relation, the state of this attribute is " not using ", this attribute is added in the attribute set of described computing process, this subset is and described computing process subset one to one, and it is labeled as " using "
The state of attribute is " using " or " not using ";
Step 3 two, in described computing process, calculate the relation of the attribute set of the attribute of whenever a pair of " using " and described computing process according to formula (8),
The coarse pass of attribute and attribute set is:
RTD = Σ 1 n RT ( b , an ) - - - ( 8 )
Wherein, b represents the attribute set of described computing process, and an represents the arbitrarily attribute of a pair of " not using ", RT (b, an)Represent the attribute set of described computing process and any coarse relation of the attribute of a pair of " use ";
Step 3 three, select the attribute of result of calculation minimum, this attribute is joined in the attribute set of described computing process, and the attribute set of described computing process is labeled as " using ";
Step 3 four, calculate the attribute set of described computing process and the relation of dimension complete or collected works D according to formula (6);
γ D ( w ) = Card ( U X ⋐ IND ( w ) POS D ( X ) ) Card ( U ) - - - ( 6 )
Wherein, w represents the attribute set of described computing process, and IND (w) is the corresponding undistinguishable relation of w subset, and Card (U) is the order of set of computations, POS D(X) be that X is corresponding to the positive territory of D;
Step 3 five, work as γ D(w)=1 o'clock, export the attribute set of described computing process;
Step 3 six, work as γ D(w)=0 o'clock, in described computing process, calculate the relation of the attribute set of the attribute of whenever a pair of " using " and described computing process according to formula (8).
6. integrated classifier is characterized in that, it comprises following apparatus:
The mode that is used for the combination of multi-process and multithreading reads the device of pending raster data, and this device comprises such as lower module:
The module that is used for the sub-classifier number n of input integrated classifier;
Wherein, n is the number of sub-classifier, and n is more than or equal to 2, by Expectation Algorithm all space attributes of raster data is divided into n part according to decision-making capability, and each sorter all possesses the whole classification capacity of complete or collected works,
Be used for starting the module of n+1 process;
Wherein, n+1 process is Rank 0, Rank 1 ... Rankn; Rank0 is managing process, and Rank 1 ... Rankn is the computing process, computing process Rank 1 ... Rankn is corresponding one by one with n sub-classifier respectively,
Be used for when current process is managing process Rank 0, the coarse relation table that structure is empty evenly is allocated to each computing process with pending raster data; Start n thread, the module of the separately corresponding computing process of each thread;
Wherein, thread comprises the 1st thread, the 2nd thread ... the n thread,
Be used for when current process is the computing process, each process all reads the module of pending raster data simultaneously;
Be used for managing process Rank0 maintain attribute discretize interval table, and this attribute discretization interval table evenly is allocated to a plurality of threads, described a plurality of threads start the device that the raster data of corresponding space connection attribute is carried out discretize simultaneously;
Be used for managing process Rank0 space attribute is evenly given n computing process processing, and collect the result of n computing process, the coarse relation table that structure is complete, should issue each computing process by coarse relation table, each computing process is set up the device of an attribute set according to coarse relation table;
Be used for managing process Rank0 each computing process is carried out parallel training sub-classifier production model according to corresponding attribute set, this sub-classifier is and described process sub-classifier one to one, each sub-classifier is predicted the type of the attribute set that this sub-classifier is corresponding, add up all sub-classifiers and predict the outcome, choose the maximum device that predicts the outcome of ballot in the mode of vote by ballot.
7. integrated classifier according to claim 6 is characterized in that, described raster data is high-dimensional raster data.
8. according to claim 6 or 7 described integrated classifiers, it is characterized in that, be used for managing process Rank0 maintain attribute discretize interval table, and this attribute discretization interval table evenly is allocated to a plurality of threads, described a plurality of thread starts the device that the raster data of corresponding space connection attribute is carried out discretize simultaneously, comprises such as lower module:
Be used for arranging the module that the cluster number is ceil;
Be used between the maximal value of the space connection attribute that this thread starts and minimum value, asking for the module of even distributional clustering initial center;
Be used for according to the K-Means algorithm even distributional clustering initial center being carried out cluster, form the module of ceil cluster;
Be used for each cluster is exported its minimum and maximal value, form the module in ceil codomain interval;
Be used for the interval module that consists of an interval tabulation of described ceil codomain.
9. integrated classifier according to claim 6 is characterized in that, described coarse relation table is a bivariate table, represent two direct overlapping degree of attribute, coarse pass is that the directly related property of 1 expression attribute is the strongest, and coarse pass is that 0 expression is least relevant, and coarse relation table is as follows:
Figure FDA00002233108800041
10. the sorting technique of the integrated classifier of the classification towards raster data according to claim 9, it is characterized in that, being used for managing process Rank0 evenly gives space attribute n computing process processing and collects the result of n computing process, the coarse relation table that structure is complete, should issue each computing process by coarse relation table, each computing process is set up the device of an attribute set according to coarse relation table, comprises such as lower module:
Be used for selecting at random the incoherent attribute of a pair of coarse relation at the coarse relation table of described computing process, the state of this attribute is " not using ", this attribute is added in the attribute set of described computing process, this subset is and described computing process subset one to one, and it is labeled as the module of " using "
The state of attribute is " using " or " not using ";
Be used in described computing process, calculate the module of relation of the attribute set of the attribute of whenever a pair of " using " and described computing process according to formula (8),
The coarse pass of attribute and attribute set is:
RTD = Σ 1 n RT ( b , an ) - - - ( 8 )
Wherein, b represents the attribute set of described computing process, and an represents the arbitrarily attribute of a pair of " not using ", RT (b, an)Represent the attribute set of described computing process and any coarse relation of the attribute of a pair of " use ";
Be used for selecting the attribute of result of calculation minimum, this attribute joined in the attribute set of described computing process, and the attribute set of described computing process is labeled as the module of " using "
Be used for calculating according to formula (6) module of the relation of the attribute set of described computing process and dimension complete or collected works D;
γ D ( w ) = Card ( U X ⋐ IND ( w ) POS D ( X ) ) Card ( U ) - - - ( 6 )
Wherein, w represents the attribute set of described computing process, and IND (w) is the corresponding undistinguishable relation of w subset, and Cd (U) is the order of set of computations, POS D(X) be that X is corresponding to the positive territory of D;
Be used for working as γ D(w)=1 o'clock, export the module of the attribute set of described computing process;
Be used for working as γ D(w)=0 o'clock, in described computing process, calculate the module of relation of the attribute set of the attribute of whenever a pair of " using " and described computing process according to formula (8).
CN201210379640.9A 2012-10-09 2012-10-09 The sorting technique of integrated classifier and this device Expired - Fee Related CN102930290B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210379640.9A CN102930290B (en) 2012-10-09 2012-10-09 The sorting technique of integrated classifier and this device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210379640.9A CN102930290B (en) 2012-10-09 2012-10-09 The sorting technique of integrated classifier and this device

Publications (2)

Publication Number Publication Date
CN102930290A true CN102930290A (en) 2013-02-13
CN102930290B CN102930290B (en) 2015-08-19

Family

ID=47645087

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210379640.9A Expired - Fee Related CN102930290B (en) 2012-10-09 2012-10-09 The sorting technique of integrated classifier and this device

Country Status (1)

Country Link
CN (1) CN102930290B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104484404A (en) * 2014-12-15 2015-04-01 中国科学院东北地理与农业生态研究所 Improved processing method for geo-raster data file in distributed file system
CN105303470A (en) * 2015-11-26 2016-02-03 国网辽宁省电力有限公司大连供电公司 Electric power project planning and construction method based on big data
CN107203775A (en) * 2016-03-18 2017-09-26 阿里巴巴集团控股有限公司 A kind of method of image classification, device and equipment
CN111259273A (en) * 2018-11-30 2020-06-09 顺丰科技有限公司 Webpage classification model construction method, classification method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101251896A (en) * 2008-03-21 2008-08-27 腾讯科技(深圳)有限公司 Object detecting system and method based on multiple classifiers
US7562017B1 (en) * 2003-05-29 2009-07-14 At&T Intellectual Property Ii, L.P. Active labeling for spoken language understanding

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7562017B1 (en) * 2003-05-29 2009-07-14 At&T Intellectual Property Ii, L.P. Active labeling for spoken language understanding
CN101251896A (en) * 2008-03-21 2008-08-27 腾讯科技(深圳)有限公司 Object detecting system and method based on multiple classifiers

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
潘欣等: "粗集属性划分的集成遥感分类", 《遥感学报》, 31 December 2009 (2009-12-31) *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104484404A (en) * 2014-12-15 2015-04-01 中国科学院东北地理与农业生态研究所 Improved processing method for geo-raster data file in distributed file system
CN104484404B (en) * 2014-12-15 2017-11-07 中国科学院东北地理与农业生态研究所 One kind improves geographical raster data document handling method in distributed file system
CN105303470A (en) * 2015-11-26 2016-02-03 国网辽宁省电力有限公司大连供电公司 Electric power project planning and construction method based on big data
CN107203775A (en) * 2016-03-18 2017-09-26 阿里巴巴集团控股有限公司 A kind of method of image classification, device and equipment
CN107203775B (en) * 2016-03-18 2021-07-27 斑马智行网络(香港)有限公司 Image classification method, device and equipment
CN111259273A (en) * 2018-11-30 2020-06-09 顺丰科技有限公司 Webpage classification model construction method, classification method and device

Also Published As

Publication number Publication date
CN102930290B (en) 2015-08-19

Similar Documents

Publication Publication Date Title
Fan et al. A review on data preprocessing techniques toward efficient and reliable knowledge discovery from building operational data
Chen et al. Financial credit risk assessment: a recent review
Sun et al. An objective-based scenario selection method for transmission network expansion planning with multivariate stochasticity in load and renewable energy sources
Hachicha et al. A survey of control-chart pattern-recognition literature (1991–2010) based on a new conceptual classification scheme
CN106845717B (en) Energy efficiency evaluation method based on multi-model fusion strategy
CN107230108A (en) The processing method and processing device of business datum
US20210390457A1 (en) Systems and methods for machine learning model interpretation
CN102930290B (en) The sorting technique of integrated classifier and this device
Trstenjak et al. Determining the impact of demographic features in predicting student success in Croatia
Hanczar Performance visualization spaces for classification with rejection option
Zhou et al. Credit risk evaluation with extreme learning machine
CN102184422B (en) Average error classification cost minimized classifier integrating method
CN112613542A (en) Bidirectional LSTM-based enterprise decontamination equipment load identification method
CN116307059A (en) Power distribution network region fault prediction model construction method and device and electronic equipment
Lahmiri Forecasting direction of the S&P500 movement using wavelet transform and support vector machines
CN110389932A (en) Electric power automatic document classifying method and device
Krishnamurthy et al. 9Cr steel visualization and predictive modeling
Zhou et al. Pre-clustering active learning method for automatic classification of building structures in urban areas
Rajasekhar et al. Weather analysis of Guntur district of Andhra region using hybrid SVM Data Mining Techniques
Zhu et al. ε-Proximal support vector machine for binary classification and its application in vehicle recognition
Nagwanshi Learning classifier system
Yao Feature selection based on SVM for credit scoring
CN109784632B (en) Mining method for interruption response characteristics of industrial and commercial users
Kranen Anytime algorithms for stream data mining
Dutta A Visual Analytics Based Decision Support Methodology For Evaluating Low Energy Building Design Alternatives

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20150819

Termination date: 20181009