CN1588363A - Backward coarse collecting attribute reducing method using directed search - Google Patents

Backward coarse collecting attribute reducing method using directed search Download PDF

Info

Publication number
CN1588363A
CN1588363A CN 200410067151 CN200410067151A CN1588363A CN 1588363 A CN1588363 A CN 1588363A CN 200410067151 CN200410067151 CN 200410067151 CN 200410067151 A CN200410067151 A CN 200410067151A CN 1588363 A CN1588363 A CN 1588363A
Authority
CN
China
Prior art keywords
attribute
memory block
transient state
attribute set
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 200410067151
Other languages
Chinese (zh)
Other versions
CN1300730C (en
Inventor
杨胜
施鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CNB2004100671515A priority Critical patent/CN1300730C/en
Publication of CN1588363A publication Critical patent/CN1588363A/en
Application granted granted Critical
Publication of CN1300730C publication Critical patent/CN1300730C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The directionally searching backward coarse set attribute reducing method adopts mutual information and redundant cooperation index of the attribute subset as the measurement of the reducing coarse set attribute. The initial attribute sets are first sorted and several equivalent attribute subsets with minimum redundant cooperation index are selected from the subsets of the initial attribute sets and are stored in the directional memory area; several next level equivalent attribute subsets with minimum redundant cooperation index are then selected from the next level subsets of equivalent attribute subsets for further search; and so on until no further equivalent attribute subset can be found. The attribute subsets ultimately stored in the directional memory area are the attribute reducing result. The said method is flexible, simple and universal, and may be used in all coarse set attribute reducing fields.

Description

The backward coarse collecting attribute reducing method of using directed search
Technical field
The present invention relates to a kind of rough set attribute reduction method, relate in particular to and a kind ofly make yojan tolerance with mutual information, adopted the backward coarse collecting attribute reducing method of directed (Beam) search technique,, belonged to field of information processing for the rough set knowledge acquisition provides good approach.
Background technology
Along with developing rapidly and the widespread use of data base management system (DBMS) of infotech, the data of people's accumulation are more and more.The under cover many behind important information of data of increasing sharply, people wish and can carry out higher level analysis to it, so that utilize these data better.Present Database Systems can realize functions such as the typing, inquiry, statistics of data efficiently, but can't find the relation and the rule that exist in the data, can't be according to existing data prediction development in future trend.The means that lack the knowledge that mining data hides have behind caused the phenomenon of " data explosion but knowledge poorness ".Therefore, research can form summary (conclusion) from bulk information method just seems and comes more with important, the maturation but senior intelligent data analytical technology also is far from.
Rough set theory is a kind ofly to study the theoretical method that uncertain, imperfect knowledge and data are concluded, expressed by what Z.Pawlak proposed, be widely used in data mining, machine learning, fields such as artificial intelligence and fault diagnosis become scientific research focus in recent years.Rough set theory obtains classifying rules by attribute reduction and value yojan, and then the treatment classification problem.Attribute reduction is a basic operation in the rough set theory classifying rules acquisition process, and it is meant the uncorrelated and redundant attribute of deletion under the prerequisite of the classification capacity that keeps the initial attribute collection.On the basis of attribute reduction, remake further value yojan, the classifying rules that obtains simplifying.
Minimum attribute reduction (also claiming optimum) is the attribute set that obtains a minimum, makes that its classification capacity is identical with the initial attribute collection.The target of rough set attribute reduction is exactly minimum attribute reduction, and it has been proved to be non-linear polynomial expression difficulty (NP-hard).The method of attribute reduction can be summed up as two big classes at present:
(1) complete searching method, searching method is meant and estimates each possible attribute set fully, obtains minimum attribute reduction result.Searching method is exactly exhaustive combinatorial search the most fully, promptly estimates each combinations of attributes.This method is most time-consuming a kind of way, as the exhaustive combination searching method of forward direction.When search evaluating deg measurer has monotonicity character, can adopt branch-bound method to search for fully.When adopting mutual information to measure, can adopt branch-bound method, as automatic branching boundary method (ABB) and branch-bound method (B﹠amp as attribute reduction; B), they are all with the mutual information of the initial attribute collection boundary as attribute reduction.Difference is that the former is the breadth-first search method, and the latter adopts the depth-first search method.Have only complete searching method can guarantee to realize minimum attribute reduction, but its time complexity is an exponential form, when property set is excessive (normally>20), searching method just becomes inapplicable owing to working time is long fully.
(2) heuristic search, heuristic search is determined search procedure according to certain direction, modal is best method (Beet First) at first.Whether common heuristic attribute reduction method is to investigate one by one that each attribute sees can be deleted, the sequencing that this obviously method is investigated according to attribute and difference.The heuristic attribute reduction method of Best First that just is based on mutual information is arranged again, and it carries out attribute reduction with the maximization mutual information as the direction of search from nuclear.The shortcoming of heuristic is that it is unidirectional, promptly has only a direction of advancing and reconnoitering.Be greatly reduced with respect to complete searching method operation time, but often produce a very poor attribute reduction result.
Summary of the invention
The objective of the invention is to overcome the deficiency of existing rough set attribute reduction method, a kind of new rough set attribute reduction method is provided, realize the rapidity of high-quality attribute reduction and computing, satisfy the actual needs of classification learning.
In order to realize such purpose, the present invention utilize the mutual information of attribute set and redundant coefficient of concordance (redundancy-synergy coefficient, RSC, RSC ( A ) = I ( A ; P ) Σ i = 1 a I ( f i ; P ) , A = { f i | i = 1 , . . , a } ) as the tolerance of rough set attribute reduction, from initial attribute collection F through ordering, from child's subclass (so-called child is meant and deletes the attribute set that an attribute obtains) of initial attribute collection, choose the attribute set of equal value (so-called attribute set of equal value is meant that mutual information equates) of M redundant coefficient of concordance minimum, be stored in directed memory block; Then, again from this M attribute set of equal value, the attribute set of equal value of choosing M redundant coefficient of concordance minimum from their child's subclass stores directed memory block into and does further search; By that analogy, up to do not have attribute set of equal value can be found till, the attribute set that is stored in directed memory block thus at last is exactly the attribute reduction result.
The concrete steps of the inventive method are as follows:
1, initialization: each attribute among the initial attribute collection F is rearranged from small to large according to mutual information, and will deposit in the directed memory block (Beam) through the initial attribute collection F after the ordering.
2, beam search: empty transient state memory block (Queue); For each attribute set in the directed memory block, according to redundant coefficient of concordance characteristic can by successively from front to back delete property find child's attribute set of equal value of its M redundant coefficient of concordance minimum, just preceding M child attribute set of equal value, deposit the transient state memory block in, wherein, redundant coefficient of concordance RSC ( A ) = I ( A ; P ) Σ i = 1 a I ( f i ; P ) , A = { f i | i = 1 , . . , a } , A representation attribute subclass, f iRepresentation attribute, I (A; P) mutual information of expression A and categorical attribute P, I (f iP) expression f iMutual information with categorical attribute P; If the child of certain attribute set attribute set number of equal value, is then got whole children attribute set of equal value of this attribute set less than M and is deposited the transient state memory block in.
3, the beam search stop condition is differentiated: if the transient state memory block comprises attribute set, then empty directed memory block; From the transient state memory block, find out M attribute set of redundant coefficient of concordance minimum, deposit directed memory block in, if the attribute set in the transient state memory block is less than M, whole attribute sets of then getting in the transient state memory block deposit directed memory block in, proceed the beam search in (2) step then.If the transient state memory block does not comprise attribute set, then all properties subclass in the output directional memory block obtains the attribute reduction result thus.
Method of the present invention can guarantee the rapidity of computing and attribute reduction result's quality by flexible M value.The M value can be set an initial value according to the size of initial attribute collection, and can adjust with length operation time, and operation time is long, then reduces the M value, otherwise, increase the M value, up to obtaining satisfied attribute reduction result.The initial attribute collection is big more, and it is more little that M gets initial value; If operation time is long, then can reduce the M value, otherwise, can increase the M value.Owing to can enlarge the hunting zone, thereby can obtain more more excellent attribute reduction results, but guarantee the rapidity of computing simultaneously.The present invention is a heuristic attribute reduction method, with general optimum at first method different be, it can be regarded as the optimum expansion of method at first, perhaps, optimum method at first is its special case.
The present invention utilizes the mutual information of attribute set and the information redundancy tolerance between the attribute---redundant coefficient of concordance is measured as attribute reduction, makes the attribute reduction of a sweep backward.Method realizes simple flexibly, and with strong points, highly versatile has the polynomial time complexity, can be applicable to all rough set attribute reduction fields.
Description of drawings
Fig. 1 is the beam search synoptic diagram in the inventive method.
Embodiment
Technical scheme for a better understanding of the present invention is further described below in conjunction with drawings and Examples.
(1) initialization:
With each attribute among the initial attribute collection F according to mutual information I (f iP) rearrange from small to large, and will deposit in the directed memory block (Beam) through the initial attribute collection F after the ordering.It is exactly the child's attribute set of equal value that finds preceding M redundant coefficient of concordance minimum of attribute set in the directed memory block for convenience that mutual information is arranged from small to large, can compress the beam search space like this, reduces search time.
Note redundant coefficient of concordance from quantity of information merchant's angle describe attribute set redundant degree and the combination cooperative ability.A (A={f i| f i∈ A, i=1 ..., a}) F, RSC (A) is called the redundant coefficient of concordance of attribute set A, and it calculates suc as formula (1),
RSC ( A ) = I ( A ; P ) Σ i = 1 a I ( f i ; P ) - - ( 1 )
Redundant coefficient of concordance is the notion of a relative measure information.The span of redundant coefficient of concordance be (0, ∞).Redundant coefficient of concordance is more little, and the combination of attributes ability is weak more, and the redundancy that comprises category information between the declared attribute is big more, and many more attributes can be deleted and keep mutual information not reduce.It has following two character:
(1) if I is (A; P)=I (B; And A B, then RSC (A) 〉=RSC (B) P).
(2) for attribute set A F, A={f 1, f 2..., f a, if I is (f 1P)<I (f 2P)<...<I (f aAnd I (A-(f P), i| i=1,2 ..., a}; P)=I (A; P), RSC (A-{f then 1)<RSC (A-{f 2)<...<RSC (A-{f a)<RSC (A).
At first the attribute among the initial attribute collection F is arranged from small to large according to mutual information in the present invention.According to redundant coefficient of concordance character (2), using this only to arrange need be by deleting preceding M child's equivalence attribute set that an attribute finds each father's attribute set from front to back successively, and need not consider child's attribute set that this father's attribute set is all.Because for each node among the directed memory block Beam (being attribute set), the redundant coefficient of concordance minimum of preceding M child's attribute set of equal value, this has saved operation time greatly.So in the initialization procedure attribute among the initial attribute collection F is arranged from small to large according to mutual information.
(2) beam search:
Optimum search at first is a starting point of estimating the optimum node of tolerance as next step search normally, and beam search is then chosen the starting point of M the measured node of evaluating deg as next step search.Beam search can be " search of a tree finite width " method, and its tree search width is made as M, is called directed width.The beam search process as shown in Figure 1, dark node represents to be used to do the further node of search among the figure, white nodes is the node that is rejected in the search procedure, directed width M is 2.The starting point of two best tree nodes that satisfy optimal conditions as next step search arranged in each layer, do further search, up to satisfying the search stop condition, end product is node 1 and 2.If (attribute set of equal value of individual redundant coefficient of concordance minimum of K<M) is then got this K attribute set and is done further search to be merely able to find K.
The redundancy of the attribute coordinate expression generic attribute that redundant coefficient of concordance is a property set and the tolerance of cooperative ability, redundant coefficient of concordance is more little, redundance is big more, having many redundant attributes more can be deleted, also promptly more may find the attribute set of equal value of a littler F, therefore, redundant coefficient of concordance can be selected tolerance as attribute set, in conjunction with the beam search method, carry out the back to the delete property yojan.
(3) the beam search stop condition is differentiated:
Be empty in the transient state memory block, illustrate when not finding attribute set of equal value, therefore the last attribute set of equal value that is stored in the minimum that the attribute set of equal value in the directed memory block is considered to find that finds, so beam search stops, and obtains the attribute reduction result.If have, explanation can be made further beam search, from the transient state memory block, find out M attribute set of redundant coefficient of concordance minimum, deposit directed memory block in, if the attribute set in the transient state memory block is less than M, whole attribute sets of then getting in the transient state memory block deposit directed memory block in, continue the search in (2) step.
The working time of attribute reduction method of the present invention and two factors have relation: the calculating of (1) attribute set mutual information; (2) search volume, the i.e. number of the attribute set of being estimated.The time of an attribute set evaluation is depended on attribute set to the sample set division of (sample set comprises p attribute, m sample), adopts hashing to divide, and the time complexity of attribute set evaluation is O (m).If r is a yojan sub-set size as a result, the attribute set number that the inventive method is estimated is not more than 0.5*M* (p-r) * (p-1+r)+p+1, so time complexity of the present invention is O (mMp 2).In fact, reduced unnecessary attribute set evaluation because produce framework by attribute ordering and child's attribute set, therefore search volume of the present invention is much smaller than 0.5*M* (p-r) * (p-1+r)+p+1.When M=1, time complexity of the present invention is O (mp).
5 UCI standard data set: Corral, Monk1, Parity5+2, Vote, Mushroom are chosen in experiment.At first select for use the ABB method to make attribute reduction, result and operation time are as shown in table 1.For the Mushroom data set, surpass 2 hours operation time, thinks that the ABB method is unaccommodated, with "-" expression.The attribute reduction result of the inventive method is as shown in table 2 respectively, and M gets 1 respectively, p and 2p.They almost can access as can be seen from the table most the attribute reduction subclass, but relative ABB method of time descends greatly.For the Mushroom data set, the inventive method has also obtained good attribute reduction result, and the ABB method is owing to be that a complete searching method can not.
Table 1 data set information and ABB method attribute reduction result
Data set sample number initial attribute collection size u
ABB
AS t(ms)
Corral 128 6 2 3
{f 1-f 4}
Monkl 432 6 2 {f 1,f 2,f 5} 19
Parity5+2 1024 10 2 650
{f 1-f 5} (1){f 1,f 3-f 5,f 7} (2)
{f 2-f 6} (3){f 3-f 7} (4)
Vote 435 16 2 {f 1-f 4,f 9,f 11,f 13,f 15,f 16} 2697
Mushroom 8124 22 2 - -
U is the classification number, and AS is the attribute reduction subclass, and t is operation time.
Table 2 the inventive method attribute reduction result
Data set the present invention (M=2p) the present invention (M=p) the present invention (M=1)
AS t(ms) AS t(ms) AS t(ms)
Corral {f 1-f 4} 2 {f 1-f 4} 2 {f 1-f 4} 2
Monkl {f 1,f 2,f 5} 13 {f 1,f 2,f 5} 13 {f 1,f 2,f 5} 4
Parity5+2?{f 3-f 7} (1) 403 {f 3-f 7} (1) 397 {f 2-f 5} 49
{f 1,f 3-f 5,f 7} (2) {f 1,f 3-f 5,f 7} (2)
{f 2-f 6} (3) {f 2-f 6} (3)
{f 1-f 5) (4)
Vote {f 1-f 4,f 9,f 11, 985 {f 1-f 4,f 9,f 11, 765 {f 1-f 4,f 9,f 11, 42
{f 13,f 15,f 16} f 13,f 15,f 16} f 13,f 15,f 16}
Mushroom {f 5,f 20,f 21,f 22} (1) 659219 15 369640 {f 5,f 8,f 12,f 19,f 20} 2389
{f 4,f 5,f 12,f 22} (2)

Claims (1)

1, a kind of backward coarse collecting attribute reducing method of using directed search is characterized in that comprising the steps:
1) initialization: each attribute that initial attribute is concentrated rearranges from small to large according to mutual information, and will deposit in the directed memory block through the initial attribute collection after the ordering;
2) beam search: empty the transient state memory block; For each attribute set in the directed memory block, according to redundant coefficient of concordance characteristic can by successively from front to back delete property find child's attribute set of equal value of its M redundant coefficient of concordance minimum, just preceding M child attribute set of equal value, deposit the transient state memory block in, wherein, redundant coefficient of concordance RSC ( A ) = I ( A ; P ) Σ i = 1 a I ( f i ; P ) , A={f i| i=1 ..., a}, A representation attribute subclass, f iRepresentation attribute, I (A; P) mutual information of expression A and categorical attribute P, I (f iP) expression f iMutual information with categorical attribute P; If the attribute set number of equal value of the child in certain attribute set, is then got whole children attribute set of equal value of this attribute set less than M and is deposited the transient state memory block in; Wherein the M initial value is set and can be adjusted with length operation time according to the size of initial attribute collection, and the initial attribute collection is big more, and the M initial value is just obtained more little, operation time is long, then reduces the M value, otherwise, increase the M value, up to obtaining satisfied attribute reduction result;
3) the beam search stop condition is differentiated: if the transient state memory block comprises attribute set, then empty directed memory block, from the transient state memory block, find out M attribute set of redundant coefficient of concordance minimum, deposit directed memory block in, if the attribute set in the transient state memory block is less than M, whole attribute sets of then getting in the transient state memory block deposit directed memory block in, proceed the beam search in (2) step then; If the transient state memory block does not comprise attribute set, then all properties subclass in the output directional memory block obtains the attribute reduction result thus.
CNB2004100671515A 2004-10-14 2004-10-14 Backward coarse collecting attribute reducing method using directed search Expired - Fee Related CN1300730C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2004100671515A CN1300730C (en) 2004-10-14 2004-10-14 Backward coarse collecting attribute reducing method using directed search

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2004100671515A CN1300730C (en) 2004-10-14 2004-10-14 Backward coarse collecting attribute reducing method using directed search

Publications (2)

Publication Number Publication Date
CN1588363A true CN1588363A (en) 2005-03-02
CN1300730C CN1300730C (en) 2007-02-14

Family

ID=34604132

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2004100671515A Expired - Fee Related CN1300730C (en) 2004-10-14 2004-10-14 Backward coarse collecting attribute reducing method using directed search

Country Status (1)

Country Link
CN (1) CN1300730C (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103336790A (en) * 2013-06-06 2013-10-02 湖州师范学院 Hadoop-based fast neighborhood rough set attribute reduction method
CN112435742A (en) * 2020-10-22 2021-03-02 北京工业大学 Neighborhood rough set method for feature reduction of fMRI brain function connection data

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103787969B (en) 2012-10-30 2016-07-06 上海京新生物医药有限公司 A kind of (1S)-1-phenyl-3,4-dihydro-2(1H) preparation method of-isoquinolinecarboxylic acid ester

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6263332B1 (en) * 1998-08-14 2001-07-17 Vignette Corporation System and method for query processing of structured documents

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103336790A (en) * 2013-06-06 2013-10-02 湖州师范学院 Hadoop-based fast neighborhood rough set attribute reduction method
CN103336790B (en) * 2013-06-06 2015-02-25 湖州师范学院 Hadoop-based fast neighborhood rough set attribute reduction method
CN112435742A (en) * 2020-10-22 2021-03-02 北京工业大学 Neighborhood rough set method for feature reduction of fMRI brain function connection data
CN112435742B (en) * 2020-10-22 2023-10-20 北京工业大学 Neighborhood rough set method for feature reduction of fMRI brain function connection data

Also Published As

Publication number Publication date
CN1300730C (en) 2007-02-14

Similar Documents

Publication Publication Date Title
Kalnis et al. On discovering moving clusters in spatio-temporal data
CN113612749B (en) Intrusion behavior-oriented tracing data clustering method and device
CN110866997A (en) Novel method for constructing running condition of electric automobile
CN112687349A (en) Construction method of model for reducing octane number loss
Calders et al. Mining rank-correlated sets of numerical attributes
CN111104398B (en) Detection method and elimination method for intelligent ship approximate repeated record
WO2020048145A1 (en) Method and device for data retrieval
CN1636168A (en) Clustering for data compression
CN100354864C (en) A method of feature selection based on mixed mutual information in data mining
CN112131278A (en) Method and device for processing track data, storage medium and electronic device
CN107133335A (en) A kind of repetition record detection method based on participle and index technology
CN116109616A (en) Pavement crack detection and small-surface element fitting detection method based on YOLOv5
CN114647684A (en) Traffic prediction method and device based on stacking algorithm and related equipment
CN114881343A (en) Short-term load prediction method and device of power system based on feature selection
Chen et al. Approximating median absolute deviation with bounded error
CN109766919B (en) Gradual change type classification loss calculation method and system in cascade target detection system
Ho et al. An adaptive information-theoretic approach for identifying temporal correlations in big data sets
Chatzigeorgakidis et al. Local pair and bundle discovery over co-evolving time series
CN1588363A (en) Backward coarse collecting attribute reducing method using directed search
CN116401212A (en) Personnel file quick searching system based on data analysis
CN110941767A (en) Network community detection countermeasure enhancement method based on multi-similarity integration
CN116361327A (en) Track accompanying relation mining method and system based on two-level space-time index
CN115019144A (en) Shale gas reservoir lithology intelligent identification method based on electric imaging logging image
CN106980495B (en) Function reusability measurement method based on program slice
Jin et al. Efficient Non-Learning Similar Subtrajectory Search

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20070214

Termination date: 20091116