CN107291854B - The lossless compression method of frequent co-location patterns - Google Patents

The lossless compression method of frequent co-location patterns Download PDF

Info

Publication number
CN107291854B
CN107291854B CN201710430303.0A CN201710430303A CN107291854B CN 107291854 B CN107291854 B CN 107291854B CN 201710430303 A CN201710430303 A CN 201710430303A CN 107291854 B CN107291854 B CN 107291854B
Authority
CN
China
Prior art keywords
spi
location
patterns
neighbours
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710430303.0A
Other languages
Chinese (zh)
Other versions
CN107291854A (en
Inventor
王丽珍
陈红梅
肖清
包旭光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi zhengrudder Network Technology Co., Ltd
Original Assignee
Yunnan University YNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan University YNU filed Critical Yunnan University YNU
Priority to CN201710430303.0A priority Critical patent/CN107291854B/en
Publication of CN107291854A publication Critical patent/CN107291854A/en
Application granted granted Critical
Publication of CN107291854B publication Critical patent/CN107291854B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction

Abstract

The invention discloses the lossless compression methods of frequent co location patterns, input data is pre-processed first, feature neighbours' transaction set is stored using lexcographical order prefix tree construction, the prefix tree construction of feature based neighbours' transaction set, generate star-like SPI and close candidate pattern, combine star-like SPI close candidate pattern generate group a SPI close candidate pattern;The group of generating SPI is closed after candidate pattern, by scanning neighbours transaction set NT, can be obtained the candidate table example of candidate pattern, then the proximity relations by detecting other examples, can really be met the table example of regimental tie;The participation PI of pattern can be calculated based on table example, while can be determined that whether a pattern is that SPI closes co location patterns.This method provides the compression expressions that is smaller, not losing participation information of frequent co location sets of patterns.

Description

The lossless compression method of frequent co-location patterns
Technical field
The invention belongs to space co-location (juxtaposition) mode excavation technical fields, more particularly to a kind of frequent co- The lossless compression method of location (juxtaposition) pattern.
Background technology
Spatial co-location patterns excavate (mining prevalent co-location patterns from Spatial data sets) the continually space characteristics group of close adjacent appearance in the geographical space data sets of tradition for identification. If Fig. 1 is the example of a space data sets.With the space characteristics that different icon representations is different, such as house in figure.In figure altogether There are 5 features, each feature there are 4 space instances.As can be observed from Figure, the withered tree of space characteristics and mountain fire and house and The example of bird tends to continually close adjacent appearance.The two patterns imply " mountain fire is frequently too many related with withered tree ", and " room Son is frequently occurred with bird, illustrates that living environment is improved ".
The application that spatial co-location patterns excavate include find out life (or growth) space it is overlapped move (or Plant) object, the location often assembled of identification specific crowd is purposefully to place particular advertisement, to understand Earth climate system difference Contact etc. between element.
Since there is this research field higher theoretical research and actual application value, domestic and international many researchers to propose Various spatial co-location patterns mining algorithms.With from transaction database Mining Frequent Itemsets Based it is similar, from large space number Significant challenge according to Mining Frequent co-location sets of patterns in library is that this excavation usually generates the minimum frequency of a large amount of satisfactions The spatial co-location patterns of threshold value M, it is especially true when M sets very low.This is because an if co-location Pattern is frequent, then his all subsets are also frequent.For this purpose, the prior art (J.S.Yoo and M.Bow.Mining top-k closed co-location patterns[C].In Proceedings of IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services(ICSDM), Pp.100-105,2011. it is denoted as document 2) propose that a kind of TOP-k closes co-location schema concepts and corresponding excavate is calculated Method, in the publication, closing the concept of co-location patterns is:One co-location patterns c is closed, and if only if it Participation angle value be all higher than the participation angle value of its any hyper mode, i.e.,:TOP-k closes co- The concept of location patterns is:If L is by all lists for closing co-location patterns for participating in the arrangement of angle value descending, p is K-th of participation for closing co-location patterns in list L, it refers to all participation angle value that TOP-k, which closes co-location patterns, The set for closing co-location patterns more than or equal to p.The method that TOP-k closes co-location mode excavations is:First, Input data is organized into example neighbours transaction set and feature neighbours' transaction set.The advantages of neighbours' transaction set, is:1, it will not lose Lose any example pair with proximity relations;2, it easily builds up;3, candidate co-location patterns can easily be generated by it; 4, it can provide the upper dividing value of co-location pattern participations.Secondly, there is neighbour to store using class FP-tree structures The space characteristics of nearly relationship, so as to obtain star-like neighbours' candidate pattern.Then, by combining related star-like neighbours candidate mould Formula can obtain the upper dividing value of candidate co-location patterns and candidate pattern participation.Next, utilizing internal minimum frequency Numerous property threshold θ can define a beta pruning frame, to reduce the quantity of candidate pattern in search space.This beta pruning frame Basic thought is:Participation angle value minimum in current top-k result sets is set as internal minimum frequency threshold θ, if next The upper dividing value of the participation of a candidate is less than θ, then it and its all supersets can be with beta pruning.Subsequently, for being unable to beta pruning Candidate pattern, using example neighbours' transaction set obtain candidate pattern table example, so as to calculate the true of candidate pattern Participate in angle value.Finally, the participation of comparison co-location patterns is concentrated in top-k Ordinals using the method for binary search Angle value, to obtain final top-k close pattern collection.
For space data sets shown in Fig. 2 (a), F={ A, B, C, D } is usually used to carry out the set of representation space feature, and Example characteristics are then denoted as " feature digital numbers ", such as " A.1 ", in figure with the line between example indicate two examples between exist Proximity relations.It can be seen that this spatial data shown in Fig. 2 (a) concentrates feature A to have 4 examples, B to have 5 examples, C to have 3 examples, and D has 4 examples.And the introducing of participation rate and participation, then it can be used to pick out frequent co- all in F Location patterns.If minimum participation threshold value M=0.3 is arranged at this time, then the frequent co- that this data set is included Location patterns are:{A,B,C,D},{A,B,C},{A,B,D},{A,C,D},{B,C,D},{A,B},{A,C},{A,D}, { B, C }, { B, D } and { C, D } (Fig. 2 (b) gives the table example of all co-location patterns of Fig. 2 (a) data sets, participates in Rate and participation are equivalent).
However, since the co-location patterns for meeting downward closure property will produce a large amount of redundant mode, such as What uses a reduced set (smaller set), enables it that can not only describe initial result set and be derived by it original Result set has become new research hotspot.Then, scholars propose very big co-location sets of patterns and close co- The two concepts of location sets of patterns.Very big co-location sets of patterns are the reduced sets of lossy compression, though this is because it It can derive initial result set, but differ and surely derive corresponding participation (PI) value.And closing co-location sets of patterns is The reduced set of lossless compression, it, which solves very big co-location sets of patterns, cannot derive asking for corresponding participation (PI) value Topic.But this is but also the ability of simplifying for closing co-location sets of patterns becomes very limited.For example, data set closes in Fig. 2 (a) Co-location sets of patterns be { A, B, C, D }, { A, B, D }, { A, C, D }, { B, C, D }, { A, B }, { A, D }, { B, D }, C, D } }, it can be seen that such reduced set is still very big.
In practice, the quantity of the frequent co-location patterns generated by space data sets may be very big, therefore, from In identify compressed subset that can derive other all frequent co-location patterns, smaller, representative It is useful.Frequently closing co-location sets of patterns (be denoted as PI- and close co-location sets of patterns) provides frequent co- Participation information is not lost in a kind of compression expression of location sets of patterns, the expression.It is existing frequently to close co-location moulds Formula concept has followed the thought of traditional Frequent Closed Itemsets, as a result, compression ratio is extremely low, i.e., frequently closes co-location pattern counts Mesh is still bigger than normal.
Invention content
The embodiment of the present invention is designed to provide a kind of lossless compression method of frequent co-location patterns, the party Method provides the compression expression that is smaller, not losing participation information of frequent co-location sets of patterns.
The technical solution adopted in the present invention is that the lossless compression method of frequent co-location patterns defines first SPI- closes co-location patterns:
1. are defined for two given co-location patterns c and c ' andC in c ' super participation SPI (c | C') it is defined as the minimum value of all feature participation rates in the c being calculated by the table example of c', i.e.,:SPI (c | c')= min{PR(c',fi),fi∈c};
It is that SPI- closes co-location patterns to define 2. 1 co-location patterns c, and the PI values and if only if c are more than SPI values of the c in the hyper mode c' that its all SPI- is closed, i.e.,:And if only ifC' is that SPI- closes co-location moulds Formula, while PI (c)>SPI (c | c') → c is that a SPI- closes co-location patterns;
If defining 3. 1 SPI-, to close co-location patterns c be that SPI- closes frequent co-location patterns, when and only When c is that SPI- is closed and what PI (c) >=M, wherein M were indicated is frequency threshold value that user specifies;And for a co- Location pattern c make if there is a co-location patterns c'And PI (c)=SPI (c | c') (PI (c)= PI (c')), then claiming " c'SPI- covers c (PI- coverings) ";
Then, it follows the steps below:
Step 1, input data is pre-processed:Generate neighbours' transaction set NT and feature neighbours' transaction set ENT;
Step 2, feature neighbours' transaction set is stored using lexcographical order prefix tree construction, before feature based neighbours' transaction set Sew tree construction, generate star-like SPI- and close candidate pattern, combine star-like SPI- close candidate pattern generate group a SPI- close candidate pattern;
Step 3, the group of generating SPI- is closed after candidate pattern, by scanning neighbours transaction set NT, can obtain candidate mould The candidate table example of formula, then the proximity relations by detecting other examples, can really be met the table example of regimental tie;Base The participation PI of pattern can be calculated in table example, while can judge whether a pattern is that SPI- is closed by the following method Co-location patterns:
For k rank candidate pattern c, if PI (c)=UPI (c) (upper bound participation UPI of the participation PI=c of c), then c Must be that SPI- closes co-location patterns;Otherwise, it is necessary to which all k-1 ranks for first generating pattern c are made by the subpattern of beta pruning For candidate, if next, PI (c)<M, then c is just fallen by beta pruning;Such as if it is greater than or equal to M, then need according to definition 2 and definition 3 To judge whether c is that a SPI- closes co-location patterns.
Further, in the step 1, pretreatment input data detailed process is:Use given adjacency threshold value Input data set is handled, all neighbouring examples pair are obtained, by being grouped adjacent to example pair, generation neighbours transaction set NT, then, Feature neighbours' transaction set ENT is generated according to neighbours' transaction set NT;For a space characteristics example f.i ∈ S, its example neighbours Transaction set is one and includes f.i and all set with other space characteristics examples of the f.i with proximity relations, that is, NT (f.i)={ f.i, g.j ∈ S | NR (f.i, g.j)=true and f ≠ g }, the wherein neighbouring pass between NR representation spaces example System, f.i are referred to as reference example, and the collection of neighbours' transaction set of all examples is collectively referred to as neighbours' transaction set of spatial data, is denoted as NT;The lexcographical order collection of different spaces feature, referred to as feature neighbours transaction set ENT in example neighbours' transaction set NT.
Further, the step 2 stores the detailed process of feature neighbours' transaction set using lexcographical order prefix tree construction For:
Step 1. defines lexcographical order prefix trees;Using the characteristic type of reference example as root node, with feature neighbours' transaction set Middle neighbors feature is child node;Each child node is by three parts Composition of contents:Characteristic type, count value and node line;Wherein, Characteristic type is used for identifying node;Count value representative is concentrated with several paths in entire feature affairs can be from the spy of reference example Sign type reaches this feature type;The connection of node line is the section for possessing same characteristic features type with the node in this tree Point;
Step 2. is since in lexcographical order prefix trees, all child nodes all have proximity relations with root node, so can Candidate co-location patterns are closed to generate star-like SPI-;And by lexcographical order prefix trees, this star-like SPI- can also be obtained Close dividing value (UPR) on the participation rate of co-location patterns;If in same one tree, some candidate upper bound participation rate is equal to Its super candidate upper bound participation rate, then, just mark this red star-like candidate;If but some candidate upper bound participation rate is less than threshold Value M, then just it is deleted;
Step 3, co-location candidate patterns are closed by combining k related star-like SPI-, generates k ranks group SPI- and closes co- Location candidate patterns, and upper bound participation rate minimum in this k star-like candidates is that k ranks group SPI- closes co-location The upper bound of candidate pattern participates in angle value (UPI).
The beneficial effects of the invention are as follows:A kind of lossless compression new method of frequent co-location patterns is proposed, referred to as SPI- closes co-location mode excavation methods.SPI- closes co-location sets of patterns and provides frequent co-location moulds (the co-location patterns of closing proposed than the prior art (are known as PI- and close co-location patterns) collection reduction to the smaller of formula collection About 30%), the compression expression of not losing participation information.
Description of the drawings
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with Obtain other attached drawings according to these attached drawings.
Fig. 1 is space data sets citing.
Fig. 2 a are a space data sets examples, and Fig. 2 b are the co- of all possible co-location patterns in Fig. 2 a Location examples, participation rate and participation angle value.
Fig. 3 is the generation example of candidate pattern, and wherein Fig. 3 a are the lexcographical order prefix trees of feature in table 1 (a) data set;Figure 3b is that star-like SPI- closes candidate pattern;Fig. 3 c are that a SPI- closes candidate pattern.
Fig. 4 is SPI- delvers and PI- delver run time comparison diagrams in the embodiment of the present invention.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
First, related definition and lemma that SPI- closes co-location patterns are provided, this can directly be excavated by then providing The method of one quasi-mode.
It defines and 1. surpasses participation, The super participation index SPI (c | c')
For two given co-location patterns c and c ' andSuper participation SPI (c | the c') quilts of c in c ' It is defined as the minimum value of all feature participation rates in the c being calculated by the table example of c'.I.e.:SPI (c | c')=min { PR (c',fi),fi∈c}。
Example 1, in Fig. 2 (a), SPI ({ A, C, D } | { A, B, C, D })=min { PR ({ A, B, C, D }, A)=2/4, PR ({ A, B, C, D }, C)=1/3, PR ({ A, B, C, D }, D)=2/4 }=1/3.Similarly, SPI ({ A, B, D } | { A, B, C, D })= 2/5。
It defines 2.SPI- and closes co-location patterns, SPI-closed co-location patterns
One co-location patterns c is that SPI- closes co-location patterns, and the PI values and if only if c are more than c at it SPI values in the hyper mode c' that all SPI- are closed.I.e.:And if only ifC' is that SPI- closes co-location patterns, simultaneously PI(c)>SPI (c | c') → c is that a SPI- closes co-location patterns.
For convenience of subsequent description, tradition is closed co-location patterns by we, and (i.e. co- is closed in document 2 proposition Location patterns) it is denoted as PI- and closes co-location patterns (PI-closed co-location patterns).
Example 2, for the space data sets in Fig. 2 (a), if M=0.3, then { A, B, C, D }, which is a SPI-, closes co- Location patterns.And due to PI ({ A, B, C })=SPI ({ A, B, C } | { A, B, C, D })=PI ({ A, B, C, D }), PI (A, B, D }) and=SPI ({ A, B, D } | { A, B, C, D })>PI ({ A, B, C, D }) is closed so { A, B, C } and { A, B, D } is not SPI- Co-location patterns, still, { A, B, D } are that a PI- closes co-location patterns.
It defines 3.SPI- and closes frequent co-location patterns, SPI-closed prevalent co-location patterns
If it is that SPI- closes frequent co-location patterns that a SPI-, which closes co-location patterns c, it is and if only if c What that SPI- is closed and PI (c) >=M, wherein M were indicated is the frequency threshold value that user specifies.
In order to simplify statement, closes co-location patterns with SPI- and SPI- is replaced to close frequent co-location patterns.And For a co-location pattern c, make if there is a co-location patterns c'And PI (c)=SPI (c | C') (PI (c)=PI (c')), then claiming " c'SPI- covers c (PI- coverings) ".
If lemma 1.And " c'PI- covers c ", it is set up then " c'SPI- covers c " is certain.
Lemma 2. is in frequent co-location sets of patterns, and SPI- covering relations are a kind of pseudo- partial ordering relations, it meets:
(1) " c SPI- cover c ".(reflexivity)
(2) if " c ' SPI- cover c " and " c SPI- cover c ' ", then c=c '.(skew-symmetry)
(3) ifPI (c)=PI (c ') and " c " SPI- cover c ' ", then " c " centainly can SPI- cover c ". (pseudo transitivity)
It was noticed that " PI- covering relations " meets transitivity, still " SPI- covering relations " is really not so.This Namely why define 2 in attached condition " c' is that SPI- closes co-location patterns ".In addition, finding SPI- closes co- The necessary top-down progress of the process of location patterns, that is to say, that the excavation that SPI- closes co-location patterns is from height Rank sequentially generates to low order.Finally, the compression performance of the compression performance ratio PI- close pattern collection of SPI- close patterns collection is eager to excel, That is for same data set, SPI- close pattern collection S that we obtainSPI-closedIncluded in pattern quantity ratio PI- Close pattern collection SPI-closedIncluded in pattern quantity to lack.
If 3. c ∈ S of lemmaSPI-closed, then c ∈ SPI-closed, on the contrary then not necessarily set up.
It is described below and directly excavates the effective ways that SPI- closes co-location patterns, referred to as SPI- closes delver.
Co-location sets of patterns are closed in order to quickly generate SPI-, need to pre-process input data, process is as follows:
For a space characteristics example f.i ∈ S, it includes f.i and all and f.i that its example neighbours' transaction set, which is one, The set of other space characteristics examples with proximity relations.It is, NT (f.i)=f.i, g.j ∈ S | NR (f.i, g.j) =true and f ≠ g }, wherein the proximity relations between NR representation spaces example, f.i are referred to as reference example.
For example, in Fig. 2 (a), the example neighbours' transaction set of example A.1 is { A.1, B.1, C.1, D.1 }.And table 1 (a) is given Neighbours' transaction set of all space instances in Fig. 2 (a) is gone out.
The lexcographical order collection of different spaces feature, referred to as feature neighbours transaction set in example neighbours' transaction set, as table 1 (b) is The corresponding feature neighbours transaction set of table 1 (a).
Table 1 is the example neighbours transaction set and feature neighbours' transaction set of Fig. 2 (a) space data sets
Candidate SPI- is quickly generated for convenience and closes co-location patterns and beta pruning candidate search space, uses word Canonical ordering prefix tree construction stores feature neighbours' transaction set.
First, we define lexcographical order prefix trees.It is using the characteristic type of reference example as root node, with feature neighbours' thing It is child node that neighbors feature is concentrated in business.Each node is by three parts Composition of contents:Characteristic type, count value and node line.Its In, characteristic type is used for identifying node;Count value representative is concentrated with several paths in entire feature affairs can be from reference example Characteristic type reach this feature type;The connection of node line is to possess same characteristic features type with the node in this tree Node.
For example, feature in table 1 (b) adjacent to transaction set lexcographical order prefix trees such as shown in Fig. 3 (a).
Secondly as in lexcographical order prefix trees, all child nodes all have proximity relations with root node, it is possible to It generates star-like SPI- and closes candidate co-location patterns.And by lexcographical order prefix trees, this star-like SPI- can also be obtained and closed Dividing value on the participation rate of co-location patterns (it is dividing value on the participation rate of root node).If in same one tree, some time The upper bound participation rate of choosing is equal to its super candidate upper bound participation rate, then, just mark this red star-like candidate.If but some is candidate Upper bound participation rate be less than threshold value M, then just it is deleted.
For example, for the lexcographical order prefix trees of feature A in Fig. 3 (a), if M=0.3, can obtain it is star-like candidate and on Boundary's participation:{A,B,C,D}:2/4, { A, B, C }:2/4, { A, B, D }:2/4, { A, C, D }:3/4, { A, B }:3/4, { A, C }:3/ 4, { A, D }:3/4.And wherein need to be marked red candidate pattern be respectively:{A,B,C}:2/4, { A, B, D }:2/4, { A, C }:3/ 4 and { A, D }:3/4.The star-like SPI- that 4 prefix trees can generate in Fig. 3 (a) closes co-location candidate patterns such as Fig. 3 (b) It is shown.
Finally, co-location candidate patterns are closed by combining k related star-like SPI-, generates k ranks group SPI- and closes co- Location candidate patterns, and upper bound participation rate minimum in this k star-like candidates is that k ranks group SPI- closes co-location The upper bound of candidate pattern participates in angle value.
Beta pruning 1 (non-frequent beta pruning):If a co-location patterns c is not some feature fi(fi∈ c) prefix trees Star-like SPI- closes co-location candidate patterns, then c can be fallen by beta pruning.
If for example, M=0.4, then the star-like SPI- that pattern { C, A, B, D } and { C, A, B } are not just feature C prefix trees is closed Co-location candidate patterns.So, { A, B, C, D } and { A, B, C } is unable to a recombinant formation group SPI- and closes co-location Candidate pattern, { A, B, C, D } and { A, B, C } can be fallen by beta pruning.
Beta pruning 2 (non-SPI- closes beta pruning 1):If the UPI values (upper bound participation angle value) that a group SPI- closes candidate pattern c are marked It is red, and UPI (c)=UPI (c') (C ' is that a group SPI- closes candidate pattern), then c can be fallen by beta pruning.
For example, in Fig. 3 (c), UPI ({ A, B, C })=UPI ({ A, B, C, D }), if { A, B, C, D }, which is a SPI-, closes time Lectotype, then { A, B, C } can be fallen by beta pruning.Similarly, { A, C } and { B, C } can also be fallen by beta pruning.
Beta pruning 3 (non-SPI- closes beta pruning 2):If the UPI values that a group SPI- closes candidate pattern c are red by mark, and UPI (c)= USPI(c|c')(C ' is that a group SPI- closes candidate pattern, USPI (c | c') refer to dividing value in super participation), then c It can be fallen by beta pruning.
For example, in Fig. 3 (c), UPI ({ A, B, D })=USPI ({ A, B, D } | { A, B, C, D })=2/5, if A, B, C, D } it is that a SPI- closes candidate pattern, then { A, B, D } can be fallen by beta pruning.Similarly, { A, D } can also be fallen by beta pruning.But Due to UPI ({ B, D }) ≠ USPI ({ B, D } | { B, C, D }), so { B, D } cannot be fallen by beta pruning.
As shown in Fig. 3 (c), if M=0.3, then available SPI- closes candidate pattern and their UPI values are distinguished For:{A,B,C,D}:1/3, { A, C, D }:2/3, { B, C, D }:2/3 }, { A, B }:3/5, { B, D }:3/4 and { C, D }:1.Pay attention to It arrives, for the space data sets in Fig. 2 (a), by above-mentioned beta pruning process, all non-SPI- close patterns have all been fallen by beta pruning.
It is furthermore noted that beta pruning 3 contains beta pruning 2, that is to say, that the candidate pattern that can be fallen by 2 beta pruning of beta pruning also can Fallen with 3 beta pruning of beta pruning.The reason of why retaining beta pruning 2 is:1, when use value Comparing method, the calculating of beta pruning 2 is complicated Property is lower than beta pruning 3;2, the non-SPI- close patterns for meeting 2 condition of beta pruning are in the majority.
Once after generating candidate pattern, it is necessary to find out the table example of each candidate pattern and calculate the true of them Real participation (PI) value.In this regard, will complete in a top-down manner.
By scanning neighbours' transaction set, the candidate table example of candidate pattern can be obtained, then by detecting other examples Neighbours' transaction set can really be met the table example of regimental tie.For example, in Fig. 3 (a), { A.2, B.2, C.1, D.2 } is The true table example of candidate pattern { A, B, C, D }.C.1, B.1, A.2, but { D.2 } is not just.
For k rank candidate pattern c, if PI (c)=UPI (c), then c, which must be SPI-, closes co-location patterns.It is no Then, it is necessary to all k-1 ranks subpatterns for first generating pattern c, if next, PI (c)<M, then c is just fallen by beta pruning;If big In equal to M, then needing to detect whether c is that a SPI- closes co-location patterns according to defining 2 and define 3.
It should be noted that the UPI values of 2 rank co-location patterns are its true PI values.
In the following, providing the algorithm that the SPI- that direct excavation SPI- closes co-location patterns closes delver:
And the major function of Part III is to calculate the true PI values of each candidate pattern in CNCC, and generate SPI- and close frequency Numerous co-location sets of patterns Ω.Particularly, if candidate pattern c, PI (c)=UPI (c), then just directly pattern C is moved on to from CNCC in Ω.If but the PI (c) of candidate pattern c ≠ UPI (c), then just with Steps 25) -27) it is further processed They.
Below by one group of experiment (embodiment) come verify SPI- proposed by the present invention close co-location sets of patterns and SPI- closes the performance of delver.Programming tool is Visual C++ used by this experiment.Run the experimental ring of SPI- delvers Border is:CPU:Intel Core i5 3337U@1.80GHz;RAM:2GB;Operating System:Microsoft Windows 7.
Data used in embodiment are the plant distributions data sets from " Yunnan Three Parallel Rivers protection zone ", it possesses few Quantity space feature, but contain a large amount of Example characteristics.This group of data are distributed across 110000m × 160000m regional extents, it is not only Data containing discrete distribution, and contain the data for the distribution that clusters.As shown in table 1.
Three Parallel Rivers in table 1. Yunnan protection zone plant distributions data set
Data set name Characteristic Instance number (Max,Min) Example distributed areas (rice)
Three Parallel Rivers in Yunnan plant distributions data set 15 501046 (55646,8706) 110000×160000
(Max,Min):For indicating the maximum example number of all features and minimum example number in this data set
Using data set shown in table 1, we close delver to SPI- and PI- closes delver and compares.Such as 2 institute of table Show, what it was provided is with the growth of pattern exponent number, the quantity of the quantity and final result pattern of the candidate pattern of generation.It can be with See, SPI- close delver generation candidate pattern quantity ratio PI- close delver generation candidate pattern quantity it is few.This Outside, with the growth of pattern exponent number, SPI- closes the quantity of candidate pattern of delver generation and the quantity of final result pattern is got over Come closer to.The run time of algorithm thus can be significantly reduced, because whether one long candidate pattern of judgement is mold closing Formula the time it takes is longer than one shorter candidate pattern the time it takes of judgement.
Table 2.SPI- closes delver and PI- closes delver comparison
In this experiment, d=10000, M=0.3 is arranged in we
As shown in figure 4, it will be seen that as M and d smaller, SPI- closes delver ratio PI- and closes delver operation It is fast.Especially as M=0.1, it is three times fast that SPI- closes delver ratio PI- delvers.
It is an advantage of the invention that:1, the SPI- proposed closes co-location sets of patterns and provides frequent co-location moulds (collection reduction is about for the co-location patterns (be known as PI- and close co-location patterns) of closing proposed than document 2 for the smaller of formula collection 30%), the expression of not losing participation information.2, the SPI- designed closes the run time of delver and closes co- than traditional PI- Time used in location mode excavations will be lacked.Firstly, because the pact of the constraints ratio PI- close patterns of SPI- close patterns Beam condition is eager to excel, so, compared with PI- close pattern mining algorithms, closing the candidate pattern generated in delver in SPI- will lack;Its It is secondary, during SPI- closes co-location schema creations, a large amount of time be used in generate co-location examples and It calculates in PI values.Therefore, during generating candidate pattern, it is non-to remove as much as possible that we used 3 kinds of Pruning strategies SPI- close patterns, the data as shown in Fig. 1 (a), combination filtration stage, all non-SPI- close co-location patterns by Whole beta prunings are fallen.
Each embodiment in this specification is all made of relevant mode and describes, identical similar portion between each embodiment Point just to refer each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so description is fairly simple, related place is referring to embodiment of the method Part explanation.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention It is interior.

Claims (3)

1. the lossless compression method of frequent co-location patterns, which is characterized in that define SPI- first and close co-location Pattern:
1. are defined for two given co-location patterns c and c ' andSuper participation SPIs (c | c') of the c in c ' It is defined as the minimum value of all feature participation rates in the c being calculated by the table example of c', i.e.,:SPI (c | c')=min {PR(c',fi),fi∈c};
It is that SPI- closes co-location patterns to define 2. 1 co-location patterns c, and the PI values and if only if c exist more than c SPI values in the hyper mode c' that its all SPI- is closed, i.e.,:And if only ifC' is that SPI- closes co-location patterns, together When PI (c)>SPI (c | c') → c is that a SPI- closes co-location patterns;
If it is that SPI- closes frequent co-location patterns to define 3. 1 SPI- and close co-location patterns c, it is and if only if c What that SPI- is closed and PI (c) >=M, wherein M were indicated is the frequency threshold value that user specifies;And for a co-location mould Formula c makes if there is a co-location patterns c'And PI (c)=SPI (c | c'), then claiming " c'SPI- coverings C " makes if there is a co-location patterns c'And PI (c)=PI (c'), claim " c'PI- covers c ";
Then, it follows the steps below:
Step 1, input data is pre-processed:Generate neighbours' transaction set NT and feature neighbours' transaction set ENT;
Step 2, feature neighbours transaction set ENT is stored using lexcographical order prefix tree construction, before feature based neighbours' transaction set Sew tree construction, generate star-like SPI- and close candidate pattern, combine star-like SPI- close candidate pattern generate group a SPI- close candidate pattern;
Step 3, the group of generating SPI- is closed after candidate pattern, by scanning neighbours transaction set NT, obtains the candidate of candidate pattern Table example, then the proximity relations by detecting other examples, are really met the table example of regimental tie;It is calculated based on table example The participation PI of pattern, while judging whether a pattern is that SPI- closes co-location patterns by the following method:
For k rank candidate pattern c, if PI (c)=UPI (c), co-location patterns are closed then c must be SPI-, UPI is upper Boundary's participation;Otherwise, it is necessary to which all k-1 ranks for first generating pattern c are used as candidate by the subpattern of beta pruning, if next, PI (c)<M, then c is just fallen by beta pruning;Such as if it is greater than or equal to M, then need to judge whether c is one according to defining 2 with defining 3 SPI- closes co-location patterns.
2. the lossless compression method of frequent co-location patterns according to claim 1, which is characterized in that the step In rapid 1, pretreatment input data detailed process is:Using given adjacency threshold process input data set, owned Neighbouring example pair, pass through the neighbouring example pair of grouping, generate neighbours transaction set NT, then, generated according to neighbours' transaction set NT special Levy neighbours' transaction set ENT;The set of S representation space Example characteristics, for a space characteristics example f.i ∈ S, its neighbours' thing Business collection NT is one and includes f.i and all set with other space characteristics examples of the f.i with proximity relations, that is, NT (f.i)={ f.i, g.j ∈ S | NR (f.i, g.j)=true and f ≠ g }, the wherein neighbouring pass between NR representation spaces example System, g.j indicate that j-th of example of feature g, f.i are referred to as reference example, and the collection of neighbours' transaction set of all examples is collectively referred to as sky Between data neighbours' transaction set, be denoted as NT;The lexcographical order collection of different spaces feature, referred to as feature neighbours thing in neighbours' transaction set NT Business collection ENT.
3. the lossless compression method of frequent co-location patterns according to claim 1, which is characterized in that the step Rapid 2 are come the detailed process for storing feature neighbours' transaction set ENT using lexcographical order prefix tree construction:
Step 1. defines lexcographical order prefix trees;Using the characteristic type of reference example as root node, with adjacent in feature neighbours' transaction set It occupies and is characterized as child node;Each child node is by three parts Composition of contents:Characteristic type, count value and node line;Wherein, feature Type is used for identifying node;Count value representative is concentrated with several paths in entire feature affairs can be from the characteristic type of reference example Reach this feature type;The connection of node line is the node for possessing same characteristic features type with the node in this tree;
Step 2. is since in lexcographical order prefix trees, all child nodes all have proximity relations with root node, so generating star Type SPI- closes candidate co-location patterns;And by lexcographical order prefix trees, also obtains this star-like SPI- and close co- Dividing value on the participation rate of location patterns;If in same one tree, some candidate upper bound participation rate is equal to its super candidate Upper bound participation rate, then, just mark this red star-like candidate;But if some candidate upper bound participation rate is less than threshold value M, then just It is deleted;
Step 3, co-location candidate patterns are closed by combining k related star-like SPI-, generates k ranks group SPI- and closes co- Location candidate patterns, and upper bound participation rate minimum in this k star-like candidates is that k ranks group SPI- closes co-location The upper bound of candidate pattern participates in angle value.
CN201710430303.0A 2017-06-09 2017-06-09 The lossless compression method of frequent co-location patterns Active CN107291854B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710430303.0A CN107291854B (en) 2017-06-09 2017-06-09 The lossless compression method of frequent co-location patterns

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710430303.0A CN107291854B (en) 2017-06-09 2017-06-09 The lossless compression method of frequent co-location patterns

Publications (2)

Publication Number Publication Date
CN107291854A CN107291854A (en) 2017-10-24
CN107291854B true CN107291854B (en) 2018-10-19

Family

ID=60096809

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710430303.0A Active CN107291854B (en) 2017-06-09 2017-06-09 The lossless compression method of frequent co-location patterns

Country Status (1)

Country Link
CN (1) CN107291854B (en)

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9092454B2 (en) * 2008-04-22 2015-07-28 Microsoft Technology Licensing, Llc Discovering co-located queries in geographic search logs
US8326834B2 (en) * 2008-06-25 2012-12-04 Microsoft Corporation Density-based co-location pattern discovery

Also Published As

Publication number Publication date
CN107291854A (en) 2017-10-24

Similar Documents

Publication Publication Date Title
CN111639237A (en) Electric power communication network risk assessment system based on clustering and association rule mining
Peng et al. A two-stage deanonymization attack against anonymized social networks
CN110347881A (en) A kind of group&#39;s discovery method for recalling figure insertion based on path
CN110232078B (en) Enterprise group relationship acquisition method and system
CN106452825B (en) A kind of adapted telecommunication net alarm correlation analysis method based on improvement decision tree
CN110334391A (en) A kind of various dimensions constraint wind power plant collection electric line automatic planning
CN106599230A (en) Method and system for evaluating distributed data mining model
CN106202430A (en) Live platform user interest-degree digging system based on correlation rule and method for digging
CN105183796A (en) Distributed link prediction method based on clustering
CN105976048A (en) Power transmission network extension planning method based on improved artificial bee colony algorithm
CN111651613B (en) Knowledge graph embedding-based dynamic recommendation method and system
CN109376544A (en) A method of prevent the community structure in complex network from being excavated by depth
John et al. Energy saving cluster head selection in wireless sensor networks for internet of things applications
CN104700311B (en) A kind of neighborhood in community network follows community discovery method
CN107291854B (en) The lossless compression method of frequent co-location patterns
Bao et al. Mining non-redundant co-location patterns
Wang et al. Spatial Co-location Pattern Mining Based on Fuzzy Neighbor Relationship.
CN103164487A (en) Clustering algorithm based on density and geometrical information
Janeja et al. Random walks to identify anomalous free-form spatial scan windows
Liu et al. Wl-align: Weisfeiler-lehman relabeling for aligning users across networks via regularized representation learning
CN107944015A (en) Threedimensional model typical structure based on simulated annealing excavates and method for evaluating similarity
CN109033746A (en) A kind of protein complex recognizing method based on knot vector
CN114911849A (en) Mobile network traffic pattern mining method based on complex network theory
Arab et al. A modularity maximization algorithm for community detection in social networks with low time complexity
CN107147520A (en) A kind of terroristic organization&#39;s Web Mining algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200609

Address after: Room b214, zone a, 968, Xiujiang East Road, Yiyang New District, Yuanzhou District, Yichun City, Jiangxi Province

Patentee after: Jiangxi zhengrudder Network Technology Co., Ltd

Address before: 650091 Yunnan Province, Kunming city Wuhua District Lake Road No. 2

Patentee before: YUNNAN University