CN107291854B - The lossless compression method of frequent co-location patterns - Google Patents
The lossless compression method of frequent co-location patterns Download PDFInfo
- Publication number
- CN107291854B CN107291854B CN201710430303.0A CN201710430303A CN107291854B CN 107291854 B CN107291854 B CN 107291854B CN 201710430303 A CN201710430303 A CN 201710430303A CN 107291854 B CN107291854 B CN 107291854B
- Authority
- CN
- China
- Prior art keywords
- spi
- location
- patterns
- neighbours
- candidate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1744—Redundancy elimination performed by the file system using compression, e.g. sparse files
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
Abstract
The invention discloses the lossless compression methods of frequent co location patterns, input data is pre-processed first, feature neighbours' transaction set is stored using lexcographical order prefix tree construction, the prefix tree construction of feature based neighbours' transaction set, generate star-like SPI and close candidate pattern, combine star-like SPI close candidate pattern generate group a SPI close candidate pattern;The group of generating SPI is closed after candidate pattern, by scanning neighbours transaction set NT, can be obtained the candidate table example of candidate pattern, then the proximity relations by detecting other examples, can really be met the table example of regimental tie;The participation PI of pattern can be calculated based on table example, while can be determined that whether a pattern is that SPI closes co location patterns.This method provides the compression expressions that is smaller, not losing participation information of frequent co location sets of patterns.
Description
Technical field
The invention belongs to space co-location (juxtaposition) mode excavation technical fields, more particularly to a kind of frequent co-
The lossless compression method of location (juxtaposition) pattern.
Background technology
Spatial co-location patterns excavate (mining prevalent co-location patterns from
Spatial data sets) the continually space characteristics group of close adjacent appearance in the geographical space data sets of tradition for identification.
If Fig. 1 is the example of a space data sets.With the space characteristics that different icon representations is different, such as house in figure.In figure altogether
There are 5 features, each feature there are 4 space instances.As can be observed from Figure, the withered tree of space characteristics and mountain fire and house and
The example of bird tends to continually close adjacent appearance.The two patterns imply " mountain fire is frequently too many related with withered tree ", and " room
Son is frequently occurred with bird, illustrates that living environment is improved ".
The application that spatial co-location patterns excavate include find out life (or growth) space it is overlapped move (or
Plant) object, the location often assembled of identification specific crowd is purposefully to place particular advertisement, to understand Earth climate system difference
Contact etc. between element.
Since there is this research field higher theoretical research and actual application value, domestic and international many researchers to propose
Various spatial co-location patterns mining algorithms.With from transaction database Mining Frequent Itemsets Based it is similar, from large space number
Significant challenge according to Mining Frequent co-location sets of patterns in library is that this excavation usually generates the minimum frequency of a large amount of satisfactions
The spatial co-location patterns of threshold value M, it is especially true when M sets very low.This is because an if co-location
Pattern is frequent, then his all subsets are also frequent.For this purpose, the prior art (J.S.Yoo and M.Bow.Mining
top-k closed co-location patterns[C].In Proceedings of IEEE International
Conference on Spatial Data Mining and Geographical Knowledge Services(ICSDM),
Pp.100-105,2011. it is denoted as document 2) propose that a kind of TOP-k closes co-location schema concepts and corresponding excavate is calculated
Method, in the publication, closing the concept of co-location patterns is:One co-location patterns c is closed, and if only if it
Participation angle value be all higher than the participation angle value of its any hyper mode, i.e.,:TOP-k closes co-
The concept of location patterns is:If L is by all lists for closing co-location patterns for participating in the arrangement of angle value descending, p is
K-th of participation for closing co-location patterns in list L, it refers to all participation angle value that TOP-k, which closes co-location patterns,
The set for closing co-location patterns more than or equal to p.The method that TOP-k closes co-location mode excavations is:First,
Input data is organized into example neighbours transaction set and feature neighbours' transaction set.The advantages of neighbours' transaction set, is:1, it will not lose
Lose any example pair with proximity relations;2, it easily builds up;3, candidate co-location patterns can easily be generated by it;
4, it can provide the upper dividing value of co-location pattern participations.Secondly, there is neighbour to store using class FP-tree structures
The space characteristics of nearly relationship, so as to obtain star-like neighbours' candidate pattern.Then, by combining related star-like neighbours candidate mould
Formula can obtain the upper dividing value of candidate co-location patterns and candidate pattern participation.Next, utilizing internal minimum frequency
Numerous property threshold θ can define a beta pruning frame, to reduce the quantity of candidate pattern in search space.This beta pruning frame
Basic thought is:Participation angle value minimum in current top-k result sets is set as internal minimum frequency threshold θ, if next
The upper dividing value of the participation of a candidate is less than θ, then it and its all supersets can be with beta pruning.Subsequently, for being unable to beta pruning
Candidate pattern, using example neighbours' transaction set obtain candidate pattern table example, so as to calculate the true of candidate pattern
Participate in angle value.Finally, the participation of comparison co-location patterns is concentrated in top-k Ordinals using the method for binary search
Angle value, to obtain final top-k close pattern collection.
For space data sets shown in Fig. 2 (a), F={ A, B, C, D } is usually used to carry out the set of representation space feature, and
Example characteristics are then denoted as " feature digital numbers ", such as " A.1 ", in figure with the line between example indicate two examples between exist
Proximity relations.It can be seen that this spatial data shown in Fig. 2 (a) concentrates feature A to have 4 examples, B to have 5 examples, C to have
3 examples, and D has 4 examples.And the introducing of participation rate and participation, then it can be used to pick out frequent co- all in F
Location patterns.If minimum participation threshold value M=0.3 is arranged at this time, then the frequent co- that this data set is included
Location patterns are:{A,B,C,D},{A,B,C},{A,B,D},{A,C,D},{B,C,D},{A,B},{A,C},{A,D},
{ B, C }, { B, D } and { C, D } (Fig. 2 (b) gives the table example of all co-location patterns of Fig. 2 (a) data sets, participates in
Rate and participation are equivalent).
However, since the co-location patterns for meeting downward closure property will produce a large amount of redundant mode, such as
What uses a reduced set (smaller set), enables it that can not only describe initial result set and be derived by it original
Result set has become new research hotspot.Then, scholars propose very big co-location sets of patterns and close co-
The two concepts of location sets of patterns.Very big co-location sets of patterns are the reduced sets of lossy compression, though this is because it
It can derive initial result set, but differ and surely derive corresponding participation (PI) value.And closing co-location sets of patterns is
The reduced set of lossless compression, it, which solves very big co-location sets of patterns, cannot derive asking for corresponding participation (PI) value
Topic.But this is but also the ability of simplifying for closing co-location sets of patterns becomes very limited.For example, data set closes in Fig. 2 (a)
Co-location sets of patterns be { A, B, C, D }, { A, B, D }, { A, C, D }, { B, C, D }, { A, B }, { A, D }, { B, D }, C,
D } }, it can be seen that such reduced set is still very big.
In practice, the quantity of the frequent co-location patterns generated by space data sets may be very big, therefore, from
In identify compressed subset that can derive other all frequent co-location patterns, smaller, representative
It is useful.Frequently closing co-location sets of patterns (be denoted as PI- and close co-location sets of patterns) provides frequent co-
Participation information is not lost in a kind of compression expression of location sets of patterns, the expression.It is existing frequently to close co-location moulds
Formula concept has followed the thought of traditional Frequent Closed Itemsets, as a result, compression ratio is extremely low, i.e., frequently closes co-location pattern counts
Mesh is still bigger than normal.
Invention content
The embodiment of the present invention is designed to provide a kind of lossless compression method of frequent co-location patterns, the party
Method provides the compression expression that is smaller, not losing participation information of frequent co-location sets of patterns.
The technical solution adopted in the present invention is that the lossless compression method of frequent co-location patterns defines first
SPI- closes co-location patterns:
1. are defined for two given co-location patterns c and c ' andC in c ' super participation SPI (c |
C') it is defined as the minimum value of all feature participation rates in the c being calculated by the table example of c', i.e.,:SPI (c | c')=
min{PR(c',fi),fi∈c};
It is that SPI- closes co-location patterns to define 2. 1 co-location patterns c, and the PI values and if only if c are more than
SPI values of the c in the hyper mode c' that its all SPI- is closed, i.e.,:And if only ifC' is that SPI- closes co-location moulds
Formula, while PI (c)>SPI (c | c') → c is that a SPI- closes co-location patterns;
If defining 3. 1 SPI-, to close co-location patterns c be that SPI- closes frequent co-location patterns, when and only
When c is that SPI- is closed and what PI (c) >=M, wherein M were indicated is frequency threshold value that user specifies;And for a co-
Location pattern c make if there is a co-location patterns c'And PI (c)=SPI (c | c') (PI (c)=
PI (c')), then claiming " c'SPI- covers c (PI- coverings) ";
Then, it follows the steps below:
Step 1, input data is pre-processed:Generate neighbours' transaction set NT and feature neighbours' transaction set ENT;
Step 2, feature neighbours' transaction set is stored using lexcographical order prefix tree construction, before feature based neighbours' transaction set
Sew tree construction, generate star-like SPI- and close candidate pattern, combine star-like SPI- close candidate pattern generate group a SPI- close candidate pattern;
Step 3, the group of generating SPI- is closed after candidate pattern, by scanning neighbours transaction set NT, can obtain candidate mould
The candidate table example of formula, then the proximity relations by detecting other examples, can really be met the table example of regimental tie;Base
The participation PI of pattern can be calculated in table example, while can judge whether a pattern is that SPI- is closed by the following method
Co-location patterns:
For k rank candidate pattern c, if PI (c)=UPI (c) (upper bound participation UPI of the participation PI=c of c), then c
Must be that SPI- closes co-location patterns;Otherwise, it is necessary to which all k-1 ranks for first generating pattern c are made by the subpattern of beta pruning
For candidate, if next, PI (c)<M, then c is just fallen by beta pruning;Such as if it is greater than or equal to M, then need according to definition 2 and definition 3
To judge whether c is that a SPI- closes co-location patterns.
Further, in the step 1, pretreatment input data detailed process is:Use given adjacency threshold value
Input data set is handled, all neighbouring examples pair are obtained, by being grouped adjacent to example pair, generation neighbours transaction set NT, then,
Feature neighbours' transaction set ENT is generated according to neighbours' transaction set NT;For a space characteristics example f.i ∈ S, its example neighbours
Transaction set is one and includes f.i and all set with other space characteristics examples of the f.i with proximity relations, that is, NT
(f.i)={ f.i, g.j ∈ S | NR (f.i, g.j)=true and f ≠ g }, the wherein neighbouring pass between NR representation spaces example
System, f.i are referred to as reference example, and the collection of neighbours' transaction set of all examples is collectively referred to as neighbours' transaction set of spatial data, is denoted as
NT;The lexcographical order collection of different spaces feature, referred to as feature neighbours transaction set ENT in example neighbours' transaction set NT.
Further, the step 2 stores the detailed process of feature neighbours' transaction set using lexcographical order prefix tree construction
For:
Step 1. defines lexcographical order prefix trees;Using the characteristic type of reference example as root node, with feature neighbours' transaction set
Middle neighbors feature is child node;Each child node is by three parts Composition of contents:Characteristic type, count value and node line;Wherein,
Characteristic type is used for identifying node;Count value representative is concentrated with several paths in entire feature affairs can be from the spy of reference example
Sign type reaches this feature type;The connection of node line is the section for possessing same characteristic features type with the node in this tree
Point;
Step 2. is since in lexcographical order prefix trees, all child nodes all have proximity relations with root node, so can
Candidate co-location patterns are closed to generate star-like SPI-;And by lexcographical order prefix trees, this star-like SPI- can also be obtained
Close dividing value (UPR) on the participation rate of co-location patterns;If in same one tree, some candidate upper bound participation rate is equal to
Its super candidate upper bound participation rate, then, just mark this red star-like candidate;If but some candidate upper bound participation rate is less than threshold
Value M, then just it is deleted;
Step 3, co-location candidate patterns are closed by combining k related star-like SPI-, generates k ranks group SPI- and closes co-
Location candidate patterns, and upper bound participation rate minimum in this k star-like candidates is that k ranks group SPI- closes co-location
The upper bound of candidate pattern participates in angle value (UPI).
The beneficial effects of the invention are as follows:A kind of lossless compression new method of frequent co-location patterns is proposed, referred to as
SPI- closes co-location mode excavation methods.SPI- closes co-location sets of patterns and provides frequent co-location moulds
(the co-location patterns of closing proposed than the prior art (are known as PI- and close co-location patterns) collection reduction to the smaller of formula collection
About 30%), the compression expression of not losing participation information.
Description of the drawings
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
Obtain other attached drawings according to these attached drawings.
Fig. 1 is space data sets citing.
Fig. 2 a are a space data sets examples, and Fig. 2 b are the co- of all possible co-location patterns in Fig. 2 a
Location examples, participation rate and participation angle value.
Fig. 3 is the generation example of candidate pattern, and wherein Fig. 3 a are the lexcographical order prefix trees of feature in table 1 (a) data set;Figure
3b is that star-like SPI- closes candidate pattern;Fig. 3 c are that a SPI- closes candidate pattern.
Fig. 4 is SPI- delvers and PI- delver run time comparison diagrams in the embodiment of the present invention.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
First, related definition and lemma that SPI- closes co-location patterns are provided, this can directly be excavated by then providing
The method of one quasi-mode.
It defines and 1. surpasses participation, The super participation index SPI (c | c')
For two given co-location patterns c and c ' andSuper participation SPI (c | the c') quilts of c in c '
It is defined as the minimum value of all feature participation rates in the c being calculated by the table example of c'.I.e.:SPI (c | c')=min { PR
(c',fi),fi∈c}。
Example 1, in Fig. 2 (a), SPI ({ A, C, D } | { A, B, C, D })=min { PR ({ A, B, C, D }, A)=2/4, PR
({ A, B, C, D }, C)=1/3, PR ({ A, B, C, D }, D)=2/4 }=1/3.Similarly, SPI ({ A, B, D } | { A, B, C, D })=
2/5。
It defines 2.SPI- and closes co-location patterns, SPI-closed co-location patterns
One co-location patterns c is that SPI- closes co-location patterns, and the PI values and if only if c are more than c at it
SPI values in the hyper mode c' that all SPI- are closed.I.e.:And if only ifC' is that SPI- closes co-location patterns, simultaneously
PI(c)>SPI (c | c') → c is that a SPI- closes co-location patterns.
For convenience of subsequent description, tradition is closed co-location patterns by we, and (i.e. co- is closed in document 2 proposition
Location patterns) it is denoted as PI- and closes co-location patterns (PI-closed co-location patterns).
Example 2, for the space data sets in Fig. 2 (a), if M=0.3, then { A, B, C, D }, which is a SPI-, closes co-
Location patterns.And due to PI ({ A, B, C })=SPI ({ A, B, C } | { A, B, C, D })=PI ({ A, B, C, D }), PI (A,
B, D }) and=SPI ({ A, B, D } | { A, B, C, D })>PI ({ A, B, C, D }) is closed so { A, B, C } and { A, B, D } is not SPI-
Co-location patterns, still, { A, B, D } are that a PI- closes co-location patterns.
It defines 3.SPI- and closes frequent co-location patterns, SPI-closed prevalent co-location
patterns
If it is that SPI- closes frequent co-location patterns that a SPI-, which closes co-location patterns c, it is and if only if c
What that SPI- is closed and PI (c) >=M, wherein M were indicated is the frequency threshold value that user specifies.
In order to simplify statement, closes co-location patterns with SPI- and SPI- is replaced to close frequent co-location patterns.And
For a co-location pattern c, make if there is a co-location patterns c'And PI (c)=SPI (c |
C') (PI (c)=PI (c')), then claiming " c'SPI- covers c (PI- coverings) ".
If lemma 1.And " c'PI- covers c ", it is set up then " c'SPI- covers c " is certain.
Lemma 2. is in frequent co-location sets of patterns, and SPI- covering relations are a kind of pseudo- partial ordering relations, it meets:
(1) " c SPI- cover c ".(reflexivity)
(2) if " c ' SPI- cover c " and " c SPI- cover c ' ", then c=c '.(skew-symmetry)
(3) ifPI (c)=PI (c ') and " c " SPI- cover c ' ", then " c " centainly can SPI- cover c ".
(pseudo transitivity)
It was noticed that " PI- covering relations " meets transitivity, still " SPI- covering relations " is really not so.This
Namely why define 2 in attached condition " c' is that SPI- closes co-location patterns ".In addition, finding SPI- closes co-
The necessary top-down progress of the process of location patterns, that is to say, that the excavation that SPI- closes co-location patterns is from height
Rank sequentially generates to low order.Finally, the compression performance of the compression performance ratio PI- close pattern collection of SPI- close patterns collection is eager to excel,
That is for same data set, SPI- close pattern collection S that we obtainSPI-closedIncluded in pattern quantity ratio PI-
Close pattern collection SPI-closedIncluded in pattern quantity to lack.
If 3. c ∈ S of lemmaSPI-closed, then c ∈ SPI-closed, on the contrary then not necessarily set up.
It is described below and directly excavates the effective ways that SPI- closes co-location patterns, referred to as SPI- closes delver.
Co-location sets of patterns are closed in order to quickly generate SPI-, need to pre-process input data, process is as follows:
For a space characteristics example f.i ∈ S, it includes f.i and all and f.i that its example neighbours' transaction set, which is one,
The set of other space characteristics examples with proximity relations.It is, NT (f.i)=f.i, g.j ∈ S | NR (f.i, g.j)
=true and f ≠ g }, wherein the proximity relations between NR representation spaces example, f.i are referred to as reference example.
For example, in Fig. 2 (a), the example neighbours' transaction set of example A.1 is { A.1, B.1, C.1, D.1 }.And table 1 (a) is given
Neighbours' transaction set of all space instances in Fig. 2 (a) is gone out.
The lexcographical order collection of different spaces feature, referred to as feature neighbours transaction set in example neighbours' transaction set, as table 1 (b) is
The corresponding feature neighbours transaction set of table 1 (a).
Table 1 is the example neighbours transaction set and feature neighbours' transaction set of Fig. 2 (a) space data sets
Candidate SPI- is quickly generated for convenience and closes co-location patterns and beta pruning candidate search space, uses word
Canonical ordering prefix tree construction stores feature neighbours' transaction set.
First, we define lexcographical order prefix trees.It is using the characteristic type of reference example as root node, with feature neighbours' thing
It is child node that neighbors feature is concentrated in business.Each node is by three parts Composition of contents:Characteristic type, count value and node line.Its
In, characteristic type is used for identifying node;Count value representative is concentrated with several paths in entire feature affairs can be from reference example
Characteristic type reach this feature type;The connection of node line is to possess same characteristic features type with the node in this tree
Node.
For example, feature in table 1 (b) adjacent to transaction set lexcographical order prefix trees such as shown in Fig. 3 (a).
Secondly as in lexcographical order prefix trees, all child nodes all have proximity relations with root node, it is possible to
It generates star-like SPI- and closes candidate co-location patterns.And by lexcographical order prefix trees, this star-like SPI- can also be obtained and closed
Dividing value on the participation rate of co-location patterns (it is dividing value on the participation rate of root node).If in same one tree, some time
The upper bound participation rate of choosing is equal to its super candidate upper bound participation rate, then, just mark this red star-like candidate.If but some is candidate
Upper bound participation rate be less than threshold value M, then just it is deleted.
For example, for the lexcographical order prefix trees of feature A in Fig. 3 (a), if M=0.3, can obtain it is star-like candidate and on
Boundary's participation:{A,B,C,D}:2/4, { A, B, C }:2/4, { A, B, D }:2/4, { A, C, D }:3/4, { A, B }:3/4, { A, C }:3/
4, { A, D }:3/4.And wherein need to be marked red candidate pattern be respectively:{A,B,C}:2/4, { A, B, D }:2/4, { A, C }:3/
4 and { A, D }:3/4.The star-like SPI- that 4 prefix trees can generate in Fig. 3 (a) closes co-location candidate patterns such as Fig. 3 (b)
It is shown.
Finally, co-location candidate patterns are closed by combining k related star-like SPI-, generates k ranks group SPI- and closes co-
Location candidate patterns, and upper bound participation rate minimum in this k star-like candidates is that k ranks group SPI- closes co-location
The upper bound of candidate pattern participates in angle value.
Beta pruning 1 (non-frequent beta pruning):If a co-location patterns c is not some feature fi(fi∈ c) prefix trees
Star-like SPI- closes co-location candidate patterns, then c can be fallen by beta pruning.
If for example, M=0.4, then the star-like SPI- that pattern { C, A, B, D } and { C, A, B } are not just feature C prefix trees is closed
Co-location candidate patterns.So, { A, B, C, D } and { A, B, C } is unable to a recombinant formation group SPI- and closes co-location
Candidate pattern, { A, B, C, D } and { A, B, C } can be fallen by beta pruning.
Beta pruning 2 (non-SPI- closes beta pruning 1):If the UPI values (upper bound participation angle value) that a group SPI- closes candidate pattern c are marked
It is red, and UPI (c)=UPI (c') (C ' is that a group SPI- closes candidate pattern), then c can be fallen by beta pruning.
For example, in Fig. 3 (c), UPI ({ A, B, C })=UPI ({ A, B, C, D }), if { A, B, C, D }, which is a SPI-, closes time
Lectotype, then { A, B, C } can be fallen by beta pruning.Similarly, { A, C } and { B, C } can also be fallen by beta pruning.
Beta pruning 3 (non-SPI- closes beta pruning 2):If the UPI values that a group SPI- closes candidate pattern c are red by mark, and UPI (c)=
USPI(c|c')(C ' is that a group SPI- closes candidate pattern, USPI (c | c') refer to dividing value in super participation), then c
It can be fallen by beta pruning.
For example, in Fig. 3 (c), UPI ({ A, B, D })=USPI ({ A, B, D } | { A, B, C, D })=2/5, if A, B, C,
D } it is that a SPI- closes candidate pattern, then { A, B, D } can be fallen by beta pruning.Similarly, { A, D } can also be fallen by beta pruning.But
Due to UPI ({ B, D }) ≠ USPI ({ B, D } | { B, C, D }), so { B, D } cannot be fallen by beta pruning.
As shown in Fig. 3 (c), if M=0.3, then available SPI- closes candidate pattern and their UPI values are distinguished
For:{A,B,C,D}:1/3, { A, C, D }:2/3, { B, C, D }:2/3 }, { A, B }:3/5, { B, D }:3/4 and { C, D }:1.Pay attention to
It arrives, for the space data sets in Fig. 2 (a), by above-mentioned beta pruning process, all non-SPI- close patterns have all been fallen by beta pruning.
It is furthermore noted that beta pruning 3 contains beta pruning 2, that is to say, that the candidate pattern that can be fallen by 2 beta pruning of beta pruning also can
Fallen with 3 beta pruning of beta pruning.The reason of why retaining beta pruning 2 is:1, when use value Comparing method, the calculating of beta pruning 2 is complicated
Property is lower than beta pruning 3;2, the non-SPI- close patterns for meeting 2 condition of beta pruning are in the majority.
Once after generating candidate pattern, it is necessary to find out the table example of each candidate pattern and calculate the true of them
Real participation (PI) value.In this regard, will complete in a top-down manner.
By scanning neighbours' transaction set, the candidate table example of candidate pattern can be obtained, then by detecting other examples
Neighbours' transaction set can really be met the table example of regimental tie.For example, in Fig. 3 (a), { A.2, B.2, C.1, D.2 } is
The true table example of candidate pattern { A, B, C, D }.C.1, B.1, A.2, but { D.2 } is not just.
For k rank candidate pattern c, if PI (c)=UPI (c), then c, which must be SPI-, closes co-location patterns.It is no
Then, it is necessary to all k-1 ranks subpatterns for first generating pattern c, if next, PI (c)<M, then c is just fallen by beta pruning;If big
In equal to M, then needing to detect whether c is that a SPI- closes co-location patterns according to defining 2 and define 3.
It should be noted that the UPI values of 2 rank co-location patterns are its true PI values.
In the following, providing the algorithm that the SPI- that direct excavation SPI- closes co-location patterns closes delver:
And the major function of Part III is to calculate the true PI values of each candidate pattern in CNCC, and generate SPI- and close frequency
Numerous co-location sets of patterns Ω.Particularly, if candidate pattern c, PI (c)=UPI (c), then just directly pattern
C is moved on to from CNCC in Ω.If but the PI (c) of candidate pattern c ≠ UPI (c), then just with Steps 25) -27) it is further processed
They.
Below by one group of experiment (embodiment) come verify SPI- proposed by the present invention close co-location sets of patterns and
SPI- closes the performance of delver.Programming tool is Visual C++ used by this experiment.Run the experimental ring of SPI- delvers
Border is:CPU:Intel Core i5 3337U@1.80GHz;RAM:2GB;Operating System:Microsoft
Windows 7.
Data used in embodiment are the plant distributions data sets from " Yunnan Three Parallel Rivers protection zone ", it possesses few
Quantity space feature, but contain a large amount of Example characteristics.This group of data are distributed across 110000m × 160000m regional extents, it is not only
Data containing discrete distribution, and contain the data for the distribution that clusters.As shown in table 1.
Three Parallel Rivers in table 1. Yunnan protection zone plant distributions data set
Data set name | Characteristic | Instance number | (Max,Min) | Example distributed areas (rice) |
Three Parallel Rivers in Yunnan plant distributions data set | 15 | 501046 | (55646,8706) | 110000×160000 |
(Max,Min):For indicating the maximum example number of all features and minimum example number in this data set
Using data set shown in table 1, we close delver to SPI- and PI- closes delver and compares.Such as 2 institute of table
Show, what it was provided is with the growth of pattern exponent number, the quantity of the quantity and final result pattern of the candidate pattern of generation.It can be with
See, SPI- close delver generation candidate pattern quantity ratio PI- close delver generation candidate pattern quantity it is few.This
Outside, with the growth of pattern exponent number, SPI- closes the quantity of candidate pattern of delver generation and the quantity of final result pattern is got over
Come closer to.The run time of algorithm thus can be significantly reduced, because whether one long candidate pattern of judgement is mold closing
Formula the time it takes is longer than one shorter candidate pattern the time it takes of judgement.
Table 2.SPI- closes delver and PI- closes delver comparison
In this experiment, d=10000, M=0.3 is arranged in we
As shown in figure 4, it will be seen that as M and d smaller, SPI- closes delver ratio PI- and closes delver operation
It is fast.Especially as M=0.1, it is three times fast that SPI- closes delver ratio PI- delvers.
It is an advantage of the invention that:1, the SPI- proposed closes co-location sets of patterns and provides frequent co-location moulds
(collection reduction is about for the co-location patterns (be known as PI- and close co-location patterns) of closing proposed than document 2 for the smaller of formula collection
30%), the expression of not losing participation information.2, the SPI- designed closes the run time of delver and closes co- than traditional PI-
Time used in location mode excavations will be lacked.Firstly, because the pact of the constraints ratio PI- close patterns of SPI- close patterns
Beam condition is eager to excel, so, compared with PI- close pattern mining algorithms, closing the candidate pattern generated in delver in SPI- will lack;Its
It is secondary, during SPI- closes co-location schema creations, a large amount of time be used in generate co-location examples and
It calculates in PI values.Therefore, during generating candidate pattern, it is non-to remove as much as possible that we used 3 kinds of Pruning strategies
SPI- close patterns, the data as shown in Fig. 1 (a), combination filtration stage, all non-SPI- close co-location patterns by
Whole beta prunings are fallen.
Each embodiment in this specification is all made of relevant mode and describes, identical similar portion between each embodiment
Point just to refer each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality
For applying example, since it is substantially similar to the method embodiment, so description is fairly simple, related place is referring to embodiment of the method
Part explanation.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all
Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention
It is interior.
Claims (3)
1. the lossless compression method of frequent co-location patterns, which is characterized in that define SPI- first and close co-location
Pattern:
1. are defined for two given co-location patterns c and c ' andSuper participation SPIs (c | c') of the c in c '
It is defined as the minimum value of all feature participation rates in the c being calculated by the table example of c', i.e.,:SPI (c | c')=min
{PR(c',fi),fi∈c};
It is that SPI- closes co-location patterns to define 2. 1 co-location patterns c, and the PI values and if only if c exist more than c
SPI values in the hyper mode c' that its all SPI- is closed, i.e.,:And if only ifC' is that SPI- closes co-location patterns, together
When PI (c)>SPI (c | c') → c is that a SPI- closes co-location patterns;
If it is that SPI- closes frequent co-location patterns to define 3. 1 SPI- and close co-location patterns c, it is and if only if c
What that SPI- is closed and PI (c) >=M, wherein M were indicated is the frequency threshold value that user specifies;And for a co-location mould
Formula c makes if there is a co-location patterns c'And PI (c)=SPI (c | c'), then claiming " c'SPI- coverings
C " makes if there is a co-location patterns c'And PI (c)=PI (c'), claim " c'PI- covers c ";
Then, it follows the steps below:
Step 1, input data is pre-processed:Generate neighbours' transaction set NT and feature neighbours' transaction set ENT;
Step 2, feature neighbours transaction set ENT is stored using lexcographical order prefix tree construction, before feature based neighbours' transaction set
Sew tree construction, generate star-like SPI- and close candidate pattern, combine star-like SPI- close candidate pattern generate group a SPI- close candidate pattern;
Step 3, the group of generating SPI- is closed after candidate pattern, by scanning neighbours transaction set NT, obtains the candidate of candidate pattern
Table example, then the proximity relations by detecting other examples, are really met the table example of regimental tie;It is calculated based on table example
The participation PI of pattern, while judging whether a pattern is that SPI- closes co-location patterns by the following method:
For k rank candidate pattern c, if PI (c)=UPI (c), co-location patterns are closed then c must be SPI-, UPI is upper
Boundary's participation;Otherwise, it is necessary to which all k-1 ranks for first generating pattern c are used as candidate by the subpattern of beta pruning, if next, PI
(c)<M, then c is just fallen by beta pruning;Such as if it is greater than or equal to M, then need to judge whether c is one according to defining 2 with defining 3
SPI- closes co-location patterns.
2. the lossless compression method of frequent co-location patterns according to claim 1, which is characterized in that the step
In rapid 1, pretreatment input data detailed process is:Using given adjacency threshold process input data set, owned
Neighbouring example pair, pass through the neighbouring example pair of grouping, generate neighbours transaction set NT, then, generated according to neighbours' transaction set NT special
Levy neighbours' transaction set ENT;The set of S representation space Example characteristics, for a space characteristics example f.i ∈ S, its neighbours' thing
Business collection NT is one and includes f.i and all set with other space characteristics examples of the f.i with proximity relations, that is, NT
(f.i)={ f.i, g.j ∈ S | NR (f.i, g.j)=true and f ≠ g }, the wherein neighbouring pass between NR representation spaces example
System, g.j indicate that j-th of example of feature g, f.i are referred to as reference example, and the collection of neighbours' transaction set of all examples is collectively referred to as sky
Between data neighbours' transaction set, be denoted as NT;The lexcographical order collection of different spaces feature, referred to as feature neighbours thing in neighbours' transaction set NT
Business collection ENT.
3. the lossless compression method of frequent co-location patterns according to claim 1, which is characterized in that the step
Rapid 2 are come the detailed process for storing feature neighbours' transaction set ENT using lexcographical order prefix tree construction:
Step 1. defines lexcographical order prefix trees;Using the characteristic type of reference example as root node, with adjacent in feature neighbours' transaction set
It occupies and is characterized as child node;Each child node is by three parts Composition of contents:Characteristic type, count value and node line;Wherein, feature
Type is used for identifying node;Count value representative is concentrated with several paths in entire feature affairs can be from the characteristic type of reference example
Reach this feature type;The connection of node line is the node for possessing same characteristic features type with the node in this tree;
Step 2. is since in lexcographical order prefix trees, all child nodes all have proximity relations with root node, so generating star
Type SPI- closes candidate co-location patterns;And by lexcographical order prefix trees, also obtains this star-like SPI- and close co-
Dividing value on the participation rate of location patterns;If in same one tree, some candidate upper bound participation rate is equal to its super candidate
Upper bound participation rate, then, just mark this red star-like candidate;But if some candidate upper bound participation rate is less than threshold value M, then just
It is deleted;
Step 3, co-location candidate patterns are closed by combining k related star-like SPI-, generates k ranks group SPI- and closes co-
Location candidate patterns, and upper bound participation rate minimum in this k star-like candidates is that k ranks group SPI- closes co-location
The upper bound of candidate pattern participates in angle value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710430303.0A CN107291854B (en) | 2017-06-09 | 2017-06-09 | The lossless compression method of frequent co-location patterns |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710430303.0A CN107291854B (en) | 2017-06-09 | 2017-06-09 | The lossless compression method of frequent co-location patterns |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107291854A CN107291854A (en) | 2017-10-24 |
CN107291854B true CN107291854B (en) | 2018-10-19 |
Family
ID=60096809
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710430303.0A Active CN107291854B (en) | 2017-06-09 | 2017-06-09 | The lossless compression method of frequent co-location patterns |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107291854B (en) |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9092454B2 (en) * | 2008-04-22 | 2015-07-28 | Microsoft Technology Licensing, Llc | Discovering co-located queries in geographic search logs |
US8326834B2 (en) * | 2008-06-25 | 2012-12-04 | Microsoft Corporation | Density-based co-location pattern discovery |
-
2017
- 2017-06-09 CN CN201710430303.0A patent/CN107291854B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN107291854A (en) | 2017-10-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111639237A (en) | Electric power communication network risk assessment system based on clustering and association rule mining | |
Peng et al. | A two-stage deanonymization attack against anonymized social networks | |
CN110347881A (en) | A kind of group's discovery method for recalling figure insertion based on path | |
CN110232078B (en) | Enterprise group relationship acquisition method and system | |
CN106452825B (en) | A kind of adapted telecommunication net alarm correlation analysis method based on improvement decision tree | |
CN110334391A (en) | A kind of various dimensions constraint wind power plant collection electric line automatic planning | |
CN106599230A (en) | Method and system for evaluating distributed data mining model | |
CN106202430A (en) | Live platform user interest-degree digging system based on correlation rule and method for digging | |
CN105183796A (en) | Distributed link prediction method based on clustering | |
CN105976048A (en) | Power transmission network extension planning method based on improved artificial bee colony algorithm | |
CN111651613B (en) | Knowledge graph embedding-based dynamic recommendation method and system | |
CN109376544A (en) | A method of prevent the community structure in complex network from being excavated by depth | |
John et al. | Energy saving cluster head selection in wireless sensor networks for internet of things applications | |
CN104700311B (en) | A kind of neighborhood in community network follows community discovery method | |
CN107291854B (en) | The lossless compression method of frequent co-location patterns | |
Bao et al. | Mining non-redundant co-location patterns | |
Wang et al. | Spatial Co-location Pattern Mining Based on Fuzzy Neighbor Relationship. | |
CN103164487A (en) | Clustering algorithm based on density and geometrical information | |
Janeja et al. | Random walks to identify anomalous free-form spatial scan windows | |
Liu et al. | Wl-align: Weisfeiler-lehman relabeling for aligning users across networks via regularized representation learning | |
CN107944015A (en) | Threedimensional model typical structure based on simulated annealing excavates and method for evaluating similarity | |
CN109033746A (en) | A kind of protein complex recognizing method based on knot vector | |
CN114911849A (en) | Mobile network traffic pattern mining method based on complex network theory | |
Arab et al. | A modularity maximization algorithm for community detection in social networks with low time complexity | |
CN107147520A (en) | A kind of terroristic organization's Web Mining algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20200609 Address after: Room b214, zone a, 968, Xiujiang East Road, Yiyang New District, Yuanzhou District, Yichun City, Jiangxi Province Patentee after: Jiangxi zhengrudder Network Technology Co., Ltd Address before: 650091 Yunnan Province, Kunming city Wuhua District Lake Road No. 2 Patentee before: YUNNAN University |