CN105808754A - Method for rapidly discovering accumulation mode from movement trajectory data - Google Patents

Method for rapidly discovering accumulation mode from movement trajectory data Download PDF

Info

Publication number
CN105808754A
CN105808754A CN201610144268.1A CN201610144268A CN105808754A CN 105808754 A CN105808754 A CN 105808754A CN 201610144268 A CN201610144268 A CN 201610144268A CN 105808754 A CN105808754 A CN 105808754A
Authority
CN
China
Prior art keywords
group
bunch
guan
participant
snapshot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610144268.1A
Other languages
Chinese (zh)
Inventor
郑凯
贾梦迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201610144268.1A priority Critical patent/CN105808754A/en
Publication of CN105808754A publication Critical patent/CN105808754A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses a method for rapidly discovering an accumulation mode from movement trajectory data, and proposes an accumulation concept, and proposes a series of algorithms such as a discovery closure group algorithm, an R-tree index technology, a grid index technology, a test partitioning algorithm, a bit vector signature based TAD algorithm, a growth algorithm, and the like to efficiently discover accumulation from a trajectory and update the accumulation in time. By using the manner above, the present invention discloses the method for rapidly discovering the accumulation mode from movement trajectory data, so that not only precision and accuracy of accumulation discovery is ensured, but also data mining efficiency can be greatly improved.

Description

A kind of method quickly finding accumulation mode from mobile trajectory data
Technical field
The present invention relates to data base, data analysis, data mining, track data analysis, trajectory data mining field, particularly relate to a kind of method quickly finding accumulation mode from mobile trajectory data.
Background technology
Becoming increasingly popular of station acquisition technology so that a large amount of tracks almost all moving object that gather are possibly realized.From the behavior of these objects, find that useful pattern can transmit valuable information to various crucial application.With regard to this, it is proposed that a kind of new concept, cry gathering, namely a kind of simulate various group activities, for instance celebration, review troops, protest, the trajectory model of traffic jam etc..From track, find that accumulation mode has two challenges:
(1) suitable model is defined:
First, former work distinguishes close quarters always by covering a fixing grid on geographical space, but this is likely to not be inconsistent with the true form of rally in gathering.Although this problem to a certain extent can by solving with the grid of the more appropriate size of space, but the complexity produced therewith is exponential increase.This makes this solution computationally infeasible.
Second, more essential problem is, judges that the sole criterion of close quarters was whether the individual amount according to one of them rally exceedes given threshold value in the past, regardless of whether the individuality in region has common behavior.
The concept having been proposed that in the past has flock, convoy and swarm.But for flock, circle can not reflect the actual population in reality, this may result in so-called group and loses problem;And flock and convoy has strict requirement for the seriality of time period;In addition, these three concept is required for comprising the colony with same individual set within life cycle, yet with in a real group activity, for instance business promotion, it is inevitable that member joins and departs from activity frequently, so this demand is unpractical.
And another one concept: mobile cluster (movingcluster) needs any two colony to share sufficient amount of same individual in the persistent period stabs, and this is still difficult to meet in actual group activity.Additionally, in mobile cluster two continue colonies can wide apart, but a gathering usually occurs in a more stable region.
(2) algorithm is found efficiently
In former work, it has been found that the algorithm of flocks can only find the colony in fixing border circular areas;As long as and mobile cluster algorithm have shared abundant identical individuality and will repeat to add one bunch at next timestamp with current cluster;First CuTS algorithm collects the track of simplification to obtain convoy participant, then adopts mobile cluster algorithm to obtain correct result.
But the above algorithm is not all suitable for our problem, because we need not share common object by any two continuous print bunch.
Additionally, object growth algorithm attempts enumerating all subsets of object set, and check that it is swarm.In order to keep computation complexity to be easy to handle, this algorithm proposes apriori beta pruning, backward beta pruning, forward is closed the methods such as inspection and effectively reduced search volume.
But, we can not use these technology, because accumulation mode does not have closes downwards attribute.
Summary of the invention
The technical problem that present invention mainly solves is to provide a kind of method quickly finding accumulation mode from mobile trajectory data, has reliability height, searches the advantages such as convenient, simultaneously the application of miniature numerical control lathe and universal on have market prospect widely.
For solving above-mentioned technical problem, the technical scheme that the present invention adopts is:
Thering is provided a kind of method quickly finding accumulation mode from mobile trajectory data, step includes:
(1) the snapshot bunch stage:
Predefined assemble: each snapshot bunch of and if only if a group Cr exists at least mpIndividual participant, namely, Cr is called gathering, if a Cr does not have preeminent and it be a gathering, then claiming this to assemble is Guan Bi, and wherein, a snapshot bunch is the group of one group of object with arbitrary shape and size, crowd is group, and o is the track of mobile object, and t is the time threshold of data base
In time point, o (t) moves the position of object o when being t for the time, Par (Cr) is the set of the participant of a group Cr;
Preset definition 2: provide mobile objectThe set of track, the threshold value of support, variable threshold, and life cycle threshold value, group Cr is to be the order of snapshot bunch in continuous print timestamp, namely, it meets following demand: Cr,TRepresent Cr life cycle no less than, namely;At least exist at any timeIndividual object, namely;Snapshot bunch any two continuously between distance be not more than,
It is substantially the set of point due to a snapshot set snapshotcluster, provides the hausodrff distance of two point set P and Q, point set P and QIt is defined as:
Time threshold data baseEach time point, the track of mobile object is concentrated by density based, to find all of snapshot bunch, first original track is simplified by curve data compression algorithm, then concentrate at straight line portion, the object that each bunch of straight line portion comprises is likely to form snapshot bunch, the data base of output snapshot bunch at some time point
(2) discovery phase of group, namely fromIn find out the group of all Guan Bis:
(2.1) definition lemma 1: groupIn, if there is noIf so that affix in group CrTo produce a new group, then group Cr is the group of a Guan Bi, and otherwise, group Cr is inc, wherein, bunchIt is t for the time in group CriTime snapshot bunch, bunchIt is t for the time in group CrjTime snapshot bunch;
(2.2) permutation index cluster method rangesearch, R tree index cluster method or grid index cluster method are utilized, by snapshot bunch being attached in the set of current group participant V find the group of Guan Bi at next time point, group participant be possible to grow into group bunch set, be equivalent to the group of candidate;
Permutation index cluster method RangeSearch:RangeSearch () be current time stab, from bunch set search fromHausdorff distance be not more thanBunch, its be achieved in that just calculate each's, namely to calculate current crowd participant and current point in time bunch between every pair, therefrom find outIt is not more thanAll bunches;
(3) assemble investigation stage:
Utilize and test partitioning algorithm TAD or bit vector signature test partitioning algorithm, confirm whether the group of each Guan Bi obtained in the previous step is that Guan Bi gathering is assembled or whether comprised to Guan Bi.
In a preferred embodiment of the present invention, described utilize permutation index cluster method, include by snapshot bunch being attached in the set of current group participant V find the concrete steps of the group of Guan Bi at next time point:
Obtain the data base of snapshot bunch, preset a group support threshold value, preset the threshold value of life cycle of a groupAnd the threshold value of the variable being preset in the definition of group
At each timestamp, check last bunch of each crowd of Cr, it is judged that whether group Cr can pass through to add one bunch again is extended: obtains current time stamp, utilizes formulaThe set calculated bunchIn each bunchTo bunchHausdorff distance, and search fromHausdorff distance be not more thanBunch, wherein, the set that C is bunch,When being current time stamp bunch set, and be contained in;If found bunchThen can extend, the group after extensionIt is inserted in after crowd participant V as new participant;If can not find bunchNamely cannot extend, and when the life cycle of Cr is not less than, then show that crowd Cr is the group of a Guan Bi according to lemma 1;If can not find bunch, and the life cycle of Cr less than, then Cr is not group;Stab at any time, it is impossible to bunch R being attached to the group participant of any existence is taken as a new group participant.
In a preferred embodiment of the present invention, described utilize R tree index cluster method, include by snapshot bunch being attached in the set of current group participant V find the concrete steps of the group of Guan Bi at next time point:
Obtain the data base of snapshot bunch, preset a group support threshold value, preset the threshold value of life cycle of a groupAnd the threshold value of the variable being preset in the definition of group
WithExpression bunchMinimum rectangle border MBR, use formulaRepresent the minimum range between two rectangles, predefined lemma 2: given two bunchesWith,, c be in C bunch;
Obtain last bunch of each crowd of Cr,For any one bunch in group Cr, utilize formulaThe set calculated bunchIn each bunchTo bunchHausdorff distance;
Use formulaSearchTime, the set of retrieval bunchAnd take out participant set, participant's set withMinimum range be not more than, then refine these participants thus finding out all lemma 2 of meeting, wherein, with R tree for bunch set C in bunch minimum rectangle border index, and set up query window based on R tree, this window is parameter is'sExpand MBR, that node comprises and be not participant with window nonoverlapping bunch;
Predefined lemma 3: orderRepresent a article limit of rectangle M, a=(1,2,3,4), define distance functionFor:
, then have, namely calculate obtain from Cr distance less thanThe set of snapshot bunch;
Use formulaParticipant in retrieval R tree, then refine these participants thus be met lemma 3 bunch set, wherein, with R tree be in C bunch MBRs index, and set up query window based on R tree, this window is parameter is'sExpand MBR, pass throughExpandEach edge so as to comprise four rectangles, rectangle useRepresent, a=(1,2,3,4), in R traversal of tree, only one of which node and four rectangles just check this node when all intersecting further;
Check last bunch of each crowd of participants, see whether it can pass through an additional bunch of extension again, if it can, the group participant after extension is inserted in after the set V of crowd participant as new participant;If cannot extend, and the life cycle of Cr is not less than, then show that crowd Cr is the group of a Guan Bi according to lemma 1;If cannot extend, and the life cycle of Cr less than, then Cr is not group;Stab at any time, it is impossible to bunch R being attached to the group participant of any existence is taken as a new group participant.
In a preferred embodiment of the present invention, described utilize grid index cluster method, include by snapshot bunch being attached in the set of current group participant V find the concrete steps of the group of Guan Bi at next time point:
Obtain the data base of snapshot bunch, preset a group support threshold value, preset the threshold value of life cycle of a groupAnd the threshold value of the variable being preset in the definition of group
Definition influence area: for being positioned at the unit of a row b row in a grid G, its influence area be withMinimum range be not more thanThe set of unit, namely
First, with grid G, the whole space of group Cr being divided into multiple unit g, each unit is that the length of side is equal toSquare, for each timestamp t, the set of once-over bunchAfter, build a grid index by two kinds of data structures, grid indexIn containing the unit list of each bunch, wherein, unit listHave recorded by bunch unit takenWith each unitReverse list, reverse list stores cover on this unit bunch;
The distance that the influence area of one unit comprises with the point in g is not more thanPoint, provide last bunch of group Cr as inquiry bunchWith bunchThe grid index of the next timestamp of corresponding time
The pruning stage: fromIn select each unit g and find outIn its unit list withIntersect bunch, wherein, only coverIn each unit influence area bunch just can become reference person, can exist in otherwise bunch at least one fromDistance ratioRemote point;
Elaboration phase: due in same unit distance between any two points scarcely more than, and in the limiting case, if, equally, so having only to check the unit in different sets, search bunch set in fromHausdorff distance be not more thanBunch, namely retrieval group Cr in each bunch, for any bunch in group Cr, first the set of a unit is addedWithObtain their common element, forIn some p, calculate some p withMinimum Hausdorff distance, and have only to calculateTo some p andTo the Hausdorff distance of the point dropped in influence area,RepresentWithBetween minimum Hausdorff distance;
If find fromHausdorff distance be not more thanBunchThen can extend, the group after extensionIt is inserted in after crowd participant V as new participant;If can not find bunchNamely cannot extend, and when the life cycle of Cr is not less than, then show that crowd Cr is the group of a Guan Bi according to lemma 1.
In a preferred embodiment of the present invention, at described utilization test partitioning algorithm, confirm whether the group of each Guan Bi obtained in the previous step is that Guan Bi assembles or whether comprise in Guan Bi gathering:
Preset definition 3: provide a group Cr object o and be called a participant and if only if that it occurs in Cr at leastIn individual snapshot bunch, orderRepresentative comprises the set of the snapshot bunch in the Cr of object o, namely, then the participant of Cr is the set of object
Utilize test partitioning algorithm, test is started from the group of entirety Guan Bi, according to definition 3 be in group Cr each snapshot bunch calculating its whether participation activity is to judge that whether it is for participant, then checking the quantity of participant in each bunch in group, then whether the group according to the predefined method the assembled each Guan Bi of test is assemble;If not gathering, then pick out invalid cluster, invalid cluster does not have enough participants, and by removing these bunches, group is divided into a few height row, wherein, for each son row being still group, whether the group of the predefined method test Guan Bi assembled of recycling is assemble again, until finding other groups never again.
In a preferred embodiment of the present invention, described utilize bit vector signature test partitioning algorithm, confirm that whether the group of each Guan Bi obtained in the previous step is that Guan Bi is assembled or whether comprised the concrete steps that Guan Bi assembles and includes:
A) for the track of each mobile object of group CrStructure bit vector signature BVS, and each BVS be a length is the bit vector of n, each of this vector represent be consistent bunch in the presence or absence of o, wherein, the BVS of all objects in group Cr can be constructed by the single sweep operation of group, and BVS has only to construct and once just can use in all recursive procedures of TAD;
B) testing procedure: useRepresent whether the BVS of certain object o, test group Cr are the group closed, it is simply that calculateIn 1 figure place, i.e. the Hamming weight of a bit vector, adopt traversalThe mode of all positions or obtain the Hamming weight of bit vector based on the counting mode of binary tree pattern, when adopting the counting based on binary tree pattern, first obtainEveryThe quantity of 1 in position, then obtain every 4, namelyPosition in 1 quantity, until the m time, 2mDuring=n, obtaining the quantity of 1 in every n position, the bit vector of any n position, its Hamming weight can be usedWithin step draw, arranging mask m, mask m is a bit vector identical with BVS length;When Hamming weight value more than or equal toTime, the group of Guan Bi is that Guan Bi gathering is assembled or comprised to Guan Bi;
C) partiting step a: if group assembles, then it is divided into a series of sub-row, is divided into a series of subvector by the bit vector of the track of each mobile object, extracts from original BVS with mask, the position, position that mask is identical with subgroup is 1, and other positions, position are 0;By original BVS and mask are carried out AND-operation, obtaining a new BVS, the institute of the subgroup wherein wanted is in place remains 1 and other positions are 0, so,Returning a series of mask, and enter, by mask, the testing procedure that son arranges, namely testing procedure directly obtains the BVS being consistent with subgroup of object with mask.
In a preferred embodiment of the present invention, the track of mobile object should periodically add data base toIn, time domain isThe data base of track of mobile object be, collecting time domain it wasNew trackAnd add toIn after, obtain expand time domain, data base after the renewal of its correspondence
In a preferred embodiment of the present invention, more after new database, it has been found that the concrete steps that the Guan Bi of new growth is assembled include:
1) extension: predefined lemma 4: lemma 4: provideIn one Guan Bi groupIf, its last bunch be notNearest time point, namely, whereinBeing the time isThe set being bunch, then Cr existsIn not expansible, lemma 4 shows in former data base that only some group or group participant are extendible;
JudgeIn a bunch sequence existWhether set CS during end can be extended to new group, and these bunch of sequence comprises group and the length group participant less than k of Guan Bi in former data base;
After step (2) finds the group of Guan Bi, preserve the group of group participant when last timestamp terminates and Guan Bi, then receiving new track setAnd the data base being converted into bunchAfterwards, by time vernierIt is set to, and pre-group participant will be worked as changed into CS by V;
2) renewal is assembled: assumeIn a groupIt is extended toIn a new closed group, then to find outIn Guan Bi assemble time, directly rightUse TAD algorithm or bit vector TAD algorithm;
Predefined lemma 5: with IC (Cr) set representing the invalid cluster in group Cr, then have, because ofIn the addition of new bunch,In some non-participants may become participant, namelyIn gathering be likely to expand or withThe gathering of middle vicinity is merged;If foundIn belong toInvalid cluster, then all Guan Bis before tj are gathered inIn remain unchanged, namely obtain theorem 2: when providing an invalid cluster, wherein, then arbitrary Guan Bi is assembled?In still remain closed;
Testing procedure: when using bit vector TAD algorithm, be firstEach mobile object formation BVS, and investigate invalid cluster;
After test phase obtains a series of invalid cluster IC, find out at timestampInvalid cluster before, namely,, and be absent fromMake bunch,, onlyIn subgroup need to be further examined because they comprise new or after updating gathering.
The invention has the beneficial effects as follows: not only ensure that and assemble the accuracy and accuracy found, and the efficiency of data mining can be greatly improved.
Accompanying drawing explanation
In order to be illustrated more clearly that the technical scheme in the embodiment of the present invention, below the accompanying drawing used required during embodiment is described is briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the premise not paying creative work, it is also possible to obtain other accompanying drawing according to these accompanying drawings, wherein:
Fig. 1 will be divided into the experiment effect figure of three time periods for one day in a kind of described method quickly finding accumulation mode from mobile trajectory data;
Fig. 2 is in a kind of described method quickly finding accumulation mode from mobile trajectory data
According to the experiment effect figure that weather divides;
Fig. 3 be aboutDescribed a kind of from mobile trajectory data, quickly find accumulation mode
The operation time diagram of method;
Fig. 4 be aboutDescribed a kind of side quickly finding accumulation mode from mobile trajectory data
The operation time diagram of method;
Fig. 5 be about || described one from mobile trajectory data, quickly find accumulation mode
The operation time diagram of method;
Fig. 6 be aboutDescribed a kind of from mobile trajectory data, quickly find accumulation mode
The operation time diagram of method;
Fig. 7 be aboutDescribed a kind of from mobile trajectory data, quickly find accumulation mode
The operation time diagram of method;
Fig. 8 be aboutDescribed a kind of from mobile trajectory data, quickly find accumulation mode
The operation time diagram of method;
The time cost that Fig. 9 is the group's expansion algorithm based on database size and Reconstruction Method compares schematic diagram;
The time cost that Figure 10 is group expansion algorithm and Reconstruction Method compares schematic diagram.
Detailed description of the invention
Technical scheme in the embodiment of the present invention will be clearly and completely described below, it is clear that described embodiment is only a part of embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, all other embodiments that those of ordinary skill in the art obtain under not making creative work premise, broadly fall into the scope of protection of the invention.
Referring to Fig. 1-10, the embodiment of the present invention includes:
(1) concept assembled is proposed: various marvellous social event can be simulated.
The definition assembled: an and if only if groupEach snapshot (SnapshotCluster a, snapshot bunch is the group of one group of object with arbitrary shape and size) in exist at leastIndividual participant, namely,It is called gathering.If oneIn do not have preeminent and it be a gathering, then claim this assemble be Guan Bi.
Assembling has with properties:
Scope: the individual amount that gathering is usually directed to is relatively more.
Density: these individualities come from an intensive colony.
Persistency: assemble and continue one period of time period determined and uninterrupted.
Stability: the geometric properties (such as shape, position) of colony is more stable.
Promise to undertake: in the random time assembled, wherein there is several special member and adhere to when one section
Between adhere in colony in (be likely to discontinuous).
Preset definition 1: provide mobile objectThe set of track, the threshold value of a distance, and an integer m, it is the nonvoid subset met the following conditions at the snapshot bunch SnapshotCluster of timestamp t:
1), aboutAnd m,FromIt it is density UNICOM;
2)Maximum, namely aboutWith m and,WithFromDensity up to;
One snapshot bunch is the group of one group of object with arbitrary shape and size, at a given timestamp, they are density UNICOM each other, concept according to DBSCAN, such snapshot bunch be bulk maximum to such an extent as to without two have identical time stamp bunch to as if overlap, and snapshotcluster is abbreviated as cluster and omits parameter m
Preset definition 2: provide mobile objectThe set of track, the threshold value of support, variable threshold, and life cycle threshold value, a groupIt is be the order of snapshot bunch in continuous print timestamp, namely, it meets following demand:
1)RepresentLife cycle no less than, namely
2) at least exist at any timeIndividual object, namely
3) snapshot bunch any two continuously between distance be not more than,
Additionally,A subsequence subsequence, be calledSub-(super-) crowd, ifNot having preeminent, claiming it is Guan Bi;
It is substantially the set of point due to a snapshot set snapshotcluster, provides the hausodrff distance of two point set P and Q, point set P and QIt is defined as:
Preset definition 3: provide a group, objectIt is called a participant and if only if that it occurs inAt leastIn individual snapshot bunch, orderRepresentative comprises object'sIn the set of snapshot bunch, namely, thenParticipant be the set of object
(2) multinomial highly effective algorithm is proposed:
Propose to find closed group algorithm (algorithm 1), R tree index technology, grid index technology, test partitioning algorithm (Test-and-DivideAlgorithm, TAD), to find efficiently to assemble and upgrade in time from track based on series of algorithms such as the TAD algorithm (TAD*) of bit vector signature, growth algorithms.
After the track obtaining substantial amounts of mobile object, we are by being analyzed these data, it has been found that wherein there are some interesting information, for these information, define a new model and assemble, simulate various group activity with it.First one concept group (crowd) of definition, it meets front four attribute (see the 4th point) assembled, and assembles the special group being then to meet the 5th attribute.
After providing definition, it is possible to find that from a large amount of track datas Guan Bi is assembled according to the concept assembled and feature.For this, we have proposed some new algorithms, to accelerate discovery procedure.Now discovery procedure is divided into the following three stage:
The snapshot bunch stage:
?Each time point, the track of object is concentrated by density based, to find all of snapshot bunch.For reducing cost, first simplify original track with Douglas-Peucker algorithm, then concentrate at straight line portion.The object that each bunch of straight line portion comprises is likely to form snapshot bunch at some time point.Such a object set finding, snapshot bunch ratio is directly more efficient in whole object set.This process is output as the data base of snapshot bunch
The discovery phase of group:
This stage be intended to fromIn find out the group of all Guan Bis.
It is apparent from group to meet and close downwards attribute, say, that the arbitrarily long son row of a group remain a group, and this makes to export all of subgroup is unnecessary.The more important thing is, it is impossible to ensure that the gathering of detected is Guan Bi from an inc group.Therefore, in this stage, we only find out the group of Guan Bi, and not all.For finding out the group of Guan Bi, first it is envisioned that need each supersequence checking group to check whether it is Guan Bi.But it practice, according to following lemma, as long as checking whether a group is that a Guan Bi additional snapshot bunch again is sufficient to.
Lemma 1: provide a group, if there is noIf so that meetMiddle affixA new group will be produced, thenIt is the group of a Guan Bi.Otherwise,It is inc.
According to this lemma, it is possible to by snapshot bunch being attached in the set of current group participant's (representing with V) find the group of Guan Bi at next time point.This process is realized by algorithm 1, and algorithm 1 is as follows:
Input:
;The set of // closed group
;// when the set of pre-group participant
At each timestamp, check each crowd of participants last bunch it whether can pass through again an additional bunch of extension.If it can, the group participant of extension is inserted in after V as new participant.Otherwise, according to lemma 1, we can show that it is also that the group of a Guan Bi is (if length is not less than kc), or be not group.Note, stab at any time, it is impossible to bunch (the representing with R) of the group participant being attached to any existence also should be considered that a new participant, because being likely to grow into a group after it.
Obviously, the RangeSearch () process in algorithm 1 compares and expends time in.RangeSearch () be current time stab, from bunch setMiddle lookup fromHausdorff distance be not more thanBunch.Its inmature being achieved in that of a comparison just calculates each's.Obviously, it is only calculateTime complexity be.And, calculate current crowd participant and current point in time bunch in all pairs.This makes the calculation cost into large database huge.For solving this problem, we have invented Spatial Data Index Technology and organize bunch and accelerate searching process.
R tree index bunch: we do not need Hausdorff distance definite between two bunches really, but only it is to be understood that
Their distance whether more than or less thanIt is sufficient to.WithRepresent the minimum border (MBR, minimumboundingrectangle) of holding of bunch c, useRepresent the minimum range between two rectangles.Then there is following lemma 2:
Lemma 2: given two bunchesWith,
Based on this lemma, first we retrieveAnd take out participant set, it withMinimum range be not more than, then refine these participants thus obtaining final result.Carry out participant's lookup for more efficient, we with R tree be in C bunch MBRs index, and set up query window based on R tree, this window is parameter is'sExpand MBR.Obviously, that node comprises and be not participant with window nonoverlapping bunch.
But,It it is a relatively vague rudimentary boundary value of Hausdorff distance.Ensuing lemma is that Hausdorff distance provides a tight rudimentary boundary value.
Lemma 3: orderRepresent a article limit (a=1,2,3,4) of rectangle M.Definition distance functionFor:
WithParticipant in retrieval R tree, it would be desirable to the window query process mentioned before is made some slight amendments, as follows: to first pass throughExpandEach edge so as to comprise four rectangles, rectangle useRepresent, a=1,2,3,4.In R traversal of tree, only one of which node and four rectangles just check this node when all intersecting further.
Grid index bunch: although R tree index bunch eliminates many unqualified nodes, improve the performance of discovery procedure, still have three shortcomings:
A) each time point will construct or keep R tree, and this is likely to cause higher cost;
B) due to density based bunch shape any, square boundary frame box can not obtain the distribution at bunch midpoint all the time, and this is by impact pruning effect.
C) violence is refined and be need nonetheless remain for estimating its Hausdorff distance into these participants bunch.
For solving these problems, we for bunch propose one and based on the index of grid.As we soon it can be seen that, bunch can share identical network due to each timestamp, so grid index is easier to structure.Can carry out more effective pruning in units of grid cell, and closer to bunch shape.Additionally, can be designed that better innovatory algorithm with grid index, and this algorithm can calculate definite Hausdorff distance and just can confirm that whether it is participant.
First, we use the whole space of stress and strain model, and wherein each unit is that the length of side is equal toSquare.For each time point t, after once-over bunch set, it is possible to build a grid index by two kinds of data structures, it is called each bunchUnit list, wherein have recorded by bunch unit taken and each unitReverse list, which stores cover on this unit bunch.Before describing this algorithm, first define the influence area (affectregion) of a unit.
Definition 1(influence area): provide the unit being positioned at a row b row in a grid G, its influence area be withMinimum range be not more thanThe set of unit.More precisely,
Intuitively, the distance that the influence area of a unit is likely to comprise the point in some and g is not more thanPoint.Now, inquiry bunch (querycluster) is providedThe grid index of (that is, last bunch of some crowd participants) and next timestamp, the process RangeSearch () of algorithm 1 works in the way of pruning refinement, as described below:
At the stage of pruning, Wo MencongIn select each unit g and find outIn its unit list withIntersect bunch.It is apparent from, only coversIn each unit influence area bunch just can become reference person because can exist in otherwise bunch at least one fromDistance ratioRemote point.
At elaboration phase, we will confirm that each participant is to determine final result.For participant, we first add a setWithObtain their common element.Principle below is the distance in same unit between any two points, and it is not more than.A kind of limiting case is, if, we can reach a conclusion immediately.Therefore, we have only to check the unit in different sets, namely.ForIn some p, without loss of generality, we calculate its withMinimum range.Notice that we have only to the distance of point calculating some p and dropping in influence area because other a little with the distance of p necessarily more than
(3) assemble investigation stage:
This stage can confirm whether the group of each Guan Bi obtained in the previous step is or whether comprises Guan Bi gathering.This stage proposes following algorithm:
Test partitioning algorithm (Test-and-DivideAlgorithm, TAD): can the gathering of all Guan Bis of detected efficiently in given group, as follows:
Algorithm 2 starts to test whether it is assemble from the group of entirety Guan Bi.If it is, just as demonstrated, it is the gathering of a Guan Bi and can return immediately as a result.Otherwise, we pick out invalid cluster, and these bunches do not have enough participants, and by removing these bunches, group are divided into a few height row (length of a little row less than k it is possible that be not group).For each son row being still group, we repeat above step again because some object is now likely to be due to removing of invalid cluster and becomes non-participants.This process recurrence performs until finding other groups never again.
Efficient with bit vector signature realizes: a kind of of TAD algorithm is directly realized by mode is that whether it occurs to judge that whether it is for participant for each calculation and object in group, then checks the quantity of participant in each bunch in group.Obviously, the time complexity done so is, wherein m isIn the quantity of object.Even further worsened, we are necessary for and tentatively obtain more than each repetition operation.
In order to make TAD have more efficient implementation, Wo MenweiEach object formation bit vector signature (bitvectorsignature, BVS), and subsequently institute in steps can use faster bit operator realize.Especially, a group is provided, each of which objectBVS be all a length be the bit vector of n, each of this vector represent be consistent bunch in the presence or absence of o.In the BVSs of all objects can be constructed by the single sweep operation of group.The more important thing is, BVSs has only to construct and once just can use in all recursive procedures of TAD.
It follows that we will be apparent from how being realized in algorithm 2 by optimization BVSWithProcess.
A) testing procedure.WithRepresent the BVS of certain object o,Process nature is calculateIn 1 figure place, i.e. the Hamming weight (Hammingweight) of a bit vector.Straightforward procedure is traversalAll positions, but we use more efficient way, and wherein best solution is based on the counting of binary tree pattern.So, we first obtainEach 2 bit slices in 1 quantity, then obtain the quantity of 1 in each 4 bit slices ..., etc..The examples below illustrates and only just obtains by three stepsThe process of Hamming weight.
Order,
Make m1=01010101,
Make m2=00110011,
Make m4=00001111,
The decimal number of present x is 4, is exactly equal toIn 1 figure place.In aforesaid operations, m1, m2, m4 is also referred to as mask (masks), and once is aware of bit vector just can more suitably define it.Generally, the bit vector of any n position, its Hamming weight can draw by the step within log2 (n).
B) partiting step.In this step, if a group does not become gathering, we will be divided into a series of sub-row it.It is substantially that the vector of each object is divided into a series of subvector.The group of a non-participants operates it can be mentioned that there is no need the BVSs to non-participants, because must keep its arbitrary subgroup to have non-participants.Meanwhile, also need not divide BVS physically, only can extract, with mask, the part wanted from original BVS on the contrary.Mask is also a bit vector identical with BVS length.Its position position identical with subgroup is 1, and other positions, position are 0.By original BVS and mask are carried out AND-operation, it is possible to obtain a new BVS, the institute of the subgroup wherein wanted is in place remains 1 and other positions are 0.So,Have only to return a series of mask, smaller and more exquisite than the son row returning group, and enter son row by itProcess.By this method,Process can obtain the BVSs being consistent with subgroup of object by direct mask, so can avoid the reconstruct of the BVSs of each subgroup.
In real world applications, track may often be such that and receives gradually.So, up-to-date a collection of track data should periodically (such as, every day, add in data base weekly or monthly).Especially, it is considered to time domain isTrack database.Collecting time domain it wasA collection of new trackAnd add toIn after, we obtain expand time domain, data base after the renewal of its correspondence
This little by little update the big challenge caused and be: the group of the Guan Bi found in data base before the update is likely to neither close after updating, because they are likely to quilts againIn bunch extension.Therefore, if group resident in the gathering of a Guan Bi is expanded, then this gathering is also possible to change.In order to obtain time dependent correct result, a simple solution is directly to look for time domain by the technology being previously mentionedThe gathering of corresponding whole data base.Obviously along with database size increases, the cost of this mode is also becoming big, and is finally difficult to bear.For solving this part thing, we have proposed a kind of growth algorithm (incrementalalgorithm) that can efficiently produce new Guan Bi gathering, this algorithm takes full advantage of the advantage of group and the gathering found in former data base.
1) extension.First, lemma 4 shows that in former data base, only some group (or group participant) is extendible.
Lemma 4: provideIn one Guan Bi groupIf, its last bunch be notNearest time point, namely, whereinBeing the time isThe set being bunch, then?In not expansible.
Based on this lemma, as long as we considerIn a bunch sequence existTime the set CS that terminates, see whether they can be extended to new group.These bunch of sequence comprises group and the group participant (length is still less than k) of Guan Bi in former data base.For this, algorithm 1 is modified slightly by we so that it is it preserves the group of group participant that last timestamp terminates and Guan Bi.Then, new track set is being receivedAnd after the data base being converted into bunch, namely.The process of algorithm 1 is made following amendment: amendment is by time vernierIt is set to, and pre-group participant will be worked as changed into CS by V.
2) renewal is assembled: assumeIn a groupIt is extended toIn a new closed group.Present target is to find outIn Guan Bi assemble.General way is again rightUse TAD algorithm.But some assemble as far back asIn just investigated out, use it can accelerate discovery procedure more advisably.WhenAccount forMost time, this optimization can bring more benefit.As before stated, we are firstEach object formation BVS, then runCode is to investigate invalid cluster.Ensuing lemma showsIn some invalid clusters existCan become effective.
Lemma 5: useRepresent groupIn the set of invalid cluster.Then have.CauseIn the addition of new bunch,In some non-participants may become participant.In other words,In gathering be likely to expand or withThe gathering of middle vicinity is merged.But, if we have foundIn some invalid clusters, they fall within, then can ensure thatAll Guan Bis before are gathered inIn remain unchanged.More properly, we have following theorem:
Theorem 2: provide an invalid cluster, wherein, then arbitrary Guan Bi is assembled?In still remain closed.
According to algorithm 2, we can pass through to optimizeThe discovery procedure of middle gathering improves original TAD algorithm.After test phase obtains a series of invalid cluster IC, we find out at timestampInvalid cluster before, namely, and be absent fromMake.Theorem 2 guaranteesIn Guan Bi assemble with the same before.Therefore only haveIn subgroup need to be further examined because they comprise new or after updating gathering.
According to reality, algorithm more than reasonable employment can find out the gathering of wherein all Guan Bis according to track database, and can efficiently calculate latest result after having updated data base every time.
Aggregation Model proposed by the invention overcomes the shortcoming comprising same individual in Population pattern in life cycle, i.e. member can join and depart from assembling at any time.
Meanwhile, we have done great many of experiments based on real track database and have confirmed effect and the efficiency of proposed concept and algorithm.In experimental data comprises Beijing 3 months, (in March, 2009, April, May) is by the moon 120K bar track produced more than 33000 taxis.Additionally, the spacer unit of time domain is for dividing, obtainIn 132480 time points (60*24*92).Experimental result is as follows:
(1) effect
One day is divided into three time periods: rush hour (morning 6:00--10:00 and afternoon 5:00--8:00), working time (10:00--in morning 5:00 in afternoon), time of having a rest (8:00--in afternoon 5:00 in the morning).Fig. 1 shows crowds (group) in the these three time period of one day, gathings (gathering), the par of four kinds of patterns of swarms and convoys.It can be seen that
1) gathering in rush hour is maximum, and other times section is less;
2) although there is a lot of group the time of having a rest, but only sub-fraction becomes gathering;
3) swarms and the convoys of rush hour and time of having a rest is more more than working time.
According to weather condition, it was divided into three groups by whole 92 days: fine day, rainy day, snow sky.Experimental result is as shown in Figure 2:
1) gathering of fine day is minimum, and snow sky is maximum;
2) group in snow sky and the quantity gap of gathering are bigger;
3) quantity of swarms is insensitive to Changes in weather, but the quantity that convoys is in snow sky is minimum.
(2) efficiency
The performance of group discovery algorithm:
Relatively three kinds of beta prunings therein are called: a) SR, and the simple R tree based on beta pruning (is used);B) IR, the R tree based on beta pruning of improvement (is used);C) GRID, based on the beta pruning of grid.Experimental result is as follows:
1) from Fig. 3/4/5 entirety, IR significantly improves the beta pruning effect of SR, and GRID further increases the performance of IR, and performance is than SR at least what a magnitude;
2) from figure 3, it can be seen that along withIncreasing, the time cost of all algorithms is all reducing;
3) from fig. 4, it can be seen that along withIncreasing, the performance of all algorithms is all being deteriorated;
4) from fig. 5, it can be seen that along with database size increases, all algorithm required times are consequently increased, but the beta pruning based on grid is least sensitive to Database size.
Assemble the performance of investigation algorithm:
Relatively three kinds of algorithms used are assembled in investigation: a) violence method;B) TAD algorithm;C) TAD*: the TAD algorithm realized with bit vector signature.Experimental result is as follows:
1) from Fig. 6/7/8, it is apparent from TAD one or two order of magnitude better than the performance of violence method, and TAD* algorithm is by the performance boost of TAD algorithm 30%;
2) as can be seen from Figure 6, along withIncrease, the time cost of violence method is significantly increased, and TAD and TAD* is first slowly increased, after reduce again;
3) as can be seen from Figure 7, along withIncrease, the time cost of violence method is significantly increased, and TAD and TAD* is first slowly increased, after reduce again;
4) as it can be observed in the picture that along withIncreasing, the time cost of violence method almost increases with exponential form, and the performance of TAD and TAD* is also being deteriorated, but the amplitude of change is less, andTime bigger, TAD* shows more benefit.
The performance of growth algorithm:
Experimental result is as follows:
1) from fig. 9, it can be seen that the time cost of Reconstruction Method | TDB| territory expands and increases over time, it is possible to prediction, and along with the sustainable growth of database size, the cost of Reconstruction Method is big, and the time spent by group's expansion algorithm is almost constant;
2) as can be seen from Figure 10, the change of variable r does not affect Reconstruction Method, but the increase along with r, assemble the in hgher efficiency of update algorithm, r refers to the red shared ratio of former group group in the updated, the size according to r, calculates the operation time of both algorithms respectively, group's expansion algorithm is not rely on r, but r can affect its operation time;
3) other parameters in Fig. 2 and Fig. 3 have also been tested by we, and experimental result is similar.
The foregoing is only embodiments of the invention; not thereby the scope of the claims of the present invention is limited; every equivalent structure utilizing description of the present invention to make or equivalence flow process conversion; or directly or indirectly it is used in other relevant technical field, all in like manner include in the scope of patent protection of the present invention.

Claims (8)

1. the method quickly finding accumulation mode from mobile trajectory data, it is characterised in that step includes:
(1) the snapshot bunch stage:
Predefined assemble: each snapshot bunch of and if only if a group Cr exists at least mpIndividual participant, namely, Cr is called gathering, if a Cr does not have preeminent and it be a gathering, then claiming this to assemble is Guan Bi, and wherein, a snapshot bunch is the group of one group of object with arbitrary shape and size, crowd is group, and o is the track of mobile object, and t is the time threshold of data base
In time point, o (t) moves the position of object o when being t for the time, Par (Cr) is the set of the participant of a group Cr;
Preset definition 2: provide mobile objectThe set of track, the threshold value of support, variable threshold, and life cycle threshold value, group Cr is to be the order of snapshot bunch in continuous print timestamp, namely, it meets following demand: Cr,TRepresent Cr life cycle no less than, namely;At least exist at any timeIndividual object, namely;Snapshot bunch any two continuously between distance be not more than,
It is substantially the set of point due to a snapshot set snapshotcluster, provides the hausodrff distance of two point set P and Q, point set P and QIt is defined as:
Time threshold data baseEach time point, the track of mobile object is concentrated by density based, to find all of snapshot bunch, first original track is simplified by curve data compression algorithm, then concentrate at straight line portion, the object that each bunch of straight line portion comprises is likely to form snapshot bunch, the data base of output snapshot bunch at some time point
(2) discovery phase of group, namely fromIn find out the group of all Guan Bis:
(2.1) definition lemma 1: groupIn, if there is noIf so that affix in group CrTo produce a new group, then group Cr is the group of a Guan Bi, and otherwise, group Cr is inc, wherein, bunchIt is t for the time in group CriTime snapshot bunch, bunchIt is t for the time in group CrjTime snapshot bunch;
(2.2) permutation index cluster method rangesearch, R tree index cluster method or grid index cluster method are utilized, by snapshot bunch being attached in the set of current group participant V find the group of Guan Bi at next time point, group participant be possible to grow into group bunch set, be equivalent to the group of candidate;
Permutation index cluster method RangeSearch:RangeSearch () be current time stab, from bunch set search fromHausdorff distance be not more thanBunch, its be achieved in that just calculate each's, namely to calculate current crowd participant and current point in time bunch between every pair, therefrom find outIt is not more thanAll bunches;
(3) assemble investigation stage:
Utilize and test partitioning algorithm TAD or bit vector signature test partitioning algorithm, confirm whether the group of each Guan Bi obtained in the previous step is that Guan Bi gathering is assembled or whether comprised to Guan Bi.
2. a kind of method quickly finding accumulation mode from mobile trajectory data according to claim 1, it is characterized in that, described utilize permutation index cluster method, include by snapshot bunch being attached in the set of current group participant V find the concrete steps of the group of Guan Bi at next time point:
Obtain the data base of snapshot bunch, preset a group support threshold value, preset the threshold value of life cycle of a groupAnd the threshold value of the variable being preset in the definition of group
At each timestamp, check last bunch of each crowd of Cr, it is judged that whether group Cr can pass through to add one bunch again is extended: obtains current time stamp, utilizes formulaThe set calculated bunchIn each bunchTo bunchHausdorff distance, and search fromHausdorff distance be not more thanBunch, wherein, the set that C is bunch,When being current time stamp bunch set, and be contained in;If found bunchThen can extend, the group after extensionIt is inserted in after crowd participant V as new participant;If can not find bunchNamely cannot extend, and when the life cycle of Cr is not less than, then show that crowd Cr is the group of a Guan Bi according to lemma 1;If can not find bunch, and the life cycle of Cr less than, then Cr is not group;Stab at any time, it is impossible to bunch R being attached to the group participant of any existence is taken as a new group participant.
3. a kind of method quickly finding accumulation mode from mobile trajectory data according to claim 1, it is characterized in that, described utilize R tree index cluster method, include by snapshot bunch being attached in the set of current group participant V find the concrete steps of the group of Guan Bi at next time point:
Obtain the data base of snapshot bunch, preset a group support threshold value, preset the threshold value of life cycle of a groupAnd the threshold value of the variable being preset in the definition of group
WithExpression bunchMinimum rectangle border MBR, use formulaRepresent the minimum range between two rectangles, predefined lemma 2: given two bunchesWith,, c be in C bunch;
Obtain last bunch of each crowd of Cr,For any one bunch in group Cr, utilize formulaThe set calculated bunchIn each bunchTo bunchHausdorff distance;
Use formulaSearchTime, the set of retrieval bunchAnd take out participant set, participant's set withMinimum range be not more than, then refine these participants thus finding out all lemma 2 of meeting, wherein, with R tree for bunch set C in bunch minimum rectangle border index, and set up query window based on R tree, this window is parameter is'sExpand MBR, that node comprises and be not participant with window nonoverlapping bunch;
Predefined lemma 3: orderRepresent a article limit of rectangle M, a=(1,2,3,4), define distance functionFor:
, then have, namely calculate obtain from Cr distance less thanThe set of snapshot bunch;
Use formulaParticipant in retrieval R tree, then refine these participants thus be met lemma 3 bunch set, wherein, with R tree be in C bunch MBRs index, and set up query window based on R tree, this window is parameter is'sExpand MBR, pass throughExpandEach edge so as to comprise four rectangles, rectangle useRepresent, a=(1,2,3,4), in R traversal of tree, only one of which node and four rectangles just check this node when all intersecting further;
Check last bunch of each crowd of participants, see whether it can pass through an additional bunch of extension again, if it can, the group participant after extension is inserted in after the set V of crowd participant as new participant;If cannot extend, and the life cycle of Cr is not less than, then show that crowd Cr is the group of a Guan Bi according to lemma 1;If cannot extend, and the life cycle of Cr less than, then Cr is not group;Stab at any time, it is impossible to bunch R being attached to the group participant of any existence is taken as a new group participant.
4. a kind of method quickly finding accumulation mode from mobile trajectory data according to claim 1, it is characterized in that, described utilize grid index cluster method, include by snapshot bunch being attached in the set of current group participant V find the concrete steps of the group of Guan Bi at next time point:
Obtain the data base of snapshot bunch, preset a group support threshold value, preset the threshold value of life cycle of a groupAnd the threshold value of the variable being preset in the definition of group
Definition influence area: for being positioned at the unit of a row b row in a grid G, its influence area be withMinimum range be not more thanThe set of unit, namely
First, with grid G, the whole space of group Cr being divided into multiple unit g, each unit is that the length of side is equal toSquare, for each timestamp t, the set of once-over bunchAfter, build a grid index by two kinds of data structures, grid indexIn containing the unit list of each bunch, wherein, unit listHave recorded by bunch unit takenWith each unitReverse list, reverse list stores cover on this unit bunch;
The distance that the influence area of one unit comprises with the point in g is not more thanPoint, provide last bunch of group Cr as inquiry bunchWith bunchThe grid index of the next timestamp of corresponding time
The pruning stage: fromIn select each unit g and find outIn its unit list withIntersect bunch, wherein, only coverIn each unit influence area bunch just can become reference person, can exist in otherwise bunch at least one fromDistance ratioRemote point;
Elaboration phase: due in same unit distance between any two points scarcely more than, and in the limiting case, if, equally, so having only to check the unit in different sets, search bunch set in fromHausdorff distance be not more thanBunch, namely retrieval group Cr in each bunch, for any bunch in group Cr, first the set of a unit is addedWithObtain their common element, forIn some p, calculate some p withMinimum Hausdorff distance, and have only to calculateTo some p andTo the Hausdorff distance of the point dropped in influence area,RepresentWithBetween minimum Hausdorff distance;
If find fromHausdorff distance be not more thanBunchThen can extend, the group after extensionIt is inserted in after crowd participant V as new participant;If can not find bunchNamely cannot extend, and when the life cycle of Cr is not less than, then show that crowd Cr is the group of a Guan Bi according to lemma 1.
5. a kind of method quickly finding accumulation mode from mobile trajectory data according to claim 1, it is characterized in that, at described utilization test partitioning algorithm, confirm whether the group of each Guan Bi obtained in the previous step is that Guan Bi assembles or whether comprise in Guan Bi gathering:
Preset definition 3: provide a group Cr object o and be called a participant and if only if that it occurs in Cr at leastIn individual snapshot bunch, orderRepresentative comprises the set of the snapshot bunch in the Cr of object o, namely, then the participant of Cr is the set of object
Utilize test partitioning algorithm, test is started from the group of entirety Guan Bi, according to definition 3 be in group Cr each snapshot bunch calculating its whether participation activity is to judge that whether it is for participant, then checking the quantity of participant in each bunch in group, then whether the group according to the predefined method the assembled each Guan Bi of test is assemble;If not gathering, then pick out invalid cluster, invalid cluster does not have enough participants, and by removing these bunches, group is divided into a few height row, wherein, for each son row being still group, whether the group of the predefined method test Guan Bi assembled of recycling is assemble again, until finding other groups never again.
6. a kind of method quickly finding accumulation mode from mobile trajectory data according to claim 1, it is characterized in that, described utilize bit vector signature test partitioning algorithm, confirm that whether the group of each Guan Bi obtained in the previous step is that Guan Bi is assembled or whether comprised the concrete steps that Guan Bi assembles and includes:
A) for the track of each mobile object of group CrStructure bit vector signature BVS, and each BVS be a length is the bit vector of n, each of this vector represent be consistent bunch in the presence or absence of o, wherein, the BVS of all objects in group Cr can be constructed by the single sweep operation of group, and BVS has only to construct and once just can use in all recursive procedures of TAD;
B) testing procedure: useRepresent whether the BVS of certain object o, test group Cr are the group closed, it is simply that calculateIn 1 figure place, i.e. the Hamming weight of a bit vector, adopt traversalThe mode of all positions or obtain the Hamming weight of bit vector based on the counting mode of binary tree pattern, when adopting the counting based on binary tree pattern, first obtainEveryThe quantity of 1 in position, then obtain every 4, namelyPosition in 1 quantity, until the m time, 2mDuring=n, obtaining the quantity of 1 in every n position, the bit vector of any n position, its Hamming weight can be usedWithin step draw, arranging mask m, mask m is a bit vector identical with BVS length;When Hamming weight value more than or equal toTime, the group of Guan Bi is that Guan Bi gathering is assembled or comprised to Guan Bi;
C) partiting step a: if group assembles, then it is divided into a series of sub-row, is divided into a series of subvector by the bit vector of the track of each mobile object, extracts from original BVS with mask, the position, position that mask is identical with subgroup is 1, and other positions, position are 0;By original BVS and mask are carried out AND-operation, obtaining a new BVS, the institute of the subgroup wherein wanted is in place remains 1 and other positions are 0, so,Returning a series of mask, and enter, by mask, the testing procedure that son arranges, namely testing procedure directly obtains the BVS being consistent with subgroup of object with mask.
7. a kind of method quickly finding accumulation mode from mobile trajectory data according to claim 1, it is characterised in that the track of mobile object should periodically add data base toIn, time domain isThe data base of track of mobile object be, collecting time domain it wasNew trackAnd add toIn after, obtain expand time domain, data base after the renewal of its correspondence
8. a kind of method quickly finding accumulation mode from mobile trajectory data according to claim 6 or 7, it is characterised in that more after new database, it has been found that the concrete steps that the Guan Bi of new growth is assembled include:
1) extension: predefined lemma 4: lemma 4: provideIn one Guan Bi groupIf, its last bunch be notNearest time point, namely, whereinBeing the time isThe set being bunch, then Cr existsIn not expansible, lemma 4 shows in former data base that only some group or group participant are extendible;
JudgeIn a bunch sequence existWhether set CS during end can be extended to new group, and these bunch of sequence comprises group and the length group participant less than k of Guan Bi in former data base;
After step (2) finds the group of Guan Bi, preserve the group of group participant when last timestamp terminates and Guan Bi, then receiving new track setAnd the data base being converted into bunchAfterwards, by time vernierIt is set to, and pre-group participant will be worked as changed into CS by V;
2) renewal is assembled: assumeIn a groupIt is extended toIn a new closed group, then to find outIn Guan Bi assemble time, directly rightUse TAD algorithm or bit vector TAD algorithm;
Predefined lemma 5: with IC (Cr) set representing the invalid cluster in group Cr, then have, because ofIn the addition of new bunch,In some non-participants may become participant, namelyIn gathering be likely to expand or withThe gathering of middle vicinity is merged;If foundIn belong toInvalid cluster, then all Guan Bis before tj are gathered inIn remain unchanged, namely obtain theorem 2: when providing an invalid cluster, wherein, then arbitrary Guan Bi is assembled?In still remain closed;
Testing procedure: when using bit vector TAD algorithm, be firstEach mobile object formation BVS, and investigate invalid cluster;
After test phase obtains a series of invalid cluster IC, find out at timestampInvalid cluster before, namely,, and be absent fromMake bunch,, onlyIn subgroup need to be further examined because they comprise new or after updating gathering.
CN201610144268.1A 2016-03-15 2016-03-15 Method for rapidly discovering accumulation mode from movement trajectory data Pending CN105808754A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610144268.1A CN105808754A (en) 2016-03-15 2016-03-15 Method for rapidly discovering accumulation mode from movement trajectory data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610144268.1A CN105808754A (en) 2016-03-15 2016-03-15 Method for rapidly discovering accumulation mode from movement trajectory data

Publications (1)

Publication Number Publication Date
CN105808754A true CN105808754A (en) 2016-07-27

Family

ID=56468224

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610144268.1A Pending CN105808754A (en) 2016-03-15 2016-03-15 Method for rapidly discovering accumulation mode from movement trajectory data

Country Status (1)

Country Link
CN (1) CN105808754A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108407A (en) * 2017-12-11 2018-06-01 南京师范大学 Group movement mobile cluster pattern sort method based on space-time track
CN109117433A (en) * 2017-06-23 2019-01-01 菜鸟智能物流控股有限公司 Index tree object creation method and index method and related device thereof
CN109800231A (en) * 2019-01-17 2019-05-24 浙江大学 A kind of real-time track co-movement motion pattern detection method based on Flink
CN110443287A (en) * 2019-07-19 2019-11-12 北京航空航天大学 A kind of mobile stream method for drafting of the crowd based on sparse track data
CN110457315A (en) * 2019-07-19 2019-11-15 国家计算机网络与信息安全管理中心 A kind of group's accumulation mode analysis method and system based on user trajectory data
CN110990722A (en) * 2019-12-19 2020-04-10 南京柏跃软件有限公司 Fuzzy co-station analysis algorithm model based on big data mining and analysis system thereof
CN111274864A (en) * 2019-12-06 2020-06-12 长沙千视通智能科技有限公司 Method and device for judging crowd aggregation
CN112633389A (en) * 2020-12-28 2021-04-09 西北工业大学 Method for calculating trend of hurricane motion track based on MDL and speed direction

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117433A (en) * 2017-06-23 2019-01-01 菜鸟智能物流控股有限公司 Index tree object creation method and index method and related device thereof
CN108108407A (en) * 2017-12-11 2018-06-01 南京师范大学 Group movement mobile cluster pattern sort method based on space-time track
CN109800231A (en) * 2019-01-17 2019-05-24 浙江大学 A kind of real-time track co-movement motion pattern detection method based on Flink
CN109800231B (en) * 2019-01-17 2020-12-08 浙江大学 Real-time co-movement motion mode detection method of track based on Flink
CN110443287A (en) * 2019-07-19 2019-11-12 北京航空航天大学 A kind of mobile stream method for drafting of the crowd based on sparse track data
CN110457315A (en) * 2019-07-19 2019-11-15 国家计算机网络与信息安全管理中心 A kind of group's accumulation mode analysis method and system based on user trajectory data
CN110443287B (en) * 2019-07-19 2022-01-14 北京航空航天大学 Crowd moving stream drawing method based on sparse trajectory data
CN111274864A (en) * 2019-12-06 2020-06-12 长沙千视通智能科技有限公司 Method and device for judging crowd aggregation
CN110990722A (en) * 2019-12-19 2020-04-10 南京柏跃软件有限公司 Fuzzy co-station analysis algorithm model based on big data mining and analysis system thereof
CN112633389A (en) * 2020-12-28 2021-04-09 西北工业大学 Method for calculating trend of hurricane motion track based on MDL and speed direction
CN112633389B (en) * 2020-12-28 2024-01-23 西北工业大学 Hurricane movement track trend calculation method based on MDL and speed direction

Similar Documents

Publication Publication Date Title
CN105808754A (en) Method for rapidly discovering accumulation mode from movement trajectory data
Pahins et al. Hashedcubes: Simple, low memory, real-time visual exploration of big data
CN104331466B (en) Motion track sequence pattern Fast Mining Algorithm based on space-time proximity search
CN107016126A (en) A kind of multi-user's model movement pattern method based on sequential mode mining
Xiao et al. Density based co-location pattern discovery
CN104462190A (en) On-line position prediction method based on mass of space trajectory excavation
CN106790468A (en) A kind of distributed implementation method for analyzing user's WiFi event trace rules
CN103914493A (en) Method and system for discovering and analyzing microblog user group structure
CN107766406A (en) A kind of track similarity join querying method searched for using time priority
CN106203494A (en) A kind of parallelization clustering method calculated based on internal memory
CN113779169B (en) Space-time data stream model self-enhancement method
CN104951464A (en) Data storage method and system
CN110275911A (en) Private car trip hotspot path method for digging based on Frequent Sequential Patterns
CN116775661A (en) Big space data storage and management method based on Beidou grid technology
CN108108407A (en) Group movement mobile cluster pattern sort method based on space-time track
Kucuk et al. Pg-trajectory: A postgresql/postgis based data model for spatiotemporal trajectories
CN110222023B (en) Multi-objective parallel attribute reduction method based on Spark and ant colony optimization
Lind et al. Spatio-temporal mobility analysis for community detection in the mobile networks using CDR data
CN111107493A (en) Method and system for predicting position of mobile user
CN105677840A (en) Data query method based on multi-dimensional increasing data model
CN109800231A (en) A kind of real-time track co-movement motion pattern detection method based on Flink
Zhang et al. An Algorithm for Mining Gradual Moving Object Clusters Pattern From Trajectory Streams.
Wu et al. STKST-I: An Efficient Semantic Trajectory Search by Temporal and Semantic Keywords
CN112380307B (en) Land utilization time-space data model design method
Akasapu et al. Efficient trajectory pattern mining for both sparse and dense dataset

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160727