CN105808754A

CN105808754A - Method for rapidly discovering accumulation mode from movement trajectory data

Info

Publication number: CN105808754A
Application number: CN201610144268.1A
Authority: CN
Inventors: 郑凯; 贾梦迪
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2016-03-15
Filing date: 2016-03-15
Publication date: 2016-07-27

Abstract

The invention discloses a method for quickly discovering aggregation patterns from moving trajectory data, proposes the concept of aggregation, and proposes a closed group algorithm for discovery, an R-tree index technology, a grid index technology, a test division algorithm, and a TAD algorithm based on bit vector signatures , growth algorithm and a series of algorithms to efficiently discover aggregation from the trajectory and update it in time. Through the above method, the present invention provides a method for quickly discovering aggregation patterns from moving track data, which not only ensures the accuracy and accuracy of aggregation discovery, but also greatly improves the efficiency of data mining.

Description

A Method for Rapid Discovery of Aggregate Patterns from Movement Trajectory Data

技术领域technical field

本发明涉及数据库、数据分析、数据挖掘、轨迹数据分析、轨迹数据挖掘领域，特别是涉及一种从移动轨迹数据中快速发现聚集模式的方法。The invention relates to the fields of database, data analysis, data mining, trajectory data analysis and trajectory data mining, in particular to a method for quickly discovering aggregation patterns from moving trajectory data.

背景技术Background technique

位置采集技术的日益普及，使得大量采集几乎全部移动对象的轨迹成为可能。从这些对象的行为中发现有用的模式可以给各种关键应用传递有价值的信息。就此，我们提出一种新的概念，叫聚集，即一种模拟各种群体活动，例如庆典、阅兵、抗议、交通堵塞等等的轨迹模式。从轨迹中发现聚集模式具有两项挑战：The increasing popularity of location acquisition technology makes it possible to collect a large number of trajectories of almost all moving objects. Discovering useful patterns from the behavior of these objects can deliver valuable information to various key applications. In this regard, we propose a new concept called gathering, which is a trajectory model that simulates various group activities, such as celebrations, military parades, protests, traffic jams, etc. Discovering aggregated patterns from trajectories presents two challenges:

(1)定义合适的模型：(1) Define a suitable model:

第一，以前的工作总是通过在地理空间上覆盖一个固定的网格来辨别密集区域，但这与聚集中集会的实际形状可能不符。尽管这个问题在一定程度上可以通过用更恰当的间隔尺寸的网格来解决，但随之产生的复杂度呈却指数增长。这使得该解决方案在计算上不可行。First, previous work has always discerned dense regions by overlaying a fixed grid on geospatial space, but this may not match the actual shape of the assemblies in the aggregate. Although this problem can be solved to some extent by using grids with more appropriate spacing sizes, the resulting complexity increases exponentially. This makes this solution computationally infeasible.

第二，更本质的问题是，以前判断密集区域的唯一标准是根据其中一个集会的个体数量是否超过给定的阈值，而不管区域中的个体是否具有共同的行为。The second, more essential problem is that previously the only criterion for judging a dense area was based on whether the number of individuals in one of the assemblies exceeded a given threshold, regardless of whether the individuals in the area shared a common behavior or not.

以前已经提出的概念有flock，convoy和swarm。但对于flock,圆形并不能反映出现实中的真实群体，这可能导致所谓的群组丢失问题；而且flock和convoy都对于时间段的连续性具有严格的要求；此外，这三个概念在生存期内都需要包含具有相同个体集合的群体，然而由于在一个真实的群体活动中，例如商业推广，成员频繁的加入和离开活动是不可避免的，所以这种需求是不现实的。The concepts that have been proposed before are flock, convoy and swarm. But for flock, the circle does not reflect the real group in reality, which may lead to the so-called group loss problem; and both flock and convoy have strict requirements for the continuity of the time period; in addition, these three concepts are in survival It is necessary to include groups with the same set of individuals throughout the period. However, in a real group activity, such as commercial promotion, it is inevitable that members frequently join and leave activities, so this requirement is unrealistic.

而另外一个概念：移动簇(movingcluster)需要任意两个群体在持续时间戳内共享足够数量的相同个体，这在实际的群体活动中仍然很难满足。此外，移动簇中的两个持续群体可以相距很远，但一个聚集通常发生在一个比较稳定的区域。And another concept: moving cluster (moving cluster) requires any two groups to share a sufficient number of the same individuals within a continuous time stamp, which is still difficult to satisfy in actual group activities. Furthermore, two persistent populations in a mobile cluster can be far apart, but an aggregation usually occurs in a relatively stable region.

(2)高效的发现算法(2) Efficient discovery algorithm

在以前的工作中，发现flocks的算法只能发现在固定的圆形区域内的群体；而移动簇算法只要和当前簇共享了足够多相同的个体就会在下一个时间戳重复添加一个簇；CuTS算法首先搜集简化的轨迹来获得convoy参与者，然后采用移动簇算法来得到正确的结果。In previous work, the algorithm for finding flocks can only find groups in a fixed circular area; while the mobile cluster algorithm will repeatedly add a cluster at the next timestamp as long as it shares enough identical individuals with the current cluster; CuTS The algorithm first collects simplified trajectories to obtain the convoy participants, and then uses the moving cluster algorithm to get the correct result.

但以上这些算法都不适用于我们的问题，因为我们不需要任何两个连续的簇共享共同的对象。But none of the above algorithms are suitable for our problem, because we don't need any two consecutive clusters to share common objects.

此外，对象增长算法尝试枚举出对象集合的所有子集，并检查它是不是swarm。为了保持计算复杂度是容易处理的，该算法提出了apriori剪枝、向后剪枝、正向关闭检查等方法来有效减少搜索空间。Additionally, the object growth algorithm tries to enumerate all subsets of the object collection and checks if it is a swarm. In order to keep the computational complexity tractable, the algorithm proposes methods such as apriori pruning, backward pruning, and forward closure checking to effectively reduce the search space.

然而，我们也不能借用这些技术，因为聚集模式不具有向下闭合属性。However, we can't borrow these techniques either, because the Aggregate pattern doesn't have the closed-down property.

发明内容Contents of the invention

本发明主要解决的技术问题是提供一种从移动轨迹数据中快速发现聚集模式的方法，具有可靠性高、查找方便等优点，同时在小型数控机床的应用及普及上有着广泛的市场前景。The technical problem mainly solved by the present invention is to provide a method for quickly discovering aggregation patterns from moving track data, which has the advantages of high reliability and convenient search, and has broad market prospects in the application and popularization of small CNC machine tools.

为解决上述技术问题，本发明采用的一个技术方案是：In order to solve the problems of the technologies described above, a technical solution adopted in the present invention is:

提供一种从移动轨迹数据中快速发现聚集模式的方法，步骤包括：Provide a method for quickly discovering aggregation patterns from mobile trajectory data, the steps include:

（1）快照簇阶段:(1) Snapshot cluster stage:

预定义聚集：当且仅当一个群Cr的每个快照簇中存在至少m_p个参与者，即，Cr叫做聚集，如果一个Cr中没有超群且它是一个聚集，则称这个聚集是闭合的，其中，一个快照簇是一组具有任意形状和大小的对象的群组，crowd即为群，o为移动对象的轨迹，t为数据库的时间阈 Predefined aggregation: if and only if there are at least m _p participants in each snapshot cluster of a cluster Cr, i.e. , Cr is called a cluster, if there is no supergroup in a Cr and it is a cluster, then the cluster is said to be closed, where a snapshot cluster is a group of objects with arbitrary shape and size, crowd is a group, o is the trajectory of the moving object, t is the time threshold of the database

中的时间点，o(t)为时间为t时移动对象o的位置，Par(Cr)为一个群Cr的参与者的集合；The time point in , o(t) is the position of the moving object o when the time is t, Par(Cr) is a set of participants of a group Cr;

预设定义2：给出移动对象的轨迹的集合，支持的阈值，变量阈值，和生存期阈值，一个群Cr是在连续的时间戳内是快照簇的次序，即，其满足以下需求：Cr,_T代表的Cr的生存期不少于，即；在任意时间至少存在个对象，即；快照簇任意两个连续对之间的距离不大于，；Preset Definition 2: Given a moving object A collection of trajectories, supported by thresholds , variable threshold , and the lifetime threshold , a group Cr is the order of snapshot clusters within consecutive time stamps, ie , which meets the following requirements: Cr, the lifetime of Cr represented by _T is not less than ,Right now ; At any time there exists at least object, that is ;The distance between any two consecutive pairs of snapshot clusters is not greater than , ;

由于一个快照集合snapshotcluster本质上是点的集合，给出两个点集P和Q，点集P和Q的hausodrff距离定义为：；Since a snapshot set snapshotcluster is essentially a collection of points, given two point sets P and Q, the hausodrff distance between point sets P and Q defined as: ;

在数据库的时间阈的每个时间点，基于密度对移动对象的轨迹进行集中，以发现所有的快照簇，首先用曲线数据压缩算法简化原始的轨迹，然后在直线部分进行集中，直线部分的每个簇包含的对象可能在某些时间点组成快照簇，输出快照簇的数据库；time threshold in the database At each time point of , the trajectories of moving objects are concentrated based on the density to find all the snapshot clusters. First, the original trajectories are simplified by the curve data compression algorithm, and then concentrated in the straight part. Each cluster of the straight part contains objects It is possible to form a snapshot cluster at some point in time, and output the database of the snapshot cluster ;

（2）群的发现阶段，即从中找出所有闭合的群：(2) The discovery phase of the group, that is, from Find all closed groups in :

（2.1）定义引理1：在群中，如果不存在，使得如果在群Cr中附加上将产生一个新的群，则群Cr是一个闭合的群，否则，群Cr是不闭合的，其中，簇为群Cr中时间为t_i时的快照簇，簇为群Cr中时间为t_j时的快照簇；(2.1) Definition Lemma 1: In the group in, if not present , such that if the group Cr is attached to will produce a new group, then the group Cr is a closed group, otherwise, the group Cr is not closed, among them, the cluster is the snapshot cluster when the time is t _i in the group Cr, the cluster is the snapshot cluster when the time is t _j in the group Cr;

（2.2）利用排列索引簇方法rangesearch、R树索引簇方法或网格索引簇方法，通过在下一个时间点把快照簇附加到当前的群参与者V的集合中来发现闭合的群，群参与者是可能成长为群的簇的集合，相当于候选的群；(2.2) Use rangesearch, R-tree index cluster method or grid index cluster method to discover closed groups by attaching snapshot clusters to the current set of group participants V at the next point in time, group participants It is a collection of clusters that may grow into a group, which is equivalent to a candidate group;

排列索引簇方法RangeSearch：RangeSearch()是在当前时间戳，从簇集合中查找离的Hausdorff距离不大于的簇，其实现方式是就计算出每个的，即要计算在当前的crowd参与者和当前时间点的簇之间的每对的，从中找出不大于的所有簇；The method of arranging index clusters RangeSearch: RangeSearch() is to find the distance from the cluster set at the current timestamp The Hausdorff distance is not greater than clusters, the implementation is to calculate each of , i.e. to calculate for each pair between the current crowd participant and the cluster at the current time point , find out from no greater than all clusters of

(3)聚集侦查阶段：(3) Gathering and investigation stage:

利用测试划分算法TAD或位向量签名测试划分算法，确认上一步得到的每个闭合的群是否是闭合聚集或者是否包含闭合聚集。Use the test partition algorithm TAD or the bit vector signature test partition algorithm to confirm whether each closed group obtained in the previous step is a closed aggregate or whether it contains a closed aggregate.

在本发明一较佳实施例中，所述利用排列索引簇方法，通过在下一个时间点把快照簇附加到当前的群参与者V的集合中来发现闭合的群的具体步骤包括：In a preferred embodiment of the present invention, the specific steps of finding a closed group by adding the snapshot cluster to the current set of group participants V at the next time point by using the method of arranging index clusters include:

获取快照簇的数据库、预设一个群的支撑阈值、预设一个群的生存期的阈值以及预设在群的定义中的变量的阈值；Get the database of the snapshot cluster , preset the support threshold of a group , Preset the threshold of the lifetime of a group and the thresholds for the variables preset in the definition of the group ;

在每个时间戳，检查每个群Cr的最后一个簇，判断群Cr是否可以通过再附加一个簇进行扩展：获取当前时间戳，利用公式计算出簇的集合中每个簇到簇的Hausdorff距离，并查找离的Hausdorff距离不大于的簇，其中，C为簇的集合，是当前时间戳时候的簇的集合，且包含于；如果找到簇则可以扩展，扩展后的群作为新的参与者插入在群参与者V后面；如果找不到簇即不可以扩展，且当Cr的生存期不小于，则根据引理1得出群Cr是一个闭合的群；如果找不到簇，且Cr的生存期小于，则Cr不是群；在任意时间戳，不能附加到任何存在的群参与者的簇R被作为一个新的群参与者。at each timestamp , check the last cluster of each group Cr, and judge whether the group Cr can be expanded by adding another cluster: to obtain the current timestamp, use the formula Calculate the set of clusters Each cluster in to cluster The Hausdorff distance, and find the distance from The Hausdorff distance is not greater than of clusters , where C is the set of clusters, is the set of clusters at the current timestamp, and is contained in ; if a cluster is found Then it can be extended, the extended group Insert as new participant after cluster participant V; if cluster not found That is, it cannot be extended, and when the lifetime of Cr is not less than , then according to Lemma 1, the group Cr is a closed group; if no cluster , and the lifetime of Cr is less than , then Cr is not a group; at any time stamp, a cluster R that cannot be attached to any existing group participant is treated as a new group participant.

在本发明一较佳实施例中，所述利用R树索引簇方法，通过在下一个时间点把快照簇附加到当前的群参与者V的集合中来发现闭合的群的具体步骤包括：In a preferred embodiment of the present invention, the specific steps of using the R-tree index cluster method to find a closed group by attaching the snapshot cluster to the current set of group participants V at the next time point include:

用表示簇的最小矩形边界MBR，用公式表示两个矩形之间的最小距离，预定义引理2：给定两个簇和，，c为C中的簇；use Represents a cluster The minimum rectangular boundary MBR, with the formula Represents the minimum distance between two rectangles, predefined Lemma 2: Given two clusters and , , c is a cluster in C;

获取每个群Cr的最后一个簇，为群Cr中的任意一个簇，利用公式计算出簇的集合中每个簇到簇的Hausdorff距离；Get the last cluster for each cluster Cr , For any cluster in the group Cr, use the formula Calculate the set of clusters Each cluster in to cluster Hausdorff distance;

使用公式查找时，检索簇的集合并取出一个参与者集合，参与者集合与的最小距离不大于，然后提炼这些参与者从而找出所有满足引理2的，其中，用R树为簇的集合C中的簇的最小矩形边界建立索引，并基于R树建立查询窗口，该窗口是参数为的扩大MBR，结点中包含着的且与窗口不重叠的簇不是参与者；use formula look up When the collection of clusters is retrieved And take out a set of participants, the set of participants is the same as The minimum distance is not greater than , and then refine these participants to find all , where the R tree is used to index the minimum rectangular boundary of the clusters in the set C of clusters, and a query window is established based on the R tree, and the window is a parameter of of To expand the MBR, the clusters contained in the node and not overlapping with the window are not participants;

预定义引理3：令代表矩形M的第a条边，a=（1，2,3,4），定义距离函数为：Predefined Lemma 3: Let Represents the ath side of the rectangle M, a=(1, 2, 3, 4), defines the distance function for:

，则有，即计算得到离Cr的距离小于的快照簇的集合； , then there is , that is, the calculated distance from Cr is less than A collection of snapshot clusters;

使用公式检索R树中的参与者，然后提炼这些参与者从而得到满足引理3的簇的集合，其中，用R树为C中的簇的MBRs建立索引，并基于R树建立查询窗口，该窗口是参数为的扩大MBR，通过扩大的每条边使之包含四个矩形，矩形用表示，a=（1，2,3,4），在R树的遍历中，只有一个结点与四个矩形都相交时才进一步检查该结点；use formula Retrieve the participants in the R-tree, and then refine these participants to obtain the set of clusters satisfying Lemma 3, where the R-tree is used to index the MBRs of the clusters in C, and a query window is built based on the R-tree, which is The parameter is of Expand the MBR, by expand Each side of the so that it contains four rectangles, the rectangles are Indicates that a=(1, 2, 3, 4), in the traversal of the R tree, only when a node intersects with all four rectangles will the node be further checked;

检查每个群参与者的最后一个簇，看它是否能通过再附加一个簇扩展，如果可以，扩展后的群参与者作为新的参与者插入在群参与者的集合V后面；如果不可以扩展，且Cr的生存期不小于，则根据引理1得出群Cr是一个闭合的群；如果不可以扩展，且Cr的生存期小于，则Cr不是群；在任意时间戳，不能附加到任何存在的群参与者的簇R被作为一个新的群参与者。Check the last cluster of each group participant to see if it can be extended by appending another cluster, if so, the expanded group participant is inserted as a new participant after the set V of group participants; if not, it cannot be extended , and the lifetime of Cr is not less than , then according to Lemma 1, the group Cr is a closed group; if it cannot be extended, and the lifetime of Cr is less than , then Cr is not a group; at any time stamp, a cluster R that cannot be attached to any existing group participant is treated as a new group participant.

在本发明一较佳实施例中，所述利用网格索引簇方法，通过在下一个时间点把快照簇附加到当前的群参与者V的集合中来发现闭合的群的具体步骤包括：In a preferred embodiment of the present invention, the specific steps of using the grid index cluster method to find a closed group by attaching the snapshot cluster to the current set of group participants V at the next time point include:

定义影响区域：对于一个网格G中位于a行b列的单元，它的影响区域是与的最小距离不大于的单元的集合，即；Define the area of influence: For a cell located in row a and column b in a grid G , its area of influence is the same as The minimum distance is not greater than A collection of units, that is ;

首先，用网格G将群Cr的整个空间划分成多个单元g，每个单元都是边长等于的正方形，对于每个时间戳t，浏览一遍簇的集合后，用两种数据结构构建一个网格索引，网格索引中含有每个簇的单元列表，其中，单元列表记录了被簇占用的单元和每个单元的反向列表，反向列表中存储着覆盖在这个单元上的簇；First, divide the entire space of the group Cr into multiple units g with a grid G, and each unit has a side length equal to The square of , for each timestamp t, walk through the set of clusters After that, build a grid index with two data structures , the grid index contains a list of cells for each cluster in , where the list of cells Records the cells occupied by the cluster and each unit The reverse list of , which stores the clusters covered on this unit;

一个单元的影响区域包含与g中的点的距离不大于的点，给出作为查询簇的群Cr的最后一个簇和簇对应时间的下一个时间戳的网格索引；The area of influence of a cell consists of points in g at a distance no greater than points of , giving the last cluster of the group Cr as the query cluster and cluster The grid index of the next timestamp corresponding to the time ;

修剪阶段：从中选出每个单元g并找出中其单元列表与相交的簇，其中，只有覆盖中的每个单元的影响区域的簇才可以成为参考者，否则簇中会存在至少一个离的距离比远的点；Pruning phase: from Select each unit g in and find where its cell list is the same as Intersecting clusters where only the covering Only the cluster of the influence area of each unit in the cluster can be the reference, otherwise there will be at least one disconnected cluster in the cluster distance ratio far point;

细化阶段：由于同一单元内任意两点之间的距离一定不大于，且在极限情况下，如果，同样，所以只需要检查在不同集合中的单元，查找簇集合中离的Hausdorff距离不大于的簇，即检索群Cr中的每个簇，对于群Cr中的任意簇，先把一个单元的集合加入和来得到它们的共同单元，对于中的点p，计算点p与的最小Hausdorff距离，且只需要计算到点p和到落在影响区域内的点的Hausdorff距离，表示和之间的最小Hausdorff距离；Refinement stage: Since the distance between any two points in the same unit must not be greater than , and in the limit, if ,same , so only the cells in different collections need to be examined , to find the distance in the set of clusters The Hausdorff distance is no greater than The clusters, that is, to retrieve each cluster in the group Cr, for any cluster in the group Cr , first add a set of units to and to get their common unit, for Point p in , calculate point p and The minimum Hausdorff distance of , and only need to calculate to point p and Hausdorff distance to points falling within the area of influence, express and The minimum Hausdorff distance between;

如果找到离的Hausdorff距离不大于的簇则可以扩展，扩展后的群作为新的参与者插入在群参与者V后面；如果找不到簇即不可以扩展，且当Cr的生存期不小于，则根据引理1得出群Cr是一个闭合的群。If found from The Hausdorff distance is no greater than of clusters Then it can be extended, the extended group Insert as new participant after cluster participant V; if cluster not found That is, it cannot be extended, and when the lifetime of Cr is not less than , then according to Lemma 1, the group Cr is a closed group.

在本发明一较佳实施例中，在所述利用测试划分算法，确认上一步得到的每个闭合的群是否是闭合聚集或者是否包含闭合聚集中：In a preferred embodiment of the present invention, in the use of the test division algorithm, confirm whether each closed group obtained in the previous step is a closed aggregation or whether it is included in a closed aggregation:

预设定义3：给出一个群Cr对象o称为一个参与者当且仅当它出现在Cr的至少个快照簇中，令代表包含对象o的Cr中的快照簇的集合，即，则Cr的参与者是对象的集合；Preset Definition 3: Given a group Cr object o is called a participant if and only if it appears in Cr at least In a snapshot cluster, let represents the collection of snapshot clusters in Cr containing object o, i.e. , then the participant of Cr is the collection of objects ;

利用测试划分算法，从全体闭合的群开始测试，根据定义3为群Cr中的每个快照簇计算其是否参与活动以判断其是否为参与者，然后检查群中的每个簇中参与者的数量，然后根据预定义聚集的方法测试每个闭合的群是否是聚集；如果不是聚集，则辨别出无效簇，无效簇没有足够的参与者，并通过移除这些簇将群划分成几个子列，其中，对于每个仍是群的子列，再次重复利用预定义聚集的方法测试闭合的群是否是聚集，直到再也不会找到其他群为止。Using the test division algorithm, start the test from the whole closed group, calculate whether it participates in activities for each snapshot cluster in the group Cr according to Definition 3 to judge whether it is a participant, and then check the participants in each cluster in the group number, each closed cluster is then tested according to the predefined clustering method to see if it is an aggregate; if not, invalid clusters are identified, which do not have enough participants, and the cluster is divided into several sub-columns by removing these clusters , where, for each subcolumn that is still a group, the test of whether the closed group is an aggregation is repeated again using the predefined aggregation method, until no further groups are found.

在本发明一较佳实施例中，所述利用位向量签名测试划分算法，确认上一步得到的每个闭合的群是否是闭合聚集或者是否包含闭合聚集的具体步骤包括：In a preferred embodiment of the present invention, the specific steps of using the bit vector signature test partition algorithm to confirm whether each closed group obtained in the previous step is a closed aggregation or whether it contains a closed aggregation include:

a)为群Cr的每个移动对象的轨迹构造位向量签名BVS，且每个BVS为一个长度为n的位向量，该向量的每一位代表相符的簇中o的存在与否，其中，群Cr中的所有对象的BVS可以通过群的单次扫描构造，且BVS只需要构造一次便可在TAD的所有递归过程中使用；a) is the trajectory of each moving object of the group Cr Construct a bit vector signature BVS, and each BVS is a bit vector of length n, each bit of the vector represents the existence of o in the matching cluster, wherein, the BVS of all objects in the group Cr can be passed through the group Single-scan construction, and BVS only needs to be constructed once to be used in all recursive processes of TAD;

b)测试步骤：用代表某个对象o的BVS,测试群Cr是否为闭合的群，就是计算中1的位数，即一个位向量的汉明权重，采用遍历的所有位的方式或者基于二进制树模式的计数方式得到位向量的汉明权重，当采用基于二进制树模式的计数时，先得到的每位中1的数量，再得到每4、即位中1的数量，一直到第m次、2^m=n时，得到每n位中1的数量，任意n位的位向量，它的汉明权重可以用以内的步骤得出，设置面具m，面具m为一个和BVS长度相同的位向量；当汉明权重的值大于或等于时，闭合的群是闭合聚集或者包含闭合聚集；b) Test steps: use The BVS representing an object o, to test whether the group Cr is a closed group, is to calculate The number of digits in 1, that is, the Hamming weight of a bit vector, uses traversal The Hamming weight of the bit vector is obtained by means of all bits of the method or the counting method based on the binary tree mode. When using the counting method based on the binary tree mode, first get per the number of 1s in the bit, and then get every 4, ie The number of 1 in the bit, until the mth time, when 2 ^m = n, get the number of 1 in every n bit, any bit vector of n bits, its Hamming weight can be used The following steps are obtained, set the mask m, the mask m is a bit vector with the same length as the BVS; when the value of the Hamming weight is greater than or equal to , the closed group is a closed aggregate or contains a closed aggregate;

c)划分步骤：如果一个群不是聚集，则将它划分成一系列子列，即将每个移动对象的轨迹的位向量划分成一系列子向量，用面具从原始的BVS中进行提取，面具与子群相同的位置位为1，其他位置位为0；通过对原始的BVS和面具进行“与”操作，得到一个新的BVS，其中想要的子群的所在位保持为1而其他位为0，这样，返回一系列面具，并通过面具进入子列的测试步骤，即测试步骤直接用面具得到对象的与子群相符的BVS。c) Division step: If a group is not aggregated, it is divided into a series of sub-columns, that is, the bit vector of the trajectory of each moving object is divided into a series of sub-vectors, extracted from the original BVS with a mask, and the mask and sub-group The same bit is 1, and the other bits are 0; by performing an "AND" operation on the original BVS and the mask, a new BVS is obtained, in which the bit of the desired subgroup remains 1 and the other bits are 0, so, Return a series of masks, and pass the mask into the test step of the subgroup, that is, the test step directly uses the mask to get the BVS of the object that matches the subgroup.

在本发明一较佳实施例中，移动对象的轨迹应该周期性地添加到数据库中，时间域为的移动对象的轨迹的数据库为，在搜集到时间域为的新轨迹并添加到中以后，得到扩大的时间域，其对应的更新后的数据库。In a preferred embodiment of the present invention, the trajectories of moving objects should be periodically added to the database , the time domain is The database of trajectories of moving objects is , in the collected time domain as new trajectory and add to Later in the middle, the extended time domain is obtained , which corresponds to the updated database .

在本发明一较佳实施例中，更新数据库后，发现新增长的闭合聚集的具体步骤包括：In a preferred embodiment of the present invention, after updating the database, the specific steps of finding the newly increased closed aggregation include:

1)扩展：预定义引理4：引理4：给出中的一个闭合的群,如果它的最后一个簇不是在的最近的时间点，即，其中是时间为是簇的集合，则Cr在中不可扩展，引理4表明以前的数据库中只有一部分群或者群参与者是可扩展的；1) Extension: Predefined Lemma 4: Lemma 4: Given a closed group in , if its last cluster is not in the most recent time point of ,in is the time for is a collection of clusters, then Cr is in is not scalable in , and Lemma 4 shows that only a part of groups or group participants in previous databases are scalable;

判断中簇序列在结束时的集合CS是否可以扩展成新的群，这些簇序列包含以前的数据库中闭合的群和长度小于k的群参与者；judge The middle cluster sequence is in Whether the set CS at the end can be expanded into new clusters whose sequence of clusters contains closed clusters in the previous database and cluster participants of length less than k;

在步骤（2）中发现闭合的群后，保存最后一个时间戳结束时的群参与者和闭合的群，然后在收到新的轨迹集合并将其转换为簇的数据库之后，将时间游标设置为，并将当前群参与者由V改为CS；After finding a closed group in step (2), save the group participants and the closed group at the end of the last timestamp, and then when a new set of trajectories is received and convert it to a clustered database After that, move the time cursor Set as , and change the current group participant from V to CS;

2）聚集更新：假设中一个群已被扩展成中的一个新的闭合群，则要找出中的闭合聚集时，直接对使用TAD算法或者位向量TAD算法；2) Aggregate update: Assume in a group has been expanded to A new closed group in , then find out When the closed aggregation in Use TAD algorithm or bit vector TAD algorithm;

预定义引理5：用IC(Cr)代表群Cr中的无效簇的集合，则有，因中新的簇的加入，中的一些非参与者可能会变成参与者，即中的聚集可能扩大或者与中邻近的聚集融合；如果找到中属于的无效簇，则在tj之前的所有闭合聚集在中保持不变，即得到定理2：当给出一个无效簇，其中，则任一闭合聚集在中仍保持闭合；Predefined Lemma 5: Use IC(Cr) to represent the set of invalid clusters in group Cr, then we have ,because The addition of new clusters in Some non-participants in may become participants, namely Aggregations in may expand either with Neighboring aggregation fusion in ; if found belongs to invalid cluster of , then all closures before tj gather at remains unchanged, that is, Theorem 2: When an invalid cluster is given ,in , then any closed aggregate exist remain closed in

测试步骤：当使用位向量TAD算法时，先为的每个移动对象构造BVS，并侦查无效簇；Test steps: When using the bit vector TAD algorithm, first for Constructs a BVS for each moving object and detects invalid clusters;

在测试阶段得到一系列无效簇IC之后，找出在时间戳之前的无效簇，即,，而且不存在使得簇，，只有中的子群需要被进一步检查，因为他们包含新的或者更新后的聚集。After getting a series of invalid cluster ICs during the test phase, find out where the time stamp previous invalid clusters, i.e. , , and there is no make cluster , ,only The subgroups in need to be further examined because they contain new or updated aggregates.

本发明的有益效果是：不仅保证了聚集发现的精确性和准确性，而且可以大大的提高数据挖掘的效率。The beneficial effect of the invention is that it not only ensures the accuracy and accuracy of aggregation discovery, but also can greatly improve the efficiency of data mining.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其它的附图，其中：In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without creative work, wherein:

图1为所述一种从移动轨迹数据中快速发现聚集模式的方法中将一天划分为三个时间段的实验效果图；Fig. 1 is an experimental effect diagram of dividing one day into three time periods in the method for quickly discovering aggregation patterns from moving track data;

图2为所述一种从移动轨迹数据中快速发现聚集模式的方法中Fig. 2 is described in the method for quickly discovering aggregation mode from moving track data

根据天气划分的实验效果图；Experimental renderings divided by weather;

图3为关于的所述一种从移动轨迹数据中快速发现聚集模式的Figure 3 is about The described method of quickly discovering aggregation patterns from mobile trajectory data

方法的运行时间示意图；Schematic diagram of the runtime of the method;

图4为关于的所述一种从移动轨迹数据中快速发现聚集模式的方Figure 4 is about A method for quickly discovering aggregation patterns from mobile trajectory data

法的运行时间示意图;Schematic diagram of the running time of the method;

图5为关于||的所述一种从移动轨迹数据中快速发现聚集模式Figure 5 is about | A method for quickly discovering aggregated patterns from mobile trajectory data

的方法的运行时间示意图;Schematic diagram of the runtime of the method;

图6为关于的所述一种从移动轨迹数据中快速发现聚集模式的Figure 6 is about The described method of quickly discovering aggregation patterns from mobile trajectory data

方法的运行时间示意图;Schematic diagram of the runtime of the method;

图7为关于的所述一种从移动轨迹数据中快速发现聚集模式的Figure 7 is about The described method of quickly discovering aggregation patterns from mobile trajectory data

方法的运行时间示意图;Schematic diagram of the runtime of the method;

图8为关于的所述一种从移动轨迹数据中快速发现聚集模式的Figure 8 is about The described method of quickly discovering aggregation patterns from mobile trajectory data

方法的运行时间示意图;Schematic diagram of the runtime of the method;

图9为基于数据库规模的群扩展算法和重构法的时间代价比较示意图；Figure 9 is a schematic diagram of the time cost comparison between the group expansion algorithm and the reconstruction method based on the database scale;

图10为群扩展算法和重构法的时间代价比较示意图。Fig. 10 is a schematic diagram of the time cost comparison between the group expansion algorithm and the reconstruction method.

具体实施方式detailed description

下面将对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅是本发明的一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

请参阅图1-10，本发明实施例包括：Please refer to Fig. 1-10, the embodiment of the present invention includes:

(1)提出聚集的概念：可以模拟各种不平凡的群体事件。(1) Propose the concept of aggregation: various extraordinary group events can be simulated.

聚集的定义：当且仅当一个群的每个快照（SnapshotCluster，一个快照簇是一组具有任意形状和大小的对象的群组）中存在至少个参与者，即，叫做聚集。如果一个中没有超群且它是一个聚集，则称这个聚集是闭合的。The definition of aggregation: if and only if a group There are at least participants, that is , It's called gathering. if a There is no supergroup in and it is a set, then the set is said to be closed.

聚集具有以下属性：Aggregates have the following properties:

范围:一个聚集通常涉及的个体数量比较多。Scope: An aggregation usually involves a relatively large number of individuals.

密度:这些个体来自于一个密集的群体。Density: These individuals come from a dense population.

持久性:聚集应该持续一段确定的时间段且不间断。Persistence: Aggregation should last for a defined period of time without interruption.

稳定性:群体的几何性质（如形状、位置）是比较稳定的。Stability: The geometric properties (such as shape, position) of the population are relatively stable.

承诺:在聚集的任意时间，其中存在几个专用的成员坚持在一段时Commitment: At any time in an aggregate, there exist several dedicated members persisting for a period of time

间（可能不连续）内坚持在群体中。Intervals (possibly discontinuously) persist in groups.

预设定义1：给出移动对象的轨迹的集合，一个距离的阈值，和一个整数m，在时间戳t的快照簇SnapshotCluster是满足以下条件的非空子集：Preset Definition 1: Given a moving object A collection of trajectories, a distance threshold , and an integer m, the snapshot cluster SnapshotCluster at timestamp t is a non-empty subset satisfying the following conditions:

1)，关于和m，离是密度联通的；1) ,about and m, Leave is densely connected;

2)是最大的，即关于和m且，和离是密度可达的；2) is the largest, that is, about and m and , and Leave is density-reachable;

一个快照簇是一组具有任意形状和大小的对象的群组，在一个给定的时间戳，他们相互之间是密度联通的，根据DBSCAN的概念，这样的快照簇是空间尺寸最大的以至于其中没有两个具有相同时间戳的簇的对象是重叠的，并将snapshotcluster缩写为cluster并省略参数m，；A snapshot cluster is a group of objects with arbitrary shape and size. At a given time stamp, they are densely connected to each other. According to the concept of DBSCAN, such a snapshot cluster has the largest spatial size so that objects in which no two clusters with the same timestamp are overlapping, and abbreviate snapshotcluster to cluster and omit the parameter m, ;

预设定义2：给出移动对象的轨迹的集合，支持的阈值，变量阈值，和生存期阈值，一个群是在连续的时间戳内是快照簇的次序，即，其满足以下需求：Preset Definition 2: Given a moving object A collection of trajectories, supported by thresholds , variable threshold , and the lifetime threshold , a group is the order of snapshot clusters within consecutive timestamps, ie , which satisfies the following requirements:

1）代表的的生存期不少于，即；1) Representative lifetime of not less than ,Right now ;

2)在任意时间至少存在个对象，即；2) exists at least at any time object, that is ;

3)快照簇任意两个连续对之间的距离不大于，；3) The distance between any two consecutive pairs of snapshot clusters is not greater than , ;

此外，的一个子序列subsequence，叫做的sub-(super-)crowd，如果没有超群则称它是闭合的；also, A subsequence of subsequence, called The sub-(super-)crowd, if It is said to be closed if there is no supergroup;

由于一个快照集合snapshotcluster本质上是点的集合，给出两个点集P和Q，点集P和Q的hausodrff距离定义为：Since a snapshot set snapshotcluster is essentially a collection of points, given two point sets P and Q, the hausodrff distance between point sets P and Q defined as:

。 .

预设定义3：给出一个群,对象称为一个参与者当且仅当它出现在的至少个快照簇中，令代表包含对象的中的快照簇的集合，即，则的参与者是对象的集合；Default definition 3: Given a group , object is called a participant if and only if it appears in at least In a snapshot cluster, let represents the containing object of The collection of snapshot clusters in , namely ,but A participant is a collection of objects ;

（2）提出多项高效算法：(2) Propose multiple efficient algorithms:

提出发现闭合群算法(算法1)、R树索引技术、网格索引技术、测试划分算法（Test-and-DivideAlgorithm，TAD）、基于位向量签名的TAD算法(TAD*)、增长算法等一系列算法来高效地从轨迹中发现聚集并及时更新。Proposed a series of algorithm for discovering closed groups (Algorithm 1), R-tree index technology, grid index technology, Test-and-Divide Algorithm (TAD), TAD algorithm based on bit vector signature (TAD*), growth algorithm, etc. Algorithms to efficiently discover aggregates from trajectories and update them in time.

在得到大量的移动对象的轨迹以后，我们通过对这些数据进行分析，发现其中存在一些有趣的信息，针对这些信息，定义一个新的模型——聚集，用它来模拟各种群体活动。首先定义一个概念——群(crowd)，它满足聚集的前四种属性(见第4点)，而聚集则是满足第五种属性的特殊的群。After obtaining the trajectories of a large number of moving objects, we analyzed the data and found some interesting information. Based on this information, we defined a new model—aggregation, and used it to simulate various group activities. First define a concept—crowd, which satisfies the first four attributes of aggregation (see point 4), and aggregation is a special group that satisfies the fifth attribute.

给出定义以后，就可以根据聚集的概念及特点从大量轨迹数据中发现闭合聚集。为此，我们提出了一些新的算法，以加快发现过程。现将发现过程分为如下三个阶段：After the definition is given, closed aggregation can be found from a large amount of trajectory data according to the concept and characteristics of aggregation. To this end, we propose some new algorithms to speed up the discovery process. The discovery process is now divided into the following three stages:

快照簇阶段:Snapshot cluster phase:

在的每个时间点，基于密度对对象的轨迹进行集中，以发现所有的快照簇。为降低成本，首先用Douglas-Peucker算法简化原始的轨迹，然后在直线部分进行集中。直线部分的每个簇包含的对象可能在某些时间点组成快照簇。在这样一个对象集合中发现快照簇比直接在整个对象集合中更高效。该过程的输出为快照簇的数据库。exist At each time point of , object trajectories are clustered based on density to discover all clusters of snapshots. In order to reduce the cost, the Douglas-Peucker algorithm is used to simplify the original trajectory first, and then concentrate on the straight line. The objects contained in each cluster of the straight line part may form a snapshot cluster at some point in time. Discovering snapshot clusters in such a collection of objects is more efficient than directly in the entire collection of objects. The output of this process is the database of the snapshot cluster .

群的发现阶段:Group discovery phase:

该阶段旨在从中找出所有闭合的群。This stage aims to start from Find all closed groups in .

易知群满足向下闭合属性，也就是说，一个群的任意长的子列仍然是一个群，这使得输出所有的子群是多余的。更重要的是，无法保证从一个不闭合的群中侦查出的聚集是闭合的。因此，在这个阶段我们只找出闭合的群，而非全部。为找出闭合的群，首先想到的是需要检查群的每个超序列以检查它是否是闭合的。但实际上，根据下面的引理，检查一个群是否是闭合的只要再附加一个快照簇就足够了。It is easy to know that groups satisfy the downward closure property, that is, an arbitrarily long sublist of a group is still a group, which makes outputting all subgroups redundant. More importantly, there is no guarantee that an aggregate detected from an open swarm is closed. Therefore, at this stage we only find closed groups, not all of them. To find a closed group, the first thing that comes to mind is the need to examine each supersequence of the group to check whether it is closed. But actually, according to the following lemma, it is enough to check whether a group is closed by appending one more cluster of snapshots.

引理1：给出一个群，如果不存在，使得满足如果在中附加上将产生一个新的群，则是一个闭合的群。否则，是不闭合的。Lemma 1: Given a group , if not present , such that it satisfies if in add in will generate a new group, then is a closed group. otherwise, is not closed.

根据这个引理，可以通过在下一个时间点把快照簇附加到当前的群参与者（用V表示）的集合中来发现闭合的群。该过程由算法1实现，算法1如下：According to this lemma, closed groups can be discovered by appending the cluster of snapshots to the current set of group participants (denoted by V) at the next time point. This process is realized by Algorithm 1, and Algorithm 1 is as follows:

输入： enter:

；//闭合群的集合 ;//set of closed groups

；//当前群参与者的集合 ;//The collection of current group participants

在每个时间戳，检查每个群参与者的最后一个簇来看它是否可以通过再附加一个簇扩展。如果可以，扩展的群参与者作为新的参与者插入在V后面。否则，根据引理1，我们可以得出它也是一个闭合的群（如果长度不小于k_c），或者不是群。注意，在任意时间戳，不能附加到任何存在的群参与者的簇（用R表示）也应该被视作一个新的参与者，因为它以后可能成长为一个群。At each timestamp, the last cluster of each swarm participant is checked to see if it can be expanded by appending another cluster. If applicable, the extended group participant is inserted after V as a new participant. Otherwise, according to Lemma 1, we can conclude that it is also a closed group (if the length is not less than k _c ), or it is not a group. Note that at any time-stamp, a cluster (denoted by R) that cannot be attached to any existing swarm participant should also be considered a new participant, since it may later grow into a swarm.

显然，算法1中的RangeSearch()过程比较耗费时间。RangeSearch()是在当前时间戳，从簇的集合中查找离的Hausdorff距离不大于的簇。它的一个比较幼稚的实现方式是就计算出每个的。显然，仅是计算的时间复杂度即为。而且，要计算在当前的crowd参与者和当前时间点的簇中的所有对的。这使得为大型数据库的计算代价巨大。为解决这个问题，我们发明了空间索引技术来组织簇并加速寻找过程。Obviously, the RangeSearch() process in Algorithm 1 is time-consuming. RangeSearch() is at the current timestamp, from the collection of clusters search from The Hausdorff distance is not greater than of clusters. A naive implementation of this is to just compute each of . Obviously, only the calculation The time complexity of . Also, to compute the pairwise . This makes the computation expensive for large databases. To solve this problem, we invented spatial index technology to organize clusters and speed up the finding process.

R树索引簇：我们的确不需要两簇间确切的Hausdorff距离，而是只要知道R-tree index clusters: We really don't need the exact Hausdorff distance between two clusters, but we just need to know

它们的距离是否大于或小于就足够了。用表示簇c的最小举行边界（MBR，minimumboundingrectangle），用表示两个矩形之间的最小距离。则有下面的引理2：Whether their distance is greater or less than Will suffice. use Represents the minimum holding boundary (MBR, minimumboundingrectangle) of cluster c, with Indicates the minimum distance between two rectangles. Then we have the following Lemma 2:

引理2：给定两个簇和，。Lemma 2: Given two clusters and , .

基于这个引理，我们首先检索并取出一个参与者集合，它与的最小距离不大于，然后提炼这些参与者从而得到最终结果。为更高效的进行参与者查找，我们用R树为C中的簇的MBRs建立索引，并基于R树建立查询窗口，该窗口是参数为的扩大MBR。显然，结点中包含着的且与窗口不重叠的簇不是参与者。Based on this lemma, we first retrieve and fetches a set of actors that is the same as The minimum distance is not greater than , and then refine these participants to get the final result. In order to perform participant search more efficiently, we use R-tree to index the MBRs of clusters in C, and build a query window based on R-tree, which is a parameter of of Expand the MBR. Clearly, clusters contained in nodes that do not overlap the window are not participants.

然而，是Hausdorff距离的一个相当模糊的低级界限值。接下来的引理为Hausdorff距离提供了一个严密的低级界限值。However, is a rather vague low-level bound on the Hausdorff distance. The following lemma provides a tight low-level bound on the Hausdorff distance.

引理3：令代表矩形M的第a条边（a=1,2,3,4）。定义距离函数为：Lemma 3: Let Represents the a-th side of the rectangle M (a=1,2,3,4). Define the distance function for:

。 .

用检索R树中的参与者，我们需要对之前提及的窗口查询过程做一些轻微的修改，如下：首先通过扩大的每条边使之包含四个矩形，矩形用表示，a=1，2,3,4。在R树的遍历中，只有一个结点与四个矩形都相交时才进一步检查该结点。use To retrieve participants in the R-tree, we need to make some slight modifications to the window query process mentioned earlier, as follows: first pass expand Each side of the so that it contains four rectangles, the rectangles are Indicates that a=1, 2, 3, 4. In the traversal of the R-tree, a node is checked further only if it intersects all four rectangles.

网格索引簇：尽管R树索引簇排除了许多没有资格的结点，提高了发现过程的性能，却仍然有三个缺点：Grid index cluster: Although the R-tree index cluster eliminates many unqualified nodes and improves the performance of the discovery process, it still has three disadvantages:

a)每个时间点都要构造或者保持R树，这可能引起较高的代价；a) The R tree must be constructed or maintained at each time point, which may cause a high cost;

b)由于基于密度的簇的形状任意，矩形边界框盒子不能始终获得簇中点的分布，这将影响修剪效果。b) Due to the arbitrary shape of the density-based clusters, the rectangular bounding box cannot always obtain the distribution of points in the clusters, which will affect the pruning effect.

c)暴力提炼仍然需要为这些参与者簇估计其Hausdorff距离。c) Violent refinement still requires estimating the Hausdorff distances for these participant clusters.

为解决这些问题，我们为簇提出一个而基于网格的索引。正如我们不久可以看到的，由于各个时间戳的簇可以共享相同的网格结构，所以网格索引更容易构造。以网格单元为单位可以进行更有效的修剪，而且更接近簇的形状。此外，用网格索引可以设计出更好的改进算法，而且该算法可以不必计算确切的Hausdorff距离便能确认其是否是参与者。To address these issues, we propose a grid-based index for clusters. As we will see shortly, grid indexes are easier to construct since clusters of individual timestamps can share the same grid structure. Grid units allow for more efficient pruning and are closer to cluster shapes. In addition, a better improved algorithm can be devised with grid indexing, and the algorithm can confirm whether it is a participant without calculating the exact Hausdorff distance.

首先，我们用网格划分整个空间，其中每个单元都是边长等于的正方形。对于每个时间点t，浏览一遍簇集合后，可以用两种数据结构构建一个网格索引，叫做每个簇的单元列表，其中记录了被簇占用的单元和每个单元的反向列表，其中存储着覆盖在这个单元上的簇。在描述这个算法之前，先定义一个单元的影响区域（affectregion）。First, we use a grid to divide the entire space, where each cell has an edge length equal to of squares. For each time point t, after browsing the cluster collection, a grid index can be built with two data structures , called each cluster A list of cells, which records the cells occupied by the cluster and each cell An inverted list of , which stores the clusters overlaid on this cell. Before describing this algorithm, first define the influence region of a unit (affectregion).

定义1（影响区域）：给出一个网格G中位于a行b列的单元，它的影响区域是与的最小距离不大于的单元的集合。更确切地说，。Definition 1 (influence area): Given a cell in row a and column b in grid G , its area of influence is the same as The minimum distance is not greater than A collection of units. more specifically, .

直观地，一个单元的影响区域可能包含某些与g中的点的距离不大于的点。现在，给出查询簇（querycluster）（即，一些crowd参与者的最后一个簇）和下一个时间戳的网格索引，算法1的过程RangeSearch()以修剪细化的方式工作，如下所述：Intuitively, the region of influence of a cell may contain some points at distances from g not greater than point. Now, given querycluster (i.e., the last cluster of some crowd participants) and the grid index for the next timestamp , the procedure RangeSearch() of Algorithm 1 works in a pruning and refinement manner, as follows:

在修剪阶段，我们从中选出每个单元g并找出中其单元列表与相交的簇。易知，只有覆盖中的每个单元的影响区域的簇才可以成为参考者，因为否则簇中会存在至少一个离的距离比远的点。In the pruning phase, we start with Select each unit g in and find where its cell list is the same as intersecting clusters. Easy to know, only cover Only the cluster of the area of influence of each unit in can be the reference, because otherwise there will be at least one disconnected distance ratio far away.

在细化阶段，我们将确认每个参与者来决定最终的结果。对于参与者，我们先把一个集合加入和来得到它们的共同单元。后面的原理是同一单元内任意两点之间的距离，且其不大于。一种极限情况是，如果，我们可以立即得出结论。因此，我们只需要检查在不同集合中的单元，即。对于中的点p，不失一般性，我们计算其与的最小距离。注意我们只需要计算点p和落在影响区域内的点的距离，因为其他所有点与p的距离一定大于。In the elaboration phase, we will confirm each participant to decide the final result. for participants , we first add a set to and to get their common unit. The latter principle is the distance between any two points in the same unit, and it is not greater than . A limiting case is if , we can immediately conclude that . Therefore, we only need to check the cells that are in different sets, i.e. . for Point p in , without loss of generality, we calculate its and the minimum distance. Note that we only need to calculate the distance between point p and points that fall within the influence area, because all other points must have a distance from p greater than .

(3)聚集侦查阶段：(3) Gathering and investigation stage:

该阶段会确认上一步得到的每个闭合的群是否是或者是否包含闭合聚集。该阶段提出了以下算法：This stage will confirm whether each closed group obtained in the previous step is or contains a closed aggregation. This phase proposes the following algorithms:

测试划分算法(Test-and-DivideAlgorithm，TAD)：可以在给定的群中高效地侦查出所有闭合的聚集,如下：Test-and-Divide Algorithm (TAD): It can efficiently detect all closed aggregations in a given group, as follows:

算法2中从全体闭合的群开始测试它是否是聚集。如果是，正如所证明的，它是一个闭合的聚集并且可以作为结果立即返回。否则，我们辨别出无效簇，这些簇没有足够的参与者，并通过移除这些簇将群划分成几个子列（一些子列的长度小于k所以可能不是群）。对于每个仍是群的子列，我们再次重复以上步骤因为有些对象此时可能由于无效簇的移除变成非参与者。这个过程递归执行直到再也不会找到其他群为止。Algorithm 2 starts from the whole closed group to test whether it is an aggregate. If it is, as demonstrated, it is a closed aggregate and can be returned immediately as a result. Otherwise, we identify invalid clusters, which do not have enough participants, and divide the group into several subsequences by removing these clusters (some subsequences have length less than k and thus may not be cliques). For each sub-column that is still a cluster, we repeat the above steps again because some objects may become non-participants at this time due to the removal of invalid clusters. This process is performed recursively until no more groups are found.

用位向量签名的高效实现：TAD算法的一种直接实现方式是为群中的每个对象计算其是否发生以判断其是否为参与者，然后检查群中的每个簇中参与者的数量。显然，这么做的时间复杂度为，其中m是中的对象的数量。甚至更糟糕，我们必须为初步得到每个重复以上操作。Efficient implementation with bit vector signatures: A straightforward implementation of the TAD algorithm is to count its occurrences for each object in the swarm to determine if it is a participant, and then check the number of participants in each cluster in the swarm. Obviously, the time complexity of doing this is , where m is The number of objects in . Even worse, we have to repeat the above operation for each of the preliminary get.

为了使TAD有更高效的实现方式，我们为的每个对象构造位向量签名（bitvectorsignature,BVS），而且随后的所有步骤都可以用更快速的位操作符实现。特别地，给出一个群，其每个对象的BVS都是一个长度为n的位向量，该向量的每一位代表相符的簇中o的存在与否。中的所有对象的BVSs可以通过群的单次扫描构造。更重要的是，BVSs只需要构造一次便可在TAD的所有递归过程中使用。In order to make TAD have a more efficient implementation, we provide Each object constructs a bit vector signature (bitvector signature, BVS), and all subsequent steps can be implemented with faster bit operators. In particular, given a group , each of whose objects The BVS is a bit vector of length n, and each bit of the vector represents the presence or absence of o in the matching cluster. BVSs for all objects in can be constructed from a single scan of the group. More importantly, BVSs only need to be constructed once to be used in all recursive processes of TAD.

接下来，我们将详细阐述如何通过优化BVS实现算法2中的和过程。Next, we will elaborate on how to implement the algorithm in Algorithm 2 by optimizing the BVS and process.

a)测试步骤。用代表某个对象o的BVS,过程本质上是计算中1的位数，即一个位向量的汉明权重(Hammingweight)。简单方法是遍历的所有位，但我们使用更高效的方式，其中最好的解决方案是基于二进制树模式的计数。这样，我们先得到的每个2位片中1的数量，再得到每个4位片中1的数量，...，等等。下面的例子展示了只用三步就得到的汉明权重的过程。a) Test steps. use A BVS representing an object o, The process is essentially computing The number of bits of 1 in the middle, that is, the Hamming weight of a bit vector (Hammingweight). The easy way is to traverse , but we use more efficient ways, where the best solution is counting based on binary tree patterns. Thus, we first get The number of 1s in each 2-bit slice of , then the number of 1s in each 4-bit slice, ..., and so on. The following example shows that it takes only three steps to get The process of the Hamming weights.

令，make ,

令m1=01010101， let m1=01010101,

令m2=00110011， Let m2=00110011,

令m4=00001111， Let m4=00001111,

现在x的十进制数为4，恰好等于中1的位数。在上述操作中，m1,m2,m4也叫做面具（masks），且一旦知道了位向量便可更合适地定义它。通常，任意n位的位向量，它的汉明权重可以用log2(n)以内的步骤得出。Now the decimal number of x is 4, which is exactly equal to The number of digits in 1. In the above operations, m1, m2, m4 are also called masks and can be defined more appropriately once the bit vector is known. In general, for any n-bit bit vector, its Hamming weight can be obtained in steps within log2(n).

b)划分步骤。在这一步，如果一个群没有成为聚集，我们将把它划分成一系列子列。本质上是将每个对象的向量划分成一系列子向量。但值得一提的是，没有必要对非参与者的BVSs操作，因为一个非参与者的群必须保持其任一子群都有非参与者。同时，也不必在物理上划分BVS，相反可以只用面具从原始的BVS中提取想要的部分。面具也是一个和BVS长度相同的位向量。其与子群相同的位置位为1，其他位置位为0。通过对原始的BVS和面具进行“与”操作，可以得到一个新的BVS，其中想要的子群的所在位保持为1而其他位为0。这样，只需要返回一系列面具，比返回群的子列更小巧，并通过它进入子列的过程。通过这种方法，过程可以直接用面具得到对象的与子群相符的BVSs，这样可以避免每个子群的BVSs的重构。b) Division steps. In this step, if a group has not become aggregated, we will divide it into a series of sub-columns. Essentially, each object's vector is divided into a series of sub-vectors. But it is worth mentioning that it is not necessary to operate on non-participant BVSs, because a non-participant group must maintain any of its subgroups with non-participants. At the same time, it is not necessary to physically divide the BVS, instead, only the mask can be used to extract the desired part from the original BVS. The mask is also a bit vector of the same length as the BVS. The same bit as the subgroup is set to 1, and the other bits are set to 0. By ANDing the original BVS and the mask, a new BVS can be obtained in which the desired subgroup's location bit remains 1 and the other bits are 0. so, Just need to return a series of masks, smaller than the sub-column of the returned group, and pass it into the sub-column's process. By this method, The process can directly use the mask to get the BVSs of the objects that match the subgroups, which can avoid the reconstruction of the BVSs of each subgroup.

在现实应用中，轨迹常常是逐渐收到的。这样，最新的一批轨迹数据应该周期性（例如，每天，每周或每月）地添加到数据库中。特别地，考虑时间域为的轨迹数据库。在搜集到时间域为的一批新轨迹并添加到中以后，我们得到扩大的时间域，其对应的更新后的数据库。In real-world applications, trajectories are often received gradually. In this way, the latest batch of trajectory data should be periodically (e.g., daily, weekly or monthly) added to the database. In particular, consider the time domain as trajectory database . In the collected time domain as A batch of new trajectories for and add to After middle, we get the extended time domain , which corresponds to the updated database .

这种逐渐地更新造成的一项大挑战是：在更新前的数据库中发现的闭合的群更新后可能再也不是闭合的了，因为它们可能被中的簇扩展。因此，如果一个闭合的聚集中常驻的群被扩展了，则该聚集也有可能改变。为了得到随时间变化的正确结果，一个简单的解决方案是直接用之前提到的技术找时间域对应的整个数据库的聚集。显然随着数据库规模增大，这种方式的代价也在变大，并最终难以承受。为解决这件事，我们提出了一种可高效产生新的闭合聚集的增长算法（incrementalalgorithm），该算法充分利用了在以前的数据库中已经找到的群和聚集的优势。A big challenge with this gradual update is that closed groups found in the pre-update database may not be closed after the update because they may be Cluster expansion in . Therefore, if the resident group in a closed aggregate is expanded, the aggregate may also change. In order to get correct results over time, a simple solution is to directly use the previously mentioned technique to find the time domain Corresponding aggregation of the entire database. Obviously, as the size of the database increases, the cost of this method is also increasing, and it is ultimately unbearable. To address this, we propose an incremental algorithm for efficiently generating new closed aggregates that takes advantage of groups and aggregates already found in previous databases.

1)扩展。首先，引理4表明以前的数据库中只有一部分群(或者群参与者)是可扩展的。1) Expansion. First, Lemma 4 shows that only a fraction of groups (or group participants) in previous databases are scalable.

引理4：给出中的一个闭合的群,如果它的最后一个簇不是在的最近的时间点，即，其中是时间为是簇的集合，则在中不可扩展。Lemma 4: Given a closed group in , if its last cluster is not in the most recent time point of ,in is the time for is a collection of clusters, then exist Not scalable.

基于这个引理，我们只要考虑中簇序列在时结束的集合CS，看它们是否可以扩展成新的群。这些簇序列包含以前的数据库中闭合的群和群参与者(长度仍小于k)。为此，我们对算法1稍作修改，使其它保存最后一个时间戳结束的群参与者和闭合的群。然后，在收到新的轨迹集合并将其转换为簇的数据库之后，即。对算法1的过程做出以下修改：修改将时间游标设置为，并将当前群参与者由V改为CS。Based on this lemma, we only need to consider The middle cluster sequence is in Set CS at the end of time to see if they can be expanded into new groups. These cluster sequences contain closed groups and group participants (still less than k in length) from previous databases. To this end, we slightly modify Algorithm 1 so that the others save the group participants and closed groups whose last timestamp ends. Then, upon receiving a new set of trajectories and after converting it to a database of clusters, i.e. . Make the following modifications to the process of Algorithm 1: Modify the time cursor Set as , and change the current group participant from V to CS.

2）聚集更新：假设中一个群已被扩展成中的一个新的闭合群。现在的目标是找出中的闭合聚集。一般的做法是再次对使用TAD算法。但是一些聚集早在中就已被侦查出来，更明智地使用它可以加快发现过程。当占的一大部分时，这种优化可以带来更多益处。正如之前所说，我们先为的的每个对象构造BVS，然后运行规程以侦查无效簇。接下来的引理表明中的一些无效簇在可以变得有效。2) Aggregate update: Assume in a group has been expanded to A new closed group in . The goal now is to find out Closed aggregates in . The general practice is to again Use the TAD algorithm. But some aggregates are as early as has already been detected in , and using it more judiciously can speed up the discovery process. when Take up This optimization can bring additional benefits when a large portion of . As we said before, we start with Construct the BVS for each object, then run Procedure to detect invalid clusters. The next lemma shows that Some invalid clusters in can become effective.

引理5：用代表群中的无效簇的集合。则有。因中新的簇的加入，中的一些非参与者可能会变成参与者。换言之，中的聚集可能扩大或者与中邻近的聚集融合。然而，如果我们找到了中的一些无效簇，他们也属于，则可以保证在之前的所有闭合聚集在中保持不变。更确切地，我们有以下定理：Lemma 5: Use representative group The collection of invalid clusters in . then there is . because The addition of new clusters in Some non-participants in may become participants. In other words, Aggregations in may expand either with Aggregation and fusion of neighbors. However, if we find Some invalid clusters in , they also belong to , it can be guaranteed that the All previous closures gather at remain unchanged. More precisely, we have the following theorem:

定理2：给出一个无效簇，其中，则任一闭合聚集在中仍保持闭合。Theorem 2: Given an invalid cluster ,in , then any closed aggregate exist remains closed.

根据算法2，我们可以通过优化中聚集的发现过程改善原来的TAD算法。在测试阶段得到一系列无效簇IC之后，我们找出在时间戳之前的无效簇，即，而且不存在使得。定理2确保中的闭合聚集和之前一样。因此只有中的子群需要被进一步检查，因为他们包含新的或者更新后的聚集。According to Algorithm 2, we can optimize The clustering discovery process improves the original TAD algorithm. After getting a series of invalid cluster ICs during the test phase, we find out the previous invalid clusters, i.e. , and there is no make . Theorem 2 ensures that The closed aggregation in is the same as before. Therefore only The subgroups in need to be further examined because they contain new or updated aggregates.

根据现实情况，合理使用以上的算法即可根据轨迹数据库找出其中全部闭合的聚集，并且可以在每次更新完数据库后高效计算出最新结果。According to the actual situation, rational use of the above algorithm can find out all closed aggregations in the trajectory database, and the latest results can be calculated efficiently after each update of the database.

本发明所提出的聚集模型克服了群体模式中生存期内要包含相同个体的缺点,即，成员可以随时加入和离开聚集。The aggregation model proposed by the present invention overcomes the shortcoming of including the same individual during the lifetime in the group model, that is, members can join and leave the aggregation at any time.

同时，我们基于真实的轨迹数据库做了大量实验来证实所提出的概念和算法的效果及效率。实验数据包含北京3个月时间内（2009年3月、4月、5月）由大于33000辆出租车产生的月120K条轨迹。此外，时间域的间隔单位为分，得到了中132480个时间点（60*24*92）。实验结果如下：At the same time, we have done a lot of experiments based on the real trajectory database to verify the effectiveness and efficiency of the proposed concepts and algorithms. The experimental data contains monthly 120K trajectories generated by more than 33,000 taxis in Beijing within 3 months (March, April, and May 2009). In addition, the interval unit of the time domain is minutes, and the 132480 time points (60*24*92) in the middle. The experimental results are as follows:

(1)成效(1) Effectiveness

将一天划分为三个时间段：高峰时间（上午6:00--10:00及下午5:00--8:00），工作时间（早上10:00--下午5:00），休息时间（下午8:00--上午5:00）。图1显示了在一天的这三个时间段里crowds(群),gathings(聚集),swarms和convoys四种模式的平均数量。可以看出：Divide a day into three time periods: peak hours (6:00 am--10:00 am and 5:00-8:00 pm), working hours (10:00 am--5:00 pm), and rest periods (8:00 pm - 5:00 am). Figure 1 shows the average number of crowds (groups), gatherings (gathering), swarms and convoys in these three time periods of the day. As can be seen:

1）高峰时间的聚集最多，其他时间段较少；1) The peak hours gather the most, and other time periods are less;

2）尽管休息时间有很多群，但只有一小部分变成了聚集；2) Although there are many swarms during breaks, only a small portion turns into swarms;

3）高峰时间和休息时间的swarms和convoys比工作时间更多。3) There are more swarms and convoys during peak hours and off hours than working hours.

根据天气情况，将全部的92天划分为三组：晴天、雨天、雪天。实验结果如图2所示：According to the weather conditions, all 92 days are divided into three groups: sunny days, rainy days, and snowy days. The experimental results are shown in Figure 2:

1）晴天的聚集最少，雪天最多；1) The accumulation is the least in sunny days and the most in snowy days;

2）雪天的群和聚集的数量差距较大；2) There is a large gap between the number of groups and gatherings in snowy days;

3）swarms的数量对天气变化不敏感，但convoys在雪天的数量却最少。3) The number of swarms is not sensitive to weather changes, but the number of convoys is the least in snowy days.

(2)效率(2) Efficiency

群发现算法的性能：The performance of the group discovery algorithm:

比较其中的三种剪枝调用：a)SR,基于剪枝的简单R树（用）；b)IR,改善的基于剪枝的R树（用）；c)GRID,基于网格的剪枝。实验结果如下：Compare three of these pruning calls: a) SR, a simple R-tree based pruning (with ); b) IR, an improved pruning-based R-tree (with ); c) GRID, grid-based pruning. The experimental results are as follows:

1）从图3/4/5整体看，IR明显提高了SR的剪枝效果，GRID进一步提高了IR的性能，而且性能比SR至少好一个量级；1) From the overall view of Figure 3/4/5, IR significantly improves the pruning effect of SR, and GRID further improves the performance of IR, and the performance is at least an order of magnitude better than SR;

2）从图3可以看出，随着增大，所有算法的时间代价都在减少；2) As can be seen from Figure 3, with Increasing, the time cost of all algorithms is decreasing;

3）从图4可以看出，随着增大，所有算法的性能都在变差；3) As can be seen from Figure 4, with Increase, the performance of all algorithms is getting worse;

4）从图5可以看出，随着数据库规模增大，所有算法所需时间也随之增加，但基于网格的剪枝对数据库大小最不敏感。4) It can be seen from Figure 5 that as the database size increases, the time required for all algorithms also increases, but the grid-based pruning is the least sensitive to the database size.

聚集侦查算法的性能：The performance of the aggregation scouting algorithm:

比较侦查聚集所用的三种算法：a)暴力方法；b)TAD算法；c)TAD*：用位向量签名实现的TAD算法。实验结果如下：Three algorithms used for reconnaissance aggregation are compared: a) brute force method; b) TAD algorithm; c) TAD*: TAD algorithm implemented with bit vector signature. The experimental results are as follows:

1）从图6/7/8看，易知TAD比暴力方法的性能好一至两个数量级，而TAD*算法又将TAD算法的性能提升了30%；1) From Figure 6/7/8, it is easy to know that the performance of TAD is one to two orders of magnitude better than the brute force method, and the TAD* algorithm improves the performance of the TAD algorithm by 30%;

2）从图6可知，随着增大，暴力方法的时间代价大幅增加，而TAD和TAD*先缓慢增加，后又减少；2) It can be seen from Figure 6 that with increases, the time cost of the brute force method increases significantly, while TAD and TAD* first increase slowly and then decrease;

3）从图7可知，随着增大，暴力方法的时间代价大幅增加，而TAD和TAD*先缓慢增加，后又减少；3) It can be seen from Figure 7 that with The time cost of the brute force method increases significantly, while TAD and TAD* first increase slowly and then decrease;

4）从图8可知，随着增大，暴力方法的时间代价几乎以指数形式增加，而TAD和TAD*的性能也在变差，但改变幅度较小，而更大时，TAD*表现出更多益处。4) It can be seen from Figure 8 that with increases, the time cost of the brute force method increases almost exponentially, and the performance of TAD and TAD* is also getting worse, but the change is small, while When larger, TAD* showed more benefit.

增长算法的性能：Performance of the growth algorithm:

实验结果如下：The experimental results are as follows:

1）从图9可以看出，重构法的时间代价|T_DB|随着时间域扩大而增长，可以预测，随着数据库规模的持续增长，重构法的代价最终将非常大，而群扩展算法所耗费的时间几乎不变；1) It can be seen from Figure 9 that the time cost |T _DB | of the reconstruction method increases with the expansion of the time domain. It can be predicted that as the database scale continues to grow, the cost of the reconstruction method will eventually be very large, and the The time spent by the extended algorithm is almost constant;

2）从图10可以看出，变量r的变化不影响重构法，但随着r的增加，聚集更新算法的效率更高，r指的是以前的群在更新后的群红所占的比例，根据r的大小，分别计算这两种算法的运行时间，群扩展算法并不依赖于r，但r会影响它的运行时间；2) It can be seen from Figure 10 that the change of the variable r does not affect the reconstruction method, but with the increase of r, the efficiency of the aggregation update algorithm is higher, and r refers to the share of the previous group in the updated group red Ratio, according to the size of r, calculate the running time of the two algorithms respectively, the group expansion algorithm does not depend on r, but r will affect its running time;

3）我们也对图2及图3中的其他参数进行了实验，实验结果类似。3) We also experimented with other parameters in Figure 2 and Figure 3, and the experimental results are similar.

以上所述仅为本发明的实施例，并非因此限制本发明的专利范围，凡是利用本发明说明书内容所作的等效结构或等效流程变换，或直接或间接运用在其它相关的技术领域，均同理包括在本发明的专利保护范围内。The above descriptions are only examples of the present invention, and are not intended to limit the patent scope of the present invention. Any equivalent structure or equivalent process transformation made by using the content of the description of the present invention, or directly or indirectly used in other related technical fields, shall be The same reasoning is included in the patent protection scope of the present invention.

Claims

1. A method for quickly discovering aggregation patterns from moving track data, characterized in that the steps include:

(1) Snapshot cluster stage:

Predefined aggregation: if and only if there are at least m _p participants in each snapshot cluster of a cluster Cr, i.e. , Cr is called a cluster, if there is no supergroup in a Cr and it is a cluster, then the cluster is said to be closed, where a snapshot cluster is a group of objects with arbitrary shape and size, crowd is a group, o is the trajectory of the moving object, t is the time threshold of the database

The time point in , o(t) is the position of the moving object o when the time is t, Par(Cr) is a set of participants of a group Cr;

Preset Definition 2: Given a moving object A collection of trajectories, supported by thresholds , variable threshold , and the lifetime threshold , a group Cr is the order of snapshot clusters within consecutive time stamps, ie , which meets the following requirements: Cr, the lifetime of Cr represented by _T is not less than ,Right now ; exist at any time at least object, that is ;The distance between any two consecutive pairs of snapshot clusters is not greater than , ;

Since a snapshot set snapshotcluster is essentially a collection of points, given two point sets P and Q, the hausodrff distance between point sets P and Q defined as: ;

time threshold in the database At each time point of , the trajectories of moving objects are concentrated based on the density to find all the snapshot clusters. First, the original trajectories are simplified by the curve data compression algorithm, and then concentrated in the straight part. Each cluster of the straight part contains objects It is possible to form a snapshot cluster at some point in time, and output the database of the snapshot cluster ;

(2) The discovery phase of the group, that is, from Find all closed groups in :

(2.1) Definition Lemma 1: In the group in, if not present , such that if the group Cr is attached to will produce a new group, then the group Cr is a closed group, otherwise, the group Cr is not closed, among them, the cluster is the snapshot cluster when the time is t _i in the group Cr, the cluster is the snapshot cluster when the time is t _j in the group Cr;

(2.2) Use rangesearch, R-tree index cluster method or grid index cluster method to discover closed groups by attaching snapshot clusters to the current set of group participants V at the next point in time, group participants It is a collection of clusters that may grow into a group, which is equivalent to a candidate group;

The method of arranging index clusters RangeSearch: RangeSearch() is to find the distance from the cluster set at the current timestamp The Hausdorff distance is no greater than clusters, the implementation is to calculate each of , i.e. to calculate for each pair between the current crowd participant and the cluster at the current time point , find out from no greater than all clusters of

(3) Gathering and investigation stage:

Use the test partition algorithm TAD or the bit vector signature test partition algorithm to confirm whether each closed group obtained in the previous step is a closed aggregate or whether it contains a closed aggregate.

2. A method for quickly discovering aggregation patterns from moving track data according to claim 1, wherein the method of using the permutation index cluster is to attach the snapshot cluster to the current group participant at the next point in time The specific steps to find a closed group in the set of V include:

Get the database of the snapshot cluster , preset the support threshold of a group , Preset the threshold of the lifetime of a group and the thresholds for the variables preset in the definition of the group ;

at each timestamp , check the last cluster of each group Cr, and judge whether the group Cr can be expanded by adding another cluster: to obtain the current timestamp, use the formula Calculate the set of clusters Each cluster in to cluster The Hausdorff distance, and find the distance from The Hausdorff distance is not greater than of clusters , where C is the set of clusters, is the set of clusters at the current timestamp, and is contained in ; if a cluster is found Then it can be extended, the extended group Insert as new participant after cluster participant V; if cluster not found That is, it cannot be extended, and when the lifetime of Cr is not less than , then according to Lemma 1, the group Cr is a closed group; if no cluster , and the lifetime of Cr is less than , then Cr is not a group; at any time stamp, a cluster R that cannot be attached to any existing group participant is treated as a new group participant.

3. A method for quickly finding aggregation patterns from moving track data according to claim 1, wherein the method of utilizing the R-tree index cluster is to attach the snapshot cluster to the current group participation at the next point in time. The specific steps to find a closed group in the set of V include:

Get the database of the snapshot cluster , preset the support threshold of a group , preset the threshold of the lifetime of a group and the thresholds of the variables preset in the definition of the group ;

use Represents a cluster The minimum rectangular boundary MBR, with the formula Represents the minimum distance between two rectangles, predefined Lemma 2: Given two clusters and , , c is a cluster in C;

Get the last cluster for each cluster Cr , For any cluster in the group Cr, use the formula Calculate the set of clusters Each cluster in to cluster Hausdorff distance;

use formula look up When , retrieve the set of clusters And take out a set of participants, the set of participants is the same as The minimum distance is not greater than , and then refine these participants to find all , where the R tree is used to index the minimum rectangular boundary of the clusters in the set C of clusters, and a query window is established based on the R tree, and the window is a parameter of of To expand the MBR, the clusters contained in the node and not overlapping with the window are not participants;

Predefined Lemma 3: Let Represents the ath side of the rectangle M, a=(1, 2, 3, 4), defines the distance function for:

, then there is , that is, the calculated distance from Cr is less than A collection of snapshot clusters;

use formula Retrieve the participants in the R-tree, and then refine these participants to obtain the set of clusters satisfying Lemma 3, where the R-tree is used to index the MBRs of the clusters in C, and a query window is built based on the R-tree, which is The parameter is of Expand the MBR, by expand Each side of the so that it contains four rectangles, the rectangles are Indicates that a=(1, 2, 3, 4), in the traversal of the R tree, only when a node intersects with all four rectangles will the node be further checked;

Check the last cluster of each group participant to see if it can be extended by appending another cluster, if so, the expanded group participant is inserted as a new participant after the set V of group participants; if not, it cannot be extended , and the lifetime of Cr is not less than , then according to Lemma 1, the group Cr is a closed group; if it cannot be extended, and the lifetime of Cr is less than , then Cr is not a group; at any time stamp, a cluster R that cannot be attached to any existing group participant is treated as a new group participant.

4. A method for quickly discovering aggregation patterns from moving trajectory data according to claim 1, characterized in that, using the grid index cluster method, the snapshot cluster is attached to the current group participation at the next time point The specific steps to find a closed group in the set of V include:

Define the area of influence: For a cell located in row a and column b in a grid G , its area of influence is the same as The minimum distance is not greater than A collection of units, that is ;

First, use grid G to divide the entire space of group Cr into multiple units g, and each unit has a side length equal to The square of , for each timestamp t, walk through the set of clusters After that, build a grid index with two data structures , the grid index contains a list of cells for each cluster in , where the list of cells Records the cells occupied by the cluster and each unit The reverse list of , which stores the clusters covered on this unit;

The area of influence of a cell consists of points in g at a distance no greater than points of , giving the last cluster of the group Cr as the query cluster and cluster The grid index of the next timestamp corresponding to the time ;

Pruning phase: from Select each unit g in and find where its cell list is the same as Intersecting clusters where only the covering Only the cluster of the influence area of each unit in the cluster can be the reference, otherwise there will be at least one disconnected cluster in the cluster distance ratio far point;

Refinement stage: Since the distance between any two points in the same unit must not be greater than , and in the limit, if ,same , so only the cells in different collections need to be examined , to find the distance in the set of clusters The Hausdorff distance is no greater than The clusters, that is, to retrieve each cluster in the group Cr, for any cluster in the group Cr , first add a set of units to and to get their common unit, for Point p in , calculate point p and The minimum Hausdorff distance of , and only need to calculate to point p and Hausdorff distance to points falling within the area of influence, express and The minimum Hausdorff distance between;

If found from The Hausdorff distance is not greater than of clusters Then it can be extended, the extended group Insert as new participant after cluster participant V; if cluster not found That is, it cannot be extended, and when the lifetime of Cr is not less than , then according to Lemma 1, the group Cr is a closed group.

5. a kind of method according to claim 1 finds aggregation mode fast from moving track data, it is characterized in that, in described utilization test division algorithm, confirm whether each closed group that last step obtains is closed aggregation or Is it included in the closed aggregate:

Preset definition 3: Given a group Cr object o is called a participant if and only if it appears in Cr at least In a snapshot cluster, let represents the collection of snapshot clusters in Cr containing object o, i.e. , then the participant of Cr is the collection of objects ;

Using the test division algorithm, start the test from the whole closed group, calculate whether it participates in activities for each snapshot cluster in the group Cr according to Definition 3 to judge whether it is a participant, and then check the participants in each cluster in the group number, each closed cluster is then tested according to the predefined clustering method to see if it is an aggregate; if not, invalid clusters are identified, which do not have enough participants, and the cluster is divided into several sub-columns by removing these clusters , where, for each subcolumn that is still a group, the test of whether the closed group is an aggregation is repeated again using the predefined aggregation method, until no further groups are found.

6. A kind of method for quickly finding aggregation pattern from moving track data according to claim 1, is characterized in that, described utilizes bit vector signature test division algorithm, confirms whether each closed group that last step obtains is closed The specific steps for aggregation or whether to include closed aggregation include:

a) is the trajectory of each moving object of the group Cr Construct a bit vector signature BVS, and each BVS is a bit vector of length n, each bit of the vector represents the existence of o in the matching cluster, wherein, the BVS of all objects in the group Cr can be passed through the group Single-scan construction, and BVS only needs to be constructed once to be used in all recursive processes of TAD;

b) Test steps: use The BVS representing an object o, to test whether the group Cr is a closed group, is to calculate The number of digits in 1, that is, the Hamming weight of a bit vector, uses traversal The Hamming weight of the bit vector is obtained by means of all bits of the method or the counting method based on the binary tree mode. When using the counting method based on the binary tree mode, first get per The number of 1s in the bit, and then get every 4, that is The number of 1 in the bit, until the mth time, when 2 ^m = n, get the number of 1 in every n bit, any bit vector of n bits, its Hamming weight can be used The following steps are obtained, set the mask m, the mask m is a bit vector with the same length as the BVS; when the value of the Hamming weight is greater than or equal to , the closed group is a closed aggregate or contains a closed aggregate;

c) Division step: If a group is not aggregated, it is divided into a series of sub-columns, that is, the bit vector of the trajectory of each moving object is divided into a series of sub-vectors, extracted from the original BVS with a mask, and the mask and sub-group The same bit is 1, and the other bits are 0; by performing an "AND" operation on the original BVS and the mask, a new BVS is obtained, in which the bit of the desired subgroup remains 1 and the other bits are 0, so, Return a series of masks, and pass the mask into the test step of the subgroup, that is, the test step directly uses the mask to get the BVS of the object that matches the subgroup.

7. A method of quickly discovering aggregation patterns from moving track data according to claim 1, wherein the track of moving objects should be periodically added to the database , the time domain is The database of trajectories of moving objects is , in the collected time domain as new trajectory and add to Later in the middle, the extended time domain is obtained , which corresponds to the updated database .

8. according to claim 6 or 7 described a kind of method of quickly finding aggregation mode from moving trace data, it is characterized in that, after updating database, the specific steps of finding the closed aggregation of new growth comprises:

1) Extension: Predefined Lemma 4: Lemma 4: Given a closed group in , if its last cluster is not in the most recent time point of ,in is the time for is a collection of clusters, then Cr is in is not scalable in , and Lemma 4 shows that only a part of groups or group participants in previous databases are scalable;

judge The middle cluster sequence is in whether the set CS at the end can be expanded into new clusters whose sequence of clusters contains closed clusters in the previous database and cluster participants of length less than k;

After finding a closed group in step (2), save the group participants and closed group at the end of the last timestamp, and then after receiving a new set of trajectories and convert it to a clustered database After that, move the time cursor Set as , and change the current group participant from V to CS;

2) Aggregate update: Assume in a group has been expanded to A new closed group in , then find out When closed aggregation in , directly to Use TAD algorithm or bit vector TAD algorithm;

Predefined Lemma 5: Use IC(Cr) to represent the set of invalid clusters in group Cr, then we have ,because The addition of new clusters in Some non-participants in may become participants, namely Aggregations in may expand either with Neighboring aggregation fusion in ; if found belongs to invalid cluster of , then all closures before tj gather at remains unchanged, that is, Theorem 2: When an invalid cluster is given ,in , then any closed aggregate exist remain closed in

Test steps: When using the bit vector TAD algorithm, first for Constructs a BVS for each moving object and detects invalid clusters;

After getting a series of invalid cluster ICs during the test phase, find out when the time stamp previous invalid clusters, i.e. , , and there is no make cluster , ,only The subgroups in need to be further examined because they contain new or updated aggregates.