CN111459997A

CN111459997A - Frequent mode increment mining method of space-time trajectory data and electronic equipment

Info

Publication number: CN111459997A
Application number: CN202010183569.1A
Authority: CN
Inventors: 钱塘文; 徐勇军; 吴�琳; 邵泽志; 余泳
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2020-03-16
Filing date: 2020-03-16
Publication date: 2020-07-28
Anticipated expiration: 2040-03-16
Also published as: CN111459997B

Abstract

The embodiment of the invention provides a frequent mode increment mining method of space-time trajectory data and electronic equipment, the method is used for mining space-time track data of a ship or an aircraft, firstly clustering track points to form a plurality of clustering clusters, wherein each clustering cluster comprises a clustering center and a clustering range, each clustering center is provided with an identity, extracting track points of which longitude and latitude coordinates fall in the clustering range of the corresponding clustering cluster and expressing the track points by the identity of the clustering center of the clustering cluster to obtain a mapped track expressed by the identity of the clustering center in sequence, the mapped track is mined in a frequent mode, so that the method has strong anti-interference performance, is convenient for identifying the specific repeated track of the target, and is convenient for finding out the track rule of the target so as to provide accurate track prediction or service for related users.

Description

Frequent mode increment mining method of space-time trajectory data and electronic equipment

Technical Field

The invention relates to the technical field of information, in particular to the field of data processing of space-time trajectory data of a discrete target such as a ship and an aircraft, and more particularly relates to a frequent mode increment mining method and electronic equipment of the space-time trajectory data.

Background

The trajectory data is a set of moving points of the moving object in space-time, and data information of the moving points generally includes data such as longitude and latitude, sampling time, and information of the object. Spatiotemporal trajectory data mining is a computational process that uses artificial intelligence, machine learning, statistics, and database interleaving methods to discover patterns (patterrn) in relatively large datasets. In general, the goal of mining is to abstract information from a large data set and translate it into patterns of motion of objects, revealing the laws of motion of the objects. The mining result can also provide help for subsequent track prediction and anomaly detection.

Trajectory data mining is typically accomplished through Frequent Pattern mining, which is a set, sequence, or substructure of items (representing a subset of data) in a data set that occurs frequently. Frequent pattern mining is an important research basis in data mining research topics, and can tell us the sequences that often appear together in an ordered data set, providing some support for possible decisions.

The frequent pattern of spatiotemporal trajectory data is the path a moving object often takes. This path is called the frequent pattern, classical trajectory or classical trajectory template of the object. Classical frequent pattern mining algorithms include a priori correlation analysis algorithm (Apriori), frequent pattern growth algorithm (FP-growth) and prefix scanning algorithm (prefix span). The three algorithms are briefly described below:

the Apriori correlation analysis algorithm (Apriori) uses a top-down approach to expand sufficiently frequent singles into a larger set of terms in a step-by-step expansion manner by identifying a single frequent term in the database whose occurrence satisfies a minimum support threshold.

In order to solve the I/O bottleneck of frequently traversing the Frequent items in the database when the prior association analysis algorithm generates the candidate set, a Frequent pattern mining algorithm is provided on the basis of designing a tree structure (frequency pattern tree) for storing Frequent patterns, and the Frequent pattern mining algorithm can mine the Frequent patterns by only traversing the database twice, so that the Frequent pattern mining algorithm is greatly improved compared with an Apriori algorithm.

Compared with the first two algorithms, the prefix scan algorithm (prefix span) considers the precedence order of the frequent items, and does not generate candidate sequences in the mining process, and the size of the generated postfix projection Database (Projected Database) is also continuously reduced. The prefix scanning algorithm is an efficient frequent pattern mining algorithm because candidate sequences are not generated and the I/O overhead of a database is low.

When the frequent pattern is mined, when the occurrence frequency of a certain track sequence in the track data reaches a preset frequency, the track sequence is considered to be the frequent pattern of the track, or called as the frequent sequence. However, for ships on the sea surface or aircrafts in the air, it is difficult to obtain the frequent patterns thereof directly according to the existing frequent pattern mining method. Taking a ship as an example, since a sea surface channel is a strip-shaped area, the reciprocating tracks have larger intervals and more discrete track points compared with the reciprocating tracks of pedestrians and vehicles on the road surface. Moreover, the positioning device of the ship has errors, the longitude and latitude of the positioning device are represented by 28-bit floating point numbers, and the situation that the track points are basically overlapped or completely overlapped is difficult to occur on the track points represented by the longitude and latitude, so that the track points are difficult to traverse for multiple times accurately, frequent sequences in the track are difficult to find accurately, and accurate track prediction or service cannot be provided for related users.

Disclosure of Invention

Therefore, the present invention is directed to overcome the above-mentioned drawbacks of the prior art, and to provide a method and an electronic device for frequent pattern incremental mining of spatiotemporal trajectory data.

The purpose of the invention is realized by the following technical scheme:

according to a first aspect of the present invention, there is provided a frequent pattern incremental mining method for spatiotemporal trajectory data, comprising the steps of:

s1, acquiring newly added space-time trajectory data of the target and preprocessing the newly added space-time trajectory data;

s2, clustering the preprocessed space-time trajectory data to obtain a plurality of clustering clusters, wherein each clustering cluster comprises a clustering center and a clustering range, and each clustering center is provided with an identity mark;

s3, mapping the track points in the preprocessed space-time track data to the corresponding clustering centers according to a time sequence to extract the track points of which the longitude and latitude coordinates fall in the clustering range of the corresponding clustering class clusters and represent the track points by the identity marks of the clustering centers to obtain mapped tracks which are sequentially represented by the identity marks of the clustering centers, wherein the mapped tracks contain various sequences;

and S4, performing frequent pattern mining based on the mapped track of the target to obtain the occurrence times of various sequences in the mapped track of the target and taking the sequence with the occurrence times reaching the preset times as the frequent sequence of the target.

In some embodiments of the invention, the preprocessing comprises cleaning and cutting, wherein the cleaning comprises sorting track points in the spatiotemporal track data of the target according to time and removing track points with invalid longitude and latitude; the cutting comprises the step of cutting the space-time track data of the target from the position where the time interval of the adjacent track points exceeds the preset time interval.

In some embodiments of the invention, said step 2 comprises:

s21, clustering the preprocessed space-time trajectory data of the target which is not subjected to clustering processing directly; and/or

And S22, performing incremental clustering on the preprocessed space-time trajectory data of the clustered target on the basis of historical clustering clusters, wherein the preprocessed space-time trajectory data of the clustered target is newly added trajectory data obtained by preprocessing the space-time trajectory data newly stored after the previous clustering.

Preferably, the step S21 includes:

s211, calculating the number of track points in the neighborhood of each track point in the preprocessed space-time track data, wherein the neighborhood of the track point is a range contained by a circle with the track point as a circle center and a preset radius as a radius;

s212, taking the trace points with the number of the trace points in the neighborhood being more than or equal to the density threshold value as core points;

and S213, merging the neighborhoods of the core points with the intersected neighborhoods to form a cluster, acquiring the track point with the minimum sum of the distances between the track points and all other track points in the cluster as a cluster center, distributing an identity for each cluster center, and taking the range covered by a circle which takes the cluster center as the center of circle and the distance between the track point farthest from the cluster center in the cluster and the cluster center as the radius as a cluster range.

Preferably, the step S22 includes:

s221, when a newly added track point in the newly added track data is in the clustering range of the historical clustering cluster and the newly added track point can be used as the clustering center of a clustering cluster in the newly added track data, combining the clustering cluster where the newly added track point is located and the historical clustering cluster into a clustering cluster, and re-determining the clustering center and the clustering range of the clustering cluster, wherein the re-determined clustering center uses the identity of the original clustering center of the historical clustering cluster and/or the identity of the original clustering center of the historical clustering cluster

When a certain newly added track point in newly added track data is in the clustering range of a historical clustering cluster but the newly added track point cannot serve as the clustering center of the clustering cluster in the newly added track data, adding all track points in the neighborhood of the newly added track point into the historical clustering cluster, merging the track points into the historical clustering cluster, and re-determining the clustering center and the clustering range of the cluster, wherein the re-determined clustering center uses the identity of the original clustering center of the historical clustering cluster;

s222, clustering new track points which are not merged into the new clustering clusters in the new track data and historical track points which do not belong to any historical clustering clusters in the historical track data before the new track data are clustered to obtain new clustering clusters, determining clustering centers and clustering ranges of the new clustering clusters, and distributing identity marks for the new clustering centers.

Preferably, the step S222 includes:

s2221, calculating the number of track points in the neighborhood of each track point in the newly added track data which is not merged into the new clustering class cluster and the historical track data before the newly added track data which does not belong to any historical clustering class cluster, wherein the neighborhood of the track point is the range contained by a circle with the track point as the center of circle and the radius as the preset radius;

s2222, taking the track points with the number of the track points in the neighborhood greater than or equal to the density threshold as core points;

s2223, the neighborhoods of the core points where the neighborhoods intersect with each other are merged to be used as a cluster class cluster, the track point with the minimum sum of the distances between the cluster class cluster and all other track points is obtained to be used as a cluster center, an identity is distributed to each cluster center, and a range covered by a circle which takes the cluster center as a circle center and the distance between the track point which is the farthest from the cluster center in the cluster class cluster and the cluster center as a radius is used as a cluster range.

In some embodiments of the invention, in the step S4, the performing frequent pattern mining includes: the frequent pattern mining is performed directly when the target has not undergone frequent pattern mining, and/or incrementally on the basis of a frequent sequence of prior mining when the target has undergone frequent pattern mining.

Preferably, the step of directly performing frequent pattern mining when the target has not undergone frequent pattern mining is to obtain the occurrence frequency of various sequences in the current mapped track based on the current mapped track of the target and take the sequence with the occurrence frequency reaching the preset frequency as the frequent sequence of the target; the incremental frequent pattern mining is carried out on the basis of the frequent sequences mined in the previous period when the target is subjected to frequent pattern mining, wherein the occurrence times of various sequences obtained on the basis of the mapped track of the target at this time are merged with the occurrence times of various sequences mined in the history, and the sequences of which the occurrence times reach the preset times after merging are used as the frequent sequences of the target.

According to a second aspect of the present invention, there is provided an electronic apparatus comprising: one or more processors; and a memory, wherein the memory is to store one or more executable instructions; the one or more processors are configured to implement the method as described in the first aspect via execution of the one or more executable instructions.

Compared with the prior art, the invention has the advantages that:

the invention firstly clusters the track points to form a plurality of cluster clusters, each cluster comprises a cluster center and a cluster range, each cluster center is provided with an identity, the track points of which the longitude and latitude coordinates fall in the cluster range of the corresponding cluster are extracted and expressed by the identity of the cluster center of the cluster to obtain a mapped track expressed by the identity of the cluster center in sequence, and the mapped track is subjected to frequent pattern mining. Moreover, for the target subjected to frequent pattern mining, the incremental frequent pattern mining is adopted, so that the data volume of repeated calculation is reduced, and the timeliness of the frequent pattern mining is improved. In general, the invention has the advantages of being not easily influenced by noise, strong anti-interference performance and good adaptability to discrete data.

Drawings

Embodiments of the invention are further described below with reference to the accompanying drawings, in which:

FIG. 1 is a simplified flow diagram of a prior art method of mining frequent sequences;

FIG. 2 is a simplified flow diagram of a method for frequent pattern incremental mining of spatiotemporal trajectory data in accordance with an embodiment of the present invention;

fig. 3 is a simplified flowchart of a method for mining frequent sequences according to yet another embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail by embodiments with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As mentioned in the background section, in the field of ships or aircraft, where the channel of the ship or aircraft is wide, as shown in fig. 1, frequent sequences are mined according to a method of the prior art, including: k1, reading space-time trajectory data of a target from a database, K2, cleaning the space-time trajectory data of the target, K3, compressing the cleaned space-time trajectory data, K4, mining a frequent pattern based on the compressed space-time trajectory data to generate a frequent sequence, and K5, storing the generated frequent sequence into the database; the track data is relatively discrete, and the frequent sequence of the corresponding target is difficult to directly extract from the track data. The invention provides a new frequent pattern mining algorithm, which clusters track points to form a plurality of Cluster clusters (Cluster), each Cluster comprises a Cluster center and a Cluster range, each Cluster center is provided with an identity, track points with longitude and latitude coordinates in the Cluster range of the corresponding Cluster are extracted and represented by the identity of the Cluster center of the Cluster, mapped tracks represented by the identity of the Cluster center in sequence arrangement are obtained, and the mapped tracks are subjected to frequent pattern mining. Moreover, for the target subjected to frequent pattern mining, the incremental frequent pattern mining is adopted, so that the data volume of repeated calculation is reduced, and the timeliness of the frequent pattern mining is improved. In general, the invention has the advantages of being not easily influenced by noise, strong anti-interference performance and good adaptability to discrete data.

According to an embodiment of the present invention, a method for incremental mining of frequent patterns of spatiotemporal trajectory data is provided, which, in summary, with reference to fig. 2, mainly includes the following steps:

s3, mapping the track points in the preprocessed space-time track data to the corresponding clustering centers according to a time sequence to extract the track points of which the longitude and latitude coordinates fall in the clustering range of the corresponding clustering clusters and represent the track points by the identification of the clustering centers to obtain mapped tracks which are sequentially represented by the identification of the clustering centers, wherein the mapped tracks contain various sequences;

Preferably, the method is mainly used for mining the space-time trajectory data of the ship or the aircraft.

For a better understanding of the present invention, each step is described in detail below with reference to specific examples or illustrations.

In step S1, new spatiotemporal trajectory data of the target is acquired and preprocessed.

For example, the newly added space-time trajectory data of the target can be read from a latest processing Record table (L ast Handled Record) of the database, only the newly added space-time trajectory data after the target is subjected to the frequent pattern mining last time can be obtained when the target is subjected to the frequent pattern mining, and all the space-time trajectory data of the target stored in the database can be obtained when the target is not subjected to the frequent pattern mining.

Preferably, the present invention can divide all targets into processed targets and unprocessed targets, the processed targets correspond to targets that have undergone frequent pattern mining, the unprocessed targets correspond to targets that have not undergone frequent pattern mining, and then generate corresponding query SQ L statements to obtain data FROM the database, wherein unique identifier information of all targets is obtained FROM the database, i.e. all targets are formed, the processed targets, i.e. targets obtained in this step that have recorded the last processing Time, the unprocessed targets, i.e. targets that have not been obtained in this step, the processed and unprocessed query SQ L statements are not identical, so different query SQ L statements need to be generated to the database for obtaining trajectory data of the targets, for example, for a processed target, a query SQ 7 is generated as "SE L Time, L initial, L root texture _ destination target, which represents" Time when the last processing Time — term ", wherein the processed target identifier represents SQ 19, xxxxxxxxxxxxxxxxy > when the processed target represents the last processing Time — 3, wherein the processed target identifier represents the last query SQ xxxxxxxiy — 3 represents the unique query.

With respect to preprocessing, it should be understood that the newly added spatiotemporal data of a target is preprocessed to process it into data that meets the requirements of subsequent processing. While the pre-treatment may include various treatment forms such as cleaning, cutting and/or trajectory compression according to different needs of users, in the present invention, cleaning, cutting and trajectory compression may be performed sequentially. It should be noted that the present invention is only exemplary and should not be construed as limiting the present invention in any way. Several forms of pretreatment are described below:

preferably, the cleaning comprises sorting the track points in the space-time track data of the target according to time and removing track points with invalid longitude and latitude. The trace data is read out of the database out of order and therefore needs to be sorted by time. Track points with invalid longitude and latitude refer to points with longitude ranges not between (-180,180) and latitude ranges not between (-90, 90).

Preferably, the cutting comprises cutting the spatiotemporal trajectory data of the object from the time interval of adjacent trajectory points exceeding a preset time interval. The cutting is to cut the track from some places where the interval time between some adjacent points is too long, so as to obtain the track segment with semantic cleaning. For example, if the preset time is set to one day, the spatiotemporal trajectory data of the target is cut from the position where the time interval between the adjacent trajectory points exceeds one day, that is, the time difference between two adjacent trajectory points exceeds one day, and then the target is cut from the middle of the two adjacent trajectory points.

Preferably, the trajectory compression is compressed by using one of a Dynamic programming algorithm (DP algorithm), a Huffman compression algorithm (Huffman) and a differential encoding compression.

In step S2, clustering the preprocessed spatio-temporal trajectory data to obtain a plurality of cluster clusters, where each cluster includes a cluster center and a cluster range, and each cluster center is provided with an identity.

Preferably, step S2 includes:

Preferably, step S21 includes:

Preferably, after the cluster clusters are formed, the cluster center, the cluster range, the identity of the cluster center and all the track points forming the cluster clusters are stored in the database. After the data is stored in the database, the information of the historical clustering clusters can be called according to the needs when incremental clustering processing is carried out on the basis of the historical clustering clusters, and the historical spatiotemporal trajectory data does not need to be called out completely and then calculated, so that the processing efficiency is improved.

For ease of understanding, the principle of step S21 is described below using another angle to illustrate its working principle:

the space-time trajectory data of the target comprises a plurality of trajectory points, each trajectory point is taken as a circle center, a set clustering radius (EPS, also referred to as the preset radius) is taken as a radius to draw a circle, and the circle is called a neighborhood of the trajectory point (EPS neighborhood);

counting the track points contained in the circle, if the number of points in the circle exceeds a density threshold (MinPts), marking the circle center of the circle as a core point (also called a core object), and if the number of points in a neighborhood of a certain point is smaller than the density threshold but falls in the neighborhood of the core point, marking the point as a boundary point, wherein the points which are not the core point or the boundary point are noise points;

all points in the neighborhood of the core point are direct density direct of the core point, if one core point xj is direct density of the core point xi, the core point xk is direct density of the core point xj, and the core point xn is direct density of the core point xk, then the core point xn is reachable density of the core point xi, which explains the transitivity of direct density, and can deduce density reachability;

if the core point xi and the core point xj can be reached by the density of the core point xk for the core point xk, the core point xi and the core point xj are called to be connected in density, and the neighborhood of the core points connected in density is connected together to form a cluster type cluster. And points of which the longitude and latitude coordinates do not fall in any cluster are not extracted during mapping, which is equivalent to discarding noise points.

According to an example of the present invention, the range of the preset radius corresponding to the ship may be, for example, 0.5 to 1 km. The value range of the preset radius corresponding to the aircraft can be 8-30 km, for example. The density threshold value can be, for example, 5-10 per neighborhood. For example, if the target is a ship, the preset radius is set to 0.8km, and the density threshold is set to 6/neighborhood, when new space-time trajectory data of the target is clustered, if 11 points are shared in the neighborhood of one trajectory point (the trajectory point is taken as the center of a circle, and the coverage range with the radius of 0.8km is taken as the radius), and 11 is greater than 6, the trajectory point is a core point; if the neighborhood of one track point has 3 track points, and 3 is less than 6, the track point is not a core point. It should be noted that the above value ranges are only illustrative, and the corresponding values can be set according to the actual needs of the user according to the specific target channel width and the data size of the spatio-temporal trajectory data. For example, if the channel of a ship is actually wider, for example, 3km, the value of the corresponding preset radius may be set to 3 km. For another example, if the data size of the space-time trajectory data of a ship is large, in order to mine a deeper frequent pattern, the density threshold may be set to be larger according to the actual needs of the user, for example, the density threshold is set to be 30/neighborhood, or even larger.

Preferably, step S22 includes:

s222, clustering new track points which are not merged into the new clustering clusters in the new track data and historical track points which do not belong to any historical clustering clusters in the historical track data before the new track data are clustered to obtain new clustering clusters, determining clustering centers and clustering ranges of the new clustering clusters, and distributing identity marks for the new clustering centers. The technical scheme of the embodiment can at least realize the following beneficial technical effects: the invention can use the historical clustering clusters generated in the earlier stage to perform clustering, and the data of the track points in the clustering clusters are called to re-determine the clustering center and the clustering range of the merged clustering clusters only when certain clustering clusters are determined to be merged, thereby reducing the clustering time and improving the timeliness.

Preferably, in step S221, the manner for re-determining the cluster center and the cluster range of the merged cluster is the same as the manner for determining the cluster center and the cluster range of a new cluster. That is, in both cases, the track point with the smallest sum of the distances from the cluster class to all other track points is first obtained as the cluster center, and the range covered by the circle with the cluster center as the center of circle and the distance between the track point farthest from the cluster center and the cluster center as the radius is used as the cluster range. The difference is that the cluster center of the merged cluster continues to use the identity of the original cluster center of the historical cluster before merging, and the new cluster can obtain a newly allocated identity for the new cluster.

Preferably, step S222 includes:

In step S3, the trajectory points in the preprocessed spatio-temporal trajectory data are mapped to the corresponding clustering centers according to the time sequence to extract the trajectory points whose longitude and latitude coordinates fall in the clustering range of the corresponding clustering class clusters and are represented by the identifiers of the clustering centers, so as to obtain mapped trajectories represented by the identifiers of the clustering centers in sequence, wherein the mapped trajectories contain various sequences.

Preferably, mapping the track points in the preprocessed space-time trajectory data to the corresponding clustering centers according to the time sequence means that the track points whose longitude and latitude coordinates fall in the clustering range of the corresponding clustering class cluster are extracted according to the time sequence of the track points in the space-time trajectory data and are represented by the identity of the clustering center, the track points which do not fall in the clustering range of any clustering class cluster are discarded (the track points are regarded as noise points), and the mapped tracks represented by the identity of the clustering centers in sequence are obtained.

According to an example of the present invention, it is assumed that there are 5 cluster clusters, the identifiers of the cluster centers are A, B, C, D, E respectively, and it is assumed that a track contains 4 track points, and the information of the track points includes longitude, latitude and timestamp, which are:

track points 1: (1.1, 1.1, 2020-03-0821: 00:00),

Track points 2: (1.2,1.2,2020-03-0821: 01:00),

Track points 3: (5.5,5.3,2020-03-0821: 30:00),

Track points 4: (9.9,4.5,2020-03-0822: 00: 00);

wherein, 1 st, 2 track point is mapped to clustering center A, 3 rd track point is mapped to clustering center B, 4 th track point is mapped to clustering center C, then the orbit after mapping represents:

(A,2020-03-08 21:00:00)、

(A,2020-03-08 21:01:00)、

(B,2020-03-08 21:30:00)、

(C,2020-03-08 22:00:00)。

in step S4, based on the mapped trajectory of the target, frequent pattern mining is performed to obtain the number of occurrences of various sequences in the mapped trajectory of the target and to take the sequence whose number of occurrences reaches a preset number as the frequent sequence of the target.

Preferably, performing frequent pattern mining includes: the frequent pattern mining is performed directly when the target has not undergone frequent pattern mining, and/or incrementally on the basis of a frequent sequence of prior mining when the target has undergone frequent pattern mining.

Preferably, when the target has not undergone frequent pattern mining, the step of directly performing frequent pattern mining is to obtain the occurrence frequency of various sequences in the current mapped track based on the current mapped track of the target and take the sequence with the occurrence frequency reaching the preset frequency as the frequent sequence of the target.

Preferably, the incremental frequent pattern mining is performed on the basis of a frequent sequence mined in the previous period when the target has undergone frequent pattern mining, and the incremental frequent pattern mining is performed by merging the occurrence number of various sequences obtained based on the mapped trajectory of the target at this time with the occurrence number of various sequences mined in the history, and taking a sequence whose occurrence number reaches a preset number after merging as the frequent sequence of the target. The value range of the preset times can be, for example, 3-10 or 3-100. It should be noted that this is only an example, and in an actual application scenario, the setting may be performed according to the needs of the user. For example, if the preset number of times is 10, the corresponding sequence appears 10 times or more, and is considered as a frequent sequence of the target.

Preferably, in step S4, the performing incremental frequent pattern mining on the basis of the frequent sequence of earlier mining when the target has undergone frequent pattern mining includes: combining the occurrence times of various sequences obtained based on the current mapped track of the target with the occurrence times of various sequences mined in the early period, connecting the sequence containing the last track point in the previously mined sequence with the sequence containing the first track point in the currently mined sequence when the time interval between the last track point in the previously mapped track and the first track point in the previously mapped track is less than the preset time interval, and updating the occurrence times of the corresponding connected sequences so as to obtain the occurrence times of various sequences in the mapped track of the target and take the sequence of which the occurrence times reaches the preset times as the frequent sequence of the target.

Preferably, the algorithm adopted when the frequent pattern mining is performed is a priori correlation analysis algorithm, a frequent pattern growth algorithm or a prefix scanning algorithm.

Preferably, when frequent pattern mining is performed, the minimum granularity of the mined sequences may be set by a user. And the minimum granularity is the point number of the trace points which are minimum contained in the mined sequence. For example, if the minimum granularity is 3, for the trajectory A, A, B, C, only the sequences A, A, B, A, A, C, A, B, C, and A, A, B, C are considered during mining, and sequences with trace points less than 2 are not considered.

A simplified example is given below to illustrate how the number of occurrences of various sequences in the mapped trace can be calculated.

According to an example of the present invention, for simplicity of illustration, assuming that the preset number is 3, the minimum granularity is set to 3, the spatiotemporal trajectory data of the target has not been mined before, and the mapped trajectory of the target includes the following five items (calculating the present number, only analyzing the identification corresponding to the trajectory point, and ignoring the time parameter, so for simplicity, the time parameter of each point is not shown below):

the first one is: A. b, C, respectively;

a second bar: A. b, C, D, respectively;

and a third: A. b, C, E, respectively;

fourth, the method comprises the following steps: A. b, E, respectively;

the fifth step: A. c, D, respectively;

obtaining the occurrence times of various sequences in the mapped track of the target to obtain:

sequence A, B, C, occurring 3 times;

sequence A, C, D, occurring 2 times;

sequence A, B, D, occurring 1 time;

sequence B, C, D, occurring 1 time;

sequence A, B, E, occurring 2 times;

sequence A, C, E, occurring 1 time;

sequence B, C, E, occurring 1 time;

sequence A, B, C, D, occurring 1 time;

sequence A, B, C, E, occurring 1 time;

thus, only the number of occurrences of the sequence A, B, C, which is a frequent sequence of the target, reaches the preset number.

According to still another embodiment of the present invention, referring to fig. 3, there is provided a frequent pattern incremental mining method of spatiotemporal trajectory data, including:

t1, reading the record of the latest processing of the target from the database, and determining the time of the space-time trajectory data of the latest processing target;

t2, dividing all the targets into processed targets and unprocessed targets according to the time of the space-time trajectory data of the last processed target, and then generating a corresponding query SQ L statement to a database to obtain the space-time trajectory data of the targets;

t3, cleaning and cutting the acquired space-time trajectory data of the target;

t4, compressing the cleaned and cut space-time trajectory data;

t5, according to the compressed space-time trajectory data, directly clustering the compressed space-time trajectory data of the unprocessed target, and performing incremental clustering on the compressed space-time trajectory data of the processed target on the basis of historical clustering;

t6, mapping the compressed space-time trajectory data to the clustering center of the clustering cluster according to the compressed space-time trajectory data to obtain a mapped trajectory which is expressed by the identity of the clustering center in sequence;

t7, based on the mapped track of the target, directly performing frequent pattern mining when the target has not undergone frequent pattern mining, and performing incremental frequent pattern mining on the basis of the frequent sequence mined in the previous period when the target has undergone frequent pattern mining to generate the frequent sequence of the target;

t8, updating the frequent sequence and last processed record of the object saved in the database.

Preferably, the steps T4, T5 in the example and the steps S2, S3 in the method embodiment are illustrated below by a piece of pseudo code.

The general interpretation of the meaning of the lines of the above pseudo code is as follows:

lines 1-7: searching cluster type clusters, cluster centers and cluster ranges, mapping the compressed track points to the cluster centers, and sequentially expressing the points falling in the cluster type clusters through the sequence of the cluster centers to form mapped tracks;

lines 8-17: the schematic specific way of acquiring the cluster center and the cluster range is shown, namely: calculating the sum of the distances from each point to the rest points in the cluster and the maximum distance from the rest points in the cluster to the point, taking the point with the minimum sum of the distances from the rest points as a cluster center, and taking the maximum distance from the rest points in the cluster to the point as a cluster range;

lines 18-24: an exemplary specific way of mapping the compressed trace points to the trace composed of the cluster centers is shown, namely: if a track point falls in the corresponding clustering range, the point is represented by the clustering center of the clustering range, and the points are extracted to obtain a mapped track.

Preferably, step T7 in the example and step S4 in the method embodiment are illustrated by a piece of pseudo code below.

lines 1-6: performing incremental frequent pattern mining, namely: extracting cluster clusters excavated in the early stage as an initial cluster set, performing incremental clustering, expanding the cluster set of the cluster clusters, updating a cluster center, finding a point with the minimum average distance or the minimum sum of the average distance and the distance of the point with the rest points in the cluster clusters as the cluster center, taking the maximum distance between the point with the rest points as the distance range of the cluster clusters, excavating frequent sequences based on the expanded cluster clusters, and storing the excavated results to complete the expansion of frequent patterns;

lines 7-20: an exemplary embodiment of incremental clustering includes: firstly, traversing a newly added point set once, comparing each point with a current cluster set, marking the point to be accessed and searching the neighborhood of the point if the distance between the point and the cluster center is less than the cluster range of the cluster center stored in a DM field, clustering and expanding the point if the point can be used as the cluster center in the newly added point set, and finally adding all the points of the point in the cluster of the newly added point set into the matched original cluster; otherwise, only adding the points in the neighborhood of the point into the matched original cluster; updating the clustering center, clustering and mining the points which are not matched by the original clustering cluster, and adding the points which can become the clustering center into the clustering set;

lines 21-26: an exemplary specific way to update the cluster center and cluster extent is: calculating the sum of the distances from each point to the rest points in the cluster and the maximum distance from the rest points in the cluster to the point, taking the point with the minimum sum of the distances from the rest points as a cluster center, and taking the maximum distance from the rest points in the cluster to the point as a cluster range.

Preferably, step T5 includes: t51, analyzing whether the target has been processed, if not, going to step T52, if yes, going to step T53; t52, directly clustering the compressed space-time trajectory data of the unprocessed target; and T53, performing incremental clustering processing on the compressed space-time trajectory data of the processed target on the basis of historical clustering clusters.

Preferably, step T7 includes: t71, analyzing and analyzing whether the target has undergone frequent pattern mining, if not, going to step T72, and if yes, going to step T73; t72, directly carrying out frequent pattern mining when the target does not undergo frequent pattern mining; and T73, performing incremental frequent pattern mining on the basis of the frequent sequence of the previous mining when the target is subjected to the frequent pattern mining.

Preferably, in step T7, the performing incremental frequent pattern mining on the basis of the frequent sequence of earlier mining when the target has undergone frequent pattern mining includes: combining the occurrence times of various sequences obtained based on the current mapped track of the target with the occurrence times of various sequences mined in the early period, connecting the sequence containing the last track point in the previously mined sequence with the sequence containing the first track point in the currently mined sequence when the time interval between the last track point in the previously mapped track and the first track point in the previously mapped track is less than the preset time interval, and updating the occurrence times of the corresponding connected sequences so as to obtain the occurrence times of various sequences in the mapped track of the target and take the sequence of which the occurrence times reaches the preset times as the frequent sequence of the target.

Preferably, in step T8, the update database is, for example, a list of most recently processed records and a frequent sequence of objects for updating the MySQ L database records.

There is also provided, in accordance with an embodiment of the present invention, electronic apparatus including: one or more processors; and a memory, wherein the memory is to store one or more executable instructions; the one or more processors are configured to implement the methods of the foregoing embodiments via execution of the one or more executable instructions.

It should be noted that, although the steps are described in a specific order, the steps are not necessarily performed in the specific order, and in fact, some of the steps may be performed concurrently or even in a changed order as long as the required functions are achieved.

The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.

The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may include, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A frequent mode increment mining method of spatio-temporal trajectory data is characterized by comprising the following steps:

2. The frequent pattern incremental mining method of spatiotemporal trajectory data according to claim 1, characterized in that said preprocessing comprises cleaning and cutting, wherein said cleaning comprises sorting trajectory points in the spatiotemporal trajectory data of the target by time and rejecting trajectory points whose longitude and latitude are invalid; the cutting comprises the step of cutting the space-time track data of the target from the position where the time interval of the adjacent track points exceeds the preset time interval.

3. The method for frequent pattern incremental mining of spatiotemporal trajectory data according to claim 1, wherein said step 2 comprises:

4. The method for frequent pattern incremental mining of spatiotemporal trajectory data as claimed in claim 3, wherein said step S21 comprises:

5. The method for frequent pattern incremental mining of spatiotemporal trajectory data as claimed in claim 3, wherein said step S22 comprises:

6. The method for frequent pattern incremental mining of spatiotemporal trajectory data as claimed in claim 5, wherein said step S222 comprises:

7. The method for frequent pattern incremental mining of spatiotemporal trajectory data as claimed in claim 1, wherein in said step S4, said performing frequent pattern mining comprises: the frequent pattern mining is performed directly when the target has not undergone frequent pattern mining, and/or incrementally on the basis of a frequent sequence of prior mining when the target has undergone frequent pattern mining.

8. The method of frequent pattern incremental mining of spatiotemporal trajectory data according to claim 7,

the direct frequent pattern mining when the target is not subjected to frequent pattern mining is to obtain the occurrence times of various sequences in the current mapped track based on the current mapped track of the target and take the sequence with the occurrence times reaching the preset times as the frequent sequence of the target;

the incremental frequent pattern mining is carried out on the basis of the frequent sequences mined in the previous period when the target is subjected to frequent pattern mining, wherein the occurrence times of various sequences obtained on the basis of the mapped track of the target at this time are merged with the occurrence times of various sequences mined in the history, and the sequences of which the occurrence times reach the preset times after merging are used as the frequent sequences of the target.

9. A computer-readable storage medium, having embodied thereon a computer program, the computer program being executable by a processor to implement the method of any one of claims 1 to 8.

10. An electronic device, comprising:

one or more processors; and

a memory, wherein the memory is to store one or more executable instructions;

the one or more processors are configured to implement the method of any of claims 1-8 via execution of the one or more executable instructions.