CN113032391B - Distributed sub-track connection query processing method - Google Patents

Distributed sub-track connection query processing method Download PDF

Info

Publication number
CN113032391B
CN113032391B CN202110162264.7A CN202110162264A CN113032391B CN 113032391 B CN113032391 B CN 113032391B CN 202110162264 A CN202110162264 A CN 202110162264A CN 113032391 B CN113032391 B CN 113032391B
Authority
CN
China
Prior art keywords
track
time
partition
query
space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110162264.7A
Other languages
Chinese (zh)
Other versions
CN113032391A (en
Inventor
陈刚
常志豪
张东祥
陈珂
寿黎但
伍赛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202110162264.7A priority Critical patent/CN113032391B/en
Publication of CN113032391A publication Critical patent/CN113032391A/en
Application granted granted Critical
Publication of CN113032391B publication Critical patent/CN113032391B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24558Binary matching operations
    • G06F16/2456Join operations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Navigation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a distributed sub-track connection query processing method. Firstly, carrying out mixed partition processing on track data, namely firstly carrying out time partition on the track data based on time information and then carrying out space partition on the track data in the same time partition based on space position information; establishing an index in each time partition; in the subsequent query process, firstly partitioning the query tracks according to the same time interval, and performing parallel query in corresponding time partitions to obtain a series of candidate tracks; then loading the space partition data corresponding to each candidate track into a memory, and verifying the space partition data one by one; and finally, merging the data obtained by each time partition. The method can support the inquiry of the city-level GPS points, effectively reduce the processing overhead of I/O and CPU, accelerate the inquiry processing and have good performance.

Description

Distributed sub-track connection query processing method
Technical Field
The invention belongs to the technical field of space database systems, and particularly relates to a distributed sub-track connection query processing method on GPS track data.
Background
In the public health field, close contact person tracking is a process of identifying persons who have close contact with infected patients, and plays a key role in preventing further spread of infectious diseases. The method is widely used for close contact tracking between normal people and confirmed patients in infectious disease prevention and treatment due to high identification accuracy rate. To find a person in long-term contact with an infected patient, a formalized representation of close contact tracking can be expressed as a sub-track connection. In order to support the tracking of the modern city-scale close contacts, sub-track connection query needs to be performed in a large-scale track database with millions of users and weeks of GPS data.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a distributed sub-track connection query processing method. The query processed by the method is as follows: a track is input and returned to all tracks in the track database which are in close contact with the track in a certain continuous time. The method supports the inquiry of the urban level track GPS points, can effectively reduce the calculation cost of disk I/O and CPU, and accelerates the processing.
The purpose of the invention is realized by the following technical scheme: a distributed sub-track connection query processing method specifically comprises the following steps:
first we describe the design steps of the memory part, as follows:
(1) and performing mixed partition processing on the original track data. Firstly, time partitioning is carried out on track data based on time information of a track to obtain a series of time partitions, and then distributed parallel processing is carried out on each time partition.
(2) The individual time partitions are further spatially partitioned based on the spatial location information of the trace points. The method comprises the steps of obtaining a Minimum Bounding Rectangle (Minimum Bounding Rectangle) corresponding to each track in the same time period, obtaining the central point of the Minimum Rectangle of each track, and finally segmenting according to the central points by using a space filling curve Hilbert curve to obtain a series of space partitions, wherein each space partition is an independent storage file.
Then we describe the construction process of the index as follows:
(1) precise indices are built at the edges of the time partitions. And for each time partition, establishing indexes for the accurate longitude and latitude of all tracks at the current time by using an index structure R-Tree at the head and tail moments of the time partition.
(2) A coarse index is built inside the time partition. And uniformly segmenting the whole two-dimensional space at each moment of time partition by using a grid index, and then obtaining a high-dimensional vector according to the spatial distribution of all track points at the moment. And introducing a parameter N, and clustering high-dimensional vectors at all moments in the time partition into N classes by using a clustering algorithm K-Means to obtain grid indexes at the N moments.
Finally, we describe the query steps as follows:
(1) and (5) filtering. And performing data segmentation on the input query trajectory in a time dimension according to the same time interval, and then performing parallel query on the query trajectory in a corresponding time partition according to a corresponding time period.
(2) And (6) verifying. And loading the corresponding space partition into a memory according to the candidate track obtained in the filtering stage, and sequentially verifying according to the candidate track ID.
(3) And (6) merging. Processing the result obtained in the verification stage, namely determining that the track is in close contact in the single subarea and directly determining that the track is in close contact; and for the single partition which cannot be determined to be in close contact with the track, combining the two adjacent partitions before and after the time partition to perform auxiliary judgment.
Compared with the prior art, the invention has the beneficial effects that: the processing method provided by the invention is based on distributed system design, has natural parallelism, and has high-speed and effective pruning performance in the filtering part, so that the processing method has the following advantages:
1) compared with the prior art, the method provided by the invention can simultaneously perform parallel processing in a large-scale cluster and has high expandability.
2) The method provided by the invention adopts the filtration-verification idea, has good pruning effect, effectively reduces the load of disk I/O and CPU, and has better system performance.
Drawings
FIG. 1 is a flow diagram of a hybrid partitioning section;
FIG. 2 is a schematic diagram of the index construction within each partition;
fig. 3 is an overall process flow diagram of the present invention.
Detailed Description
The technical solutions of the present invention are further described below with reference to the accompanying drawings, and it should be understood that the specific examples described herein are only for the purpose of explaining the present invention and are not intended to limit the present invention.
The attached drawing is a flow processing chart of the invention, and the method specifically comprises the following steps:
first we describe the storage design part and the index building process as follows:
(1) and performing mixed partition processing on the original track data. Firstly, time partitioning is carried out on all track data based on the time information of the track to obtain N time partitions. Assuming that there are 10000 users 1 day of original trajectory data, if the original trajectory data is partitioned at 30 minutes, all trajectory data can be divided into 48 time partitions, which are 10000 users 00: 00: 00-00: 30: 00 one partition, 10000 users 00: 30: 00-01: 00: 00 one partition, …, 10000 users 23: 30: 00-24: 00: 00 one partition.
(2) The individual time partitions are further spatially partitioned based on the spatial location information of the trace points. And finally, solving a connection sequence by using a space filling curve Hilbert curve according to two-dimensional coordinates of the central points, and segmenting according to the connection sequence to obtain a series of space partitions, wherein each space partition is an independent storage file. Assume for 10000 users 00 obtained in the first step: 00: 00-00: 30: for 00 single time partition, 10000 users can obtain a Minimum rectangular frame (Minimum Bounding Rectangle) corresponding to the track in the time period, then obtain the central point of the 10000 Minimum rectangular frames, connect 10000 two-dimensional coordinate points by using a space filling curve Hilbert curve, obtain the precedence order of the central points, if the central points are divided into 10 space partitions, the 10000 tracks can be divided into 10 parts according to the order, and then 10 space partitions 0-999 users 00 can be obtained: 00: 00-00: 30: 00 one partition, 1000-: 00: 00-00: 30: 00 one partition, …, 9000-: 00: 00-00: 30: 00 one partition.
(3) And establishing an accurate index at the edge of the partition. And for each time partition, establishing indexes for the accurate longitude and latitude of all track points at the current time by using an R-Tree data structure at the head and the tail of the time partition. Let 10000 users 00 obtained in the first step: 00: 00-00: 30: 00 partition, at 00: 00: 00 and 00: 30: and 00, using an R-Tree data structure to build indexes for 1000 track points at two moments, caching the indexes into a memory after the indexes are built, and then directly inquiring in the memory.
(4) A coarse index is built inside a partition. And uniformly segmenting the whole two-dimensional space at each moment of time partition by using a grid index, and then obtaining a high-dimensional vector according to the spatial distribution of all track points at the moment. And introducing a parameter N, and clustering high-dimensional vectors at all moments in the time partition into N classes by using a clustering algorithm K-Means to obtain grid indexes at the N moments. Let 10000 users 00 obtained in the first step: 00: 00-00: 30: and 00 partitions, setting the sampling frequency of GPS points of the tracks to be 10s and the size of the grid index to be 100 x 100, establishing a 100 x 100 grid index every 10s, wherein each track corresponds to a grid number at each moment, each moment can obtain a 100 x 100 high-dimensional vector, and each element in the vector represents the number of the tracks falling into the grid at the moment. Then, a parameter N is introduced, a clustering algorithm K-Means is used for clustering high-dimensional vectors at all times into N classes, then two-dimensional grid indexes are built at N clustering center times, the indexes are cached in a memory after the indexes are built, and then the indexes can be directly inquired in the memory.
For the query processing process, the specific implementation steps are as follows:
(1) and (5) filtering. And performing data segmentation on the input query trajectory in a time dimension according to the same time interval, and then performing parallel query on the query trajectory in a corresponding time partition according to a corresponding time period. Let there be a query trajectory at this time, the duration of the trajectory being one day. The trace is first sliced in the time dimension for a length of 30 minutes to yield 48 sub-queries. Then, distributed parallel processing is performed, that is, the 1 st sub-query is distributed to the time partition 00: 00: 00-00: 30: 00, distribute the 2 nd sub-query to time partition 00: 30: 00-01: 00: 00, …, distribute the 48 th sub-query to time partition 23: 30: 00-24: 00: 00. When inquiring in each partition, firstly making a Range Query on the R-Tree at the head and the tail of two moments, and directly taking the obtained result as a candidate track; for other moments in the partitions, making Range Query on the two-dimensional grid index of each moment, then taking results generated by the Range Query at a plurality of continuous moments as intersections, finally taking the results of all the intersections of the partitions as a union set, and finally generating candidate tracks corresponding to the partitions.
(2) And (6) verifying. And loading the corresponding space partition into a memory according to the candidate track obtained in the filtering stage, and sequentially verifying according to the candidate track ID. The method comprises the specific steps that all spatial partitions are processed on a cluster in parallel, when a single partition is processed, track point information of the track in the period of time is read according to a track ID which is generated in a filtering stage and needs to be verified, and the track point information and query track points are calculated one by one to obtain a verification result.
(3) And (6) merging. Processing the result obtained in the verification stage, namely determining that the track is in close contact in the single subarea and directly determining that the track is in close contact; and for the single partition which cannot be determined to be in close contact with the track, combining the two adjacent partitions before and after the time partition to perform auxiliary judgment. Assuming that the time window of the query is 20 minutes, the result obtained in the verification stage for the track with ID 1 is in partition 00: 30: 00-01: 00: 00, the ID is 1, and the track can be directly determined as the close contact track if the track is in close contact with the query track in 25 minutes; if the track with ID 2 gets the result in the verification stage as in partition 00: 30: 00-01: 00: 00 in 00: 30: 00-00: 45: when the phase 00 is in close contact with the query track, since it cannot be determined whether the track is in close contact with the query track, the phase 00 needs to be combined with the partition 00: 00: 00-00: 30: 00 in 00: 25: 00-00: 30: and further judging the contact condition of the 00 stage.
Compared with the prior art, the method provided by the invention can simultaneously perform parallel processing in a large-scale cluster, and has good expandability. The method provided by the invention adopts the filtration verification idea, has good pruning effect, effectively reduces the disk I/O and CPU load and has better system performance.

Claims (1)

1. A distributed sub-track connection query processing method is characterized by comprising the following steps:
(1) a storage section designing step, comprising the substeps of:
(1.1) firstly, time partitioning is carried out on track data based on time information of track points, and track segments are segmented according to equal time intervals to obtain a series of time partitions;
(1.2) solving a corresponding minimum rectangular frame for each track in each time partition, and calculating to obtain a central point of the minimum rectangular frame of each track;
(1.3) sequencing the central points of the minimum rectangular frames of each track by using a space filling curve Hilbert curve in each time partition, and segmenting according to sequencing results to obtain a series of space partitions, wherein each space partition is an independent storage file;
(2) the index part constructing step comprises the following substeps:
(2.1) for each time partition, establishing indexes for the accurate longitude and latitude of all tracks at the current time by using an index structure R-Tree at the head and tail moments of the time partition;
(2.2) uniformly dividing the whole two-dimensional space into m × m two-dimensional grids by using a grid index at each time of time partition, and then obtaining an m × m-dimensional high-dimensional vector according to the spatial distribution of all track points at the time;
(2.3) introducing a parameter N, and clustering m-dimensional high-dimensional vectors of all moments in the time partition into N classes by using a clustering algorithm K-Means to finally obtain grid indexes of N moments;
(3) a query step, comprising the following substeps:
(3.1) filtration: for the input query track, performing data segmentation on the time dimension according to the same time interval, and then performing parallel query in corresponding time partitions according to corresponding time periods;
(3.2) verifying: loading the corresponding space partition data into a memory according to the candidate track ID obtained in the filtering stage, and sequentially verifying according to the candidate track ID;
(3.3) merging: processing the result obtained by the verification part, namely determining the track to be in close contact in a single partition, and directly determining the track to be in close contact; and for the single partition which cannot be determined to be in close contact with the track, combining the two adjacent partitions before and after the time partition to perform auxiliary judgment.
CN202110162264.7A 2021-02-05 2021-02-05 Distributed sub-track connection query processing method Active CN113032391B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110162264.7A CN113032391B (en) 2021-02-05 2021-02-05 Distributed sub-track connection query processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110162264.7A CN113032391B (en) 2021-02-05 2021-02-05 Distributed sub-track connection query processing method

Publications (2)

Publication Number Publication Date
CN113032391A CN113032391A (en) 2021-06-25
CN113032391B true CN113032391B (en) 2022-04-12

Family

ID=76460107

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110162264.7A Active CN113032391B (en) 2021-02-05 2021-02-05 Distributed sub-track connection query processing method

Country Status (1)

Country Link
CN (1) CN113032391B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US2426217A (en) * 1942-09-14 1947-08-26 Standard Telephones Cables Ltd Direction and distance indicating system
CN102567497A (en) * 2011-12-23 2012-07-11 浙江大学 Inquiring method of best matching with fuzzy trajectory problems
CN111652446A (en) * 2020-06-15 2020-09-11 深圳前海微众银行股份有限公司 Method, apparatus and storage medium for predicting risk of infection of infectious disease

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US2426217A (en) * 1942-09-14 1947-08-26 Standard Telephones Cables Ltd Direction and distance indicating system
CN102567497A (en) * 2011-12-23 2012-07-11 浙江大学 Inquiring method of best matching with fuzzy trajectory problems
CN111652446A (en) * 2020-06-15 2020-09-11 深圳前海微众银行股份有限公司 Method, apparatus and storage medium for predicting risk of infection of infectious disease

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Smart Low-consumption IoT Framework for Location Tracking and Its Real Application;Hao Tang等;《 2016 6th International Conference on Electronics Information and Emergency Communication (ICEIEC)》;20161013;全文 *
面向室内空间的语义轨迹提取框架;骆歆远等;《清华大学学报(自然科学版)》;20191231;全文 *

Also Published As

Publication number Publication date
CN113032391A (en) 2021-06-25

Similar Documents

Publication Publication Date Title
Pelanis et al. Indexing the past, present, and anticipated future positions of moving objects
Uddin et al. Finding regions of interest from trajectory data
CN111523577A (en) Mass trajectory similarity calculation method based on improved LCSS algorithm
CN109241126A (en) A kind of space-time trajectory accumulation mode mining algorithm based on R* tree index
CN106528793A (en) Spatial-temporal fragment storage method for distributed spatial database
CN106156528A (en) A kind of track data stops recognition methods and system
CN112131325A (en) Track determination method, device and equipment and storage medium
CN105760548A (en) Vehicle first appearance analysis method and system based on big data cross-domain comparison
CN102004771B (en) Method for querying reverse neighbors of moving object based on dynamic cutting
CN111611900B (en) Target point cloud identification method and device, electronic equipment and storage medium
CN117893383B (en) Urban functional area identification method, system, terminal equipment and medium
CN114238491B (en) Heterogeneous graph-based multi-mode traffic operation situation association rule mining method
CN108566620A (en) A kind of indoor orientation method based on WIFI
CN113722415B (en) Point cloud data processing method and device, electronic equipment and storage medium
CN111833224A (en) Urban main and auxiliary center boundary identification method based on population grid data
CN113779105B (en) Distributed track flow accompanying mode mining method
CN104778355B (en) The abnormal track-detecting method of traffic system is distributed based on wide area
CN113032391B (en) Distributed sub-track connection query processing method
CN112052405B (en) Passenger searching area recommendation method based on driver experience
CN112307286B (en) Vehicle track clustering method based on parallel ST-AGNES algorithm
CN109800231A (en) A kind of real-time track co-movement motion pattern detection method based on Flink
Rslan et al. Spatial R-tree index based on grid division for query processing
CN114564521A (en) Method and system for determining working time period of agricultural machine based on clustering algorithm
Chen et al. Detecting trajectory outliers based on spark
CN110222022B (en) Intelligent algorithm optimized data library construction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant