CN115905242A

CN115905242A - Mass ship historical track data storage system and query method

Info

Publication number: CN115905242A
Application number: CN202211659538.4A
Authority: CN
Inventors: 覃基伟; 马良荔; 何智勇; 牛敬华; 李永杰
Original assignee: Naval University of Engineering PLA
Current assignee: Naval University of Engineering PLA
Priority date: 2022-12-22
Filing date: 2022-12-22
Publication date: 2023-04-04

Abstract

The invention designs a mass ship historical track data storage system which comprises a track storage module, a local index module and a data maintenance module. According to the ship track data structure characteristics, ship historical track data are organized and stored in order, a B + tree, an R tree and a hash table are combined according to common query types, a local index structure supporting multiple query types is constructed, local indexes and corresponding data storage partitions are maintained in the same node, and communication overhead is reduced. On the basis of the constructed storage and index structure, the query method is realized based on the parallel query method, the optimization of query based on time, space and ship identification is realized, the communication overhead among nodes is effectively reduced due to the fact that the same ship data is stored in the model of the same node, the quick search of the three types of keywords is ensured through the row keys and the index structure, and the query time delay can be effectively reduced.

Description

Mass ship historical track data storage system and query method

Technical Field

The invention relates to the technical field of ship data processing, in particular to a mass ship historical track data storage system and a query method.

Background

In recent years, with rapid economic development, the number of ships, ship traffic volume and ship traffic density are rapidly increased, the marine traffic environment is increasingly complex, and the occurrence frequency of marine traffic accidents is also increasingly increased. In order to reduce the occurrence of marine traffic accidents, more and more ships are beginning to be equipped with Automatic Identification Systems (AIS) for ships. The AIS is a novel digital navigation aid system and equipment integrating network technology, modern communication technology, computer technology and electronic information display technology, ship dynamic information such as ship position, ship speed, changed course rate and course and ship static information such as ship name, call sign, draught, dangerous goods and the like are broadcasted to ships and shore stations in nearby water areas by very high frequency electric waves in cooperation with a GPS, so that the nearby ships and the shore stations can timely master the dynamic and static information of all ships on the sea surface, and further the traffic safety on the sea is ensured.

Nowadays, hundreds of thousands of ships are installed worldwide, and the worldwide monthly AIS data volume can reach hundreds of GB. Each AIS message for recording the position information can be regarded as a track point of the ship, so that the accumulated AIS data contains a large amount of historical ship track data. By analyzing and mining historical ship track data, a good solution can be provided for route optimization, abnormal behavior monitoring, false target screening, port throughput statistical analysis and the like.

The method realizes the support of effective analysis and mining of mass historical ship track data without high-efficiency storage and query technologies. However, the traditional centralized storage and query scheme is not capable of bearing the query requirement of massive historical ship trajectory data due to the defects of calculation, reading and writing, communication and expandability, and has the following problems:

1. the storage model is behind. The existing historical ship track data is usually stored in a relational database or a relational database based on space expansion and is deployed in a single computer, and because the capacity of the single computer is limited, the traditional relational database is difficult to expand and is not suitable for large-scale data storage.

2. The query efficiency is low. Because the existing historical ship track data is large in scale and multiple in type, the traditional query mode is difficult to implement a parallelization query means, the efficiency is low, and the real-time query requirement cannot be met.

From the above, it can be seen that the storage model is backward and the query efficiency is low, which are two major problems to be solved in the existing ship historical track data storage and query method.

Disclosure of Invention

The invention aims to provide a mass ship historical track data storage system and a query method, which solve the problems of backward storage model and low query efficiency in ship historical track data storage and query, and realize high-efficiency storage and query of mass ship historical track data.

In order to achieve the purpose, the mass ship historical track data storage system comprises a track storage module, a local index module and a data maintenance module, wherein the track storage module comprises track storage partitions distributed in a plurality of nodes of a cluster, and each track storage partition stores track data of a plurality of ships;

the local index module comprises a plurality of local index partitions, all the local index partitions are stored in the memory, each local index partition corresponds to one track storage partition, and the local index partitions and the track storage partitions which have corresponding relations are stored in the same node;

each local index partition comprises a space-time index and a ship identification time index, in the inquiry process, the space-time index positions the inquiry range according to the space-time inquiry key words, and the ship identification time index positions the inquiry range according to the ship identification key words and the time key words; the data maintenance module is used for synchronously processing the splitting and the migration of the track storage partition and the corresponding local index partition;

when a certain track storage partition is split due to the fact that the stored data amount exceeds the storage threshold value to form a new track storage partition, the data maintenance module is used for synchronously splitting the corresponding local index partition into a new local index partition, and enabling the new local index partition to correspond to the new track storage partition; when the track storage partition is migrated from one node to another node, the data maintenance module synchronously migrates the corresponding local index partition to the same computer.

A method for carrying out parallelization mass historical track data query of ships by using the system comprises the following steps:

step 1: sending the query time keyword qtk, the query space keyword qsk and the query ship identification set qms to each local index partition of a local index module, wherein when a user initiates a query, at least one of three query conditions of the query time keyword qtk, the query space keyword qsk and the query ship identification set qms is not empty, and if the query time keyword qtk is empty, the query covers the whole time range; if the query space keyword qsk is null, the query covers the whole space range; if the query ship identification set qms is empty, querying to cover all ships;

step 2: parallel queries are started, and the query process at each local index partition is as follows:

step 201: for each local index partition, firstly, inquiring the space-time index through the inquiry time key words qtk and the inquiry space key words qsk to obtain the set crks of the candidate row keys of the inquiry result ₀ ，crks ₀ ＝{rk _0,1 ,rk _0,2 ,…,rk _m,n }，rk _m,n Represents that the MMSI is m in the time interval ti _n Generating a row key of the track segment;

step 202: inquiring the ship identification time index through the inquiry time key words qtk and the inquiry ship identification set qms to obtain the set crks of the candidate row keys of the inquiry result ₁ ，crks ₁ ＝{rk _0,1 ,rk _0,2 ,…,rk _i,j Where rk, rk _i,j Representing a ship with MMSI of i in a time interval ti _j Generating a row key of the track segment;

step 203: computing crks for candidate row keys ₀ Crks set with candidate row keys ₁ Set crks of intersection candidate row keys of ₂ Through candidate row key set crks ₂ Obtaining a candidate row key set mcrks ₀ ；

Step 204: sequential reading candidate row key set mcrks ₀ For the current element mcrk _g By belonging to mcrk _g The candidate row keys are inquired about the track storage partition corresponding to the local index partition to obtain a candidate row key set mcrks ₀ Track data of rows corresponding to all candidate row keys are filtered through query time keywords qtk and query space keywords qsk to obtain query results of the partitions;

and step 3: and summarizing the obtained query results of each partition to obtain a query result set.

The invention has the beneficial effects that:

aiming at the problems of backward storage model and low query efficiency existing in the existing ship historical track data storage device and query method, the ship historical track data are orderly organized and stored according to the structural characteristics of the ship track data, the B + tree, the R tree and the hash table are combined according to common query types, a local index structure supporting multiple query types is constructed, and meanwhile, the local index and the corresponding data storage partition are maintained in the same node, so that the communication overhead is reduced. On the basis of the constructed storage and index structure, the query method is realized based on the parallel query method, the optimization of query based on time, space and ship identification is realized, the communication overhead among nodes is effectively reduced due to the fact that the same ship data is stored in the model of the same node, the quick search of the three types of keywords is ensured through the row keys and the index structure, and the query time delay can be effectively reduced.

Drawings

Fig. 1 is a schematic structural diagram of a mass ship historical track data storage system in the invention.

The system comprises a track storage module 1, a local index module 2, a data maintenance module 3 and a track storage partition 4; 5-local index partition; 6-spatio-temporal index; 7-vessel identification time index.

Detailed Description

The invention is described in further detail below with reference to the following figures and specific examples:

as shown in fig. 1, the system for storing historical track data of a large number of ships comprises a track storage module 1, a local index module 2 and a data maintenance module 3, wherein the track storage module 1 comprises track storage partitions 4 distributed in a plurality of nodes of a cluster (the cluster is formed by connecting and combining a plurality of computers, and the nodes refer to a single computer in the cluster), and each track storage partition 4 stores track data of a plurality of ships;

the local index module 2 comprises a plurality of local index partitions 5, all the local index partitions 5 are stored in the memory, each local index partition 5 corresponds to one track storage partition 4, the correspondence means that one local index partition only indexes data in one track storage partition, and the local index partitions are on the same node, so that the local index partitions and the track storage partitions are in one-to-one correspondence relationship, and the two local index partitions and the track storage partitions are stored in the same node, so that node crossing is not needed in communication, message encapsulation, network transmission and message analysis expenses required for node crossing communication can be reduced, and the local index partitions 5 and the track storage partitions 4 with the correspondence relationship are stored in the same node;

each local index partition 5 comprises a time-space index 6 and a ship identification time index 7, in the process of inquiring ship historical track data, the time-space index 6 positions and inquires a time-space range according to time-space inquiry keywords, and the ship identification time index 7 positions and inquires the time range and ship identification according to the ship identification keywords and the time keywords; the data maintenance module 3 is configured to synchronously process splitting and migration of the track storage partition 4 and the corresponding local index partition 5 (if too much track data is written into a partition, which results in that the occupied space of the partition exceeds a threshold value, and is not beneficial to system maintenance, the too large partition needs to be split into two partitions, and one split partition is moved to other nodes of the cluster, so as to ensure load balance of the whole cluster);

when a certain track storage partition 4 is split due to the fact that the amount of stored data exceeds a storage threshold (the same storage capacity occupied by a new partition is guaranteed as much as possible during splitting, and the data of the same ship is required to be split on one partition), and a new track storage partition 4 is formed, the data maintenance module 3 is used for synchronously splitting a corresponding local index partition 5 into a new local index partition 5, so that the new local index partition 5 corresponds to the new track storage partition 4; when the track storage partition 4 is migrated from one node to another node, the data maintenance module 3 synchronously migrates the corresponding local index partition 5 to the corresponding node. Moving a partition from one node to another ensures load balancing. Migration occurs after the split. The migration method is that after the load of each node in the cluster is calculated, the partition is migrated to the node with the minimum load. When the index and the data are partitioned on one node, the processing overhead caused by cross-node query can be reduced during query, and the query efficiency can be improved.

In the technical scheme, the cluster is a group of loosely or tightly connected computers (loose means that the minimum data unit stored in a node is a file, and tight means that the minimum data unit in the node is a data block, that is, one file is divided into a plurality of data blocks and then stored on different nodes), and the single computer in the cluster is regarded as one node, and the nodes are connected through a local area network. The purpose of the cluster is to coordinate a plurality of computers to complete tasks, the work efficiency is improved through a distributed parallel method, and compared with a single computer, the plurality of computers have stronger capability in the aspects of calculation and storage.

In the above technical solution, the track storage module 1 is implemented based on a non-relational database HBase, data in the HBase is stored in a table, the table is composed of rows and columns, columns of the same type may form a column group, and a single HBase table Tab is used _t To store the ship track data, HBase table Tab _t The method is divided into a plurality of track storage subareas distributed in a cluster, each track storage subarea stores track data of a plurality of ships, the track data of the same ship is continuously and orderly stored in one track storage subarea 4 in a track section mode, the continuity means that the data of the same ship is continuously distributed on a storage space without inserting data of other ships in the middle,the order means that different track data in the same ship are stored according to a time sequence;

the track section is continuous and ordered ship track data generated by a ship in a certain time interval, the time interval is obtained by equally dividing time dimensions, the length of all the time intervals is 2 hours, and the track section is in a certain time interval ti _i Middle, track section TS _i,j ＝{p ₁ ,p ₂ ,…,p _k Mean the ship sh _j In a time interval ti _i Sequence of middle sampling trace points, trace point p ₁ ,p ₂ ,…,p _k Sorting according to a time sequence;

the track segment realizes unique identification by combining ship identification and time interval attribute, and due to HBase table Tab _t The position of the data of the storage partition is determined by using a Row Key (Row Key), so when the track data is stored in the form of a track segment, the Row Key rk is realized in the form of combining ship identification mmsi and time interval ti attributes:

rk＝mmsi+ti

wherein MMSI (millimeter Mobile Service identity, MMSI) is the identification code of the above-water Mobile communication Service, and can be regarded as the ship identification of a ship, ti represents the time interval, because MMSI information in rk is used as the prefix, according to the characteristic of HBase row key dictionary ordering, the track data of the same ship can be continuously and orderly stored in Tab _t Performing the following steps;

matched with HBase table Tab _t The design of the middle row key adopts a column family to store all track points in a track segment in terms of column families and columns, each column in the column family adopts the sampling time of the track points as a column key to ensure that the track points in the track segment are stored on the basis of time sequence, the spatial position of the track points and other information (such as course, speed, destination and the like) except MMSI, time and longitude and latitude in AIS data are recorded in a data unit corresponding to an HBase table, the data unit is a universal data storage structure in the HBase, and the data unit can uniquely determine a cell through the row key, the column family and the column keys. The design can reduce the communication overhead and the sequencing overhead of cross nodes during query, and because the same ship data linkContinuously storing the data in the same partition, and only one node is needed to be inquired when the data of one ship is inquired; when a plurality of ship data are inquired, the data of one ship are not on a plurality of nodes, so that the plurality of nodes are not required to be coordinated to sum up the data, and the communication overhead can be reduced. In addition, because the data of the same ship are stored in order, the data do not need to be reordered after the result is obtained by query.

In the above technical solution, the spatio-temporal index 6 in the local index partition 5 is of a layered mixed structure, and is divided into an upper layer and a lower layer, the upper layer is a classification index based on a time period, the time period is obtained by dividing a time dimension into a plurality of time periods with equal length, each time period includes a plurality of time intervals, the length of the time period is 24 hours, the lower layer is a track segment index and is composed of a plurality of R trees, the R trees use track segments as index objects, and each R tree in the lower layer corresponds to the time period of the upper layer one by one, so that each R tree only needs to maintain the track segment in the time period;

in order to be suitable for indexing ship trajectory data, the R tree in the space-time index 6 indexes spatial position attributes and also puts time attributes within an index range, index items of classification indexes in the space-time index 6 are recorded in the form of binary group a (tc, rtree), wherein tc represents a time period of the index item, rtree represents a pointer pointing to a corresponding lower-layer R tree, middle nodes of the R tree in the space-time index 6 are recorded in the form of triple group a (tp, MBR, rcns), wherein tp represents a time range of nodes, MBR (Minimum Bounding Rectangle, MBR) represents a spatial Minimum outsourcing Rectangle of the nodes, and rcns is a set of pointers pointing to sub-nodes, leaf nodes of an R tree in the spatio-temporal index 6 are recorded in a form of a triple B (tp, MBR, tses), wherein tp represents a time range of the nodes, MBR represents a Minimum space outsourcing Rectangle of the nodes, tses is a set of pointers pointing to track segment index entries, in tses, the pointers of the track segment index entries are arranged according to row key values of the pointers pointing to the track segment index entries from small to large, the track segment index entries are recorded in a form of a binary B (rk, MBR 1), rk is a row key of a track segment, MBR1 represents a Minimum space outsourcing Rectangle of the track segment, and time interval information of the track segment is stored in the row key, so that no extra recording is needed. Because the time-space index 6 takes time and space attributes as index objects, and the time-space index 6 is stored in the memory, the access speed is high, and the query support of time keywords and space keywords can be realized.

In the above technical solution, the ship identification time index 7 in the local index partition 5 is a layered mixed structure, which is divided into an upper layer and a lower layer, wherein the upper layer is a ship identification index, and adopts a hash table structure for indexing ship identification attributes; the lower layer is a time index and consists of a plurality of B + trees, each B + tree corresponds to a ship identification index item of the upper layer and is used for indexing time intervals corresponding to all track sections of the corresponding ship; the ship identification time index 7 takes the ship identification and the time as index objects, and the ship identification time index 7 is stored in the memory, so that the access speed is high, and the inquiry support of the ship and the time keyword can be realized.

The index items of the ship identifiers in the ship identifier time index 7 are recorded in the form of a binary group C (mmse, bptree), wherein mmse represents the ship identifiers, bptree represents pointers pointing to a corresponding lower layer B + tree, middle nodes of the B + tree in the ship identifier time index 7 are recorded in the form of a binary group D (tp, bcns), wherein tp represents the time range of the nodes, and bcns represents a set pointing to sub-node pointers; the characteristics correspond to the data distribution of the track storage partition (MMSI is in front of the line key, and time is in the back), so the upper layer structure of the ship identification time index 7 takes the ship identification attribute as an index object.

The B + tree leaf nodes in the ship identification time index 7 are recorded in the form of quadruplets (pre, next, tp, mes), where pre is a pointer pointing to a previous leaf node, next is a pointer pointing to a next leaf node, tp is a time range of the nodes, and mes represents a set of row keys corresponding to a trajectory segment within the time range of the nodes. The characteristics correspond to the data distribution of the track storage partition (MMSI in the row key is in front of the MMSI, and the time is in the back of the MMSI), the lower-layer structure takes the time attribute as an index object, and in addition, the leaf nodes of the B + tree are mutually linked, so that the time range searching can be quickly realized.

A method for carrying out parallelization mass ship historical track data query by utilizing the system comprises the following steps:

step 1: sending the query time keyword qtk, the query space keyword qsk and the query ship identification set qms to each local index partition 5 of the local index module 2, wherein when a user initiates a query, at least one of three query conditions of the query time keyword qtk, the query space keyword qsk and the query ship identification set qms is not empty, and if the query time keyword qtk is empty, the query covers the whole time range; if the query space keyword qsk is null, the query covers the whole space range; if the query ship identification set qms is empty, querying to cover all ships; the inquiry conditions related to the method comprise ship identification, time and space, and if the inquiry of a user possibly relates to all 3 conditions, the inquiry keywords are directly sent to each partition to start the next processing; however, it is also possible that if the query condition of the user is only 1 or 2, the query range without specified condition needs to be set as all, so that the subsequent processing steps can obtain the correct query structure;

step 2: parallel queries are started, and the query process at each local index partition 5 is as follows:

step 201: for each local index partition, firstly, inquiring the space-time index 6 through inquiring the time key words qtk and the inquiring space key words qsk to obtain a set crks of candidate row keys of an inquiring result ₀ ，crks ₀ ＝{rk _0,1 ,rk _0,2 ,…,rk _m,n }，rk _m,n Represents that the MMSI is m in the time interval ti _n Generating a row key of the track segment;

step 202: inquiring the ship identification time index 7 through the inquiry time key words qtk and the inquiry ship identification set qms to obtain the set crks of the candidate row keys of the inquiry result ₁ ，crks ₁ ＝{rk _0,1 ,rk _0,2 ,…,rk _i,j H, rk, wherein rk _i,j Representing a ship with MMSI of i in a time interval ti _j Generating row keys of the track segments;

step 203: computing crks for candidate row keys ₀ With set of candidate row keys crks ₁ Of intersection of

Obtaining candidate row key set crks ₂ Through candidate row key set crks ₂ Obtaining a candidate row key set mcrks ₀ ；

Step 204: reading the candidate row key set mcrks according to the dictionary sequence (the sequence of letters and numbers is from small to large) of the row key values ₀ For each element of the current candidate row key element mcrk _g By mcrk _g Inquiring the track storage partition 4 corresponding to the local index partition 5 to obtain a candidate row key set mcrks ₀ Then, track data of a row in a track storage partition corresponding to one row key is filtered through a query time keyword qtk and a query space keyword qsk (mcrks are obtained) ₀ Corresponding to the data of all the rows in the track storage partition), obtaining a query result of the track storage partition (a data set consisting of a plurality of track points, wherein for each track point in the query result, the MMSI attribute appears in qms, the time attribute is in the range of a time keyword qtk, and the space position is in the range of a space keyword qsk);

and step 3: and summarizing the obtained query results of each partition to obtain a query result set, returning the query result set to the user, and finishing the query.

In the above technical solution, the specific process of step 201 is: for each local index partition 5, firstly, inquiring a space-time index 6 through inquiring a time keyword qtk and an inquiring space keyword qsk, when searching the space-time index 6, firstly, searching a classification index at the upper layer of the space-time index 6 through inquiring the time keyword qtk, and if a certain index item binary A (tc, rtree) of the classification index meets the requirement of the two-tuple A (tc, rtree)

That is, the time range of qtk intersects with the time range of the time period of the index entry, the R tree rtree corresponding to the index entry is searched from top to bottom (starting from the query of the root node of the tree to ending at the leaf node), and when one intermediate node triple a (tp, mbr, rcns) of rtree is searched, if the intermediate node satisfies |/h/R>

And is

That is, the time range of qtk intersects with the time range of the intermediate node and the space range of qsk intersects with the space range of the intermediate node, the child node of the node is searched by the set rcns pointing to the child node pointer, and so on; whenever a leaf node triplet B (tp, mbr, tses) in rtree is searched, if the leaf node satisfies +>

And->

That is, the time range of qtk intersects with the time range of the intermediate node and the space range of qsk intersects with the space range of the intermediate node, the index item contained in the node is searched through the set tses pointing to the track segment index item pointer; every time when an index item binary B (rk, mbr 1) is searched, acquiring a time interval ti of an indexed track segment from a row key rk, and if an index item satisfies ^ and ^>

And->

Extracting the row key of the index item, regarding the row key as a candidate row key, regarding the index item as a candidate index item, and obtaining a set crks of the candidate row keys after the space-time index search is finished ₀ Is denoted as crks ₀ ＝{rk _0,1 ,rk _0,2 ,…,rk _m,n Where rk, rk _m,n Represents that the MMSI is m in the time interval ti _n Row keys for the track segments are generated. Step 201 is to query the spatio-temporal index 6 by the time keyword and the space keyword, in order to obtain the row keys of all the trajectory segments where the time range intersects with the time keyword and the space range intersects with the space keyword. Because the space-time index 6 is stored in the memory, and the time and the space are storedThe attribute is used as an index object, so that the attribute can quickly acquire the track segments intersected by the space and the time.

In the above technical solution, the specific process of step 202 is: inquiring the ship identification time index 7 through the inquiry time keyword qtk and the inquiry ship identification set qms, and inquiring the ship identification mmsi for each inquiry ship identification set qms _q Using mmsi _q- Search the ship identification time index 7, if mmsi _q Search hits, with the index term in the hit set to (mmsi) _q ,bptree _q )，bptree _q Represents the mmi in the vessel identification time index 7 _q Corresponding to the lower B + tree, then search for bptree in a top-down manner through qtk _q Every search to bptree _q If the intermediate node (tp, bcns) satisfies the condition

That is, the time range of qtk intersects with the time range tp of the intermediate node, the child node of the node is searched by the set bcns pointing to the child node pointer, and so on; bptree per search _q If the leaf node satisfies { [ pre, next, tp, mes) }>

That is, the qtk time range intersects the qtk time range tp of the leaf node, then the row key included in mes is extracted and regarded as a candidate row key, and for the leaf node where the first time range and the qtk time range intersect, which are searched according to the top-down manner, after the search for the leaf node is completed, the leaf node connected to the leaf node is continuously searched through the pointer pre pointing to the previous leaf node and the pointer next pointing to the next leaf node, and it is set that (pre) _p ,next _p ,tp _p ,mes _p ) Is a leaf node, pre, connected to (pre, next, tp, mes) by a pre pointer _p ,next _p ,tp _p ,mes _p Respectively representing a pointer pointing to a previous leaf node, a pointer pointing to a next leaf node, a node, which are connected to a leaf node (pre, next, tp, mes) through a pre pointerIn the time range of the node, the set of row keys corresponding to the trajectory section in the time range of the node, make a decision &>

If yes, extract mes _p Including row keys, as candidate row keys, and passing pre _p Continuing to search for connected leaf nodes, and so on until the time ranges of the currently searched leaf nodes are not intersected; is (pre) _n ,next _n ,tp _n ,mes _n ) Is a leaf node, pre, connected to (pre, next, tp, mes) by a next pointer _n ,next _n ,tp _n ,mes _n Respectively representing a pointer pointing to a previous leaf node, a pointer pointing to a next leaf node, a time range of a node and a set of row keys corresponding to a track segment in the time range of the node which are connected with (pre, next, tp, mes) through a next pointer, and determining &>

If yes, then mes is extracted _n The contained row key is regarded as a candidate row key and passes next _n Continuing to search for connected leaf nodes, and so on until the time ranges of the currently searched leaf nodes are not intersected; when passing pre _p And next _n When the searched leaf nodes are not intersected with qtk, stopping searching the bptree _q . After the current inquiry ship identification is searched, reading the next inquiry ship identification in the inquiry ship identification set qms, and completing the search processing according to the same steps. After searching of all elements in qms is finished, a set crks of candidate row keys is obtained ₁ Is marked as crks ₁ ＝{rk _0,1 ,rk _0,2 ,…,rk _i,j Where rk, rk _i,j Representing a ship with MMSI of i in a time interval ti _j Row keys for the track segments are generated. Step 202 is to query the ship identification time index 7 through the query time keyword and the query ship identification set, so as to obtain the row keys of all track segments of which the time range is intersected with the query time keyword and the ship identification attribute is in the query ship identification set.The ship identification time index 7 is stored in the memory, and the time and the ship identification attribute are used as index objects, so that the track segment with crossed time and appointed ship identification can be quickly acquired.

In the above technical solution, in the step 203, crks is set by candidate row keys ₂ Obtaining a candidate row key set mcrks ₀ The specific process comprises the following steps: for candidate row key set crks ₂ The row keys in the middle are sorted from small to large according to the row key value, and are sorted according to the MMSI attribute while being sorted, so that a sorted and sorted row key set mcrks is obtained ₀ ＝{mcrk ₀ ,mcrk ₁ ,…,mcrk _k Where mcrk _k Set of candidate row keys representing a ship with MMSI of k, and at mcrk _k In the middle, the row keys are arranged in the order from small to large. The design is to avoid subsequent sorting and ordering processes for the query results. This is because the data in the trace storage partition is stored sequentially and orderly, and if the candidate row keys are treated as sorted and ordered, each result subset obtained by the query necessarily belongs to one ship and is arranged in a time ascending order.

In the above technical solution, the specific process of step 204 is as follows: sequential reading of row selection key set mcrks ₀ For the current element mcrk _g By belonging to mcrk _g The candidate row key queries the track storage partition 4 corresponding to the local index partition 5 to obtain a candidate row key set mcrks ₀ The track data of the corresponding row of all the candidate row keys is marked as cTs _g ＝{p _g,1 ,p _g,2 ,…,p _g,f In which p is _g,f Representing the f track point of the ship with MMSI g sorted according to the ascending order of time, and then, inquiring time keywords qtk and space keywords qsk to belong to the cTs _g The trace points of (a) are filtered, and the trace point p is filtered _g,d ∈cTs _g If the locus point p _g,d Is not in the query time keyword qtk range or the track point p _g,d If the space position is not in the range of the query space keyword qsk, the track point p is determined _g,d From the cTs _g After the medium deletion and the filtration are finished, the product is obtainedTrack query result Ts to ship g _g Due to mcrk _g The middle candidate row keys are ordered, and the track points in the track storage partition 4 are stored according to the time sequence, so the track query result Ts _g The track point query results in the system can be automatically stored according to the time sequence, and the track query result Ts is stored _g Adding to the partitioned query results Tss _p Middle, tss _p The method is a summary of all query results in a track storage partition, and comprises the track query results Ts of a plurality of ships _g And (4) forming. Continued access to mcrks ₀ Until mcrks, the next element is processed in the same way as above ₀ Until the elements in (1) are processed.

Those not described in detail in this specification are well within the skill of the art.

Claims

1. A mass ship historical track data storage system is characterized in that: the ship track data management system comprises a track storage module (1), a local index module (2) and a data maintenance module (3), wherein the track storage module (1) comprises track storage partitions (4) distributed in a plurality of nodes of a cluster, and each track storage partition (4) stores track data of a plurality of ships;

the local index module (2) comprises a plurality of local index partitions (5), all the local index partitions (5) are stored in the memory, each local index partition (5) corresponds to one track storage partition (4), and the local index partitions (5) and the track storage partitions (4) which have corresponding relations are stored in the same node;

each local index partition (5) comprises a time-space index (6) and a ship identification time index (7), in the process of inquiring ship historical track data, the time-space index (6) positions and inquires a time-space range according to a time-space inquiry keyword, and the ship identification time index (7) positions and inquires the time range and a ship identification according to the ship identification keyword and a time keyword; the data maintenance module (3) is used for synchronously processing the splitting and the migration of the track storage partition (4) and the corresponding local index partition (5);

when a certain track storage partition (4) is split due to the fact that the stored data amount exceeds the storage threshold value to form a new track storage partition (4), the data maintenance module (3) is used for synchronously splitting the corresponding local index partition (5) into a new local index partition (5), so that the new local index partition (5) corresponds to the new track storage partition (4); when the track storage partition (4) is migrated from one node to another node, the data maintenance module (3) synchronously migrates the corresponding local index partition (5) to the corresponding node.

2. The mass ship historical track data storage system according to claim 1, wherein: the cluster is a group of computers which are loosely or tightly connected to work together, a single computer in the cluster is regarded as a node, and the nodes are connected through a local area network.

3. The mass ship historical track data storage system according to claim 1, wherein: the track storage module (1) is realized based on a non-relational database HBase, data in the HBase is stored in a table, the table consists of rows and columns, columns of the same type can form a column family, and a single HBase table Tab is used _t To store the ship track data, HBase table Tab _t The method comprises the following steps that the method is divided into a plurality of track storage subareas distributed in a cluster, each track storage subarea stores track data of a plurality of ships, the track data of the same ship are continuously and orderly stored in one track storage subarea (4) in a track section mode, the continuous mode means that the data of the same ship are continuously distributed on a storage space, data of other ships cannot be inserted in the middle, and the orderly mode means that different track data in the same ship are stored according to a time sequence;

the track section is continuous and ordered ship track data generated by a ship in a certain time interval, the time interval is obtained by equally dividing time dimensions, and the track section is obtained in a certain time interval ti _i Middle, track section TS _i,j ＝{p ₁ ,p ₂ ,…,p _k Mean the ship sh _j In a time interval ti _i Sequence of middle sampling trace points, trace point p ₁ ,p ₂ ,…,p _k Sorting according to a time sequence;

the track segment realizes unique identification by combining ship identification and time interval attribute, and due to HBase table Tab _t The row key is used to determine the position of the data of the storage partition, so when the track data is stored in the form of track segments, the row key rk is implemented in the form of combining ship identification mmsi and time interval ti attributes:

rk＝mmsi+ti

wherein mmsi is a water mobile communication service identification code, ti represents a time interval, and as the mmsi information in rk is used as a prefix, the track data of the same ship can be continuously and orderly stored in Tab according to the characteristic of HBase row key dictionary sorting _t Performing the following steps;

matched with HBase table Tab _t In the aspect of column family and column, one column family is adopted to store all track points in the track section, the sampling time of the track points is adopted as a column key in each column in the column family, the track points in the track section are ensured to be stored based on time sequence, and the spatial position of the track points and other information except MMSI, time and longitude and latitude in AIS data are recorded in a data unit corresponding to an HBase table.

4. The mass ship historical track data storage system according to claim 1, wherein: the space-time index (6) in the local index partition (5) adopts a layered mixed structure and is divided into an upper layer and a lower layer, the upper layer is a classification index based on a time period, the time period is obtained by dividing a time dimension into a plurality of time periods with equal length, each time period comprises a plurality of time intervals, the lower layer is a track segment index and is composed of a plurality of R trees, the track segments are used as index objects by the R trees, and each R tree in the lower layer corresponds to the time period of the upper layer one by one;

in order to be suitable for indexing ship track data, the R tree in the space-time index (6) does not optically index the spatial position attribute, and meanwhile, the time attribute is also placed in an index range, index items of classification indexes in the space-time index (6) are recorded in the form of a binary group A (tc, rtree), wherein tc represents the time period of the index item, rtree represents a pointer pointing to a corresponding lower-layer R tree, middle nodes of the R tree in the space-time index (6) are recorded in the form of a ternary group A (tp, mbr, rcns), wherein tp represents the time range of the nodes, mbr represents a spatial minimum outsourcing rectangle of the nodes, rcns represents a set of pointers pointing to subnodes, nodes of leaves of the R tree in the space-time index (6) are recorded in the form of a ternary group B (tp, mbr, tses) represents the time range of the nodes, mbr represents a rectangular spatial minimum outsourcing rectangle of the nodes, tses represents a set of pointers pointing to track segment indexes, and in tses, the track index items are arranged according to row values of the track pointers of the binary group A, and rk represents a row key of the track index item, wherein rk represents a row of the binary group B, and rk represents a row of the track index item.

5. The mass ship historical track data storage system according to claim 1, wherein: the ship identification time index (7) in the local index partition (5) adopts a layered mixed structure and is divided into an upper layer and a lower layer, the upper layer is a ship identification index and adopts a hash table structure for indexing the ship identification attribute; the lower layer is a time index and consists of a plurality of B + trees, each B + tree corresponds to a ship identification index item of the upper layer and is used for indexing time intervals corresponding to all track sections of the corresponding ship;

index items of ship marks in the ship mark time index (7) are recorded in the form of a tuple C (mmse, bptree), wherein mmse represents a ship mark, bptree represents a pointer pointing to a corresponding lower layer B + tree, middle nodes of the B + tree in the ship mark time index (7) are recorded in the form of a tuple D (tp, bcns), wherein tp represents a time range of the nodes, and bcns represents a set of pointers pointing to sub-nodes;

the B + tree leaf nodes in the ship identification time index (7) are recorded in a quadruplet (pre, next, tp, mes) mode, wherein pre is a pointer pointing to a previous leaf node, next is a pointer pointing to a next leaf node, tp is a time range of the nodes, and mes represents a set of row keys corresponding to track segments in the node time range.

6. A method for performing parallelized massive ship historical track data query by using the system of claim 1, which comprises the following steps:

step 1: sending the query time keyword qtk, the query space keyword qsk and the query ship identification set qms to each local index partition (5) of the local index module (2), wherein when a user initiates a query, at least one of three query conditions of the query time keyword qtk, the query space keyword qsk and the query ship identification set qms is not empty, and if the query time keyword qtk is empty, the query covers the whole time range; if the query space keyword qsk is empty, querying to cover the whole space range; if the query ship identification set qms is empty, querying to cover all ships;

and 2, step: parallel queries are started, and the query process at each local index partition (5) is as follows:

step 201: for each local index partition, firstly, inquiring a space-time index (6) through an inquiry time keyword qtk and an inquiry space keyword qsk to obtain a set crks of candidate row keys of an inquiry result ₀ ，crks ₀ ＝{rk _0,1 ,rk _0,2 ,…,rk _m,n }，rk _m,n Represents that the MMSI is m in the time interval ti _n Generating a row key of the track segment;

step 202: inquiring the ship identification time index (7) through the inquiry time key words qtk and the inquiry ship identification set qms to obtain the set crks of the candidate row keys of the inquiry result ₁ ，crks ₁ ＝{rk _0,1 ,rk _0,2 ,…,rk _i,j H, rk, wherein rk _i,j Representing a ship with MMSI of i in a time interval ti _j Generating row keys of the track segments;

step 203: computing crks for candidate row keys ₀ Crks set with candidate row keys ₁ The intersection of the two sets of row keys obtains a set crks of candidate row keys ₂ Through candidate row key set crks ₂ Obtaining a candidate row key set mcrks ₀ ；

Step 204: reading candidate row key set mcrks according to the dictionary sequence of row key value ₀ For each element of the current candidate row key element mcrk _g By mcrk _g Inquiring the track storage partition (4) corresponding to the local index partition (5) to obtain a candidate row key set mcrks ₀ Then, filtering the track data of one row in the track storage partition corresponding to one row key through a query time keyword qtk and a query space keyword qsk to obtain a query result of the track storage partition;

7. The method for performing parallelized massive ship historical track data query according to claim 6, characterized in that: the specific process of step 201 is as follows: for each local index partition (5), firstly querying a spatio-temporal index (6) through querying a time keyword qtk and a space keyword qsk, when searching the spatio-temporal index (6), firstly searching a classification index at the upper layer of the spatio-temporal index (6) through querying the time keyword qtk, and if a certain index item binary A (tc, rtree) of the classification index meets the requirement of the binary A (tc, rtree) of the classification index

Searching the R tree rtree corresponding to the index item from top to bottom, and if an intermediate node triple A (tp, mbr, rcns) of rtree is searched, if the intermediate node satisfies ≥ h>

And->

Searching the sub-node of the node through the set rcns pointing to the sub-node pointer; each time a leaf node triple B (tp, mbr, tses) in rtree is searched, if the leaf node satisfies ^ or greater>

And->

Searching the index item contained in the node through a set tses of pointers pointing to the track segment index items; every time when an index item binary B (rk, mbr 1) is searched, acquiring a time interval ti of an indexed track segment from a row key rk, and if an index item satisfies ^ and ^>

And->

Extracting the row key of the index item to be regarded as a candidate row key, then regarding the index item as a candidate index item, and obtaining a set crks of the candidate row keys after the space-time index search is finished ₀ Is marked as crks ₀ ＝{rk _0,1 ,rk _0,2 ,…,rk _m,n H, rk, wherein rk _m,n Represents that the MMSI is m in the time interval ti _n Row keys for the track segments are generated.

8. The method for performing parallelized massive ship historical track data query according to claim 7, wherein: the specific process of step 202 is as follows: inquiring the ship identification time index (7) through the inquiry time keyword qtk and the inquiry ship identification set qms, and inquiring the ship identification mmsi for each inquiry ship identification set qms _q Using mmsi _q Searching the vessel identification time index (7) if mmsi _q Search hits, with the index term in the hit set to (mmsi) _q ,bptree _q )，bptree _q Represents the mmi in the index (7) of the vessel identification time _q Corresponding to the lower B + tree, then search for bptree in a top-down manner through qtk _q Every search to bptree _q If the intermediate node (tp, bcns) satisfies the condition

Searching the child node of the node through the set bcns pointing to the child node pointer; bptree per search _q When a leaf node quadruple (pre, next, tp, mes) is found, if the leaf node satisfies/>

Extracting the row key contained in mes to be used as a candidate row key, continuously searching the leaf node connected with the leaf node through a pointer pre pointing to the previous leaf node and a pointer next pointing to the next leaf node after the leaf node is searched according to the top-down mode and the leaf node of which the first time range is intersected with the qtk time range is set to be (pre) _p ,next _p ,tp _p ,mes _p ) Is a leaf node, pre, connected to (pre, next, tp, mes) by a pre pointer _p ,next _p ,tp _p ,mes _p Respectively representing a pointer pointing to a previous leaf node, a pointer pointing to a next leaf node, a time range of a node and a set of row keys corresponding to a track section in the time range of the node which are connected with (pre, next, tp, mes) through a pre pointer, and judging ^ er>

If yes, then mes is extracted _p Including row keys, as candidate row keys, and passing pre _p Continuing to search for connected leaf nodes, and so on until the time ranges of the currently searched leaf nodes are not intersected; is (pre) _n ,next _n ,tp _n ,mes _n ) Is a leaf node, pre, connected to (pre, next, tp, mes) by a next pointer _n ,next _n ,tp _n ,mes _n Respectively representing a pointer pointing to a previous leaf node, a pointer pointing to a next leaf node, a time range of a node and a set of row keys corresponding to a track section in the time range of the node which are connected with (pre, next, tp, mes) through a next pointer, and judging ^ H>

If yes, then mes is extracted _n The contained row key is regarded as a candidate row key and passes next _n Continuing to search for connected leaf nodes until currentSearching the leaf nodes until the time ranges of the leaf nodes are not intersected; when passing pre _p And next _n When the searched leaf nodes are not intersected with qtk, stopping searching the bptree _q After the current search of the ship mark is finished, reading the next ship mark in the ship mark set qms, and after the search of all elements in qms is finished, obtaining the set crks of the candidate row keys ₁ Is marked as crks ₁ ＝{rk _0,1 ,rk _0,2 ,…,rk _i,j Where rk, rk _i,j Representing a ship with MMSI of i in a time interval ti _j Row keys for the track segments are generated.

9. The method for performing parallelized massive ship historical track data query according to claim 8, characterized in that: in the step 203, crks is set by candidate row keys ₂ Obtaining a candidate row key set mcrks ₀ The specific process comprises the following steps: for candidate row key set crks ₂ The row keys in the middle are sorted from small to large according to the row key value, and are sorted according to the MMSI attribute while being sorted, so that a sorted and sorted row key set mcrks is obtained ₀ ＝{mcrk ₀ ,mcrk ₁ ,…,mcrk _k Where mcrk _k Set of candidate row keys representing a ship with MMSI of k, and at mcrk _k In the middle, the row keys are arranged in the order from small to large.

10. The method for performing parallelized massive ship historical track data query according to claim 9, wherein: the specific process of step 204 is as follows: sequential reading of row selection key set mcrks ₀ For the current element mcrk _g By belonging to mcrk _g The candidate row key queries a track storage partition (4) corresponding to the local index partition (5) to obtain a candidate row key set mcrks ₀ The track data of the corresponding row of all the candidate row keys in the system is recorded as cTs _g ＝{p _g,1 ,p _g,2 ,…,p _g,f In which p is _g,f Representing the f-th track point of the ship with MMSI g sorted according to the ascending order of time, and then querying the relation by querying a time keyword qtk and a spaceThe key word qsk pair belongs to the cTs _g The trace points of (a) are filtered, and the trace point p is filtered _g,d ∈cTs _g If the locus point p _g,d Is not in the query time keyword qtk range or the track point p _g,d If the spatial position of the point is not in the range of the query spatial key word qsk, the track point p is determined _g,d From the cTs _g Deleting the track points, and obtaining a track point query result Ts of the ship g after filtering _g Due to mcrk _g The middle candidate row keys are ordered, and the track points in the track storage partition (4) are stored according to the time sequence, so the track query result Ts _g The track point query result in the system can be automatically stored according to the time sequence, and the track query result Ts is _g Adding to the partitioned query results Tss _p In, continue to access mcrks ₀ Until mcrks ₀ Until the elements in (1) are processed.