CN106777093B

CN106777093B - Skyline inquiry system based on space time sequence data flow application

Info

Publication number: CN106777093B
Application number: CN201611150565.3A
Authority: CN
Inventors: 季长清; 王宝凤; 谢雨婧; 李媛媛
Original assignee: Dalian University
Current assignee: Dalian University
Priority date: 2016-12-14
Filing date: 2016-12-14
Publication date: 2021-01-01
Anticipated expiration: 2036-12-14
Also published as: CN106777093A

Abstract

A Skyline query system based on space time sequence data stream application belongs to the application field of dynamic Skyline query in data stream and is used for solving the problem of real-time query processing of mass data. The technical points are as follows: including cloud center service system, cloud center service system includes: a dividing module for performing a spatial time sequence based division of a continuous time sequence into a plurality of time segments in time windows; the inverted grid index generating module is used for generating a grid inverted index for each time segment; and the computing module is used for mapping the moment query points to corresponding Skyline grids, then obtaining global Skyline grids as a candidate set by using a global Skyline grid computing method, and then performing dynamic Skyline query on network node data in the candidate set according to a time sequence to obtain an effective global Skyline result by computing. The effect is as follows: the result query is carried out at the moment when the execution time is finished, so that the method is more accurate and accords with the actual situation.

Description

Skyline inquiry system based on space time sequence data flow application

Technical Field

The invention relates to the application field of dynamic Skyline query in data stream, in particular to a Skyline query system based on space time sequence data stream application, which relates to large-scale data analysis, space time sequence mass data processing and global Skyline calculation.

Background

With the rapid development of the internet and the internet of things and the wide application of technologies such as social networks and cloud computing, mass data technologies are rapidly developed. Massive data is collected and recorded and used for research and analysis in the fields of science, engineering, commerce and the like. Recent studies have shown that: global internet, mobileData sources such as the internet, GPS networks, etc. are generated in excess of 2.5 x 10 a day¹⁸Bytes of mass data, and the sources of these mass data are wide. Data on the internet is turned over every two years, and mass data are constantly added to the internet of things, the mobile internet, the internet of vehicles and various sensor networks. However, the explosive growth of massive data has made traditional stand-alone data analysis and processing techniques increasingly unsuitable for the current intensive data analysis and processing requirements. In order to save cost and provide a distributed processing framework for storage and computation of large-scale data, related technologies such as cloud computing, big data, cloud storage, MapReduce and BigTable are proposed.

As predicted by cisco, in 2016 79% of the world's data centers host cloud computing platforms. Mass data are stored in the cloud computing platforms, and due to the fact that the data volume is too large, requirements of the mass data processing technologies on software and hardware are very high, system resources occupy very much, and the problem of low algorithm efficiency is brought. A plurality of scholars put forward a plurality of new high-efficiency mass data processing algorithms by virtue of a cloud computing platform, and the Skyline algorithm is one of the high-efficiency data query and extraction methods, so that key information can be extracted from mass data quickly, the data volume is greatly reduced, the requirements on software and hardware in mass data processing are reduced, and the data processing efficiency is improved. The Skyline algorithm is used as an effective data extraction and processing method, mainly considers how to find out the most interesting or most concerned information of people from huge data sets, and has wide application in the aspects of mass data analysis and processing, such as multi-target decision, shop addressing, environment monitoring, image retrieval, personalized recommendation, data mining and the like. The Skyline query can provide a multi-attribute evaluation principle for a user in a decision process, and the evaluation function can also adopt different measurement methods (such as Euclidean distance, spatial distance and the like) according to different applications so as to improve the experience quality of the user; skyline calculation can help market analysts to position prices and market strategies for mass business transaction data records; in the environment monitoring process, potential natural disasters and risks can be analyzed and evaluated by analyzing massive data accumulated by the sensor network. In addition, the Skyline query is also applied to the fields of image retrieval, shop addressing and the like.

The Skyline algorithm has a plurality of varieties, and the application scenes of the varieties are wider. The variants have respective characteristics and facing problems, most of the existing Skyline algorithms based on MapReduce are static Skyline algorithms, and the problems of Skyline variants cannot be universally solved. Therefore, the Skyline algorithm based on MapReduce is to be further researched and expanded. Besides the urgent need of MapReduce, the variant algorithms still face some self problems to be solved, for example, the subspace Skyline can well solve the problem of large calculation amount caused by high-dimensional data, but the returned result set is too large and most results are not required by users, and the characteristic is not suitable for the current mobile internet terminal query application trend; the attribute value of the queried object in the dynamic Skyline changes along with the change of the queried object, the calculation amount is large, and the requirements on the real-time performance, the response time and the user experience of the algorithm are high. The partition mode or the index mode adopted by the Skyline algorithm based on MapReduce at present cannot meet the requirements; the problem of measurement space modeling and the problem of high query complexity exist in the measurement space Skyline, the query precision is influenced, and the calculation amount is increased. Because all attribute values in the dynamic Skyline change along with the change of the query point, the problems of large calculation amount and high real-time requirement can be encountered when mass data is processed. For example, dynamic Skyline query of a mobile phone user has a very high requirement on real-time performance, and data generated by a mobile phone terminal in the big data era becomes a main source of data growth. Aiming at the trend, the dynamic Skyline algorithm in a centralized environment cannot be competent for processing mass data; the partition mode generally adopted by the existing Skyline algorithm based on MapReduce also does not adapt to the requirement. The dynamic query of parallel anti-Skyline implemented with MapReduce proposed in the literature relies on quadtree (rsky-quadtree) partitioning, which has the disadvantage that for each query point q, an extra step is required to convert the coordinates p of each data point to p', and the quadtree needs to be subsequently re-established. When faced with large data, coordinate transformation and the reconstruction of the quadtree incur burdensome overhead. In order to solve the problems, the definitions of the Skyline lattices and the global Skyline lattices are provided, and a dynamic Skyline query algorithm based on the application of the spatial time sequence data stream is provided on the basis of the definitions. The method has the main idea that a dynamically changing data space is divided into Skyline non-uniform lattices with time stamps according to a time window as a unit, namely, a reverse lattice index structure based on time sequencing is established. When a query point arrives, the current query time is judged to predict the query ending time (the prediction or sampling prediction can be carried out according to the average system execution time and is represented by the lower limit of an execution time window), then the domination relation of Skyline lattices in four quadrants around the query point at the ending time is calculated in a polling mode, global Skyline lattices are obtained according to the domination relation comparison, and data in the global Skyline lattices form a candidate set and are used for the next dynamic Skyline calculation. The method not only can effectively carry out real-time pruning and save a large amount of unnecessary calculation, but also can carry out dynamic adjustment according to time change, thereby accelerating the inquiry of dynamic Skyline and leading the result to be relatively more accurate; in order to verify the algorithm provided by the patent, a system prototype is finally designed and applied to the detection of the abnormal condition of network monitoring.

The existing Skyline algorithm based on MapReduce has less support on the Skyline query in a time-based subspace and the dynamic Skyline query in time sequence under the parallel environment. For example, some Skyline algorithms based on MapReduce modify the Hadoop framework, but still have the problems of poor expandability and poor universality. The dynamic Skyline query method based on MapReduce researched and designed in the prior art can only process non-real-time data in an off-line batch mode and cannot be well used for real-time data query. These methods have not been suitable for data queries that are now explosively growing, and based on this starting point we have designed and implemented the invention.

Disclosure of Invention

According to the defects and shortcomings in the background art, the invention provides a skyline query system based on space time sequence data stream application in a cloud computing environment, so as to improve the defects of the existing dynamic skyline query method of data streams, and improve the accuracy and processing efficiency and the user experience.

A Skyline inquiry system based on space time sequence data flow application comprises a cloud center service system, wherein the cloud center service system comprises: a dividing module for performing a spatial time sequence based division of a continuous time sequence into a plurality of time segments in time windows; the inverted grid index generating module is used for generating a grid inverted index for each time segment; and the computing module is used for mapping the moment query points to corresponding Skyline grids, then obtaining global Skyline grids as a candidate set by using a global Skyline grid computing method, and then performing dynamic Skyline query on network node data in the candidate set according to a time sequence to obtain an effective global Skyline result by computing.

The specific steps of the space time sequence division by the division module are as follows: given a set of objects P, each data point P_kIn a bounded interval [ T ]_min,T_max]Constructing a uniform partition t₀,...,t_B}，t_iDefinition of (t)_i＝T_min+l×i，l＝(T_max-T_min)/B,i＝0,...,B

Form a set of time slices b₀,...,b_B-1Each time slice b_i＝[t_i,t_i+1) The fixed length is l, and B is the number of the bounded intervals which are uniformly divided; the time attribute value of each point is t and is mapped to a time slice b_s(t)∈{b₀,...,b_B-1Where s (t) is defined as follows:

in the inverted grid index generation module, for each time slice, the generation process of the grid inverted index is as follows: let a given set of d-dimensional spatial objects P ═ P₁,...,p_nP for each data point P_kI.e. p_kAll e.p have d-dimensional attribute P_k.x₁,...,p_k.x_dData of d dimension }The space is divided into grids with equal width, and the width of each unit grid is (₁,...,_d) (ii) a The width of the cell is determined according to the value of each dimension, so that the mapped data points can be uniformly distributed in the cell, all the points in the same time slice are scanned,

point p_kMapping to grid coordinates

Coordinate mapping such as

In the calculation module, the global Skyline grid calculation method comprises the following steps: query points q are mapped to corresponding grid cells c_qIn the middle, the whole grid area is divided into an influence area and a dominated area, the influence area comprises c_qPeripheral non-empty cells and grid c_qA grid on the same horizontal or vertical line; the dominated region is a region dominated by the affected region, for the search of the affected region, a quadrant polling method is adopted, the domination relationship of non-empty Skyline lattices in each quadrant around the query point is calculated through gradual expansion, and data points in global Skyline lattices and lattices are obtained through comparison according to the domination relationship.

The Skyline format governing method is as follows: given any two non-empty Skyline grids C in the Skyline grid set C on the q, d dimension space of the query point_i,c_j，

Simultaneously, the following conditions are satisfied:

①

(c_i(t)-q(t))(c_j(t)-q(t))＞0；

②

|c_i(t)-q(t)|≤|c_j(t)-q(t)|；

③

|c_i(t)-q(t)|＜|c_j(t)-q(t)|。

skyline lattice c_iSkyline lattice c dominated by q_j。

The global Skyline lattice is given a lattice set C, the global Skyline lattice of C is a lattice set which is not globally dominated by other lattices, and the global Skyline lattice is defined as:

when the index is established, a MapReduce processing flow is used, a plurality of maps are started to read the streaming data at the same time, each Map reads different HDFS data fragments, a < key, value > data pair is generated, the key is a space-time index, the value is a hashmap data structure, and corresponding data points obtained according to division are stored in the hash Map data structure; and the intermediate data obtained by each Map is a sub-index of partial data, sorting is completed according to key, and merging generation of the index is completed by calling a Reduce.

When the space time sequence is divided, a monitoring time range is set, a threshold value is set accordingly, if the query time exceeds the specified time range, a plurality of time windows need to be spanned, the size of the time window needing to be spanned is evaluated, and if the size of the time window exceeds the time threshold value, the direct query fails.

Has the advantages that: the spatial time sequence data flow system can accurately and efficiently process a large amount of information according to the requirements of a user through the related technology, then upload the information to a cloud server for analysis, and feed back a final conclusion to the user.

Drawings

FIG. 1 is based on a time series partitioning;

FIG. 2 is a time series based inverted index structure;

FIG. 3 is a grid-based inverted index creation process;

FIG. 4 illustrates an example of a MapReduce generated index;

fig. 5 global sky grid.

Detailed Description

Example 1:

the skyline query method is based on space time sequence data flow application. The invention comprises the following steps:

s1, based on space time sequence division:

we will divide a continuous time series into several time segments in time windows. As shown in fig. 1, the method is as follows: given a set of objects P, each data point P_kIn a bounded interval [ T ]_min,T_max]Constructing a uniform partition t₀,...,t_B}，t_iDefinition of (1):

t_i＝T_min+l×i，l＝(T_max-T_min)/B,i＝0,...,B；

to form a set of time slices b₀,...,b_B-1Each time slice b_i＝[t_i,t_i+1) The fixed length is l. The time attribute value of each point is t and is mapped to a time slice b_s(t)∈{b₀,...,b_B-1Wherein s (t) is as defined

And B is the number of the bounded intervals which are uniformly divided.

The interval fixed length (l) values of different granularities are determined according to practical application conditions. In order to reduce the calculation amount, a monitoring time range is set, a threshold value is set, if the query exceeds the specified time range, a plurality of time windows need to be spanned, the size of the time windows needing to be spanned is evaluated, and if the threshold value is exceeded, the direct query fails during the query. Because a time window is introduced, a monitoring range needs to be further defined, if the time window is too small, and the data volume accumulation is not large, a batch flow caching method is adopted, and data flow is cached and then sent in batch periodically. If the time window is large and the data size is large, the data stream is split according to the window, and the splitting granularity is determined by the actual application scene. Therefore, the upper limit and the lower limit of the monitoring range are limited, and if the monitoring range is exceeded, query failure processing is carried out. The processing method also meets the requirement of actual inquiry application, for example, because the vehicle is driven too fast, the inquiry does not need to be continued after leaving a certain application area. The experimental test results show that the application effect is relatively good when the calculation is carried out according to the sampling distribution probability.

S2, establishing a grid reverse index for the time segment:

in this step, a data structure of the inverted grid index based on the time series is designed as shown in fig. 2. For each time slice, the time is determined, and the ending time (i.e. the lower limit of the execution time window) is estimated, and the grid is indexed backwards here, and the index generation process is shown in fig. 3. Let a given set of d-dimensional spatial objects P ═ P₁,...,p_nP for each data point P_kI.e. p_kAll e.p have d-dimensional attribute P_k.x₁,...,p_k.x_d}. The d-dimensional data space is divided into grids with equal width, and the width of each unit grid is (₁,...,_d). The width of the cell is determined according to the value of each dimension, so that the mapped data points can be uniformly distributed as much as possible. All points within the same time slice are scanned,

point p_kMapping to grid coordinates

The coordinate mapping is as follows:

in step S1 and step S2, a plurality of maps are started to read the stream data simultaneously by using a MapReduce processing flow based on two processes of time sequence division and grid index generation, each Map reads a different HDFS data segment to generate a data pair such as < key, value >, where key is a spatio-temporal index, value is a hashmap data structure, and corresponding data points obtained by division are stored in the value. The intermediate data obtained by each Map, i.e. the sub-index representing part of the data, is sorted automatically according to key. In order to ensure data integrity and consistency, a Reduce is called finally to complete the merging generation of the index. The generation based on the time sequence inverted index is a preprocessing process, the pre-generation can be used for subsequent query, the query time is not occupied, and the method is an effective data management mode. Meanwhile, the capacity of MapReduce for parallel processing of big data can well finish the work.

And simultaneously starting a plurality of maps to read time stream data by using a Spark stream system, wherein each Map reads different HDFS data fragments to generate a data pair of < key, value >, wherein the key is a space-time index, the value is a hashmap data structure, and corresponding data points obtained according to division are stored in the hash Map data structure. Each Map gets intermediate data with the number of time slices B set to n and the grid width 15 as shown in fig. 4, that is, sub-indexes representing partial data, and sorting is automatically done according to key.

Compared with the work of the people, the method has two optimizations, one is to use the time when the execution time is finished to perform result query, and the method is more representative. For example, if a vehicle running fast on a highway starts an inquiry request, the inquiry result should be filtered according to the time point of the end of the inquiry, so that the result will be more accurate and conform to the actual situation. The other optimization is that a Spark stream processing system is adopted, and the result of Map calculation is distributed and cached in a stream form, and is not written in the HDFS, so that the calculation speed can be greatly accelerated.

S3 calculation of Global Skyline lattices

When mass data is faced, in order to reduce the calculation amount, a calculation method of a coarse-grained global Skyline lattice is provided, and data in the global Skyline lattice after polling calculation is used as a candidate set. Compared with the original data set, the data volume in the candidate data set is greatly reduced, so that the comparison of dominant relations in the next dynamic Skyline calculation is reduced, and the process is similar to pruning. The definition of the Skyline lattice domination relation and the definition of the global Skyline lattice are given below,

definition (Skyline lattice governs): given any two non-empty Skyline grids C in the Skyline grid set C on the q, d dimension space of the query point_i,c_jThen Skyline lattice c_iSkyline lattice c dominated by q_jNamely, it is

Simultaneously, the following conditions are satisfied:

①

(c_i(t)-q(t))(c_j(t)-q(t))＞0；

②

|c_i(t)-q(t)|≤|c_j(t)-q(t)|；

③

|c_i(t)-q(t)|＜|c_j(t)-q(t)|。

definition (global Skyline lattice): the global Skyline lattice (GSC) for a given lattice set C, C is a set of all lattices that are not globally dominated by other lattices

The overhead of dynamic Skyline query has a direct relation with the size of a data set, particularly the overhead of real-time judgment of dominating relation among mass data is large, and each query needs to be recalculated. The concept of the global Skyline lattice can well realize coarse-grained pruning, and a candidate set obtained on the basis is the basis for realizing the next step of dynamic Skyline query calculation. The course of coarse-grain pruning will be described in detail below.

As shown in FIG. 5, query point q is mapped to a corresponding netGrid cell c_qIn the middle, the entire grid area is divided into the affected area and the dominated area. The region of influence comprising c_qGrid c with non-empty periphery₁，c₂，c₃,...,c₈And with a grid c_qGrid on the same horizontal or vertical line, e.g. c₉Grid; dominated region refers to the region dominated by the affected region, e.g. c in the second quadrant₁₀And (4) grid. For the search of the influence area, a 2d quadrant polling method (d is a data set dimension) is adopted, the domination relation of non-empty Skyline lattices in each quadrant around the query point is calculated through a gradual expansion method, and data points in global Skyline lattices and lattices are obtained through comparison according to the domination relation, so that the data points in the influence area can be obtained without traversing all data. The traversal of a very small number of Skyline bins significantly reduces computational overhead relative to the full traversal of the raw data.

In the step, the global Skyline lattice is applied to the data monitored by the network, the query point q at the moment is mapped to the corresponding Skyline lattice, then the global Skyline lattice is obtained by using a global Skyline lattice calculation method and is used as a candidate set, then the network node data in the candidate set is subjected to dynamic Skyline query according to time sequence, and finally an effective global Skyline result, namely a node close to a query threshold value in the network monitoring is obtained by calculation.

The corresponding system or device obtained by the above method is as follows:

in the inverted grid index generation module, for each time slice, the generation process of the grid inverted index is as follows: let a given set of d-dimensional spatial objects P ═ P₁,...,p_nP for each data point P_kI.e. p_kAll e.p have d-dimensional attribute P_k.x₁,...,p_k.x_dD-dimensional data space is divided into grids with equal width, and the width of each unit grid is (₁,...,_d) (ii) a The width of the cell is determined according to the value of each dimension, so that the mapped data points can be uniformly distributed in the cell, all the points in the same time slice are scanned,

point p_kMapping to grid coordinates

Coordinate mapping such as

Simultaneously, the following conditions are satisfied:

①

(c_i(t)-q(t))(c_j(t)-q(t))＞0；

②

|c_i(t)-q(t)|≤|c_j(t)-q(t)|；

③

|c_i(t)-q(t)|＜|c_j(t)-q(t)|。

skyline lattice c_iSkyline lattice c dominated by q_j。

Example 2:

the present embodiment relates to a specific application of the Skyline query method based on the application of the spatial time series data stream described in embodiment 1:

the Skyline query system based on the space time sequence data flow application is used for calling of mobile medical treatment, wherein the cloud center service system provides a space grid pruning strategy and continuous network medical data monitoring to execute a dynamic Skyline and global Skyline algorithm, threshold values of all attributes are input, and query results are sent according to the time when execution time ends, so that attributes of a hospital are improved. Namely, the system executes the following steps:

s1, a cloud center service system provides a module index data structure through a distributed dynamic Skyline and global Skyline algorithm, meanwhile, a Spark stream system is utilized to start a plurality of maps to read time stream data, each Map reads different HDFS data fragments to generate a data pair of key and value, the key is a space-time index, the value is a hashmap data structure, and corresponding data points obtained through division are stored in the data pair to screen large-scale medical institution data.

S2, the intelligent mobile client is firstly positioned on the terminal equipment through the GPS, and the space where the user is located and the personalized requirements are determined. And then operating a medical calling program, communicating through the cloud server, sending a query instruction, and performing information interaction with the space filtering result and the continuous space monitoring data fed back by the cloud center service system under the participation of the user.

Example 3:

the Skyline query method based on the application of the spatial time series data stream in embodiment 1 is used for epidemic detection, and first, we divide a time series for epidemic monitoring into a plurality of time segments according to a time window, and then perform Skyline static query on data of each time segment. For a set of time objects P with epidemic, each data point P_kIn a bounded interval [ T ]_min,T_max]Constructing a uniform partition t₀,...,t_B}，t_iDefinition of (t)_i＝T_min+l×i，l＝(T_max-T_min) B, i ═ 0. Form a set of time slices b₀,...,b_B-1Each time slice b_i＝[t_i,t_i+1) The fixed length is l. The time attribute value of each point is t and is mapped to a time slice b_s(t)∈{b₀,...,b_B-1Wherein s (t) is as

Wherein the value of the interval (l) of different granularity is determined according to the time of actual monitoring. Meanwhile, in order to reduce the calculation amount, a time range for monitoring the epidemic disease is set, a threshold value is set, if the query exceeds the specified time range, a plurality of time windows need to be spanned, the size of the time window needing to be spanned is evaluated, and if the threshold value is exceeded, the query fails directly. The state of the network nodes is dynamically monitored in real time through network monitoring, and each node continuously sends the time when the execution time is finished to the server, so that the result is more accurate and accords with the actual situation.

Example 4:

the skyline query method based on the application of the spatial time series data stream in embodiment 1 is used forAnd analyzing medical historical data. When the medical history data set is given, the static Skyline results can be determined. If real-time medical data is continuously added, an inquiry request is specified, and a dominant relationship between objects in an inquiry data set and an inquiry request point is considered, then the inquiry result of the Skyline is uncertain, that is, the inquiry result is different according to different inquiry reference objects for dynamic Skyline inquiry, if the inquiry of a user is considered to be possibly changed, the inquired medical history data is also changed, and if the dominant relationship exists, the multi-factor inquiry is the Skyline inquiry, and if the accumulated historical medical data, especially the multi-dimensional data information is as follows: when the information such as illness state, etiology, illness time, treatment condition and the like is very large and cannot be processed by a single computing node, the cloud computing technology is used for parallel processing. DynamicSkylineQuery: one d-dimensional data space S ═ S₁,s₂,...,s_dP is a data set on the data space S, i.e., P ═ P₁,p₂,...,p_nAnd one query object ref carries out dynamic domination calculation on the vector according to the dynamic domination relation and time, and a Skyline result set is obtained through calculation. Data object b dynamically dominates a if and only if b is no farther than a from ref in all attributes and has at least one dimension closer than a. If the query points are dynamically changing over time, then the indexing and querying operations also need to be processed dynamically in time-stream order. The advantages of this embodiment are: the result query is carried out at the moment when the execution time is finished, so that the method is more accurate and accords with the actual situation. And the other is that a Spark stream processing system is adopted to distribute and cache the result of Map calculation in a stream form, so that the calculation speed can be greatly accelerated. The invention is realized to be applied to: monitoring in the aspect of mobile disease early warning, calling for mobile medical treatment, retrieving medical history data and the like. Aiming at the difficulty in maintaining the data increment of the query result, the method is applied to the dynamic query of the global variable skyline in network monitoring and focuses on the discovery of abnormal conditions.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be able to change or modify the technical solution and the inventive concept of the present invention within the technical scope of the present invention.

Claims

1. A Skyline inquiry system based on space time sequence data flow application is characterized in that,

including cloud center service system, cloud center service system includes:

a dividing module for performing a spatial time sequence based division of a continuous time sequence into a plurality of time segments in time windows;

the inverted grid index generating module is used for generating a grid inverted index for each time segment;

the computing module is used for mapping the moment query points to corresponding Skyline grids, then obtaining global Skyline grids as a candidate set by using a global Skyline grid computing method, and then performing dynamic Skyline query on network node data in the candidate set according to a time sequence to obtain an effective global Skyline result by computing; parallel processing using cloud computing techniques, a d-dimensional data space S ═ S₁,s₂,...,s_dP is a set of data P ═ P over the data space S₁,p₂,...,p_nA query object ref dynamically dominates the vector according to time according to a dynamic domination relation, a result set of Skyline is obtained through calculation, a is dynamically dominated by a data object b, if and only if b is not farther than a from ref on all attributes and at least one dimension is closer than a, if query points dynamically change according to time, index and query operations are also dynamically processed according to a time stream sequence;

setting a monitoring time range and a threshold, if the query exceeds the specified time range, needing to span a plurality of time windows, evaluating the size of the time windows needing to be spanned, if the time windows exceed the threshold, directly failing the query during the query, and if the time windows are too small, and under the condition that the data volume is not accumulated, adopting a batch flow caching method to cache the data flow and then periodically sending the data flow in batches; and if the time window is large and the data volume is large, splitting the data stream according to the window, wherein the splitting granularity is determined by the actual application scene.

2. The Skyline query system based on spatial time-series data stream application of claim 1, wherein in the inverted grid index generation module, for each time slice, a generation process of the inverted grid index is as follows: let a given set of d-dimensional spatial objects P ═ P₁,...,p_nP for each data point P_kI.e. p_kAll e.p have d-dimensional attribute P_k.x₁,...,p_k.x_dD-dimensional data space is divided into grids with equal width, and the width of each unit grid is (₁,...,_d) (ii) a The width of the cell is determined according to the value of each dimension, so that the mapped data points can be uniformly distributed in the cell, all the points in the same time slice are scanned,

point p_kMapping into meshGrid coordinate

Coordinate mapping such as

3. The Skyline query system based on space time series data flow application of claim 1, wherein in the computation module, the global Skyline lattice computation method is: query points q are mapped to corresponding grid cells c_qIn the middle, the whole grid area is divided into an influence area and a dominated area, the influence area comprises c_qPeripheral non-empty cells and grid c_qA grid on the same horizontal or vertical line; the dominated region is a region dominated by the affected region, for the search of the affected region, a quadrant polling method is adopted, the domination relationship of non-empty Skyline lattices in each quadrant around the query point is calculated through gradual expansion, and data points in global Skyline lattices and lattices are obtained through comparison according to the domination relationship.

4. The Skyline query system based on the application of the spatial time series data streams as claimed in claim 3, wherein the Skyline format governing method is as follows: given any two non-empty Skyline grids C in the Skyline grid set C on the q, d dimension space of the query point_i,c_j，c_i＜_qc_jSimultaneously, the following conditions are satisfied:

①

(c_i(t)-q(t))(c_j(t)-q(t))＞0；

②

|c_i(t)-q(t)|≤|c_j(t)-q(t)|；

③

|c_i(t)-q(t)|＜|c_j(t)-q(t)|；

skyline lattice c_iSkyline lattice c dominated by q_j。

5. A sky inquiry system based on spatial time series dataflow application as claimed in claim 3, wherein the global sky lattices are, given a lattice set C, the global sky lattice of C is a set of all lattices that are not globally dominated by other lattices, and is defined as:

6. the sky line inquiry system based on space time series data flow application of claim 1, wherein when an index is established, a MapReduce processing flow is used, a plurality of maps are started to read stream data simultaneously, each Map reads different HDFS data segments, and generates a < key, value > data pair, where key is a space-time index, value is a hashmap data structure, and corresponding data points obtained according to division are stored therein; and the intermediate data obtained by each Map is a sub-index of partial data, sorting is completed according to key, and merging generation of the index is completed by calling a Reduce.