CN114267211A - Multidimensional space-time data thinning and restoring algorithm - Google Patents

Multidimensional space-time data thinning and restoring algorithm Download PDF

Info

Publication number
CN114267211A
CN114267211A CN202111643342.1A CN202111643342A CN114267211A CN 114267211 A CN114267211 A CN 114267211A CN 202111643342 A CN202111643342 A CN 202111643342A CN 114267211 A CN114267211 A CN 114267211A
Authority
CN
China
Prior art keywords
data
time
point
restored
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111643342.1A
Other languages
Chinese (zh)
Other versions
CN114267211B (en
Inventor
张重阳
宣彤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Variflight Technology Co ltd
Original Assignee
Variflight Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Variflight Technology Co ltd filed Critical Variflight Technology Co ltd
Priority to CN202111643342.1A priority Critical patent/CN114267211B/en
Publication of CN114267211A publication Critical patent/CN114267211A/en
Application granted granted Critical
Publication of CN114267211B publication Critical patent/CN114267211B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to data management, in particular to a multidimensional spatio-temporal data thinning and restoring algorithm, which is characterized in that the multidimensional spatio-temporal data are subjected to relevance classification and divided into a plurality of data buckets, a key time value in each data bucket is solved to obtain time value sets of all the data buckets, corresponding measurement point values are extracted from original data based on the time value sets to obtain compressed results, intervals where time points to be restored are located are obtained from the time value sets, the measurement point values of the data buckets at the time points to be restored are calculated, and then restoring results of each dimension are obtained; the technical scheme provided by the invention can effectively overcome the defects that the resource consumption is difficult to reduce, the data characteristic loss is reduced, and the data after thinning cannot be effectively restored in the prior art.

Description

Multidimensional space-time data thinning and restoring algorithm
Technical Field
The invention relates to data management, in particular to a multidimensional space-time data rarefying and restoring algorithm.
Background
With the explosive growth of internet of things devices, the consumption of network bandwidth, storage occupation, graphic rendering and other resources is increasing when internet of things data is collected, stored, processed and applied.
With the rapid development of satellite navigation, internet technology and sharing economy, the requirement for high-precision positioning is more and more strong, in order to meet the requirement for high-precision positioning, a large amount of monitoring data needs to be acquired, the monitoring data is converted into a driving track, and a supervisor can know the running state in time through a monitoring view.
In collecting the monitoring data, the monitoring data may be collected with a finer granularity, for example, the monitoring data may be collected in units of seconds. When a supervisor checks monitoring data within a period of time, the performance of drawing a monitoring curve of the monitoring data is greatly reduced due to the large data volume. Moreover, most monitoring data are data in a normal operation state, a monitoring alarm cannot be triggered, and a supervisor does not need to pay attention to the data in the normal operation state, so that a large amount of redundant data is contained in the monitoring data. In this case, the monitor view needs to display a very large number of data points, resulting in a reduced clarity of the monitor view.
In order to solve the above problem, processing of monitoring data based on a real-time data processing workflow (Spark Streaming) may be adopted in the conventional art. Specifically, the monitoring data may be divided into multiple segments, each segment of the monitoring data is subjected to aggregation calculation, and the aggregated and calculated data is plotted into a curve, for example: and calculating the average value of each section of monitoring data, and drawing the average value of each section of monitoring data into a curve. Although the data volume of the monitoring data is reduced after the aggregation calculation, and the performance and the definition of the drawn curve are improved, the data after the aggregation calculation cannot accurately reflect the data characteristics. Therefore, how to reduce the data amount of the monitoring data and reduce the loss of the data characteristics as much as possible is an urgent problem to be solved.
Disclosure of Invention
Technical problem to be solved
Aiming at the defects in the prior art, the invention provides a multidimensional space-time data thinning and restoring algorithm, which can effectively overcome the defects that the prior art is difficult to reduce the loss of data characteristics and cannot effectively restore thinned data while reducing the resource consumption.
(II) technical scheme
In order to achieve the purpose, the invention is realized by the following technical scheme:
a multidimensional space-time data thinning and restoring algorithm comprises the following steps:
performing relevance classification on the multi-dimensional space-time data, and dividing the multi-dimensional space-time data into a plurality of data buckets;
solving the key time value in each data bucket to obtain the time value set of all the data buckets;
extracting corresponding measured point values from the original data based on the time value set to obtain a compression result;
and acquiring the interval of the time point to be restored in the time value set, and calculating the measuring point value of the data bucket at the time point to be restored so as to obtain the restoration result of each dimension.
Preferably, the classifying the multidimensional space-time data into a plurality of data buckets by correlation includes:
and dividing the multidimensional space-time data into a plurality of data buckets marked as B according to time + space 3D/4D data or time + a plurality of arbitrary measurement values.
Preferably, the finding the critical time value in each data bucket to obtain the time value set of all data buckets includes:
for each data bucket, a distance function d ═ f (x) for metric values corresponding to the data characteristics is sets,xe,xm) For calculating point xmTo point xs、xeThe distance that constitutes a straight line;
setting an error tolerance threshold value v based on the service attribute of each data bucket metric value;
each bucket is recursively calculated, and the starting point of the bucket is denoted as BsThe end point is denoted as BeThe middle point is marked as BmSubstituting the distance function to obtain the distance d ═ f (B)s,Be,Bm) Finding out the key time value t with the maximum change rate, and recording the error distance at the position as Dt
If D istIf the data barrel is more than v, the data barrel is divided into intervals of [1, t]、[t,n]Two sub data buckets B[1,t]、B[t,n]And performing the above recursive calculation when D istStopping the calculation of the sub data bucket when the value is less than or equal to v;
finding out all key time values T with the error distances larger than the error tolerance threshold value v, and recording the key time values T to a time value set T.
Preferably, the distance function is one of a planar geospatial distance, a three-dimensional geospatial distance, a euclidean distance, and a mahalanobis distance.
Preferably, the extracting the corresponding point measurement value from the original data based on the time value set to obtain the compressed result includes:
and (4) sorting the time value sets T in an ascending order after removing the duplicate, taking the time value sets T as indexes, and extracting corresponding measured point values from the original data to obtain a compression result R.
Preferably, the obtaining an interval where the time point to be restored is located in the time value set, and calculating a measurement point value of the data bucket at the time point to be restored, so as to obtain a restoration result of each dimension, includes:
inputting a time point t to be restorediAnd a reduction function xm=f-1(xs,xeM), finding the time point T to be restored in the time value set TiIn the interval [ tn,tm],n≤i≤m;
If the time point t of waiting for reductioniIf not, terminating the calculation;
will tnMeasured point value of
Figure BDA0003442999050000031
tmMeasured point value of
Figure BDA0003442999050000032
Substituting the reduction function into the sub-data bucket to calculate the time point t of the sub-data bucket to be reducediMeasured point value of
Figure BDA0003442999050000033
All the sub-data buckets are restored at the time point t to be restorediAfter the measured point values are fused, the time point t to be restored is obtainediAnd (4) corresponding reduction results.
(III) advantageous effects
Compared with the prior art, the multidimensional space-time data thinning and restoring algorithm provided by the invention can greatly compress the number of sampling points on the premise of ensuring the data characteristic loss rate of time sequence data, and effectively restore data of a target time point according to service requirements; based on the technology, real-time data distribution service, large-scale time sequence data storage service, real-time data display of the internet of things and other applications can be constructed, and larger-scale data management of the internet of things can be performed under the conditions of low-configuration hardware equipment and network.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a schematic diagram of a travel track obtained by performing multipoint sampling on an airplane flight according to the prior art;
fig. 3 is a schematic diagram of a driving track obtained by performing data thinning on an airplane flight according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A multi-dimensional spatio-temporal data thinning and restoring algorithm is disclosed, as shown in FIG. 1, that is, relevance classification is performed on multi-dimensional spatio-temporal data, and the multi-dimensional spatio-temporal data is divided into a plurality of data buckets, and the algorithm comprises the following steps:
and dividing the multidimensional space-time data into a plurality of data buckets marked as B according to time + space 3D/4D data or time + a plurality of arbitrary measurement values.
In the process of dividing the original data into a plurality of data buckets, correlation classification can be performed according to logic, change rate and the like.
Solving the key time value in each data bucket to obtain the time value set of all the data buckets, wherein the time value set comprises the following steps:
for each data bucket, a distance function d ═ f (x) for metric values corresponding to the data characteristics is sets,xe,xm) For calculating point xmTo point xs、xeThe distance that constitutes a straight line;
setting an error tolerance threshold value v based on the service attribute of each data bucket metric value;
each bucket is recursively calculated, and the starting point of the bucket is denoted as BsThe end point is denoted as BeThe middle point is marked as BmSubstituting the distance function to obtain the distance d ═ f (B)s,Be,Bm) Finding out the key time value t with the maximum change rate, and recording the error distance at the position as Dt
If D istIf the data barrel is more than v, the data barrel is divided into intervals of [1, t]、[t,n]Two sub data buckets B[1,t]、B[t,n]And performing the above recursive calculation when D istStopping the calculation of the sub data bucket when the value is less than or equal to v;
finding out all key time values T with the error distances larger than the error tolerance threshold value v, and recording the key time values T to a time value set T.
The distance function is one of a plane geospatial distance, a three-dimensional geospatial distance, a Euclidean distance and a Mahalanobis distance.
Extracting corresponding measured point values from the original data based on the time value set to obtain a compression result, wherein the compression result comprises the following steps:
and (4) sorting the time value sets T in an ascending order after removing the duplicate, taking the time value sets T as indexes, and extracting corresponding measured point values from the original data to obtain a compression result R.
Acquiring the interval of the time point to be restored in the time value set, calculating the measuring point value of the data bucket at the time point to be restored, and further obtaining the restoration result of each dimension, wherein the restoring result comprises the following steps:
inputting a time point t to be restorediAnd a reduction function xm=f-1(xs,xeM), finding the time point T to be restored in the time value set TiIn the interval [ tn,tm],n≤i≤m;
If the time point t of waiting for reductioniIf not, terminating the calculation;
will tnMeasured point value of
Figure BDA0003442999050000051
tmMeasured point value of
Figure BDA0003442999050000052
Substituting the reduction function into the sub-data bucket to calculate the time point t of the sub-data bucket to be reducediMeasured point value of
Figure BDA0003442999050000053
All the sub-data buckets are restored at the time point t to be restorediAfter the measured point values are fused, the time point t to be restored is obtainediAnd (4) corresponding reduction results.
Wherein the reduction function reduces x according to the logic of the sub data buckets、xeStraight line formed and its time axis tiCross point x ofm
According to the technical scheme, the method mainly comprises two parts, namely data thinning and data restoring, wherein the data thinning is to divide time sequence data into a plurality of stable or linearly-changed segments, delete middle point values and store key point values in a time sequence; in the data reduction, an interval where a time point to be reduced is located is obtained in the time value set, and a measured value of the data bucket at the time point to be reduced is calculated, so that an original value at the time point to be reduced is obtained.
FIG. 2 is a schematic view of a driving track of a CZ3677 flight flying from a Pudong airport to a Kunming airport, wherein the flight time is 3 hours and 24 minutes, the sampling rate is 0.3Hz, and 4386 sampling points are provided; FIG. 3 shows a track plane geographical distance error of 500m, a height drop of 50m, and a speed deviation of 50km/h, wherein the method marks a special state as a target rarefaction result, and the number of key sampling points is 46. Therefore, the technical scheme of the application can greatly reduce the number of sampling points on the basis of ensuring the original data characteristics.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (6)

1. A multidimensional space-time data thinning and reducing algorithm is characterized in that: the method comprises the following steps:
performing relevance classification on the multi-dimensional space-time data, and dividing the multi-dimensional space-time data into a plurality of data buckets;
solving the key time value in each data bucket to obtain the time value set of all the data buckets;
extracting corresponding measured point values from the original data based on the time value set to obtain a compression result;
and acquiring the interval of the time point to be restored in the time value set, and calculating the measuring point value of the data bucket at the time point to be restored so as to obtain the restoration result of each dimension.
2. The multi-dimensional spatio-temporal data thinning and restoring algorithm of claim 1, wherein: the method for classifying the multidimensional space-time data into a plurality of data buckets by correlation comprises the following steps:
and dividing the multidimensional space-time data into a plurality of data buckets marked as B according to time + space 3D/4D data or time + a plurality of arbitrary measurement values.
3. The multi-dimensional spatio-temporal data thinning and restoring algorithm of claim 1, wherein: the solving of the key time value in each data bucket to obtain the time value set of all data buckets includes:
for each data bucket, a distance function d ═ f (x) for metric values corresponding to the data characteristics is sets,xe,xm) For calculating point xmTo point xs、xeThe distance that constitutes a straight line;
setting an error tolerance threshold value v based on the service attribute of each data bucket metric value;
each bucket is recursively calculated, and the starting point of the bucket is denoted as BsThe end point is denoted as BeThe middle point is marked as BmSubstituting the distance function to obtain the distance d ═ f (B)s,Be,Bm) Finding out the key time value t with the maximum change rate, and recording the error distance at the position as Dt
If D istIf the data barrel is more than v, the data barrel is divided into intervals of [1, t]、[t,n]Two sub data buckets B[1,t]、B[t,n]And performing the above recursive calculation when D istStopping the calculation of the sub data bucket when the value is less than or equal to v;
finding out all key time values T with the error distances larger than the error tolerance threshold value v, and recording the key time values T to a time value set T.
4. The multi-dimensional spatiotemporal data thinning and restoring algorithm of claim 3, wherein: the distance function is one of a plane geospatial distance, a three-dimensional geospatial distance, a Euclidean distance and a Mahalanobis distance.
5. The multi-dimensional spatio-temporal data thinning and restoring algorithm of claim 1, wherein: the extracting of the corresponding point value from the original data based on the time value set to obtain a compressed result includes:
and (4) sorting the time value sets T in an ascending order after removing the duplicate, taking the time value sets T as indexes, and extracting corresponding measured point values from the original data to obtain a compression result R.
6. The multi-dimensional spatio-temporal data thinning and restoring algorithm of claim 1, wherein: the obtaining of the interval where the time point to be restored is located in the time value set, and calculating the measuring point value of the data bucket at the time point to be restored, so as to obtain the restoration result of each dimension, includes:
inputting a time point t to be restorediAnd a reduction function xm=f-1(xs,xeM), finding the time point T to be restored in the time value set TiIn the interval [ tn,tm],n≤i≤m;
If the time point t of waiting for reductioniIf not, terminating the calculation;
will tnMeasured point value of
Figure FDA0003442999040000021
tmMeasured point value of
Figure FDA0003442999040000022
Substituting the reduction function into the sub-data bucket to calculate the time point t of the sub-data bucket to be reducediMeasured point value of
Figure FDA0003442999040000023
All the sub-data buckets are restored at the time point t to be restorediAfter the measured point values are fused, the time point t to be restored is obtainediAnd (4) corresponding reduction results.
CN202111643342.1A 2021-12-29 2021-12-29 Multi-dimensional space-time data thinning and restoring algorithm Active CN114267211B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111643342.1A CN114267211B (en) 2021-12-29 2021-12-29 Multi-dimensional space-time data thinning and restoring algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111643342.1A CN114267211B (en) 2021-12-29 2021-12-29 Multi-dimensional space-time data thinning and restoring algorithm

Publications (2)

Publication Number Publication Date
CN114267211A true CN114267211A (en) 2022-04-01
CN114267211B CN114267211B (en) 2024-04-05

Family

ID=80831479

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111643342.1A Active CN114267211B (en) 2021-12-29 2021-12-29 Multi-dimensional space-time data thinning and restoring algorithm

Country Status (1)

Country Link
CN (1) CN114267211B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8363961B1 (en) * 2008-10-14 2013-01-29 Adobe Systems Incorporated Clustering techniques for large, high-dimensionality data sets
CN106354760A (en) * 2016-08-18 2017-01-25 北京工商大学 Deforming statistical map based multi-view spatio-temporal data visualization method and application
CN109767483A (en) * 2017-11-09 2019-05-17 中交上海航道勘察设计研究院有限公司 A kind of three-dimensional point cloud quickly vacuates De-weight method
CN110660133A (en) * 2018-06-29 2020-01-07 百度在线网络技术(北京)有限公司 Three-dimensional rarefying method and device for electronic map
CN110826183A (en) * 2019-10-08 2020-02-21 广州博进信息技术有限公司 Construction interaction method for multidimensional dynamic marine environment scalar field
CN112419483A (en) * 2020-11-24 2021-02-26 中电科新型智慧城市研究院有限公司 Three-dimensional model data transmission method and server

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8363961B1 (en) * 2008-10-14 2013-01-29 Adobe Systems Incorporated Clustering techniques for large, high-dimensionality data sets
CN106354760A (en) * 2016-08-18 2017-01-25 北京工商大学 Deforming statistical map based multi-view spatio-temporal data visualization method and application
CN109767483A (en) * 2017-11-09 2019-05-17 中交上海航道勘察设计研究院有限公司 A kind of three-dimensional point cloud quickly vacuates De-weight method
CN110660133A (en) * 2018-06-29 2020-01-07 百度在线网络技术(北京)有限公司 Three-dimensional rarefying method and device for electronic map
CN110826183A (en) * 2019-10-08 2020-02-21 广州博进信息技术有限公司 Construction interaction method for multidimensional dynamic marine environment scalar field
CN112419483A (en) * 2020-11-24 2021-02-26 中电科新型智慧城市研究院有限公司 Three-dimensional model data transmission method and server

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈虹;: "大型数据库中多维空间数据智能插值方法仿真", 计算机仿真, no. 10, 15 October 2017 (2017-10-15) *

Also Published As

Publication number Publication date
CN114267211B (en) 2024-04-05

Similar Documents

Publication Publication Date Title
CN113723452A (en) Large-scale anomaly detection system based on KPI clustering
CN112100435B (en) Automatic labeling method based on edge traffic audio and video synchronization samples
Qian et al. Grid-based Data Stream Clustering for Intrusion Detection.
CN115293236A (en) Hybrid clustering-based parallel fault diagnosis method and device for power equipment
CN115794578A (en) Data management method, device, equipment and medium for power system
CN114003596A (en) Multi-source heterogeneous data processing system and method based on industrial system
CN114267211A (en) Multidimensional space-time data thinning and restoring algorithm
CN116744006B (en) Video monitoring data storage method based on block chain
CN117372552A (en) Three-dimensional point cloud data compression method of industrial product facing complex surface
CN115907159B (en) Method, device, equipment and medium for determining typhoons in similar paths
CN117116096A (en) Airport delay prediction method and system based on multichannel traffic image and depth CNN
CN106708876B (en) Similar video retrieval method and system based on Lucene
CN115905983A (en) Artificial intelligence data classification system
CN111815449B (en) Abnormality detection method and system of multi-host quotation system based on stream computing
CN117135037A (en) Method and device for defining network traffic performance abnormality
CN108596220A (en) A kind of bridge node recognition methods based on hypergraph entropy
CN108846543B (en) Computing method and device for non-overlapping community set quality metric index
Qin et al. A trajectory abnormal detection method based on segmentation and clustering
Liang Research and implementation of compression algorithm for large-scale point cloud data
Jiang et al. Time synchronized velocity error for trajectory compression
CN115361584B (en) Video data processing method and device, electronic equipment and readable storage medium
CN117112815B (en) Personal attention video event retrieval method and system, storage medium and electronic device
CN113887718B (en) Channel pruning method and device based on relative activation rate and lightweight flow characteristic extraction network model simplification method
CN118193645B (en) Track retrieval method, system and medium based on Hbase and vector database
CN113536078B (en) Method, apparatus and computer storage medium for screening data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant