CN112699601B - Space-time reconstruction method for sensor network data - Google Patents

Space-time reconstruction method for sensor network data Download PDF

Info

Publication number
CN112699601B
CN112699601B CN202011576661.0A CN202011576661A CN112699601B CN 112699601 B CN112699601 B CN 112699601B CN 202011576661 A CN202011576661 A CN 202011576661A CN 112699601 B CN112699601 B CN 112699601B
Authority
CN
China
Prior art keywords
data
time
space
sensor
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011576661.0A
Other languages
Chinese (zh)
Other versions
CN112699601A (en
Inventor
段锐
张健
徐家晨
侯依雯
骆飞
刘泽菡
陈祝明
李沫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202011576661.0A priority Critical patent/CN112699601B/en
Publication of CN112699601A publication Critical patent/CN112699601A/en
Application granted granted Critical
Publication of CN112699601B publication Critical patent/CN112699601B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The invention discloses a space-time reconstruction method of sensor network data, which comprises the following steps: s1, collecting space-time data of the sensor; s2, preprocessing the acquired space-time data; s3, determining the length of training data; s4, constructing a training data set and a testing data set; s5, establishing a space-time data model; s6, training a space-time data model; and S7, performing space-time data reconstruction by using the model. The space-time reconstruction method can reconstruct the data of the position without the sensor and the sampling moment according to the sensor data of a limited number of measuring positions and measuring time; by estimating the time decorrelation length of each sensor data and determining the time length of the training data, the computation amount during model training and data testing can be reduced.

Description

Space-time reconstruction method for sensor network data
Technical Field
The invention relates to a data processing method of a sensor network, in particular to a space-time reconstruction method of sensor network data.
Background
The rapid popularization and application of sensor network technologies, such as Internet of Things (Internet of Things, iot), optical sensor networks, radio radar sensor networks, and the like, provide new capabilities and opportunities for large-scale collection, analysis, and utilization of space and time to sense large data, which makes human social life, industrial manufacturing, security, and other fields more intelligent, safe, and efficient.
According to the motion characteristics of the sensor platform, the data acquisition modes of the sensor can be roughly divided into two types: in the first category, the sensors are fixed and installed in the infrastructure of the sensing area when being arranged, so that the action range of each sensor node is determined in advance; in the second category, the sensors are mobile and placed on a moving platform or a person, the action area of each sensor node is dynamically changed, and the moving platform or the carrier needs to travel in the sensing area to acquire measurement data at all positions. However, limited by the number of sensor nodes, data transmission rate, energy consumption and other factors, in any way, the collected sensor data is local and discontinuous in time and space, that is: the sensor network can only sense or sample a portion of the locations in the sensing region at a series of discrete time instants. Therefore, both of the above two sensing methods inherently present a false alarm or false alarm risk for any possible event in time and space. The method has great potential safety hazards for sensor network applications aiming at disaster, safety, danger monitoring and the like, such as mine production, building facility health monitoring, forest fire or geological disaster early warning, environmental pollution monitoring, intrusion detection and the like.
For the above defects of the sensor network, if a data processing method can be found, the state of the sensing object in any time and space can be recovered only by using the collected limited space-time sensing data, which has a very important application value.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a space-time reconstruction method for sensor network data, which can reconstruct the data of a position without a sensor and a sampling moment according to sensor data of a limited number of measuring positions and measuring times.
The purpose of the invention is realized by the following technical scheme: a space-time reconstruction method of sensor network data comprises the following steps:
s1, collecting space-time data of the sensor;
s2, preprocessing the acquired space-time data;
s3, determining the length of training data;
s4, constructing a training data set and a testing data set;
s5, establishing a space-time data model;
s6, training a space-time data model;
and S7, performing space-time data reconstruction by using the model.
Further, the specific implementation method of step S1 is as follows: sensor siTime series data d for data collectioni=[di(1),di(2),...,di(Ki)]TIs represented by the formula (I) in which di(k),k=1,2,...,KiIndicating sensor siData collected at time K, KiIs siT is the transpose operator;
set for all sensor data collected by sensor network
Figure BDA0002864067400000021
Representation in which superscripts are placed(0)Which represents the raw data of the acquisition,
Figure BDA0002864067400000022
indicating sensor siThe acquired time series data i is 1, …, and N represents the number of sensors.
Further, the specific implementation method of step S2 is as follows:
s21, removing outliers in the sensor data, and defining the outliers of each sensor to be data values which are n times larger than the standard deviation of the sensor data, wherein the value range of n is 2-10; the method of removing outliers is to replace them with the values of the normal data points that are closest to them;
s22, filling missing data, and taking the average value of the time series before missing, which is equal to the data length of the missing segment, as the filling value of the missing data segment;
s23, data aggregation processing, namely aggregating the data of each sensor into time sequence data with a time interval delta T, and keeping the starting time of an aggregation time period as a time stamp of the aggregated data points; the aggregation operation is to take the average value, the median value or a specific quantile numerical value of the data in the aggregation time period;
s24, truncating all sensor data to start from the same timestamp and end by the same timestamp;
after data preprocessing, the time series data of all sensors are equal in length, the time stamps and the time intervals of the data points are also identical, and a data set is represented as
Figure BDA0002864067400000023
Wherein the superscript is(1)Representing the pre-processed data.
Further, the specific implementation method of step S3 is as follows:
s31, calculating the time correlation sequence of each sensor:
Figure BDA0002864067400000024
in the formula E [. C]Representing a desired operator; τ is a time delay length and τ is 0,1τ,KτRepresents a maximum delay time;
Figure BDA0002864067400000031
and
Figure BDA0002864067400000032
are respectively a time sequence
Figure BDA0002864067400000033
And
Figure BDA0002864067400000034
the average value of (a) of (b),
Figure BDA0002864067400000035
presentation pair
Figure BDA0002864067400000036
A time sequence obtained by delaying tau time units;
s32, estimating each sensor SiDecorrelation length τ of time series datac,iThere are two cases:
case 1: if the time correlation sequence ri(τ) is aperiodic, then find the condition riThe maximum time delay length with (tau) less than or equal to 0.05 is used as the decorrelation length tauc,i
Case 2: if the time correlation sequence ri(τ) is periodic, the period of the interval between peaks is found as the decorrelation length τc,i
S33, determining the maximum decorrelation length of the sensor data as follows:
Figure BDA0002864067400000037
s34, taking the length L of the training data as: l ═ n · τc,maxWherein the value range of n is 1.5-10.
Further, the specific implementation method of step S4 is as follows:
s41, constructing a training data set: sensor siIs represented by
Figure BDA0002864067400000038
Short for
Figure BDA0002864067400000039
Which corresponds to a space-time position (x)i,yi(ii) a k) The data on (1); time series data of length K for N sensors
Figure BDA00028640674000000310
Total NK data points;
the sensor data is normalized, and the processing method comprises the following steps:
Figure BDA00028640674000000311
in the formula
Figure BDA00028640674000000312
Is a sequence of data
Figure BDA00028640674000000313
The average value of (a) of (b),
Figure BDA00028640674000000314
is that
Figure BDA00028640674000000315
Standard deviation of (2), superscript(2)Representing the data after normalization processing;
and (3) constructing a training data set omega by using the normalized sensor data to obtain:
Figure BDA00028640674000000316
Figure BDA00028640674000000317
the representation is located at spatial coordinates (x)i,yi) Sensor siNormalizing the data acquired at the moment k to obtain data, wherein the training data set omega has NK data points;
s42, constructing a test data set: the test data set is a collection of data points to be reconstructed, the location (x) of which is tested*,y*,k*) Space-time trellis(s)g;tg) Above, i.e. (x)*,y*)∈sgAnd k*∈tg,sgIs a spatial grid, tgIs a time grid;
defining a test data set omega*Comprises the following steps:
Figure BDA0002864067400000041
in the formula (I), the compound is shown in the specification,
Figure BDA0002864067400000042
for the data sequence to be reconstructed,
Figure BDA0002864067400000043
for the spatial position of the data to be reconstructed, k*For the temporal position of the data to be reconstructed,
Figure BDA0002864067400000044
data on data points to be reconstructed; mx、My、MtRespectively representing the maximum indexes of the test data on x, y axes and a time axis t, wherein the total number of the test data points is MxMyMt(ii) a When k is more than or equal to 1*When K is less than or equal to K, the reconstructed data is regarded as a time interpolation result of the training data; when k is*When the value is more than K, the reconstructed data is regarded as a prediction result of the training data;
in order to reduce the operation complexity, selecting sensor data of L time sections from omega in a training process as training data, wherein the total number of training data points is NL; from Ω*Middle selection of L*Taking the spatial grid of each time section as a test data point to be reconstructed, wherein the total number of the test data points is MxMyL*(ii) a All M's are to be reconstructedxMyMtData of each test point is required to be carried out
Figure BDA0002864067400000045
A secondary test wherein
Figure BDA0002864067400000046
Denotes a ceiling operation, L*Indicating the reconstructed data length.
Further, the specific implementation method of step S5 is as follows:
s51, constructing a space-time covariance matrix of the data: space-time kernel function ker (v)p,vq) Containing the space-time correlation of the data, where vp=(xp,yp,kp) And vq=(xq,yq,kq) Respectively representing two different space-time positions of p and q; the space-time data is three-dimensional data in the x, y, and k dimensions, represented by ker (v) using a product tensor kernelp,vq) Namely:
ker(vp,vq)=kerx(xp,xq)kery(yp,yq)kerk(kp,kq) (6)
formula (III) kerx(·,·)、kery(. phi.) and kerk(-) is a kernel function in dimensions x, y, and k, respectively;
using space-time kernel function ker (v)p,vq) Constructing a space-time covariance matrix of training and testing data, expressed as:
Figure BDA0002864067400000047
where V represents the position of NL training data points, so Σ (V, V) represents a training covariance matrix of NL × NL dimensions; v*Represents MxMyL*Location of a test point, so (V)*,V*) Represents MxMyL*×MxMyL*A dimensional test covariance matrix; sigma (V, V)*) And Σ (V)*V) represents NL × MxMyL*And MxMyL*A cross covariance matrix of xnl dimensions;
s52, establishing a space-time data model by using a Gaussian process: under the gaussian process model, the training data and the test data obey a joint gaussian distribution, namely:
Figure BDA0002864067400000051
in the formula dNL×1Is a training data vector, i.e.
Figure BDA0002864067400000052
Where Vec (-) is the vectoring operator,
Figure BDA0002864067400000053
is represented by a sensor siIs/are as follows
Figure BDA0002864067400000054
A training data vector consisting of the L data points in (1);
Figure BDA0002864067400000055
is a test data vector, i.e.:
Figure BDA0002864067400000056
wherein
Figure BDA0002864067400000057
Is represented by a spatial grid mxmyL of*A test data vector consisting of time data points.
Further, the specific implementation method of step S6 is as follows: calculating a hyper-parameter in the kernel function; known a priori from Gauss, training data dNL×1The distribution of (a) is gaussian, i.e.: dNL×1|V~N(0,Σ(V,V));dNL×1The log-edge likelihood function of (a) is:
Figure BDA0002864067400000058
in the formula []-1Representing a matrix inversion operator;
solving the hyperparameters by a numerical method, namely the hyperparameters with the maximum log-edge likelihood function:
Figure BDA0002864067400000059
further, the specific implementation method of step S7 is as follows: according to the space-time data model, the space-time data reconstructed or predicted at the test point position is as follows:
Figure BDA00028640674000000510
the covariance of the reconstructed data is:
Figure BDA00028640674000000511
the invention has the beneficial effects that:
1. the space-time reconstruction method can reconstruct the data of the position without the sensor and the sampling moment according to the sensor data of a limited number of measuring positions and measuring time;
2. the effect of data reconstruction may be interpolation of training data, or extrapolation or prediction of training data, based on selected data samples from the training data set;
3. the space-time data model utilizes the space-time correlation of the training data and the test data, and has good interpretability;
4. by using the product tensor kernel model, a high-dimensional space-time kernel function can be calculated by using a low-dimensional kernel function;
5. by estimating the time decorrelation length of each sensor data and determining the time length of the training data, the computation amount during model training and data testing can be reduced.
Drawings
Fig. 1 is a flowchart of a space-time reconstruction method of sensor network data according to the present invention;
FIG. 2 is a diagram of a sensor arrangement for Internet of things in a room;
FIG. 3 is a schematic diagram of a sensor network data structure according to the present invention;
fig. 4 is spatial panel data collected by the sensor network at time t-5 h in this embodiment;
fig. 5 shows the spatial panel data reconstructed by the present method at time t-5 h in this embodiment;
fig. 6 is a comparison curve between the predicted value and the actual value of the temperature data predicted by using the method at the time when the position t of the sensor 1, the sensor 4, the sensor 5 or the sensor 12 is 6-15h and t is 15-20h in the embodiment;
fig. 7 is a training data slice with t being 0-15h, a test point reconstructed space-time field data slice with t being 2h and t being 7.5h in this embodiment;
fig. 8 shows training data slices with t being 0 to 15h, and test point predicted space-time field data slices with t being 16h and t being 18.5h in this embodiment.
Detailed Description
The technical scheme of the invention is further explained by combining the attached drawings.
It is assumed that the sensor network is laid out in a two-dimensional space represented by a Cartesian coordinate system O-xy, in which there are a total of N sensor nodes siN, the position coordinates of the node i are (x)i,yi). Each sensor node may simultaneously contain a plurality of different types of sensors. To simplify the description of the proposed method, each sensor node is made to contain only one sensor of the same type.
The space and time to be sensed by the sensor network are divided by a grid method, and the size of grid units is determined according to specific application requirements. In particular, the spatial grid sgWith Mx×MyUnit of unit size Δx×ΔyIn which ΔxAnd ΔyThe x-and y-dimensional lengths of the cells, respectively. Typically, the number of spatial grid cells MxMy> N. Time grid tgWith MtUnit of unit size Δt. Thus, a space-time trellis(s)g;tg) The space-time positions of the data points to be reconstructed according to the invention are specified, namely: each space-time grid cell represents one space-time data point to be reconstructed. The total number of reconstructed data points may be MxMyMtAnd (4) respectively.
As shown in fig. 1, a method for space-time reconstruction of sensor network data according to the present invention includes the following steps:
s1, collecting space-time data of the sensor;
the specific implementation method comprises the following steps: sensor siTime series data d for data collectioni=[di(1),di(2),...,di(Ki)]TIs represented by the formula (I) in which di(k),k=1,2,...,KiIndicating sensor siData collected at time K, KiIs siT is the transpose operator;
set for all sensor data collected by sensor network
Figure BDA0002864067400000071
Representation in which superscripts are placed(0)Which represents the raw data of the acquisition,
Figure BDA0002864067400000072
indicating sensor siThe acquired time series data i is 1, …, and N represents the number of sensors. D(0)The position information comprises position coordinates, time stamps and the like of the sensor networks.
In this embodiment, the overall size of the working scene space of the internet of things sensor in a certain room is 40m×32m, and 54 sensors are arranged in total, and the layout is shown in figure 2. Wherein the size of the space-time grid is deltax=2m,Δy2m, time grid size, Δ T1 h, sensor siThe space-time position of the data is (x)i,yi(ii) a k) 1,2,., 54, and k 1,2,., 15, the data structure of which is shown in fig. 3.
S2, preprocessing the acquired space-time data;
the specific implementation method comprises the following steps:
s21, removing outliers in the sensor data, defining an outlier of each sensor as a data value n times larger than a standard deviation of the sensor data, where n is 2-10, and in this embodiment, n is 5; the method for eliminating outliers is to replace outliers with the value of the nearest normal data point;
s22, filling missing data, and taking the average value of the time series before missing, which is equal to the data length of the missing segment, as the filling value of the missing data segment;
s23, data aggregation processing, namely aggregating the data of each sensor into time sequence data with a time interval delta T, and keeping the starting time of an aggregation time period as a time stamp of the aggregated data points; the aggregation operation is to take the average value, the median value or a specific quantile numerical value of the data in the aggregation time period;
s24, truncating all sensor data to start from the same timestamp and end by the same timestamp;
after data preprocessing, the time series data of all sensors are equal in length, the time stamps and the time intervals of the data points are also identical, and a data set is represented as
Figure BDA0002864067400000073
Wherein the superscript is(1)Representing pre-processed data, time series data of each sensor
Figure BDA0002864067400000074
All are K.
S3, determining the length of training data;
the specific implementation method comprises the following steps:
s31, calculating a time correlation sequence of each sensor:
Figure BDA0002864067400000075
in the formula E [. C]Representing a desired operator; τ is a time delay length and τ is 0,1τ,KτRepresents a maximum delay time;
Figure BDA0002864067400000081
and
Figure BDA0002864067400000082
are respectively a time sequence
Figure BDA0002864067400000083
And
Figure BDA0002864067400000084
the average value of (a) of (b),
Figure BDA0002864067400000085
presentation pair
Figure BDA0002864067400000086
A time sequence obtained by delaying tau time units;
s32, estimating each sensor SiDecorrelation length tau of time series datac,iThere are two cases:
case 1: if the time correlation sequence ri(τ) is aperiodic, then find the condition riThe maximum time delay length with (tau) less than or equal to 0.05 is used as the decorrelation length tauc,i
Case 2: if the time correlation sequence ri(τ) is periodic, the period of the interval between peaks is found as the decorrelation length τc,i
S33, determining the maximum decorrelation length of the sensor data as follows:
Figure BDA0002864067400000087
the correlation sequence r in the present embodiment is found by calculationi(τ) is aperiodic, so the decorrelation length τ is found by case 1c,i(ii) a And the maximum decorrelation length tau is obtained by using the formula (2) extremelyc,max=1;
S34, taking the length L of the training data as follows: l ═ n · τc,maxWherein n is 1.5 to 10, and in this embodiment, n is 10, so that L is 10.
S4, constructing a training data set and a testing data set;
the specific implementation method comprises the following steps:
s41, constructing a training data set: sensor siIs represented by
Figure BDA0002864067400000088
It is briefly described as
Figure BDA0002864067400000089
Which corresponds to a space-time position (x)i,yi(ii) a k) The data on (1); time series data of 54 sensors and 15 length
Figure BDA00028640674000000810
There are a total of 810 data points;
the sensor data is normalized, and the processing method comprises the following steps:
Figure BDA00028640674000000811
in the formula
Figure BDA00028640674000000812
Is a sequence of data
Figure BDA00028640674000000813
The average value of (a) of (b),
Figure BDA00028640674000000814
is that
Figure BDA00028640674000000815
Standard deviation of (2), superscript(2)Representing the data after normalization processing;
and (3) constructing a training data set omega by using the normalized sensor data to obtain:
Figure BDA00028640674000000816
Figure BDA0002864067400000091
the representation is located at spatial coordinates (x)i,yi) Sensor s ofiNumber obtained after normalization processing of data acquired at time kAccording to the data, the training data set omega has NK data points;
s42, constructing a test data set: the test data set is a collection of data points to be reconstructed, the location (x) of which is tested*,y*,k*) Space-time trellis(s)g;tg) Above, i.e. (x)*,y*)∈sgAnd k*∈tg,sgIs a spatial grid, tgIs a time grid;
defining a test data set omega*Comprises the following steps:
Figure BDA0002864067400000092
in the formula (I), the compound is shown in the specification,
Figure BDA0002864067400000093
for the data sequence to be reconstructed,
Figure BDA0002864067400000094
for the spatial position of the data to be reconstructed, k*For the temporal position of the data to be reconstructed,
Figure BDA0002864067400000095
data on data points to be reconstructed; mx、My、MtRespectively representing the maximum indexes of the test data on x, y axes and a time axis t, wherein the total number of the test data points is MxMyMt(ii) a When k is more than or equal to 1*At ≦ K, the reconstructed data is considered as a result of the temporal interpolation on the training data, mx=1,2,...,Mx,my=1,2,...,My, k *1,2, 15; when k is*When > K, the reconstructed data is regarded as a prediction result about the training data, mx=1,2,...,Mx,my=1,2,...,My,k*=16,17,...,20;
In order to reduce the operation complexity, the sensor data of L time sections are selected from omega in one training process to be used as training data, and the total number of the training data points isNL; from Ω*Middle selection of L*Taking the spatial grid of each time section as a test data point to be reconstructed, wherein the total number of the test data points is MxMyL*(ii) a All M's are to be reconstructedxMyMtData of each test point is required to be carried out
Figure BDA0002864067400000096
A secondary test wherein
Figure BDA0002864067400000097
Denotes a ceiling operation, L*Indicating the reconstructed data length.
Selecting k from omega in the training process *6,7, 15, i.e. L10 time slices of sensor data as training data, the total number of training data points NL 540; when the reconstruction operation is performed in the present embodiment, the operation is performed from Ω*In select k *6,7, 15, i.e. L*Taking a spatial grid of 10 time sections as test data points to be reconstructed, wherein the total number of the test data points is MxMyL*2660; when the prediction operation is finally performed in the present embodiment, the operation is started from Ω*In select k *16,17, 20, i.e. L*Taking a spatial grid of 5 time sections as test data points to be reconstructed, wherein the total number of the test data points is MxMyL*=270。
S5, establishing a space-time data model;
the specific implementation method comprises the following steps:
s51, constructing a space-time covariance matrix of the data: space-time kernel function ker (v)p,vq) Containing the space-time correlation of the data, where vp=(xp,yp,kp) And vq=(xq,yq,kq) Respectively representing two different space-time positions of p and q; the space-time data is three-dimensional data in the x, y, and k dimensions, represented by ker (v) using a product tensor kernelp,vq) Namely:
ker(vp,vq)=kerx(xp,xq)kery(yp,yq)kerk(kp,kq) (6)
formula (III) kerx(·,·)、kery(. phi.) and kerk(-) is a kernel function in dimensions x, y, and k, respectively; the specific form of the kernel function can be determined in advance, namely, the kernel function is determined by the correlation of historical data, and is a square exponential kernel, a matrix kernel, a periodic kernel or the like; or estimated from the acquired data, such as spectral mixing kernel. The hyperparameters in the kernel function are represented by a parameter vector θ.
In this embodiment, a Squared explicit iteration (SE) kernel is empirically chosen in the x and y dimensions, and the expression of the SE kernel is as follows:
Figure BDA0002864067400000101
Figure BDA0002864067400000102
wherein s is2And l is a hyper-parameter which respectively represents the variance and the length scale of the training data, wherein the length scale is the proportion of the distance between the samples before and after the feature space mapping under the weight space view angle.
According to experience, a Spectral Mixture (SM) kernel is selected in the k dimension, and the expression of the SM kernel is as follows:
Figure BDA0002864067400000103
in the formula, wq、Σe=[Σe (1)e (2),...,Σe (Q)]And mue=[μe (1)e (2),...,μe (Q)]The weight, variance and frequency parameters corresponding to the e-th accumulation term are respectively represented, Q is an accumulation parameter, and in this embodiment, an empirical value of 5 is taken. Before the SM core is used, the SM core needs to be initialized according to the following method: w is aqAll elements in the list are initialConverting into data variance; mu.seAre each initialized to the reciprocal of the minimum sampling interval; sigmaeAre initialized to the inverse of the length scale.
Using space-time kernel function ker (v)p,vq) Constructing a space-time covariance matrix of training and testing data, expressed as:
Figure BDA0002864067400000104
where V represents the position of NL training data points, so Σ (V, V) represents a training covariance matrix of NL × NL dimensions; v*Represents MxMyL*Location of a test point, so (V)*,V*) Represents MxMyL*×MxMyL*A dimensional test covariance matrix; sigma (V, V)*) And sigma (V)*V) represents NL × MxMyL*And MxMyL*A cross covariance matrix of xnl dimensions;
s52, establishing a space-time data model by using a Gaussian process: under the gaussian process model, the training data and the test data obey a joint gaussian distribution, namely:
Figure BDA0002864067400000111
in the formula dNL×1Is a training data vector, i.e.
Figure BDA0002864067400000112
Where Vec (-) is the vectoring operator,
Figure BDA0002864067400000113
is represented by a sensor siIs/are as follows
Figure BDA0002864067400000114
A training data vector consisting of the L data points in (1);
Figure BDA0002864067400000115
is a test data vector, i.e.:
Figure BDA0002864067400000116
wherein
Figure BDA0002864067400000117
Is represented by a spatial grid mxmyL of*A test data vector consisting of time data points.
S6, training a space-time data model;
the specific implementation method comprises the following steps: calculating a hyper-parameter in the kernel function; known a priori from Gauss, training data dNL×1The distribution of (a) is gaussian, i.e.: d is a radical ofNL×1|V~N(0,Σ(V,V));dNL×1The log-edge likelihood function of (a) is:
Figure BDA0002864067400000118
in the formula []-1Representing a matrix inversion operator;
solving the hyperparameters by a numerical method, namely the hyperparameters with the maximum log-edge likelihood function:
Figure BDA0002864067400000119
s7, performing space-time data reconstruction by using the model;
the specific implementation method comprises the following steps: according to the space-time data model, the space-time data reconstructed or predicted at the test point position is as follows:
Figure BDA00028640674000001110
fig. 4 shows a sensor network data slice at the time point t-5 h in this embodiment; the space-time field slice reconstructed by using the method at the time t-5 h is shown in fig. 5;
a space-time data model can also be applied to ensure that the formula (11) predicts the space-time data at the position of the predicted test point
Figure BDA00028640674000001111
In the example, comparison curves of predicted values and actual values of temperature data predicted by the method at the time when the position t of the No. 1, No. 4, No. 5 and No. 12 sensors is 6-15h and t is 15-20h are shown in FIG. 6;
the training data slice with t being 0-15h, the test point reconstruction space-time field data slice with t being 2h and t being 7.5h in the embodiment are shown in fig. 7; training data slices with t being 0-15h, test point prediction space-time field data slices with t being 16h and t being 18.5h in the embodiment are shown in fig. 8.
It can be seen from the figure that the space-time field reconstruction method based on the gaussian process, which is provided by the invention, can be used for reconstructing the space-time field without depending on a physical model by using the known node data in the space-time field.
The covariance of the reconstructed data is:
Figure BDA0002864067400000121
it will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims (2)

1. A space-time reconstruction method of sensor network data is characterized by comprising the following steps:
s1, collecting space-time data of the sensor; the specific implementation method comprises the following steps: sensor siTime series data d for data collectioni=[di(1),di(2),...,di(Ki)]TIs represented by (a) wherein di(k),k=1,2,...,KiIndicating sensor siData collected at time K, KiIs siT is the transpose operator;
set for all sensor data collected by sensor network
Figure FDA0003615662560000011
Indicating, where superscript (0) indicates raw acquisition data,
Figure FDA0003615662560000012
indicating sensor siThe acquired time series data, i is 1, …, N, N represents the number of sensors;
s2, preprocessing the acquired space-time data; the specific implementation method comprises the following steps:
s21, removing outliers in the sensor data, and defining the outliers of each sensor to be data values which are n times larger than the standard deviation of the sensor data, wherein the value range of n is 2-10; the method for eliminating outliers is to replace outliers with the value of the nearest normal data point;
s22, filling missing data, and taking the average value of the time series before missing, which is equal to the data length of the missing segment, as the filling value of the missing data segment;
s23, data aggregation processing, namely aggregating the data of each sensor into time sequence data with a time interval delta T, and keeping the starting time of an aggregation time period as a time stamp of the aggregated data points; the aggregation operation is to take the average value, the median value or a specific quantile numerical value of the data in the aggregation time period;
s24, truncating all sensor data to start from the same timestamp and end by the same timestamp;
after data preprocessing, the time series data of all sensors are equal in length, the time stamps and the time intervals of the data points are also identical, and a data set is represented as
Figure FDA0003615662560000013
Wherein the superscript is(1)Representing the preprocessed data;
s3, determining the length of training data;
s4, constructing a training data set and a testing data set; the specific implementation method comprises the following steps:
s41, constructing a training data set: sensor siIs represented by
Figure FDA0003615662560000014
It is briefly described as
Figure FDA0003615662560000015
Which corresponds to a space-time position (x)i,yi(ii) a k) The data on (1); time series data of length K for N sensors
Figure FDA0003615662560000016
Total NK data points;
the sensor data is normalized, and the processing method comprises the following steps:
Figure FDA0003615662560000017
in the formula
Figure FDA0003615662560000018
Is a sequence of data
Figure FDA0003615662560000019
The average value of (a) of (b),
Figure FDA00036156625600000110
is that
Figure FDA00036156625600000111
Standard deviation of (2), superscript(2)Representing the data after normalization processing;
and (3) constructing a training data set omega by using the normalized sensor data to obtain:
Figure FDA0003615662560000021
Figure FDA0003615662560000022
the representation is located at spatial coordinates (x)i,yi) Sensor siCarrying out normalization processing on the data acquired at the moment k to obtain data, wherein the training data set omega has NK data points;
s42, constructing a test data set: the test data set is a collection of data points to be reconstructed, the location (x) of which is tested*,y*,k*) Space-time trellis(s)g;tg) Above, i.e. (x)*,y*)∈sgAnd k*∈tg,sgIs a spatial grid, tgIs a time grid;
defining a test data set omega*Comprises the following steps:
Figure FDA0003615662560000023
in the formula (I), the compound is shown in the specification,
Figure FDA0003615662560000024
for the data sequence to be reconstructed,
Figure FDA0003615662560000025
for the spatial position of the data to be reconstructed, k*For the temporal position of the data to be reconstructed,
Figure FDA0003615662560000026
data on data points to be reconstructed; mx、My、MtRespectively representing the test data in x, y axes and timeMaximum index on axis t, total number of test data points MxMyMt(ii) a When k is more than or equal to 1*When K is less than or equal to K, the reconstructed data is regarded as a time interpolation result of the training data; when k is*When the data is more than K, the reconstructed data is regarded as a prediction result about the training data;
in order to reduce the operation complexity, selecting sensor data of L time sections from omega in a training process as training data, wherein the total number of training data points is NL; from Ω*Middle selection L*Taking the spatial grid of each time section as a test data point to be reconstructed, wherein the total number of the test data points is MxMyL*(ii) a All M's are to be reconstructedxMyMtData of each test point is needed to be carried out
Figure FDA0003615662560000027
Sub-test of wherein
Figure FDA0003615662560000028
Denotes a ceiling operation, L*Representing the reconstructed data length;
s5, establishing a space-time data model; the specific implementation method comprises the following steps:
s51, constructing a space-time covariance matrix of the data: space-time kernel function ker (v)p,vq) Containing the space-time correlation of the data, where vp=(xp,yp,kp) And vq=(xq,yq,kq) Respectively representing two different space-time positions of p and q; the space-time data is three-dimensional data in the x, y, and k dimensions, represented by ker (v) using a product tensor kernelp,vq) Namely:
ker(vp,vq)=kerx(xp,xq)kery(yp,yq)kerk(kp,kq) (6)
formula (III) kerx(·,·)、kery(. cndot.) and kerk(-) is a kernel function in dimensions x, y, and k, respectively;
using space-time kernel function ker (v)p,vq) Constructing a space-time covariance matrix of training and testing data, expressed as:
Figure FDA0003615662560000031
where V represents the position of NL training data points, so Σ (V, V) represents a training covariance matrix of NL × NL dimensions; v*Represents MxMyL*Location of a test point, so (V)*,V*) Represents MxMyL*×MxMyL*A dimensional test covariance matrix; sigma (V, V)*) And Σ (V)*V) represents NL × MxMyL*And MxMyL*A cross covariance matrix of xnl dimensions;
s52, establishing a space-time data model by using a Gaussian process: under the gaussian process model, the training data and the test data obey a joint gaussian distribution, namely:
Figure FDA0003615662560000032
in the formula dNL×1Is a training data vector, i.e.
Figure FDA0003615662560000033
Where Vec (-) is the vectoring operator,
Figure FDA0003615662560000034
is represented by a sensor siIs/are as follows
Figure FDA0003615662560000035
A training data vector consisting of the L data points in (1);
Figure FDA0003615662560000036
is a test data vector, i.e.:
Figure FDA0003615662560000037
wherein
Figure FDA0003615662560000038
Is represented by a spatial grid mxmyL of*A test data vector consisting of time data points;
s6, training a space-time data model; the specific implementation method comprises the following steps: calculating a hyper-parameter in the kernel function; known a priori from Gauss, training data dNL×1The distribution of (a) is gaussian, i.e.: dNL×1|V~N(0,Σ(V,V));dNL×1The log-edge likelihood function of (a) is:
Figure FDA0003615662560000039
in the formula []-1Representing a matrix inversion operator;
solving the hyperparameters by a numerical method, namely the hyperparameters with the maximum log-edge likelihood function:
Figure FDA00036156625600000310
s7, performing space-time data reconstruction by using the model; the specific implementation method comprises the following steps: according to the space-time data model, the space-time data reconstructed or predicted at the test point position is as follows:
Figure FDA00036156625600000311
the covariance of the reconstructed data is:
Figure FDA0003615662560000041
2. a space-time reconstruction method of sensor network data according to claim 1, wherein the step S3 is specifically implemented by:
s31, calculating the time correlation sequence of each sensor:
Figure FDA0003615662560000042
in the formula E [ ·]Representing a desired operator; τ is a time delay length and τ is 0,1τ,KτRepresents a maximum delay time;
Figure FDA0003615662560000043
and
Figure FDA0003615662560000044
are respectively a time sequence
Figure FDA0003615662560000045
And
Figure FDA0003615662560000046
the average value of (a) of (b),
Figure FDA0003615662560000047
presentation pair
Figure FDA0003615662560000048
A time sequence obtained by delaying tau time units;
s32, estimating each sensor SiDecorrelation length τ of time series datac,iThere are two cases:
case 1: if the time correlation sequence ri(τ) is aperiodic, then find the condition riThe maximum time delay length with (tau) less than or equal to 0.05 is used as the decorrelation length tauc,i
Case 2: if the time correlation sequence ri(τ) is periodic, the period of the interval between peaks is found as the decorrelation length τc,i
S33, determining the maximum decorrelation length of the sensor data as follows:
Figure FDA0003615662560000049
s34, taking the length L of the training data as: l ═ n · τc,maxWherein the value range of n is 1.5-10.
CN202011576661.0A 2020-12-28 2020-12-28 Space-time reconstruction method for sensor network data Active CN112699601B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011576661.0A CN112699601B (en) 2020-12-28 2020-12-28 Space-time reconstruction method for sensor network data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011576661.0A CN112699601B (en) 2020-12-28 2020-12-28 Space-time reconstruction method for sensor network data

Publications (2)

Publication Number Publication Date
CN112699601A CN112699601A (en) 2021-04-23
CN112699601B true CN112699601B (en) 2022-05-31

Family

ID=75512190

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011576661.0A Active CN112699601B (en) 2020-12-28 2020-12-28 Space-time reconstruction method for sensor network data

Country Status (1)

Country Link
CN (1) CN112699601B (en)

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7981424B2 (en) * 2006-05-05 2011-07-19 Transtech Pharma, Inc. RAGE fusion proteins, formulations, and methods of use thereof
CN104156615A (en) * 2014-08-25 2014-11-19 哈尔滨工业大学 Sensor test data point anomaly detection method based on LS-SVM
CN106295703B (en) * 2016-08-15 2022-03-25 清华大学 Method for modeling and identifying time sequence
CN107704962B (en) * 2017-10-11 2021-03-26 大连理工大学 Steam flow interval prediction method based on incomplete training data set
CN109583386B (en) * 2018-11-30 2020-08-25 中南大学 Intelligent rotating machinery fault depth network feature identification method
CN112067294A (en) * 2019-09-20 2020-12-11 宫文峰 Rolling bearing intelligent fault diagnosis method based on deep learning
CN111861002A (en) * 2020-07-22 2020-10-30 上海明华电力科技有限公司 Building cold and hot load prediction method based on data-driven Gaussian learning technology

Also Published As

Publication number Publication date
CN112699601A (en) 2021-04-23

Similar Documents

Publication Publication Date Title
CN109492822B (en) Air pollutant concentration time-space domain correlation prediction method
Le et al. Spatiotemporal deep learning model for citywide air pollution interpolation and prediction
Trebing et al. Wind speed prediction using multidimensional convolutional neural networks
Wu et al. Promoting wind energy for sustainable development by precise wind speed prediction based on graph neural networks
CN106559749B (en) Multi-target passive positioning method based on radio frequency tomography
CN114220271A (en) Traffic flow prediction method, equipment and storage medium based on dynamic space-time graph convolution cycle network
CN111079977A (en) Heterogeneous federated learning mine electromagnetic radiation trend tracking method based on SVD algorithm
CN113705880A (en) Traffic speed prediction method and device based on space-time attention diagram convolutional network
Sun et al. Device-free wireless localization using artificial neural networks in wireless sensor networks
CN107967487A (en) A kind of colliding data fusion method based on evidence distance and uncertainty
CN110008508A (en) Three-dimensional temperature field monitoring method based on space-time condition dynamic modeling
CN103298156A (en) Passive multi-target detecting and tracking method based on wireless sensor networks
CN108647643A (en) A kind of packed tower liquid flooding state on-line identification method based on deep learning
CN114297907A (en) Greenhouse environment spatial distribution prediction method and device
CN113033654A (en) Indoor intrusion detection method and system based on WiFi channel state information
CN110796360A (en) Fixed traffic detection source multi-scale data fusion method
Chen et al. WSN sampling optimization for signal reconstruction using spatiotemporal autoencoder
CN111209968A (en) Multi-meteorological factor mode forecast temperature correction method and system based on deep learning
CN112699601B (en) Space-time reconstruction method for sensor network data
CN117156442B (en) Cloud data security protection method and system based on 5G network
Chen et al. Temperature monitoring and prediction under different transmission modes
CN113538239B (en) Interpolation method based on space-time autoregressive neural network model
Xuegang et al. Missing Data Reconstruction Based on Spectral k-Support Norm Minimization for NB-IoT Data
Li et al. Missing data reconstruction in attitude for quadrotor unmanned aerial vehicle based on deep regression model with different sensor failures
Zhu et al. Multi-resolution spatio-temporal prediction with application to wind power generation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant