CN109684314B

CN109684314B - Wireless sensor network missing value estimation method based on space structure

Info

Publication number: CN109684314B
Application number: CN201811541830.XA
Authority: CN
Inventors: 李微微; 马卫
Original assignee: NANJING INSTITUTE OF TOURISM & HOSPITALITY
Current assignee: NANJING INSTITUTE OF TOURISM & HOSPITALITY
Priority date: 2018-12-17
Filing date: 2018-12-17
Publication date: 2023-04-18
Anticipated expiration: 2038-12-17
Also published as: CN109684314A

Abstract

The invention discloses a wireless sensor network missing value estimation method based on a space structure, and belongs to the field of wireless sensor networks. The method not only considers the time relevance of the WSNs, combines the characteristic that sensor node data has space-time relevance, and restricts the solution space of matrix completion by adding a regularization term to estimate the missing value of the data.

Description

Wireless sensor network missing value estimation method based on space structure

Technical Field

The invention belongs to the field of wireless sensor networks, and particularly relates to a method for estimating a missing value of a wireless sensor network based on a space structure.

Background

Wireless Sensor Networks (WSNs) are a brand-new information acquisition and processing technology, which are widely applied in many fields such as military, environmental monitoring, disaster relief, industrial control, smart home, and the like, and are the research focus in the information field. The wireless sensor node is usually directly exposed in the external environment, and frequent disconnection of a communication link can be caused by weather conditions, stability of the sensor device, human factors and the like, so that the phenomenon of data loss or data abnormality of acquired sensing data in transmission can be caused.

In general, conventional processing methods for missing data include three types: firstly, deleting missing data directly; secondly, the data is not processed at all, and the current algorithm is directly used; and thirdly, filling missing data. Although the first method is simple and easy to use, with the arrival of the 'big data era', the sparse characteristic of data is more and more serious, the amount of missing data is gradually increased, and discarding the missing data item not only affects the overall characteristics of the data, but also can seriously affect the result of data mining, so that an operator makes wrong judgment, and great artificial loss is caused. The second method has to face the current situation that the traditional data mining algorithm processes complete data, and needs to modify the classical algorithm correspondingly to make the classical algorithm suitable for missing data, and more importantly, the modification task is heavy and some methods cannot be realized. In addition, although some analysis algorithms for incomplete data appear at present, the problems of high algorithm complexity, poor processing effect and the like generally exist. Filling in missing data is therefore the most desirable method of incomplete data processing. Incomplete data filling refers to that one or more predicted values closest to missing data are obtained by using other known auxiliary information and a specific method or model, and then the missing data is filled by using the predicted values to obtain a complete data set, so that the data set is close to an original data set as much as possible.

In recent years, researchers have proposed a series of models and algorithms for the data recovery problem of WSNs, and have achieved visible success. For example, a sensor data recovery method based on related rule mining is proposed by Nan; li et al propose a method for recovering sensing data based on physical and statistical models; in 2010, pan Lijiang et al proposed a sensing data estimation algorithm based on spatio-temporal correlation. It is worth mentioning that there is a strong correlation between adjacent time points of the WSNs data, for example, there is a certain degree of correlation between the time points before and after the temperature sensor and the illumination intensity sensor, and such a smooth evolution effect on the time axis can be a typical low rank structure on a mathematical model. Generally, such a low rank structure can be used in WSN data by matrix decomposition, and such as a method of crash et al compress data collected by WSNs by non-negative matrix decomposition, and obtain good results. However, it should be noted that in addition to temporal correlation in WSNs, there is also spatial correlation between sensor nodes, for example, the temperature change rule of nearby sensors in a temperature sensor is more similar than that of a sensor spaced farther apart, so that when the temperature sensor has data missing, the values of the nearby sensors are more obviously referred to.

The matrix completion method is an effective method for estimating missing values, but the estimation error is large because continuity between data is not considered at present. Therefore, in order to ensure the integrity of data in the wireless sensor network, how to estimate the missing value in combination with the spatial structural constraint aiming at the problem of data loss in the transmission of the sensing data is a big problem to be solved at present.

Disclosure of Invention

1. Technical problem to be solved by the invention

The invention provides a wireless sensor network missing value estimation method based on a space structure aiming at the problem of data loss in the transmission of sensing data.

2. Technical scheme

In order to achieve the purpose, the technical scheme provided by the invention is as follows:

a wireless sensor network missing value estimation method based on a space structure comprises the following steps:

step 1, restoring an original matrix according to partial elements of a known matrix by using the sparsity of matrix singular values;

and 2, converting structural information in the sensor network data into a mathematical graph structure, performing matrix completion on the original matrix in the step 1 based on structural constraint, adding a regularization item, and constraining a solution space of the matrix completion to obtain an optimal solution of the missing values of the matrix.

Further, the step of step 1 is as follows:

step 1.1, giving partial data M in sparse data set omega _ij I, j ∈ Ω ∈ {1,.., M } × {1,.., n }, finding the M × n value of the matrix M, M _ij The method comprises the following steps that (1) the element belongs to M, sensor data are abstracted into a low-rank matrix, the completion of the low-rank matrix is realized by solving a minimization problem, and the standard matrix completion problem is described as follows:

wherein, A _Ω (X)＝(M _ij∈Ω ) Represents an observation matrix M;

step 1.2, replacing the matrix rank with the matrix kernel norm, the formula is as follows:

wherein σ _k The k-th singular value is arranged from small to large;

the formula relaxation in step 1.1 is according to the above formula:

when the observation data is affected by noise, the above formula is:

wherein, γ _n As a coefficient, the coefficient l is determined according to the noise class.

Further, the formula in step 1.2 is processed by a least square method

And (6) fitting.

Further, the step 2 comprises the following steps:

step 2.2, abstracting the sensor network into a mathematical model as shown in the following: given an undirected weighting network G = (V, E, W), where vertex V = { 1.. Multidata, n }, edge

Represented by a non-negative weight matrix W. />

Is a matrix with a column vector ofm-dimensional vector, denoted as X = (X) ₁ ,...,x _n ) (ii) a The row vector is an n-dimensional vector and is expressed as X = ((X) ¹ ) ^T ,...,(x ^m ) ^T )；

Step 2.3, defining column value x ₁ ,...,x _n For the value of vertex V, it is smoothed when (j, j'). Epsilon.E, assuming x is satisfied _j ≈x _j′ Namely:

wherein, L = D-W,

is the laplace matrix of graph G;

step 2.4, according to the above formula, when the observation data in step 1.2 is affected by noise, the standard matrix completion problem will be converted as follows:

further, the separable function of the formula in step 2.4 is:

wherein, F (X) = gamma _n ||X|| _* ，

γ _n 、γ _r 、γ _c All are coefficients, whose augmented lagrange function is:

further, the non-differentiable term in step 2.4 is solved by using an alternating direction multiplier method.

Further, the alternating direction multiplier method performs iterative solution in a variable alternating updating mode, and the iterative solution method is as follows:

Z ^k+1 ＝Z ^k +ρ(X ^k+1 -Y ^k+1 )

formula (II)

Is approximately resolved into->

Wherein, U, V, Λ are respectively singular value decomposition of H, and are expressed as H = UΛ V ^T And H is H = Y ^k -ρ ^-1 Z ^k (ii) a According to the iterative solution, the formula in step 2.4 is converted into:

wherein H = X ^k+1 +ρ ^-1 Z ^k

And iterating the formula to obtain the optimal solution of the missing value of the matrix.

3. Advantageous effects

Compared with the prior known technology, the invention has the beneficial effects that:

(1) According to the method for estimating the missing value of the wireless sensor network based on the space structure, structural feature constraints of the sensor network are added on the basis of traditional matrix low-rank decomposition, so that the missing value of sensing data is estimated. When data recovery is carried out by a matrix completion method, a network structure is taken as a basis, a time relation is added as a constraint, the correlation of the WSNs in time is considered, and a structural constraint in space is added, so that the accuracy of data recovery can be obviously improved;

(2) The method for estimating the wireless sensor network missing value based on the space structure provided by the invention considers that the missing interval time of sensing data is a main factor influencing the performance of the algorithm, and tests the influence of the missing time interval of the data on the performance of the algorithm, the invention tests the performance of the algorithm with the time interval of 1-30 min, and can find that the error of the algorithm is smaller under different missing interval times of the data, so the missing time interval does not greatly influence the performance of the algorithm, and the method considers the space structure characteristics of WSNs, and the accuracy of the algorithm is more dependent on the space correlation of the data. In addition, when the interval time exceeds 15min, the estimation precision of the missing value of the temperature cannot be further improved, which shows that the result tends to be stable after the sample capacity reaches 30 min;

(3) The invention discloses a wireless sensor network missing value estimation method based on a space structure, which considers that the quantity of continuous missing values is another factor influencing the performance of an algorithm. The invention compares the performance of each algorithm when the number of continuous missing of the sensing data is 1 to 30. It can be seen that the error increases as the number of consecutive missing values increases, since the invention needs to take into account the non-missing data at the time instants adjacent to the missing values. When the time interval between a missing value and its neighboring non-missing perceptual data increases, the temporal correlation between the missing value and its neighboring non-missing perceptual data decreases, and thus the estimation error of the algorithm increases. Similarly, the accuracy of the algorithm of the invention is reduced due to the reduction of the correlation, but the invention has the spatial structure constraint of WSNs, so that the invention always has better accuracy and stability no matter how the continuous missing value changes.

Drawings

FIG. 1 is a data set experimental scenario diagram provided by the present invention;

FIG. 2 is a graph comparing the performance of different missing time interval algorithms provided by the present invention;

FIG. 3 is a graph comparing the performance of the algorithm with different numbers of consecutive missing values according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. The embodiments described herein are part of the embodiments of the present invention and not all of the embodiments. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention.

The invention discloses a wireless sensor network missing value estimation method based on a space structure, which comprises the following steps:

and step 1, performing matrix completion, wherein the aim of the matrix completion is to reconstruct an original complete matrix from the matrix with missing elements. In many specific problems, data is often organized in a matrix. However, due to sampling limitation, noise pollution and the like, these matrices often face problems of data missing, data abnormal and the like. In order to solve the problems, a compressed sensing theory is expanded to a matrix space, and an original matrix is restored according to the condition of partial elements of a known matrix by using the sparsity of matrix singular values, namely the low rank of the matrix;

step 2, matrix completion based on structural constraint, and a low-rank structure means that linear correlation exists between rows and columns of the matrix M, but the correlation is not structured generally. Therefore, when the matrix M has a certain structural relationship, the solution space of the matrix completion can be constrained in a mode of adding a regularization term, so that the solution space is closer to a true value. In the sensor network data faced by the present invention, it naturally has spatial structural constraints, and mathematically we can represent this structural information as a graph structure;

further, step 1 is performed as follows:

step 1.1, standard matrix completion queryThe problem is often described as: the standard matrix completion problem is generally described as: given a partial data M in the sparse data set Ω _ij I, j ∈ Ω ∈ {1,.., M } × {1,.., n }, finding the M × n value of the matrix M, M _ij E.g. M. In the WSNs data, the data passing through the sensors are continuously recorded, so that the data at adjacent time points have strong similarity, which constitutes a low rank matrix. Low-rank matrix completion can be implemented to complement unknown elements by solving a minimization problem, which is generally described as:

wherein A is _Ω (X)＝(M _ij∈Ω ) Representing the observation matrix M. Due to the matrix A _Ω The rank of (X) is a non-convex function, and equation (1) is a NP-complete problem. Therefore we need to approximate the solution of the problem using an approximate convex function.

(2) The present invention replaces the rank of the matrix with the kernel norm of the matrix proposed by Candes et al and is defined as

Wherein sigma _k Is the k-th singular value arranged from small to large. Thus, equation (1) can be relaxed as:

if M is noiseless or contains only gaussian noise, and Ω is uniformly distributed and sufficiently large, the minimum value of formula (2) is unique and will match with the minimum value of formula (1), and the target matrix Ω can be accurately recovered. However, if the observed data is affected by noise, there is some data far beyond the normal range, and equation (2) is:

formula (3)) It can be seen that the completion of the matrix characterizing the noise, i.e. the observed elements may contain noise, can reduce the influence of the noise to some extent, and is fitted with least squares, without the equality constraint of equation (2). Wherein, γ _n The coefficient l is determined according to the noise category, and is set as a Frobenius norm:

(invention A) _Ω To the observed matrix, -, is Hada Ma Chengji).

(3) Because it is very difficult to solve the non-convex problem, the iterative algorithm is easily trapped in a relatively poor local optimal solution or saddle point, or it takes a lot of time to solve a relatively good local optimal solution. Based on such problems, salakhutdinov and Srebro propose the use of weighted nuclear norms

Wherein p and q are observed data and respectively obey m rows and n columns distribution. Compared with the unweighted kernel norm, the weighted kernel norm has obvious promotion effect and can be quickly converged to the local optimal solution;

further, the process of matrix completion based on structural constraint in step 2 is as follows:

(1) Assuming that there is a sensor network composed of n sensors, each sensor is used as a vertex of the network, and the relationship between the sensors, such as energy relationship or topological relationship, is an edge of the network. In the present invention, we select the simplest spatial relationship as the edge of the network connection. It is assumed that the rows and columns of matrix M depend on the vertices of the graph. In a sensor network, the sensors (i.e., columns of matrix M) are vertices of the network, and the relationship between the sensors is the spatial relationship between the sensors.

(2) The abstraction is a mathematical model as follows: given an undirected weighting network G = (V, E, W), where vertex V = { 1.. Multidata, n }, edge

Represented by a non-negative weight matrix W. />

Is a matrix with m-dimensional column vectors, denoted as X = (X) ₁ ,...,x _n ) (ii) a The row vector is an n-dimensional vector and is denoted by X = ((X) ¹ ) ^T ,...,(x ^m ) ^T )。

(3) Defining a column value x ₁ ,...,x _n For the value of vertex V, the smoothing assumption satisfies x if (j, j'). Epsilon.E _j ≈x _j′ Namely:

wherein the content of the first and second substances,

l = D-W is the laplacian matrix of fig. G.

(4) These smoothing terms are added to the matrix solving problem as regularization terms (let l be the Frobenius norm),

(5) The above formula includes an undifferentiated term, so that the solution needs to be performed by means of an Alternating Direction multiplier (ADMM), and the ADMM equivalently decomposes the objective function of the original problem into a plurality of subproblems for which local solutions are easy to find, thereby obtaining a global solution of the original problem, which is a simple and effective Method [10] for solving the separable convex programming problem.

Wherein the separable function of formula (5) is:

wherein, F (X) = gamma _n ||X|| _* ，

γ _n 、γ _r 、γ _c Are all coefficients, whose augmented lagrange function is:

(6) Wherein F (X) = gamma _n ||X|| _* ，G(Y)＝l(A _Ω (X),A _Ω (M))+γ _r tr(XLX ^T ) And through converting into the above-mentioned form, the ADMM carries out iterative solution through a variable alternative updating mode, and the main iterative solution method is divided into the following three steps:

Z ^k+1 ＝Z ^k +ρ(X ^k+1 -Y ^k+1 ) (10)

(10) Equation (8) has a closed approximation:

the singular value decomposition of which U, V and Lambda are respectively H is expressed as H = ULambda V ^T And H is H = Y ^k -ρ ^-1 Z ^k . The formula (10) is to find Y ^k+1 Thereby minimizing

Equation (5) can be rewritten as:

H＝X ^k+1 +ρ ^-1 Z ^k . And (4) alternately updating according to the steps, and converging to the optimal solution of the formula (5) after iterating for a certain number of times.

(11) Therefore, the optimal solution of the matrix missing value can be obtained, and the sensor network is completed, namely the network with n top points is sequenced into m-dimensional column functions (m-dimensional time sequence).

(1) Introduction of data

In order to measure the effectiveness of the experimental method, the real sensor data set (http:// db. Lcs. Mit. Edu/labdata. Html) collected by the Inter inour project of the intel berkeley laboratory is adopted for carrying out the experiment, and fig. 1 is the experimental scene of the data set. The Inter inour project is that 54 Mica2Dot sensors are deployed in a 40m × 30m room, and sensing data are collected every 30 s. FIG. 1 is a diagram of an experimental scenario of a data set. The data samples collected by the berkeley laboratory have eight attributes including temperature, humidity, light and voltage values, date, etc. The invention selects temperature data to carry out experiments, and the experimental data are shown in table 1:

table 1 description of intel berkeley laboratory related data

Because the original sensing data set contains the missing values which cannot be restored, a section of data of the nodes with few missing values needs to be selected from the original data set in the experimental process, namely, the relatively complete part of the data is selected as the test data set, and the missing values are replaced by the average values of the sensing data at the adjacent moments, so that a complete test data set is formed.

(2) Structure matrix construction

The structural matrix reflects the closeness of the connection between the nodes, and the closeness is specified in the invention as the natural position relation. Since the sensor data used in the present invention contains respective position information, the positional relationship thereof can be represented by the euclidean distance in space. It should be noted that the distance matrix is a fully connected matrix, which will increase the noise of the laplacian matrix of the graph, and thus affect the optimization result. Therefore, this distance matrix must be thinned.

In the invention, two methods for thinning treatment are respectively selected: (1) The threshold method regards two nodes in the distance matrix larger than a certain threshold as no connection (the threshold is 0.35, so that only the connection with close relationship in the nodes is reserved. (2) The nearest neighbor method selects K nodes with the most compact connection around the nodes according to the distance matrix to construct the connection, and is worthy of being noticed as the structure matrix constructed in the way, and the degree of each node is K. In addition, for the convenience of the optimization process, the constructed structural matrix selected by the invention has no weight, namely, the connecting edge is 1, and otherwise, the connecting edge is 0.

(3) Results of the implementation

The invention also compares the experimental effects of the proposed algorithm with the typical LM (Linear Interpolation Model, LM) algorithm based on space-time correlation and the conventional NNI (Nearest Neighbor Interpolation, NNI) algorithm. In the experiment, known temperature data in a test data set are marked randomly as missing values, and then the missing values are estimated by using three algorithms respectively, so that the effectiveness of the algorithms is evaluated.

Considering how to accurately estimate missing data, the problem to be solved by the invention is that the accuracy of the algorithm to the missing value estimation is used as a standard for measuring the performance of the algorithm, and Root Mean Square Error (RMSE) of the estimated value and the original value is used as an evaluation metric. The smaller the RMSE, the better the recovery for the missing data.

/>

Wherein, y _it Is a true non-missing data value and,

to assume y _it After the missing value, mean is the estimated value obtained by the algorithm, and mean is expressed as estimating all the data marked as the missing value and averaging the residual values.

On the one hand, the missing interval time of the perception data is a main factor influencing the performance of the algorithm. In order to test the influence of the data missing time interval on the algorithm performance, the invention tests the algorithm performance with the time interval of 1-30 min, and fig. 2 is an experimental result. It can be found that under different data missing interval times, the errors of the algorithm of the invention are smaller than those of the NNI and LM algorithms, so the missing time interval does not have great influence on the performance of the algorithm of the invention, and the algorithm takes the spatial structure characteristics of the WSNs into consideration, and the accuracy rate of the algorithm is more dependent on the spatial correlation of the data. In addition, when the interval time exceeds 15min, the effect of improving the estimation precision of the missing value of the temperature is obviously reduced, which shows that the algorithm result tends to be stable after the sample capacity reaches 30 min.

On the other hand, the number of consecutive missing values is another factor that affects the performance of the algorithm. The performance of each algorithm when the number of continuous missing of sensing data is 1 to 30 is compared in the experiment of the invention, and fig. 3 is the experiment result. As can be seen from fig. 3, as the number of consecutive missing values increases, the error of each algorithm increases. The reason is that all three algorithms need to take into account non-missing data at the time instants adjacent to the missing value. As the time interval between a missing value and its neighboring non-missing perceptual data increases, the temporal correlation between the missing value and its neighboring non-missing perceptual data decreases, resulting in an increase in estimation error for the NNI and LM algorithms. Similarly, the accuracy of the algorithm of the invention is reduced due to the reduction of the correlation, but the algorithm of the invention has the spatial structure constraint of the WSNs, so that the algorithm of the invention always has better accuracy and stability no matter how the continuous missing value changes.

The data loss problem is an inherent problem in sensor networks. In order to reduce the influence of the missing value on the wireless sensor network, the invention provides a missing value estimation method based on a space structure, and structural feature constraints of the sensor network are increased on the basis of traditional matrix low-rank decomposition, so that the missing value of sensing data is estimated. Simulation experiments for estimating the temperature value on a data set acquired by a real sensor prove that the method has higher accuracy and stability.

The present invention and its embodiments have been described above schematically, without limitation, and what is shown in the drawings is only one of the embodiments of the present invention, and the actual structure is not limited thereto. Therefore, if the person skilled in the art receives the teaching, without departing from the spirit of the invention, the person skilled in the art shall not inventively design the similar structural modes and embodiments to the technical solution, but shall fall within the scope of the invention.

Claims

1. A wireless sensor network missing value estimation method based on a space structure is characterized in that: the method comprises the following steps:

step 1, restoring an original matrix according to partial elements of a known matrix by using the sparsity of singular values of the matrix;

step 2, converting the structural information in the sensor network data into a mathematical graph structure, and performing matrix completion on the original matrix recovered in the step 1 based on structural constraint:

in the sensor network, sensors are vertexes of the sensor network, and the relationship among the sensors is a spatial relationship among the sensors; it is abstracted as a mathematical model as follows: given an undirected weighting network G = (V, E, W), where vertex V = { 1.. Multidata, n }, edge

Represented by a non-negative weight matrix W; />

Is a matrix with m-dimensional column vectors, denoted as X = (X) ₁ ,...,x _n ) (ii) a The row vector is an n-dimensional vector and is denoted by X = ((X) ¹ ) ^T ,...,(x ^m ) ^T ) (ii) a Defining a column value x ₁ ,...,x _j ，...,x _n For the value of vertex V, if (j, j') ∈ E, we obtain:

wherein the content of the first and second substances,

l = D-W is the laplacian matrix of fig. G;

and adding the obtained smooth term as a regularization term into a matrix solving problem:

wherein, gamma is _n 、γ _r As a coefficient, the coefficient l is determined according to the noise class; a. The _Ω (X)＝(M _ij∈Ω ) Representing observation matrices M, M _ij I, j e Ω e {1,.., M } × {1,.., n } is a partial data in a given sparse data set Ω, M _ij ∈M；

And solving to obtain the optimal solution of the matrix missing value, and completing the sensor network.

2. The method of estimating missing values in a wireless sensor network according to claim 1, wherein: the step 1 comprises the following steps:

step 1.1, giving partial data M in sparse data set omega _ij I, j ∈ Ω ∈ {1,.., M } × {1,.., n }, finding the M × n value of the matrix M, M _ij The element belongs to M, sensor data is abstracted into a low-rank matrix, completion of unknown elements is achieved by solving a minimization problem, and a standard matrix completion problem is described as follows:

wherein, A _Ω (X)＝(M _ij∈Ω ) Represents an observation matrix M;

wherein σ _k The k-th singular value is arranged from small to large;

the formula relaxation in step 1.1 is according to the above formula:

when the observation data is affected by noise, the above formula is:

3. The wireless sensor network missing value estimation method according to claim 2, characterized in that: using least squares to correct for the formula in step 1.2

And (6) fitting.

4. The method for estimating missing values in a wireless sensor network according to claim 1, wherein: solving the non-differentiable items in the matrix solving problem in the step 2 by adopting an alternating direction multiplier method to obtain separable functions as follows:

wherein F (X) = gamma _n ||X|| _* ，

γ _n 、γ _r 、γ _c Are all the coefficients of the linear vibration motor,the augmented Lagrangian function is:

wherein G (Y) = [ l (A) _Ω (X),A _Ω (M))+γ _r tr(XLX ^T )]The alternating direction multiplier method carries out iterative solution in a variable alternating updating mode, and the iterative solution method is as follows:

Z ^k+1 ＝Z ^A +ρ(X ^k+1 -y ^k+1 )

formula (II)

Is closed approximate solution as->

Wherein, U, V, Λ are respectively singular value decomposition of H, and are expressed as H = UΛ V ^T And H is H = Y ^k -ρ ^-1 Z ^k (ii) a According to the iterative solution, converting the matrix solution problem into:

wherein H = X ^k+1 +ρ ^-1 Z ^k