Background
Wireless Sensor Networks (WSNs) are a brand-new information acquisition and processing technology, which are widely used in many fields such as military, environmental monitoring, disaster relief, industrial control, smart home, and the like, and are the key research points in the information field. The wireless sensor node is usually directly exposed in the external environment, and frequent disconnection of a communication link can be caused by weather conditions, stability of the sensor device, human factors and the like, so that the phenomenon of data loss or data abnormality of acquired sensing data in transmission can be caused.
In general, conventional processing methods for missing data include three types: firstly, deleting missing data directly; secondly, the data is not processed at all, and the current algorithm is directly used; and thirdly, filling missing data. Although the first method is simple and easy to use, with the arrival of the big data era, the sparse characteristic of data is more and more serious, the amount of missing data is gradually increased, the missing data items are discarded, the overall characteristics of the data are influenced, even the data mining result is seriously influenced, and operators make wrong judgments, so that great artificial loss is caused. The second method has to face the current situation that the traditional data mining algorithm processes complete data, and needs to modify the classical algorithm correspondingly to make the classical algorithm suitable for missing data, and more importantly, the modification task is heavy and some methods cannot be realized. In addition, although some analysis algorithms for incomplete data appear at present, the problems of high algorithm complexity, poor processing effect and the like generally exist. Filling in missing data is therefore the most desirable method of incomplete data processing. Incomplete data filling refers to that one or more predicted values closest to missing data are obtained by using other known auxiliary information and a specific method or model, and then the missing data is filled by using the predicted values to obtain a complete data set, so that the data set is close to an original data set as much as possible.
In recent years, researchers have proposed a series of models and algorithms for the data recovery problem of WSNs, and have achieved visible success. A sensing data recovery method based on relevant rule mining is proposed as Nan; li et al propose a method for recovering sensing data based on physical and statistical models; in 2010, Panli et al proposed a sensing data estimation algorithm based on spatio-temporal correlation. It is worth mentioning that there is a strong correlation between adjacent time points of the WSNs data, for example, there is a certain correlation between the time points before and after the temperature sensor and the illumination intensity sensor, and such a smooth evolution effect on the time axis can be a typical low rank structure on the mathematical model. Generally, such a low rank structure can be used in WSN data by matrix decomposition, and such as a method of crash et al compress data collected by WSNs by non-negative matrix decomposition, and obtain good results. However, it should be noted that in addition to temporal correlation in WSNs, there is also spatial correlation between sensor nodes, for example, the temperature change rule of nearby sensors in a temperature sensor is more similar than that of a sensor spaced farther apart, so that when the temperature sensor has data missing, the values of the nearby sensors are more obviously referred to.
The matrix completion method is an effective method for estimating missing values, but the estimation error is large because continuity between data is not considered at present. Therefore, in order to ensure the integrity of data in the wireless sensor network, how to estimate the missing value in combination with the spatial structural constraint aiming at the problem of data loss in the transmission of the sensing data is a big problem to be solved at present.
Disclosure of Invention
1. Technical problem to be solved by the invention
The invention provides a wireless sensor network missing value estimation method based on a space structure aiming at the problem of data loss in the transmission of sensing data.
2. Technical scheme
In order to achieve the purpose, the technical scheme provided by the invention is as follows:
a wireless sensor network missing value estimation method based on a space structure comprises the following steps:
step 1, restoring an original matrix according to partial elements of a known matrix by using the sparsity of matrix singular values;
and 2, converting structural information in the sensor network data into a mathematical graph structure, performing matrix completion on the original matrix in the step 1 based on structural constraint, adding a regularization item, and constraining a solution space of the matrix completion to obtain an optimal solution of the matrix missing value.
Further, the step of step 1 is as follows:
step 1.1, theDetermining partial data M in sparse data set omegaijI, j ∈ Ω ∈ {1,.., M } × {1,.., n }, finding the M × n value of the matrix M, MijThe element belongs to M, sensor data is abstracted into a low-rank matrix, the low-rank matrix completion realizes the completion of unknown elements by solving a minimization problem, and the standard matrix completion problem is described as follows:
wherein A isΩ(X)=(Mij∈Ω) Represents an observation matrix M;
step 1.2, replacing the matrix rank with the matrix kernel norm, the formula is as follows:
wherein σkThe k-th singular value is arranged from small to large;
the formula relaxation in step 1.1 is according to the above formula:
when the observation data is affected by noise, the above formula is:
wherein, γnAs a coefficient, the coefficient l is determined according to the noise class.
Further, the formula in step 1.2 is processed by a least square methodAnd (6) fitting.
Further, the step 2 comprises the following steps:
step 2.2, abstracting the sensor network into a mathematical model as shown in the following: given an undirected weighting network G ═ (V, E, W), where vertex V ═ 1.., n }, edgeRepresented by a non-negative weight matrix W.Is a matrix, the column vector of which is m-dimensional vector and is expressed as X ═ X (X)1,...,xn) (ii) a The row vector is an n-dimensional vector and is represented by X ═ X ((X)1)T,...,(xm)T);
Step 2.3, defining column value x1,...,xnFor the value of vertex V, it is smoothed when (j, j'). epsilon.E, assuming x is satisfiedj≈xj′Namely:
wherein, L is D-W,is the laplace matrix of graph G;
step 2.4, according to the above formula, when the observation data in step 1.2 is affected by noise, the standard matrix completion problem will be converted as follows:
further, the separable function of the formula in step 2.4 is:
wherein F (X) ═ γn||X||*,γn、γr、γcAll are coefficients, whose augmented lagrange function is:
further, the non-differentiable term in step 2.4 is solved by using an alternating direction multiplier method.
Further, the alternating direction multiplier method performs iterative solution in a variable alternating updating mode, and the iterative solution method is as follows:
Zk+1=Zk+ρ(Xk+1-Yk+1)
formula (II)Is approximately solved as
Wherein, U, V, Λ are singular value decomposition of H respectively, expressed as H ═ UΛ VTAnd H is H ═ Yk-ρ-1Zk(ii) a According to the iterative solution, the formula in step 2.4 is converted into:
wherein H ═ Xk+1+ρ-1Zk
And (4) iterating the formula to obtain the optimal solution of the matrix missing value.
3. Advantageous effects
Compared with the prior known technology, the invention has the beneficial effects that:
(1) according to the method for estimating the missing value of the wireless sensor network based on the space structure, structural feature constraints of the sensor network are added on the basis of traditional matrix low-rank decomposition, so that the missing value of sensing data is estimated. When data recovery is carried out by a matrix completion method, a network structure is taken as a basis, a time relation is added as a constraint, the correlation of the WSNs in time is considered, and a structural constraint in space is added, so that the accuracy of data recovery can be obviously improved;
(2) according to the method for estimating the wireless sensor network missing value based on the spatial structure, the fact that the missing interval time of sensing data is a main factor influencing the performance of the algorithm is considered, in order to test the influence of the data missing interval on the performance of the algorithm, the performance of the algorithm with the time interval of 1-30 min is tested, and it can be found that the error of the algorithm is small under different data missing interval times, so that the missing interval does not greatly influence the performance of the algorithm, and the method considers the spatial structure characteristics of WSNs, and the accuracy of the algorithm is more dependent on the spatial correlation of the data. In addition, when the interval time exceeds 15min, the estimation precision of the missing value of the temperature cannot be further improved, which shows that the result tends to be stable after the sample capacity reaches 30 min;
(3) the invention discloses a wireless sensor network missing value estimation method based on a space structure, which considers that the quantity of continuous missing values is another factor influencing the performance of an algorithm. The invention compares the performance of each algorithm when the number of continuous missing of the sensing data is 1 to 30. It can be seen that the error increases as the number of consecutive missing values increases, since the invention needs to take into account the non-missing data at the time instants adjacent to the missing values. When the time interval between a missing value and its neighboring non-missing perceptual data increases, the temporal correlation between the missing value and its neighboring non-missing perceptual data decreases, and thus the estimation error of the algorithm increases. Similarly, the accuracy of the algorithm of the invention is reduced due to the reduction of the correlation, but the invention has the spatial structure constraint of WSNs, so that the invention always has better accuracy and stability no matter how the continuous missing value changes.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. The described embodiments are a subset of the embodiments of the invention and are not all embodiments of the invention. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention.
The invention discloses a wireless sensor network missing value estimation method based on a space structure, which comprises the following steps:
and step 1, performing matrix completion, wherein the aim of the matrix completion is to reconstruct an original complete matrix from the matrix with missing elements. In many specific problems, data is often organized in a matrix. However, due to sampling limitation, noise pollution and the like, these matrices often face problems of data missing, data abnormality and the like. In order to solve the problems, a compressed sensing theory is expanded to a matrix space, and an original matrix is restored according to the condition of partial elements of a known matrix by using the sparsity of matrix singular values, namely the low rank of the matrix;
step 2, matrix completion based on structural constraint, and a low-rank structure means that linear correlation exists between rows and columns of the matrix M, but the correlation is not structured generally. Therefore, when the matrix M has a certain structural relationship, the solution space of the matrix completion can be constrained in a mode of adding a regularization term, so that the solution space is closer to a true value. In the sensor network data faced by the present invention, the data naturally has spatial structural constraints, and mathematically, we can represent the structural information as a graph structure;
further, step 1 is performed as follows:
step 1.1, the standard matrix completion problem is often described as: the standard matrix completion problem is generally described as: given partial data M in sparse data set ΩijI, j ∈ Ω ∈ {1,.., M } × {1,.., n }, finding the M × n value of the matrix M, MijE.g. M. In the WSNs data, the data passing through the sensors are continuously recorded, so that the data at adjacent time points have strong similarity, which constitutes a low rank matrix. Low-rank matrix completion can be implemented to complement unknown elements by solving a minimization problem, which is generally described as:
wherein A isΩ(X)=(Mij∈Ω) Representing the observation matrix M. Due to the matrix AΩThe rank of (X) is a non-convex function, and equation (1) is a NP-complete problem. We need to approximate the solution of the problem using an approximate convex function.
(2) The present invention replaces the rank of the matrix with the kernel norm of the matrix proposed by Candes et al, and is defined asWherein sigmakIs the k-th singular value arranged from small to large. Thus, equation (1) can be relaxed as:
if M is noiseless or contains only gaussian noise, and Ω is uniformly distributed and sufficiently large, the minimum value of formula (2) is unique and will match with the minimum value of formula (1), and the target matrix Ω can be accurately recovered. However, if the observation data is affected by noise, there is some data far beyond the normal range, and equation (2) is:
equation (3) can be regarded as a matrix complement that characterizes noise, i.e., the observed elements may contain noise, which can reduce the influence of noise to some extent, and the least square fitting is used instead of the equation constraint of equation (2). Wherein, γnThe coefficient l is determined according to the noise category, and is set as a Frobenius norm:(invention A)ΩTo the observed matrix, -, is the hadamard product).
(3) Since solving the non-convex problem is very difficult, iterationThe algorithm is prone to fall into a relatively poor locally optimal solution or saddle points, or it takes a lot of time to find a relatively good locally optimal solution. Based on such problems, Salakhutdinov and Srebro propose the use of weighted nuclear normsWherein p and q are observed data and respectively obey m rows and n columns distribution. Compared with the unweighted kernel norm, the weighted kernel norm has obvious promotion effect and can be quickly converged to the local optimal solution;
further, the process of matrix completion based on structural constraint in step 2 is as follows:
(1) assuming that there is a sensor network composed of n sensors, each sensor is used as a vertex of the network, and the relationship between the sensors, such as energy relationship or topological relationship, is an edge of the network. In the present invention, we select the simplest spatial relationship as the edge of the network connection. Assume that the rows and columns of matrix M depend on the vertices of the graph. In a sensor network, the sensors (i.e., the columns of the matrix M) are vertices of the network, and the relationship between the sensors is a spatial relationship between the sensors.
(2) The abstraction is a mathematical model as follows: given an undirected weighting network G ═ (V, E, W), where vertex V ═ 1.., n }, edgeRepresented by a non-negative weight matrix W.Is a matrix, the column vector of which is m-dimensional vector and is expressed as X ═ X (X)1,...,xn) (ii) a The row vector is an n-dimensional vector and is represented by X ═ X ((X)1)T,...,(xm)T)。
(3) Defining a column value x1,...,xnFor the value of vertex V, the smoothing assumption satisfies x if (j, j'). epsilon.Ej≈xj′Namely:
wherein,L-D-W is the laplacian matrix of fig. G.
(4) These smoothing terms are added to the matrix solving problem as regularization terms (let l be the Frobenius norm),
(5) the above formula includes an undifferentiated term, so that the solution needs to be performed by means of an alternating direction multiplier (ADMM), and the ADMM equivalently decomposes the objective function of the original problem into a plurality of subproblems for which local solutions are easy to find, thereby obtaining a global solution of the original problem, which is a simple and effective Method [10] for solving the separable convex programming problem.
Wherein the separable function of formula (5) is:
wherein F (X) ═ γn||X||*,γn、γr、γcAll are coefficients, whose augmented lagrange function is:
(6) wherein F (X) ═ gamman||X||*,G(Y)=l(AΩ(X),AΩ(M))+γrtr(XLXT) And through converting into the above-mentioned form, the ADMM carries out iterative solution through a variable alternative updating mode, and the main iterative solution method is divided into the following three steps:
Zk+1=Zk+ρ(Xk+1-Yk+1) (10)
(10) equation (8) has a closed approximation:where U, V, Λ are the singular value decompositions of H, respectively, denoted as H ═ U Λ VTAnd H is H ═ Yk-ρ-1Zk. The formula (10) is to find Yk+1Thereby minimizingEquation (5) can be rewritten as:
H=Xk+1+ρ-1Zk. And (4) performing alternate updating according to the steps, and converging to the optimal solution of the formula (5) after iterating for a certain number of times.
(11) Therefore, the optimal solution of the matrix missing value can be obtained, and the sensor network is completed, namely the network with n top points is sequenced into m-dimensional column functions (m-dimensional time sequence).
(1) Introduction of data
In order to measure the effectiveness of the experimental method, the real sensor data set (http:// db. lcs. mit. edu/labdata. html) collected by the Inter inour project of the intel berkeley laboratory is adopted for carrying out the experiment, and fig. 1 is the experimental scene of the data set. The Inter inoor project is that 54 Mica2Dot sensors are deployed in a 40m × 30m room, and sensing data is collected every 30 s. FIG. 1 is a diagram of an experimental scenario of a data set. The data samples collected by the berkeley laboratory have eight attributes including temperature, humidity, light and voltage values, date, etc. The invention selects temperature data to carry out experiments, and the experimental data are shown in table 1:
table 1 description of intel berkeley laboratory related data
Because the original sensing data set contains the missing values which cannot be restored, a section of data of the nodes with few missing values needs to be selected from the original data set in the experimental process, namely, the relatively complete part of the data is selected as the test data set, and the missing values are replaced by the average values of the sensing data at the adjacent moments, so that a complete test data set is formed.
(2) Structure matrix construction
The structural matrix reflects the closeness of the connection between the nodes, and the closeness is specified in the invention as the natural position relation. Since the sensor data used in the present invention contains respective position information, the positional relationship thereof can be represented by the euclidean distance in space. It should be noted that the distance matrix is a fully connected matrix, which will increase the noise of the laplacian matrix of the graph, and thus affect the optimization result. Therefore, this distance matrix must be thinned.
In the invention, two methods for thinning treatment are respectively selected: (1) the threshold method regards two nodes in the distance matrix larger than a certain threshold as no connection (the threshold is 0.35, so that only the connection with close relationship in the nodes is reserved. (2) The nearest neighbor method selects K nodes with the most compact connection around the nodes according to the distance matrix to construct the connection, and is worthy of being noticed as the structure matrix constructed in the way, and the degree of each node is K. In addition, for the convenience of the optimization process, the constructed structural matrix selected by the invention has no weight, namely, the connecting edge is 1, and otherwise, the connecting edge is 0.
(3) Results of the implementation
The invention also compares the experimental effect of the proposed algorithm with the typical LM (Linear orthogonal model, LM) algorithm based on space-time correlation and the NNI (Nearest Neighbor Interpolation, NNI) of the traditional Nearest Neighbor Interpolation method. In the experiment, known temperature data in a test data set are marked randomly as missing values, and then the missing values are estimated by using three algorithms respectively, so that the effectiveness of the algorithms is evaluated.
Considering how to accurately estimate missing data, the problem to be solved by the invention is that the accuracy of the algorithm to the missing value estimation is used as a standard for measuring the performance of the algorithm, and the Root Mean Square Error (RMSE) of the estimated value and the original value is used as an evaluation metric. The smaller the RMSE, the better the recovery for the missing data.
Wherein, yitFor a true non-missing data value,to assume yitAfter the missing value, mean is the estimated value obtained by the algorithm, and mean is expressed as estimating all the data marked as the missing value and averaging the residual values.
On the one hand, the missing interval time of the perception data is a main factor influencing the performance of the algorithm. In order to test the influence of the data missing time interval on the algorithm performance, the algorithm performance with the time interval of 1-30 min is tested, and fig. 2 is an experimental result. It can be found that under different data missing interval times, the errors of the algorithm of the invention are smaller than those of the NNI and LM algorithms, so the missing time interval does not have great influence on the performance of the algorithm of the invention, and the algorithm takes the spatial structure characteristics of the WSNs into consideration, and the accuracy rate of the algorithm is more dependent on the spatial correlation of the data. In addition, when the interval time exceeds 15min, the effect of improving the estimation precision of the missing value of the temperature is obviously reduced, which shows that the algorithm result tends to be stable after the sample capacity reaches 30 min.
On the other hand, the number of consecutive missing values is another factor that affects the performance of the algorithm. The performance of each algorithm when the number of continuous missing of sensing data is 1 to 30 is compared in the experiment of the invention, and fig. 3 is the experiment result. As can be seen from fig. 3, as the number of consecutive missing values increases, the error of each algorithm increases. The reason is that all three algorithms need to take into account non-missing data at the time instants adjacent to the missing value. As the time interval between a missing value and its neighboring non-missing perceptual data increases, the temporal correlation between the missing value and its neighboring non-missing perceptual data decreases, resulting in an increase in estimation error for the NNI and LM algorithms. Similarly, the accuracy of the algorithm of the invention is reduced due to the reduction of the correlation, but the algorithm of the invention has the spatial structure constraint of the WSNs, so that the algorithm of the invention always has better accuracy and stability no matter how the continuous missing value changes.
The data loss problem is an inherent problem in sensor networks. In order to reduce the influence of the missing value on the wireless sensor network, the invention provides a missing value estimation method based on a space structure, and structural feature constraints of the sensor network are added on the basis of traditional matrix low-rank decomposition, so that the missing value of the sensing data is estimated. Simulation experiments for estimating the temperature value on a data set acquired by a real sensor prove that the method has higher accuracy and stability.
The present invention and its embodiments have been described above schematically, without limitation, and what is shown in the drawings is only one of the embodiments of the present invention, and the actual structure is not limited thereto. Therefore, if the person skilled in the art receives the teaching, without departing from the spirit of the invention, the person skilled in the art shall not inventively design the similar structural modes and embodiments to the technical solution, but shall fall within the scope of the invention.