CN108289285B

CN108289285B - Method for recovering and reconstructing lost data of ocean wireless sensor network

Info

Publication number: CN108289285B
Application number: CN201810030395.8A
Authority: CN
Inventors: 吴华锋; 鲜江峰
Original assignee: Shanghai Maritime University
Current assignee: Shanghai Maritime University
Priority date: 2018-01-12
Filing date: 2018-01-12
Publication date: 2020-11-13
Anticipated expiration: 2038-01-12
Also published as: CN108289285A

Abstract

The invention discloses a method for recovering and reconstructing lost data of a marine wireless sensor network, which comprises the following steps: clustering nodes and recovering lost data; wherein the clustering of the nodes into clusters comprises: firstly, arranging nodes meeting the communication requirement and the real-time monitoring requirement of the ocean wireless sensor network in an ocean area to be monitored, completing topology construction of the ocean wireless sensor network, and effectively clustering the deployed nodes into clusters according to an improved K-means algorithm; wherein the lost data recovery comprises: when node data are lost, mining the spatiotemporal correlation of the node data in the data loss cluster by using a RBF neural network optimized by a PSO algorithm, and further recovering the lost data value according to the historical round data and the current round data in the lost cluster. The invention can adapt to the high dynamic property of the marine wireless sensor network topological structure, and can reduce the transmission energy consumption of data between nodes, thereby achieving the purpose of prolonging the service life of the network.

Description

Method for recovering and reconstructing lost data of ocean wireless sensor network

Technical Field

The invention relates to a data prediction and transmission technology of a marine wireless sensor network, in particular to a lost data recovery and reconstruction method of a marine wireless sensor network (OWSNs).

Background

The real-time acquisition of ocean data is a prerequisite for comprehensively understanding the ocean, developing ocean resources and protecting the ocean in the 21 st century. Wireless Sensor Networks (WSNs) are widely used in the field of environmental monitoring due to their low power consumption, low cost, distributed and self-organized characteristics. In the marine ecological environment monitoring, a large number of sensor nodes thrown in a target sea area can be self-organized into a monitoring network with good adaptability rapidly in a wireless multi-hop mode, so that the requirement of acquiring marine data in real time can be met.

In recent years, marine wireless sensor networks have been widely used for marine environmental monitoring, such as marine oil pollution diffusion monitoring, marine water quality real-time monitoring, marine information acquisition and the like. The marine wireless sensor network mainly uses a wireless sensor network technology to realize real-time monitoring of marine ecological environment and marine data acquisition.

However, data in marine wireless sensor networks is susceptible to large-scale loss due to various reasons (hardware failures, packet collisions, signal attenuation, insufficient energy, time asynchrony, malicious attacks). This requires recovery of the missing data to obtain a complete set of environmental data. Recovery of lost data is a fundamental operation in the data collection process.

In the ocean wireless sensor network, an important problem to be solved is to prolong the life cycle of the network as far as possible, and simultaneously ensure that information of each node can be transmitted to a sink node, so as to ensure that network monitoring information can be timely and effectively transmitted to a control center for relevant departments and users to use. Due to the large-scale intensive deployment of the marine wireless sensor network nodes, the space-time redundancy of data acquired by the network is high. If all the collected ocean data are sent to the convergence terminal by the nodes, not only is a large amount of energy consumption generated, but also congestion of a data transmission channel is caused. Reducing the amount of data transmission in OWSNs using data recovery and reconstruction techniques is an effective means to reduce network energy consumption.

The existing data recovery and prediction method mainly recovers data based on a time correlation model and a probability model, but both the two models have larger data recovery errors under the condition that the monitored data has larger change over time, and the influence of the real-time dynamic change characteristic of the topological structure of the marine wireless sensor network and the wave shielding effect on data transmission between nodes is considered, so that the time correlation and the space correlation of marine environment data are combined together to provide a new method for recovering and reconstructing lost data of the marine wireless sensor network.

Disclosure of Invention

The invention aims to provide a method for recovering and reconstructing lost data of a marine wireless sensor network, which solves the problem of recovering the lost data of nodes of the marine wireless sensor network, adapts to the high dynamics of a topological structure of the marine wireless sensor network, and can reduce the energy consumption for data transmission among the nodes, thereby achieving the purpose of prolonging the service life of the network.

In order to achieve the purpose, the invention is realized by the following technical scheme:

a method for recovering and reconstructing lost data of a marine wireless sensor network is characterized by comprising the following steps: clustering nodes and recovering lost data;

wherein the clustering of the nodes into clusters comprises: firstly, arranging nodes meeting the communication requirement and the real-time monitoring requirement of the ocean wireless sensor network in an ocean area to be monitored, completing topology construction of the ocean wireless sensor network, and effectively clustering the deployed nodes into clusters according to an improved K-means algorithm;

wherein the lost data recovery comprises: when node data are lost, mining the spatiotemporal correlation of the node data in the data loss cluster by using a RBF neural network optimized by a PSO algorithm, and further recovering the lost data value according to the historical round data and the current round data in the lost cluster.

The ocean wireless sensor network communication requirement is that sensor nodes self-organize in the ocean monitoring area to form a wireless sensor network and complete the initialization of a topological routing structure, and the network completely covers the ocean area to be monitored.

The specific process of clustering the nodes is as follows:

selecting a clustering center:

considering a sensor node dataset to be clustered

As a set of corresponding indices, d_ij＝dist(x_i,x_j) Representing a node x_iAnd x_jDistance between, for any sensor node x_iDefining the local density p_iAnd distance_iTwo quantities characterize the cluster center;

computing x by Gaussian kernel algorithm_iLocal density of (p)_i：

Wherein d is_cThe selection of (1) is that the average number of neighbors of each node is 1% -2% of the number of all nodes;

distance between two adjacent plates_i

Wherein

Is that

In descending order;

thus far, for any node x in the node set S_iCan be calculated to obtain (rho)_i,_i),i∈I_SThe point with larger rho value and value is the clustering center, and gamma can be calculated at the same time_i＝ρ_i _iThe larger the value of γ, the more likely it is a cluster center.

The improved K-means algorithm inputs are as follows:

node data set

And k determined initial cluster centers;

the output of the improved K-means algorithm is K clusters which meet the criterion function

Converging;

wherein: k is the number of clusters, k_iIs the number of nodes in the ith cluster, w_ijIs the jth node in the ith cluster,

is the cluster center of the ith cluster,

is defined as follows

The improved K-means algorithm executes the following steps:

A) dividing the nodes into clusters with the shortest distance according to the distance;

B) recalculating the average value of the objects in each cluster, and updating the cluster center of the cluster;

C) A-B are repeated until the criterion function E does not change.

The specific process of mining the time-space correlation of the node data in the data loss cluster by using the RBF neural network optimized by the PSO algorithm comprises the following steps:

and determining RBF neural network parameters by using an improved adaptive particle swarm algorithm to obtain an optimal network structure, then mining the time-space correlation of the marine sensing data by using the adaptive particle swarm algorithm and the obtained optimal network structure, and finally realizing accurate recovery and reconstruction of the lost data in the cluster.

In the lost node data recovery process, if a single node is clustered, the historical round data of the node is used for data recovery, and if a plurality of nodes are clustered, the historical round data of the lost node and the current round data of other nodes in the cluster are used for data recovery.

The method for determining RBF neural network parameters by using the improved adaptive particle swarm optimization to obtain the optimal network structure specifically comprises the following steps:

if the RBF neural network has k centers, each center is m-dimensional, the position of a particle is k × (m +1) dimensional, the velocity of the corresponding particle is also k × (m +1) dimensional, and the fitness of the particle is added, the coding structure of the particle is as follows:

X₁₁X₁₂…X_1mσ₁…X_i1X_i2…X_imσ_i…X_k1X_k2…X_kmσ_k

V₁V₂…V_k×(m+1)

f(x)

wherein: x_i1X_i2…X_imThe position of the ith (i ═ 1.., k.) neural network center; sigma_iIs the width of the basis function; v₁V₂…V_kX (m +1) is the velocity of the particle; (x) is a fitness function of the particle;

the purpose of RBF neural network training is to search the optimal value of the parameter to make the node lose data and recover the mean square error and ERR_iMinimum, and therefore ERR is selected_iIs a fitness function, i.e. when the fitness value f is_iWhen the maximum value is taken, the RBF neural network structure is optimal, and the fitness function f of the ith individual_iAs follows

Wherein ERR_iMean square error, y, of recovered values for lost data of a node_kFor the actual marine environment monitoring data values,

values are recovered for lost data for the node.

Compared with the prior art, the invention has the following advantages:

compared with the prior art, the data loss and reconstruction method for the marine wireless sensor network has the advantages that the improved K-means algorithm can be used for clustering OWSNs nodes in real time, the characteristic of real-time dynamic change of a topological structure of the marine wireless sensor network can be well met, and meanwhile, the good nonlinear mapping capacity of APSO-RBFNNs is utilized to mine the time-space correlation of the nodes in the same cluster, so that the data loss value of the nodes can be accurately recovered.

If all network nodes send the sensing environment data to the Sink node, not only a large amount of energy is consumed, but also limited OWSNs bandwidth resources are wasted, packet collision is caused, communication efficiency is reduced, and energy consumption and time delay of data transmission are increased. The lost data recovery algorithm is adopted at two ends of node data transmission, if the recovered data and the actual monitoring data are within the error threshold value given by a user, the recovered data are used for replacing the monitoring data, the transmission quantity of the data in the network is reduced, and therefore the energy consumption of the network is reduced and the acquisition efficiency of ocean big data is improved.

Drawings

Fig. 1 is a flowchart of a method for recovering and reconstructing lost data of a marine wireless sensor network according to the present invention.

FIG. 2 is a schematic diagram of clustering of the nodes of the line sensor network by using an improved K-means algorithm.

FIG. 3 is a schematic diagram of location topology of 54 nodes in an Intel laboratory and clustering using the improved K-means algorithm.

Fig. 4A and 4B are graphs comparing the prediction result and the real data of the data recovery and reconstruction method of the present invention.

Fig. 4C is a data recovery error diagram for the data recovery and reconstruction method of the present invention.

FIG. 5 is a graph comparing the recovery errors of the method of the present invention with NCGP and MASTER.

Fig. 6 is a graph comparing data transmission amounts when the method of the present invention, NGCP and data recovery algorithm are not used.

Detailed Description

The present invention will now be further described by way of the following detailed description of a preferred embodiment thereof, taken in conjunction with the accompanying drawings.

As shown in fig. 1, a method for recovering and reconstructing lost data of a marine wireless sensor network (OWSNs) includes clustering nodes and recovering lost data.

The clustering of the nodes into clusters comprises:

firstly, nodes meeting OWSNs communication requirements and real-time monitoring requirements are arranged in an ocean area to be monitored, topology and route construction of an ocean wireless sensor network are completed, and then the deployed nodes are effectively clustered according to an improved K-means algorithm. Specifically, n sensor nodes are deployed for the sea area to be monitored, and the fact that the sea area to be monitored is completely covered by the ocean wireless sensor network signals is guaranteed. Wireless sensor network P ═ { P) composed of n nodes₁,p₂,…,p_nData G ═ G generated by the network₁,g₂,…,g_nIn which g is_iIs node p_iOne time series data set H ═ H acquired with time T as a period_i(t₁),h_i(t₂),h_i(t₃) … }. N nodes in the network recover data R ═ { R ═ R₁,R₂,…,R_nIn which node p_iHas a recovered data value of R_i＝{R_i(t₁),R_i(t₂),R_i(t₃) … }. The OWSN communication requirements are as follows: the sensor nodes self-organize in the monitoring sea area to form a wireless sensor network and complete the initialization of a topological routing structure, and the network comprehensively covers the target monitoring sea area.

Next, the local density ρ value and the relative distance value of each sensor node are calculated using equations (1) to (2). As shown in fig. 2, a schematic diagram of clustering nodes of deployed sensors in a monitoring sea area is shown, and four clusters of CH1, CH2, CH3 and CH4 are finally formed. The common node sends the collected environment information to the cluster head node, and then the cluster head node sends the received information of the common node and the information collected by the node to the base station.

Clustering Intel indoor project nodes: the project is developed by an Intel Berkeley Research Lab, 54 nodes provided with a TinyOS operating system are deployed in a test room, the nodes collect data including temperature, humidity, illumination, voltage and the like every 31 seconds, and a TinyDB processing network is used for processing the dataAnd (6) collecting. The collection work continued from 2/28/2004 up to 4/5/day. The location topology of 54 nodes in the Intel laboratory and the clustering result by using the improved K-means algorithm are shown in FIG. 3. The clustering process is as follows: 1) calculating the local density ρ of each node_iThe value: since the cut-off kernel is a discrete value and the Gaussian kernel is a continuous value, the probability that different sensor nodes have the same local density value is calculated by using the Gaussian kernel is smaller, so that the local density value of each sensor node is calculated by using the Gaussian kernel, and the formula is as follows:

2) calculating the relative distance of each node_iThe value:

wherein

Is that

In descending order.

3) Thus far, for any node x in the node set S_iCan be calculated to obtain (rho)_i,_i),i∈I_S. And the point with the larger rho value and value is the clustering center. At the same time gamma can be calculated_i＝ρ_i _iThe larger the value of γ, the more likely it is a cluster center. 4) Dividing other non-cluster head nodes into clusters with the closest distance according to the distance; 5) recalculating the average value of the objects in each cluster, and updating the cluster center of the cluster; 6) repeat 4) -5) until the criterion function

Converging;

is the cluster center of the ith cluster,

is defined as follows

The improved K-means algorithm inputs are as follows:

node data set

And k determined initial cluster centers;

the output of the improved K-means algorithm is K clusters, and the convergence of a criterion function is met.

From the above calculations, 54 nodes deployed in the Intel lab are divided into 7 clusters. Here two sets of temperature data sets, cluster 1 (containing nodes 1-5) and cluster 3 (containing nodes 20-27), are selected to validate the proposed lost data recovery and reconstruction method, with nodes 4 and 22 being respective cluster heads.

The lost data recovery comprises: when node data are lost, mining the spatio-temporal correlation of the node data in the data loss cluster by using a RBF neural network optimized by a PSO algorithm, and further recovering the node lost data value according to the historical round data and the current round data in the lost cluster.

The specific process of mining the spatio-temporal correlation of the node data in the data loss cluster by using the RBF neural network optimized by the PSO algorithm comprises the following steps:

X₁₁X₁₂…X_1mσ₁…X₂₁X₂₂…X_2mσ₂…X_k1X_k2…X_kmσ_k

V₁V₂…V_k×(m+1)

f(x)

wherein: x_i1X_i2…X_imThe position of the ith (i ═ 1.., k.) neural network center; sigma_iIs the width of the basis function; v₁V₂…V_kX (m +1) is the velocity of the particle; f (x) is a fitness function of the particle

The purpose of neural network training is to search the optimal value of the parameter to make the node lost data recover the mean square error and ERR_iMinimum, and therefore ERR is selected_iIs a fitness function, i.e. when the fitness value f is_iWhen the maximum value is taken, the RBF neural network structure is optimal, and the fitness function f of the ith individual_iThe following were used:

values are recovered for lost data for the node.

In a specific embodiment, the node lost data recovery comprises the following steps:

step 1, initializing a population. Randomly generating a population with the number of N particles, and initializing the position and the speed of the particles. The initial parameters (the central value and width of the basis function, and the weight from the hidden layer to the output layer) of the RBF neural network take any value in the (-8, 8) interval.

And 2, sorting. The fitness value of each particle was calculated using equation (3) and sorted in descending order.

And 3, calculating and updating the speed and the position of each particle, and then updating the Pbest value and the Gbest value. Wherein w, c₁And c₂Is adaptively determined by equations (4) - (6):

to determine the optimal combination of the three parameter values α, β and γ (α, β, γ ∈ {0.5,1, 1.5,2,2.5}), all combinations must be tested. The possible combinations of alpha, beta and gamma are 5 in total³125 combinations. This consumes a significant amount of computing resources if all of the combination tests are performed. Here we use an orthogonal design technique in order to sample a small representative set of a, β and γ combinations. L is₂₅(5⁶) Is an orthogonal array that can be used 25 times to simulate six variables containing up to 5 values. Thus, only 25 simulation experiments need to be performed to determine the optimal combination of α, β and γ values. Finally, the optimum combination value of α, β, and γ is found by simulation to be α ═ 2, β ═ 0.5, and γ ═ 1.5.

And 4, generating a new population.

And 5, repeating the steps 2-4 if the updated fitness value of the particle cannot meet the algorithm termination condition. Otherwise, outputting the optimal parameter scheme of the RBF neural network.

And 6, predicting and recovering the lost data of the marine wireless sensor network by using the RBFNNS optimal structure obtained in the step 5.

By observing the data set, 23% of the original data collected by the 54 nodes in the Intel indoor project in 84600 time slices is lost. Our goal is to estimate the temperature data loss value of cluster head node N4 from the N4 node historical wheel data and the current wheel data of other nodes (N1, N2, N3, N5) within the cluster. We set historical sensor readings as a terminal to predict missing information. We chose 1200 temperature data of N4 nodes to make simulation prediction. Fig. 4A, 4B, and 4C show results of N4 node simulation experiments. From fig. 4A, fig. 4B, and fig. 4C, it can be seen that the present invention captures all temperature data information except some wave peaks, and the recovery error is within 0.1.

We next compare the performance of the inventive method with different data recovery algorithms in case of data loss. The data loss rate is set to be between 10% and 90%. Figure 5 shows the data recovery performance of several algorithms. The X-axis represents the loss probability and the Y-axis represents the recovery error RE (recovery error). In general, RE increases with increasing loss rate. The method of the invention achieves the best performance under the ocean temperature data set. Even if 80% of data is lost, the recovery error of the method is less than 25%, while the error of the corresponding MASTER algorithm is more than 60% and the error of the NGCP algorithm is 40%.

If all the nodes send the monitoring environment data to the sink node, a large amount of communication energy consumption is generated, limited network bandwidth resources are wasted, channel conflict is caused, communication efficiency is reduced, and energy consumption and time delay of data transmission are increased. The data reconstruction technology provided by the text can be used for judging the change trend of data in advance, the lost data recovery algorithm of the text is adopted at two ends of node data transmission, if the recovered data and the actual monitoring data are within the error threshold value given by a user, the recovered data is used for replacing the actual node sensing data, the transmission quantity of the data in the network is reduced, and therefore the energy consumption of the network is reduced, and the acquisition efficiency of ocean big data is improved.

Fig. 6 shows a comparison of the amount of data transmitted by the method of the present invention, NGCP and when no data recovery algorithm is used. It can be seen from fig. 6 that the inventive method can reduce the data transmission amount by 50% compared to the NCGP algorithm. In the data recovery stage, compared with the NCGP algorithm, the method of the invention is more accurate, and the self-adaptive calculation time of the parameters is shorter. Therefore, we can get all OWSNs monitoring data by accepting a small amount of data. Therefore, the data recovery and reconstruction method can greatly improve the acquisition efficiency of ocean big data.

While the present invention has been described in detail with reference to the preferred embodiments, it should be understood that the above description should not be taken as limiting the invention. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be determined from the following claims.

Claims

1. A method for recovering and reconstructing lost data of a marine wireless sensor network is characterized by comprising the following steps:

clustering nodes and recovering lost data;

wherein the lost data recovery comprises: when node data are lost, mining the time-space correlation of the node data in a data loss cluster by using a RBF neural network optimized by a PSO algorithm, and further recovering a lost data value according to data of a history wheel and a current wheel in the lost cluster;

the specific process of clustering the nodes is as follows:

selecting a clustering center:

considering a sensor node dataset to be clustered

I_SN is the corresponding index set, d_ij＝dist(x_i,x_j) Representing a node x_iAnd x_jDistance between, for any sensor node x_iDefining the local density p_iAnd distance_iTwo quantities characterize the cluster center;

computing x by Gaussian kernel algorithm_iLocal density of (p)_i：

distance between two adjacent plates_i

Wherein

Is that

In descending order;

thus far, for any node x in the node set S_iCan be calculated to obtain (rho)_i,_i),i∈I_SThe point with larger rho value and value is the clustering center, and gamma can be calculated at the same time_i＝ρ_i _iThe larger the value of γ, the more likely it is a cluster center;

determining RBF neural network parameters by using an improved adaptive particle swarm algorithm to obtain an optimal network structure, then mining the time-space correlation of marine sensing data by using the adaptive particle swarm algorithm and the obtained optimal network structure, and finally realizing accurate recovery and reconstruction of lost data in a cluster;

X₁₁X₁₂…X_1mσ₁…X_i1X_i2…X_imσ_i…X_k1X_k2…X_kmσ_k

V₁V₂…V_k×(m+1)

f(x)

values are recovered for lost data for the node.

2. The method for recovering and reconstructing lost data of a marine wireless sensor network according to claim 1, wherein the marine wireless sensor network communication requires that a wireless sensor network is formed for self-organization of sensor nodes in the marine monitoring area and topology routing structure initialization is completed, and the network covers the marine area to be monitored comprehensively.

3. The method for recovering and reconstructing lost data of a marine wireless sensor network according to claim 1, wherein the improved K-means algorithm inputs are as follows:

node data set

And k determined initial cluster centers;

Converging;

is the cluster center of the ith cluster,

is defined as follows

The improved K-means algorithm executes the following steps:

C) A-B are repeated until the criterion function E does not change.

4. The method for recovering and reconstructing lost data of a marine wireless sensor network according to claim 1, wherein in the process of recovering lost node data, if a single node is clustered, historical round data of the node is used for data recovery, and if a plurality of nodes are clustered, historical round data of the lost node and current round data of other nodes in the cluster are used for data recovery.