CN111601358B - Multi-stage hierarchical clustering spatial correlation temperature perception data redundancy removing method - Google Patents
Multi-stage hierarchical clustering spatial correlation temperature perception data redundancy removing method Download PDFInfo
- Publication number
- CN111601358B CN111601358B CN202010361344.0A CN202010361344A CN111601358B CN 111601358 B CN111601358 B CN 111601358B CN 202010361344 A CN202010361344 A CN 202010361344A CN 111601358 B CN111601358 B CN 111601358B
- Authority
- CN
- China
- Prior art keywords
- cluster
- data
- node
- clustering
- redundancy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W40/00—Communication routing or communication path finding
- H04W40/02—Communication route or path selection, e.g. power-based or shortest path routing
- H04W40/20—Communication route or path selection, e.g. power-based or shortest path routing based on geographic position or location
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W40/00—Communication routing or communication path finding
- H04W40/24—Connectivity information management, e.g. connectivity discovery or connectivity update
- H04W40/248—Connectivity information update
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W84/00—Network topologies
- H04W84/18—Self-organising networks, e.g. ad-hoc networks or sensor networks
Abstract
The invention discloses a multi-stage hierarchical clustering space correlation temperature perception data redundancy removing method, which comprises the following steps: step 1: acquiring a large amount of temperature sensing data acquired by a temperature sensor network, improving a k-Means method by using Euclidean distance and Pearson distance on a Sink node, and performing node similarity analysis on the node according to a node position coordinate to obtain a redundant node cluster; step 2: performing similarity judgment on data in the cluster by using a Gaussian mixture clustering method at a cluster head CHs node of redundant node clustering, thereby further performing data redundancy clustering on the nodes in the cluster; and step 3: after the data redundancy clustering is obtained, carrying out random weighting on the data in the data redundancy clustering to obtain a final redundancy removing result; and 4, step 4: and transmitting the temperature data with the redundancy removed to the Sink node. The method can judge the redundant node more accurately, so that the judgment of the redundant data is more accurate, and the error of the result after redundancy removal is smaller.
Description
Technical Field
The invention relates to the technical field of wireless sensor networks, in particular to a multi-stage hierarchical clustering space correlation temperature perception data redundancy removing method.
Background
Wireless Sensor Networks (WSNs) are deployed in one area and used for monitoring physical phenomena such as temperature, humidity and earthquake events. In order to obtain accurate information of the environment or events, a large number of sensing nodes are deployed to collect data, and the data are reported to the aggregation node Sink in a high-frequency mode. Data generated by the sensor nodes generally has high space-time correlation and contains a large amount of redundant data. Meanwhile, transmitting redundant data causes unnecessary power consumption. Therefore, how to reduce the transmission energy consumption of the WSNs redundant data and extend the lifetime of the WSNs are very important issues.
By studying the space-time correlation, two synchronous predictors are used on the sensor node and the Sink. If the data prediction error is smaller than a given threshold value, the sensor node will not send data to the Sink. The Sink takes the predicted value as sensing data, so that the cost of data transmission and communication energy can be reduced, and the service life of the network is prolonged. However, this method increases the computational complexity of each sensor, and also fails to guarantee the true reliability of the predicted values. Meanwhile, the method for judging the redundant node only according to the node position lacks accuracy.
Aiming at the problem that the judgment of redundant nodes is inaccurate due to insufficient judgment conditions of the redundant nodes in the WSNs, a staged hierarchical clustering similarity redundancy removing method (TSDA) is provided. The method mainly comprises three stages: firstly, the Sink judges the node similarity by using an improved k-Means algorithm based on the node position information, and clusters all the nodes; in the second stage, the cluster heads CHs judge the similarity of the sensing data generated by the nodes in the cluster at the same moment by using a Gaussian mixture clustering algorithm so as to accurately judge the similarity of the nodes in the cluster; and three stages, randomly weighting the sensing data of the similar nodes in the cluster as a redundancy removing result, and transmitting and storing the result. The algorithm is suitable for a clustering network and mainly comprises a k-Means classification model, a Gaussian mixture classification model and a random weighting redundancy removal model. According to the similarity between the node position and the sensing data of the nodes in the cluster, the redundant data is removed, the accuracy of the node similarity can be effectively improved, the judgment on the redundant data is improved, and the life cycle of the network is further improved.
Disclosure of Invention
The invention aims to solve the technical problem of providing a multi-stage hierarchical clustering space correlation temperature perception data redundancy removing method aiming at the defect that in the prior art, the judgment of redundant nodes is inaccurate due to insufficient judgment conditions of the redundant nodes in a mode of judging the redundant nodes only according to node positions in WSNs.
The technical scheme adopted by the invention for solving the technical problems is as follows:
the invention provides a multi-stage hierarchical clustering space correlation temperature perception data redundancy removing method, which comprises the following steps:
step 1: acquiring a large amount of temperature sensing data acquired by a temperature sensor network, improving a k-Means method by using Euclidean distance and Pearson distance on a Sink node, and performing node similarity analysis on the node according to a node position coordinate to obtain a redundant node cluster;
step 2: performing similarity judgment on data in the cluster by using a Gaussian mixture clustering method at a cluster head CHs node of redundant node clustering, thereby further performing data redundancy clustering on the nodes in the cluster;
and step 3: after the data redundancy clustering is obtained, carrying out random weighting on the data in the data redundancy clustering to obtain a final redundancy removing result;
and 4, step 4: and transmitting the temperature data with the redundancy removed to the Sink node.
Further, the k-Means method is improved by the Euclidean distance and the Pearson distance in step 1 of the invention as follows:
the spatial similarity distance D (i, j) of the two nodes is as follows:
D(i,j)=DE(i,j)+βDP(i,j)
wherein, Euclidean distance DE(i, j) is:
pearson correlation distance DP(i, j) is:
wherein beta is a scale factor and represents DP(i, j) influence on the weight of D (i, j); the spatial position coordinates of n SNs nodes in the sensor network S1 are respectively (x)i,yi) Where 1 ≦ i ≦ n, the node is represented as the set S ≦ S1,s2,…,sn}; and the Sink node runs an improved k-Means algorithm according to the set S ═ S1,s2,…,snThe coordinate position set L ═ L corresponding to each node in the structure1,l2,…,lnAnd li=(xi,yi)&&1≤i is less than or equal to n, set S is { S ═ S1,s2,…,snN nodes in the tree are classified into K mutually disjoint subsets CiC, wherein C ═ { C ═ C1,C2,…,CKAnd C, and C1∪C2∪…∪CK(ii) S, wherein,and isi is not equal to j; by improved k-means algorithm, S is equal to S1,s2,…,snClustering, and obtaining a cluster division C ═ C1,C2,…,CK}。
Further, the improved k-Means algorithm in step 1 of the present invention comprises the following specific steps:
step 1.1, setting the number k of clustering centers of an improved k-Means algorithm;
step 1.2, randomly selecting k nodes from the sensor network S1 as an initial mean value (mu)1,μ2,…,μk};
Step 1.3, respectively solving the position coordinates ljAnd the mean vector mui(1. ltoreq. i. ltoreq. k) spatial similarity distance D (i, j): d (i, j) ← DE(i,j)+βDP(i,j);
Step 1.4, mixing of application and muiDetermining the node position l by D (i, j) with the minimum distancejCluster classification of (2):
And step 1.6, repeatedly executing the step 1.3 to the step 1.5 until a clustering result is obtained.
Further, the method in step 2 of the present invention specifically comprises:
wireless sensor network S1 consists of K clusters, where all data produced by a cluster is represented as the set X ═ X1,X2,…,Xn};Xi={xi(t1),xi(t2),…,xi(t2) Wherein i is more than or equal to 1 and less than or equal to n is a sensor node s per T secondsiA generated time series set; each cluster head CH in the whole wireless sensor network continues to classify and cluster the data correlation of the nodes in the cluster, and the Gaussian mixture clustering algorithm is adopted to collect the data sensed at the same time in the similar cluster of the same spatial nodeIs divided into component K1A cluster of 1 ≦ j&&1≤h≤K1(ii) a Sample setThe division result is K1Each cluster C ═ Ci1,Ci2,Ci3,...,CiK1},0<i≤K1。
Further, the gaussian mixture clustering method adopted in step 2 of the present invention specifically comprises:
let random variableRepresents node j1Is sensed dataThe gaussian mixture component of (a), which is a random value;prior probability of (2)Corresponds to alphai(i=1,2,…,K1) (ii) a According to the Bayes' theorem,a posterior distribution of (A) corresponds to:
Wherein the content of the first and second substances,is expressed as a sampleThe posterior probability generated from the ith Gaussian mixture component is recorded as
After the Gaussian mixture distribution is obtained, the Gaussian mixture clustering will collect the sample setIs divided into K1Each cluster C ═ Ci1,Ci2,Ci3,...,CiK1},0<i≤K1Each sample ofCluster mark ofComprises the following steps:
using EM algorithm to carry out iterative optimization solution to obtain a sample setThe result of the division of (1).
Further, the method in step 3 of the present invention specifically comprises:
according to the cluster division obtained in step 2As a result, the CHs carries out random weighted average on data generated by nodes in the data similarity cluster, and the redundancy removing resultComprises the following steps:
wherein, beta1,β2,…,βvIs a weighting factor, and β1+β2+…+βv=1;xw(tj),xa(tj),…,xb(tj) Are respectively sw,sa,…,sbNode at tjThe perception data generated at the moment, and
the invention has the following beneficial effects: the invention discloses a multi-stage hierarchical clustering space correlation temperature perception data redundancy removing method, which comprises the steps of performing redundancy removing processing on space redundancy nodes in three stages; in the process of removing redundancy of the sensing data, the redundant node can be judged more accurately, so that the judgment of the redundant data is more accurate, and the error of the result after removing redundancy is smaller. The invention improves the algorithm, so that the redundant data is removed more reasonably, and the network energy consumption is effectively reduced; experiments show that 70% of spatial redundancy data can be reduced, the data error is 0.2 ℃ on average, and meanwhile, 1.25% of energy consumption can be further reduced.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a flowchart of a method for removing redundancy of multi-stage hierarchical clustering temperature-sensing data based on spatial correlation according to an embodiment of the present invention;
FIG. 2 is an algorithmic flow diagram of an embodiment of the present invention;
FIG. 3 is a system model of an embodiment of the invention;
FIG. 4 is a raw data presentation of a node of an embodiment of the present invention;
FIG. 5 is a block diagram of an embodiment of the present invention for implementing node classification clustering by improved k-Means;
FIG. 6 is a cluster C of an embodiment of the present invention1The data similarity distribution of (a);
FIG. 7 is a cluster C of an embodiment of the present invention13Comparing the data before and after redundancy removal;
FIG. 8 is a cluster C of an embodiment of the present invention13Comparing errors before and after redundancy removal;
FIG. 9 shows K in an embodiment of the present invention1The impact on data de-redundancy rate;
FIG. 10 shows K according to an embodiment of the present invention1The impact on network energy consumption is 4.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The temperature sensing data of the intel berkeley laboratory was used for analysis in the examples of the present invention. This data, perceived collection for temperature, collected about forty thousand pieces of data per sensor node, for a total of 54 sensor nodes, with a data volume of about two million. This embodiment will carry out the relevant work around the temperature sensing data of the laboratory, the overall flow chart of the algorithm, as shown in fig. 1 and 2. System model, as shown in fig. 3.
Step 1: firstly, node similarity analysis is carried out on nodes on Sink according to the node position coordinates. Since accurate clustering requires precise definition of the closeness between samples, based on similarity or distance of pairings. Among the various distances, the euclidean distance is probably the most common distance for numerical data. However, the euclidean distance describes only the magnitude difference of the two eigenvector components. The euclidean distance of two differently shaped feature vectors may be smaller than the euclidean distance of similarly shaped feature vectors. The problem of the direction difference, rather than the size, of the two vectors is measured for the correlation distance. Therefore, the spatial similarity distance D (i, j) between two nodes is:
D(i,j)=DE(i,j)+βDP(i,j) (1)
euclidean distance DE(i, j) is:
pearson correlation distance DP(i, j) is:
wherein β is a scale factor and represents DP(i, j) influence on the D (i, j) weight. The dual metric distance satisfies three distance characteristics: positive, symmetrical and self-reflecting. In terms of dual metric distance, any pair of active feature vectors can be compared from the magnitude of the euclidean distance measure and the shape change of the associated distance measure.
The spatial position coordinates of n SNs nodes in the sensor network S1 are respectively (x)i,yi) Where 1 ≦ i ≦ n, the node is represented as the set S ≦ S1,s2,…,sn}. Sink is obtained by running a modified k-Means algorithm according to the set S ═ S1,s2,…,snThe coordinate position set L ═ L corresponding to each node in the structure1,l2,…,lnAnd li=(xi,yi)&&I is not less than 1 and not more than n, and the set S is not less than S1,s2,…,snIn (1)n nodes are classified into K mutually disjoint subsets CiC, wherein C ═ { C ═ C1,C2,…,CKAnd C, and C1∪C2∪…∪CK(ii) S, wherein,and isi ≠ j. By improved k-means algorithm, S is equal to S1,s2,…,snClustering, and obtaining a cluster division C ═ C1,C2,…,CK}. Wherein the minimized square error e in clustering is:
Step 2: after the similar cluster division is carried out on the spatial node positions in the first step, because the Gaussian mixture clustering can accurately quantize objects, the similarity analysis is further carried out on the data acquired at the same time in the same cluster by adopting the Gaussian mixture clustering algorithm in the second stage, so that the redundancy of the nodes on the spatial correlation is more accurate. The core of Gaussian mixture clustering is a probability model, prototype data are analyzed and described by adopting the probability model, and cluster division is mainly determined by posterior probability corresponding to a prototype.
The gaussian distribution is defined as a random variable X in an n-dimensional sample space X, and if X follows a gaussian distribution, its probability density function p (X) is:
wherein, mu represents an n-dimensional mean vector,representing an n x n covariance matrix. The Gaussian distribution is completely composed of a mean vector mu and a covariance matrixThese two parameters are determined. Therefore, for convenience of description, the probability density function of the dependency relationship of the gaussian distribution on the corresponding parameter is expressed asGaussian mixture distribution pMComprises the following steps:
the distribution is totally defined by K1Each mixture component is corresponding to a Gaussian distribution. Wherein muiAndis a parameter of the ith Gaussian mixture component, and alphai>0 is the corresponding "mixing coefficient", and
the wireless sensor network S1 is composed of K clusters, where all data generated by a cluster can be represented as a set X ═ X1,X2,…,Xn}。Xi={xi(t1),xi(t2),…,xi(t2) Wherein i is more than or equal to 1 and less than or equal to n is a sensor node s per T secondsiThe generated time series set. Each cluster head CH in the whole wireless sensor network continues to classify and cluster the data correlation of the nodes in the cluster, and the Gaussian mixture clustering algorithm is adopted to collect the data sensed at the same time in the similar cluster of the same spatial nodeIs divided into component K1A cluster of 1 ≦ j&&1≤h≤K1。
Let random variableRepresents node j1Is sensed dataThe gaussian mixture component of (a), which is a random value.Prior probability of (2)Corresponds to alphai(i=1,2,…,K1). According to the Bayes' theorem,the posterior distribution of (a) corresponds to:
is expressed as a sampleThe posterior probability generated from the ith Gaussian mixture component is recorded as
After the Gaussian mixture distribution is obtained, the Gaussian mixture clustering will collect the sample setIs divided into K1Each cluster C ═ Ci1,Ci2,Ci3,…,CiK1}(0<i≤K1) Each sample ofCluster mark ofComprises the following steps:
and carrying out iterative optimization solution by using an EM algorithm.
μiThe mean for each mixture component can be estimated by a sample weighted average, the sample weight being the posterior probability that each sample belongs to that componentRate of changeSimilarly, composed ofCan obtain the product
Lagrangian forms of LL (D)
Where ρ is the lagrange multiplier. Is represented by formula (4.12) to alphaiHas a derivative of 0, has
Both sides are multiplied by alphaiSumming all the mixed components to obtain rho ═ m, where
I.e. the mixing coefficient of each gaussian component is determined by the average a posteriori probability that the sample belongs to that component.
And step 3: cluster partitioning according to step 2As a result, the CHs carries out random weighted average on data generated by nodes in the data similarity cluster, and the redundancy removing resultComprises the following steps:
wherein beta is1,β2,…,βvIs a weighting factor, and β1+β2+…+βv=1;xw(tj),xa(tj),…,xb(tj) Are respectively sw,sa,…,sbNode at tjThe perception data generated at the moment, and
and 4, step 4: removing redundant result in step 3And transmitting the information to an aggregation node Sink.
And (3) redundancy removal and energy consumption analysis of experimental data:
in this experimental part, the data from the intel berkeley research laboratory were used mainly for research analysis. The laboratory has totally arranged 54 sensor nodes, monitors the temperature change condition of different positions of the whole laboratory respectively, and the sensor nodes collect data once every 0.5 minute, collects data of about one month, and the data volume of each node is about forty thousand, totally 54 nodes, therefore, the total data volume reaches two million pieces, and the data volume is huge. In the preliminary test, four ten thousand pieces of data of one node are mainly used for analysis and improvement. The raw data for node 1 is shown in FIG. 4.
In fig. 4, the x-axis represents time and the y-axis represents temperature. The temperature varied dramatically with time, especially near time 430 minutes, reaching a minimum, then increasing with time to reach a maximum peak around 750 minutes, then decreasing again with time to begin at a minimum at 1800 minutes, then increasing again, and reaching a maximum at 2250 minutes and beginning to decrease again. Since the variability of the data is very large, it is very intuitive that the data is the most specific maximum node in the data at around 430, 750, 1800 and 2250 minutes. Therefore, in the data redundancy removing process, the data redundancy removing situation of the four positions needs to be particularly focused.
To verify the performance of the proposed method, the study was simulated using python 3.6. The model uses a single hop approach to transmit data. In order to verify a data transmission algorithm and a network life cycle in the network, a data transmission model and a node energy consumption model are adopted to describe data transmission and node energy consumption conditions in the network. The model parameters are shown in table 1.
Table 1 parameter set-up for simulation experiments
(1) In the first stage, Sink classifies and clusters according to the coordinate position of a node by running an improved k-Means clustering algorithm. Assuming that k is 4 and β is {0,0.3,0.5,0.7,1}, the four classification clustering results are obviously changed along with the change of β. The diamonds in the figure represent the cluster centers in four classification clusters, and the classification clustering results are shown in fig. 5.
As can be seen from fig. 5, the classification of 54 nodes mainly appears in two cases: 1) the node classification varies with β and the cluster classification is not altered. 2) The node classification varies with β and alters the cluster classification. The nodes that apparently change in the figure are S ═ {0,2,5,9,10,19,20,32,33,45,46}, and if the classification cluster is labeled clockwise from the top left, the label is used to indicate that the classification cluster is changed clockwiseIs marked as a cluster C1Cluster C2Cluster C3And cluster C4. And respectively carrying out probability calculation of four categories on the nodes which are obviously changed, and further classifying the nodes with the maximum probability into corresponding clusters. The corresponding probability distribution is shown in table 2.
TABLE 2 node distribution Cluster probability
Through the results in table 2, the Sink classifies nodes which are easy to change into a corresponding class according to the probability ratio, so that the final classification result cluster C of all nodes1Cluster C2Cluster C3And cluster C4Respectively expressed as:
C1={0,2,21,22,23,24,25,26,27,28,29,30,31,32,33};
C2={1,34,35,36,37,38,39,40,41,42,43,44};
C3={3,4,5,6,7,8,9,45,46,47,48,49,50,51,52,53};
C4={10,11,12,13,14,15,16,17,18,19,20}。
(2) second stage, Cluster C1Cluster C2Cluster C3And cluster C4The cluster heads CHs in the cluster respectively operate a Gaussian mixture clustering algorithm, the sensing data of two different moments of the nodes in the cluster are continuously acquired to analyze the similarity of the data between the nodes, and after the data similarity judgment for a period of time, four clusters C are calculated1Cluster C2Cluster C3And cluster C4And (5) final classification results of the middle nodes. Cluster C1The similar classification result between two consecutive sensing data between each node in the graph is shown in fig. 5.
In fig. 6, the abscissa represents the previous sensed data of two data sensed in succession, and the ordinate represents two data sensed in successionThe latter perception of the data, and the cluster C is evident from the figure1The similar classification result clusters among the nodes in (1). Cluster C1Is divided into C11={22,25,28,30,32},C12={23,24,26},C13={27,29,31,33},C14={0,2,21}。
In the same way, Cluster C2Is divided into C21={1,34,35,36},C22={37,38,39},C 2340,4143, 44; cluster C3Is divided into C31={3,4,5,6,7},C32={8,9,45,46},C33{47,48,49,50,51,52,53 }; cluster C4Is divided into C41={10,11,12},C42={13,14,15,16},C43={17,18,19,20}。
(3) And in the third stage, carrying out random weighting on the data in the data similarity cluster to obtain a final redundancy removing result. This stage is mainly in clusters C1Middle sub-cluster C13For example, {27,29,31,33}, let the random weighting factor β be calculated for convenience1+β2+…+βv1 and beta1=β2=…+=βvAnd then analyzing the data result after removing redundancy, as shown in fig. 7, and the relationship of the error between the data and the original data, as shown in fig. 8.
TABLE 3 Cluster C13Mean error comparison of middle nodes
The cluster C can be seen in FIG. 713The resulting data after redundancy removal tends to center the sensing data of the redundant node, and the sensing data of the node 29 and the node 33 tends to approach the resulting data, whereas the node 27 and the node 31 are relatively far away from the resulting data. The cluster C is reflected by FIG. 813The error of each node in the data and each sensing data of the result data after redundancy removal is obviously found from the figure, and the error of the node 29 and the node 33 is relatively much lower than that of the node 27 and the node 31. From Table 3, cluster C13Of individual sensing data of individual nodesMean error, it can be seen that the mean error for node 27 is 0.348, the mean error for node 29 is 0.043, the mean error for node 31 is 0.337, and the mean error for node 33 is 0.056. Indicating that even if they belong to the class of data similarity, there are still large differences between the data. Therefore, for the method of performing similarity analysis using only the coordinate positions of the nodes, the spatial correlation analysis between the data is lacking, which in turn may cause a larger error between the data. By means of grading, layering and clustering, accuracy of similarity among node data is guaranteed.
(4) Clustering K on data similarity of TSDA algorithm data redundancy removal rate and data in the second stage1Has a relationship of, but K1The value of (a) in turn affects the accuracy of the data correlation, while K1The larger the data correlation, the more accurate the data correlation. Further analysis with K1Influence of variation of (2) on the Deredundancy Rate (0)<K1Cluster C not more thaniNumber of inner nodes) as shown in fig. 9, and the effect on energy consumption as shown in fig. 10.
As can be seen from FIG. 9, with K1The redundancy removal rate of the system is gradually reduced when K is greater1When 1, the cluster C is described1Cluster C2Cluster C3And cluster C4Does not perform the division of similar sub-clusters, but rather divides cluster C1Cluster C2Cluster C3And cluster C4All nodes in the cluster are divided into similar clusters, and the cluster C is simultaneously divided into similar clusters1Cluster C2Cluster C3And cluster C4The nodes in each cluster are regarded as redundant nodes, and random weighted redundancy removal is carried out on the redundant nodes to obtain result data, so that the redundancy removal rate is maximum at the moment. However, due to K1When the value is 1, it is equivalent to that the node position similarity determination is performed only on all the nodes, and the data similarity determination is not performed, so that the resulting data error is also the largest. When K is1When 10, it is equivalent to put each cluster C1Cluster C2Cluster C3And cluster C4The nodes in (1) are divided into 10 sub-data similar clusters respectively, and then the clusters C are respectively processed1Cluster C2Cluster C3Hezhou clusterC4The 10 sub-similar clusters in the cluster are weighted randomly to obtain the data de-redundancy result, so that the last retained data is the cluster C1Cluster C2Cluster C3And cluster C4And the data of the middle 10 sub-similar clusters are displayed, so that the data redundancy removing rate is the lowest, and the accuracy of the redundancy removing result data is further ensured. K is selected in compromise for ensuring the data accuracy and the data redundancy removal rate1And 4, analyzing and solving the network energy consumption.
As can be seen from FIG. 10, the TSDA algorithm curve is mainly distributed between 97.50% and 98.0%, the TCDA algorithm curve is mainly distributed between 96.26% and 96.75%, and the TSDA algorithm curve variation and the TCDA algorithm curve variation are maintained at 1.25%. Therefore, the TSDA algorithm is combined with the TCDA algorithm, 70% of redundant data can be further removed, and the network energy consumption can be further improved by 1.25%. Meanwhile, the accuracy of the redundant node can be maintained between 0.043 and 0.35. The redundancy removing rate can be made to be the highest within the error range allowed by a user.
It will be understood that modifications and variations can be made by persons skilled in the art in light of the above teachings and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.
Claims (1)
1. A multi-stage hierarchical clustering spatial correlation temperature sensing data de-redundancy method is characterized by comprising the following steps:
step 1: acquiring a large amount of temperature sensing data acquired by a temperature sensor network, improving a k-Means method by using Euclidean distance and Pearson distance on a Sink node, and performing node similarity analysis on the node according to a node position coordinate to obtain a redundant node cluster;
the method for improving the k-Means method by using the Euclidean distance and the Pearson distance in the step 1 comprises the following steps:
the spatial similarity distance D (i, j) of the two nodes is as follows:
D(i,j)=DE(i,j)+βDP(i,j)
wherein, Euclidean distance DE(i, j) is:
pearson correlation distance DP(i, j) is:
wherein beta is a scale factor and represents DP(i, j) influence on the weight of D (i, j); the spatial position coordinates of n SNs nodes in the sensor network S1 are respectively (x)i,yi) Where 1 ≦ i ≦ n, the node is represented as the set S ≦ S1,s2,…,sn}; and the Sink node runs an improved k-Means algorithm according to the set S ═ S1,s2,…,snThe coordinate position set L ═ L corresponding to each node in the structure1,l2,…,lnAnd li=(xi,yi)&&I is not less than 1 and not more than n, and the set S is not less than S1,s2,…,snN nodes in the tree are classified into K mutually disjoint subsets CiC, wherein C ═ { C ═ C1,C2,…,CKAnd C, and C1∪C2∪…∪CK(ii) S, wherein,and isBy improved k-means algorithm, S is equal to S1,s2,…,snClustering, and obtaining a cluster division C ═ C1,C2,…,CK};
The improved k-Means algorithm in the step 1 comprises the following specific steps:
step 1.1, setting the number k of clustering centers of an improved k-Means algorithm;
step 1.2, randomly selecting k nodes from the sensor network S1 as an initial mean value (mu)1,μ2,…,μk};
Step 1.3, respectively solving the position coordinates ljAnd the mean vector mui(1. ltoreq. i. ltoreq. k) spatial similarity distance D (i, j): d (i, j) ← DE(i,j)+βDP(i,j);
Step 1.4, mixing of application and muiDetermining the node position l by D (i, j) with the minimum distancejCluster classification of (2):
Step 1.6, repeatedly executing the step 1.3 to the step 1.5 until a clustering result is obtained;
step 2: performing similarity judgment on data in the cluster by using a Gaussian mixture clustering method at a cluster head CHs node of redundant node clustering, thereby further performing data redundancy clustering on the nodes in the cluster;
the method in the step 2 specifically comprises the following steps:
the wireless sensor network S1 is composed of K clusters, where all data generated by a cluster is represented as a set X ═ X1,X2,…,Xn};Xi={xi(t1),xi(t2),…,xi(t2) Wherein i is more than or equal to 1 and less than or equal to n is a sensor node s per T secondsiA generated time series set; each cluster head CH in the whole wireless sensor network continues to classify and cluster the intra-cluster nodes according to the data correlation, and a Gaussian mixture clustering algorithm is adopted to collect data D sensed at the same time in the same spatial node similar clusterCHh(tj)={x1(tj),x2(tj),…,xz(tj) Is divided into K1A cluster of 1 ≦ j&&1≤h≤K1(ii) a Sample setThe division result is K1Each cluster C ═ Ci1,Ci2,Ci3,...,CiK1},0<i≤K1;
The Gaussian mixture clustering method adopted in the step 2 specifically comprises the following steps:
let random variableRepresents node j1Is sensed dataThe gaussian mixture component of (a), which is a random value;prior probability of (2)Corresponds to alphai(i=1,2,…,K1) (ii) a According to the Bayes' theorem,the posterior distribution of (a) corresponds to:
wherein the content of the first and second substances,is expressed as a sampleThe posterior probability generated from the ith Gaussian mixture component is recorded as
After the Gaussian mixture distribution is obtained, the Gaussian mixture clustering will collect the sample setIs divided into K1Each cluster C ═ Ci1,Ci2,Ci3,...,CiK1},0<i≤K1Each sample ofCluster mark ofComprises the following steps:
using EM algorithm to carry out iterative optimization solution to obtain a sample setThe division result of (1);
and step 3: after the data redundancy clustering is obtained, carrying out random weighting on the data in the data redundancy clustering to obtain a final redundancy removing result;
the method in the step 3 specifically comprises the following steps:
according to the cluster division obtained in step 2As a result, the CHs carries out random weighted average on data generated by nodes in the data similarity cluster, and the redundancy removing resultComprises the following steps:
wherein, beta1,β2,…,βvIs a weighting factor, and β1+β2+…+βv=1;xw(tj),xa(tj),…,xb(tj) Are respectively sw,sa,…,sbNode at tjThe perception data generated at the moment, and
and 4, step 4: and transmitting the temperature data with the redundancy removed to the Sink node.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010361344.0A CN111601358B (en) | 2020-04-30 | 2020-04-30 | Multi-stage hierarchical clustering spatial correlation temperature perception data redundancy removing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010361344.0A CN111601358B (en) | 2020-04-30 | 2020-04-30 | Multi-stage hierarchical clustering spatial correlation temperature perception data redundancy removing method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111601358A CN111601358A (en) | 2020-08-28 |
CN111601358B true CN111601358B (en) | 2021-05-18 |
Family
ID=72190901
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010361344.0A Active CN111601358B (en) | 2020-04-30 | 2020-04-30 | Multi-stage hierarchical clustering spatial correlation temperature perception data redundancy removing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111601358B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112994991B (en) * | 2021-05-20 | 2021-07-16 | 中南大学 | Redundant node discrimination method, device and equipment and readable storage medium |
CN116992220B (en) * | 2023-09-25 | 2023-12-19 | 国网北京市电力公司 | Low-redundancy electricity consumption data intelligent acquisition method |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2502775A (en) * | 2012-05-31 | 2013-12-11 | Toshiba Res Europ Ltd | Selecting routes between nodes in a network based on node processing gain and lifetime |
CN107241776B (en) * | 2017-07-18 | 2019-03-22 | 中南民族大学 | A kind of wireless sensor network data fusion method mixing delay sensitive sub-clustering |
CN109446028B (en) * | 2018-10-26 | 2022-05-03 | 中国人民解放军火箭军工程大学 | Method for monitoring state of refrigeration dehumidifier based on genetic fuzzy C-mean clustering |
CN110830946B (en) * | 2019-11-15 | 2020-11-06 | 江南大学 | Mixed type online data anomaly detection method |
-
2020
- 2020-04-30 CN CN202010361344.0A patent/CN111601358B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN111601358A (en) | 2020-08-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019233189A1 (en) | Method for detecting sensor network abnormal data | |
CN111601358B (en) | Multi-stage hierarchical clustering spatial correlation temperature perception data redundancy removing method | |
WO2009038271A1 (en) | Method for automatic clustering, and method and apparatus for multipath clustering in wireless communication using the same | |
CN106682454B (en) | A kind of macro genomic data classification method and device | |
CN110880369A (en) | Gas marker detection method based on radial basis function neural network and application | |
CN110379521B (en) | Medical data set feature selection method based on information theory | |
CN111436926A (en) | Atrial fibrillation signal detection method based on statistical characteristics and convolution cyclic neural network | |
CN110460401B (en) | Cooperative spectrum sensing method based on matrix decomposition and particle swarm optimization clustering | |
CN110309887A (en) | Based on the Fuzzy C-Means Clustering method for detecting abnormality for improving flower pollination | |
CN112287980B (en) | Power battery screening method based on typical feature vector | |
CN113075129B (en) | Hyperspectral image band selection method and system based on neighbor subspace division | |
CN111275132A (en) | Target clustering method based on SA-PFCM + + algorithm | |
CN111291822A (en) | Equipment running state judgment method based on fuzzy clustering optimal k value selection algorithm | |
CN107679138A (en) | Spectrum signature system of selection based on local scale parameter, entropy and cosine similarity | |
CN110796159A (en) | Power data classification method and system based on k-means algorithm | |
CN112305441A (en) | Power battery health state assessment method under integrated clustering | |
CN108985462B (en) | Unsupervised feature selection method based on mutual information and fractal dimension | |
CN110689140A (en) | Method for intelligently managing rail transit alarm data through big data | |
CN111934797B (en) | Collaborative spectrum sensing method based on covariance eigenvalue and mean shift clustering | |
CN111666999A (en) | Remote sensing image classification method | |
CN111797899A (en) | Low-voltage transformer area kmeans clustering method and system | |
Shenoy et al. | Anamoly detection in wireless sensor networks | |
CN110784887B (en) | Method for detecting number of abnormal signal sources in gridding radio signal monitoring system | |
Choi et al. | Comparison of various statistical methods for detecting disease outbreaks | |
Hou et al. | The indoor wireless location technology research based on WiFi |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |