CN111601358B - Multi-stage hierarchical clustering spatial correlation temperature perception data redundancy removing method - Google Patents

Multi-stage hierarchical clustering spatial correlation temperature perception data redundancy removing method Download PDF

Info

Publication number
CN111601358B
CN111601358B CN202010361344.0A CN202010361344A CN111601358B CN 111601358 B CN111601358 B CN 111601358B CN 202010361344 A CN202010361344 A CN 202010361344A CN 111601358 B CN111601358 B CN 111601358B
Authority
CN
China
Prior art keywords
cluster
data
node
clustering
redundancy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010361344.0A
Other languages
Chinese (zh)
Other versions
CN111601358A (en
Inventor
朱容波
王俊
李媛丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South Central Minzu University
Original Assignee
South Central University for Nationalities
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South Central University for Nationalities filed Critical South Central University for Nationalities
Priority to CN202010361344.0A priority Critical patent/CN111601358B/en
Publication of CN111601358A publication Critical patent/CN111601358A/en
Application granted granted Critical
Publication of CN111601358B publication Critical patent/CN111601358B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W40/00Communication routing or communication path finding
    • H04W40/02Communication route or path selection, e.g. power-based or shortest path routing
    • H04W40/20Communication route or path selection, e.g. power-based or shortest path routing based on geographic position or location
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W40/00Communication routing or communication path finding
    • H04W40/24Connectivity information management, e.g. connectivity discovery or connectivity update
    • H04W40/248Connectivity information update
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W84/00Network topologies
    • H04W84/18Self-organising networks, e.g. ad-hoc networks or sensor networks

Abstract

The invention discloses a multi-stage hierarchical clustering space correlation temperature perception data redundancy removing method, which comprises the following steps: step 1: acquiring a large amount of temperature sensing data acquired by a temperature sensor network, improving a k-Means method by using Euclidean distance and Pearson distance on a Sink node, and performing node similarity analysis on the node according to a node position coordinate to obtain a redundant node cluster; step 2: performing similarity judgment on data in the cluster by using a Gaussian mixture clustering method at a cluster head CHs node of redundant node clustering, thereby further performing data redundancy clustering on the nodes in the cluster; and step 3: after the data redundancy clustering is obtained, carrying out random weighting on the data in the data redundancy clustering to obtain a final redundancy removing result; and 4, step 4: and transmitting the temperature data with the redundancy removed to the Sink node. The method can judge the redundant node more accurately, so that the judgment of the redundant data is more accurate, and the error of the result after redundancy removal is smaller.

Description

Multi-stage hierarchical clustering spatial correlation temperature perception data redundancy removing method
Technical Field
The invention relates to the technical field of wireless sensor networks, in particular to a multi-stage hierarchical clustering space correlation temperature perception data redundancy removing method.
Background
Wireless Sensor Networks (WSNs) are deployed in one area and used for monitoring physical phenomena such as temperature, humidity and earthquake events. In order to obtain accurate information of the environment or events, a large number of sensing nodes are deployed to collect data, and the data are reported to the aggregation node Sink in a high-frequency mode. Data generated by the sensor nodes generally has high space-time correlation and contains a large amount of redundant data. Meanwhile, transmitting redundant data causes unnecessary power consumption. Therefore, how to reduce the transmission energy consumption of the WSNs redundant data and extend the lifetime of the WSNs are very important issues.
By studying the space-time correlation, two synchronous predictors are used on the sensor node and the Sink. If the data prediction error is smaller than a given threshold value, the sensor node will not send data to the Sink. The Sink takes the predicted value as sensing data, so that the cost of data transmission and communication energy can be reduced, and the service life of the network is prolonged. However, this method increases the computational complexity of each sensor, and also fails to guarantee the true reliability of the predicted values. Meanwhile, the method for judging the redundant node only according to the node position lacks accuracy.
Aiming at the problem that the judgment of redundant nodes is inaccurate due to insufficient judgment conditions of the redundant nodes in the WSNs, a staged hierarchical clustering similarity redundancy removing method (TSDA) is provided. The method mainly comprises three stages: firstly, the Sink judges the node similarity by using an improved k-Means algorithm based on the node position information, and clusters all the nodes; in the second stage, the cluster heads CHs judge the similarity of the sensing data generated by the nodes in the cluster at the same moment by using a Gaussian mixture clustering algorithm so as to accurately judge the similarity of the nodes in the cluster; and three stages, randomly weighting the sensing data of the similar nodes in the cluster as a redundancy removing result, and transmitting and storing the result. The algorithm is suitable for a clustering network and mainly comprises a k-Means classification model, a Gaussian mixture classification model and a random weighting redundancy removal model. According to the similarity between the node position and the sensing data of the nodes in the cluster, the redundant data is removed, the accuracy of the node similarity can be effectively improved, the judgment on the redundant data is improved, and the life cycle of the network is further improved.
Disclosure of Invention
The invention aims to solve the technical problem of providing a multi-stage hierarchical clustering space correlation temperature perception data redundancy removing method aiming at the defect that in the prior art, the judgment of redundant nodes is inaccurate due to insufficient judgment conditions of the redundant nodes in a mode of judging the redundant nodes only according to node positions in WSNs.
The technical scheme adopted by the invention for solving the technical problems is as follows:
the invention provides a multi-stage hierarchical clustering space correlation temperature perception data redundancy removing method, which comprises the following steps:
step 1: acquiring a large amount of temperature sensing data acquired by a temperature sensor network, improving a k-Means method by using Euclidean distance and Pearson distance on a Sink node, and performing node similarity analysis on the node according to a node position coordinate to obtain a redundant node cluster;
step 2: performing similarity judgment on data in the cluster by using a Gaussian mixture clustering method at a cluster head CHs node of redundant node clustering, thereby further performing data redundancy clustering on the nodes in the cluster;
and step 3: after the data redundancy clustering is obtained, carrying out random weighting on the data in the data redundancy clustering to obtain a final redundancy removing result;
and 4, step 4: and transmitting the temperature data with the redundancy removed to the Sink node.
Further, the k-Means method is improved by the Euclidean distance and the Pearson distance in step 1 of the invention as follows:
the spatial similarity distance D (i, j) of the two nodes is as follows:
D(i,j)=DE(i,j)+βDP(i,j)
wherein, Euclidean distance DE(i, j) is:
Figure GDA0002957597100000021
pearson correlation distance DP(i, j) is:
Figure GDA0002957597100000031
wherein beta is a scale factor and represents DP(i, j) influence on the weight of D (i, j); the spatial position coordinates of n SNs nodes in the sensor network S1 are respectively (x)i,yi) Where 1 ≦ i ≦ n, the node is represented as the set S ≦ S1,s2,…,sn}; and the Sink node runs an improved k-Means algorithm according to the set S ═ S1,s2,…,snThe coordinate position set L ═ L corresponding to each node in the structure1,l2,…,lnAnd li=(xi,yi)&&1≤i is less than or equal to n, set S is { S ═ S1,s2,…,snN nodes in the tree are classified into K mutually disjoint subsets CiC, wherein C ═ { C ═ C1,C2,…,CKAnd C, and C1∪C2∪…∪CK(ii) S, wherein,
Figure GDA0002957597100000032
and is
Figure GDA0002957597100000033
i is not equal to j; by improved k-means algorithm, S is equal to S1,s2,…,snClustering, and obtaining a cluster division C ═ C1,C2,…,CK}。
Further, the improved k-Means algorithm in step 1 of the present invention comprises the following specific steps:
step 1.1, setting the number k of clustering centers of an improved k-Means algorithm;
step 1.2, randomly selecting k nodes from the sensor network S1 as an initial mean value (mu)12,…,μk};
Step 1.3, respectively solving the position coordinates ljAnd the mean vector mui(1. ltoreq. i. ltoreq. k) spatial similarity distance D (i, j): d (i, j) ← DE(i,j)+βDP(i,j);
Step 1.4, mixing of application and muiDetermining the node position l by D (i, j) with the minimum distancejCluster classification of (2):
Figure GDA0002957597100000034
step 1.5, update mui
Figure GDA0002957597100000035
And step 1.6, repeatedly executing the step 1.3 to the step 1.5 until a clustering result is obtained.
Further, the method in step 2 of the present invention specifically comprises:
wireless sensor network S1 consists of K clusters, where all data produced by a cluster is represented as the set X ═ X1,X2,…,Xn};Xi={xi(t1),xi(t2),…,xi(t2) Wherein i is more than or equal to 1 and less than or equal to n is a sensor node s per T secondsiA generated time series set; each cluster head CH in the whole wireless sensor network continues to classify and cluster the data correlation of the nodes in the cluster, and the Gaussian mixture clustering algorithm is adopted to collect the data sensed at the same time in the similar cluster of the same spatial node
Figure GDA0002957597100000036
Is divided into component K1A cluster of 1 ≦ j&&1≤h≤K1(ii) a Sample set
Figure GDA0002957597100000037
The division result is K1Each cluster C ═ Ci1,Ci2,Ci3,...,CiK1},0<i≤K1
Further, the gaussian mixture clustering method adopted in step 2 of the present invention specifically comprises:
let random variable
Figure GDA0002957597100000041
Represents node j1Is sensed data
Figure GDA0002957597100000042
The gaussian mixture component of (a), which is a random value;
Figure GDA0002957597100000043
prior probability of (2)
Figure GDA0002957597100000044
Corresponds to alphai(i=1,2,…,K1) (ii) a According to the Bayes' theorem,
Figure GDA0002957597100000045
a posterior distribution of (A) corresponds to:
Figure GDA0002957597100000046
Wherein the content of the first and second substances,
Figure GDA0002957597100000047
is expressed as a sample
Figure GDA0002957597100000048
The posterior probability generated from the ith Gaussian mixture component is recorded as
Figure GDA0002957597100000049
After the Gaussian mixture distribution is obtained, the Gaussian mixture clustering will collect the sample set
Figure GDA00029575971000000410
Is divided into K1Each cluster C ═ Ci1,Ci2,Ci3,...,CiK1},0<i≤K1Each sample of
Figure GDA00029575971000000411
Cluster mark of
Figure GDA00029575971000000412
Comprises the following steps:
Figure GDA00029575971000000413
model parameters
Figure GDA00029575971000000414
Solving:
Figure GDA00029575971000000415
using EM algorithm to carry out iterative optimization solution to obtain a sample set
Figure GDA00029575971000000416
The result of the division of (1).
Further, the method in step 3 of the present invention specifically comprises:
according to the cluster division obtained in step 2
Figure GDA00029575971000000417
As a result, the CHs carries out random weighted average on data generated by nodes in the data similarity cluster, and the redundancy removing result
Figure GDA00029575971000000418
Comprises the following steps:
Figure GDA00029575971000000419
wherein, beta1,β2,…,βvIs a weighting factor, and β12+…+βv=1;xw(tj),xa(tj),…,xb(tj) Are respectively sw,sa,…,sbNode at tjThe perception data generated at the moment, and
Figure GDA0002957597100000051
the invention has the following beneficial effects: the invention discloses a multi-stage hierarchical clustering space correlation temperature perception data redundancy removing method, which comprises the steps of performing redundancy removing processing on space redundancy nodes in three stages; in the process of removing redundancy of the sensing data, the redundant node can be judged more accurately, so that the judgment of the redundant data is more accurate, and the error of the result after removing redundancy is smaller. The invention improves the algorithm, so that the redundant data is removed more reasonably, and the network energy consumption is effectively reduced; experiments show that 70% of spatial redundancy data can be reduced, the data error is 0.2 ℃ on average, and meanwhile, 1.25% of energy consumption can be further reduced.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a flowchart of a method for removing redundancy of multi-stage hierarchical clustering temperature-sensing data based on spatial correlation according to an embodiment of the present invention;
FIG. 2 is an algorithmic flow diagram of an embodiment of the present invention;
FIG. 3 is a system model of an embodiment of the invention;
FIG. 4 is a raw data presentation of a node of an embodiment of the present invention;
FIG. 5 is a block diagram of an embodiment of the present invention for implementing node classification clustering by improved k-Means;
FIG. 6 is a cluster C of an embodiment of the present invention1The data similarity distribution of (a);
FIG. 7 is a cluster C of an embodiment of the present invention13Comparing the data before and after redundancy removal;
FIG. 8 is a cluster C of an embodiment of the present invention13Comparing errors before and after redundancy removal;
FIG. 9 shows K in an embodiment of the present invention1The impact on data de-redundancy rate;
FIG. 10 shows K according to an embodiment of the present invention1The impact on network energy consumption is 4.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The temperature sensing data of the intel berkeley laboratory was used for analysis in the examples of the present invention. This data, perceived collection for temperature, collected about forty thousand pieces of data per sensor node, for a total of 54 sensor nodes, with a data volume of about two million. This embodiment will carry out the relevant work around the temperature sensing data of the laboratory, the overall flow chart of the algorithm, as shown in fig. 1 and 2. System model, as shown in fig. 3.
Step 1: firstly, node similarity analysis is carried out on nodes on Sink according to the node position coordinates. Since accurate clustering requires precise definition of the closeness between samples, based on similarity or distance of pairings. Among the various distances, the euclidean distance is probably the most common distance for numerical data. However, the euclidean distance describes only the magnitude difference of the two eigenvector components. The euclidean distance of two differently shaped feature vectors may be smaller than the euclidean distance of similarly shaped feature vectors. The problem of the direction difference, rather than the size, of the two vectors is measured for the correlation distance. Therefore, the spatial similarity distance D (i, j) between two nodes is:
D(i,j)=DE(i,j)+βDP(i,j) (1)
euclidean distance DE(i, j) is:
Figure GDA0002957597100000061
pearson correlation distance DP(i, j) is:
Figure GDA0002957597100000062
wherein β is a scale factor and represents DP(i, j) influence on the D (i, j) weight. The dual metric distance satisfies three distance characteristics: positive, symmetrical and self-reflecting. In terms of dual metric distance, any pair of active feature vectors can be compared from the magnitude of the euclidean distance measure and the shape change of the associated distance measure.
The spatial position coordinates of n SNs nodes in the sensor network S1 are respectively (x)i,yi) Where 1 ≦ i ≦ n, the node is represented as the set S ≦ S1,s2,…,sn}. Sink is obtained by running a modified k-Means algorithm according to the set S ═ S1,s2,…,snThe coordinate position set L ═ L corresponding to each node in the structure1,l2,…,lnAnd li=(xi,yi)&&I is not less than 1 and not more than n, and the set S is not less than S1,s2,…,snIn (1)n nodes are classified into K mutually disjoint subsets CiC, wherein C ═ { C ═ C1,C2,…,CKAnd C, and C1∪C2∪…∪CK(ii) S, wherein,
Figure GDA0002957597100000063
and is
Figure GDA0002957597100000064
i ≠ j. By improved k-means algorithm, S is equal to S1,s2,…,snClustering, and obtaining a cluster division C ═ C1,C2,…,CK}. Wherein the minimized square error e in clustering is:
Figure GDA0002957597100000065
wherein the content of the first and second substances,
Figure GDA0002957597100000071
is a cluster CiThe mean vector of (2).
Step 2: after the similar cluster division is carried out on the spatial node positions in the first step, because the Gaussian mixture clustering can accurately quantize objects, the similarity analysis is further carried out on the data acquired at the same time in the same cluster by adopting the Gaussian mixture clustering algorithm in the second stage, so that the redundancy of the nodes on the spatial correlation is more accurate. The core of Gaussian mixture clustering is a probability model, prototype data are analyzed and described by adopting the probability model, and cluster division is mainly determined by posterior probability corresponding to a prototype.
The gaussian distribution is defined as a random variable X in an n-dimensional sample space X, and if X follows a gaussian distribution, its probability density function p (X) is:
Figure GDA0002957597100000072
wherein, mu represents an n-dimensional mean vector,
Figure GDA0002957597100000073
representing an n x n covariance matrix. The Gaussian distribution is completely composed of a mean vector mu and a covariance matrix
Figure GDA0002957597100000074
These two parameters are determined. Therefore, for convenience of description, the probability density function of the dependency relationship of the gaussian distribution on the corresponding parameter is expressed as
Figure GDA0002957597100000075
Gaussian mixture distribution pMComprises the following steps:
Figure GDA0002957597100000076
the distribution is totally defined by K1Each mixture component is corresponding to a Gaussian distribution. Wherein muiAnd
Figure GDA0002957597100000077
is a parameter of the ith Gaussian mixture component, and alphai>0 is the corresponding "mixing coefficient", and
Figure GDA0002957597100000078
the wireless sensor network S1 is composed of K clusters, where all data generated by a cluster can be represented as a set X ═ X1,X2,…,Xn}。Xi={xi(t1),xi(t2),…,xi(t2) Wherein i is more than or equal to 1 and less than or equal to n is a sensor node s per T secondsiThe generated time series set. Each cluster head CH in the whole wireless sensor network continues to classify and cluster the data correlation of the nodes in the cluster, and the Gaussian mixture clustering algorithm is adopted to collect the data sensed at the same time in the similar cluster of the same spatial node
Figure GDA0002957597100000079
Is divided into component K1A cluster of 1 ≦ j&&1≤h≤K1
Let random variable
Figure GDA00029575971000000710
Represents node j1Is sensed data
Figure GDA00029575971000000711
The gaussian mixture component of (a), which is a random value.
Figure GDA00029575971000000712
Prior probability of (2)
Figure GDA00029575971000000713
Corresponds to alphai(i=1,2,…,K1). According to the Bayes' theorem,
Figure GDA00029575971000000714
the posterior distribution of (a) corresponds to:
Figure GDA0002957597100000081
Figure GDA0002957597100000082
is expressed as a sample
Figure GDA0002957597100000083
The posterior probability generated from the ith Gaussian mixture component is recorded as
Figure GDA0002957597100000084
After the Gaussian mixture distribution is obtained, the Gaussian mixture clustering will collect the sample set
Figure GDA0002957597100000085
Is divided into K1Each cluster C ═ Ci1,Ci2,Ci3,…,CiK1}(0<i≤K1) Each sample of
Figure GDA0002957597100000086
Cluster mark of
Figure GDA0002957597100000087
Comprises the following steps:
Figure GDA0002957597100000088
model parameters
Figure GDA0002957597100000089
Solving:
Figure GDA00029575971000000810
and carrying out iterative optimization solution by using an EM algorithm.
If parameter
Figure GDA00029575971000000811
To maximize (4.8), then
Figure GDA00029575971000000812
Is provided with
Figure GDA00029575971000000813
And is
Figure GDA00029575971000000814
Is provided with
Figure GDA00029575971000000815
μiThe mean for each mixture component can be estimated by a sample weighted average, the sample weight being the posterior probability that each sample belongs to that componentRate of change
Figure GDA00029575971000000816
Similarly, composed of
Figure GDA00029575971000000817
Can obtain the product
Figure GDA00029575971000000818
For the mixing coefficient alphaiIn addition to maximizing LL (D), it is desirable to satisfy ai≥0,
Figure GDA00029575971000000819
Lagrangian forms of LL (D)
Figure GDA00029575971000000820
Where ρ is the lagrange multiplier. Is represented by formula (4.12) to alphaiHas a derivative of 0, has
Figure GDA0002957597100000091
Both sides are multiplied by alphaiSumming all the mixed components to obtain rho ═ m, where
Figure GDA0002957597100000092
I.e. the mixing coefficient of each gaussian component is determined by the average a posteriori probability that the sample belongs to that component.
Thus, the sample set
Figure GDA0002957597100000093
The division result is K1Each cluster C ═ Ci1,Ci2,Ci3,…,CiK1}(0<i≤K1)。
And step 3: cluster partitioning according to step 2
Figure GDA0002957597100000094
As a result, the CHs carries out random weighted average on data generated by nodes in the data similarity cluster, and the redundancy removing result
Figure GDA0002957597100000095
Comprises the following steps:
Figure GDA0002957597100000096
wherein beta is1,β2,…,βvIs a weighting factor, and β12+…+βv=1;xw(tj),xa(tj),…,xb(tj) Are respectively sw,sa,…,sbNode at tjThe perception data generated at the moment, and
Figure GDA0002957597100000097
and 4, step 4: removing redundant result in step 3
Figure GDA0002957597100000098
And transmitting the information to an aggregation node Sink.
And (3) redundancy removal and energy consumption analysis of experimental data:
in this experimental part, the data from the intel berkeley research laboratory were used mainly for research analysis. The laboratory has totally arranged 54 sensor nodes, monitors the temperature change condition of different positions of the whole laboratory respectively, and the sensor nodes collect data once every 0.5 minute, collects data of about one month, and the data volume of each node is about forty thousand, totally 54 nodes, therefore, the total data volume reaches two million pieces, and the data volume is huge. In the preliminary test, four ten thousand pieces of data of one node are mainly used for analysis and improvement. The raw data for node 1 is shown in FIG. 4.
In fig. 4, the x-axis represents time and the y-axis represents temperature. The temperature varied dramatically with time, especially near time 430 minutes, reaching a minimum, then increasing with time to reach a maximum peak around 750 minutes, then decreasing again with time to begin at a minimum at 1800 minutes, then increasing again, and reaching a maximum at 2250 minutes and beginning to decrease again. Since the variability of the data is very large, it is very intuitive that the data is the most specific maximum node in the data at around 430, 750, 1800 and 2250 minutes. Therefore, in the data redundancy removing process, the data redundancy removing situation of the four positions needs to be particularly focused.
To verify the performance of the proposed method, the study was simulated using python 3.6. The model uses a single hop approach to transmit data. In order to verify a data transmission algorithm and a network life cycle in the network, a data transmission model and a node energy consumption model are adopted to describe data transmission and node energy consumption conditions in the network. The model parameters are shown in table 1.
Table 1 parameter set-up for simulation experiments
Figure GDA0002957597100000101
(1) In the first stage, Sink classifies and clusters according to the coordinate position of a node by running an improved k-Means clustering algorithm. Assuming that k is 4 and β is {0,0.3,0.5,0.7,1}, the four classification clustering results are obviously changed along with the change of β. The diamonds in the figure represent the cluster centers in four classification clusters, and the classification clustering results are shown in fig. 5.
As can be seen from fig. 5, the classification of 54 nodes mainly appears in two cases: 1) the node classification varies with β and the cluster classification is not altered. 2) The node classification varies with β and alters the cluster classification. The nodes that apparently change in the figure are S ═ {0,2,5,9,10,19,20,32,33,45,46}, and if the classification cluster is labeled clockwise from the top left, the label is used to indicate that the classification cluster is changed clockwiseIs marked as a cluster C1Cluster C2Cluster C3And cluster C4. And respectively carrying out probability calculation of four categories on the nodes which are obviously changed, and further classifying the nodes with the maximum probability into corresponding clusters. The corresponding probability distribution is shown in table 2.
TABLE 2 node distribution Cluster probability
Figure GDA0002957597100000102
Figure GDA0002957597100000111
Through the results in table 2, the Sink classifies nodes which are easy to change into a corresponding class according to the probability ratio, so that the final classification result cluster C of all nodes1Cluster C2Cluster C3And cluster C4Respectively expressed as:
C1={0,2,21,22,23,24,25,26,27,28,29,30,31,32,33};
C2={1,34,35,36,37,38,39,40,41,42,43,44};
C3={3,4,5,6,7,8,9,45,46,47,48,49,50,51,52,53};
C4={10,11,12,13,14,15,16,17,18,19,20}。
(2) second stage, Cluster C1Cluster C2Cluster C3And cluster C4The cluster heads CHs in the cluster respectively operate a Gaussian mixture clustering algorithm, the sensing data of two different moments of the nodes in the cluster are continuously acquired to analyze the similarity of the data between the nodes, and after the data similarity judgment for a period of time, four clusters C are calculated1Cluster C2Cluster C3And cluster C4And (5) final classification results of the middle nodes. Cluster C1The similar classification result between two consecutive sensing data between each node in the graph is shown in fig. 5.
In fig. 6, the abscissa represents the previous sensed data of two data sensed in succession, and the ordinate represents two data sensed in successionThe latter perception of the data, and the cluster C is evident from the figure1The similar classification result clusters among the nodes in (1). Cluster C1Is divided into C11={22,25,28,30,32},C12={23,24,26},C13={27,29,31,33},C14={0,2,21}。
In the same way, Cluster C2Is divided into C21={1,34,35,36},C22={37,38,39},C 2340,4143, 44; cluster C3Is divided into C31={3,4,5,6,7},C32={8,9,45,46},C33{47,48,49,50,51,52,53 }; cluster C4Is divided into C41={10,11,12},C42={13,14,15,16},C43={17,18,19,20}。
(3) And in the third stage, carrying out random weighting on the data in the data similarity cluster to obtain a final redundancy removing result. This stage is mainly in clusters C1Middle sub-cluster C13For example, {27,29,31,33}, let the random weighting factor β be calculated for convenience12+…+βv1 and beta1=β2=…+=βvAnd then analyzing the data result after removing redundancy, as shown in fig. 7, and the relationship of the error between the data and the original data, as shown in fig. 8.
TABLE 3 Cluster C13Mean error comparison of middle nodes
Figure GDA0002957597100000112
The cluster C can be seen in FIG. 713The resulting data after redundancy removal tends to center the sensing data of the redundant node, and the sensing data of the node 29 and the node 33 tends to approach the resulting data, whereas the node 27 and the node 31 are relatively far away from the resulting data. The cluster C is reflected by FIG. 813The error of each node in the data and each sensing data of the result data after redundancy removal is obviously found from the figure, and the error of the node 29 and the node 33 is relatively much lower than that of the node 27 and the node 31. From Table 3, cluster C13Of individual sensing data of individual nodesMean error, it can be seen that the mean error for node 27 is 0.348, the mean error for node 29 is 0.043, the mean error for node 31 is 0.337, and the mean error for node 33 is 0.056. Indicating that even if they belong to the class of data similarity, there are still large differences between the data. Therefore, for the method of performing similarity analysis using only the coordinate positions of the nodes, the spatial correlation analysis between the data is lacking, which in turn may cause a larger error between the data. By means of grading, layering and clustering, accuracy of similarity among node data is guaranteed.
(4) Clustering K on data similarity of TSDA algorithm data redundancy removal rate and data in the second stage1Has a relationship of, but K1The value of (a) in turn affects the accuracy of the data correlation, while K1The larger the data correlation, the more accurate the data correlation. Further analysis with K1Influence of variation of (2) on the Deredundancy Rate (0)<K1Cluster C not more thaniNumber of inner nodes) as shown in fig. 9, and the effect on energy consumption as shown in fig. 10.
As can be seen from FIG. 9, with K1The redundancy removal rate of the system is gradually reduced when K is greater1When 1, the cluster C is described1Cluster C2Cluster C3And cluster C4Does not perform the division of similar sub-clusters, but rather divides cluster C1Cluster C2Cluster C3And cluster C4All nodes in the cluster are divided into similar clusters, and the cluster C is simultaneously divided into similar clusters1Cluster C2Cluster C3And cluster C4The nodes in each cluster are regarded as redundant nodes, and random weighted redundancy removal is carried out on the redundant nodes to obtain result data, so that the redundancy removal rate is maximum at the moment. However, due to K1When the value is 1, it is equivalent to that the node position similarity determination is performed only on all the nodes, and the data similarity determination is not performed, so that the resulting data error is also the largest. When K is1When 10, it is equivalent to put each cluster C1Cluster C2Cluster C3And cluster C4The nodes in (1) are divided into 10 sub-data similar clusters respectively, and then the clusters C are respectively processed1Cluster C2Cluster C3Hezhou clusterC4The 10 sub-similar clusters in the cluster are weighted randomly to obtain the data de-redundancy result, so that the last retained data is the cluster C1Cluster C2Cluster C3And cluster C4And the data of the middle 10 sub-similar clusters are displayed, so that the data redundancy removing rate is the lowest, and the accuracy of the redundancy removing result data is further ensured. K is selected in compromise for ensuring the data accuracy and the data redundancy removal rate1And 4, analyzing and solving the network energy consumption.
As can be seen from FIG. 10, the TSDA algorithm curve is mainly distributed between 97.50% and 98.0%, the TCDA algorithm curve is mainly distributed between 96.26% and 96.75%, and the TSDA algorithm curve variation and the TCDA algorithm curve variation are maintained at 1.25%. Therefore, the TSDA algorithm is combined with the TCDA algorithm, 70% of redundant data can be further removed, and the network energy consumption can be further improved by 1.25%. Meanwhile, the accuracy of the redundant node can be maintained between 0.043 and 0.35. The redundancy removing rate can be made to be the highest within the error range allowed by a user.
It will be understood that modifications and variations can be made by persons skilled in the art in light of the above teachings and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.

Claims (1)

1. A multi-stage hierarchical clustering spatial correlation temperature sensing data de-redundancy method is characterized by comprising the following steps:
step 1: acquiring a large amount of temperature sensing data acquired by a temperature sensor network, improving a k-Means method by using Euclidean distance and Pearson distance on a Sink node, and performing node similarity analysis on the node according to a node position coordinate to obtain a redundant node cluster;
the method for improving the k-Means method by using the Euclidean distance and the Pearson distance in the step 1 comprises the following steps:
the spatial similarity distance D (i, j) of the two nodes is as follows:
D(i,j)=DE(i,j)+βDP(i,j)
wherein, Euclidean distance DE(i, j) is:
Figure FDA0002957597090000011
pearson correlation distance DP(i, j) is:
Figure FDA0002957597090000012
wherein beta is a scale factor and represents DP(i, j) influence on the weight of D (i, j); the spatial position coordinates of n SNs nodes in the sensor network S1 are respectively (x)i,yi) Where 1 ≦ i ≦ n, the node is represented as the set S ≦ S1,s2,…,sn}; and the Sink node runs an improved k-Means algorithm according to the set S ═ S1,s2,…,snThe coordinate position set L ═ L corresponding to each node in the structure1,l2,…,lnAnd li=(xi,yi)&&I is not less than 1 and not more than n, and the set S is not less than S1,s2,…,snN nodes in the tree are classified into K mutually disjoint subsets CiC, wherein C ═ { C ═ C1,C2,…,CKAnd C, and C1∪C2∪…∪CK(ii) S, wherein,
Figure FDA0002957597090000013
and is
Figure FDA0002957597090000014
By improved k-means algorithm, S is equal to S1,s2,…,snClustering, and obtaining a cluster division C ═ C1,C2,…,CK};
The improved k-Means algorithm in the step 1 comprises the following specific steps:
step 1.1, setting the number k of clustering centers of an improved k-Means algorithm;
step 1.2, randomly selecting k nodes from the sensor network S1 as an initial mean value (mu)12,…,μk};
Step 1.3, respectively solving the position coordinates ljAnd the mean vector mui(1. ltoreq. i. ltoreq. k) spatial similarity distance D (i, j): d (i, j) ← DE(i,j)+βDP(i,j);
Step 1.4, mixing of application and muiDetermining the node position l by D (i, j) with the minimum distancejCluster classification of (2):
Figure FDA0002957597090000021
step 1.5, update mui
Figure FDA0002957597090000023
Step 1.6, repeatedly executing the step 1.3 to the step 1.5 until a clustering result is obtained;
step 2: performing similarity judgment on data in the cluster by using a Gaussian mixture clustering method at a cluster head CHs node of redundant node clustering, thereby further performing data redundancy clustering on the nodes in the cluster;
the method in the step 2 specifically comprises the following steps:
the wireless sensor network S1 is composed of K clusters, where all data generated by a cluster is represented as a set X ═ X1,X2,…,Xn};Xi={xi(t1),xi(t2),…,xi(t2) Wherein i is more than or equal to 1 and less than or equal to n is a sensor node s per T secondsiA generated time series set; each cluster head CH in the whole wireless sensor network continues to classify and cluster the intra-cluster nodes according to the data correlation, and a Gaussian mixture clustering algorithm is adopted to collect data D sensed at the same time in the same spatial node similar clusterCHh(tj)={x1(tj),x2(tj),…,xz(tj) Is divided into K1A cluster of 1 ≦ j&&1≤h≤K1(ii) a Sample set
Figure FDA0002957597090000024
The division result is K1Each cluster C ═ Ci1,Ci2,Ci3,...,CiK1},0<i≤K1
The Gaussian mixture clustering method adopted in the step 2 specifically comprises the following steps:
let random variable
Figure FDA0002957597090000025
Represents node j1Is sensed data
Figure FDA0002957597090000026
The gaussian mixture component of (a), which is a random value;
Figure FDA0002957597090000027
prior probability of (2)
Figure FDA0002957597090000028
Corresponds to alphai(i=1,2,…,K1) (ii) a According to the Bayes' theorem,
Figure FDA0002957597090000029
the posterior distribution of (a) corresponds to:
Figure FDA00029575970900000210
wherein the content of the first and second substances,
Figure FDA00029575970900000211
is expressed as a sample
Figure FDA00029575970900000212
The posterior probability generated from the ith Gaussian mixture component is recorded as
Figure FDA0002957597090000031
After the Gaussian mixture distribution is obtained, the Gaussian mixture clustering will collect the sample set
Figure FDA0002957597090000032
Is divided into K1Each cluster C ═ Ci1,Ci2,Ci3,...,CiK1},0<i≤K1Each sample of
Figure FDA0002957597090000033
Cluster mark of
Figure FDA0002957597090000034
Comprises the following steps:
Figure FDA0002957597090000035
model parameters
Figure FDA0002957597090000036
Solving:
Figure FDA0002957597090000037
using EM algorithm to carry out iterative optimization solution to obtain a sample set
Figure FDA0002957597090000038
The division result of (1);
and step 3: after the data redundancy clustering is obtained, carrying out random weighting on the data in the data redundancy clustering to obtain a final redundancy removing result;
the method in the step 3 specifically comprises the following steps:
according to the cluster division obtained in step 2
Figure FDA0002957597090000039
As a result, the CHs carries out random weighted average on data generated by nodes in the data similarity cluster, and the redundancy removing result
Figure FDA00029575970900000310
Comprises the following steps:
Figure FDA00029575970900000311
wherein, beta1,β2,…,βvIs a weighting factor, and β12+…+βv=1;xw(tj),xa(tj),…,xb(tj) Are respectively sw,sa,…,sbNode at tjThe perception data generated at the moment, and
Figure FDA00029575970900000312
and 4, step 4: and transmitting the temperature data with the redundancy removed to the Sink node.
CN202010361344.0A 2020-04-30 2020-04-30 Multi-stage hierarchical clustering spatial correlation temperature perception data redundancy removing method Active CN111601358B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010361344.0A CN111601358B (en) 2020-04-30 2020-04-30 Multi-stage hierarchical clustering spatial correlation temperature perception data redundancy removing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010361344.0A CN111601358B (en) 2020-04-30 2020-04-30 Multi-stage hierarchical clustering spatial correlation temperature perception data redundancy removing method

Publications (2)

Publication Number Publication Date
CN111601358A CN111601358A (en) 2020-08-28
CN111601358B true CN111601358B (en) 2021-05-18

Family

ID=72190901

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010361344.0A Active CN111601358B (en) 2020-04-30 2020-04-30 Multi-stage hierarchical clustering spatial correlation temperature perception data redundancy removing method

Country Status (1)

Country Link
CN (1) CN111601358B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112994991B (en) * 2021-05-20 2021-07-16 中南大学 Redundant node discrimination method, device and equipment and readable storage medium
CN116992220B (en) * 2023-09-25 2023-12-19 国网北京市电力公司 Low-redundancy electricity consumption data intelligent acquisition method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2502775A (en) * 2012-05-31 2013-12-11 Toshiba Res Europ Ltd Selecting routes between nodes in a network based on node processing gain and lifetime
CN107241776B (en) * 2017-07-18 2019-03-22 中南民族大学 A kind of wireless sensor network data fusion method mixing delay sensitive sub-clustering
CN109446028B (en) * 2018-10-26 2022-05-03 中国人民解放军火箭军工程大学 Method for monitoring state of refrigeration dehumidifier based on genetic fuzzy C-mean clustering
CN110830946B (en) * 2019-11-15 2020-11-06 江南大学 Mixed type online data anomaly detection method

Also Published As

Publication number Publication date
CN111601358A (en) 2020-08-28

Similar Documents

Publication Publication Date Title
WO2019233189A1 (en) Method for detecting sensor network abnormal data
CN111601358B (en) Multi-stage hierarchical clustering spatial correlation temperature perception data redundancy removing method
WO2009038271A1 (en) Method for automatic clustering, and method and apparatus for multipath clustering in wireless communication using the same
CN106682454B (en) A kind of macro genomic data classification method and device
CN110880369A (en) Gas marker detection method based on radial basis function neural network and application
CN110379521B (en) Medical data set feature selection method based on information theory
CN111436926A (en) Atrial fibrillation signal detection method based on statistical characteristics and convolution cyclic neural network
CN110460401B (en) Cooperative spectrum sensing method based on matrix decomposition and particle swarm optimization clustering
CN110309887A (en) Based on the Fuzzy C-Means Clustering method for detecting abnormality for improving flower pollination
CN112287980B (en) Power battery screening method based on typical feature vector
CN113075129B (en) Hyperspectral image band selection method and system based on neighbor subspace division
CN111275132A (en) Target clustering method based on SA-PFCM + + algorithm
CN111291822A (en) Equipment running state judgment method based on fuzzy clustering optimal k value selection algorithm
CN107679138A (en) Spectrum signature system of selection based on local scale parameter, entropy and cosine similarity
CN110796159A (en) Power data classification method and system based on k-means algorithm
CN112305441A (en) Power battery health state assessment method under integrated clustering
CN108985462B (en) Unsupervised feature selection method based on mutual information and fractal dimension
CN110689140A (en) Method for intelligently managing rail transit alarm data through big data
CN111934797B (en) Collaborative spectrum sensing method based on covariance eigenvalue and mean shift clustering
CN111666999A (en) Remote sensing image classification method
CN111797899A (en) Low-voltage transformer area kmeans clustering method and system
Shenoy et al. Anamoly detection in wireless sensor networks
CN110784887B (en) Method for detecting number of abnormal signal sources in gridding radio signal monitoring system
Choi et al. Comparison of various statistical methods for detecting disease outbreaks
Hou et al. The indoor wireless location technology research based on WiFi

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant