CN104994170A - Distributed clustering method based on mixed factor analysis model in sensor network - Google Patents

Distributed clustering method based on mixed factor analysis model in sensor network Download PDF

Info

Publication number
CN104994170A
CN104994170A CN201510414218.6A CN201510414218A CN104994170A CN 104994170 A CN104994170 A CN 104994170A CN 201510414218 A CN201510414218 A CN 201510414218A CN 104994170 A CN104994170 A CN 104994170A
Authority
CN
China
Prior art keywords
node
rsqb
lsqb
data
css
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510414218.6A
Other languages
Chinese (zh)
Other versions
CN104994170B (en
Inventor
魏昕
周亮
周全
陈建新
王磊
赵力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Tian Gu Information Technology Co ltd
Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201510414218.6A priority Critical patent/CN104994170B/en
Publication of CN104994170A publication Critical patent/CN104994170A/en
Application granted granted Critical
Publication of CN104994170B publication Critical patent/CN104994170B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W84/00Network topologies
    • H04W84/18Self-organising networks, e.g. ad-hoc networks or sensor networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a distributed clustering method based on a mixed factor analysis model in a sensor network. The distributed clustering method utilizes the mixed factor analysis model for modeling data to be clustered at nodes in the sensor network, each node calculates local sufficient statistics based on its own data, then spreads and broadcasts the local sufficient statistics to the neighbour nodes, jointly sufficient statistics can be obtained when the node receives all local sufficient statistics from the neighbour nodes, parameters in the mixed factor analysis model are estimated based on the jointly sufficient statistics, and finally the clustering is completed based on the estimated model. The distributed clustering method establishes the mixed factor analysis model so as to complete dimensionality reduction of the data while clustering, adopts the distributed clustering method, and avoids network collapse caused by a center node in the traditional centralized processing method. In the distributed clustering method, the sufficient statistics rather than data are transmitted among the nodes, thus the communication overhead is greatly saved, and private information in the data can be better protected.

Description

Based on the distributed clustering method of hybrid cytokine analytical model in sensor network
Technical field
The present invention relates to the distributed clustering method based on hybrid cytokine analytical model in a kind of sensor network, belong to the parallel and distributed process method of data and the technical field of application.
Background technology
Sensor network is made up of the microsensor node of the static in a large number or movement be deployed in monitored area, and its single-sensor node is very limited for the ability of the collection of data, storage, process and transmission.Therefore, for the data processing in sensor network, need to improve traditional data processing.At present, data processing in the sensor mainly contains two kinds of modes, centralized processing and distributed treatment.In centralized processing mode, wherein will be appointed as Centroid by certain node, the original data transmissions collected is aggregated into Centroid by other nodes, completes the process of data at Centroid place, then again result is turned back to each node.The shortcoming of which was once Centroid lost efficacy, and brought detrimental effects can to whole network.Another kind of processing mode is distributed treatment.In this approach, all node status are identical, by the communication between neighbor node and cooperation, finally complete data processing task.Compared with centralized data processing, distributed treatment can be avoided because the inefficacy adverse effect of Centroid, and the robustness of whole network is stronger.And the present invention can solve problem above well.
Summary of the invention
The object of the invention is the defect solving prior art, propose a kind of in sensor network the distributed clustering method based on hybrid cytokine analytical model, cluster refers to the process by certain method, data being divided into multiple class.The class generated by cluster is the set of one group of data object, and these objects are similar each other to the object in same bunch, different with the object in other bunches.Due in cluster, the class label belonging to data is unknown, therefore in machine learning field, is processes of a unsupervised learning to the cluster of data.Existing data clustering method is a lot, but the cluster of most hypothesis total data all completes in a processing center, and in sensor network, distributed treatment is very crucial, therefore, the method, just in order to address this problem, designs a kind of distributed clustering method based on hybrid cytokine analytical model.Its advantage mainly contains: (1) hybrid cytokine analytical model effectively can process high dimensional data; (2) by cooperation mode between design node; only transmission intermediate object program just can obtain satisfied cluster result, compared with transmission initial data mode, has both reduced the expense of communication; protect again the privacy information in data, ensure that the data security in network.
The present invention solves the technical scheme that its technical problem takes: based on the distributed clustering method of hybrid cytokine analytical model in a kind of sensor network, the method comprises the steps:
If there be M sensor node in sensor network, m node collects N mindividual data, are expressed as wherein y m,nrepresent n-th data at node m place, dimension is p.By hybrid cytokine analytical model (MFA), Y is described m(m=1 ..., M) distribution, note the public same MFA of the data of all nodes.MFA is a component number is the mixed model of I; For each data y m,n, it can be expressed as:
Y m,ni+ A iu m,n+ e m, n, iwith probability π i(i=1 ..., I),
Wherein, μ ibe the p dimension mean value vector of i-th blending constituent, u m,nfor with data y m,nthe factor in corresponding lower dimensional space, its dimension is q (q < < p), Gaussian distributed N (u m,n| 0, I q), the value of q is chosen according to the size of p in particular problem, generally gets the arbitrary integer between q=p/6 ~ p/2; A ifor the Factor load-matrix of (p × q); Error variance e m, n, igaussian distributed N (e m, n, i| 0, D i), wherein D ifor the diagonal matrix of (p × p); Probability π imeet so the parameter sets Θ of MFA is { π i, A i, μ i, D i} i=1 ..., I.Note, for all nodes, in its MFA parameter sets to be estimated, parameters value is identical.
In addition, the data transmission range of each node is set to W, and for present node m, all nodes being less than W with its distance are its neighbor node, and the neighbor node set expression of node m is R m.Illustrate the relation between certain each node of sensor network interior joint in Fig. 1, wherein circle represents node, if there is limit to be connected between two nodes, then represents and can communicate mutually between two nodes, transmission information.The R of the m of the empty wire frame representation node in Fig. 1 m.In the present invention, network topology determined before Distributed Cluster is implemented, and will ensure direct between any two nodes or intercommunication after multi-hop.
After the MFA that sensor network topological of the present invention and data of description distribute establishes, then start Distributed Cluster process, as shown in Figure 2, its concrete steps comprise:
Step 1: initialization; Have M sensor node in sensor network, m node collects N mindividual data, are expressed as wherein y m,nrepresent n-th data at node m place, dimension is p; Network topology is determined in advance, and the neighbor node set expression of node m is R m; By hybrid cytokine analytical model (MFA), Y is described m(m=1 ..., M) distribution, the same MFA of data sharing of all nodes; The parameter sets of MFA is { π i, A i, μ i, D i} i=1 ..., I, wherein π ibe the weight of i-th blending constituent, A ibe the Factor load-matrix of (p × q) of i-th blending constituent, q is the dimension of the low-dimensional factor, gets the arbitrary integer between q=p/6 ~ p/2; μ ibe the p dimension mean value vector of i-th blending constituent, D iit is the covariance matrix of the error of i-th blending constituent;
First, setting MFA in be mixed into mark I, be also classification number to be clustered; The initial value of each parameter in MFA is set according to I, p and q; Wherein, each Nodes ( &pi; 1 0 , ... , &pi; i 0 , ... , &pi; I 0 ) = ( 1 / I , ... , 1 / I , ... , 1 / I ) , random selecting from the data that this node collects, with in each element generate from standardized normal distribution N (0,1); In addition, the data amount check N that collected of each node l lbe broadcast to its neighbor node; When certain node m receives its all neighbor node l (l ∈ R m) broadcast come data amount check after, this node calculates weight c according to following formula lm:
c l m = N l &Sigma; l &prime; &Element; R m N l &prime; ;
After initialization completes, iteration count iter=1, starts iterative process;
Step 2: local calculation; At each node l place, based on the data Y that it collects l, first calculate intermediate variable g i, Ω i, and <z l, n, i>, (n=1 ..., N l; I=1 ..., I):
g i = &lsqb; A i o l d ( A i o l d ) T + D i o l d &rsqb; - 1 &CenterDot; A i o l d ,
&Omega; i = I q - g i T A i o l d ,
u &OverBar; l , n , i = g i T ( y l , n - &mu; i o l d ) ,
< z l , n , i > = &pi; i o l d N ( y l , n | &mu; i o l d , A i o l d ( A i o l d ) T + D i o l d ) &Sigma; i &prime; = 1 I &pi; i &prime; o l d N ( y l , n | &mu; i &prime; o l d , A i &prime; o l d ( A i &prime; o l d ) T + D i &prime; o l d ) ,
Wherein, for a front iteration complete after the parameter value that obtains (be the initial value of parameter first during iteration <z l, n, i> represents node l place n-th data y l,nbelong to the probability of i-th class (blending constituent);
Then, node calculate local sufficient statistic comprise:
LSS l ( 1 ) &lsqb; i &rsqb; = &Sigma; n = 1 N l < z l , n , i > , LSS l ( 2 ) &lsqb; i &rsqb; = &Sigma; n = 1 N l < z l , n , i > &CenterDot; y l , n , LSS l ( 3 ) &lsqb; i &rsqb; = &Sigma; n = 1 N l < z l , n , i > &CenterDot; y l , n &CenterDot; y l , n T ;
Step 3: broadcast diffusion; The local sufficient statistic LSS that each node l in sensor network will calculate lbroadcast diffusion is to its neighbor node;
Step 4: combined calculation; When node m (m=1 ..., M) receive from its all neighbor node l (l ∈ R m) LSS lafter, node m calculates associating sufficient statistic CSS m = { CSS m ( 1 ) &lsqb; i &rsqb; , CSS m ( 2 ) &lsqb; i &rsqb; , CSS m ( 3 ) &lsqb; i &rsqb; } i = 1 , ... , I :
CSS m ( 1 ) &lsqb; i &rsqb; = &Sigma; l &Element; R m c l m &CenterDot; LSS l ( 1 ) &lsqb; i &rsqb; ,
CSS m ( 2 ) &lsqb; i &rsqb; = &Sigma; l &Element; R m c l m &CenterDot; LSS l ( 2 ) &lsqb; i &rsqb; ,
CSS m ( 3 ) &lsqb; i &rsqb; = &Sigma; l &Element; R m c l m &CenterDot; LSS l ( 3 ) &lsqb; i &rsqb; ;
Step 5: estimated parameter; Node m (m=1 ..., M) CSS that calculates according to previous step m, estimate Θ={ π i, A i, μ i, D i} i=1 ..., I, wherein, { π i, μ i} i=1 ..., Iestimation procedure as follows:
&pi; i = CSS m ( 1 ) &lsqb; i &rsqb; &Sigma; i &prime; = 1 I CSS m ( 1 ) &lsqb; i &prime; &rsqb; ,
&mu; i = CSS m ( 2 ) &lsqb; i &rsqb; CSS m ( 1 ) &lsqb; i &rsqb; ;
For { A i, D i} i=1 ..., Iestimation, process is as follows:
V i = CSS m ( 3 ) &lsqb; i &rsqb; - 2 CSS m ( 2 ) &lsqb; i &rsqb; &CenterDot; &mu; i + CSS m ( 1 ) &lsqb; i &rsqb; &CenterDot; &mu; i &CenterDot; &mu; i T CSS m ( 1 ) &lsqb; i &rsqb; ,
A i = V i g i ( g i T V i g i + &Omega; i ) - 1 ,
D i = d i a g { V i - A i ( g i T V i g i + &Omega; i ) A i T } ;
Step 6: judgement convergence; Node m (m=1 ..., M) calculate log-likelihood under current iteration:
log p ( Y m | &Theta; ) = &Sigma; n = 1 N m log ( &Sigma; i = 1 I &pi; i N ( y m , n | &mu; i , A i A i T + D i ) ) ;
If logp is (Y m| Θ)-logp (Y m| Θ old) < ε, then restrain, stop iteration; Otherwise perform step 2, start next iteration (iter=iter+1); Wherein Θ represents the parameter value that current iteration estimates, Θ oldrepresent the parameter value estimated in last iteration, that is, the log-likelihood of adjacent twice iteration is less than threshold epsilon, algorithmic statement; ε gets 10 -5~ 10 -6in arbitrary value; Because node each in network is parallel data processing, therefore allly can not to restrain in an iteration simultaneously; When node l restrains, node m not yet restrains, then node l no longer sends LSS l, also no longer receive the information of neighbor node transmission; Node m is then with the LSS that the node l received for the last time sends lupgrade its CSS m; The node of not restraining continues iteration, until all nodes are all restrained in network;
Step 7: cluster exports; After step 1-step 6, and node m (m=1 ..., M) obtain data each with it corresponding <z m, n, i> (n=1 ..., N m; I=1 ..., I), by <z m, n, i> (i=1 ..., I) in the sequence number corresponding to maximum as y m,nthe class C be finally allocated to m,n, that is:
C m , n = argmax i = 1 I < z m , n , i > ;
Obtain the cluster result of all data on all nodes in such a way
The present invention's hybrid cytokine analytical model carrys out the data that in modeling sensor network, each Nodes is to be clustered, each node calculates local sufficient statistic based on its data, then the diffusion of this amount is broadcast to its neighbor node, after node receives all local sufficient statistics from neighbor node, it can obtain associating sufficient statistic, and estimate the parameters in hybrid cytokine analytical model based on this statistic, finally complete cluster based on the model estimated.The hybrid cytokine analytical model that the present invention sets up can complete the dimensionality reduction of data while cluster; and adopt Distributed Cluster mode; avoid the periods of network disruption brought by Centroid in traditional centralized processing mode; in addition; in distributed clustering method of the present invention, each inter-node transmission be sufficient statistic instead of data, both greatly saved communication overhead; again can privacy information preferably in protected data, make to adopt the security of system of the method greatly to increase.
Beneficial effect:
1. the hybrid cytokine analyzer adopted in the present invention can carry out dimensionality reduction to high dimensional data, thus completes cluster smoothly while dimensionality reduction, obtains better clustering performance.
2. the distributed clustering method based on hybrid cytokine analytical model adopted in the present invention, make each node in sensor network can make full use of the information comprised in the data of other node, clustering performance is better than centralized approach.
3. the distributed clustering method based on hybrid cytokine analytical model adopted in the present invention; in node cooperation process; exchange local sufficient statistic instead of directly transmit initial data; because the quantity of local sufficient statistic and dimension are much smaller than data; therefore this mode saves the expense of communication on the one hand; on the other hand, be conducive to the privacy information adequately protected in data, improve the security performance of the system adopting this method.
Accompanying drawing explanation
Fig. 1 is the neighbor node collection R of sensor network interior joint m of the present invention m, and the schematic diagram of transmitting-receiving local sufficient statistic (that is: LSS) between node.
Fig. 2 is the flow chart based on the distributed clustering method of hybrid cytokine analytical model in the sensor network that the present invention relates to.
Fig. 3 is the data clusters result schematic diagram of each Nodes in embodiments of the invention.
Embodiment
Below in conjunction with Figure of description, the invention is described in further detail.
In order to the distributed clustering method based on hybrid cytokine analytical model in the sensor network that the present invention relates to is described better, be applied to the cluster of wine compositional data.In some countries, in zones of different, be distributed with some measuring stations, for detecting each component content in wine.The kind delivering to the wine of measuring station is different.Therefore need to carry out cluster to other wine of Similarity Class.If this measuring station can form Sensor Network with other measuring stations, by cooperating with each other, the data of wine in other measuring stations can be made full use of, thus improve cluster accuracy.Here, wine to be clustered, data from UCI machine learning databases, has 178 data here, altogether from 3 classes.The dimension of each data is 13, represents the content of each composition in wine.Have 8 nodes in sensor network, the average neighbor node number of each node is 2, and network is (all there is the path directly or indirectly arrived between any two nodes) that can be communicated with.Therefore in the present example, M=8, p=13, I=3, q=3.In addition, the data bulk at the place of each node: N 1=21, N 2=22, N 3=21, N 4=21, N 5=22, N 6=22, N 7=21, N 8=28; The neighbor node of each node: R 1={ 3,5,6}, R 2={ 3,5}, R 3={ Isosorbide-5-Nitrae, 2}, R 4={ 3}, R 5={ 1,2}, R 6={ 1,7}, R 7={ 6,8}, R 8={ 3,7}.
According to the flow process of summary of the invention (shown in Fig. 2), start Distributed Cluster:
(1) initialization: the initial value of parameter in setting MFA.Wherein, each Nodes ( &pi; 1 0 , ... , &pi; i 0 , ... , &pi; I 0 ) = ( 1 / I , ... , 1 / I , ... , 1 / I ) , random selecting from the data of node, with in each element generate from standardized normal distribution N (0,1).In addition, and each node l (l=1 ..., M) the data amount check N that collected lbe broadcast to its neighbor node.After certain node m receives its data amount check of all neighbor nodes broadcast, this node calculate weight c lm:
c l m = N l &Sigma; l &prime; &Element; R m N l &prime;
The implication of this weight is each neighbor node l (the l ∈ R for weighing node m m) each importance of information at node m place transmitted.After initialization completes, iteration count iter=1, starts iterative process.
(2) local calculation: this step does not need the information of neighbor node.At each node l place, based on the data Y that it collects l, first calculate g i, Ω i, and <z l, n, i>, (n=1 ..., N l; I=1 ..., I):
g i = &lsqb; A i o l d ( A i o l d ) T + D i o l d &rsqb; - 1 &CenterDot; A i o l d
&Omega; i = I q - g i T A i o l d ,
u &OverBar; l , n , i = g i T ( y l , n - &mu; i o l d ) ,
< z l , n , i > = &pi; i o l d N ( y l , n | &mu; i o l d , A i o l d ( A i o l d ) T + D i o l d ) &Sigma; i &prime; = 1 I &pi; i &prime; o l d N ( y l , n | &mu; i &prime; o l d , A i &prime; o l d ( A i &prime; o l d ) T + D i &prime; o l d ) ,
Wherein for a front iteration complete after the parameter value that obtains (be the initial value of parameter first during iteration <z l, n, i> represents node l place n-th data y l,nbelong to the probability of i-th class (blending constituent).
Secondly, node calculate local sufficient statistic as follows:
LSS l ( 1 ) &lsqb; i &rsqb; = &Sigma; n = 1 N l < z l , n , i > , LSS l ( 2 ) &lsqb; i &rsqb; = &Sigma; n = 1 N l < z l , n , i > &CenterDot; y l , n , LSS l ( 3 ) &lsqb; i &rsqb; = &Sigma; n = 1 N l < z l , n , i > &CenterDot; y l , n &CenterDot; y l , n T
(3) broadcast diffusion: the local statistic LSS that each node l in sensor network will calculate lbroadcast is spread to its neighbor node, as shown in Figure 1.
(4) combined calculation: when node m (m=1 ..., M) receive from its all neighbor node l (l ∈ R m) LSS lafter, node m calculates associating sufficient statistic CSS m = { CSS m ( 1 ) &lsqb; i &rsqb; , CSS m ( 2 ) &lsqb; i &rsqb; , CSS m ( 3 ) &lsqb; i &rsqb; } i = 1 , ... , I :
CSS m ( 1 ) &lsqb; i &rsqb; = &Sigma; l &Element; R m c l m &CenterDot; LSS l ( 1 ) &lsqb; i &rsqb; ,
CSS m ( 2 ) &lsqb; i &rsqb; = &Sigma; l &Element; R m c l m &CenterDot; LSS l ( 2 ) &lsqb; i &rsqb; ,
CSS m ( 3 ) &lsqb; i &rsqb; = &Sigma; l &Element; R m c l m &CenterDot; LSS l ( 3 ) &lsqb; i &rsqb; .
(5) estimated parameter: node m (m=1 ..., M) CSS that calculates according to previous step m, estimate Θ={ π i, A i, μ i, D i} i=1 ..., I, wherein, { π i, μ i} i=1 ..., Iestimation procedure as follows:
&pi; i = CSS m ( 1 ) &lsqb; i &rsqb; &Sigma; i &prime; = 1 I CSS m ( 1 ) &lsqb; i &prime; &rsqb; ,
&mu; i = CSS m ( 2 ) &lsqb; i &rsqb; CSS m ( 1 ) &lsqb; i &rsqb; ,
For { A i, D i} i=1 ..., Iestimation, process is as follows:
V i = CSS m ( 3 ) &lsqb; i &rsqb; - 2 CSS m ( 2 ) &lsqb; i &rsqb; &CenterDot; &mu; i + CSS m ( 1 ) &lsqb; i &rsqb; &CenterDot; &mu; i &CenterDot; &mu; i T CSS m ( 1 ) &lsqb; i &rsqb; ,
A i = V i g i ( g i T V i g i + &Omega; i ) - 1 ,
D i = d i a g { V i - A i ( g i T V i g i + &Omega; i ) A i T } .
(6) judgement convergence: node m (m=1 ..., M) calculate log-likelihood under current iteration:
log p ( Y m | &Theta; i t e r ) = &Sigma; n = 1 N m log ( &Sigma; i = 1 I &pi; i N ( y m , n | &mu; i , A i A i T + D i ) ) ,
If logp is (Y m| Θ)-logp (Y m| Θ old) < ε, then restrain, stop iteration; Otherwise perform step (2), start next iteration (iter=iter+1); Wherein Θ represents the parameter value that current iteration estimates, Θ oldrepresent the parameter value estimated in last iteration, that is, the log-likelihood of adjacent twice iteration is less than threshold epsilon, algorithmic statement; ε gets 10 -5~ 10 -6in arbitrary value; It should be noted that because node each in network is parallel data processing, therefore allly can not to restrain in an iteration simultaneously; Such as, when when node l restrains, node m not yet restrains, then node l no longer sends LSS l, also no longer receive the information of neighbor node transmission; Node m is then with the LSS that the node l received for the last time sends lupgrade its CSS m; The node of not restraining continues iteration, until all nodes are all restrained in network.
(7) cluster exports.After step (1)-(6), and node m (m=1 ..., M), obtain data { y each with it m,n} n=1 ..., Nmcorresponding <z m, n, i> (n=1 ..., N m; I=1 ..., I), by <z m, n, i>, i=1 ..., the sequence number corresponding to the maximum in I is as y m,nthe class C be finally allocated to m,n, that is:
C m , n = { j = arg max i < z m , n , i > } ;
Obtain the cluster result of all data on all nodes in such a way
Performance evaluation:
By the result adopting clustering method involved in the present invention to obtain compare with correct generic result, thus can evaluate and weigh out validity and the accuracy of method involved in the present invention.As shown in Figure 3, the abscissa of this figure represents 178 data to the cluster result of each Nodes, and these data of the positional representation of non-vacancy have been assigned to that node, and ordinate represents the classification sequence number (totally 3 classes) that these data are assigned to.In the figure, " o " represents the data of correct cluster, and " x " represents the data of wrong cluster.From Fig. 3, the cluster accuracy of 8 Nodes is: 100%, 100%, 95.2%, 95.5%, 100%, 95.5%, 100%, 92.9%.Altogether only have five data by the cluster of mistake, the average accuracy of whole network is 97.2%.Compare with the result (98%) adopting the method for localized transmission to obtain, its accuracy is substantially identical.And the shortcoming of localized transmission is fairly obvious, one, once Centroid lost efficacy, then whole periods of network disruption; Its two, each node directly by original data transmissions to Centroid, not only increase the communications burden in network, and the privacy easily in leak data.Therefore, adopt method of the present invention to overcome above shortcoming, obtain good Distributed Cluster performance.
The scope of request protection of the present invention is not limited only to the description of this embodiment.

Claims (2)

1. in sensor network based on the distributed clustering method of hybrid cytokine analytical model, it is characterized in that, described method comprises the steps:
Step 1: initialization; Have M sensor node in sensor network, m node collects N mindividual data, are expressed as wherein y m,nrepresent n-th data at node m place, dimension is p; Network topology is determined in advance, and the neighbor node set expression of node m is R m; By hybrid cytokine analytical model (MFA), Y is described m(m=1 ..., M) distribution, the same MFA of data sharing of all nodes; The parameter sets of MFA is { π i, A i, μ i, D i} i=1 ..., I, wherein π ibe the weight of i-th blending constituent, A ibe the Factor load-matrix of (p × q) of i-th blending constituent, q is the dimension of the low-dimensional factor, gets the arbitrary integer between q=p/6 ~ p/2; μ ibe the p dimension mean value vector of i-th blending constituent, D iit is the covariance matrix of the error of i-th blending constituent;
First, setting MFA in be mixed into mark I, be also classification number to be clustered; The initial value of each parameter in MFA is set according to I, p and q; Wherein, each Nodes ( &pi; 1 0 , ... , &pi; i 0 , ... , &pi; I 0 ) = ( 1 / I , ... , 1 / I , ... , 1 / I ) , random selecting from the data that this node collects, with in each element generate from standardized normal distribution N (0,1); In addition, the data amount check N that collected of each node l lbe broadcast to its neighbor node; When certain node m receives its all neighbor node l (l ∈ R m) broadcast come data amount check after, this node calculates weight c according to following formula lm:
c l m = N l &Sigma; l &prime; &Element; R m N l &prime; ;
After initialization completes, iteration count iter=1, starts iterative process;
Step 2: local calculation; At each node l place, based on the data Y that it collects l, first calculate intermediate variable g i, Ω i, and <z l, n, i>, (n=1 ..., N l; I=1 ..., I):
g i = &lsqb; A i o l d ( A i o l d ) T + D i o l d &rsqb; - 1 &CenterDot; A i o l d ,
&Omega; i = I q - g i T A i o l d ,
u &OverBar; l , n , i = g i T ( y l , n - &mu; i o 1 d ) ,
< z l , n , i > = &pi; i o l d N ( y l , n | &mu; i o l d , A i o 1 d ( A i o 1 d ) T + D i o l d ) &Sigma; i &prime; = 1 I &pi; i &prime; o l d N ( y l , n | &mu; i &prime; o l d , A i &prime; o l d ( A i &prime; o l d ) T + D i &prime; o l d ) ,
Wherein, for a front iteration complete after the parameter value that obtains (be the initial value of parameter first during iteration <z l, n, i< represents node l place n-th data y l,nbelong to the probability of i-th class (blending constituent);
Then, node calculate local sufficient statistic LSS l = { LSS l ( 1 ) &lsqb; i &rsqb; , LSS l ( 2 ) &lsqb; i &rsqb; , LSS l ( 3 ) &lsqb; i &rsqb; } i = 1 , ... , I , Comprise:
LSS l ( 1 ) &lsqb; i &rsqb; = &Sigma; n = 1 N l < z l , n , i > , LSS l ( 2 ) &lsqb; i &rsqb; = &Sigma; n = 1 N l < z l , n , i > &CenterDot; y l , n , LSS l ( 3 ) &lsqb; i &rsqb; = &Sigma; n = 1 N l < z l , n , i > &CenterDot; y l , n &CenterDot; y l , n T ;
Step 3: broadcast diffusion; The local sufficient statistic LSS that each node l in sensor network will calculate lbroadcast diffusion is to its neighbor node;
Step 4: combined calculation; When node m (m=1 ..., M) receive from its all neighbor node l (l ∈ R m) LSS lafter, node m calculates associating sufficient statistic CSS m = { CSS m ( 1 ) &lsqb; i &rsqb; , CSS m ( 2 ) &lsqb; i &rsqb; , CSS m ( 3 ) &lsqb; i &rsqb; } i = 1 , ... , I :
CSS m ( 1 ) &lsqb; i &rsqb; = &Sigma; l &Element; R m c l m &CenterDot; LSS l ( 1 ) &lsqb; i &rsqb; ,
CSS m ( 2 ) &lsqb; i &rsqb; = &Sigma; l &Element; R m c l m &CenterDot; LSS l ( 2 ) &lsqb; i &rsqb; ,
CSS m ( 3 ) &lsqb; i &rsqb; = &Sigma; l &Element; R m c l m &CenterDot; LSS l ( 3 ) &lsqb; i &rsqb; ;
Step 5: estimated parameter; Node m (m=1 ..., M) CSS that calculates according to previous step m, estimate Θ={ π i, A i, μ i, D i} i=1 ..., I, wherein, { π i, μ i} i=1 ..., Iestimation procedure as follows:
&pi; i = CSS m ( 1 ) &lsqb; i &rsqb; &Sigma; i &prime; = 1 I CSS m ( 1 ) &lsqb; i &prime; &rsqb; ,
&mu; i = CSS m ( 2 ) &lsqb; i &rsqb; CSS m ( 1 ) &lsqb; i &rsqb; ;
For { A i, D i} i=1 ..., Iestimation, process is as follows:
V i = CSS m ( 3 ) &lsqb; i &rsqb; - 2 CSS m ( 2 ) &lsqb; i &rsqb; &CenterDot; &mu; i + CSS m ( 1 ) &lsqb; i &rsqb; &CenterDot; &mu; i &CenterDot; &mu; i T CSS m ( 1 ) &lsqb; i &rsqb; ,
A i = V i g i ( g i T V i g i + &Omega; i ) - 1 ,
D i = d i a g { V i - A i ( g i T V i g i + &Omega; i ) A i T } ;
Step 6: judgement convergence; Node m (m=1 ..., M) calculate log-likelihood under current iteration:
log p ( Y m | &Theta; ) = &Sigma; n = 1 N m log ( &Sigma; n = 1 I &pi; i N ( &Sigma; i = 1 N m l o g &Sigma; i = 1 I &pi; i N ( y m , n | &mu; i , A i A i T + D i ) ) ;
If logp is (Y m| Θ)-logp (Y m| Θ old) < ε, then restrain, stop iteration; Otherwise perform step 2, start next iteration (iter=iter+1); Wherein Θ represents the parameter value that current iteration estimates, Θ oldrepresent the parameter value estimated in last iteration, that is, the log-likelihood of adjacent twice iteration is less than threshold epsilon, algorithmic statement; ε gets 10 -5~ 10 -6in arbitrary value; Because node each in network is parallel data processing, therefore allly can not to restrain in an iteration simultaneously; When node l restrains, node m not yet restrains, then node l no longer sends LSS l, also no longer receive the information of neighbor node transmission; Node m is then with the LSS that the node l received for the last time sends lupgrade its CSS m; The node of not restraining continues iteration, until all nodes are all restrained in network;
Step 7: cluster exports; After step 1-step 6, and node m (m=1 ..., M) obtain data each with it corresponding <z m, n, i> (n=1 ..., N m; I=1 ..., I), by <z m, n, i> (i=1 ..., I) in the sequence number corresponding to maximum as y m,nthe class C be finally allocated to m,n, that is:
C m , n = argmax i = 1 I < z m , n , i > ; Obtain the cluster result of all data on all nodes C = { C m , n } n = 1 , ... , N m m = 1 , ... , M .
2. in a kind of sensor network according to claim 1 based on the distributed clustering method of hybrid cytokine analytical model, it is characterized in that: described method is applied to the cluster of wine compositional data.
CN201510414218.6A 2015-07-15 2015-07-15 Distributed clustering method based on hybrid cytokine analysis model in sensor network Expired - Fee Related CN104994170B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510414218.6A CN104994170B (en) 2015-07-15 2015-07-15 Distributed clustering method based on hybrid cytokine analysis model in sensor network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510414218.6A CN104994170B (en) 2015-07-15 2015-07-15 Distributed clustering method based on hybrid cytokine analysis model in sensor network

Publications (2)

Publication Number Publication Date
CN104994170A true CN104994170A (en) 2015-10-21
CN104994170B CN104994170B (en) 2018-06-05

Family

ID=54305921

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510414218.6A Expired - Fee Related CN104994170B (en) 2015-07-15 2015-07-15 Distributed clustering method based on hybrid cytokine analysis model in sensor network

Country Status (1)

Country Link
CN (1) CN104994170B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550704A (en) * 2015-12-10 2016-05-04 南京邮电大学 Hybrid common factor analyzer-based distributed high-dimensional data classification method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101404664A (en) * 2008-11-05 2009-04-08 湖南大学 Network positioning and optimizing algorithm based on node clustering
CN102752784A (en) * 2012-06-19 2012-10-24 电子科技大学 Detection method of distribution type event domain based on graph theory in wireless sensor network
WO2013036892A1 (en) * 2011-09-08 2013-03-14 Attagene, Inc. Systems and methods for assessment of biosimilarity
CN103226595A (en) * 2013-04-17 2013-07-31 南京邮电大学 Clustering method for high dimensional data based on Bayes mixed common factor analyzer

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101404664A (en) * 2008-11-05 2009-04-08 湖南大学 Network positioning and optimizing algorithm based on node clustering
WO2013036892A1 (en) * 2011-09-08 2013-03-14 Attagene, Inc. Systems and methods for assessment of biosimilarity
CN102752784A (en) * 2012-06-19 2012-10-24 电子科技大学 Detection method of distribution type event domain based on graph theory in wireless sensor network
CN103226595A (en) * 2013-04-17 2013-07-31 南京邮电大学 Clustering method for high dimensional data based on Bayes mixed common factor analyzer

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨晔宏 等: "一种基于混合因子分析的分布估计算法", 《信息与控制》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550704A (en) * 2015-12-10 2016-05-04 南京邮电大学 Hybrid common factor analyzer-based distributed high-dimensional data classification method
CN105550704B (en) * 2015-12-10 2019-01-01 南京邮电大学 Distributed high dimensional data classification method based on mixing common factor analyzer

Also Published As

Publication number Publication date
CN104994170B (en) 2018-06-05

Similar Documents

Publication Publication Date Title
Lu et al. Integrating granger causality and vector auto-regression for traffic prediction of large-scale WLANs
CN104838597B (en) Adaptive channel reuse mechanism in communication network
CN104917820B (en) A kind of data transmission method based on the wireless interconnected interworking technology of micropower
Demirbas et al. A fault-local self-stabilizing clustering service for wireless ad hoc networks
CN102077530B (en) Check wireless traffic and slow down grouping and delete to avoid wireless saturated method and apparatus
CN103281740B (en) Interest community method for routing in a kind of opportunistic network
CN105488581B (en) A kind of transport need amount estimation method based on simulated annealing
CN105677648A (en) Community detection method and system based on label propagation algorithm
CN106162869A (en) Efficient collaboration, both localization method in mobile ad-hoc network
CN106296315A (en) Context aware systems based on user power utilization data
CN107181628A (en) Two-way wireless communication method, device and terminal
CN105915300A (en) RLNC-based back-off frequency spectrum prediction method in CR network
CN113411766A (en) Intelligent Internet of things comprehensive sensing system and method
CN113111271A (en) Travel OD data sample expansion method and device, computer equipment and storage medium
CN108156018A (en) Electric power networks equipment topology identification method, electronic equipment and computer storage media
CN107613500B (en) A kind of wireless frequency spectrum sharing method under uncertain environment
CN105337639A (en) Standard spider-web fractal networking method for low voltage power line communication
CN109600756A (en) A kind of physical area identification distribution method based on the preferential coloring algorithm of maximal degree
CN110461006A (en) Reduce WLAN interference method, device and terminal device
CN104158604B (en) A kind of distributed collaborative frequency spectrum sensing method based on average common recognition
CN104994170A (en) Distributed clustering method based on mixed factor analysis model in sensor network
CN103414532B (en) A kind of clock synchronizing method
CN106059734A (en) Massive MIMO system pilot frequency distribution method based on edge user interference measurement value
CN109033603B (en) Intelligent substation secondary system simulation method based on source flow path chain
CN103560983B (en) Training sequence design method in multi-base-station cooperative system with users as centers

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20201207

Address after: Room 214, building D5, No. 9, Kechuang Avenue, Zhongshan Science and Technology Park, Jiangbei new district, Nanjing, Jiangsu Province

Patentee after: Nanjing Tian Gu Information Technology Co.,Ltd.

Address before: Yuen Road Qixia District of Nanjing City, Jiangsu Province, No. 9 210023

Patentee before: NANJING University OF POSTS AND TELECOMMUNICATIONS

Effective date of registration: 20201207

Address after: Gulou District of Nanjing City, Jiangsu Province, Beijing Road No. 20 210024

Patentee after: STATE GRID JIANGSU ELECTRIC POWER Co.,Ltd. INFORMATION & TELECOMMUNICATION BRANCH

Address before: Room 214, building D5, No. 9, Kechuang Avenue, Zhongshan Science and Technology Park, Jiangbei new district, Nanjing, Jiangsu Province

Patentee before: Nanjing Tian Gu Information Technology Co.,Ltd.

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180605