CN104994170A

CN104994170A - Distributed clustering method based on mixed factor analysis model in sensor network

Info

Publication number: CN104994170A
Application number: CN201510414218.6A
Authority: CN
Inventors: 魏昕; 周亮; 周全; 陈建新; 王磊; 赵力
Original assignee: Nanjing Post and Telecommunication University
Current assignee: Nanjing Tian Gu Information Technology Co ltd; Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Priority date: 2015-07-15
Filing date: 2015-07-15
Publication date: 2015-10-21
Anticipated expiration: 2035-07-15
Also published as: CN104994170B

Abstract

The invention discloses a distributed clustering method based on a mixed factor analysis model in a sensor network. The distributed clustering method utilizes the mixed factor analysis model for modeling data to be clustered at nodes in the sensor network, each node calculates local sufficient statistics based on its own data, then spreads and broadcasts the local sufficient statistics to the neighbour nodes, jointly sufficient statistics can be obtained when the node receives all local sufficient statistics from the neighbour nodes, parameters in the mixed factor analysis model are estimated based on the jointly sufficient statistics, and finally the clustering is completed based on the estimated model. The distributed clustering method establishes the mixed factor analysis model so as to complete dimensionality reduction of the data while clustering, adopts the distributed clustering method, and avoids network collapse caused by a center node in the traditional centralized processing method. In the distributed clustering method, the sufficient statistics rather than data are transmitted among the nodes, thus the communication overhead is greatly saved, and private information in the data can be better protected.

Description

Based on the distributed clustering method of hybrid cytokine analytical model in sensor network

Technical field

The present invention relates to the distributed clustering method based on hybrid cytokine analytical model in a kind of sensor network, belong to the parallel and distributed process method of data and the technical field of application.

Background technology

Sensor network is made up of the microsensor node of the static in a large number or movement be deployed in monitored area, and its single-sensor node is very limited for the ability of the collection of data, storage, process and transmission.Therefore, for the data processing in sensor network, need to improve traditional data processing.At present, data processing in the sensor mainly contains two kinds of modes, centralized processing and distributed treatment.In centralized processing mode, wherein will be appointed as Centroid by certain node, the original data transmissions collected is aggregated into Centroid by other nodes, completes the process of data at Centroid place, then again result is turned back to each node.The shortcoming of which was once Centroid lost efficacy, and brought detrimental effects can to whole network.Another kind of processing mode is distributed treatment.In this approach, all node status are identical, by the communication between neighbor node and cooperation, finally complete data processing task.Compared with centralized data processing, distributed treatment can be avoided because the inefficacy adverse effect of Centroid, and the robustness of whole network is stronger.And the present invention can solve problem above well.

Summary of the invention

The object of the invention is the defect solving prior art, propose a kind of in sensor network the distributed clustering method based on hybrid cytokine analytical model, cluster refers to the process by certain method, data being divided into multiple class.The class generated by cluster is the set of one group of data object, and these objects are similar each other to the object in same bunch, different with the object in other bunches.Due in cluster, the class label belonging to data is unknown, therefore in machine learning field, is processes of a unsupervised learning to the cluster of data.Existing data clustering method is a lot, but the cluster of most hypothesis total data all completes in a processing center, and in sensor network, distributed treatment is very crucial, therefore, the method, just in order to address this problem, designs a kind of distributed clustering method based on hybrid cytokine analytical model.Its advantage mainly contains: (1) hybrid cytokine analytical model effectively can process high dimensional data; (2) by cooperation mode between design node; only transmission intermediate object program just can obtain satisfied cluster result, compared with transmission initial data mode, has both reduced the expense of communication; protect again the privacy information in data, ensure that the data security in network.

The present invention solves the technical scheme that its technical problem takes: based on the distributed clustering method of hybrid cytokine analytical model in a kind of sensor network, the method comprises the steps:

If there be M sensor node in sensor network, m node collects N _mindividual data, are expressed as wherein y _m,nrepresent n-th data at node m place, dimension is p.By hybrid cytokine analytical model (MFA), Y is described _m(m=1 ..., M) distribution, note the public same MFA of the data of all nodes.MFA is a component number is the mixed model of I; For each data y _m,n, it can be expressed as:

Y _m,n=μ _i+ A _iu _m,n+ e _{m, n, i}with probability π _i(i=1 ..., I),

Wherein, μ _ibe the p dimension mean value vector of i-th blending constituent, u _m,nfor with data y _m,nthe factor in corresponding lower dimensional space, its dimension is q (q < < p), Gaussian distributed N (u _m,n| 0, I _q), the value of q is chosen according to the size of p in particular problem, generally gets the arbitrary integer between q=p/6 ~ p/2; A _ifor the Factor load-matrix of (p × q); Error variance e _{m, n, i}gaussian distributed N (e _{m, n, i}| 0, D _i), wherein D _ifor the diagonal matrix of (p × p); Probability π _imeet so the parameter sets Θ of MFA is { π _i, A _i, μ _i, D _i} _{i=1 ..., I}.Note, for all nodes, in its MFA parameter sets to be estimated, parameters value is identical.

In addition, the data transmission range of each node is set to W, and for present node m, all nodes being less than W with its distance are its neighbor node, and the neighbor node set expression of node m is R _m.Illustrate the relation between certain each node of sensor network interior joint in Fig. 1, wherein circle represents node, if there is limit to be connected between two nodes, then represents and can communicate mutually between two nodes, transmission information.The R of the m of the empty wire frame representation node in Fig. 1 _m.In the present invention, network topology determined before Distributed Cluster is implemented, and will ensure direct between any two nodes or intercommunication after multi-hop.

After the MFA that sensor network topological of the present invention and data of description distribute establishes, then start Distributed Cluster process, as shown in Figure 2, its concrete steps comprise:

Step 1: initialization; Have M sensor node in sensor network, m node collects N _mindividual data, are expressed as wherein y _m,nrepresent n-th data at node m place, dimension is p; Network topology is determined in advance, and the neighbor node set expression of node m is R _m; By hybrid cytokine analytical model (MFA), Y is described _m(m=1 ..., M) distribution, the same MFA of data sharing of all nodes; The parameter sets of MFA is { π _i, A _i, μ _i, D _i} _{i=1 ..., I}, wherein π _ibe the weight of i-th blending constituent, A _ibe the Factor load-matrix of (p × q) of i-th blending constituent, q is the dimension of the low-dimensional factor, gets the arbitrary integer between q=p/6 ~ p/2; μ _ibe the p dimension mean value vector of i-th blending constituent, D _iit is the covariance matrix of the error of i-th blending constituent;

First, setting MFA in be mixed into mark I, be also classification number to be clustered; The initial value of each parameter in MFA is set according to I, p and q; Wherein, each Nodes

(π_{1}^{0}, ..., π_{i}^{0}, ..., π_{I}^{0}) = (1 / I, ..., 1 / I, ..., 1 / I),

random selecting from the data that this node collects, with in each element generate from standardized normal distribution N (0,1); In addition, the data amount check N that collected of each node l _lbe broadcast to its neighbor node; When certain node m receives its all neighbor node l (l ∈ R _m) broadcast come data amount check after, this node calculates weight c according to following formula _lm:

c_{l m} = \frac{N_{l}}{Σ_{l^{'} &Element; R_{m}} N_{l^{'}}};

After initialization completes, iteration count iter=1, starts iterative process;

Step 2: local calculation; At each node l place, based on the data Y that it collects _l, first calculate intermediate variable g _i, Ω _i, and <z _{l, n, i}>, (n=1 ..., N _l; I=1 ..., I):

g_{i} = {[A_{i}^{o l d} {(A_{i}^{o l d})}^{T} + D_{i}^{o l d}]}^{- 1} \cdot A_{i}^{o l d},

Ω_{i} = I_{q} - g_{i}^{T} A_{i}^{o l d},

{\overset{&OverBar;}{u}}_{l, n, i} = g_{i}^{T} (y_{l, n} - μ_{i}^{o l d}),

< z_{l, n, i} > = \frac{π_{i}^{o l d} N (y_{l, n} | μ_{i}^{o l d}, A_{i}^{o l d} {(A_{i}^{o l d})}^{T} + D_{i}^{o l d})}{Σ_{i^{'} = 1}^{I} π_{i^{'}}^{o l d} N (y_{l, n} | μ_{i^{'}}^{o l d}, A_{i^{'}}^{o l d} {(A_{i^{'}}^{o l d})}^{T} + D_{i^{'}}^{o l d})},

Wherein, for a front iteration complete after the parameter value that obtains (be the initial value of parameter first during iteration <z _{l, n, i}> represents node l place n-th data y _l,nbelong to the probability of i-th class (blending constituent);

Then, node calculate local sufficient statistic comprise:

\begin{matrix} {LSS}_{l}^{(1)} [i] = Σ_{n = 1}^{N_{l}} < z_{l, n, i} >, & {LSS}_{l}^{(2)} [i] = Σ_{n = 1}^{N_{l}} < z_{l, n, i} > \cdot y_{l, n}, & {LSS}_{l}^{(3)} [i] = Σ_{n = 1}^{N_{l}} < z_{l, n, i} > \cdot y_{l, n} \cdot y_{l, n}^{T}; \end{matrix}

Step 3: broadcast diffusion; The local sufficient statistic LSS that each node l in sensor network will calculate _lbroadcast diffusion is to its neighbor node;

Step 4: combined calculation; When node m (m=1 ..., M) receive from its all neighbor node l (l ∈ R _m) LSS _lafter, node m calculates associating sufficient statistic

{CSS}_{m} = {{CSS}_{m}^{(1)} [i], {CSS}_{m}^{(2)} [i], {CSS}_{m}^{(3)} [i]}_{i = 1, ..., I} :

{CSS}_{m}^{(1)} [i] = \underset{l &Element; R_{m}}{Σ} c_{l m} \cdot {LSS}_{l}^{(1)} [i],

{CSS}_{m}^{(2)} [i] = \underset{l &Element; R_{m}}{Σ} c_{l m} \cdot {LSS}_{l}^{(2)} [i],

{CSS}_{m}^{(3)} [i] = \underset{l &Element; R_{m}}{Σ} c_{l m} \cdot {LSS}_{l}^{(3)} [i];

Step 5: estimated parameter; Node m (m=1 ..., M) CSS that calculates according to previous step _m, estimate Θ={ π _i, A _i, μ _i, D _i} _{i=1 ..., I}, wherein, { π _i, μ _i} _{i=1 ..., I}estimation procedure as follows:

π_{i} = \frac{{CSS}_{m}^{(1)} [i]}{Σ_{i^{'} = 1}^{I} {CSS}_{m}^{(1)} [i^{'}]},

μ_{i} = \frac{{CSS}_{m}^{(2)} [i]}{{CSS}_{m}^{(1)} [i]};

For { A _i, D _i} _{i=1 ..., I}estimation, process is as follows:

V_{i} = \frac{{CSS}_{m}^{(3)} [i] - 2 {CSS}_{m}^{(2)} [i] \cdot μ_{i} + {CSS}_{m}^{(1)} [i] \cdot μ_{i} \cdot μ_{i}^{T}}{{CSS}_{m}^{(1)} [i]},

A_{i} = V_{i} g_{i} {(g_{i}^{T} V_{i} g_{i} + Ω_{i})}^{- 1},

D_{i} = d i a g {V_{i} - A_{i} (g_{i}^{T} V_{i} g_{i} + Ω_{i}) A_{i}^{T}};

Step 6: judgement convergence; Node m (m=1 ..., M) calculate log-likelihood under current iteration:

\log p (Y_{m} | Θ) = Σ_{n = 1}^{N_{m}} \log (Σ_{i = 1}^{I} π_{i} N (y_{m, n} | μ_{i}, A_{i} A_{i}^{T} + D_{i}));

If logp is (Y _m| Θ)-logp (Y _m| Θ ^old) < ε, then restrain, stop iteration; Otherwise perform step 2, start next iteration (iter=iter+1); Wherein Θ represents the parameter value that current iteration estimates, Θ ^oldrepresent the parameter value estimated in last iteration, that is, the log-likelihood of adjacent twice iteration is less than threshold epsilon, algorithmic statement; ε gets 10 ^-5~ 10 ^-6in arbitrary value; Because node each in network is parallel data processing, therefore allly can not to restrain in an iteration simultaneously; When node l restrains, node m not yet restrains, then node l no longer sends LSS _l, also no longer receive the information of neighbor node transmission; Node m is then with the LSS that the node l received for the last time sends _lupgrade its CSS _m; The node of not restraining continues iteration, until all nodes are all restrained in network;

Step 7: cluster exports; After step 1-step 6, and node m (m=1 ..., M) obtain data each with it corresponding <z _{m, n, i}> (n=1 ..., N _m; I=1 ..., I), by <z _{m, n, i}> (i=1 ..., I) in the sequence number corresponding to maximum as y _m,nthe class C be finally allocated to _m,n, that is:

C_{m, n} = {argmax}_{i = 1}^{I} < z_{m, n, i} >;

Obtain the cluster result of all data on all nodes in such a way

The present invention's hybrid cytokine analytical model carrys out the data that in modeling sensor network, each Nodes is to be clustered, each node calculates local sufficient statistic based on its data, then the diffusion of this amount is broadcast to its neighbor node, after node receives all local sufficient statistics from neighbor node, it can obtain associating sufficient statistic, and estimate the parameters in hybrid cytokine analytical model based on this statistic, finally complete cluster based on the model estimated.The hybrid cytokine analytical model that the present invention sets up can complete the dimensionality reduction of data while cluster; and adopt Distributed Cluster mode; avoid the periods of network disruption brought by Centroid in traditional centralized processing mode; in addition; in distributed clustering method of the present invention, each inter-node transmission be sufficient statistic instead of data, both greatly saved communication overhead; again can privacy information preferably in protected data, make to adopt the security of system of the method greatly to increase.

Beneficial effect:

1. the hybrid cytokine analyzer adopted in the present invention can carry out dimensionality reduction to high dimensional data, thus completes cluster smoothly while dimensionality reduction, obtains better clustering performance.

2. the distributed clustering method based on hybrid cytokine analytical model adopted in the present invention, make each node in sensor network can make full use of the information comprised in the data of other node, clustering performance is better than centralized approach.

3. the distributed clustering method based on hybrid cytokine analytical model adopted in the present invention; in node cooperation process; exchange local sufficient statistic instead of directly transmit initial data; because the quantity of local sufficient statistic and dimension are much smaller than data; therefore this mode saves the expense of communication on the one hand; on the other hand, be conducive to the privacy information adequately protected in data, improve the security performance of the system adopting this method.

Accompanying drawing explanation

Fig. 1 is the neighbor node collection R of sensor network interior joint m of the present invention _m, and the schematic diagram of transmitting-receiving local sufficient statistic (that is: LSS) between node.

Fig. 2 is the flow chart based on the distributed clustering method of hybrid cytokine analytical model in the sensor network that the present invention relates to.

Fig. 3 is the data clusters result schematic diagram of each Nodes in embodiments of the invention.

Embodiment

Below in conjunction with Figure of description, the invention is described in further detail.

In order to the distributed clustering method based on hybrid cytokine analytical model in the sensor network that the present invention relates to is described better, be applied to the cluster of wine compositional data.In some countries, in zones of different, be distributed with some measuring stations, for detecting each component content in wine.The kind delivering to the wine of measuring station is different.Therefore need to carry out cluster to other wine of Similarity Class.If this measuring station can form Sensor Network with other measuring stations, by cooperating with each other, the data of wine in other measuring stations can be made full use of, thus improve cluster accuracy.Here, wine to be clustered, data from UCI machine learning databases, has 178 data here, altogether from 3 classes.The dimension of each data is 13, represents the content of each composition in wine.Have 8 nodes in sensor network, the average neighbor node number of each node is 2, and network is (all there is the path directly or indirectly arrived between any two nodes) that can be communicated with.Therefore in the present example, M=8, p=13, I=3, q=3.In addition, the data bulk at the place of each node: N ₁=21, N ₂=22, N ₃=21, N ₄=21, N ₅=22, N ₆=22, N ₇=21, N ₈=28; The neighbor node of each node: R ₁={ 3,5,6}, R ₂={ 3,5}, R ₃={ Isosorbide-5-Nitrae, 2}, R ₄={ 3}, R ₅={ 1,2}, R ₆={ 1,7}, R ₇={ 6,8}, R ₈={ 3,7}.

According to the flow process of summary of the invention (shown in Fig. 2), start Distributed Cluster:

(1) initialization: the initial value of parameter in setting MFA.Wherein, each Nodes

(π_{1}^{0}, ..., π_{i}^{0}, ..., π_{I}^{0}) = (1 / I, ..., 1 / I, ..., 1 / I),

random selecting from the data of node, with in each element generate from standardized normal distribution N (0,1).In addition, and each node l (l=1 ..., M) the data amount check N that collected _lbe broadcast to its neighbor node.After certain node m receives its data amount check of all neighbor nodes broadcast, this node calculate weight c _lm:

c_{l m} = \frac{N_{l}}{Σ_{l^{'} &Element; R_{m}} N_{l^{'}}}

The implication of this weight is each neighbor node l (the l ∈ R for weighing node m _m) each importance of information at node m place transmitted.After initialization completes, iteration count iter=1, starts iterative process.

(2) local calculation: this step does not need the information of neighbor node.At each node l place, based on the data Y that it collects _l, first calculate g _i, Ω _i, and <z _{l, n, i}>, (n=1 ..., N _l; I=1 ..., I):

g_{i} = {[A_{i}^{o l d} {(A_{i}^{o l d})}^{T} + D_{i}^{o l d}]}^{- 1} \cdot A_{i}^{o l d}

Ω_{i} = I_{q} - g_{i}^{T} A_{i}^{o l d},

{\overset{&OverBar;}{u}}_{l, n, i} = g_{i}^{T} (y_{l, n} - μ_{i}^{o l d}),

< z_{l, n, i} > = \frac{π_{i}^{o l d} N (y_{l, n} | μ_{i}^{o l d}, A_{i}^{o l d} {(A_{i}^{o l d})}^{T} + D_{i}^{o l d})}{Σ_{i^{'} = 1}^{I} π_{i^{'}}^{o l d} N (y_{l, n} | μ_{i^{'}}^{o l d}, A_{i^{'}}^{o l d} {(A_{i^{'}}^{o l d})}^{T} + D_{i^{'}}^{o l d})},

Wherein for a front iteration complete after the parameter value that obtains (be the initial value of parameter first during iteration <z _{l, n, i}> represents node l place n-th data y _l,nbelong to the probability of i-th class (blending constituent).

Secondly, node calculate local sufficient statistic as follows:

\begin{matrix} {LSS}_{l}^{(1)} [i] = Σ_{n = 1}^{N_{l}} < z_{l, n, i} >, & {LSS}_{l}^{(2)} [i] = Σ_{n = 1}^{N_{l}} < z_{l, n, i} > \cdot y_{l, n}, & {LSS}_{l}^{(3)} [i] = Σ_{n = 1}^{N_{l}} < z_{l, n, i} > \cdot y_{l, n} \cdot y_{l, n}^{T} \end{matrix}

(3) broadcast diffusion: the local statistic LSS that each node l in sensor network will calculate _lbroadcast is spread to its neighbor node, as shown in Figure 1.

(4) combined calculation: when node m (m=1 ..., M) receive from its all neighbor node l (l ∈ R _m) LSS _lafter, node m calculates associating sufficient statistic

{CSS}_{m} = {{CSS}_{m}^{(1)} [i], {CSS}_{m}^{(2)} [i], {CSS}_{m}^{(3)} [i]}_{i = 1, ..., I} :

{CSS}_{m}^{(1)} [i] = \underset{l &Element; R_{m}}{Σ} c_{l m} \cdot {LSS}_{l}^{(1)} [i],

{CSS}_{m}^{(2)} [i] = \underset{l &Element; R_{m}}{Σ} c_{l m} \cdot {LSS}_{l}^{(2)} [i],

{CSS}_{m}^{(3)} [i] = \underset{l &Element; R_{m}}{Σ} c_{l m} \cdot {LSS}_{l}^{(3)} [i] .

(5) estimated parameter: node m (m=1 ..., M) CSS that calculates according to previous step _m, estimate Θ={ π _i, A _i, μ _i, D _i} _{i=1 ..., I}, wherein, { π _i, μ _i} _{i=1 ..., I}estimation procedure as follows:

π_{i} = \frac{{CSS}_{m}^{(1)} [i]}{Σ_{i^{'} = 1}^{I} {CSS}_{m}^{(1)} [i^{'}]},

μ_{i} = \frac{{CSS}_{m}^{(2)} [i]}{{CSS}_{m}^{(1)} [i]},

For { A _i, D _i} _{i=1 ..., I}estimation, process is as follows:

V_{i} = \frac{{CSS}_{m}^{(3)} [i] - 2 {CSS}_{m}^{(2)} [i] \cdot μ_{i} + {CSS}_{m}^{(1)} [i] \cdot μ_{i} \cdot μ_{i}^{T}}{{CSS}_{m}^{(1)} [i]},

A_{i} = V_{i} g_{i} {(g_{i}^{T} V_{i} g_{i} + Ω_{i})}^{- 1},

D_{i} = d i a g {V_{i} - A_{i} (g_{i}^{T} V_{i} g_{i} + Ω_{i}) A_{i}^{T}} .

(6) judgement convergence: node m (m=1 ..., M) calculate log-likelihood under current iteration:

\log p (Y_{m} | Θ^{i t e r}) = Σ_{n = 1}^{N_{m}} \log (Σ_{i = 1}^{I} π_{i} N (y_{m, n} | μ_{i}, A_{i} A_{i}^{T} + D_{i})),

If logp is (Y _m| Θ)-logp (Y _m| Θ ^old) < ε, then restrain, stop iteration; Otherwise perform step (2), start next iteration (iter=iter+1); Wherein Θ represents the parameter value that current iteration estimates, Θ ^oldrepresent the parameter value estimated in last iteration, that is, the log-likelihood of adjacent twice iteration is less than threshold epsilon, algorithmic statement; ε gets 10 ^-5~ 10 ^-6in arbitrary value; It should be noted that because node each in network is parallel data processing, therefore allly can not to restrain in an iteration simultaneously; Such as, when when node l restrains, node m not yet restrains, then node l no longer sends LSS _l, also no longer receive the information of neighbor node transmission; Node m is then with the LSS that the node l received for the last time sends _lupgrade its CSS _m; The node of not restraining continues iteration, until all nodes are all restrained in network.

(7) cluster exports.After step (1)-(6), and node m (m=1 ..., M), obtain data { y each with it _m,n} _{n=1 ..., Nm}corresponding <z _{m, n, i}> (n=1 ..., N _m; I=1 ..., I), by <z _{m, n, i}>, i=1 ..., the sequence number corresponding to the maximum in I is as y _m,nthe class C be finally allocated to _m,n, that is:

C_{m, n} = {j = \arg \max_{i} < z_{m, n, i} >};

Obtain the cluster result of all data on all nodes in such a way

Performance evaluation:

By the result adopting clustering method involved in the present invention to obtain compare with correct generic result, thus can evaluate and weigh out validity and the accuracy of method involved in the present invention.As shown in Figure 3, the abscissa of this figure represents 178 data to the cluster result of each Nodes, and these data of the positional representation of non-vacancy have been assigned to that node, and ordinate represents the classification sequence number (totally 3 classes) that these data are assigned to.In the figure, " o " represents the data of correct cluster, and " x " represents the data of wrong cluster.From Fig. 3, the cluster accuracy of 8 Nodes is: 100%, 100%, 95.2%, 95.5%, 100%, 95.5%, 100%, 92.9%.Altogether only have five data by the cluster of mistake, the average accuracy of whole network is 97.2%.Compare with the result (98%) adopting the method for localized transmission to obtain, its accuracy is substantially identical.And the shortcoming of localized transmission is fairly obvious, one, once Centroid lost efficacy, then whole periods of network disruption; Its two, each node directly by original data transmissions to Centroid, not only increase the communications burden in network, and the privacy easily in leak data.Therefore, adopt method of the present invention to overcome above shortcoming, obtain good Distributed Cluster performance.

The scope of request protection of the present invention is not limited only to the description of this embodiment.

Claims

1. in sensor network based on the distributed clustering method of hybrid cytokine analytical model, it is characterized in that, described method comprises the steps:

(π_{1}^{0}, ..., π_{i}^{0}, ..., π_{I}^{0}) = (1 / I, ..., 1 / I, ..., 1 / I),

c_{l m} = \frac{N_{l}}{Σ_{l^{'} &Element; R_{m}} N_{l^{'}}};

g_{i} = {[A_{i}^{o l d} {(A_{i}^{o l d})}^{T} + D_{i}^{o l d}]}^{- 1} \cdot A_{i}^{o l d},

Ω_{i} = I_{q} - g_{i}^{T} A_{i}^{o l d},

{\overset{&OverBar;}{u}}_{l, n, i} = g_{i}^{T} (y_{l, n} - μ_{i}^{o 1 d}),

< z_{l, n, i} > = \frac{π_{i}^{o l d} N (y_{l, n} | μ_{i}^{o l d}, A_{i}^{o 1 d} {(A_{i}^{o 1 d})}^{T} + D_{i}^{o l d})}{Σ_{i^{'} = 1}^{I} π_{i^{'}}^{o l d} N (y_{l, n} | μ_{i^{'}}^{o l d}, A_{i^{'}}^{o l d} {(A_{i^{'}}^{o l d})}^{T} + D_{i^{'}}^{o l d})},

Wherein, for a front iteration complete after the parameter value that obtains (be the initial value of parameter first during iteration <z _{l, n, i}< represents node l place n-th data y _l,nbelong to the probability of i-th class (blending constituent);

Then, node calculate local sufficient statistic

{LSS}_{l} = {{LSS}_{l}^{(1)} [i], {LSS}_{l}^{(2)} [i], {LSS}_{l}^{(3)} [i]}_{i = 1, ..., I},

Comprise:

{LSS}_{l}^{(1)} [i] = Σ_{n = 1}^{N_{l}} < z_{l, n, i} >,

{LSS}_{l}^{(2)} [i] = Σ_{n = 1}^{N_{l}} < z_{l, n, i} > \cdot y_{l, n},

{LSS}_{l}^{(3)} [i] = Σ_{n = 1}^{N_{l}} < z_{l, n, i} > \cdot y_{l, n} \cdot y_{l, n}^{T};

{CSS}_{m} = {{CSS}_{m}^{(1)} [i], {CSS}_{m}^{(2)} [i], {CSS}_{m}^{(3)} [i]}_{i = 1, ..., I} :

{CSS}_{m}^{(1)} [i] = \underset{l &Element; R_{m}}{Σ} c_{l m} \cdot {LSS}_{l}^{(1)} [i],

{CSS}_{m}^{(2)} [i] = \underset{l &Element; R_{m}}{Σ} c_{l m} \cdot {LSS}_{l}^{(2)} [i],

{CSS}_{m}^{(3)} [i] = \underset{l &Element; R_{m}}{Σ} c_{l m} \cdot {LSS}_{l}^{(3)} [i];

π_{i} = \frac{{CSS}_{m}^{(1)} [i]}{Σ_{i^{'} = 1}^{I} {CSS}_{m}^{(1)} [i^{'}]},

μ_{i} = \frac{{CSS}_{m}^{(2)} [i]}{{CSS}_{m}^{(1)} [i]};

For { A _i, D _i} _{i=1 ..., I}estimation, process is as follows:

V_{i} = \frac{{CSS}_{m}^{(3)} [i] - 2 {CSS}_{m}^{(2)} [i] \cdot μ_{i} + {CSS}_{m}^{(1)} [i] \cdot μ_{i} \cdot μ_{i}^{T}}{{CSS}_{m}^{(1)} [i]},

A_{i} = V_{i} g_{i} {(g_{i}^{T} V_{i} g_{i} + Ω_{i})}^{- 1},

D_{i} = d i a g {V_{i} - A_{i} (g_{i}^{T} V_{i} g_{i} + Ω_{i}) A_{i}^{T}};

\log p (Y_{m} | Θ) = Σ_{n = 1}^{N_{m}} \log (Σ_{n = 1}^{I} π_{i} N (Σ_{i = 1}^{N_{m}} l o g Σ_{i = 1}^{I} π_{i} N (y_{m, n} | μ_{i}, A_{i} A_{i}^{T} + D_{i}));

C_{m, n} = {argmax}_{i = 1}^{I} < z_{m, n, i} >;

Obtain the cluster result of all data on all nodes

C = {C_{m, n}}_{n = 1, ..., N_{m}}^{m = 1, ..., M} .

2. in a kind of sensor network according to claim 1 based on the distributed clustering method of hybrid cytokine analytical model, it is characterized in that: described method is applied to the cluster of wine compositional data.