CN104992188A

CN104992188A - Distributed handwritten digit recognition method based on t mixed factor analysis

Info

Publication number: CN104992188A
Application number: CN201510415750.XA
Authority: CN
Inventors: 魏昕; 周亮; 周全; 陈建新; 王磊; 赵力
Original assignee: Nanjing Post and Telecommunication University
Current assignee: Nanjing Tian Gu Information Technology Co ltd; Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Priority date: 2015-07-15
Filing date: 2015-07-15
Publication date: 2015-10-21
Anticipated expiration: 2035-07-15
Also published as: CN104992188B

Abstract

The invention discloses a distributed handwritten digit recognition method based on t mixed factor analysis. The distributed handwritten digit recognition method comprises the steps that: features of acquired handwritten digits are extracted at each node; feature data corresponding to each digit used for training are initialized; each node calculates local statistics based on its training data and broadcasts the local statistics to its neighbour nodes; each node calculates joint statistics according to the received local statistics from all neighbour nodes, and estimates parameters in t mixed factor analysis on the basis of the joint statistics; and the distributed training process is completed. In the distributed recognition state, the data for test can be input to any mode, a log likelihood value of tMFA corresponding to each trained digit is calculated, and the digit corresponding to the maximum log likelihood value is taken as the recognition result. By adopting the tMFA, the distributed handwritten digit recognition method has high robustness for off-group points in the data, adopts the distributed training and recognizing method, and avoids network collapse caused by a center node.

Description

A kind of distributed Handwritten Digit Recognition method analyzed based on t hybrid cytokine

Technical field

The present invention relates to a kind of distributed Handwritten Digit Recognition method analyzed based on t hybrid cytokine, belong to the parallel and distributed process method of data and the technical field of application.

Background technology

At present, Handwritten Digit Recognition be in pattern-recognition category one have a challenging problem, main research be enable computing machine independently recognize the numeral that staff is write.Handwritten Digit Recognition has application in a lot, as: the many-sides such as extensive data statistics, finance, the tax, finance and sorting mail are all widely used.There is existing accomplished in many ways Handwritten Digit Recognition at present.But when data volume is larger, single computing machine cannot process fast and effectively, therefore needs handwritten numeral data to be divided into multiple part, stores on a different computer respectively, by the collaboration method between designing a calculating machine, thus achieve the distributed treatment of data.

In pattern-recognition, machine learning field, Handwritten Digit Recognition belongs to the category of supervised learning.In the training stage, based on markd sample, set up and train the model of the distribution that can describe handwritten numeral characteristic, at ensuing cognitive phase, numerical characteristic data to be identified and all kinds of characteristic model of representative trained are compared, thus find most suitable model, as final recognition result.Needing to carry out in the scene of distributed treatment, in the training stage, how by the cooperation between computer node, making each computer node can make full use of the data of other each computer nodes, thus estimating consistent model; At cognitive phase, whether handwritten numeral data to be identified can be inputted any one node, correct recognition result can be obtained, very crucial.The method that the present invention proposes can solve the problem effectively, obtains good recognition correct rate.

Summary of the invention

The object of the invention is the defect solving prior art, propose a kind of in sensor network the distributed clustering method based on hybrid cytokine analytical model.

The present invention solves the technical scheme that its technical matters takes: a kind of distributed Handwritten Digit Recognition method analyzed based on t hybrid cytokine, the method comprises the steps:

Step 1: the collection of data and feature extraction;

Be provided with M platform computing machine/computing node (that is: node), form a network, m node collects from numeral 0 ~ 9 by the handwriting pad be attached thereto, the raw data of totally 10 classes, handwriting pad records the two-dimensional coordinate position of each point on each character writing track automatically, gets the coordinate of 8 points in the first-class compartment of terrain of track as the characteristic s corresponding to each raw data, totally 16 dimensions.In order to represent convenient, if the collection of node m place the training dataset of the digital d obtained through feature extraction are wherein represent node m place, for n-th characteristic of handwritten numeral d of training, dimension is p, for the training data number of digital d.

Describe by t hybrid cytokine analytical model (tMFA) distribution, note, all Nodes carry out modeling about the public same tMFA of training data of digital d.TMFA is a component number is the mixture model of I; For each data it can be expressed as:

s_{m, n}^{(d)} = μ_{i} + A_{i} u_{m, n}^{(d)} + e_{m, n, i}^{(d)}

With probability π _i(i=1 ..., I),

Wherein, μ _ibe i-th blending constituent mean value vector, dimension is p; for with the factor in corresponding lower dimensional space, its dimension is q (q < < p), obeys t distribution the value of q is chosen according to the size of p in particular problem, generally gets the arbitrary integer between q=p/5 ~ p/3; A _iit is the Factor load-matrix of (p × q) of i-th blending constituent; Error variance obey t distribution wherein D _ifor the diagonal matrix of (p × p), ν _iit is the degree of freedom of i-th blending constituent; The weight π of each blending constituent _imeet so the parameter sets Θ of tMFA is { π _i, A _i, μ _i, D _i, ν _i} _{i=1 ..., I}.Note, for all nodes, in its tMFA parameter sets to be estimated, parameters value is identical.Here it should be noted that t distribution can be launched into the integration of Gaussian distribution and Gamma distribution:

t (u_{m, n}^{(d)} | 0, I_{q}, ν_{i}) = &Integral; N (u_{m, n}^{(d)} | 0, I_{q} / w_{m, n, i}^{(d)}) \cdot G a m m a (w_{m, n, i}^{(d)} | ν_{i} / 2, ν_{i} / 2) {dw}_{m, n, i}^{(d)}

t (e_{m, n, i}^{(d)} | 0, I_{q}, ν_{i}) = &Integral; N (e_{m, n, i}^{(d)} | 0, D_{i} / w_{m, n, i}^{(d)}) \cdot G a m m a (w_{m, n, i}^{(d)} | ν_{i} / 2, ν_{i} / 2) {dw}_{m, n, i}^{(d)}

Wherein, be with corresponding integration hidden variable.

In addition, the data transmission range of each node is set to Dis, and for present node m, all nodes being less than Dis with its distance are its neighbor node, and the neighbor node set expression of node m is R _m.Illustrate the relation between each node in certain network in Fig. 1, wherein computer icon representation node, if there is limit to be connected between two nodes, then represents and can communicate mutually between two nodes, transmission information.The R of the m of the empty wire frame representation node in Fig. 1 _m.In the present invention, network topology determines in advance, only needs to ensure at least to exist between any two nodes one directly or the path that can arrive through multi-hop.

Step 2: distributed training, will for distributed training, obtain the tMFA parameter sets of every class numeral corresponding to d

Θ^{(d)} = {π_{i}^{(d)}, A_{i}^{(d)}, μ_{i}^{(d)}, D_{i}^{(d)}, ν_{i}^{(d)}}_{i = 1, ..., I}, (d = 0, ..., 9);

After the tMFA that network topology and data of description distribute establishes, then start distributed training.Here be trained for example with digital d, as shown in Figure 2, its concrete steps are as follows for training process:

Step 2-1: initialization; Setting tMFA in be mixed into mark I.Here I determines the complexity of tMFA model, and I can get the arbitrary integer in 3 ~ 8, gets I=5 and can obtain good performance in Handwritten Digit Recognition.The initial value of parameter in MFA is set according to the dimension p of I and data.Wherein, each Nodes

(π_{1}^{0}, ..., π_{i}^{0}, ..., π_{I}^{0}) = (1 / I, ..., 1 / I, ..., 1 / I); {μ_{1}^{0}, ..., μ_{i}^{0}, ..., μ_{I}^{0}}

Random selecting from the data that this node collects; with in each element generate from standardized normal distribution N (0,1); this group parameter gets the arbitrary integer between 1 ~ 5.In addition, and each node l (l=1 ..., M) data amount check that collected be broadcast to its neighbor node.After certain node m receives its data amount check of all neighbor nodes broadcast, this node calculate weight c _lm:

c_{l m} = \frac{N_{l}^{(d)}}{Σ_{l^{'} &Element; R_{m}} N_{l^{'}}^{(d)}},

The implication of this weight is each neighbor node l (the l ∈ R for weighing node m _m) each importance of information at node m place transmitted.After initialization completes, iteration count iter=1, starts iterative process.

Step 2-2: calculate local statistic and broadcast; This step does not need the information of neighbor node.At each node l place, based on the data that it collects first g is calculated _i, Ω _i, with

g_{i} = {[A_{i}^{o l d} {(A_{i}^{o l d})}^{T} + D_{i}^{o l d}]}^{- 1} \cdot A_{i}^{o l d},

Ω_{i} = I_{q} - g_{i}^{T} A_{i}^{o l d},

{\overset{&OverBar;}{u}}_{l, n, i}^{(d)} = g_{i}^{T} (s_{l, n}^{(d)} - μ_{i}^{o l d}),

W_{l, n, i}^{(d)} = {(s_{l, n}^{(d)} - μ_{i}^{o l d})}^{T} \cdot [A_{i}^{o l d} {(A_{i}^{o l d})}^{T} + D_{i}^{o l d}] \cdot (s_{l, n}^{(d)} - μ_{i}^{o l d})

< z_{l, n, i}^{(d)} > = \frac{π_{i}^{o l d} \cdot t (s_{l, n}^{(d)} | μ_{i}^{o l d}, A_{i}^{o l d} {(A_{i}^{o l d})}^{T} + D_{i}^{o l d}, ν_{i}^{o l d})}{Σ_{i^{'} = 1}^{I} π_{i^{'}}^{o l d} \cdot t (s_{l, n}^{(d)} | μ_{i^{'}}^{o l d}, A_{i^{'}}^{o l d} {(A_{i^{'}}^{o l d})}^{T} + D_{i^{'}}^{o l d}, ν_{i^{'}}^{o l d})},

< w_{l, n, i}^{(d)} > = \frac{ν_{i}^{o l d} + p}{ν_{i}^{o l d} + W_{l, n, i}^{(d)}},

Wherein for a front iteration complete after obtain parameter value (that is: first iteration time be the initial value of parameter represent node l place n-th data belong to the probability of i-th class (that is: blending constituent), for expectation value.

Had the result of calculation of above-mentioned variable, node calculate goes out local statistic (LS)

{LS}_{l} = {{LS}_{l}^{(1)} [i], {LS}_{l}^{(2)} [i], {LS}_{l}^{(3)} [i], {LS}_{l}^{(4)} [i], {LS}_{l}^{(5)} [i]}_{i = 1, ..., I},

As follows:

{LS}_{l}^{(1)} [i] = Σ_{n = 1}^{N_{l}} < z_{l, n, i}^{(d)} >,

{LS}_{l}^{(2)} [i] = Σ_{n = 1}^{N_{l}} < z_{l, n, i}^{(d)} > \cdot < w_{l, n, i}^{(d)} >,

{LS}_{l}^{(3)} [i] = Σ_{n = 1}^{N_{l}} < z_{l, n, i}^{(d)} > \cdot < w_{l, n, i}^{(d)} > \cdot s_{l, n}^{(d)},

{LS}_{l}^{(4)} [i] = Σ_{n = 1}^{N_{l}} < z_{l, n, i}^{(d)} > \cdot < w_{l, n, i}^{(d)} > \cdot s_{l, n}^{(d)} \cdot {(s_{l, n}^{(d)})}^{T},

{LS}_{l}^{(5)} [i] = Σ_{n = 1}^{N_{l}} < z_{l, n, i}^{(d)} > \cdot l o g < w_{l, n, i}^{(d)} > .

Finally, the local statistic LS that will calculate of each node l _lbroadcast is spread to its neighbor node, as shown in Figure 1.

Step 2-3: calculate associating statistic; When node m (m=1 ..., M) receive from its all neighbor node l (l ∈ R _m), node m calculates associating statistic

{CS}_{m} = {{CS}_{m}^{(1)} [i], {CS}_{m}^{(2)} [i], {CS}_{m}^{(3)} [i], {CS}_{m}^{(4)} [i], {CS}_{m}^{(5)} [i]}_{i = 1, ..., I} :

{CS}_{m}^{(1)} [i] = \underset{l &Element; R_{m}}{Σ} c_{l m} \cdot {LS}_{l}^{(1)} [i],

{CS}_{m}^{(2)} [i] = \underset{l &Element; R_{m}}{Σ} c_{l m} \cdot {LS}_{l}^{(2)} [i],

{CS}_{m}^{(3)} [i] = \underset{l &Element; R_{m}}{Σ} c_{l m} \cdot {LS}_{l}^{(3)} [i],

{CS}_{m}^{(4)} [i] = \underset{l &Element; R_{m}}{Σ} c_{l m} \cdot {LS}_{l}^{(4)} [i],

{CS}_{m}^{(5)} [i] = \underset{l &Element; R_{m}}{Σ} c_{l m} \cdot {LS}_{l}^{(5)} [i] .

Step 2-4: each parameter in estimation model; Node m (m=1 ..., M) CS that calculates according to previous step _m, estimate each parameter Θ={ π _i, A _i, μ _i, D _i, ν _i} _{i=1 ..., I}, wherein, { π _i, μ _i} _{i=1 ..., I}estimation procedure as follows:

π_{i} = \frac{{CS}_{m}^{(1)} [i]}{Σ_{i^{'} = 1}^{I} {CS}_{m}^{(1)} [i^{'}]},

μ_{i} = \frac{{CS}_{m}^{(3)} [i]}{{CS}_{m}^{(2)} [i]};

For { A _i, D _i} _{i=1 ..., I}estimation, process is as follows:

V_{i} = \frac{{CS}_{m}^{(4)} [i] - 2 {CS}_{m}^{(3)} [i] \cdot μ_{i}^{T} + {CS}_{m}^{(2)} [i] \cdot μ_{i} \cdot μ_{i}^{T}}{{CS}_{m}^{(1)} [i]},

A_{i} = V_{i} g_{i} {(g_{i}^{T} V_{i} g_{i} + Ω_{i})}^{- 1},

D_{i} = d i a g {V_{i} - A_{i} (g_{i}^{T} V_{i} g_{i} + Ω_{i}) A_{i}^{T}};

In addition, for { ν _i} _{i=1 ..., I}, obtained by solution equation below:

\log (\frac{ν_{i}}{2}) - ψ (\frac{ν_{i}}{2}) + 1 - \frac{{CS}_{m}^{(5)} [i] - {CS}_{m}^{(2)} [i]}{{CS}_{m}^{(1)} [i]} - \log (\frac{ν_{i}^{o l d} + p}{2}) + ψ (\frac{ν_{i}^{o l d} + p}{2}) = 0,

Wherein ψ () the digamma function that is standard, specifically solves employing Newton method.

Step 2-5: judge whether convergence; Node m (m=1 ..., M) calculate log-likelihood under current iteration iter:

\log p (S_{m}^{(d)} | Θ) = Σ_{n = 1}^{N_{m}} l o g (Σ_{i = 1}^{I} π_{i} \cdot t (s_{m, n}^{(d)} | μ_{i}, A_{i} A_{i}^{T} + D_{i}, ν_{i})),

If then algorithm convergence, stops iteration; Otherwise perform step 2-2, start next iteration (iter=iter+1; ).Wherein Θ represents the parameter value that current iteration estimates, Θ ^oldrepresent the parameter value estimated in last iteration.That is, the log-likelihood of adjacent twice iteration is less than threshold epsilon, algorithm convergence.ε gets 10 ^-5~ 10 ^-6in arbitrary value.It should be noted that because node each in network is parallel data processing, therefore allly can not to restrain in an iteration simultaneously.Such as, when when node l restrains, node m not yet restrains, then node l no longer sends LS _l, also no longer receive the information of neighbor node transmission.Node m is then with the LS that the node l received for the last time sends _lupgrade its CS _m.The node of not restraining continues iteration, until all nodes are all restrained in network.

After above-mentioned steps 2-1 ~ step 2-5, the corresponding model tMFA (that is: representing with parameter Θ during training convergence) obtained by the training data of handwritten numeral d.Repeat above-mentioned steps 10 times, thus obtain 10 numerals tMFA model corresponding separately, in order to represent convenient, and being distinguished, using

Θ^{(d)} = {π_{i}^{(d)}, A_{i}^{(d)}, μ_{i}^{(d)}, D_{i}^{(d)}, ν_{i}^{(d)}}_{i = 1, ..., I}, (d = 0, 1, ..., 9)

The tMFA model that representative digit d is corresponding.Distributed training completes.

Step 3: Distributed identification; When the arbitrary computer acquisition in network to new for identify handwritten numeral time, first obtain its character pair by step (1), be expressed as s', then calculate about Θ ^(d)(d=0,1 ..., 9) log-likelihood logp (s'| Θ ^(d)) (d=0,1 ..., 9):

\log p (s^{'} | Θ^{(d)}) = l o g (Σ_{i = 1}^{I} π_{i}^{(d)} \cdot t (s^{'} | μ_{i}^{(d)}, A_{i}^{(d)} {(A_{i}^{(d)})}^{T} + D_{i}^{(d)}, ν_{i}^{(d)}));

Using the recognition result d' of sequence number corresponding for max log likelihood value as s':

d^{'} = \arg \max_{d = 0}^{9} \log p (s^{'} | Θ^{(d)}) .

The idiographic flow of Distributed identification method of the present invention as shown in Figure 2.

Beneficial effect:

1. the t hybrid cytokine analysis adopted in the present invention, to the outlier existed in data, there is higher robustness, and high dimensional data can be described better, thus better obtain the model corresponding with data, thus also can obtain better training and recognition performance.

2. the distributed training process based on t hybrid cytokine analytical model adopted in the present invention, the information comprised in each computer node in network can be made full use of data that other computer node collects, thus the more accurate model of training place.

3. the distributed training process based on t hybrid cytokine analytical model adopted in the present invention; in computer node cooperating process; exchange local statistic instead of directly transmit raw data; because the quantity of local statistic and dimension are much smaller than data; therefore this mode saves the expense of communication on the one hand; on the other hand, be conducive to the privacy information adequately protected in data, improve the security performance of the system adopting this method.

4. the Distributed identification process based on t hybrid cytokine analytical model adopted in the present invention, can gather new data in any computer node place in a network, can obtain identical recognition result.

Accompanying drawing explanation

Fig. 1 is the neighbor node collection R of nodes m of the present invention _m, and between node, receive and dispatch the schematic diagram of local statistic.

Fig. 2 is the process flow diagram of the distributed Handwritten Digit Recognition method based on the analysis of t hybrid cytokine that the present invention relates to.

Fig. 3 is method of the present invention and centralized tMFA, without confusion matrix (that is: ConfusionMatrix) the hinton schematic diagram that cooperation tMFA method recognition result is corresponding.

Fig. 4 is method of the present invention and centralized tMFA, without average and the variance schematic diagram of the recognition correct rate of cooperation tMFA method.

Embodiment

Below in conjunction with Figure of description, the invention is described in further detail.

In order to a kind of distributed Handwritten Digit Recognition method analyzed based on t hybrid cytokine that the present invention relates to is described better.We describe with a concrete application example.

(1) collection of data and feature extraction: establish and always have 44 people, everyone each digital handwriting 25 times, is total up to 25*10*44=11000 raw data.20 computing machine/nodes (M=20) altogether in network, the neighbor node number of each node is 3, has directly or multihop path intercommunication between any two nodes.The raw data (250*30=7500) of the wherein handwritten numeral of 30 people, for distributed training, is divided into 20 parts, is assigned randomly in 20 nodes.For each raw data, equally spaced get the coordinate of 8 points on its track as the characteristic s corresponding to this raw data, totally 16 dimensions.In order to represent convenient, if the training dataset of digital d that node m place obtains through feature extraction is wherein represent node m place, for n-th characteristic of handwritten numeral d of training, dimension is p=16, for the training data number of digital d.

(2) distributed training: after step (1) completes, start distributed training.Here be trained for example with digital d, carry out the distribution of Modelling feature data with tMFA as shown in Figure 2, its concrete steps are as follows for training process:

(2-1) initialization: setting tMFA in be mixed into mark I=5.The initial value of parameter in MFA is set according to the dimension p of I and data.Wherein, each Nodes

(π_{1}^{0}, ..., π_{i}^{0}, ..., π_{I}^{0}) = (1 / I, ..., 1 / I, ..., 1 / I); {μ_{1}^{0}, ..., μ_{i}^{0}, ..., μ_{I}^{0}}

Random selecting from the data that this node collects; with in each element generate from standardized normal distribution N (0,1); in addition, and each node l (l=1 ..., M) data amount check that collected be broadcast to its neighbor node.After certain node m receives its data amount check of all neighbor nodes broadcast, this node calculate weight c _lm:

c_{l m} = \frac{N_{l}^{(d)}}{Σ_{l^{'} &Element; R_{m}} N_{l^{'}}^{(d)}},

(2-2) calculate local statistic and broadcast: this step does not need the information of neighbor node.At each node l place, based on the data that it collects first intermediate variable g is calculated _i, Ω _i,

{\overset{&OverBar;}{u}}_{l, n, i}^{(d)}, W_{l, n, i}^{(d)}, < z_{l, n, i}^{(d)} >

With

< w_{l, n, i}^{(d)} > (n = 1, ..., N_{l}^{(d)}; i = 1, ..., I) :

g_{i} = {[A_{i}^{o l d} {(A_{i}^{o l d})}^{T} + D_{i}^{o l d}]}^{- 1} \cdot A_{i}^{o l d},

Ω_{i} = I_{q} - g_{i}^{T} A_{i}^{o l d},

{\overset{&OverBar;}{u}}_{l, n, i}^{(d)} = g_{i}^{T} (s_{l, n}^{(d)} - μ_{i}^{o l d}),

W_{l, n, i}^{(d)} = {(s_{l, n}^{(d)} - μ_{i}^{o l d})}^{T} \cdot [A_{i}^{o l d} {(A_{i}^{o l d})}^{T} + D_{i}^{o l d}] \cdot (s_{l, n}^{(d)} - μ_{i}^{o l d})

< z_{l, n, i}^{(d)} > = \frac{π_{i}^{o l d} \cdot t (s_{l, n}^{(d)} | μ_{i}^{o l d}, A_{i}^{o l d} {(A_{i}^{o l d})}^{T} + D_{i}^{o l d}, ν_{i}^{o l d})}{Σ_{i^{'} = 1}^{I} π_{i^{'}}^{o l d} \cdot t (s_{l, n}^{(d)} | μ_{i^{'}}^{o l d}, A_{i^{'}}^{o l d} {(A_{i^{'}}^{o l d})}^{T} + D_{i^{'}}^{o l d}, ν_{i^{'}}^{o l d})},

< w_{l, n, i}^{(d)} > = \frac{ν_{i}^{o l d} + p}{ν_{i}^{o l d} + W_{l, n, i}^{(d)}},

Wherein for last iteration complete after the parameter value that obtains (be the initial value of parameter first during iteration represent node l place n-th data belong to the probability of i-th class (blending constituent), for expectation value.

Had the result of calculation of above-mentioned intermediate variable, node calculate goes out local statistic (LS)

{LS}_{l} = {{LS}_{l}^{(1)} [i], {LS}_{l}^{(2)} [i], {LS}_{l}^{(3)} [i], {LS}_{l}^{(4)} [i], {LS}_{l}^{(5)} [i]}_{i = 1, ..., I},

As follows:

{LS}_{l}^{(1)} [i] = Σ_{n = 1}^{N_{l}} < z_{l, n, i}^{(d)} >,

{LS}_{l}^{(2)} [i] = Σ_{n = 1}^{N_{l}} < z_{l, n, i}^{(d)} > \cdot < w_{l, n, i}^{(d)} >,

{LS}_{l}^{(3)} [i] = Σ_{n = 1}^{N_{l}} < z_{l, n, i}^{(d)} > \cdot < w_{l, n, i}^{(d)} > \cdot s_{l, n}^{(d)},

{LS}_{l}^{(4)} [i] = Σ_{n = 1}^{N_{l}} < z_{l, n, i}^{(d)} > \cdot < w_{l, n, i}^{(d)} > \cdot s_{l, n}^{(d)} \cdot {(s_{l, n}^{(d)})}^{T},

{LS}_{l}^{(5)} [i] = Σ_{n = 1}^{N_{l}} < z_{l, n, i}^{(d)} > \cdot l o g < w_{l, n, i}^{(d)} > .

(2-3) calculate associating statistic: when node m (m=1 ..., M) receive from its all neighbor node l (l ∈ R _m), node m calculates associating statistic

{CS}_{m} {{CS}_{m}^{(1)} [i], {CS}_{m}^{(2)} [i], {CS}_{m}^{(3)} [i], {CS}_{m}^{(4)} [i], {CS}_{m}^{(5)} [i]}_{i = 1, ..., I} :

{CS}_{m}^{(1)} [i] = \underset{l &Element; R_{m}}{Σ} c_{l m} \cdot {LS}_{l}^{(1)} [i],

{CS}_{m}^{(2)} [i] = \underset{l &Element; R_{m}}{Σ} c_{l m} \cdot {LS}_{l}^{(2)} [i],

{CS}_{m}^{(3)} [i] = \underset{l &Element; R_{m}}{Σ} c_{l m} \cdot {LS}_{l}^{(3)} [i],

{CS}_{m}^{(4)} [i] = \underset{l &Element; R_{m}}{Σ} c_{l m} \cdot {LS}_{l}^{(4)} [i],

{CS}_{m}^{(5)} [i] = \underset{l &Element; R_{m}}{Σ} c_{l m} \cdot {LS}_{l}^{(5)} [i] .

(2-4) each parameter in estimation model: node m (m=1 ..., M) CS that calculates according to previous step _m, estimate Θ={ π _i, A _i, μ _i, D _i, ν _i} _{i=1 ..., I}, wherein, { π _i, μ _i} _{i=1 ..., I}estimation procedure as follows:

π_{i} = \frac{{CS}_{m}^{(1)} [i]}{Σ_{i^{'} = 1}^{I} {CS}_{m}^{(1)} [i^{'}]},

μ_{i} = \frac{{CS}_{m}^{(3)} [i]}{{CS}_{m}^{(2)} [i]};

For { A _i, D _i} _{i=1 ..., I}estimation, process is as follows:

V_{i} = \frac{{CS}_{m}^{(4)} [i] - 2 {CS}_{m}^{(3)} [i] \cdot μ_{i}^{T} + {CS}_{m}^{(2)} [i] \cdot μ_{i} \cdot μ_{i}^{T}}{{CS}_{m}^{(1)} [i]},

A_{i} = V_{i} g_{i} {(g_{i}^{T} V_{i} g_{i} + Ω_{i})}^{- 1},

D_{i} = d i a g {V_{i} - A_{i} (g_{i}^{T} V_{i} g_{i} + Ω_{i}) A_{i}^{T}};

In addition, for { ν _i} _{i=1 ..., I}, obtained by solution equation below:

\log (\frac{ν_{i}}{2}) - ψ (\frac{ν_{i}}{2}) + 1 - \frac{{CS}_{m}^{(5)} [i] - {CS}_{m}^{(2)} [i]}{{CS}_{m}^{(1)} [i]} - \log (\frac{ν_{i}^{o l d} + p}{2}) + ψ (\frac{ν_{i}^{o l d} + p}{2}) = 0,

Wherein ψ () the digamma function that is standard, generally with the above-mentioned equation of Newton method solution.

(2-5) convergence is judged whether: node m (m=1 ..., M) calculate log-likelihood under current iteration iter:

\log p (S_{m}^{(d)} | Θ) = Σ_{n = 1}^{N_{m}} l o g (Σ_{i = 1}^{I} π_{i} \cdot t (s_{m, n}^{(d)} | μ_{i}, A_{i} A_{i}^{T} + D_{i}, ν_{i})),

If then algorithm convergence, stops iteration; Otherwise perform step (2-2), start next iteration (iter=iter+1; ).Wherein Θ represents the parameter value that current iteration estimates, Θ ^oldrepresent the parameter value estimated in last iteration.That is, the log-likelihood of adjacent twice iteration is less than threshold epsilon, algorithm convergence.ε gets 10 ^-5~ 10 ^-6in arbitrary value.It should be noted that because node each in network is parallel data processing, therefore allly can not to restrain in an iteration simultaneously.Such as, when when node l restrains, node m not yet restrains, then node l no longer sends LS _l, also no longer receive the information of neighbor node transmission.Node m is then with the LS that the node l received for the last time sends _lupgrade its CS _m.The node of not restraining continues iteration, until all nodes are all restrained in network.

After above-mentioned steps (2-1) ~ (2-5), the corresponding model tMFA (representing with parameter Θ during training convergence) obtained by the training data of handwritten numeral d.Repeat above-mentioned steps 10 times, thus obtain 10 numerals tMFA model corresponding separately, in order to represent convenient, and being distinguished, using

Θ^{(d)} = {π_{i}^{(d)}, A_{i}^{(d)}, μ_{i}^{(d)}, D_{i}^{(d)}, ν_{i}^{(d)}}_{i = 1, ..., I}, (d = 0, 1, ..., 9)

(3) Distributed identification: when the arbitrary computer acquisition in network to new for identify handwritten numeral time, first obtain its character pair by step (1), be expressed as s', then calculate about Θ ^(d)(d=0,1 ..., 9) log-likelihood logp (s'| Θ ^(d)) (d=0,1 ..., 9):

\log p (s^{'} | Θ^{(d)}) = l o g (Σ_{i = 1}^{I} π_{i}^{(d)} \cdot t (s^{'} | μ_{i}^{(d)}, A_{i}^{(d)} {(A_{i}^{(d)})}^{T} + D_{i}^{(d)}, ν_{i}^{(d)}));

d^{'} = \arg \max_{d = 0}^{9} \log p (s^{'} | Θ^{(d)}),

Distributed identification flow process of the present invention as shown in Figure 2.

Performance evaluation:

Because digital realistic value to be identified is known, compare adopting recognition methods involved in the present invention and its true value, obtain recognition correct rate (that is: quantity/(20*3500) of the handwritten numeral that recognition correct rate=all nodes correctly identify), thus can evaluate and weigh out validity and the accuracy of method involved in the present invention.In order to compare the present invention propose based on the distributed Handwritten Digit Recognition method (being called for short distributed tMFA) of tMFA and the performance of additive method, here with based on the centralized Handwritten Digit Recognition method (being called for short centralized tMFA) of tMFA, compare without the Handwritten Digit Recognition method (being called for short without cooperation tMFA) of cooperation between each node based on tMFA.It should be noted that; in centralized tMFA; all nodes need by original data transmissions to certain Centroid, adopt traditional MFA to carry out training and identified, then again result is returned to each node by Centroid; this mode in practice little; one is that transmission raw data communication expense is very large, once occur that packet loss or packet damage, very large on the impact of last recognition performance; two is the secret protections be unfavorable in data, and network security causes anxiety.Here object is whether the recognition methods in order to compare the distributed tMFA that the present invention proposes can reach the same performance of centralized tMFA.Recognition result represents by quantitative and qualitative analysis two kinds of modes respectively.In the qualitative representation of result, adopt the hinton figure of confusion matrix, as shown in Figure 3.In the figure, each recognition result of row representative digit 0 ~ 9 and the true value of each row representative digit 0 ~ 9.Blockage on principal diagonal represents correct situation about identifying, the size of blockage shows that more greatly the numeral of this identification correct is more, and other positions occurs blockage shows to exist the situation of wrong identification.As can be seen from this figure, centralized tMFA and distributed tMFA of the present invention (only provide node 3 wherein as space is limited, other nodes come to the same thing) better performances, and without cooperation tMFA poor-performing.In the quantificational expression of result, adopt average and the variance two indices of discrimination, as shown in Figure 4.In the figure, the recognition correct rate of the distributed tMFA of the present invention's design is substantially identical with the average of the recognition correct rate of centralized tMFA, and without cooperating the poor of tMFA, the variance of the recognition correct rate of distributed tMFA is also much smaller than nothing cooperation tMFA in addition.Therefore, adopt method of the present invention to overcome the shortcoming of the Handwritten Digit Recognition of traditional centralized tMFA, achieve Distributed identification and there is good performance.

The scope of request protection of the present invention is not limited only to the description of this embodiment.

Claims

1., based on the distributed Handwritten Digit Recognition method that t hybrid cytokine is analyzed, it is characterized in that, described method comprises the steps:

Step 1: the collection of data and feature extraction: be provided with M platform computing machine, that is: node, form a network, the topology of network determines in advance, as long as it meets and to exist directly between any two nodes or multi-hop forwards and the path of intercommunication, the neighbor node set expression of node m is R _m; M node (m=1, ..., M) collect handwritten numeral 0 ~ 9 by the handwriting pad be attached thereto, the raw data of totally 10 classes, handwriting pad records the two-dimensional coordinate position of each point on each character writing track automatically, gets the coordinate of 8 points in the first-class compartment of terrain of track as the characteristic s corresponding to each raw data, totally 16 dimensions; If node m place gathers and the training dataset of the digital d obtained through feature extraction is wherein represent node m place, for n-th characteristic of handwritten numeral d of training, dimension is p=16, for the training data number of digital d;

Analyze (tMFA) with a public t hybrid cytokine and describe characteristic data set relevant to digital d in all nodes distribution; The parameter sets of tMFA is { π _i, A _i, μ _i, D _i, ν _i} _{i=1 ..., I}, wherein I is for being mixed into mark, π _ibe the weight of i-th blending constituent, A _ibe the Factor load-matrix of (p × q) of i-th blending constituent, q is the dimension of the low-dimensional factor, gets the arbitrary integer between q=p/5 ~ p/3, μ _ibe the p dimension mean value vector of i-th blending constituent, D _iit is the covariance matrix of (p × p) of the error of i-th blending constituent; ν _iit is the degree of freedom of i-th blending constituent;

\begin{matrix} Θ^{(d)} = {π_{i}^{(d)}, A_{i}^{(d)}, μ_{i}^{(d)}, D_{i}^{(d)}, ν_{i}^{(d)}}_{i = 1, ..., I} & (d = 0, ..., 9) \end{matrix};

Step 3: Distributed identification, when any one node in network collect new for identify handwritten numeral time, first obtain its character pair by above-mentioned steps 1, be expressed as s', then calculate s' about Θ ^(d)(d=0,1 ..., 9) log-likelihood logp (s'| Θ ^(d)) (d=0,1 ..., 9):

l o g p (s^{'} | Θ^{(d)}) = l o g (Σ_{i = 1}^{I} π_{i}^{(d)} \cdot t (s^{'} | μ_{i}^{(d)}, A_{i}^{(d)} {(A_{i}^{(d)})}^{T} + D_{i}^{(d)}, ν_{i}^{(d)}));

d^{'} = \arg \max_{d = 0}^{9} l o g p (s^{'} | Θ^{(d)}) .

2. a kind of distributed Handwritten Digit Recognition method analyzed based on t hybrid cytokine according to claim 1, it is characterized in that, described step 2 comprises the steps:

Step 2-1: initialization; Setting tMFA in be mixed into mark I, set the initial value of each parameter in MFA according to I, p and q

{π_{i}^{0}, A_{i}^{0}, μ_{i}^{0}, D_{i}^{0}, ν_{i}^{0}}_{i = 1, ..., I};

Wherein, each Nodes

(π_{1}^{0}, ..., π_{i}^{0}, ..., π_{I}^{0}) = (1 / I, ..., 1 / I, ..., 1 / I);

random selecting from the data that this node collects; with in each element generate from standardized normal distribution N (0,1); this group parameter gets the arbitrary integer between 1 ~ 5; In addition, and each node l (l=1 ..., M) data amount check that collected be broadcast to its neighbor node; After certain node m receives its data amount check of all neighbor nodes broadcast, this node calculate weight c _lm:

c_{l m} = \frac{N_{l}^{(d)}}{Σ_{l^{'} &Element; R_{m}} N_{l^{'}}^{(d)}},

After initialization completes, iteration count iter=1, starts iterative process;

Step 2-2: calculate local statistic and broadcast; At each node l place, based on the data that it collects first intermediate variable g is calculated _i, Ω _i, with

\begin{matrix} < w_{l, n, i}^{(d)} > & (n = 1, ..., N_{l}^{(d)}; i = 1, ..., I) \end{matrix} :

g_{i} = {[A_{i}^{o l d} {(A_{i}^{o l d})}^{T} + D_{i}^{o l d}]}^{- 1} \cdot A_{i}^{o l d},

Ω_{i} = I_{q} - g_{i}^{T} A_{i}^{o l d},

{\overset{&OverBar;}{u}}_{l, n, i}^{(d)} = g_{i}^{T} (s_{l, n}^{(d)} - μ_{i}^{o l d}),

W_{l, n, i}^{(d)} = {(s_{l, n}^{(d)} - μ_{i}^{o l d})}^{T} \cdot [A_{i}^{o l d} {(A_{i}^{o l d})}^{T} + D_{i}^{o l d}] \cdot (s_{l, n}^{(d)} - μ_{i}^{o l d})

< z_{l, n, i}^{(d)} > = \frac{π_{i}^{o l d} \cdot t (s_{l, n}^{(d)} | μ_{i}^{o l d}, A_{i}^{o l d} {(A_{i}^{o l d})}^{T} + D_{i}^{o l d}, ν_{i}^{o l d})}{Σ_{i^{'} = 1}^{I} π_{i^{'}}^{o l d} \cdot t (s_{l, n}^{(d)} | μ_{i^{'}}^{o l d}, A_{i^{'}}^{o l d} {(A_{i^{'}}^{o l d})}^{T} + D_{i^{'}}^{o l d}, ν_{i^{'}}^{o l d})},

< w_{l, n, i}^{(d)} > = \frac{ν_{i}^{o l d} + p}{ν_{i}^{o l d} + W_{l, n, i}^{(d)}},

Wherein for last iteration complete after the parameter value that obtains, that is: first iteration time be the initial value of parameter represent node l place n-th data belong to i-th class, that is: the probability of blending constituent, for the hidden variable in tMFA expectation value;

Had the result of calculation of above-mentioned intermediate variable, node calculate goes out local statistic, i.e. LS,

{LS}_{l} = {{LS}_{l}^{(1)} [i], {LS}_{l}^{(2)} [i], {LS}_{l}^{(3)} [i], {LS}_{l}^{(4)} [i], {LS}_{l}^{(5)} [i]}_{i = 1, ..., I},

Comprise:

{LS}_{l}^{(1)} [i] = Σ_{n = 1}^{N_{l}} < z_{l, n, i}^{(d)} >,

{LS}_{l}^{(2)} [i] = Σ_{n = 1}^{N_{l}} < z_{l, n, i}^{(d)} > \cdot < w_{l, n, i}^{(d)} >,

{LS}_{l}^{(3)} [i] = Σ_{n = 1}^{N_{l}} < z_{l, n, i}^{(d)} > \cdot < w_{l, n, i}^{(d)} > \cdot s_{l, n}^{(d)},

{LS}_{l}^{(4)} [i] = Σ_{n = 1}^{N_{l}} < z_{l, n, i}^{(d)} > \cdot < w_{l, n, i}^{(d)} > \cdot s_{l, n}^{(d)} \cdot {(s_{l, n}^{(d)})}^{T},

{LS}_{l}^{(5)} [i] = Σ_{n = 1}^{N_{l}} < z_{l, n, i}^{(d)} > \cdot \log < w_{l, n, i}^{(d)} >,

Finally, the local statistic LS that will calculate of each node l _lbroadcast diffusion is to its neighbor node;

Step 2-3: calculate associating statistic; When node m (m=1 ..., M) receive from its all neighbor node l (l ∈ R _m) LS _lafter, node m calculates associating statistic

{CS}_{m} = {{CS}_{m}^{(1)} [i], {CS}_{m}^{(2)} [i], {CS}_{m}^{(3)} [i], {CS}_{m}^{(4)} [i], {CS}_{m}^{(5)} [i]}_{i = 1, ..., I} :

{CS}_{m}^{(1)} [i] = \underset{l &Element; R_{m}}{Σ} c_{l m} \cdot {LS}_{l}^{(1)} [i],

{CS}_{m}^{(2)} [i] = \underset{l &Element; R_{m}}{Σ} c_{l m} \cdot {LS}_{l}^{(2)} [i],

{CS}_{m}^{(3)} [i] = \underset{l &Element; R_{m}}{Σ} c_{l m} \cdot {LS}_{l}^{(3)} [i],

{CS}_{m}^{(4)} [i] = \underset{l &Element; R_{m}}{Σ} c_{l m} \cdot {LS}_{l}^{(4)} [i],

{CS}_{m}^{(5)} [i] = \underset{l &Element; R_{m}}{Σ} c_{l m} \cdot {LS}_{l}^{(5)} [i];

Step 2-4: each parameter in estimation model; Node m (m=1 ..., M) CS that calculates according to previous step _m, estimate Θ={ π _i, A _i, μ _i, D _i, ν _i} _{i=1 ..., I}, wherein, { π _i, μ _i} _{i=1 ..., I}estimation procedure comprise:

π_{i} = \frac{{CS}_{m}^{(1)} [i]}{Σ_{i^{'} = 1}^{I} {CS}_{m}^{(1)} [i^{'}]},

μ_{i} = \frac{{CS}_{m}^{(3)} [i]}{{CS}_{m}^{(2)} [i]};

For { A _i, D _i} _{i=1 ..., I}estimation, process comprises:

V_{i} = \frac{{CS}_{m}^{(4)} [i] - 2 {CS}_{m}^{(3)} [i] \cdot μ_{i}^{T} + {CS}_{m}^{(2)} [i] \cdot μ \cdot μ_{i}^{T}}{{CS}_{m}^{(1)} [i]},

A_{i} = V_{i} g_{i} {(g_{i}^{T} V_{i} g_{i} + Ω_{i})}^{- 1},

D_{i} = d i a g {V_{i} - A_{i} (g_{i}^{T} V_{i} g_{i} + Ω_{i}) A_{i}^{T}};

In addition, for { ν _i} _{i=1 ..., I}, obtained by solution equation below, comprising:

\log (\frac{ν_{i}}{2}) - ψ (\frac{ν_{i}}{2}) + 1 - \frac{{CS}_{m}^{(5)} [i] - {CS}_{m}^{(2)} [i]}{{CS}_{m}^{(1)} [i]} - \log (\frac{ν_{i}^{o l d} + p}{2}) + ψ (\frac{ν_{i}^{o l d} + p}{2}) = 0,

Wherein ψ () the digamma function that is standard, adopts Newton method when specifically solving;

Step 2-5: judge whether convergence; Node m (m=1 ..., M) calculate log-likelihood under current iteration:

l o g p (S_{m}^{(d)} | Θ) = Σ_{n = 1}^{N_{m}} l o g (Σ_{i = 1}^{I} π_{i} \cdot t (s_{m, n}^{(d)} | μ_{i}, A_{i} A_{i}^{T} + D_{i}, ν_{i})),

If then algorithm convergence, stops iteration; Otherwise perform step 2-2, start next iteration (iter=iter+1; ); Wherein Θ represents the parameter value that current iteration estimates, Θ ^oldrepresent the parameter value estimated in last iteration; That is, the log-likelihood of adjacent twice iteration is less than threshold epsilon, algorithm convergence; ε gets 10 ^-5~ 10 ^-6in arbitrary value; Because node each in network is parallel data processing, therefore allly can not to restrain in an iteration simultaneously; When node l restrains, node m not yet restrains, then node l no longer sends LS _l, also no longer receive the information of neighbor node transmission; Node m is then with the LS that the node l received for the last time sends _lupgrade its CS _m; The node of not restraining continues iteration, until all nodes are all restrained in network;

After above-mentioned steps 2-1 ~ step 2-5, the corresponding model tMFA obtained by the training data of handwritten numeral d, that is: represent with parameter Θ during training convergence; Repeat above-mentioned steps 10 times, thus obtain 10 numerals tMFA model corresponding separately, in order to represent convenient, and being distinguished, using

\begin{matrix} Θ^{(d)} = {π_{i}^{(d)}, A_{i}^{(d)}, μ_{i}^{(d)}, D_{i}^{(d)}, ν_{i}^{(d)}}_{i = 1, ..., I} & (d = 0, 1, ..., 9) \end{matrix}

The tMFA model that representative digit d is corresponding, so far, distributed training completes.