CN104598720B

CN104598720B - Cmp time setting method based on cluster and multi-task learning

Info

Publication number: CN104598720B
Application number: CN201410805040.3A
Authority: CN
Inventors: 刘民; 段运强; 董明宇; 郝井华
Original assignee: Tsinghua University
Current assignee: Zhengda Industrial Biotechnology (shanghai) Co Ltd
Priority date: 2014-12-23
Filing date: 2014-12-23
Publication date: 2018-04-10
Anticipated expiration: 2034-12-23
Also published as: CN104598720A

Abstract

Cmp time setting method based on cluster and multi-task learning, belong to automatically control, information technology and advanced manufacturing field, it is characterized in that, to realize the lower micro-electronic manufacturing process chemistry mechanical polishing time optimal setting of multi items mixing, for this method using technic index and Product Status as input, the key factor milling time for influenceing technic index establishes reverse model for optimizing setting to milling time as output.It is more for production Varieties in above-mentioned reverse model process is built, the problem of single species data is few, similar varieties are clustered by product feature, are modeled in each category using the multi-task learning method based on shared parameter extraction；The model parameter calculated is divided into the shared part of this veriety and the privately owned part of single variety.

Description

Cmp time setting method based on cluster and multi-task learning

Technical field

The invention belongs to automatically control, information technology and advanced manufacturing field.To solve in the modern microelectronic manufacturing Manually the rework rate caused by the setting cmp time is high under multi items mixed-flow production style, the production of whole production line The problem of efficiency is relatively low, the present invention use the optimal setting method based on reverse model, using technic index and Product Status as Input, the milling time of key factor one for influenceing technic index establish reverse model for excellent to milling time progress as output Change setting.For more than the product variety in the presence of creation data and the problem of single species data amount is few, proposing that one kind is based on The cmp time setting method of cluster and multi-task learning, the optimization to the cmp time can be achieved and set It is fixed, improve the production efficiency of cmp process.

Background technology

Cmp is the critical process during micro-electronic manufacturing, affects the production efficiency of whole production line. Due to the productive prospecting of modern microelectronic manufacturing industry multi-varieties and small-batch, pattern of the enterprise frequently with multi items mixture manufacturing. Under this mixed-flow production style, traditional Run-to-Run (RtR) optimal control method is difficult to obtain ideal effect.RtR master It is to instruct the production of batch of newly arriving using the production information of last batch or even a few batches to want thought, generally assumes that same breed Product some a small amount of kinds of the regular cyclic process of equipment of Continuous maching or hypothesis in an equipment.But due to multi items The product of some any kinds of the possible Continuous maching of equipment, different cultivars and equipment room have differences under mixture manufacturing, tradition RtR method effects are poor.Therefore, milling time still relies on artificial experience setting and leading built-in testing at present, once occur not Qualified products, whole batch will be done over again, it is difficult to realized higher production efficiency, needed the processing to cmp badly Time optimizes setting.Traditional operation parameter optimization establishing method needs first to be modeled technic index, Ran Hou Optimize operating parameter on the basis of index model.Technic index model is to be referred to using Product Status and operating parameter as input, technique The forward model of output is designated as, but when data are less, it is difficult to accurately establish index model, cause operation optimization effect It is bad.The idea for instructing following production in RtR with history batch is used for reference, for cmp data are wide in variety, Mei Gepin The characteristics of data are few in kind, the present invention influence the key factor of technic index-grind using technic index and Product Status as inputting Time consuming establishes reverse model as output and is used to optimize milling time setting, based on this proposition it is a kind of based on cluster with The cmp time setting method of multi-task learning.

The content of the invention

It is of the invention by technique to solve the problems, such as the time-optimized setting of cmp under multi items mixture manufacturing environment As input, the key factor-milling time for influenceing technic index establishes reverse model use as output for index and Product Status In optimizing setting to milling time, based on a kind of this proposition cmp time based on cluster and multi-task learning Establishing method.It is more for product variety in processing actual production data, the problem of each species data is few, propose a kind of based on cluster With two step modeling methods of multi-task learning, tradition is clustered into cluster process and parameter learning process in multi-task learning and separated Consider.Variety cluster process assumes the data symbols unification Multi-dimensional Gaussian distribution in each kind, using maximal possibility estimation pair The mean vector and variance matrix of Multi-dimensional Gaussian distribution are estimated.According to estimated result, represented using Pasteur's distance more than two Tie up the similarity of Gaussian Profile.Similarity matrix is obtained by calculating the similarity between different cultivars probability distribution.With similar Spend matrix and Variety cluster is completed using affine propagation algorithm as input.While in order to ensure the result of cluster gained under small sample In sample size in each classification it is sufficiently large, be embedded into kind using the sample size of each kind as a kind of priori In cluster process.

In each category using the multi-task learning algorithm computation model parameter based on shared parameter extraction proposed. In order to solve the problems, such as that sample size is few in individual task, the multi-task learning algorithm based on shared parameter extraction proposed will Model parameter is decomposed into shared part and privately owned part and learns this two parts simultaneously.Shared part can make up sample data not The model bias that foot straps are come.

Cmp time setting method based on cluster and multi-task learning, it is characterised in that methods described is Realize according to the following steps successively on computers：

Step (1)：Data preparation

The optimal setting model established in this method is with 4 technic indexs and Product Status composition model input row vector X, including：Grinding-material removal rate, carry out piece thickness, take a sample test leading piece slice thickness and lot and take a sample test slice thickness, to change It is that model exports y to learn mechanical polishing time；Relation between hypothesized model input and output meets following formula：

Y=xw+ δ

Wherein column vector w represents model parameter to be determined, and δ is noise.

Without loss of generality, it is assumed that have m product variety；The N of i-th of product variety_iIndividual sample input is designated as matrixIts jth row x_i(j)The input vector of j-th of sample of i-th of product variety is represented, d is mode input variable Count, d=4 in this method；The N of i-th of product variety_iIndividual sample output is designated as column vectorIts j-th of elementTable Show the output of j-th of sample of i-th of product variety, i.e. cmp time；Represented for ease of follow-up, with X represent by Input matrix [X₁, X₂..., X_m-1, X_m] matrix that is formed of longitudinal arrangement；Equally represented with Y by output vector [y₁, y₂..., y_m-1, y_m] longitudinal arrangement form column vector；X and Y haveN_iOK, N represents all total sample numbers；Column vectorThe label of the affiliated kind of each sample is represented, span is { 1,2 ..., m-1, m }；

Step (2)：Calculate the similarity matrix of different cultivars

Assuming that the data of each kind obey different Multi-dimensional Gaussian distributions, calculated using Maximum Likelihood Estimation each The probability-distribution function of kind；For example, to i-th of kind, the mean vector of its Multi-dimensional Gaussian distribution andAnd variance matrix Estimate：

Wherein matrixRow vector z_i(j)Representing matrix Z_iJth row, include 4 input Variable and 1 output variable；

Similarity between different cultivars is compared using Pasteur's distance, that is, compares the similar of different cultivars Multi-dimensional Gaussian distribution Degree：

Under the hypothesis of Multi-dimensional Gaussian distribution, Pasteur's distance has analytical expression, it is assumed that two Multi-dimensional Gaussian distributions it is general Rate is distributed as G₁~N (μ₁, ∑₁), G₂~N (μ₂, ∑₂) distance calculating method be：

Wherein| A | representing matrix A determinant；

According to above-mentioned diversity factor computational methods, based on the polynary Gaussian Profile mean vector of each kindAnd variance matrixEstimate, similarity matrix is calculated；Because subsequent affine propagation clustering algorithm required input is similarity, so difference Degree takes negative to obtain similarity；

Step (3)：Product feature cluster based on affine propagation

Affine propagation clustering is a kind of clustering algorithm based on information accumulation, is determined according to the cumulative information amount of difference poly- Class center, mainly calculate two kinds of information content, γ (i, k), a (i, k) using similarity matrix：

A (i, k)=0 is started setting up, then according to the renewal γ (i, k) of above formula iteration and a (i, k) until convergence；

Represent that each task is as the possibility of cluster centre in priori with preference vector in affine propagation clustering； The diagonal entry in similarity matrix is replaced with preference vector and then influence the selection of cluster centre in iteration；Because sample number Extended meeting obtains more accurate model after the more product variety of mesh, is more suitable for cluster centre, in order to by number of samples this One priori is used for during clustering, and sets the preference vector in affine propagation algorithm with the following method；

If the setting value of preference vector is p=[p₁, p₂..., p_m-1, p_m]：

Wherein N_iThe sample number of each task is represented, L represents to wish that kind of the sample size more than L is more likely to turn into poly- Class center；Representative value sets a=0.005, b=2000, L=50；

Step (4)：Multi-task learning based on shared parameter extraction

L classification is obtained after cluster, the multitask based on shared parameter extraction is used to the kind in each classification Algorithm is practised, the main thought of its algorithm is that the model parameter of each kind is divided into two parts：Shared parameter and privately owned parameter；Altogether It is identical part in the data model of all kinds in each classification to enjoy parameter, uses column vectorRepresent；And privately owned ginseng Number is parts different in the data model of each kind in each classification, uses column vectorRepresent；An if classification In have γ kind, then γ column vector vⁱIt may make up the matrix V of a γ row；Multi-task learning based on shared parameter extraction Algorithm can learn this two-part parameter so as to obtain the parameter of final mask, such as to according to the data in each classification I task, final model parameter are：

wⁱ=u+vⁱ

, it is necessary to which first data are normalized before model learning is carried out；Then the parameter of setting model, including λ₁, λ₂, λ₃, random initializtion shares parameter vector u and privately owned parameter matrix V；

Iterative process is as follows, the input matrix [X of γ kind during wherein X represents a kind of₁, X₂..., XX_r-1, X_r] longitudinal direction The formed matrix of arrangement；Output vector [the y of γ kind in one kind is equally represented with Y₁, y₂..., y_r-1, y_r] longitudinal arrangement The column vector of composition is to kth time iterative calculation：

In above formula：

According to p_k, u_k-1And Q_k, V_k-1Update u_k, V_k

Wherein α ∈ [0,1], can make α⁰=0；t₀=1,

In an iterative process, step-length l_kDetermine with the following method：

l_k=2^jkl_k-1, wherein j_kTo cause the minimum non-negative positive integer of following formula establishment：

L model library i.e. can obtain using the above method respectively to the classification of L cluster gained, it comprises all m The model of kind.

Brief description of the drawings

Fig. 1：Cmp time setting method flow chart based on cluster and multi-task learning

Fig. 2：Cmp time setting method software and hardware based on cluster and multi-task learning forms figure.

Embodiment

The present invention proposes the cmp time setting method based on cluster and multi-task learning, and its main advantage exists In available for multi items mixture manufacturing, production efficiency can be improved compared to artificial setting.In actual application, if new Production batch arrives, and can calculate milling time according to its kind and machined layer species and other batch informations.The present invention based on Cluster depends on related hardware device to multi-task learning algorithm, including：Data collecting system, arithmetic server and user visitor Family end, and by being realized based on intelligent optimization software.The present invention proposes that method flow diagram is as shown in Figure 2.

Step (1)：Data acquisition

The product variety of the production information of collection including lot, processing level, material removing rate, carry out piece thickness, leading piece goes out Piece thickness, sampling observation slice thickness, come piece thickness range, slice thickness range, initialization information is stored to production process data In storehouse；

Step (2)：Data preparation

On model training server, exception history record is removed first, including artificially inputted in historical data obvious The exception record of exception record and individual data item missing.Then data are arranged according to the requirement of model learning, wherein by product Kind and a kind integrated positioning the most are a kind of broad sense kind, and the different levels of same breed are considered as different product Kind.The data set formed isWherein jth rowIt is made up of 4 dimension datas, including material removal rate, carrys out piece Thickness, leading piece slice thickness and lot take a sample test slice thickness；J-th of elementFor the cmp time；Product Kind numbers vectorial I.

Step (3)：Model training

On model training server, according to reduced data collection { X_i, y_i, i={ 1,2 ..., m-1, m } and kind Number vectorial I and carry out model learning.Including two steps：Product feature clusters and shared the multi-task learning of parameter extraction.Root Calculation process in being described according to invention completes the process of whole model training.The present invention proposes that method flow diagram is as shown in Figure 1.

Step (4)：Parameter optimization

In being clustered based on product feature and being shared the multi-task learning of parameter extraction, main model parameter includes sample Number threshold value L, cluster numbers parameter a and multi-task learning parameter lambda₁, λ₂, λ₃.Sample threshold L can be by each in whole data set The sample number of kind is counted to determine.Its excess-three parameter is optimized using the method for cross validation.Its main process is pair Whole data set is split and respectively as training set and test set.For example, if 3 cross validations are carried out, then by each kind Data be divided into 3 parts, then take 1 part to be used as test data, 2 are allocated as training data.Trained to obtain model with training data, so The optimal setting effect of test data test model is used afterwards.By setting different a, λ₁, λ₂, λ₃The best parameter of Selection effect. Finally optimal model is transferred in presence server.

Step (4)：Application on site

At the scene on server, according to actual production data be transmitted through come current processing lot data, select corresponding kind Model parameter is determined with the data of level.Data x in present lot includes material removal rate, carrys out piece thickness, lacks in advance Piece slice thickness and lot take a sample test slice thickness.Replace leading piece slice thick with the standard slice thickness of this kind and level herein Degree and lot take a sample test slice thickness.The optimal setting value of cmp time is can obtain with model parameter dot product vector x.

Based on it is above-mentioned it is proposed based on product feature cluster and share parameter extraction multi-task learning method, the present invention is done Substantial amounts of l-G simulation test, by length is limited, only provides the invention and is applied to the time-optimized setting of cmp here Simulation result.Input data is made up of 4 dimension datas, including material removal rate, carrys out piece thickness, leading piece slice thickness and Lot takes a sample test slice thickness.It is common to the industrial field data between 2013-7-25,441 kinds that data are derived from 2011-1-1 11250 records, model parameter is determined using 3 cross validations.

Of the invention and typical multi-task learning algorithm CMTL-convex (Clustered multi-task learning A convex formulation) and non-linear modeling method core extreme learning machine (KELM), SVMs (SVM) progress Compare.Wherein in order to prove the validity of Variety cluster algorithm, respectively to not using KELM, SVM of Variety cluster method with entering Row Variety cluster later Cluster-KELM and Cluster-SVM is compared.Used in KELM and SVM is Gaussian kernel letter Number, expression formula are：

exp(-γ*||u-v||²)

An other parameter is the regular terms weight in model.Relevant parameter value is as shown in the table.

The algorithm relative parameters setting of table 1

The data for randomly selecting different proportion (0.2,0.4,0.6) carry out algorithm performance comparison as test data.Performance Index selection normalized mean-square error (normalized Mean Squared Error, nMSE), Averaged Square Error of Multivariate Proportion (error ＞ 10%) compares knot beyond (averaged Means Squared Error, aMSE) and 10% error Fruit is as shown in table 2

The LS-IELM of table 2 and OS-ELM, Fixed-LSSVM algorithm performance comparative result

As can be seen from the table, TwCMTL proposed by the present invention compared to CMTL-convex, SVM, KELM, Cluster-SVM, Cluster-KELM has more preferable measuring accuracy, has more preferable generalization ability.

Claims

1. based on cluster and multi-task learning cmp time setting method, it is characterised in that methods described be Realized according to the following steps successively on computer：

Step (1)：Data preparation

The optimal setting model established in this method with 4 technic indexs and Product Status composition model line of input vector x, its Include：Grinding-material removal rate, carry out piece thickness, take a sample test leading piece slice thickness and lot and take a sample test slice thickness, with chemical machine Tool milling time is that model exports y；Relation between hypothesized model input and output meets following formula：

Y=xw+ δ

Wherein column vector w represents model parameter to be determined, and δ is noise；

Assuming that there is m product variety；The N of i-th of product variety_iIndividual sample input is designated as matrixIts jth row x_i(j) Represent the input vector of j-th of sample of i-th of product variety, d is mode input variable number, d=4；I-th of product variety N_iIndividual sample output is designated as column vectorIts j-th of elementRepresent the defeated of j-th of sample of i-th of product variety Go out, i.e. the cmp time；Represent for ease of follow-up, represented with X by input matrix [X₁, X₂..., X_m-1, X_m] longitudinal direction row The formed matrix of row；Equally represented with Y by output vector [y₁, y₂..., y_m-1, y_m] longitudinal arrangement form column vector；X and Y hasOK, N represents all total sample numbers；Column vectorRepresent the mark of the affiliated kind of each sample Number, span is { 1,2 ..., m-1, m }；

Step (2)：Calculate the similarity matrix of different cultivars

The probability-distribution function of each kind is calculated using Maximum Likelihood Estimation, each product under Multi-dimensional Gaussian distribution hypothesis The corresponding mean vector of kindAnd variance matrixEstimate be：

<mrow> <msub> <mover> <mi>&mu;</mi> <mo>^</mo> </mover> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <msub> <mi>N</mi> <mi>i</mi> </msub> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>N</mi> <mi>i</mi> </msub> </munderover> <msub> <mi>z</mi> <mrow> <mi>i</mi> <mrow> <mo>(</mo> <mi>j</mi> <mo>)</mo> </mrow> </mrow> </msub> <mo>,</mo> <mi>i</mi> <mo>=</mo> <mo>{</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mo>...</mo> <mo>,</mo> <mi>m</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>m</mi> <mo>}</mo> </mrow>

<mrow> <msub> <mover> <mo>&Sigma;</mo> <mo>^</mo> </mover> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <msub> <mi>N</mi> <mi>i</mi> </msub> <mo>-</mo> <mn>1</mn> </mrow> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>N</mi> <mi>i</mi> </msub> </munderover> <mrow> <mo>(</mo> <msub> <mi>z</mi> <mrow> <mi>i</mi> <mrow> <mo>(</mo> <mi>j</mi> <mo>)</mo> </mrow> </mrow> </msub> <mo>-</mo> <msub> <mover> <mi>&mu;</mi> <mo>^</mo> </mover> <mi>i</mi> </msub> <mo>)</mo> </mrow> <msup> <mrow> <mo>(</mo> <msub> <mi>z</mi> <mrow> <mi>i</mi> <mrow> <mo>(</mo> <mi>j</mi> <mo>)</mo> </mrow> </mrow> </msub> <mo>-</mo> <msub> <mover> <mi>&mu;</mi> <mo>^</mo> </mover> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>&prime;</mo> </msup> <mo>,</mo> <mi>i</mi> <mo>=</mo> <mo>{</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mo>...</mo> <mo>,</mo> <mi>m</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>m</mi> <mo>}</mo> </mrow>

Wherein matrixRow vector z_i(j)Representing matrix Z_iJth row, comprising 4 input variables and 1 output variable；

Diversity factor between different cultivars is compared using Pasteur's distance, that is, compares the diversity factor of different cultivars Multi-dimensional Gaussian distribution； The definition of Pasteur's distance is：

<mrow> <msub> <mi>d</mi> <mi>B</mi> </msub> <mrow> <mo>(</mo> <mi>p</mi> <mo>(</mo> <mi>x</mi> <mo>)</mo> <mo>,</mo> <mi>q</mi> <mo>(</mo> <mi>x</mi> <mo>)</mo> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <mi>l</mi> <mi>n</mi> <mrow> <mo>(</mo> <mo>&Integral;</mo> <msqrt> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mi>q</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> </mrow> </msqrt> <mi>d</mi> <mi>x</mi> <mo>)</mo> </mrow> </mrow>

Under the hypothesis of Multi-dimensional Gaussian distribution, Pasteur's distance has analytical expression：Two Multi-dimensional Gaussian distribution G₁~N (μ₁, ∑₁) And G₂~N (μ₂, ∑₂) Pasteur apart from calculating formula：

<mrow> <msub> <mi>d</mi> <mi>B</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>G</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>G</mi> <mn>2</mn> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mn>8</mn> </mfrac> <msup> <mrow> <mo>(</mo> <msub> <mi>&mu;</mi> <mn>1</mn> </msub> <mo>-</mo> <msub> <mi>&mu;</mi> <mn>2</mn> </msub> <mo>)</mo> </mrow> <mi>T</mi> </msup> <msup> <mi>&Gamma;</mi> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msup> <mrow> <mo>(</mo> <msub> <mi>&mu;</mi> <mn>1</mn> </msub> <mo>-</mo> <msub> <mi>&mu;</mi> <mn>2</mn> </msub> <mo>)</mo> </mrow> <mo>+</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mi>ln</mi> <mrow> <mo>(</mo> <mo>|</mo> <msub> <mi>&Sigma;</mi> <mn>1</mn> </msub> <msup> <mo>|</mo> <mrow> <mo>-</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> </mrow> </msup> <mo>|</mo> <msub> <mi>&Sigma;</mi> <mn>2</mn> </msub> <msup> <mo>|</mo> <mrow> <mo>-</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> </mrow> </msup> <mo>|</mo> <mi>&Gamma;</mi> <mo>|</mo> <mo>)</mo> </mrow> </mrow>

Wherein| A | representing matrix A determinant；

Because subsequent affine propagation clustering method required input is similarity, diversity factor is taken and negative obtains similarity；It is based on The polynary Gaussian Profile mean vector of each kindAnd variance matrixEstimate, similarity matrix is calculated；

Step (3)：Product feature cluster based on affine propagation

Affine propagation clustering is a kind of clustering method based on information accumulation, is determined according to the cumulative information amount of difference in cluster The heart, two kinds of information content are calculated using similarity matrix；To point i and point k, involved two kinds of information content r (i, k) and a (i, k) For：

<mrow> <mi>r</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&LeftArrow;</mo> <mi>s</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <munder> <mrow> <mi>m</mi> <mi>a</mi> <mi>x</mi> </mrow> <mrow> <mi>k</mi> <mo>*</mo> <mi>s</mi> <mo>.</mo> <mi>t</mi> <mo>.</mo> <mi>k</mi> <mo>*</mo> <mo>&NotEqual;</mo> <mi>k</mi> </mrow> </munder> <mo>{</mo> <mi>a</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>k</mi> <mo>*</mo> <mo>)</mo> </mrow> <mo>+</mo> <mi>s</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>k</mi> <mo>*</mo> <mo>)</mo> </mrow> <mo>}</mo> </mrow>

<mrow> <mi>a</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&LeftArrow;</mo> <munder> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>*</mo> <mo>&NotEqual;</mo> <mi>k</mi> </mrow> </munder> <mi>m</mi> <mi>a</mi> <mi>x</mi> <mo>{</mo> <mn>0</mn> <mo>,</mo> <mi>r</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>*</mo> <mo>,</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>}</mo> <mo>,</mo> <mrow> <mo>(</mo> <mi>i</mi> <mo>=</mo> <mi>k</mi> <mo>)</mo> </mrow> </mrow>

Iteration starts setting up a (i, k)=0, then according to the renewal r (i, k) of above formula iteration and a (i, k) until convergence；

Represent that each task is as the possibility of cluster centre in priori with preference vector in affine propagation clustering；In iteration It is middle to replace the diagonal entry in similarity matrix with preference vector and then influence the selection of cluster centre；Because number of samples compared with Extended meeting obtains more accurate model after more product varietys, is more suitable for cluster centre, in order to by this elder generation of number of samples During testing for clustering, the preference vector in affine transmission method is set with the following method；

If the setting value of preference vector is p=[p₁, p₂..., p_m-1, p_m]：

<mrow> <msub> <mi>p</mi> <mi>i</mi> </msub> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mo>(</mo> <msub> <mi>N</mi> <mi>i</mi> </msub> <mo>-</mo> <mi>M</mi> <mi>a</mi> <mi>x</mi> <mo>(</mo> <msub> <mi>N</mi> <mi>i</mi> </msub> <mo>)</mo> <mo>)</mo> <mo>&times;</mo> <mi>a</mi> </mrow> </mtd> <mtd> <mrow> <msub> <mi>N</mi> <mi>i</mi> </msub> <mo>></mo> <msub> <mi>L</mi> <mn>1</mn> </msub> </mrow> </mtd> <mtd> <mrow> <mi>i</mi> <mo>=</mo> <mo>{</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mo>...</mo> <mo>,</mo> <mi>m</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>m</mi> <mo>}</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>-</mo> <msub> <mi>L</mi> <mn>1</mn> </msub> <mo>&times;</mo> <mi>a</mi> </mrow> </mtd> <mtd> <mrow> <msub> <mi>N</mi> <mi>i</mi> </msub> <mo>&le;</mo> <msub> <mi>L</mi> <mn>1</mn> </msub> </mrow> </mtd> <mtd> <mrow> <mi>i</mi> <mo>=</mo> <mo>{</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mo>...</mo> <mo>,</mo> <mi>m</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>m</mi> <mo>}</mo> </mrow> </mtd> </mtr> </mtable> </mfenced> </mrow>

Wherein L₁Expression wishes that sample size is more than L₁Kind be more likely to turn into cluster centre；Set a=0.005, b= 2000, L=50；

Step (4)：Multi-task learning based on shared parameter extraction

The L classification obtained after cluster, the multi-task learning based on shared parameter extraction is used to the kind in each classification Method, its main thought are that the model parameter of each kind is divided into two parts：Shared parameter and privately owned parameter；Shared parameter is Identical part in the data model of all kinds, uses column vector in each classificationRepresent；And privately owned parameter is each Part different in the data model of each kind, uses column vector in classificationRepresent；If there are r in a classification Kind, then r column vector v⁽ⁱ⁾It may make up the matrix V of a r row；Multi-task learning method based on shared parameter extraction can root Learn this two-part parameter according to the data in each classification so as to obtain the parameter of final mask, to i-th of task, finally Model parameter be：

wⁱ=u+v⁽ⁱ⁾

Iterative process is as follows, the input matrix [X of r kind during wherein X represents a kind of₁, X₂..., X_r-1, X_r] longitudinal arrangement institute The matrix of composition；Output vector [the y of r kind in one kind is equally represented with Y₁, y₂..., y_r-1, y_r] longitudinal arrangement composition Column vector

To kth time iterative calculation：

<mrow> <msubsup> <mi>q</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> <mo>=</mo> <mi>m</mi> <mi>a</mi> <mi>x</mi> <mrow> <mo>(</mo> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo>-</mo> <mfrac> <msub> <mi>&lambda;</mi> <mn>3</mn> </msub> <mrow> <msub> <mi>l</mi> <mrow> <mi>k</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>|</mo> <mo>|</mo> <msubsup> <mi>s</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> <mo>|</mo> <msub> <mo>|</mo> <mn>2</mn> </msub> </mrow> </mfrac> <mo>)</mo> </mrow> <msubsup> <mi>s</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> </mrow>

In above formula：

<mrow> <msub> <mo>&dtri;</mo> <mi>u</mi> </msub> <mi>f</mi> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mi>k</mi> </msub> <mo>,</mo> <msub> <mi>V</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mi>X</mi> <mi>T</mi> </msup> <mfrac> <mrow> <mi>Y</mi> <mo>-</mo> <msub> <mi>X</mi> <msub> <mi>u</mi> <mi>k</mi> </msub> </msub> </mrow> <mrow> <mo>|</mo> <mo>|</mo> <mi>Y</mi> <mo>-</mo> <msub> <mi>X</mi> <msub> <mi>u</mi> <mi>k</mi> </msub> </msub> <mo>|</mo> <msub> <mo>|</mo> <mn>2</mn> </msub> </mrow> </mfrac> <mo>-</mo> <mn>2</mn> <msub> <mi>&lambda;</mi> <mn>1</mn> </msub> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>r</mi> </munderover> <msubsup> <mi>X</mi> <mi>i</mi> <mi>T</mi> </msubsup> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>X</mi> <mi>i</mi> </msub> <mo>(</mo> <mrow> <msub> <mi>u</mi> <mi>k</mi> </msub> <mo>+</mo> <msubsup> <mi>v</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mo>+</mo> <mn>2</mn> <msub> <mi>&lambda;</mi> <mn>2</mn> </msub> <mo>&CenterDot;</mo> <msub> <mi>u</mi> <mi>k</mi> </msub> </mrow>

<mrow> <msub> <mo>&dtri;</mo> <mi>V</mi> </msub> <mi>f</mi> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mi>k</mi> </msub> <mo>,</mo> <msub> <mi>V</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <mn>2</mn> <msub> <mi>&lambda;</mi> <mn>1</mn> </msub> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>r</mi> </munderover> <msubsup> <mi>X</mi> <mi>i</mi> <mi>T</mi> </msubsup> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>X</mi> <mi>i</mi> </msub> <mo>(</mo> <mrow> <msub> <mi>u</mi> <mi>k</mi> </msub> <mo>+</mo> <msubsup> <mi>v</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> </mrow> <mo>)</mo> <mo>)</mo> </mrow> </mrow>

According to p_k, u_k-1And Q_k, V_k-1Update u_k, V_k

u_k=p_k+α_k(p_k-p_k-1)

V_k=Q_k+α_k(Q_k-Q_k-1)

Wherein α ∈ [0,1], can make α₀=0；t₀=1,

In an iterative process, step-length l_kDetermine with the following method：

Wherein j_kTo cause the minimum non-negative positive integer of following formula establishment：

<mrow> <mi>f</mi> <mrow> <mo>(</mo> <mi>u</mi> <mo>,</mo> <mi>V</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>|</mo> <mo>|</mo> <mi>Y</mi> <mo>-</mo> <mi>X</mi> <mi>u</mi> <mo>|</mo> <msub> <mo>|</mo> <mn>2</mn> </msub> <mo>+</mo> <msub> <mi>&lambda;</mi> <mn>1</mn> </msub> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>r</mi> </munderover> <mo>|</mo> <mo>|</mo> <msub> <mi>y</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>X</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <mi>u</mi> <mo>+</mo> <msub> <mi>v</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>|</mo> <msubsup> <mo>|</mo> <mn>2</mn> <mn>2</mn> </msubsup> <mo>+</mo> <msub> <mi>&lambda;</mi> <mn>2</mn> </msub> <mo>|</mo> <mo>|</mo> <mi>u</mi> <mo>|</mo> <msubsup> <mo>|</mo> <mn>2</mn> <mn>2</mn> </msubsup> </mrow>

L model library i.e. can obtain using the above method respectively to the classification of L cluster gained, it comprises all m kinds Model.