CN104598720B - Cmp time setting method based on cluster and multi-task learning - Google Patents

Cmp time setting method based on cluster and multi-task learning Download PDF

Info

Publication number
CN104598720B
CN104598720B CN201410805040.3A CN201410805040A CN104598720B CN 104598720 B CN104598720 B CN 104598720B CN 201410805040 A CN201410805040 A CN 201410805040A CN 104598720 B CN104598720 B CN 104598720B
Authority
CN
China
Prior art keywords
msub
mrow
mfrac
msubsup
mtd
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410805040.3A
Other languages
Chinese (zh)
Other versions
CN104598720A (en
Inventor
刘民
段运强
董明宇
郝井华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengda Industrial Biotechnology (shanghai) Co Ltd
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201410805040.3A priority Critical patent/CN104598720B/en
Publication of CN104598720A publication Critical patent/CN104598720A/en
Application granted granted Critical
Publication of CN104598720B publication Critical patent/CN104598720B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Cmp time setting method based on cluster and multi-task learning, belong to automatically control, information technology and advanced manufacturing field, it is characterized in that, to realize the lower micro-electronic manufacturing process chemistry mechanical polishing time optimal setting of multi items mixing, for this method using technic index and Product Status as input, the key factor milling time for influenceing technic index establishes reverse model for optimizing setting to milling time as output.It is more for production Varieties in above-mentioned reverse model process is built, the problem of single species data is few, similar varieties are clustered by product feature, are modeled in each category using the multi-task learning method based on shared parameter extraction;The model parameter calculated is divided into the shared part of this veriety and the privately owned part of single variety.

Description

Cmp time setting method based on cluster and multi-task learning
Technical field
The invention belongs to automatically control, information technology and advanced manufacturing field.To solve in the modern microelectronic manufacturing Manually the rework rate caused by the setting cmp time is high under multi items mixed-flow production style, the production of whole production line The problem of efficiency is relatively low, the present invention use the optimal setting method based on reverse model, using technic index and Product Status as Input, the milling time of key factor one for influenceing technic index establish reverse model for excellent to milling time progress as output Change setting.For more than the product variety in the presence of creation data and the problem of single species data amount is few, proposing that one kind is based on The cmp time setting method of cluster and multi-task learning, the optimization to the cmp time can be achieved and set It is fixed, improve the production efficiency of cmp process.
Background technology
Cmp is the critical process during micro-electronic manufacturing, affects the production efficiency of whole production line. Due to the productive prospecting of modern microelectronic manufacturing industry multi-varieties and small-batch, pattern of the enterprise frequently with multi items mixture manufacturing. Under this mixed-flow production style, traditional Run-to-Run (RtR) optimal control method is difficult to obtain ideal effect.RtR master It is to instruct the production of batch of newly arriving using the production information of last batch or even a few batches to want thought, generally assumes that same breed Product some a small amount of kinds of the regular cyclic process of equipment of Continuous maching or hypothesis in an equipment.But due to multi items The product of some any kinds of the possible Continuous maching of equipment, different cultivars and equipment room have differences under mixture manufacturing, tradition RtR method effects are poor.Therefore, milling time still relies on artificial experience setting and leading built-in testing at present, once occur not Qualified products, whole batch will be done over again, it is difficult to realized higher production efficiency, needed the processing to cmp badly Time optimizes setting.Traditional operation parameter optimization establishing method needs first to be modeled technic index, Ran Hou Optimize operating parameter on the basis of index model.Technic index model is to be referred to using Product Status and operating parameter as input, technique The forward model of output is designated as, but when data are less, it is difficult to accurately establish index model, cause operation optimization effect It is bad.The idea for instructing following production in RtR with history batch is used for reference, for cmp data are wide in variety, Mei Gepin The characteristics of data are few in kind, the present invention influence the key factor of technic index-grind using technic index and Product Status as inputting Time consuming establishes reverse model as output and is used to optimize milling time setting, based on this proposition it is a kind of based on cluster with The cmp time setting method of multi-task learning.
The content of the invention
It is of the invention by technique to solve the problems, such as the time-optimized setting of cmp under multi items mixture manufacturing environment As input, the key factor-milling time for influenceing technic index establishes reverse model use as output for index and Product Status In optimizing setting to milling time, based on a kind of this proposition cmp time based on cluster and multi-task learning Establishing method.It is more for product variety in processing actual production data, the problem of each species data is few, propose a kind of based on cluster With two step modeling methods of multi-task learning, tradition is clustered into cluster process and parameter learning process in multi-task learning and separated Consider.Variety cluster process assumes the data symbols unification Multi-dimensional Gaussian distribution in each kind, using maximal possibility estimation pair The mean vector and variance matrix of Multi-dimensional Gaussian distribution are estimated.According to estimated result, represented using Pasteur's distance more than two Tie up the similarity of Gaussian Profile.Similarity matrix is obtained by calculating the similarity between different cultivars probability distribution.With similar Spend matrix and Variety cluster is completed using affine propagation algorithm as input.While in order to ensure the result of cluster gained under small sample In sample size in each classification it is sufficiently large, be embedded into kind using the sample size of each kind as a kind of priori In cluster process.
In each category using the multi-task learning algorithm computation model parameter based on shared parameter extraction proposed. In order to solve the problems, such as that sample size is few in individual task, the multi-task learning algorithm based on shared parameter extraction proposed will Model parameter is decomposed into shared part and privately owned part and learns this two parts simultaneously.Shared part can make up sample data not The model bias that foot straps are come.
Cmp time setting method based on cluster and multi-task learning, it is characterised in that methods described is Realize according to the following steps successively on computers:
Step (1):Data preparation
The optimal setting model established in this method is with 4 technic indexs and Product Status composition model input row vector X, including:Grinding-material removal rate, carry out piece thickness, take a sample test leading piece slice thickness and lot and take a sample test slice thickness, to change It is that model exports y to learn mechanical polishing time;Relation between hypothesized model input and output meets following formula:
Y=xw+ δ
Wherein column vector w represents model parameter to be determined, and δ is noise.
Without loss of generality, it is assumed that have m product variety;The N of i-th of product varietyiIndividual sample input is designated as matrixIts jth row xi(j)The input vector of j-th of sample of i-th of product variety is represented, d is mode input variable Count, d=4 in this method;The N of i-th of product varietyiIndividual sample output is designated as column vectorIts j-th of elementTable Show the output of j-th of sample of i-th of product variety, i.e. cmp time;Represented for ease of follow-up, with X represent by Input matrix [X1, X2..., Xm-1, Xm] matrix that is formed of longitudinal arrangement;Equally represented with Y by output vector [y1, y2..., ym-1, ym] longitudinal arrangement form column vector;X and Y haveNiOK, N represents all total sample numbers;Column vectorThe label of the affiliated kind of each sample is represented, span is { 1,2 ..., m-1, m };
Step (2):Calculate the similarity matrix of different cultivars
Assuming that the data of each kind obey different Multi-dimensional Gaussian distributions, calculated using Maximum Likelihood Estimation each The probability-distribution function of kind;For example, to i-th of kind, the mean vector of its Multi-dimensional Gaussian distribution andAnd variance matrix Estimate:
Wherein matrixRow vector zi(j)Representing matrix ZiJth row, include 4 input Variable and 1 output variable;
Similarity between different cultivars is compared using Pasteur's distance, that is, compares the similar of different cultivars Multi-dimensional Gaussian distribution Degree:
Under the hypothesis of Multi-dimensional Gaussian distribution, Pasteur's distance has analytical expression, it is assumed that two Multi-dimensional Gaussian distributions it is general Rate is distributed as G1~N (μ1, ∑1), G2~N (μ2, ∑2) distance calculating method be:
Wherein| A | representing matrix A determinant;
According to above-mentioned diversity factor computational methods, based on the polynary Gaussian Profile mean vector of each kindAnd variance matrixEstimate, similarity matrix is calculated;Because subsequent affine propagation clustering algorithm required input is similarity, so difference Degree takes negative to obtain similarity;
Step (3):Product feature cluster based on affine propagation
Affine propagation clustering is a kind of clustering algorithm based on information accumulation, is determined according to the cumulative information amount of difference poly- Class center, mainly calculate two kinds of information content, γ (i, k), a (i, k) using similarity matrix:
A (i, k)=0 is started setting up, then according to the renewal γ (i, k) of above formula iteration and a (i, k) until convergence;
Represent that each task is as the possibility of cluster centre in priori with preference vector in affine propagation clustering; The diagonal entry in similarity matrix is replaced with preference vector and then influence the selection of cluster centre in iteration;Because sample number Extended meeting obtains more accurate model after the more product variety of mesh, is more suitable for cluster centre, in order to by number of samples this One priori is used for during clustering, and sets the preference vector in affine propagation algorithm with the following method;
If the setting value of preference vector is p=[p1, p2..., pm-1, pm]:
Wherein NiThe sample number of each task is represented, L represents to wish that kind of the sample size more than L is more likely to turn into poly- Class center;Representative value sets a=0.005, b=2000, L=50;
Step (4):Multi-task learning based on shared parameter extraction
L classification is obtained after cluster, the multitask based on shared parameter extraction is used to the kind in each classification Algorithm is practised, the main thought of its algorithm is that the model parameter of each kind is divided into two parts:Shared parameter and privately owned parameter;Altogether It is identical part in the data model of all kinds in each classification to enjoy parameter, uses column vectorRepresent;And privately owned ginseng Number is parts different in the data model of each kind in each classification, uses column vectorRepresent;An if classification In have γ kind, then γ column vector viIt may make up the matrix V of a γ row;Multi-task learning based on shared parameter extraction Algorithm can learn this two-part parameter so as to obtain the parameter of final mask, such as to according to the data in each classification I task, final model parameter are:
wi=u+vi
, it is necessary to which first data are normalized before model learning is carried out;Then the parameter of setting model, including λ1, λ2, λ3, random initializtion shares parameter vector u and privately owned parameter matrix V;
Iterative process is as follows, the input matrix [X of γ kind during wherein X represents a kind of1, X2..., XXr-1, Xr] longitudinal direction The formed matrix of arrangement;Output vector [the y of γ kind in one kind is equally represented with Y1, y2..., yr-1, yr] longitudinal arrangement The column vector of composition is to kth time iterative calculation:
In above formula:
According to pk, uk-1And Qk, Vk-1Update uk, Vk
Wherein α ∈ [0,1], can make α0=0;t0=1,
In an iterative process, step-length lkDetermine with the following method:
lk=2jklk-1, wherein jkTo cause the minimum non-negative positive integer of following formula establishment:
L model library i.e. can obtain using the above method respectively to the classification of L cluster gained, it comprises all m The model of kind.
Brief description of the drawings
Fig. 1:Cmp time setting method flow chart based on cluster and multi-task learning
Fig. 2:Cmp time setting method software and hardware based on cluster and multi-task learning forms figure.
Embodiment
The present invention proposes the cmp time setting method based on cluster and multi-task learning, and its main advantage exists In available for multi items mixture manufacturing, production efficiency can be improved compared to artificial setting.In actual application, if new Production batch arrives, and can calculate milling time according to its kind and machined layer species and other batch informations.The present invention based on Cluster depends on related hardware device to multi-task learning algorithm, including:Data collecting system, arithmetic server and user visitor Family end, and by being realized based on intelligent optimization software.The present invention proposes that method flow diagram is as shown in Figure 2.
Step (1):Data acquisition
The product variety of the production information of collection including lot, processing level, material removing rate, carry out piece thickness, leading piece goes out Piece thickness, sampling observation slice thickness, come piece thickness range, slice thickness range, initialization information is stored to production process data In storehouse;
Step (2):Data preparation
On model training server, exception history record is removed first, including artificially inputted in historical data obvious The exception record of exception record and individual data item missing.Then data are arranged according to the requirement of model learning, wherein by product Kind and a kind integrated positioning the most are a kind of broad sense kind, and the different levels of same breed are considered as different product Kind.The data set formed isWherein jth rowIt is made up of 4 dimension datas, including material removal rate, carrys out piece Thickness, leading piece slice thickness and lot take a sample test slice thickness;J-th of elementFor the cmp time;Product Kind numbers vectorial I.
Step (3):Model training
On model training server, according to reduced data collection { Xi, yi, i={ 1,2 ..., m-1, m } and kind Number vectorial I and carry out model learning.Including two steps:Product feature clusters and shared the multi-task learning of parameter extraction.Root Calculation process in being described according to invention completes the process of whole model training.The present invention proposes that method flow diagram is as shown in Figure 1.
Step (4):Parameter optimization
In being clustered based on product feature and being shared the multi-task learning of parameter extraction, main model parameter includes sample Number threshold value L, cluster numbers parameter a and multi-task learning parameter lambda1, λ2, λ3.Sample threshold L can be by each in whole data set The sample number of kind is counted to determine.Its excess-three parameter is optimized using the method for cross validation.Its main process is pair Whole data set is split and respectively as training set and test set.For example, if 3 cross validations are carried out, then by each kind Data be divided into 3 parts, then take 1 part to be used as test data, 2 are allocated as training data.Trained to obtain model with training data, so The optimal setting effect of test data test model is used afterwards.By setting different a, λ1, λ2, λ3The best parameter of Selection effect. Finally optimal model is transferred in presence server.
Step (4):Application on site
At the scene on server, according to actual production data be transmitted through come current processing lot data, select corresponding kind Model parameter is determined with the data of level.Data x in present lot includes material removal rate, carrys out piece thickness, lacks in advance Piece slice thickness and lot take a sample test slice thickness.Replace leading piece slice thick with the standard slice thickness of this kind and level herein Degree and lot take a sample test slice thickness.The optimal setting value of cmp time is can obtain with model parameter dot product vector x.
Based on it is above-mentioned it is proposed based on product feature cluster and share parameter extraction multi-task learning method, the present invention is done Substantial amounts of l-G simulation test, by length is limited, only provides the invention and is applied to the time-optimized setting of cmp here Simulation result.Input data is made up of 4 dimension datas, including material removal rate, carrys out piece thickness, leading piece slice thickness and Lot takes a sample test slice thickness.It is common to the industrial field data between 2013-7-25,441 kinds that data are derived from 2011-1-1 11250 records, model parameter is determined using 3 cross validations.
Of the invention and typical multi-task learning algorithm CMTL-convex (Clustered multi-task learning A convex formulation) and non-linear modeling method core extreme learning machine (KELM), SVMs (SVM) progress Compare.Wherein in order to prove the validity of Variety cluster algorithm, respectively to not using KELM, SVM of Variety cluster method with entering Row Variety cluster later Cluster-KELM and Cluster-SVM is compared.Used in KELM and SVM is Gaussian kernel letter Number, expression formula are:
exp(-γ*||u-v||2)
An other parameter is the regular terms weight in model.Relevant parameter value is as shown in the table.
The algorithm relative parameters setting of table 1
The data for randomly selecting different proportion (0.2,0.4,0.6) carry out algorithm performance comparison as test data.Performance Index selection normalized mean-square error (normalized Mean Squared Error, nMSE), Averaged Square Error of Multivariate Proportion (error > 10%) compares knot beyond (averaged Means Squared Error, aMSE) and 10% error Fruit is as shown in table 2
The LS-IELM of table 2 and OS-ELM, Fixed-LSSVM algorithm performance comparative result
As can be seen from the table, TwCMTL proposed by the present invention compared to CMTL-convex, SVM, KELM, Cluster-SVM, Cluster-KELM has more preferable measuring accuracy, has more preferable generalization ability.

Claims (1)

1. based on cluster and multi-task learning cmp time setting method, it is characterised in that methods described be Realized according to the following steps successively on computer:
Step (1):Data preparation
The optimal setting model established in this method with 4 technic indexs and Product Status composition model line of input vector x, its Include:Grinding-material removal rate, carry out piece thickness, take a sample test leading piece slice thickness and lot and take a sample test slice thickness, with chemical machine Tool milling time is that model exports y;Relation between hypothesized model input and output meets following formula:
Y=xw+ δ
Wherein column vector w represents model parameter to be determined, and δ is noise;
Assuming that there is m product variety;The N of i-th of product varietyiIndividual sample input is designated as matrixIts jth row xi(j) Represent the input vector of j-th of sample of i-th of product variety, d is mode input variable number, d=4;I-th of product variety NiIndividual sample output is designated as column vectorIts j-th of elementRepresent the defeated of j-th of sample of i-th of product variety Go out, i.e. the cmp time;Represent for ease of follow-up, represented with X by input matrix [X1, X2..., Xm-1, Xm] longitudinal direction row The formed matrix of row;Equally represented with Y by output vector [y1, y2..., ym-1, ym] longitudinal arrangement form column vector;X and Y hasOK, N represents all total sample numbers;Column vectorRepresent the mark of the affiliated kind of each sample Number, span is { 1,2 ..., m-1, m };
Step (2):Calculate the similarity matrix of different cultivars
The probability-distribution function of each kind is calculated using Maximum Likelihood Estimation, each product under Multi-dimensional Gaussian distribution hypothesis The corresponding mean vector of kindAnd variance matrixEstimate be:
<mrow> <msub> <mover> <mi>&amp;mu;</mi> <mo>^</mo> </mover> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <msub> <mi>N</mi> <mi>i</mi> </msub> </mfrac> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>N</mi> <mi>i</mi> </msub> </munderover> <msub> <mi>z</mi> <mrow> <mi>i</mi> <mrow> <mo>(</mo> <mi>j</mi> <mo>)</mo> </mrow> </mrow> </msub> <mo>,</mo> <mi>i</mi> <mo>=</mo> <mo>{</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mo>...</mo> <mo>,</mo> <mi>m</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>m</mi> <mo>}</mo> </mrow>
<mrow> <msub> <mover> <mo>&amp;Sigma;</mo> <mo>^</mo> </mover> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <msub> <mi>N</mi> <mi>i</mi> </msub> <mo>-</mo> <mn>1</mn> </mrow> </mfrac> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>N</mi> <mi>i</mi> </msub> </munderover> <mrow> <mo>(</mo> <msub> <mi>z</mi> <mrow> <mi>i</mi> <mrow> <mo>(</mo> <mi>j</mi> <mo>)</mo> </mrow> </mrow> </msub> <mo>-</mo> <msub> <mover> <mi>&amp;mu;</mi> <mo>^</mo> </mover> <mi>i</mi> </msub> <mo>)</mo> </mrow> <msup> <mrow> <mo>(</mo> <msub> <mi>z</mi> <mrow> <mi>i</mi> <mrow> <mo>(</mo> <mi>j</mi> <mo>)</mo> </mrow> </mrow> </msub> <mo>-</mo> <msub> <mover> <mi>&amp;mu;</mi> <mo>^</mo> </mover> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>&amp;prime;</mo> </msup> <mo>,</mo> <mi>i</mi> <mo>=</mo> <mo>{</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mo>...</mo> <mo>,</mo> <mi>m</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>m</mi> <mo>}</mo> </mrow>
Wherein matrixRow vector zi(j)Representing matrix ZiJth row, comprising 4 input variables and 1 output variable;
Diversity factor between different cultivars is compared using Pasteur's distance, that is, compares the diversity factor of different cultivars Multi-dimensional Gaussian distribution; The definition of Pasteur's distance is:
<mrow> <msub> <mi>d</mi> <mi>B</mi> </msub> <mrow> <mo>(</mo> <mi>p</mi> <mo>(</mo> <mi>x</mi> <mo>)</mo> <mo>,</mo> <mi>q</mi> <mo>(</mo> <mi>x</mi> <mo>)</mo> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <mi>l</mi> <mi>n</mi> <mrow> <mo>(</mo> <mo>&amp;Integral;</mo> <msqrt> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mi>q</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> </mrow> </msqrt> <mi>d</mi> <mi>x</mi> <mo>)</mo> </mrow> </mrow>
Under the hypothesis of Multi-dimensional Gaussian distribution, Pasteur's distance has analytical expression:Two Multi-dimensional Gaussian distribution G1~N (μ1, ∑1) And G2~N (μ2, ∑2) Pasteur apart from calculating formula:
<mrow> <msub> <mi>d</mi> <mi>B</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>G</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>G</mi> <mn>2</mn> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mn>8</mn> </mfrac> <msup> <mrow> <mo>(</mo> <msub> <mi>&amp;mu;</mi> <mn>1</mn> </msub> <mo>-</mo> <msub> <mi>&amp;mu;</mi> <mn>2</mn> </msub> <mo>)</mo> </mrow> <mi>T</mi> </msup> <msup> <mi>&amp;Gamma;</mi> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msup> <mrow> <mo>(</mo> <msub> <mi>&amp;mu;</mi> <mn>1</mn> </msub> <mo>-</mo> <msub> <mi>&amp;mu;</mi> <mn>2</mn> </msub> <mo>)</mo> </mrow> <mo>+</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mi>ln</mi> <mrow> <mo>(</mo> <mo>|</mo> <msub> <mi>&amp;Sigma;</mi> <mn>1</mn> </msub> <msup> <mo>|</mo> <mrow> <mo>-</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> </mrow> </msup> <mo>|</mo> <msub> <mi>&amp;Sigma;</mi> <mn>2</mn> </msub> <msup> <mo>|</mo> <mrow> <mo>-</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> </mrow> </msup> <mo>|</mo> <mi>&amp;Gamma;</mi> <mo>|</mo> <mo>)</mo> </mrow> </mrow>
Wherein| A | representing matrix A determinant;
Because subsequent affine propagation clustering method required input is similarity, diversity factor is taken and negative obtains similarity;It is based on The polynary Gaussian Profile mean vector of each kindAnd variance matrixEstimate, similarity matrix is calculated;
Step (3):Product feature cluster based on affine propagation
Affine propagation clustering is a kind of clustering method based on information accumulation, is determined according to the cumulative information amount of difference in cluster The heart, two kinds of information content are calculated using similarity matrix;To point i and point k, involved two kinds of information content r (i, k) and a (i, k) For:
<mrow> <mi>r</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&amp;LeftArrow;</mo> <mi>s</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>-</mo> <munder> <mrow> <mi>m</mi> <mi>a</mi> <mi>x</mi> </mrow> <mrow> <mi>k</mi> <mo>*</mo> <mi>s</mi> <mo>.</mo> <mi>t</mi> <mo>.</mo> <mi>k</mi> <mo>*</mo> <mo>&amp;NotEqual;</mo> <mi>k</mi> </mrow> </munder> <mo>{</mo> <mi>a</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>k</mi> <mo>*</mo> <mo>)</mo> </mrow> <mo>+</mo> <mi>s</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>k</mi> <mo>*</mo> <mo>)</mo> </mrow> <mo>}</mo> </mrow>
<mrow> <mi>a</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>&amp;LeftArrow;</mo> <munder> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>*</mo> <mo>&amp;NotEqual;</mo> <mi>k</mi> </mrow> </munder> <mi>m</mi> <mi>a</mi> <mi>x</mi> <mo>{</mo> <mn>0</mn> <mo>,</mo> <mi>r</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>*</mo> <mo>,</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>}</mo> <mo>,</mo> <mrow> <mo>(</mo> <mi>i</mi> <mo>=</mo> <mi>k</mi> <mo>)</mo> </mrow> </mrow>
Iteration starts setting up a (i, k)=0, then according to the renewal r (i, k) of above formula iteration and a (i, k) until convergence;
Represent that each task is as the possibility of cluster centre in priori with preference vector in affine propagation clustering;In iteration It is middle to replace the diagonal entry in similarity matrix with preference vector and then influence the selection of cluster centre;Because number of samples compared with Extended meeting obtains more accurate model after more product varietys, is more suitable for cluster centre, in order to by this elder generation of number of samples During testing for clustering, the preference vector in affine transmission method is set with the following method;
If the setting value of preference vector is p=[p1, p2..., pm-1, pm]:
<mrow> <msub> <mi>p</mi> <mi>i</mi> </msub> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mo>(</mo> <msub> <mi>N</mi> <mi>i</mi> </msub> <mo>-</mo> <mi>M</mi> <mi>a</mi> <mi>x</mi> <mo>(</mo> <msub> <mi>N</mi> <mi>i</mi> </msub> <mo>)</mo> <mo>)</mo> <mo>&amp;times;</mo> <mi>a</mi> </mrow> </mtd> <mtd> <mrow> <msub> <mi>N</mi> <mi>i</mi> </msub> <mo>&gt;</mo> <msub> <mi>L</mi> <mn>1</mn> </msub> </mrow> </mtd> <mtd> <mrow> <mi>i</mi> <mo>=</mo> <mo>{</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mo>...</mo> <mo>,</mo> <mi>m</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>m</mi> <mo>}</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>-</mo> <msub> <mi>L</mi> <mn>1</mn> </msub> <mo>&amp;times;</mo> <mi>a</mi> </mrow> </mtd> <mtd> <mrow> <msub> <mi>N</mi> <mi>i</mi> </msub> <mo>&amp;le;</mo> <msub> <mi>L</mi> <mn>1</mn> </msub> </mrow> </mtd> <mtd> <mrow> <mi>i</mi> <mo>=</mo> <mo>{</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mo>...</mo> <mo>,</mo> <mi>m</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>m</mi> <mo>}</mo> </mrow> </mtd> </mtr> </mtable> </mfenced> </mrow>
Wherein L1Expression wishes that sample size is more than L1Kind be more likely to turn into cluster centre;Set a=0.005, b= 2000, L=50;
Step (4):Multi-task learning based on shared parameter extraction
The L classification obtained after cluster, the multi-task learning based on shared parameter extraction is used to the kind in each classification Method, its main thought are that the model parameter of each kind is divided into two parts:Shared parameter and privately owned parameter;Shared parameter is Identical part in the data model of all kinds, uses column vector in each classificationRepresent;And privately owned parameter is each Part different in the data model of each kind, uses column vector in classificationRepresent;If there are r in a classification Kind, then r column vector v(i)It may make up the matrix V of a r row;Multi-task learning method based on shared parameter extraction can root Learn this two-part parameter according to the data in each classification so as to obtain the parameter of final mask, to i-th of task, finally Model parameter be:
wi=u+v(i)
, it is necessary to which first data are normalized before model learning is carried out;Then the parameter of setting model, including λ1, λ2, λ3, random initializtion shares parameter vector u and privately owned parameter matrix V;
Iterative process is as follows, the input matrix [X of r kind during wherein X represents a kind of1, X2..., Xr-1, Xr] longitudinal arrangement institute The matrix of composition;Output vector [the y of r kind in one kind is equally represented with Y1, y2..., yr-1, yr] longitudinal arrangement composition Column vector
To kth time iterative calculation:
<mrow> <msub> <mi>p</mi> <mi>k</mi> </msub> <mo>=</mo> <msub> <mi>u</mi> <mrow> <mi>k</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>-</mo> <mfrac> <mn>1</mn> <msub> <mi>l</mi> <mrow> <mi>k</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> </mfrac> <msub> <mo>&amp;dtri;</mo> <mi>u</mi> </msub> <mi>f</mi> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mrow> <mi>k</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>V</mi> <mrow> <mi>k</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> </mrow>
<mrow> <msubsup> <mi>q</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> <mo>=</mo> <mi>m</mi> <mi>a</mi> <mi>x</mi> <mrow> <mo>(</mo> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo>-</mo> <mfrac> <msub> <mi>&amp;lambda;</mi> <mn>3</mn> </msub> <mrow> <msub> <mi>l</mi> <mrow> <mi>k</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>|</mo> <mo>|</mo> <msubsup> <mi>s</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> <mo>|</mo> <msub> <mo>|</mo> <mn>2</mn> </msub> </mrow> </mfrac> <mo>)</mo> </mrow> <msubsup> <mi>s</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> </mrow>
In above formula:
<mrow> <msub> <mi>S</mi> <mi>k</mi> </msub> <mo>=</mo> <msub> <mi>V</mi> <mrow> <mi>k</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>-</mo> <mfrac> <mn>1</mn> <msub> <mi>l</mi> <mrow> <mi>k</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> </mfrac> <msub> <mo>&amp;dtri;</mo> <mi>V</mi> </msub> <mi>f</mi> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mrow> <mi>k</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>V</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> </mrow>
<mrow> <msub> <mo>&amp;dtri;</mo> <mi>u</mi> </msub> <mi>f</mi> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mi>k</mi> </msub> <mo>,</mo> <msub> <mi>V</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mi>X</mi> <mi>T</mi> </msup> <mfrac> <mrow> <mi>Y</mi> <mo>-</mo> <msub> <mi>X</mi> <msub> <mi>u</mi> <mi>k</mi> </msub> </msub> </mrow> <mrow> <mo>|</mo> <mo>|</mo> <mi>Y</mi> <mo>-</mo> <msub> <mi>X</mi> <msub> <mi>u</mi> <mi>k</mi> </msub> </msub> <mo>|</mo> <msub> <mo>|</mo> <mn>2</mn> </msub> </mrow> </mfrac> <mo>-</mo> <mn>2</mn> <msub> <mi>&amp;lambda;</mi> <mn>1</mn> </msub> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>r</mi> </munderover> <msubsup> <mi>X</mi> <mi>i</mi> <mi>T</mi> </msubsup> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>X</mi> <mi>i</mi> </msub> <mo>(</mo> <mrow> <msub> <mi>u</mi> <mi>k</mi> </msub> <mo>+</mo> <msubsup> <mi>v</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mo>+</mo> <mn>2</mn> <msub> <mi>&amp;lambda;</mi> <mn>2</mn> </msub> <mo>&amp;CenterDot;</mo> <msub> <mi>u</mi> <mi>k</mi> </msub> </mrow>
<mrow> <msub> <mo>&amp;dtri;</mo> <mi>V</mi> </msub> <mi>f</mi> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mi>k</mi> </msub> <mo>,</mo> <msub> <mi>V</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <mn>2</mn> <msub> <mi>&amp;lambda;</mi> <mn>1</mn> </msub> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>r</mi> </munderover> <msubsup> <mi>X</mi> <mi>i</mi> <mi>T</mi> </msubsup> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>X</mi> <mi>i</mi> </msub> <mo>(</mo> <mrow> <msub> <mi>u</mi> <mi>k</mi> </msub> <mo>+</mo> <msubsup> <mi>v</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> </mrow> <mo>)</mo> <mo>)</mo> </mrow> </mrow>
According to pk, uk-1And Qk, Vk-1Update uk, Vk
uk=pkk(pk-pk-1)
Vk=Qkk(Qk-Qk-1)
Wherein α ∈ [0,1], can make α0=0;t0=1,
In an iterative process, step-length lkDetermine with the following method:
Wherein jkTo cause the minimum non-negative positive integer of following formula establishment:
<mrow> <mi>f</mi> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mi>k</mi> </msub> <mo>,</mo> <msub> <mi>V</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>&amp;le;</mo> <msub> <mi>F</mi> <mrow> <msub> <mi>u</mi> <mi>k</mi> </msub> <mo>,</mo> <msub> <mi>V</mi> <mi>k</mi> </msub> <mo>,</mo> <msub> <mi>l</mi> <mi>k</mi> </msub> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>V</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> </mrow>
<mrow> <mi>f</mi> <mrow> <mo>(</mo> <mi>u</mi> <mo>,</mo> <mi>V</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>|</mo> <mo>|</mo> <mi>Y</mi> <mo>-</mo> <mi>X</mi> <mi>u</mi> <mo>|</mo> <msub> <mo>|</mo> <mn>2</mn> </msub> <mo>+</mo> <msub> <mi>&amp;lambda;</mi> <mn>1</mn> </msub> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>r</mi> </munderover> <mo>|</mo> <mo>|</mo> <msub> <mi>y</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>X</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <mi>u</mi> <mo>+</mo> <msub> <mi>v</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>|</mo> <msubsup> <mo>|</mo> <mn>2</mn> <mn>2</mn> </msubsup> <mo>+</mo> <msub> <mi>&amp;lambda;</mi> <mn>2</mn> </msub> <mo>|</mo> <mo>|</mo> <mi>u</mi> <mo>|</mo> <msubsup> <mo>|</mo> <mn>2</mn> <mn>2</mn> </msubsup> </mrow>
<mfenced open = "" close = ""> <mtable> <mtr> <mtd> <mrow> <msub> <mi>F</mi> <mrow> <msub> <mi>u</mi> <mi>k</mi> </msub> <mo>,</mo> <msub> <mi>V</mi> <mi>k</mi> </msub> <mo>,</mo> <msub> <mi>l</mi> <mi>k</mi> </msub> </mrow> </msub> <mrow> <mo>(</mo> <mi>u</mi> <mo>,</mo> <mi>V</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>f</mi> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mi>k</mi> </msub> <mo>,</mo> <msub> <mi>V</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <mo>&lt;</mo> <mi>u</mi> <mo>-</mo> <msub> <mi>u</mi> <mi>k</mi> </msub> <mo>,</mo> <msub> <mo>&amp;dtri;</mo> <mi>u</mi> </msub> <mi>f</mi> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mi>k</mi> </msub> <mo>,</mo> <msub> <mi>V</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>&gt;</mo> <mo>+</mo> <mfrac> <msub> <mi>l</mi> <mi>k</mi> </msub> <mn>2</mn> </mfrac> <mo>|</mo> <mo>|</mo> <mi>u</mi> <mo>-</mo> <msub> <mi>u</mi> <mi>k</mi> </msub> <mo>|</mo> <msubsup> <mo>|</mo> <mn>2</mn> <mn>2</mn> </msubsup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>+</mo> <mo>&lt;</mo> <mi>V</mi> <mo>-</mo> <msub> <mi>V</mi> <mi>k</mi> </msub> <mo>,</mo> <msub> <mo>&amp;dtri;</mo> <mi>V</mi> </msub> <mi>f</mi> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mi>k</mi> </msub> <mo>,</mo> <msub> <mi>V</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>&gt;</mo> <mo>+</mo> <mfrac> <msub> <mi>l</mi> <mi>k</mi> </msub> <mn>2</mn> </mfrac> <mo>|</mo> <mo>|</mo> <mi>V</mi> <mo>-</mo> <msub> <mi>V</mi> <mi>k</mi> </msub> <mo>|</mo> <msubsup> <mo>|</mo> <mi>F</mi> <mn>2</mn> </msubsup> </mrow> </mtd> </mtr> </mtable> </mfenced>
L model library i.e. can obtain using the above method respectively to the classification of L cluster gained, it comprises all m kinds Model.
CN201410805040.3A 2014-12-23 2014-12-23 Cmp time setting method based on cluster and multi-task learning Active CN104598720B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410805040.3A CN104598720B (en) 2014-12-23 2014-12-23 Cmp time setting method based on cluster and multi-task learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410805040.3A CN104598720B (en) 2014-12-23 2014-12-23 Cmp time setting method based on cluster and multi-task learning

Publications (2)

Publication Number Publication Date
CN104598720A CN104598720A (en) 2015-05-06
CN104598720B true CN104598720B (en) 2018-04-10

Family

ID=53124499

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410805040.3A Active CN104598720B (en) 2014-12-23 2014-12-23 Cmp time setting method based on cluster and multi-task learning

Country Status (1)

Country Link
CN (1) CN104598720B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6845278B2 (en) * 2002-08-07 2005-01-18 Kimberly-Clark Worldwide, Inc. Product attribute data mining in connection with a web converting manufacturing process
CN101853507A (en) * 2010-06-03 2010-10-06 浙江工业大学 Cell sorting method for affine propagation clustering
CN103093078A (en) * 2012-12-18 2013-05-08 湖南大唐先一科技有限公司 Data inspection method for improved 53H algorithm

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6845278B2 (en) * 2002-08-07 2005-01-18 Kimberly-Clark Worldwide, Inc. Product attribute data mining in connection with a web converting manufacturing process
CN101853507A (en) * 2010-06-03 2010-10-06 浙江工业大学 Cell sorting method for affine propagation clustering
CN103093078A (en) * 2012-12-18 2013-05-08 湖南大唐先一科技有限公司 Data inspection method for improved 53H algorithm

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems;A Beck 等;《Siam Journal on Imaging Sciences》;20091231;第183-202页 *
A Few Useful Things to Know about Machine Learning;Pedro Domingos;《Communications of the Acm》;20121031;第1-9页 *
Clustering by Passing Messages Between Data Points;Brendan J.Frey;《Science》;20070216;第942-951页 *
Regularized mult-task learning;T.Evgeniou;《Tench Acm Sigkdd International Conference on Knowledge Discovery & Data Mining》;20041231;第109-117页 *

Also Published As

Publication number Publication date
CN104598720A (en) 2015-05-06

Similar Documents

Publication Publication Date Title
CN104699894B (en) Gaussian process based on real-time learning returns multi-model Fusion Modeling Method
CN108959728B (en) Radio frequency device parameter optimization method based on deep learning
TW201734840A (en) Automatic multi-threshold characteristic filtering method and apparatus
CN108804784A (en) A kind of instant learning soft-measuring modeling method based on Bayes&#39;s gauss hybrid models
CN108920863B (en) Method for establishing energy consumption estimation model of robot servo system
CN110516818A (en) A kind of high dimensional data prediction technique based on integrated study technology
CN110942194A (en) Wind power prediction error interval evaluation method based on TCN
CN106600001B (en) Glass furnace Study of Temperature Forecasting method based on Gaussian mixtures relational learning machine
US20220036231A1 (en) Method and device for processing quantum data
CN107451102A (en) A kind of semi-supervised Gaussian process for improving self-training algorithm returns soft-measuring modeling method
US20110060441A1 (en) Clustering for Prediction Models in Process Control and for Optimal Dispatching
US20110029469A1 (en) Information processing apparatus, information processing method and program
JP2008305373A (en) Dual-phase virtual metrology method
CN106067034B (en) Power distribution network load curve clustering method based on high-dimensional matrix characteristic root
CA2652710A1 (en) Pruning-based variation-aware design
CN111340069A (en) Incomplete data fine modeling and missing value filling method based on alternate learning
CN109933040B (en) Fault monitoring method based on hierarchical density peak clustering and most similar mode
CN106547899B (en) Intermittent process time interval division method based on multi-scale time-varying clustering center change
CN104330972A (en) Comprehensive prediction iterative learning control method based on model adaptation
Gao et al. Modifier adaptation with quadratic approximation in iterative optimizing control
CN108808657B (en) Short-term prediction method for power load
CN107436957A (en) A kind of chaos polynomial construction method
CN108537249A (en) A kind of industrial process data clustering method of density peaks cluster
CN104598720B (en) Cmp time setting method based on cluster and multi-task learning
US8406904B2 (en) Two-dimensional multi-products multi-tools advanced process control

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20181031

Address after: 200120 Shanghai free trade pilot area 707 Zhang Yang road two floor West 205 room

Patentee after: Zhengda Industrial Biotechnology (Shanghai) Co., Ltd.

Address before: 100084 Tsinghua Yuan, Beijing, Haidian District

Patentee before: Tsinghua University

DD01 Delivery of document by public notice

Addressee: Zhengda Industrial Biotechnology (Shanghai) Co., Ltd.

Document name: Notification to Pay the Fees

DD01 Delivery of document by public notice
DD01 Delivery of document by public notice

Addressee: Zhang Wei

Document name: payment instructions

DD01 Delivery of document by public notice
DD01 Delivery of document by public notice

Addressee: Zhang Wei

Document name: payment instructions

DD01 Delivery of document by public notice