CN103403724B

CN103403724B - Neural metwork training accurately and quickly for the critical dimension CD metering based on storehouse

Info

Publication number: CN103403724B
Application number: CN201280010987.4A
Authority: CN
Inventors: 金文�; V·永; 鲍君威; 李列泉; L·波斯拉夫斯基
Original assignee: KLA Tencor Corp
Current assignee: KLA Corp
Priority date: 2011-03-04
Filing date: 2012-02-28
Publication date: 2016-11-09
Anticipated expiration: 2032-02-28
Also published as: CN107092958B; US20140032463A1; EP2681684A2; WO2012150993A2; KR20140017588A; US20120226644A1; CN103403724A; TW201243738A; CN107092958A; JP2014514533A; US9607265B2; WO2012150993A3; US8577820B2; KR20180125056A; KR101958161B1; KR101992500B1

Abstract

Describe the method for the accurate neural metwork training measuring based on the critical dimension (CD) in storehouse.Also describe the method for the fast neuronal network training measuring based on the critical dimension (CD) in storehouse.

Description

Neural metwork training accurately and quickly for the critical dimension CD metering based on storehouse

Technical field

Embodiments of the present invention are optical metrology field, more particularly, to for the critical dimension (CD) based on storehouse The neural metwork training accurately and quickly of metering.

Background technology

Optical metrology technology generally refers to provide the possibility characterizing workpiece parameter as scatterometry in the fabrication process Property.In practice, by light direct irradiation on the periodic optical grating be formed at workpiece and measure and analyze reflection light spectrum come Characterize this grating parameter.Characterized parameter can include the critical dimension (CD) of reflectivity and the refractive index affecting material, sidewall (sidewall) angle (SWA) and characteristic height (HT) etc..The characterization of grating thus workpiece can be characterized and also can have at light The feature of the manufacture process utilizing in the formation of grid and workpiece.

At several years of the past, rigorous coupled-wave method (RCWA) and similar algorithm be widely used in diffraction structure research and Design.In RCWA method, the profile of periodic structure is approximated by the sufficiently thin planar light screen giving quantity.Concrete next Saying, RCWA comprises three primary operational, i.e. the Fourier expansion of the field in grating, the constant coefficients matrix characterizing diffracted signal Characteristic value and the calculating of characteristic vector, and the solution of the linear system inferred by the boundary condition.RCWA will Problem is divided into three different area of space: 1) support the peripheral region of the summation of the order of diffraction of plane of incidence wave field and all reflections Territory, 2) optical grating construction and following non-pattern (pattern) layer, wherein wave field is considered the mould being associated with each order of diffraction The superposition of formula, and 3) comprise the substrate of transmitted wave field.

The accuracy of RCWA solution partly depends on the item number retaining in the space harmonics of wave field launches, typically full The foot conservation of energy.The item number retaining is the function of the diffraction progression considered during calculating.For given supposition profile Effectively producing of simulated diffraction signal include transverse magnetic (TM) component for diffracted signal and/or horizontal electricity (TE) lowest The optimal set of the order of diffraction is selected at each wavelength.Mathematically, the order of diffraction of selection is more, and the accuracy of emulation is higher. But, diffraction progression is higher, and the amount of calculation needed for the diffracted signal of computer sim-ulation is also bigger.Additionally, the calculating time is to be used The nonlinear function of progression.

Content of the invention

Embodiments of the present invention include the side of the neural metwork training accurately and quickly for the CD metering based on storehouse Method.

In embodiments, the side of a kind of accurate neural metwork training for measuring based on the critical dimension (CD) in storehouse Method, comprising: optimize the threshold value of principal component analysis (PCA) of sets of spectral data to provide principal component (PC) value.Estimate one or The training objective of multiple neutral nets.Based on described training objective and based on described in offer from the threshold value optimizing described PCA PC value, trains the one or more neutral net.There is provided library of spectra based on the neutral net after one or more training.

In another embodiment, a kind of machineaccessible storage medium with the instruction being stored thereon, this instruction Data handling system is made to perform the method for the accurate neural metwork training for measuring based on the critical dimension (CD) in storehouse.The method Including: optimize the threshold value of principal component analysis (PCA) of sets of spectral data to provide principal component (PC) value.Estimate one or more The training objective of neutral net.Based on described training objective and based on the described PC value providing from the threshold value optimizing described PCA, Train the one or more neutral net.There is provided library of spectra based on the neutral net after one or more training.

In embodiments, the side of a kind of fast neuronal network training for measuring based on the critical dimension (CD) in storehouse Method, the method includes: provide training objective for first nerves network.Train described first nerves network.This training includes with in advance Order the neuron of quantity to start and iteration increases the quantity of neuron until reaching the total quantity of the neuron optimizing.Based on described The total quantity of the neuron of training and described optimization, produces nervus opticus network.There is provided spectrum based on described nervus opticus network Storehouse.

In another embodiment, a kind of machineaccessible storage medium with the instruction being stored thereon, this instruction Data handling system is made to perform the method for the fast neuronal network training for measuring based on the critical dimension (CD) in storehouse.The method There is provided training objective for first nerves network.Train described first nerves network.This training includes the neuron with predetermined number Start and iteration increases the quantity of neuron until reaching the total quantity of the neuron optimizing.Based on described training and described optimization Neuron total quantity produce nervus opticus network.There is provided library of spectra based on described nervus opticus network.

Brief description

Fig. 1 describes the accurate neural metwork training for the CD metering based on storehouse for the expression according to embodiment of the present invention The flow chart of exemplary sequences of operations；

Fig. 2 A is the curve of the storehouse homing method that the Sample Storehouse of the dynamic increase illustrating and being in OFF state matches Figure；

Fig. 2 B is that the Sample Storehouse of the dynamic increase illustrating and being in ON state according to embodiment of the present invention matches The curve map of storehouse homing method；

Fig. 2 C includes that 3 σ error ranges contrast what traditional increase storehouse sample size compared with according to embodiment of the present invention A pair curve map；

Fig. 3 describes the fast neuronal network training for the CD metering based on storehouse for the expression according to embodiment of the present invention The flow chart of exemplary sequences of operations；

Fig. 4 A shows the double hidden layer neutral nets according to embodiment of the present invention；

Fig. 4 B shows the fast neuronal network training for the CD metering based on storehouse according to embodiment of the present invention The Matlab curve map of real response curved surface；

Fig. 4 C is to compare delta algorithm according to embodiment of the present invention and monostable Levenberg-Marquardt algorithm The curve map of convergence history；

Fig. 4 D shows according to embodiment of the present invention for the fast neuronal network training of the CD metering based on storehouse The curve map of Performance comparision；

Fig. 4 E includes comparing according to embodiment of the present invention for the fast neuronal network training of the CD metering based on storehouse A pair curve map of the result of one frequency spectrum (spectrum)；

Fig. 4 F includes comparing according to embodiment of the present invention for the fast neuronal network training of the CD metering based on storehouse A pair curve map of the result of the second frequency spectrum；

Fig. 5 describes the choosing of the Exemplary neural network in the storehouse for producing spectral information according to embodiment of the present invention Select element；

Fig. 6 A describes the periodic with the profile changing in x-y plane according to embodiment of the present invention；

Fig. 6 B describes has, according to embodiment of the present invention, the profile changing in x direction but not changing in y direction Periodic；

Fig. 7 describes the expression according to embodiment of the present invention for determining and utilizing for automatic business processing and equipment control The flow chart of the exemplary sequences of operations of the structural parameters of system；

Fig. 8 is the structure controlling for automatic business processing and equipment for determination and utilization according to embodiment of the present invention The block diagram of the system of parameter；

Fig. 9 is to illustrate the structure outline utilizing optical metrology to determine on the semiconductor wafer according to embodiment of the present invention System assumption diagram；

Figure 10 shows the block diagram of the exemplary computer system according to embodiment of the present invention；

Figure 11 is to represent the spectrum for setting up parameterized model and start with sample spectrum according to embodiment of the present invention The flow chart of the operation in the method in storehouse.

Detailed description of the invention

Describe the method for the neural metwork training accurately and quickly for the CD metering based on storehouse in this.Retouch following In stating, in order to provide the deep understanding to embodiment of the present invention, it is proposed that many specific details, showing of such as neutral net Example.It will be apparent for a person skilled in the art that embodiment of the present invention can be in the case of not having these specific detail Implement.In other examples, in order to avoid causing embodiment of the present invention unclear, do not specifically describe all such as relating to manufacture light The known process operation of the operation of grid structure.Furthermore, it is to be understood that the various embodiments of display are signal in accompanying drawing Property represents and is not necessarily to scale.

With the increase of semiconductor and the complexity of dependency structure, the storehouse information for optical CD emulation is faced with Obtain the challenge of good accuracy for the nearest application of many.For example, user that several week can be spent to set up is different Storehouse, and only realizing that good storehouse returns coupling and good total metering uncertainty (TMU) is had any problem for introduce metering. The aspect of the present invention can provide pinpoint accuracy storehouse, and little storehouse size and for the quick Solution being used for.For not Need to spend several week to set up too much different sink it is also possible to obtain good accuracy.

The training method that many has iterative nature has been carried out developing for storehouse and comparing.These methods include The change of Levenberg-Marquardt (row literary composition Burger-Ma Kuaerte), backpropagation and N2X algorithm.The problem of this kind of method It is probably them very time-consuming.If the supposition of the quantity of neuron is correct, then algorithm will be received by a large amount of iteration Hold back.If the quantity of neuron is very little, then will be unable to convergence and stop until the quantity of neuron hits the maximum number of iteration Amount.

In one aspect of the invention, accurate neural network training method is provided.Fig. 1 describes according to the present invention real Execute flow process Figure 100 of exemplary sequences of operations of the expression of the mode accurate neural metwork training for the CD metering based on storehouse.

With reference to the operation 102 of flow process Figure 100, method includes optimizing the principal component analysis (PCA) for frequency spectrum data collection Threshold value.In embodiments, this optimization will provide principal component (PC) value.In embodiments, optimization will minimize PCA and draws The error entering.The following specifically describes frequency spectrum data collection can based on obtain in the diffractometry from optical grating construction measurement or The frequency spectrum of emulation.

According to the embodiment of the present invention, PCA threshold value is by Automatic Optimal.This optimization can minimize PCA and be incorporated into subsequently The error of neural metwork training.For example, traditional method generally utilizes steady state value to come for PCA threshold value, and e.g., PCA introduces Error has about 10^-5Magnitude.In one embodiment, the threshold value optimizing PCA includes determining the spectrum domain of minimum level.? In one specific this kind of embodiment, the error that PCA introduces has less than 10^-5Magnitude, e.g., about 10^-8To 10^-9Amount Level.

In embodiments, the threshold value optimizing PCA includes determining a PCA threshold value.For example, give PCA threshold value or point Numerical value can be set at t=10^-5Threshold value.This PCA is applied to spectrum number quantity set.For example, PCA is applied to frequency spectrum data Collection S is to obtain PC value P=T*S, and wherein T is matrix.Calculate the error of spectrum being introduced by application PCA.For example, Δ S=S-T ' * P, wherein T ' is the transposition of T.Then error of spectrum compares with pectrum noise level.In one embodiment, error of spectrum Level is based on optical CD (OCD) hardware specification information.Hardware specification information can be hardware-related, such as following and figure 9 be associated describe system.

When comparing error of spectrum with pectrum noise level, following standard: ε can be applied for given standard or acquiescence Pectrum noise level, if Δ S < ε, export t, otherwise t=t/10 and optimize repeated.Thus, in one embodiment, If error of spectrum is less than pectrum noise level, then a PCA threshold value is set to PC value.In another embodiment, if Error of spectrum is more than or equal to pectrum noise level, it is determined that the 2nd PCA threshold value, and repeated application, calculates and compare.

In embodiments, the training objective optimizing PCA includes using Mueller (Muller) territory error tolerance.For example, right Current techniques in the training of storehouse, the error target of each PCA can be set to 10^-5.It is set in this for by error target Individual value does not has clear and definite reason, and at PCA10^-5Error target and related error of spectrum between it doesn't matter.Further, will It is favourable that each PCA is set as that identical error target is not necessarily, because they are possible different to the contribution of frequency spectrum.With shown below In example method, Mueller territory error tolerance is converted into each PCA territory error tolerance and sets as PCA training objective.

Before training, neural metwork training profile is converted into Mueller.According to Mueller element (ME), based on instruction Practice sample data normalized for each wavelength.Then, PCA is performed according to standardization Mueller element (NME) to obtain The PC signal of training must be used for.Therefore, the Mueller element (M of i-th sample at j-th wavelength_ij) can be designated as Under:

M_{i j} = Σ_{p = 0}^{P C #} {PC}_{i p} * t_{p j} * {Std}_{j} + {Mean}_{j}

PC#: total PC number

Std_j, Mean_j: j-th standard of wavelength deviation and mean value

For each sample I, PC_ipAlways with identical fac-tor to form Mueller element.If M_ijError is public Difference is set to 0.001, then p-th PC will have a following error budget:

E T p = (\frac{0.001 / P C #}{\underset{j}{M a x} (t_{p j} * {Std}_{j})}),

Wherein ETp is the error tolerance of p-th Principal component.During training, each PC will have the training of himself Target, and in order to meet training error target, the neuronal quantity for network can increase.

With reference to the operation 104 of flow process Figure 100, method also includes estimating the training mesh for one or more neutral nets Mark.

According to the embodiment of the present invention, more accurate training objective is used for each neutral net.This kind of at one In embodiment, PCA and standardization are considered to estimate the training objective for each neutral net.In embodiments, instruct Practice target to be estimated based on PCA conversion and hardware signal noise level.

With reference to the operation 106 of flow process Figure 100, method also includes training one or many based on training objective with based on PC value Individual neutral net.

According to the embodiment of the present invention, over training detection and control is performed.Method may be utilized for In the case of increasing neuronal quantity and over training during Levenberg-Marquardt (LM) iteration, detect and controlled Degree training.In embodiments, training is also based on the training objective of above operation 104.Should be understood that training can be based on The PC value of more than one optimization, and it is potentially based on many PC values optimizing.The combination of that be not optimised and optimization PC value Also can be used.

With reference to the operation 108 of flow process Figure 100, the neutral net after method also includes based on one or more training provides light Spectrum storehouse.In embodiments, library of spectra is the library of spectra of pinpoint accuracy.

According to the embodiment of the present invention, provide there is the pinpoint accuracy storehouse of good generalization ability.A this kind of reality Executing in mode, based on the discrete error target for each training domain output, the dynamic method increasing neuronal quantity is by inspection Look into the method for training specification error and verification both specification errors and be developed to have based on pinpoint accuracy and good extensive energy The storehouse of the neuron net of power.Neuronal quantity iteration weights before are used as the initial of Current Situation of Neural Network structure Weights are to accelerate training.

In the exemplary embodiment, method has been developed to improve in the whole storehouse training method measured for CD Multiple zoness of different in the training of neuron net.According to test, have been obtained for returning changing of the essence matching in storehouse Kind.For example, storehouse error range (such as 3 σ error range) can than from before training method in produce storehouse little more than 10 times. In the case of implementing so less training set, the scope of error can be close to accurate rank.Use the sample of dynamic increase The method in storehouse can provide the convergence behavior improving compared with conventional method, and may only need less sample set come for Set up the storehouse improved.The method of the dynamic Sample Storehouse increasing has an advantage that for accurate rank, less storehouse size and quick The pinpoint accuracy of solution is so that user obtains extraordinary storehouse using as final solution.

Fig. 2 A is the curve of the storehouse homing method that the Sample Storehouse of the dynamic increase illustrating and being in OFF state matches Figure.Fig. 2 B is that the storehouse that the Sample Storehouse of the dynamic increase illustrating and being in ON state according to embodiment of the present invention matches returns The curve map of method.With reference to Fig. 2 A and 2B, use method ON of the dynamic Sample Storehouse increasing, need considerably less sample (e.g., 8000 contrasts 30000) obtain good coupling.

Fig. 2 C includes that 3 σ (Sigma) error range comparing increases storehouse sample size with according to embodiment of the present invention The figure of a pair curve 220 and 230.With reference to Fig. 2 C, 3 σ error ranges change with response sample size (e.g., traditional method, dynamically Method OFF of the Sample Storehouse increasing, 5 hours；Relatively in embodiment of the present invention, method ON of the dynamic Sample Storehouse increasing, 3 is little When).Thus, use the method for the Sample Storehouse of dynamic increase, it is achieved the time of the result wanted considerably reduces.

In embodiments, referring again to flow process Figure 100, pinpoint accuracy library of spectra includes simulated spectra, and and flow process The method of Figure 100 associated description also includes the operation comparing simulated spectra and sample spectrum.In one embodiment, Simulated spectra obtains from the set of space harmonics level.In one embodiment, sample spectrum is collected from structure, this structure Produce sample such as, but not limited to entity sample for reference or entity.In embodiments, ratio is performed by using recurrence calculating Relatively.In one embodiment, one or more non-differential signal are used simultaneously in calculating.This one or more non-differential are believed Number such as can for but be not limited to azimuth, incident angle, polarizer/analyzer angle, or extra measurement target.

In another aspect of this invention, fast neuronal network training method is provided.Fig. 3 describes and implements according to the present invention The flow chart of the exemplary sequences of operations representing the fast neuronal network training being used for the CD metering based on storehouse of mode.Fig. 4 A- 4F also show the aspect of the method described in Fig. 3.

With reference to the operation 302 in flow chart 300, method includes providing the training objective for first nerves network.Reference Operation 304 in flow chart 300, this first nerves network is trained to, and this training includes that the neuron with predetermined quantity starts simultaneously Iteration increases the quantity of neuron until total quantity after reaching the optimization of neuron.With reference to the operation 306 in flow chart 300, Produce nervus opticus network based on the total quantity after the optimization of training and neuron.It should be appreciated that many this kind of iteration Can be performed to reach the total quantity of the neuron after optimizing.With reference to the operation 308 in flow chart 300, based on nervus opticus Network and library of spectra is provided.

In embodiments, it is used for implementing nonlinear mapping function F with regard to " quickly " training, feedforward neural network, because of This y ≈ F (p).Nonlinear mapping function F is used as determining the meta-model that frequency spectrum or spectrum are associated with given profile.Consider Calculating time and cost (cost), this determination can and be relatively rapidly performed.In one embodiment, this function is at tool There is the set (p of training data_i, y_i) training during be determined.Neutral net can be used for approximating this kind of arbitrarily non-linear Function.For example, Fig. 4 A shows the double hidden layer neutral nets according to embodiment of the present invention.

With reference to Fig. 4 A, the real mapping function F from input p to output y_trueP () can have with mathematical way approximation Double hidden layer neutral nets 400, according to equation 1:

Y=F_true(p) ≈ F (p)=v^T*G₁[h*G₂(W*p+d)+e]+q (1)

Wherein G1 and G2 is nonlinear function.Set (the p of given training data_i, y_i), search W, h, d, e, v^TCollection with q Close, so make F (p) be best represented by F_trueP () is referred to as training.This training can be considered to solve optimization problem with In minimizing mean square error, according to equation 2:

C o s t = \frac{1}{2 N} Σ_{i = 1}^{N} {(y_{i} - F (p_{i}))}^{2} = \frac{1}{2 N} E^{T} E - - - (2)

Follow-up problem or determination include, the neuron of (1) how many hidden layers should be used？, and (2) neutral net Should how to be trained to have the accuracy of regulation？

With regard to the determination of neuronal quantity, there are two methods to tackle above first problem.First method is to sound out Method.Heuristic includes setting two numbers, and 18 and 30, for minimum number and the maximum quantity of neuron to be used.It is being used for When the quantity of the principal component of training data is less than or equal to 25, use 18 neurons.Quantity in principal component is more than or equal to When 80, use 30 neurons.Principal component quantity between when, linear interpolation be used for determine neuron suitable Quantity.One potential problem of heuristic may is that the quantity of principal component is unrelated with the quantity of the neuron that should use.

Determine the second method tackling above first problem for the method combining with various training methods.First, The maximum quantity of the neuron that can use is estimated, and e.g., quantity is expressed as Mmax.Then, following iterative process is used for Determine the quantity of neuron and train corresponding network: setting m=10, then (1) training has the network of m neuron；As Really the method convergence, then stop, and otherwise (2) are if (m+5) is more than Mmax, then stop, and otherwise (3) m increases by 5, carries out operating 1. But, above method may be very time-consuming.

In supposition feedforward neural network, the optimal number of neuron is that np complete problem (that is, does not has with applicable nonlinear function There is a class problem of the known solution with polynomial time complexity).Thus, according to the embodiment of the present invention, and such as Describing in greater detail below, the fast method of optimization includes the quantity being gradually increased neuron in network during training, until The optimal number providing the neuron of specific accuracy is determined.

With regard to arthmetic statement, in embodiments, incremental training algorithm is to utilize an algorithm to train the compound of neutral net Method.In one embodiment, improved Levenberg-Marquardt algorithm is used.Initial for this problem Levenberg-Marquardt algorithm is briefly described below: represent Jacobian (Jacobi)Wherein w is for for god W, h, d, e, v through network^TWith the element in q.Estimate J in iteration i and search δ w, therefore:

(J^TJ+ μ I) δ ω=J^TE,

Wherein I is unit matrix, and μ is the scaling constant adjusting in each iteration.Use wⁱ⁺¹=wⁱ+ δ w updates w.With new W ' s estimate cost；If cost is less than the value (such as 10 of regulation^-5), then stop.Otherwise, next iteration is continued until iteration Quantity is more than predetermined quantity (e.g., 200).In other words, the possible situation stopping iteration being had two kinds by algorithm.The first is Cost function is less than the value of regulation.The second is the maximum quantity more than iteration for the quantity of iteration.One is viewed as Levenberg-Marquardt algorithm is highly effective in the cost (mean-square value) in the one 10 ' s reducing iteration.This it After, reduced rate substantially slows down.

Alternatively, in embodiments, at stopping criterion, it is carried out at the above Levenberg-of this application The improvement of Marquardt algorithm.It is to say, more than one standard is increased: if with before for " r " subsequent iteration Iteration cost compare, cost does not reduce x%, then stop.Extra standard is performed to detect poor fitting.Incremental training is calculated Method is then suggested: given training set (p_i, y_i) it is provided as input, and quantity n of neuron, the weights of 2 hidden layer networks It is provided as output.From this, below operation is performed: (1) estimates the maximum quantity of neuron, e.g., is expressed as Nmax, and (2) are neural Unit quantity n be set to 4, (3) Nguyen-Widrow algorithm be used for initialize weight w ' s, (4) when (N < Nmax), A () Web vector graphic improved Levenberg-Marquardt method is trained, if (b) cost is less than the value of regulation, then Determining that process stops, if (c) is compared with the cost with " r " secondary continuous experiment before, cost does not reduce x%, then Determining that process stops, (d) optionally, uses verification data setting: if verify the error increasing of data for t continuous experiment Adding, then determine that process stops, (e) n is set to n+2, and new neutral net is by using the instruction in old neutral net Practice weights to construct.Then random number is distributed to new weights, and repeat 4a-e.

Should be understood that the improvement of Levenberg-Marquardt possibly for being important for Fast Training method 's.Improvement can make algorithm stop in the case that slip is little and increase the quantity of substituted neuron.It is, improve Be equivalent to detect poor fitting.In fact, in an embodiment, x=10 and r=4 is found to be a good selection.Except Levenberg-Marquardt, can also use algorithm, provides suitable improvement.With regard to operation 4e, such as Levenberg- The operation of Marquardt makes lookup jump out the local minimum frequently resulting in by those based on the optimization algorithm of gradient. Levenberg-Marquardt method provides the good set of the initial value of weights.More than operate 4c and 4d for preventing from having The method of the over training of a large amount of neurons.

In embodiments, delta algorithm further expand can implement as follows, for simple and suppose initial data quilt Training, this initial data includes differently contoured Mueller element, rather than principal component.Further expand in following operation Being described in set: (1) given profile set, referred to as Np, neutral net uses above new algorithm to be trained with table Show profile to the Nonlinear Mapping of a Mueller element, (2) Np+=δ Np method is used for training the net defining Np profile Network, if (a) Np+=δ Np method is stagnated, then the quantity of neuron increases to represent Nonlinear Mapping more accurately, (b) as Really Np+=δ Np method convergence, then current network is used, the accuracy of (3) Current Situation of Neural Network model be estimated for Nonlinear Mapping, if (a) meets the accuracy of needs, then method stops at this, (b) otherwise, the quantity of profile increases δ Np, and method includes again returning to operate 2.

Fig. 4 B shows the fast neuronal network training for the CD metering based on storehouse according to embodiment of the present invention The Matlab curve map 410 of real response curved surface.With reference to Fig. 4 B, the function with two unknown numbers uses Halton quasi-random numbers Measure generator and verify samples for producing 1000 training samples and 100.

Fig. 4 C is to compare delta algorithm according to embodiment of the present invention and monostable Levenberg-Marquardt algorithm The curve map 420 of convergence history.With reference to Fig. 4 C, incremental training convergence of algorithm history and original Levenberg- Marquardt algorithm (is labeled as monostable method, because this method providing the monostable instruction of the neuronal quantity with estimation Practice, e.g., 12 neurons in each hidden layer) compare.The incremental training of curve map 420 is partly because operation in 4e And demonstrate spike, but the cost of algorithm rank before being decreased back to is very effective.Monostable method is stagnated simultaneously Do not restrain in 200 iteration.The computation complexity of an iteration of Levenberg-Marquardt is O (n⁶), wherein n is The quantity of neuron in double hidden layer networks.For delta algorithm, expansion time has the training control of the maximum of neuronal quantity System.Thus, by compare iteration number in the incremental training of final stage with for the iteration number in monostable method, institute The performance of the delta algorithm of design can be more more preferable the order of magnitude than monostable method.

Fig. 4 D shows according to embodiment of the present invention for the fast neuronal network training of the CD metering based on storehouse The curve map 430 of Performance comparision.With reference to Fig. 4 D, in two dimension test sample, having four frees degree, one is top dimension, One is bottom size, and two is height.The scope of these parameters is also shown as.Automatically block level be used for RCWA imitate Very.There are 251 wavelength.The method of the dynamic Sample Storehouse increasing is activated, and has produce to set up storehouse to be more than 14000 Individual profile.

Fig. 4 E includes comparing according to embodiment of the present invention for the fast neuronal network training of the CD metering based on storehouse A pair curve map 440 and 450 of the result of one frequency spectrum.With reference to Fig. 4 E, for the sake of clarity, only depict the end in each stage Cost value at end.Using incremental training, this application converges on 127 total iteration and the final amt of neuron is 22.Should It is to be noted that the final stage in control part only has 7 iteration.Use the training algorithm using the dynamic Sample Storehouse increasing, should Method converges on 712 iteration, and the final amt of neuron is 25.Should be noted that the method utilizes final stage 111 iteration.

Fig. 4 F includes comparing according to embodiment of the present invention for the fast neuronal network training of the CD metering based on storehouse A pair curve map 460 and 470 of the result of the second frequency spectrum.It with reference to Fig. 4 F, in same test example, is used for another such as Fig. 4 E The result of frequency spectrum is expressed.Using incremental training, for the neuron that final amt is 40, the method needs 240 total iteration, Wherein final stage only uses 9 iteration.Using traditional training method, the method is after 1200 iteration and not converged.

More generally, at least belong to some embodiments of the present invention, have found the new instruction for neutral net Practice method.The term of execution algorithm, the optimization quantity of hidden layer neuron is determined.This algorithm can be for than monostable The faster order of magnitude of Levenberg-Marquardt, especially in the case of can estimating the correct number of neuron.This algorithm Can be more faster the order of magnitude than conventional method.In one embodiment, in hidden layer, the optimization quantity of neuron is determined. In one embodiment, above method carrys out training network in a very fast manner.

In embodiments, referring again to flow chart 300, produced library of spectra includes simulated spectrum, and and flow chart The method of 300 related descriptions also includes the operation comparing simulated spectra and sample spectrum.In one embodiment, light is emulated Spectrum obtains from the set of space harmonics level.In one embodiment, sample spectrum is collected from structure, this structure such as but It is not limited to entity sample for reference or entity produces sample.In embodiments, perform to compare by using recurrence calculating.One In individual embodiment, one or more non-differential signal are used simultaneously in calculating.This one or more non-differential signal are such as Can for but be not limited to azimuth, incident angle, polarizer/analyzer angle, or extra measurement target.

Any applicable neutral net can be used for of the description performing to be associated with flow process Figure 100 and 300 Or multiple method.As example, Fig. 5 describes the exemplary of the storehouse for producing spectral information according to embodiment of the present invention The selection element of neutral net.

With reference to Fig. 5, neutral net 500 uses back-propagation algorithm.Neutral net 500 includes input layer the 502nd, output layer 504 and the hidden layer 506 between input layer 502 and output layer 504.Input layer 502 and hidden layer 506 use link 508 even Connect.Hidden layer 506 and output layer 504 use link 510 to connect.It should be understood, however, that this neutral net 500 can include At any amount of layer generally connecting in known various configurations in nerual network technique.

As described in Figure 5, input layer 502 includes one or more input node 512.This exemplary embodiment party In formula, input node in input layer 502 512 is corresponding to the profile parameters of skeleton pattern, and this parameter is input to neutral net 500.Thus, the quantity of input node 512 is corresponding to the quantity of the profile parameters for describing skeleton pattern feature.For example, as Really skeleton pattern uses two profile parameters (such as top critical dimension and bottom critical dimension) to be characterized, then input layer 502 Including two input nodes 512, wherein the first input node 512 corresponds to the first profile parameter (e.g., top critical dimension), with And second input node 512 correspond to the second profile parameters (e.g., bottom critical dimension).

In neutral net 500, output layer 504 includes one or more output node 514.In this illustrative embodiments In, each output node 514 is linear function.It should be understood, however, that this each output node 514 can be various types of The function of type.Additionally, in this illustrative embodiments, the output node 514 in output layer 504 is corresponding to from neutral net The size of the simulated diffraction signal of 500 outputs.Thus, the quantity of output node 514 is corresponding to being used for characterizing simulated diffraction signal The quantity of size.For example, if simulated diffraction signal uses five sizes to be characterized, this five sizes for example correspond to five Different wavelength, then output layer 504 includes five output nodes 514, and wherein the first output node 514 corresponds to first size (e.g., first wave length), the second output node 514 corresponds to the second size (e.g., second wave length), etc..Additionally, for increase Performance, neutral net 500 can based on the sized fraction of the unitary part of simulated diffraction signal and/or simulated diffraction signal quilt It is divided into multiple sub-network.

In neutral net 500, hidden layer 506 includes one or more concealed nodes 516.In this illustrative embodiments In, each concealed nodes 516 is S-shaped (sigmoidal) transfer function or RBF.But it would be recognized that each is hidden Hiding node 516 can be various types of functions.Additionally, in this illustrative embodiments, the quantity of concealed nodes 516 based on The quantity of output node 514 and be determined.More specifically, the quantity (m) of concealed nodes 516 by estimated rate (r=m/n) with The quantity (n) of output node 514 is associated.For example, as r=10, there are 10 concealed nodes for each output node 514 516.But it would be recognized that estimated rate can be the ratio of the quantity of output node 514 and the quantity of concealed nodes 516 (that is, r=n/m).In addition, it should be appreciated that the quantity of the concealed nodes 516 in neutral net 500 can be at concealed nodes 516 Initial number be determined after be adjusted based on estimated rate.Additionally, the number of the concealed nodes 516 in neutral net 500 Amount can be determine based on experience and/or test rather than based on estimated rate.

In embodiments, storehouse described above can include one of personal feature in two dimension or three-dimensional grating structure or Multiple parameters.Term used herein " three-dimensional grating structure " refers to also have in two dimension in addition to the degree of depth on z direction The structure of the x-y profile of change.For example, Fig. 6 A describes and changes on the x-y plane according to having of embodiment of the present invention The periodic 600 of profile.The change in z direction for the profile of this periodic is the function of x-y profile.

Term used herein " two-dimensional grating structure " refers to also have in addition to the degree of depth on z direction only a dimension The structure of the x-y profile of degree change.For example, Fig. 6 B describes having in the change of x direction but at y according to embodiment of the present invention The periodic 602 of the indeclinable profile in direction.The change in z direction for the profile of this periodic is the function of x profile. It is understood that two-dimensional structure is not changed in needing not to be unlimited in y-direction, but in pattern any destruction all by It is considered as long scope, as any destruction in pattern on y direction separates with the destruction in pattern on x direction substantially further.

In personal feature in the embodiment of two dimension or the personal feature of three-dimensional grating structure, the first parameter is such as but not Be limited to personal feature width, highly, length, upper fillet, base foundation or side wall angle.Such as refractive index in chip architecture The optical property of the material of index and extinction coefficient (n and k) also can be imitated to use in optical metrology.

With regard to the use to library of spectra providing from the method for flow process Figure 100 and 300, in embodiments, this kind of side Method includes based on consistent with the simulation parameter in library of spectra or the inconsistent parameter changing handling implement.By use such as but The technology being not limited to feedback technique, feed-forward technique and in-situ control technology can perform to change the parameter of handling implement.Implementing In mode, it is several that library of spectra can be used for setting up more accurately in apparatus structure profile and CD metering outfit formula (recipe) What structure.In embodiments, library of spectra is used as a part for the verification of CD metering outfit, diagnosis and feature.

As described above, the use of library of spectra can include comparing simulated spectra with sample spectrum.An enforcement In mode, the order of diffraction set simulated with represent from two or three-dimensional grating structure by elliptical polarization (ellipsometric) diffracted signal that optical metering system produces.This kind of optical metering system is described referring to Fig. 9. However, it is understood that identical concept and principle are equally applicable to other optical metering systems, such as reflectometric systems. Represented diffracted signal can illustrate the two dimension such as, but not limited to profile, size or material composite or three-dimensional grating structure Feature.

Figuratum film (such as figuratum semiconductor film or photoresist layer) can be indicated based on the simulated diffraction level calculating Profile parameters, it is possible to be used for calibration automated procedure or equipment control.Fig. 7 describes the table according to embodiment of the present invention Show the exemplary sequence determining and using for automated procedure and the operation of the structural parameters (such as profile parameters) of equipment control Flow chart 700.

The operation 702 of reference flow sheet 700, the machine learning system (MLS) of library of spectra or training is exploited for from survey The set of the diffracted signal of amount is extracted profile parameters.In operation 704, the MLS of use library of spectra or training determines structure At least one profile parameters.In operation 706, at least one profile parameters is transferred into manufacture group (fabrication Cluster), this manufacture group be configured to execution process operation, wherein process operation can complete measurement operation 704 before or Perform in semiconductor fabrication processes afterwards.In operation 708, at least one profile parameters being transmitted is used to change by making Make the process variable processing operation or the equipment setting that group performs.

For the more specific description of machine learning system and algorithm, see the entitled of submission on June 27th, 2003 OPTICAL METROLOGY OF STRUCTURES FORMED ON SEMICONDUCTOR WAFERS USING MACHINE The U.S. Patent application No.7 of LEARNING SYSTEMS, 831,528, the whole of this application are herein incorporated by reference. The description optimizing for the order of diffraction for two-dimensional repeated structure, sees the entitled of submission on March 24th, 2006 OPTIMIZATION OF DIFFRACTION ORDER SELECTION FOR TWO-DIMENSIONAL STRUCTURES's U.S. Patent application No.7,428,060, the whole of this application are herein incorporated by reference.

Fig. 8 is to determine and utilize the structural parameters controlling for automatic business processing and equipment according to embodiment of the present invention The block diagram of the system 800 of (such as profile parameters).System 800 includes the first manufacture group 802 and optical metering system 804.System 800 also includes the second manufacture group 806.Although in fig. 8 the second manufacture group 806 being described as be at the first manufacture group After 802 it should be appreciated that, in system 800 (and as in process stream), the second manufacture group 806 is permissible It is positioned at before the first manufacture group 802.

Photoetching process, such as exposure the photoresist layer being applied on wafer that develops, it is possible to use the first manufacture group 802 holds OK.In an illustrative embodiments, optical metering system 804 includes optical metrology tool 808 and processor 810.Optics Metering outfit 808 is configured to measure the diffracted signal obtaining from structure.If the diffracted signal that measurement is arrived and simulated diffraction Signal matches, then one or more values of profile parameters are defined as the one of the profile parameters being associated with simulated diffraction signal Individual or multiple values.

In an illustrative embodiments, optical metering system 804 can also include library of spectra 812, this library of spectra 812 There are multiple simulated diffraction signal and multiple values of the one or more profile parameters being associated with multiple simulated diffraction signals.As Upper described, library of spectra can be produced in advance.Metrology processor 810 can by obtain from structure measurement to diffracted signal with Multiple simulated diffraction signals in library of spectra compare.When finding the simulated diffraction signal of coupling, with in library of spectra One or more values of the profile parameters that the simulated diffraction signal joined is associated are considered as in the wafer application manufacturing structure One or more values of the profile parameters using.

System 800 also includes metrology processor 816.In an illustrative embodiments, processor 810 can be by one Or one or more values of multiple profile parameters are sent to metrology processor 816.Metrology processor 816 may then based on use One or more values of one or more profile parameters that optical metering system 804 determines adjust the one of the first manufacture group 802 Individual or multiple procedure parameter or equipment are arranged.Metrology processor 816 is also based on use optical metering system 804 to determine one Individual or multiple profile parameters one or more values adjust one or more procedure parameter of the second manufacture group 806 or equipment sets Put.Above it should be noted that, manufacture group 806 and before or after manufacturing group 802, wafer can be processed.At another In illustrative embodiments, processor 810 is configured to the diffraction using the measurement as the input of machine learning system 814 to arrive The set of signal and the profile parameters as the desired output of machine learning system 814 carry out training machine learning system 814.

Fig. 9 is to illustrate the structure outline utilizing optical metrology to determine on the semiconductor wafer according to embodiment of the present invention System assumption diagram.Optical metering system 900 includes the metering projecting metrology beam 904 at the object construction 906s of wafer 908 Light beam source 902.Metrology beam 904 projects to object construction 906 with incidence angle θ.Metrology beam receiver 912 measures diffracted beam 910.Diffraction beam data 914 is sent to profile application server 916.The diffraction that measurement is arrived by profile application server 916 Beam data 914 compares with the library of spectra 918 of simulated diffraction beam data, and this simulated diffraction beam data represents that target is tied The combination of the change of the critical dimension of structure and resolution ratio.

According to the embodiment of the present invention, at least a portion of simulated diffraction beam data is based on to two or more sides The difference that parallactic angle determines.According to another implementation of the invention, at least a portion of simulated diffraction beam data is based on to two The difference that individual or more incidence angles determine.In an illustrative embodiments, select the diffraction beam data 914 arriving with measurement Library of spectra 918 example of optimal coupling.Although it is understood that the library of spectra of difraction spectrum or signal and the vacation being associated Fixed profile is frequently used in explaination concept and principle, but present invention is equally applicable to include simulated diffraction signal and be associated The spectroscopic data of set (such as in the recurrence of contours extract, neutral net and similar method) of profile parameters Space.The supposition profile of selected library of spectra 916 example and the critical dimension being associated are considered corresponding to object construction 906 The critical dimension of feature and actual cross-section profile.Optical metering system 900 can utilize reflectometer, ellipsometer Or other optical metrology device measure diffracted beam or signal (ellipsometer).

The set of the skeleton pattern being stored in library of spectra 918 can profile parameters collection is incompatible characterizes skeleton pattern by using And change profile parameters set produces with the skeleton pattern producing the shape and size of change.Use profile parameters collection table The process levying skeleton pattern is referred to as parametrization.For example it is assumed that skeleton pattern can be respectively by the wheel of its height of definition and width Wide parameter h1 and w1 characterize.Other shapes of skeleton pattern and feature can carry out table by the quantity increasing profile parameters Levy.For example, skeleton pattern can be by defining profile parameters h1, w1 and w2 of its height, bottom width and top width respectively Characterize.It should be noted that, the width of skeleton pattern can be referred to as critical dimension (CD).For example, profile parameters w1 Can be described as respectively defining the bottom CD of skeleton pattern and top CD with w2.Should be understood that various types of profile Parameter can be used for characterizing skeleton pattern, including but not limited to incidence angle (AOI), inclination angle (pitch), n&k, hardware parameter (e.g., polarizing angle).

As described above, the set of the skeleton pattern being stored in library of spectra 918 can characterize skeleton pattern by changing Profile parameters and produce.For example, by changing profile parameters h1, w1 and w2, the profile of the shape and size of change can be produced Model.It should be noted that, one in profile parameters, the two or all three profile parameters can change relative to another Become.Similarly, the profile parameters of skeleton pattern being associated with the simulated diffraction signal of coupling may be used to determine whether by The architectural feature checking.For example, the profile parameters corresponding to the skeleton pattern of bottom CD may be used to determine whether to be examined The bottom CD of structure.

Embodiments of the present invention can be applicable to various film stack (film stack).For example, in embodiments, Film stack includes single layer or multiple layer.Further, in embodiments, analyze or measurement optical grating construction include three-dimensional micromodule and Two dimension both assemblies.For example, can be by utilizing two dimension assembly to total based on the computational efficiency of simulated diffraction data Simpler contribution and diffraction data and optimised.

For the ease of describe embodiments of the present invention, ellipse degree of bias amount (ellipsometric) optical metering system by with In the above-mentioned concept of explanation and principle.It should be appreciated that identical concept and principle apply equally to other optical metering systems, Such as reflection gauging system.In a similar manner, semiconductor wafer can be used in the application of explanation concept.Additionally, method It is equally applicable to other workpiece with repetitive structure with process.In embodiments, optical scattering measurement is not such as but not It is limited to optic ellipse polarization spectrum (SE), beam profile reflection measurement (BPR) and the measurement of enhanced ultraviolet line reflection (eUVR) technology.

The present invention may be provided in computer program or software, and it can include being stored thereon with the machine of instruction Device computer-readable recording medium, described instruction may be used for being programmed computer system (or other electronic equipments) to perform according to this Bright process.Machine readable media includes any machine for storing or transmitting information in the way of machine (such as computer) is readable System.For example, machine readable (such as computer-readable) medium includes (e.g., the read-only storage of machine (such as computer) readable storage medium storing program for executing Device (" ROM "), random access memory (" RAM "), magnetic disk storage medium, optical storage media, flash memory device etc.), machine (as meter Calculation machine) readable transmission medium (transmitting signals (e.g., carrier wave, infrared signal, data signal etc.) of electricity, light, sound or other forms) Deng.

Figure 10 shows that the figure of the machine of the exemplary form with computer system 1000 describes, in this computer system In 1000 can with set of instructions, this instruction set for make that machine is implemented in this method being discussed any one or Multiple.In interchangeable embodiment, machine can connect (as network connects) and arrive LAN (LAN), Intranet, extranet Or the other machines in internet.Machine can the capacity of server in client-sever network environment or client computer (capacity) operate in, or operate as the peer machines in end-to-end (or distributed) network environment.Machine Can be personal computer (PC), flat board PC, Set Top Box (STB), personal digital assistant (PDA), cell phone, the network equipment, Server, network router, switch or bridge or be able to carry out the instruction set of the required movement that machine is taked (in order Or other) any machine.Although additionally, illustrate only individual machine, but term " machine " also includes alone or combines holding The machine of any one or more method to be implemented in this described method for the row instruction set (or multiple set) is (as calculated Machine) arbitrary collection.

Exemplary computer system 1000 includes processor the 1002nd, main storage 1004 (e.g., read-only storage (ROM), sudden strain of a muscle Deposit, dynamic random access memory (DRAM) (synchronous dram (SDRAM) or rambus DRAM (RDRAM) etc.)), static Memory 1006 (e.g., flash memory, static RAM (SRAM) etc.) and additional storage 1018 (e.g., data storage Equipment), they communicate with one another via bus 1030.

Processor 1002 represents one or more general purpose processing device, microprocessor, CPU etc..More special Not, processor 1002 can be that sophisticated vocabulary calculates (CISC) microprocessor, the micro-process of Jing Ke Cao Neng (RISC) The processor of the combination of device, very long instruction word (VLIW) microprocessor, the processor performing other instruction set or execution instruction set. Processor 1002 can also is that one or more dedicated treatment facility, such as application-specific integrated circuit (ASIC), scene Programmable gate array (FPGA), digital signal processor (DSP), network processing unit etc..Processor 1002 is configured to perform use In the process logic 1026 performing operation described here.

Computer system 1000 can also include Network Interface Unit 1008.Computer system 1000 can also include video Display unit 1010 (such as liquid crystal display (LCD) or cathode-ray tube (CRT)), Alphanumeric Entry Device 1012 are (such as key Dish), cursor control device 814 (such as mouse) and signal generating apparatus 1016 (e.g., loudspeaker).

Additional storage 1018 can include machine-accessible storage medium (or more specifically computer-readable storage medium Matter) 1031, in the one or more instructions being stored thereon with any one or the many persons embodying method described herein or function Collection (such as software 1022).During computer system 1000 performs this software 1022, software 1022 can also fully or In being at least partially situated at main storage 1004 and/or in processor 1002, main storage 1004 and processor 1002 are also constituted Machinable medium.Software 1022 can also be transmitted on network 1020 via Network Interface Unit 1008 or be received.

Although in the exemplary embodiment machine-accessible storage medium 1031 is shown as single medium, but term " machine Device readable storage medium storing program for executing " should be believed to comprise the single medium storing one or more instruction set or multiple medium (e.g., is concentrated Formula or distributed data base, and/or the caching that is associated and server).Term " machinable medium " also should be recognized It is the instruction of any one or the many persons that include can storing or encode the method being performed and made by machine machine to perform the present invention The arbitrary medium of collection.Term " machinable medium " therefore should be believed to comprise but be not limited to solid-state memory and light and Magnetic medium.

According to the embodiment of the present invention, machine-accessible storage medium has the instruction being stored thereon, and this instruction makes Data handling system performs the method for the accurate neural metwork training for the CD metering based on storehouse.The method includes optimizing to be used In the threshold value of principal component analysis (PCA) of sets of spectral data to provide principal component (PC) value.One or more neutral nets Target is estimated based on this PC value.One or more neutral nets are trained based on training objective and PC value.Library of spectra base In one or more training after neutral net and provide.

In embodiments, the threshold value optimizing PCA includes determining the spectrum domain of lowest level.

In embodiments, the threshold value optimizing PCA includes determining a PCA threshold value, applies PCA to frequency spectrum data collection Close, calculate the error of spectrum being introduced by application PCA, and compare error of spectrum and pectrum noise rank.A this kind of reality Execute in mode, if error of spectrum is less than pectrum noise rank, then a PCA threshold value is set to PC value.Another this In class embodiment, if error of spectrum is more than or equal to pectrum noise rank, it is determined that the 2nd PCA threshold value, and repeat to answer With, calculate and comparison procedure.

In embodiments, the threshold value optimizing PCA includes using Mueller territory error tolerance.

In embodiments, pinpoint accuracy library of spectra includes simulated spectrum, and the method also includes comparing simulated spectrum With sample spectra.

According to another embodiment of the present invention, machine-accessible storage medium has the instruction being stored thereon, and this refers to Order makes data handling system perform the method for the fast neuronal network training for the CD metering based on storehouse.The method includes providing The training objective of first nerves network.Training first nerves network.This training includes that the neuron with predetermined quantity starts, and repeatedly The quantity of generation ground increase neuron is until reaching the sum of the neuron optimizing.Sum product based on training and the neuron optimizing Raw nervus opticus network.There is provided library of spectra based on nervus opticus network.

In embodiments, the quantity of neuron is increased iteratively until the sum reaching the neuron of optimization includes using Improved Levenberg-Marquardt method.

In embodiments, the quantity increasing neuron iteratively includes increasing the god in the hidden layer of first nerves network Through first quantity.

In embodiments, library of spectra includes simulated spectrum, and the method also includes comparing simulated spectrum and sample frequency Spectrum.

To measurement to the analysis of spectrum generally comprise by measurement to sample spectrum compare with simulated spectra to push away Derive the model parameter value most preferably describing the sample that measurement is arrived.Figure 11 is to represent for setting up ginseng according to embodiment of the present invention The flow process of the operation in the method for numberization model and the library of spectra starting (e.g., coming from one or more workpiece) with sample spectrum Figure.

In operation 1101, the spy of the material forming the sample characteristics that measurement is arrived specified by one group of material document defined by the user Levy (such as n, k value).

In operation 1102, scatterometry user by selecting one or more material document comes integrated corresponding to be measured Current material heap in periodic feature defines the nominal model of the expected composition of sample.This user-defined model Can be parameterized by the definition of the nominal value of model parameter further, such as characterize just in the shape of measured characteristic Thickness, critical dimension (CD), side wall angle (SWA), highly (HT), edge roughness, radius of corner etc..According to 2D model (i.e. Profile) or 3D model whether be defined, there is 30-50 or more this model parameter be not uncommon for.

According to parameterized model, the simulated spectra for the set of given grating parameter value can use such as strict coupling The rigorous diffraction modeling algorithm of the algorithm closing wave analysis (RCWA) calculates.Then regression analysis is performed at operation 1103s Until parameterized model converges on the set of the model parameter value characterizing final skeleton pattern (for 2D), this final profile die Type corresponds to simulated spectra, and the difraction spectrum that measurement is arrived by this simulated spectra mates matches criteria with predetermined.Assuming that with mate The final skeleton pattern that is associated of simulated diffraction signal represent the TP of the structure producing model.

Then operation 1104s at, the simulated spectra of coupling and/or the skeleton pattern of related optimization can be utilized with Produced the storehouse of simulated diffraction spectra by the value of the final skeleton pattern of disturbance parameter.Then operation in production environment Whether scatterometry system can use the final storehouse of simulated diffraction spectra to determine the optical grating construction subsequently measuring root Manufactured according to specification.The generation 1104 in storehouse can include each that the machine learning system of such as neutral net is in multiple profile Profile produces simulated spectra information, and each profile includes the set of one or more modeled profile parameters.In order to produce storehouse, Machine learning system itself may be subjected to the training of some training data set based on spectral information.This training is permissible It is that intensity calculates and/or may be repeated for different models and/or profile parameters territory.Produce in the computational load in storehouse Low-down efficiency may be caused by the decision of the size to training data set for the user.For example, training greatly is selected Data acquisition system may cause unnecessary training to calculate, and may need re-training with the not enough training data set of size simultaneously Produce storehouse.

Some embodiments described here include automatically determining the training data using in training machine learning system The size of set.Usually, training data set is arranged according to size based on the degree of convergence of data acquisition system characteristic measure standard, and And the estimation that can be based further on the error to last solution is arranged according to size.Training data set be gradually expanded and test with Identify and restrain, and in some embodiments, it is provided that estimate the last solution error of this sample size.Gradually extension and Test is performed until convergence is satisfied and/or the estimation of last solution error reaches thresholding.

It because the method for determination training matrix size described here does not needs individually to train, is used for neural metwork training Good training data sample set identified quickly and efficiently, and last solution error can be controlled well.With instruction Practicing data sample set to be identified, machine learning system can then be trained to produce desired object function information.One In individual specific embodiment, machine learning system is trained to produce the storehouse of simulated spectra information (such as diffracted signal), this storehouse The parameter of the unknown sample (such as diffraction grating or wafer cycle structure) of minimizing scatterometry systematic survey can be used to.

Should be understood that embodiment of the present invention thought and within the scope of above method can be in various environment feelings It is employed under condition.For example, in embodiments, process as described above semiconductor, solar energy, Light-Emitting Diode (LED) or Related manufacture process is performed.In embodiments, process as described above is used for single or integrated metering work In tool.In embodiments, process as described above is used in single or multiple measurement goal regression.

Thus, have been disclosed for the side of the neural metwork training accurately and quickly for the CD metering based on storehouse in this Method.According to the embodiment of the present invention, for including optimizing frequency based on the method for the accurate neural metwork training of the CD in storehouse metering The threshold value of the principal component analysis (PCA) of modal data set is to provide principal component (PC) value.The method also includes estimating one or many The training objective of individual neutral net.The method also includes training one or more neutral net based on PC value and training objective. Neutral net after the method also includes based on one or more training provides library of spectra.In one embodiment, optimize The threshold value of PCA includes the spectrum domain determining lowest level.According to the embodiment of the present invention, fast for measure based on the CD in storehouse The method of speed neural metwork training includes providing the training objective for first nerves network.The method also include training this first Neutral net, this training includes that the predetermined number with neuron starts and increases the quantity of neuron iteratively until reaching to optimize The total quantity of neuron.The method also includes producing nervus opticus net based on the total quantity of training and the neuron optimizing Network.The method also includes providing library of spectra based on nervus opticus network.In one embodiment, neuron is increased iteratively Quantity until the total quantity of neuron reaching to optimize includes using improved Levenberg-Marquardt method.

Claims

1. being used for a method for the accurate neural metwork training of the Critical Dimensions metrology based on storehouse, the method includes:

The threshold value of the principal component analysis of optimization sets of spectral data, to provide Principal component, wherein optimizes described sets of spectral data The threshold value of principal component analysis include:

Determine that first principal component analyzes threshold value；

Use described first principal component to analyze threshold value, described principal component analysis is applied to described sets of spectral data；

Calculate error of spectrum, this error of spectrum by use described first principal component analyze threshold value apply described principal component analysis and It is introduced into；And

Relatively more described error of spectrum and pectrum noise rank；

Estimate the training objective of one or more neutral net；

Based on described training objective and based on the described Principal component providing from the threshold value optimizing described principal component analysis, training The one or more neutral net；And

There is provided library of spectra based on the neutral net after one or more training.

2. method according to claim 1, the threshold value wherein optimizing described principal component analysis includes determining lowest level Spectrum domain.

3. method according to claim 1, the method also includes:

If described error of spectrum is less than described pectrum noise rank, then described first principal component analysis threshold value is set as described Principal component.

4. method according to claim 1, the method also includes:

If described error of spectrum is more than or equal to described pectrum noise rank, it is determined that Second principal component, analyzes threshold value, and Repeat described application, calculate and compare.

5. method according to claim 1, the threshold value wherein optimizing described principal component analysis includes using Muller territory error Tolerance.

6. method according to claim 1, wherein said library of spectra includes simulated spectrum, and the method also includes:

Relatively more described simulated spectrum and sample spectra.

7. being used for an equipment for the accurate neural metwork training of the Critical Dimensions metrology based on storehouse, this equipment includes:

For optimizing the device of the threshold value of the principal component analysis of sets of spectral data, to provide Principal component, wherein said for The device of the threshold value optimizing the principal component analysis of described sets of spectral data includes:

For determining that first principal component analyzes the device of threshold value；

For using described first principal component to analyze threshold value, described principal component analysis is applied to the dress of described sets of spectral data Put；

For calculating the device of error of spectrum, this error of spectrum by use described first principal component analyze threshold value apply described master Constituent analysis and be introduced into；And

Device for relatively more described error of spectrum and pectrum noise rank；

For estimating the device of the training objective of one or more neutral net；

For the described Principal component based on described training objective and based on offer from the threshold value optimizing described principal component analysis, Train the device of the one or more neutral net；And

For providing the device of library of spectra based on the neutral net after one or more training.

8. equipment according to claim 7, the wherein said principal component analysis for optimizing described sets of spectral data The device of threshold value farther includes the device of the spectrum domain for determining lowest level.

9. equipment according to claim 7, also includes:

For in the case that described error of spectrum is less than described pectrum noise rank, described first principal component analysis threshold value is set It is set to the device of described Principal component.

10. equipment according to claim 7, also includes:

For in the case that described error of spectrum is more than or equal to described pectrum noise rank, determine that Second principal component, analyzes threshold Value, and repeat described application, the device calculating and comparing.

11. equipment according to claim 7, the wherein said principal component analysis for optimizing described sets of spectral data The device of threshold value also includes the device for using Muller territory error tolerance.

12. equipment according to claim 7, wherein said library of spectra includes simulated spectrum, and this equipment also includes:

Device for relatively more described simulated spectrum and sample spectra.

The method of 13. 1 kinds of fast neuronal network trainings for the Critical Dimensions metrology based on storehouse, the method includes:

There is provided training objective for first nerves network；

Training described first nerves network, this training includes that the neuron with predetermined number starts the number that simultaneously iteration increases neuron Amount, until it reaches the total quantity after the optimization of neuron, the total quantity after the optimization of wherein said neuron is more than neuron Described predetermined number；

Produce nervus opticus network based on the total quantity after the optimization of described training and described neuron；And

There is provided library of spectra based on described nervus opticus network.

14. methods according to claim 13, wherein iteration increases the quantity of neuron until reaching described neuron Total quantity after optimization includes using improved row literary composition Burger-Ma Kuaer special formula method.

15. methods according to claim 13, wherein the quantity of iteration increase neuron includes increasing described first nerves The quantity of neuron in the hidden layer of network.

16. methods according to claim 13, wherein said library of spectra includes simulated spectrum, and the method also includes:

Relatively more described simulated spectrum and sample spectra.

The equipment of 17. 1 kinds of fast neuronal network trainings for the Critical Dimensions metrology based on storehouse, this equipment includes:

For providing the device of training objective for first nerves network；

For training the device of described first nerves network, the described device for training includes for the nerve with predetermined number Unit starts and iteration increases the quantity of neuron until the device of total quantity after reaching the optimization of neuron, wherein said nerve Total quantity after the optimization of unit is more than the described predetermined number of neuron；

For the total quantity after the optimization based on described training and described neuron, produce the device of nervus opticus network；And

For providing the device of library of spectra based on described nervus opticus network.

18. equipment according to claim 17, the wherein said quantity for iteration increase neuron is until reaching described The device of the total quantity after the optimization of neuron includes the device for using improved row literary composition Burger-Ma Kuaer special formula method.

19. equipment according to claim 17, the device of the wherein said quantity increasing neuron for iteration includes using The device of the quantity of neuron in the hidden layer increasing described first nerves network.

20. equipment according to claim 17, wherein said library of spectra includes simulated spectrum, and this equipment also includes:

Device for relatively more described simulated spectrum and sample spectra.