Detailed description of the invention
Describe the method for the neural metwork training accurately and quickly for the CD metering based on storehouse in this.Retouch following
In stating, in order to provide the deep understanding to embodiment of the present invention, it is proposed that many specific details, showing of such as neutral net
Example.It will be apparent for a person skilled in the art that embodiment of the present invention can be in the case of not having these specific detail
Implement.In other examples, in order to avoid causing embodiment of the present invention unclear, do not specifically describe all such as relating to manufacture light
The known process operation of the operation of grid structure.Furthermore, it is to be understood that the various embodiments of display are signal in accompanying drawing
Property represents and is not necessarily to scale.
With the increase of semiconductor and the complexity of dependency structure, the storehouse information for optical CD emulation is faced with
Obtain the challenge of good accuracy for the nearest application of many.For example, user that several week can be spent to set up is different
Storehouse, and only realizing that good storehouse returns coupling and good total metering uncertainty (TMU) is had any problem for introduce metering.
The aspect of the present invention can provide pinpoint accuracy storehouse, and little storehouse size and for the quick Solution being used for.For not
Need to spend several week to set up too much different sink it is also possible to obtain good accuracy.
The training method that many has iterative nature has been carried out developing for storehouse and comparing.These methods include
The change of Levenberg-Marquardt (row literary composition Burger-Ma Kuaerte), backpropagation and N2X algorithm.The problem of this kind of method
It is probably them very time-consuming.If the supposition of the quantity of neuron is correct, then algorithm will be received by a large amount of iteration
Hold back.If the quantity of neuron is very little, then will be unable to convergence and stop until the quantity of neuron hits the maximum number of iteration
Amount.
In one aspect of the invention, accurate neural network training method is provided.Fig. 1 describes according to the present invention real
Execute flow process Figure 100 of exemplary sequences of operations of the expression of the mode accurate neural metwork training for the CD metering based on storehouse.
With reference to the operation 102 of flow process Figure 100, method includes optimizing the principal component analysis (PCA) for frequency spectrum data collection
Threshold value.In embodiments, this optimization will provide principal component (PC) value.In embodiments, optimization will minimize PCA and draws
The error entering.The following specifically describes frequency spectrum data collection can based on obtain in the diffractometry from optical grating construction measurement or
The frequency spectrum of emulation.
According to the embodiment of the present invention, PCA threshold value is by Automatic Optimal.This optimization can minimize PCA and be incorporated into subsequently
The error of neural metwork training.For example, traditional method generally utilizes steady state value to come for PCA threshold value, and e.g., PCA introduces
Error has about 10-5Magnitude.In one embodiment, the threshold value optimizing PCA includes determining the spectrum domain of minimum level.?
In one specific this kind of embodiment, the error that PCA introduces has less than 10-5Magnitude, e.g., about 10-8To 10-9Amount
Level.
In embodiments, the threshold value optimizing PCA includes determining a PCA threshold value.For example, give PCA threshold value or point
Numerical value can be set at t=10-5Threshold value.This PCA is applied to spectrum number quantity set.For example, PCA is applied to frequency spectrum data
Collection S is to obtain PC value P=T*S, and wherein T is matrix.Calculate the error of spectrum being introduced by application PCA.For example, Δ S=S-T ' *
P, wherein T ' is the transposition of T.Then error of spectrum compares with pectrum noise level.In one embodiment, error of spectrum
Level is based on optical CD (OCD) hardware specification information.Hardware specification information can be hardware-related, such as following and figure
9 be associated describe system.
When comparing error of spectrum with pectrum noise level, following standard: ε can be applied for given standard or acquiescence
Pectrum noise level, if Δ S < ε, export t, otherwise t=t/10 and optimize repeated.Thus, in one embodiment,
If error of spectrum is less than pectrum noise level, then a PCA threshold value is set to PC value.In another embodiment, if
Error of spectrum is more than or equal to pectrum noise level, it is determined that the 2nd PCA threshold value, and repeated application, calculates and compare.
In embodiments, the training objective optimizing PCA includes using Mueller (Muller) territory error tolerance.For example, right
Current techniques in the training of storehouse, the error target of each PCA can be set to 10-5.It is set in this for by error target
Individual value does not has clear and definite reason, and at PCA10-5Error target and related error of spectrum between it doesn't matter.Further, will
It is favourable that each PCA is set as that identical error target is not necessarily, because they are possible different to the contribution of frequency spectrum.With shown below
In example method, Mueller territory error tolerance is converted into each PCA territory error tolerance and sets as PCA training objective.
Before training, neural metwork training profile is converted into Mueller.According to Mueller element (ME), based on instruction
Practice sample data normalized for each wavelength.Then, PCA is performed according to standardization Mueller element (NME) to obtain
The PC signal of training must be used for.Therefore, the Mueller element (M of i-th sample at j-th wavelengthij) can be designated as
Under:
PC#: total PC number
Stdj, Meanj: j-th standard of wavelength deviation and mean value
For each sample I, PCipAlways with identical fac-tor to form Mueller element.If MijError is public
Difference is set to 0.001, then p-th PC will have a following error budget:
Wherein ETp is the error tolerance of p-th Principal component.During training, each PC will have the training of himself
Target, and in order to meet training error target, the neuronal quantity for network can increase.
With reference to the operation 104 of flow process Figure 100, method also includes estimating the training mesh for one or more neutral nets
Mark.
According to the embodiment of the present invention, more accurate training objective is used for each neutral net.This kind of at one
In embodiment, PCA and standardization are considered to estimate the training objective for each neutral net.In embodiments, instruct
Practice target to be estimated based on PCA conversion and hardware signal noise level.
With reference to the operation 106 of flow process Figure 100, method also includes training one or many based on training objective with based on PC value
Individual neutral net.
According to the embodiment of the present invention, over training detection and control is performed.Method may be utilized for
In the case of increasing neuronal quantity and over training during Levenberg-Marquardt (LM) iteration, detect and controlled
Degree training.In embodiments, training is also based on the training objective of above operation 104.Should be understood that training can be based on
The PC value of more than one optimization, and it is potentially based on many PC values optimizing.The combination of that be not optimised and optimization PC value
Also can be used.
With reference to the operation 108 of flow process Figure 100, the neutral net after method also includes based on one or more training provides light
Spectrum storehouse.In embodiments, library of spectra is the library of spectra of pinpoint accuracy.
According to the embodiment of the present invention, provide there is the pinpoint accuracy storehouse of good generalization ability.A this kind of reality
Executing in mode, based on the discrete error target for each training domain output, the dynamic method increasing neuronal quantity is by inspection
Look into the method for training specification error and verification both specification errors and be developed to have based on pinpoint accuracy and good extensive energy
The storehouse of the neuron net of power.Neuronal quantity iteration weights before are used as the initial of Current Situation of Neural Network structure
Weights are to accelerate training.
In the exemplary embodiment, method has been developed to improve in the whole storehouse training method measured for CD
Multiple zoness of different in the training of neuron net.According to test, have been obtained for returning changing of the essence matching in storehouse
Kind.For example, storehouse error range (such as 3 σ error range) can than from before training method in produce storehouse little more than 10 times.
In the case of implementing so less training set, the scope of error can be close to accurate rank.Use the sample of dynamic increase
The method in storehouse can provide the convergence behavior improving compared with conventional method, and may only need less sample set come for
Set up the storehouse improved.The method of the dynamic Sample Storehouse increasing has an advantage that for accurate rank, less storehouse size and quick
The pinpoint accuracy of solution is so that user obtains extraordinary storehouse using as final solution.
Fig. 2 A is the curve of the storehouse homing method that the Sample Storehouse of the dynamic increase illustrating and being in OFF state matches
Figure.Fig. 2 B is that the storehouse that the Sample Storehouse of the dynamic increase illustrating and being in ON state according to embodiment of the present invention matches returns
The curve map of method.With reference to Fig. 2 A and 2B, use method ON of the dynamic Sample Storehouse increasing, need considerably less sample (e.g.,
8000 contrasts 30000) obtain good coupling.
Fig. 2 C includes that 3 σ (Sigma) error range comparing increases storehouse sample size with according to embodiment of the present invention
The figure of a pair curve 220 and 230.With reference to Fig. 2 C, 3 σ error ranges change with response sample size (e.g., traditional method, dynamically
Method OFF of the Sample Storehouse increasing, 5 hours;Relatively in embodiment of the present invention, method ON of the dynamic Sample Storehouse increasing, 3 is little
When).Thus, use the method for the Sample Storehouse of dynamic increase, it is achieved the time of the result wanted considerably reduces.
In embodiments, referring again to flow process Figure 100, pinpoint accuracy library of spectra includes simulated spectra, and and flow process
The method of Figure 100 associated description also includes the operation comparing simulated spectra and sample spectrum.In one embodiment,
Simulated spectra obtains from the set of space harmonics level.In one embodiment, sample spectrum is collected from structure, this structure
Produce sample such as, but not limited to entity sample for reference or entity.In embodiments, ratio is performed by using recurrence calculating
Relatively.In one embodiment, one or more non-differential signal are used simultaneously in calculating.This one or more non-differential are believed
Number such as can for but be not limited to azimuth, incident angle, polarizer/analyzer angle, or extra measurement target.
In another aspect of this invention, fast neuronal network training method is provided.Fig. 3 describes and implements according to the present invention
The flow chart of the exemplary sequences of operations representing the fast neuronal network training being used for the CD metering based on storehouse of mode.Fig. 4 A-
4F also show the aspect of the method described in Fig. 3.
With reference to the operation 302 in flow chart 300, method includes providing the training objective for first nerves network.Reference
Operation 304 in flow chart 300, this first nerves network is trained to, and this training includes that the neuron with predetermined quantity starts simultaneously
Iteration increases the quantity of neuron until total quantity after reaching the optimization of neuron.With reference to the operation 306 in flow chart 300,
Produce nervus opticus network based on the total quantity after the optimization of training and neuron.It should be appreciated that many this kind of iteration
Can be performed to reach the total quantity of the neuron after optimizing.With reference to the operation 308 in flow chart 300, based on nervus opticus
Network and library of spectra is provided.
In embodiments, it is used for implementing nonlinear mapping function F with regard to " quickly " training, feedforward neural network, because of
This y ≈ F (p).Nonlinear mapping function F is used as determining the meta-model that frequency spectrum or spectrum are associated with given profile.Consider
Calculating time and cost (cost), this determination can and be relatively rapidly performed.In one embodiment, this function is at tool
There is the set (p of training datai, yi) training during be determined.Neutral net can be used for approximating this kind of arbitrarily non-linear
Function.For example, Fig. 4 A shows the double hidden layer neutral nets according to embodiment of the present invention.
With reference to Fig. 4 A, the real mapping function F from input p to output ytrueP () can have with mathematical way approximation
Double hidden layer neutral nets 400, according to equation 1:
Y=Ftrue(p) ≈ F (p)=vT*G1[h*G2(W*p+d)+e]+q (1)
Wherein G1 and G2 is nonlinear function.Set (the p of given training datai, yi), search W, h, d, e, vTCollection with q
Close, so make F (p) be best represented by FtrueP () is referred to as training.This training can be considered to solve optimization problem with
In minimizing mean square error, according to equation 2:
Follow-up problem or determination include, the neuron of (1) how many hidden layers should be used?, and (2) neutral net
Should how to be trained to have the accuracy of regulation?
With regard to the determination of neuronal quantity, there are two methods to tackle above first problem.First method is to sound out
Method.Heuristic includes setting two numbers, and 18 and 30, for minimum number and the maximum quantity of neuron to be used.It is being used for
When the quantity of the principal component of training data is less than or equal to 25, use 18 neurons.Quantity in principal component is more than or equal to
When 80, use 30 neurons.Principal component quantity between when, linear interpolation be used for determine neuron suitable
Quantity.One potential problem of heuristic may is that the quantity of principal component is unrelated with the quantity of the neuron that should use.
Determine the second method tackling above first problem for the method combining with various training methods.First,
The maximum quantity of the neuron that can use is estimated, and e.g., quantity is expressed as Mmax.Then, following iterative process is used for
Determine the quantity of neuron and train corresponding network: setting m=10, then (1) training has the network of m neuron;As
Really the method convergence, then stop, and otherwise (2) are if (m+5) is more than Mmax, then stop, and otherwise (3) m increases by 5, carries out operating 1.
But, above method may be very time-consuming.
In supposition feedforward neural network, the optimal number of neuron is that np complete problem (that is, does not has with applicable nonlinear function
There is a class problem of the known solution with polynomial time complexity).Thus, according to the embodiment of the present invention, and such as
Describing in greater detail below, the fast method of optimization includes the quantity being gradually increased neuron in network during training, until
The optimal number providing the neuron of specific accuracy is determined.
With regard to arthmetic statement, in embodiments, incremental training algorithm is to utilize an algorithm to train the compound of neutral net
Method.In one embodiment, improved Levenberg-Marquardt algorithm is used.Initial for this problem
Levenberg-Marquardt algorithm is briefly described below: represent Jacobian (Jacobi)Wherein w is for for god
W, h, d, e, v through networkTWith the element in q.Estimate J in iteration i and search δ w, therefore:
(JTJ+ μ I) δ ω=JTE,
Wherein I is unit matrix, and μ is the scaling constant adjusting in each iteration.Use wi+1=wi+ δ w updates w.With new
W ' s estimate cost;If cost is less than the value (such as 10 of regulation-5), then stop.Otherwise, next iteration is continued until iteration
Quantity is more than predetermined quantity (e.g., 200).In other words, the possible situation stopping iteration being had two kinds by algorithm.The first is
Cost function is less than the value of regulation.The second is the maximum quantity more than iteration for the quantity of iteration.One is viewed as
Levenberg-Marquardt algorithm is highly effective in the cost (mean-square value) in the one 10 ' s reducing iteration.This it
After, reduced rate substantially slows down.
Alternatively, in embodiments, at stopping criterion, it is carried out at the above Levenberg-of this application
The improvement of Marquardt algorithm.It is to say, more than one standard is increased: if with before for " r " subsequent iteration
Iteration cost compare, cost does not reduce x%, then stop.Extra standard is performed to detect poor fitting.Incremental training is calculated
Method is then suggested: given training set (pi, yi) it is provided as input, and quantity n of neuron, the weights of 2 hidden layer networks
It is provided as output.From this, below operation is performed: (1) estimates the maximum quantity of neuron, e.g., is expressed as Nmax, and (2) are neural
Unit quantity n be set to 4, (3) Nguyen-Widrow algorithm be used for initialize weight w ' s, (4) when (N < Nmax),
A () Web vector graphic improved Levenberg-Marquardt method is trained, if (b) cost is less than the value of regulation, then
Determining that process stops, if (c) is compared with the cost with " r " secondary continuous experiment before, cost does not reduce x%, then
Determining that process stops, (d) optionally, uses verification data setting: if verify the error increasing of data for t continuous experiment
Adding, then determine that process stops, (e) n is set to n+2, and new neutral net is by using the instruction in old neutral net
Practice weights to construct.Then random number is distributed to new weights, and repeat 4a-e.
Should be understood that the improvement of Levenberg-Marquardt possibly for being important for Fast Training method
's.Improvement can make algorithm stop in the case that slip is little and increase the quantity of substituted neuron.It is, improve
Be equivalent to detect poor fitting.In fact, in an embodiment, x=10 and r=4 is found to be a good selection.Except
Levenberg-Marquardt, can also use algorithm, provides suitable improvement.With regard to operation 4e, such as Levenberg-
The operation of Marquardt makes lookup jump out the local minimum frequently resulting in by those based on the optimization algorithm of gradient.
Levenberg-Marquardt method provides the good set of the initial value of weights.More than operate 4c and 4d for preventing from having
The method of the over training of a large amount of neurons.
In embodiments, delta algorithm further expand can implement as follows, for simple and suppose initial data quilt
Training, this initial data includes differently contoured Mueller element, rather than principal component.Further expand in following operation
Being described in set: (1) given profile set, referred to as Np, neutral net uses above new algorithm to be trained with table
Show profile to the Nonlinear Mapping of a Mueller element, (2) Np+=δ Np method is used for training the net defining Np profile
Network, if (a) Np+=δ Np method is stagnated, then the quantity of neuron increases to represent Nonlinear Mapping more accurately, (b) as
Really Np+=δ Np method convergence, then current network is used, the accuracy of (3) Current Situation of Neural Network model be estimated for
Nonlinear Mapping, if (a) meets the accuracy of needs, then method stops at this, (b) otherwise, the quantity of profile increases δ
Np, and method includes again returning to operate 2.
Fig. 4 B shows the fast neuronal network training for the CD metering based on storehouse according to embodiment of the present invention
The Matlab curve map 410 of real response curved surface.With reference to Fig. 4 B, the function with two unknown numbers uses Halton quasi-random numbers
Measure generator and verify samples for producing 1000 training samples and 100.
Fig. 4 C is to compare delta algorithm according to embodiment of the present invention and monostable Levenberg-Marquardt algorithm
The curve map 420 of convergence history.With reference to Fig. 4 C, incremental training convergence of algorithm history and original Levenberg-
Marquardt algorithm (is labeled as monostable method, because this method providing the monostable instruction of the neuronal quantity with estimation
Practice, e.g., 12 neurons in each hidden layer) compare.The incremental training of curve map 420 is partly because operation in 4e
And demonstrate spike, but the cost of algorithm rank before being decreased back to is very effective.Monostable method is stagnated simultaneously
Do not restrain in 200 iteration.The computation complexity of an iteration of Levenberg-Marquardt is O (n6), wherein n is
The quantity of neuron in double hidden layer networks.For delta algorithm, expansion time has the training control of the maximum of neuronal quantity
System.Thus, by compare iteration number in the incremental training of final stage with for the iteration number in monostable method, institute
The performance of the delta algorithm of design can be more more preferable the order of magnitude than monostable method.
Fig. 4 D shows according to embodiment of the present invention for the fast neuronal network training of the CD metering based on storehouse
The curve map 430 of Performance comparision.With reference to Fig. 4 D, in two dimension test sample, having four frees degree, one is top dimension,
One is bottom size, and two is height.The scope of these parameters is also shown as.Automatically block level be used for RCWA imitate
Very.There are 251 wavelength.The method of the dynamic Sample Storehouse increasing is activated, and has produce to set up storehouse to be more than 14000
Individual profile.
Fig. 4 E includes comparing according to embodiment of the present invention for the fast neuronal network training of the CD metering based on storehouse
A pair curve map 440 and 450 of the result of one frequency spectrum.With reference to Fig. 4 E, for the sake of clarity, only depict the end in each stage
Cost value at end.Using incremental training, this application converges on 127 total iteration and the final amt of neuron is 22.Should
It is to be noted that the final stage in control part only has 7 iteration.Use the training algorithm using the dynamic Sample Storehouse increasing, should
Method converges on 712 iteration, and the final amt of neuron is 25.Should be noted that the method utilizes final stage
111 iteration.
Fig. 4 F includes comparing according to embodiment of the present invention for the fast neuronal network training of the CD metering based on storehouse
A pair curve map 460 and 470 of the result of the second frequency spectrum.It with reference to Fig. 4 F, in same test example, is used for another such as Fig. 4 E
The result of frequency spectrum is expressed.Using incremental training, for the neuron that final amt is 40, the method needs 240 total iteration,
Wherein final stage only uses 9 iteration.Using traditional training method, the method is after 1200 iteration and not converged.
More generally, at least belong to some embodiments of the present invention, have found the new instruction for neutral net
Practice method.The term of execution algorithm, the optimization quantity of hidden layer neuron is determined.This algorithm can be for than monostable
The faster order of magnitude of Levenberg-Marquardt, especially in the case of can estimating the correct number of neuron.This algorithm
Can be more faster the order of magnitude than conventional method.In one embodiment, in hidden layer, the optimization quantity of neuron is determined.
In one embodiment, above method carrys out training network in a very fast manner.
In embodiments, referring again to flow chart 300, produced library of spectra includes simulated spectrum, and and flow chart
The method of 300 related descriptions also includes the operation comparing simulated spectra and sample spectrum.In one embodiment, light is emulated
Spectrum obtains from the set of space harmonics level.In one embodiment, sample spectrum is collected from structure, this structure such as but
It is not limited to entity sample for reference or entity produces sample.In embodiments, perform to compare by using recurrence calculating.One
In individual embodiment, one or more non-differential signal are used simultaneously in calculating.This one or more non-differential signal are such as
Can for but be not limited to azimuth, incident angle, polarizer/analyzer angle, or extra measurement target.
Any applicable neutral net can be used for of the description performing to be associated with flow process Figure 100 and 300
Or multiple method.As example, Fig. 5 describes the exemplary of the storehouse for producing spectral information according to embodiment of the present invention
The selection element of neutral net.
With reference to Fig. 5, neutral net 500 uses back-propagation algorithm.Neutral net 500 includes input layer the 502nd, output layer
504 and the hidden layer 506 between input layer 502 and output layer 504.Input layer 502 and hidden layer 506 use link 508 even
Connect.Hidden layer 506 and output layer 504 use link 510 to connect.It should be understood, however, that this neutral net 500 can include
At any amount of layer generally connecting in known various configurations in nerual network technique.
As described in Figure 5, input layer 502 includes one or more input node 512.This exemplary embodiment party
In formula, input node in input layer 502 512 is corresponding to the profile parameters of skeleton pattern, and this parameter is input to neutral net
500.Thus, the quantity of input node 512 is corresponding to the quantity of the profile parameters for describing skeleton pattern feature.For example, as
Really skeleton pattern uses two profile parameters (such as top critical dimension and bottom critical dimension) to be characterized, then input layer 502
Including two input nodes 512, wherein the first input node 512 corresponds to the first profile parameter (e.g., top critical dimension), with
And second input node 512 correspond to the second profile parameters (e.g., bottom critical dimension).
In neutral net 500, output layer 504 includes one or more output node 514.In this illustrative embodiments
In, each output node 514 is linear function.It should be understood, however, that this each output node 514 can be various types of
The function of type.Additionally, in this illustrative embodiments, the output node 514 in output layer 504 is corresponding to from neutral net
The size of the simulated diffraction signal of 500 outputs.Thus, the quantity of output node 514 is corresponding to being used for characterizing simulated diffraction signal
The quantity of size.For example, if simulated diffraction signal uses five sizes to be characterized, this five sizes for example correspond to five
Different wavelength, then output layer 504 includes five output nodes 514, and wherein the first output node 514 corresponds to first size
(e.g., first wave length), the second output node 514 corresponds to the second size (e.g., second wave length), etc..Additionally, for increase
Performance, neutral net 500 can based on the sized fraction of the unitary part of simulated diffraction signal and/or simulated diffraction signal quilt
It is divided into multiple sub-network.
In neutral net 500, hidden layer 506 includes one or more concealed nodes 516.In this illustrative embodiments
In, each concealed nodes 516 is S-shaped (sigmoidal) transfer function or RBF.But it would be recognized that each is hidden
Hiding node 516 can be various types of functions.Additionally, in this illustrative embodiments, the quantity of concealed nodes 516 based on
The quantity of output node 514 and be determined.More specifically, the quantity (m) of concealed nodes 516 by estimated rate (r=m/n) with
The quantity (n) of output node 514 is associated.For example, as r=10, there are 10 concealed nodes for each output node 514
516.But it would be recognized that estimated rate can be the ratio of the quantity of output node 514 and the quantity of concealed nodes 516
(that is, r=n/m).In addition, it should be appreciated that the quantity of the concealed nodes 516 in neutral net 500 can be at concealed nodes 516
Initial number be determined after be adjusted based on estimated rate.Additionally, the number of the concealed nodes 516 in neutral net 500
Amount can be determine based on experience and/or test rather than based on estimated rate.
In embodiments, storehouse described above can include one of personal feature in two dimension or three-dimensional grating structure or
Multiple parameters.Term used herein " three-dimensional grating structure " refers to also have in two dimension in addition to the degree of depth on z direction
The structure of the x-y profile of change.For example, Fig. 6 A describes and changes on the x-y plane according to having of embodiment of the present invention
The periodic 600 of profile.The change in z direction for the profile of this periodic is the function of x-y profile.
Term used herein " two-dimensional grating structure " refers to also have in addition to the degree of depth on z direction only a dimension
The structure of the x-y profile of degree change.For example, Fig. 6 B describes having in the change of x direction but at y according to embodiment of the present invention
The periodic 602 of the indeclinable profile in direction.The change in z direction for the profile of this periodic is the function of x profile.
It is understood that two-dimensional structure is not changed in needing not to be unlimited in y-direction, but in pattern any destruction all by
It is considered as long scope, as any destruction in pattern on y direction separates with the destruction in pattern on x direction substantially further.
In personal feature in the embodiment of two dimension or the personal feature of three-dimensional grating structure, the first parameter is such as but not
Be limited to personal feature width, highly, length, upper fillet, base foundation or side wall angle.Such as refractive index in chip architecture
The optical property of the material of index and extinction coefficient (n and k) also can be imitated to use in optical metrology.
With regard to the use to library of spectra providing from the method for flow process Figure 100 and 300, in embodiments, this kind of side
Method includes based on consistent with the simulation parameter in library of spectra or the inconsistent parameter changing handling implement.By use such as but
The technology being not limited to feedback technique, feed-forward technique and in-situ control technology can perform to change the parameter of handling implement.Implementing
In mode, it is several that library of spectra can be used for setting up more accurately in apparatus structure profile and CD metering outfit formula (recipe)
What structure.In embodiments, library of spectra is used as a part for the verification of CD metering outfit, diagnosis and feature.
As described above, the use of library of spectra can include comparing simulated spectra with sample spectrum.An enforcement
In mode, the order of diffraction set simulated with represent from two or three-dimensional grating structure by elliptical polarization
(ellipsometric) diffracted signal that optical metering system produces.This kind of optical metering system is described referring to Fig. 9.
However, it is understood that identical concept and principle are equally applicable to other optical metering systems, such as reflectometric systems.
Represented diffracted signal can illustrate the two dimension such as, but not limited to profile, size or material composite or three-dimensional grating structure
Feature.
Figuratum film (such as figuratum semiconductor film or photoresist layer) can be indicated based on the simulated diffraction level calculating
Profile parameters, it is possible to be used for calibration automated procedure or equipment control.Fig. 7 describes the table according to embodiment of the present invention
Show the exemplary sequence determining and using for automated procedure and the operation of the structural parameters (such as profile parameters) of equipment control
Flow chart 700.
The operation 702 of reference flow sheet 700, the machine learning system (MLS) of library of spectra or training is exploited for from survey
The set of the diffracted signal of amount is extracted profile parameters.In operation 704, the MLS of use library of spectra or training determines structure
At least one profile parameters.In operation 706, at least one profile parameters is transferred into manufacture group (fabrication
Cluster), this manufacture group be configured to execution process operation, wherein process operation can complete measurement operation 704 before or
Perform in semiconductor fabrication processes afterwards.In operation 708, at least one profile parameters being transmitted is used to change by making
Make the process variable processing operation or the equipment setting that group performs.
For the more specific description of machine learning system and algorithm, see the entitled of submission on June 27th, 2003
OPTICAL METROLOGY OF STRUCTURES FORMED ON SEMICONDUCTOR WAFERS USING MACHINE
The U.S. Patent application No.7 of LEARNING SYSTEMS, 831,528, the whole of this application are herein incorporated by reference.
The description optimizing for the order of diffraction for two-dimensional repeated structure, sees the entitled of submission on March 24th, 2006
OPTIMIZATION OF DIFFRACTION ORDER SELECTION FOR TWO-DIMENSIONAL STRUCTURES's
U.S. Patent application No.7,428,060, the whole of this application are herein incorporated by reference.
Fig. 8 is to determine and utilize the structural parameters controlling for automatic business processing and equipment according to embodiment of the present invention
The block diagram of the system 800 of (such as profile parameters).System 800 includes the first manufacture group 802 and optical metering system
804.System 800 also includes the second manufacture group 806.Although in fig. 8 the second manufacture group 806 being described as be at the first manufacture group
After 802 it should be appreciated that, in system 800 (and as in process stream), the second manufacture group 806 is permissible
It is positioned at before the first manufacture group 802.
Photoetching process, such as exposure the photoresist layer being applied on wafer that develops, it is possible to use the first manufacture group 802 holds
OK.In an illustrative embodiments, optical metering system 804 includes optical metrology tool 808 and processor 810.Optics
Metering outfit 808 is configured to measure the diffracted signal obtaining from structure.If the diffracted signal that measurement is arrived and simulated diffraction
Signal matches, then one or more values of profile parameters are defined as the one of the profile parameters being associated with simulated diffraction signal
Individual or multiple values.
In an illustrative embodiments, optical metering system 804 can also include library of spectra 812, this library of spectra 812
There are multiple simulated diffraction signal and multiple values of the one or more profile parameters being associated with multiple simulated diffraction signals.As
Upper described, library of spectra can be produced in advance.Metrology processor 810 can by obtain from structure measurement to diffracted signal with
Multiple simulated diffraction signals in library of spectra compare.When finding the simulated diffraction signal of coupling, with in library of spectra
One or more values of the profile parameters that the simulated diffraction signal joined is associated are considered as in the wafer application manufacturing structure
One or more values of the profile parameters using.
System 800 also includes metrology processor 816.In an illustrative embodiments, processor 810 can be by one
Or one or more values of multiple profile parameters are sent to metrology processor 816.Metrology processor 816 may then based on use
One or more values of one or more profile parameters that optical metering system 804 determines adjust the one of the first manufacture group 802
Individual or multiple procedure parameter or equipment are arranged.Metrology processor 816 is also based on use optical metering system 804 to determine one
Individual or multiple profile parameters one or more values adjust one or more procedure parameter of the second manufacture group 806 or equipment sets
Put.Above it should be noted that, manufacture group 806 and before or after manufacturing group 802, wafer can be processed.At another
In illustrative embodiments, processor 810 is configured to the diffraction using the measurement as the input of machine learning system 814 to arrive
The set of signal and the profile parameters as the desired output of machine learning system 814 carry out training machine learning system 814.
Fig. 9 is to illustrate the structure outline utilizing optical metrology to determine on the semiconductor wafer according to embodiment of the present invention
System assumption diagram.Optical metering system 900 includes the metering projecting metrology beam 904 at the object construction 906s of wafer 908
Light beam source 902.Metrology beam 904 projects to object construction 906 with incidence angle θ.Metrology beam receiver 912 measures diffracted beam
910.Diffraction beam data 914 is sent to profile application server 916.The diffraction that measurement is arrived by profile application server 916
Beam data 914 compares with the library of spectra 918 of simulated diffraction beam data, and this simulated diffraction beam data represents that target is tied
The combination of the change of the critical dimension of structure and resolution ratio.
According to the embodiment of the present invention, at least a portion of simulated diffraction beam data is based on to two or more sides
The difference that parallactic angle determines.According to another implementation of the invention, at least a portion of simulated diffraction beam data is based on to two
The difference that individual or more incidence angles determine.In an illustrative embodiments, select the diffraction beam data 914 arriving with measurement
Library of spectra 918 example of optimal coupling.Although it is understood that the library of spectra of difraction spectrum or signal and the vacation being associated
Fixed profile is frequently used in explaination concept and principle, but present invention is equally applicable to include simulated diffraction signal and be associated
The spectroscopic data of set (such as in the recurrence of contours extract, neutral net and similar method) of profile parameters
Space.The supposition profile of selected library of spectra 916 example and the critical dimension being associated are considered corresponding to object construction 906
The critical dimension of feature and actual cross-section profile.Optical metering system 900 can utilize reflectometer, ellipsometer
Or other optical metrology device measure diffracted beam or signal (ellipsometer).
The set of the skeleton pattern being stored in library of spectra 918 can profile parameters collection is incompatible characterizes skeleton pattern by using
And change profile parameters set produces with the skeleton pattern producing the shape and size of change.Use profile parameters collection table
The process levying skeleton pattern is referred to as parametrization.For example it is assumed that skeleton pattern can be respectively by the wheel of its height of definition and width
Wide parameter h1 and w1 characterize.Other shapes of skeleton pattern and feature can carry out table by the quantity increasing profile parameters
Levy.For example, skeleton pattern can be by defining profile parameters h1, w1 and w2 of its height, bottom width and top width respectively
Characterize.It should be noted that, the width of skeleton pattern can be referred to as critical dimension (CD).For example, profile parameters w1
Can be described as respectively defining the bottom CD of skeleton pattern and top CD with w2.Should be understood that various types of profile
Parameter can be used for characterizing skeleton pattern, including but not limited to incidence angle (AOI), inclination angle (pitch), n&k, hardware parameter
(e.g., polarizing angle).
As described above, the set of the skeleton pattern being stored in library of spectra 918 can characterize skeleton pattern by changing
Profile parameters and produce.For example, by changing profile parameters h1, w1 and w2, the profile of the shape and size of change can be produced
Model.It should be noted that, one in profile parameters, the two or all three profile parameters can change relative to another
Become.Similarly, the profile parameters of skeleton pattern being associated with the simulated diffraction signal of coupling may be used to determine whether by
The architectural feature checking.For example, the profile parameters corresponding to the skeleton pattern of bottom CD may be used to determine whether to be examined
The bottom CD of structure.
Embodiments of the present invention can be applicable to various film stack (film stack).For example, in embodiments,
Film stack includes single layer or multiple layer.Further, in embodiments, analyze or measurement optical grating construction include three-dimensional micromodule and
Two dimension both assemblies.For example, can be by utilizing two dimension assembly to total based on the computational efficiency of simulated diffraction data
Simpler contribution and diffraction data and optimised.
For the ease of describe embodiments of the present invention, ellipse degree of bias amount (ellipsometric) optical metering system by with
In the above-mentioned concept of explanation and principle.It should be appreciated that identical concept and principle apply equally to other optical metering systems,
Such as reflection gauging system.In a similar manner, semiconductor wafer can be used in the application of explanation concept.Additionally, method
It is equally applicable to other workpiece with repetitive structure with process.In embodiments, optical scattering measurement is not such as but not
It is limited to optic ellipse polarization spectrum (SE), beam profile reflection measurement (BPR) and the measurement of enhanced ultraviolet line reflection
(eUVR) technology.
The present invention may be provided in computer program or software, and it can include being stored thereon with the machine of instruction
Device computer-readable recording medium, described instruction may be used for being programmed computer system (or other electronic equipments) to perform according to this
Bright process.Machine readable media includes any machine for storing or transmitting information in the way of machine (such as computer) is readable
System.For example, machine readable (such as computer-readable) medium includes (e.g., the read-only storage of machine (such as computer) readable storage medium storing program for executing
Device (" ROM "), random access memory (" RAM "), magnetic disk storage medium, optical storage media, flash memory device etc.), machine (as meter
Calculation machine) readable transmission medium (transmitting signals (e.g., carrier wave, infrared signal, data signal etc.) of electricity, light, sound or other forms)
Deng.
Figure 10 shows that the figure of the machine of the exemplary form with computer system 1000 describes, in this computer system
In 1000 can with set of instructions, this instruction set for make that machine is implemented in this method being discussed any one or
Multiple.In interchangeable embodiment, machine can connect (as network connects) and arrive LAN (LAN), Intranet, extranet
Or the other machines in internet.Machine can the capacity of server in client-sever network environment or client computer
(capacity) operate in, or operate as the peer machines in end-to-end (or distributed) network environment.Machine
Can be personal computer (PC), flat board PC, Set Top Box (STB), personal digital assistant (PDA), cell phone, the network equipment,
Server, network router, switch or bridge or be able to carry out the instruction set of the required movement that machine is taked (in order
Or other) any machine.Although additionally, illustrate only individual machine, but term " machine " also includes alone or combines holding
The machine of any one or more method to be implemented in this described method for the row instruction set (or multiple set) is (as calculated
Machine) arbitrary collection.
Exemplary computer system 1000 includes processor the 1002nd, main storage 1004 (e.g., read-only storage (ROM), sudden strain of a muscle
Deposit, dynamic random access memory (DRAM) (synchronous dram (SDRAM) or rambus DRAM (RDRAM) etc.)), static
Memory 1006 (e.g., flash memory, static RAM (SRAM) etc.) and additional storage 1018 (e.g., data storage
Equipment), they communicate with one another via bus 1030.
Processor 1002 represents one or more general purpose processing device, microprocessor, CPU etc..More special
Not, processor 1002 can be that sophisticated vocabulary calculates (CISC) microprocessor, the micro-process of Jing Ke Cao Neng (RISC)
The processor of the combination of device, very long instruction word (VLIW) microprocessor, the processor performing other instruction set or execution instruction set.
Processor 1002 can also is that one or more dedicated treatment facility, such as application-specific integrated circuit (ASIC), scene
Programmable gate array (FPGA), digital signal processor (DSP), network processing unit etc..Processor 1002 is configured to perform use
In the process logic 1026 performing operation described here.
Computer system 1000 can also include Network Interface Unit 1008.Computer system 1000 can also include video
Display unit 1010 (such as liquid crystal display (LCD) or cathode-ray tube (CRT)), Alphanumeric Entry Device 1012 are (such as key
Dish), cursor control device 814 (such as mouse) and signal generating apparatus 1016 (e.g., loudspeaker).
Additional storage 1018 can include machine-accessible storage medium (or more specifically computer-readable storage medium
Matter) 1031, in the one or more instructions being stored thereon with any one or the many persons embodying method described herein or function
Collection (such as software 1022).During computer system 1000 performs this software 1022, software 1022 can also fully or
In being at least partially situated at main storage 1004 and/or in processor 1002, main storage 1004 and processor 1002 are also constituted
Machinable medium.Software 1022 can also be transmitted on network 1020 via Network Interface Unit 1008 or be received.
Although in the exemplary embodiment machine-accessible storage medium 1031 is shown as single medium, but term " machine
Device readable storage medium storing program for executing " should be believed to comprise the single medium storing one or more instruction set or multiple medium (e.g., is concentrated
Formula or distributed data base, and/or the caching that is associated and server).Term " machinable medium " also should be recognized
It is the instruction of any one or the many persons that include can storing or encode the method being performed and made by machine machine to perform the present invention
The arbitrary medium of collection.Term " machinable medium " therefore should be believed to comprise but be not limited to solid-state memory and light and
Magnetic medium.
According to the embodiment of the present invention, machine-accessible storage medium has the instruction being stored thereon, and this instruction makes
Data handling system performs the method for the accurate neural metwork training for the CD metering based on storehouse.The method includes optimizing to be used
In the threshold value of principal component analysis (PCA) of sets of spectral data to provide principal component (PC) value.One or more neutral nets
Target is estimated based on this PC value.One or more neutral nets are trained based on training objective and PC value.Library of spectra base
In one or more training after neutral net and provide.
In embodiments, the threshold value optimizing PCA includes determining the spectrum domain of lowest level.
In embodiments, the threshold value optimizing PCA includes determining a PCA threshold value, applies PCA to frequency spectrum data collection
Close, calculate the error of spectrum being introduced by application PCA, and compare error of spectrum and pectrum noise rank.A this kind of reality
Execute in mode, if error of spectrum is less than pectrum noise rank, then a PCA threshold value is set to PC value.Another this
In class embodiment, if error of spectrum is more than or equal to pectrum noise rank, it is determined that the 2nd PCA threshold value, and repeat to answer
With, calculate and comparison procedure.
In embodiments, the threshold value optimizing PCA includes using Mueller territory error tolerance.
In embodiments, pinpoint accuracy library of spectra includes simulated spectrum, and the method also includes comparing simulated spectrum
With sample spectra.
According to another embodiment of the present invention, machine-accessible storage medium has the instruction being stored thereon, and this refers to
Order makes data handling system perform the method for the fast neuronal network training for the CD metering based on storehouse.The method includes providing
The training objective of first nerves network.Training first nerves network.This training includes that the neuron with predetermined quantity starts, and repeatedly
The quantity of generation ground increase neuron is until reaching the sum of the neuron optimizing.Sum product based on training and the neuron optimizing
Raw nervus opticus network.There is provided library of spectra based on nervus opticus network.
In embodiments, the quantity of neuron is increased iteratively until the sum reaching the neuron of optimization includes using
Improved Levenberg-Marquardt method.
In embodiments, the quantity increasing neuron iteratively includes increasing the god in the hidden layer of first nerves network
Through first quantity.
In embodiments, library of spectra includes simulated spectrum, and the method also includes comparing simulated spectrum and sample frequency
Spectrum.
To measurement to the analysis of spectrum generally comprise by measurement to sample spectrum compare with simulated spectra to push away
Derive the model parameter value most preferably describing the sample that measurement is arrived.Figure 11 is to represent for setting up ginseng according to embodiment of the present invention
The flow process of the operation in the method for numberization model and the library of spectra starting (e.g., coming from one or more workpiece) with sample spectrum
Figure.
In operation 1101, the spy of the material forming the sample characteristics that measurement is arrived specified by one group of material document defined by the user
Levy (such as n, k value).
In operation 1102, scatterometry user by selecting one or more material document comes integrated corresponding to be measured
Current material heap in periodic feature defines the nominal model of the expected composition of sample.This user-defined model
Can be parameterized by the definition of the nominal value of model parameter further, such as characterize just in the shape of measured characteristic
Thickness, critical dimension (CD), side wall angle (SWA), highly (HT), edge roughness, radius of corner etc..According to 2D model (i.e.
Profile) or 3D model whether be defined, there is 30-50 or more this model parameter be not uncommon for.
According to parameterized model, the simulated spectra for the set of given grating parameter value can use such as strict coupling
The rigorous diffraction modeling algorithm of the algorithm closing wave analysis (RCWA) calculates.Then regression analysis is performed at operation 1103s
Until parameterized model converges on the set of the model parameter value characterizing final skeleton pattern (for 2D), this final profile die
Type corresponds to simulated spectra, and the difraction spectrum that measurement is arrived by this simulated spectra mates matches criteria with predetermined.Assuming that with mate
The final skeleton pattern that is associated of simulated diffraction signal represent the TP of the structure producing model.
Then operation 1104s at, the simulated spectra of coupling and/or the skeleton pattern of related optimization can be utilized with
Produced the storehouse of simulated diffraction spectra by the value of the final skeleton pattern of disturbance parameter.Then operation in production environment
Whether scatterometry system can use the final storehouse of simulated diffraction spectra to determine the optical grating construction subsequently measuring root
Manufactured according to specification.The generation 1104 in storehouse can include each that the machine learning system of such as neutral net is in multiple profile
Profile produces simulated spectra information, and each profile includes the set of one or more modeled profile parameters.In order to produce storehouse,
Machine learning system itself may be subjected to the training of some training data set based on spectral information.This training is permissible
It is that intensity calculates and/or may be repeated for different models and/or profile parameters territory.Produce in the computational load in storehouse
Low-down efficiency may be caused by the decision of the size to training data set for the user.For example, training greatly is selected
Data acquisition system may cause unnecessary training to calculate, and may need re-training with the not enough training data set of size simultaneously
Produce storehouse.
Some embodiments described here include automatically determining the training data using in training machine learning system
The size of set.Usually, training data set is arranged according to size based on the degree of convergence of data acquisition system characteristic measure standard, and
And the estimation that can be based further on the error to last solution is arranged according to size.Training data set be gradually expanded and test with
Identify and restrain, and in some embodiments, it is provided that estimate the last solution error of this sample size.Gradually extension and
Test is performed until convergence is satisfied and/or the estimation of last solution error reaches thresholding.
It because the method for determination training matrix size described here does not needs individually to train, is used for neural metwork training
Good training data sample set identified quickly and efficiently, and last solution error can be controlled well.With instruction
Practicing data sample set to be identified, machine learning system can then be trained to produce desired object function information.One
In individual specific embodiment, machine learning system is trained to produce the storehouse of simulated spectra information (such as diffracted signal), this storehouse
The parameter of the unknown sample (such as diffraction grating or wafer cycle structure) of minimizing scatterometry systematic survey can be used to.
Should be understood that embodiment of the present invention thought and within the scope of above method can be in various environment feelings
It is employed under condition.For example, in embodiments, process as described above semiconductor, solar energy, Light-Emitting Diode (LED) or
Related manufacture process is performed.In embodiments, process as described above is used for single or integrated metering work
In tool.In embodiments, process as described above is used in single or multiple measurement goal regression.
Thus, have been disclosed for the side of the neural metwork training accurately and quickly for the CD metering based on storehouse in this
Method.According to the embodiment of the present invention, for including optimizing frequency based on the method for the accurate neural metwork training of the CD in storehouse metering
The threshold value of the principal component analysis (PCA) of modal data set is to provide principal component (PC) value.The method also includes estimating one or many
The training objective of individual neutral net.The method also includes training one or more neutral net based on PC value and training objective.
Neutral net after the method also includes based on one or more training provides library of spectra.In one embodiment, optimize
The threshold value of PCA includes the spectrum domain determining lowest level.According to the embodiment of the present invention, fast for measure based on the CD in storehouse
The method of speed neural metwork training includes providing the training objective for first nerves network.The method also include training this first
Neutral net, this training includes that the predetermined number with neuron starts and increases the quantity of neuron iteratively until reaching to optimize
The total quantity of neuron.The method also includes producing nervus opticus net based on the total quantity of training and the neuron optimizing
Network.The method also includes providing library of spectra based on nervus opticus network.In one embodiment, neuron is increased iteratively
Quantity until the total quantity of neuron reaching to optimize includes using improved Levenberg-Marquardt method.