CN116802647A

CN116802647A - Modular automatic encoder model for manufacturing process parameter estimation

Info

Publication number: CN116802647A
Application number: CN202180088188.8A
Authority: CN
Inventors: A·奥纳斯; B·J·M·铁默斯马; N·弗赫尔; R·德克斯
Original assignee: ASML Holding NV
Current assignee: ASML Holding NV
Priority date: 2020-12-30
Filing date: 2021-12-20
Publication date: 2023-09-22
Also published as: CN116710928A; CN116710933A

Abstract

A modular automatic encoder model is described. The modular automatic encoder model includes: an input model configured to process one or more inputs into a first-level dimension suitable for combination with other inputs; a common model configured to: reducing the dimensions of the combined processed inputs to produce low-dimensional data in potential space; and expanding the low-dimensional data in the potential space into one or more expanded versions of the one or more inputs adapted to produce one or more different outputs; an output model configured to generate the one or more different outputs using the one or more expanded versions of the one or more inputs, the one or more different outputs being approximations of the one or more inputs; and a predictive model configured to estimate one or more parameters based on the low-dimensional data in the potential space.

Description

Modular automatic encoder model for manufacturing process parameter estimation

Cross Reference to Related Applications

The present application claims priority from european application 20217886.9 submitted at 30 of 12 months 2020, european application 21168585.4 submitted at 15 of 4 months 2021, european application 20217883.6 submitted at 30 of 12 months 2020, european application 21169035.9 submitted at 18 of 4 months 2021, european application 21187893.9 submitted at 27 of 7 months 2021, european application 20217888.5 submitted at 30 of 12 months 2020, and european application 21168592.0 submitted at 15 of 4 months 2021, all of which are incorporated herein by reference.

Technical Field

The present specification relates to methods and systems for estimating manufacturing process parameters through a modular automatic encoder model.

Background

A lithographic apparatus is a machine that is configured to apply a desired pattern onto a substrate. Lithographic apparatus can be used, for example, in the manufacture of Integrated Circuits (ICs). The lithographic apparatus may, for example, project a pattern (also commonly referred to as a "design layout" or "design") at a patterning device (e.g., a mask) onto a layer of radiation-sensitive material (resist) disposed on a substrate (e.g., a wafer).

To project a pattern onto a substrate, a lithographic apparatus may use electromagnetic radiation. The wavelength of this radiation determines the smallest dimension of the features that can be formed on the substrate. Typical wavelengths currently used are 365nm (i-line), 248nm, 193nm and 13.5nm. A lithographic apparatus having Extreme Ultraviolet (EUV) radiation with a wavelength in the range of 4nm to 20nm (e.g., 6.7nm or 13.5 nm) may be used to form smaller features on a substrate than a lithographic apparatus using radiation, for example, with a wavelength of 193 nm.

Low-k ₁ Photolithography may be used to process features that are smaller in size than the typical resolution limits of a lithographic apparatus. In such a process, the resolution formula may be expressed as cd=k ₁ X lambda/NA, where lambda is the wavelength of the radiation used, NA is the numerical aperture of projection optics in the lithographic apparatus, CD is the "critical dimension" (typically the dimension of the smallest feature printed, but in this case half pitch) and k ₁ Is an empirical resolution factor. Generally, k ₁ The smaller it is more difficult to reproduce a pattern on the substrate that resembles the shape and size planned by the circuit designer in order to achieve a particular electrical functionality and performance.

To overcome these difficulties, complex fine tuning steps may be applied to the lithographic projection apparatus and/or the design layout. These steps include, for example, but are not limited to, optimization of NA, customizing illumination schemes, using phase shift patterning devices, various optimizations of design layout such as optical proximity effect correction (OPC, sometimes also referred to as "optical and process correction") in the design layout, or other methods commonly defined as "resolution enhancement techniques" (RET). Alternatively, a tight control loop for controlling the stability of the lithographic apparatus may be used to improve the reproduction of the pattern at low k 1.

Disclosure of Invention

The automatic encoder may be configured for metrology and/or for parameter inference and/or for other solutions for other purposes. Such deep learning model architecture is generic and scalable to arbitrary sizes and complexities. The auto encoder is configured to compress a high-dimensional signal (e.g., a pupil image in a semiconductor manufacturing process) to an efficient low-dimensional representation of the same signal. Parameter inference (i.e., regression) is then performed from the low-dimensional representation against the set of known tags. By compressing the signal first, the inference problem is significantly simplified compared to performing regression directly on the high-dimensional signal.

However, it is often difficult to understand the information flow inside a typical automatic encoder. Information at the input, at the level or stage of the compressed low-dimensional representation, and at the output can be inferred. Information between these points cannot be easily interpreted.

Compared with the traditional integral automatic encoder model, the module type automatic encoder model has lower rigidity. The present modular automatic encoder model has a greater number of trainable and/or otherwise adjustable components. The modularity of the present model makes the present model easier to interpret, define and expand. The complexity of the present model is easily adjusted and is high enough to model the process that generates the data provided to the model, but low enough to avoid modeling noise or other unwanted characteristics (e.g., the present model is configured to avoid overfitting the data provided). Since the process (or at least aspects of the process) that generates the data is often unknown, selecting the appropriate network complexity typically involves some intuitiveness and trial-and-error. For this reason, it is particularly desirable to provide a modular, easily understood, and easily scaled up and down model architecture in complexity.

It should be noted that the term "auto encoder" as used in connection with the present modular auto encoder model may generally refer to one or more auto encoders and/or other auto encoders configured for partially supervised learning using potential space for parameter estimation. This may also include, for example, a single automatic encoder trained using semi-supervised learning.

According to an embodiment, a non-transitory computer-readable medium having instructions thereon is provided. The instructions are configured to cause a computer to execute a modular automatic encoder model for parameter estimation. The modular automatic encoder model includes one or more input models configured to process one or more inputs into a first-level dimension suitable for combination with other inputs. The modular automatic encoder model includes a common model configured to: combining the processed inputs and reducing the dimensions of the combined processed inputs to generate low-dimensional data in a potential space, the low-dimensional data in the potential space having a resulting reduced second-level dimension that is smaller than the first-level dimension; and expanding the low-dimensional data in the potential space into one or more expanded versions of the one or more inputs, the one or more expanded versions of the one or more inputs having an increased dimension as compared to the low-dimensional data in the potential space, the one or more expanded versions of the one or more inputs adapted to produce one or more different outputs. (note that the extended version does not have to approximate the input of the common model, since the approximation is enforced on the final output.) the modular automatic encoder model includes: one or more output models configured to use the one or more expanded versions of the one or more inputs to generate the one or more different outputs, the one or more different outputs being approximations of the one or more inputs, the one or more different outputs having the same or increased dimensions as compared to the expanded version of the one or more inputs. The modular automatic encoder model includes a prediction model configured to estimate one or more parameters based on the low-dimensional data and/or the one or more different outputs in the potential space. In some embodiments, the modular automatic encoder model (and/or any of the individual components of the model described herein) may be configured before and/or after the training data is seen.

In some embodiments, the separate input and/or output models include two or more sub-models that are associated with different portions of the sensing operation and/or manufacturing process. In some embodiments, the single output model includes the two or more sub-models, and the two or more sub-models include a sensor model and a stack model for semiconductor sensor operation.

In some embodiments, the one or more input models, the common model, and the one or more output models are separate from each other and correspond to process physical property differences in different portions of a manufacturing process and/or sensing operation such that each of the one or more input models, the common model, and/or the one or more output models, among other models in the modular automatic encoder model, may be trained together and/or separately based on the process physical properties of the respective portions of the manufacturing process and/or sensing operation, but configured separately.

In some embodiments, the number of the one or more input models and the number of the one or more output models are determined based on process physical property differences in different portions of the manufacturing process and/or sensing operation.

In some embodiments, the number of input models is different from the number of output models.

In some embodiments, the common model includes an encoder-decoder architecture and/or a variant encoder-decoder architecture; processing the one or more inputs into the first-level dimension, and reducing the dimension of the combined processed inputs includes encoding; and expanding the low-dimensional data in the potential space into the one or more expanded versions of the one or more inputs includes decoding.

In some embodiments, the modular automatic encoder model is trained by comparing the one or more different outputs to respective inputs, and adjusting parameterization of the one or more input models, the common model, and/or the one or more output models to reduce or minimize differences between outputs and respective inputs.

In some embodiments, the common model includes an encoder and a decoder, and the modular automatic encoder model is trained by: applying a change to the low-dimensional data in the potential space such that the common model decodes a relatively more continuous potential space to produce a decoder signal; recursively providing the decoder signal to the encoder to generate new low-dimensional data; comparing the new low-dimensional data with the low-dimensional data; and adjusting one or more components of the modular automatic encoder model based on the comparison to reduce or minimize differences between the new low-dimensional data and the low-dimensional data.

In some embodiments, the one or more parameters are semiconductor manufacturing process parameters; the one or more input models and/or the one or more output models may include, by way of non-limiting example only, dense feed-forward layer, convolutional layer, and/or residual network architecture of the modular automatic encoder model; the common model may include, by way of non-limiting example only, a feed-forward layer and/or a residual layer; and the prediction model may include, by way of non-limiting example only, a feed-forward layer and/or a residual layer.

According to another embodiment, a method for parameter estimation is provided. The method includes processing, by one or more input models of a modular automatic encoder model, one or more inputs into a first-level dimension suitable for combination with other inputs; combining the processed inputs by a common model of the modular automatic encoder model and reducing the dimensions of the combined processed inputs to produce low-dimensional data in a potential space, the low-dimensional data in the potential space having a resulting reduced second-level dimension that is smaller than the first-level dimension; expanding, by the common model, the low-dimensional data in the potential space into one or more expanded versions of the one or more inputs, the one or more expanded versions of the one or more inputs having an increased dimension compared to the low-dimensional data in the potential space, the one or more expanded versions of the one or more inputs adapted to produce one or more different outputs; using, by one or more output models of the modular automatic encoder model, the one or more expanded versions of the one or more inputs to generate the one or more different outputs, the one or more different outputs being approximations of the one or more inputs, the one or more different outputs having the same or increased dimensions as the expanded version of the one or more inputs; and estimating, by a predictive model of the modular automatic encoder model, one or more parameters based on the low-dimensional data and/or the one or more outputs in the potential space. In some embodiments, the separate input and/or output models include two or more sub-models that are associated with different portions of the sensing operation and/or manufacturing process.

In some embodiments, the single output model includes the two or more sub-models, and the two or more sub-models include a sensor model and a stack model for semiconductor sensor operation.

In some embodiments, the method further comprises: the number of the one or more input models and/or the number of the one or more output models is determined based on process physical property differences in different portions of the manufacturing process and/or the sensing operation.

In some embodiments, the common model includes an encoder decoder architecture and/or a variant encoder decoder architecture; processing the one or more inputs into the first-level dimension, and reducing the dimension of the combined processed inputs includes encoding; and expanding the low-dimensional data in the potential space into the one or more expanded versions of the one or more inputs includes decoding.

In some embodiments, the method further comprises: the modular automatic encoder model is trained by comparing the one or more distinct outputs with respective inputs, and adjusting parameterization of the one or more input models, the common model, and/or the one or more output models to reduce or minimize differences between outputs and respective inputs.

In some embodiments, the common model includes an encoder and a decoder, and the method further includes training the modular automatic encoder model by: applying a change to the low-dimensional data in the potential space such that the common model decodes a relatively more continuous potential space to produce a decoder signal; recursively providing the decoder signal to the encoder to generate new low-dimensional data; comparing the new low-dimensional data with the low-dimensional data; and adjusting one or more components of the modular automatic encoder model based on the comparison to reduce or minimize differences between the new low-dimensional data and the low-dimensional data.

According to another embodiment, there is provided a system comprising: one or more input models of the modular automatic encoder model, the one or more input models configured to process one or more inputs into a first-level dimension suitable for combination with other inputs; a common model of the modular automatic encoder models, the common model configured to: combining the processed inputs and reducing the dimensions of the combined processed inputs to generate low-dimensional data in a potential space, the low-dimensional data in the potential space having a resulting reduced second-level dimension that is smaller than the first-level dimension; and expanding the low-dimensional data in the potential space into one or more expanded versions of the one or more inputs, the one or more expanded versions of the one or more inputs having an increased dimension as compared to the low-dimensional data in the potential space, the one or more expanded versions of the one or more inputs adapted to produce one or more different outputs; one or more output models of the modular automatic encoder model, the one or more output models configured to use the one or more expanded versions of the one or more inputs to generate the one or more different outputs, the one or more different outputs being approximations of the one or more inputs, the one or more different outputs having the same or increased dimensions as the expanded version of the one or more inputs; and a predictive model of the modular automatic encoder model, the predictive model configured to estimate one or more parameters based on the low-dimensional data and/or the one or more different outputs in the potential space.

In some embodiments, the separate input and/or output models include two or more sub-models that are associated with different portions of the sensing operation and/or manufacturing process. In some embodiments, the single output model includes the two or more sub-models, and the two or more sub-models include a sensor model and a stack model for semiconductor sensor operation. In some embodiments, the one or more input models, the common model, and the one or more output models are separate from each other and correspond to process physical property differences in different portions of a manufacturing process and/or sensing operation such that each of the one or more input models, the common model, and/or the one or more output models, among other models in the modular automatic encoder model, may be trained together and/or separately based on the process physical properties of the respective portions of the manufacturing process and/or sensing operation, but configured separately.

According to another embodiment, a non-transitory computer-readable medium having instructions thereon is provided. The instructions are configured to cause a computer to execute a machine learning model for parameter estimation. The machine learning model includes: one or more first models configured to process one or more inputs into a first-level dimension suitable for combination with other inputs; a second model configured to: combining the processed one or more inputs and reducing the dimension of the combined processed one or more inputs; expanding the combined processed one or more inputs into one or more recovered versions of the one or more inputs, the one or more recovered versions of the one or more inputs adapted to produce one or more different outputs; one or more third models configured to use the one or more recovered versions of the one or more inputs to generate the one or more different outputs; and a fourth model configured to estimate parameters based on the combined compressed input and the one or more different outputs with reduced dimensions. In some embodiments, the separate models of the one or more third models include two or more sub-models that are associated with different portions of the manufacturing process and/or sensing operation.

In some embodiments, the two or more sub-models include a sensor model and a stack model for a semiconductor manufacturing process.

In some embodiments, the one or more first models, the second model, and the one or more third models are separate from each other and correspond to process physical property differences in different portions of the manufacturing process and/or sensing operation such that each of the one or more first models, the second model, and/or the one or more third models, among other models in the machine learning model, may be trained together and/or separately, but configured separately, based on the process physical properties of the respective portions of the manufacturing process and/or sensing operation.

In some embodiments, the number of the one or more first models and the number of the one or more third models are determined based on differences in process physical properties in different portions of the manufacturing process and/or sensing operation.

In some embodiments, the number of first models is different from the number of second models.

In some embodiments, the second model includes an encoder-decoder architecture and/or a variant encoder-decoder architecture; compressing the one or more inputs includes encoding; and expanding the combined compressed one or more inputs into one or more recovered versions of the one or more inputs includes decoding.

In some embodiments, the machine learning model is trained by comparing the one or more different outputs to respective inputs, and adjusting the one or more first models, the second model, and/or the one or more third models to reduce or minimize differences between the outputs and the respective inputs.

In some embodiments, the second model includes an encoder and a decoder, and the second model is trained by: applying varying low-dimensional data in the potential space such that the second model decodes a relatively more continuous potential space to produce a decoder signal; recursively providing the decoder signal to the encoder to generate new low-dimensional data; comparing the new low-dimensional data with the low-dimensional data; and adjusting the second model based on the comparison to reduce or minimize differences between the new low-dimensional data and the low-dimensional data.

In some embodiments, the parameter is a semiconductor manufacturing process parameter; the one or more first models and/or the one or more third models comprise dense feed-forward layer, convolutional layer, and/or residual network architecture of the machine learning model; the second model comprises a feedforward layer and/or a residual layer; and the fourth model comprises a feed forward layer and/or a residual layer.

Data driven inference methods have been proposed for semiconductor metrology operations and for the task of parameter estimation. Data driven inference methods rely on a large collection of measurements and models that map measured features to parameters of interest, where labels for these parameters are obtained via carefully designed targets on the wafer or from third party measurements. Current methods are capable of measuring a significant number of channels (multiple wavelengths, observations at multiple wafer rotations, four light polarization schemes, etc.). However, due to practical timing constraints, the number of channels needs to be limited to a subset of those available for generating measurements. To select the best channel, a brute force approach to test all possible channel combinations is typically used. This is time consuming, resulting in longer measurement and/or process recipe generation times. In addition, brute force methods may be prone to fitting, introducing different biases and/or other drawbacks for each channel.

Advantageously, the present modular automatic encoder model is configured for estimating the parameter of interest by estimating the available amount of information content based on the available channels using a subset of the plurality of input models to estimate the parameter of interest from a combination of available channels from measurement data from the optical metrology platform. The present model is configured to train by randomly or otherwise iteratively varying (e.g., sub-selecting) the number of channels used to approximate the input during the iterative training step. This iterative variation/sub-selection ensures that the model maintains predictability/consistency for any combination of input channels. Furthermore, since the information content present in the input represents all channels (e.g., since each channel is part of the subset of selected channels for at least one training iteration), the resulting model will not include a bias specific to one particular channel.

It should be noted that the term "auto encoder" used in connection with the present modular auto encoder model may generally refer to one or more auto encoders and/or other auto encoders configured for partially supervised learning using latent space for parameter estimation.

According to an embodiment, a non-transitory computer-readable medium having instructions thereon is provided. The instructions are configured to cause a computer to execute a modular auto-encoder module for estimating a parameter of interest by estimating an available quantity of information content based on available channels using a subset of a plurality of input models to estimate a combination of available channels from measurement data from an optical metrology platform. The instructions cause operations comprising: causing the plurality of input models to compress a plurality of inputs based on the available channels such that the plurality of inputs are adapted to be combined with one another; and causing a common model to combine the compressed inputs and generate low-dimensional data in a potential space based on the combined compressed inputs, the low-dimensional data estimating the acquirable quantity, and the low-dimensional data in the potential space configured to be used by one or more additional models to generate approximations of the plurality of inputs and/or to estimate parameters based on the low-dimensional data.

In some embodiments, the instructions cause other operations comprising: training the modular automatic encoder model by: iteratively changing a subset of the compressed inputs to be combined by the common model and used to generate low-dimensional training data; comparing one or more training approximations and/or training parameters generated or predicted based on the low-dimensional training data to respective references; and adjusting one or more of the plurality of input models, the common model, and/or one or more of the additional models to reduce or minimize differences between the one or more training approximations and/or the training parameters and the reference based on the comparison; such that the common model is configured to combine the compressed inputs and generate the low-dimensional data for generating the approximations and/or estimated parameters, regardless of which of the plurality of inputs are combined by the common model.

In some embodiments, the variation of individual iterations is random, or the variation of individual iterations varies in a statistically significant manner.

In some embodiments, the variation of the individual iterations is configured such that after a target number of iterations, each of the compressed inputs has been included at least once in the subset of compressed inputs.

In some embodiments, iteratively changing the subset of compressed inputs combined by the common model and used to generate low-dimensional training data includes channel selection from among a set of possible available channels associated with the optical metrology platform.

In some embodiments, the iteratively changing, comparing, and adjusting are repeated until the target converges.

In some embodiments, the iteratively varying, comparing, and adjusting are configured to reduce or eliminate bias that may occur for a combined search throughout a channel.

In some embodiments, the one or more additional models include: one or more output models configured to generate approximations of the one or more inputs; and a predictive model configured to estimate the parameters based on the low-dimensional data, and one or more of the plurality of input models, the common model, and/or the additional model is configured to be adjusted to reduce or minimize differences between one or more training approximations and/or training manufacturing process parameters and corresponding references.

In some embodiments, the plurality of input models, the common model, and the one or more output models are separate from each other and correspond to process physical property differences in different portions of a manufacturing process and/or sensing operation such that each of the plurality of input models, the common model, and/or the one or more output models may be trained together and/or separately, but configured separately, based on the process physical properties of the respective portions of the manufacturing process and/or sensing operation, among other models in the modular automatic encoder model.

In some embodiments, the separate input model includes: a neural network block comprising a dense feed-forward layer, convolutional layer, and/or residual network architecture of the modular automatic encoder model; and the common model comprises a neural network block comprising a feed-forward layer and/or a residual layer.

According to another embodiment, a method is provided for estimating an available amount of information content by using a subset of a plurality of input models of a modular automatic encoder model based on available channels of measurement data from an optical metrology platform to estimate a parameter of interest from a combination of available channels of measurement data from the optical metrology platform. The method comprises the following steps: causing the plurality of input models to compress a plurality of inputs based on the available channels such that the plurality of inputs are adapted to be combined with one another; and causing a common model of the modular automatic encoder models to combine the compressed inputs and generate low-dimensional data in a potential space based on the combined compressed inputs, the low-dimensional data estimating the acquirable quantity, and the low-dimensional data in the potential space configured to be used by one or more additional models to generate approximations of the plurality of inputs and/or to estimate parameters based on the low-dimensional data.

In some embodiments, the method further comprises training the modular automatic encoder model by: iteratively changing a subset of compressed inputs combined by the common model and used to generate low-dimensional training data; comparing one or more training approximations and/or training parameters generated or predicted based on the low-dimensional training data to respective references; and adjusting one or more of the plurality of input models, the common model, and/or one or more of the additional models to reduce or minimize differences between the one or more training approximations and/or the training parameters and the reference based on the comparison; such that the common model is configured to combine the compressed inputs and generate the low-dimensional data for generating the approximations and/or estimated parameters, regardless of which of the plurality of inputs are combined by the common model.

According to another embodiment, a system is provided for estimating an available amount of information content by using a subset of a plurality of input models of a modular automatic encoder model based on available channels of measurement data from an optical metrology platform to estimate a parameter of interest from a combination of available channels of measurement data from the optical metrology platform. The system comprises: the plurality of input models configured to compress a plurality of inputs based on the available channels such that the plurality of inputs are adapted to be combined with each other; and a common model of the modular automatic encoder models, the common model configured to combine the compressed inputs and generate low-dimensional data in a potential space based on the combined compressed inputs, the low-dimensional data estimating the acquirable quantity, and the low-dimensional data in the potential space configured to be used by one or more additional models to generate approximations of the plurality of inputs and/or estimate parameters based on the low-dimensional data.

In some embodiments, the modular automatic encoder model is configured to be trained by: iteratively changing a subset of compressed inputs combined by the common model and used to generate low-dimensional training data; comparing one or more training approximations and/or training parameters generated or predicted based on the low-dimensional training data to respective references; and adjusting one or more of the plurality of input models, the common model, and/or one or more of the additional models to reduce or minimize differences between the one or more training approximations and/or the training parameters and the reference based on the comparison; such that the common model is configured to combine the compressed inputs and generate the low-dimensional data for generating the approximations and/or estimated parameters, regardless of which of the plurality of inputs are combined by the common model.

In some embodiments, the one or more additional models include one or more output models configured to generate approximations of the one or more inputs, and a predictive model configured to estimate the parameters based on the low-dimensional data, and one or more of the plurality of input models, the common model, and/or the additional models are configured to be adjusted to reduce or minimize one or more training approximations and/or differences between the training manufacturing process parameters and the respective references.

According to another embodiment, a non-transitory computer-readable medium having instructions thereon configured to cause a computer to execute a modular automatic encoder model for parameter estimation is provided. The instructions cause operations comprising: causing a plurality of input models to compress a plurality of inputs such that the plurality of inputs are adapted to be combined with each other; and causing a common model to combine the compressed inputs and generate low-dimensional data in a potential space based on the combined compressed inputs, the low-dimensional data in the potential space configured to be used by one or more additional models to generate approximations of the one or more inputs and/or predict the parameters based on the low-dimensional data, wherein the common model is configured to combine the compressed inputs and generate the low-dimensional data regardless of which of the plurality of inputs are combined by the common model.

In some embodiments, the instructions cause other operations comprising: the modular automatic encoder is trained by: iteratively changing a subset of compressed inputs combined by the common model and used to generate low-dimensional training data; comparing one or more training approximations and/or training parameters generated or estimated based on the low-dimensional training data to corresponding references; and adjusting one or more of the plurality of input models, the common model, and/or the additional model to reduce or minimize the one or more training approximations and/or differences between the training parameters and the reference based on the comparison; such that the common model is configured to combine the compressed inputs and generate the low-dimensional data for use in generating the approximations and/or estimating process parameters, regardless of which of the plurality of inputs are combined by the common model.

In some embodiments, the variation of individual iterations is random, or the variation of individual iterations varies in a statistically significant manner. In some embodiments, the variation of the individual iterations is configured such that after a target number of iterations, each of the compressed inputs has been included at least once in the subset of compressed inputs.

In some embodiments, the one or more additional models include one or more output models configured to generate approximations of the one or more inputs, and a predictive model configured to estimate parameters based on the low-dimensional data, and adjust one or more of the plurality of input models, the common model, and/or the additional model based on the comparison to reduce or minimize the one or more training approximations and/or differences between the training parameters and the reference includes adjusting at least one output model and/or the predictive model.

In some embodiments, iteratively changing the subset of compressed inputs combined by the common model and used to generate low-dimensional training data includes channel selection from among a set of possible channels associated with one or more aspects of a semiconductor manufacturing process and/or sensing operation.

In some embodiments, the iteratively varying, comparing, and adjusting are configured to reduce or eliminate bias relative to bias that may occur for a combined search throughout a channel.

In some embodiments, the parameter is a semiconductor manufacturing process parameter; the individual input models include a neural network block comprising dense feed-forward layer, convolutional layer, and/or residual network architecture of the modular automatic encoder model; and the common model comprises a neural network block comprising a feed-forward layer and/or a residual layer.

In semiconductor manufacturing, optical metrology can be used to measure critical stack parameters directly on a product (e.g., patterned wafer) structure. Machine learning methods are typically applied to optical scatterometry data acquired using a metrology platform. These machine learning methods are conceptually equivalent to supervised learning methods, i.e., learning from a labeled dataset. The success of such a method depends largely on the quality of the label. Typically, the labeled dataset is generated by measuring and labeling known targets on the wafer.

One of the main challenges in using targets in this way is the fact that the targets only provide very accurate relative labels. This means that within one cluster of objects, there is some unknown cluster bias, the exact label on the object is known. Determining such unknown cluster bias and thus obtaining absolute labels is critical to the accuracy of the target-based recipe. The step of estimating the cluster bias is commonly referred to as tag correction.

Advantageously, the present modular automatic encoder model is configured such that known properties of the input (e.g., domain knowledge) can be embedded into the model during the training phase, which reduces or eliminates any such bias in subsequent inferences made by the model. In other words, the present modular automatic encoder is configured such that known (e.g., symmetric) properties of the input are embedded into the decoded portion of the model, and these embedded known properties allow the model to make unbiased inferences.

It should be noted that the term "auto encoder" used in connection with the present modular auto encoder model may generally refer to one or more auto encoders and/or other auto encoders configured for partially supervised learning using potential space for parameter estimation.

According to an embodiment, a non-transitory computer-readable medium having instructions thereon is provided. The instructions are configured to cause a computer to execute a modular automatic encoder model having an extended range of applicability for estimating a parameter of interest of an optical metrology operation by enforcing known properties of an input of the modular automatic encoder model in a decoder of the modular automatic encoder model. The instructions cause operations comprising: causing an encoder of the modular automatic encoder model to encode an input to produce a low-dimensional representation of the input in potential space; and causing the decoder of the modular automatic encoder model to generate an output corresponding to the input by decoding the low-dimensional representation. The decoder is configured to force known properties of the encoded input during decoding to produce the output. The known property is associated with a known physical relationship between the low-dimensional representation in the potential space and the output. A parameter of interest is estimated based on the output and/or the low-dimensional representation of the input in the potential space.

In some embodiments, enforcing includes using a penalty term in a cost function associated with the decoder to penalize differences between the output and an output that should be generated according to the known property.

In some embodiments, the penalty term comprises a difference between decoded versions of the low-dimensional representation of the input that are related to each other via a physical prior.

In some embodiments, the known property is a known symmetry property, and the penalty term includes a difference between decoded versions of the low-dimensional representation of the input that are reflected with respect to each other across or rotated about a point of symmetry.

In some embodiments, the encoder and/or the decoder are configured to adjust based on any differences between the decoded versions of the low-dimensional representation, and adjusting includes adjusting at least one weight associated with a layer of the encoder and/or the decoder.

In some embodiments, the input includes a sensor signal associated with a sensing operation in a semiconductor manufacturing process, the low-dimensional representation of the input is a compressed representation of the sensor signal, and the output is an approximation of the input sensor signal.

In some embodiments, the sensor signal comprises a pupil image, and the encoded representation of the pupil image is configured for estimating overlap (as one example of many possible parameters of interest).

In some embodiments, the instructions cause other operations comprising: processing the input into a first-level dimension suitable for combination with other inputs by an input model of the modular automatic encoder model, and providing the processed input to the encoder; receiving an expanded version of the input from the decoder by an output model of the modular automatic encoder model, and generating an approximation of the input based on the expanded version; and estimating, by a predictive model of the modular automatic encoder model, a parameter of interest based on the low-dimensional representation of the input and/or the output (the output comprising and/or being related to the approximation of the input) in the potential space.

In some embodiments, the input model, the encoder/decoder, and the output model are separate from each other and correspond to process physical property differences in different portions of a manufacturing process and/or sensing operation such that each of the input model, the encoder/decoder, and/or the output model may be trained together and/or separately, but configured separately, based on the process physical properties of the respective portions of the manufacturing process and/or sensing operation, in addition to other models in the modular automatic encoder model.

In some embodiments, the decoder is configured to enforce known symmetry properties of the encoded input during a training phase such that the modular automatic encoder model complies with the enforced known symmetry properties during an inference phase.

In some embodiments, a method is provided for estimating a parameter of interest of an optical metrology operation by a modular automatic encoder model having an extended range of applicability by enforcing known properties of an input of the modular automatic encoder model in a decoder of the modular automatic encoder model. The method comprises the following steps: causing an encoder of the modular automatic encoder model to encode an input to produce a low-dimensional representation of the input in potential space; and causing the decoder of the modular automatic encoder model to generate an output corresponding to the input by decoding the low-dimensional representation. The decoder is configured to force known properties of the encoded input during decoding to produce the output. The known property is associated with a known physical relationship between the low-dimensional representation in the potential space and the output. A parameter of interest is estimated based on the output and/or the low-dimensional representation of the input in the potential space.

In some embodiments, the method further comprises processing the input into a first-level dimension suitable for combination with other inputs by an input model of the modular automatic encoder model, and providing the processed input to the encoder; receiving an expanded version of the input from the decoder by an output model of the modular automatic encoder model, and generating an approximation of the input based on the expanded version; and estimating, by a predictive model of the modular automatic encoder model, a parameter of interest based on the low-dimensional representation of the input and/or the output (the output comprising and/or being related to the approximation of the input) in the potential space.

According to another embodiment, a system is provided that is configured to execute a modular automatic encoder model with an extended range of applicability for estimating a parameter of interest of an optical metrology operation by enforcing known properties of an input of the modular automatic encoder model in a decoder of the modular automatic encoder model. The system comprises: an encoder of the modular automatic encoder model, the encoder configured to encode an input to produce a low-dimensional representation of the input in potential space; and the decoder of the modular automatic encoder model, the decoder configured to generate an output corresponding to the input by decoding the low-dimensional representation. The decoder is configured to force known properties of the encoded input during decoding to produce the output. The known property is associated with a known physical relationship between the low-dimensional representation in the potential space and the output. A parameter of interest is estimated based on the output and/or the low-dimensional representation of the input in the potential space.

In some embodiments, the system further comprises an input model of the modular automatic encoder model, the input model configured to process the input into a first-level dimension suitable for combination with other inputs, and to provide the processed input to the encoder; an output model of the modular automatic encoder model, the output model configured to receive an expanded version of the input from the decoder and to generate an approximation of the input based on the expanded version; and a predictive model of the modular automatic encoder model, the predictive model configured to estimate a parameter of interest based on the low-dimensional representation of the input in the potential space.

In some embodiments, a non-transitory computer-readable medium having instructions thereon configured to cause a computer to execute a modular auto-encoder model configured to generate an output based on an input is provided. The instructions cause operations comprising: causing an encoder of the modular automatic encoder model to encode the input to produce a low-dimensional representation of the input in potential space; and causing a decoder of the modular automatic encoder model to generate the output by decoding the low-dimensional representation. The decoder is configured to force, during decoding, a known property of the encoded input to produce the output, the known property being associated with a known physical relationship between the low-dimensional representation in the potential space and the output.

In some embodiments, the modular automatic encoder model further comprises: an input model configured to process the input into a first-level dimension suitable for combination with other inputs and to provide the processed input to the encoder; an output model configured to receive an expanded version of the input from the decoder and to generate the approximation of the input based on the expanded version; and a predictive model configured to estimate a manufacturing process parameter based on the low-dimensional representation of the input in the potential space.

In some embodiments, the parameter is a semiconductor manufacturing process parameter; the input model includes a neural network block including dense feed-forward layer, convolutional layer, and/or residual network architecture of the modular automatic encoder model; the encoder and/or decoder comprises a neural network block comprising a feed-forward layer and/or a residual layer; and the prediction model comprises a neural network block comprising a feed-forward layer and/or a residual layer.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate one or more embodiments and, together with the description, explain these embodiments. Embodiments of the invention will now be described, by way of example only, with reference to the accompanying schematic drawings in which corresponding reference symbols indicate corresponding parts, and in which:

FIG. 1 depicts a schematic overview of a lithographic apparatus according to an embodiment.

FIG. 2 depicts a schematic overview of a lithographic unit according to an embodiment.

Fig. 3 depicts a schematic representation of global lithography showing cooperation between three techniques for optimizing semiconductor manufacturing, according to an embodiment.

FIG. 4 illustrates an exemplary metrology apparatus, such as a scatterometer, in accordance with an embodiment.

Fig. 5 illustrates an encoder-decoder architecture according to an embodiment.

Fig. 6 illustrates an encoder-decoder architecture within a neural network, according to an embodiment.

Fig. 7 illustrates an embodiment of the present modular automatic encoder model, according to an embodiment.

FIG. 8 illustrates an output model of a modular automatic encoder model including two or more sub-models, according to an embodiment.

Fig. 9 illustrates an embodiment of a modular automatic encoder model that may be used during parameter inference (e.g., estimation and/or prediction) in accordance with an embodiment.

FIG. 10 illustrates how a modular automatic encoder model is configured to estimate a parameter of interest by estimating an available number of information content based on available channels using a subset of multiple input models to estimate a combination of available channels from measurement data from one or more sensing (e.g., optical metrology and/or other sensing) platforms, according to an embodiment.

Fig. 11 illustrates a common model, an output model (in this example, a neural network block corresponding to each input channel) and other components of a modular automatic encoder model according to an embodiment.

FIG. 12 illustrates a graphical interpretation of forcing known properties of an encoded input to produce an output, according to an embodiment.

Fig. 13 illustrates an application of a modular automatic encoder model for semi-supervised learning, according to an embodiment.

FIG. 14 illustrates how a model automatic encoder model is configured to include a recursive deep learning automatic encoder structure in some embodiments.

FIG. 15 also illustrates how in some embodiments the model automatic encoder model is configured to include a recursive deep learning automatic encoder structure.

Fig. 16 illustrates a method for parameter estimation according to an embodiment.

FIG. 17 is a block diagram of an exemplary computer system, according to an embodiment.

FIG. 18 is an alternative design of the lithographic apparatus of FIG. 1, according to an embodiment.

Detailed Description

As described above, the automatic encoder may be configured for metrology and/or for parameter inference and/or for other solutions for other purposes. Such deep learning model architecture is generic and scalable to arbitrary sizes and complexities. The auto encoder is configured to compress a high-dimensional signal (e.g., a pupil image in a semiconductor metrology platform) to an efficient low-dimensional representation of the same signal. Parameter inference (i.e., regression) is then performed from the low-dimensional representation against the set of known tags. By compressing the signal first, the inference problem is significantly simplified compared to performing regression directly on the high-dimensional signal.

However, it is often difficult to understand the information flow inside a typical automatic encoder. Information at the input, at the level or stage of the compressed low-dimensional representation, and at the output can be deduced. Information between these points cannot be easily interpreted.

Data driven inference methods have been proposed for semiconductor metrology operations and for the task of parameter estimation. Data driven inference methods rely on a large collection of measurements and models that map measured features to parameters of interest, where labels for these parameters are obtained via carefully designed targets on the wafer or from third party measurements. Current methods are capable of measuring a significant number of channels (multiple wavelengths, observations at multiple wafer rotations, four light polarization schemes, etc.). However, due to practical timing or timing constraints, the number of channels needs to be limited to a subset of those available for producing the measurements. To select the best channel, a brute force approach to test all possible channel combinations is typically used. This is time consuming, resulting in longer measurement and/or process recipe generation times. In addition, brute force methods may be prone to over-fitting, introducing different biases per channel and/or other drawbacks.

In semiconductor manufacturing, optical metrology can be used to measure critical stack parameters directly on a product (e.g., patterned wafer) structure. Machine learning methods are typically applied to optical scatterometry data acquired using a metrology platform. These machine learning methods are conceptually equivalent to supervised learning methods, i.e., learning from a labeled dataset. The success of such a method depends largely on the quality of the label. Typically, the labeled dataset is generated by measuring and labeling known targets in the wafer. One of the main challenges in using targets in this way is the fact that the targets only provide very accurate relative labels. This means that within one cluster of the target there is some unknown cluster bias and the exact label on one cluster of the target is known. Determining such unknown cluster bias and thus obtaining absolute labels is critical to the accuracy of the target-based recipe. The step of estimating cluster bias is commonly referred to as tag correction.

Compared with the traditional integral automatic encoder model, the module type automatic encoder model has lower rigidity. The present modular automatic encoder model has a greater number of trainable and/or otherwise adjustable components. The modularity of the present model makes it easier to interpret, define and expand. The complexity of the present model is high enough to model the process that generates the data provided to the model, but low enough to avoid modeling noise or other unwanted characteristics (e.g., the present model is configured to avoid overfitting the data provided). Since the process (or at least aspects of the process) that generates the data is often unknown, selecting the appropriate network complexity typically involves some intuitiveness and trial-and-error. For this reason, it is particularly desirable to provide a model architecture that is modular, easy to understand, and easy to scale up and down in complexity.

In addition, the present modular automatic encoder model is configured to estimate the parameter of interest by estimating an available amount of information content based on the available channels using a subset of the plurality of input models to estimate a combination of available channels from measurement data from the optical metrology platform. The present model is configured to train by randomly or otherwise iteratively varying (e.g., sub-selecting) the number of channels used to approach the input during the iterative training step. This iterative variation/sub-selection ensures that the model remains predictive/consistent for any combination of input channels. Furthermore, since the information content present in the input represents all channels (e.g., since each channel is part of a subset of the selected channels for at least one training iteration), the resulting model will not include a bias specific to one particular channel.

The present modular automatic encoder model is also configured such that known properties of the input (e.g., domain knowledge) can be embedded into the model during the training phase, which reduces or eliminates (e.g., cluster) bias in subsequent inferences made through the model. In other words, the present modular automatic encoder is configured such that known (e.g., symmetric) properties of the input are embedded into the decoding portion of the model, and these embedded known properties allow the model to make unbiased inferences.

It should be noted that the term "auto encoder" as used in connection with the present modular auto encoder model may generally refer to one or more auto encoders, or one or more portions of an auto encoder, that are configured for partial supervised learning using potential space for parameter estimation and/or other operations. In addition, the various drawbacks and advantages (of the present modular automatic encoder model) described above (e.g., of the previous system) are examples of many other possible drawbacks and advantages, and should not be considered limiting.

Finally, while specific reference may be made herein to the fabrication of integrated circuits, the description herein has many other possible applications. For example, the description may be used to fabricate integrated optical systems, guidance and detection patterns for magnetic domain memories, liquid crystal display panels, thin film magnetic heads, and the like. In these alternative applications, it will be appreciated by those skilled in the art that any use of the terms "reticle," "wafer," or "die" herein should be considered interchangeable with the more generic terms "mask," "substrate," and "target portion," respectively, in the context of these alternative applications. In addition, it should be noted that the methods described herein may have many other possible applications in a variety of fields, such as language processing systems, autopilot, medical imaging and diagnostics, semantic segmentation, denoising, chip design, electronic design automation, and the like. The method can be applied to any field where it is advantageous to quantify uncertainty in machine learning model predictions.

In this document, the terms "radiation" and "beam" are used to encompass all types of electromagnetic radiation, including ultraviolet radiation (e.g., where the wavelength is 365nm, 248nm, 193nm, 157nm, or 126 nm) and EUV (extreme ultra-violet radiation, e.g., having a wavelength in the range of about 5nm to 100 nm).

The patterning device may include or may form one or more design layouts. CAD (computer aided design) programs can be utilized to generate the design layout. This process is often referred to as EDA (electronic design automation). Most CAD programs follow a set of predetermined design rules in order to produce a functional design layout/patterning device. These rules are set based on processing and design constraints. For example, design rules define the spatial tolerances between devices (such as gates, capacitors, etc.) or interconnect lines to ensure that the devices or lines do not interact with each other in an undesirable manner. One or more of the design rule limits may be referred to as a "critical dimension" (CD). The critical dimension of a device may be defined as the minimum width of a line or hole or the minimum space between two lines or holes. Thus, CD adjusts the overall size and density of the designed device. One of the goals in device fabrication is to faithfully reproduce the original design intent (via patterning means) on the substrate.

The terms "reticle," "mask," or "patterning device" as used herein may be broadly interpreted as referring to a generic patterning device that can be used to impart an incoming radiation beam with a patterned cross-section that corresponds to a pattern to be created in a target portion of the substrate. The term "light valve" may also be used herein. Examples of other such patterning devices include programmable mirror arrays, in addition to classical masks (transmissive or reflective, binary, phase-shifted, hybrid, etc.).

As a brief introduction, FIG. 1 schematically depicts a lithographic apparatus LA. The lithographic apparatus LA comprises: an illumination system (also referred to as an illuminator) IL configured to condition a radiation beam B (e.g. UV radiation, DUV radiation or EUV radiation); a mask support (e.g. a mask table) T configured to support a patterning device (e.g. a mask) MA and connected to a first positioner PM configured to accurately position the patterning device MA in accordance with certain parameters; a substrate support (e.g., a wafer table) WT configured to hold a substrate (e.g., a resist-coated wafer) W and coupled to a second positioner PW configured to accurately position the substrate support in accordance with certain parameters; and a projection system (e.g., a refractive projection lens system) PS configured to project a pattern imparted to the radiation beam B by patterning device MA onto a target portion C (e.g., comprising one or more dies) of the substrate W.

In operation, the illumination system IL receives a radiation beam from a radiation source SO, for example via a beam delivery system BD. The illumination system IL may include various types of optical components, such as refractive, reflective, magnetic, electromagnetic, electrostatic and/or other types of optical components, or any combination thereof, for directing, shaping, and/or controlling radiation. The illuminator IL may be used to condition the radiation beam B to have a desired spatial and angular intensity distribution in its cross-section at the plane of the patterning device MA.

The term "projection system" PS used herein should be broadly interpreted as encompassing various types of projection system, including refractive, reflective, catadioptric, anamorphic, magnetic, electromagnetic and/or electrostatic optical systems, or any combination thereof, as appropriate for the exposure radiation being used, and/or for other factors such as the use of an immersion liquid or the use of a vacuum. Any use of the term "projection lens" herein may be considered as synonymous with the more general term "projection system" PS.

The lithographic apparatus LA may be of a type wherein at least a portion of the substrate may be covered by a liquid having a relatively high refractive index, e.g. water, so as to fill a space between the projection system PS and the substrate W, which is also referred to as immersion lithography. Further information about immersion techniques is given in US6952253, which is incorporated herein by reference.

The lithographic apparatus LA may also be of a type having two or more substrate supports WT (also referred to as "dual stage"). In such a "multi-stage" machine, the substrate supports WT may be used in parallel, and/or a subsequent exposure step of the substrate W may be performed on one of the substrate supports WT in preparation for a subsequent exposure of the substrate W, while another substrate W on another substrate support WT is used to expose a pattern on another substrate W.

In addition to the substrate support WT, the lithographic apparatus LA may also comprise a measurement table. The measurement platform is arranged to hold the sensor and/or the cleaning device. The sensor may be arranged to measure a property of the projection system PS or a property of the radiation beam B. The measurement platform may hold a plurality of sensors. The cleaning device may be arranged to clean a part of the lithographic apparatus, for example a part of the projection system PS or a part of the system providing the immersion liquid. The measurement table may be moved under the projection system PS when the substrate support WT is away from the projection system PS.

In operation, the radiation beam B is incident on the patterning device (e.g., mask) MA, which is held on the mask support MT, and is patterned by a pattern (design layout) presented to the patterning device MA. Having traversed the mask MA, the radiation beam B passes through the projection system PS, which focuses the beam onto a target portion C of the substrate W. By means of the second positioner PW and position measurement system IF, the substrate support WT can be moved accurately, e.g. so as to position different target portions C in the path of the radiation beam B at focus and alignment positions. Similarly, the first positioner PM and possibly another position sensor (which is not explicitly depicted in fig. 1) can be used to accurately position the patterning device MA with respect to the path of the radiation beam B. The patterning device MA and the substrate W may be aligned using the mask alignment marks M1, M2 and the substrate alignment marks P1, P2. Although the substrate alignment marks P1, P2 occupy dedicated target portions as illustrated, the substrate alignment marks P1, P2 may be located in spaces between target portions. When the substrate alignment marks P1, P2 are located between the target portions C, the substrate alignment marks P1, P2 are referred to as scribe-lane alignment marks.

Fig. 2 depicts a schematic overview of a lithography unit LC. As shown in fig. 2, the lithographic apparatus LA may form part of a lithographic cell LC, sometimes also referred to as a lithographic cell or (lithography element) cluster, which often also comprises an apparatus for performing pre-exposure and post-exposure processes on the substrate W. Conventionally, these devices include a spin coater SC configured to deposit a resist layer, a developer DE to develop the exposed resist, a chill plate CH and a bake plate BK for adjusting, for example, the temperature of the substrate W, for example, the solvent in the resist layer. The substrate transport apparatus or robot RO picks up a substrate W from the input port I/O1, the output port I/O2, moves the substrate between different process devices and transfers the substrate W to the feed station LB of the lithographic apparatus LA. The means in the lithography unit, which are also often referred to collectively as a track or coating development system, are typically under the control of a track or coating development system control unit TCU, which may itself be controlled by a supervisory control system SCS, which may also control the lithography apparatus LA, for example via a lithography control unit LACU.

In order to properly and consistently expose the substrate W (fig. 1) exposed by the lithographic apparatus LA, it is desirable to inspect the substrate to measure properties of the patterned structure, such as overlay error between subsequent layers, line thickness, critical Dimension (CD), etc. For this purpose, an inspection tool (not shown) may be included in the lithography unit LC. If errors are detected, in particular in the case of an inspection before other substrates W of the same batch or lot remain to be exposed or processed, it is possible, for example, to adjust the exposure of subsequent substrates or other processing steps to be performed on the substrates W.

An inspection apparatus, which may also be referred to as a metrology apparatus, is used to determine the properties of the substrate W (fig. 1), and in particular how the properties of different substrates W change or how properties associated with different layers of the same substrate W change between different layers. The inspection apparatus is alternatively configured to identify defects on the substrate W and may be, for example, part of the lithographic cell LC, or may be integrated into the lithographic apparatus LA, or may even be a separate device. The inspection apparatus may measure properties on the latent image (the image in the resist layer after exposure), or on the semi-latent image (the image in the resist layer after the post-exposure bake step PEB), or on the developed resist image (where the exposed or unexposed portions of the resist have been removed) or even on the etched image (after a pattern transfer step such as etching).

Fig. 3 depicts a schematic representation of global lithography, which represents the collaboration between three techniques for optimizing semiconductor fabrication. In general, the patterning process in the lithographic apparatus LA is one of the most critical steps in the process that require higher accuracy in the sizing and placement of structures on the substrate W (fig. 1). To ensure such high accuracy, three systems (in this example) may be combined in a so-called "overall" control environment, as schematically depicted in fig. 3. One of these systems is a lithographic apparatus LA (virtually) connected to a metrology apparatus (e.g. a metrology tool) MT (second system) and to a computer system CL (third system). The "global" environment may be configured to optimize the cooperation between the three systems to enhance the overall process window and provide a tight control loop to ensure that the patterning performed by the lithographic apparatus LA remains within the process window. The process window defines a range of process parameters (e.g., dose, focus, overlap) within which a particular manufacturing process produces defined results (e.g., functional semiconductor devices) -within which process parameters in a lithographic process or patterning process are typically allowed to vary.

The computer system CL can use (part of) the design layout to be patterned to predict which resolution enhancement technique to use and to perform computational lithography simulation and computation to determine which mask layout and lithographic apparatus set the largest overall process window (depicted in fig. 3 by the double arrow in the first scale SC 1) that implements the patterning process. Typically, resolution enhancement techniques are arranged to match patterning possibilities of the lithographic apparatus LA. The computer system CL may also be used to detect (e.g., using input from the metrology tool MT) where the lithographic apparatus LA is currently operating within the process window to predict whether a defect is likely to exist due to, for example, sub-optimal processing (depicted in fig. 3 by the arrow pointing to "0" in the second scale SC 2).

The metrology apparatus (tool) MT may provide input to the computer system CL to enable accurate simulation and prediction, and may provide feedback to the lithographic apparatus LA to identify, for example, possible drift in the calibration state of the lithographic apparatus LA (depicted in fig. 3 by the plurality of arrows in the third scale SC 3).

In a lithographic process, it is desirable to frequently measure the resulting structure, for example, for process control and verification. The means for making such measurements include a metrology tool (device) MT. Different types of metrology tools MT for making such measurements are known, including scanning electron microscopes or various forms of scatterometry metrology tools MT. Scatterometers are multifunctional instruments that allow measurement of parameters of a lithographic process by having a sensor in the pupil or in a plane conjugate to the pupil of the objective lens of the scatterometer, typically referred to as pupil-based measurement, or by having a sensor in the image plane or in a plane conjugate to the image plane, in which case the measurement is typically referred to as image-or field-based measurement. Such scatterometers and associated measurement techniques are further described in patent applications US20100328655, US2011102753A1, US20120044470A, US20110249244, US20110026032, or ep1,628,164a, which are incorporated herein by reference in their entirety. For example, the aforementioned scatterometers may use light from the soft x-ray and visible to near-IR wavelength ranges to measure features of the substrate, such as gratings.

In some embodiments, the scatterometer MT is an angle resolved scatterometer. In these embodiments, a scatterometer reconstruction method may be applied to the measurement signal to reconstruct or calculate the properties of the grating and/or other features in the substrate. Such reconstruction may be caused, for example, by modeling the interaction of the scattered radiation with a mathematical model of the target structure and comparing the simulation results with the measurement results. The parameters of the mathematical model are adjusted until the simulated interactions produce a diffraction pattern similar to that observed from a real target.

In some embodiments, the scatterometer MT is a spectroscatterometer MT. In these embodiments, the spectroscatterometer MT may be configured such that radiation emitted by the radiation source is directed onto a target feature of the substrate and reflected or scattered radiation from the target is directed to a spectroscope detector that measures the spectrum of the radiation reflected by the mirror (i.e. a measurement of intensity as a function of wavelength). From such data, the structure or profile of the target that produced the detected spectrum can be reconstructed, for example, by rigorous coupled wave analysis and nonlinear regression or by comparison with a library of simulated spectra.

In some embodiments, the scatterometer MT is an ellipsometric scatterometer. Ellipsometry scatterometers allow the parameters of a lithographic process to be determined by measuring the scattered radiation for each polarization state. Such a metrology device (MT) emits polarized light (such as linear, circular or elliptical) by using, for example, suitable polarization filters in the illumination section of the metrology device. Sources suitable for metrology equipment may also provide polarized radiation. Various embodiments of existing ellipsometric scatterometers are described in U.S. patent applications 11/451, 599, 11/708, 678, 12/256, 780, 12/486, 449, 12/920, 968, 12/922, 587, 13/000, 229, 13/033, 135, 13/533, 110, and 13/891, 410, which are incorporated herein by reference in their entirety.

In some embodiments, the scatterometer MT is adapted to measure the overlap of two misaligned gratings or periodic structures (and/or other target features of the substrate) by measuring the reflectance spectrum and/or detecting an asymmetry in the configuration, which is related to the degree of overlap. Two (typically stacked) grating structures may be applied in two different layers (not necessarily consecutive layers) and may be formed at substantially the same location on the wafer. The scatterometer may have a symmetric detection configuration as described, for example, in patent application EP1,628,164A, such that any asymmetry is clearly distinguishable. This provides a way to measure misalignment in the grating. Further examples of measuring overlap may be found in PCT patent application publication No. WO 2011/012624 or US patent application US 20160161863, which are incorporated herein by reference in their entirety.

Other parameters of interest may be focal length and dose. The focal length and dose may be determined simultaneously by scatterometry (or alternatively by scanning electron microscopy) as described in U.S. patent application 2011-0249244, which is incorporated herein by reference in its entirety. A single structure (e.g., a feature in a substrate) may be used that has a unique combination of critical dimensions and sidewall angle measurements for each point in a focus energy matrix (FEM, also referred to as a focus exposure matrix). If these unique combinations of critical dimensions and sidewall angles are available, focal length and dose values can be uniquely determined from these measurements.

The metrology target may be a set of composite gratings and/or other features in the substrate, which are typically in the resist, for example by a lithographic process, but may also be formed, for example, after an etching process. In some embodiments, one or more sets of targets may be clustered in different locations around the wafer. In general, the pitch and linewidth of the structures in the grating depend on the measurement optics (especially NA of the optics) to be able to capture the diffraction orders from the measurement target. The diffraction signal may be used to determine a shift (also referred to as "overlay") between the two layers or may be used to reconstruct at least a portion of the original grating as produced by a lithographic process. Such reconstruction may be used to provide guidance of the quality of the lithographic process and may be used to control at least a portion of the lithographic process. The target may have smaller subsections that are configured to mimic the dimensions of the functional portions of the design layout in the target. Due to such sub-segmentation, the target will behave more like the functional part of the design layout, so that the measurement of the total process parameter is similar to the functional part of the design layout. The target may be measured in an underfill mode or in an overfill mode. In the underfill mode, the measurement beam produces a spot that is smaller than the overall target. In the overfill mode, the measurement beam produces a spot that is larger than the overall target. In such an overfill mode, it is also possible to measure different targets simultaneously, thus determining different process parameters simultaneously.

The overall measurement quality of a lithographic parameter using a particular target is determined, at least in part, by the measurement recipe used to measure such a lithographic parameter. The term "substrate measurement recipe" may include one or more parameters of the measurement itself, one or more parameters of the one or more patterns measured, or both. For example, if the measurement used in the substrate measurement option is diffraction-based optical metrology, one or more of the measured parameters may include the wavelength of the radiation, the polarization of the radiation, the angle of incidence of the radiation with respect to the substrate, the orientation of the radiation with respect to the pattern on the substrate, and so forth. One of the criteria for selecting a measurement option may be, for example, the sensitivity of one of the measurement parameters to process variations. Further examples are described in U.S. patent application 2016-0161863 and published U.S. patent application 2016/0370717A1, which are incorporated herein by reference in their entirety.

Fig. 4 illustrates an exemplary metrology device (tool or platform) MT such as a scatterometer. The MT includes a broadband (self-optic) radiation projector 40 that projects radiation onto a substrate 42. The reflected or scattered radiation is passed to a spectrometer detector 44 which measures the spectrum 46 of the specularly reflected radiation (i.e. a measurement of intensity as a function of wavelength). From this data, the structure or profile of the detected spectrum may be reconstructed 48 by the processing unit PU, for example by rigorous coupled wave analysis and nonlinear regression or by comparison with a library of simulated spectra as shown at the bottom of fig. 3. In general, for reconstruction, the general form of the structure is known, and some parameters are assumed from knowledge of the process used to fabricate the structure, leaving only a few parameters of the structure determined from scatterometry data. For example, such a scatterometer may be configured as a normal incidence scatterometer or a oblique incidence scatterometer.

It is often desirable to be able to computationally determine how a patterning process will produce a desired pattern on a substrate. The computational determination may include, for example, simulation and/or modeling. The model and/or simulation may be provided for one or more portions of the manufacturing process. For example, it is desirable to be able to simulate a lithographic process that transfers a patterning device pattern onto a resist layer of a substrate, as well as patterns that are generated in the resist layer after development of the resist, simulate metrology operations (such as overlay determination), and/or perform other simulations. The purpose of the simulation may be to accurately predict, for example, metrology metrics (e.g., overlay, critical dimensions, reconstruction of the three-dimensional profile of a feature of the substrate, dose or focus of the lithographic apparatus when the feature of the substrate is printed with the lithographic apparatus, etc.), manufacturing process parameters (e.g., edge placement, aerial image intensity slope, sub-resolution assist features (SRAF), etc.), and/or other information that may then be used to determine whether the desired or target design has been achieved. The desired design is typically defined as a pre-optical proximity correction design layout, which may be provided in a standardized digital file format such as GDSII, OASIS, or another file format.

The simulation and/or modeling may be used to determine one or more metrology metrics (e.g., perform overlay and/or other metrology measurements), configure one or more features of the patterning device pattern (e.g., perform optical proximity effect correction), configure one or more features of the illumination (e.g., change one or more characteristics of the spatial/angular intensity distribution of the illumination, such as change shape), configure one or more features of the projection optics (e.g., numerical aperture, etc.), and/or for other purposes. Such determination and/or configuration may be generally referred to as mask optimization, source optimization, and/or projection optimization, for example. Such optimizations may be performed independently or combined in different combinations. One such example is source-mask optimization (SMO), which involves configuring one or more features of a patterning device pattern along with one or more features of illumination. The optimization may, for example, use the parameterized models described herein to predict values of various parameters (including images, etc.).

In some embodiments, the optimization process of the system may be expressed as a cost function. The optimization process may include finding a set of parameters (design variables, process variables, inspection operation variables, etc.) of the system that minimizes the cost function. The cost function may have any suitable form depending on the objective of the optimization. For example, the cost function may be a weighted Root Mean Square (RMS) of the deviation of certain characteristics (evaluation points) of the system from the expected values (e.g., ideal values) of those characteristics. The cost function may also be the maximum of these deviations (i.e., the worst deviation). The term "evaluation point" should be interpreted broadly to include any characteristic of the system or method of manufacture. Due to the applicability of the implementation of the system and/or method, the design of the system and/or the process variables may be limited in scope and/or may be interdependent. In the case of lithographic projection and/or inspection equipment, limitations are often associated with physical properties and characteristics of the hardware, such as tunable range and/or patterning device manufacturability design rules. The evaluation points may include physical points on the resist image on the substrate, as well as non-physical properties such as, for example, dose and focal length.

In some embodiments, the present system(s) and method(s) may include performing one or more empirical models of the operations described herein. The empirical model may predict the output based on correlations between various inputs (e.g., one or more characteristics of a pupil image, one or more characteristics of a complex electric field image, one or more characteristics of a design layout, one or more characteristics of a patterning device, one or more characteristics of illumination used in a lithographic process (such as wavelength), etc.).

As an example, the empirical model may be a parameterized model and/or other model. The parameterized model may be a machine learning model and/or any other parameterized model. In some embodiments, the machine learning model may be and/or include mathematical equations, algorithms, curves, graphs, networks (e.g., neural networks), and/or other tool and machine learning model components, for example. For example, the machine learning model may be and/or include one or more neural networks (e.g., neural network blocks) having an input layer, an output layer, and one or more intermediate or hidden layers. In some embodiments, the one or more neural networks may be and/or include deep neural networks (e.g., neural networks having one or more intermediate or hidden layers between an input layer and an output layer).

As an example, one or more neural networks may be based on a larger set of neural units (or artificial neurons). The one or more neural networks may not closely mimic the manner in which a biological brain works (e.g., via a larger cluster of biological neurons connected by axons). Each neural unit of the neural network may be connected to many other neural units of the neural network. Such a connection may enhance or inhibit its effect on the activation state of the connected neural unit. In some embodiments, each individual neural unit may have a summation function that combines all of its input values together. In some embodiments, each connection (or the neural unit itself) may have a threshold function such that the signal must exceed a threshold before the signal is allowed to propagate to other neural units. These neural network systems may be self-learning and trained, rather than explicitly programmed, and perform significantly better in some problem-solving areas than traditional computer programs. In some embodiments, one or more neural networks may include multiple layers (e.g., where the signal path traverses from a front-end layer to a back-end layer). In some embodiments, a back propagation technique may be utilized by the neural network, wherein forward stimulation is used to weight "front-end" neural units. In some embodiments, stimulation and inhibition of one or more neural networks may flow more freely, with connections interacting in a more chaotic and complex manner. In some embodiments, the intermediate layers of the one or more neural networks include one or more convolutional layers, one or more recursive layers, and/or other layers.

One or more neural networks may be trained (i.e., parameters of one or more neural networks are determined) using a training data set (e.g., ground truth). The training data may include a set of training samples. Each sample may be a pair comprising an input object (typically an image, a measurement, a tensor or vector which may be referred to as a characteristic tensor or vector) and a desired output value (also referred to as a management signal). The training algorithm analyzes the training data and adjusts the behavior of the neural network by adjusting parameters (e.g., weights of one or more layers) of the neural network based on the training data. For example, a given form { (x) ₁ ，y ₁ )，(x ₂ ，y ₂ )，...，(x _N ，y _N ) A set of N training samples such that x _i Feature tensor/vector and y for the i-th example _i For its management signals, the training algorithm looks for a neural network g: X-Y, which isWherein X is the input space and Y is the output space. The feature tensor/vector is an n-dimensional tensor/vector representing the numerical feature of some object (e.g., a complex electric field image). The tensor/vector space associated with these vectors is often referred to as feature or potential space. After training, the neural network may be used to make predictions using the new samples.

As described herein, the present modular automatic encoder model includes one or more parameterized models (e.g., machine learning models, such as neural networks) that use encoder-decoder architecture and/or other models. In the middle (e.g., middle layer) of a model (e.g., neural network), the present model formulates a low-dimensional encoding (e.g., in potential space) that encapsulates information into an input (e.g., pupil image and/or other input associated with a pattern or other feature of a semiconductor manufacturing and/or metrology (and/or other sensing) process) of the model. The present modular automatic encoder model exploits the low dimensionality and compactness of potential space for parameter estimation and/or prediction.

By way of non-limiting example, fig. 5 illustrates a generic encoder-decoder architecture 50. The encoder-decoder architecture 50 has an encoding portion 52 (encoder) and a decoding portion 54 (decoder). In the example shown in fig. 5, the encoder-decoder architecture 50 may output, for example, a predicted pupil image 56 and/or other outputs.

By way of another non-limiting example, fig. 6 illustrates an encoder-decoder architecture 50 within a neural network 62. The encoder-decoder architecture 50 includes an encoding portion 52 and a decoding portion 54. In fig. 6, x represents the encoder input (e.g., the input pupil image and/or the extracted features of the input pupil image) and x' represents the decoder output (e.g., the predicted output image and/or the predicted features of the output image). In some embodiments, x' may represent, for example, the output from the middle layer of the neural network (as compared to the final output of the overall model) and/or other outputs. In fig. 6, z represents potential space 64 and/or low-dimensional coding (tensors/vectors). In some embodiments, z is or is related to a latent variable.

In some embodiments, the low-dimensional code z represents one or more features of an input (e.g., pupil image). One or more features of the input may be considered critical or critical features of the input. Features may be considered critical or critical features of an input because they are relatively more predictive than other features of the desired output, and/or have other characteristics, for example. One or more features (dimensions) represented in the low-dimensional encoding may be predetermined (e.g., by a programmer at the time of building the present modular automatic encoder model), determined by a previous layer of the neural network, adjusted by a user via a user interface associated with the system described herein, and/or may be determined by other methods. In some embodiments, the number of features (sizes) represented by the low-dimensional encoding may be predetermined (e.g., by a programmer at the time of building the present modular automatic encoder model), determined based on output from previous layers of the neural network, adjusted by a user via a user interface associated with the systems described herein, and/or determined by other methods.

It should be noted that while machine learning models, neural networks, and/or encoder-decoder architectures are referred to throughout this specification, machine learning models, neural networks, and encoder-decoder architectures are merely examples, and the operations described herein may be applied to different parameterized models.

As described above, process information (e.g., images, measurements, process parameters, metrology indicators, etc.) may be used to guide various manufacturing operations. Utilizing the relatively lower dimension of the potential space to predict and/or otherwise determine the process information may be faster, more efficient, require less computing resources, and/or have other advantages than previous methods of determining the process information.

Fig. 7 illustrates an embodiment of the present modular automatic encoder model 700. In general, the automatic encoder model may be adapted for metrology and/or for parameter inference and/or for other solutions for other purposes. Inference can include estimating parameters of interest from data and/or other operations. For example, this may include looking for potential representations in a forward manner by evaluating the encoder or in a reverse manner by solving a reverse problem using the decoder (as described herein). After finding the potential representation, the parameters of interest may be found by evaluating the prediction/estimation model (also as described herein). In addition, the potential representation provides a set of outputs (since the decoder can be evaluated, giving the potential representation), for example, the set can be compared to the data. Essentially, within the present context, inference and estimation (of the parameter of interest) can be used interchangeably. The automatic encoder model architecture is generic and can be extended to any size and complexity. The automatic encoder model is configured to compress high-dimensional signals (inputs) into an efficient low-dimensional representation of the same signal. Parameter inference (e.g., parameter inference may include regression and/or other operations) is performed from the low-dimensional representation, one or more outputs, and/or other information against a set of known tags. The tag may be a "reference" used in supervised learning. Within such a context, this may mean the design of external references or carefully crafted metrology targets that are intended to be reproduced. Measuring carefully crafted metrology targets may include measuring known targets having known (absolute/relative) properties (e.g., overlapping and/or other properties). By first compressing (input) the signal, the inference problem is significantly simplified, as compared to performing regression and/or other operations directly on the high-dimensional signal.

However, it is difficult to understand the information flow inside a typical automatic encoder. The architecture of an automatic encoder is often opaque and/or non-transparent, and it is often possible to deduce only the information at the model input, at the model output, and at the compression points (i.e., in potential space). The information is not easily interpreted between these points. In practice, during semiconductor manufacturing, there may be ancillary information (in addition to inputs) such as the physical properties of the target and corresponding sensor on the wafer. Such side information may be used as a priori knowledge (e.g., "a priori") to ensure that the model predictions match the physical reality to improve the performance of the automatic encoder model or to extend the applicability of the automatic encoder model. However, in a typical automatic encoder model with a rigid architecture including input, compression points, and output, it is unclear how any such information is incorporated (e.g., where and how any such information may be inserted into or used by the model).

The modular automatic encoder model 700 has a modular structure. This allows constructing an intermediate snapshot stage that can be used to utilize the side information. Instructions stored on a non-transitory computer-readable medium may cause a computer (e.g., one or more processors) to execute (e.g., train and/or evaluate) model 700 for, e.g., parameter estimation and/or prediction. In some embodiments, the model 700 (and/or any of the individual components of the model 700 described below) may be configured a priori before the training data is seen. In some embodiments, the estimated and/or predicted parameters include one or more of an image (e.g., pupil image, electric field image, etc.), a process measurement (e.g., an index value), and/or other information. In some embodiments, the process measurements include one or more of the following: measurement index, intensity, xyz position, size, electric field, wavelength, illumination and/or detection pupil, bandwidth, illumination and/or detection polarization angle, illumination and/or detection retardation angle, and/or other process measurements. The modular automatic encoder model 700 is configured to use potential space for partially supervised learning of parameter estimates (as further described below).

As shown in fig. 7, the modular automatic encoder model 700 is formed from four types of sub-models: input model 702, common model 704, output model 706, and predictive model 708 (although any number, type, and/or arrangement of sub-models is possible). The input model 702 is configured to process input data into higher snapshot levels suitable for combination with other inputs. The common model 704 combines the inputs to the bottleneck, compresses the information to the bottleneck (e.g., compression points or potential space in the model 700), and expands the information again to a stage suitable for splitting into multiple outputs. The output model 706 processes information from such common snapshot stage into a plurality of outputs that approximate the corresponding inputs. The predictive model 708 is used to estimate parameters of interest from information that passes through bottlenecks. Finally, it should be noted that, in contrast to typical automatic encoder models, the modular automatic encoder model 700 is configured for a number of different inputs and a number of different outputs.

In some embodiments, the modular automatic encoder model 700 includes one or more input models 702 (a, b, …, n), a common model 704, one or more output models 706 (a, b, …, n), a predictive model 708, and/or other components. In general, the modular automatic encoder model 700 may be more complex (in terms of the number of free parameters) than the typical overall model discussed above. However, in exchange, such more complex models are easier to interpret, define and expand. For any neural network, the complexity of the network must be chosen. This complexity should be high enough to model the process that forms the basis of the data, but low enough not to model the noise implementation (which is often interpreted as a form of overfitting). For example, the model may be configured to model the manner in which the sensor views the results of the manufacturing process on the wafer. Since the process of generating the data is typically unknown (or has unknown aspects), selecting the appropriate network complexity typically involves some intuitiveness and trial-and-error. For this reason, it is desirable to provide a model architecture that is easy to understand and in which it is clear how to scale up and down in model complexity by means of a modular automatic encoder model 700.

Here, one or more input models 702, common model 704, one or more output models 706, and/or predictive models 708 are separate from each other and may be configured to correspond to process physical property differences in different portions of the manufacturing process and/or sensing operation. The model 700 is configured in such a way that, among other models in the modular automatic encoder model 700, each of the one or more input models 702, the common model 704, the one or more output models 706, and/or the predictive model 708 may be trained together and/or separately based on process physical properties of the respective portions of the manufacturing process and/or sensing operation, but configured separately. By way of non-limiting example, the target contribution and the sensor contribution in the optical metrology device (tool, platform, etc.) are physically separable. In other words, different targets may be measured by the same sensor. For this reason, the target contribution and the sensor contribution can be modeled separately. In other words, one or more of the input model 702, the common model 704, one or more of the output model 706, and/or the predictive model 708 may be associated with a physical property of light as it propagates through the sensor or stack.

The one or more input models 702 are configured to process one or more inputs 711 (e.g., 711a, 711b, …,711 n) into a first-level dimension suitable for combination with other inputs. Processing may include filtering and/or otherwise converting input into a model friendly format, compressing input, projecting data onto a lower dimensional subspace to accelerate training steps, data normalization, processing signal contributions from sensors (e.g., source fluctuations, sensor dose configuration (amount of light produced), etc.), and/or other processing operations. The processing may be considered preprocessing, for example, to ensure that the input or data associated with the input is appropriate for the model 700, appropriate for combination with other inputs, and so forth. The first level dimension may be the same as or lower than the dimension level of the given input 711. In some embodiments, the one or more input models 702 include a feed-forward layer, a convolutional layer, and/or a residual network architecture of the dense (e.g., linear and/or dense layers with different starts) of the modular automatic encoder model 700. These structures are merely examples and should not be considered limiting.

In some embodiments, the input 711 is associated with a pupil, an object, and/or other component of the semiconductor manufacturing process, and is received from one or more of a plurality of characterization devices configured to generate the input 711. The characterization device may include various sensors and/or tools configured to generate data about the target. In some embodiments, the characterization device may include, for example, an optical metrology platform, such as the optical metrology platform shown in fig. 4. The data may include images, values of various indicators, and/or other information. In some embodiments, the input 711 includes one or more of an input image, an input process measurement, and/or a series of process measurements, and/or other information. In some embodiments, the input 711 may be a signal associated with a channel of measurement data from one or more sensing (e.g., optical metrology and/or other sensing) platforms. The channel may be a mode in which stacks are observed, such as a machine/physical configuration used when making measurements. By way of non-limiting example, the input 711 can include an image (e.g., any image associated with semiconductor manufacturing or any image generated during semiconductor manufacturing). The image may be preprocessed by the input model 702 and encoded by an encoder portion 705 of the common model 704 (described below) into low-dimensional data representing the image in the potential space 707 (described below). It should be noted that in some embodiments, the input model(s) 702 may be or be considered part of the encoder portion 705. The low-dimensional data may then be decoded for use in estimating and/or predicting process information and/or for other purposes.

Common model 704 includes an encoder-decoder architecture, a variant encoder-decoder architecture, and/or other architectures. In some embodiments, the common model 704 is configured to determine a potential spatial representation of a given input 711 in potential space 707 (fewer degrees of freedom to analyze in potential space 707 compared to the number of degrees of freedom of raw input data from different sensors and/or tools). Process information may be estimated and/or predicted based on the potential spatial representation of the given input 711 and/or other operations may be performed.

In some embodiments, common model 704 includes an encoder portion 705, a latent space 707, a decoder portion 709, and/or other components. It should be noted that in some embodiments, decoder portion 709 may include or be considered to include output model(s) 706. In some embodiments, the common model includes a feed forward layer and/or a residual layer and/or other components, although these exemplary structures should not be considered limiting. The encoder portion 705 of the common model 704 is configured to combine (e.g., through the input model 702) the processed inputs 711 and reduce the dimensions of the combined processed inputs to produce low-dimensional data in the potential space 707. In some embodiments, the input model 702 may perform at least some of the encoding. For example, encoding may include processing (e.g., by the input model 702) one or more inputs 711 into a first-level dimension, and reducing (e.g., by the encoder portion 705) the dimension of the combined processed inputs. This may include reducing the dimensions of the input 711 to form low-dimensional data and/or any amount of dimensional reduction in the potential space 707 before actually reaching the low-dimensional level in the potential space 707 (e.g., through one or more layers of the encoder section 705). It should be noted that such a dimension reduction need not be monotonic. For example, a combination of inputs (by means of concatenation) may be regarded as an increase in dimension.

The low-dimensional data in the potential space 707 has a resulting reduced second-level dimension that is less than the first-level dimension (e.g., the level of the dimension of the processed input). In other words, the resulting dimension after the reduction is smaller than the dimension before the reduction. In some embodiments, the low-dimensional data in the potential space may have one or more different forms, such as tensors, vectors, and/or other potential spatial representations (e.g., something having fewer dimensions than the number of dimensions associated with the given input 711).

The common model 704 is configured to expand low-dimensional data in the potential space into one or more expanded versions of the one or more inputs 711. Expanding the low-dimensional data in the potential space 707 into one or more expanded versions of the one or more inputs 711 includes, for example, decoding, generating decoder signals, and/or other operations. Typically, one or more expanded versions of one or more inputs include an output from (e.g., the last layer of) common model 704 or an input of output model 706. However, the one or more extended versions of the one or more inputs 711 may include any extended version from any layer of the decoder section 709 and/or any output passed from the common model 704 to the output model 706. One or more expanded versions of the one or more inputs 711 have an increased dimension compared to the low-dimensional data in the potential space 707. One or more expanded versions of the one or more inputs 711 are configured to be adapted to produce one or more different outputs 713 (e.g., a, b, …, n). It should be noted that the inputs of common model 704 need not be restored to their outputs. This is intended to describe only the interface. However, restoration may globally hold input 711 to output 713.

The one or more output models 706 are configured to use one or more expanded versions of the one or more inputs 711 to generate one or more different outputs 713. The one or more different outputs 713 include approximations of the one or more inputs 711, the one or more different outputs 713 having the same or increased dimensions as compared to an expanded version of the one or more inputs 711 (e.g., output from the common model 704). In some embodiments, the one or more output models 706 include dense feed-forward layer, convolutional layer, and/or residual network architecture of the modular automatic encoder model, although these example structures are not intended to be limiting. By way of non-limiting example, the input 711 may include a sensor signal associated with a sensing operation in a semiconductor manufacturing process, the low-dimensional representation of the input 711 may be a compressed representation of the sensor signal, and the corresponding output 713 may be an approximation of the input sensor signal.

The predictive model 708 is configured to estimate one or more parameters (parameters of interest) 715 based on the low-dimensional data in the potential space 707, the one or more different outputs 713, and/or other information. In some embodiments, for example, the one or more parameters may be semiconductor manufacturing process parameters (as described herein). In some embodiments, the predictive model 708 includes a feed forward layer, a residual layer, and/or other components, although these example structures are not intended to be limiting. By way of non-limiting example, the input 711 sensor signal may include a pupil image, and the encoded representation of the pupil image may be configured for use by the predictive model 708 to estimate overlap and/or other parameters.

In some embodiments, the modular automatic encoder model 700 is trained by comparing one or more different outputs 713 to corresponding inputs 711, and adjusting the parameterization of the one or more input models 702, the common model 704, the one or more output models 706, and/or the predictive model 708 to reduce or minimize the difference between the outputs 713 and the corresponding inputs 711. In some embodiments, training may include applying changes to low-dimensional data in potential space 707 such that common model 704 decodes a relatively more continuous potential space to produce decoder signals (e.g., output from common model 704), output 713 from one or more output models 706, or both); providing the decoder signal recursively to an encoder (e.g., one or more input models 702, encoder portion 705 of common model 704, or both) to generate new low-dimensional data; comparing the new low-dimensional data with the prior low-dimensional data; and adjusting (e.g., changing weights, changing constants, changing architecture, etc.) one or more components (702, 704, 706, 708) of the modular automatic encoder model 700 based on the comparison to reduce or minimize differences between the new low-dimensional data and the a priori low-dimensional data. Training is performed across all sub-models 702-708 in an overall fashion (although training may be separate for each model). In other words, changing the data in the potential space 707 affects other components of the modular automatic encoder model 700. In some embodiments, adjusting includes adjusting at least one weight, constant, and/or architecture (e.g., number of layers, etc.) associated with layers of one or more input models 702, common models 704, one or more output models 706, predictive models 708, and/or other components of model 700. These and other aspects of training the modular automatic encoder model 700 are described in more detail with respect to other figures.

In some embodiments, the number of one or more input models 702, the number of one or more output models 706, and/or other characteristics of model 700 are determined based on data needs (e.g., preprocessing input data may be necessary to filter and/or otherwise convert the data into a model friendly format), differences in process physical properties of different portions of the manufacturing process and/or sensing operation, and/or other information. For example, the number of input models may be the same as or different from the number of output models. In some embodiments, the individual input models 702 and/or output models 706 include two or more sub-models. Two or more sub-models are associated with different portions of the sensing operation and/or manufacturing process.

For example, the number of available data channels may be associated with possible configuration states of the sensor. The number of input models 702 and/or output models 706, whether to use a certain input model 702 and/or output model 706, and/or other characteristics of model 700 may be determined based on such information and/or other manufacturing and/or sensing operation information.

By way of non-limiting example, fig. 8 illustrates an output model 706 of a modular automatic encoder model 700 that includes two or more sub-models. In some embodiments, as shown in fig. 8, the individual output models 706 include two or more sub-models 720a, 720b, …,720 n, 722, and the like. In some embodiments, for example, the two or more sub-models may include a stack model (e.g., 720a, 720b, …,720 n) and a sensor model (e.g., 722) for semiconductor sensor operation. As described above, the target contribution and the sensor contribution in the metrology device are separable. Because of this situation, model 700 is configured to model the target contribution and the sensor contribution separately.

In fig. 8, a modular automatic encoder model 700 is shown with an integrated sensor model 722 for a particular sensor. Such an exemplary automatic encoder model may be trained by using data gathered by sensors associated with sensor model 722. It should be noted that this choice is to simplify the discussion. The principle applies to any number of sensors. It should also be noted that even though not shown in fig. 8, in some embodiments, a single input model 702 (e.g., 702 a) may include two or more sub-models. For example, the input model 702 sub-model may be used for data preprocessing (e.g., on singular value decomposition projections) and/or for other purposes.

Fig. 9 illustrates an embodiment of a modular automatic encoder model 700 that may be used during parameter inference (e.g., estimation and/or prediction). During inference, the sensors associated with sensor model 722 may be swapped for any arbitrary sensor modeled by sensor model "72 i". This submodel configuration is configured to solve the problem:

(this is the way the inference is performed by solving the inverse problem.)

In this equation, θ represents the compressed low-dimensional parameterization of the input in potential space, and θ ^* Representing the resulting target parameterization. From the resulting target parameterizations, the forward evaluation of the predictive model 708 may be used to find the corresponding parameter of interest 715.

As shown in fig. 10, the modular automatic encoder model 700 (see also fig. 7) is configured to estimate a parameter of interest by estimating an available amount of information content based on a subset of available channels using a plurality of input models 702 (fig. 7) to estimate a combination of available channels P from measurement data from one or more sensing (e.g., optical metrology and/or other sensing devices and/or tools) platformsIn some embodiments, the input model 702 is configured to process the plurality of inputs 711 based on available channels such that the plurality of inputs are suitable for combining with each other. As described above, processing may include filtering and/or otherwise converting input into a model friendly format, compressing the input, and/or other processing operations. The processing may be considered preprocessing, for example, to ensure that the input or data associated with the input is appropriate for the model 700, appropriate for combination with other inputs, and so forth. As also described above, the common model 704 (e.g., encoder 705) is configured to combine the processed inputs and generate low-dimensional data in the potential space 707 (fig. 7) based on the combined processed inputs. The low-dimensional data estimates an available number and the low-dimensional data in the potential space is configured to be used by one or more additional models (e.g., one or more output models 706 and/or predictive models 708) to generate a plurality of outputs An approximation to 711 and/or estimating a parameter 715 (of interest) based on low-dimensional data, as described herein.

In some embodiments, the modular automatic encoder model 700 (fig. 7) trains by iteratively changing a subset (e.g., sub-selection) of the inputs 711 processed (e.g., compressed) to be combined and used (e.g., compressed) by the common model 704 to produce low-dimensional training data. In other words, the input 711 (processed, compressed, or otherwise) is changed to the first compressed layer. Comparing one or more training approximations and/or training parameters generated or predicted based on the low-dimensional training data to corresponding references (e.g., known and/or otherwise predetermined reference approximations and/or parameters to which the training approximations and/or training parameters should be matched); and adjusting one or more of the plurality of input models 702, the common model 704, the one or more output models 706, and/or the predictive models 708 to reduce or minimize differences between the one or more training approximations and/or training parameters and the respective references based on the comparison. It should be noted that there are no reference values in the potential space. Alternatively, model 700 may be trained by iteratively discarding inputs and requiring the remainder of the network to produce all of the desired outputs (i.e., both 713 and 715). The modular automatic encoder model 700 is trained in such a way that the common model 704 is configured to combine the processed inputs 711 and generate low-dimensional data for generating approximation and/or estimation parameter(s), regardless of which of the plurality of inputs 711 are ultimately combined by the common model 704. It should be noted that in FIG. 10, P _i →φ _i The input model 702 is represented and the prospective operator E is part of the common model 704, but the output generation potential representation of the prospective operator need not be true (as described herein).

In some embodiments, the variation of individual iterations is random, or the variation of individual iterations varies in a statistically significant manner. For example, the number of channels that are started at any particular iteration is generally similar to the number of channels that would be available during actual inference, i.e., represents a typical use. Uniform sampling of the set of channels may be performed with probabilities that match the actual application. In some embodiments, the variation of the individual iterations is configured such that after a target number of iterations, each of the processed inputs 711 has been included at least once in the processed input subset. In some embodiments, iteratively changing the subset of processed inputs combined by the common model and used to generate the low-dimensional training data includes channel selection among a set of possible available channels. For example, a set of possible available channels is associated with a sensing (e.g., optical metrology) platform. The iteratively changing, comparing and adjusting is repeated until the model and/or the target (cost function) converges. In some embodiments, iteratively changing, comparing, and adjusting is configured to reduce or eliminate bias that may occur for a combined search throughout a channel.

By way of non-limiting example, in optical metrology for semiconductor fabrication, polarized light is used to excite a given feature on a wafer, and the response (raw scattered light intensity and/or phase) is used to infer/measure a parameter of interest for the given feature. Data driven inference methods have been used for the task of parameter estimation. Data driven inference methods rely on a large collection of measurements and models that map the measured pupil to the parameters of interest, where the labels for these parameters are obtained via carefully designed targets on the wafer and/or from third party measurements. However, these approaches have been shown to lack the ability to handle process variations.

An optical metrology platform (e.g., tool, equipment, etc.) has the capability to measure a significant number of channels (e.g., input 711 shown in fig. 7, such as multiple wavelengths, observations at multiple wafer rotations, multiple light polarization schemes, etc.). However, due to practical timing or timing constraints, the number of channels actually used (inputs 711) is typically limited to a subset of the available channels (typically up to the maximum of the two incident light channels) when making measurements in the production setting. Heretofore, in order to select the best channel, a brute force approach to test all possible channel combinations was used. This is time consuming, resulting in longer recipe generation times. In addition, brute force methods may be easy to fit, introducing different deviations for different channels.

The modular automatic encoder model 700 (e.g., input model 702 and/or common model 704) is configured to utilize combinations from all available channelsA framework for statistical modeling of pupil data (as one possible example of an input) to provide direct and fast channel selection relative to previous systems. As shown in fig. 10, for a channel P with a measurement ₁ To P _n The modular automatic encoder model 700 is configured to be able to use all available data (all channels) and also to evaluate through only a subset of those channels, for example, the inputs 711 shown in fig. 7. Model 700 is configured to use acquisition channel P from each target across all channels in a coherent manner _i Extracting information content phi _i Is (e.g., 702) f _i (P _i )→φ _i So that the intended information content per channel is the same, i.e. Ephi _i ]＝E[φ _j ]For all channels i, j. Thus, the coherence parameterized (modular auto-encoder) model 700 is configured to extract information that can be used via another model +.>Predicting information of a parameter of interest, wherein +.>A joint estimation described for the hypothetical complete information content, as may be measured through all channels. It should be noted that such information content can be distributed over multiple channels, i.e. it is not possible to observe the complete +_ in case of a single channel/measurement result >

At each phi is given _i With a per-channel noise/imperfections estimation of model 700 configured such that by using a limited number of channels available, it will be possible toThe asymptotic information content obtained from the stack is approximately:

this expression model 700 is configured to search for complianceIs parameterized phi _i Is a set of (3). This number is later used to predict the parameter of interest o (e.g., 715 in fig. 7). Since g (e.g. the coding part 705 of the common model 704 in fig. 7 together with the predictive model 708, except for the desired operator) adopts the information content +.>As input, model 700 may therefore use the values of +.>Any subset and possible combination of the indicated channels estimates the parameter of interest o. Note that o is a real tag, and +.>Is an estimate produced by the predictive model. The estimated quality depends on each +_ determined via entry> Information quality provided by the channel:

here, there are fewer channels availableAnd thus, for->Is of lower quality. In training by f _i After the g-defined model, model 700 estimates the number +_using a subset of channels>To evaluate predicted parameters of interest for any combination of channels. Examples for two (e.g., 1050) and three (e.g., 1052) input channels are presented in fig. 10, but many other possible examples are contemplated.

In some embodiments, an input model (e.g., neural network block) 702 (fig. 7) is associated with each input channel. The input model 702 is configured to be trained and may represent the function f presented above _c . To ensure good model performance, model 700 includes a common model 704, with common model 704 being configured (via each input model 702) to combine the information content generated from each channel to produce the modular automatic encoder structure shown in fig. 7.

FIG. 11 also illustrates a modular automatic encoder model 700, but with additional details related to the discussion of FIG. 10 above. Fig. 11 illustrates the common model 704, the output model 706 (neural network block-corresponding to each input channel in this example), and other components of the model 700. In this example, model 700 is configured to train to estimate and/or predict, for example, both a pupil (pupil image) and a parameter of interest. Model 700 shown in fig. 11 (and fig. 7) is configured to be desirable with respect to information contentTo converge because model 700 is configured to iteratively change/sub-select (e.g., randomly or in a statistically significant manner) for the pair ∈during each step of training (indicated by 1100 in fig. 11) >To make approximationsNumber of channels. This iterative variation/sub-selection ensures that model 700 remains predictive/consistent for any combination of input channels. Furthermore, due to the presence->The information content in (a) needs to represent all channels (i.e. -/->) The resulting model will therefore not reproduce a deviation specific to a particular channel. Mathematically, training can be specified as a function of the cost function 1102 shown in FIG. 11>Is defined as a minimization of the definition of (c). In the cost function 1102, the function r (·) is used as regularization of the potential parameterization or other type of regularization, and the target for the different measurements +.>Number of (xi) _t，i Is randomly (in this example) selected from the set 0, 1.

To reiterate, this approach allows training to use all or substantially all of the available data rather than a single model (e.g., 700) searched for a brute force combination of optimal models/channels. This approach reduces the time of the selection scheme compared to the combination in the previous approach, since the training computational complexity depends linearly on the number of channels. Furthermore, the present method reduces bias that can occur for combined searches across channels, as the present method ensures that all channel information is used during training. Since the entire model 700 is trained to take into account all the different sub-selections of channels, the resulting model produces consistent results with respect to channel selections.

Fig. 12 illustrates aspects of a modular automatic encoder model 700 (see fig. 7) for estimating how a parameter of interest for a manufacturing and/or sensing (e.g., optical metrology) operation has an extended range of applicability. The modular automatic encoder model 700 (see fig. 7) has an extended range of applicability for estimating parameters of interest for manufacturing and/or sensing (e.g., optical metrology) operations, since the model is configured to enforce the known nature of the input 711 (fig. 7) in the decoder portion 709 (fig. 7), which may include one or more output models 706 (as described above). In some embodiments, decoder portion 709 is configured to force the known properties of encoded input 711 during decoding (the result of the force performed during training) to produce output 713 while generating output 713 (fig. 7) corresponding to input 711 by decoding the low-dimensional representation of input 711. In practice, the enforcement initially occurs during training. After training, the property of becoming a model is enforced. Strictly speaking, however, decoding is also performed during training. The known properties are associated with a known physical relationship between the low-dimensional representation in the potential space 707 (fig. 7) for the input 711 and the output 713. In some embodiments, the known property is a known symmetry property, a known asymmetry property, and/or other known properties. In some embodiments, decoder section 709 may be configured to utilize the modularity of model 700 to enforce known properties at some intermediate decoding level or stage (e.g., at the interface between common model 704 and output model 706). Parameters of interest may be estimated based on the low-dimensional representations of the inputs 711 (as described herein) in the output 713 and/or the potential space 707. For example, in some embodiments, for use with respect to symmetry, the predictive model may be a selection mask (e.g., selecting parameters from a potential space to be associated with the parameter of interest). This can still be denoted as a neural network layer. However, the neural network layer remains fixed during training (the neural network layer becomes a fixed linear layer σ (wx+b), where each column in W includes only one value of 1 and the other elements are set to 0, b includes only elements equal to 0 and σ (·) is a unit element).

In some embodiments, decoder portion 709 (in some embodiments, decoder portion 709 may include one or more output models 706) is configured to force known symmetry properties and/or other properties of the encoded input during the training phase such that modular automatic encoder model 700 complies with the forced known symmetry properties (and/or other properties) during the inference phase to produce an output. Forcing includes using penalty terms in a cost function associated with decoder section 709 (which may include one or more output models 706) to penalize differences between outputs 713 and outputs that should be generated according to known properties. The penalty term comprises the difference between decoded versions of the low-dimensional representations of the inputs that are related to each other via a physical prior. In some embodiments, the known property is a known symmetry property, and the penalty term includes a difference between decoded versions of the low-dimensional representation of the input 711 that are reflected with respect to each other across or rotated about the symmetry point. In some embodiments, one or more of the input models 702, one or more of the encoder portion 705, the decoder portion 709, one or more of the output models 706, the predictive model 708, and/or other components of the model 700 (see fig. 7) are configured to adjust (e.g., train or further train) based on any differences between decoded versions of the low-dimensional representation.

By way of non-limiting example, an optical metrology platform (e.g., equipment, tools, etc.) is configured to measure critical semiconductor stack parameters located directly on a product structure. For this purpose, machine learning methods are typically applied to the optical scatterometry data acquired using the optical metrology platform. These machine learning methods are conceptually equivalent to supervised learning methods, i.e., learning from a labeled dataset. The success of such a method depends on the quality of the label.

There are common methods for obtaining labels. One method uses a self-referencing target, which is a specifically designed target for obtaining tagged data. The second method relies on recording tools in a semiconductor factory (typically scanning electron microscopes). The self-referencing target approach is generally preferred due to the free competitive advantage in the design of the self-referencing target and due to the independence of the competitive metrology solutions.

One of the main challenges in using self-referencing targets is the fact that they only provide very accurate relative labels. This means that within a target cluster there is some unknown cluster deviation, the exact label on said target cluster being known. Determining such unknown cluster bias and thus obtaining absolute labels is critical to the accuracy of manufacturing and/or inspection parameter selection schemes based on self-referencing targets. The step of estimating cluster bias is commonly referred to as tag correction.

This tag correction problem is insoluble for linear signals (e.g., input 711 shown in fig. 7, such as pupil images, etc.) as a function of the parameter of interest. Thus, methods for exploiting nonlinearities in the signal (e.g., pupil image and/or other input 711) are being investigated. Currently, methods that make use of physical assumptions about signal nonlinearities and/or directions in signal space are not known.

When all asymmetric parameters are negated simultaneously, the signal of interest (e.g., input 711) such as an asymmetric cross-polarized pupil signal caused by the overlap (e.g., from the metrology platform) is asymmetric with respect to the stack parameterization (an odd symmetric function). More specifically, when all other asymmetry parameters are zero, the signal may be asymmetric around the 0 overlap (odd symmetric function). Such domain knowledge may be embedded into model 700 (see fig. 7) during the training phase, which adds physical interpretability to model 700. Furthermore, symmetry points are important because they define an origin (zero) for parameterization of the model that can be used to calibrate absolute accuracy so that the proper correction tag can be found. Model 700 is configured to take advantage of this and other physical understandings and embed this and other physical understandings into model 700. In this example, the general pupil properties utilized are as follows:

Wherein I is ^a _DE Represents an asymmetric normalized pupil and θ _a Is a set of asymmetric parameters.

Referring to the modular automatic encoder model 700 shown in fig. 10 and 11 (and fig. 7), P (e.g., input 711) in this example may be a pupil image (p=i for ease of labeling ^a _DE ) F (P) (e.g., by one or more input models 702 and/or common models 704) encodes such pupil images into a compressed representationCompressed representation->Finally pass->Decoding to produce an approximate pupil->Such a model is such that->Training in a manner approximating the correct overlap ov, i.e. +.>One of the elements in (a) represents an overlap. For self-referencing targets, such a model may be trained using the following targets (e.g., cost functions):

where the true overlap is set to ov=l+b with a known label L and an unknown cluster bias B. In practice, this approach may be inadequate because of the certain degree of freedom in selecting the cluster bias B. This effectively corresponds to motion parameterizationThis can be problematic because absolute overlap estimation is desired. To reduce this ambiguity, another term is added to embed the symmetry properties of the signal (e.g., input 711) into the decoding model Targets (cost functions) in (e.g., common model 704 and/or one or more output models 706):

for any purpose ofIn practice, it is not ensured that for any +.>However, points from the process window may be sampled to ensure that the third term for any larger sample is smaller.

Fig. 12 illustrates a graphical interpretation that enforces the known nature of the encoded input 711 (fig. 7) to produce the output 713 (fig. 7). The known properties are associated with a known physical relationship between the low-dimensional representation in the potential space 707 (fig. 7) for the input 711 and the output 713. In this example, the known property is a known symmetry property (e.g., "symmetry a priori"). Fig. 12 illustrates a sample of a signal (e.g., input 711) that may be available (point 1201) that poorly samples the evolution of semiconductor manufacturing and/or sensing process 1202 with respect to (input) signal 1205 and parameter 1207 curve 1203. If knowledge of the symmetry of process 1202 is not embedded, model 700 may end or stop estimating and/or predicting parameters 1207 following line 1209 in FIG. 12. While line 1209 fits the data very well (point 1201), line 1209 does not adequately represent the process 1202 outside the sampling range. As shown by line 1211, embedding known symmetry properties into model 700 (fig. 7) enables model 700 to estimate and/or predict parameters 1207 of matching process 1202 along a much wider range. Furthermore, as mentioned previously, zero-crossings 1213 or symmetry points are important. Obviously, in this example, after adding a known symmetry property (a priori), the data utilization model 700 is significantly closer to the true origin.

Fig. 13 illustrates an application of a modular automatic encoder model 700 (shown in fig. 7) for semi-supervised learning. This may be used, for example, for in-device metrology and/or for other applications. Optical metrology platforms (e.g., equipment, tools, etc.) are often configured to infer physical parameters of structures on a semiconductor wafer from corresponding pupil images. Models associated with optical metrology platforms are typically trained and then used to infer (e.g., estimate and/or predict parameters of interest). During training, the pupil is trained using self-referencing targets or using critical dimension Scanning Electron Microscope (SEM) data acquisition and labeling. From these marked pupils, the model learns a mapping from pupil to label, which is then applied during inference. The usability of the marked pupil is limited, since obtaining SEM data is often expensive. This is due in part to the fact that SEM measurements can be destructive to semiconductor stacks and due to the fact that SEM measurements are slow metrology techniques. Thus, only a limited and expensive training data set is available.

The pupil image is composed of a large number of pixels. Currently, the training step requires learning a mapping from such high-dimensional signals (e.g., input 711 shown in fig. 7) to one or several parameters of interest (e.g., 715 shown in fig. 7). Due to the high dimensionality of the signals, a considerable amount of training images is required, which means that a considerable amount of SEM measurements is also required. Regarding signal noise: the stack response signal spans a low-dimensional space that becomes high-dimensional when the observation is contaminated with noise (noise spans the complete space). The noise does not carry any information about the stack and thus only acts as a disturbance. This is why the auto-encoder structure can be used to learn a low-dimensional representation of the stack contribution while also functioning as a noise filter. The process changes the stack response in a meaningful way and thus, many places in the process window need to be sampled to be able to learn the behavior of the parameters throughout the process window.

As one exemplary input, the pupil image (e.g., input 711) has a lower signal complexity. This is due to the fact that a limited set of physical parameters may be used to describe the semiconductor stack. Advantageously, model 700 is configured to be trained in two or more phases with different training data sets. In some embodiments, the pupil image signal and/or other inputs 711 are compressed in an unmanaged manner, resulting in a mapping from the pupil (or using any input) to an arbitrary low-dimensional subspace (e.g., potential space 707 shown in fig. 7). Next, using a smaller number of marked pupils and/or other inputs 711, the mapping from the low-dimensional subspace to the parameter(s) of interest is learned. This can be performed with a reduced number of targets, since the mapping is simpler (lower in dimension), which helps alleviate the problems described above. This can be seen as an application of semi-supervised learning. Fig. 13 depicts the general concept of a compression step 1301 followed by an embedding 1303, a regression step 1305, and an inference 1307 (e.g., 715 shown in fig. 7). The compression step is trained on the unlabeled 1311 dataset and the regression step is trained on the smaller labeled 1313 dataset, as also depicted in figure 13,

Two main methods for training the structure shown in fig. 13 (and in fig. 7 and/or other figures) can be distinguished. First, components of model 700 (e.g., one or more input models 702, common model 704, one or more output models 706, and/or predictive models 708) may be trained separately in a continuous manner. Second, the components may be trained simultaneously. If the components of model 700 are continuously trained, any non-supervised dimension reduction techniques may be applied for compression. For example, linear (principal component analysis-PCA, independent component analysis-ICA, … …) or nonlinear (auto encoder, t-distribution random adjacent embedded-t-SNE, consistent manifold approximation and projection-UMAP, … …) may be used. After the compression step, any regression technique may be applied to the embedding (e.g., linear regression, neural network, … …). When training (e.g., two or more) components simultaneously, the neural network may be used for both steps. This is because most unsupervised learning techniques are not well suited to be modified to such semi-supervised structures. For example, an automatic encoder may be used in the compression step and a forward neural network may be used in the regression step. These can be trained simultaneously by selecting an optimization objective (cost function) such that the compression step trains against any element of the dataset while the regression step trains against only the labeled elements of the dataset (i.e., penalties).

In some embodiments, modular automatic encoder model 700 (fig. 7) is configured to include a recursive deep learning automatic encoder structure. Fig. 14 and 15 illustrate examples of such structures. For example, in optical metrology for semiconductor devices, features on a wafer are excited using polarized light, and the response (raw scattered light intensity and/or phase) is used to infer/measure a parameter of interest for a given feature. Two types of methods are commonly applied to parameter inference. As described above, the data driven approach relies on a fairly large number of gathered measurements and a simplified model that map the pupil to the parameter of interest, where the tags are obtained via carefully designed targets on the wafer or from third party measurements. The second class explicitly models the target response under the sensor (e.g., using the Jones model). Such stack parameterization uses physical models, electronic and/or physical/electronic hybrid methods to determine the best fit to the measurement results.

The auto encoder may be used in a data driven method (as described herein). The self-encoding instrument has the benefit of producing a richer model that is able to model complex signals (inputs) while also performing the inference of complex parameters. Coupling the auto-encoder model with a variational Bayesian (Bayesian) prior (e.g., with respect to known properties of the input) may also ensure the potential space (i.e., the dimension reduction space of the bottleneck in the auto-encoder) and the continuity of the resulting production model. Schematic diagrams of such concepts are shown in fig. 7, 11, etc. and described herein.

Fig. 14 follows the concepts described above. Performing the analysis from (in this example) a set of intensities (I) over several channels by an encoding layer (e.g., one or more input models 702 and/or common models 704) _ch1 ，…，I _chi ) Mapping of the input 711 to the compact representation c. Returning this scenario from the compact representation c (e.g., in potential space 707) to the intensity space occurs through the decoding layer (e.g., common model 704 and/or one or more output models 706)(e.g., output 713). This builds a model (e.g., modular automatic encoder model 700) configured to extract relevant information from, for example, a large number of pixels (in the range of several 1000 s) and compress this into a space of several 10s of parameters. From this compressed representation (e.g. by means of the predictive model 708) we get to the parameter of interest +.>Is a link to a network.

Model 700 may be trained (to ensure that c follows a given distribution, such as a multi-variational Gaussian (Gaussian)) by bayesian priors applied to the potential representation c (e.g., on known properties of the input) such that representation c becomes continuous rather than a point estimate. In practice, this a priori is also encoded mathematically, small changes in parameterized c being required by the estimated intensity Is reflected in a similarly small change in (c). Thus, if for a given input 711 +.> A certain parameterization of the potential space can be obtained and given estimate +.>Approximately equal to I _chk And any change in potential space δc should be taken by the estimate +.>Is reflected in the proportional change of (c). Creating such a mapping of continuous potential space may prevent models, such as model 700, from learning effectively to classify data, a problem often encountered with neural networks having discrete potential space.

Especially where a variational prior (known property of the input) is used, the decoding layer (e.g., common model 704 and/or one or more output models 706) in an automatic encoder model such as model 700 can provide a representation of the signal (input) in a continuous and well-generalized (from potential space to pupil space) manner of generation. In some embodiments, a priori is used to adjust the distribution of potential space and primarily affects the generation portion of the model. The manifold-compressed portion of the model (the encoder formed by one or more input models 702 and/or common model 704 from pupil space to potential space) is not affected a priori in a significant way. Thus, model 700 may be suboptimal in terms of generalization ability when applied to the task of direct parameter inference, as the encoder portion of model 700 may not be trained to account for continuous input space (but model 700 may be trained in this way and/or the model may be trained in this way).

In some embodiments, model 700 includes a recursive model scheme for which training of both the encoding layer (702, 704) and decoding layer (704, 706) benefits from one or more variational priors (a priori knowledge about the input) placed on the potential space c (e.g., 707). In FIG. 14, the encoding portion (702, 704) of the model 700 includes a function f (I) mapped to a parameterization c of the potential space 707 _ch1 ，…，I _chi ) And c. Similarly, the decoding portion (704, 706) may be regarded as such a function Is an approximation of the reciprocal of (a). The variational prior placed on the potential space 707 (e.g., prior knowledge of the inputs) ensures that the model 700 learns the distribution for each of the potential variables, rather than the point estimates. Thus, the model 700 also learns the distribution of the output data in view of the potential distribution.

In some embodiments, model 700 is configured to use a variational scheme (capable of yielding a mapping of smaller changes in c to predicted intensitiesContinuous potential space of smaller variations) such that the encoded part f can encode the intensity I _ch1 ，…，I _chi Smaller changes in (e.g., input 711) map to similar changes in potential representation c. This may be done by training the modular automatic encoder model 700 in a recursive manner, ensuring that if passed as input 711 to the same model 700, an output 713 is produced, e.g. an intensity estimate +. >Yielding a valid potential representation c and a valid decoded output 713 (e.g., an intensity estimate).

Fig. 15 illustrates an expanded version of this recursive scheme. This scheme can be extended for any number of recursive passes (pass). (note that this recursive scheme is different from the iterative operation described with respect to fig. 10 and 11.) fig. 15 illustrates a model 700 that includes two (or, in general, r) different passes throughout the same model 700. The first pass takes physical data, measures the data, an implementation of the data, and maps the data to a given distribution in potential space. From this distribution of potential space, the data used to generate the output estimate can be plotted Is a sample of (a). These samples of the output estimates are then passed through the model 700 again as a composite input to ensure that the encoder portions (702, 704) of the model 700 map the samples to similar distributions in the potential space 707.

In general, for training of the unfolded embodiment of model 700 shown in fig. 15, the same input-output cost function 1500 for a conventional (variant) automatic encoder (see 1500 in fig. 15) can be used. In cost function 1500, g is a regularized term that codes a priori the variation, o is the desire to find the predicted value in a given norm p Is provided. Finer cost functions can also be designed for training by concatenating the internal states of the data between recursions. These cost functions may include the cost function 1502 shown in fig. 15 and/or other cost functions.

It should be noted that while the description herein often refers to (a single) potential space, such should not be viewed as limiting. The principles described herein may be applied by and/or to any non-zero number of potential spaces. One or more potential spaces may be used in series (e.g., for analyzing data and/or making a first prediction followed by a second prediction), in parallel (e.g., for simultaneously analyzing data and/or making predictions), and/or otherwise.

In some embodiments, one or more of the operations described herein may be combined into one or more particular methods. An example of one of these methods is illustrated in fig. 16. Fig. 16 illustrates a method 1600 for parameter estimation. The method 1600 includes training 1602 a modular automatic encoder model (e.g., model 700 shown in fig. 7 and described herein) for parameter estimation and/or prediction. This may include programming components, inferences, and/or other operations of the model. For example, training may be performed by one or more of the operations described herein. The method 1600 includes processing 1604 one or more inputs (e.g., 711) into a first level dimension suitable for combination with other inputs by one or more input models (e.g., 702) of a modular automatic encoder model. The method 1600 includes combining 1606 the processed inputs by a common model (e.g., 704) of the modular automatic encoder model and reducing the dimensions of the combined processed inputs to produce low-dimensional data in potential space. The low-dimensional data in the potential space has a resulting reduced second-level dimension that is less than the first-level low dimension. The method 1600 includes expanding 1608 the low-dimensional data in the potential space into one or more expanded versions of the one or more inputs through the common model. One or more expanded versions of one or more inputs have an increased dimension as compared to low-dimension data in the potential space. One or more expanded versions of the one or more inputs are adapted to produce one or more different outputs (e.g., 713). The method 1600 includes using 1610 one or more expanded versions of one or more inputs with one or more output models (e.g., 706) of a modular automatic encoder model to generate one or more different outputs. The one or more different outputs are approximations of the one or more inputs. The one or more different outputs have the same or increased dimensions as compared to the expanded version of the one or more inputs. The method 1600 includes estimating 1612, by a predictive model (e.g., 708) of the modular automatic encoder model, one or more parameters based on low-dimensional data and/or one or more outputs in the potential space.

Other operations described herein may form a separate method, or other operations described herein may be included in one or more steps (1602 to 1612) of method 1600. The operations described herein are intended to be illustrative. In some embodiments, the method may be implemented by one or more additional operations not described and/or without one or more of the operations discussed. Additionally, the order in which the operations of a given method are combined and otherwise described herein is not intended to be limiting. In some embodiments, one or more portions of a given method may be implemented (e.g., by simulation, modeling, etc.) in one or more processing devices (e.g., one or more processors). The one or more processing devices may include one or more means for performing some or all of the operations described herein in response to instructions stored electronically on an electronic storage medium. The one or more processing devices may include one or more devices configured via hardware, firmware, and/or software specifically designed to perform, for example, one or more of the operations of a given method.

The principles described herein (e.g., utilizing relatively low dimensionality of potential space in a trained parameterized model to predict and/or otherwise determine process information) may have a number of additional applications (e.g., in addition to and/or in lieu of the applications described above). For example, the present system(s) and method(s) may be used to coordinate data from different process sensors and/or tools, which may be different even for the same measured or imaged target.

Fig. 17 is a block diagram illustrating a computer system 100 that may perform and/or assist in implementing the methods, processes, systems, or devices disclosed herein. Computer system 100 includes a bus 102 or other communication mechanism for communicating information, and a processor 104 (or multiple processors 104 and 105) coupled with bus 102 for processing information. Computer system 100 also includes a main memory 106, such as a Random Access Memory (RAM) or other dynamic storage device, coupled to bus 102 for storing information and instructions to be executed by processor 104. Main memory 106 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 104. Computer system 100 also includes a Read Only Memory (ROM) 108 or other static storage device coupled to bus 102 for storing static information and instructions for processor 104. A storage device 110, such as a magnetic disk or optical disk, is provided and the storage device 110 is coupled to the bus 102 for storing information and instructions.

Computer system 100 may be coupled via bus 102 to a display 112, such as a Cathode Ray Tube (CRT) or flat panel display or touch panel display, for displaying information to a computer user. An input device 114, including alphanumeric and other keys, is coupled to bus 102 for communicating information and command selections to processor 104. Another type of user input device is a cursor control 116, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 104 and for controlling cursor movement on display 112. Such input devices typically have two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allow the device to specify positions in a plane. A touch panel (screen) display may also be used as an input device.

According to one embodiment, portions of one or more methods described herein may be performed by computer system 100 in response to processor 104 executing one or more sequences of one or more instructions contained in main memory 106. Such instructions may be read into main memory 106 from another computer-readable medium, such as storage device 110. Execution of the sequences of instructions contained in main memory 106 causes processor 104 to perform the process steps described herein. One or more processors in a multi-processing arrangement may also be employed to execute the sequences of instructions contained in main memory 106. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions. Thus, the description herein is not limited to any specific combination of hardware circuitry and software.

The term "computer-readable medium" as used herein refers to any medium that participates in providing instructions to processor 104 for execution. Such a medium may take many forms, including but not limited to, non-volatile media, and transmission media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 110. Volatile media includes volatile memory, such as main memory 106. Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 102. Transmission media can also take the form of acoustic or light waves, such as those generated during Radio Frequency (RF) and Infrared (IR) data communications. Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read.

Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to processor 104 for execution. For example, the instructions may initially be carried on a magnetic disk of a remote computer. The remote computer can load the instructions into its volatile memory and send the instructions over a telephone line using a modem. A modem local to computer system 100 can receive the data on the telephone line and use an infrared transmitter to convert the data to an infrared signal. An infrared detector coupled to bus 102 can receive the data carried in the infrared signal and place the data on bus 102. Bus 102 carries the data to main memory 106, from which processor 104 retrieves and executes the instructions. The instructions received by main memory 106 may optionally be stored on storage device 110 either before or after execution by processor 104.

Computer system 100 may also include a communication interface 118 coupled to bus 102. Communication interface 118 provides a two-way data communication coupling to a network link 120 that is connected to a local area network 122. For example, communication interface 118 may be an Integrated Services Digital Network (ISDN) card or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 118 may be a Local Area Network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation, communication interface 118 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.

Network link 120 typically provides data communication through one or more networks to other data devices. For example, network link 120 may provide a connection through local network 122 to a host computer 124 or to data equipment operated by an Internet Service Provider (ISP) 126. ISP 126 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the "Internet" 128. Both local network 122 and internet 128 use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 120 and through communication interface 118, which carry the digital data to computer system 100 and carry the digital data from computer system 100, are exemplary forms of carrier waves transporting the information.

Computer system 100 can send messages and receive data, including program code, through the network(s), network link 120 and communication interface 118. In the Internet example, a server 130 might transmit a requested code for an application program through Internet 128, ISP 126, local network 122 and communication interface 118. For example, one such downloaded application may provide all or part of the methods described herein. The received code may be executed by processor 104 as it is received, and/or stored in storage device 110, or other non-volatile storage for later execution. In this manner, computer system 100 may obtain application code in the form of a carrier wave.

FIG. 18 is a detailed view of an alternative design of the lithographic projection apparatus LA shown in FIG. 1. (FIG. 1 relates to DUV radiation due to the use of lenses and the use of transparent reticles, while FIG. 18 relates to a lithographic apparatus using EUV radiation due to the use of mirrors and reflective reticles.) As shown in FIG. 18, a lithographic projection apparatus may comprise a source SO, an illumination system IL, and a projection system PS. The source SO is configured such that a vacuum environment may be maintained in the enclosure 220 of the source SO. EUV (for example) radiation emitting plasma 210 may be formed by a discharge-generated plasma radiation source. EUV radiation may be generated from a gas or vapor, such as Xe gas, li vapor, or Sn vapor, in which a plasma 210 is generated to emit radiation in the EUV range of the electromagnetic spectrum. The plasma 210 is generated, for example, by a discharge that generates an at least partially ionized plasma. For efficient generation of radiation, for example, 10Pa partial pressure of Xe, li, sn vapor or any other suitable gas or vapor may be required. In some embodiments, an excited plasma of tin (Sn) is provided to generate EUV radiation.

Radiation emitted by the plasma 210 is transferred from the source chamber 211 into the collector chamber 212 via an optional gas barrier or contaminant trap 230 (also referred to as a contaminant barrier or foil trap in some cases) positioned in or behind the opening of the source chamber 211. The contaminant trap 230 may include a channel structure. The chamber 211 may include a radiation collector CO, which may be, for example, a grazing incidence collector. The radiation collector CO has an upstream radiation collector side 251 and a downstream radiation collector side 252. Radiation traversing the collector CO may reflect from the grating spectral filter 240 to be focused into the virtual source point IF along the optical axis indicated by line "O". The virtual source point IF is commonly referred to as an intermediate focus, and the sources are arranged such that the intermediate focus IF is located at or near the opening 221 in the enclosure 220. The virtual source point IF is an image of the radiation-emitting plasma 210.

The radiation then traverses the illumination system IL, which may include a facet field mirror device 22 and a facet pupil mirror device 24, the facet field mirror device 22 and the facet pupil mirror device 24 being arranged to provide a desired angular distribution of the radiation beam 21 at the patterning device MA, and a desired uniformity of the radiation intensity at the patterning device MA. After reflection of the radiation beam 21 at the patterning device MA, which is held by the support structure (table) T, a patterned beam 26 is formed, and the patterned beam 26 is imaged by the projection system PS via reflective elements 28, 30 onto a substrate W held by the substrate table WT. More elements than shown may generally be present in the illumination optics unit IL and the projection system PS. Depending on, for example, the type of lithographic apparatus, there may optionally be a grating spectral filter 240. Furthermore, there may be more mirrors than those shown in the figures, for example, there may be 1 to 6 more additional reflective elements in the projection system PS than those shown in fig. 18.

The collector optics CO as illustrated in fig. 18 is depicted as a nest-like collector with grazing incidence reflectors 253, 254 and 255, as just an example of a collector (or collector mirror). The grazing incidence reflectors 253, 254 and 255 are arranged axially symmetrically about the optical axis O, and this type of collector optics CO may be used in combination with a discharge-generating plasma source, commonly referred to as DPP source.

Other embodiments are disclosed in the subsequent list of numbered aspects:

1. a non-transitory computer-readable medium having instructions thereon configured to cause a computer to execute a modular automatic encoder model for parameter estimation, the modular automatic encoder model comprising:

one or more input models configured to process one or more inputs into a first-level dimension suitable for combination with other inputs;

a common model configured to:

combining the processed inputs and reducing the dimensions of the combined processed inputs to generate low-dimensional data in a potential space, the low-dimensional data in the potential space having a resulting reduced second-level dimension that is smaller than the first-level dimension;

Expanding the low-dimensional data in the potential space into one or more expanded versions of one or more inputs, the one or more expanded versions of one or more inputs having an increased dimension as compared to the low-dimensional data in the potential space, the one or more expanded versions of one or more inputs adapted to produce one or more different outputs;

one or more output models configured to use one or more expanded versions of one or more inputs to generate one or more different outputs, the one or more different outputs being approximations of the one or more inputs, the one or more different outputs having the same or increased dimensions as compared to the expanded version of the one or more inputs; and

a predictive model configured to estimate one or more parameters based on low dimensional data and/or one or more different outputs in a potential space.

2. The medium of aspect 1, wherein the separate input and/or output models include two or more sub-models, the two or more sub-models being associated with different portions of the sensing operation and/or manufacturing process.

3. The medium of any of the preceding aspects, wherein the separate output model comprises two or more sub-models, and the two or more sub-models comprise a sensor model and a stack model for semiconductor sensor operation.

4. The medium of any of the preceding aspects, wherein the one or more input models, the common model, and the one or more output models are separate from each other and correspond to process physical property differences in different portions of the manufacturing process and/or the sensing operation such that each of the one or more input models, the common model, and/or the one or more output models, among other models in the modular automatic encoder model, may be trained together and/or separately based on process physical properties of the respective portions of the manufacturing process and/or the sensing operation, but configured separately.

5. The medium of any of the preceding aspects, wherein the number of one or more input models and the number of one or more output models are determined based on process physical property differences in different portions of the manufacturing process and/or sensing operation.

6. The medium of any of the preceding aspects, wherein the number of input models is different from the number of output models.

7. The medium of any one of the preceding aspects, wherein:

the common model includes an encoder-decoder architecture and/or a variant encoder-decoder architecture;

processing one or more inputs into a first-level dimension, and reducing the dimension of the combined processed inputs includes encoding; and is also provided with

Expanding the low-dimensional data in the potential space into one or more expanded versions of the one or more inputs includes decoding.

8. The medium of any of the preceding aspects, wherein the modular automatic encoder model is trained by comparing one or more different outputs to respective inputs, and adjusting parameterization of one or more input models, a common model, and/or one or more output models to reduce or minimize differences between the outputs and the respective inputs.

9. The medium of any of the preceding aspects, wherein the common model comprises an encoder and a decoder, and wherein the modular automatic encoder model is trained by:

applying the changes to the low-dimensional data in the potential space such that the common model decodes the relatively more continuous potential space to produce a decoder signal;

Providing the decoder signal to the encoder in a recursive manner to generate new low-dimensional data;

comparing the new low-dimensional data with the low-dimensional data; and

one or more components of the modular automatic encoder model are adjusted based on the comparison to reduce or minimize differences between the new low-dimensional data and the low-dimensional data.

10. The medium of any one of the preceding aspects, wherein:

the one or more parameters are semiconductor manufacturing process parameters;

the one or more input models and/or the one or more output models include dense feed-forward layer, convolutional layer, and/or residual network architecture of the modular automatic encoder model;

the common model includes a feed forward layer and/or a residual layer; and is also provided with

The prediction model includes a feed forward layer and/or a residual layer.

11. A method for parameter estimation, the method comprising:

processing, by one or more input models of the modular automatic encoder model, the one or more inputs into a first-level dimension suitable for combination with other inputs;

combining the processed inputs by a common model of the modular automatic encoder model and reducing the dimensions of the combined processed inputs to produce low-dimensional data in a potential space, the low-dimensional data in the potential space having a resulting reduced second-level dimension that is smaller than the first-level dimension;

Expanding, by the common model, the low-dimensional data in the potential space into one or more expanded versions of one or more inputs, the one or more expanded versions of one or more inputs having an increased dimension as compared to the low-dimensional data in the potential space, the one or more expanded versions of one or more inputs adapted to produce one or more different outputs;

using, by one or more output models of the modular automatic encoder model, one or more expanded versions of the one or more inputs to generate one or more different outputs, the one or more different outputs being approximations of the one or more inputs, the one or more different outputs having the same or increased dimensions as compared to the expanded version of the one or more inputs; and

one or more parameters are estimated based on low-dimensional data and/or one or more outputs in the potential space by a predictive model of the modular automatic encoder model.

12. The method of any of the preceding aspects, wherein the separate input and/or output models comprise two or more sub-models, the two or more sub-models being associated with different portions of the sensing operation and/or manufacturing process.

13. The method of any of the preceding aspects, wherein the separate output model comprises two or more sub-models, and the two or more sub-models comprise a sensor model and a stack model for semiconductor sensor operation.

14. The method of any of the preceding aspects, wherein the one or more input models, the common model, and the one or more output models are separate from each other and correspond to process physical property differences in different portions of the manufacturing process and/or the sensing operation such that each of the one or more input models, the common model, and/or the one or more output models, among other models in the modular automatic encoder model, may be trained together and/or separately based on process physical properties of the respective portions of the manufacturing process and/or the sensing operation, but configured separately.

15. The method of any of the preceding aspects, further comprising: the number of one or more input models and/or the number of one or more output models is determined based on differences in process physical properties in different portions of the manufacturing process and/or sensing operation.

16. The method of any of the preceding aspects, wherein the number of input models is different from the number of output models.

17. The method of any of the preceding aspects, wherein:

18. The method of any of the preceding aspects, further comprising training the modular automatic encoder model by comparing one or more different outputs to respective inputs, and adjusting parameterization of one or more input models, common models, and/or one or more output models to reduce or minimize differences between the outputs and the respective inputs.

19. The method of any of the preceding aspects, wherein the common model comprises an encoder and a decoder, the method further comprising training a modular automatic encoder model by:

comparing the new low-dimensional data with the low-dimensional data; and

20. The method of any of the preceding aspects, wherein:

the one or more parameters are semiconductor manufacturing process parameters;

The prediction model includes a feed forward layer and/or a residual layer.

21. A system, comprising:

one or more input models of the modular automatic encoder model, the one or more input models configured to process one or more inputs into a first-level dimension suitable for combination with other inputs;

a common model of modular automatic encoder models, the common model configured to:

one or more output models of the modular automatic encoder model, the one or more output models configured to use one or more expanded versions of the one or more inputs to generate one or more different outputs, the one or more different outputs being approximations of the one or more inputs, the one or more different outputs having the same or increased dimensions as compared to the expanded version of the one or more inputs; and

a predictive model of a modular automatic encoder model configured to estimate one or more parameters based on low dimensional data and/or one or more outputs in a potential space.

22. The system of any of the preceding aspects, wherein the separate input and/or output models comprise two or more sub-models, the two or more sub-models being associated with different portions of the sensing operation and/or manufacturing process.

23. The system of any of the preceding aspects, wherein the separate output model comprises two or more sub-models, and the two or more sub-models comprise a sensor model and a stack model for semiconductor sensor operation.

24. The system of any of the preceding aspects, wherein the one or more input models, the common model, and the one or more output models are separate from each other and correspond to process physical property differences in different portions of the manufacturing process and/or the sensing operation such that each of the one or more input models, the common model, and/or the one or more output models, among other models in the modular automatic encoder model, may be trained together and/or separately based on process physical properties of the respective portions of the manufacturing process and/or the sensing operation, but configured separately.

25. The system of any of the preceding aspects, wherein the number of one or more input models and the number of one or more output models are determined based on process physical property differences in different portions of the manufacturing process and/or sensing operation.

26. The system of any of the preceding aspects, wherein the number of input models is different from the number of output models.

27. The system of any one of the preceding aspects, wherein:

28. The system of any of the preceding aspects, wherein the modular automatic encoder model is trained by comparing one or more different outputs to respective inputs, and adjusting parameterization of one or more input models, a common model, and/or one or more output models to reduce or minimize differences between the outputs and the respective inputs.

29. The system of any of the preceding aspects, wherein the common model comprises an encoder and a decoder, and wherein the modular automatic encoder model is trained by:

comparing the new low-dimensional data with the low-dimensional data; and

30. The system of any one of the preceding aspects, wherein:

the one or more parameters are semiconductor manufacturing process parameters;

The prediction model includes a feed forward layer and/or a residual layer.

31. A non-transitory computer-readable medium having instructions thereon configured to cause a computer to execute a machine learning model for parameter estimation, the machine learning model comprising:

one or more first models configured to process one or more inputs into a first-level dimension suitable for combination with other inputs;

A second model configured to:

combining the processed one or more inputs and reducing the dimension of the combined processed one or more inputs;

expanding the combined processed one or more inputs into one or more recovered versions of the one or more inputs, the one or more recovered versions of the one or more inputs adapted to produce one or more different outputs;

one or more third models configured to use one or more recovered versions of one or more inputs to produce one or more different outputs; and

a fourth model configured to estimate parameters based on the combined compressed input and one or more different outputs with reduced dimensions.

32. The medium of any of the preceding aspects, wherein the individual models of the one or more third models comprise two or more sub-models, the two or more sub-models being associated with different portions of the manufacturing process and/or the sensing operation.

33. The medium of any of the preceding aspects, wherein the two or more sub-models comprise a sensor model and a stack model for a semiconductor manufacturing process.

34. The medium of any of the preceding aspects, wherein the one or more first models, the second models, and the one or more third models are separate from each other and correspond to process physical property differences in different portions of the manufacturing process and/or the sensing operation such that each of the one or more first models, the second models, and/or the one or more third models, among other models in the machine learning model, may be trained together and/or separately, but configured separately, based on process physical properties of the respective portions of the manufacturing process and/or the sensing operation.

35. The medium of any of the preceding aspects, wherein the number of one or more first models and the number of one or more third models are determined based on process physical property differences in different portions of the manufacturing process and/or sensing operation.

36. The medium of any of the preceding aspects, wherein the number of first models is different from the number of second models.

37. The medium of any one of the preceding aspects, wherein:

the second model includes an encoder-decoder architecture and/or a variant encoder-decoder architecture; compressing one or more inputs includes encoding; and is also provided with

Expanding the combined compressed one or more inputs into one or more recovered versions of the one or more inputs includes decoding.

38. The medium of any of the preceding aspects, wherein the machine learning model is trained by comparing one or more different outputs to respective inputs, and adjusting one or more first models, second models, and/or one or more third models to reduce or minimize differences between the outputs and the respective inputs.

39. The medium of any of the preceding aspects, wherein the second model comprises an encoder and a decoder, and wherein the second model is trained by:

applying the varying low-dimensional data in the potential space such that the second model decodes the relatively more continuous potential space to produce a decoder signal;

comparing the new low-dimensional data with the low-dimensional data; and

the second model is adjusted based on the comparison to reduce or minimize differences between the new low-dimensional data and the low-dimensional data.

40. The medium of any one of the preceding aspects, wherein:

The parameter is a semiconductor manufacturing process parameter;

the one or more first models and/or the one or more third models comprise dense feed-forward layer, convolutional layer, and/or residual network architecture of machine learning models;

the second model comprises a feedforward layer and/or a residual layer; and is also provided with

The fourth model includes a feed forward layer and/or a residual layer.

41. A non-transitory computer-readable medium having instructions thereon configured to cause a computer to execute a modular automatic encoder model that estimates a parameter of interest using a subset of a plurality of input models by estimating an available quantity of information content based on available channels of measurement data from an optical metrology platform to estimate a combination of available channels of measurement data from the optical metrology platform, the instructions causing operations comprising:

causing the plurality of input models to compress the plurality of inputs based on the available channels such that the plurality of inputs are adapted to be combined with each other; and

causing the common model to combine the compressed inputs and generate low-dimensional data in the potential space based on the combined compressed inputs, wherein the low-dimensional data estimates an available number, and wherein the low-dimensional data in the potential space is configured to be used by one or more additional models to generate approximations of the plurality of inputs and/or to estimate the parameters based on the low-dimensional data.

42. The medium of any of the preceding aspects, the instructions to cause further operations comprising:

the modular automatic encoder model is trained by:

iteratively changing a subset of the compressed inputs combined by the common model and used to generate the low-dimensional training data;

comparing one or more training approximations and/or training parameters generated or predicted based on the low-dimensional training data to corresponding references; and

adjusting one or more of the plurality of input models, the common model, and/or one or more of the additional models to reduce or minimize differences between the one or more training approximations and/or training parameters and the reference based on the comparison;

such that the common model is configured to combine the compressed inputs and generate the low-dimensional data for use in generating the approximation and/or the estimated parameters, regardless of which of the plurality of inputs are combined by the common model.

43. The medium of any of the preceding aspects, wherein the variation of individual iterations is random, or wherein the variation of individual iterations is varied in a statistically significant manner.

44. The medium of any of the preceding aspects, wherein the variation of individual iterations is configured such that after a target number of iterations, each of the compressed inputs has been included at least once in a subset of the compressed inputs.

45. The medium of any of the preceding aspects, wherein iteratively changing the subset of compressed inputs combined by the common model and used to generate the low-dimensional training data comprises channel selection from among a set of possible available channels associated with the optical metrology platform.

46. The medium of any of the preceding aspects, wherein iteratively changing, comparing, and adjusting are repeated until the target converges.

47. The medium of any of the preceding aspects, wherein iteratively changing, comparing, and adjusting is configured to reduce or eliminate bias that may occur for a combined search throughout a channel.

48. The medium of any of the preceding aspects, wherein the one or more additional models comprise: one or more output models configured to generate approximations of one or more inputs; and a predictive model configured to estimate parameters based on the low-dimensional data, and

wherein one or more of the plurality of input models, the common model, and/or the additional model are configured to adjust to reduce or minimize differences between the one or more training approximations and/or training manufacturing process parameters and the respective references.

49. The medium of any of the preceding aspects, wherein the plurality of input models, the common model, and the one or more output models are separate from each other and correspond to process physical property differences in different portions of the manufacturing process and/or the sensing operation such that each of the plurality of input models, the common model, and/or the one or more output models may be trained together and/or separately, but configured separately, based on process physical properties of the respective portions of the manufacturing process and/or the sensing operation, among other models in the modular automatic encoder model.

50. The medium of any one of the preceding aspects, wherein:

the individual input models include neural network blocks including dense feed-forward, convolutional, and/or residual network architectures of the modular automatic encoder model; and is also provided with

The common model includes a neural network block that includes a feed-forward layer and/or a residual layer.

51. A method for estimating an available amount of information content by using a subset of a plurality of input models of a modular automatic encoder model based on available channels of measurement data from an optical metrology platform to estimate a parameter of interest from a combination of available channels of measurement data from the optical metrology platform, the method comprising:

Causing the plurality of input models to compress the plurality of inputs based on the available channels such that the plurality of inputs are adapted to be combined with each other; and is also provided with

Such that the common model of the modular automatic encoder models combines the compressed inputs and generates low-dimensional data in the potential space based on the combined compressed inputs, wherein the low-dimensional data estimates an acquirable number, and wherein the low-dimensional data in the potential space is configured to be used by one or more additional models to generate approximations of the plurality of inputs and/or to estimate parameters based on the low-dimensional data.

52. The method of any of the preceding aspects, the method further comprising:

the modular automatic encoder model is trained by:

Such that the common model is configured to combine the compressed inputs and generate low-dimensional data for generating approximations and/or estimated parameters, regardless of which of the plurality of inputs are combined by the common model.

53. The method of any of the preceding aspects, wherein the variation of individual iterations is random, or wherein the variation of individual iterations is varied in a statistically significant manner.

54. The method of any of the preceding aspects, wherein the variation of individual iterations is configured such that after a target number of iterations, each of the compressed inputs has been included at least once in a subset of the compressed inputs.

55. The method of any of the preceding aspects, wherein iteratively changing the subset of compressed inputs combined by the common model and used to generate the low-dimensional training data comprises channel selection from among a set of possible available channels associated with the optical metrology platform.

56. The method of any of the preceding aspects, wherein iteratively changing, comparing, and adjusting are repeated until the target converges.

57. The method of any of the preceding aspects, wherein iteratively changing, comparing, and adjusting is configured to reduce or eliminate bias that may occur for a combined search throughout a channel.

58. The method of any of the preceding aspects, wherein the one or more additional models comprise: one or more output models configured to generate approximations of one or more inputs; and a predictive model configured to estimate parameters based on the low-dimensional data, and

59. The method of any of the preceding aspects, wherein the plurality of input models, the common model, and the one or more output models are separate from each other and correspond to process physical property differences in different portions of the manufacturing process and/or the sensing operation such that each of the plurality of input models, the common model, and/or the one or more output models may be trained together and/or separately, but configured separately, based on process physical properties of the respective portions of the manufacturing process and/or the sensing operation, among other models in the modular automatic encoder model.

60. The method of any of the preceding aspects, wherein:

61. A system for estimating an available amount of information content by using a subset of a plurality of input models of a modular automatic encoder model based on available channels of measurement data from an optical metrology platform to estimate a parameter of interest from a combination of available channels of measurement data from the optical metrology platform, the system comprising:

a plurality of input models configured to compress a plurality of inputs based on available channels such that the plurality of inputs are adapted to be combined with one another; and

a common model of the modular automatic encoder models configured to combine the compressed inputs and generate low-dimensional data in the potential space based on the combined compressed inputs, wherein the low-dimensional data estimates an acquirable number, and wherein the low-dimensional data in the potential space is configured to be used by one or more additional models to generate approximations of the plurality of inputs and/or estimate parameters based on the low-dimensional data.

62. The system of any of the preceding aspects, wherein the modular automatic encoder model is configured to be trained by:

63. The system of any of the preceding aspects, wherein the variation of individual iterations is random, or wherein the variation of individual iterations is varied in a statistically significant manner.

64. The system of any of the preceding aspects, wherein the variation of individual iterations is configured such that after a target number of iterations, each of the compressed inputs has been included at least once in a subset of the compressed inputs.

65. The system of any of the preceding aspects, wherein iteratively changing the subset of compressed inputs combined by the common model and used to generate the low-dimensional training data comprises channel selection from among a set of possible available channels associated with the optical metrology platform.

66. The system of any of the preceding aspects, wherein iteratively changing, comparing, and adjusting are repeated until the target converges.

67. The system of any of the preceding aspects, wherein iteratively changing, comparing, and adjusting is configured to reduce or eliminate bias that may occur for a combined search throughout a channel.

68. The system of any of the preceding aspects, wherein the one or more additional models comprise: one or more output models configured to generate approximations of one or more inputs; and a predictive model configured to estimate parameters based on the low-dimensional data, and

69. The system of any of the preceding aspects, wherein the plurality of input models, the common model, and the one or more output models are separate from each other and correspond to process physical property differences in different portions of the manufacturing process and/or the sensing operation such that each of the plurality of input models, the common model, and/or the one or more output models may be trained together and/or separately, but configured separately, based on process physical properties of the respective portions of the manufacturing process and/or the sensing operation, among other models in the modular automatic encoder model.

70. The system of any one of the preceding aspects, wherein:

71. A non-transitory computer-readable medium having instructions thereon configured to cause a computer to execute a modular automatic encoder model for parameter estimation, the instructions causing operations comprising:

Compressing the plurality of inputs by the plurality of input models such that the plurality of inputs are adapted to be combined with each other; and is also provided with

Such that the common model combines the compressed inputs and generates low-dimensional data in a potential space based on the combined compressed inputs, the low-dimensional data in the potential space configured to be used by one or more additional models to generate approximations of the one or more inputs and/or predict parameters based on the low-dimensional data,

wherein the common model is configured to combine the compressed inputs and produce low-dimensional data regardless of which of the plurality of inputs are combined by the common model.

72. The medium of any of the preceding aspects, the instructions to cause further operations comprising:

the modular automatic encoder is trained by:

comparing one or more training approximations and/or training parameters generated or estimated based on the low-dimensional training data to corresponding references; and

adjusting one or more of the plurality of input models, the common model, and/or the additional model to reduce or minimize one or more training approximations and/or differences between the training parameters and the reference based on the comparison;

Such that the common model is configured to combine the compressed inputs and generate low-dimensional data for use in generating approximations and/or estimating process parameters, regardless of which of the plurality of inputs are combined by the common model.

73. The medium of any of the preceding aspects, wherein the variation of individual iterations is random, or wherein the variation of individual iterations is varied in a statistically significant manner.

74. The medium of any of the preceding aspects, wherein the variation of individual iterations is configured such that after a target number of iterations, each of the compressed inputs has been included at least once in a subset of the compressed inputs.

75. The medium of any of the preceding aspects, wherein the one or more additional models comprise: one or more output models configured to generate approximations of one or more inputs; and a predictive model configured to estimate parameters based on the low-dimensional data, and

wherein adjusting one or more of the plurality of input models, the common model, and/or the additional model to reduce or minimize one or more training approximations and/or differences between the training parameters and the reference based on the comparison comprises adjusting at least one output model and/or the predictive model.

76. The medium of any of the preceding aspects, wherein the plurality of input models, the common model, and the one or more output models are separate from each other and correspond to process physical property differences in different portions of the manufacturing process and/or the sensing operation such that each of the plurality of input models, the common model, and/or the one or more output models may be trained together and/or separately, but configured separately, based on process physical properties of the respective portions of the manufacturing process and/or the sensing operation, among other models in the modular automatic encoder model.

77. The medium of any of the preceding aspects, wherein iteratively changing the subset of compressed inputs combined by the common model and used to generate the low-dimensional training data comprises channel selection from among a set of possible channels associated with one or more aspects of a semiconductor manufacturing process and/or sensing operation.

78. The medium of any of the preceding aspects, wherein iteratively changing, comparing, and adjusting are repeated until the target converges.

79. The medium of any of the preceding aspects, wherein iteratively changing, comparing, and adjusting is configured to reduce or eliminate bias relative to bias that may occur for a combined search throughout a channel.

80. The medium of any one of the preceding aspects, wherein:

the parameter is a semiconductor manufacturing process parameter;

81. A non-transitory computer-readable medium having instructions thereon configured to cause a computer to execute a modular auto-encoder model having an extended range of applicability for estimating a parameter of interest of an optical metrology operation by enforcing known properties of an input of the modular auto-encoder model in a decoder of the modular auto-encoder model, the instructions causing operations comprising:

causing an encoder of the modular automatic encoder model to encode the input to produce a low-dimensional representation of the input in potential space; and

such that a decoder of the modular automatic encoder model generates an output corresponding to the input by decoding the low-dimensional representation, wherein the decoder is configured to enforce known properties of the encoded input during decoding to generate the output, wherein the known properties are associated with a known physical relationship between the low-dimensional representation in the potential space and the output, and wherein the parameter of interest is estimated based on the low-dimensional representation of the input in the output and/or the potential space.

82. The medium of any of the preceding aspects, wherein enforcing comprises using a penalty term in a cost function associated with the decoder to penalize differences between outputs and outputs that should be generated according to known properties.

83. The medium of any of the preceding aspects, wherein the penalty term comprises a difference between decoded versions of the low-dimensional representations of the inputs that are related to each other via a physical prior.

84. The medium of any of the preceding aspects, wherein the known property is a known symmetry property, and wherein the penalty term comprises a difference between decoded versions of the low-dimensional representation of the input, the decoded versions being reflected with respect to each other across or rotated around a point of symmetry.

85. The medium of any of the preceding aspects, wherein the encoder and/or decoder are configured to adjust based on any differences between decoded versions of the low-dimensional representation, wherein adjusting comprises adjusting at least one weight associated with a layer of the encoder and/or decoder.

86. The medium of any of the preceding aspects, wherein the input comprises a sensor signal associated with a sensing operation in a semiconductor manufacturing process, the low-dimensional representation of the input is a compressed representation of the sensor signal, and the output is an approximation of the input sensor signal.

87. The medium of any of the preceding aspects, wherein the sensor signal comprises a pupil image, and wherein the encoded representation of the pupil image is configured for estimating an overlap (as one example of a number of possible parameters of interest).

88. The medium of any of the preceding aspects, wherein the instructions cause other operations comprising:

processing the input into a first-level dimension suitable for combination with other inputs by an input model of a modular automatic encoder model, and providing the processed input to an encoder;

receiving an expanded version of the input from the decoder through an output model of the modular automatic encoder model, and generating an approximation of the input based on the expanded version; and

parameters of interest are estimated based on a low-dimensional representation of the input and/or the output (the output including and/or being associated with approximations of the input) in the potential space by a predictive model of the modular automatic encoder model.

89. The medium of any of the preceding aspects, wherein the input model, encoder/decoder, and output model are separate from each other and correspond to process physical property differences in different portions of the manufacturing process and/or sensing operation such that each of the input model, encoder/decoder, and/or output model, in addition to other models in the modular automatic encoder model, may be trained together and/or separately based on process physical properties of the respective portions of the manufacturing process and/or sensing operation, but configured separately.

90. The medium of any of the preceding aspects, wherein the decoder is configured to enforce known symmetry properties of the encoded input during the training phase such that the modular automatic encoder model complies with the enforced known symmetry properties during the inference phase.

91. A method for estimating a parameter of interest of an optical metrology operation using a modular auto-encoder model having an extended range of applicability by enforcing known properties of an input of the modular auto-encoder model in a decoder of the modular auto-encoder model, the instructions causing operations comprising:

such that a decoder of the modular automatic encoder model generates an output corresponding to the input by decoding the low-dimensional representation, wherein the decoder is configured to force known properties of the encoded input during decoding to generate the output, wherein the known properties are associated with a known physical relationship between the low-dimensional representation in the potential space and the output, and wherein the parameter of interest is estimated based on the low-dimensional representation of the input in the output and/or the potential space.

92. The method of any of the preceding aspects, wherein enforcing comprises using a penalty term in a cost function associated with the decoder to penalize differences between outputs and outputs that should be generated according to known properties.

93. The method of any of the preceding aspects, wherein the penalty term comprises a difference between decoded versions of the low-dimensional representations of the inputs that are related to each other via a physical prior.

94. The method of any of the preceding aspects, wherein the known property is a known symmetry property, and wherein the penalty term comprises a difference between decoded versions of the low-dimensional representation of the input, the decoded versions being reflected with respect to each other across or rotated around a symmetry point.

95. The method of any of the preceding aspects, wherein the encoder and/or decoder are configured to adjust based on any differences between decoded versions of the low-dimensional representation, wherein adjusting comprises adjusting at least one weight associated with a layer of the encoder and/or decoder.

96. The method of any of the preceding aspects, wherein the input comprises a sensor signal associated with a sensing operation in a semiconductor manufacturing process, the low-dimensional representation of the input is a compressed representation of the sensor signal, and the output is an approximation of the input sensor signal.

97. The method of any of the preceding aspects, wherein the sensor signal comprises a pupil image, and wherein the encoded representation of the pupil image is configured for estimating an overlap (as one example of a number of possible parameters of interest).

98. The method of any of the preceding aspects, the method further comprising:

parameters of interest are estimated based on a low-dimensional representation of the input and/or the output (the output including and/or being related to an approximation of the input) in the potential space by a predictive model of the modular automatic encoder model.

99. The method of any of the preceding aspects, wherein the input model, encoder/decoder, and output model are separate from each other and correspond to process physical property differences in different portions of the manufacturing process and/or sensing operation such that each of the input model, encoder/decoder, and/or output model may be trained together and/or separately, but configured separately, based on process physical properties of the respective portions of the manufacturing process and/or sensing operation, in addition to other models in the modular automatic encoder model.

100. The method of any of the preceding aspects, wherein the decoder is configured to enforce known symmetry properties of the encoded input during the training phase such that the modular automatic encoder model complies with the enforced known symmetry properties during the inference phase.

101. A system configured to execute a modular automatic encoder model having an extended range of applicability for estimating a parameter of interest of an optical metrology operation by enforcing known properties of an input of the modular automatic encoder model in a decoder of the modular automatic encoder model, the system comprising:

an encoder of a modular automatic encoder model, the encoder configured to encode an input to produce a low-dimensional representation of the input in potential space; and

a decoder of a modular automatic encoder model configured to generate an output corresponding to an input by decoding a low-dimensional representation, wherein the decoder is configured to enforce known properties of the encoded input during decoding to generate the output, wherein the known properties are associated with a known physical relationship between the low-dimensional representation in potential space and the output, and wherein a parameter of interest is estimated based on the output and/or the low-dimensional representation of the input in potential space.

102. The system of any of the preceding aspects, wherein enforcing comprises using a penalty term in a cost function associated with the decoder to penalize differences between outputs and outputs that should be generated according to known properties.

103. The system of any of the preceding aspects, wherein the penalty term comprises a difference between decoded versions of the low-dimensional representations of the inputs that are related to each other via a physical prior.

104. The system of any of the preceding aspects, wherein the known property is a known symmetry property, and wherein the penalty term comprises a difference between decoded versions of the low-dimensional representation of the input, the decoded versions being reflected with respect to each other across or rotated around a point of symmetry.

105. The system of any of the preceding aspects, wherein the encoder and/or decoder are configured to adjust based on any differences between decoded versions of the low-dimensional representation, wherein adjusting comprises adjusting at least one weight associated with a layer of the encoder and/or decoder.

106. The system of any of the preceding aspects, wherein the input comprises a sensor signal associated with a sensing operation in a semiconductor manufacturing process, the low-dimensional representation of the input is a compressed representation of the sensor signal, and the output is an approximation of the input sensor signal.

107. The system of any of the preceding aspects, wherein the sensor signal comprises a pupil image, and wherein the encoded representation of the pupil image is configured for estimating overlap (as one example of many possible parameters of interest).

108. The system of any of the preceding aspects, further comprising:

an input model of a modular automatic encoder model, the input model configured to process input into a first-level dimension suitable for combination with other inputs, and to provide the processed input to an encoder;

an output model of the modular automatic encoder model, the output model configured to receive an expanded version of the input from the decoder and to generate an approximation of the input based on the expanded version; and

a predictive model of a modular automatic encoder model, the predictive model configured to estimate a parameter of interest based on a low-dimensional representation of an input in a potential space.

109. The system of any of the preceding aspects, wherein the input model, encoder/decoder, and output model are separate from each other and correspond to process physical property differences in different portions of the manufacturing process and/or sensing operation such that each of the input model, encoder/decoder, and/or output model may be trained together and/or separately, but configured separately, based on process physical properties of the respective portions of the manufacturing process and/or sensing operation, in addition to other models in the modular automatic encoder model.

110. The system of any of the preceding aspects, wherein the decoder is configured to enforce known symmetry properties of the encoded input during the training phase such that the modular automatic encoder model complies with the enforced known symmetry properties during the inference phase.

111. A non-transitory computer-readable medium having instructions thereon configured to cause a computer to execute a modular automatic encoder model configured to generate an output based on an input, the instructions causing operations comprising:

such that a decoder of the modular automatic encoder model generates an output by decoding the low-dimensional representation, wherein the decoder is configured to force a known property of the encoded input during decoding to generate the output, the known property being associated with a known physical relationship between the low-dimensional representation in the potential space and the output.

112. The medium of any of the preceding aspects, wherein enforcing comprises using a penalty term in a cost function associated with the decoder to penalize differences between outputs and outputs that should be generated according to known properties.

113. The medium of any of the preceding aspects, wherein the penalty term comprises a difference between decoded versions of the low-dimensional representations of the inputs that are related to each other via a physical prior.

114. The medium of any of the preceding aspects, wherein the encoder and/or decoder are configured to adjust based on any differences between decoded versions of the low-dimensional representation, wherein adjusting comprises adjusting at least one weight associated with a layer of the encoder and/or decoder.

115. The medium of any of the preceding aspects, wherein the input comprises a sensor signal associated with a sensing operation in a semiconductor manufacturing process, the low-dimensional representation of the input is a compressed representation of the sensor signal, and the output is an approximation of the input sensor signal.

116. The medium of any of the preceding aspects, wherein the sensor signal comprises a pupil image, and wherein the encoded representation of the pupil image is configured for estimating an overlap (as one example of a number of possible parameters of interest).

117. The medium of any of the preceding aspects, wherein the modular automatic encoder model further comprises:

An input model configured to process an input into a first-level dimension suitable for combination with other inputs and to provide the processed input to an encoder;

an output model configured to receive the expanded version of the input from the decoder and to generate an approximation of the input based on the expanded version; and

a predictive model configured to estimate manufacturing process parameters based on a low-dimensional representation of an input in a potential space.

118. The medium of any one of the preceding aspects, wherein:

the parameter is a semiconductor manufacturing process parameter;

the input model comprises a neural network block comprising a dense feed-forward layer, convolutional layer, and/or residual network architecture of a modular automatic encoder model;

the encoder and/or decoder comprises a neural network block comprising a feed-forward layer and/or a residual layer; and is also provided with

The prediction model comprises a neural network block comprising a feed-forward layer and/or a residual layer.

119. The medium of any of the preceding aspects, wherein the input model, encoder/decoder, and output model are separate from each other and correspond to process physical property differences in different portions of the manufacturing process and/or sensing operation such that each of the input model, encoder/decoder, and/or output model, in addition to other models in the modular automatic encoder model, may be trained together and/or separately based on process physical properties of the respective portions of the manufacturing process and/or sensing operation, but configured separately.

120. The medium of any of the preceding aspects, wherein the decoder is configured to enforce known symmetry properties of the encoded input during the training phase such that the modular automatic encoder model complies with the enforced known symmetry properties during the inference phase.

The concepts disclosed herein may model or mathematically model any general imaging system for imaging sub-wavelength features, and may be used in particular for emerging imaging technologies capable of producing shorter and shorter wavelengths. Emerging technologies that have been in use include EUV (extreme ultraviolet), DUV lithography that can produce 193nm wavelengths by using ArF lasers and even 157nm wavelengths by using fluorine lasers. Furthermore, EUV lithography can produce wavelengths in the range of 20nm to 5nm by using synchrotrons or by striking the material (solid or plasma) with high energy electrons in order to produce photons in this range.

While the concepts disclosed herein may be used for imaging on substrates such as silicon wafers, it should be understood that the disclosed concepts may be used with any type of lithographic imaging system, such as a lithographic imaging system and/or metrology system for imaging on substrates other than silicon wafers. In addition, combinations and subcombinations of the disclosed elements may include separate embodiments. For example, predicting complex electric field images and determining metrology metrics such as overlap may be performed by the same parameterized model and/or different parameterized models. These features may comprise separate embodiments and/or may be used together in the same embodiment.

Although specific reference may be made herein to embodiments of the invention in the context of a lithographic apparatus, embodiments of the invention may be used in other apparatuses. Embodiments of the invention may form part of a mask inspection apparatus, a lithographic apparatus, or any apparatus that measures or processes an object such as a wafer (or other substrate) or a mask (or other patterning device). These devices may be generally referred to as lithographic tools. Such a lithographic tool may use vacuum conditions or ambient (non-vacuum) conditions.

While the foregoing may have specifically referred to the use of embodiments of the invention in the context of optical lithography, it will be appreciated that the invention is not limited to optical lithography, where the context allows, and may be used in other applications, for example imprint lithography. While specific embodiments of the invention have been described above, it should be appreciated that the invention may be practiced otherwise than as described. The above description is intended to be illustrative, and not restrictive. Accordingly, it will be apparent to those skilled in the art that modifications may be made to the invention as described without departing from the scope of the claims set out below.

Claims

1. A non-transitory computer-readable medium having instructions thereon configured to cause a computer to execute a modular auto-encoder model for estimating a parameter of interest using a subset of a plurality of input models by estimating an available number of information content based on available channels of metrology data from an optical metrology platform using a combination of the available channels of metrology data from the optical metrology platform, the instructions causing operations comprising:

causing the plurality of input models to compress a plurality of inputs based on the available channels such that the plurality of inputs are adapted to be combined with one another; and

combining the compressed inputs with a common model and generating low-dimensional data in a potential space based on the combined compressed inputs, wherein the low-dimensional data estimates the acquirable quantity, and wherein the low-dimensional data in the potential space is configured to be used by one or more additional models to generate approximations of the plurality of inputs and/or to estimate parameters based on the low-dimensional data.

2. The medium of claim 1, the instructions to cause further operations comprising:

Training the modular automatic encoder model by:

iteratively changing a subset of the compressed inputs combined by the common model and used to generate low-dimensional training data;

comparing training parameters, and/or one or more training approximations, generated or predicted based on the low-dimensional training data to respective references; and

adjusting one or more of the plurality of input models, the common model, and/or one or more of the additional models based on the comparison to reduce or minimize differences between the one or more training approximations and/or the training parameters and the reference;

such that the common model is configured to combine the compressed inputs and generate the low-dimensional data for generating the approximations and/or estimated parameters, regardless of which of the plurality of inputs are combined by the common model.

3. The medium of claim 2, wherein the variation of individual iterations is random, or wherein the variation of individual iterations varies in a statistically significant manner.

4. A method for estimating an available amount of information content by using a subset of a plurality of input models of a modular automatic encoder model based on available channels of metrology data from an optical metrology platform to estimate a parameter of interest from a combination of the available channels, the method comprising:

combining the compressed inputs with a common model of the modular automatic encoder model and generating low-dimensional data in a potential space based on the combined compressed inputs, wherein the low-dimensional data estimates the acquirable quantity, and wherein the low-dimensional data in the potential space is configured to be used by one or more additional models to generate approximations of the plurality of inputs and/or to estimate parameters based on the low-dimensional data.

5. The method of claim 4, the method further comprising:

training the modular automatic encoder model by:

6. The method of claim 5, wherein the variation of individual iterations is random, or wherein the variation of individual iterations varies in a statistically significant manner.

7. The method of claim 5 or 6, wherein the variation of individual iterations is configured such that after a target number of iterations, each of the compressed inputs has been included at least once in the subset of compressed inputs.

8. The method of any of claims 5 to 7, wherein iteratively changing the subset of compressed inputs combined by the common model and used to generate low-dimensional training data comprises: channel selection is made from among a set of possible available channels associated with the optical metrology platform.

9. The method of any of claims 5 to 8, wherein the iteratively varying, comparing, and adjusting are repeated until a target converges.

10. The method of any of claims 5 to 9, wherein the iteratively varying, comparing, and adjusting are configured to reduce or eliminate bias that may occur for a combined search throughout a channel.

11. The method of any of claims 4 to 10, wherein the one or more additional models comprise: one or more output models configured to generate approximations of the one or more inputs; and a predictive model configured to estimate the parameters based on the low-dimensional data, and

wherein one or more of the plurality of input models, the common model, and/or the additional model are configured to be adjusted to reduce or minimize differences between one or more training approximations and/or training manufacturing process parameters and respective references.

12. The method of claim 11, wherein the plurality of input models, the common model, and the one or more output models are separate from each other and correspond to process physical property differences in different portions of a manufacturing process and/or sensing operation such that each of the plurality of input models, the common model, and/or the one or more output models, among other models in the modular automatic encoder model, can be trained together, but configured separately, based on the process physical properties of the respective portions of the manufacturing process and/or sensing operation.

13. The method of any one of claims 4 to 12, wherein:

the individual input models include neural network blocks including dense feed-forward layers, convolutional layers, and/or residual network architecture of the modular automatic encoder model; and is also provided with

The common model includes a neural network block including a feed-forward layer and/or a residual layer.

14. A system for estimating an available amount of information content by using a subset of a plurality of input models of a modular automatic encoder model based on available channels of metrology data from an optical metrology platform to estimate a parameter of interest from a combination of the available channels, the system comprising:

the plurality of input models configured to compress a plurality of inputs based on the available channels such that the plurality of inputs are adapted to be combined with each other; and

a common model of the modular automatic encoder models configured to combine the compressed inputs and generate low-dimensional data in a potential space based on the combined compressed inputs, wherein the low-dimensional data estimates the acquirable quantity, and wherein the low-dimensional data in the potential space is configured to be used by one or more additional models to generate approximations of the plurality of inputs and/or to estimate parameters based on the low-dimensional data.

15. A non-transitory computer-readable medium having instructions thereon configured to cause a computer to execute a modular automatic encoder model for parameter estimation, the instructions causing operations comprising:

compressing a plurality of inputs by a plurality of input models such that the plurality of inputs are adapted to be combined with each other; and

combining the common models with the compressed inputs and generating low-dimensional data in a potential space based on the combined compressed inputs, the low-dimensional data in the potential space configured to be used by one or more additional models to generate approximations of the one or more inputs and/or predicting the parameters based on the low-dimensional data,

wherein the common model is configured to combine the compressed inputs and generate the low-dimensional data regardless of which of the plurality of inputs are combined by the common model.