CN116615750A

CN116615750A - Apparatus and method for determining three-dimensional data based on an image of a patterned substrate

Info

Publication number: CN116615750A
Application number: CN202180084294.9A
Authority: CN
Inventors: T·厚本; T·J·胡伊斯曼; M·皮萨伦科; S·A·米德尔布鲁克; C·巴蒂斯塔基斯; 曹宇
Original assignee: ASML Holding NV
Current assignee: ASML Holding NV
Priority date: 2020-12-15
Filing date: 2021-11-24
Publication date: 2023-08-18

Abstract

Systems, methods, and apparatuses for determining three-dimensional (3D) information of structures of a patterned substrate are described herein. The 3D information may be determined using one or more models configured to generate 3D information (e.g., depth information) using only a single image of the patterned substrate. In one approach, the model is trained by acquiring a pair of stereoscopic images of the structure of the patterned substrate. The model generates disparity data between a first image and a second image using a first image of the pair of stereoscopic images as an input, the disparity data indicating depth information associated with the first image. The parallax data is combined with the second image to generate a reconstructed image corresponding to the first image. Further, one or more model parameters are adjusted based on the parallax data, the reconstructed image, and the first image.

Description

Apparatus and method for determining three-dimensional data based on an image of a patterned substrate

Cross Reference to Related Applications

The present application claims priority from U.S. application 63/125,522 filed on 12, 15, 2020 and U.S. application 63/132,053 filed on 12, 30, 2020, which are incorporated herein by reference in their entirety.

Technical Field

The description herein relates generally to improved metrology systems and methods. More specifically, methods for determining and employing a model configured to determine three-dimensional data of structures patterned on a substrate using a single image (e.g., SEM image).

Background

For example, a lithographic projection apparatus may be used to manufacture an Integrated Circuit (IC). In this case, the patterning device (e.g., mask) may contain or provide a pattern corresponding to an individual layer of the IC (the "design layout"), and this pattern may be transferred to a target portion (e.g., comprising one or more dies) on a substrate (e.g., a silicon wafer) that has been coated with a layer of radiation-sensitive material (the "resist"), by a method such as irradiating the target portion through the pattern on the patterning device. Typically, a single substrate contains a plurality of adjacent target portions to which a pattern is sequentially transferred one target portion at a time by a lithographic projection apparatus. In one type of lithographic projection apparatus, the pattern on the entire patterning device is transferred onto a target portion at a time; such devices are commonly referred to as steppers. In an alternative apparatus, commonly referred to as a step-and-scan apparatus, the projection beam is scanned over the patterning device in a given reference direction (the "scanning" direction) while synchronously moving the substrate parallel or anti-parallel to the reference direction. Different portions of the pattern on the patterning device are gradually transferred to one target portion. Typically, since the lithographic projection apparatus will have a reduction ratio M (e.g., 4), the speed F of the substrate movement will be 1/M times the speed of the projection beam scanning patterning device. For more information on the lithographic apparatus described herein, see for example US 6,046,792, which is incorporated herein by reference.

The substrate may undergo various processes such as priming, resist coating, and soft baking before transferring the pattern from the patterning device to the substrate. After exposure, the substrate may be subjected to other processes ("post-exposure processes") such as post-exposure bake (PEB), development, hard bake, and measurement/inspection of the transferred pattern. This series of processes is used as a basis for fabricating individual layers of a device (e.g., an IC). The substrate may then undergo various processes such as etching, ion implantation (doping), metallization, oxidation, chemical mechanical polishing, etc., all of which are intended to complete a single layer of the device. If several layers are required in the device, the entire process or variants thereof are repeated for each layer. Eventually, a device will be present in each target portion on the substrate. These devices are then separated from each other by techniques such as dicing or sawing, whereby the individual devices are mounted on a carrier, connected to pins, etc.

Thus, manufacturing devices such as semiconductor devices typically involves processing a substrate (e.g., a semiconductor wafer) using a variety of manufacturing processes to form various features and layers of the device. Such layers and features are typically fabricated and processed using, for example, deposition, photolithography, etching, chemical mechanical polishing, and ion implantation. Multiple devices may be fabricated on multiple dies on a substrate and then separated into individual devices. The device manufacturing process may be considered a patterning process. The patterning process involves a patterning step for transferring a pattern on the patterning device to the substrate, such as optical and/or nanoimprint lithography using the patterning device in a lithographic apparatus, and typically, but optionally, one or more associated pattern processing steps, such as resist development by a developing apparatus, baking the substrate using a baking tool, etching using a pattern using an etching apparatus, and so forth.

Disclosure of Invention

As photolithography and other patterning technology advances, the size of the functional elements continues to decrease, while the number of functional elements (such as transistors) per device steadily increases over several decades. At the same time, the requirements for Critical Dimension (CD), height, etc. precision are becoming more and more stringent. Errors in the shape and size of the structures may cause problems with device function, including device function failure or one or more electronic problems with the functional device. Accordingly, it is desirable to be able to measure the three-dimensional structure of a functional element to characterize reduction or minimization of one or more defects in the device. However, measuring the three-dimensional (3D) structure of a functional element using existing metrology tools and methods is time consuming and inaccurate, which can negatively impact the yield of the patterning process.

In the present disclosure, a system for determining 3D data (e.g., depth information) from a single captured image of a patterned substrate is provided. In one example, the system package: including image capturing devices, such as Scanning Electron Microscopes (SEMs), having electron beam optics configured to capture images of the patterned substrate; and one or more processors including a trained model stored in memory, the processors being operable to receive the captured image and execute the model to determine depth information from the captured image. The one or more processors may be configured to: inputting a captured image of the patterned substrate to a trained model configured to generate depth related data from the single image; and extracting depth information from the captured image by executing the trained model. The trained model may be trained by one or more methods of the present disclosure.

According to one embodiment, a method for determining a model configured to generate data for estimating depth information of a structure of a patterned substrate is provided. The method includes acquiring a pair of images (e.g., SEM1 and SEM 2) of a structure of a patterned substrate, the pair of images including a first image (e.g., SEM 1) captured at a first angle (e.g., 90 ° or perpendicular to the substrate) relative to the patterned substrate and a second image (e.g., SEM 2) captured at a second angle (e.g., 10 ° relative to the vertical of the substrate) different from the first angle. Parallax data between the first image and the second image is generated using the first image as an input of the model. The disparity data indicates depth information associated with the first image. The parallax data is combined with the second image to generate a reconstructed image corresponding to the first image. Further, one or more model parameters of the model are adjusted based on the performance function such that the adjustment causes the performance function to be within a specified performance threshold. The performance function is a function of the parallax data, the reconstructed image, and the first image. The model is configured to generate data that is convertible to depth information of a structure of the patterned substrate.

In one embodiment, another method is provided for generating a model configured to estimate depth data of a structure of a patterned substrate. The method comprises the following steps: acquiring, via a simulator (e.g., SEM simulator), a plurality of simulated metrology images (e.g., simulated SEM images) of the structure, each simulated metrology image of the plurality of simulated metrology images being associated with depth data used by the simulator; generating a model (e.g., CNN) configured to predict depth data from the input image based on the plurality of simulated metrology images and the corresponding simulated depth data; acquiring a captured image (e.g., SEM image) and observed depth data (e.g., measured height map) of a structure patterned on a substrate; and calibrating the model (e.g., CNN) based on the captured image and the observed depth data such that the predicted depth data is within a specified matching threshold of the observed depth data. In one embodiment, calibration of the model involves inputting a captured image (e.g., SEM image) to the model (e.g., CNN) to predict depth data; adjusting the predicted depth data by comparing the predicted depth data with the observed depth data; and adjusting model parameters of the model based on the adjusted predicted depth data such that the model generates depth data that is within a matching threshold of the observed depth data. For example, the predicted height map is adjusted to match the observed height map of the structure.

Furthermore, in one embodiment, another method for generating a model configured to estimate depth data of a structure of a patterned substrate is provided. The method employs training data generated by a process model (e.g., a deterministic process model). The method comprises the following steps: acquiring (i) a plurality of SEM images of a structure associated with programming changes in the mask pattern and (ii) a simulated contour of the structure based on the programming changes, each SEM image of the plurality of SEM images paired with a simulated contour corresponding to the programming changes in the mask pattern; and generating a model for estimating depth data of the structure based on the plurality of SEM images paired with the corresponding simulated contour such that the estimated depth data is within acceptable thresholds of the depth data associated with the simulated contour. In one embodiment, the programming changes include changes in one or more of the following: a change in assist features associated with the mask pattern, a change in primary features associated with the mask pattern, or a change in resist coating thickness. In one embodiment, the simulated contour is a 3D resist contour generated from a calibrated deterministic process model (e.g., a deterministic resist model). In one embodiment, the calibrated deterministic process model is a process model calibrated to meet the Critical Dimension (CD) of the structure but not to meet LCDU, LER, LWR or random variations associated with the structure.

Furthermore, in one embodiment, another method for generating a model configured to estimate depth data of a structure of a patterned substrate is provided. The method employs training data generated by a process model (e.g., a stochastic process model). The method comprises the following steps: acquiring (i) a plurality of SEM images of a structure, (ii) a simulated contour of the structure, and (iii) a Key Performance Indicator (KPI) associated with the simulated contour; and generating a model for estimating depth data of the structure based on the plurality of SEM images, the simulated contours, and the KPIs such that the KPIs associated with the estimated depth data are within acceptable thresholds of the KPIs associated with the simulated contours. In one embodiment, the simulated contours of the structure are generated by a calibrated random process model associated with patterning. In one embodiment, the calibrated stochastic process model is a process model calibrated to satisfy one or more KPIs, the KPIs comprising: critical Dimension (CD) of the structure, local CD uniformity (LCDU) associated with the structure, line Edge Roughness (LER) associated with the structure, defect rate associated with the structure, line Width Roughness (LWR) associated with the line space pattern, contact edge roughness associated with the contact hole, random edge placement error (SEPE), or random variation associated with the geometry of the structure.

In one embodiment, one or more non-transitory computer-readable media are provided that include instructions corresponding to the processes of the methods herein. In one embodiment, one or more non-transitory computer readable media are used to store a model configured to determine three-dimensional (3D) information (e.g., depth data) of a structure from a single image (e.g., SEM image) of the structure formed on a substrate. In one embodiment, one or more non-transitory computer-readable media are configured to generate 3D information via a stored model. In particular, one or more non-transitory computer-readable media store instructions that, when executed by one or more processors, provide the model. In one embodiment, the model is generated by the process of the methods herein.

Drawings

The above-described and other aspects and features will become apparent to those of ordinary skill in the art upon reading the following description of the specific embodiments in conjunction with the accompanying drawings, in which:

FIG. 1 depicts a block diagram of various subsystems of a lithography system, according to one embodiment;

FIG. 2A illustrates an example SEM image of a line having a depth of 10nm in the z-direction from the top of a substrate, according to one embodiment;

FIG. 2B illustrates another example SEM image of lines having a depth of 100nm in the z-direction from the top of the substrate, according to one embodiment;

FIG. 2C illustrates an SEM image and corresponding depth map of contact holes patterned on a substrate, according to one embodiment;

FIG. 2D illustrates an SEM image of a set of lines patterned on a substrate and a corresponding depth map, according to one embodiment;

FIG. 2E illustrates an SEM image and corresponding depth map of another set of lines patterned on a substrate, according to one embodiment;

FIG. 3 is a block diagram of a training process of a model according to one embodiment;

FIG. 4 illustrates a correlation between height associated with a stereoscopic image of a patterned substrate and parallax data, according to one embodiment;

FIG. 5 is a flow chart of a method for training a model for estimating depth information of structures of a patterned substrate, according to one embodiment;

FIG. 6A is a block diagram of an example of determining depth information for a structure formed on a substrate using a model trained according to FIG. 5, according to one embodiment;

FIG. 6B is a flowchart of a method for determining depth information for a structure formed on a substrate using a model trained according to FIG. 5, according to one embodiment;

FIG. 7 is a flowchart of a method for generating a model configured to estimate depth data of a structure of a patterned substrate, according to one embodiment;

FIG. 8A illustrates exemplary training of a model using simulated SEM data, according to one embodiment;

FIG. 8B illustrates exemplary fine-tuning of the model (of FIG. 8A) using metrology data in accordance with one embodiment;

FIG. 8C illustrates an exemplary scaling of estimated depth data generated by the model of FIG. 8A based on observed metrology data, the scaled estimated data may be further used to fine tune the model, according to one embodiment;

9A, 9B, and 9C illustrate SEM images with charging effect, predicted depth data with charging effect, and predicted depth data without charging effect, respectively, according to one embodiment;

FIG. 10A is a flowchart of a method for estimating depth data using a trained model (of FIG. 7) using a single SEM image as input, according to one embodiment;

FIG. 10B illustrates exemplary depth data estimated using a trained model according to one embodiment;

FIG. 11 is a flow chart of another method for training a model for estimating depth information of structures of a patterned substrate, according to one embodiment;

FIG. 12 illustrates exemplary training of a CNN based on the method of FIG. 11, using training data generated by a deterministic process simulator, according to one embodiment;

FIG. 13 is a flow chart of another method for training a model for estimating depth information of structures of a patterned substrate, according to one embodiment;

FIG. 14 illustrates exemplary training of the CNN based on the method of FIG. 13, using training data generated by a deterministic process simulator, according to one embodiment;

FIG. 15 is a flowchart of a method for estimating depth data using a trained model (of FIG. 11 or FIG. 13) and using a single SEM image as input, according to one embodiment;

FIG. 16 schematically depicts one embodiment of a Scanning Electron Microscope (SEM) according to one embodiment;

FIG. 17 schematically depicts an embodiment of an electron beam inspection device according to an embodiment;

FIG. 18 is a block diagram of an example computer system, according to one embodiment;

FIG. 19 is a schematic view of a lithographic projection apparatus according to an embodiment;

FIG. 20 is a schematic diagram of another lithographic projection apparatus according to an embodiment;

FIG. 21 is a more detailed view of the apparatus of FIG. 19, according to one embodiment; and

Fig. 22 is a more detailed view of the source collector module SO of the apparatus of fig. 20 and 21 according to one embodiment.

Detailed Description

Integrated Circuit (IC) chips used in devices (e.g., telephones, notebook computers, computer memories, etc.) include complex circuit patterns. During the manufacture of such circuit patterns, an image of the printed circuit pattern is captured to determine whether the desired circuit pattern is accurately printed. The final performance of the fabricated device is largely dependent on the accuracy of positioning and sizing of the various features of the product structure formed via photolithography and other processing steps. These features are three-dimensional (3D) structures having predetermined depths and shapes on the nanometer scale. Product structures made by imperfect lithographic processes or other processing steps will result in structures that differ slightly from the ideal or nominally desired structure.

To examine the dimensions of various features, three-dimensional information (e.g., the height of the features) is very beneficial to ensure that features at one layer are connected to features at another layer. However, acquiring 3D information of a structure on the nanometer scale is not a trivial task. In the prior art, 3D information may be acquired via a tilted beam Scanning Electron Microscope (SEM), wherein two or more images targeting the same location are required in order to infer the appropriate depth information. However, there are several limitations to using multiple images for 3D measurement. For example, capturing a pair of stereo images reduces the throughput of the patterning process or metrology process by having to switch between beam tilt angles. Proper alignment between images is required to capture a stereoscopic image. Processing stereo images to determine depth information can be computationally expensive and can easily produce noise in the images and drift in the metrology hardware. Thus, extracting 3D information using prior art techniques may slow down the chip manufacturing and metrology process.

Although specific reference may be made in this text to the manufacture of ICs, it should be clearly understood that the description herein has many other possible applications. For example, it may be used to manufacture integrated optical systems, guidance and detection patterns for magnetic domain memories, liquid crystal display panels, thin film magnetic heads, and the like. Those skilled in the art will appreciate that any use of the terms "reticle," "wafer," or "die" herein, in the context of such alternative applications, should be considered interchangeable with the more general terms "mask," "substrate," and "target portion," respectively. The substrate referred to herein may be processed, before or after exposure, in for example a track (a tool that typically applies a layer of resist to a substrate and develops the exposed resist) or a metrology or inspection tool. Where applicable, the disclosure herein may be applied to such and other substrate processing tools. Furthermore, the substrate may be processed more than once, for example in order to create a multi-layer IC, so that the term substrate used herein may also refer to a substrate that already contains multiple processed layers.

The Critical Dimension (CD) of a device refers to the minimum width of a line or hole, or the minimum space between two lines or holes. Thus, the CD determines the overall size and density of the device being designed. Of course, one of the goals of device fabrication is to faithfully reproduce the original design intent on the substrate (via the patterning device).

In this document, the terms "radiation" and "beam" may be used to encompass all types of electromagnetic radiation, including ultraviolet radiation (e.g., having a wavelength of 365nm, 248nm, 193nm, 157nm, or 126 nm) and EUV (extreme ultra-violet radiation, e.g., having a wavelength in the range of about 5-100 nm).

The term "mask" or "patterning device" used herein can be broadly interpreted as referring to a generic patterning device that can be used to impart an incoming radiation beam with a patterned cross-section that corresponds to a pattern to be created in a target portion of the substrate; the term "light valve" may also be used in this context. Examples of other such patterning means, besides classical masks (transmissive or reflective; binary, phase-shifted, hybrid, etc.), include:

-a programmable mirror array. One example of such a device is a matrix-addressable surface having a viscoelastic control layer and a reflective surface. The basic principle behind such a device is that addressed areas of the reflective surface reflect incident radiation as diffracted radiation, whereas unaddressed areas reflect input radiation as undiffracted radiation, for example. Using an appropriate filter, the above undiffracted radiation can be filtered out of the reflected beam, leaving only the diffracted radiation; in this way, the beam is patterned according to the addressing pattern of the matrix-addressable surface. The required matrix addressing may be performed using suitable electronic means.

-a programmable LCD array. An example of such a configuration is given in U.S. Pat. No. 5,229,872, which is incorporated herein by reference.

As a brief introduction, FIG. 1 depicts an exemplary lithographic projection apparatus 10A. The main component is a radiation source 12A, which radiation source 12A may be a deep ultraviolet excimer laser source, or other type of source, including an Extreme Ultraviolet (EUV) source (as described above, the lithographic projection apparatus itself need not have a radiation source), illumination optics, which define, for example, partial coherence (denoted sigma) and may include: optics 14A, 16Aa, and 16Ab that shape the radiation from source 12A; patterning device 18A; and transmissive optics 16Ac that project an image of the patterning device pattern onto substrate plane 22A. An adjustable filter or aperture 20A at the pupil plane of the projection optics may limit the range of beam angles incident on the substrate plane 22A, where the maximum possible angle defines the numerical aperture na=n sin (Θ _max ) Where n is the refractive index of the medium between the substrate and the final element of the projection optics, Θ _max Is the maximum angle of the beam emerging from the projection optics that can still be incident on the substrate plane 22A.

In a lithographic projection apparatus, a source provides illumination (i.e., radiation) to a patterning device, and projection optics direct and shape the illumination onto a substrate via the patterning device. The projection optics may include at least some of the components 14A, 16Aa, 16Ab, and 16 Ac. The Aerial Image (AI) is the radiation intensity distribution at the substrate level. The resist layer on the substrate is exposed and the aerial image is transferred to the resist layer as a latent "resist image" (RI) therein. The Resist Image (RI) may be defined as the spatial distribution of the solubility of the resist in the resist layer. A resist model may be used to calculate a resist image from a aerial image, an example of which may be found in U.S. patent application publication No. US 2009-0157360, the disclosure of which is incorporated herein by reference in its entirety. The resist model is only related to the properties of the resist layer (e.g., the effects of chemical processes that occur during exposure, PEB, and development). The optical properties of the lithographic projection apparatus (e.g. the properties of the source, patterning device and projection optics) determine the aerial image. Since the patterning device used in the lithographic projection apparatus may be changed, it may be desirable to separate the optical properties of the patterning device from the optical properties of the rest of the lithographic projection apparatus, including at least the source and the projection optics.

Once the semiconductor chip is fabricated, measurements may be performed to determine the dimensions of structures fabricated on the substrate. For example, inspection of the fabricated structure may be based on Critical Dimension (CD) measurements obtained using metrology tools (e.g., SEM, AFM, optical tools, etc.). In one embodiment, the lateral dimensions of the structure (e.g., in the x, y plane) may be extracted from the CD measurements. In addition to the lateral dimensions of the features, the measurements (e.g., CD-SEM) may also include information about the 3D information of the features (e.g., height, CD values at different heights, height profile, etc.). In one embodiment, the height of a feature refers to the depth in the z direction perpendicular to the x-y plane.

Fig. 2A and 2B illustrate two different SEM images of lines having the same CD but different heights. Fig. 2A illustrates an SEM image I1 of a line having a depth of 10nm in the z-direction (perpendicular to the plane of fig. 2A and 2B) from the top of the substrate, and fig. 2B illustrates another SEM image I2 of a line having a depth of 100nm in the z-direction from the top of the substrate. These figures show that depth has a significant effect on the captured SEM images. However, although 3D information (e.g., altitude information) is of interest, it is rarely extracted. In the prior art, only lateral information such as CD can be extracted, but not the height.

For many of the problems associated with semiconductor manufacturing, there may be a correlation between input and output, but such correlation may be too complex to be modeled or identified by humans. In accordance with the present disclosure, the model is trained to extract depth information from a single image (e.g., SEM image) of the substrate. For example, a deep learning algorithm may be applied to train a machine learning model, such as a convolutional neural network (e.g., CNN), using the training data and training methods described herein. In one embodiment, the methods described herein enable the extraction of the height of a feature from a single image of a substrate. For example, fig. 2C-2E illustrate different SEM images I3, I4, and I5 according to the present disclosure, for each of which a depth map may be extracted from the SEM images. In fig. 2C-2E, the input SEM images I3, I4, and I5 are depicted in the top row, and the corresponding depth maps 3D3, 3D4, and 3D5 are depicted in the bottom row, respectively.

In one embodiment, the beam tilt function of the SEM tool may be used to retrieve depth information from a single CD-SEM image. The unsupervised training of the model may be performed using the stereo image pair (e.g., acquired from the beam tilt SEM images) as a training dataset. When measuring semiconductor topology with a CD-SEM, the electron beam is typically incident normal to the substrate, producing a top-down image. This measurement method can resolve substrate measurements with lateral resolution near 1nm, while axial resolution is typically ignored, due to the ill-defined problem of how to convert SEM signals to topological depth. To address this inappropriateness problem, a tilted beam SEM may be used to generate information about the depth of the structure. For example, "c.valade, et al, tilled beam SEM,3D metrology for industry,Proc.SPIE 10959,Metrology,Inspection,and Process Control for Microlithography XXXIII,109590Y (26 march 2019)" describes an example oblique beam SEM for performing 3D metrology. In such systems, a stereoscopic image pair created by oblique electron beams from different directions is typically used.

For example, the SEM tool can direct the electron beam at a desired angle (e.g., up to 12 ° relative to the vertical of the substrate) to capture images at different angles to obtain more information. From analysis of such images, pattern height and sidewall angle can thus be determined using geometric factors.

However, in a tilted beam SEM, two or more images targeting the same location are required to infer the appropriate depth information of the structure. For example, for the same structure on a substrate, two images are acquired, a first image when the normal beam is incident and a second image when the beam is incident at an oblique angle. There are several limitations to using multiple images for 3D measurement. For example, capturing a pair of stereo images reduces the throughput of the patterning process or metrology process by having to switch between beam tilt angles. Proper alignment between images is required to capture a stereoscopic image. Processing stereo images to determine depth information can be computationally expensive and can easily create noise in the image and drift in the metrology hardware. Capturing stereo images requires metrology hardware to facilitate ease of use when changing the tilt direction of the beam. Capturing images at oblique angles reduces the effective field of view because only the areas present in all images can be used for depth estimation.

In some embodiments, the present disclosure describes mechanisms for estimating the depth of a structure using a single image (e.g., SEM image) captured at a first tilt angle (e.g., perpendicular to a substrate). Thus, this potentially alleviates most of the disadvantages associated with the stereoscopic images described above. For example, once a model is trained in accordance with the present disclosure, depth information may be determined using a single SEM image, thereby increasing yield, reducing metrology time, and reducing computational resources, as compared to using a stereo image based method to determine depth information.

In some embodiments, the present disclosure describes an unsupervised training method for training a machine learning model (e.g., CNN). In one embodiment, the term "unsupervised" refers to a training method in which no ground truth depth data is used in training the model or manual intervention is involved during the training process. The method itself herein has sufficient capability to learn how to perform depth estimation. In one embodiment, a Convolutional Neural Network (CNN) with a U-shaped mesh architecture (e.g., encoder, decoder, and skip connection between the two) may be employed to train the CNN.

FIG. 3 is a block diagram of a training process for model M0 according to one embodiment of the present disclosure. After training, the trained model may be referred to as model M1. As shown in fig. 3, a model M0 (e.g., CNN) receives an image 301 of a substrate as an input, and outputs parallax data 311 associated with the input image. The input image 301 may be a vertical SEM image or a tilted SEM image of the patterned substrate. In this embodiment, the term "normal image" or "vertical SEM image" refers to a top view image of the substrate capturing a first topology of the structure as viewed from the top. In one embodiment, the normal image may be acquired via a metrology tool configured to direct an electron beam perpendicular to the substrate. The term "tilted image" or "tilted SEM image" refers to a tilted view image of a substrate that captures a second topology of structures viewed at an angle relative to the vertical of the substrate. In one embodiment, the oblique image may be acquired via a metrology tool configured to orient the electron beam at an angle relative to the substrate.

In fig. 3, the model M0 is configured to output data characterizing the difference between two images of the same structure. From such difference data, depth information of the structure can be determined. In one embodiment, model M0 generates disparity data 311, which disparity data 311 can be converted to depth information 315 for structures on the substrate. For example, the parallax data 311 may be converted into the depth information 315 by multiplying the size of the parallax data 311 by a conversion function such as a scaling factor k. In one embodiment, the transfer function may be a linear function, a non-linear function, or a constant determined by comparing the result of applying the transfer function to one of the two images. In fig. 3, as one example, the parallax data 311 is visually represented as a parallax map or image. In the present disclosure, the parallax data 311 refers to the difference in coordinates of similar features within two stereoscopic images (e.g., two SEM images of the same structure). In one embodiment, the magnitude of the displacement vector in the disparity map is proportional to the depth of the structure.

To train the model M0, the parallax data 311 may be transformed into another image 321 corresponding to the input image 301 of the model M0 via a transformation operation T1. The transform operation T1 represents any transform function that modifies the oblique image 302 using the parallax data 311. For example, the transformation function may be a synthesis or convolution operation between the parallax data 311 and the oblique image 302, which produces another image that should correspond to the normal image.

In this example, the parallax data 311 is combined with the oblique image 302 (the oblique image 302 is measured at a specified beam tilt (not vertical)) to map the pixels of the oblique image 302 with the vertical SEM image. In one embodiment, the oblique image 302 may be represented as a function m, and the disparity data 311 (e.g., a graph) may be represented as another function Φ. By combining the parallax data 311 with the oblique image 302, a reconstructed image can be acquired. For example, the reconstructed image is represented by a function obtained from the synthesis of the functions m and Φ. In one embodiment, the reconstructed image may be represented asWherein the symbol->Representing synthetic operations between functions, e.g.>x is the vector of xy coordinates of the image. If the estimated parallax data 311 is accurate, the reconstructed image is expected to be very similar to the input SEM image. For example, the reconstructed image has a similarity of greater than 95% with the vertical SEM image input to model M0. If the reconstructed images are not similar, the model M0 is modified or trained to make the reconstructed images similar to the input SEM image. Modification of the model M0 may involve adjusting one or more parameters (e.g., weights and deviations) of the model M0 until the model generates a satisfactory reconstructed image. In one embodiment, the adjustment of one or more parameters is based on differences between the reconstructed image and the input SEM image. In one embodiment, the performance function may be used to guide the adjustment of one or more parameters of model M0 The performance function is the difference between the reconstructed image and the incoming SEM image.

In one embodiment, training of model M0 involves adjusting model parameters to achieve minimization of the performance function. At the end of the training process, model M0 is referred to as model M1 or trained model M1. In one embodiment, adjusting includes determining a gradient map of the performance function by taking derivatives of the performance function with respect to one or more model parameters. The gradient map directs the adjustment of one or more model parameters in a direction that minimizes or brings the performance function value within a specified threshold. For example, the specified threshold may be such that the difference in the performance function value between the current iteration and the subsequent iteration is less than 1%.

In one embodiment, the performance function includes a similarity penalty indicating the similarity between the reconstructed image and the input SEM image (e.g., represented as function f)In one embodiment, a similarity loss function may be calculated for a plurality of images of a structure, each image acquired at a different angle. For example, the performance function may be modified to include a loss function that is calculated as a sum of similarities between images of the plurality of images and corresponding reconstructed images.

Additionally, the performance function may include another loss function L determined based on prior information about parallax characteristics of a pair of stereoscopic SEM images _prior (phi) is provided. The a priori information includes, but is not limited to, disparities characterized as piecewise smooth functions, disparities characterized as piecewise constant functions, or disparities characterized as functions that are allowed to jump at feature edges in the normal image. For example, edges of features may be detected by applying gradient operations to an image (e.g., normal image). The gradient operation identifies abrupt changes in the slope of the intensity profile of the image (e.g., normal image) at one or more locations, such locations may be characterized as edges where parallax may jump from one function type to another.

As previously described, the parallax function may be determined based on previous stereoscopic images of one or more previously patterned substrates. For example, the disparity may be a piecewise smoothing function, wherein the derivative of the disparity is piecewise continuous. For example, the parallax associated with structures having non-perpendicular walls will be piecewise continuous. In another example, the disparity may be piecewise constant. For example, the parallax associated with a structure having vertical walls will be piecewise constant. In yet another example, the disparity may be a function of having jumps at edges of structures within the image, which edges are detected based on gradients of intensity profiles within the image. For example, parallax relates to structures having both vertical and non-vertical walls, where the location of the jump is determined from SEM images. Thus, the present method can incorporate parallax data from previously patterned substrates in the form of parallax functions to train a machine learning model. Using such a disparity function enables faster convergence or faster training of the model M0 and more accurate results.

In one embodiment, the model M0 (e.g., CNN) may be trained based on a performance function expressed as a sum of a similarity loss function associated with a previously patterned substrate and another loss function (see equation below). In one example, by modifying one or more model parameters of the CNN, the performance function L is minimized during training, as follows:

in accordance with the present disclosure, referring to fig. 4, the inventors determined based on experiments that depth information of structures patterned on a substrate can be inferred from differences between two or more SEM images captured at different tilt angles. For example, the experiment involves generating a depth map and feeding it into a monte carlo SEM simulator to calculate stereo pairs of denoised SEM images: one SEM image is 0 degrees from normal (top view image) and one SEM image is 5.7 degrees from normal (oblique image). An example Monte Carlo simulator configured to generate SEM images is discussed in "L.van Kessel, and C.W. Hagen, nebula: monte Carlo simulator of electron-matter interaction, softwarex, volume 12,2020,100605,ISSN 2352-7110". In the experiment, in both SEM images, a set of points was identified and the relevant parallax data was analyzed. For example, the set of points corresponds to a step transition in the height profile of the structure that produces a peak in the SEM signal. For the set of points, parallax data (e.g., displacement in x-coordinate) for each point between the two SEM images is calculated. The height or depth of each point is plotted according to the parallax as shown in fig. 4. Based on the line 401 fitted between the parallax data and the height at the set of points, it can be concluded that there is a strong correlation between the parallax between the two SEM images and the height of the structure.

In one embodiment, once model M1 is trained, prediction of depth information may be performed based on a single image (e.g., SEM image) of any patterned substrate. Fig. 6A illustrates the use of a single SEM image 351 to estimate height or depth information associated with a structure. In fig. 6A, a model M1 (e.g., trained as shown in fig. 3) uses a vertical SEM image 351 of a structure as input to estimate parallax data 355. The parallax data 355 is further converted into height data 360 of structures patterned on the substrate by applying a conversion function k to the parallax data 355. For example, the size of the estimated disparity data may be multiplied by a constant to generate depth information. Therefore, no tilting of the image is required to infer the height information of the structure.

Fig. 5 is an exemplary flow chart of a method 500 for determining a model for estimating depth information of a structure of a patterned substrate. In one example implementation, the method 500 includes the following processes P502, P504, P506, and P508, discussed in detail below.

Process P502 involves acquiring a pair of images IM1 and IM2 of the structure of the patterned substrate. The pair of images includes a first image IM1 captured at a first angle relative to the patterned substrate and a second image IM2 captured at a second angle different from the first angle. As one example, the first image IM1 captures top view details of the structure and the second image IM2 captures angular view details of the same structure from another angle (e.g., an angle between 1 and 15 ° from the vertical of the substrate). One example of the first image IM1 may be a normal image 301 (see fig. 3), and the second image IM2 may be a tilted image 302 (see fig. 3). In one embodiment, the first image IM1 is a normal image associated with an electron beam oriented perpendicular to the patterned substrate and the second image IM2 is associated with an electron beam oriented at an angle greater than 90 ° or less than 90 ° relative to the patterned substrate. In one embodiment, multiple pairs of images may be acquired and used as a training dataset. In one embodiment, the pair of images acquired via the metrology tool includes a plurality of pairs of SEM images of the patterned substrate. Each pair includes a first SEM image associated with a first beam tilt setting of the metrology tool and a second SEM image associated with a second beam tilt setting of the metrology tool. In one embodiment, each image of the pair of images is captured from a different SEM tool. For example, a first SEM image is captured by a first SEM tool configured to capture a normal image of a structure, and a second SEM image is captured by a second SEM tool configured to capture an image of the same structure at a second electron beam tilt setting.

The process P504 involves generating disparity data DD1 between the first image IM1 and the second image IM2 via a model M0 using the first image IM1 as an input, the disparity data DD1 indicating depth information associated with the first image IM 1. In one embodiment, the disparity data DD1 includes a coordinate difference of similar features within the first image IM1 and the second image IM 2. In one embodiment, after the training process, model M1 is considered a trained model of M0. In one embodiment, the model M0 or M1 may be a machine learning model configured to predict parallax data using a single image of the substrate. For example, the model M0 or M1 may be a convolutional neural network CNN, deep CNN, or other machine learning model.

The process P506 involves applying the parallax data DD1 to the second image IM2 to generate a reconstructed image corresponding to the first image IM 1. In one embodiment, the reconstructed image is generated by performing a synthesis operation between the disparity data DD1 and the second image IM2 to generate the reconstructed image. For example, fig. 3 illustrates a reconstructed image 321 generated based on parallax data (represented as a parallax map 311) and the oblique image 302.

Process P508 involves adjusting one or more parameters of model M1 based on the performance function such that the performance function is within a specified performance threshold. The performance function may be a function of the parallax data DD1, the reconstructed image and the first image IM 1. The model M1 is configured to generate data that can be converted into depth information of the structure of the patterned substrate.

In one embodiment, the performance function further includes a loss function calculated based on parallax characteristics associated with a pair of stereoscopic images of the previous patterned substrate or substrates and the parallax data DD1 predicted by the model M1. In one example, the disparity characteristic may include a disparity characterized as a piecewise smoothing function, wherein derivatives of the disparity are piecewise continuous. In another example, the disparity characteristic may include piecewise constant disparity. In yet another example, the disparity characteristic may include a disparity characterized as a function of having jumps at edges of structures within the image, the edges being detected based on gradients of intensity profiles within the image. An exemplary performance function L is discussed above with respect to fig. 3.

In one embodiment, the training is based on multiple images of the structure acquired at different angles. Accordingly, the performance function may include a loss function related to a sum of similarities between images of the plurality of images and corresponding reconstructed images.

In one example, the performance function is minimized during training by modifying one or more model parameters of the CNN. For example, the performance function L is calculated as a sum of loss functions related to parallax characteristics of the reconstructed image and the first image, and the previous stereoscopic image of the previous patterned substrate. For example, the performance function may be calculated as As previously described.

In one embodiment, the one or more parameters of the adjustment model M0 are an iterative process, each iteration comprising determining a performance function based on the disparity data DD1 and the reconstructed image. The iteration further includes: determining whether the performance function is within a specified performance threshold; and responsive to the performance function not being within the specified variance threshold, adjusting one or more parameters of the model M0 such that the performance function is within the specified performance threshold. The adjustment may be based on a gradient of the performance function relative to one or more parameters. Once the model M0 is trained, the model M1 may be applied to determine depth information for any structure based on a single image (e.g., vertical SEM image) of any patterned substrate.

In one embodiment, the method 500 may further include a step for applying the trained model M1 during measurement or inspection of the patterned substrate. In one embodiment, the method 500 may include a process for acquiring an SEM image of a patterned substrate via a metrology tool at a first electron beam tilt setting of the metrology tool. The SEM image may be a normal image obtained by directing an electron beam substantially perpendicular to the patterned substrate. The method 500 may further include: executing a model M1 using the SEM image as an input to generate parallax data associated with the SEM image; and applying a conversion function (e.g., a linear function, a constant conversion factor, or a nonlinear function) to the parallax data to generate depth information of the structure in the SEM image. The method 500 includes determining physical properties of a structure of the patterned substrate based on the depth information. In one embodiment, the physical characteristics may include the relative positioning of shapes, sizes, or polygonal shapes with respect to each other at one or more depths of features of the structure.

In one embodiment, the processes described herein may be stored on a non-transitory computer readable medium in the form of instructions that, when executed by one or more processors, cause the steps of the present method. For example, the medium includes operations that include receiving a vertical SEM image of a structure on a patterned substrate via a metrology tool (e.g., SEM tool). The vertical SEM image is associated with an electron beam directed perpendicular to the patterned substrate. Further, the medium includes operations for performing a model using the vertical SEM image to determine depth information of structures on the patterned substrate. As discussed herein, the model may be trained to estimate depth information from only a single SEM image and stored on the medium. For example, model M1 (e.g., CNN) may be trained using the method of FIG. 5.

In one embodiment, the medium is further configured to determine a physical property of the structure of the patterned substrate based on the depth information. For example, physical characteristics include, but are not limited to, the relative positioning of shapes, sizes, or polygonal shapes with respect to one another at one or more depths of features of the structure.

In one embodiment, the medium is further configured to determine a defect in the structure based on the physical characteristic, the defect indicating that the structure does not meet the design specification. Based on the defects, one or more parameters of the patterning process (e.g., resist or etch-related parameters) may be adjusted to eliminate the defects during subsequent stages of the patterning process.

Fig. 6B is a flowchart of a method 600 for determining depth information for a structure formed on a substrate using a model trained in accordance with fig. 5, according to one embodiment. Depth information is determined based on scanning electron microscope ("SEM") images of the structure. Depth information is determined using a Convolutional Neural Network (CNN) that is trained to simulate parallax effects present in paired stereo SEM images. Method 600 includes processes P602, P604, and P606. Process P602 involves receiving SEM image SEM1 of structures patterned on a substrate via an SEM tool.

Process P604 involves inputting SEM image SEM1 into CNN1 (one example of model M1 trained according to method 500) to predict parallax data 605 associated with SEM image SEM1. In one embodiment, CNN1 may be trained according to FIG. 5. For example, CNN1 may be trained by: acquiring a stereoscopic pair of SEM images of the patterned substrate via the SEM tool, the stereoscopic pair comprising a first SEM image acquired at a first beam tilt setting of the SEM tool and a second SEM image acquired at a second beam tilt setting of the SEM tool; generating using parallax data between the first SEM image and the second SEM image; combining the parallax data with the second SEM image to generate a reconstructed image of the first SEM image; and comparing the reconstructed image with the first SEM image.

Process P606 involves generating depth information 610 associated with structures patterned on the substrate based on the predicted disparity data 605.

In one embodiment, the depth information may also be used to determine defects of the structure by comparing CDs of features at different depths and determining whether the CDs meet design specifications. In one embodiment, the depth information may be used to determine one or more parameters of a patterning process, such as a resist process, an etching process, or other patterning related process, such that the 3D structure of the feature is within a desired design specification.

In some embodiments, described herein are methods and systems for semi-supervised training of depth estimation models of structures formed on a substrate and employing the trained models to predict depth data from a single SEM image. In some systems, to acquire depth information related to a structure, two images of the structure are acquired, the first image being at normal beam incidence and the second image being at beam tilt. However, adjusting the alignment is time consuming in addition to being difficult and inaccurate, which can negatively impact manufacturing yield.

The present disclosure overcomes these problems by predicting depth data associated with SEM images using a machine learning model, such as a convolutional neural network ("CNN"), that uses SEM images as input. However, the use of machine learning models presents additional technical problems in that the training data may be sparse and not of adequate diversity, resulting in poor predictions. To overcome this problem, the machine learning model described herein is trained to predict depth data of an input SEM image based on a plurality of simulated SEM images labeled with simulated depth data. For example, by using simulated depth data, the system can fully control the number and diversity.

While the use of simulated SEM images overcomes the technical problem of sparse training data, and using simulated SEM images as input to a model provides qualitatively reasonable depth information, the depth information generally still does not match the observed depth information. To overcome this second technical problem, the machine learning model described herein includes a second supervised calibration step. In particular, the machine learning model is calibrated to scale the predicted depth data to correspond to observed depth data of an actual SEM image determined using the SEM tool. For example, to correct this problem, the system scales the predicted depth information to that observed using SEM tools or other metrology tools, and uses the scaled predicted depth information as a new ground truth for the subsequent training process.

FIG. 7 is a flow chart of a method for generating a model 710, the model 710 configured to estimate depth data of a structure of a patterned substrate, according to one embodiment. In one embodiment, method 700 is a semi-supervised training of depth estimation model 710 for structures formed on a substrate from SEM images of the structures. For example, in a semi-supervised training approach, the depth estimation model 710 is initially trained using simulation data, further, model predicted depth data is modified using observed depth data of the structure, and the modified depth data is used as a reference to further fine tune the depth estimation model 710. Method 700 is discussed in further detail with respect to processes P702, P704, P706, and P710.

Process P702 includes acquiring a plurality of analog metrology images 703 of a structure via simulator 701. Each of the plurality of analog metrology images 703 is associated with depth data used by the simulator 701. For example, the depth data used in simulator 701 (also referred to as simulator depth data) may be a shape, size, sidewall angle, relative position of polygons, a material, or a depth-related parameter associated with each layer of the substrate. The simulator 701 is configured to perturb the simulator depth data to generate a simulated metrology image. For example, simulator depth data that may be adjusted includes, but is not limited to, the shape of the resist profile of the structures in the resist layer, and the sidewall angles of the resist profile. For each adjusted simulator depth data, a corresponding metrology image (e.g., SEM image) may be generated.

In one embodiment, the simulator 701 is a Monte Carlo simulator configured to generate a simulated metrology image 703 of the structure by varying depth-related parameters defined in the Monte Carlo simulator. An example Monte Carlo simulator configured to generate SEM images is discussed in "L.van Kessel, and C.W. Hagen, nebula: monte Carlo simulator of electron-matter interaction, softwarex, volume 12,2020,100605,ISSN 2352-7110". The monte carlo simulator is provided as an example, but the present disclosure is not limited to a particular simulator.

Process P704 includes generating a model M70 based on the plurality of simulated metrology images 703 and the corresponding simulated depth data, the model M70 configured to predict the depth data from the input image.

In one embodiment, generating model M70 is an iterative process. Each iteration includes predicting depth data, comparing the predicted data to simulator data, and generating a model based on the comparison. For example, process P704 includes inputting a plurality of analog metrology images 703 into model M70 to predict depth data associated with each of the plurality of analog metrology images 703. The predicted depth data is compared to simulator depth data. Based on the comparison, model parameters of model M70 are adjusted such that the predicted depth data is within a specified matching threshold of the simulator depth data. For example, the predicted depth data has more than 90% matches with the simulator depth data. For example, the similarity of the height profile of the predicted depth data and the height profile of the simulator depth data exceeds 90%. Model M70 is further trimmed based on the observed data, which is part of semi-supervised training.

Process P706 includes acquiring a captured image 707 and observed depth data 708 of a structure patterned on a substrate. In one embodiment, the captured image 707 is acquired via an image capture tool and the observed depth data 708 is acquired from one or more metrology tools. In one embodiment, the image capture tool is an SEM tool and the captured image is an SEM image. In one embodiment, the observed depth data is obtained from a metrology tool. The metrology tool is one or more of an optical metrology tool or an Atomic Force Microscope (AFM), the optical metrology tool configured to measure a structure of the patterned substrate and extract depth information based on a diffraction-based measurement of the patterned substrate. For example, the observed depth data 708 includes a height profile of a structure captured by an atomic force microscope tool (AFM), or shape parameter data captured by an optical scatterometry tool (e.g., yieldstar). In one embodiment, the metrology tool is an optical metrology tool configured to measure a structure of the patterned substrate and extract depth information based on diffraction-based measurements of the patterned substrate.

In one embodiment, the observed depth data is from one-dimensional height data of structures tracked from the captured image 707. Similarly, simulator depth data may include one-dimensional height data determined from a simulated image of structures on a patterned substrate. In one embodiment, the observed depth data includes two-dimensional height data of structures tracked from the captured image 707. Similarly, simulator depth data may include two-dimensional height data extracted from a simulated image of structures on a patterned substrate. In another embodiment, the observed depth data includes shape parameters acquired from an optical metrology tool used to measure structures of the patterned substrate. Similarly, shape parameters may be extracted from simulator depth data. As one example, the one-dimensional height data includes a height profile of the structure along a cut line of the captured image 707 (e.g., height profile 3DD1 shown in fig. 8B). In one embodiment, the two-dimensional height data includes the height of the structure along the first and second directions on the captured image 707. In one embodiment, the shape parameters include one or more of the following: top CD measured at the top of the structure, bottom CD measured at the bottom of the structure, or side wall angle of the structure. For example, the shape parameter data 3DD2 is shown in fig. 8B.

Process P710 includes calibrating model M70 based on captured image 707 and observed depth data 708 such that the predicted depth data is within a specified match threshold of the observed depth data 708. After calibration, model M70 may be referred to as model 710, trained model 710, or calibrated model 710.

In one embodiment, the calibration model M70 is an iterative process. Each iteration includes adjusting predicted depth data based on the observed data to generate modified predicted data for training. For example, each iteration includes inputting the captured image 707 to the model M70 to predict depth data. The predicted depth data is adjusted by comparing the predicted depth data with the observed depth data 708. Based on the adjusted predicted depth data, model parameters of model M70 may be adjusted such that model M70 generates depth data that is within a matching threshold of observed depth data 708.

In one embodiment, the adjustment of the predicted depth data comprises: extracting a one-dimensional height profile of the structure along a given direction from the predicted depth data; comparing the predicted height profile with a one-dimensional height profile of the observed depth data 708 of the structure along the given direction; and modifying the predicted height profile to match the height profile of the observed depth data 708 of the structure.

In one embodiment, the adjustment of the predicted depth data comprises: extracting predicted shape parameters of the structure from the predicted depth data, and extracting real shape parameters from the observed depth data 708; comparing the predicted shape parameters with the actual shape parameters of the structure; and modifying the predicted shape parameters to match the real shape parameters.

In one embodiment, the adjustment of the predicted depth data comprises: deriving a predicted average height of the structure from the predicted depth data of the structure, and deriving a true average height of the structure from the observed depth data 708; and scaling the predicted average height to match the true average height. For example, the scaling factor (e.g., factor SF1 in fig. 8C) is calculated as a ratio of the average height calculated from the predicted depth data to the average height obtained from the optical scatterometry tool (e.g., yieldstar).

FIG. 8A illustrates one example of generating an initial model M70 using simulated metrology images (e.g., generated via a Monte Carlo SEM simulator) and simulated depth data. In fig. 8A, convolutional neural network M70 (e.g., with encoder-decoder, U-net, or Res-net architecture) is trained based on a sufficiently large data set generated according to the monte carlo SEM model. The advantage of using simulation data in this step is that it provides complete control over the number and diversity of training examples (e.g., SEM settings, feature geometry, and feature materials).

As one example, the composite or simulated SEM image SSEM1 and corresponding simulator data SDEP generated by the SEM model may be used to train the initial CNN M70. In one embodiment, CNN M70 predicts depth data that is compared to analog data SDEP. Based on the comparison, one or more weights and deviations of one or more layers of the CNN are adjusted such that the predicted data matches the simulated data SDEP. In one embodiment, the difference between the predicted depth data and the depth data extracted from the analog data SDEP may be calculated. During training of CNN M70, the weights and biases are modified in each iteration, thereby reducing the differences between the predicted and simulated data. In one embodiment, the training process terminates when the variance cannot be further reduced in subsequent iterations. At the end of training, model M70 is trained sufficiently to predict depth data from the input SEM image. However, as previously described, the predicted data of CNN M70 may not accurately correspond to the observed data. Thus, CNN M70 is further trimmed, as discussed with respect to fig. 8B and 8C.

FIG. 8B illustrates observed depth data obtained from different metrology tools (e.g., AFM, SEM, or YIeldstar) that may be used to fine tune the model CNN M70. In this example, pairs of experimental SEM images SEM80 and subsets of metrology depth data 3DD1 or 3DD2 (e.g., from metrology tools Yieldstar or AFM) are generated. Experimental SEM image SEM80 serves as an input to CNN M70 to provide qualitatively reasonable depth data 3DM, such as a height map. However, in one embodiment, the predicted height map 3DM may not match the observed height maps 3DD1 or 3DD2. For example, height map 3DD1 is a 1D height map of the cut line across the AFM data and height map 3DD2 is a 2D height map taken from an optical scatterometry tool (e.g., YIeldstar). To correct for this, the predicted height map may be scaled to the observed height data 3DD1 or 3DD2. The scaled predicted height map is used as a new ground truth for the subsequent training process.

In one embodiment, CNN M70 may be, for example, a layer that may include convolutions with kernels that model the finite nature of AFM tool tips. In one embodiment, additional layers corresponding to metrology tools may be added to CNN M70. Thus, during trimming, model parameters associated with such specific layers enhanced in CNN M70 may be adjusted. In one embodiment, the experimental data set is typically limited by the metrology tool (e.g., AFM or YS), so that only a subset of the degrees of freedom (e.g., layers) of CNN M70 are optimized, while the remaining layers remain unchanged. In one embodiment, the correction bias in the latent space variable may be used to train CNN M70. For example, the latent spatial variable z may be calculated as follows:

z′＝z+Δz，

in the above equation, Δz is a vector having the same dimension as z. Δz (or a norm thereof) may be used as a measure of the difference between the synthetically trained model (e.g., CNN M70) and the calibrated model (e.g., 710). Ideally, this distance would be relatively small. In another embodiment, the weights and deviations of the last encoder layer are used as degrees of freedom.

FIG. 8C illustrates an example adjustment to predicted depth data based on observed data from an optical metrology tool (e.g., YIeldstar) configured to measure a patterned substrate. In this example, experimental SEM image SSEM2 is input to CNN M70 to generate a qualitatively reasonable height map PDD2, but the average height does not match the observed height YSh provided by the optical tool (e.g., yieldstar). To correct for this, the predicted height map PDD2 may be scaled to the observed height YSh using a scaling factor SF 1. For example, the scaling factor may be calculated as a ratio of the average height of the observed height YSh to the average height of the structure calculated from the predicted height map PDD 2. Multiplying the predicted altitude map PDD2 with the scaling factor SF1 generates a scaled predicted altitude map SPD2, which altitude map SPD2 is used as a new ground truth for a subsequent training process to generate a fully trained CNN 710 (e.g., fig. 8B).

In one embodiment, the results of the acquired/set-up experiments indicate that the predicted depth data using the calibrated model 710 closely follows the programmed depth data used in the experiments. For example, the depth of a single contact hole (the acquired depth) predicted by model 710 closely follows the depth programmed in the geometry used to generate the simulated SEM image (the set depth). Similarly, the sidewall angle of the edge extracted from the simulation data closely follows the simulated SEM image. In one embodiment, the brightness and contrast settings in the simulated image may be varied during the training process of model 710 such that predictions from the deep learning network are independent of the brightness and contrast settings.

The calibrated model 710 is capable of handling various physical effects that occur during SEM image capture or during patterning processes. For example, model 710 can make good predictions even in the presence of charging effects in SEM images, as shown in fig. 9A-9C.

Fig. 9A is an exemplary SEM image 902 including charging artifacts. For example, in SEM image 902 of line space features, SEM charging artifacts can be seen right of the trench where the SEM signal is reduced compared to at the edge of the trench. When such SEM image 902 with charging effect is input to the initial model M70 (e.g., in fig. 7), the predicted height map is affected by such SEM intensity decrease, as shown in the predicted height map 904 in fig. 9B (see right side of trench). To correct this, a fine tuning or further calibration step is employed during the training of the model M70, as previously described. In such a calibration step, the prediction data is adjusted to assume that the lines for the new training set have a flat surface. When such adjusted prediction data is used to generate a calibrated model 710 (e.g., in fig. 7), the predicted height map (via model 701) is no longer affected by charging, but rather has a flat surface on top of the line, as shown by predicted height map 906 in fig. 9C. This example serves as proof of principle, i.e., in one embodiment, a deep learning network (e.g., CNN) is taught to identify artifacts that are not present in the simulated data set, and to perform nonlinear correction based on the observed data.

Fig. 10A is a flow chart of a method for estimating depth data using the trained model 710 (of fig. 7), the model 710 using a single SEM image as input. Method 800 includes processes P802 and P804. In one embodiment, process P802 includes receiving SEM image SEM1 of structures patterned on a substrate. In one embodiment, process P804 includes predicting depth data 810 associated with the SEM image via a CNN (e.g., model 710) using the SEM image as input. As previously described, CNNs (e.g., model 710) may be trained to predict depth data of an input SEM image based on a plurality of simulated SEM images labeled with simulated depth data; and further calibrated using the scaled predicted depth data, wherein the scaled predicted data corresponds to observed depth data of a captured SEM image of the structure patterned on the substrate as determined using the SEM tool.

For example, training of CNN 710 includes receiving, via an SEM tool, a captured SEM image and observed depth data associated with the captured SEM image of a patterned structure on a substrate; inputting the captured SEM image into a model to predict depth data; adjusting the predicted depth data by comparing the predicted depth data with the observed depth data; and tuning the model based on the adjusted predicted depth data such that the predicted depth data is within a specified matching threshold of the observed depth data.

In one embodiment, the estimated depth data includes physical characteristics of a structure of the patterned substrate. For example, the physical characteristics include a shape (e.g., a resist profile), a size, a sidewall angle (e.g., an angle of a resist profile), or a relative positioning of polygonal shapes with respect to each other at one or more depths of a feature of the structure.

Fig. 10B illustrates an example of predicting depth data 825 (e.g., a height map) using a trained model 710 using SEM image 821 as input. As shown, model 710 predicts 3D information from 2D data in SEM images.

In one embodiment, rather than using SEM simulators to train depth estimation models with paired data, calibrated deterministic or random process models (e.g., resist models) may be employed to generate training data. In cases where the accuracy of the process model may be inadequate, a limited set of experimental depth data (e.g., from AFM or Yieldstar) may be used to fine tune the process model. In one embodiment, two separate methods are described herein, depending on the type of process model (deterministic or stochastic).

In one embodiment, using a resist model instead of an SEM model for generating training data (e.g., depth map and corresponding SEM images) has several advantages. For example, the SEM image corresponding to the simulated resist profile is an experimental image, while the SEM image generated from the SEM model is synthetic. The process model (e.g., resist model) may be well calibrated to wafer data with or without the inclusion of an SEM simulator.

FIG. 11 is a flow chart of a method 1100 for generating a model configured to estimate depth data of a structure of a patterned substrate. For example, the depth data of the structure includes one or more features, such as lines, bars, contact holes, other features, or combinations of features. In one embodiment, the model is trained using training data generated as a pair of SEM images and corresponding simulated 3D contours that may be formed on the substrate. The simulated contours may be generated via a calibrated process model (e.g., a calibrated resist model). In one embodiment, calibration is performed to meet critical dimension criteria associated with the structure. In one embodiment, process model calibration may involve comparing the height of the 3D simulated geometry to a height extracted from Atomic Force Microscope (AFM) data or optical metrology data. In one embodiment, the model is a CNN trained using a Generated Antagonism Network (GAN), encoder-decoder network, or other machine learning related training methods. For example, the model includes a first model (e.g., a generator model or an encoder model) and a second model (e.g., a discriminator model or a decoder model) that are trained together. In one embodiment, both the first model and the second model may be CNN models. One example implementation of method 1100 includes a process P1102 for acquiring training data and a process P1104 for generating a model based on the training data. Procedures P1102 and P1104 are discussed in detail below.

Process P1102 includes acquiring (i) a plurality of SEM images 1101 of a structure associated with a programming change PV1 in a mask pattern and (ii) a simulated contour 1103 of the structure based on the programming change PV1, each SEM image of the plurality of SEM images 1101 being paired with a simulated contour corresponding to the programming change in the mask pattern.

In one embodiment, the programming changes PV1 include changes in one or more of the following: a change in assist features associated with the mask pattern, a change in primary features associated with the mask pattern, or a change in resist coating thickness. In one embodiment, the assist feature variation includes modifying a dimension of an assist feature of the mask pattern, a distance of the assist feature from a primary feature of the mask pattern, or both. In one example, the first programming variation may be a variation in a dimension of the assist feature, and a mask including the first programming variation may be fabricated. The substrate may be patterned (e.g., via a lithographic apparatus) using a mask having first programming variations. Further, SEM images of the patterned substrate may be captured via an SEM tool. In this way, the captured SEM image includes features associated with the first programmed variation in the mask. In another example, the second programming change in the mask pattern may be a modification of the shape of the primary feature of the mask pattern. A mask having a second programming variation may be used to pattern the substrate, and another SEM image of the patterned substrate may be captured such that the SEM image includes variations related to the second programming variation. Such SEM images corresponding to the first, second, third, etc. programming variations PV1 may be acquired and included in the training data.

In one embodiment, the training data may also include a simulated contour 1103 of the structure. In one embodiment, the simulated contour 1103 may be a 3D contour (e.g., a resist contour) that may be formed on the substrate. Such 3D contours may be generated by a process model (e.g., a resist model). In one embodiment, the simulated contour 1103 is generated by a calibrated deterministic process model DPM associated with a patterning process that uses the programmed variations PV1 in the mask pattern. For example, a first programming variation (e.g., a variation in assist feature size), a second programming variation (e.g., a variation in shape of a main feature), or other programming variation may be input to the calibrated process model, thereby producing the simulated contour 1103. Each of these simulated contours 1103 is a 3D contour, which may include variations in the 3D contour caused by corresponding programmed variations in the mask pattern. Thus, the SEM image 1101 and the simulated contour 1103 can be paired according to the programming variation PV1.

In one embodiment, the calibrated deterministic process model DPM is a process model calibrated using inspection data (also referred to as wafer data) of the patterned substrate such that the simulated process model generates a simulated 3D profile of the structure within a desired range of measured Critical Dimensions (CDs) of the structure of the patterned substrate. In one embodiment, the deterministic process model DPM may not be calibrated to meet local CD uniformity, line Edge Roughness (LER), line Width Roughness (LWR) associated with line space features, and contact edge roughness associated with contact hole features, or random variations associated with structures. In other words, the calibrated deterministic process model DPM can simulate the contours of structures that meet CD requirements. In another embodiment, the process model may be calibrated to meet LCDU, LER, LWR or other random variations associated with the physical characteristics of the structure.

Process P1104 includes generating model M110 based on a plurality of SEM images 1101 paired with corresponding simulated contours 1103 to estimate depth data of the structure such that the estimated depth data is within acceptable thresholds of the depth data associated with the simulated contours 1103. For example, model M110 may be trained such that the estimated height map is more than 90% similar to the height map extracted from the simulated contour 1103.

In one embodiment, the generated depth data may include height data of the structure, shape parameters of the structure, voxel maps of overhanging features, or other depth features of the structure. In one embodiment, the shape parameters include one or more of the following: top CD measured at the top of the structure, bottom CD measured at the bottom of the structure, or side wall angle of the structure.

In one embodiment, the generation of model M110 may be based on a generation antagonism network architecture, an encoder-decoder network, res-net, or other machine learning architecture. For example, the condition GAN learns a mapping from the observed image x (e.g., SEM image 1101) and random noise vector z to the output y (e.g., simulated contour 1103), which is denoted as G { x, z } →y. In one embodiment, the generator G is trained to produce an output that is indistinguishable from a reference image (e.g., a "true" image) by a discriminator D of the resistance training, which is trained to classify the generated image as "false" as good as possible. For example, in GAN training, G attempts to minimize the target, while resistance D attempts to maximize the target. The generator may be an encoder-decoder or a U-net.

In one embodiment, the generation of model M110 includes: a first model (e.g., CNN 1) is trained in conjunction with a second model (e.g., CNN 2) such that the first model generates depth data using SEM images of the plurality of images as input, and the second model is used to classify the generated depth data as either a first class (e.g., real) or a second class (e.g., false) based on paired simulated contours. In one embodiment, the first model and the second model are Convolutional Neural Networks (CNNs) or deep CNNs. After the training process, a first model (e.g., CNN 1) may be used as model M110 to generate depth data for any incoming SEM images.

In one embodiment, the generation of model M110 includes: inputting SEM images of the plurality of SEM images 1101 to the first model; estimating depth data of the input SEM image by executing the first model; classifying the estimated depth data as indicating that the estimated depth data corresponds to a first class of references (e.g., real) or as indicating that the estimated depth data does not correspond to a second class of references (e.g., false) via a second model using the simulated contour 1103 as a reference; and updating model parameters of both the first model and the second model such that the first model estimates depth data such that the second model classifies the estimated depth into a first category. In one embodiment, the model parameters are weights and deviations of one or more layers of the CNN or DCNN.

Fig. 12 illustrates an exemplary training of model 1205 using encoder-decoder architecture and training data, including paired SEM images 1202 and simulated contour data 1210. The simulated contour 1210 is only one example herein, and may not correspond to exactly the same mask pattern as the SEM image, nor is the simulated contour 1210 calibrated. In one embodiment, the model 1205 may be trained using a GAN architecture. In one embodiment, the simulated contour data 1210 represents an average 3D resist contour 1210, the resist contour 1210 describing an average 3D behavior of an inspected feature (e.g., the feature may be an array of contact holes) of the patterned structure. In the present case, a deterministic resist model (one example of a calibrated deterministic process model DPM) may be used to generate a simulated 3D resist profile 1210 of the structure.

To introduce local profile variations for training the model 1205, programmed variations of the mask pattern (e.g., sub-resolution assist feature (SRAF) variations) within the field of view may be performed to cause variations in feature profiles (e.g., contact hole profiles) on the substrate. In one embodiment, the programming change refers to a physical change to the mask pattern. Such physical changes in the mask pattern may cause changes in the 3D profile patterned on the physical substrate. Thus, for each programming change, a corresponding 3D profile may be acquired. In one embodiment, the programming changes and the data generated thereby are used as training data for training the model as discussed herein. In one embodiment, the programming changes include changes in the size of the SRAF that will affect the CD and 3D profiles, changes in the distance to the primary pattern, or both. Based on such programming variations, a patterned substrate may be generated and SEM images of the patterned substrate captured. In one embodiment, programming changes in the mask pattern are used to generate training data and are different from changes (e.g., OPC related) to the mask pattern performed to improve the lithography process.

In one embodiment, the programmed variations in the mask pattern may be used in a simulation of a lithographic process configured to generate a simulated 3D profile 1210. For example, mask pattern data (e.g., GDS format) may be changed according to programming changes, and corresponding mask data (e.g., modified GDS) may be employed in simulations using deterministic model DPM to generate simulated 3D contours. For example, as described above, programming changes implemented on the physical mask (such as changes in size, shape, distance to the primary pattern, etc. of the SRAF) may be used in the simulator to generate a corresponding simulated 3D profile. In this way, the correspondence between the observed data and the analog data can be established based on the programming changes. In one example, a Convolutional Neural Network (CNN) may be trained to map SEM data 1202 directly to simulated 3D resist profile 1210. It will be appreciated that the ADI image is for illustration purposes only.

Referring again to FIG. 11, in one embodiment, the method 1100 may further include a process P1110 for applying the trained model 1110. For example, applications of model 1110 include: receiving an SEM image of a structure patterned on a substrate via an SEM tool; and determining depth data associated with the SEM image via a model using the received SEM image as input.

In one embodiment, based on the depth data, a physical characteristic of a structure of the patterned substrate may be determined. The physical properties may include the relative positioning of shapes, sizes, or polygonal shapes with respect to each other at one or more depths of features of the structure. In one embodiment, the physical characteristics may be compared to desired physical characteristics to determine any defects in the structure. Defects may indicate that the structure does not meet the design specifications (e.g., the CD is within desired CD limits). Based on the defects, one or more parameters of the patterning process (e.g., dose, focus, resist characteristics, etc.) may be adjusted to eliminate the defects during subsequent processing of the patterning process.

Fig. 13 is a flow chart of a method 1300 for generating a model M130, the model M130 configured to estimate depth data of a structure of a patterned substrate. In one embodiment, the model is trained using training data comprising unpaired multiple SEM images and multiple simulated contours. For example, a particular SEM image is not paired with a particular simulated contour. In one embodiment, each of the plurality of SEM images may correspond to one or more simulated contours. In the present method, the simulated contours are generated via a calibrated process model (e.g., a calibrated stochastic model) configured to account for random variations in physical properties associated with the structure. In one embodiment, method 1300 may include a process for acquiring training data P1302 and a process for generating a model for estimating depth data P1304. These processes P1302 and P1304 are discussed in further detail below.

Process P1302 includes: a plurality of SEM images 1301 of (i) a structure, (ii) a simulated contour 1303 of the structure, and (iii) a Key Performance Indicator (KPI) associated with the simulated contour 1303 are acquired. In one embodiment, the plurality of SEM images may be acquired from an image capture tool (such as an SEM tool) configured to capture an image of the structure on the patterned substrate. The SEM image may be a 2D image, such as a top-down SEM image. In one embodiment, the simulated contour 1303 of the structure may be a 3D contour of the structure generated via a calibrated process model SPM associated with patterning. In one embodiment, the plurality of SEM images 1301 and the simulated contour 1303 are unpaired.

In one embodiment, the calibrated process model SPM is a process model calibrated to generate a 3D profile of the structure such that the 3D structure satisfies one or more KPIs extracted from the 3D structure. In one embodiment, KPIs may include, but are not limited to, critical Dimensions (CDs) of structures, local CD uniformity (LCDUs) associated with structures, line Edge Roughness (LER) associated with structures, defect rates associated with structures, line Width Roughness (LWR) associated with line space patterns, contact edge roughness associated with contact holes, random edge placement errors (SEPEs) associated with structures, other random variations associated with geometry of structures, or combinations thereof. In one embodiment, the calibration may be based on a comparison between the wafer data being inspected and the model generated data.

Process P1304 includes generating model M130 to estimate depth data of a structure based on (i) a plurality of SEM images 1301, (ii) a simulated contour 1303, and (iii) KPIs such that KPIs associated with the estimated depth data are within acceptable thresholds of KPIs associated with the simulated contour 1303. In one embodiment, the estimated depth data includes at least one of: height data of the structure, shape parameters of the structure, or voxel map associated with the overhang structure. In one embodiment, the shape parameters include one or more of the following: top CD measured at the top of the structure, bottom CD measured at the bottom of the structure, or side wall angle of the structure.

In one embodiment, the generative model M130 may be based on a generative resistance network, encoder-decoder network, or other machine learning related network. For example, the generation of model M130 includes training a first model (e.g., G1 in FIG. 14) and a second model (e.g., DS of FIG. 14). In one embodiment, for example, referring to the GAN architecture, the first model may be a generator model and the second model may be a discriminator model. Training of the first model is performed in conjunction with the second model such that the first model generates depth data using SEM images of the plurality of SEM images as input. The second model classifies the generated depth data into a first class (e.g., real) or a second class (e.g., false) using the simulated contour 1303 and the simulated KPI associated with the simulated contour 1303 reference data.

In one embodiment, the generation of the model is an iterative process. Each iteration includes: inputting SEM images of the plurality of SEM images 1301 to the first model; estimating depth data of the input SEM image using the first model; extracting KPIs from the estimated depth data; the estimated depth data is classified into a first class or a second class via a second model using (i) the extracted KPIs and (ii) the plurality of simulated contours 1303 and the simulated KPIs of the plurality of simulated contours 1303 as references. In one embodiment, the first category indicates that the estimated depth data corresponds to a reference, and the second category indicates that the estimated depth data does not correspond to a reference. Further, model parameters of both the first model and the second model may be updated such that the first model estimates depth data such that the second model classifies the estimated depth as a first class (e.g., true).

In one embodiment, for example, using cyclic GAN training, the first model (e.g., G1 of fig. 14) may be further trained in conjunction with the third model (e.g., another generator model G2 of fig. 14). The third model may be configured to generate SEM images from the estimated depth data. In other words, the first model and the third model are cyclically identical. For example, the first model and the third model are trained such that the input of the first model closely matches the output of the third model using the loss function (e.g., the difference between images 1402 and 1406 in fig. 14).

The training of the first model and the third model may be an iterative process. Each iteration includes inputting an SEM image of the plurality of SEM images 1301 to the first model; estimating depth data of the input SEM image using the first model; generating a predicted SEM image via a third model using the estimated depth data as input; and updating model parameters of both the first model and the third model such that a difference between the input SEM image and the predicted SEM image is within a specified difference threshold. In one embodiment, the first model, the second model, and the third model are Convolutional Neural Networks (CNNs) or deep CNNs. In one embodiment, the model parameters are weights and deviations of one or more layers of the CNN or DCNN.

Fig. 14 illustrates an exemplary determination of a model for estimating depth data according to method 1300. In fig. 14, model G1 may be trained according to method 1300 to generate model M130. In this embodiment, a random process model (e.g., a random resist model SPM) may be employed that eliminates the need for specific target profile variations (e.g., programming variations). According to one example, there may not be a one-to-one correlation between the CH generated by the random resist model and the CH forming the wafer. In one embodiment, to avoid this problem, the generation of a loop coherent antagonism network (loop GAN) may be employed to learn the unpaired image-to-image conversion between SEM image 1402, simulated depth data 1414, and predicted SEM image 1406 by minimizing the loss function. In one embodiment, the 3D profile shown is merely an example and may not correspond to an actual calibration profile for the SEM image shown. For example, the loss function may be the difference between SEM image 1402 and predicted SEM image 1406. An example of cyclic GAN is discussed in references "J.Zhu, T.Park, P.Isola and A.A. Efres, unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks,2017IEEE International Conference on Computer Vision (ICCV), venice,2017, pp.2242-2251", which are incorporated herein by reference in their entirety.

Referring to fig. 14, metrology data or inspection data (e.g., SEM 1402) for CH arrays may be acquired from a metrology tool (e.g., SEM) by capturing images or inspecting patterned substrates. In one embodiment, multiple CH arrays are acquired from the ADI for multiple situations, for example using a focus-exposure matrix (FEM).

In one embodiment, a physical depth random resist model (not shown) may be calibrated so that the random model can well predict CD distribution, LCDU, LER, LWR, defect rate or other random variation under the same conditions, and contrast curves under the same conditions during flat exposure. The random model may be used to generate simulated contours with multiple examples of CD and resist height variations (e.g., due to semi-open CH), similar to inspection data for patterned substrates.

Further, the network (e.g., GAN) is trained using the unpaired depth resist profile 1414 and the data set of SEM images 1402. For example, the GAN network includes a first model G1, a second model DS, and a third model G2. In one embodiment, the training GAN network learns an efficient way to transform SEM image 1402 into a true depth resist profile 1404 and then return to the true SEM image 1406. For example, by using a trained model G1 from SEM image 1402 to prediction of depth resist profile 1404.

In this training example, SEM images 1402 captured by the SEM tool may be input to a first model G1 (e.g., a generator of GAN) to generate depth data 1404. Depth data 1404 may be represented as a 3D profile of a structure that may be formed on a substrate. For example, depth data 1404 may be a resist profile formed on a resist layer of a substrate. In one embodiment, KPIs (e.g., CD, LCDU, LER, LWR) may be extracted from depth data 1404. In one embodiment, the depth data 1404 and KPIs may be input to a second model DS (e.g., a discriminator of GAN). In one embodiment, the reference profile 1414 may be generated using a stochastic process model, and the reference KPIs may be extracted from the reference profile 1414.

The depth data 1404 and related KPIs are input to a second model DS configured to classify the depth data 1404 into a first class (e.g., real) or a second class (e.g., false) based on the reference KPIs and the reference simulation profile 1414. In one embodiment, the first class indicates that the depth data 1404 is true or similar to the reference profile 1414, while the second class indicates that the depth data 1404 is not true or similar to the reference profile 1414.

In one embodiment, the first model G1 and the second model DS are trained to improve upon each other. In this way, the first model G1 gradually generates such real depth data that the second model DS classifies the model-generated depth data as belonging to the first class (e.g., real).

Alternatively or additionally, the first model G1 and the second model DS are further trained in cooperation with a third model G2, the third model G2 generating a predicted SEM image 1406 using the depth data 1404 as input. During training, model parameters of the first model G1 and the third model G2 are adjusted to reduce or minimize the difference between the predicted SEM image 1406 and the real SEM image 1402. Thus, the third model G2 may generate a true SEM image 1406 for any depth data. In one embodiment, the model parameters of the second model DS may also be adjusted, as modifying G1 may result in a change in the G1 prediction, which in turn may affect the classification of the second model DS. In other words, the three models, when trained together, may generate real depth data that is further validated by generating predicted SEM images that are also real (similar to real SEM). Thus, according to one embodiment, pairing between training data may not be required.

Referring again to fig. 13, in one embodiment, the method 1300 may further include a process P1310 for applying the trained model 1310. For example, applying the model includes: receiving an SEM image of a structure patterned on a substrate via an SEM tool; and determining depth data associated with the SEM image via the model using the received SEM image as input. In one embodiment, based on the depth data, a physical characteristic of a structure of the patterned substrate may be determined. Physical characteristics include the relative position of shapes, sizes, or polygonal shapes with respect to each other at one or more depths of a feature of a structure.

In one embodiment, the physical characteristics are compared to a desired range and defects in the structure can be detected. In one embodiment, KPIs associated with physical characteristics may be compared to a range of expected KPIs to determine defects. Thus, a defect may indicate that the structure does not meet design specifications or performance specifications. In one embodiment, based on the defects, one or more parameters of the patterning process may be adjusted to eliminate the defects during subsequent processing of the patterning process.

Fig. 15 is a flow chart of a method 1500 of applying a model 1110 or model 1310 trained in accordance with the methods 1100 or 1300 discussed herein. In one example, according to one embodiment, the method 1500 for estimating depth data of a structure formed on a substrate uses a convolutional neural network ("CNN") configured to estimate depth data based on a scanning electron microscope ("SEM") image of the structure. For example, CNNs may be trained according to 1100 or 1300. One example implementation of method 1500 may include processes P1502 and P1504, which will be discussed further below.

Process P1502 includes receiving SEM image 1501 of structures on a patterned substrate. For example, by projecting an electron beam perpendicular to the substrate, a vertical SEM image may be captured by the SEM tool. Alternatively, in process P1502, an optical image of the substrate may be captured via an optical tool and received.

Process P1504 includes estimating depth data 1510 associated with the structure of the patterned substrate via CNN using SEM image 1501 as input. In one embodiment, the CNN may be trained using training data that includes multiple SEM images of the structure and multiple simulated contours of the structure. In one embodiment, the simulated contours may be generated by a process model associated with patterning.

In some embodiments, the CNN may be a combination of a generator model (e.g., a first model or CNN 1) and a discriminator model (e.g., a second model or CNN 2) trained together (e.g., as discussed with respect to fig. 11). For example, training of CNNs is an iterative process, each iteration comprising: acquiring a plurality of SEM images and a plurality of simulated contours, wherein each SEM image is paired with a simulated contour; inputting an SEM image of the plurality of SEM images into a generator model; estimating depth data of the input SEM image using the generator model; classifying the estimated depth data into a first class indicating that the estimated depth data corresponds to a reference or a second class indicating that the estimated depth data does not correspond to a reference via a discriminator model using a simulated contour paired with the input SEM image as a reference; and updating model parameters of both the generator model and the discriminator model such that the generator model estimates depth data such that the discriminator model classifies the estimated depth into a first category.

In some embodiments, the CNN may be a combination of a generator model (e.g., a first model or CNN 1) and a discriminator model (e.g., a second model or CNN 2) trained together (e.g., as discussed with respect to fig. 13). For example, training of CNNs is an iterative process; each iteration includes: inputting an SEM image of the plurality of SEM images into a generator model; estimating depth data of the input SEM image using the generator model; extracting key performance indicators, KPIs, associated with the structure based on intensity values associated with the structure in the estimated depth data; classifying the estimated depth data into a first class indicating that the estimated depth data corresponds to a reference or a second class indicating that the estimated depth data does not correspond to a reference, via a discriminator model using (i) the extracted KPIs and (ii) the plurality of simulated contours and the simulated KPIs of the plurality of simulated contours as references; and updating model parameters of both the generator model and the discriminator model such that the generator model estimates depth data such that the discriminator model classifies the estimated depth into a first category.

In one embodiment, the CNN may be further trained in conjunction with a second generator model (e.g., CNN 3) configured to generate SEM images from the estimated depth data. Training also includes: inputting SEM images of the plurality of SEM images to the CNN; estimating depth data of the input SEM image using the CNN; generating a predicted SEM image via a second generator model using the estimated depth data as input; and updating model parameters of both the CNN and the second generator model such that a difference between the input SEM image and the predicted SEM image is within a specified difference threshold.

In one embodiment, the process model is a calibrated deterministic process model DPM that is calibrated such that the Critical Dimension (CD) of the simulated contour of the structure is within the CD threshold of observed data associated with the patterned substrate, but does not necessarily meet the random variation specifications associated with the observed data. In one embodiment, the process model is a calibrated stochastic process model that is calibrated such that Key Performance Indicators (KPIs) extracted from depth data generated by the model are within specified thresholds of observed KPIs extracted from inspection data associated with the patterned substrate. In one embodiment, the inspection data is obtained from a plurality of features of the patterned substrate, the plurality of features being formed using a range of dose and focus conditions. In one embodiment, the KPIs include one or more of the following: a CD of a structure, a local CD uniformity (LCDU) associated with the structure, a Line Edge Roughness (LER) associated with the structure, a defect rate associated with the structure, other KPIs indicative of performance of a patterning process.

As previously described, depth information may also be used to determine defects in the structure by comparing CDs of features at different depths and determining whether the CDs meet design specifications. In one embodiment, the depth information may be used to determine one or more parameters of a patterning process, such as a resist process, an etching process, or other patterning related process, such that the 3D structure of the feature is within a desired design specification.

In one embodiment, the base model (e.g., an untrained model) and the trained model may be machine learning models that include weights and biases as model parameters. During the training process, the weights and deviations of the base model are continuously adjusted based on the training data. At the end of training, the base model is referred to as a trained model. In one embodiment, the trained model is a convolutional neural network (e.g., CNN), a deep convolutional network (DCNN). Model parameters include weights and deviations for one or more layers of the deep convolutional network.

Without limiting the scope of this disclosure, the application of an example supervised machine learning algorithm is described below.

Supervised learning is a machine learning task that infers functions from labeled training data. The training data includes a set of training examples. In supervised learning, each example is a pair having an input object (typically a vector) and a desired output value (also referred to as a supervisory signal). The supervised learning algorithm analyzes the training data and generates an inferred function that can be used to map new examples. An optimal scenario would allow the algorithm to correctly determine class labels for invisible instances. This requires that the learning algorithm generalize from training data to invisible situations in a "rational" way.

Given a set of N training examples { (x) ₁ ，y ₁ )，(x ₂ ，y ₂ )，...，(x _N ，y _N ) -x is such that _i For the feature vector of the i-th example, y _i For its labels (i.e., classes), the learning algorithm looks for a function g: X-Y, where X is the input space and Y is the output space. A feature vector is an n-dimensional vector representing the numerical features of a certain object. Many algorithms in machine learning require a numerical representation of the object, as such a representation facilitates processing and statistical analysis. The feature values may correspond to pixels of an image when representing the image and may correspond to the term frequency of occurrence when representing text. The vector space associated with these vectors is often referred to as feature space. The function G is an element of a certain space G (commonly referred to as a hypothetical space) of possible functions. Sometimes use scoring functionsIt is convenient to express g such that g is defined as the y value that gives the highest score:let F denote the space of the scoring function.

Although G and F may be spaces of any function, many learning algorithms are probabilistic models, where G takes the form of a conditional probability model G (x) =p (y|x), or F takes the form of a joint probability model F (x, y) =p (x, y). For example, naive bayes and linear discriminant analysis are joint probability models, while logistic regression is a conditional probability model.

There are two basic approaches to selecting either f or f: experience risk minimization and structural risk minimization. Empirical risk minimization seeks to best fit the function of the training data. Structural risk minimization includes a penalty function that controls the bias/variance tradeoff.

In both cases, it is assumed that the training set has independent and equally distributed pairs of samples (x _i ，y _i ). In order to measure the fitting degree of the function and the training data, a loss function is definedFor training examples (x _i ，y _i ) Predicted value->The loss of->

The risk R (g) of the function g is defined as the expected loss of g. This can be estimated from training data as

Example models for supervised learning include decision trees, integration (bagging, boosting, random forest), k-NN, linear regression, naive bayes, neural networks, logistic regression, perceptrons, support Vector Machines (SVMs), relevance Vector Machines (RVMs), and deep learning.

SVM is an example of a supervised learning model that analyzes data and identifies patterns, and may be used for classification and regression analysis. Given a set of training examples, each labeled as belonging to one of two classes, the SVM training algorithm builds a model, assigning the new example to one class or the other, making it a non-probabilistic binary linear classifier. The SVM model is a model that represents examples as points in space, mapped such that the examples of individual classes are partitioned with as wide a distinct gap as possible. The new examples are then mapped into the same space and are predicted to belong to a certain class based on which side of the gap they fall on.

In addition to performing linear classification, the SVM may also efficiently perform non-linear classification using a so-called kernel method, thereby implicitly mapping its input into a high-dimensional feature space.

The kernel approach involves a user-specified kernel, i.e., a similarity function over pairs of data points in the original representation. The name of kernel methods derives from the use of kernel functions that enable them to operate in a high-dimensional implicit feature space without computing the coordinates of the data in that space, but rather by simply computing the inner product between images of all pairs of data in the feature space. Such operations are generally less computationally expensive than explicit computation of coordinates. This approach is known as "kernel skills"

The effectiveness of the SVM depends on the choice of kernel, the parameters of the kernel and the soft margin parameter C. One common option is a gaussian kernel, which has a single parameter γ. The best combination of C and gamma is typically selected by a grid search (also known as a "parametric scan") with exponentially increasing C and gamma sequences, e.g., C.epsilon. {2 } ^-5 ，2 ^-4 ，...，2 ¹⁵ ，2 ¹⁶ }；γ∈{2 ^-15 ，2 ^-14 ，...，2 ⁴ ，2 ⁵ }。

Grid searching is an exhaustive search through a manually specified subset of the super parameter space of the learning algorithm. Grid search algorithms are guided by some performance metrics, typically measured by cross-validation of training sets or evaluation of retained validation sets.

Cross-validation may be used to examine each combination of parameter selections and select the parameter with the best cross-validation accuracy.

Cross-validation (sometimes referred to as rotation estimation) is a model validation technique that is used to evaluate how the results of statistical analysis will generalize to independent datasets. It is mainly used in environments where the goal is prediction and one wants to estimate the accuracy of the prediction model in practice. In the prediction problem, a model is typically given a known data set (training data set) on which training is run and an unknown data set (or first seen data) for which the model is tested (test data set). The goal of cross-validation is to define the data set to "test" the model (i.e., validate the data set) during the training phase in order to limit problems such as overfitting, give insight into how the model will generalize to independent data sets (i.e., unknown data sets, e.g., unknown data sets from real problems), etc. One round of cross-validation involves partitioning a data sample into complementary subsets, performing an analysis on one subset (referred to as a training set), and validating the analysis on the other subset (referred to as a validation set or test set). To reduce variability, multiple rounds of cross-validation are performed using different partitions, and the validation results are averaged over multiple rounds.

The final model, which can be used to test and classify the new data, is then trained over the entire training set using the selected parameters.

Another example of supervised learning is regression. Regression derives a relationship between a dependent variable and one or more independent variables from a set of values for the dependent variable and corresponding values for the independent variable. Regression can estimate the conditional expectation of a dependent variable given the independent variable. The inferred relationship may be referred to as a regression function. The inferred relationship may be probabilistic.

In one embodiment, a system is provided that can use a model to generate 3D data (e.g., depth data) after the system captures an image of a patterned substrate. In one embodiment, for example, the system may be the SEM tool of FIG. 16 or the inspection tool of FIG. 17 configured to include the model DBM discussed herein. For example, the metrology tool includes: an electron beam generator for capturing an image of the patterned substrate; and one or more processors including a trained model as described herein. The one or more processors are configured to execute the trained model using the captured images to generate depth data. As previously described, the model may be a convolutional neural network.

In one embodiment, model-generated 3D data (e.g., depth data) may be used to improve the patterning process. For example, the depth data may be used in simulation of the patterning process, e.g., to predict contours, CDs, edge placement (e.g., edge placement errors), etc., in the resist and/or etch image. The purpose of the simulation is to accurately predict the edge position and/or aerial image intensity slope and/or CD, etc. of the printed pattern. These values may be compared to an expected design, for example, to correct a patterning process, identify locations where defects are expected to occur, and so forth. The desired design is typically defined as a pre-OPC design layout, which may be provided in a standardized digital file format (such as GDSII or OASIS) or other file format.

In some embodiments, the inspection device or metrology device may be a Scanning Electron Microscope (SEM) that produces images of structures (e.g., some or all of the structures of the device) that are exposed or transferred to the substrate. Fig. 16 depicts one embodiment of an SEM tool. The primary electron beam EBP emitted from the electron source ESO is condensed by a condensing lens CL and then passes through beam deflectors EBD1, exb deflector EBD2, and an objective lens OL to irradiate the substrate PSub on the substrate stage ST at a focal point.

When the substrate PSub is irradiated with the electron beam EBP, secondary electrons are generated from the substrate PSub. The secondary electrons are deflected by the E x B deflector EBD2 and detected by the secondary electron detector SED. The two-dimensional electron beam image may be acquired by detecting electrons generated from the sample in synchronization with, for example, a two-dimensional scan of the electron beam by the beam deflector EBD1 or a repeated scan of the electron beam EBP in the X or Y direction by the beam deflector EBD1 and a continuous movement of the substrate PSub in the other of the X or Y direction by the substrate table ST.

The signal detected by the secondary electronic detector SED is converted into a digital signal by an analog/digital (a/D) converter ADC and the digital signal is sent to the image processing system IPU. In an embodiment the image processing system IPU may have a memory MEM for storing all or part of the digital image for processing by the processing unit PU. The processing unit PU (e.g. specially designed hardware or a combination of hardware and software) is configured to convert or process the digital image into a dataset representing the digital image. Furthermore, the image processing system IPU may have a storage medium STOR configured to store the digital image and the corresponding data set in a reference database. The display device DIS may be connected to the image processing system IPU so that an operator may perform the necessary operations on the device with the help of a graphical user interface.

As described above, SEM images may be processed to extract contours describing edges of objects representing device structures in the images. These contours are then quantized via a metric such as CD. Thus, typically, images of device structures are compared and quantified via simple metrics, such as edge-to-edge distances (CDs) or simple pixel differences between images. A typical contour model that detects edges of objects in an image in order to measure CD uses image gradients. In fact, these models rely on strong image gradients. However, in practice, the image is typically noisy and has discontinuous boundaries. Techniques such as smoothing, adaptive thresholding, edge detection, erosion and dilation can be used to process the results of the image gradient profile model to process noise and discontinuous images, but will ultimately result in low resolution quantization of high resolution images. Therefore, in most cases, mathematical manipulation of the image of the device structure to reduce noise and automated edge detection results in loss of image resolution and thus information. The result is therefore a low resolution quantization, equivalent to a simple representation of a complex high resolution structure.

It is therefore desirable to mathematically represent structures produced or intended to be produced using patterning processes (e.g., circuit features, alignment marks or metrology target portions (e.g., grating features), etc.), whether these structures are layers in a latent resist image, in a developed resist image, or transferred onto a substrate, e.g., by etching, which can maintain resolution and describe the general shape of the structure. In the context of photolithography or other patterning processes, the structure may be the device being fabricated or a portion thereof, and the image may be an SEM image of the structure. In some cases, the structure may be a feature of a semiconductor device, such as an integrated circuit. In this case, the structure may be referred to as a pattern or a desired pattern including a plurality of features of the semiconductor device. In some cases, the structure may be an alignment mark or a portion thereof (e.g., a grating of the alignment mark) that is used in an alignment measurement process to determine alignment of an object (e.g., a substrate) with another object (e.g., a patterning device) or a metrology target or a portion thereof (e.g., a grating of the metrology target) that is used to measure parameters of the patterning process (e.g., overlay, focus, dose, etc.). In one embodiment, the metrology target is a diffraction grating for measuring, for example, an overlay.

Fig. 17 schematically illustrates another embodiment of the examination apparatus. The system is for inspecting a sample 90 (such as a substrate) on a sample stage 88 and includes a charged particle beam generator 81, a converging lens module 82, a probe forming objective lens module 83, a charged particle beam deflection module 84, a secondary charged particle detector module 85, and an image forming module 86.

The charged particle beam generator 81 generates a primary charged particle beam 91. The converging lens module 82 converges the generated primary charged particle beam 91. The probe forming objective lens module 83 focuses the converging primary charged particle beam into a charged particle beam probe 92. Charged particle beam deflection module 84 scans a formed charged particle beam probe 92 across the surface of a region of interest on a sample 90 fixed on sample stage 88. In one embodiment, the charged particle beam generator 81, the converging lens module 82 and the probe forming objective lens module 83, or equivalent designs, alternatives or any combination thereof, together form a charged particle beam probe generator that generates a scanned charged particle beam probe 92.

The secondary charged particle detector module 85 detects secondary charged particles 93 (which may also be together with other reflected or scattered charged particles on the sample surface) emitted from the sample surface when bombarded by the charged particle beam probe 92 to generate a secondary charged particle detection signal 94. An image forming module 86 (e.g., a computing device) is coupled with the secondary charged particle detector module 85 to receive the secondary charged particle detection signal 94 from the secondary charged particle detector module 85 and to form at least one scanned image accordingly. In one embodiment, the secondary charged particle detector module 85 and the image forming module 86, or equivalent designs, alternatives, or any combination thereof, together form an image forming device that forms a scanned image from detected secondary charged particles emitted from the sample 90 bombarded by the charged particle beam probe 92.

In one embodiment, the monitoring module 87 is coupled to the image forming module 86 of the image forming apparatus to monitor, control, etc. the patterning process and/or to derive parameters for patterning process design, control, monitoring, etc. using the scanned image of the sample 90 received from the image forming module 86. Thus, in one embodiment, the monitoring module 87 is configured or programmed to cause performance of the methods described herein. In one embodiment, the monitoring module 87 includes a computing device. In one embodiment, the monitoring module 87 includes a computer program for providing the functionality herein, and the computer program is encoded on a computer readable medium forming the monitoring module 87 or disposed within the monitoring module 87.

In one embodiment, as with the electron beam inspection tool of fig. 16 that uses a probe to inspect a substrate, the electron current in the system of fig. 17 is significantly greater than a CD SEM such as that shown in fig. 16, so that the probe points are large enough so that the inspection speed can be fast. However, due to the large probe points, the resolution may not be as high as compared to CD SEM. In one embodiment, the inspection device may be a single or multi-beam device, without limiting the scope of the present disclosure.

SEM images from the systems of fig. 16 and/or 17 may be processed to extract contours describing edges of objects representing device structures in the images. These contours are then typically quantified via a metric such as CD on a user defined cut line. Thus, typically, the images of the device structures are compared and quantified via metrics, such as edge-to-edge distances (CDs) measured on the extracted contours or simple pixel differences between the images.

In one embodiment, one or more processes of the method may be implemented as instructions (e.g., program code) in a processor of a computer system (e.g., process 104 of computer system 100). In one embodiment, the process may be distributed across multiple processors (e.g., parallel computing) to increase computing efficiency. In one embodiment, a computer program product comprising a non-transitory computer readable medium has instructions recorded thereon that, when executed by a computer hardware system, implement the methods described herein.

As discussed herein, different methods (e.g., methods 500, 700, 1100, and 1300) are provided for training a model configured to generate depth information from a single SEM image of a patterned substrate. Thus, using the model described herein, depth information may be estimated using only one measurement, saving metrology time. The depth information may be further employed to configure the lithography process to improve yield or minimize defects.

Combinations and subcombinations of the disclosed elements constitute separate embodiments in accordance with the present disclosure. For example, the first combination includes determining a model configured to estimate depth data using a single image (e.g., SEM image, AFM data, optical image, etc.). The sub-combining may include determining depth data using a trained model. In another combination, the depth data may be used in an inspection process to determine OPC or SMO based on model-generated variance data. In another example, the combining includes determining a process adjustment to a lithographic process, a resist process, or an etching process based on inspection data (the inspection data being based on depth data) to increase a yield of the patterning process.

Fig. 18 is a block diagram illustrating a computer system 100, which computer system 100 may facilitate implementing the methods, processes, or apparatus disclosed herein. Computer system 100 includes a bus 102 or other communication mechanism for communicating information, and a processor 104 (or multiple processors 104 and 105) coupled with bus 102 for processing information. Computer system 100 also includes a main memory 106, such as a Random Access Memory (RAM) or other dynamic storage device, main memory 106 being coupled to bus 102 for storing information and instructions to be executed by processor 104. Main memory 106 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 104. Computer system 100 also includes a Read Only Memory (ROM) 108 or other static storage device coupled to bus 102 for storing static information and instructions for processor 104. A storage device 110, such as a magnetic disk or optical disk, is provided and coupled to bus 102 for storing information and instructions.

Computer system 100 may be coupled via bus 102 to a display 112, such as a Cathode Ray Tube (CRT) or flat panel or touch pad display, to display information to a computer user. An input device 114, including alphanumeric and other keys, is coupled to bus 102 for communicating information and command selections to processor 104. Another type of user input device is cursor control 116, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 104 and for controlling cursor movement on display 112. The input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), which allows the device to specify positions in a plane. A touch pad (screen) display may also be used as an input device.

According to one embodiment, portions of one or more methods described herein may be performed by computer system 100 in response to processor 104 executing one or more sequences of one or more instructions contained in main memory 106. Such instructions may be read into main memory 106 from another computer-readable medium, such as storage device 110. Execution of the sequences of instructions contained in main memory 106 causes processor 104 to perform the process steps described herein. One or more processors in a multi-processing arrangement may also be employed to execute the sequences of instructions contained in main memory 106. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions. Thus, the description herein is not limited to any specific combination of hardware circuitry and software.

The term "computer-readable medium" as used herein refers to any medium that participates in providing instructions to processor 104 for execution. Such a medium may take many forms, including but not limited to, non-volatile media, and transmission media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 110. Volatile media includes dynamic memory, such as main memory 106. Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 102. Transmission media can also take the form of acoustic or light waves, such as those generated during Radio Frequency (RF) and Infrared (IR) data communications. Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read.

Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to processor 104 for execution. For example, the instructions may initially be carried on a magnetic disk of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 100 can receive the data on the telephone line and use an infrared transmitter to convert the data to an infrared signal. An infrared detector coupled to bus 102 can receive the data carried in the infrared signal and place the data on bus 102. Bus 102 carries the data to main memory 106, and processor 104 retrieves and executes the instructions from main memory 106. The instructions received by main memory 106 may optionally be stored on storage device 110 either before or after execution by processor 104.

Computer system 100 may also include a communication interface 118 coupled to bus 102. Communication interface 118 provides a two-way data communication coupling to a network link 120 that is connected to a local network 122. For example, communication interface 118 may be an Integrated Services Digital Network (ISDN) card or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 118 may be a Local Area Network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation, communication interface 118 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.

Network link 120 typically provides data communication through one or more networks to other data devices. For example, network link 120 may provide a connection through local network 122 to a host computer 124 or to data equipment operated by an Internet Service Provider (ISP) 126. ISP 126 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the "Internet" 128. Local network 122 and internet 128 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 120 and through communication interface 118, which carry the digital data to computer system 100 and from computer system 100, are exemplary forms of carrier waves transporting the information.

Computer system 100 can send messages and receive data, including program code, through the network(s), network link 120 and communication interface 118. In the Internet example, a server 130 might transmit a requested code for an application program through Internet 128, ISP 126, local network 122 and communication interface 118. For example, one such downloaded application may provide all or part of the methods described herein. The received code may be executed by processor 104 as it is received, and/or stored in storage device 110, or other non-volatile storage for later execution. In this manner, computer system 100 may obtain application code in the form of a carrier wave.

FIG. 19 schematically depicts an exemplary lithographic projection apparatus that may be used in connection with the techniques described herein. The apparatus includes:

an illumination system IL for conditioning a radiation beam B. In this particular case, the illumination system further comprises a radiation source SO;

a first stage (e.g. patterning device table) MT provided with a patterning device holder to hold a patterning device MA (e.g. a reticle), and connected to a first positioner to accurately position the patterning device with respect to the article PS;

A second stage (substrate table) WT provided with a substrate holder to hold a substrate W (e.g. a resist coated silicon wafer) and connected to a second positioner to accurately position the substrate with respect to the article PS;

a projection system ("lens") PS (e.g., a refractive, reflective or catadioptric optical system) for imaging an illumination portion of the patterning device MA onto a target portion C (e.g., comprising one or more dies) of the substrate W.

As described herein, the apparatus is transmissive (i.e., has a transmissive patterning device). However, in general, it may also be reflective, for example (with reflective patterning means). The apparatus may employ a different type of patterning device than a conventional mask; examples include a programmable mirror array or an LCD matrix.

A source SO (e.g. a mercury lamp or an excimer laser, LPP (laser produced plasma) EUV source) produces a radiation beam. The beam is fed into an illumination system (illuminator) IL, directly or after having traversed conditioning means, such as a beam expander Ex. The illuminator IL may comprise an adjuster AD for setting the outer and/or inner radial extent (commonly referred to as σ -outer and σ -inner, respectively) of the intensity distribution in the beam. IN addition, it typically includes various other components, such as an integrator IN and a concentrator CO. In this way, the beam B incident on the patterning device MA has a desired uniformity and intensity distribution in its cross-section.

It should be noted with respect to FIG. 19 that the source SO may be lithographically castWithin the housing of the radiation apparatus (as is the case, for example, when the source SO is a mercury lamp), but it may also be remote from the lithographic projection apparatus, the radiation beam it produces being directed into the apparatus (e.g. by means of a suitable directing mirror); the latter is typically the case when the source SO is an excimer laser (e.g., based on KrF, arF or F ₂ Laser), a case of laser light.

The beam PB subsequently intercepts the patterning device MA, which is held on a patterning device table MT. After passing through patterning device MA, beam B passes through lens PL, which focuses beam B onto target portion C of substrate W. With the aid of the second positioning device (and interferometric measuring device IF), the substrate table WT can be moved accurately, e.g. so as to position different target portions C in the path of the beam PB. Similarly, the first positioning device may be used to accurately position patterning device MA with respect to the path of beam B, e.g. after mechanical retrieval of patterning device MA from a patterning device library, or during a scan. In general, movement of the object tables MT, WT will be realized with the aid of a long-stroke module (coarse positioning) and a short-stroke module (fine positioning), which are not explicitly depicted in FIG. 19. However, in the case of a stepper (as opposed to a step-and-scan tool) the patterning device table MT may be connected to a short-stroke actuator only, or may be fixed.

The illustrated tool can be used in two different modes:

in step mode, the patterning device table MT is kept essentially stationary, and an entire patterning device image is projected onto a target portion C at once (i.e. a single "flash"). Then, the substrate table WT is shifted in the x and/or y direction so that different target portions C can be irradiated by the beam PB;

in scan mode, substantially the same scene applies except that a given target portion C is not exposed in a single "flash". Conversely, the patterning device table MT may be moved in a given direction (the so-called "scanning direction", e.g. the y direction) with a speed v, so that the projection beam B scans the patterning device image; meanwhile, the substrate table WT is simultaneously moved in the same or opposite directions at a speed v=mv, where M is the magnification of the lens PL (typically, m=1/4 or 1/5). In this way, a relatively large target portion C can be exposed without having to compromise on resolution.

FIG. 20 schematically depicts another exemplary lithographic projection apparatus LA that may be used in conjunction with the techniques described herein.

The lithographic projection apparatus LA comprises:

source collector module SO

An illumination system (illuminator) IL configured to condition a radiation beam B (e.g. EUV radiation).

A support structure (e.g. a patterning device table) MT constructed to support a patterning device (e.g. a mask or a reticle) MA and connected to a first positioner PM configured to accurately position the patterning device;

a substrate table (e.g., a wafer table) WT configured to hold a substrate (e.g., a resist-coated wafer) W and connected to a second positioner PW configured to accurately position the substrate; and

a projection system (e.g. a reflective projection system) PS configured to project a pattern imparted to the radiation beam B by patterning device MA onto a target portion C (e.g. comprising one or more dies) of the substrate W.

As shown, the device LA is reflective (e.g., employing a reflective patterning device). It should be noted that since most materials are absorptive in the EUV wavelength range, the patterning device may have a multilayer reflector comprising a multi-stack of e.g. molybdenum and silicon. In one example, the multi-stack reflector has 40 layers of molybdenum and silicon pairs, where each layer is one quarter wavelength thick. Smaller wavelengths can be produced using X-ray lithography. Since most materials are absorptive in the EUV and x-ray wavelength ranges, a thin sheet of patterned absorptive material (e.g., a TaN absorber over a multilayer reflector) on the patterning device topology defines where features will print (positive resist) or not print (negative resist).

Referring to fig. 20, the illuminator IL receives an euv radiation beam from a source collector module SO. Methods for generating EUV radiation include, but are not necessarily limited to, patterning materialsIs converted into a plasma state having one or more emission lines in at least one element of the EUV range, for example xenon, lithium or tin. In such a method, commonly referred to as laser produced plasma ("LPP"), the plasma may be produced by irradiating a fuel with a laser beam, such as a droplet, stream, or cluster of material having a line emitting element. The source collector module SO may be part of an EUV radiation system comprising a laser (not shown in fig. 20) for providing a laser beam for exciting the fuel. The generated plasma emits output radiation, e.g. EUV radiation, which is collected using a radiation collector arranged in a source collector module. The laser and source collector modules may be separate entities, for example when CO ₂ The laser is used to provide a laser beam for fuel excitation.

In this case, the laser is not considered part of the lithographic apparatus and the radiation beam is transmitted from the laser to the source collector module with the aid of a beam delivery system comprising, for example, suitable directing mirrors and/or a beam expander. In other cases, the source may be an integral part of a source collector module, for example when the source is a discharge-generated plasma EUV generator, commonly referred to as a DPP source.

The illuminator IL may comprise an adjuster for adjusting the angular intensity distribution of the radiation beam. Typically, at least the outer and/or inner radial extent (commonly referred to as σ -outer and σ -inner, respectively) of the intensity distribution in a pupil plane of the illuminator can be adjusted. In addition, the illuminator IL may comprise various other components, such as a facet field and a pupil mirror device. The illuminator may be used to condition the radiation beam, to have a desired uniformity and intensity distribution in its cross-section.

The radiation beam B is incident on the patterning device (e.g., mask) MA, which is held on the support structure (e.g., patterning device table) MT, and is patterned by the patterning device. After being reflected from the patterning device (e.g., mask) MA, the radiation beam B passes through the projection system PS, which focuses the beam onto a target portion C of the substrate W. With the aid of the second positioner PW and position sensor PS2 (e.g. an interferometric device, linear encoder or capacitive sensor), the substrate table WT can be moved accurately, e.g. so as to position different target portions C in the path of the radiation beam B. Similarly, the first positioner PM and another position sensor PS1 can be used to accurately position the patterning device (e.g. mask) MA with respect to the path of the radiation beam B. Patterning device (e.g., mask) MA and substrate W may be aligned using patterning device alignment marks M1, M2 and substrate alignment marks P1, P2.

The described device LA may be used in at least one of the following modes:

1. in step mode, the support structure (e.g., patterning device table) MT and the substrate table WT are kept essentially stationary, while an entire pattern imparted to the radiation beam is projected onto a target portion C at one time (i.e., a single static exposure). The substrate table WT is then shifted in the X and/or Y direction so that a different target portion C can be exposed.

2. In scan mode, the support structure (e.g., patterning device table) MT and the substrate table WT are scanned synchronously while a pattern imparted to the radiation beam is projected onto a target portion C (i.e., a single dynamic exposure). The velocity and direction of the substrate table WT relative to the support structure (e.g. patterning device table) MT may be determined by the (de) magnification and image reversal characteristics of the projection system PS.

3. In another mode, the support structure (e.g., patterning device table) MT is kept essentially stationary, so as to hold a programmable patterning device, and the substrate table WT is moved or scanned while a pattern imparted to the radiation beam is projected onto a target portion C. In this mode, a pulsed radiation source is typically employed, and the programmable patterning device is updated as required after each movement of the substrate table WT or in between successive radiation pulses during a scan. This mode of operation can be readily applied to maskless lithography that utilizes programmable patterning device, such as a programmable mirror array of a type as referred to above.

Fig. 21 shows in more detail an apparatus LA comprising a source collector module SO, an illumination system IL and a projection system PS. The source collector module SO is constructed and arranged such that a vacuum environment can be maintained in the enclosure 220 of the source collector module SO. The EUV radiation emitting plasma 210 may be formed by a discharge generated plasma source. EUV radiation may be generated from a gas or vapor, such as Xe gas, li vapor or Sn vapor, wherein a very hot plasma 210 is generated to emit radiation in the EUV range of the electromagnetic spectrum. The very hot plasma 210 is generated by, for example, a discharge that causes an at least partially ionized plasma. For efficient generation of radiation, a partial pressure of Xe, li, sn vapor, or any other suitable gas or vapor, for example, of 10Pa, may be required. In one embodiment, an excited plasma of tin (Sn) is provided to generate EUV radiation.

Radiation emitted by the thermal plasma 210 enters the collection chamber 212 from the source chamber 211 via an optional gas barrier or contaminant trap 230 (also referred to as a contaminant barrier or foil trap in some cases) positioned in or after the opening of the source chamber 211. Contaminant trap 230 may include a channel structure. Contaminant trap 230 may also include a gas barrier or a combination of a gas barrier and a channel structure. The contaminant trap or contaminant barrier 230 further indicated herein comprises at least a channel structure known in the art.

The collector chamber 211 may comprise a radiation collector CO, which may be a so-called grazing incidence collector. The radiation collector CO has an upstream radiation collector side 251 and a downstream radiation collector side 252. Radiation passing through the collector CO may be reflected by the grating spectral filter 240 to be focused in the virtual source point IF along the optical axis indicated by the dash-dot line "O". The virtual source point IF is commonly referred to as an intermediate focus, and the source collector module is arranged such that the intermediate focus IF is located at or near the opening 221 in the enclosure 220. The virtual source point IF is an image of the radiation-emitting plasma 210.

The radiation then passes through an illumination system IL, which may include a facet field lens device 22 and a facet pupil lens device 24, the facet pupil lens device 24 being arranged to provide a desired angular distribution of the radiation beam 21 at the patterning device MA and a desired uniformity of the radiation intensity at the patterning device MA. When the radiation beam 21 is reflected at the patterning device MA, which is held by the support structure MT, a patterning beam 26 is formed, and the patterning beam 26 is imaged by the projection system PS via reflective elements 28, 30 onto a substrate W held by the substrate table WT.

There may generally be more elements in the illumination optical unit IL and the projection system PS than shown. Grating spectral filter 240 may optionally be present, depending on the type of lithographic apparatus. Furthermore, there may be more mirrors than shown in the figures, for example, there may be 1-6 more reflective elements in the projection system PS than shown in fig. 21.

As shown in fig. 21, collector optics CO are depicted as nested collectors with grazing incidence reflectors 253, 254, and 255, just one example of a collector (or collector mirror). The grazing incidence reflectors 253, 254 and 255 are arranged axially symmetrically about the optical axis O, and this type of collector optics CO may be used in combination with a discharge-generated plasma source (commonly referred to as DPP source).

Alternatively, the source collector module SO may be part of an LPP radiation system, as shown in fig. 22. The laser LA is arranged to deposit laser energy into a fuel, such as xenon (Xe), tin (Sn) or lithium (Li), to produce a highly ionized plasma 210 with an electron temperature of tens of eV. The high energy radiation generated during de-excitation and recombination of these ions is emitted from the plasma, collected by near normal incidence collector optics CO, and focused onto an opening 221 in the enclosed structure 220.

Embodiments may be further described using the following clauses:

1. a non-transitory computer-readable medium for determining depth information for a structure formed on a substrate based on a single scanning electron microscope, SEM, image of the structure, the depth information determined using a convolutional neural network, CNN, trained to simulate parallax effects present in paired stereo SEM images, the medium comprising instructions stored therein, which when executed by one or more processors, cause operations comprising:

Receiving a single SEM image of a structure patterned on a substrate via an SEM tool;

inputting the SEM image to the CNN to predict parallax data associated with the SEM image, the CNN being trained by:

acquiring a stereoscopic pair of SEM images of a patterned substrate via the SEM tool, the stereoscopic pair comprising a first SEM image acquired at a first electron beam tilt setting of the SEM tool and a second SEM image acquired at a second electron beam tilt setting of the SEM tool;

generating parallax data between the first SEM image and the second SEM image using the CNN;

combining the parallax data with the second SEM image to generate a reconstructed image of the first SEM image; and

comparing the reconstructed image to the first SEM image; and

depth information associated with the structure patterned on the substrate is generated based on the predicted disparity data.

2. A non-transitory computer-readable medium for determining a model configured to generate data for estimating depth information of a structure of a patterned substrate, the medium comprising instructions stored therein, which when executed by one or more processors, cause operations comprising:

Acquiring a pair of images of a structure of a patterned substrate, the pair of images including a first image captured at a first angle relative to the patterned substrate and a second image captured at a second angle different from the first angle;

generating, via a model using the first image as an input, disparity data between the first image and the second image, the disparity data being indicative of depth information associated with the first image;

combining the parallax data with the second image to generate a reconstructed image corresponding to the first image; and

one or more parameters of the model are adjusted based on a performance function such that the performance function is within a specified performance threshold, the performance function being a function of the parallax data, the reconstructed image, and the first image, the model being configured to generate data that is convertible to depth information of a structure of a patterned substrate.

3. The medium of clause 2, wherein the parallax data comprises coordinate differences of similar features within the first image and the second image.

4. The medium of clause 3, wherein the reconstructed image is generated by:

A synthesis operation is performed between the parallax data and the second image to generate the reconstructed image.

5. The medium of any one of clauses 2-4, wherein the performance function further comprises a loss function calculated based on parallax characteristics associated with a pair of stereoscopic images of a previous patterned substrate or substrates and the parallax data predicted by the model.

6. The medium of clause 5, wherein the parallax characteristic based on the previously patterned substrate comprises parallax expressed as a piecewise smoothing function, wherein derivatives of the parallax are piecewise continuous.

7. The medium of clause 5, wherein the parallax characteristic based on the previously patterned substrate comprises parallax expressed as piecewise constant.

8. The medium of clause 5, wherein the parallax characteristic based on the previously patterned substrate comprises parallax expressed as a function of having jumps at edges of structures within an image, the edges being detected based on gradients of intensity profiles within the image.

9. The medium of any one of the preceding clauses, wherein adjusting the one or more parameters of the model is an iterative process, each iteration comprising:

Determining the performance function based on the parallax data and the reconstructed image;

determining whether the performance function is within the specified performance threshold; and

in response to the performance function not being within a specified variance threshold, the one or more parameters of the model are adjusted such that the performance function is within the specified performance threshold, the adjusting based on a gradient of the performance function relative to the one or more parameters.

10. The medium of any of the preceding clauses wherein the first image is a normal image associated with an electron beam oriented perpendicular to the patterned substrate and the second image is associated with an electron beam oriented at an angle greater than 90 ° or less than 90 ° relative to the patterned substrate.

11. The medium of clause 2, wherein acquiring the pair of images comprises acquiring multiple pairs of SEM images of the patterned substrate, each pair comprising a first SEM image associated with a first beam tilt setting of a metrology tool and a second SEM image associated with a second beam tilt setting of the metrology tool.

12. The medium of clause 2, wherein determining the model comprises:

A plurality of images of the structure are acquired, each image acquired at a different angle.

13. The medium of clause 12, wherein the performance function further comprises another loss function calculated as a sum of similarities between images of the plurality of images and corresponding reconstructed images.

14. The medium of any one of the preceding clauses, wherein the operations further comprise:

a single SEM image of a patterned substrate is acquired via the metrology tool at the first electron beam tilt setting of the metrology tool.

15. The medium of clause 14, wherein the single SEM image is a normal image acquired by directing an electron beam substantially perpendicular to the patterned substrate.

16. The medium of clause 14, wherein the operations further comprise:

executing the model using the single SEM image as input to generate parallax data associated with the single SEM image; and

a conversion function is applied to the parallax data to generate depth information for the structure in the SEM image.

17. The medium of clause 16, wherein the operations further comprise:

a physical characteristic of the structure of the patterned substrate is determined based on the depth information, the physical characteristic including a relative positioning of a shape, a dimension, or a polygonal shape with respect to each other at one or more depths of a feature of the structure.

18. A non-transitory computer-readable medium for determining depth information from an image of a patterned substrate, the medium comprising instructions stored therein, which when executed by one or more processors, cause operations comprising:

receiving, via the system, an image of a structure on a patterned substrate; and

a model is performed using the images to determine depth information for the structures on the patterned substrate, the model being trained to estimate depth information from only a single image of a structure and stored on the medium.

19. The medium of clause 18, wherein the operations further comprise:

20. The medium of clause 19, wherein the operations further comprise:

determining a defect in the structure based on the physical characteristic, the defect indicating that the structure does not meet a design specification; and

one or more parameters of the patterning process are adjusted based on the defects to eliminate the defects during subsequent processing of the patterning process.

21. The medium of clause 19, wherein the image is a vertical SEM image or an optical image, and wherein the system is any one of a computer system, SEM tool, or atomic force microscope.

22. The medium of any one of the preceding clauses, further comprising an operation for determining the model, the operation comprising:

one or more parameters of the model are adjusted based on a performance function such that the performance function is within a specified performance threshold, the performance function being a function of the parallax data, the reconstructed image, and the first image, the model configured to generate data that is convertible to depth information of a structure of a patterned substrate.

23. The medium of clause 22, wherein the parallax data comprises coordinate differences of similar features within the first image and the second image.

24. The medium of clause 23, wherein the reconstructed image is generated by:

25. The medium of any one of clauses 22 to 24, wherein the performance function further comprises a loss function calculated based on parallax characteristics associated with a pair of stereoscopic images of a previous patterned substrate or substrates and the parallax data predicted by the model.

26. The medium of clause 25, wherein the parallax characteristic based on the previously patterned substrate comprises parallax expressed as a piecewise smoothing function, wherein derivatives of the parallax are piecewise continuous.

27. The medium of clause 25, wherein the parallax characteristic based on the previously patterned substrate comprises parallax expressed as a piecewise constant.

28. The medium of clause 25, wherein the parallax characteristic based on the previously patterned substrate comprises parallax expressed as a function of having jumps at edges of structures within an image, the edges being detected based on gradients of intensity profiles within the image.

29. The medium of any one of the preceding clauses, wherein adjusting the one or more parameters of the model is an iterative process, each iteration comprising:

in response to the performance function not being within the specified variance threshold, the one or more parameters of the model are adjusted such that the performance function is within the specified performance threshold, the adjusting based on a gradient of the performance function relative to the one or more parameters.

30. The medium of any of the preceding clauses wherein the first image is a normal image associated with an electron beam oriented perpendicular to the patterned substrate and the second image is associated with an electron beam oriented at an angle greater than 90 ° or less than 90 ° relative to the patterned substrate.

31. A method for determining a model configured to generate data for estimating depth information of a structure of a patterned substrate, the method comprising:

32. The method of clause 31, wherein the parallax data comprises coordinate differences of similar features within the first image and the second image.

33. The method of clause 32, wherein the reconstructed image is generated by:

34. The method of any of the preceding clauses, wherein the performance function further comprises a loss function calculated based on parallax characteristics associated with a pair of stereoscopic images of a previous patterned substrate or substrates and the parallax data predicted by the model.

35. The method of clause 34, wherein the parallax characteristics include at least one of:

the disparity is a piecewise smoothing function, wherein the derivative of the disparity is piecewise continuous;

parallax is piecewise constant; and

parallax is a function of having jumps at edges of structures within an image, which edges are detected based on gradients of intensity profiles within the image.

36. The method of any of clauses 31-35, wherein adjusting the one or more parameters of the model is an iterative process, each iteration comprising:

37. The method of any of clauses 31 to 36, wherein the first image is a normal image associated with an electron beam directed perpendicular to the patterned substrate and the second image is associated with an electron beam directed at an angle of greater than 90 ° or less than 90 ° relative to the patterned substrate.

38. The method of clause 31, wherein acquiring the pair of images comprises:

a plurality of pairs of SEM images of a patterned substrate are acquired via a metrology tool, each pair including a first SEM image associated with a first beam tilt setting of the metrology tool and a second SEM image associated with a second beam tilt setting of the metrology tool.

39. The method of clause 31, wherein the first SEM image is captured by a first SEM tool and the second SEM image is captured by a second SEM tool.

40. The method of clause 38, wherein the operations further comprise:

41. The method of clause 38, wherein the single SEM image is a normal image acquired by directing an electron beam substantially perpendicular to the patterned substrate.

42. The method of clause 38, wherein the operations further comprise:

executing the model using the single SEM image as input to generate parallax data associated with the SEM image; and

43. The method of clause 42, wherein the operations further comprise:

44. A method for determining depth information from SEM images, the method comprising:

receiving, via a system, a vertical SEM image of a structure on a patterned substrate, the vertical SEM image associated with an electron beam directed perpendicular to the patterned substrate; and

a model using the vertical SEM images is performed to determine depth information for the structures on the patterned substrate, the model being trained to estimate depth information from only a single SEM image.

45. The method of clause 44, wherein the operations further comprise:

46. The method of clause 45, wherein the operations further comprise:

47. A non-transitory computer-readable medium for semi-supervised training of a depth estimation model for determining depth data of a structure formed on a substrate from a single scanning electron microscope, SEM, image of the structure, the medium comprising instructions stored therein, which when executed by one or more processors, cause operations comprising:

receiving a single SEM image of a structure patterned on a substrate; and

predicting depth data associated with the SEM image via a convolutional neural network CNN using the SEM image as input, wherein the CNN:

trained to predict depth data of an input SEM image based on a plurality of simulated SEM images marked with simulated depth data; and

calibrated to adjust the predicted depth data to correspond to observed depth data of SEM images of the structures patterned on the substrate as determined based on data from a metrology tool.

48. The medium of clause 47, wherein the calibrating of the CNN comprises:

receiving an SEM image and observed depth data associated with the SEM image of the structure patterned on the substrate;

inputting the SEM image to the model to predict depth data;

adjusting the predicted depth data by comparing the predicted depth data with the observed depth data; and

the model is tuned based on the adjusted predicted depth data such that the predicted depth data is within a specified matching threshold of the observed depth data.

49. A non-transitory computer-readable medium for generating a model configured to estimate depth data of a structure of a patterned substrate, the medium comprising instructions stored therein, which when executed by one or more processors, cause operations comprising:

obtaining, via a simulator, a plurality of simulated metrology images of a structure, each simulated metrology image of the plurality of simulated metrology images being associated with depth data used by the simulator;

generating a model configured to predict depth data from an input image based on the plurality of simulated metrology images and corresponding simulated depth data;

Acquiring an image of the structure patterned on the substrate and observed depth data; and

the model is calibrated based on the image and the observed depth data such that the predicted depth data is within a specified matching threshold of the observed depth data.

50. The medium of clause 49, wherein the image is acquired via an image capture tool and the observed depth data is acquired from a metrology tool configured to provide depth information of the structures on the patterned substrate.

51. The medium of clause 50, wherein the image capture tool is an SEM tool and the image is an SEM image.

52. The medium of clause 50, wherein the observed depth data is obtained from the metrology tool, the metrology tool being one or more of an optical metrology tool, an SEM tool, or an atomic force microscope AFM, the optical metrology tool configured to measure a structure of the patterned substrate and extract depth information based on diffraction-based measurements of the patterned substrate.

53. The medium of any one of clauses 49-52, wherein the simulator is a monte carlo simulator configured to generate the simulated metrology image of the structure by changing depth dependent parameters defined in the monte carlo simulator.

54. The medium of any one of clauses 49 to 53, wherein generating the model is an iterative process, each iteration comprising:

inputting the plurality of simulated metrology images to the model to predict depth data associated with each of the plurality of simulated metrology images;

comparing the predicted depth data with the simulator depth data; and

model parameters of the model are adjusted such that the predicted depth data is within specified matching thresholds of the simulator depth data.

55. The medium of any one of clauses 49 to 54, wherein calibrating the model is an iterative process, each iteration comprising:

inputting the image to the model to predict depth data;

model parameters of the model are adjusted based on the adjusted predicted depth data such that the model generates depth data that is within a matching threshold of observed depth data.

56. The medium of clause 55, wherein the adjusting the predicted depth data comprises:

Extracting a one-dimensional height profile of the structure along a given direction from the predicted depth data;

comparing the predicted height profile with a one-dimensional height profile of the observed depth data of the structure along the given direction; and

the predicted height profile is modified to match the height profile of the observed depth data of the structure.

57. The medium of clause 55, wherein the adjusting the predicted depth data comprises:

extracting predicted shape parameters of the structure from the predicted depth data and extracting true shape parameters from the observed depth data;

comparing the predicted shape parameter with the real shape parameter of the structure; and

modifying the predicted shape parameters to match the real shape parameters.

58. The medium of clause 55, wherein the adjusting the predicted depth data comprises:

deriving a predicted average height of the structure from the predicted depth data of the structure, and deriving a true average height of the structure from the observed depth data;

scaling the predicted average height to match the true average height.

59. The medium of any one of clauses 49 to 58, wherein the depth data comprises at least one of:

one-dimensional height data of the structure tracked from the image or simulated image of the structure on the patterned substrate;

two-dimensional height data of the structure tracked from the image or the simulated image of the structure on the patterned substrate; or alternatively

A shape parameter obtained from the metrology tool used to measure a structure of the patterned substrate.

60. The medium of any one of clauses 49 to 59, wherein the one-dimensional height data comprises a height of the structure along a cut line of the image.

61. The medium of any one of clauses 49 to 60, wherein the two-dimensional height data comprises a height of the structure on the image along a first direction and a second direction.

62. The medium of any one of clauses 49 to 61, wherein the shape parameter comprises one or more of a top CD measured at a top of the structure, a bottom CD measured at a bottom of the structure, or a sidewall angle of the structure.

63. The medium of clause 62, wherein the depth data comprises physical properties of the structures of the patterned substrate, the physical properties comprising a shape, a size, a sidewall angle, or a relative positioning of polygonal shapes with respect to each other at one or more depths of features of the structures.

64. A non-transitory computer-readable medium for determining depth data of a structure formed on a substrate, the medium comprising instructions stored therein, which when executed by one or more processors, cause operations comprising:

receiving an image of a structure patterned on a substrate via an image capture tool; and

depth data associated with the image is determined via a model using the image as input, wherein the model is trained to predict depth data based on a simulated image and corresponding simulated depth data, and calibrated using image data and observed depth data, the calibrating comprising scaling the predicted depth data to correspond to the observed depth data.

65. The medium of clause 64, wherein the operations further comprise:

66. The medium of clause 64, wherein the operations further comprise:

67. The medium of clause 64, wherein the image capture tool is an SEM tool and the image is an SEM image.

68. A method for generating a model configured to estimate depth data of a structure of a patterned substrate, the method comprising:

generating a model configured to predict depth data from the input image based on the plurality of simulated metrology images and the corresponding simulated depth data;

69. The method of clause 67, wherein the image is acquired via an image capture tool, and the observed depth data is acquired from a metrology tool configured to provide depth information of the structures on the patterned substrate.

70. The method of clause 68, wherein the image capture tool is an SEM tool and the image is an SEM image.

71. The method of clause 68, wherein the observed depth data is obtained from the metrology tool, the metrology tool being one or more of an optical metrology tool, an SEM tool, or an atomic force microscope AFM, the optical metrology tool being configured to measure the structure of the patterned substrate and extract depth information based on diffraction-based measurements of the patterned substrate.

72. The method of any one of clauses 68 to 71, wherein the simulator is a monte carlo simulator configured to generate the simulated metrology image of the structure by varying depth dependent parameters defined in the monte carlo simulator.

73. The method of any of clauses 68 to 72, wherein generating the model is an iterative process, each iteration comprising:

comparing the predicted depth data with the simulator depth data; and

74. The method of any one of clauses 68 to 73, wherein calibrating the model is an iterative process, each iteration comprising:

inputting the image to the model to predict depth data;

75. The method of clause 74, wherein the adjusting the predicted depth data comprises:

comparing the predicted height profile with a one-dimensional height profile of the observed depth data of the structure along the direction; and

76. The method of clause 74, wherein the adjusting the predicted depth data comprises:

modifying the predicted shape parameters to match the real shape parameters.

77. The method of clause 74, wherein the adjusting the predicted depth data comprises:

deriving a predicted average height of the structure from the predicted depth data of the structure, and deriving a true average height of the structure from the observed depth data; and

scaling the predicted average height to match the true average height.

78. The method of any one of clauses 68 to 77, wherein the depth data comprises at least one of:

79. The method of any of clauses 68 to 78, wherein the one-dimensional height data comprises a height of the structure along a cut line of the image.

80. The method of any of clauses 68 to 79, wherein the two-dimensional height data comprises a height of the structure on the image along a first direction and a second direction.

81. The method of any one of clauses 68 to 80, wherein the shape parameter comprises one or more of a top CD measured at a top of the structure, a bottom CD measured at a bottom of the structure, or a sidewall angle of the structure.

82. The method of clause 81, wherein the depth data comprises physical properties of the structures of the patterned substrate, the physical properties comprising a shape, a size, a sidewall angle, or a relative positioning of polygonal shapes with respect to each other at one or more depths of features of the structures.

83. A non-transitory computer-readable medium for estimating depth data of a structure formed on a substrate using a convolutional neural network CNN, the CNN configured to estimate depth data based on a scanning electron microscope SEM image of the structure, the medium comprising instructions stored therein that when executed by one or more processors cause operations comprising:

Receiving a single SEM image of a structure on a patterned substrate; and

depth data associated with the structure of the patterned substrate is estimated via the CNN using the SEM images as input, wherein the CNN is trained using training data comprising a plurality of SEM images of structures and a plurality of simulated contours of the structures, the simulated contours generated by a process model associated with patterning.

84. The medium of clause 83, wherein the CNN is a combination of a generator model and a discriminator model trained together, the training being an iterative process, each iteration comprising:

acquiring the plurality of SEM images and the plurality of simulated contours, each SEM image paired with a simulated contour;

inputting SEM images of the plurality of SEM images to the generator model;

estimating the depth data of the input SEM image using the generator model;

classifying the estimated depth data as indicating that the estimated depth data corresponds to a first class of the reference or as indicating that the estimated depth data does not correspond to a second class of the reference via a discriminator model using a simulated contour paired with the input SEM image as a reference; and

Model parameters of both the generator model and the discriminator model are updated such that the generator model estimates depth data such that the discriminator model classifies the estimated depth into a first category.

85. The medium of clause 83, wherein the CNN is a combination of a generator model and a discriminator model that are trained together, the training being an iterative process, each iteration comprising:

inputting SEM images of the plurality of SEM images to the generator model;

estimating the depth data of the input SEM image using the generator model;

extracting key performance indicators, KPIs, associated with the structure based on intensity values associated with the structure in the estimated depth data;

classifying the estimated depth data as indicating that the estimated depth data corresponds to a first class of the reference or that the estimated depth data does not correspond to a second class of the reference via the evaluator model using (i) the extracted KPIs and (ii) the plurality of simulated contours and a simulated KPI of the plurality of simulated contours as references; and

model parameters of both the generator model and the discriminator model are updated such that the generator model estimates depth data such that the discriminator model classifies the estimated depth into the first category.

86. The medium of clause 85, wherein the CNN is further trained in conjunction with a second generator model configured to generate SEM images from the estimated depth data, the training being an iterative process, each iteration comprising:

inputting SEM images of the plurality of SEM images to the CNN;

estimating the depth data of the input SEM image using the CNN;

generating a predicted SEM image via the second generator model using the estimated depth data as input; and

model parameters of both the CNN and the SEM generator model are updated such that the difference between the input SEM image and the predicted SEM image is within a specified difference threshold.

87. The medium of clause 83, wherein the process model is a calibrated deterministic process model calibrated such that a critical dimension CD of a simulated contour of the structure is within a CD threshold of observed data associated with the patterned substrate, but does not necessarily meet a random variation specification associated with the observed data.

88. The medium of clause 83, wherein the process model is a calibrated random process model calibrated such that key performance indicators KPIs extracted from the depth data generated by the model are within specified thresholds of observed KPIs extracted from inspection data associated with the patterned substrate.

89. The medium of clause 88, wherein the inspection data is obtained from a plurality of features of the patterned substrate, the plurality of features being formed using a range of dose and focus conditions.

90. The medium of clause 88, wherein the KPIs include one or more of the following: the CD of the structure, the local CD uniformity LCDU associated with the structure, the line edge roughness LER associated with the structure, the line width roughness LWR associated with the line space pattern, the contact edge roughness associated with the contact hole, the random edge placement error SEPE, and the defect rate associated with the structure.

91. A non-transitory computer-readable medium for generating a model configured to estimate depth data of a structure of a patterned substrate, the medium comprising instructions stored therein, which when executed by one or more processors, cause operations comprising:

acquiring (i) a plurality of SEM images of a structure associated with programming changes in a mask pattern and (ii) a simulated contour of the structure based on the programming changes, each SEM image of the plurality of SEM images paired with a simulated contour corresponding to a programming change in the mask pattern; and

A model for estimating depth data of a structure is generated based on the plurality of SEM images paired with the corresponding simulated contour such that the estimated depth data is within acceptable thresholds of depth data associated with the simulated contour.

92. The medium of clause 91, wherein the programming changes include changes in one or more of the following: a variation in assist features associated with the mask pattern, a variation in primary features associated with the mask pattern, or a variation in resist coating thickness.

93. The medium of clause 92, wherein the assist feature variation comprises a variation in a dimension of an assist feature of the mask pattern, a variation in a distance of the assist feature from a main feature of the mask pattern, or both.

94. The medium of clause 91, wherein the simulated contour of the structure is generated by a calibrated deterministic process model associated with patterning processes that use the programming changes in the mask pattern.

95. The medium of clause 94, wherein the calibrated deterministic process model is a process model calibrated to meet a critical dimension CD of a structure but not to meet a local CD uniformity LCDU associated with the structure, a line edge roughness LER associated with the structure, a line width roughness LWR associated with the structure, or random variations associated with the structure.

96. The medium of clause 91, wherein the depth data generated comprises at least one of: height data of the structure, shape parameters of the structure, or a voxel map of the overhanging structure.

97. The medium of clause 96, wherein the shape parameters comprise one or more of a top CD measured at a top of the structure, a bottom CD measured at a bottom of the structure, or a sidewall angle of the structure.

98. The medium of clause 91, wherein the generating the model is based on generating a resistive network or an encoder-decoder network with a specific objective function.

99. The medium of clause 98, wherein the generating of the model comprises: the first model is trained in conjunction with the second model such that the first model generates depth data using SEM images of the plurality of images as input, and the second model classifies the generated depth data into a first class or a second class based on pairs of the simulated contours.

100. The medium of clause 99, wherein the generating of the model comprises:

inputting an SEM image of the plurality of SEM images;

estimating the depth data of the input SEM image using the first model;

Classifying the estimated depth data as indicating that the estimated depth data corresponds to a first class of the reference or as indicating that the estimated depth data does not correspond to a second class of the reference via a second model using the simulated contour as a reference; and

model parameters of both the first model and the second model are updated such that the first model estimates depth data such that the second model classifies the estimated depth into the first category.

101. The medium of clause 100, wherein the first model and the second model are convolutional neural network CNN or deep CNN.

102. The medium of clause 101, wherein the model parameters are weights and deviations of one or more layers of the CNN or DCNN.

103. The medium of clause 91, wherein the operations further comprise:

depth data associated with the SEM image is determined via the model using the received SEM image as input.

104. The medium of clause 103, wherein the operations further comprise:

a physical characteristic of the structure of the patterned substrate is determined based on the depth data, the physical characteristic including a relative positioning of a shape, a dimension, or a polygonal shape with respect to each other at one or more depths of a feature of the structure.

105. The medium of clause 104, wherein the operations further comprise:

106. A non-transitory computer-readable medium for generating a model configured to estimate depth data of a structure of a patterned substrate, the medium comprising instructions stored therein, which when executed by one or more processors, cause operations comprising:

acquiring (i) a plurality of SEM images of a structure, (ii) a simulated contour of the structure, and (iii) a Key Performance Indicator (KPI) associated with the simulated contour; and

a model for estimating depth data of a structure is generated based on the plurality of SEM images, the simulated contour, and the KPI such that a KPI associated with the estimated depth data is within an acceptable threshold of the KPI associated with the simulated contour.

107. The medium of clause 106, wherein the simulated contour of the structure is generated by a calibrated random process model associated with patterning.

108. The medium of clause 107, wherein the calibrated random process model is a process model calibrated to meet one or more KPI specifications, KPI being one or more of the following: critical dimension CD of a structure, local CD uniformity LCDU associated with the structure, line edge roughness LER associated with the structure, defect rate associated with the structure, line width roughness LWR associated with a line space pattern, contact edge roughness associated with a contact hole, random edge placement error SEPE, or random variation associated with geometry of the structure.

109. The medium of clause 108, wherein the depth data generated comprises at least one of: height data of the structure, shape parameters of the structure, or a voxel map associated with the overhang structure.

110. The medium of clause 109, wherein the shape parameters comprise one or more of a top CD measured at a top of the structure, a bottom CD measured at a bottom of the structure, or a sidewall angle of the structure.

111. The medium of clause 108, wherein the generating of the model is based on generating a resistance network or encoder-decoder network.

112. The medium of clause 111, wherein the generating of the model comprises: training a first model in conjunction with a second model such that the first model generates depth data using SEM images of the plurality of images as input, and classifying the generated depth data into a first class or a second class based on the KPIs associated with the simulated contours and the generated depth data using the second model.

113. The medium of clause 112, wherein the generating of the model is an iterative process, each iteration comprising:

inputting SEM images of the plurality of SEM images to the first model;

estimating the depth data of the input SEM image using the first model;

extracting KPIs from the estimated depth data;

classifying the estimated depth data as either a first class indicating that the estimated depth data does not correspond to the reference or a second class indicating that the estimated depth data does not correspond to the reference via the second model using (i) the extracted KPIs and (ii) the plurality of simulated contours and simulated KPIs of the plurality of simulated contours as references; and

114. The medium of clause 113, wherein the first model is further trained in conjunction with a generator third model configured to generate SEM images from the estimated depth data, the training being an iterative process, each iteration comprising:

inputting SEM images of the plurality of SEM images to the first model;

estimating the depth data of the input SEM image using the first model;

generating a predicted SEM image via the third model using the estimated depth data as input; and

model parameters of both the first model and the third model are updated such that a difference between the input SEM image and the predicted SEM image is within a specified difference threshold.

115. The medium of clause 114, wherein the first model, the second model, and the third model are convolutional neural network CNN or depth CNN.

116. The medium of clause 115, wherein the model parameters are weights and biases of the CNN or one or more layers of the DCNN.

117. The medium of clause 108, wherein the generating of the model is based on generating a resistance network or encoder-decoder network.

118. The medium of clause 106, wherein the operations further comprise:

119. The medium of clause 118, wherein the operations further comprise:

120. The medium of clause 119, wherein the operations further comprise:

121. A method for generating a model configured to estimate depth data of a structure of a patterned substrate, the method comprising:

122. The method of clause 121, wherein the programming changes include changes in one or more of the following: a variation in assist features associated with the mask pattern, a variation in primary features associated with the mask pattern, or a variation in resist coating thickness.

123. The method of clause 122, wherein the assist feature variation comprises a variation in a dimension of an assist feature of the mask pattern, a variation in a distance of the assist feature to a main feature of the mask pattern, or both.

124. The method of clause 121, wherein the simulated contour of the structure is generated by a calibrated deterministic process model associated with patterning processes that use the programming changes in the mask pattern.

125. The method of clause 124, wherein the calibrated deterministic process model is a process model calibrated to meet a critical dimension CD of a structure but not to meet a local CD uniformity LCDU associated with the structure, a line edge roughness LER associated with the structure, a line width roughness LWR associated with the structure, or random variations associated with the structure.

126. The method of clause 121, wherein the depth data generated comprises at least one of: height data of the structure, shape parameters of the structure, or a voxel map of the overhanging structure.

127. The method of clause 126, wherein the shape parameters comprise one or more of a top CD measured at a top of the structure, a bottom CD measured at a bottom of the structure, or a sidewall angle of the structure.

128. The method of clause 121, wherein the generating the model is based on generating a resistive network or an encoder-decoder network with a specific objective function.

129. The method of clause 128, wherein the generating of the model comprises: the first model is trained in conjunction with the second model such that the first model generates depth data using SEM images of the plurality of images as input, and the second model classifies the generated depth data into a first class or a second class based on pairs of the simulated contours.

130. The method of clause 129, wherein the generating of the model comprises:

inputting an SEM image of the plurality of SEM images;

estimating the depth data of the input SEM image using the first model;

131. The method of clause 130, wherein the first model and the second model are convolutional neural network CNN or deep CNN.

132. The method of clause 131, wherein the model parameters are weights and biases of the CNN or one or more layers of the DCNN.

133. The method of clause 121, wherein the operations further comprise:

134. The method of clause 133, wherein the operations further comprise:

135. The method of clause 134, wherein the operations further comprise:

136. A method for generating a model configured to estimate depth data of a structure of a patterned substrate, the method comprising:

acquiring (i) a plurality of SEM images of a structure, (ii) a simulated contour of the structure, and (iii) a key performance indicator KPI associated with the simulated contour; and

137. The method of clause 136, wherein the simulated contour of the structure is generated by a calibrated random process model associated with patterning.

138. The method of clause 137, wherein the calibrated random process model is a process model calibrated to satisfy one or more KPIs comprising: critical dimension CD of a structure, local CD uniformity LCDU associated with the structure, line edge roughness LER associated with the structure, defect rate associated with the structure, line width roughness LWR associated with a line space pattern, contact edge roughness associated with a contact hole, random edge placement error SEPE, or random variation associated with geometry of the structure.

139. The method of clause 138, wherein the depth data generated comprises at least one of: height data of the structure, shape parameters of the structure, or a voxel map associated with the overhang structure.

140. The method of clause 139, wherein the shape parameters comprise one or more of a top CD measured at a top of the structure, a bottom CD measured at a bottom of the structure, or a sidewall angle of the structure.

141. The method of clause 138, wherein the generating of the model is based on generating a resistance network or encoder-decoder network.

142. The method of clause 141, wherein the generating of the model comprises: training a first model in conjunction with a second model such that the first model generates depth data using SEM images of the plurality of images as input, and classifying the generated depth data into a first class or a second class based on the KPIs associated with the simulated contours and the generated depth data using the second model.

143. The method of clause 142, wherein the generating of the model is an iterative process, each iteration comprising:

inputting SEM images of the plurality of SEM images to the first model;

estimating the depth data of the input SEM image using the first model;

extracting KPIs from the estimated depth data;

144. The method of clause 143, wherein the first model is further trained in conjunction with a generator third model configured to generate SEM images from the estimated depth data, the training being an iterative process, each iteration comprising:

inputting SEM images of the plurality of SEM images to the first model;

estimating the depth data of the input SEM image using the first model;

model parameters of both the first model and the third model are updated so that a difference between the input SEM image and the predicted SEM image is within a specified difference threshold.

145. The method of clause 144, wherein the first model, the second model, and the third model are convolutional neural network CNN or depth CNN.

146. The method of clause 145, wherein the model parameters are weights and biases of the CNN or one or more layers of the DCNN.

147. The method of clause 138, wherein the generating of the model is based on generating a resistance network or encoder-decoder network.

148. The method of clause 136, wherein the operations further comprise:

149. The method of clause 148, wherein the operations further comprise:

150. The method of clause 149, wherein the operations further comprise:

151. A system for determining depth information from an image of a patterned substrate, the system comprising:

Electron beam optics configured to capture an image of the patterned substrate; and

one or more processors configured to:

inputting the image of the patterned substrate to a trained model configured to generate depth related data from a single image; and

depth information is extracted from the image by executing the trained model.

The concepts disclosed herein may simulate or mathematically model any general imaging system for imaging sub-wavelength features and may be particularly useful for emerging imaging technologies capable of producing shorter and shorter wavelengths. Emerging technologies that have been in use include EUV (extreme ultraviolet), DUV lithography, which can produce 193nm wavelengths using ArF lasers, and even 157nm wavelengths using fluorine lasers. Furthermore, EUV lithography can produce wavelengths in the range of 20nm-5nm by using synchrotrons or by striking a material (solid or plasma) with high energy electrons in order to produce photons in this range.

While the concepts disclosed herein may be used for imaging on a substrate such as a silicon wafer, it should be understood that the disclosed concepts may be used with any type of lithographic imaging system, for example, for imaging on substrates other than silicon wafers.

As used herein, unless specifically stated otherwise, the term "or" includes all possible combinations unless it is not possible. For example, if a claim database may include a or B, the database may include a, or B, or a and B unless specifically stated or not possible otherwise. As a second example, if a claim database may include A, B or C, the database may include a, or B, or C, or a and B, or a and C, or B and C, or a and B and C, unless specifically stated otherwise or not possible.

The above description is intended to be illustrative, and not restrictive. It will therefore be clear to a person skilled in the art that modifications may be made as described without departing from the scope of the claims set out below.

Claims

1. A non-transitory computer-readable medium for determining depth information for a structure formed on a substrate based on a single scanning electron microscope, SEM, image of the structure, the depth information determined using a convolutional neural network, CNN, trained to simulate parallax effects present in paired stereoscopic SEM images, the medium comprising instructions stored therein, which when executed by one or more processors, cause operations comprising:

comparing the reconstructed image to the first SEM image; and

3. The medium of claim 2, wherein the parallax data comprises coordinate differences of similar features within the first image and the second image.

4. A medium according to claim 3, wherein the reconstructed image is generated by:

5. The medium of any one of claims 2 to 4, wherein the performance function further comprises a loss function calculated based on parallax characteristics associated with a pair of stereoscopic images of a previous patterned substrate or substrates and the parallax data predicted by the model.

6. The medium of claim 5, wherein the disparity characteristic based on the previously patterned substrate comprises a disparity expressed as a piecewise smoothing function, wherein a derivative of the disparity is piecewise continuous.

7. The medium of claim 5, wherein the parallax characteristic based on the previously patterned substrate comprises a parallax expressed as piecewise constant.

8. The medium of claim 5, wherein the parallax characteristic based on the previously patterned substrate comprises parallax expressed as a function of having jumps at edges of structures within an image, the edges detected based on gradients of intensity profiles within the image.

9. The medium of any one of the preceding claims, wherein adjusting the one or more parameters of the model is an iterative process, each iteration comprising:

10. The medium of any of the preceding claims, wherein the first image is a normal image associated with an electron beam oriented perpendicular to the patterned substrate and the second image is associated with an electron beam oriented at an angle greater than 90 ° or less than 90 ° relative to the patterned substrate.

11. The medium of claim 2, wherein acquiring the pair of images comprises acquiring a plurality of pairs of SEM images of a patterned substrate, each pair comprising a first SEM image associated with a first beam tilt setting of a metrology tool and a second SEM image associated with a second beam tilt setting of the metrology tool.

12. The medium of claim 2, wherein determining the model comprises:

13. The medium of claim 12, wherein the performance function further comprises another loss function calculated as a sum of similarities between images of the plurality of images and corresponding reconstructed images.

14. The medium of any one of the preceding claims, wherein the operations further comprise:

15. The medium of claim 14, wherein the single SEM image is a normal image acquired by directing an electron beam substantially perpendicular to the patterned substrate.