WO2022268433A1

WO2022268433A1 - Training a machine learning model for determining burl issues

Info

Publication number: WO2022268433A1
Application number: PCT/EP2022/064473
Authority: WO
Inventors: Mir Farrokh SHAYEGAN SALEK; Robert Renier KERSTEN; Jorge GUTIERREZ SAN SIMON; Heiko Jacob Anthonius ENGWERDA
Original assignee: Asml Netherlands B.V.
Priority date: 2021-06-23
Filing date: 2022-05-27
Publication date: 2022-12-29
Also published as: CN117529692A

Abstract

Described are embodiments for predicting a physical condition of a burl of an object holder of a lithographic apparatus. Images of the burls are provided as an input to a machine learning (ML) model to generate a prediction of a physical condition of the burls. Various ML models may be used to generate the prediction. A first ML model generates the prediction using a first type of burl data, e.g., high resolution images of a burl. A second ML model generates the prediction using a second type of burl data, e.g., a subset of the high resolution images of a burl or a low resolution image of the burls. A third ML model generates the prediction using a third type of burl data, e.g., surface descriptive parameters of a burl that are indicative of properties of a surface of the object holder.

Description

TRAINING A MACHINE LEARNING MODEL FOR DETERMINING BURL ISSUES

TECHNICAL FIELD

[0001] The description herein relates to object holders for use in a lithographic apparatus, and more particularly to determining issues with the surface of the object holders.

BACKGROUND

[0002] A lithographic apparatus is a machine constructed to apply a desired pattern onto a substrate. A lithographic apparatus can be used, for example, in the manufacture of integrated circuits (ICs). A lithographic apparatus may, for example, project a pattern (also often referred to as “design layout” or “design”) of a patterning device (e.g., a mask) onto a layer of radiation-sensitive material (resist) provided on a substrate (e.g., a wafer).

[0003] As semiconductor manufacturing processes continue to advance, the dimensions of circuit elements have continually been reduced while the amount of functional elements, such as transistors, per device has been steadily increasing over decades, following a trend commonly referred to as “Moore’s law”. To keep up with Moore’s law the semiconductor industry is chasing technologies that enable to create increasingly smaller features. To project a pattern on a substrate a lithographic apparatus may use electromagnetic radiation. The wavelength of this radiation determines the minimum size of features which are patterned on the substrate. Typical wavelengths currently in use are 365 nm (i-line), 248 nm, 193 nm and 13.5 nm.

[0004] In a lithographic apparatus, an object such as a substrate to be exposed (which may be referred to as a production substrate) is held on an object holder such as a substrate holder (sometimes referred to as a wafer table). The substrate holder may be moveable with respect to the projection system. The substrate holder usually comprises a solid body made of a rigid material and having similar dimensions in plan to the production substrate to be supported. The substrate-facing surface of the solid body is provided with multiple projections (referred to as burls). The distal surfaces of the burls conform to a flat plane and support the substrate. The burls provide several advantages: a contaminant particle on the substrate holder or on the substrate is likely to fall between burls and therefore does not cause a deformation of the substrate; it is easier to machine the burls so their ends conform to a plane than to make the surface of the solid body flat; and the properties of the burls can be adjusted, e.g., to control the clamping of the substrate.

[0005] However, the burls of the substrate holder wear during use, e.g., due to the repeated loading and unloading of substrates. The burls may also be prone to other issues, e.g., cracks, contamination, scratch, etc. Damaged or otherwise not considered to be “healthy” burls may be problematic to lithographic processes. For example, uneven wear of the burls leads to unflatness of the substrate during exposure which can lead to a reduction of the process window and, in extreme cases, to imaging errors. Due to the very precise manufacturing specifications, it is desirable to identify burl issues so that appropriate actions may be employed to repair, refurbish, replace, or improve the life of a substrate holder.

SUMMARY

[0006] In some embodiments, there is provided a non-transitory computer readable medium having instructions that, when executed by a computer, cause the computer to execute a method for training a machine learning model to determine a physical condition of an object holder of a lithographic apparatus. The method includes: obtaining, from a first machine learning model, a first plurality of condition data that is representative of a physical condition of multiple burls on a surface of an object holder of a lithographic apparatus, wherein the first plurality of condition data is obtained using a plurality of first type of burl data associated with the burls; obtaining a plurality of second type of burl data associated with the burls, the second type of burl data being different from the first type of burl data; and training, based on the plurality of second type of burl data and the first plurality of condition data, a second machine learning model to predict condition data representative of a physical condition of a burl of the burls.

[0007] In some embodiments, there is provided a non-transitory computer readable medium having instructions that, when executed by a computer, cause the computer to execute a method for training a machine learning model to determine a physical condition of an object holder of a lithographic apparatus. The method includes: obtaining first training data having a first plurality of first type of burl data associated with multiple burls on a surface of an object holder of a lithographic apparatus, wherein the first plurality of first type of burl data includes first burl data of a first burl of the burls, wherein the first burl data is associated with first condition data that is representative of a physical condition of the first burl; training, based on the first training data, a first machine learning model to predict condition data for the first burl data such that a cost function that is indicative of a difference between the predicted condition data and the first condition data is reduced; providing a second plurality of first type of burl data associated with the burls as input to the first machine learning model to obtain a second plurality of condition data as second condition data that is representative of physical conditions of the burls; obtaining second training data comprising (a) a plurality of second type of burl data associated with the burls, wherein the plurality of second type of burl data includes second burl data, the second burl data associated with the second condition data; and training, based on the second training data, a second machine learning model to predict condition data for the second burl data such that a cost function that is indicative of a difference between the predicted condition data and the second condition data is reduced.

[0008] In some embodiments, there is provided a non-transitory computer readable medium having instructions that, when executed by a computer, cause the computer to execute a method for determining a physical condition of an object holder of a lithographic apparatus. The method includes: obtaining specified burl data associated with multiple burls on a surface of an object holder of a lithographic apparatus, wherein the specified burl data includes an image of one or more of the multiple burls; providing the specified burl data to a machine learning model; and executing the machine learning model to generate specified condition data of the burls, wherein the specified condition data is representative of physical conditions of the burls.

[0009] In some embodiments, there is provided a non-transitory computer readable medium having instructions that, when executed by a computer, cause the computer to execute a method for training a machine learning model to determine a physical condition of an object holder of a lithographic apparatus. The method includes: obtaining a plurality of image data of multiple burls on a surface of an object holder of a lithographic apparatus, wherein the plurality of image data includes first image data comprising multiple images of a first burl of the burls, wherein the first image data is associated with first condition data that is representative of a physical condition of the first burl; and training, based on the plurality of image data, a machine learning model to predict condition data representative of a physical condition of a burl of the burls.

[0010] In some embodiments, there is provided a non-transitory computer readable medium having instructions that, when executed by a computer, cause the computer to execute a method for training a machine learning model to determine a feature associated with a subject of an image. The method includes: obtaining, from a first machine learning model, a plurality of feature data using a plurality of first type of image data associated with a plurality of subjects, the plurality of feature data including first feature data that is representative of a feature of a first subject of the subjects, the plurality of image data including first image data comprising an image of the first subject at a first resolution; obtaining a plurality of second type of image data associated with the subjects, the plurality of second type of image data including second image data comprising an image of the first subject at a second resolution lower than the first resolution; associating the plurality of second type of image data with the plurality of feature data, wherein the associating includes associating the second image data with the first feature data; and training, based on the plurality of second type of image data and the plurality of feature data, a second machine learning model to predict feature data representative of a feature of a subject of the subjects.

[0011] In some embodiments, there is provided a method for training a machine learning model to determine a physical condition of an object holder of a lithographic apparatus. The method includes: obtaining, from a first machine learning model, a first plurality of condition data that is representative of a physical condition of multiple burls on a surface of an object holder of a lithographic apparatus, wherein the first plurality of condition data is obtained using a plurality of first type of burl data associated with the burls; obtaining a plurality of second type of burl data associated with the burls, the second type of burl data being different from the first type of burl data; and training, based on the plurality of second type of burl data and the first plurality of condition data, a second machine learning model to predict condition data representative of a physical condition of a burl of the burls. [0012] In some embodiments, there is provided a method for training a machine learning model to determine a physical condition of an object holder of a lithographic apparatus. The method includes: obtaining first training data having a first plurality of first type of burl data associated with multiple burls on a surface of an object holder of a lithographic apparatus, wherein the first plurality of first type of burl data includes first burl data of a first burl of the burls, wherein the first burl data is associated with first condition data that is representative of a physical condition of the first burl; training, based on the first training data, a first machine learning model to predict condition data for the first burl data such that a cost function that is indicative of a difference between the predicted condition data and the first condition data is reduced; providing a second plurality of first type of burl data associated with the burls as input to the first machine learning model to obtain a second plurality of condition data as second condition data that is representative of physical conditions of the burls; obtaining second training data comprising (a) a plurality of second type of burl data associated with the burls, wherein the plurality of second type of burl data includes second burl data, the second burl data associated with the second condition data; and training, based on the second training data, a second machine learning model to predict condition data for the second burl data such that a cost function that is indicative of a difference between the predicted condition data and the second condition data is reduced.

[0013] In some embodiments, there is provided a method for determining a physical condition of an object holder of a lithographic apparatus. The method includes: obtaining specified burl data associated with multiple burls on a surface of an object holder of a lithographic apparatus, wherein the specified burl data includes an image of one or more of the multiple burls; providing the specified burl data to a machine learning model; and executing the machine learning model to generate specified condition data of the burls, wherein the specified condition data is representative of physical conditions of the burls.

[0014] In some embodiments, there is provided a method for training a machine learning model to determine a physical condition of an object holder of a lithographic apparatus. The method includes: obtaining a plurality of image data of multiple burls on a surface of an object holder of a lithographic apparatus, wherein the plurality of image data includes first image data comprising multiple images of a first burl of the burls, wherein the first image data is associated with first condition data that is representative of a physical condition of the first burl; and training, based on the plurality of image data, a machine learning model to predict condition data representative of a physical condition of a burl of the burls.

[0015] In some embodiments, there is provided a method for training a machine learning model to determine a feature associated with a subject of an image. The method includes: obtaining, from a first machine learning model, a plurality of feature data using a plurality of first type of image data associated with a plurality of subjects, the plurality of feature data including first feature data that is representative of a feature of a first subject of the subjects, the plurality of image data including first image data comprising an image of the first subject at a first resolution; obtaining a plurality of second type of image data associated with the subjects, the plurality of second type of image data including second image data comprising an image of the first subject at a second resolution lower than the first resolution; associating the plurality of second type of image data with the plurality of feature data, wherein the associating includes associating the second image data with the first feature data; and training, based on the plurality of second type of image data and the plurality of feature data, a second machine learning model to predict feature data representative of a feature of a subject of the subjects. [0016] In some embodiments, there is provided an apparatus for training a machine learning model to determine a physical condition of an object holder of a lithographic apparatus. The apparatus includes a memory storing a set of instructions; and a processor configured to execute the set of instructions to cause the apparatus to perform a method of: obtaining, from a first machine learning model, a first plurality of condition data that is representative of a physical condition of multiple burls on a surface of an object holder of a lithographic apparatus, wherein the first plurality of condition data is obtained using a plurality of first type of burl data associated with the burls; obtaining a plurality of second type of burl data associated with the burls, the second type of burl data being different from the first type of burl data; and training, based on the plurality of second type of burl data and the first plurality of condition data, a second machine learning model to predict condition data representative of a physical condition of a burl of the burls.

[0017] In some embodiments, there is provided an apparatus for training a machine learning model to determine a physical condition of an object holder of a lithographic apparatus. The apparatus includes a memory storing a set of instructions; and a processor configured to execute the set of instructions to cause the apparatus to perform a method of: obtaining first training data having a first plurality of first type of burl data associated with multiple burls on a surface of an object holder of a lithographic apparatus, wherein the first plurality of first type of burl data includes first burl data of a first burl of the burls, wherein the first burl data is associated with first condition data that is representative of a physical condition of the first burl; training, based on the first training data, a first machine learning model to predict condition data for the first burl data such that a cost function that is indicative of a difference between the predicted condition data and the first condition data is reduced; providing a second plurality of first type of burl data associated with the burls as input to the first machine learning model to obtain a second plurality of condition data as second condition data that is representative of physical conditions of the burls; obtaining second training data comprising (a) a plurality of second type of burl data associated with the burls, wherein the plurality of second type of burl data includes second burl data, the second burl data associated with the second condition data; and training, based on the second training data, a second machine learning model to predict condition data for the second burl data such that a cost function that is indicative of a difference between the predicted condition data and the second condition data is reduced. [0018] In some embodiments, there is provided an apparatus for determining a physical condition of an object holder of a lithographic apparatus. The apparatus includes a memory storing a set of instructions; and a processor configured to execute the set of instructions to cause the apparatus to perform a method of: obtaining specified burl data associated with multiple burls on a surface of an object holder of a lithographic apparatus, wherein the specified burl data includes an image of one or more of the multiple burls; providing the specified burl data to a machine learning model; and executing the machine learning model to generate specified condition data of the burls, wherein the specified condition data is representative of physical conditions of the burls.

[0019] In some embodiments, there is provided an apparatus for training a machine learning model to determine a physical condition of an object holder of a lithographic apparatus. The apparatus includes a memory storing a set of instructions; and a processor configured to execute the set of instructions to cause the apparatus to perform a method of: obtaining a plurality of image data of multiple burls on a surface of an object holder of a lithographic apparatus, wherein the plurality of image data includes first image data comprising multiple images of a first burl of the burls, wherein the first image data is associated with first condition data that is representative of a physical condition of the first burl; and training, based on the plurality of image data, a machine learning model to predict condition data representative of a physical condition of a burl of the burls.

[0020] In some embodiments, there is provided an apparatus for training a machine learning model to determine a feature associated with a subject of an image. The apparatus includes a memory storing a set of instructions; and a processor configured to execute the set of instructions to cause the apparatus to perform a method of: obtaining, from a first machine learning model, a plurality of feature data using a plurality of first type of image data associated with a plurality of subjects, the plurality of feature data including first feature data that is representative of a feature of a first subject of the subjects, the plurality of image data including first image data comprising an image of the first subject at a first resolution; obtaining a plurality of second type of image data associated with the subjects, the plurality of second type of image data including second image data comprising an image of the first subject at a second resolution lower than the first resolution; associating the plurality of second type of image data with the plurality of feature data, wherein the associating includes associating the second image data with the first feature data; and training, based on the plurality of second type of image data and the plurality of feature data, a second machine learning model to predict feature data representative of a feature of a subject of the subjects.

BRIEF DESCRIPTION OF THE DRAWINGS

[0021] FIG. 1 is a schematic diagram of a lithographic projection apparatus, according to an embodiment.

[0022] FIG. 2A is a schematic depiction of a substrate holder according to an embodiment. [0023] FIG. 2B is a top view of the surface of the substrate holder according to an embodiment.

[0024] FIG. 3A is a block diagram of a system for generating a prediction of a physical condition of a burl using a first type of burl data, in accordance with an embodiment.

[0025] FIG. 3B shows examples of the first type of burl data that may be used to generate a prediction of a physical condition of a burl, in accordance with an embodiment.

[0026] FIG. 4A is a block diagram of a system for generating a prediction of a physical condition of a burl using a second type of burl data, in accordance with an embodiment.

[0027] FIG. 4B is a block diagram of a system for generating a prediction of a physical condition of a burl using the second type of burl data, in accordance with an embodiment.

[0028] FIG. 5 is a system for training a first burl-condition predictor using the first type of burl data to predict a physical condition of a burl, in accordance with an embodiment.

[0029] FIG. 6A is a flow chart of a process of training the first burl-condition predictor using the first type of burl data to predict a physical condition of a burl, in accordance with an embodiment. [0030] FIG. 6B is a flow chart of another process of training the first burl-condition predictor to predict a physical condition of a burl, in accordance with an embodiment.

[0031] FIG. 7 is a system for training a second burl-condition predictor using the second type of burl data to predict a physical condition of a burl, in accordance with an embodiment.

[0032] FIG. 8A is a flow chart of a process of training the second burl-condition predictor using the second type of burl data to predict a physical condition of a burl, in accordance with an embodiment.

[0033] FIG. 8B is a flow chart of another process of training the second burl-condition predictor to predict a physical condition of one or more burls, in accordance with an embodiment. [0034] FIG. 9 is a block diagram of a system for generating a prediction of a physical condition of a burl using a third type of burl data, in accordance with an embodiment.

[0035] FIG. 10 is a system for training a third burl-condition predictor using a third type of burl data to predict a physical condition of a burl, in accordance with an embodiment.

[0036] FIG. 11 A is a flow chart of a process of training the third burl-condition predictor using a third type of burl data to predict a physical condition of a burl, in accordance with an embodiment.

[0037] FIG. 1 IB is a flow chart of another process of training the third burl-condition predictor to predict a physical condition of a burl, in accordance with an embodiment.

[0038] FIG. 12 is a block diagram that illustrates a computer system which can assist in implementing the systems and methods disclosed herein.

DETAILED DESCRIPTION

[0039] In lithography, a patterning device (e.g., a mask) may provide a mask pattern (e.g., mask design layout) corresponding to a target pattern (e.g., target design layout), and this mask pattern may be transferred onto a substrate by transmitting light through the mask pattern. In a lithographic apparatus, the substrate to be exposed is held on an object holder such as a substrate holder. The substrate-facing surface of the substrate holder is provided with multiple projections (referred to as burls). The distal surfaces of the burls conform to a flat plane and support the substrate. The burls provide several advantages: a contaminant particle on the substrate holder or on the substrate is likely to fall between burls and therefore does not cause a deformation of the substrate. However, the burls of the substrate holder wear during use, e.g., due to the repeated loading and unloading of substrates. The burls may also be prone to other issues, e.g., cracks, contamination, scratch, etc. Damaged or otherwise not considered to be “healthy” burls may be problematic to lithographic processes (e.g., may lead to a reduction of the process window, or may cause imaging errors). Due to the very precise manufacturing specifications, it is desirable to identify burl issues so that appropriate actions (e.g., maintenance related) may be employed to repair, refurbish, replace, or improve the life of a substrate holder. Conventional techniques identify burl issues (e.g., a physical condition) by visual inspection of high resolution images of a burl. For example, users manually inspect high resolution images of a burl (e.g., intensity image or height image) and identify an issue. However, obtaining high resolution images of a burl or visually inspecting the images is a time consuming process, especially considering that there are thousands of burls on a substrate holder (e.g., 10,000, 20,000, 30,000 or other number). Further, such a process may also be prone to human error resulting in an incorrect determination of an issue with a burl. These and other drawbacks exist with convention burl issue determination methods.

[0040] According to the present disclosure, machine learning (ML) models are trained to predict or determine a physical condition of a burl based on one or more images of a burl. For example, a first ML model may be trained using high resolution images of a burl (e.g., intensity image and height image obtained using an image capture device such as a white-light interferometric microscope) to predict a physical condition of the burl. In another example, a second ML model may be trained using (a) a subset of the images used to train the first ML model, (b) a low resolution image of the burls, or (c) an image of the burl that is faster to obtain than the images used to train the first ML model to predict the physical condition of one or more burls. In some embodiments, the second ML model be trained using the output of the first ML model. That is, physical conditions of burls predicted by a trained first ML model using the high resolution images (e.g., intensity image and height image) of the burls may be used in training the second ML model. For example, physical conditions of the burls obtained using the first ML model may be associated with a subset of the high resolution images of the burls, such as the intensity images (e.g., as opposed to intensity and height images used in the first ML), which may then be used as training data to train the second ML model to predict a physical condition of a burl. In this case, the second ML model may generate a prediction of a physical condition using a single image, e.g., an intensity image, of a burl, as opposed to the multiple images required by the first ML model, thereby minimizing the time or computing resources that may otherwise be consumed in obtaining or processing multiple high resolution images of a burl. In some embodiments, obtaining an intensity image consumes lesser time or computing resources than consumed in obtaining a height image. In another example, the physical condition of the burls obtained using the first ML model may be associated with each of the burls in a low resolution image of the burls (e.g., an image with a lower resolution or of lower magnification than the images used in training the first ML model and that captures multiple burls in a single image), which may then be used as training data to train the second ML model to predict physical conditions of the burls in any given low resolution image. In this case, the second ML model may generate a prediction of physical conditions of multiple burls using a single low resolution image as opposed to the multiple images required by the first ML model, thereby minimizing the time or computing resources that may otherwise be consumed in obtaining multiple high resolution images of a burl.

[0041] In some embodiments, a third ML model may be trained using roughness parameters of a burl (e.g., metrics that are indicative of physical condition of the burl) to predict a physical condition of the burl. The roughness parameters may be obtained by processing images of the burl (e.g., intensity and height images) using any of a number of image processing tools. In some embodiments, a physical condition of a burl may be determined as a function of the physical conditions of the burl determined using various ML models.

[0042] Note that throughout the description, terms like high resolution and low resolution are used. A high resolution image may mean that the image is obtained at a first resolution or magnification above a specified threshold (e.g., user defined threshold). A low resolution image may mean that the image is obtained at a second resolution or magnification that is lower than the first resolution.

[0043] FIG. 1 schematically depicts a lithographic apparatus in accordance with one or more embodiments. The apparatus comprises: an illumination system (illuminator) IL configured to condition a radiation beam B (e.g., UV radiation or DUV radiation); a first object holder or a support structure (e.g., a mask table) MT constructed to hold a patterning device (e.g., a mask) MA and connected to a first positioner PM configured to accurately position the patterning device in accordance with certain parameters; a second object holder such as a substrate holder or substrate table (e.g., a wafer table) WT constructed to hold a substrate (e.g., a resist coated wafer) W and connected to a second positioner PW configured to accurately position the substrate in accordance with certain parameters; and a projection system (e.g., a refractive projection lens system) PS configured to project a pattern imparted to the radiation beam B by patterning device MA onto a target portion C (e.g., comprising one or more dies) of the substrate W. [0044] The illumination system may include various types of optical components, such as refractive, reflective, magnetic, electromagnetic, electrostatic or other types of optical components, or any combination thereof, for directing, shaping, or controlling radiation.

[0045] The support structure holds the patterning device in a manner that depends on the orientation of the patterning device, the design of the lithographic apparatus, and other conditions, such as for example whether or not the patterning device is held in a vacuum environment. The support structure can use mechanical, vacuum, electrostatic or other clamping techniques to hold the patterning device. The support structure may be a frame or a table, for example, which may be fixed or movable as required. The support structure may ensure that the patterning device is at a desired position, for example with respect to the projection system. Any use of the terms “reticle” or “mask” herein may be considered synonymous with the more general term “patterning device.”

[0046] The term “patterning device” used herein should be broadly interpreted as referring to any device that can be used to impart a radiation beam with a pattern in its cross-section such as to create a pattern in a target portion of the substrate. It should be noted that the pattern imparted to the radiation beam may not exactly correspond to the desired pattern in the target portion of the substrate, for example if the pattern includes phase-shifting features or so called assist features. Generally, the pattern imparted to the radiation beam will correspond to a particular functional layer in a device being created in the target portion, such as an integrated circuit.

[0047] The patterning device may be transmissive or reflective. Examples of patterning devices include masks, programmable mirror arrays, and programmable LCD panels. Masks are well known in lithography, and include mask types such as binary, alternating phase-shift, and attenuated phase-shift, as well as various hybrid mask types. An example of a programmable mirror array employs a matrix arrangement of small mirrors, each of which can be individually tilted so as to reflect an incoming radiation beam in different directions. The tilted mirrors impart a pattern in a radiation beam which is reflected by the mirror matrix.

[0048] The term “projection system” used herein should be broadly interpreted as encompassing any type of projection system, including refractive, reflective, catadioptric, magnetic, electromagnetic and electrostatic optical systems, or any combination thereof, as appropriate for the exposure radiation being used, or for other factors such as the use of an immersion liquid or the use of a vacuum. Any use of the term “projection lens” herein may be considered as synonymous with the more general term “projection system”.

[0049] As here depicted, the apparatus is of a transmissive type (e.g., employing a transmissive mask). Alternatively, the apparatus may be of a reflective type (e.g., employing a programmable mirror array of a type as referred to above, or employing a reflective mask).

[0050] The lithographic apparatus may be of a type having two (dual stage) or more substrate tables (and/or two or more support structures). In such “multiple stage” machines the additional tables / support structure may be used in parallel, or preparatory steps may be carried out on one or more tables / support structure while one or more other tables / support structures are being used for exposure.

[0051] Referring to FIG. 1, the illuminator IL receives a radiation beam from a radiation source SO. The source and the lithographic apparatus may be separate entities, for example when the source is an excimer laser. In such cases, the source is not considered to form part of the lithographic apparatus and the radiation beam is passed from the source SO to the illuminator IL with the aid of a beam delivery system BD comprising, for example, suitable directing mirrors and/or a beam expander. In other cases, the source may be an integral part of the lithographic apparatus, for example when the source is a mercury lamp. The source SO and the illuminator IL, together with the beam delivery system BD if required, may be referred to as a radiation system.

[0052] The illuminator IL may comprise an adjuster AD configured to adjust the angular intensity distribution of the radiation beam. Generally, at least the outer and or inner radial extent (commonly referred to as s-outer and s-inner, respectively) of the intensity distribution in a pupil plane of the illuminator can be adjusted. In addition, the illuminator IL may comprise various other components, such as an integrator IN and a condenser CO. The illuminator may be used to condition the radiation beam, to have a desired uniformity and intensity distribution in its cross section.

[0053] The radiation beam B is incident on the patterning device (e.g., mask) MA, which is held on the support structure (e.g., mask table) MT, and is patterned by the patterning device. Having traversed the patterning device MA, the radiation beam B passes through the projection system PS, which focuses the beam onto a target portion C of the substrate W. With the aid of the second positioner PW and position sensor IF (e.g., an interferometric device, linear encoder or capacitive sensor), the substrate table WT can be moved accurately, e.g., so as to position different target portions C in the path of the radiation beam B. Similarly, the first positioner PM and another position sensor (which is not explicitly depicted in FIG. 1) can be used to accurately position the patterning device MA with respect to the path of the radiation beam B, e.g., after mechanical retrieval from a mask library, or during a scan. In general, movement of the support structure MT may be realized with the aid of a long- stroke module (coarse positioning) and a short- stroke module (fine positioning), which form part of the first positioner PM. Similarly, movement of the substrate table WT may be realized using a long-stroke module and a short-stroke module, which form part of the second positioner PW. In the case of a stepper (as opposed to a scanner) the support structure MT may be connected to a short-stroke actuator only, or may be fixed. Patterning device MA and substrate W may be aligned using patterning device alignment marks Ml, M2 and substrate alignment marks PI, P2. Although the substrate alignment marks as illustrated occupy dedicated target portions, they may be located in spaces between target portions (these are known as scribe-lane alignment marks). Similarly, in situations in which more than one die is provided on the patterning device MA, the patterning device alignment marks may be located between the dies.

[0054] The depicted apparatus could be used in at least one of the following modes: [0055] 1. In step mode, the support structure MT and the substrate table WT are kept essentially stationary, while an entire pattern imparted to the radiation beam is projected onto a target portion C at one time (i.e., a single static exposure). The substrate table WT is then shifted in the X and/or Y direction so that a different target portion C can be exposed. In step mode, the maximum size of the exposure field limits the size of the target portion C imaged in a single static exposure. [0056] 2. In scan mode, the support structure MT and the substrate table WT are scanned synchronously while a pattern imparted to the radiation beam is projected onto a target portion C (i.e., a single dynamic exposure). The velocity and direction of the substrate table WT relative to the support structure MT may be determined by the (de-)magnification and image reversal characteristics of the projection system PS. In scan mode, the maximum size of the exposure field limits the width (in the non-scanning direction) of the target portion in a single dynamic exposure, whereas the length of the scanning motion determines the height (in the scanning direction) of the target portion.

[0057] 3. In another mode, the support structure MT is kept essentially stationary holding a programmable patterning device, and the substrate table WT is moved or scanned while a pattern imparted to the radiation beam is projected onto a target portion C. In this mode, generally a pulsed radiation source is employed and the programmable patterning device is updated as required after each movement of the substrate table WT or in between successive radiation pulses during a scan. This mode of operation can be readily applied to maskless lithography that utilizes programmable patterning device, such as a programmable mirror array of a type as referred to above. [0058] Combinations and or variations on the above described modes of use or entirely different modes of use may also be employed.

[0059] FIG. 2A is a schematic depiction of a substrate holder, in accordance with an embodiment. FIG. 2B is a top view of the surface of the substrate holder, in accordance with an embodiment. In this example, an object holder such as a substrate table WT supports an object such as a substrate W. As shown, the substrate table WT is configured to provide one or more support surfaces to directly contact and support the substrate W. In some embodiments, the substrate table WT has one or more projections 20 (e.g., burls or protrusions) protruding or extending substantially perpendicularly from a surface of the substrate table WT. For example, more than 10,000 (e.g., 20,000, 30,000 or other number) burls may be provided on top of the substrate table WT.

Accordingly, in this situation, the substrate table WT may be referred to as a pimple table or a burl table. In particular, during operation, a lower surface or backside of the substrate W may be supported on upper face(s) of the one or more burls 20. Thus, the top(s) of the one or more burls define an effective support plane for the substrate W.

[0060] So, the substrate table WT is configured such that, when the substrate W is positioned on the substrate table WT, i.e., at least on the one or more burls 20, an upper surface of the substrate W lies in a predetermined plane in relation to a propagation direction of the exposure radiation. In an embodiment, the surface of the substrate W is oriented transverse to the propagation direction of the beam.

[0061] The arrangement of one or more burls 20 is not limiting. In some embodiments, the burls are arranged in an array, such as shown in the top view of the table surface in FIG. 2B. Also, the surface area of the substrate table WT or its burls that are in contact with the substrate W is not intended to be limiting. The above-described arrangement of the burls 20 may minimize or reduce the total area of the substrate W in contact with the substrate table WT and so generally may reduce the overall amount of contaminants between the substrate W and a corresponding surface of the substrate table WT contacting the substrate W.

[0062] In the present disclosure, methods and systems are disclosed for determination of burl issues using machine learning models. The machine learning models may predict a physical condition of a burl (e.g., a healthy burl, or a defective burl) using one or more images of a burl at one or more resolutions, or using roughness parameters associated with the surface of the burl.

[0063] FIG. 3A is a block diagram of a system 300 for generating a prediction of a physical condition of a burl using a first type of burl data, in accordance with an embodiment. The system 300 includes a first burl-condition predictor 350 that is configured to generate condition data 314 of a burl 302 based on image data 304 of the burl 302. In some embodiments, the burl 302 may be similar to the burl 20 of FIGs. 2A or 2B. The condition data 314 may be indicative of a physical condition of the burl 302. For example, the physical condition may include “healthy,” contaminated, cracked, scratched, hot (e.g., partial protrusion on the surface), edge wear, or various other conditions.

[0064] The image data 304 input to the first burl-condition predictor 350 may be a first type of burl data, which includes multiple images of the burl 302 obtained at a high resolution (e.g., a first resolution or magnification above a specified threshold). For example, as illustrated in FIG. 3B, the image data 304 may include different types of images such as an intensity image 324a and height image 324b of the burl 302. The intensity image 324a and height image 324b may be obtained using an image capture device such as a white-light interferometric microscope. The intensity image 324a of the burl 302 is representative of intensities in a range (e.g., each pixel may have a value between 0 to 1 indicating an intensity of the pixel compared to the other pixels in the image). The height image 324b is representative of height related (e.g., z-axis value) properties of the burl 302.

[0065] In another example, the image data 304 may include a composite image 334 of multiple images of the burl 302. For example, the composite image 334 may be a red, green, blue (RGB) image of a combination of the intensity image 324a, the height image 324b and a delta image 324c. The delta image 324c is representative of a difference between a first image of the burl 302 obtained prior to performing a process (e.g., a patterning process to print a target pattern on the substrate or a process in which there is multiple collision between the burl and a substrate) on a substrate held by the substrate holder and a second image of the burl 302 obtained after performing the process. Regardless of the number of images used in the image data 304, the images in the image data 304 may be high resolution images. In some embodiments, the image used in the image data 304 may also be a surface measurement with low resolution of a different image type. It should be noted that the higher the resolution, the longer is the time consumed, the more is the computing resource consumed in obtaining an image. While the first type of burl data is described as having intensity image and height image of a burl, the first type of burl data is not restricted to the foregoing image types for a burl and other image types may be used (e.g., scanned electron microscopy (SEM) image, Atomic Force Microscopy (AFM) image, or another image type).

[0066] In some embodiments, the first burl-condition predictor 350 may be a machine learning model (e.g., a convolutional neural network (CNN)) that is trained to predict the condition data of a burl. The present disclosure is not limited to any specific type of neural network of the machine learning model. The first burl-condition predictor 350 may be trained using a number of images of a number of burls and their associated condition data as training data. A type of input to be provided to the first burl-condition predictor 350 during a prediction process may be similar to the type of input provided during the training process. For example, if the first burl-condition predictor 350 is trained with the image pair of intensity image and height image as the input, then for generating the prediction of the physical condition, the image data to be input to the first burl-condition predictor 350 may include the image pair of intensity image and height image of a burl as well. Additional details with respect to the training process is described below at least with reference to Figures 5 and 6 below.

[0067] FIG. 4A and FIG. 4B illustrate generating a prediction of a physical condition of a burl using two different examples of a second type of burl data. The second type of burl data may include a subset of the high resolution images of the first type of burl data, or a low resolution image of multiple burls (e.g., image obtained at a second resolution or magnification that is lower than the first resolution used in the first type of burl data). FIG. 4A is a block diagram of a system 400 for generating a prediction of a physical condition of a burl using a second type of burl data, in accordance with an embodiment. The system 400 includes a second burl-condition predictor 450 that is configured to generate condition data 414 of a burl 402a based on image data 404 of the burl 402a. In some embodiments, the burl 402a may be similar to the burl 20 of FIGs. 2A or 2B. The condition data 414 may be indicative of a physical condition of the burl 402a.

[0068] The image data 404 input to the second burl-condition predictor 450 may be a second type of burl data, which includes a single image of the burl 402a (e.g., unlike multiple images of the burl used in the first type of burl data described above at least with reference to FIG. 3A or FIG. 3B). For example, the image data 404 may include an intensity image 424a of the burl 402a. The intensity image may be obtained at a high resolution (e.g., a first resolution or magnification above a specified threshold). However, in some embodiments, obtaining the intensity image at a high resolution consumes significantly lesser time than obtaining both the intensity image and the height image at a high resolution. While the second type of burl data is described as having and intensity image of a burl, the second type of burl data is not restricted to the foregoing image type for a burl and other image types may be used (e.g., SEM image, AFM image, or another image type). In some embodiments, the image types used in the second type of image data is different from or a subset of the image types used in the first type of burl data.

[0069] In some embodiments, the second burl-condition predictor 450 may be a machine learning model (e.g., a CNN) that is trained to predict the condition data of a burl. The second burl- condition predictor 450 may be trained using a number of images of a number of burls and their associated condition data as training data. Additional details with respect to the training process is described below at least with reference to Figures 7 and 8 below.

[0070] FIG. 4B is a block diagram of a system 425 for generating a prediction of a physical condition of a burl using another second type of burl data, in accordance with an embodiment. The system 425 includes the second burl-condition predictor 450 that is configured to generate condition data 414 of burls 402a-402n based on image data 404 of the burls 402a-402n. In some embodiments, the burls 402a-402n may be similar to the burls 20 of FIG. 2A or FIG. 2B. The condition data 414, which includes condition data 414a-414n for the burls 402a-402n, may be indicative of a physical condition of a respective burl.

[0071] The image data 404 may be another example of a second type of burl data, which includes a single image 434 of multiple burls 402a-402n (e.g., unlike multiple images of a single burl used in the first type of burl data described above at least with reference to FIG. 3 A or FIG. 3B). The image 434 may be obtained at a low resolution (e.g., a second resolution or magnification that is lower than the first resolution used in the first type of burl data). The image 434 may be obtained using an image capture device such as a white-light interferometric microscope. In some embodiments, obtaining a low resolution image of multiple burls consumes significantly lesser time than obtaining high resolution images (e.g., the intensity image or the height image).

[0072] The second burl-condition predictor 450 may be trained using (a) a number intensity images of a burl, or (b) a number of low resolution images, and their associated condition data as training data. In some embodiments, the condition data to be used as the training data may be obtained using the first burl-condition predictor 350. That is, the output of the first burl-condition predictor 350 may be used an input for training the second burl-condition predictor 450. For example, image data having high resolution images (e.g., intensity image and the height image) for each of a number of burls may be input to the first burl-condition predictor 350 to obtain the condition data for each of the burls. After obtaining the condition data, (a) each intensity image of a burl may be associated with the corresponding physical condition, or (b) each burl in a low resolution image may be associated with the corresponding physical condition, which may then be used as the training data for the second burl-condition predictor 450. Additional details with respect to the training process is described below at least with reference to Figures 7 and 8 below. [0073] The following description illustrates training of the first burl-condition predictor 350 with reference to Figures 5, 6A and 6B. FIG. 5 is a system 500 for training a first burl-condition predictor 350 using a first type of burl data to predict a physical condition of a burl, in accordance with an embodiment. FIG. 6A is a flow chart of a process 600 of training the first burl-condition predictor 350 using a first type of burl data to predict a physical condition of a burl, in accordance with an embodiment. The training is based on a first type of burl data including multiple high resolution images for a burl.

[0074] In an operation P601, a first type of burl data such as image data 504a-504n and associated condition data 514a-514n are obtained for a number of burls 502a-502n. In some embodiments, the burls 502a-502n may be similar to the burls 20 of FIG. 2A or FIG. 2B. Each image data 504a-504n (e.g., a first image data 504a) may be similar to image data 304 of FIG. 3A or FIG. 3B and may include multiple high resolution images (e.g., a first resolution or magnification above a specified threshold) of the respective burl (e.g., a first burl 502a). For example, the first image data 504a associated with the first burl 502a may include an intensity image (e.g., similar to intensity image 324a) and a height image (e.g., similar to height image 324b) of the first burl 502a. In another example, the first image data 504a may include a RGB image (e.g., similar to image 334) generated using a combination of the intensity image, the height image and a delta image (e.g., similar to the delta image 324c). In another example, the first image data 504a may include a distorted image or a transformed image of the intensity image, height image or the delta image. A number of types of distortion (e.g., optical distortion or other optical aberration that retains the characteristics of the physical conditions of the burls) or transformation (e.g., cropping, rotating, or other transformation) may be applied to a candidate image to generate the distorted image or transformed image, respectively. In some embodiments, distorted or transformed images may be used to improve the coverage or accuracy of the ML model. The high resolution images may be obtained using an image capture device, such as white-light interferometric microscope.

[0075] The first image data 504a is associated with a first condition data 514a that is indicative of a physical condition of the first burl 502a. For example, the physical condition may include healthy, contaminated, cracked, scratched, hot, edge wear, or another condition. The first condition data 514a may be obtained in a number of ways. For example, the first condition data 514a may be obtained by visually inspecting the multiple images in the first image data 504a of the first burl 502a.

[0076] In an operation P602, the first burl-condition predictor 350 may be trained using the image data 504a-504n and associated condition data 514a-514n to predict a physical condition of a burl. In some embodiments, the first burl-condition predictor 350 is a machine learning model. In some embodiments, the machine learning model is implemented as a neural network (e.g., CNN). [0077] FIG. 6B is a flow chart of another process 650 of training the first burl-condition predictor 350 to predict a physical condition of a burl, in accordance with an embodiment. In some embodiments, the process 650 is executed as part of the operation P602 of the process 600 of FIG.

6 A.

[0078] In an operation P603, the first image data 504a and the first condition data 514a are provided as input to the first burl-condition predictor 350. The first burl-condition predictor 350 generates, based on the first image data 504a, a prediction of condition data 524a that is indicative of a physical condition of the first burl 502a.

[0079] In an operation P604, a cost function 604 that is indicative of a difference between the predicted condition data 524a and the first condition data 514a is determined.

[0080] In an operation P605, parameters of the first burl-condition predictor 350 (e.g., weights or biases of the machine learning model) are adjusted such that the cost function 604 is reduced. The parameters may be adjusted in various ways. For example, the parameters may be adjusted based on a gradient descent method.

[0081] In an operation P606, a determination is made as to whether a training condition is satisfied. If the training condition is not satisfied, the process 650 is executed again with the same image data or a next image data (e.g., a second image data 504b) and an associated condition data.

The process 650 is executed with the same or a different image data and associated condition data iteratively until the training condition is satisfied. The training condition may be satisfied when the cost function 604 is minimized, the rate at which the cost function 604 reduces is below a threshold value, the process 650 (e.g., operations P603-P606) is executed for a predefined number of iterations, or other such condition. The process 650 may conclude when the training condition is satisfied.

[0082] At the end of the training process (e.g., when the training condition is satisfied), the first burl-condition predictor 350 may be used as a trained first burl-condition predictor 350, and may be used to predict a physical condition for any burl using any high resolution images of the burl (e.g., high resolution images that are not previously input to the first burl-condition predictor 350). An example of employing the trained first burl-condition predictor 350 is discussed with respect to FIG. 3A and FIG. 3B above.

[0083] The following description illustrates training of the second burl-condition predictor

450 with reference to Figures 7, 8A and 8B. FIG. 7 is a system 700 for training a second burl- condition predictor 450 using a second type of burl data to predict a physical condition of a burl, in accordance with an embodiment. FIG. 8A is a flow chart of a process 800 of training the second burl- condition predictor 450 using a second type of burl data to predict a physical condition of a burl, in accordance with an embodiment. The training is based on a second type of burl data including a low resolution image of multiple burls. In some embodiments, the second burl-condition predictor 450 is trained using the condition data predicted by the first burl-condition predictor 350.

[0084] In an operation P801, a first type of burl data, which includes image data 704a-704n are obtained for a number of burls 702a-702n. In some embodiments, the burls 702a-702n may be similar to the burls 20 of FIG. 2A or FIG. 2B. Each image data 704a-704n (e.g., a first image data 704a) may be similar to image data 504 of FIG. 5 and may include multiple high resolution images (e.g., a first resolution or magnification above a specified threshold) of the respective burl (e.g., a first burl 702a). The high resolution images may be obtained using an image capture device, such as white-light interferometric microscope.

[0085] The image data 704a-704n are input to a trained first burl-condition predictor 350 to obtain predicted condition data 714a-714n of the burls 702a-702n. For example, a first condition data 714a may be indicative of a physical condition of the first burl 702a.

[0086] In an operation P802, image data 724, which is a second type of burl data including a low resolution image of the burls 702a-n is obtained. The image data 724 is a single image of multiple burls 702a-70n obtained at a low resolution (e.g., a second resolution or magnification that is lower than the first resolution used in the first type of burl data). In some embodiments, the image data 724 is similar to the image data 404 of FIG. 4A and FIG. 4B.

[0087] In an operation P803, the second burl-condition predictor 450 may be trained using the image data 724 and associated condition data 714a-714n as the training data to predict a physical condition of one or more burls. In some embodiments, the second burl-condition predictor 450 is a machine learning model. In some embodiments, the machine learning model is implemented as a neural network (e.g., CNN).

[0088] FIG. 8B is a flow chart of another process 850 of training the second burl-condition predictor 450 to predict a physical condition of one or more burls, in accordance with an embodiment. In some embodiments, the process 850 is executed as part of the operation P803 of the process 800 of FIG. 8A.

[0089] In an operation P804, the condition data 714a-714n predicted by the first burl- condition predictor 350 is associated with the burls in the low resolution image 724. Each image of a burl either in the first type of data or the second type of data may be associated with location data that is representative of a location of the burl on the substrate holder. Accordingly, the condition data 714a-714n predicted by the first burl-condition predictor 350 may be associated with the burls in the low resolution image 724 based on the location data. For example, location data of the first burl 702a may be obtained from the first image data 704a associated with the first burl 702a and used to locate the first burl 702a in the low resolution image 724. After the first burl 702a is located in the low resolution image 724, the first condition data 714a is associated with the first burl 702a in the low resolution image 724. Similarly, the other burls in the low resolution image 724 may be associated with the corresponding condition data 714a-714n.

[0090] While the above describes associating condition data 714a-714n with the burls in a single low resolution image, different low resolution images of the burls 702a-702n may be obtained and associated with the condition data 714a-714n accordingly for preparing multiple low resolution images for training the second burl-condition predictor 450. [0091] In an operation P805, the image data 724 and the condition data 714a-714n are provided as input to the second burl-condition predictor 450. The second burl-condition predictor 450 generates, based on the image data 724, a prediction of condition data 722a-722n that is indicative of a physical condition of the burls 702a-702n. For example, the condition data 722a-722n includes a predicted first condition data 722a that is indicative of a physical condition of the first burl 702a. [0092] In an operation P806, a cost function 806 that is indicative of a difference between the predicted first condition data 722a and the first condition data 714a is determined.

[0093] In an operation P807, parameters of the second burl-condition predictor 450 (e.g., weights or biases of the machine learning model) are adjusted such that the cost function 806 is reduced. The parameters may be adjusted in various ways. For example, the parameters may be adjusted based on a gradient descent method.

[0094] In an operation P808, a determination is made as to whether a training condition is satisfied. If the training condition is not satisfied, the process 850 is executed again with the same image data or a next image data (e.g., a second low resolution image data) and associated condition data. The process 850 is executed with the same or a different image data and associated condition data iteratively until the training condition is satisfied. The training condition may be satisfied when the cost function 806 is minimized, the rate at which the cost function 806 reduces is below a threshold value, the process 850 (e.g., operations P805-P808) is executed for a predefined number of iterations, or other such condition. The process 850 may conclude when the training condition is satisfied.

[0095] At the end of the training process (e.g., when the training condition is satisfied), the second burl-condition predictor 450 may be used as a trained second burl-condition predictor 450, and may be used to predict a physical condition for one or more burls using any low resolution image of the burls (e.g., low resolution images that are not previously input to the second burl-condition predictor 450). An example of employing the trained second burl-condition predictor 450 is discussed with respect to FIG. 4A and FIG. 4B above.

[0096] While the FIGs. 7, 8A and 8B describe training the second burl-condition predictor

450 using a second type of burl data such as a single low resolution image of multiple burls 702a- 702n, the second type of burl data may also include multiple low resolution images in which each low resolution image has a single burl instead of multiple burls like the low resolution image 724. For example, each low resolution image of one the burls 702a-702n may be associated with the corresponding condition data from the condition data 714a-714n predicted by the first burl-condition predictor 350 to generate labeled set of low resolution images, which may be used as training data to the train the second burl-condition predictor 450. Similarly, the second type of burl data may include multiple high resolution intensity images in which each intensity image has a single burl unlike the low resolution image 724. For example, each intensity image of one of the burls 702a-702n may be associated with the corresponding condition data from the condition data 714a-714n predicted by the first burl-condition predictor 350 to generate a labeled set of intensity images, which may be used as training data to the train the second burl-condition predictor 450.

[0097] FIG. 9 is a block diagram of a system 900 for generating a prediction of a physical condition of a burl using a third type of burl data, in accordance with an embodiment. The system 900 includes a third burl-condition predictor 950 that is configured to predict condition data 914 of a burl 902 based on burl data 904 of the burl 302. In some embodiments, the burl 902 may be similar to the burl 20 of FIG. 2A or FIG. 2B. The condition data 914 may be indicative of a physical condition of the burl 902.

[0098] The burl data 904 input to the third burl-condition predictor 950 may be a third type of burl data, which includes surface descriptive parameters of the burl 902 that are indicative of physical properties of a surface of the substrate holder, such as a surface roughness. A number of surface descriptive parameters may be used to characterize a burl. For example, a first surface descriptive parameter may be indicative of a maximum height difference between a burl and the remaining burls. In another example, a second surface descriptive parameter may be indicative of an arithmetic average of burl heights measured as a line across a surface. In another example, a third surface descriptive parameter may be indicative of a measure of symmetry, or the lack of symmetry of burls with reference to a reference point on the surface. Each surface descriptive parameter may have a value in a specified range (e.g., 0 to 1, or another range) and different parameters may have values in different ranges. The surface descriptive parameters may be obtained in a number of ways. For example, an image processing application may be used to analyze images of a burl (e.g., high resolution images of the burl such as intensity image and height image) to derive the values of various surface descriptive parameters.

[0099] In some embodiments, the third burl-condition predictor 950 may be a machine learning model (e.g., a CNN) that is trained to predict the condition data of a burl based on the surface descriptive parameters. The present disclosure is not limited to any specific type of neural network of the machine learning model. The third burl-condition predictor 950 may be trained using surface descriptive parameters of a number of burls and their associated condition data as training data. [00100] The following description illustrates training of the first burl-condition predictor 350 with reference to Figures 10, 11 A and 1 IB. FIG. 10 is a system 1000 for training a third burl- condition predictor 950 using a third type of burl data to predict a physical condition of a burl, in accordance with an embodiment. FIG. 11 A is a flow chart of a process 1100 of training the third burl- condition predictor 950 using a third type of burl data to predict a physical condition of a burl, in accordance with an embodiment. The training is based on a third type of burl data including multiple surface descriptive parameters for a burl.

[00101] In an operation P1101, a third type of burl data 1004a-1004n and associated condition data 1014a-1014n are obtained for a number of burls 1002a-1002n. In some embodiments, the burls 1002a- 1002n may be similar to the burls 20 of FIG. 2A or FIG. 2B. Each burl data 1004a- 1004n (e.g., a first burl data 1004a) may be similar to the burl data 904 of FIG. 9 and may include values of multiple surface descriptive parameters of the respective burl (e.g., a first burl 1002a). For example, the first burl data 1004a associated with the first burl 1002a may include values of multiple surface descriptive parameters of the first burl 1002a. As described above, the surface descriptive parameters may be obtained in a number of ways. For example, an image processing application may be used to analyze images of a burl (e.g., high resolution images such as intensity image and height image of the burl) to derive the values of various surface descriptive parameters.

[00102] The first burl data 1004a is associated with a first condition data 1014a that is indicative of a physical condition of the first burl 1002a. For example, the physical condition may include healthy, contaminated, cracked, scratched, hot, edge wear, blister, or another condition. The first condition data 1014a may be obtained in a number of ways. For example, different ranges of values of different parameters may be representative of different physical conditions. For example, a user may define the “healthy” condition to include values in a first set of ranges for a first set of surface descriptive parameters, a contaminated condition to include values in a second set of ranges for a second set of surface descriptive parameters and so on.

[00103] In an operation PI 102, the third burl-condition predictor 950 may be trained using the burl data 1004a-1004n and associated condition data 1014a-1014n to predict a physical condition of a burl. In some embodiments, the third burl-condition predictor 950 is a machine learning model. In some embodiments, the machine learning model is implemented as a neural network (e.g., CNN). [00104] FIG. 1 IB is a flow chart of another process 1150 of training the third burl-condition predictor 950 to predict a physical condition of a burl, in accordance with an embodiment. In some embodiments, the process 1150 is executed as part of the operation PI 102 of the process 1100 of FIG. 11 A.

[00105] In an operation PI 103, the first burl data 1004a and the first condition data 1014a are provided as input to the third burl-condition predictor 950. The third burl-condition predictor 950 generates, based on the first burl data 1004a, a prediction of condition data 1024a that is indicative of a physical condition of the first burl 1002a.

[00106] In an operation PI 104, a cost function 1104 that is indicative of a difference between the predicted condition data 1024a and the first condition data 1014a is determined.

[00107] In an operation PI 105, parameters of the third burl-condition predictor 950 (e.g., weights or biases of the machine learning model) are adjusted such that the cost function 1104 is reduced. The parameters may be adjusted in various ways. For example, the parameters may be adjusted based on a gradient descent method.

[00108] In an operation PI 106, a determination is made as to whether a training condition is satisfied. If the training condition is not satisfied, the process 1150 is executed again with the same burl data or a next burl data (e.g., a second burl data 1004b) and an associated condition data. The process 1150 is executed with the same or a different burl data and associated condition data iteratively until the training condition is satisfied. The training condition may be satisfied when the cost function 1104 is minimized, the rate at which the cost function 1104 reduces is below a threshold value, the process 1150 (e.g., operations PI 103-PI 106) is executed for a predefined number of iterations, or other such condition. The process 1150 may conclude when the training condition is satisfied.

[00109] At the end of the training process (e.g., when the training condition is satisfied), the third burl-condition predictor 950 may be used as a trained third burl-condition predictor 950, and may be used to predict a physical condition for any burl using its surface descriptive parameter values. An example of employing the trained third burl-condition predictor 950 is discussed with respect to FIG. 9 above.

[00110] In some embodiments, a physical condition of a burl may be determined as a function of a physical condition of the burl determined using various burl-condition predictors. For example, a physical condition of a first burl may be determined as a function of a first physical condition of the fist burl determined using the surface descriptive parameters and a second physical condition determined using the low resolution image or the high resolution images of the first burl. In some embodiments, the physical condition determined using a burl-condition predictor may be associated with a confidence score, which is indicative of a likelihood or a probability of the burl having the predicted physical condition. For example, if the predicted physical condition of “healthy” has a confidence score of “0.865” (e.g., 0 to 1 being the range in which “0” is least confident and “1” is most confident), then the confidence score indicates that there is a “86.5%” likelihood that the burl is healthy. The function to determine the resultant physical condition of the burl may be defined in a number of ways. For example, the physical condition associated with the highest confidence score among the physical conditions predicted using various burl-condition predictors may be determined to be the resultant physical condition of a burl. In another example, the physical condition associated with a confidence score that satisfies a confidence score threshold (e.g., exceeds a confidence score threshold) among the physical conditions predicted using various burl-condition predictors may be determined to be the resultant physical condition.

[00111] In some embodiments, the physical conditions of the burls may be used for various purposes. For example, since the physical conditions of the burls may be indicative of a wear and tear of the substrate holder, the type of damage caused to the substrate holder may be determined and an appropriate remedial action (e.g., repair, refurbish, or replace) may be performed with respect to the substrate holder to improve the life or performance of the substrate holder, thereby minimizing any errors that may be caused due to a faulty substrate holder in patterning a substrate. In another example, the physical conditions may be used in determination of a wafer load grid (WLG) associated with the substrate holder, which is typically indicative of a performance of the substrate holder. A WLG error may be a shift in a direction generally parallel to a support plane/surface of the substrate holder. Where the object is supported by one or more burls, the WLG error may be due to a relatively high friction between the substrate and the top of the one or more burls of the substrate holder. Contamination may be a cause for issues associated with WLG error, but WLG error may also arise from other causes (e.g., overly smooth surfaces). Typically, WLG error is determined based on one or more metrology metrics associated with the substrate table. The burl physical condition may also be used as one of the metrics to determine the WLG error.

[00112] While the foregoing embodiments are described with reference to a substrate holder, the embodiments may also be used to determine physical conditions of the surface of other object holders of the lithographic apparatus (e.g., a mask holder that is used to hold a mask of a pattern to be printed on the substrate, or other object holder).

[00113] Further, while the foregoing embodiments are described with reference to determining physical conditions of a burl using high resolution and low resolution images of a burl, the embodiments may also be used to train ML models to determine various features of a subject in an image using a combination of low resolution and high resolution images of the subject. For example, a first ML model (e.g., like the first burl-condition predictor 350) may be trained to predict a first feature (e.g., number on a license plate) of a subject (e.g., a car) using a high resolution image of (a) the subject or (b) an environment having the subject. A second ML model (e.g., like the second burl- condition predictor 450) may be trained to predict the first feature of the subject (e.g., the number on the license plate of the car) using a low resolution image of (a) the subject or (b) an environment having the subject. In some embodiments, the training data for the second ML model (e.g., low resolution images labeled with the number on the license plate) may be generated based on the features predicted by the first ML model. For example, a first set of high resolution images of the subject may be input to a trained first ML model to obtain a prediction of the first set of features (the first ML model, like the first burl-condition predictor 350, may be trained with labeled high resolution images of the subject). A second set of low resolution images of the subject may be obtained and associated with the first set of features obtained from the first ML model to generate the training data for training the second ML model. The second ML model, like the second burl-condition predictor 450, may be trained using the training data to predict a feature of a subject using a low resolution image of the subject.

[00114] FIG. 12 is a block diagram that illustrates a computer system 100 which can assist in implementing the systems and methods disclosed herein. Computer system 100 includes a bus 102 or other communication mechanism for communicating information, and a processor 104 (or multiple processors 104 and 105) coupled with bus 102 for processing information. Computer system 100 also includes a main memory 106, such as a random access memory (RAM) or other dynamic storage device, coupled to bus 102 for storing information and instructions to be executed by processor 104. Main memory 106 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 104. Computer system 100 further includes a read only memory (ROM) 108 or other static storage device coupled to bus 102 for storing static information and instructions for processor 104. A storage device 110, such as a magnetic disk or optical disk, is provided and coupled to bus 102 for storing information and instructions.

[00115] Computer system 100 may be coupled via bus 102 to a display 112, such as a cathode ray tube (CRT) or flat panel or touch panel display for displaying information to a computer user. An input device 114, including alphanumeric and other keys, is coupled to bus 102 for communicating information and command selections to processor 104. Another type of user input device is cursor control 116, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 104 and for controlling cursor movement on display 112. This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane. A touch panel (screen) display may also be used as an input device.

[00116] According to one embodiment, portions of the optimization process may be performed by computer system 100 in response to processor 104 executing one or more sequences of one or more instructions contained in main memory 106. Such instructions may be read into main memory 106 from another computer-readable medium, such as storage device 110. Execution of the sequences of instructions contained in main memory 106 causes processor 104 to perform the process steps described herein. One or more processors in a multi-processing arrangement may also be employed to execute the sequences of instructions contained in main memory 106. In an alternative embodiment, hard- wired circuitry may be used in place of or in combination with software instructions. Thus, the description herein is not limited to any specific combination of hardware circuitry and software.

[00117] The term “computer-readable medium” as used herein refers to any medium that participates in providing instructions to processor 104 for execution. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non volatile media include, for example, optical or magnetic disks, such as storage device 110. Volatile media include dynamic memory, such as main memory 106. Transmission media include coaxial cables, copper wire and fiber optics, including the wires that comprise bus 102. Transmission media can also take the form of acoustic or light waves, such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD- ROM, DVD, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read.

[00118] Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to processor 104 for execution. For example, the instructions may initially be borne on a magnetic disk of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 100 can receive the data on the telephone line and use an infrared transmitter to convert the data to an infrared signal. An infrared detector coupled to bus 102 can receive the data carried in the infrared signal and place the data on bus 102. Bus 102 carries the data to main memory 106, from which processor 104 retrieves and executes the instructions. The instructions received by main memory 106 may optionally be stored on storage device 110 either before or after execution by processor 104.

[00119] Computer system 100 also preferably includes a communication interface 118 coupled to bus 102. Communication interface 118 provides a two-way data communication coupling to a network link 120 that is connected to a local network 122. For example, communication interface 118 may be an integrated services digital network (ISDN) card or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 118 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation, communication interface 118 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.

[00120] Network link 120 typically provides data communication through one or more networks to other data devices. For example, network link 120 may provide a connection through local network 122 to a host computer 124 or to data equipment operated by an Internet Service Provider (ISP) 126. ISP 126 in turn provides data communication services through the worldwide packet data communication network, now commonly referred to as the “Internet” 128. Local network 122 and Internet 128 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 120 and through communication interface 118, which carry the digital data to and from computer system 100, are exemplary forms of carrier waves transporting the information.

[00121] Computer system 100 can send messages and receive data, including program code, through the network(s), network link 120, and communication interface 118. In the Internet example, a server 130 might transmit a requested code for an application program through Internet 128, ISP 126, local network 122 and communication interface 118. One such downloaded application may provide for the illumination optimization of the embodiment, for example. The received code may be executed by processor 104 as it is received, and/or stored in storage device 110, or other non-volatile storage for later execution. In this manner, computer system 100 may obtain application code in the form of a carrier wave.

[00122] The embodiments may further be described using the following clauses:

1. A non-transitory computer -readable medium having instructions that, when executed by a computer, cause the computer to execute a method for training a machine learning model to determine a physical condition of an object holder of a lithographic apparatus, the method comprising: obtaining, from a first machine learning model, a first plurality of condition data that is representative of a physical condition of multiple burls on a surface of an object holder of a lithographic apparatus, wherein the first plurality of condition data is obtained using a plurality of first type of burl data associated with the burls; obtaining a plurality of second type of burl data associated with the burls, the second type of burl data being different from the first type of burl data; and training, based on the plurality of second type of burl data and the first plurality of condition data, a second machine learning model to predict condition data representative of a physical condition of a burl of the burls.

2. The computer-readable medium of clause 1, wherein the first type of burl data includes multiple images of a first burl of the burls, wherein the images are of different image types.

3. The computer-readable medium of clause 2, wherein the second type of burl data includes an image of one or more of the multiple burls, wherein the image is of an image type, the image type being different from or a subset of image types of the images in the first type of burl data.

4. The computer-readable medium of clause 3, wherein the images of the first burl in the first type of burl data are of a first resolution, and wherein the image of the one or more of the multiple burls in the second type of burl data is of a second resolution lower than the first resolution.

5. The computer-readable medium of clause 2, wherein the images of the first burl in the first type of burl data includes an intensity image and a height image of the first burl.

6. The computer -readable medium of clause 2, wherein the images of the first burl in the first type of burl data includes an intensity image, a height image, and a delta image of the first burl, wherein the delta image is representative of a difference between a first image of the first burl obtained prior to performing a process using the lithographic apparatus on an object held by the object holder and a second image of the first burl obtained after performing the process.

7. The computer-readable medium of clause 2, wherein the images of the first burl in the first type of burl data includes a distorted image of any of the images of the first burl.

8. The computer-readable medium of clause 2, wherein the second type of burl data includes one of the images of the first burl.

9. The computer-readable medium of clause 8, wherein the second type of burl data includes an intensity image of the first burl.

10. The computer-readable medium of clause 1, wherein obtaining the first plurality of condition data includes: providing the plurality of first type of burl data to the first machine learning model that is trained to predict condition data representative of a physical condition of a burl of the burls, wherein the plurality of first type of burl data includes first burl data of a first burl of the burls, the first burl data including multiple images of the first burl, wherein the images are of a first resolution; and obtaining the first plurality of condition data, the first plurality of condition data including first condition data that is representative of a physical condition of the first burl.

11. The computer-readable medium of clause 10, wherein obtaining the first plurality of condition data further includes: obtaining second burl data of the plurality of second type of burl data, wherein the second burl data includes an image of one or more of the multiple burls, wherein the image is of a second resolution lower than the first resolution; and associating the first plurality of condition data with the corresponding burls in the image.

12. The computer-readable medium of clause 11, wherein associating the first plurality of condition data includes: obtaining, using the first burl data, location data representative of a location of the first burl on the object holder; determining the first burl in the image of the second burl data based on the location data; and associating the first condition data with the first burl in the image of the second burl data.

13. The computer-readable medium of clause 10, wherein obtaining the first plurality of condition data includes: obtaining a set of first type of burl data associated with the burls, wherein the set of first type of burl data includes third burl data of a third burl of the burls, wherein the third burl data is associated with third condition data that is representative of the physical condition of the third burl; and training, based on the set of first type of burl data, the first machine learning model to predict condition data for the third burl data such that a cost function that is indicative of a difference between the predicted condition data and the third condition data is reduced.

14. The computer-readable medium of clause 1, wherein training the second machine learning model is an iterative process in which each iteration includes:

(a) executing the second machine learning model, using the plurality of second type of the burl data and the first plurality of condition data, to output the predicted condition data,

(b) determining a cost function as the difference between the predicted condition data and first condition data of the first plurality of condition data associated, the first condition data representative of a physical condition of the burl,

(c) adjusting parameters of the second machine learning model,

(d) determining whether the cost function is reduced as a result of the adjusting, and

(e) responsive to the cost function not being reduced, repeating steps (a), (b), (c) and (d).

15. The computer-readable medium of clause 1 further comprising: inputting specified burl data of one or more burls on the object holder to the second machine learning model; and executing the second machine learning model to generate specified condition data of the one or more burls, wherein the specified condition data is representative of physical conditions of the one or more burls.

16. The computer-readable medium of clause 15, wherein the specified burl data includes an intensity image of a specified burl of the one or more burls, and wherein the specified condition data is representative of a physical condition of the specified burl.

17. The computer-readable medium of clause 15, wherein the specified burl data includes a specified image of multiple burls of the one or more burls, and wherein the specified condition data is representative of physical conditions of the burls.

18. The computer-readable medium of clause 15 further comprising: determining a maintenance related action to be performed in association with the object holder based on the specified condition data.

19. The computer-readable medium of clause 1 further comprising: obtaining a plurality of third type of burl data associated with the burls, wherein the plurality of third type of burl data includes fourth burl data a fourth burl of the burls, the fourth burl data associated with fourth condition data that is representative of a physical condition of the fourth burl; and training, based on the plurality of third type of burl data, a third machine learning model to predict condition data such that a cost function that is indicative of a difference between the predicted condition data and the fourth condition data is reduced.

20. The computer-readable medium of clause 19, wherein the third type of burl data includes surface descriptive parameters associated with a burl of the burls, wherein the surface descriptive parameters are representative of various properties of the burl.

21. The computer-readable medium of clause 20, wherein the surface descriptive parameters are obtained using one or more images of the burl.

22. The computer-readable medium of clause 19 further comprising: inputting specified burl data of a specified burl of the burls to the third machine learning model, wherein the specified burl data is of the third type of burl data; and executing the third machine learning model to generate specified condition data of the specified burl, wherein the specified condition data is representative of a physical condition of the specified burl.

23. The computer-readable medium of clause 1, wherein the object holder is a wafer table or a wafer clamp on which a substrate is positioned.

24. The computer-readable medium of clause 1, wherein the object holder is a mask holder on which a mask associated with a pattern to be printed on a substrate is positioned. 25. A non-transitory computer -readable medium having instructions that, when executed by a computer, cause the computer to execute a method for training a machine learning model to determine a physical condition of an object holder of a lithographic apparatus, the method comprising: obtaining first training data having a first plurality of first type of burl data associated with multiple burls on a surface of an object holder of a lithographic apparatus, wherein the first plurality of first type of burl data includes first burl data of a first burl of the burls, wherein the first burl data is associated with first condition data that is representative of a physical condition of the first burl; training, based on the first training data, a first machine learning model to predict condition data for the first burl data such that a cost function that is indicative of a difference between the predicted condition data and the first condition data is reduced; providing a second plurality of first type of burl data associated with the burls as input to the first machine learning model to obtain a second plurality of condition data as second condition data that is representative of physical conditions of the burls; obtaining second training data comprising (a) a plurality of second type of burl data associated with the burls, wherein the plurality of second type of burl data includes second burl data, the second burl data associated with the second condition data; and training, based on the second training data, a second machine learning model to predict condition data for the second burl data such that a cost function that is indicative of a difference between the predicted condition data and the second condition data is reduced.

26. The computer-readable medium of clause 25 further comprising: inputting specified burl data of one or more burls on the object holder to the second machine learning model, wherein the specified burl data is of the second type; and executing the second machine learning model to generate specified condition data of the one or more burls, wherein the specified condition data is representative of physical conditions of the one or more burls.

27. A non-transitory computer -readable medium having instructions that, when executed by a computer, cause the computer to execute a method for determining a physical condition of an object holder of a lithographic apparatus, the method comprising: obtaining specified burl data associated with multiple burls on a surface of an object holder of a lithographic apparatus, wherein the specified burl data includes an image of one or more of the multiple burls; providing the specified burl data to a machine learning model; and executing the machine learning model to generate specified condition data of the burls, wherein the specified condition data is representative of physical conditions of the burls.

28. The computer-readable medium of clause 27 further comprising: determining a maintenance related action to be performed in association with the object holder based on the specified condition data. 29. A non-transitory computer -readable medium having instructions that, when executed by a computer, cause the computer to execute a method for training a machine learning model to determine a physical condition of an object holder of a lithographic apparatus, the method comprising: obtaining a plurality of image data of multiple burls on a surface of an object holder of a lithographic apparatus, wherein the plurality of image data includes first image data comprising multiple images of a first burl of the burls, wherein the first image data is associated with first condition data that is representative of a physical condition of the first burl; and training, based on the plurality of image data, a machine learning model to predict condition data representative of a physical condition of a burl of the burls.

30. A non-transitory computer -readable medium having instructions that, when executed by a computer, cause the computer to execute a method for training a machine learning model to determine a feature associated with a subject of an image, the method comprising: obtaining, from a first machine learning model, a plurality of feature data using a plurality of first type of image data associated with a plurality of subjects, the plurality of feature data including first feature data that is representative of a feature of a first subject of the subjects, the plurality of image data including first image data comprising an image of the first subject at a first resolution; obtaining a plurality of second type of image data associated with the subjects, the plurality of second type of image data including second image data comprising an image of the first subject at a second resolution lower than the first resolution; associating the plurality of second type of image data with the plurality of feature data, wherein the associating includes associating the second image data with the first feature data; and training, based on the plurality of second type of image data and the plurality of feature data, a second machine learning model to predict feature data representative of a feature of a subject of the subjects.

31. A method for training a machine learning model to determine a physical condition of an object holder of a lithographic apparatus, the method comprising: obtaining, from a first machine learning model, a first plurality of condition data that is representative of a physical condition of multiple burls on a surface of an object holder of a lithographic apparatus, wherein the first plurality of condition data is obtained using a plurality of first type of burl data associated with the burls; obtaining a plurality of second type of burl data associated with the burls, the second type of burl data being different from the first type of burl data; and training, based on the plurality of second type of burl data and the first plurality of condition data, a second machine learning model to predict condition data representative of a physical condition of a burl of the burls.

32. A method for training a machine learning model to determine a physical condition of an object holder of a lithographic apparatus, the method comprising: obtaining first training data having a first plurality of first type of burl data associated with multiple burls on a surface of an object holder of a lithographic apparatus, wherein the first plurality of first type of burl data includes first burl data of a first burl of the burls, wherein the first burl data is associated with first condition data that is representative of a physical condition of the first burl; training, based on the first training data, a first machine learning model to predict condition data for the first burl data such that a cost function that is indicative of a difference between the predicted condition data and the first condition data is reduced; providing a second plurality of first type of burl data associated with the burls as input to the first machine learning model to obtain a second plurality of condition data as second condition data that is representative of physical conditions of the burls; obtaining second training data comprising (a) a plurality of second type of burl data associated with the burls, wherein the plurality of second type of burl data includes second burl data, the second burl data associated with the second condition data; and training, based on the second training data, a second machine learning model to predict condition data for the second burl data such that a cost function that is indicative of a difference between the predicted condition data and the second condition data is reduced.

33. A method for determining a physical condition of an object holder of a lithographic apparatus, the method comprising: obtaining specified burl data associated with multiple burls on a surface of an object holder of a lithographic apparatus, wherein the specified burl data includes an image of one or more of the multiple burls; providing the specified burl data to a machine learning model; and executing the machine learning model to generate specified condition data of the burls, wherein the specified condition data is representative of physical conditions of the burls.

34. A method for training a machine learning model to determine a physical condition of an object holder of a lithographic apparatus, the method comprising: obtaining a plurality of image data of multiple burls on a surface of an object holder of a lithographic apparatus, wherein the plurality of image data includes first image data comprising multiple images of a first burl of the burls, wherein the first image data is associated with first condition data that is representative of a physical condition of the first burl; and training, based on the plurality of image data, a machine learning model to predict condition data representative of a physical condition of a burl of the burls.

35. A method for training a machine learning model to determine a feature associated with a subject of an image, the method comprising: obtaining, from a first machine learning model, a plurality of feature data using a plurality of first type of image data associated with a plurality of subjects, the plurality of feature data including first feature data that is representative of a feature of a first subject of the subjects, the plurality of image data including first image data comprising an image of the first subject at a first resolution; obtaining a plurality of second type of image data associated with the subjects, the plurality of second type of image data including second image data comprising an image of the first subject at a second resolution lower than the first resolution; associating the plurality of second type of image data with the plurality of feature data, wherein the associating includes associating the second image data with the first feature data; and training, based on the plurality of second type of image data and the plurality of feature data, a second machine learning model to predict feature data representative of a feature of a subject of the subjects.

36. An apparatus for training a machine learning model to determine a physical condition of an object holder of a lithographic apparatus, the apparatus comprising: a memory storing a set of instructions; and a processor configured to execute the set of instructions to cause the apparatus to perform a method of: obtaining, from a first machine learning model, a first plurality of condition data that is representative of a physical condition of multiple burls on a surface of an object holder of a lithographic apparatus, wherein the first plurality of condition data is obtained using a plurality of first type of burl data associated with the burls; obtaining a plurality of second type of burl data associated with the burls, the second type of burl data being different from the first type of burl data; and training, based on the plurality of second type of burl data and the first plurality of condition data, a second machine learning model to predict condition data representative of a physical condition of a burl of the burls.

37. An apparatus for training a machine learning model to determine a physical condition of an object holder of a lithographic apparatus, the apparatus comprising: a memory storing a set of instructions; and a processor configured to execute the set of instructions to cause the apparatus to perform a method of: obtaining first training data having a first plurality of first type of burl data associated with multiple burls on a surface of an object holder of a lithographic apparatus, wherein the first plurality of first type of burl data includes first burl data of a first burl of the burls, wherein the first burl data is associated with first condition data that is representative of a physical condition of the first burl; training, based on the first training data, a first machine learning model to predict condition data for the first burl data such that a cost function that is indicative of a difference between the predicted condition data and the first condition data is reduced; providing a second plurality of first type of burl data associated with the burls as input to the first machine learning model to obtain a second plurality of condition data as second condition data that is representative of physical conditions of the burls; obtaining second training data comprising (a) a plurality of second type of burl data associated with the burls, wherein the plurality of second type of burl data includes second burl data, the second burl data associated with the second condition data; and training, based on the second training data, a second machine learning model to predict condition data for the second burl data such that a cost function that is indicative of a difference between the predicted condition data and the second condition data is reduced.

38. An apparatus for determining a physical condition of an object holder of a lithographic apparatus, the apparatus comprising: a memory storing a set of instructions; and a processor configured to execute the set of instructions to cause the apparatus to perform a method of: obtaining specified burl data associated with multiple burls on a surface of an object holder of a lithographic apparatus, wherein the specified burl data includes an image of one or more of the multiple burls; providing the specified burl data to a machine learning model; and executing the machine learning model to generate specified condition data of the burls, wherein the specified condition data is representative of physical conditions of the burls.

39. An apparatus for training a machine learning model to determine a physical condition of an object holder of a lithographic apparatus, the apparatus comprising: a memory storing a set of instructions; and a processor configured to execute the set of instructions to cause the apparatus to perform a method of: obtaining a plurality of image data of multiple burls on a surface of an object holder of a lithographic apparatus, wherein the plurality of image data includes first image data comprising multiple images of a first burl of the burls, wherein the first image data is associated with first condition data that is representative of a physical condition of the first burl; and training, based on the plurality of image data, a machine learning model to predict condition data representative of a physical condition of a burl of the burls.

40. An apparatus for training a machine learning model to determine a feature associated with a subject of an image, the apparatus comprising: a memory storing a set of instructions; and a processor configured to execute the set of instructions to cause the apparatus to perform a method of: obtaining, from a first machine learning model, a plurality of feature data using a plurality of first type of image data associated with a plurality of subjects, the plurality of feature data including first feature data that is representative of a feature of a first subject of the subjects, the plurality of image data including first image data comprising an image of the first subject at a first resolution; obtaining a plurality of second type of image data associated with the subjects, the plurality of second type of image data including second image data comprising an image of the first subject at a second resolution lower than the first resolution; associating the plurality of second type of image data with the plurality of feature data, wherein the associating includes associating the second image data with the first feature data; and training, based on the plurality of second type of image data and the plurality of feature data, a second machine learning model to predict feature data representative of a feature of a subject of the subjects.

[00123] While the concepts disclosed herein may be used for imaging on a substrate such as a silicon wafer, it shall be understood that the disclosed concepts may be used with any type of lithographic imaging systems, e.g., those used for imaging on substrates other than silicon wafers. [00124] The terms “optimizing” and “optimization” as used herein refers to or means adjusting a patterning apparatus (e.g., a lithography apparatus), a patterning process, etc. such that results and/or processes have more desirable characteristics, such as higher accuracy of projection of a design pattern on a substrate, a larger process window, etc. Thus, the term “optimizing” and “optimization” as used herein refers to or means a process that identifies one or more values for one or more parameters that provide an improvement, e.g., a local optimum, in at least one relevant metric, compared to an initial set of one or more values for those one or more parameters. "Optimum" and other related terms should be construed accordingly. In an embodiment, optimization steps can be applied iteratively to provide further improvements in one or more metrics.

[00125] Aspects of the invention can be implemented in any convenient form. For example, an embodiment may be implemented by one or more appropriate computer programs which may be carried on an appropriate carrier medium which may be a tangible carrier medium (e.g., a disk) or an intangible carrier medium (e.g., a communications signal). Embodiments of the invention may be implemented using suitable apparatus which may specifically take the form of a programmable computer running a computer program arranged to implement a method as described herein. Thus, embodiments of the disclosure may be implemented in hardware, firmware, software, or any combination thereof. Embodiments of the disclosure may also be implemented as instructions stored on a machine-readable medium, which may be read and executed by one or more processors. A machine -readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computing device). For example, a machine -readable medium may include read only memory (ROM); random access memory (RAM); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical or other forms of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.), and others. Further, firmware, software, routines, instructions may be described herein as performing certain actions. However, it should be appreciated that such descriptions are merely for convenience and that such actions in fact result from computing devices, processors, controllers, or other devices executing the firmware, software, routines, instructions, etc.

[00126] In block diagrams, illustrated components are depicted as discrete functional blocks, but embodiments are not limited to systems in which the functionality described herein is organized as illustrated. The functionality provided by each of the components may be provided by software or hardware modules that are differently organized than is presently depicted, for example such software or hardware may be intermingled, conjoined, replicated, broken up, distributed (e.g., within a data center or geographically), or otherwise differently organized. The functionality described herein may be provided by one or more processors of one or more computers executing code stored on a tangible, non-transitory, machine readable medium. In some cases, third party content delivery networks may host some or all of the information conveyed over networks, in which case, to the extent information (e.g., content) is said to be supplied or otherwise provided, the information may be provided by sending instructions to retrieve that information from a content delivery network.

[00127] Unless specifically stated otherwise, as apparent from the discussion, it is appreciated that throughout this specification discussions utilizing terms such as “processing,” “computing,” “calculating,” “determining” or the like refer to actions or processes of a specific apparatus, such as a special purpose computer or a similar special purpose electronic processing/computing device.

[00128] The reader should appreciate that the present application describes several inventions.

Rather than separating those inventions into multiple isolated patent applications, these inventions have been grouped into a single document because their related subject matter lends itself to economies in the application process. But the distinct advantages and aspects of such inventions should not be conflated. In some cases, embodiments address all of the deficiencies noted herein, but it should be understood that the inventions are independently useful, and some embodiments address only a subset of such problems or offer other, unmentioned benefits that will be apparent to those of skill in the art reviewing the present disclosure. Due to costs constraints, some inventions disclosed herein may not be presently claimed and may be claimed in later filings, such as continuation applications or by amending the present claims. Similarly, due to space constraints, neither the Abstract nor the Summary sections of the present document should be taken as containing a comprehensive listing of all such inventions or all aspects of such inventions. [00129] It should be understood that the description and the drawings are not intended to limit the present disclosure to the particular form disclosed, but to the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the inventions as defined by the appended claims.

[00130] Modifications and alternative embodiments of various aspects of the inventions will be apparent to those skilled in the art in view of this description. Accordingly, this description and the drawings are to be construed as illustrative only and are for the purpose of teaching those skilled in the art the general manner of carrying out the inventions. It is to be understood that the forms of the inventions shown and described herein are to be taken as examples of embodiments. Elements and materials may be substituted for those illustrated and described herein, parts and processes may be reversed or omitted, certain features may be utilized independently, and embodiments or features of embodiments may be combined, all as would be apparent to one skilled in the art after having the benefit of this description. Changes may be made in the elements described herein without departing from the spirit and scope of the invention as described in the following claims. Headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description. [00131] As used throughout this application, the word “may” is used in a permissive sense

(i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). The words “include”, “including”, and “includes” and the like mean including, but not limited to. As used throughout this application, the singular forms “a,” “an,” and “the” include plural referents unless the content explicitly indicates otherwise. Thus, for example, reference to “an” element or "a” element includes a combination of two or more elements, notwithstanding use of other terms and phrases for one or more elements, such as “one or more.” As used herein, unless specifically stated otherwise, the term “or” encompasses all possible combinations, except where infeasible. For example, if it is stated that a component may include A or B, then, unless specifically stated otherwise or infeasible, the component may include A, or B, or A and B. As a second example, if it is stated that a component may include A, B, or C, then, unless specifically stated otherwise or infeasible, the component may include A, or B, or C, or A and B, or A and C, or B and C, or A and B and C.

[00132] Terms describing conditional relationships, e.g., "in response to X, Y," "upon X, Y,",

“if X, Y,” "when X, Y," and the like, encompass causal relationships in which the antecedent is a necessary causal condition, the antecedent is a sufficient causal condition, or the antecedent is a contributory causal condition of the consequent, e.g., "state X occurs upon condition Y obtaining" is generic to "X occurs solely upon Y" and "X occurs upon Y and Z." Such conditional relationships are not limited to consequences that instantly follow the antecedent obtaining, as some consequences may be delayed, and in conditional statements, antecedents are connected to their consequents, e.g., the antecedent is relevant to the likelihood of the consequent occurring. Statements in which a plurality of attributes or functions are mapped to a plurality of objects (e.g., one or more processors performing steps A, B, C, and D) encompasses both all such attributes or functions being mapped to all such objects and subsets of the attributes or functions being mapped to subsets of the attributes or functions (e.g., both all processors each performing steps A-D, and a case in which processor 1 performs step A, processor 2 performs step B and part of step C, and processor 3 performs part of step C and step D), unless otherwise indicated. Further, unless otherwise indicated, statements that one value or action is “based on” another condition or value encompass both instances in which the condition or value is the sole factor and instances in which the condition or value is one factor among a plurality of factors. Unless otherwise indicated, statements that “each” instance of some collection have some property should not be read to exclude cases where some otherwise identical or similar members of a larger collection do not have the property, i.e., each does not necessarily mean each and every. References to selection from a range includes the end points of the range.

[00133] In the above description, any processes, descriptions or blocks in flowcharts should be understood as representing modules, segments or portions of code which include one or more executable instructions for implementing specific logical functions or steps in the process, and alternate implementations are included within the scope of the exemplary embodiments of the present advancements in which functions can be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending upon the functionality involved, as would be understood by those skilled in the art.

[00134] To the extent certain U.S. patents, U.S. patent applications, or other materials (e.g., articles) have been incorporated by reference, the text of such U.S. patents, U.S. patent applications, and other materials is only incorporated by reference to the extent that no conflict exists between such material and the statements and drawings set forth herein. In the event of such conflict, any such conflicting text in such incorporated by reference U.S. patents, U.S. patent applications, and other materials is specifically not incorporated by reference herein.

[00135] While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the present disclosures. Indeed, the novel methods, apparatuses and systems described herein can be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the methods, apparatuses and systems described herein can be made without departing from the spirit of the present disclosures. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the present disclosures.

Claims

1. A non-transitory computer-readable medium having instructions that, when executed by a computer, cause the computer to execute a method for training a machine learning model to determine a physical condition of an object holder of a lithographic apparatus, the method comprising: obtaining, from a first machine learning model, a first plurality of condition data that is representative of a physical condition of multiple burls on a surface of an object holder of a lithographic apparatus, wherein the first plurality of condition data is obtained using a plurality of first type of burl data associated with the burls; obtaining a plurality of second type of burl data associated with the burls, the second type of burl data being different from the first type of burl data; and training, based on the plurality of second type of burl data and the first plurality of condition data, a second machine learning model to predict condition data representative of a physical condition of a burl of the burls.

2. The computer-readable medium of claim 1, wherein the first type of burl data includes multiple images of a first burl of the burls, wherein the images are of different image types.

3. The computer-readable medium of claim 2, wherein the second type of burl data includes an image of one or more of the multiple burls, wherein the image is of an image type, the image type being different from or a subset of image types of the images in the first type of burl data.

4. The computer-readable medium of claim 3, wherein the images of the first burl in the first type of burl data are of a first resolution, and wherein the image of the one or more of the multiple burls in the second type of burl data is of a second resolution lower than the first resolution.

5. The computer-readable medium of claim 2, wherein the images of the first burl in the first type of burl data includes an intensity image and a height image of the first burl.

6. The computer-readable medium of claim 2, wherein the images of the first burl in the first type of burl data includes an intensity image, a height image, and a delta image of the first burl, wherein the delta image is representative of a difference between a first image of the first burl obtained prior to performing a process using the lithographic apparatus on an object held by the object holder and a second image of the first burl obtained after performing the process.

7. The computer-readable medium of claim 2, wherein the images of the first burl in the first type of burl data includes a distorted image of any of the images of the first burl.

8. The computer-readable medium of claim 2, wherein the second type of burl data includes one of the images of the first burl.

9. The computer-readable medium of claim 8, wherein the second type of burl data includes an intensity image of the first burl.

10. The computer-readable medium of claim 1, wherein obtaining the first plurality of condition data includes: providing the plurality of first type of burl data to the first machine learning model that is trained to predict condition data representative of a physical condition of a burl of the burls, wherein the plurality of first type of burl data includes first burl data of a first burl of the burls, the first burl data including multiple images of the first burl, wherein the images are of a first resolution; and obtaining the first plurality of condition data, the first plurality of condition data including first condition data that is representative of a physical condition of the first burl.

11. The computer-readable medium of claim 10, wherein obtaining the first plurality of condition data further includes: obtaining second burl data of the plurality of second type of burl data, wherein the second burl data includes an image of one or more of the multiple burls, wherein the image is of a second resolution lower than the first resolution; and associating the first plurality of condition data with the corresponding burls in the image.

12. The computer-readable medium of claim 11, wherein associating the first plurality of condition data includes: obtaining, using the first burl data, location data representative of a location of the first burl on the object holder; determining the first burl in the image of the second burl data based on the location data; and associating the first condition data with the first burl in the image of the second burl data.

13. The computer-readable medium of claim 10, wherein obtaining the first plurality of condition data includes: obtaining a set of first type of burl data associated with the burls, wherein the set of first type of burl data includes third burl data of a third burl of the burls, wherein the third burl data is associated with third condition data that is representative of the physical condition of the third burl; and training, based on the set of first type of burl data, the first machine learning model to predict condition data for the third burl data such that a cost function that is indicative of a difference between the predicted condition data and the third condition data is reduced.

14. The computer-readable medium of claim 1, wherein training the second machine learning model is an iterative process in which each iteration includes:

(c) adjusting parameters of the second machine learning model,

15. The computer-readable medium of claim 1 further comprising: inputting specified burl data of one or more burls on the object holder to the second machine learning model; and executing the second machine learning model to generate specified condition data of the one or more burls, wherein the specified condition data is representative of physical conditions of the one or more burls.