CN117561539A

CN117561539A - Deep learning for unsupervised or self-supervised semiconductor-based applications

Info

Publication number: CN117561539A
Application number: CN202280043897.9A
Authority: CN
Inventors: 张晶; R·塞阿加拉简; 董宇杰; J·强·宋; K·巴哈斯卡尔
Original assignee: KLA Tencor Corp
Current assignee: KLA Corp
Priority date: 2021-10-04
Filing date: 2022-10-03
Publication date: 2024-02-13

Abstract

Methods and systems for determining information of a sample are provided. A system includes a computer subsystem and one or more components executed by the computer subsystem, the one or more components including a Deep Learning (DL) model trained without tagged data (e.g., in an unsupervised or self-supervised manner) and configured to generate a reference for a sample from one or more inputs including at least a sample image or data generated from the sample image. The computer subsystem is configured for determining information of the sample from the reference and at least the sample image or the data generated from the sample image.

Description

Deep learning for unsupervised or self-supervised semiconductor-based applications

Technical Field

The present invention generally relates to methods and systems for determining information of a sample. Certain embodiments relate to a deep learning model that is trained without labeled data (e.g., in an unsupervised or self-supervised manner) and is configured to generate a reference for a sample from one or more inputs including at least one sample image or data generated from the sample image.

Background

The following description and examples are not admitted to be prior art by inclusion in this paragraph.

Manufacturing semiconductor devices such as logic and memory devices typically involves processing a substrate (e.g., a semiconductor wafer) using a large number of semiconductor manufacturing processes to form various features and multiple levels of the semiconductor device. For example, photolithography is a semiconductor manufacturing process that involves transferring a pattern from a reticle to a photoresist disposed on a semiconductor wafer. Additional examples of semiconductor fabrication processes include, but are not limited to, chemical Mechanical Polishing (CMP), etching, deposition, and ion implantation. Multiple semiconductor devices may be arranged to be fabricated on a single semiconductor wafer and then separated into individual semiconductor devices.

Inspection processes are used at various steps during the semiconductor manufacturing process to detect defects on samples to drive higher yields and thus higher profits in the manufacturing process. Inspection is always an important part of manufacturing semiconductor devices. However, as semiconductor devices decrease in size, inspection becomes more important for successful manufacture of acceptable semiconductor devices because smaller defects may cause device failure.

Defect inspection typically involves re-detecting defects that are themselves detected by the inspection process, and using high magnification optical systems or Scanning Electron Microscopes (SEM) to generate additional information about the defects at a higher resolution. Thus, defect inspection is performed at discrete locations on the sample where defects have been detected by inspection. Higher resolution data of defects generated by defect inspection is better suited for determining properties of the defects, such as profile, roughness, more accurate size information, etc. Defects may generally be more accurately classified as defect types based on information determined through defect inspection than inspection.

The metrology process is also used at various steps during the semiconductor manufacturing process to monitor and control the process. The metrology process is different from the inspection process in that, unlike the inspection process in which defects are detected on a sample, the metrology process is used to measure one or more characteristics of a sample that cannot be determined using currently used inspection tools. For example, metrology processes are used to measure one or more characteristics of a sample, such as the dimensions (e.g., line width, thickness, etc.) of features formed on the sample during the process, so that the performance of the process can be determined from the one or more characteristics. Additionally, if one or more characteristics of the sample are unacceptable (e.g., outside a predetermined range of characteristics), the measurement of the one or more characteristics of the sample may be used to alter one or more parameters of the process such that additional samples manufactured by the process have acceptable characteristics.

The metrology process is also different from the defect inspection process in that, unlike the defect inspection process in which defects detected by inspection are revisited in defect inspection, the metrology process may be performed at locations where defects have not been detected. In other words, unlike defect inspection, the location at which the metrology process is performed on the sample may be independent of the results of the inspection process performed on the sample. In particular, the location at which the metering process is performed may be selected independently of the test results. In addition, since the location on the sample at which the metrology is performed may be selected independently of the inspection results, unlike defect inspection in which the location on the sample at which defect inspection is to be performed cannot be determined until the inspection results of the sample are generated and available for use, the location at which the metrology process is performed may be determined before the inspection process has been performed on the sample.

Many different kinds of algorithms are currently used with the process described above and vary depending on the process itself, the sample and the information determined for it. Different kinds of such algorithms may be divided into different categories in various ways, such as a deep learning based way and a non-deep learning based way. In the inspection example, some non-deep learning defect detection algorithms are unsupervised and use frequency measurements for marginal or joint probabilities. One example of a non-deep learning defect detection algorithm used by some inspection tools commercially available from KLA corporation of Milpitas, california is the multi-die auto-thresholding (MDAT) algorithm. Unlike such algorithms, supervised detection of machine learning or deep learning authorization may be performed via Convolutional Neural Networks (CNNs) or object detection networks.

While many of the algorithms described above have proven to be useful in the field to varying degrees, these approaches may still suffer from some drawbacks to be improved. For example, many non-deep learning defect detection algorithms are difficult to apply to multi-mode or multi-angle data input. It is becoming increasingly important to have the ability to utilize multi-mode or multi-angle data entry as tools are pushed beyond their optimal performance that can be achieved using only single-mode data. In another example, the machine learning or deep learning defect detection method described above may require a substantially large training data set, which is not in fact always available or may incur substantially high cost of ownership in terms of time and physical expense (like a wafer or other sample) to obtain the results.

Accordingly, it would be advantageous to develop systems and methods for determining information of a sample that do not have one or more of the above-described drawbacks.

Disclosure of Invention

The following description of various embodiments should in no way be construed as limiting the scope of the appended claims.

One embodiment relates to a system configured to determine information of a sample. The system includes a computer subsystem and one or more components executed by the computer subsystem, the one or more components including a Deep Learning (DL) model trained without labeled data and configured to generate a reference of a sample from one or more inputs including at least one sample image or data generated from the sample image. The computer subsystem is configured for determining information of the sample from the reference and at least the sample image or the data generated from the sample image. The system may be further configured as described herein.

Another embodiment relates to a computer-implemented method for determining information of a sample. The method includes generating a reference for the sample by inputting one or more inputs to a DL model trained without using the labeled data. The one or more inputs include at least one sample image or data generated from the sample image. The method also includes determining information for the sample from the reference and at least the sample image or the data generated from the sample image. The inputting and determining steps are performed by a computer subsystem. Each of the steps of the method may be performed as further described herein. The method may include any other steps of any other method described herein. The method may be performed by any of the systems described herein.

Another embodiment relates to a non-transitory computer-readable medium storing program instructions executable on a computer system to perform a computer-implemented method for determining information of a sample. The computer-implemented method comprises the steps of the method described above. The computer-readable medium may be further configured as described herein. The steps of the computer-implemented method may be performed as further described herein. Additionally, a computer-implemented method for which program instructions may be executed may include any other steps of any other method described herein.

Drawings

Further advantages of the present invention will become apparent to those skilled in the art upon reading the following detailed description of the preferred embodiments and upon reference to the accompanying drawings in which:

FIGS. 1 and 1a are schematic diagrams illustrating side views of embodiments of systems configured as described herein;

FIGS. 2-3 are flowcharts illustrating embodiments of steps that may be performed for determining information of a sample; and

FIG. 4 is a block diagram illustrating one embodiment of a non-transitory computer-readable medium storing program instructions for causing a computer system to perform the computer-implemented methods described herein.

While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and are herein described in detail. The drawings may not be to scale. It should be understood, however, that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present invention as defined by the appended claims.

Detailed Description

Referring now to the drawings, it should be noted that the drawings are not drawn to scale. In particular, the dimensions of some of the elements in the figures are exaggerated to a large extent to help emphasize their characteristics. It should also be noted that the figures are not drawn to the same scale. The use of the same reference symbols indicates elements that may be similarly configured shown in more than one figure. Any element described and shown may comprise any suitable commercially available element unless otherwise specified herein.

In general, embodiments described herein are configured for determining information of a sample for inspection applications (e.g., detecting defects on the sample and/or other semiconductor-based applications, such as metrology and defect inspection) via learning a reference (e.g., a reference image or structural noise of the sample).

In some embodiments, the sample is a wafer. The wafer may comprise any wafer known in the semiconductor arts. Although some embodiments are described herein with respect to a wafer or a number of wafers, embodiments are not limited to samples thereof may be used. For example, the embodiments described herein may be used with samples such as reticle, flat panel, personal Computer (PC) board, and other semiconductor samples.

One embodiment of a system configured for determining information of a sample is shown in fig. 1. In some embodiments, system 10 includes an imaging subsystem, such as imaging subsystem 100. The imaging subsystem includes and/or is coupled to a computer subsystem, such as computer subsystem 36 and/or one or more computer systems 102.

In general, the imaging subsystem described herein includes at least one energy source, a detector, and a scanning subsystem. The energy source is configured to generate energy directed to the sample by the imaging subsystem. The detector is configured to detect energy from the sample and to generate an output in response to the detected energy. The scanning subsystem is configured to change the location on the sample to which energy is directed and from which energy is detected. In one embodiment, as shown in fig. 1, the imaging subsystem is configured as a light-based imaging subsystem. In this way, the sample images described herein may be generated by a light-based imaging subsystem.

In the light-based imaging subsystem described herein, the energy directed to the sample includes light, and the energy detected from the sample includes light. For example, in the embodiment of the system shown in fig. 1, the imaging subsystem includes an illumination subsystem configured to direct light to the sample 14. The illumination subsystem includes at least one light source. For example, as shown in fig. 1, the illumination subsystem includes a light source 16. The illumination subsystem is configured to direct light to the sample at one or more angles of incidence, which may include one or more tilt angles and/or one or more normal angles. For example, as shown in fig. 1, light from the light source 16 is directed through the optical element 18 and then through the lens 20 to the sample 14 at an oblique angle of incidence. The oblique angle of incidence may include any suitable oblique angle of incidence, which may vary depending on, for example, the characteristics of the sample and the process to be performed on the sample.

The illumination subsystem may be configured to direct light to the sample at different angles of incidence at different times. For example, the imaging subsystem may be configured to alter one or more characteristics of one or more elements of the illumination subsystem such that light may be directed to the sample at an angle of incidence different than that shown in fig. 1. In one such example, the imaging subsystem may be configured to move light source 16, optical element 18, and lens 20 such that light is directed to the sample at different oblique or normal (or near normal) angles of incidence.

In some examples, the imaging subsystem may be configured to direct light to the sample at more than one angle of incidence at the same time. For example, the illumination subsystem may include more than one illumination channel, one of which may include a light source 16, an optical element 18, and a lens 20 as shown in fig. 1, and another one of which (not shown) may include similar elements, which may be different or the same configuration, or may include at least one light source and possibly one or more other components (e.g., components further described herein). If this light is directed to the sample at the same time as other light, one or more characteristics (e.g., wavelength, polarization, etc.) of the light directed to the sample at different angles of incidence may be different so that the sources can be distinguished from each other at the detector from the light illuminating the sample at different angles of incidence.

In another example, the illumination subsystem may include only one light source (e.g., source 16 shown in fig. 1) and light from the light source may be separated into different optical paths by one or more optical elements (not shown) of the illumination subsystem (e.g., based on wavelength, polarization, etc.). Light in each of the different optical paths may then be directed to the sample. The plurality of illumination channels may be configured to direct light to the sample at the same time or at different times (e.g., when different illumination channels are used to sequentially illuminate the sample). In another example, the same illumination channel may be configured to direct light having different characteristics to the sample at different times. For example, the optical element 18 may be configured as a spectral filter and the properties of the spectral filters may be changed in a variety of different ways (e.g., by swapping one spectral filter for another spectral filter) so that light of different wavelengths may be directed to the sample at different times. The illumination subsystem may have any other suitable configuration known in the art for directing light having different or the same characteristics to the sample, sequentially or simultaneously, at different or the same angles of incidence.

Light source 16 may comprise a broadband plasma (BBP) light source. In this way, the light generated by the light source and directed to the sample may comprise broadband light. However, the light source may include any other suitable light source, such as any suitable laser known in the art configured to generate light of any suitable wavelength. The laser may be configured to produce monochromatic or near-monochromatic light. In this way, the laser may be a narrow frequency laser. The light source may also comprise a polychromatic light source that produces light of a plurality of discrete wavelengths or wavelength bands.

Light from the optical element 18 may be focused onto the sample 14 by a lens 20. Although lens 20 is shown in fig. 1 as a single refractive optical element, in practice lens 20 may include several refractive and/or reflective optical elements that focus light from the optical elements in combination to the sample. The illumination subsystem shown in fig. 1 and described herein may include any other suitable optical elements (not shown). Examples of such optical elements include, but are not limited to, polarizing components, spectral filters, spatial filters, reflective optical elements, apodizers, beam splitters, apertures, and the like, which may include any such suitable optical elements known in the art. Additionally, the system may be configured to alter one or more elements of the illumination subsystem based on the type of illumination used for imaging.

The imaging subsystem may also include a scanning subsystem configured to change the location to which light on the sample is directed and from which light is detected and possibly cause light to be scanned throughout the sample. For example, the imaging subsystem may include stage 22 upon which sample 14 is disposed during imaging. The scanning subsystem may include any suitable mechanical and/or robotic assembly (including stage 22) that may be configured to move the sample such that light may be directed to and detected from different locations on the sample. Additionally or alternatively, the imaging subsystem may be configured such that one or more optical elements of the imaging subsystem perform a certain scan of light throughout the sample such that light may be directed to and detected from different locations on the sample. In examples where light is scanned across the sample, the light may be scanned across the sample in any suitable manner (e.g., in a serpentine path or in a spiral path).

The imaging subsystem further includes one or more detection channels. At least one of the detection channels includes a detector configured to detect light from the sample due to illumination of the sample by the imaging subsystem and to generate an output in response to the detected light. For example, the imaging subsystem shown in fig. 1 includes two detection channels, one formed by collector 24, element 26, and detector 28 and the other formed by collector 30, element 32, and detector 34. As shown in fig. 1, the two detection channels are configured to collect and detect light at different collection angles. In some examples, two detection channels are configured to detect scattered light, and the detection channels are configured to detect light scattered from the sample at different angles. However, one or more detection channels may be configured to detect another type of light (e.g., reflected light) from the sample.

As further shown in fig. 1, two detection channels are shown positioned in the paper plane and the illumination subsystem is also shown positioned in the paper plane. Thus, in this embodiment, the two detection channels are positioned (e.g., centered) in the plane of incidence. However, one or more detection channels may be positioned out of the plane of incidence. For example, the detection channel formed by collector 30, element 32, and detector 34 may be configured to collect and detect light scattered from the plane of incidence. Thus, such a detection channel may be commonly referred to as a "side" channel, and such a side channel may be centered in a plane substantially perpendicular to the plane of incidence.

Although fig. 1 shows an embodiment of an imaging subsystem including two detection channels, the imaging subsystem may include a different number of detection channels (e.g., only one detection channel or two or more detection channels). In one such example, the detection channel formed by collector 30, element 32, and detector 34 may form one side channel as described above, and the imaging subsystem may include an additional detection channel (not shown) formed as another side channel positioned on the opposite side of the plane of incidence. Thus, the imaging subsystem may include a detection channel that includes the light collector 24, the element 26, and the detector 28 and is centered in the plane of incidence and configured to collect and detect light at a scatter angle normal or near normal to the sample surface. Thus, such a detection channel may be commonly referred to as a "top" channel, and the imaging subsystem may also include two or more side channels configured as described above. Thus, the imaging subsystem may include at least three channels (i.e., one top channel and two side channels), and each of the at least three channels has its own light collector, each light collector configured to collect light at a different scatter angle than each of the other light collectors.

As further described above, each of the detection channels included in the imaging subsystem may be configured to detect scattered light. Thus, the imaging subsystem shown in fig. 1 may be configured for Dark Field (DF) imaging of a sample. However, the imaging subsystem may also or alternatively include a detection channel configured for Bright Field (BF) imaging of the sample. In other words, the imaging subsystem may include at least one detection channel configured to detect light specularly reflected from the sample. Thus, the imaging subsystem described herein may be configured for DF imaging alone, BF imaging alone, or both DF imaging and BF imaging. Although each of the collectors is shown as a single refractive optical element in fig. 1, each of the collectors may include one or more refractive optical elements and/or one or more reflective optical elements.

The one or more detection channels may include any suitable detector known in the art, such as a photomultiplier tube (PMT), a Charge Coupled Device (CCD), and a Time Delay Integration (TDI) camera. The detector may also include a non-imaging detector or an imaging detector. If the detectors are non-imaging detectors, each of the detectors may be configured to detect certain characteristics (e.g., intensity) of scattered light but may not be configured to detect such characteristics as a function of position within the imaging plane. Thus, the output generated by each of the detectors included in each of the detection channels of the imaging subsystem may be a signal or data, rather than an image signal or image data. In such examples, a computer subsystem (e.g., computer subsystem 36) may be configured to generate an image of the sample from the non-imaging output of the detector. However, in other examples, the detector may be configured as an imaging detector configured to generate imaging signals or image data. Thus, the imaging subsystem may be configured to generate images in several ways.

It should be noted that fig. 1 is provided herein to generally illustrate the configuration of an imaging subsystem that may be included in the system embodiments described herein. Obviously, the imaging subsystem configuration described herein may be altered to optimize the performance of the imaging subsystem as is typically performed when designing commercial imaging systems. In addition, the systems described herein may be implemented using existing systems such as tools commercially available from the 29xx/39xx series of KLA corporation of milpitas, california (e.g., by adding the functionality described herein to existing inspection systems). For some such systems, the methods described herein may be provided as optional functionality of the system (e.g., in addition to other functionality of the system). Alternatively, the system described herein may be designed "from scratch" to provide an entirely new system.

Computer subsystem 36 may be coupled to the detectors of the imaging subsystem in any suitable manner (e.g., via one or more transmission media, which may include "wired" and/or "wireless" transmission media) such that the computer subsystem may receive the output generated by the detectors. Computer subsystem 36 may be configured to perform several functions using the output of the detector. For example, if the system is configured as an inspection system, the computer subsystem may be configured to detect events (e.g., defects and potential defects) on the sample using the output of the detector. Detecting events on a sample may be performed as further described herein.

The computer subsystem 36 may be further configured as described herein. For example, computer subsystem 36 may be configured to perform the steps described herein. Thus, the steps described herein may be performed "on-tool" by a computer subsystem coupled to or part of an imaging subsystem. Additionally or alternatively, computer system 102 may perform one or more steps described herein. Accordingly, one or more steps described herein may be performed "off-tool" by a computer system that is not directly coupled to the imaging subsystem.

Computer subsystem 36 (and other computer subsystems described herein) may also be referred to herein as a computer system. Each of the computer subsystems or systems described herein may take various forms, including a personal computer system, an image computer, a mainframe computer system, a workstation, a network appliance, an internet appliance, or other device. In general, the term "computer system" may be broadly defined to encompass any device having one or more processors, which execute instructions from a memory medium. The computer subsystem or system may also include any suitable processor known in the art, such as a parallel processor. In addition, the computer subsystem or the system may include a computer platform (as a stand-alone tool or a network link tool) with high-speed processing and software.

If a system includes more than one computer subsystem, the different computer subsystems may be coupled to each other such that images, data, information, instructions, etc. may be sent between the computer subsystems. For example, computer subsystem 36 may be coupled to computer system 102 by any suitable transmission media, which may include any suitable wired and/or wireless transmission media known in the art (as shown by the dashed lines in FIG. 1). Two or more such computer subsystems may also be operatively coupled through a shared computer-readable storage medium (not shown).

Although the imaging subsystem is described above as an optical or light-based imaging subsystem, in another embodiment, the imaging subsystem is configured as an electron beam imaging subsystem. In this way, the sample images described herein may be generated by an electron beam imaging subsystem. In the electron beam imaging subsystem, the energy directed to the sample includes electrons, and the energy detected from the sample includes electrons. In one such embodiment shown in fig. 1a, the imaging subsystem includes an electron column 122, and the system includes a computer subsystem 124 coupled to the imaging subsystem. The computer subsystem 124 may be configured as described above. In addition, such an imaging subsystem may be coupled to another computer system or systems in the same manner described above and shown in FIG. 1.

As also shown in fig. 1a, the electron column includes an electron beam source 126 configured to generate electrons that are focused by one or more elements 130 to a sample 128. The electron beam source may include, for example, a cathode source or an emitter tip, and the one or more elements 130 may include, for example, a gun lens, an anode, a beam limiting aperture, a gate valve, a beam current selection aperture, an objective lens, and a scanning subsystem, all of which may include any such suitable elements known in the art.

Electrons (e.g., secondary electrons) returned from the sample may be focused by one or more elements 132 to a detector 134. One or more of the elements 132 may include, for example, a scanning subsystem, which may be the same scanning subsystem included in element 130.

The electron column may comprise any other suitable element known in the art. In addition, the electron column may be further configured as described in U.S. patent No. 8,664,594 to Jiang (Jiang) et al, U.S. patent No. 8,692,204 to Kong Jima (Kojima) et al, U.S. patent No. 8,698,093 to Gu Bensai (Gubbens) et al, U.S. patent No. 8,698,093 to Gubbens) et al, and U.S. patent No. 8,716,662 to Maxwell (Macdonald) et al, 4 months, 4 days, 2014, and 6 days, as incorporated herein by reference as if fully set forth herein.

Although the electron column is shown in fig. 1a as being configured such that electrons are directed to and scattered from the sample at an oblique angle of incidence and at another oblique angle, the electron beam may be directed to and scattered from the sample at any suitable angle. In addition, the electron beam imaging subsystem may be configured to use multiple modes (e.g., using different illumination angles, collection angles, etc.) to generate an output of the sample, as further described herein. The multiple modes of the electron beam imaging subsystem may differ in any output generation parameters of the imaging subsystem.

The computer subsystem 124 may be coupled to the detector 134, as described above. The detector may detect electrons returned from the surface of the sample, thereby forming an electron beam image of the sample (or other output of the sample). The electron beam image may comprise any suitable electron beam image. The computer subsystem 124 may be configured to detect events on the sample using the output generated by the detector 134, which may be performed as described further herein. Computer subsystem 124 may be configured to perform any of the additional steps described herein. The system including the imaging subsystem shown in fig. 1a may be further configured as described herein.

It should be noted that fig. 1a is provided herein to generally illustrate a configuration of an electron beam imaging subsystem that may be included in the embodiments described herein. As with the optical imaging subsystem described above, the electron beam imaging subsystem configuration described herein may be altered to optimize the performance of the imaging subsystem as is typically performed when designing a commercial system. In addition, the systems described herein may be implemented using existing systems such as tools commercially available from KLA (e.g., by adding the functionality described herein to existing systems). For some such systems, the methods described herein may be provided as optional functionality of the system (e.g., in addition to other functionality of the system). Alternatively, the system described herein may be designed "from scratch" to provide an entirely new system.

Although the imaging subsystem is described above as an optical or electron beam imaging subsystem, the imaging subsystem may be an ion beam imaging subsystem. This imaging subsystem may be configured as shown in fig. 1a, except that the electron beam source may be replaced with any suitable ion beam source known in the art. In addition, the imaging subsystem may include any other suitable ion beam imaging system, such as those included in commercial Focused Ion Beam (FIB) systems, helium Ion Microscope (HIM) systems, and Secondary Ion Mass Spectrometer (SIMS) systems.

As further mentioned above, the imaging subsystem may be configured to have multiple modes. In general, a "mode" is defined by the values of the parameters of the imaging subsystem used to generate the output of the sample. Thus, the different modes (other than the location on the sample where the output is generated) may differ in the value of at least one imaging parameter of the imaging subsystem. For example, for a light-based imaging subsystem, different modes may use different wavelengths of light. The modes may differ in the wavelength of light directed to the sample (e.g., by using different light sources, different spectral filters, etc. for the different modes), as further described herein. In another embodiment, different modes may use different illumination channels. For example, as mentioned above, the imaging subsystem may include more than one illumination channel. Thus, different illumination channels may be used for different modes.

The multiple modes may also differ in illumination and/or collection/detection. For example, as described further above, the imaging subsystem may include a plurality of detectors. Thus, one detector may be used in one mode and another detector may be used in another mode. Additionally, the modes may differ from one another in more than one manner described herein (e.g., different modes may have one or more different illumination parameters and one or more different detection parameters). In addition, the multiple modes may differ in angle, meaning having one or both of different incident angles and collection angles, which may be achieved as further described herein. For example, depending on the ability to scan a sample simultaneously using multiple modes, the imaging subsystem may be configured to scan the sample using different modes in the same scan or different scans.

In some examples, the systems described herein may be configured as verification systems. However, the systems described herein may be configured as another type of semiconductor-related quality control type system, such as defect inspection systems and metrology systems. For example, the embodiments of the imaging subsystem described herein and shown in fig. 1 and 1a may be modified in one or more parameters to provide different imaging capabilities depending on the application for which it is to be used. In one embodiment, the imaging subsystem is configured as an electron beam defect inspection subsystem. For example, the imaging subsystem shown in FIG. 1a may be configured to have a higher resolution if it is to be used for defect inspection or metrology, rather than for inspection. In other words, the embodiment of the imaging subsystem shown in fig. 1 and 1a describes some general and various configurations of the imaging subsystem, which may be tailored in several ways that will be apparent to those of skill in the art to produce imaging subsystems having different imaging capabilities more or less suited to different applications.

As mentioned above, the imaging subsystem may be configured for directing energy (e.g., light, electrons) to and/or scanning energy throughout a physical version of the sample, thereby generating an actual image for the physical version of the sample. In this manner, the imaging subsystem may be configured as a "real" imaging system rather than a "virtual" system. However, the storage medium (not shown) and computer subsystem 102 shown in FIG. 1 may be configured as a "virtual" system. In particular, the storage medium and computer subsystem are not part of imaging subsystem 100 and do not have any capability for handling a physical version of the sample but may be configured to use the stored detector output, a virtual checker that performs a checking function, a virtual metrology system that performs a metrology function, a virtual defect inspection tool that performs a defect inspection function, and so forth. Systems and methods configured as "virtual" systems are described in commonly assigned patents: us patent 8,126,255 issued to baskarl (Bhaskar) et al at 28, 2012; U.S. patent No. 9,222,895 to Duffy et al, 12/29 in 2015; and U.S. patent No. 9,816,939 to Duffy et al, 11/14/2017, which is incorporated herein by reference as if fully set forth herein. The embodiments described herein may be further configured as described in these patents. For example, the computer subsystems described herein may be further configured as described in these patents.

The system includes a computer subsystem (which may include any configuration of any of the computer subsystems or systems described above), and one or more components executed by the computer subsystem. For example, as shown in fig. 1, the system may include a computer subsystem 36 and one or more components 104 executed by the computer subsystem. One or more components may be performed by a computer subsystem as described further herein or in any other suitable manner known in the art. Executing at least a portion of one or more components may include inputting one or more inputs (e.g., images, data, etc.) into the one or more components. The component subsystem may be configured to input any image, data, etc. into one or more components in any suitable manner.

One or more components include a Deep Learning (DL) model trained without labeled data and configured to generate a reference for a sample from one or more inputs including at least one sample image or data generated from the sample image. The phrase "training without marked data" as used herein is defined as training at least initially or even completely without any way of marked data. For example, the first step of training may be a type of pre-training based on unlabeled images only, meaning that training is performed based on information contained in the data itself only.

This first step of training may also be referred to as pre-text or auxiliary tasks, which are different from the tasks for which the DL model will ultimately be used (i.e., its "downstream tasks"). In one such example, the pre-text or auxiliary task may be to acquire an unlabeled image, select and clip two or more tiles from the image, and then "learn" the relative positions of the tiles in the original image. In this way, the markers learned during this training step are from the data itself (i.e., where the clipped tiles are located in the image) rather than from sources external to the data (e.g., human-generated markers).

Features learned during this stage may then be used to train the DL model for tasks for which the DL model is configured (e.g., object detection or semantic segmentation). This second step of training (a transfer learning or fine tuning step) may also be performed without labeled data (i.e., unsupervised learning) or based on a substantially smaller (10-100 times smaller) labeled data set (i.e., self-supervised learning) than would be required if the entire training of the DL model was supervised. Achieving training using substantially smaller data sets is particularly important for the embodiments described herein because, unlike consumer-based applications (as if learning to distinguish people from automobiles), substantially large training data sets can often be difficult to generate due to the general lack of good exemplary images (e.g., as when defects of interest (DOIs) are small and far apart, especially during the setup phase of the inspection process).

In one embodiment, the DL model is trained in an unsupervised manner. For example, when all training steps are performed without labeled data, the training described above and further described herein is unsupervised. In another embodiment, the DL model is trained in a self-supervising manner. Self-supervised training is a branch of Machine Learning (ML) that trains DL models using unlabeled data. For example, when the initial training step is performed at least without labeled data, the training described above and further described herein is self-supervising. The algorithm X (and algorithm Z) described further herein and shown in fig. 2 and 3, respectively, may be selected to generate an impedance network (GAN), a pixel convolutional neural network (PixelCNN), a generation model, and the like. The PixelCNN can be trained in a self-supervised manner and the automatic encoder or generative model can be trained in a self-supervised or unsupervised manner.

GAN can be generally defined as a deep neural network architecture that includes two networks competing with each other. Additional description of the general architecture and configuration of the GAN and conditional GAN (cGAN) can be found in the following: U.S. patent application publication No. 2021/0272273 issued by Brauer at 2021, 9, 2; U.S. patent application Ser. No. 17/308,878, filed 5/2021 by Brauer et al; "generating an countermeasure network (Generative Adversarial Nets)", goodfeld et al, arXiv:1406.2661, 2014, 6, 10, 9 pages; "Semi-supervised learning with depth generation model (Semi-supervised Learning with Deep Generative Models)", gold Max (Kingma) et al, NIPS2014, 10 months 31, 2014, pages 1 to 9; "Condition generating countermeasure network (Conditional Generative Adversarial Nets)", mi Erza (Mirza) et al, arXiv:1411.1784, 2014, 11, 6, 7 pages; "reverse automatic encoder (Adversarial Autoencoders)", makhzani et al, arXiv 1511.05644v2, 25 th month of 2016, 16 pages; and "Image-to-Image conversion with conditional countermeasure network (Image-to-Image Translation with Conditional Adversarial Networks)", isola et al, arXiv:1611.07004v2, 2017, 11, 22, 17, which is incorporated herein by reference as if fully set forth. The embodiments described herein may be further configured as described in these references.

PixelCNN is the architecture of a full convolutional layer network that maintains the spatial resolution of its input throughout the layer and outputs a conditional distribution at each location. Examples of PixelCNNs that may be used in the embodiments described herein are contained in "Pixel recurrent neural network (Pixel Recurrent Neural Networks)", fan Dewo De (van den Oord) et al, arXiv:1601.06759, 2016, 8, 19, page 11, which is incorporated herein by reference as if fully set forth herein. The embodiments described herein may be further configured as described in this reference.

A "generative" model may be defined generally as a model that is probabilistic in nature. In other words, the "generation" model is not a model of the forward simulation or rule-based method and thus, a physical model of the process involved in generating the actual image is unnecessary. Alternatively, as described further herein, the generative model (where its parameters may be learned) may be learned based on a suitable training data set. The generative model may be configured to have a DL architecture that may include multiple layers that perform several algorithms or transformations. The number of layers included in the generative model may be use case dependent. Suitable ranges for layers are from 2 layers to tens of layers for practical purposes. A deep-generated model generally configured with a joint probability distribution (mean and variance) between learning inputs and outputs described herein may be described further herein and in U.S. patent No. 10,395,356 issued to Zhang et al, 8.27, 2019, which is incorporated herein as if fully set forth herein. The embodiments described herein may be further configured as described in this patent.

In one configuration, the DL model is trained in an independent manner to learn the low frequency structure of the data. Given input data X, potential space vector H (X) and output reconstructed data X _R Self-supervised loss functions L (X, X) _R ) Any one or a combination of (e.g., mean Square Error (MSE) loss, siamese loss, contrast loss, etc.) trains the generator in a self-supervised or unsupervised manner. For example, during training, the DL model as further described herein may be followedIs input to the DL model to learn the DL model to predict the reference. The reference or derivative thereof may be in both the input and the expected output, as is common in self-supervised or unsupervised algorithms. In one such example, when the predicted reference is a reference image, the training input may be any of the following: sample test image, sample test image and corresponding reference image and design information for the sample with the sample test image or reference image. The input may then be used to predict itself in a self-supervised or unsupervised manner. Similar to Principal Component Analysis (PCA), additional constraints may be added to the potential spatial vectors to ensure that the learned features are all orthogonal to each other. This may be achieved by multiplying the potential spatial vector by its transpose as input and by the density matrix (I) to MSE loss. L (L) _Orth ＝(H(X) ^T * H (X), I). Training may be stopped if the loss function and other validation metrics do not improve (stop in advance) after N number of epochs.

Any of the training described above may be performed by one or more computer subsystems included in the embodiments described herein. In this way, embodiments described herein may be configured for performing one or more setup or training functions of a DL model. However, any of the training described above may be performed by another method or system (not shown), and other methods or systems may make the trained DL model accessible to the embodiments described herein. In this way, embodiments described herein may be configured for training the DL models described further herein and for performing runtime functions, as if using the trained DL models to determine information for one or more runtime samples, which may be the same or different than the setup samples.

In one embodiment, the reference comprises a learned reference image when the one or more inputs comprise a sample image. In this way, the DL model can learn the reference directly via self-supervised or unsupervised learning. In particular, embodiments described herein may be configured for direct learning of non-defect patterns for defect detection on a wafer or reticle image (or another application described herein). One such embodiment is shown in fig. 2. For example, the sample image (also referred to herein as "data 1A") 200 is input to the reference study via a self-supervised or unsupervised method (also referred to herein as "algorithm X") step 202. Then in this embodiment the DL model is also referred to as "algorithm X".

The sample image 200 may be an image of a wafer or reticle image or another sample as described herein. The images may be generated by one of the imaging subsystems described herein and acquired by a computer subsystem in any suitable manner. The computer subsystem may input the sample image into the reference learning step 202 in any suitable manner. In most inspection use cases, this image will contain relatively sparse defect signals. In other words, if there is a defect in the region on the sample where the sample image was generated, the sample image will contain a defect signal corresponding to the defect. Thus, the defect signal in the sample image will vary depending on the defects present on the sample. Other signals in the sample image may also vary depending on any patterned features formed on the sample, any disturbing points or noise sources on the sample, etc.

Through algorithm X, a sample-learned reference (also referred to herein as "data 1B") 204 may be learned and calculated from data 1A. The difference between data 1A and data 1B is that data 1B, which is properly learned, does not contain a major defect signal from a statistical perspective. Data 1A and data 1B may then be input to a supervised or unsupervised information determination step 206 (also referred to herein as "algorithm Y") that generates determined information 208 (also referred to herein as "data 1C"). Step 206, algorithm Y, and data 1C may be further configured as described herein.

In another embodiment, the reference includes learned structural noise when the one or more inputs include data generated from the sample image and the data generated from the sample image includes structural noise. In this way, embodiments described herein may be configured for learning structural noise via self-supervised learning. In particular, embodiments described herein may be configured for learning non-defective structural noise for applications such as defect detection on wafer or reticle images. In one such embodiment shown in fig. 3, a sample image 300 (also referred to herein as "data 2A") and a sample reference 302 (also referred to herein as "data 2B") may be input to a calculate structural noise step 304, and the calculate structural noise step 304 may calculate structural noise 306 (also referred to herein as "data 2C") from data 2A and data 2B.

The sample image 300 may be an image of a wafer or reticle image or another sample as described herein. Images may be generated and acquired as described further herein. The computer subsystem may input the sample image into the calculate structural noise step 304 in any suitable manner. In most inspection use cases, this image will contain relatively sparse defect signals, and the signals in this image may vary as described above.

The sample reference 302 may be any suitable reference image, which may be generated as shown in fig. 2 or by any other suitable (DL or non-DL) method known in the art. For example, the sample reference 302 may be an image of only the region of the sample that corresponds to the region where the sample image 300 was generated. The sample reference 302 may be generated by modifying or combining (e.g., by filtering, averaging, etc.) one or more images corresponding to (and possibly including) the sample image. In another example, the sample reference 302 may be generated by the DL reference learning step shown in fig. 2 or by another suitable DL or ML method known in the art. For example, the DL or ML method may be configured to generate a reference image from design information of a sample. When the sample reference 302 is generated as shown in fig. 2, the embodiment shown in fig. 3 adds structural noise calculations substantially prior to the steps. The computer subsystem may input the sample reference image into the calculate structural noise step 304 in any suitable manner.

Generating or acquiring a sample reference is typically performed in a manner that minimizes any defect signals in the image used as or for generating the sample reference. For example, a sample reference may be obtained by taking the average or median (or other equivalent) of the images of two or more adjacent dies/units, which may advantageously suppress the intensity of (but may not eliminate) high frequency defective components in the images. In another example, a noise suppression technique as currently used with computed references may be used to generate the sample reference.

The high frequency defective noise component that cannot be eliminated by this step represents local structural noise that can be learned by algorithm Z described further below. For the embodiments described herein, it may be important to learn the high frequency defective noise component. In general, any measured intensity from the optics is additive intensity from both the signal and noise. For example, consider a relatively small local signal at the same location and extensive/diffuse noise. When the noise is relatively small, a peak signal with relatively little background noise is observed. However, when the noise is relatively high, a relatively small signal with a relatively high background noise is observed. This also applies to high frequency noise. Thus, by constructing/learning the noise component at/around the defect location, we can achieve a higher sensitivity by subtracting it from the combined intensity.

Structural noise is defined herein as a variant optical response or intensity relative to nominal optical imaging (as is the case for other types of imaging) relative to random noise. Whereas the reference image is an approximate representation of nominal optical imaging, one approximate representation of structural noise is a difference image. In this way, this embodiment can be considered as: instead of directly learning the reference image, a non-defective image is learned from a "difference" image by calculating structural noise. By including a "structural noise calculation" step prior to algorithm Z in this embodiment, we give a priori information to the DL model, which may help better suppress high frequency components in the reference they produce than the reference produced by the embodiment shown in fig. 2.

The calculate structural noise step 304 may be performed in various ways. As mentioned above, data 2A may be a wafer/reticle image of a test die/reticle, and data 2B may be a reference image of an adjacent die/unit or a simulated reference via physical modeling or ML/DL based modeling, including the references shown in fig. 2. The structural noise may then be determined in step 304 as a subtraction between the two inputs. In another option, the structural noise may be calculated in step 304 by taking the ratio between the two inputs. For example, the calculate structural noise step 304 may include: subtracting the sample image (2B-2A) from the sample reference or vice versa (2A-2B); dividing the sample image by the sample reference (2A/2B) or vice versa (2B/2A) etc. The output of this step is structural noise 306. The numbers along the various axes of structural noise 306 shown in fig. 3 are not relevant to an understanding of the embodiments described herein and are shown only in fig. 3 to convey the nature of the visual representation of the calculated structural noise shown in this figure.

The calculated structural noise may be input by the computer subsystem to the learning of the structural noise via self-supervised or unsupervised method step 308 (also referred to herein as "algorithm Z") in any suitable manner. In this embodiment, therefore, the DL model is also referred to as "algorithm Z". Other data described herein may also be input to algorithm Z along with the calculated structural noise. For example, the input may include any structural noise (e.g., 2A-2B, 2B-2A, 2A/2B, 2B/2A, etc.) that may be calculated as described above in combination with data 2B (sample reference). Algorithm Z will produce learned structural noise 310 (also referred to herein as "data 2D"). In this way, defect-free correlated structural noise (data 2D) in the calculated structural noise (data 2C) may be learned via algorithm Z, and the learned defect-free structural noise presented as data 2D. Like the calculated structural noise, the numbers along the various axes of the learned structural noise 310 are not relevant to the understanding of the embodiments described herein, and are shown only in fig. 3 to convey the nature of the visual representation of the structural noise shown in this figure.

The calculated structural noise (data 2C) and the learned structural noise (data 2D) differ in an important and possibly insignificant way. For example, the test image (data 2A) of the wafer or reticle contains a defective signal containing any defects in the area on the sample where the test image was generated. Data 2B (nominal or reference image) ideally contains no defective signals. Thus, data 2C (calculated structural noise) itself contains information from both process variations and defects. In contrast, the data 2D learned by algorithm Z re-stores most of the information/noise related to process variations but not to defects. By doing so, data 2C and data 2D may be used in combination to extract cleaner defect signals from unnecessary process variation signals. In other words, the embodiments described herein improve the defect signal, unlike many inspection processes that focus primarily on how to "clean" the noise. Importantly, by further separating defect-free structural noise from 2C via the DL model, better detection sensitivity can be achieved. The results of other processes described herein may be enhanced in a similar manner.

To restate the above from a mathematical perspective, the output 2C is obtained by a predetermined non-predictive method (e.g., subtraction/averaging/median image of different dies). The 2D is predicted by the algorithm Z taking 2C as input. In addition, as mentioned above, data 2D contains learnable non-defective structural noise and few defective signals, while data 2C contains both.

Data 2C and data 2D may then be input to a supervised or unsupervised information determination step 312 (also referred to herein as "algorithm Y") that generates determined information 314 (also referred to herein as "data 2E"). Data 2C and data 2D may be input to algorithm Y in a number of different ways in this embodiment, e.g., as 2C and 2D, as 2C-2D, as 2C/2D, etc. In this embodiment, the input to algorithm Y may also include any of the above inputs in combination with design information and/or a sample reference (data 2B). Step 312, algorithm Y, and data 2E may also be further configured as described herein.

In one embodiment, the one or more inputs to the DL model (e.g., algorithm X or algorithm Z shown in fig. 2 or 3, respectively) also include design information for the sample and at least the sample image or data generated from the sample image. For example, in the embodiment shown in FIG. 3, the input may include, in addition to the design information, data generated from the sample image, i.e., any structural noise (e.g., 2A-2B, 2B-2A, 2A/2B, 2B/2A, etc.) calculated as described above that may combine data 2B (sample reference). Design or Computer Aided Design (CAD) information may be critical to reference learning or structural noise learning. Design images presented at the same pixel size as the images collected by/from the imaging subsystem or at smaller pixel sizes (e.g., 2X, 4X, 8X scaled designs) may be used as inputs to algorithm X and algorithm Z. In both examples, the design may also be input to algorithm Y. In other such examples, the design may also be input to algorithm Y only (and not algorithm X or algorithm Z, as the case may be).

The terms "design," "design data," and "design information," as used interchangeably herein, generally refer to the physical design (layout) of an IC or other semiconductor device and data derived from the physical design by complex simulation or simple geometric and Boolean (Boolean) calculations. The design may include any other design data or design data agent described in commonly owned U.S. patent No. 7,570,796 to zafire (Zafar) et al, 8.4, and commonly owned U.S. patent No. 7,676,077 to kukulkarni (Kulkarni) et al, 9.3, 2010, both of which are incorporated herein by reference as if fully set forth herein. In addition, the design data may be standard cell library data, integrated layout data, design data for one or more layers, derivatives of design data, and full or partial chip design data. Furthermore, "design," "design data," and "design information" described herein refer to information and data that is generated by a semiconductor device designer during the design process and thus may be well used in the embodiments described herein before the design is printed on any physical sample, such as reticles and wafers.

In one such embodiment, the one or more inputs, design information, and data generated from the sample image do not include region of interest information for the sample. For example, embodiments described herein may incorporate design information directly into the information determination process (e.g., as one of the inputs to one or more of algorithm X, algorithm Y, and algorithm Z) without generating a region of interest from the design. For many of the processes described herein, this may provide for higher sensitivity (because other inputs and/or determined information may be directly aligned and correlated with design information) and better time to obtain results (e.g., by eliminating the region of interest generation process).

The "region of interest" as commonly referred to in the art is the region of interest on the sample for testing purposes. Sometimes, the region of interest is used to distinguish between an inspected region on the sample and a region on the sample that was not inspected during the inspection process. In addition, regions of interest are sometimes used to distinguish between regions on a sample that are examined using one or more different parameters. For example, if a first region of the sample is more critical than a second region on the sample, the first region may be inspected using a higher sensitivity than the second region such that defects are detected in the first region using the higher sensitivity. Other parameters of the inspection process may be altered with the region of interest in a similar manner.

In another embodiment, the one or more inputs also include region of interest information of the sample and at least the sample image or data generated from the sample image. For example, the design information may be converted into the area of interest in any suitable manner, such as a NanoPoint or PixelPoint area of interest used by some tools commercially available from the KLA. The region of interest information may be used as input to one or both of algorithm X (or algorithm Z) and algorithm Y. In this way, when focus area information is available for the embodiments described herein, this information may be input to any of the algorithms described herein in combination with other inputs to the algorithms.

The data input to the DL model in any of the embodiments described herein may be single mode data or multi-mode data. For example, the data 1A shown in fig. 2 may be single mode or multi-mode imaging data. In another example, data 2A and data 2B shown in fig. 3 may be single mode or multi-mode imaging data. Single or multiple modes may include any of the modes further described herein (including multi-angle modes), and single or multiple modes of data may be generated and acquired as further described herein. As described below, when the data input to the DL model includes multi-mode data (i.e., multi-mode data), data of different modes may be input in various ways depending on the configuration of the DL model.

In some embodiments, the sample image is generated using a first mode of the imaging subsystem, the DL model is configured to generate additional references to the sample from one or more additional inputs including additional sample images generated using at least a second mode of the imaging subsystem or data generated from the additional sample images, and the computer subsystem is configured for determining additional information for the sample from the additional references and at least the additional sample images or data generated from the additional sample images. For example, in an embodiment where a sample reference is learned, in a multi-mode setting, each mode will have a different 1A and 1B. In another example, in an embodiment where the sample reference is learned structural noise, in a multi-mode setup, each mode would have different 2C and 2D. Thus, in essence, each of the steps shown in fig. 2 and 3 may be performed multiple times on a per-mode basis. In this way, the DL model may generate output 1 from the mode 1 input, output 2 from the mode 2 input, and so on for the N modes of interest.

In one such embodiment, the sample image and the additional sample image or data generated from the sample image and data generated from the additional sample image are separately input to the DL model at different times. For example, in this case, learning may be performed separately for each use in a multi-mode setting. In this way, the DL model can be run independently for each optical mode in a multi-mode setup. In another such embodiment, the sample image and the additional sample image or data generated from the sample image and data generated from the additional sample image are jointly input to the DL model. In this way, in a multi-mode setting, learning can be jointly performed for all modes through a single DL model. The DL model can then be run jointly using the multimodal data during run time. In this case, different 2C images may be stacked together as input.

The computer subsystem may acquire or generate an input multi-mode image 200 (or multi-mode structural noise 306 generated from the multi-mode sample image 300 and the multi-mode reference image 302) as further described herein, the input multi-mode image 200 being input by the computer subsystem to the multi-mode DL model. The input multimodal image (or input multimodal structural noise) may be generated by an imaging subsystem and/or a computer subsystem, as further described herein.

The computer subsystem is configured for determining information of the sample from the reference and at least the sample image or data generated from the sample image. In this way, the computer subsystem is configured for determining information from the learned reference image and the sample image or the learned structural noise and the calculated structural noise. The determined information and the manner in which the reference and at least the sample image or data generated from the sample image are used may vary depending on the process performed on the sample. In the embodiment shown in fig. 2 and 3, the step of determining information may be performed by a computer subsystem using algorithm Y. Such an algorithm may be part of one or more components executed by the computer subsystem or may be separate from the components.

In one embodiment, the computer subsystem is not configured to determine information from a reference of any other sample. For example, embodiments described herein may be configured for generating references on an as-needed basis for any one or more samples for which information is determined. In this way, for any sample that is inspected, measured, defect inspected, etc., different references can be generated from and used only for one DL model described herein. In other words, reference 1 may be generated for sample 1 and used only to determine information for sample 1, reference 2 may be generated for sample 2 and used only to determine information for sample 2, and so on. Generating different references for different samples using one DL model described herein may be performed in the same manner as described above with respect to multiple modes. When samples (even including samples manufactured in the same process and having the same layers formed thereon) may have different, and even sometimes significantly different, noise characteristics, it may be useful and advantageous to generate and use different predicted references for different samples. In this way, the embodiments described herein may be more stable to sample and process variations than embodiments using the same reference for multiple samples.

In another embodiment, the computer subsystem is configured for determining information of the sample from the reference and the sample-only image or data generated from the sample image. For example, embodiments described herein may be configured for generating references on an as-needed basis for any one or more sample images for which information is determined. In this way, for any sample image that is inspected, measured, defect inspected, etc., a different reference may be generated from and used only for one DL model described herein. In other words, reference 1 may be generated for sample image 1 and used only to determine information of sample image 1, reference 2 may be generated for sample image 2 and used only to determine information of sample image 2, and so on. Different references to generating different sample images using one DL model described herein may be performed in the same manner described above with respect to multiple modes. When sample images, even including sample images acquired from different regions on the same sample, each region having the same design information, and/or sample images acquired from different samples fabricated in the same process and having the same layers formed thereon, may have different, and sometimes significantly different, noise characteristics, it may be useful and advantageous to generate and use different predicted references for the different sample images. In this way, the embodiments described herein may be more stable to sample and intra-process variations than embodiments using the same reference for multiple sample images.

In some embodiments, the computer subsystem is configured for determining information of the sample by inputting the reference and at least the sample image or data generated from the sample image to a supervised DL model. For example, as shown in fig. 2, in the case of inspection, data 1A and data 1B may be input to algorithm Y to perform supervised defect detection. In a similar manner, as shown in fig. 3, in the case of inspection, data 2C and data 2D may be input to algorithm Y to perform supervised defect detection. The supervised defect detection may be: in the case of single mode testing, as described in U.S. patent application publication 2020/0327354 published by Zhang et al at 10/15 of 2020; in the case of multi-mode testing, as described in U.S. patent application publication No. 2021/0366103 issued by Zhang et al at 2021, 10, 25; or in any other suitable manner known in the art. These two U.S. patent application publications are incorporated herein by reference as if set forth in full. The embodiments described herein may be further configured as described in these publications.

In another embodiment, the computer subsystem is configured for determining information of the sample by inputting the reference and at least the sample image or data generated from the sample image to an unsupervised DL model. For example, if an unsupervised DL model can be used to determine any of the information further described herein, the computer subsystem may input reference and sample images or calculated structural noise into the unsupervised DL model for use in determining the information. The unsupervised DL model may include any suitable such model known in the art.

In a further embodiment, the computer subsystem is configured for determining information of the sample by inputting the reference and at least the sample image or data generated from the sample image to an unsupervised algorithm. In this embodiment, the unsupervised algorithm may be a non-DL algorithm. For example, as shown in FIG. 2, in the case of inspection, data 1A and data 1B may be input to algorithm Y to perform unsupervised defect detection. In another example, as shown in fig. 3, in the case of inspection, data 2C and data 2D may be input to algorithm Y to perform unsupervised defect detection. In both examples, algorithm Y may comprise any suitable unsupervised defect detection algorithm, such as the MCAT algorithm used by some inspection tools commercially available from KLA.

In some embodiments, the information determined for the sample includes predicted defect locations on the sample. For example, embodiments described herein may use a DL-based CNN, another DL model, or a non-DL approach for predicting the location of defects on BBP or other images. Each of these models, methods, or algorithms may be supervised or unsupervised. In the most general sense, predicting the location of a defect on a sample involves subtracting a non-defective (or as non-defective as the reference is capable of) image or data from the test image or data and then determining whether any differences therebetween are more likely to be defects. In the simplest case, this determination may involve applying a threshold to the difference separating the difference indicative of a defect from the difference not indicative of a defect. Obviously, the algorithm described above may be far more complex and cumbersome than this simple example, which is provided herein merely to convey the nature of predicting defect locations on a sample. In general, references generated as described herein and any other inputs described herein may be used for defect detection in the same manner as any other reference image/data and test image/data. In this way, the reference image/data and the test image/data are not specific to any particular defect detection algorithm or method.

The predicted defect locations may be determined in an inspection process in which a relatively large area on the sample is scanned by the imaging subsystem and then the image produced by such scanning is inspected for potential defects. In addition to predicted defect locations, algorithm Y (in each of the embodiments described herein) may also be configured for determining other information of predicted defect locations, such as defect classification and possible defect attributes. In general, the determination information may include one or more test results that produced the sample. Thus, basically, the step of determining information may have a plurality of output channels, each for a different type of information. The outputs from the multiple channels may then be combined into a single test result file for the sample (e.g., a KLARF file generated by some KLA test tools). In this way, multiple types of information may be present in the test result profile for any location on the sample.

In a similar manner, the process may be a defect inspection process. Unlike the inspection process, the defect inspection process typically revisits discrete locations of detected defects on the sample. An imaging subsystem configured for defect inspection may generate a sample image as described herein, which may be input to a DL model as described herein. The DL model may be trained and configured for generating a sample reference that may then be used with the sample image to determine whether a defect is actually present at the defect location identified by inspection and for determining one or more properties of the defect (like defect shape, size, roughness, background pattern information, etc.), and/or for determining defect classification (e.g., bridging type defect, missing feature defect, etc.). For defect inspection applications, algorithm Y may be any suitable defect inspection method or algorithm used on any suitable defect inspection tool. While algorithm Y and various inputs and outputs may be different for defect inspection use cases than inspection, the same DL model may be used for both defect inspection and inspection (after the application requires training). The DL model may be trained and configured in other ways as described above.

As described above, in some embodiments, the imaging subsystem may be configured for metering of samples. In one such embodiment, determining information includes determining one or more characteristics of a sample structure in the input image. For example, the DL models described herein may be configured for generating a sample reference that may be used with a sample image to determine metrology information for a sample. The metrology information may include any metrology information of interest that may vary depending on the structure on the sample. Examples of such metrology information include, but are not limited to, critical Dimensions (CDs), such as line widths and other dimensions of sample structures. The sample image may include any image produced by any metrology tool that may have a configuration such as described herein or any other suitable configuration known in the art. In this way, embodiments described herein may advantageously use a sample image generated by a metrology tool with a sample reference generated as described herein for predicting metrology information of a sample and any one or more sample structures included in an input image. For metrology applications, algorithm Y may be any suitable metrology method or algorithm used on any suitable metrology tool. While algorithm Y and various inputs and outputs may be different for metering use cases than for verification, the same DL model may be used for both metering and verification (after the application requires training). The DL model may be trained and configured in other ways as described above.

The computer subsystem may also be configured for generating results that include determined information, which may include any of the results or information described herein. The results of determining the information may be generated by the computer subsystem in any suitable manner. All embodiments described herein may be configured for storing the results of one or more steps of the embodiments in a computer-readable storage medium. The results may include any of the results described herein and may be stored in any manner known in the art. The results including the determined information may be of any suitable form or format, such as standard file types. Storage media may include any storage media described herein or any other suitable storage media known in the art.

After the information has been stored, the information may be accessed in a storage medium and used by any of the method or system embodiments described herein, formatted for display to a user, used by another software module, method or system, etc. to perform one or more functions of the sample or another sample of the same type. For example, the results generated by the computer subsystem may include information of any defects detected on the sample (e.g., the location of bounding boxes of detected defects, etc.), detection scores, information about defect classification (e.g., class labels or IDs, any defect attributes determined from any image, etc.), predicted sample structure measurements, sizes, shapes, etc., or any such suitable information known in the art. The information may be used by a computer subsystem or another system or method for performing additional functions of the sample and/or detected defects, such as sampling the defects for defect inspection or other analysis, determining root causes of the defects, etc.

Such functions also include, but are not limited to, altering processes, such as manufacturing processes or steps performed on or to be performed on the sample in a feedback or feedforward manner, and the like. For example, the computer subsystem may be configured to determine one or more changes to a process performed on the sample and/or a process to be performed on the sample based on the determined information. The change in the process may include any suitable change in one or more parameters of the process. In one such example, the computer subsystem preferably determines the change such that defects on other samples for which a revised process is performed may be reduced or prevented, defects on the sample may be corrected or eliminated in another process performed on the sample, defects may be compensated for in another process performed on the sample, and so forth. The computer subsystem may determine such changes in any suitable manner known in the art.

The changes may then be sent to a semiconductor manufacturing system (not shown) or to a storage medium (not shown) accessible to both the computer subsystem and the semiconductor manufacturing system. The semiconductor manufacturing system may or may not be part of the system embodiments described herein. For example, the imaging subsystem and/or computer subsystem described herein may be coupled to a semiconductor manufacturing system, such as via one or more common components (e.g., a housing, a power supply, a sample handling device or mechanism, etc.). The semiconductor manufacturing system may include any semiconductor manufacturing system known in the art, such as a photolithography tool, an etching tool, a chemical-mechanical polishing (CMP) tool, a deposition tool, and the like.

In addition to the advantages already described, the embodiments described herein have several advantages. For example, embodiments have advantages over currently used methods (e.g., using unsupervised defect detection algorithms for marginal or joint probability frequency measurement) including the ability to directly incorporate multi-mode and multi-angle data (enabling higher sensitivity). In another example, embodiments may be incorporated directly into design data without generating areas of interest, which enables higher sensitivity and better time to obtain results. In further embodiments, the embodiments described herein may learn and remove learned defect-free structural noise, which enables higher sensitivity.

The embodiments have advantages over currently used supervised ML or DL models including a 10-to 100-fold reduction in the number of labeled data points, which provides lower cost of ownership and better time to obtain results. In particular, embodiments will be easier, cheaper, and faster to set up because the embodiments described herein have significantly lower requirements on the tagged data than other ML and DL-based detectors.

Additional advantages of embodiments over typical sample inspection, metrology, defect inspection, and the like include higher signal-to-noise ratio and sensitivity than all existing solutions. In addition, the embodiments described herein are particularly applicable to High Volume Manufacturing (HVM) use cases, as well as many leading edge process control process limited research and development. For example, the embodiments described herein may be only ML/DL detection methods applicable to HVM use cases. Moreover, the embodiments described herein may have potentially more stable sensitivity to process variations than other process control methods and systems.

The embodiments described herein are also broadly applicable to any process control method requiring sample reference. For example, embodiments may be used in next generation BBP tools to address multi-mode defect detection complexity for current and future process nodes. Likewise, embodiments may be used in light scattering inspection tools to provide better performance of these tools. The embodiments described herein may be used to push the upper sensitivity limits of these and other tools described herein above the upper sensitivity limits currently achievable.

Each of the above-described embodiments may be combined together into one single embodiment. In other words, no embodiment is mutually exclusive of any other embodiment, unless mentioned otherwise herein.

Another embodiment relates to a computer-implemented method for determining information of a sample. The method includes generating a reference for the sample by inputting one or more inputs into a DL model trained without labeled data. The one or more inputs comprise at least one sample image or data generated from a sample image. The method also includes determining information for the sample from the reference and at least the sample image or data generated from the sample image. The inputting and determining steps are performed by a computer subsystem, which may be configured according to any of the embodiments described herein.

Each of the steps of the method may be performed as further described herein. The method may also include any other steps that may be performed by the imaging subsystem and/or the computer subsystem described herein. Additionally, the method may be performed by any of the system embodiments described herein.

Additional embodiments relate to a non-transitory computer-readable medium storing program instructions executable on a computer system to perform a computer-implemented method for determining information of a sample. One such embodiment is shown in fig. 4. In particular, as shown in FIG. 4, non-transitory computer-readable medium 400 includes program instructions 402 executable on computer system 404. A computer-implemented method may include any step of any method described herein.

Program instructions 402 implementing the methods of the methods described herein may be stored on computer-readable medium 400. The computer-readable medium may be a storage medium such as a magnetic or optical disk, magnetic tape, or any other suitable non-transitory computer-readable medium known in the art.

The program instructions may be implemented in any of a variety of ways, including process-based techniques, component-based techniques, and/or object-oriented techniques, etc. For example, program instructions may be implemented using ActiveX controls, C++ objects, javaBeans, microsoft Foundation categories ("MFCs"), SSEs (streaming SIMD extensions), python, tensorflow, or other techniques or methodologies, as desired.

The computer system 404 may be configured according to any of the embodiments described herein.

Further modifications and alternative embodiments of various aspects of the invention will be apparent to those skilled in the art in view of this description. For example, methods and systems for determining information of a sample are provided. Accordingly, this description is to be construed as illustrative only and is for the purpose of teaching those skilled in the art the general manner of carrying out the invention. It is to be understood that the forms of the invention shown and described herein are to be taken as the presently preferred embodiments. Elements and materials may be substituted for those illustrated and described herein, parts and processes may be reversed, and certain attributes of the invention may be utilized independently, as would be apparent to one of ordinary skill in the art having had the benefit of this description of the invention. Changes may be made in the elements described herein without departing from the spirit and scope of the invention as described in the following claims.

Claims

1. A system configured for determining information of a sample, comprising:

a computer subsystem; and

One or more components executed by the computer subsystem;

wherein the one or more components comprise a deep learning model trained without labeled data and configured to generate a reference of a sample from one or more inputs comprising at least a sample image or data generated from the sample image; and is also provided with

Wherein the computer subsystem is configured for determining information of the sample from the reference and at least the sample image or the data generated from the sample image.

2. The system of claim 1, wherein the deep learning model is further trained in an unsupervised manner.

3. The system of claim 1, wherein the deep learning model is further trained in a self-supervised manner.

4. The system of claim 1, wherein when the one or more inputs comprise the sample image, the reference comprises a learned reference image.

5. The system of claim 1, wherein when the one or more inputs comprise the data generated from the sample image and the data generated from the sample image comprises structural noise, the reference comprises learned structural noise.

6. The system of claim 1, wherein the one or more inputs further comprise design information for the sample and at least the sample image or the data generated from the sample image.

7. The system of claim 1, wherein the one or more inputs further comprise design information for the sample and at least the sample image or the data generated from the sample image, and wherein the one or more inputs, the design information, and the data generated from the sample image do not comprise region of interest information for the sample.

8. The system of claim 1, wherein the one or more inputs further comprise region of interest information for the sample and at least the sample image or the data generated from the sample image.

9. The system of claim 1, wherein the sample image is generated using a first mode of an imaging subsystem, wherein the deep learning model is further configured to generate additional references to the sample from one or more additional inputs including at least additional sample images generated using a second mode of the imaging subsystem or data generated from the additional sample images, and wherein the computer subsystem is further configured to determine additional information for the sample from the additional references and at least the additional sample images or the data generated from the additional sample images.

10. The system of claim 9, wherein the sample image and the additional sample image or the data generated from the sample image and the data generated from the additional sample image are input separately to the deep learning model at different times.

11. The system of claim 9, wherein the sample image and the additional sample image or the data generated from the sample image and the data generated from the additional sample image are jointly input to the deep learning model.

12. The system of claim 1, wherein the computer subsystem is not configured for determining information from a reference of any other sample.

13. The system of claim 1, wherein the computer subsystem is further configured for determining information of the sample from the reference and the sample image alone or the data generated from the sample image.

14. The system of claim 1, wherein the computer subsystem is further configured for determining the information of the sample by inputting the reference and at least the sample image or the data generated from the sample image into a supervised deep learning model.

15. The system of claim 1, wherein the computer subsystem is further configured for determining the information of the sample by inputting the reference and at least the sample image or the data generated from the sample image into an unsupervised deep learning model.

16. The system of claim 1, wherein the computer subsystem is further configured for determining the information of the sample by inputting the reference and at least the sample image or the data generated from the sample image into an unsupervised algorithm.

17. The system of claim 1, wherein the information determined for the sample comprises predicted defect locations on the sample.

18. The system of claim 1, wherein the sample image is generated by a light-based imaging subsystem.

19. A non-transitory computer readable medium storing program instructions executable on a computer system to perform a computer-implemented method for determining information of a sample, wherein the computer-implemented method comprises:

generating a reference for a sample by inputting one or more inputs into a deep learning model trained without using labeled data, wherein the one or more inputs include at least a sample image or data generated from the sample image; and

Information of the sample is determined from the reference and at least the sample image or the data generated from the sample image.

20. A computer-implemented method for determining information of a sample, comprising:

Information of the sample is determined from the reference and at least the sample image or the data generated from the sample image, wherein the inputting and the determining are performed by a computer subsystem.