WO2023137212A1

WO2023137212A1 - Anomaly detection in manufacturing processes using hidden markov model-based segmentation error correction of time-series sensor data

Info

Publication number: WO2023137212A1
Application number: PCT/US2023/010871
Authority: WO
Inventors: Roberto DAILEY; Dragan Djurdjanovic
Original assignee: Board Of Regents, The University Of Texas System
Priority date: 2022-01-16
Filing date: 2023-01-16
Publication date: 2023-07-20
Also published as: TW202343240A

Abstract

An exemplary anomaly detection system is disclosed for a feature-based assessment of a semiconductor fabrication equipment or process, as well as other manufacturing equipment and processes, that employs Hidden Markov Model-based segmentation error correction of time-series sensor data in the assessment. Notably, the feature-based assessment and segmentation error correction have been observed to provide a high detection rate of defects in a fabricated device and associated fabricated techniques and with a low false alarm rate.

Description

Anomaly Detection in Manufacturing Processes using Hidden Markov Model-based Segmentation Error Correction of Time-Series Sensor Data

Related Application

This application claims priority to, and the benefit of, U.S. Provisional Patent Application No. 63/300,020, filed January 16, 2022, which is incorporated by reference herein in its entirety.

Technical Field

[0001] The present disclosure generally relates to error detection in articles of manufactures from sensor readings of manufacturing, metrology, or inspection systems, in particular, anomaly detection and error correction in the analysis of the sensor readings and/or other associated data of such systems.

Background

[0002] Semiconductor manufacturing involves many complex processes being executed sequentially on a workpiece to create components critical to the functioning of electronic devices. Depending on the complexity of the fabricated number, the number of operations can vary from tens to hundreds, many of which are performed in ultra-precise operations in the micrometers, nanometers, or sub-nanometer scale.

[0003] Progress in semiconductor technology has been enabled by extremely tight control and consistent execution of the underlying manufacturing processes. As the dimensions and allowable tolerances on semiconductor components become restrictively small, the ability to reliably manufacture them requires that manufacturing equipment performs in a near-perfect manner in repeating their operations among batches of operations. While semiconductor fabrication equipment and metrology systems have long been equipped with a large array of sensors, these sensors have traditionally collected data at relatively low sampling rates, 1 Hz or lower. More advanced systems provide higher sampling rates, e.g., between 3-10 Hz, for improved data-driven decisions and control.

[0004] There is a benefit in the detection of anomalies and defects in fabricated devices during fabrication and in the quick and early detection of errors in fabrication processes and equipment.

Summary

[0005] An exemplary anomaly detection system is disclosed for a feature-based assessment of a semiconductor fabrication equipment or process, as well as other manufacturing equipment and processes, that employs Hidden Markov Model-based segmentation error correction of time-series sensor data in the assessment. Notably, the feature-based assessment and segmentation error correction have been observed to provide a high detection rate of defects in a fabricated device and associated fabricated techniques and with a low false alarm rate.

[0006] The segmentation error correction generates for a set of manufacturing equipment or processes and subsequently employs a Hidden Markov Model template that is then used to correct a number of segmentation errors to improve the accuracy and reduce false positives in the assessment. The anomaly detection system can be used for any number of manufacturing processes, e.g., for semiconductor fabrication equipment or processes, such as plasma etching system, liquid solution-etching system (wet etching), plasma-enhanced chemical vapor deposition system, thin-film deposition system, molecular-beam epitaxy (MBE) system, electron beam melting (EBM) system, chemical vapor deposition (CVD) system, and roll-to-roll web coating system.

[0007] In some embodiments, the segmentation correction operation can determine the presence of extraneous features in a data set due to segmentation error and remove the extraneous features or adjust the index of segments to correct for the extraneous feature. In some embodiments, the segmentation correction operation can determine the presence of incorrect classification of segments and correct for such misclassification to apply the correct feature operations to those segments. In some embodiments, the segmentation correction operation can determine the misclassification of segments after the feature operations are applied and correctly apply feature operations to segments temporally similar in time to one another.

[0008] The Hidden Markov Model can be generated at a given equipment and distributed to other equipment that can then use the shared Hidden Markov Model in combination with its own HHMs to perform signal segmentation. The shared and local HMMs are each selfcorrecting in their data parsing and alignment of data segments for data correct! on/curati on, e.g., for other analytics/ data mining (i.e., mining in the compressed domain), e.g., virtual metrology operation, tool matching operation, among other applications described herein.

[0009] In an aspect, a method is disclosed to detect an anomaly in a fabrication process for semiconductor devices, the method comprising a) generating a template hidden Markov model to align a first time series data collected from a sensor and associated with the fabrication process of a fabricated semiconductor device by: (i) retrieving, by a processor, a plurality of training sensor data sets associated with a plurality of fabricated semiconductor devices, wherein each of the plurality of training sensor data set comprises a training time-series data that is associated with a fabricated semiconductor device of the plurality of fabricated semiconductor devices; (ii) segmenting, by the processor, the each of the training time-series data to generate a plurality of segment data for the plurality of sensor data; (iii) performing, by the processor, a hidden Markov model analysis of the plurality of segment data to generate a template hidden Markov model that describes the hidden states of the plurality of fabricated semiconductor devices; and (iv) generating, by the processor, an ordered sequence of states of the plurality of segment data using parameters of the template hidden Markov model; (b) retrieving, by the processor, the first time series data associated with the fabrication process of the fabricated semiconductor device; and (c) aligning, by the processor, the first time series data to a second time series data associated with the same fabrication process using the generated ordered sequence of states, wherein the first time series data is compared to the second time series data to determine an analytical output (e.g., an anomaly, mismatch tool determination, virtual metrology output) in the fabrication process for the fabricated semiconductor device.

[0010] In some embodiments, the method further includes comparing, by the processor, the first time-series data to the second time-series data to determine the anomaly in the fabrication process for the fabricated semiconductor device (e.g., using a comparison operation or a correlation operation).

[0011] In some embodiments, the first time-series data is acquired from a first semiconductor device, wherein the second time-series data is acquired from a second semiconductor device, wherein the first semiconductor device and the second semiconductor device are in the same fabrication batch of fabricated semiconductor devices, wherein a batch is subjected to a same or similar process of fabrication for a given device pattern on a wafer.

[0012] In some embodiments, the first time-series data is acquired from a first semiconductor device, wherein the second time-series data is acquired from a second semiconductor device, wherein the first semiconductor device and the second semiconductor device are in different fabrication batches of fabricated semiconductor devices, wherein a batch is subjected to a same or similar process of fabrication for a given device pattern on a wafer.

[0013] In some embodiments, the step of aligning is performed using a Viterbi algorithm or a max-sum algorithm.

[0014] In some embodiments, the step of segmenting the first time-series data to generate a plurality of segment data comprises segmenting the first time-series data into a plurality of steady-state segments by determining, using a moving window of a predetermined size, along with the first time-series data, a set of regions of the first time-series data having values within a pre-defined threshold profile (e.g., within a 2AT range until more than 10% of the signal is outside the range); and segmenting the first time-series data into a plurality of transient state segments by labeling regions outside the plurality of steady-state segments as a plurality of transient state segments.

[0015] In some embodiments, the Hidden Markov Model template and thresholds may be employed as a proxy, or to determine, a virtual metrology measurement (e.g., layer thickness from chemical vapor deposition, layer width in etching, critical dimensions in photolithography). Virtual metrology measurement can predict or estimate the properties of a wafer based on machine parameters and sensor data in the production equipment, without performing the costly, destructive physical measurement of the wafer properties.

[0016] In some embodiments, the sensor that collected the first time-series data is a part of manufacturing equipment of the fabricated semiconductor device, wherein the manufacturing equipment is selected from the group consisting of a plasma etching system, a liquid solutionetching system (wet etching), a plasma-enhanced chemical vapor deposition system, a thin-film deposition system, a molecular-beam epitaxy (MBE) system, an electron beam melting (EBM) system, a chemical vapor deposition (CVD) system, and a roll-to-roll web coating system.

[0017] In some embodiments, the sensor that collected the first time-series data is a metrology or inspection equipment selected from the group consisting of: a wafer prober, imaging station, ellipsometer, CD-SEM, ion mill, C-V system, interferometer, source measure units (SME) magnetometer, optical and imaging system, profilometer, reflectometer, resistance probe, resistance high-energy electron diffraction (RHEED) system, and X-ray diffractometer. [0018] In some embodiments, the first time-series data is retrieved from a controller of manufacturing equipment of the fabricated semiconductor device, wherein the controller of the manufacturing equipment is operatively connected to the sensor.

[0019] In some embodiments, the first time-series data comprises observed measurements of a metrology signal associated with a device pattern on a wafer.

[0020] In some embodiments, the first time-series data comprises observed measurements of a power signal, a pressure signal, a temperature signal, a volume signal, a flow rate signal, a voltage signal, and an optical signal, any of which is associated with a fabrication process.

[0021] In some embodiments, the first time-series data is compared to the second timeseries data to determine accurate tool matching (e.g., chamber matching) between a piece of first fabrication equipment and a piece of second fabrication equipment employed in the same fabrication process. [0022] In some embodiments, the first time-series data is compared to the second timeseries data to generate an indication of a quality of a fabrication process or an associated fabrication equipment (e.g., product detect level prediction or product quality characteristic prediction).

[0023] In some embodiments, for a given wafer k, each sensor i collects a signal 6_k of length p_k ^l .

[0024] In some embodiments, the method further includes retrieving, by the processor, a set of second time-series data associated with the fabrication process of the fabricated semiconductor device; and aligning, by the processor, the set of second time-series data to a set of third time-series data associated with the same fabrication process based on the hidden Markov model analysis, wherein the set of second time-series data comprises more than 50 sensors sampled at 1 Hz, 5 Hz, 10 Hz, or at a sampling rate in between.

[0025] In some embodiments, the step of generating the template hidden Markov model comprises segmenting, by the processor, the time-series data to generate the plurality of segment data and determining alignment statistics of the plurality of segment data; clustering the plurality of segments based on alignment statistics; and determining a transition matrix and an emission parameter matrix based on the clustering.

[0026] In some embodiments, the steps of generating the template hidden Markov model are performed for over 100 sensor readings for a given fabrication process, wherein the operation is performed near real-time in between batch processing.

[0027] In some embodiments, the method further includes generating an alert when an anomaly in the given fabrication process is detected.

[0028] In some embodiments, the method is performed at a remote analysis system for a plurality of semiconductor fabrication equipment.

[0029] In some embodiments, the method is performed at an analysis system for a semiconductor fabrication equipment.

[0030] In some embodiments, the analysis system is a part of the semiconductor fabrication equipment.

[0031] In some embodiments, the analysis system is a part of a controller of a semiconductor fabrication equipment.

[0032] In some embodiments, the method further includes transmitting the template hidden Markov model of a first semiconductor fabrication equipment to a second semiconductor fabrication equipment configured to generate a second template hidden Markov model, wherein the template hidden Markov model of the first semiconductor fabrication equipment and the second template hidden Markov model are combined at the second semiconductor fabrication equipment for a tool matching operation or virtual metrology operation performed at the second semiconductor fabrication equipment.

[0033] In some embodiments, the method further includes transmitting the template hidden Markov model of a first semiconductor fabrication equipment to an analysis system, wherein the analysis system is configured to the template hidden Markov model of the first semiconductor fabrication equipment and the template hidden Markov model of other semiconductor fabrication equipment to determine an anomaly in a fabrication process of first semiconductor fabrication equipment.

[0034] In another aspect, a metrology system (e.g., semiconductor metrology or inspection system) is disclosed comprising a processing unit configured by computer-readable instructions to detect an anomaly in a fabrication process for semiconductor devices by: [0035] (a) generating a template hidden Markov model to align a first time-series data collected from a sensor and associated with the fabrication process of a fabricated semiconductor device by: (b) retrieving the first time-series data associated with the fabrication process of the fabricated semiconductor device; (c) aligning the first time-series data to a second time-series data associated with the same fabrication process using the generated ordered sequence of states; and (d) comparing the first time-series data to the second time-series data to determine the anomaly in the fabrication process for the fabricated semiconductor device.

[0036] In some embodiments, the instructions to generate the template hidden Markov model comprises (i) instructions to retrieve a plurality of training sensor data sets associated with a plurality of fabricated semiconductor devices, wherein each of the plurality of training sensor data set comprises a training time-series data that is associated with a fabricated semiconductor device of the plurality of fabricated semiconductor devices; (ii) instructions to segment the timeseries data to generate a plurality of segment data for the plurality of sensor data; (iii) instructions to perform a hidden Markov model analysis of the plurality of segment data to generate a template hidden Markov model that describes the hidden states of the plurality of fabricated semiconductor devices; and (iv) instructions to generate an ordered sequence of states of the plurality of segment data using parameters of the template hidden Markov model.

[0037] In some embodiments, the first time-series data is acquired from a first semiconductor device, wherein the second time-series data is acquired from a second semiconductor device, wherein the first semiconductor device and the second semiconductor device are in the same fabrication batch of fabricated semiconductor devices.

[0038] In some embodiments, the first time-series data is acquired from a first semiconductor device, wherein the second time-series data is acquired from a second semiconductor device, wherein the first semiconductor device and the second semiconductor device are in different fabrication batches of fabricated semiconductor devices.

[0039] In some embodiments, the instructions to align the first time-series data to the second time-series data comprise a Viterbi algorithm or a max-sum algorithm.

[0040] In some embodiments, the instructions to segment the time-series data to generate a plurality of segment data comprises instructions to segment the first time-series data into a plurality of steady-state segments by determining, using a moving window of a predetermined size, along with the first time-series data, a set of regions of the first time-series data having values within a pre-defined threshold profile (e.g., within a 2 AT range until more than 10% of the signal is outside the range); and instructions to segment the first time-series data into a plurality of transient state segments by labeling regions outside the plurality of steady-state segments as a plurality of transient state segments.

[0041] In some embodiments, the sensor that collected the first time-series data is a part of manufacturing equipment of the fabricated semiconductor device, wherein the manufacturing equipment is selected from the group consisting of a plasma etching system, a liquid solutionetching system (wet etching), a plasma-enhanced chemical vapor deposition system, a thin-film deposition system, a molecular-beam epitaxy (MBE) system, an electron beam melting (EBM) system, a chemical vapor deposition (CVD) system, and a roll-to-roll web coating system.

[0042] In some embodiments, the sensor that collected the first time-series data is a metrology or inspection equipment selected from the group consisting of: a wafer prober, imaging station, ellipsometer, CD-SEM, ion mill, C-V system, interferometer, source measure units (SME) magnetometer, optical and imaging system, profilometer, reflectometer, resistance probe, resistance high-energy electron diffraction (RHEED) system, and X-ray diffractometer. [0043] In some embodiments, the first time-series data is retrieved from a controller of manufacturing equipment of the fabricated semiconductor device, wherein the controller of the manufacturing equipment is operatively connected to the sensor.

[0044] In some embodiments, the first time-series data comprises observed measurements of a metrology signal associated with a device pattern on a wafer. [0045] In some embodiments, the first time-series data comprises observed measurements of a power signal, a pressure signal, a temperature signal, a volume signal, a flow rate signal, a voltage signal, and an optical signal, any of which is associated with a fabrication process.

[0046] In some embodiments, the processing unit is configured by instructions to compare the first time-series data to the second time-series data to determine accurate tool matching (e.g., chamber matching) between a piece of first fabrication equipment and a piece of second fabrication equipment employed in the same fabrication process.

[0047] In some embodiments, the first time-series data is compared to the second timeseries data to determine virtual metrology output.

[0048] In some embodiments, the processing unit is configured to compare the first timeseries data to the second time-series data to generate an indication of a quality of a fabrication process or an associated fabrication equipment (e.g., product detect level prediction or product quality characteristic prediction).

[0049] In some embodiments, the processing unit is configured by computer-readable instructions to further retrieve a set of second time-series data associated with the fabrication process of the fabricated semiconductor device; and align the set of second time-series data to a set of third time-series data associated with the same fabrication process based on the hidden Markov model analysis, wherein the set of second time-series data comprises more than 50 sensors sampled at 1 Hz, 5 Hz, 10 Hz, or at a sampling rate in between.

[0050] In some embodiments, the instructions to generate the template hidden Markov model comprises the instructions to segment the time-series data to generate the plurality of segment data and determine alignment statistics of the plurality of segment data; instructions to cluster the plurality of segments based on alignment statistics; and instructions to determine a transition matrix and an emission parameter matrix based on the clustering.

[0051] In some embodiments, the system further includes a metrology sensor system comprising a plurality of sensors configured to acquire a plurality of sensor data.

[0052] In another aspect, a non-transitory computer-readable medium is disclosed having instructions stored thereon, wherein execution of the instructions by a processor causes the processor to perform any of the above-discussed methods or above-discussed systems.

[0053] In another aspect, a method is disclosed to detect an anomaly in a manufacturing process for an article, the method comprising (a) generating a template hidden Markov model to align a first time series data collected from a sensor and associated with the manufacturing process of the article by: (i) retrieving, by a processor, a plurality of training sensor data sets associated with a plurality of manufactured articles, wherein each of the plurality of training sensor data set comprises a training time-series data that is associated with a manufactured article of the plurality of manufactured articles; (ii) segmenting, by the processor, the time-series data to generate a plurality of segment data for the plurality of sensor data; (iii) performing, by the processor, a hidden Markov model analysis of the plurality of segment data to generate a template hidden Markov model that describes the hidden states of the plurality of fabricated semiconductor devices; and (iv) generating, by the processor, an ordered sequence of states of the plurality of segment data using parameters of the template hidden Markov model; (b) retrieving, by the processor, the first time series data associated with the manufacturing process of the manufactured article; and (c) aligning, by the processor, the first time series data to a second time series data associated with the same manufacturing process using the generated ordered sequence of states, wherein the first time series data is compared to the second time series data to determine an anomaly in the manufacturing process for the manufactured article.

Brief Description of the Drawings

[0054] The skilled person in the art will understand that the drawings described below are for illustration purposes only.

[0055] Figs. 1A, IB, and 1C each show an example analysis system (e.g., anomaly detection system) for anomaly detection of defects or errors, or tool matching or virtual metrology, in manufacturing processes in accordance with an illustrative embodiment.

[0056] Fig. 2 shows an example method of operation to determine or detect an anomaly, e.g., the presence of a defect or error in a fabricated workpiece or in a fabrication process in accordance with an illustrative embodiment.

[0057] Fig. 3 shows an example method of operation to generate the HMM template for use in the operation of Fig. 2 to determine an anomaly in accordance with an illustrative embodiment.

[0058] Fig. 4A shows an example method of Fig. 2 in accordance with an illustrative embodiment.

[0059] Fig. 4B is a diagram showing the analytical features for the dynamic-based analysis and/or static-based analysis of Fig. 2 in accordance with an illustrative embodiment. [0060] Fig. 4C is a diagram showing the alignment feature vector used to determine the emission parameters of the HMM template of Fig. 2 in accordance with an illustrative embodiment.

[0061] Fig. 4D shows an example of a first error type corrected by the segmentation error correction module of Fig. 1A in accordance with an illustrative embodiment.

[0062] Fig. 4E shows an example of a second error type corrected by the segmentation error correction module of Fig. 1A in accordance with an illustrative embodiment.

[0063] Figs. 4F and 4G each show an example of a third error type corrected by the segmentation error correction module of Fig. 1A in accordance with an illustrative embodiment. [0064] Fig. 4H shows an example of a fourth error type corrected by the segmentation error correction module of Fig. 1A in accordance with an illustrative embodiment.

[0065] Fig. 5 A shows an example method to generate the HMM template of Fig. 3 in accordance with an illustrative embodiment.

[0066] Fig. 5B shows example emission parameters of the HMM template of Fig. 3 in accordance with an illustrative embodiment.

[0067] Fig. 5C shows an example transition matrix of the HMM template of Fig. 3 in accordance with an illustrative embodiment.

[0068] Fig. 5D shows a method of clustering to generate the HMM template of Fig. 3 in accordance with an illustrative embodiment.

[0069] Fig. 5E shows a method to determine the state statistics of Fig. 3 in accordance with an illustrative embodiment.

[0070] Fig. 6 shows an example semiconductor fabrication system from which time series data can be evaluated using the method of operation to determine or detect an anomaly in accordance with an illustrative embodiment.

[0071] Fig. 7 shows an example operation of Hidden Markov model matching or comparison in accordance with an illustrative embodiment.

Detailed Specification

[0072] Each and every feature described herein, and each and every combination of two or more of such features, is included within the scope of the present invention provided that the features included in such a combination are not mutually inconsistent.

[0073] Some references, which may include various patents, patent applications, and publications, are cited in a reference list and discussed in the disclosure provided herein. The citation and/or discussion of such references is provided merely to clarify the description of the present disclosure and is not an admission that any such reference is “prior art” to any aspects of the present disclosure described herein. In terms of notation, “[n]” corresponds to the n^th reference in the list. All references cited and discussed in this specification are incorporated herein by reference in their entirety and to the same extent as if each reference was individually incorporated by reference.

[0074] Example System #1

[0075] Figs. 1A, IB, and 1C each show an example equipment analysis system 100(shown as 100a, 100b, 100c), for anomaly detection, tool matching, or virtual metrology of defects or errors in manufacturing processes in accordance with an illustrative embodiment. The analysis system 100 may be implemented for a set of equipment via a central analysis system or on individual local equipment. The analysis may generate equipment-specific parameters that can be transmitted and/or shared with other analysis systems.

[0076] In the example shown in Fig. 1A, the equipment analysis system 100a includes a machine analysis system 102 (shown as “Analysis System (Central)” 102a) configured to receive a stream 104 of time series data from a set of manufacturing or fabrication equipment 106 (shown as “Semiconductor Fabrication Equipment” 106a) and associated metrology 108 or instruction equipment 110 to determine the presence or non-presence of an anomaly in the signal corresponding to defects in a fabricated device or non-compliant operations of the manufacturing or fabrication equipment 106a. The time-series data can be one-dimensional data, two- dimensional data, or three-dimensional data. The analysis system 102 (e.g., 102a) or device/module (e.g., 102b, 102c) is configured to segment each time-series data of stream 104 into a plurality of data segments corresponding to a fabricated feature or process control parameter to which the analytical features are applied. In the example shown in Fig. 1 A, timeseries data relating to the processing performed by the manufacturing or fabrication equipment 106 are provided to one or more data stores 112 and are made available to the analysis system 102a. To correctly segment the time series data stream 104, the analysis system 102 (e.g., 102a) or device/module (e.g., 102b, 102c) performs a segmentation error correction based on a Hidden Markov Model-based model (also referred to as an HMM-based template) from other time-series data of the same process or fabricated device that models the time series data as a set of hidden Markov processes. The HMM-based template is then used to adjust the lengths of the segments initially defined in the time series data or their classification.

[0077] In the example shown in Fig. 1A, the analysis system 102a (shown as 102a’) includes a segmentation module 116 (shown as “Segmentation” 116), a segmentation error correction module 118 (shown as “Segmentation Error Correction” 118), a feature assessment module 120 (shown as “Features” 120), an anomaly detector 122, and a Hidden Markov Model module 124 (shown as “Hidden Markov Model” 124). The segmentation module 116 receives the stream 104 of time-series data (shown as 104a) from any one of the manufacturing or fabrication equipment 106a, inspection metrology equipment 108, and/or equipment 110 to generate a set of segmented data 117. The data can correspond to a given workpiece 125 from a batch of workpieces 126 (shown as “Workpiece Batch 1” 126a, “Workpiece Batch 2” 126b, “Workpiece Batch n” 126c) and/or associated processing 128 (shown as “Processing Batch 1” 128a, “Processing Batch 2” 128b, and “Processing Batch n” 128c) performed to fabricate the workpiece. The Hidden Markov Model module 124 generates a template HMM 130 from a set of batch data 104b as a hidden Markov process 132 to be used by the segmentation error correction module 118 to perform a number of segmentation corrections. The template HMM 130 includes the probabilities of hidden state transitions (shown as “State Transition Matrix” 136) and the probabilities of each hidden state being present (shown as “Emission Distribution” 134). The segmentation error correction module 118 then employs a maximum likelihood estimator (e.g., in the Viterbi algorithm) to estimate a sequence of states (shown as the “Template States” 138) that likely caused the time-series signal. The segmentation error correction module 118 compares and realigns the segmented data 117 to generate realigned or corrected segment data 119. The feature assessment module 120 performs dynamical-based analysis on transient portions of the realigned or corrected segment data 119 that are then evaluated by the anomaly detector 122 to determine the presence of a defect or error in a fabricated workpiece or in a fabrication process using prior features 140 calculated from feature assessment module 142 using the batch data 104b.

[0078] In some embodiments, the segmentation correction module 118 can (i) determine the presence of extraneous features in a data set due to segmentation error and remove the extraneous features or adjust the index of segments to correct for the extraneous feature; (ii) determine the presence of incorrect classification of segments and correct for such misclassification to apply the correct feature operations to those segments; and (iii) determine misclassification of segments after the feature operations are applied and to correctly apply feature operations to segments temporally similar in time to one another.

[0079] The analysis system (e.g., anomaly detection) can be performed following fabrication processes, e.g., dry etching and deposition, to early identify defects in a wafer or the processing prior to the wafer being subjected to additional processing. The analysis system (e.g., anomaly detection) can be performed in real-time or near real-time in parallel or in-between wafer processing operations without adding to the processing time. The analysis system (e.g., anomaly detection) can be performed in conjunction with metrology or inspection.

[0080] Example System #2

[0081] Figs. IB and 1C each show an example machine analysis system 100 (shown as 100b and 100c) for machine-specific anomaly detection of defects or errors in manufacturing processes in accordance with an illustrative embodiment. In the example shown in Figs. IB, the machine analysis system 100b includes an analysis device 102b implemented as a part of the semiconductor fabrication equipment 106b. In the example shown in Figs. 1C, the machine analysis system 100c includes an analysis module 102c implemented as a part of the controller 107 (shown as 107a) of the semiconductor fabrication equipment 106c.

[0082] The analysis device 102b or analysis module 102c are each configured to receive a stream 104 of time series data from a controller 107 or plant control 109, respectively, of the manufacturing or fabrication equipment (e.g., 106b, 106c) to determine the presence or nonpresence of an anomaly in the signal corresponding to defects in a fabricated device or non- compliant operations of the manufacturing or fabrication equipment (e.g., 106b, 106c). The analysis device 102b may be a computing device, microprocessors (MCUs), microcontrollers, graphical processing units (GPUs), a logical circuit implemented via a CPLD or FPGA, or an application-specific circuit (ASICs), as described herein. The analysis module 102c may be an instruction for a computing device, microprocessors (MCUs), microcontrollers, graphical processing units (GPUs), a logical circuit implemented via a CPLD or FPGA, or an applicationspecific circuit (ASICs) that can execute with the plant control 109.

[0083] The time-series data can be one-dimensional data, two-dimensional data, or three- dimensional data. The analysis device 102b is configured to segment each time-series data of stream 104 into a plurality of data segments corresponding to a fabricated feature or process control parameter to which the analytical features are applied. In the example shown in Fig. IB, time-series data relating to the processing performed by the manufacturing or fabrication equipment 106b are provided to one or more data stores 112 (shown as 112a) of the equipment 106b and are made available to the analysis device 102b. To correctly segment the time series data stream 104, the analysis device 102 b performs a segmentation error correction based on a Hidden Markov Model-based model (also referred to as an HMM-based template) from other time-series data of the same process or fabricated device that models the time series data as a set of hidden Markov processes. The HMM-based template is then used to adjust the lengths of the segments initially defined in the time series data or their classification.

[0084] In the example shown in Figs. IB or 1C, the analysis device 102b or analysis module 102c, similar to the analysis device 102a, includes a segmentation module 116 (shown as “Segmentation” 116), a segmentation error correction module 118 (shown as “Segmentation Error Correction” 118), a feature assessment module 120 (shown as “Features” 120), an anomaly detector 122, and a Hidden Markov Model module 124 (shown as “Hidden Markov Model” 124). The segmentation module 116 receives the stream 104 of time-series data (shown as 104a) from any one of the manufacturing or fabrication equipment 106a, inspection metrology equipment 108, and/or equipment 110 to generate a set of segmented data 117. The data can correspond to a given workpiece 125 from a batch of workpieces 126 (shown as “Workpiece Batch 1” 126a, “Workpiece Batch 2” 126b, “Workpiece Batch n” 126c) and/or associated processing 128 (shown as “Processing Batch 1” 128a, “Processing Batch 2” 128b, and “Processing Batch n” 128c) performed to fabricate the workpiece. The Hidden Markov Model module 124 generates a template HMM 130 from a set of batch data 104b as a hidden Markov process 132 to be used by the segmentation error correction module 118 to perform a number of segmentation corrections. The template HMM 130 includes the probabilities of hidden state transitions (shown as “State Transition Matrix” 136) and the probabilities of each hidden state being present (shown as “Emission Distribution” 134). The segmentation error correction module 118 then employs a maximum likelihood estimator (e.g., in the Viterbi algorithm) to estimate a sequence of states (shown as the “Template States” 138) that likely caused the timeseries signal. The segmentation error correction module 118 compares and realigns the segmented data 117 to the segmented data 117 to generate realigned or corrected segment data 119. The feature assessment module 120 performs dynamical-based analysis on transient portions of the realigned or corrected segment data 119 that are then evaluated by the anomaly detector 122 to determine the presence of a defect or error in a fabricated workpiece or in a fabrication process using prior features 140 calculated from feature assessment module 142 using the batch data 104b.

[0085] In some embodiments, the segmentation correction module 118 can (i) determine the presence of extraneous features in a data set due to segmentation error and remove the extraneous features or adjust the index of segments to correct for the extraneous feature; (ii) determine the presence of incorrect classification of segments and correct for such misclassification to apply the correct feature operations to those segments; and (iii) determine misclassification of segments after the feature operations are applied and to correctly apply feature operations to segments temporally similar in time to one another.

[0086] The analysis system can be performed following fabrication processes, e.g., dry etching and deposition, to early identify defects in a wafer or the processing prior to the wafer being subjected to additional processing. The analysis system can be performed in real-time or near real-time in parallel or in-between wafer processing operations without adding to the processing time. The analysis system can be performed in conjunction with metrology or inspection.

[0087] Method of Determining an Anomaly in a Manufacturing Process

[0088] Fig. 2 shows an example method 200 of operation to determine an anomaly, e.g., the presence of a defect or error in a fabricated workpiece or in a fabrication process in accordance with an illustrative embodiment. Fig. 4A shows an example of method 200 (shown as 400) of Fig. 2 in accordance with an illustrative embodiment.

[0089] Method 200 includes receiving (202) a time series data 104 (shown as 104c in Fig. 4A) from a piece of manufacturing or fabrication, metrology, or inspection equipment (e.g., 106, 108, 110) or a data store (e.g., 112) associated therewith.

[0090] Method 200 includes segmenting (204) the time-series data (e.g., 104c) and classifying and labeling (204) the segments as being associated with a transient state part of the signal (402) and steady-state part of the signal (404). Each segment includes fiduciaries or alignment features, including a segment number (“index”), a “start-time” value, an “end-time” value, a “level” value, a “type” value, a “range” value, and a “difference” value.

[0091] An example of sets of segmentation operations that may be performed is described in U1 Haq, A., Djurdjanovic, D., “Dynamics-Inspired Feature Extraction in Semiconductor Manufacturing Processes,” which is incorporated by reference herein in its entirety. Another example set of segmentation operations is described in Tian, R., “An Enhanced Approach using Time Series Segmentation for Fault Detection of Semiconductor Manufacturing Process,” which is also incorporated by reference herein in its entirety.

[0092] The method can entail filtering the signal (e.g., via an FIR filter) and determining the gradient (e.g., using the different based method) of the filtered signal. The maximum of the standard deviations (o) of data points in the regions with gradients below a predefined threshold are then used to specify the noise threshold (e.g., noise threshold Ar = 5o). To parse the signal into steady-state and transient segments, a moving window of length 'M' (size of the window corresponds to the shortest portion of a signal that could be considered a steady-state) slides along the signal until at least 90% of the points in the window are contained within a range of 2Ar. The initial point of the window is locked, while the other end is moved forward through the signal to expand the window until more than 10% of the signal readings he outside the 2 Ar range to define the steady-state portion. The window is then reset to its original length, while the initial point of the window is shifted across the steady-state segment that has just been recognized. The process is repeated until the edge of the window reaches the end of the signal. The remaining portions of the signal are then classified and labeled as the transient portion of the signals. Method 200 then applies labels consequentially to each identified steady-state and transient-state portion of the signals.

[0093] Following the classification and state labeling (from step 204), Method 200 includes performing (208) a set of dynamic-based analyses (via dynamical-based analytical features) and static-based analyses (via static-based analytical features). Prior to the analysis, Method 200 includes performing (206) a segmentation error correction by evaluating the segments against an HMM-based template to address mislabeled segments or misclassified portions of the time series data by applying determined hidden states from the HMM-based template as the labels for the subsequent analysis. In some embodiments, the HMM-based template can be used to label portions of the time series for subsequent analysis. Stated differently, the HMM-based template can be performed in such embodiments as the classifier and labeling operation of step 204.

[0094] The segmentation error correction module 118 performs segmentation error correction 206 by aligning and classifying a given sensor reading using an HMM template using the Viterbi algorithm.

[0095] HMM model. The HMM template (e.g., 130) comprises a Hidden Markov Model that includes a set of the transition matrix (e.g., 136) and a set of emission parameters (e.g., 134). The HMM model may be configured with an even initial state distribution across all the states. [0096] Transition Matrix. The transition matrix (e.g., 136) of the HMM template (e.g., 130) represents the probability of moving from one hidden state to another. The HMM model may enforce a left-to-right transition matrix. That is, once in a state, the state can be repeated multiple times, but once the model has transitioned to the next state, it cannot return to a prior state (i.e., moving back left within the matrix). The transition matrix (e.g., 136) has the form:

[0097] where a_Lj represents the probability of transitioning from state i to state j. In the transition matrix, the row numbers represent the current state, and the column numbers represent the state being transitioned to. The probability is bounded between “0” and “1”, and each row must be summed to 1. Fig. 5C shows an example transition matrix for a fabricated device. It can be observed that this matrix implies a high probability of simply moving to the next state, except for state 0. For state 0, a value of -50% indicates that the state is likely to return to itself, implying that the first segment may be split by the segmentation in some signals but not others. [0098] To get initial parameters, individual segments can be first labeled via a clustering operation. When the labels are applied, transitions are counted from nearby labels (e.g., when one label sequentially follows another) and divided by the total count of transitions from state i to any other state forms the initial value of atj. The initial values are used in a Gibbs sampler that estimates the transmission and emission parameters simultaneously.

[0099] Emission Parameters. The HMM model includes the emissions parameters 134 (shown as 134b in Fig. 4A) that define the distribution of statistics produced by each hidden state. From the state emissions 134b, the HMM model attempts to accurately model properties of segments extracted from the hidden states by separating probabilistic models for different alignment statistics such as level, range, difference, start, and end per Table 1. The emission parameters may be used as an alignment feature vector for an individual sensing reading. Fig. 4C is a diagram showing the alignment feature vector of Table 1. Each segment has a range, difference, and level.

Table 1

[0100] The statistics were assumed to be independent of each other in which the level of a segment within a hidden state is not correlated with its segment type. The likelihood for a given state is calculated as the product of the likelihood of each statistic using the state distribution parameters.

[0101] At every segment, the “start” and “end” parameters can be estimated as the minimum and maximum sample location a segment with this hidden state label contains - with A’ being the sample minimum, B’ being the sample maximum. The posterior predictive distribution p(Start,End|A’,B’), while not having an analytical form, when observing it with a Bayesian sampling schema, it exhibits a uniform shape defined between two exponential distributions. This distribution can be estimated with a piecewise distribution per Equation 2 with a, P, estimated by fitting exponentials to the tails of sampled data from the true posterior predictive distribution:

(Eq. 2) [0102] In Equation 2, A’ refers to the sample “min” of starts calculated for a state, and B’ refers to the sample “max” of the Ends. With this distribution, segments will have a high probability of originating from a state if their “start” and “end” points he within the min and max points. Otherwise, the probability will lower exponentially the further the start and end are out of these bounds. The distribution can handle segments split by poor segmentation. With this distribution, if a segment is split in half or thirds, each split segment will have the same likelihood as the full segment. These rules allow the HMM model to merge segments split incorrectly by segmentation and label them as the correct state.

[0103] Fig. 5B shows example emission parameters of an HMM template (e.g., 130) that was previously generated by the Hidden Markov Model module 124 from a prior batch of fabricated devices or processes and stored (shown in datastore 406) for the analysis. The emission parameters 134b may be used as an alignment feature vector for an individual sensing reading 124b. Each column of the emission parameters represents the alignment statistics for one segment, and the index numbering is the order of the segments in the sensor reading. Each hidden state has 9 hidden emission parameters (407 - not shown): min start, max end, the probability of transient or steady-state, the three means, and three standard deviations for the normal distributions describing level, range, and difference. Each segment can be matched to a hidden state by comparing statistics for a given segment to each of these parameters. The Viterbi algorithm then combines the information with the transition matrix to estimate the most likely hidden states for every segment.

[0104] Alignment. Once segmented and analyzed, the segmentation error correction module 118 is configured to align the segmented data (e.g., 117) using the Viterbi algorithm. The Viterbi algorithm takes the alignment feature vector (e.g., of Fig. 5B) to return a most likely path taken by the segmentation (410), along with a likelihood for that path (412), using the transition matrix (e.g., 136) and state emission parameters (407 - not shown) from the HMM template (e.g., 130).

[0105] The Viterbi algorithm employs a maximum-likelihood detector to recursively search over all possible input sequences through a trellis (132) (of the hidden Markov chain) that represents all possible input sequences. Each path through the trellis is represented as a different binary input sequence. Each branch in the trellis has a transition probability p, taken from the transition matrix, and each node between branches has a likelihood lj corresponding to the likelihood for a specific hidden state. The product of all branch probabilities and state likelihoods for a given path represents the likelihood associated with that path. Maximizing that likelihood can be represented as maximizing Equation 3. (m represents the list of nodes in the trellis, n represents the list of branches)

(Eq. 3) [0106] The Viterbi algorithm can eliminate those paths that cannot be part of the most likely path because they diverge and remerge with another path that has a larger likelihood. An ML detector can be used to keep track of the maximum likelihood path leading to each state at the current sampling time. When a current sample is received, likelihoods for the two paths leaving each state at the previous sampling time are calculated by multiplying the transition probability and state likelihood with the likelihood of the maximum likelihood paths. The two- path likelihoods entering each state at the current sampling time are then compared, and the path with the maximum likelihood is selected as the template path 138.

[0107] This template path 138 corresponds to state labels, much like the labels added with the initial clustering when generating the template. The path analysis further improves the consistency of segmentation and alignment. As discussed above, when two states are visited in parallel to each other, the two-path metrics entering each state at the current sampling time are compared, and the path with the maximum likelihood is selected as the template path 138. Additionally, if paths are returned that represent unlikely scenarios segmentation (either by a new path or by having low statistical likelihood), the Viterbi algorithm can be rerun with the modified parameters.

[0108] Once modifications to the path are completed, adjacent segments with the same label are merged to provide the final segments/labels. At this point, dynamics-based analysis can be performed to provide the final analysis of the anomaly detection for the sensor reading.

[0109] Dynamic-based and Steady-State-based Analysis . Once a signal (e.g., 104c) is correctly segmented into steady-state and transient segments, the feature assessment module 120 can perform dynamical-based analysis and steady-state-based analysis on the respective transient portions and steady-state portions of the signal. Table 2 shows a list of analytical features that can be employed for the feature assessment module 120. Fig. 4B is a diagram showing the analytical features of Table 2.

Table 2

[0110] Additional examples and descriptions of the analytical features are described in IEEE Std 181TM - 2011, “IEEE Standard for Transitions, Pulses, and Related Waveforms IEEE Instrumentation and Measurement Society,” New York, 2011, which is incorporated by reference herein in its entirety.

[0111] Correction Scenarios. Figs. 4D - 4H each shows example corrections that can be performed using the segmentation error correction operation 206 of Fig. 2.

[0112] Fig. 4D shows an example of a first error in the alignment labeling of segments. In Fig. 4D, segments “1” and “6” are each shown split into 3 segments (440), e.g., as generated by initial segmentation 117. This error can cause subsequent labels to be misapplied, resulting in the outputs of the subsequent feature analysis being incorrectly compared to those of the prior batches. The segmentation error correction corrects the error (in 440) by merging the extra segments together.

[0113] Fig. 4E shows an example of a second error in which features may be present on only a subset of sensor readings (442) that may be classified as an extra feature by the segmentation module 116. This error can cause a change to the index numbering of the segments and can also disrupt the features of adjacent segments. The segmentation error correction 206 employs hidden states to represent individual features that can address this issue. When alignment is performed with the Viterbi algorithm, not all states need to be visited; thus, states can represent segments that are present for a subset of sensor readings.

[0114] Fig. 4F (similar to Fig. 4D) shows an example of a third error in which the initial segmentation 116 had incorrectly labeled segments (444) in the time series data because a transient segment was detected where none existed. Fig. 4G shows another example of a third error in which the initial segmentation 116 had incorrectly labeled (446) a short steady-state captured in reading “1” is not registered in reading “2.” It can be difficult to tune segmentation methods to work at large scales on a variety of sensor types and features. To avoid adjusting segmentation for every variety of sensor/ time series feature, a certain amount of inconstancy in labeling must be accepted. These inconstancies can be due to several reasons, but brief excursions as shown in Figs. 4D, 4F, and 4G or steady states close to a minimum length are common. The segmentation error correction 206 can address this issue through two operations using the Hidden Markov Model. For the first of these two operations, emissions for hidden states can be designed to represent both full segments as well as broken or incomplete segments coming from the same sensor feature. That is, to represent model emissions from states.

HMM’s with emissions model emissions can be modeled, e.g., as simple multivariate normal distributions. In some embodiment, the model emissions are modeled as a set of independent distributions in which the distributions correspond to each parameter. These distributions have been set to match the true type of distribution for each parameter. For the second of these two operations, the error correction 206 can take paths through hidden states representing standard sensor reading behavior and rerun the segmentation with modified parameters when an unusual path is taken, or the likelihood of that path is significantly lower than usual. In some embodiment, modifications are made to the noise threshold parameters or to the minimum stead- state lengths parameters when rerunning segmentation based on the path. For noise threshold modification, the noise threshold determined for steady states defines the expected amount of noise in steady states can be adjusted if the threshold is too low (e.g., causing the segmentation to incorrectly split the steady states) or if the threshold is too high (e.g., causing the segmentation to incorrectly measure when a steady state stops and a dynamic state begins or may even miss dynamic states).

[0115] For the minimum steady-state lengths modification (which determines the minimum length a steady state can take), the minimum steady-state lengths can be increased to reduce the frequency of noisy segmentation while not causing misses on shorter segments. Of the noise threshold and minimum steady-state lengths adjustments, the one that meets the highest likelihood path is accepted. When segmentation is rerun, multiple potential adjustments (e.g., 4) to these parameters can be assessed.

[0116] Fig. 4H shows an example of a fourth error due to the length of the sensor time series data being long and containing a high number of segments. This can lead to many segments (448) having very similar statistical features and being difficult to separate with standard distance or clustering methods. The segmentation error correction 206 can address this issue via the template order provided in the Hidden Markov Model. The HMM builds a transition matrix that identifies the order in which states occur. Thus, segments are not only identified by their features but also by the labels and order of other segments within the sensor readings. This leads to more accurate labels that can better handle the ambiguity, as shown in Fig. 4H.

[0117] HMM Template Generation (124), Hidden Markov Model module 124 is configured to generate an HMM template that may include a Hidden Markov Model (referred to herein as “HMM model”) from an initial segmentation and statistics determined from a batch set of time series; clustering segments based on alignment statistics; and building the HMM model with priors and emission distributions.

[0118] Fig. 3 shows an example method of operation to generate the HMM template for use in the operation of Fig. 2 to determine an anomaly, e.g., the presence of a defect or error in a fabricated workpiece or in a fabrication process in accordance with an illustrative embodiment. Fig. 5 A shows an example of method 300 (shown as 500) of Fig. 3 in accordance with an illustrative embodiment.

[0119] Method 300 includes receiving (302) a batch of time series data 104b (e.g., -200 time-series data from prior batches or the same batch of the same fabricated device or associated processes), segmenting (304) the received batch of time series data 104b and extracting statistics from the segments, (ii) clustering (306) segments based on the statistics, and (iii) constructing (308) the HMM model 132 including an emission distribution 134 and state transition matrix 136 based on the clustering. The emission distribution 134 and state transition matrix 136 are used to generate template states 138 in the Viterbi algorithm to correct the segment errors in module 118 as described in relation to Fig. 2.

[0120] Initial Segmentation (304). Segmentation (304) may be performed via the process laid out in U1 Haq, A., Djurdjanovic, D., “Dynamics-Inspired Feature Extraction in Semiconductor Manufacturing Processes” or Tian, R., “An Enhanced Approach using Time Series Segmentation for Fault Detection of Semiconductor Manufacturing Process.” [0121] The method (304) can entail filtering the signal (e.g., via an FIR filter) and determining the gradient (e.g., using the different based method) of the filtered signal. The maximum of the standard deviations (o) of data points in the regions with gradients below a predefined threshold are then used to specify the noise threshold (e.g., noise threshold Ar = 5o). To parse the signal into steady-state and transient segments, a moving window of length 'M' (size of the window corresponds to the shortest portion of a signal that could be considered a steadystate) slides along the signal until at least 90% of the points in the window are contained within a range of 2Ar. The initial point of the window is locked, while the other end is moved forward through the signal to expand the window until more than 10% of the signal readings he outside the 2Ar range to define the steady-state portion. The window is then reset to its original length, while the initial point of the window is shifted across the steady-state segment that has just been recognized. The process is repeated until the edge of the window reaches the end of the signal. The remaining portions of the signal are then classified and labeled as transient portions of the signal. Method 200 then applies labels consequentially to each identified steady-state and transient-state portion of the signals.

[0122] The segmentation can generate a set of parameters for each of the segments, as shown in Table 3.

Table 3

[0123] Clustering. To provide a quality initialization for the Gibbs sampling algorithm (and ensure the HMM represents true features well), a hierarchical clustering operation may be run on the alignment features of the template sensor set by grouping similar objects into clusters to generate a set of clusters. The hierarchical clustering operation may be performed by first running the segmentation with the parameters fixed on the “template” dataset. The operation then collects alignment parameters for the time series data and determines each segment for the data. The parameters are then normalized, and clustering is run on all of them together.

Agglomerative hierarchical clustering may be performed, which can provide a robust initial identification of segments. The number of classes set for the clustering can be set as the average number of segments found in each time series plus a constant (e.g., three).

[0124] After clustering is performed, summary statistics may be performed by sorting segments into the class they were assigned by the clustering. Those classes can then be sorted by the average segment start time, and the statistics for each state can then serve as the initial emission parameters. The class labeling order for each time series can then be checked to provide initial guesses for the parameters of the transition matrix.

[0125] Fig. 5D shows a method of clustering by labeling segments from all template sensor readings.

[0126] Once segments are labeled, all similarly labeled segments are averaged to determine the state statistics that serve as the initial emission data for the Hidden Markov Model (see Fig. 5E). The labeled states are then reordered by their average start from low to high. The ordered labels serve as the initial values for the sample paths of the HMM. Statistics on the states are employed as the prior data for the emission parameters in the HMM. Clusters can provide initial labeling and alignment of the states to which the HMM templates can be employed with the Viterbi algorithm to consider the state order.

[0127] The HMM model. Once clustering is performed, Hidden Markov Model module

124 is configured to generate the states of the HMM model using the Gibbs sampling procedure that performs an iterative conditional sampling operation. Gibbs sampling procedure calculates an initial value 0¹ = (ui, 112) and samples 0® from the conditional distribution (0₁|0₂ = 2 ¹'⁾) and fl^from the conditional distribution for a bivariate case having a joint

distribution

0₂). The Hidden Markov Model includes a set of initial state distribution parameters, a transition matrix, and a set of emission parameters. The HMM model may be configured with an even initial state distribution across all the states.

[0128] Transition Matrix. The HMM model includes the transition matrix to represent the probability of moving from one hidden state to another. The HMM model may enforce a left-to-right transition matrix. That is, once in a state, the state can be repeated multiple times, but once the model has transitioned to the next state, it cannot return to a prior state (i. e. , moving back left within the matrix). In some embodiments, the constraint could be relaxed, e.g., if the sensors could possibly be monitoring processes that repeated a common subset of actions with no set order. The transition matrix has the form of Equation 1.

[0129] Fig. 5B shows an example of the transition matrix. In the transition matrix, row numbers represent the current state, and column numbers represent the state being transitioned too. Each element (i,j) corresponds to the probability of moving from state i to state j. In the example shown in Fig. 5B, the matrix implies a high probability of simply moving to the next state as the probability in each matrix cell is generally greater than 0.5, except for state “0,” which has ~0.5 of the time the state returns to itself.

[0130] Emission Parameters. The HMM model includes the emissions parameters to define the distribution of statistics produced by each hidden state. For the state emissions, the HMM model attempts to accurately model properties of segments extracted from the hidden states by separating probabilistic models for different alignment statistics such as “level,” “range,” “difference,” “start,” and “end” per Table 2.

[0131] The exemplary machine analysis system 100 (e.g., 100a, 100b, 100c) is configured to perform virtual metrology (VM) by collecting data from equipment sensors during a manufacturing process to predict a product quality characteristic of interest. The segmentation module 116 and feature assessment module 120 extract informative signatures from the raw data. The anomaly detector 122 then uses a VM classification or regression on a VM model to predict the quality characteristics of interest. The VM model can be determined from a subset of the features that are selected, e.g., by Genetic Algorithms [11] that consider multiple performance criteria and ease of implementation. A multi-fold cross-validation policy (e.g., 5-fold) may be employed within the Genetic Algorithms to perform feature selection.

[0132] The exemplary machine analysis system 100 (e.g., 100a, 100b, 100c) can be used to augment metrology analysis to detect defects early in between fabrication processes. During the execution of semiconductor manufacturing processes, like dry etching and deposition, certain defects may form on the wafer that can impact the quality and functionality of the final product. Because of the time-consuming nature of their operation (often taking longer than the corresponding manufacturing process), metrology -based inspection is currently performed on about 5-10% of products.

[0133] A product is considered defective if it contains more defects than a manufacturer- specified threshold dT. Genetic algorithm may be used to select a subset of the extracted features to inform, e.g., a Support Vector Machine (SVM) [12] classifier that can then assign a predicted class to each wafer. The chosen SVM input set may include 10 or fewer features, many of which correspond to transient-based features.

[0134] The segmentation operation using the Hidden Markov Model-based template 114 can be applied to any number of time-series data such as those from metrology or inspection equipment for semiconductor manufacturing and fabrication devices such as a wafer prober, imaging station, ellipsometer, CD-SEM, ion mill, C-V system, interferometer, source measure units (SME) magnetometer, optical and imaging system, profilometer, reflectometer, resistance probe, resistance high-energy electron diffraction (RHEED) system, and X-ray diffractometer, among other equipment disclosed herein.

[0135] The analysis system 102 can update the Hidden Markov Model -based based template or feature 114 in real-time by employing data from previous one or more sets of batches from the manufacturing or fabrication equipment 106.

[0136] Example Fabrication System

[0137] Fig. 6 shows an example semiconductor fabrication system 106a (shown as “Etching System / Station” 600). The system 600 can include a number of equipment 602 (shown as “Photoresist processing” 602a), “Lithography” 602b, “Etch Bath” 602c, and “Wafer Processing” 602d). Each of these equipment 602 can include individual set of sensors 104 (shown as 604) and controller 606 that generates time-series data. The equipment 602 can be instrumented with external sensors 104 (shown as 606a, 606b, and 606c) that are connected to a data acquisition system 608.

[0138] The analysis system 102 can receive time-series data from any of these equipment sensors 604 (through their controller 604) or external sensors 606 through the data acquisition system. Time series data 104 may also include metrics generated by the controller 604 or data acquisition system 608, as well as data received from the inspection system 110 or metrology system 108. The semiconductor fabrication system may include other manufacturing equipment, e.g., for semiconductor fabrication equipment or processes, such as plasma etching system, liquid solution-etching system (wet etching), plasma-enhanced chemical vapor deposition system, thin-film deposition system, molecular-beam epitaxy (MBE) system, electron beam melting (EBM) system, chemical vapor deposition (CVD) system, and roll-to-roll web coating system.

[0139] Example Hidden Markov Model Matching or Comparison

[0140] Fig. 7 shows example operation of Hidden Markov model matching or comparison. In some embodiments, the operation may be employed to update the Hidden Markov model of other fabrication systems. In other embodiments, the operation may be employed for virtual metrology.

[0141] In the example shown in Fig. 7, the analysis system 102 (shown as 102d) transfers the Hidden Markov model parameters and/or threshold, e.g., as generated in relation to Figs. 2 - 5, through a network 702, to other analysis systems 102 (shown as 102e). Indeed, while the analysis system 102d of a given semiconductor fabrication equipment (also referred to as a “tool”) determines its own Hidden Markov model and/or thresholds, similar operations may be performed by other analysis systems 102e of other tools. Each of the individual tool or groups thereof can then shares their respective Hidden Markov model parameters and/or threshold with one another, e.g., for virtual metrology, virtual modeling, or monitoring. While the example shown in Fig. 7 is shown in relation to the implementation of Fig. 1C, it is contemplated that similar operations may be employed with the implementations of Figs. 1A and IB.

[0142] Comparison Tool with Distance Measure. Once the Hidden Markov model and/or thresholds are shared between tools (e.g., 102d, 102e), the respective analysis system 102d can perform tool matching based on functionalized distances (e.g., Wasserstein distance) between HMMs, e.g., to detect outlier tools that are different from other tools with statistical significance. The tool-matching operation is employed to determine when a given tool needs service maintenance or has reached the end of its operational life. The tool matching output can indicate if two tools from a set of tools have matching operations.

[0143] In some embodiments, the transferred and the local Hidden Markov models may be evaluated using clustering operation, e.g., as described above, or via Statistical Process Control (SPC), or SPC charts, to determine those that are otherwise outside a cluster (e.g., via hypothetical testing) or outside a pre-defined standard deviation.

[0144] An example of computing a dissimilarity measure or distance between two Hidden Markov Models is provided by Chen, Yukun, Jianbo Ye, and Jia Li. "Aggregated Wasserstein Metric and State Registration for Hidden Markov Models." arXiv preprint arXiv:1711.05792 (2017), which is incorporated by reference herein.

[0145] Virtual Metrology. In some embodiments, the shared Hidden Markov model and/or thresholds may be employed in a machine-learning environment for virtual metrology. The Hidden Markov model and/or thresholds can be employed to generate inputs to train a neural network or machine learning algorithm. The trained neural network or machine learning algorithm can then be used to create output that serves as a virtual metrology measurement, e.g., film thickness from chemical vapor deposition, critical dimensions in etching (e.g., trench width, trench depth), critical dimensions in photolithography (e.g., overlay errors).

[0146] In some embodiments, once the Hidden Markov model and/or thresholds are shared between tools (e.g., 102d, 102e), the respective analysis system 102d aggregates the models of other tools. While the implementation of edge analysis of individual tools reduces the complexity of the system implementation, aggregation of analysis of data from different individual tools is not trivial.

[0147] To combine the Hidden Markov models and/or thresholds from multiple individual tools, the analysis may consider the application of each transmitted HMM and thresholds to parsing of a signal s, yielding likelihood £;(■$) of that signal’s parsing under the Hidden Markov Model and thresholds from tool i. The term TM(s) denotes the aggregated virtual metrology model evaluated for signal 5 and developed using HMM and thresholds enabling parsing of signal 5 on each of those tools i and can be determined per Equation 6.

(Eq. 6) [0148] In Equation 6, the EM₍(s) is a virtual metrology model for a tool i evaluated for signal 5 (e.g., parsed by the HMM and thresholds enabling parsing of that signal on tool /), and Tj(s) is the likelihood signal 5 when it is parsed using HMM and/or threshold i from tool i. Each of these Virtual Metrology models can be realized using various AI/ML tools, such as artificial neural networks, support vector machine regression, Lasso regression, and others.

[0149] Machine Learning. In addition to the machine learning features described above, the various analysis system can be implemented using one or more artificial intelligence and machine learning operations. The term “artificial intelligence” can include any technique that enables one or more computing devices or computing systems (i.e., a machine) to mimic human intelligence. Artificial intelligence (Al) includes but is not limited to knowledge bases, machine learning, representation learning, and deep learning. The term “machine learning” is defined herein to be a subset of Al that enables a machine to acquire knowledge by extracting patterns from raw data. Machine learning techniques include, but are not limited to, logistic regression, support vector machines (SVMs), decision trees, Naive Bayes classifiers, and artificial neural networks. The term “representation learning” is defined herein to be a subset of machine learning that enables a machine to automatically discover representations needed for feature detection, prediction, or classification from raw data. Representation learning techniques include, but are not limited to, autoencoders and embeddings. The term “deep learning” is defined herein to be a subset of machine learning that enables a machine to automatically discover representations needed for feature detection, prediction, classification, etc., using layers of processing. Deep learning techniques include but are not limited to artificial neural networks or multilayer perceptron (MLP).

[0150] Machine learning models include supervised, semi-supervised, and unsupervised learning models. In a supervised learning model, the model learns a function that maps an input (also known as feature or features) to an output (also known as target) during training with a labeled data set (or dataset). In an unsupervised learning model, the algorithm discovers patterns among data. In a semi-supervised model, the model leams a function that maps an input (also known as a feature or features) to an output (also known as a target) during training with both labeled and unlabeled data.

[0151] Neural Networks. An artificial neural network (ANN) is a computing system including a plurality of interconnected neurons (e.g., also referred to as “nodes”). This disclosure contemplates that the nodes can be implemented using a computing device (e.g., a processing unit and memory as described herein). The nodes can be arranged in a plurality of layers, such as an input layer, an output layer, and optionally one or more hidden layers with different activation functions. An ANN having hidden layers can be referred to as a deep neural network or multilayer perceptron (MLP). Each node is connected to one or more other nodes in the ANN. For example, each layer is made of a plurality of nodes, where each node is connected to all nodes in the previous layer. The nodes in a given layer are not interconnected with one another, i.e., the nodes in a given layer function independently of one another. As used herein, nodes in the input layer receive data from outside of the ANN, nodes in the hidden layer(s) modify the data between the input and output layers, and nodes in the output layer provide the results. Each node is configured to receive an input, implement an activation function (e.g., binary step, linear, sigmoid, tanh, or rectified linear unit (ReLU), and provide an output in accordance with the activation function. Additionally, each node is associated with a respective weight. ANNs are trained with a dataset to maximize or minimize an objective function. In some implementations, the objective function is a cost function, which is a measure of the ANN’S performance (e.g., error such as LI or L2 loss) during training, and the training algorithm tunes the node weights and/or bias to minimize the cost function. This disclosure contemplates that any algorithm that finds the maximum or minimum of the objective function can be used for training the ANN. Training algorithms for ANNs include but are not limited to backpropagation. It should be understood that an ANN is provided only as an example machine learning model. This disclosure contemplates that the machine learning model can be any supervised learning model, semisupervised learning model, or unsupervised learning model. Optionally, the machine learning model is a deep learning model. Machine learning models are known in the art and are therefore not described in further detail herein.

[0152] A convolutional neural network (CNN) is a type of deep neural network that has been applied, for example, to image analysis applications. Unlike traditional neural networks, each layer in a CNN has a plurality of nodes arranged in three dimensions (width, height, and depth). CNNs can include different types of layers, e.g., convolutional, pooling, and fully- connected (also referred to herein as “dense”) layers. A convolutional layer includes a set of filters and performs the bulk of the computations. A pooling layer is optionally inserted between convolutional layers to reduce the computational power and/or control overfitting (e.g., by downsampling). A fully-connected layer includes neurons, where each neuron is connected to all of the neurons in the previous layer. The layers are stacked similarly to traditional neural networks. GCNNs are CNNs that have been adapted to work on structured datasets such as graphs.

[0153] Other Supervised Learning Models. A logistic regression (LR) classifier is a supervised classification model that uses the logistic function to predict the probability of a target, which can be used for classification. LR classifiers are trained with a data set (also referred to herein as a “dataset”) to maximize or minimize an objective function, for example, a measure of the LR classifier’s performance (e.g., an error such as LI or L2 loss), during training. This disclosure contemplates that any algorithm that finds the minimum of the cost function can be used. LR classifiers are known in the art and are therefore not described in further detail herein.

[0154] A Naive Bayes’ (NB) classifier is a supervised classification model that is based on Bayes’ Theorem, which assumes independence among features (i.e., the presence of one feature in a class is unrelated to the presence of any other features). NB classifiers are trained with a data set by computing the conditional probability distribution of each feature given a label and applying Bayes’ Theorem to compute the conditional probability distribution of a label given an observation. NB classifiers are known in the art and are therefore not described in further detail herein.

[0155] A k-NN classifier is an unsupervised classification model that classifies new data points based on similarity measures (e.g., distance functions). The k-NN classifiers are trained with a data set (also referred to herein as a “dataset”) to maximize or minimize a measure of the k-NN classifier’s performance during training. This disclosure contemplates any algorithm that finds the maximum or minimum. The k-NN classifiers are known in the art and are therefore not described in further detail herein.

[0156] Experimental Results and Examples

[0157] A study was conducted to evaluate the exemplary method. In one experiment, the method was shown to be able to evaluate and perform corrections on 60 days of data collected at a 1Hz sampling rate (over 600GB of data). The dataset could be evaluated at a rate of about 1 data equivalent day per hour. Process control methods were applied to a 30-day subsection of the data.

[0158] The 60-day data set was evaluated according to the exemplary segmentation error correction and was observed to have 197 excursions (182 in the steady state issues). The data set includes 927 different combinations of sensor and recipe types, covering different forms of measurement of gas flow, pressure, angle, and temperature. No adjustments to parameters or methodology were made manually for this dataset.

[0159] A second analysis was conducted that ran a Plasma Enhanced Chemical Vapor Deposition (PECVD) dataset. This dataset included 116,000 wafers from 4 recipes, with 12 sensors recorded at lOhz for each wafer. Compression rates of the analysis were observed on average at 5 to 1. The analysis was performed using a computer with an i7-6700 @2.60GHz 4-core processor and was completed in 5 hours.

[0160] Discussion

[0161] The ability to exploit data-driven process control and decision-making frameworks is rapidly becoming critical to success in semiconductor manufacturing. At the same time, advances in manufacturing equipment sensors have seen dramatic increases in sampling rates in recent years, which has led to the ability to capture transient effects in signals with higher fidelity than previously possible. It is known that data-driven process control and decision-making methodologies rely on the process of extraction of useful information from raw data signals. To that end, the current manuscript presents a novel methodology for the extraction of information from data in the form of a feature set that faithfully and reliably depicts both the transient and stationary portions of the signals. The solution proposed is an automated dynamics- inspired approach that looks to segment a signal into steady-state and transient components before summarizing each segment into a set of relevant signatures. The steady-state segments are summarized through a set of statistics, and each transient is reduced to a set of parameters relating to the underlying system dynamics, such as settling time, rise time, overshoots, etc. The impactful novel information content of the resulting dynamics-inspired feature set is evaluated by application to chamber matching, product defect level prediction, and product quality characteristic prediction in etching and deposition processes executed in various tools across several modem 300mm fabs.

[0162] Semiconductor manufacturing involves a large number of complex processes being executed sequentially on a workpiece to create components critical to the functioning of electronic devices. The evolution in quality and functionality of these devices has been driven by rapid improvements in semiconductor technology, with ever-smaller component dimensions being demanded and produced. This progression in semiconductor technology has been enabled by extremely tight control and consistent execution of the underlying manufacturing processes. As the dimensions and allowable tolerances on semiconductor components become restrictively small, the ability to reliably and profitably manufacture them is a growing challenge. This requires that manufacturing equipment must function in near-perfect conditions, changes in behavior must be recognized and localized in a timely manner, variations in product quality must be caught as they are generated, and advanced process control must be realized. Consequently, recognition and handling of phenomena that cannot be observed by simple human intervention due to the scale and volume of the data, as well as the need to handle data in high dimensions, has become an urgent need of this industry.

[0163] Naturally, there have also been developments in manufacturing and metrology equipment technologies employed in this industry. While the equipment has long been equipped with a large array of sensors 1, these sensors have traditionally collected data at relatively low sampling rates, 1 Hz or lower. A recognition of the need for data-driven decisions and control has seen these sampling rates rise in recent years, now commonly residing between 3-10 Hz [1] [2] . These densely sampled signals are able to capture short-lived effects with higher fidelity, as shown in Figure 1.

[0164] At the same time, data-driven methods that facilitate process control and decisionmaking without the need for expert knowledge or deep physics-based understanding of the underlying processes have also been a focus of much work [3], These methods are inherently founded on the informational content available from the collected data. The criticality of feature extraction and analysis is highlighted in [15], However, the process of extraction of useful information from the raw data has not progressed at the same rate. Some important aspects of data mining for manufacturing environments are discussed in [16], However, the time-series analysis methods mentioned here are primarily useful for predictive purposes related to a sensor itself but do not necessarily contain the information required for modeling process performance. On the other hand, the classification methodologies discussed in [16] are leveraged in this work. In [17], Cheng et. al discusses the increasingly important role big data analytics are playing in production environments. The work presented in this manuscript contributes to the domains of quality improvement, defect analysis, and fault diagnosis.

[0165] In fact, in the realm of semiconductor manufacturing, the process of extracting informative signatures from raw data remains almost identical to that employed while the sampling rates were much lower. These traditional methods focus on the statistical characteristics of the signals, such as mean values, standard deviations, peak-to-peak values, and occasionally even higher-order statistics, such as skewness, kurtosis, and entropy. In practice, the statistics are determined either for the entire signal, or for portions of the signal that are specified by user-defined windows, which usually require expert knowledge or large amounts of manual investigation of historical data. Inevitably, these characteristics are limited to steady-state portions of the signals and are not equipped to access the novel information content in densely sampled signals, where transient phenomena are more faithfully depicted than what could be observed at lower sampling rates. Consequently, studies have shown that even the traditional information extracted from the data is leveraged to a very limited extent, with 2%-5% utilization [4], This leads to poor utilization of the data, with vast amounts of raw data stored temporarily in data lakes, or similar environments, before being discarded in order to free storage resources for newer data. Let us note that when it comes to densely sampled signals, the frequently utilized frequency [5] [6] and time-frequency analysis-based methods [7] offer great value in handling data from rotating machinery, where some underlying harmonic phenomena exist. However, they are inadequate for the analysis of many signals in semiconductor manufacturing applications, which are often driven by phenomena that are nonharmonic and non-cyclic in nature. It is evident that there is a need for an automated and repeatable method for the extraction of information from the raw data collected during semiconductor manufacturing processes and that this method must be capable of capturing the characteristics of not only the steady-state portions of the signal but also transient phenomena.

[0166] The current manuscript proposes such a methodology, utilizing an approach for the automatic segmentation of signals into steady-state and transient portions before summarizing each of these segments into a set of informative features. The steady-state segments are represented by traditional statistics-inspired characteristics, such as mean, standard deviation, peak-to-peak values, and maximum or minimum values.

[0167] On the other hand, the transient phenomena are summarized using a set of characteristics that depict the underlying system dynamics, as stipulated by the IEEE standards [8], These characteristics include, for instance, the settling time, rise time, and overshoots. This set of signatures then represents what shall be referred to as a “dynamics-inspired” feature set2, as it incorporates manifestations of the underlying dynamics of the system and process. This solution overcomes some of the major limitations associated with the currently available technologies, enabling access to information about the underlying system’s dynamics characteristics, avoiding the need for manually specified portions of the signal for analysis, and enabling detection and monitoring of unprecedented phenomena.

[0168] Naturally, the importance of this contribution must be highlighted by its ability to inform critical decisions in semiconductor manufacturing fabs. We shall establish the usefulness of this information extraction tool by application to inform various important decision-making tasks faced in modem semiconductor fabrication facilities. The data used for these tests is acquired in operation at multiple leading 300 mm fabs.

[0169] The remainder of this paper is organized as follows. Section 2 presents the methodology for signal parsing and construction of the features from the signals, while Section 3 presents the results of utilizing the newly available sensory signatures for chamber matching, prediction of product defect levels, and virtual metrology for characteristic quality prediction. Finally, Section 4 discusses the implications of this work and also mentions potential avenues for future work.

[0170] Etching background (from prior art manuscript). Due to the fast advancement of technology nowadays, electronic devices play an increasingly important role in both daily life and industrial manufacturing. No matter if it is consumer electronic products, such as personal computers and smartphones, or electronic devices for industry, such as integrated circuits and medical equipment, all modem electronic devices contain semiconductors. During the mid-20th century, semiconductor device fabrication was introduced, and integrated circuits were thus manufactured in mass. Semiconductors are manufactured with a highly complex process, which involves 250-500 steps during the wafer fabrication process. A very important step in semiconductor device fabrication is the etching process, during which layers from the surface of wafers are chemically removed, and then the wafers would be ready to be modified and further processed to define circuit elements. [0171] The two fundamental types of etching processes are categorized as liquid-phase (“wet”) and plasma-phase (“dry”), and each type has different variations. Between these two etching types, plasma etching is the main focus of this thesis due to its wide application in the industry.

[0172] The dry etching process is the process in which plasma removes the masked pattern on the surface of semiconductor wafers in a vacuum chamber. Dry etching is most commonly used for semiconductors that are difficult to be wet-etched and has the advantage of low chemical material consumption and high etching speed. Usually, the dry etching hardware includes a gas delivery system, a waveform generator, and an exhaust system besides the main chamber. During the dry etching process, there is always an accumulation of byproducts on the parts or side walls of the chamber. As the byproducts accumulate during the etching process, they might drop on the wafer and cause damage to the wafer. This situation is one of the reasons for the change in the data. Other cases, such as changes in upstream processes and data drift, could also cause changes in data [1], The degradation of this process is unobservable and extremely difficult to monitor.

[0173] Advanced process control (APC) was then developed in the 1990s, and it was a key component in improving the maintenance of semiconductor fabrication [2], There are two parts contained in APC: run-to-run control (R2R) and fault detection and classification (FDC). While the disclosure discussed the exemplary anomaly detection from an FDC implementation (to detect faulty data and classify faulty data from healthy data during wafer processing), R2R implementations may be used as well.

[0174] In recent years, there has been increasing interest in developing and improving technologies to detect faulty products, evaluate mechanical degradation, and predict the future failure of machines. With the development of modem manufacturing and the introduction of information systems, manufacturing has become more and more automated. It is expected that the machine degradation and future failure prediction could be made automatically by adding sensors and information systems into the manufacturing process to monitor the machine's health condition and make decisions accordingly. To achieve this, fault detection is the key to machine health assessment. Thus, “Industry 4.0” was proposed, which introduced Prognostics and Health Management (PHM) techniques. Instead of traditional fail-and-fix type maintenance, PHM aims to bring advanced health monitoring models to evaluate and manage machine asset’s health and provide state-of-art predict-and-prevent type maintenance to modem manufacturing. To ensure product quality, conventional information systems monitor the machine's performance by reading values from add-on sensors and setting up thresholds based on expertise and experience. Later, machine health status could be inferred based on the references mentioned above; however, due to varied experiences, biased inferences, and multiple parameters, this information system, and related decisions may fail to point out the machine defects or predict the machine failure. On the other hand, PHM could be able to detect sudden malfunctions and hidden degradation. Then the intelligent model could perform assets health evaluation and management and predict remaining useful life (RUL) to make maintenance decisions and optimize the spare part inventory.

[0175] In PHM techniques, fault detection is the precondition for faulty diagnosis and RUL prediction. Fault detection is mainly about inspecting the machine's health status and detecting faulty conditions of manufacturing processes or products. There are two fundamental types of modeling in fault detection: the physical-based model and the data-driven model. However, the former model requires complex mathematical models, is time-consuming, and results in higher costs than the latter model. This results in a trend of adopting the data-driven model or a combined model in the development of fault detection methods. Thus, this thesis focuses on the pure data-driven models and different types of features used in each model to detect faulty condition during the semiconductor etching process.

[0176] Although Prognostics and Health Management (PHM) provides different methods to perform fault detection by building pure data-driven models that would help save maintenance costs, there are three major issues remaining. One issue is data quality. Quality is a characteristic of data, and it also indicates that data need to fulfill certain requirements. Several indicators are used to evaluate data quality, including completeness, conformity, consistency, accuracy, and uniqueness. In the data-driven method, data is the core of the whole method. To identify faulty behavior from normal behavior, the decision is usually made based on the data itself. The vector or data set represented by “abnormal” data could be seen as faulty or as an anomaly and thus should be distinguished. However, if the data quality is not good (such as mislabeled data or missing values), there might be problems in identifying the faulty behavior of a machine. Thus, certain data pre-processing steps are required to be followed to improve data quality. In case of missing values or data with different timestamps, data synchronization is the first step to take. Secondly, regime separation is a significant method for separating data into different working regimes, which makes it easier for the next step of feature extraction. Working regimes could be separated by categorizing data into different groups of different working parameters. For example, for data extracted from a rotary machine element, the data could be separated by the rotation speed. The third step is data cleaning to delete outliers and remove the incorrect measurements, which requires expert knowledge and known distribution of data samples. The last step is data partitioning, which partitions data into training data sets, validation data sets, and testing data sets.

[0177] After the data pre-processing steps mentioned above, data quality could be improved. However, there are still remaining challenges, such as training data insufficiency and large background noise due to the data acquisition system. Therefore, in order to further improve data quality, feature extraction is another important step to take before starting to build a model. Features are compact information extracted from the raw data, and they could be either context or numerical. They usually have two characteristics that are relevant to the problem and uncorrelated to each other. Thus, feature extraction reduces data dimensionality and data volume. The purpose of extracting features is that it is more efficient to use features than raw data in modeling. Furthermore, there are three basic types of features: time-domain features, frequencydomain features, and time-frequency domain features. Among all three types of features, timedomain features are most commonly used because they could represent the distribution of raw data and thus are extremely useful for building models. In the case of fault detection of the semiconductor etching process, time-domain features are also preferred due to their ease of extraction.

[0178] Another issue is model robustness. Model is a key factor that affects overall fault detection effectiveness, and a suitable model is essential to the problem as well. Since the semiconductor etching process is an unobservable process, fault detection is extremely hard to perform since there is no explicit definition of a “faulty” state, and there are a great many anomaly data points. Furthermore, the performance of inappropriate models could be inconsistent or unstable, encountering this unobservable process. To select an effective model, not only basic knowledge but also a deep understanding of the characteristics of the model is required.

[0179] In a word, these three issues about data quality, feature quality, and model robustness are consistent challenges in the design of every system. Other issues such as insufficient data quantity and complex wafer processing procedures could not be ignored as well. [0180] Example Computing System [0181] It should be appreciated that the logical operations described above can be implemented (1) as a sequence of computer-implemented acts or program modules running on a computing system and/or (2) as interconnected machine logic circuits or circuit modules within the computing system. The implementation is a matter of choice dependent on the performance and other requirements of the computing system. Accordingly, the logical operations described herein are referred to variously as state operations, acts, or modules. These operations, acts, and/or modules can be implemented in software, in firmware, in special purpose digital logic, in hardware, and any combination thereof. It should also be appreciated that more or fewer operations can be performed than shown in the figures and described herein. These operations can also be performed in a different order than those described herein.

[0182] The computer system is capable of executing the software components described herein for the exemplary method or systems. In an embodiment, the computing device may comprise two or more computers in communication with each other that collaborate to perform a task. For example, but not by way of limitation, an application may be partitioned in such a way as to permit concurrent and/or parallel processing of the instructions of the application.

Alternatively, the data processed by the application may be partitioned in such a way as to permit concurrent and/or parallel processing of different portions of a data set by the two or more computers. In an embodiment, virtualization software may be employed by the computing device to provide the functionality of a number of servers that are not directly bound to the number of computers in the computing device. For example, virtualization software may provide twenty virtual servers on four physical computers. In an embodiment, the functionality disclosed above may be provided by executing the application and/or applications in a cloud computing environment. Cloud computing may comprise providing computing services via a network connection using dynamically scalable computing resources. Cloud computing may be supported, at least in part, by virtualization software. A cloud computing environment may be established by an enterprise and/or can be hired on an as-needed basis from a third-party provider. Some cloud computing environments may comprise cloud computing resources owned and operated by the enterprise as well as cloud computing resources hired and/or leased from a third-party provider.

[0183] In its most basic configuration, a computing device includes at least one processing unit and system memory. Depending on the exact configuration and type of computing device, system memory may be volatile (such as random-access memory (RAM)), non-volatile (such as read-only memory (ROM), flash memory, etc.), or some combination of the two.

[0184] The processing unit may be a standard programmable processor that performs arithmetic and logic operations necessary for the operation of the computing device. While only one processing unit is shown, multiple processors may be present. As used herein, processing unit and processor refers to a physical hardware device that executes encoded instructions for performing functions on inputs and creating outputs, including, for example, but not limited to, microprocessors (MCUs), microcontrollers, graphical processing units (GPUs), and applicationspecific circuits (ASICs). Thus, while instructions may be discussed as executed by a processor, the instructions may be executed simultaneously, serially, or otherwise executed by one or multiple processors. The computing device may also include a bus or other communication mechanism for communicating information among various components of the computing device. [0185] Computing devices may have additional features/functionality. For example, the computing device may include additional storage such as removable storage and non-removable storage, including, but not limited to, magnetic or optical disks or tapes. Computing devices may also contain network connection(s) that allow the device to communicate with other devices, such as over the communication pathways described herein. The network connection(s) may take the form of modems, modem banks, Ethernet cards, universal serial bus (USB) interface cards, serial interfaces, token ring cards, fiber distributed data interface (FDDI) cards, wireless local area network (WLAN) cards, radio transceiver cards such as code division multiple access (CDMA), global system for mobile communications (GSM), long-term evolution (LTE), worldwide interoperability for micro wave access (WiMAX), and/or other air interface protocol radio transceiver cards, and other well-known network devices. Computing devices may also have input device(s) such as keyboards, keypads, switches, dials, mice, trackballs, touch screens, voice recognizers, card readers, paper tape readers, or other well-known input devices. Output device(s) such as printers, video monitors, liquid crystal displays (LCDs), touch screen displays, displays, speakers, etc., may also be included. The additional devices may be connected to the bus in order to facilitate the communication of data among the components of the computing device. All these devices are well-known in the art and need not be discussed at length here. [0186] The processing unit may be configured to execute program code encoded in tangible, computer-readable media. Tangible, computer-readable media refers to any media that is capable of providing data that causes the computing device (i.e., a machine) to operate in a particular fashion. Various computer-readable media may be utilized to provide instructions to the processing unit for execution. Example tangible, computer-readable media may include but is not limited to volatile media, non-volatile media, removable media, and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. System memory 230, removable storage, and non-removable storage are all examples of tangible computer storage media. Examples of tangible, computer-readable recording media include, but are not limited to, an integrated circuit (e.g., field-programmable gate array or application-specific IC), a hard disk, an optical disk, a magneto-optical disk, a floppy disk, a magnetic tape, a holographic storage medium, a solid-state device, RAM, ROM, electrically erasable program read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices.

[0187] In light of the above, it should be appreciated that many types of physical transformations take place in the computer architecture in order to store and execute the software components presented herein. It also should be appreciated that the computer architecture may include other types of computing devices, including hand-held computers, embedded computer systems, personal digital assistants, and other types of computing devices known to those skilled in the art.

[0188] In an example implementation, the processing unit may execute program code stored in the system memory. For example, the bus may carry data to the system memory 230, from which the processing unit receives and executes instructions. The data received by the system memory may optionally be stored on the removable storage or the non-removable storage before or after execution by the processing unit.

[0189] It should be understood that the various techniques described herein may be implemented in connection with hardware or software or, where appropriate, with a combination thereof. Thus, the methods and apparatuses of the presently disclosed subject matter, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium wherein, when the program code is loaded into and executed by a machine, such as a computing device, the machine becomes an apparatus for practicing the presently disclosed subject matter. In the case of program code execution on programmable computers, the computing device generally includes a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. One or more programs may implement or utilize the processes described in connection with the presently disclosed subject matter, e.g., through the use of an application programming interface (API), reusable controls, or the like. Such programs may be implemented in a high-level procedural or object-oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language if desired. In any case, the language may be a compiled or interpreted language, and it may be combined with hardware implementations.

[0190] Although example embodiments of the present disclosure are explained in some instances in detail herein, it is to be understood that other embodiments are contemplated. Accordingly, it is not intended that the present disclosure be limited in its scope to the details of construction and arrangement of components set forth in the following description or illustrated in the drawings. The present disclosure is capable of other embodiments and of being practiced or carried out in various ways.

[0191] It must also be noted that, as used in the specification and the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. Ranges may be expressed herein as from “about” or “5 approximately” one particular value and/or to “about” or “approximately” another particular value. When such a range is expressed, other exemplary embodiments include one particular value and/or the other particular value.

[0192] By “comprising” or “containing” or “including,” it meant that at least the name compound, element, particle, or method step is present in the composition or article or method, but does not exclude the presence of other compounds, materials, particles, method steps, even if the other such compounds, material, particles, method steps have the same function as what is named.

[0193] In describing example embodiments, terminology will be resorted to for the sake of clarity. It is intended that each term contemplates its broadest meaning as understood by those skilled in the art and includes all technical equivalents that operate in a similar manner to accomplish a similar purpose. It is also to be understood that the mention of one or more steps of a method does not preclude the presence of additional method steps or intervening method steps between those steps expressly identified. Steps of a method may be performed in a different order than those described herein without departing from the scope of the present disclosure. Similarly, it is also to be understood that the mention of one or more components in a device or system does not preclude the presence of additional components or intervening components between those components expressly identified.

[0194] The term “about,” as used herein, means approximately, in the region of, roughly, or around. When the term “about” is used in conjunction with a numerical range, it modifies that range by extending the boundaries above and below the numerical values set forth. In general, the term “about” is used herein to modify a numerical value above and below the stated value by a variance of 10%. In one aspect, the term “about” means plus or minus 10% of the numerical value of the number with which it is being used. Therefore, about 50% means in the range of 45%-55%. Numerical ranges recited herein by endpoints include all numbers and fractions subsumed within that range (e.g., 1 to 5 includes 1, 1.5, 2, 2.75, 3, 3.90, 4, 4.24, and 5). [0195] Similarly, numerical ranges recited herein by endpoints include subranges subsumed within that range (e.g., 1 to 5 includes 1-1.5, 1.5-2, 2-2.75, 2.75-3, 3-3.90, 3.90-4, 4- 4.24, 4.24-5, 2-5, 3-5, 1-4, and 2-4). It is also to be understood that all numbers and fractions thereof are presumed to be modified by the term “about.”

[0196] The following patents, applications, and publications, as listed below and throughout this document, are hereby incorporated by reference in their entirety herein.

[1 ’] Haq, Asad Arsalan Ul, and Dragan Djurdjanovic. "Dynamics-inspired feature extraction in semiconductor manufacturing processes." Journal of Industrial Information Integration 13 (2019): 22-31.

[2’] R. Dailey and D. Djurdjanovic,” Software for signal segmentation and extraction of informative time-domain features,” software disclosure at The University of Texas at Austin, disclosure identification number 7751 DJU, 2021.

[3’] P. Kosir, R. DeWall, R. Mitchell, Feature alignment for pattern recognition, Proc, of the IEEE 1994 National Aerospace and Electronics Conference, 1994, pp. 128-132.

[4/] Tian, Runfeng. "An Enhanced Approach using Time Series Segmentation for Fault Detection of Semiconductor Manufacturing Process." PhD diss., University of Cincinnati, 2019. Second Set of References

[1] J. Dietz and F. Ducrot, “New Vita controller demonstrates performance on 1 million wafers,” 2014, [Online], Available: http://www.appliedmaterials.com/nanochip/nanochip-fab- solutions/December-2014/new-vita-controller-demonstrates-performance-on-l-million-wafers

[2] Lam Research Corporation, “enabling Chipmakers to Create the Future,” [Online], Available:http://www.lamresearch.com/products/products-overview

[3] J. Moyne and J. Iskander, “Big data analytics for smart manufacturing: case studies in semiconductor manufacturing,” Processes, vol. 5, no. 39, 2017. doi:10.3390/pr5030039.

[4] Tom DiChristopher, “Oil firms are swimming in data they don’t use,” 2015, [Online], Available: http:// www.cnbc.com/2015/03/05/us-energy-industry-collects-a-lot-of-operational- data-but-doesntuse-it.html [5] P. Arun, S.A. Lincon, and N. Prabhakaran, “Detection and Characterization of Bearing Faults from the Frequency Domain Features of Vibration,” IETE Journal of Research, 2017. DOI: 10.1080/03772063.2017.1369369.

[6] I. Romero and L. Serrano, “ECG frequency domain features extraction: a new characteristic for arrhythmias classification,” in Proc, of the 23rd Annual Int. Conf, of the IEEE Engineering in Medicine and Biology Society, vol.22001, pp. 2006-2008. doi:

10.1109/IEMBS.2001.1020624.

[7] L. Cohen, Time-frequency Analysis. Upper Saddle River, NJ: Prentice Hall, 1994.

[8] IEEE Std 181TM - 2011, “IEEE Standard for Transitions, Pulses, and Related Waveforms IEEE Instrumentation and Measurement Society,” New York, 2011.

[9] R. Rakotomalala and F. Mhamdi, “Using the text categorization framework for protein classification,” in Handbook of Research of Text and Web Mining Technologies, 2009, ch. 8, pp. 128-141.

[10] P. Kosir, R. DeWall, and R. Mitchell, “Feature alignment for pattern recognition,” in Proc, of the IEEE 1994 National Aerospace and Electronics Conference, 1994, pp. 128-132.

[11] J. Yang and V. Honavar, “Feature subset selection using a genetic algorithm,” IEEE Intelligent Systems, vol. 13, no. 2, 1998, pp. 44-49.

[12] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and other Kemelbased Learning Methods. Cambridge, UK: Cambridge University Press, 2000.

[13] A. Bleakie and D. Djurdjanovic, “Growing Structure Multiple Model System for quality estimation in manufacturing processes,” IEEE Transactions on Semiconductor Manufacturing, vol. 29, no. 2, 2016, pp. 79-97.

[14] B. Lu, J. Stuber, and T. F. Edgar, “Data-driven adaptive multiple model system utilizing growing self-organizing maps,” Journal of Process Control, 2017. [Online] Available: http://dx.doi.Org/10.1016/j.jprocont.2017.06.006

[15] H. X. Li and L. D. Xu, “Feature Space Theory-A Mathematical Foundation for Data Mining,” Knowledge-Based Systems, 14, 253-257, 2001. http://doi.org/10.1016/S0950- 7051(01)00103-4.

[16] M. S. Packianather, A. Davies, S. Harraden, S. Soman, and J. White, “Data mining techniques applied to a manufacturing SME,” in 10th CIRP Conference on Intelligent Computation in Manufacturing Engineering, 2016.

[17] Y. Chen, K. Chen, H. Sun, Y. Zhang, and F. Tao, “Data and knowledge mining with big data towards smart production,” Journal of Industrial Informatics, no. 9, 2018, pp. 1-13. doi:

[18] Chen, Yukun, Jianbo Ye, and Jia Li. "Aggregated Wasserstein Metric and State Registration for Hidden Markov Models." arXiv preprint arXiv:1711.05792 (2017).

Claims

What is claimed is:

1. A method to detect an anomaly or other analysis in a fabrication process for semiconductor devices, the method comprising:

(a) generating a template hidden Markov model to align a first time-series data collected from a sensor and associated with the fabrication process of a fabricated semiconductor device by:

(i) retrieving, by a processor, a plurality of training sensor data sets associated with a plurality of fabricated semiconductor devices or associated processes, wherein each of the plurality of training sensor data set comprises a training time-series data that is associated with a fabricated semiconductor device of the plurality of fabricated semiconductor devices;

(ii) segmenting, by the processor, the each of the training time-series data to generate a plurality of segment data for the plurality of sensor data;

(iii) performing, by the processor, a hidden Markov model analysis of the plurality of segment data to generate a template hidden Markov model that describes the hidden states of the plurality of fabricated semiconductor devices; and

(iv) generating, by the processor, an ordered sequence of states of the plurality of segment data using parameters of the template hidden Markov model;

(b) retrieving, by the processor, the first time-series data associated with the fabrication process of the fabricated semiconductor device; and

(c) aligning, by the processor, the first time-series data to a second time-series data associated with the same fabrication process using the generated ordered sequence of states, wherein the first time-series data is compared to the second time-series data to determine an analytical output in the fabrication process for the fabricated semiconductor device.

2. The method of claim 1, further comprising: comparing, by the processor, the first time-series data to the second time-series data to determine the anomaly in the fabrication process for the fabricated semiconductor device.

3. The method of claim 1 or 2, wherein the first time-series data is acquired from a first semiconductor device, wherein the second time-series data is acquired from a second semiconductor device, wherein the first semiconductor device and the second semiconductor device are in the same fabrication batch of fabricated semiconductor devices, wherein a batch is subjected to a same or similar process of fabrication for a given device pattern on a wafer.

46

4. The method of claim 1 or 2, wherein the first time-series data is acquired from a first semiconductor device, wherein the second time-series data is acquired from a second semiconductor device, wherein the first semiconductor device and the second semiconductor device are in different fabrication batches of fabricated semiconductor devices, wherein a batch is subjected to a same or similar process of fabrication for a given device pattern on a wafer.

5. The method of any one of claims 1-4, wherein the step of aligning is performed using a Viterbi algorithm or a max-sum algorithm.

6. The method of any one of claims 1-5, wherein the step of segmenting the first time-series data to generate a plurality of segment data comprises: segmenting the first time-series data into a plurality of steady-state segments by determining, using a moving window of a predetermined size, along with the first time-series data, a set of regions of the first time-series data having values within a pre-defined threshold profile; and segmenting the first time-series data into a plurality of transient state segments by labeling regions outside the plurality of steady-state segments as a plurality of transient state segments.

7. The method of any one of claims 1-6, wherein the sensor that collected the first timeseries data is a part of manufacturing equipment of the fabricated semiconductor device, wherein the manufacturing equipment is selected from the group consisting of a plasma etching system, a liquid solution-etching system, a plasma-enhanced chemical vapor deposition system, a thin-film deposition system, a molecular-beam epitaxy (MBE) system, an electron beam melting (EBM) system, a chemical vapor deposition (CVD) system, and a roll-to-roll web coating system.

8. The method of any one of claims 1-6, wherein the sensor that collected the first timeseries data is a metrology or inspection equipment selected from the group consisting of: a wafer prober, imaging station, ellipsometer, CD-SEM, ion mill, C-V system, interferometer, source measure units (SME) magnetometer, optical and imaging system, profilometer, reflectometer, resistance probe, resistance high-energy electron diffraction (RHEED) system, and X-ray diffractometer.

47

9. The method of any one of claims 1-6, wherein the first time-series data is retrieved from a controller of a manufacturing equipment of the fabricated semiconductor device, wherein the controller of the manufacturing equipment is operatively connected to the sensor.

10. The method of any one of claims 1-9, wherein the first time-series data comprises observed measurements of a metrology signal associated with a device pattern on a wafer.

11. The method of any one of claims 1-9, wherein the first time-series data comprises observed measurements of a power signal, a pressure signal, a temperature signal, a volume signal, a flow rate signal, a voltage signal, an optical signal, any of which is associated with a fabrication process.

12. The method of any one of claims 1-11, wherein the first time-series data is compared to the second time-series data to determine accurate tool matching between a first fabrication equipment and a second fabrication equipment employed in a same fabrication process.

13. The method of any one of claims 1-12, wherein the first time-series data is compared to the second time-series data to generate an indication of a quality of a fabrication process or an associated fabrication equipment.

14. The method of any one of claims 1-13, wherein for a given wafer k, each sensor i collects a signal 6_k of length p_k ^l .

15. The method of any one of claims 1-14 further comprising: retrieving, by the processor, a set of second time-series data associated with the fabrication process of the fabricated semiconductor device; and aligning, by the processor, the set of second time-series data to a set of third time-series data associated with the same fabrication process based on the hidden Markov model analysis, wherein the set of second time-series data comprises more than 50 sensors sampled at 1 Hz, 5 Hz, 10 Hz, or at a sampling rate in between.

48

16. The method of any one of claims 1-15, wherein the step of generating the template hidden Markov model comprises: segmenting, by the processor, the time-series data to generate the plurality of segment data and determining alignment statistics of the plurality of segment data; clustering the plurality of segments based on alignment statistics; and determining a transition matrix and an emission parameter matrix based on the clustering.

17. The method of any one of claims 1-16, wherein the steps of generating the template hidden Markov model is performed for over 100 sensor readings for a given fabrication process, wherein the operation is performed near real-time in between batch processing.

18. The method of claim 17 further comprising: generating an alert when an anomaly in the given fabrication process is detected.

19. The method of any one of claims 1-18, wherein the method is performed at a remote analysis system for a plurality of semiconductor fabrication equipment.

20. The method of any one of claims 1-18, wherein the method is performed at an analysis system for a semiconductor fabrication equipment.

21. The method of any one of claims 1-20, wherein the analysis system is a part of the semiconductor fabrication equipment.

22. The method of any one of claims 1-20, wherein the analysis system is a part of a controller of a semiconductor fabrication equipment.

23. The method of any one of claims 1-22 further comprising: transmitting the template hidden Markov model of a first semiconductor fabrication equipment to a second semiconductor fabrication equipment configured to generate a second template hidden Markov model, wherein the template hidden Markov model of the first semiconductor fabrication equipment and the second template hidden Markov model are combined at the second semiconductor fabrication equipment for a tool matching operation or virtual metrology operation performed at the second semiconductor fabrication equipment.

24. The method of any one of claims 1 -23 further comprising: transmitting the template hidden Markov model of a first semiconductor fabrication equipment to an analysis system, wherein the analysis system is configured to the template hidden Markov model of the first semiconductor fabrication equipment and the template hidden Markov model of other semiconductor fabrication equipment to determine an anomaly in a fabrication process of first semiconductor fabrication equipment.

25. A metrology system comprising: a processing unit configured by computer-readable instructions to detect an anomaly in a fabrication process for semiconductor devices by:

(b) retrieving the first time-series data associated with the fabrication process of the fabricated semiconductor device;

(c) aligning the first time-series data to a second time-series data associated with the same fabrication process using the generated ordered sequence of states; and

(d) comparing the first time-series data to the second time-series data to determine the anomaly in the fabrication process for the fabricated semiconductor device.

26. The system of claim 25, wherein the instructions to generate the template hidden Markov model comprises:

(i) instructions to retrieve a plurality of training sensor data sets associated with a plurality of fabricated semiconductor devices, wherein each of the plurality of training sensor data set comprises a training time-series data that is associated with a fabricated semiconductor device of the plurality of fabricated semiconductor devices;

(ii) instructions to segment the time-series data to generate a plurality of segment data for the plurality of sensor data;

(iii) instructions to perform a hidden Markov model analysis of the plurality of segment data to generate a template hidden Markov model that describes the hidden states of the plurality of fabricated semiconductor devices; and (iv) instructions to generate an ordered sequence of states of the plurality of segment data using parameters of the template hidden Markov model.

27. Th system of claim 25 or 26, wherein the first time-series data is acquired from a first semiconductor device, wherein the second time-series data is acquired from a second semiconductor device, wherein the first semiconductor device and the second semiconductor device are in the same fabrication batch of fabricated semiconductor devices.

28. The system of claim 25 or 26, wherein the first time-series data is acquired from a first semiconductor device, wherein the second time-series data is acquired from a second semiconductor device, wherein the first semiconductor device and the second semiconductor device are in different fabrication batches of fabricated semiconductor devices.

29. The system of any one of claims 25-28, wherein the instructions to align the first timeseries data to the second time-series data comprise a Viterbi algorithm or a max-sum algorithm.

30. The method of any one of claims 25-29, wherein the instructions to segment the timeseries data to generate a plurality of segment data comprises: instructions to segment the first time-series data into a plurality of steady-state segments by determining, using a moving window of a predetermined size, along with the first time-series data, a set of regions of the first time-series data having values within a pre-defined threshold profile; and instructions to segment the first time-series data into a plurality of transient state segments by labeling regions outside the plurality of steady-state segments as a plurality of transient state segments.

31. The system of any one of claims 25-30, wherein the sensor that collected the first timeseries data is a part of a manufacturing equipment of the fabricated semiconductor device, wherein the manufacturing equipment is selected from the group consisting of a plasma etching system, a liquid solution-etching system, a plasma-enhanced chemical vapor deposition system, a thin-film deposition system, a molecular-beam epitaxy (MBE) system, an electron beam melting (EBM) system, a chemical vapor deposition (CVD) system, and a roll-to-roll web coating system.

32. The system of any one of claims 25-30 wherein the sensor that collected the first timeseries data is a metrology or inspection equipment selected from the group consisting of: a wafer prober, imaging station, ellipsometer, CD-SEM, ion mill, C-V system, interferometer, source measure units (SME) magnetometer, optical and imaging system, profilometer, reflectometer, resistance probe, resistance high-energy electron diffraction (RHEED) system, and X-ray diffractometer.

33. The system of any one of claims 25-30, wherein the first time-series data is retrieved from a controller of a manufacturing equipment of the fabricated semiconductor device, wherein the controller of the manufacturing equipment is operatively connected to the sensor.

34. The system of any one of claims 25-33, wherein the first time-series data comprises observed measurements of a metrology signal associated with a device pattern on a wafer.

35. The system of any one of claims 25-34, wherein the first time-series data comprises observed measurements of a power signal, a pressure signal, a temperature signal, a volume signal, a flow rate signal, a voltage signal, an optical signal, any of which is associated with a fabrication process.

36. The system of any one of claims 25-35, wherein the processing unit is configured by instructions to compare the first time-series data to the second time-series data to determine accurate tool matching between a first fabrication equipment and a second fabrication equipment employed in the same fabrication process.

37. The system of any one of claims 25-36, wherein the processing unit is configured to compare the first time-series data to the second time-series data to generate an indication of a quality of a fabrication process or an associated fabrication equipment.

38. The system of any one of claims 25-37, wherein the processing unit is configured by computer-readable instructions to further retrieve a set of second time-series data associated with the fabrication process of the fabricated semiconductor device; and

52 align the set of second time-series data to a set of third time-series data associated with the same fabrication process based on the hidden Markov model analysis, wherein the set of second time-series data comprises more than 50 sensors sampled at 1 Hz, 5 Hz, 10 Hz, or at a sampling rate in between.

39. The system of any one of claims 25-38, wherein the instructions to generate the template hidden Markov model comprises: the instructions to segment the time-series data to generate the plurality of segment data and determining alignment statistics of the plurality of segment data; instructions to cluster the plurality of segments based on alignment statistics; and instructions to determine a transition matrix and an emission parameter matrix based on the clustering.

40. The system of any one of claims 25-39 further comprising: a metrology sensor system comprising a plurality of sensors configured to acquire a plurality of sensor data.

41. A non-transitory computer-readable medium having instructions stored thereon, wherein execution of the instructions by a processor causes the processor to perform any one of the methods of claims 1-24 or execute the system of claims 25-40.

42. A method to perform operations of any one of the methods of claims 1-24 or execute the system of claims 25-40.

53