CN116340872A

CN116340872A - Method for determining abnormality based on combination of reconstruction and prediction

Info

Publication number: CN116340872A
Application number: CN202310321626.1A
Authority: CN
Inventors: 文成武; 夏敏; 易丛文
Original assignee: Shenzhen Zhixian Future Industrial Software Co ltd
Current assignee: Shenzhen Zhixian Future Industrial Software Co ltd
Priority date: 2023-03-29
Filing date: 2023-03-29
Publication date: 2023-06-27

Abstract

The invention relates to a method for determining abnormality based on combination of reconstruction and prediction, which comprises the following steps: acquiring multivariable time sequence data, wherein the time sequence data is generated by a plurality of sensors of equipment in the semiconductor manufacturing process; inputting a first target subsequence corresponding to a first time window in the time sequence data into a reconstruction model to obtain reconstruction data of the first time window, wherein the cut-off time point of the first time window is an n+1th time point; inputting a second target subsequence of the time sequence data, which is cut to an nth time point, into a prediction model to obtain prediction data of an (n+1) th time point; aggregating target data corresponding to the n+1th time point in the reconstruction data with the prediction data to obtain total prediction data of the n+1th time point; and comparing the total predicted data with measured data corresponding to the time sequence data at the n+1th time point, and determining whether the n+1th time point is abnormal.

Description

Method for determining abnormality based on combination of reconstruction and prediction

Technical Field

The invention relates to the field of semiconductor manufacturing, in particular to a method for determining abnormality based on combination of reconstruction and prediction.

Background

In the semiconductor manufacturing process, there are generally multiple sensors on the same machine to monitor the production status at the same time. There is relevance between each sensor on the same machine, when some abnormal conditions occur, some sensors which are relevant to each other may generate abnormal data at the same time, so as to generate an alarm.

In the prior art, the sensor data are respectively modeled by utilizing univariate time sequence prediction, the data generated by each sensor are predicted independently and then compared with real data, and whether an alarm is given or not is judged according to the comparison result. However, this method may generate a false alarm, that is, in the case where no abnormality has actually occurred, a certain sensor erroneously generates alarm information due to a self-malfunction or an influence of environmental noise or the like. This can result in redundancy of the entire monitoring system, increasing maintenance costs of the production facility.

When a modeling mode is selected, if the data is trained only by using a prediction mode, the model is sensitive to randomness of a time sequence and lacks robustness to disturbance and noise; if the data is trained using only the reconstruction, the model may ignore individual data features therein, and may be subject to false negatives, particularly when the population still conforms to the original window data distribution.

Disclosure of Invention

One or more embodiments of the present disclosure describe a method for determining anomalies based on a combination of reconstruction and prediction, where data generated by multiple sensors is modeled simultaneously using a multivariate sequential anomaly reconstruction model, and the reconstructed data of all sensors at any point in time is combined to be compared with the real data to obtain a better prediction result. Meanwhile, when a modeling mode is selected, training data is trained by using a prediction mode and a reconstruction mode at the same time, so that the internal relation of multi-element time sequence data is better mined, and a better prediction result is obtained.

The present specification provides a method of determining anomalies based on a combination of reconstruction and prediction, comprising:

acquiring multivariable time sequence data, wherein the time sequence data is generated by a plurality of sensors of equipment in the semiconductor manufacturing process;

inputting a first target subsequence corresponding to a first time window in the time sequence data into a reconstruction model to obtain reconstruction data of the first time window, wherein the cut-off time point of the first time window is an n+1th time point;

inputting a second target subsequence of the time sequence data, which is cut to an nth time point, into a prediction model to obtain prediction data of an (n+1) th time point;

aggregating target data corresponding to the n+1th time point in the reconstruction data with the prediction data to obtain total prediction data of the n+1th time point;

and comparing the total predicted data with measured data corresponding to the time sequence data at the n+1th time point, and determining whether the n+1th time point is abnormal.

In one possible embodiment, the method further comprises:

and when the n+1th time point is abnormal, determining an abnormal sensor and a corresponding abnormal type according to the total predicted data and the measured data.

In one possible embodiment, the method further comprises:

and determining corresponding knowledge points according to the anomaly sensor, the anomaly type, the equipment number of the equipment and the number of the wafer being processed by the equipment, wherein the knowledge points are used for generating or updating a knowledge graph in the semiconductor field.

In one possible implementation manner, comparing the total predicted data with the measured data corresponding to the time sequence data at the n+1th time point to determine whether the n+1th time point is abnormal, including:

and calculating errors of the total predicted data and the actually measured data, and determining whether an abnormality occurs at an n+1th time point according to a comparison result of the errors and a preset first threshold value.

In one possible embodiment, determining the anomaly sensor and the corresponding anomaly type from the total predicted data and the measured data includes:

and at an n+1th time point, calculating the error between the total predicted data and the data corresponding to the measured data of any one of the plurality of sensors, and determining whether the target sensor is abnormal and the corresponding abnormal type according to the comparison result of the error and the preset threshold corresponding to the target sensor.

In one possible implementation, the reconstruction model is a variational self-encoder VAE; inputting a first target subsequence corresponding to a first time window in the time sequence data into a reconstruction model to obtain reconstruction data of the first time window, wherein the method comprises the following steps:

encoding the first target subsequence by using an encoder of a variable self-encoder (VAE) to obtain a hidden space variable sequence;

and decoding the hidden space variable sequence by using a decoder of a variable self-encoder (VAE) to obtain the reconstruction data.

In one possible implementation, the predictive model includes a self-attention model, a graph attention network GAT, and a fully connected neural network; inputting a second target subsequence of the time sequence data, which is cut to an nth time point, into a prediction model to obtain prediction data of an (n+1) th time point, wherein the method comprises the following steps of:

coding the second target subsequence based on a self-attention mechanism in a time dimension by using a self-attention model to obtain a coding sequence;

inputting the second target subsequence into a graph annotation force network GAT, and determining graph relation information among the plurality of sensors, wherein any one sensor corresponds to one node in the graph;

and after the coding sequence and the graph relation information are spliced, inputting the information into the fully-connected neural network to obtain the prediction data.

In a possible implementation manner, the reconstruction model and the prediction model are obtained through training of sample time sequence data, and the training includes:

dividing a data sequence in the sample time sequence data into a plurality of sequence fragments with the same size;

and performing multiple rounds of training on the reconstruction model and the prediction model by using the plurality of sequence fragments, wherein each round of training uses one sample sequence fragment in the plurality of sequence fragments, and any round of training comprises:

inputting a third target subsequence corresponding to a third time window in the sample sequence fragment into a reconstruction model to obtain reconstruction data of the third time window, wherein the cut-off time point of the third time window is an (m+1) th time point;

inputting a fourth target subsequence of the sample sequence segment, which is cut to the mth time point, into a prediction model to obtain prediction data of the mth+1th time point;

determining a reconstruction error according to the reconstruction data and the data corresponding to the sample sequence segment at the m+1th time point;

determining a prediction error according to the prediction data and the data corresponding to the sample sequence segment at the (m+1) th time point;

and combining the reconstruction error and the prediction error to obtain a total error, and adjusting the values of parameters in the reconstruction model and the prediction model by minimizing the total error.

In one possible embodiment, combining the reconstruction error with the prediction error results in a total error, including:

adding the reconstruction error and the prediction error to obtain a total error; or (b)

Averaging the reconstruction error and the prediction error to obtain a total error; or (b)

Obtaining the maximum value of the reconstruction error and the prediction error to obtain a total error; or (b)

And (3) calculating the minimum value of the reconstruction error and the prediction error to obtain a total error.

In a possible embodiment, the types of errors include at least: root mean square error RMSE, mean square error MSE and mean absolute error MAE.

According to the method for determining the abnormality based on the combination of reconstruction and prediction, disclosed by the invention, the data generated by a plurality of sensors are simultaneously modeled by using a multivariate time sequence abnormal reconstruction model, and the reconstruction data of all the sensors at any time point are comprehensively compared with the real data so as to obtain a better prediction result. Meanwhile, when a modeling mode is selected, training data is trained by using a prediction mode and a reconstruction mode at the same time, so that the internal relation of multi-element time sequence data is better mined, and a better prediction result is obtained.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments disclosed in the present specification, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only examples of the embodiments disclosed in the present specification, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a block diagram of a method for determining anomalies based on a combination of reconstruction and prediction in accordance with an embodiment of the present invention;

FIG. 2 is a flow chart of a method for determining anomalies based on a combination of reconstruction and prediction in accordance with an embodiment of the present invention;

FIG. 3 is a flow chart of a method of training a reconstruction model and a predictive model in accordance with an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

FIG. 1 illustrates a framework for a method of determining anomalies based on a combination of reconstruction and prediction, according to one embodiment. As shown in fig. 1, the framework used in the method mainly comprises three parts, namely a reconstruction model, a prediction model and an abnormality evaluation layer. Wherein the reconstruction model may be a variational self-encoder VAE, including an encoder and a decoder; the predictive model may be composed of a self-attention model, a graph attention network GAT, and a fully connected neural network.

In the training stage, sample multivariable time sequence data for training are respectively input into two models, reconstruction errors and prediction errors of the two models are respectively calculated, then a combined training mode is adopted, the reconstruction errors and the prediction errors are minimized, and parameters of the two models are adjusted.

In the prediction stage, multivariate time sequence data are respectively input into two models, the two models can respectively give a predicted value of each sensor to the next time point, and the anomaly evaluation layer aggregates the predicted values of the sensors corresponding to the two models to obtain a more accurate overall predicted value of each sensor. And then comparing the overall predicted value with the true value to judge whether the time point is abnormal or not.

The following description will proceed with reference being made to the drawings, which are not intended to limit the scope of embodiments of the invention.

FIG. 2 is a flow chart of a method for determining anomalies based on a combination of reconstruction and prediction in accordance with an embodiment of the present invention. As shown in fig. 2, the method at least includes: step 201, acquiring multi-variable time sequence data, wherein the time sequence data is generated by a plurality of sensors of equipment in the semiconductor manufacturing process; step 202, inputting a first target subsequence corresponding to a first time window in the time sequence data into a reconstruction model to obtain reconstruction data of the first time window, wherein a cut-off time point of the first time window is an n+1th time point; step 203, inputting the second target subsequence of the time sequence data up to the nth time point into a prediction model to obtain prediction data of the (n+1) th time point; step 204, aggregating the target data corresponding to the n+1th time point in the reconstructed data with the predicted data to obtain total predicted data for the n+1th time point; step 205, comparing the total predicted data with the measured data corresponding to the time series data at the n+1th time point, and determining whether the n+1th time point is abnormal.

At step 201, multivariate time series data is obtained, the time series data being data generated by a plurality of sensors of an apparatus in a semiconductor manufacturing process.

The time series data is data obtained after raw data generated by a sensor of equipment in the semiconductor manufacturing process is processed by an FDC system. A semiconductor manufacturing tool is provided with a plurality of sensors, each of which monitors a parameter, such as temperature, humidity, voltage, current, pressure, etc., on the tool. After the values of any one of the parameters output by the sensor over a period of time are processed by the FDC system, a set of univariate time series data is obtained. After integrating the time series data of all the sensors, the multi-variable time series data can be obtained.

The multivariate time series data can be expressed as d= { X ¹ ，X ² ，...，X ⁿ }，X ⁱ ∈R ^k Representing the value corresponding to the ith time point of the k sensors, X ⁱ Is a k-dimensional vector, where k is the number of sensors, i=1, 2,..n.

In step 202, a first target subsequence corresponding to a first time window in the time sequence data is input into a reconstruction model, so as to obtain reconstruction data of the first time window, wherein a cut-off time point of the first time window is an n+1th time point.

Specifically, a first time window with a length ω is preset, and the first target subsequence cut to the (n+1) th time point may be represented as S ¹ ＝{X ^n-ω+2 ，X ^n-ω+3 ，...，X ⁿ⁺¹ }. The first target subsequence S ¹ Inputting into a reconstruction model to obtain reconstruction data for the first time window

Reconstruction data R ¹ Each data of (a) is a first target subsequence S ¹ Reconstruction data of the corresponding data in the database. For example, a->

Is X ^n-ω+2 Corresponding reconstruction data,/->

Is X ⁿ⁺¹ Corresponding reconstruction data.

In one embodiment, the reconstruction model is a variational self-encoder VAE. At this time, step 202 specifically includes: encoding S the first target subsequence using an encoder of a variant self-encoder VAE ¹ Obtaining the hidden space variable sequence V ¹ The method comprises the steps of carrying out a first treatment on the surface of the Decoder using variable self-encoder VAE for the sequence of hidden space variables V ¹ Decoding to obtain the reconstruction data R ¹ 。

In step 203, a second target subsequence of the time series data up to the nth time point is input into a prediction model to obtain prediction data for the (n+1) th time point

Specifically, the second target subsequence up to the nth point in time may be represented as S ² ，S ² Start data of (c) may be equal to S ¹ Identical or different, provided S ² The last data of (2) is X ⁿ And (3) obtaining the product. The second target subsequence S ² Inputting into the prediction model to obtain predicted data for the (n+1) th time point

In one embodiment, the predictive models include a self-attention model, a graph attention network GAT, and a fully connected neural network. At this time, step 203 specifically includes: using a self-attention model for said second target subsequence S ² Coding based on a self-attention mechanism in a time dimension to obtain a coding sequence; inputting the second target subsequence into a graph annotation force network GAT, and determining graph relation information among the plurality of sensors, wherein any one sensor corresponds to one node in the graph; splicing the coding sequence and the graph relation information, and inputting the spliced coding sequence and the graph relation information into the fully-connected neural network to obtain the predicted data

In a more specific embodiment, the self-attention model is a self-attention module of a transducer model.

Step 204, the target data corresponding to the n+1th time point in the reconstructed data is processed

And the prediction data

Polymerization is carried out to obtain total predicted data for the n+1th time point +.>

For a pair of

And->

The aggregation of (c) may be performed in a variety of ways, for example, arithmetic means, geometric means, or direct summation, without limitation.

Step 205, the total prediction data

Measured data X corresponding to the time series data at the n+1th time point ⁿ⁺¹ A comparison is made to determine whether an abnormality occurs at the n+1th time point.

In one embodiment, the total predicted data is calculated

And the measured data X ⁿ⁺¹ According to the comparison result of the error and a preset first threshold value, determining whether the n+1th time point is abnormal or not.

There are various methods for calculating the error, for example, using the root mean square error RMSE, the mean square error MSE, or the mean absolute error MAE, which are not limited herein.

In some possible embodiments, the method further comprises: and 206, when the n+1th time point is abnormal, determining an abnormal sensor and a corresponding abnormal type according to the total predicted data and the actually measured data.

In one embodiment, at the n+1th time point, any target sensor of the plurality of sensors is calculated at the total predicted data

And the measured data X ⁿ⁺¹ Error between corresponding data, i.e. +.>

And X ⁿ⁺¹ And determining whether the target sensor is abnormal or not and the corresponding abnormality type according to the comparison result of the error and the preset threshold value corresponding to the target sensor. For example, when the error is greater than the threshold, it is determined that the abnormality occurs in the target sensor at the n+1th time point, and at the same time, the corresponding abnormality type is determined according to the type of the machine parameter detected by the target sensor.

The preset threshold corresponding to the target sensor can be set by an engineer according to actual conditions or experience, and can also be calculated by some data models according to historical data. Different thresholds may be set for different sensors according to actual conditions.

Since the error described in step 206 is an error between scalars, the difference between the two may be directly calculated or the absolute value of the difference may be calculated when calculating the error, which is not limited herein.

In some possible embodiments, the method further comprises: step 207, determining corresponding knowledge points according to the anomaly sensor, the anomaly type, the equipment number of the equipment and the number of the wafer being processed by the equipment, wherein the knowledge points are used for generating or updating a knowledge graph in the semiconductor field.

In some embodiments, since at a certain point in time, there may be multiple anomaly types at the same time, an anomaly occurs for multiple target sensors, at this point in time, the knowledge point may be in the form of a multi-tuple, specifically, in the form of (anomaly type 1, … anomaly type m, equipment number, wafer number). For example, a specific knowledge point may be (too high temperature, too high pressure, platen 2, wafer 3), wafer representing a wafer.

The steps included in fig. 2 are steps for anomaly detection of data using a trained model; the method steps of training the model are shown in fig. 3.

FIG. 3 is a flow chart of a method of training a reconstruction model and a predictive model in accordance with an embodiment of the present invention. As shown in fig. 3, the reconstruction model and the prediction model are obtained through training of sample time series data, and the training includes: step 310, segmenting the data sequence in the sample time sequence data into a plurality of sequence fragments with the same size; step 320, performing multiple rounds of training on the reconstruction model and the prediction model by using the plurality of sequence segments, wherein each round of training uses one sample sequence segment of the plurality of sequence segments, and any round of training includes: step 321, inputting a third target subsequence corresponding to a third time window in the sample sequence segment into a reconstruction model to obtain reconstruction data of the third time window, wherein a cut-off time point of the third time window is an (m+1) th time point; step 322, inputting the fourth target subsequence of the sample sequence segment up to the mth time point into a prediction model to obtain prediction data for the (m+1) th time point; step 323, determining a reconstruction error according to the reconstruction data and the data corresponding to the sample sequence segment at the (m+1) th time point; step 324, determining a prediction error according to the prediction data and the data corresponding to the sample sequence segment at the (m+1) th time point; step 325, combining the reconstruction error with the prediction error to obtain a total error, and adjusting the values of the parameters in the reconstruction model and the prediction model by minimizing the total error.

The sample timing data used to train the model can be expressed as

Representing the value of k sensors corresponding to the ith time point,/for each sensor>

Is a k-dimensional vector, where k is the number of sensors, i=1, 2, …, n. No abnormal data exists in the sample time sequence data used for training the model.

In step 310, the data sequence in the sample time sequence data is segmented into a plurality of sequence segments with the same size, denoted as D ¹ ，D ² ，...，D ^m 。

At step 320, the reconstruction model and the predictive model are trained in multiple rounds using the number of sequence segments, each round of training using one sequence segment D of the number of sequence segments ⁱ Any one of the training steps 321 to 325 is included.

In step 321, a third target subsequence corresponding to a third time window in the sample sequence segment is input into a reconstruction model, so as to obtain reconstruction data of the third time window, wherein a cut-off time point of the third time window is an m+1th time point.

Specifically, if a first time window with a length μ is preset, the third target subsequence cut to the (m+1) th time point may be denoted as S ³ ＝{X ^m-μ+2 ，X ^m-μ+3 ，...，X ^m+1 }. The third target subsequence S ³ Inputting into a reconstruction model to obtain reconstruction data for the third time window

Reconstruction data R ³ Is a third target subsequence S ³ Reconstruction data of the corresponding data in the database.

In one embodiment, the reconstruction model is a variational self-encoder VAE. At this time, the implementation method of step 321 may refer to step 202, which will not be described in detail herein.

At step 322, a fourth target subsequence S in the sample sequence segment up to the mth time point ⁴ Inputting into a prediction model to obtain prediction data for the (m+1) th time point

In one embodiment, the predictive models include a self-attention model, a graph attention network GAT, and a fully connected neural network. At this time, the implementation method in step 322 may refer to step 203, which will not be described in detail herein.

In step 323, a reconstruction error is determined according to the reconstruction data and the data corresponding to the sample sequence segment at the m+1th time point.

In particular according to

And X ^m+1 Determining a reconstruction error loss _re 。

In step 324, a prediction error is determined according to the prediction data and the data corresponding to the sample sequence segment at the m+1th time point.

In particular according to

And X ^m+1 Determining a prediction error loss _pr 。

In step 325, the reconstruction error grass _re And the prediction error loss _pr Is combined to obtain the total error mass _total Values of parameters in the reconstruction model and the prediction model are adjusted by minimizing the total error.

There are several ways to combine the reconstruction error with the prediction error to get the total error, for example: adding the reconstruction error and the prediction error to obtain a total error; or average the reconstruction error and the prediction error to obtain a total error; or solving the maximum value of the reconstruction error and the prediction error to obtain a total error; or the minimum value is calculated for the reconstruction error and the prediction error, so as to obtain the total error. The description is not intended to be limiting.

The above calculation of the error may be performed by various methods, for example, using a root mean square error RMSE, a mean square error MSE, or a mean absolute error MAE, which is not limited herein.

In using sequence fragment D ¹ ,D ² ,…,D ^m After the reconstruction model and the prediction model are trained for a plurality of times, a trained reconstruction model and prediction model can be obtained, and can be used for anomaly detection in the related step shown in fig. 2.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, and the program may be stored in a computer readable storage medium, where the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A method of determining anomalies based on a combination of reconstruction and prediction, comprising:

2. The method as recited in claim 1, further comprising:

3. The method as recited in claim 2, further comprising:

4. The method of claim 1, wherein comparing the total predicted data with measured data corresponding to the time series data at an n+1th time point to determine whether an abnormality occurs at the n+1th time point, comprises:

5. The method of claim 2, wherein determining an anomaly sensor and a corresponding anomaly type from the total predicted data and the measured data comprises:

6. The method of claim 1, wherein the reconstruction model is a variational self-encoder VAE; inputting a first target subsequence corresponding to a first time window in the time sequence data into a reconstruction model to obtain reconstruction data of the first time window, wherein the method comprises the following steps:

7. The method of claim 1, wherein the predictive models include a self-attention model, a graph attention network GAT, and a fully connected neural network; inputting a second target subsequence of the time sequence data, which is cut to an nth time point, into a prediction model to obtain prediction data of an (n+1) th time point, wherein the method comprises the following steps of:

8. The method of claim 1, wherein the reconstruction model and the predictive model are trained from sample timing data, the training comprising:

9. The method of claim 8, wherein combining the reconstruction error with a prediction error yields a total error, comprising:

10. The method according to claim 4 or 8, characterized in that the type of error comprises at least: root mean square error RMSE, mean square error MSE and mean absolute error MAE.