CN113221932A

CN113221932A - Batch data synchronization method with unequal intermittent process lengths based on kernel dynamic time warping

Info

Publication number: CN113221932A
Application number: CN202011571610.9A
Authority: CN
Inventors: 王建林; 邱科鹏; 周新杰; 王汝童
Original assignee: Beijing University of Chemical Technology
Current assignee: Beijing University of Chemical Technology
Priority date: 2020-12-27
Filing date: 2020-12-27
Publication date: 2021-08-06

Abstract

The invention discloses a method for synchronizing batch data with unequal length in an intermittent process based on kernel dynamic time warping, and belongs to the field of intermittent process data processing. Firstly, projecting unequal-length batch data to a high-dimensional feature space by a kernel method; then, constructing a synchronization performance evaluation index, and acquiring a synchronization path of the optimal kernel parameter; and finally, synchronizing the data of the different-length batches by utilizing dynamic time warping. The method comprehensively considers the path deviation degree and the feature retention degree, improves the accuracy of the synchronization result of the data of the unequal-length batches, and provides process data with consistency for the modeling of the intermittent process.

Description

Batch data synchronization method with unequal intermittent process lengths based on kernel dynamic time warping

Technical Field

The invention belongs to the field of intermittent process data processing, and particularly relates to a Kernel Dynamic Time Warping (KDTW) based method for synchronizing data of unequal batches in an intermittent process.

Background

The intermittent process is a batch production process, however, due to the change of the operating conditions and the quality of raw materials, the intermittent production cannot be completely repeated, so that the data lengths of different batches are inconsistent, the data are represented as data of different-length batches, the characteristics of irregular data are provided, the requirement of the consistency of the modeling data length of the data-driven intermittent process is difficult to meet, and the accuracy of the data-driven intermittent process model is seriously influenced. Therefore, the problem of data of batches with unequal lengths in the intermittent process is solved, and consistent data can be provided for modeling the intermittent process. Dynamic Time Warping (DTW) is widely applied to synchronization of batch data with unequal lengths in an intermittent process, however, in the DTW method, batch data similarity measurement based on euclidean distance cannot accurately reflect the non-linear characteristic and high-dimensional characteristic of the batch data, and a similarity calculation result with a large deviation is easily generated, so that the accuracy of a synchronization result of the batch data with unequal lengths is low.

Therefore, the invention provides a batch data synchronization method with unequal length in an intermittent process based on Kernel Dynamic Time Warping (KDTW), which comprises the steps of firstly projecting the batch data with unequal length to a high-dimensional feature space through a kernel method; then, constructing a Synchronization Performance Combination Index (SPCI), and acquiring a Synchronization path of the optimal core parameter; and finally, synchronizing the data of the different-length batches by utilizing dynamic time warping. The method comprehensively considers the path deviation degree and the feature retention degree, improves the accuracy of the synchronization result of the data of the unequal-length batches, and provides process data with consistency for the modeling of the intermittent process.

Disclosure of Invention

The invention provides a KDTW-based method for synchronizing data of different-length batches in an intermittent process, aiming at improving the accuracy of the synchronization result of the data of different-length batches in the intermittent process, and the method comprises the following steps:

the method comprises the following steps: collecting multi-batch process data of an intermittent process, constructing an intermittent process batch data set, and carrying out standardized processing on the batch data set;

step two: selecting a reference batch based on a Max Principal Similarity Criterion (MPSC), and dividing a batch data set of an intermittent process into a reference batch and a test batch set;

step three: setting a nuclear parameter range according to the data length of the reference batch and the test batch set;

step four: acquiring a test batch set synchronization result under different nuclear parameter settings by using the KDTW method;

step five: determining the optimal nuclear parameters of the proposed KDTW method by using the proposed SPCI;

step six: and under the setting of the optimal core parameters, synchronizing the test batch set by using a KDTW method and the reference batch to obtain an optimal synchronization result of the test batch set.

The first step specifically comprises:

suppose that

Is a batch data set for a batch process, where I is the number of batches. Due to differences in units and amplitudes of different process variables, the batch data sets need to be normalized, i.e., centered and dimensionally normalized for each process variable. Assume a normalized batch dataset for a batch process

The second step specifically comprises:

order to

As batch data

And batch data

The similarity of (A) is calculated by the formula

In the formula (I), the compound is shown in the specification,

and

are respectively as

And

the h-th covariance matrix eigenvalue of (a) is the number of principal components,

and

are respectively as

And

the weighted load matrix of (2) is calculated by:

on the basis of this, the sum of the similarity is calculated, i.e.

Selecting the batch with the largest sum of the similarity as a reference batch R ═ { R ═ R₁,r₂,…,r_rI.e. that

In the formula, r is the data length of the reference batch. Regarding all the lot data except the reference lot R as lot data, a test lot set is constructed as

The third step specifically comprises:

setting the value interval of the kernel width parameter s of the Gaussian kernel function as [1, s_max]Wherein s is_maxIs calculated as

In the formula, t_iIs the ith test batch T_iThe data length of (c).

The fourth step specifically comprises:

let T ═ T₁,t₂,…,t_tIs a test batch set

T is the data length of the test batch, first, the local similarity between the batch data R and T in the high-dimensional feature space, i.e. the local similarity between the batch data R and T in the high-dimensional feature space is calculated based on the Gaussian kernel function

In which s is ∈ [1, s ∈ >_max]Is a nuclear parameter. Further processing the formula (7) into

Then, the cumulative distance is calculated:

finally, an optimal synchronization path p is calculated by minimizing the cumulative distance { p ═ p₁,p₂,…,p_h,…,p_HWhere H is the number of steps of the synchronization path, i.e.

And performing point-to-point matching on the test batch T by using the optimal synchronization path p, so that the data lengths of the test batch T and the reference batch R are consistent. On the basis, all the test batches except the batch data T in the test batch set are synchronized by the formulas (7) to (10), and the synchronization result set of all the test batches is

Wherein

For testing batch T_iThe synchronization result of (2).

The fifth step specifically comprises:

based on a set of synchronization results as

For test batch T_iResult of synchronization of

The path deviation ratio Z and the feature retention degree Q are respectively defined as:

in the formula (I), the compound is shown in the specification,

is the total number of steps of the synchronization path,

and

are respectively the result of synchronization

And reference batch R, test batch T_iThe similarity of (c). On the basis, the average path deviation rate of the synchronization results of all the test batches is calculated respectively

And average degree of feature retention

Namely, it is

In the value range [1, s_max]In-core parameter s, synchronization performance evaluation index SPCI corresponding to each value of core parameter s_sIs calculated as

In the formula (I), the compound is shown in the specification,

is the set of mean path deviation ratios corresponding to different kernel parameters,

is the set of average feature retention degrees corresponding to different kernel parameters. Selecting an optimal kernel parameter s by maximizing a synchronicity performance evaluation index^*I.e. by

The sixth step specifically includes:

at the optimum kernel parameter s^*Under the setting, all the test batches in the test batch set are synchronized by using the formulas (7) to (10), and the synchronization result set of all the test batches is

Wherein

For testing batch T_iTo the optimal synchronization result. The synchronized batch data sets with different length in the intermittent process are

The invention has the advantages that: aiming at the problem of unequal-length batch data in the intermittent process, the intermittent process data are mapped to a high-dimensional feature space from an original feature space by using a kernel method, the optimal kernel parameter is obtained by using the provided SPCI, meanwhile, the path deviation degree and the feature retention degree are considered, the accuracy of the synchronization result of the unequal-length batch data in the intermittent process is improved, and the process data with consistency is provided for modeling of the intermittent process.

Drawings

Fig. 1 is a flowchart of a KDTW-based batch data synchronization method for batch processing with unequal lengths in an intermittent process according to the present invention;

FIG. 2 is a plot of process variables for all batches of data: (a) the process variables v1-v15 are (o) respectively;

FIG. 3 is a graph of the sum of the similarity of each batch of data to other batches of data;

FIG. 4 is a process variable trace plot of reference batch data: (a) the process variables v1-v15 are (o) respectively;

FIG. 5 is a diagram of the selection of optimal kernel parameters;

fig. 6 is a diagram of the synchronization result of data of different-length batches in a DTW-based batch process: (a) a synchronization path, (b) a reference lot, (c) a test lot, and (d) a synchronization result;

fig. 7 is a diagram of batch data synchronization results of unequal length batch based KDTW batch process: (a) a synchronization path, (b) a reference lot, (c) a test lot, and (d) a synchronization result;

fig. 8 is a comparison graph of SPCI for the results of synchronization of all unequal length batches of data based on DTW and KDTW, respectively.

Detailed Description

The present invention is further described with reference to the following examples and the accompanying drawings, which are not intended to limit the scope of the invention as claimed.

Examples

Semiconductor etching processes are typically batch processes that are an important component of semiconductor manufacturing processes. A process data set from texas instruments LAM9600TCP metal etcher, usa, was used, which contained 107 batches of data of unequal length, with part of the process variables for each batch as depicted in table 1.

TABLE 1 semiconductor etch Process variables

The flow of applying the present invention to a semiconductor etching process is shown in fig. 1, and the specific steps are as follows:

the method comprises the following steps: the collected process data is

Normalized process variable data is shown in FIG. 2;

step two: reference batch selection based on the MPSC is carried out by the formulas (1) to (5), and the selection result of the reference batch and the corresponding process variable data are respectively shown in FIG. 3 and FIG. 4;

step three: firstly, obtaining the value range of a kernel function as [1,9] by using the formula (6), and then obtaining the optimal kernel parameter of the KDTW method by using the formulas (7) to (17), wherein the value of the optimal kernel parameter is 7 as shown in FIG. 5;

step four: setting the kernel parameter to be 7, and realizing synchronization of all test batches based on KDTW by using equations (7) to (10).

Comparing the method of the present invention with the DTW-based batch data synchronization method with unequal length in the intermittent process, fig. 6 and 7 are the results of the TCP top power process variable data synchronization of the test batch 1, respectively. As can be seen from fig. 6 (a), the total number of steps of the synchronization path based on DTW is large, and data sample points of consecutive batches appear in the vertical direction, resulting in the loss of part of the original data characteristics of the synchronization result, as shown by the oval labels in fig. 6 (b); compared with DTW, as can be seen from fig. 7, the synchronization path total step number of KDTW is small, and the synchronization result retains most of the original data characteristics. In combination with the SPCI comparison chart of the synchronization results of all the unequal length batches based on DTW and KDTW, respectively, as shown in fig. 8, it can be seen that the synchronization results of the unequal length batches based on the method of the present invention are more accurate.

Claims

1. A batch data synchronization method with unequal length in an intermittent process based on kernel dynamic time warping is characterized in that: the method comprises the following steps:

the method comprises the following steps: multiple batch process with collection of batch processesData of

Wherein I is the number of batches, each process variable is subjected to centralization and dimensional normalization, and a normalized batch data set of the intermittent process is assumed to be

Step two: order to

As batch data

And batch data

The similarity of (A) is calculated by the formula

In the formula (I), the compound is shown in the specification,

and

are respectively as

And

and

are respectively as

And

the weighted load matrix of (2) is calculated by:

on the basis of this, the sum of the similarity is calculated, i.e.

Wherein r is the data length of the reference batch; regarding all the lot data except the reference lot R as lot data, a test lot set is constructed as

Step three: setting the value interval of the kernel width parameter s of the Gaussian kernel function as [1, s_max]Wherein s is_maxIs calculated as

In the formula, t_iIs the ith test batch T_iThe data length of (d);

step four: let T ═ T₁,t₂,…,t_tIs a test batch set

In which s is ∈ [1, s ∈ >_max]Is a nuclear parameter; further processing the formula (7) into

Then, the cumulative distance is calculated:

Performing point-to-point matching on the test batch T by using the optimal synchronization path p, so that the test batchThe data length of the secondary T and the reference batch R is consistent; on the basis, all the test batches except the batch data T in the test batch set are synchronized by the formulas (7) to (10), and the synchronization result set of all the test batches is

Wherein

For testing batch T_iThe synchronization result of (2);

step five: based on a set of synchronization results as

For test batch T_iSynchronization result of (1)_iThe path deviation ratio Z and the feature retention degree Q are respectively defined as:

in the formula (I), the compound is shown in the specification,

is the total number of steps of the synchronization path,

and

are respectively the result of synchronization

And reference batch R, test batch T_iIs likeDegree; on the basis, the average path deviation rate of the synchronization results of all the test batches is calculated respectively

And average degree of feature retention

Namely, it is

In the formula (I), the compound is shown in the specification,

the average feature retention degree sets corresponding to different kernel parameters; selecting an optimal kernel parameter s by maximizing a synchronicity performance evaluation index^*I.e. by

Step six: at the optimum kernel parameter s^*Under the setting, all the test batches in the test batch set are synchronized by using the formulas (7) to (10), and the synchronization result set of all the test batches is

Wherein

For testing batch T_iThe optimal synchronization result of (1); the synchronized batch data sets with different length in the intermittent process are