CN113221932A - Batch data synchronization method with unequal intermittent process lengths based on kernel dynamic time warping - Google Patents

Batch data synchronization method with unequal intermittent process lengths based on kernel dynamic time warping Download PDF

Info

Publication number
CN113221932A
CN113221932A CN202011571610.9A CN202011571610A CN113221932A CN 113221932 A CN113221932 A CN 113221932A CN 202011571610 A CN202011571610 A CN 202011571610A CN 113221932 A CN113221932 A CN 113221932A
Authority
CN
China
Prior art keywords
batch
data
synchronization
test
kernel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011571610.9A
Other languages
Chinese (zh)
Inventor
王建林
邱科鹏
周新杰
王汝童
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Chemical Technology
Original Assignee
Beijing University of Chemical Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Chemical Technology filed Critical Beijing University of Chemical Technology
Priority to CN202011571610.9A priority Critical patent/CN113221932A/en
Publication of CN113221932A publication Critical patent/CN113221932A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • G06F18/21355Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis nonlinear criteria, e.g. embedding a manifold in a Euclidean space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Abstract

The invention discloses a method for synchronizing batch data with unequal length in an intermittent process based on kernel dynamic time warping, and belongs to the field of intermittent process data processing. Firstly, projecting unequal-length batch data to a high-dimensional feature space by a kernel method; then, constructing a synchronization performance evaluation index, and acquiring a synchronization path of the optimal kernel parameter; and finally, synchronizing the data of the different-length batches by utilizing dynamic time warping. The method comprehensively considers the path deviation degree and the feature retention degree, improves the accuracy of the synchronization result of the data of the unequal-length batches, and provides process data with consistency for the modeling of the intermittent process.

Description

Batch data synchronization method with unequal intermittent process lengths based on kernel dynamic time warping
Technical Field
The invention belongs to the field of intermittent process data processing, and particularly relates to a Kernel Dynamic Time Warping (KDTW) based method for synchronizing data of unequal batches in an intermittent process.
Background
The intermittent process is a batch production process, however, due to the change of the operating conditions and the quality of raw materials, the intermittent production cannot be completely repeated, so that the data lengths of different batches are inconsistent, the data are represented as data of different-length batches, the characteristics of irregular data are provided, the requirement of the consistency of the modeling data length of the data-driven intermittent process is difficult to meet, and the accuracy of the data-driven intermittent process model is seriously influenced. Therefore, the problem of data of batches with unequal lengths in the intermittent process is solved, and consistent data can be provided for modeling the intermittent process. Dynamic Time Warping (DTW) is widely applied to synchronization of batch data with unequal lengths in an intermittent process, however, in the DTW method, batch data similarity measurement based on euclidean distance cannot accurately reflect the non-linear characteristic and high-dimensional characteristic of the batch data, and a similarity calculation result with a large deviation is easily generated, so that the accuracy of a synchronization result of the batch data with unequal lengths is low.
Therefore, the invention provides a batch data synchronization method with unequal length in an intermittent process based on Kernel Dynamic Time Warping (KDTW), which comprises the steps of firstly projecting the batch data with unequal length to a high-dimensional feature space through a kernel method; then, constructing a Synchronization Performance Combination Index (SPCI), and acquiring a Synchronization path of the optimal core parameter; and finally, synchronizing the data of the different-length batches by utilizing dynamic time warping. The method comprehensively considers the path deviation degree and the feature retention degree, improves the accuracy of the synchronization result of the data of the unequal-length batches, and provides process data with consistency for the modeling of the intermittent process.
Disclosure of Invention
The invention provides a KDTW-based method for synchronizing data of different-length batches in an intermittent process, aiming at improving the accuracy of the synchronization result of the data of different-length batches in the intermittent process, and the method comprises the following steps:
the method comprises the following steps: collecting multi-batch process data of an intermittent process, constructing an intermittent process batch data set, and carrying out standardized processing on the batch data set;
step two: selecting a reference batch based on a Max Principal Similarity Criterion (MPSC), and dividing a batch data set of an intermittent process into a reference batch and a test batch set;
step three: setting a nuclear parameter range according to the data length of the reference batch and the test batch set;
step four: acquiring a test batch set synchronization result under different nuclear parameter settings by using the KDTW method;
step five: determining the optimal nuclear parameters of the proposed KDTW method by using the proposed SPCI;
step six: and under the setting of the optimal core parameters, synchronizing the test batch set by using a KDTW method and the reference batch to obtain an optimal synchronization result of the test batch set.
The first step specifically comprises:
suppose that
Figure RE-GDA0003113999940000021
Is a batch data set for a batch process, where I is the number of batches. Due to differences in units and amplitudes of different process variables, the batch data sets need to be normalized, i.e., centered and dimensionally normalized for each process variable. Assume a normalized batch dataset for a batch process
Figure RE-GDA0003113999940000022
The second step specifically comprises:
order to
Figure RE-GDA0003113999940000023
As batch data
Figure RE-GDA0003113999940000024
And batch data
Figure RE-GDA0003113999940000025
The similarity of (A) is calculated by the formula
Figure RE-GDA0003113999940000026
In the formula (I), the compound is shown in the specification,
Figure RE-GDA0003113999940000027
and
Figure RE-GDA0003113999940000028
are respectively as
Figure RE-GDA0003113999940000029
And
Figure RE-GDA00031139999400000210
the h-th covariance matrix eigenvalue of (a) is the number of principal components,
Figure RE-GDA00031139999400000211
and
Figure RE-GDA00031139999400000212
are respectively as
Figure RE-GDA00031139999400000213
And
Figure RE-GDA00031139999400000214
the weighted load matrix of (2) is calculated by:
Figure RE-GDA00031139999400000215
Figure RE-GDA00031139999400000216
on the basis of this, the sum of the similarity is calculated, i.e.
Figure RE-GDA00031139999400000217
Selecting the batch with the largest sum of the similarity as a reference batch R ═ { R ═ R1,r2,…,rrI.e. that
Figure RE-GDA00031139999400000218
In the formula, r is the data length of the reference batch. Regarding all the lot data except the reference lot R as lot data, a test lot set is constructed as
Figure RE-GDA00031139999400000219
The third step specifically comprises:
setting the value interval of the kernel width parameter s of the Gaussian kernel function as [1, smax]Wherein s ismaxIs calculated as
Figure RE-GDA0003113999940000031
In the formula, tiIs the ith test batch TiThe data length of (c).
The fourth step specifically comprises:
let T ═ T1,t2,…,ttIs a test batch set
Figure RE-GDA0003113999940000032
T is the data length of the test batch, first, the local similarity between the batch data R and T in the high-dimensional feature space, i.e. the local similarity between the batch data R and T in the high-dimensional feature space is calculated based on the Gaussian kernel function
Figure RE-GDA0003113999940000033
In which s is ∈ [1, s ∈ >max]Is a nuclear parameter. Further processing the formula (7) into
Figure RE-GDA0003113999940000034
Then, the cumulative distance is calculated:
Figure RE-GDA0003113999940000035
finally, an optimal synchronization path p is calculated by minimizing the cumulative distance { p ═ p1,p2,…,ph,…,pHWhere H is the number of steps of the synchronization path, i.e.
Figure RE-GDA0003113999940000036
And performing point-to-point matching on the test batch T by using the optimal synchronization path p, so that the data lengths of the test batch T and the reference batch R are consistent. On the basis, all the test batches except the batch data T in the test batch set are synchronized by the formulas (7) to (10), and the synchronization result set of all the test batches is
Figure RE-GDA0003113999940000041
Wherein
Figure RE-GDA0003113999940000042
For testing batch TiThe synchronization result of (2).
The fifth step specifically comprises:
based on a set of synchronization results as
Figure RE-GDA0003113999940000043
For test batch TiResult of synchronization of
Figure RE-GDA0003113999940000044
The path deviation ratio Z and the feature retention degree Q are respectively defined as:
Figure RE-GDA0003113999940000045
Figure RE-GDA0003113999940000046
in the formula (I), the compound is shown in the specification,
Figure RE-GDA0003113999940000047
is the total number of steps of the synchronization path,
Figure RE-GDA0003113999940000048
and
Figure RE-GDA0003113999940000049
are respectively the result of synchronization
Figure RE-GDA00031139999400000410
And reference batch R, test batch TiThe similarity of (c). On the basis, the average path deviation rate of the synchronization results of all the test batches is calculated respectively
Figure RE-GDA00031139999400000411
And average degree of feature retention
Figure RE-GDA00031139999400000412
Namely, it is
Figure RE-GDA00031139999400000413
Figure RE-GDA00031139999400000414
In the value range [1, smax]In-core parameter s, synchronization performance evaluation index SPCI corresponding to each value of core parameter ssIs calculated as
Figure RE-GDA00031139999400000415
Figure RE-GDA00031139999400000416
In the formula (I), the compound is shown in the specification,
Figure RE-GDA00031139999400000417
is the set of mean path deviation ratios corresponding to different kernel parameters,
Figure RE-GDA00031139999400000418
is the set of average feature retention degrees corresponding to different kernel parameters. Selecting an optimal kernel parameter s by maximizing a synchronicity performance evaluation index*I.e. by
Figure RE-GDA00031139999400000419
The sixth step specifically includes:
at the optimum kernel parameter s*Under the setting, all the test batches in the test batch set are synchronized by using the formulas (7) to (10), and the synchronization result set of all the test batches is
Figure RE-GDA00031139999400000420
Wherein
Figure RE-GDA0003113999940000051
For testing batch TiTo the optimal synchronization result. The synchronized batch data sets with different length in the intermittent process are
Figure RE-GDA0003113999940000052
The invention has the advantages that: aiming at the problem of unequal-length batch data in the intermittent process, the intermittent process data are mapped to a high-dimensional feature space from an original feature space by using a kernel method, the optimal kernel parameter is obtained by using the provided SPCI, meanwhile, the path deviation degree and the feature retention degree are considered, the accuracy of the synchronization result of the unequal-length batch data in the intermittent process is improved, and the process data with consistency is provided for modeling of the intermittent process.
Drawings
Fig. 1 is a flowchart of a KDTW-based batch data synchronization method for batch processing with unequal lengths in an intermittent process according to the present invention;
FIG. 2 is a plot of process variables for all batches of data: (a) the process variables v1-v15 are (o) respectively;
FIG. 3 is a graph of the sum of the similarity of each batch of data to other batches of data;
FIG. 4 is a process variable trace plot of reference batch data: (a) the process variables v1-v15 are (o) respectively;
FIG. 5 is a diagram of the selection of optimal kernel parameters;
fig. 6 is a diagram of the synchronization result of data of different-length batches in a DTW-based batch process: (a) a synchronization path, (b) a reference lot, (c) a test lot, and (d) a synchronization result;
fig. 7 is a diagram of batch data synchronization results of unequal length batch based KDTW batch process: (a) a synchronization path, (b) a reference lot, (c) a test lot, and (d) a synchronization result;
fig. 8 is a comparison graph of SPCI for the results of synchronization of all unequal length batches of data based on DTW and KDTW, respectively.
Detailed Description
The present invention is further described with reference to the following examples and the accompanying drawings, which are not intended to limit the scope of the invention as claimed.
Examples
Semiconductor etching processes are typically batch processes that are an important component of semiconductor manufacturing processes. A process data set from texas instruments LAM9600TCP metal etcher, usa, was used, which contained 107 batches of data of unequal length, with part of the process variables for each batch as depicted in table 1.
TABLE 1 semiconductor etch Process variables
Figure RE-GDA0003113999940000053
Figure RE-GDA0003113999940000061
The flow of applying the present invention to a semiconductor etching process is shown in fig. 1, and the specific steps are as follows:
the method comprises the following steps: the collected process data is
Figure RE-GDA0003113999940000062
Normalized process variable data is shown in FIG. 2;
step two: reference batch selection based on the MPSC is carried out by the formulas (1) to (5), and the selection result of the reference batch and the corresponding process variable data are respectively shown in FIG. 3 and FIG. 4;
step three: firstly, obtaining the value range of a kernel function as [1,9] by using the formula (6), and then obtaining the optimal kernel parameter of the KDTW method by using the formulas (7) to (17), wherein the value of the optimal kernel parameter is 7 as shown in FIG. 5;
step four: setting the kernel parameter to be 7, and realizing synchronization of all test batches based on KDTW by using equations (7) to (10).
Comparing the method of the present invention with the DTW-based batch data synchronization method with unequal length in the intermittent process, fig. 6 and 7 are the results of the TCP top power process variable data synchronization of the test batch 1, respectively. As can be seen from fig. 6 (a), the total number of steps of the synchronization path based on DTW is large, and data sample points of consecutive batches appear in the vertical direction, resulting in the loss of part of the original data characteristics of the synchronization result, as shown by the oval labels in fig. 6 (b); compared with DTW, as can be seen from fig. 7, the synchronization path total step number of KDTW is small, and the synchronization result retains most of the original data characteristics. In combination with the SPCI comparison chart of the synchronization results of all the unequal length batches based on DTW and KDTW, respectively, as shown in fig. 8, it can be seen that the synchronization results of the unequal length batches based on the method of the present invention are more accurate.

Claims (1)

1. A batch data synchronization method with unequal length in an intermittent process based on kernel dynamic time warping is characterized in that: the method comprises the following steps:
the method comprises the following steps: multiple batch process with collection of batch processesData of
Figure FDA0002862898580000011
Wherein I is the number of batches, each process variable is subjected to centralization and dimensional normalization, and a normalized batch data set of the intermittent process is assumed to be
Figure FDA0002862898580000012
Step two: order to
Figure FDA0002862898580000013
As batch data
Figure FDA0002862898580000014
And batch data
Figure FDA0002862898580000015
The similarity of (A) is calculated by the formula
Figure FDA0002862898580000016
In the formula (I), the compound is shown in the specification,
Figure FDA0002862898580000017
and
Figure FDA0002862898580000018
are respectively as
Figure FDA0002862898580000019
And
Figure FDA00028628985800000110
the h-th covariance matrix eigenvalue of (a) is the number of principal components,
Figure FDA00028628985800000111
and
Figure FDA00028628985800000112
are respectively as
Figure FDA00028628985800000113
And
Figure FDA00028628985800000114
the weighted load matrix of (2) is calculated by:
Figure FDA00028628985800000115
Figure FDA00028628985800000116
on the basis of this, the sum of the similarity is calculated, i.e.
Figure FDA00028628985800000117
Selecting the batch with the largest sum of the similarity as a reference batch R ═ { R ═ R1,r2,…,rrI.e. that
Figure FDA00028628985800000118
Wherein r is the data length of the reference batch; regarding all the lot data except the reference lot R as lot data, a test lot set is constructed as
Figure FDA00028628985800000119
Step three: setting the value interval of the kernel width parameter s of the Gaussian kernel function as [1, smax]Wherein s ismaxIs calculated as
Figure FDA00028628985800000120
In the formula, tiIs the ith test batch TiThe data length of (d);
step four: let T ═ T1,t2,…,ttIs a test batch set
Figure FDA00028628985800000121
T is the data length of the test batch, first, the local similarity between the batch data R and T in the high-dimensional feature space, i.e. the local similarity between the batch data R and T in the high-dimensional feature space is calculated based on the Gaussian kernel function
Figure FDA0002862898580000021
In which s is ∈ [1, s ∈ >max]Is a nuclear parameter; further processing the formula (7) into
Figure FDA0002862898580000022
Then, the cumulative distance is calculated:
Figure FDA0002862898580000023
finally, an optimal synchronization path p is calculated by minimizing the cumulative distance { p ═ p1,p2,…,ph,…,pHWhere H is the number of steps of the synchronization path, i.e.
Figure FDA0002862898580000024
Performing point-to-point matching on the test batch T by using the optimal synchronization path p, so that the test batchThe data length of the secondary T and the reference batch R is consistent; on the basis, all the test batches except the batch data T in the test batch set are synchronized by the formulas (7) to (10), and the synchronization result set of all the test batches is
Figure FDA0002862898580000025
Wherein
Figure FDA0002862898580000026
For testing batch TiThe synchronization result of (2);
step five: based on a set of synchronization results as
Figure FDA0002862898580000027
For test batch TiSynchronization result of (1)iThe path deviation ratio Z and the feature retention degree Q are respectively defined as:
Figure FDA0002862898580000028
Figure FDA0002862898580000029
in the formula (I), the compound is shown in the specification,
Figure FDA00028628985800000210
is the total number of steps of the synchronization path,
Figure FDA00028628985800000211
and
Figure FDA00028628985800000212
are respectively the result of synchronization
Figure FDA00028628985800000213
And reference batch R, test batch TiIs likeDegree; on the basis, the average path deviation rate of the synchronization results of all the test batches is calculated respectively
Figure FDA00028628985800000214
And average degree of feature retention
Figure FDA00028628985800000215
Namely, it is
Figure FDA0002862898580000031
Figure FDA0002862898580000032
In the value range [1, smax]In-core parameter s, synchronization performance evaluation index SPCI corresponding to each value of core parameter ssIs calculated as
Figure FDA0002862898580000033
Figure FDA0002862898580000034
In the formula (I), the compound is shown in the specification,
Figure FDA0002862898580000035
is the set of mean path deviation ratios corresponding to different kernel parameters,
Figure FDA0002862898580000036
the average feature retention degree sets corresponding to different kernel parameters; selecting an optimal kernel parameter s by maximizing a synchronicity performance evaluation index*I.e. by
Figure FDA0002862898580000037
Step six: at the optimum kernel parameter s*Under the setting, all the test batches in the test batch set are synchronized by using the formulas (7) to (10), and the synchronization result set of all the test batches is
Figure FDA0002862898580000038
Wherein
Figure FDA0002862898580000039
For testing batch TiThe optimal synchronization result of (1); the synchronized batch data sets with different length in the intermittent process are
Figure FDA00028628985800000310
CN202011571610.9A 2020-12-27 2020-12-27 Batch data synchronization method with unequal intermittent process lengths based on kernel dynamic time warping Pending CN113221932A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011571610.9A CN113221932A (en) 2020-12-27 2020-12-27 Batch data synchronization method with unequal intermittent process lengths based on kernel dynamic time warping

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011571610.9A CN113221932A (en) 2020-12-27 2020-12-27 Batch data synchronization method with unequal intermittent process lengths based on kernel dynamic time warping

Publications (1)

Publication Number Publication Date
CN113221932A true CN113221932A (en) 2021-08-06

Family

ID=77085843

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011571610.9A Pending CN113221932A (en) 2020-12-27 2020-12-27 Batch data synchronization method with unequal intermittent process lengths based on kernel dynamic time warping

Country Status (1)

Country Link
CN (1) CN113221932A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070288101A1 (en) * 2006-06-08 2007-12-13 Liu Hugh H T Method, system and computer program for generic synchronized motion control for multiple dynamic systems
CN106354889A (en) * 2016-11-07 2017-01-25 北京化工大学 Batch process unequal-length time period synchronization method based on LWPT-DTW (lifting wavelet package transform-dynamic time warping)
CN106990768A (en) * 2017-05-21 2017-07-28 北京工业大学 MKPCA batch process fault monitoring methods based on Limited DTW

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070288101A1 (en) * 2006-06-08 2007-12-13 Liu Hugh H T Method, system and computer program for generic synchronized motion control for multiple dynamic systems
CN106354889A (en) * 2016-11-07 2017-01-25 北京化工大学 Batch process unequal-length time period synchronization method based on LWPT-DTW (lifting wavelet package transform-dynamic time warping)
CN106990768A (en) * 2017-05-21 2017-07-28 北京工业大学 MKPCA batch process fault monitoring methods based on Limited DTW

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
邱科鹏: "非规则数据下的间歇过程软测量建模方法研究", 中国博士论文全文数据库》 *

Similar Documents

Publication Publication Date Title
CN101458522A (en) Multi-behavior process monitoring method based on pivot analysis and vectorial data description support
CN111638707B (en) Intermittent process fault monitoring method based on SOM clustering and MPCA
US7313454B2 (en) Method and apparatus for classifying manufacturing outputs
CN104834923B (en) Fingerprint image method for registering based on global information
CN116167640B (en) LCP film production quality detection data analysis method and system
CN109670687B (en) Quality analysis method based on particle swarm optimization support vector machine
CN108241925A (en) A kind of discrete manufacture mechanical product quality source tracing method based on outlier detection
CN114700587B (en) Missing welding defect real-time detection method and system based on fuzzy inference and edge calculation
CN103218837B (en) The histogrammic method for drafting of a kind of unequal interval based on empirical distribution function
CN108537249B (en) Industrial process data clustering method for density peak clustering
CN116821832A (en) Abnormal data identification and correction method for high-voltage industrial and commercial user power load
CN113935535A (en) Principal component analysis method for medium-and-long-term prediction model
CN111783336A (en) Uncertain structure frequency response dynamic model correction method based on deep learning theory
CN113221932A (en) Batch data synchronization method with unequal intermittent process lengths based on kernel dynamic time warping
CN116740053B (en) Management system of intelligent forging processing production line
CN112200252A (en) Joint dimension reduction method based on probability box global sensitivity analysis and active subspace
CN110020680B (en) PMU data classification method based on random matrix theory and fuzzy C-means clustering algorithm
CN111599348A (en) Automatic segmentation method and system for machine tool machining process monitoring signals
CN109212751A (en) A kind of analysis method of free form surface tolerance
CN114580503A (en) DP-SVM-based large-scale instrument man-hour calculation method
CN113190728A (en) Oil-immersed transformer fault diagnosis method based on cluster optimization
CN117708691B (en) Intermittent process monitoring method, storage medium and computer equipment
CN112183569A (en) FDA and SOM based intermittent industrial process reaction phase clustering and fault classification visualization
CN113051810B (en) Space division process PWA model identification method based on constrained grid hierarchical clustering
CN117828464B (en) Fan fault diagnosis method and diagnosis module based on local linear embedding algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination