CN113221932A - Batch data synchronization method with unequal intermittent process lengths based on kernel dynamic time warping - Google Patents
Batch data synchronization method with unequal intermittent process lengths based on kernel dynamic time warping Download PDFInfo
- Publication number
- CN113221932A CN113221932A CN202011571610.9A CN202011571610A CN113221932A CN 113221932 A CN113221932 A CN 113221932A CN 202011571610 A CN202011571610 A CN 202011571610A CN 113221932 A CN113221932 A CN 113221932A
- Authority
- CN
- China
- Prior art keywords
- batch
- data
- synchronization
- test
- kernel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
- G06F18/21355—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis nonlinear criteria, e.g. embedding a manifold in a Euclidean space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
Abstract
The invention discloses a method for synchronizing batch data with unequal length in an intermittent process based on kernel dynamic time warping, and belongs to the field of intermittent process data processing. Firstly, projecting unequal-length batch data to a high-dimensional feature space by a kernel method; then, constructing a synchronization performance evaluation index, and acquiring a synchronization path of the optimal kernel parameter; and finally, synchronizing the data of the different-length batches by utilizing dynamic time warping. The method comprehensively considers the path deviation degree and the feature retention degree, improves the accuracy of the synchronization result of the data of the unequal-length batches, and provides process data with consistency for the modeling of the intermittent process.
Description
Technical Field
The invention belongs to the field of intermittent process data processing, and particularly relates to a Kernel Dynamic Time Warping (KDTW) based method for synchronizing data of unequal batches in an intermittent process.
Background
The intermittent process is a batch production process, however, due to the change of the operating conditions and the quality of raw materials, the intermittent production cannot be completely repeated, so that the data lengths of different batches are inconsistent, the data are represented as data of different-length batches, the characteristics of irregular data are provided, the requirement of the consistency of the modeling data length of the data-driven intermittent process is difficult to meet, and the accuracy of the data-driven intermittent process model is seriously influenced. Therefore, the problem of data of batches with unequal lengths in the intermittent process is solved, and consistent data can be provided for modeling the intermittent process. Dynamic Time Warping (DTW) is widely applied to synchronization of batch data with unequal lengths in an intermittent process, however, in the DTW method, batch data similarity measurement based on euclidean distance cannot accurately reflect the non-linear characteristic and high-dimensional characteristic of the batch data, and a similarity calculation result with a large deviation is easily generated, so that the accuracy of a synchronization result of the batch data with unequal lengths is low.
Therefore, the invention provides a batch data synchronization method with unequal length in an intermittent process based on Kernel Dynamic Time Warping (KDTW), which comprises the steps of firstly projecting the batch data with unequal length to a high-dimensional feature space through a kernel method; then, constructing a Synchronization Performance Combination Index (SPCI), and acquiring a Synchronization path of the optimal core parameter; and finally, synchronizing the data of the different-length batches by utilizing dynamic time warping. The method comprehensively considers the path deviation degree and the feature retention degree, improves the accuracy of the synchronization result of the data of the unequal-length batches, and provides process data with consistency for the modeling of the intermittent process.
Disclosure of Invention
The invention provides a KDTW-based method for synchronizing data of different-length batches in an intermittent process, aiming at improving the accuracy of the synchronization result of the data of different-length batches in the intermittent process, and the method comprises the following steps:
the method comprises the following steps: collecting multi-batch process data of an intermittent process, constructing an intermittent process batch data set, and carrying out standardized processing on the batch data set;
step two: selecting a reference batch based on a Max Principal Similarity Criterion (MPSC), and dividing a batch data set of an intermittent process into a reference batch and a test batch set;
step three: setting a nuclear parameter range according to the data length of the reference batch and the test batch set;
step four: acquiring a test batch set synchronization result under different nuclear parameter settings by using the KDTW method;
step five: determining the optimal nuclear parameters of the proposed KDTW method by using the proposed SPCI;
step six: and under the setting of the optimal core parameters, synchronizing the test batch set by using a KDTW method and the reference batch to obtain an optimal synchronization result of the test batch set.
The first step specifically comprises:
suppose thatIs a batch data set for a batch process, where I is the number of batches. Due to differences in units and amplitudes of different process variables, the batch data sets need to be normalized, i.e., centered and dimensionally normalized for each process variable. Assume a normalized batch dataset for a batch process
The second step specifically comprises:
In the formula (I), the compound is shown in the specification,andare respectively asAndthe h-th covariance matrix eigenvalue of (a) is the number of principal components,andare respectively asAndthe weighted load matrix of (2) is calculated by:
on the basis of this, the sum of the similarity is calculated, i.e.
Selecting the batch with the largest sum of the similarity as a reference batch R ═ { R ═ R1,r2,…,rrI.e. that
In the formula, r is the data length of the reference batch. Regarding all the lot data except the reference lot R as lot data, a test lot set is constructed as
The third step specifically comprises:
setting the value interval of the kernel width parameter s of the Gaussian kernel function as [1, smax]Wherein s ismaxIs calculated as
In the formula, tiIs the ith test batch TiThe data length of (c).
The fourth step specifically comprises:
let T ═ T1,t2,…,ttIs a test batch setT is the data length of the test batch, first, the local similarity between the batch data R and T in the high-dimensional feature space, i.e. the local similarity between the batch data R and T in the high-dimensional feature space is calculated based on the Gaussian kernel function
In which s is ∈ [1, s ∈ >max]Is a nuclear parameter. Further processing the formula (7) into
Then, the cumulative distance is calculated:
finally, an optimal synchronization path p is calculated by minimizing the cumulative distance { p ═ p1,p2,…,ph,…,pHWhere H is the number of steps of the synchronization path, i.e.
And performing point-to-point matching on the test batch T by using the optimal synchronization path p, so that the data lengths of the test batch T and the reference batch R are consistent. On the basis, all the test batches except the batch data T in the test batch set are synchronized by the formulas (7) to (10), and the synchronization result set of all the test batches isWhereinFor testing batch TiThe synchronization result of (2).
The fifth step specifically comprises:
based on a set of synchronization results asFor test batch TiResult of synchronization ofThe path deviation ratio Z and the feature retention degree Q are respectively defined as:
in the formula (I), the compound is shown in the specification,is the total number of steps of the synchronization path,andare respectively the result of synchronizationAnd reference batch R, test batch TiThe similarity of (c). On the basis, the average path deviation rate of the synchronization results of all the test batches is calculated respectivelyAnd average degree of feature retentionNamely, it is
In the value range [1, smax]In-core parameter s, synchronization performance evaluation index SPCI corresponding to each value of core parameter ssIs calculated as
In the formula (I), the compound is shown in the specification,is the set of mean path deviation ratios corresponding to different kernel parameters,is the set of average feature retention degrees corresponding to different kernel parameters. Selecting an optimal kernel parameter s by maximizing a synchronicity performance evaluation index*I.e. by
The sixth step specifically includes:
at the optimum kernel parameter s*Under the setting, all the test batches in the test batch set are synchronized by using the formulas (7) to (10), and the synchronization result set of all the test batches isWhereinFor testing batch TiTo the optimal synchronization result. The synchronized batch data sets with different length in the intermittent process are
The invention has the advantages that: aiming at the problem of unequal-length batch data in the intermittent process, the intermittent process data are mapped to a high-dimensional feature space from an original feature space by using a kernel method, the optimal kernel parameter is obtained by using the provided SPCI, meanwhile, the path deviation degree and the feature retention degree are considered, the accuracy of the synchronization result of the unequal-length batch data in the intermittent process is improved, and the process data with consistency is provided for modeling of the intermittent process.
Drawings
Fig. 1 is a flowchart of a KDTW-based batch data synchronization method for batch processing with unequal lengths in an intermittent process according to the present invention;
FIG. 2 is a plot of process variables for all batches of data: (a) the process variables v1-v15 are (o) respectively;
FIG. 3 is a graph of the sum of the similarity of each batch of data to other batches of data;
FIG. 4 is a process variable trace plot of reference batch data: (a) the process variables v1-v15 are (o) respectively;
FIG. 5 is a diagram of the selection of optimal kernel parameters;
fig. 6 is a diagram of the synchronization result of data of different-length batches in a DTW-based batch process: (a) a synchronization path, (b) a reference lot, (c) a test lot, and (d) a synchronization result;
fig. 7 is a diagram of batch data synchronization results of unequal length batch based KDTW batch process: (a) a synchronization path, (b) a reference lot, (c) a test lot, and (d) a synchronization result;
fig. 8 is a comparison graph of SPCI for the results of synchronization of all unequal length batches of data based on DTW and KDTW, respectively.
Detailed Description
The present invention is further described with reference to the following examples and the accompanying drawings, which are not intended to limit the scope of the invention as claimed.
Examples
Semiconductor etching processes are typically batch processes that are an important component of semiconductor manufacturing processes. A process data set from texas instruments LAM9600TCP metal etcher, usa, was used, which contained 107 batches of data of unequal length, with part of the process variables for each batch as depicted in table 1.
TABLE 1 semiconductor etch Process variables
The flow of applying the present invention to a semiconductor etching process is shown in fig. 1, and the specific steps are as follows:
the method comprises the following steps: the collected process data isNormalized process variable data is shown in FIG. 2;
step two: reference batch selection based on the MPSC is carried out by the formulas (1) to (5), and the selection result of the reference batch and the corresponding process variable data are respectively shown in FIG. 3 and FIG. 4;
step three: firstly, obtaining the value range of a kernel function as [1,9] by using the formula (6), and then obtaining the optimal kernel parameter of the KDTW method by using the formulas (7) to (17), wherein the value of the optimal kernel parameter is 7 as shown in FIG. 5;
step four: setting the kernel parameter to be 7, and realizing synchronization of all test batches based on KDTW by using equations (7) to (10).
Comparing the method of the present invention with the DTW-based batch data synchronization method with unequal length in the intermittent process, fig. 6 and 7 are the results of the TCP top power process variable data synchronization of the test batch 1, respectively. As can be seen from fig. 6 (a), the total number of steps of the synchronization path based on DTW is large, and data sample points of consecutive batches appear in the vertical direction, resulting in the loss of part of the original data characteristics of the synchronization result, as shown by the oval labels in fig. 6 (b); compared with DTW, as can be seen from fig. 7, the synchronization path total step number of KDTW is small, and the synchronization result retains most of the original data characteristics. In combination with the SPCI comparison chart of the synchronization results of all the unequal length batches based on DTW and KDTW, respectively, as shown in fig. 8, it can be seen that the synchronization results of the unequal length batches based on the method of the present invention are more accurate.
Claims (1)
1. A batch data synchronization method with unequal length in an intermittent process based on kernel dynamic time warping is characterized in that: the method comprises the following steps:
the method comprises the following steps: multiple batch process with collection of batch processesData ofWherein I is the number of batches, each process variable is subjected to centralization and dimensional normalization, and a normalized batch data set of the intermittent process is assumed to be
In the formula (I), the compound is shown in the specification,andare respectively asAndthe h-th covariance matrix eigenvalue of (a) is the number of principal components,andare respectively asAndthe weighted load matrix of (2) is calculated by:
on the basis of this, the sum of the similarity is calculated, i.e.
Selecting the batch with the largest sum of the similarity as a reference batch R ═ { R ═ R1,r2,…,rrI.e. that
Wherein r is the data length of the reference batch; regarding all the lot data except the reference lot R as lot data, a test lot set is constructed as
Step three: setting the value interval of the kernel width parameter s of the Gaussian kernel function as [1, smax]Wherein s ismaxIs calculated as
In the formula, tiIs the ith test batch TiThe data length of (d);
step four: let T ═ T1,t2,…,ttIs a test batch setT is the data length of the test batch, first, the local similarity between the batch data R and T in the high-dimensional feature space, i.e. the local similarity between the batch data R and T in the high-dimensional feature space is calculated based on the Gaussian kernel function
In which s is ∈ [1, s ∈ >max]Is a nuclear parameter; further processing the formula (7) into
Then, the cumulative distance is calculated:
finally, an optimal synchronization path p is calculated by minimizing the cumulative distance { p ═ p1,p2,…,ph,…,pHWhere H is the number of steps of the synchronization path, i.e.
Performing point-to-point matching on the test batch T by using the optimal synchronization path p, so that the test batchThe data length of the secondary T and the reference batch R is consistent; on the basis, all the test batches except the batch data T in the test batch set are synchronized by the formulas (7) to (10), and the synchronization result set of all the test batches isWhereinFor testing batch TiThe synchronization result of (2);
step five: based on a set of synchronization results asFor test batch TiSynchronization result of (1)iThe path deviation ratio Z and the feature retention degree Q are respectively defined as:
in the formula (I), the compound is shown in the specification,is the total number of steps of the synchronization path,andare respectively the result of synchronizationAnd reference batch R, test batch TiIs likeDegree; on the basis, the average path deviation rate of the synchronization results of all the test batches is calculated respectivelyAnd average degree of feature retentionNamely, it is
In the value range [1, smax]In-core parameter s, synchronization performance evaluation index SPCI corresponding to each value of core parameter ssIs calculated as
In the formula (I), the compound is shown in the specification,is the set of mean path deviation ratios corresponding to different kernel parameters,the average feature retention degree sets corresponding to different kernel parameters; selecting an optimal kernel parameter s by maximizing a synchronicity performance evaluation index*I.e. by
Step six: at the optimum kernel parameter s*Under the setting, all the test batches in the test batch set are synchronized by using the formulas (7) to (10), and the synchronization result set of all the test batches isWhereinFor testing batch TiThe optimal synchronization result of (1); the synchronized batch data sets with different length in the intermittent process are
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011571610.9A CN113221932A (en) | 2020-12-27 | 2020-12-27 | Batch data synchronization method with unequal intermittent process lengths based on kernel dynamic time warping |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011571610.9A CN113221932A (en) | 2020-12-27 | 2020-12-27 | Batch data synchronization method with unequal intermittent process lengths based on kernel dynamic time warping |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113221932A true CN113221932A (en) | 2021-08-06 |
Family
ID=77085843
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011571610.9A Pending CN113221932A (en) | 2020-12-27 | 2020-12-27 | Batch data synchronization method with unequal intermittent process lengths based on kernel dynamic time warping |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113221932A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070288101A1 (en) * | 2006-06-08 | 2007-12-13 | Liu Hugh H T | Method, system and computer program for generic synchronized motion control for multiple dynamic systems |
CN106354889A (en) * | 2016-11-07 | 2017-01-25 | 北京化工大学 | Batch process unequal-length time period synchronization method based on LWPT-DTW (lifting wavelet package transform-dynamic time warping) |
CN106990768A (en) * | 2017-05-21 | 2017-07-28 | 北京工业大学 | MKPCA batch process fault monitoring methods based on Limited DTW |
-
2020
- 2020-12-27 CN CN202011571610.9A patent/CN113221932A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070288101A1 (en) * | 2006-06-08 | 2007-12-13 | Liu Hugh H T | Method, system and computer program for generic synchronized motion control for multiple dynamic systems |
CN106354889A (en) * | 2016-11-07 | 2017-01-25 | 北京化工大学 | Batch process unequal-length time period synchronization method based on LWPT-DTW (lifting wavelet package transform-dynamic time warping) |
CN106990768A (en) * | 2017-05-21 | 2017-07-28 | 北京工业大学 | MKPCA batch process fault monitoring methods based on Limited DTW |
Non-Patent Citations (1)
Title |
---|
邱科鹏: "非规则数据下的间歇过程软测量建模方法研究", 中国博士论文全文数据库》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101458522A (en) | Multi-behavior process monitoring method based on pivot analysis and vectorial data description support | |
CN111638707B (en) | Intermittent process fault monitoring method based on SOM clustering and MPCA | |
US7313454B2 (en) | Method and apparatus for classifying manufacturing outputs | |
CN104834923B (en) | Fingerprint image method for registering based on global information | |
CN116167640B (en) | LCP film production quality detection data analysis method and system | |
CN109670687B (en) | Quality analysis method based on particle swarm optimization support vector machine | |
CN108241925A (en) | A kind of discrete manufacture mechanical product quality source tracing method based on outlier detection | |
CN114700587B (en) | Missing welding defect real-time detection method and system based on fuzzy inference and edge calculation | |
CN103218837B (en) | The histogrammic method for drafting of a kind of unequal interval based on empirical distribution function | |
CN108537249B (en) | Industrial process data clustering method for density peak clustering | |
CN116821832A (en) | Abnormal data identification and correction method for high-voltage industrial and commercial user power load | |
CN113935535A (en) | Principal component analysis method for medium-and-long-term prediction model | |
CN111783336A (en) | Uncertain structure frequency response dynamic model correction method based on deep learning theory | |
CN113221932A (en) | Batch data synchronization method with unequal intermittent process lengths based on kernel dynamic time warping | |
CN116740053B (en) | Management system of intelligent forging processing production line | |
CN112200252A (en) | Joint dimension reduction method based on probability box global sensitivity analysis and active subspace | |
CN110020680B (en) | PMU data classification method based on random matrix theory and fuzzy C-means clustering algorithm | |
CN111599348A (en) | Automatic segmentation method and system for machine tool machining process monitoring signals | |
CN109212751A (en) | A kind of analysis method of free form surface tolerance | |
CN114580503A (en) | DP-SVM-based large-scale instrument man-hour calculation method | |
CN113190728A (en) | Oil-immersed transformer fault diagnosis method based on cluster optimization | |
CN117708691B (en) | Intermittent process monitoring method, storage medium and computer equipment | |
CN112183569A (en) | FDA and SOM based intermittent industrial process reaction phase clustering and fault classification visualization | |
CN113051810B (en) | Space division process PWA model identification method based on constrained grid hierarchical clustering | |
CN117828464B (en) | Fan fault diagnosis method and diagnosis module based on local linear embedding algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |