CN116361722A - Multi-fault classification method for improving linear local cut space arrangement model - Google Patents
Multi-fault classification method for improving linear local cut space arrangement model Download PDFInfo
- Publication number
- CN116361722A CN116361722A CN202310314884.7A CN202310314884A CN116361722A CN 116361722 A CN116361722 A CN 116361722A CN 202310314884 A CN202310314884 A CN 202310314884A CN 116361722 A CN116361722 A CN 116361722A
- Authority
- CN
- China
- Prior art keywords
- samples
- local
- space
- neighborhood
- fault
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 133
- 230000008569 process Effects 0.000 claims abstract description 81
- 239000011159 matrix material Substances 0.000 claims abstract description 55
- 238000004519 manufacturing process Methods 0.000 claims abstract description 30
- 238000012360 testing method Methods 0.000 claims abstract description 18
- 238000012549 training Methods 0.000 claims abstract description 18
- 238000012544 monitoring process Methods 0.000 claims abstract description 17
- 238000001514 detection method Methods 0.000 claims abstract description 12
- 238000013145 classification model Methods 0.000 claims abstract description 9
- 230000006870 function Effects 0.000 claims description 30
- 239000013598 vector Substances 0.000 claims description 18
- 238000005520 cutting process Methods 0.000 claims description 17
- 238000013507 mapping Methods 0.000 claims description 16
- 238000005457 optimization Methods 0.000 claims description 15
- 239000004973 liquid crystal related substance Substances 0.000 claims description 12
- 238000005259 measurement Methods 0.000 claims description 12
- 230000009466 transformation Effects 0.000 claims description 12
- 238000000926 separation method Methods 0.000 claims description 10
- 238000005070 sampling Methods 0.000 claims description 7
- 238000013459 approach Methods 0.000 claims description 6
- 238000000354 decomposition reaction Methods 0.000 claims description 3
- 230000014759 maintenance of location Effects 0.000 claims description 3
- 238000005065 mining Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- TYAQXZHDAGZOEO-KXQOOQHDSA-N 1-myristoyl-2-stearoyl-sn-glycero-3-phosphocholine Chemical compound CCCCCCCCCCCCCCCCCC(=O)O[C@@H](COP([O-])(=O)OCC[N+](C)(C)C)COC(=O)CCCCCCCCCCCCC TYAQXZHDAGZOEO-KXQOOQHDSA-N 0.000 description 2
- 238000003070 Statistical process control Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 208000014446 corneal intraepithelial dyskeratosis-palmoplantar hyperkeratosis-laryngeal dyskeratosis syndrome Diseases 0.000 description 2
- 238000003756 stirring Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000010978 in-process monitoring Methods 0.000 description 1
- 238000009776 industrial production Methods 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 238000011112 process operation Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000001932 seasonal effect Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24143—Distances to neighbourhood prototypes, e.g. restricted Coulomb energy networks [RCEN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/254—Fusion techniques of classification results, e.g. of results related to same input data
- G06F18/256—Fusion techniques of classification results, e.g. of results related to same input data of results relating to different input data, e.g. multimodal recognition
Abstract
The invention relates to the technical field of industrial process monitoring, in particular to a multi-fault classification method for improving a linear local tangent space arrangement model. According to the invention, a new weight matrix is introduced to represent the local position relation between different class labels and the same class label samples in the adjacent domain on the process potential manifold, and the local geometric characteristics of the multi-fault process data manifold are maintained through the improved WLLSTA, so that the distorted manifold structure of the original WLLSTA, which is generated by not distinguishing the class labels of the samples in the adjacent domain, is improved; the method is based on WLLTSA, integrates the locality of manifold geometry and the discrimination of global data, builds a fault classification model, captures the intrinsic characteristics of different operation modes in the process, reflects the characteristic discrimination information of multi-fault process data, and identifies the fault type of a test sample by minimizing Euclidean distance from the test sample to various known training samples on a low-dimensional potential manifold. The method is suitable for multi-fault classification detection in the high-dimensional multi-mode industrial process.
Description
Technical Field
The invention relates to the technical field of industrial process monitoring, in particular to a multi-fault classification method for improving a linear local tangent space arrangement model.
Background
The process monitoring in the modern industrial production has important roles in guaranteeing production safety, improving yield and the like. With the development of distributed control systems, the production scale and the operational complexity have increased dramatically, and industrial processes have collected a large amount of high-dimensional process data. Moreover, as the grade and yield of the produced products can be continuously adjusted along with market demands and seasonal effects, the technological parameters such as product components, process set values, feeding proportions and the like can also fluctuate, and modern industrial processes can be switched among a plurality of different operation modes. The random variation in these manufacturing processes causes the process data to exhibit non-gaussian, multi-modal characteristics, etc. Engineers need to classify specific fault data generated in the production process so as to identify different fault types and further determine the fault sending source in the process runaway state. Thus, effective fault classification detection can ensure stable industrial processes and product quality. Although the data-driven multivariate statistical process control (Multivariate Statistical Process Control, MSPC) approach has been successfully applied to process monitoring, the mean and covariance of the multi-modal, non-gaussian process data have all changed significantly, and the conventional MSPC approach ignores the non-gaussian distribution and multi-modal characteristics existing between different process variables, possibly leading to degradation of the monitoring effect. At present, some manifold learning methods are used in process monitoring, so that the local relation among data samples is well explained, and a more accurate monitoring model is constructed. The partial cut space arrangement (Local Tangent space alignment, LTSA) has found wide application due to its simple geometry and ease of implementation. LTSA firstly approximates local cutting space in the neighborhood of each sample by utilizing a principle ComponentAnalysis, PCA method, the obtained local coordinates are arranged to form a global coordinate system, and finally monitoring indexes are built, so that a good monitoring effect is obtained. LTSA pairs are an effective dimension reduction method for training sets, but it does not yield a clear mapping relationship for new test sets. The linear local cut space arrangement (Linear Local Tangent space alignment, LLTSA) is a linearization variant of LTSA, establishes a clear mapping relationship between the original data space and the feature space, and achieves a better monitoring effect than LTSA.
However, when the process data is sparse, non-gaussian distributed or noisy, the local coordinate system obtained by approximating the local tangential space by using the PCA is not accurate enough, so that the constructed tangential space approximation model cannot describe potential manifold geometric features, cannot keep local structural information of the low-dimensional space, and reduces the efficiency of fault detection.
Recently, zohaib et al in literature (IEEETransactions on Industrial Informatics,2023 (1)) propose a method for linear local cut space arrangement (WeightedLinear Local Tangent SpaceAlignment, WLLTSA) of weights, which uses a thermonuclear matrix as a weight to describe the importance of different samples in the neighborhood, gives a small weight value to a neighborhood sample far from the center and gives a large weight value to a neighborhood sample near the center, so as to obtain more reliable local cut space coordinates, provide clear mapping relation for a test set, establish a more accurate fault detection model, and obtain better monitoring performance. However, when multiple fault types continuously occur in the production process, if the same weight value is given to similar neighbor samples and different types of neighbor samples in the neighborhood, the similar and different types of neighbor samples are projected to the same position of the cutting space, so that the extracted local coordinates of the cutting space are not reliable any more, and the WLLTSA method mainly focuses on the local relationship in the same class of targets and ignores the similarity measurement between different class targets, so that the designed training model cannot necessarily generate accurate classification performance for the test set. Thus, extracting reliable process operation information and important process data features is particularly important for identifying the class to which the fault belongs.
Disclosure of Invention
In order to overcome the above-mentioned shortcomings of the prior art, the present invention provides a multi-fault classification method that improves the linear local cut-space arrangement model. The supervision weight linear local cut space arrangement (Supervised Weighted Linear Local Tangent SpaceAlignment, SWLLTSA) model provided by the invention can fully mine various types of data characteristic information collected in the production process, introduce class mark information and extract important process multi-type operation information according to the local structural relationship of samples and the divergence relationship of different classes of samples; compared with the WLLTSA monitoring method, higher classification precision and stronger fault discrimination capability can be obtained.
The technical scheme adopted by the invention is as follows: a multi-fault classification method for improving a linear local cut space arrangement model, comprising the steps of:
step 1, identifying k neighbors of a center sample according to Euclidean distance, forming a neighborhood of the center sample, and constructing a neighborhood library; defining a weight matrix of the samples in the neighborhood, wherein the added class mark information reflects the possibility that the samples in the neighborhood belong to a certain class;
step 2, establishing a local coordinate system of a tangent space by using defined weights, and constructing an improved WLLTSA method, wherein the remote samples of which the neighborhood is far away from a center sample in the class and the close samples of which the neighborhood is close to the center sample have different weight values, so that the position relation of the neighbor samples in the neighborhood is refined, and the dissimilarity measure of the neighbor samples of different classes is paid attention to, so that the positions of the similar samples and the different classes in the neighborhood in the local tangent space are distinguished;
step 3, establishing an intra-class objective function and an inter-class objective function, maintaining a local structural relationship of production process data and mining global divergence information among different class samples;
and 4, constructing a SWLLSTA fault classification model, fusing global divergence information and local structure relation information in the process, designing multi-fault classification indexes in the process, evaluating an operation mode of the production process, and identifying various fault types.
As a preferable technical scheme of the invention: the specific process of the step 1 is as follows:
step 1.1, sampling to generate N standard samples to form a high-dimensional multi-mode process data setThese samples are from the low-dimensional feature space +.>On the potential manifold in, wherein P and P (P < P) represent the dimensions of the high-dimensional original input space and the low-dimensional feature map space, respectively; for each sample, its k neighbor samples are determined by the nearest euclidean distance, forming a center sample x i Is a neighborhood of (2):
step 1.2, the process dataset has C+1 class labels: { l 1 ,l 2 ,...,l C+1 The class labels respectively correspond to normal operation samples collected in the industrial process and C different kinds of process fault classes, and are respectively:
for sample x i ,l(x i ) E {1,2,.. i Defining two samples x i And x j Weights w between ij :
Wherein the parameter adjustment beta is set as the average euclidean distance of the pairs of samples in the neighborhood. The weight reflects that similar close-range samples in the neighborhood have larger weight values than long-range samples, the neighborhood structure of the local cut space is mainly determined by close-range neighbors similar to the center sample in the neighborhood, and original process data can be properly mapped to the local cut space; in addition, dissimilarity measurement between neighborhood samples of different types of targets is added, dissimilarity characteristics between the different types of target samples are described, distances between the different types of target samples are stretched, and different positions of the same type of samples and the different types of samples in the neighborhood in a local cutting space are represented. By the weight, the distance of the similar samples in the neighborhood is compressed, the distance of the different samples in the neighborhood is stretched, and the correct positions of the similar samples and the different samples in the neighborhood in the local cutting space are reflected.
As a preferable technical scheme of the invention: the specific process of constructing the improved WLLTSA method in step 2 is as follows:
step a, based on newly established weight, a local PCA method is applied to approach the tangent space of each neighborhood, and a local transformation matrix Q is introduced i Each x is i Mapping the neighborhood to a local cutting space, establishing an optimization function of the local cutting space, extracting local coordinate information, and solving local coordinates theta of the cutting space i :
Wherein H is k =I-ee T K is the centering matrix, I is the identity matrix, e is the column vector with all elements 1, w j =[w 1 ,w 2 ,...,w k ]Is the weight of each neighbor sample in the neighborhood; q (Q) i Is an orthogonal base matrix of tangent space, defined by matrix x i H k w i Feature vector composition, w, corresponding to the first p maximum feature values of the decomposition i Is a k x k diagonal matrix w i (j,j)=w ij ),Θ i Is x i Is described, local coordinate system information of the process data:
and b, after extracting the local structure information, arranging local coordinates of all obtained samples to a global low-dimensional feature space, and solving global coordinates Y of all samples X.
As a preferable technical scheme of the invention: the step of establishing the intra-class objective function and the inter-class objective function in the step 3 is specifically as follows:
step A, arranging local coordinates Θ of a tangent space i Solving a global coordinate Y, establishing an intra-class objective function, and keeping a local geometric relationship:
in the formula, a selection matrix S= [ S ] is introduced 1 ,...,S N ]Its element S i (i=1,., N) is a 0 to 1 selection vector, Y i =YS i ,Is x i Global coordinates; l (L) i Is a global transformation matrix, optimal-> Is theta i Moore-Penrose generalized inverse array; f=diag (F 1 ,...,F N ),F i By solving for the value of->Obtaining;
step B, an improved WLLTSA method, namely firstly accurately describing local structure information of each sample neighborhood on a local cut space by utilizing improved weights, then rearranging the local cut space of all samples to a global low-dimensional characteristic space, and finally, finding a projection matrix A, and mapping a high-dimensional process data set X to a low-dimensional data set
Y=A T XH N (6);
Step C, maintaining local relations among similar samples in the low-dimensional feature space by minimizing local structure information of process data, so that similar original samples are mapped on a tighter low-dimensional manifold;
according to equation (6), equation (5) can be transformed into:
wherein b=sff T S T ,A T XH N BH N X T A=I p I.e. YY T =I p For uniquely determining Y;
step D, maximizing the global separation degree of the data, separating different standard samples of the low-dimensional feature space, expanding the intervals of different types of samples in the feature space, and establishing an inter-class objective function:
wherein, the liquid crystal display device comprises a liquid crystal display device,is a global inter-class divergence matrix of the low-dimensional feature space dataset Y, expressed as:
wherein, the liquid crystal display device comprises a liquid crystal display device,and->The average value of the g type samples and the average value of all samples in the low-dimensional feature space are respectively; the elements of matrix G satisfy:
according to YY T =I p And y=a T XH N The divergence matrix becomes:
maximizing the global inter-class divergence matrix, formula (11) is rewritten as:
as a preferable technical scheme of the invention: the step of constructing the SWLLSTA fault classification model in the step 4 is specifically as follows:
s1, constructing a SWLLTSA model, fusing local structure retention information and global separation degree information, extracting effective low-dimensional space characteristic information, and solving the following optimization function:
the optimization problem of equation (13) translates into solving the following generalized eigenvalue problem:
XH N (G-B)H N X T α=λXH N X T α (14)
according to y=a T XH N The constraint-containing optimization problem formula (13) is restated as:
i.e. solving the following generalized eigenvalue problem:
(G-B)y T =λy T (16)
y T is a feature vector corresponding to a feature value lambda of the feature problem solving of the above formula (16); if alpha is T XH N =y, α is a feature vector corresponding to the same feature value λ for solving the feature problem of formula (14);
if the characteristic problem of the formula (16) is solved to obtain the characteristic value after sequencing, lambda 1 >λ 2 >...>λ p The feature vectors corresponding to the feature values areAnd (3) calculating:
wherein, delta is more than or equal to 0, delta is a regularization parameter;
A=[α 1 ,α 2 ,...,α p ]=(XH N X T +δI) -1 XH N Y p (18)
A, determining the category direction of the SWLLTSA model, wherein the separation degree of different targets is orderly reduced; for test set X new The low dimension in the feature space is expressed as:
Z=A T X new H N (19);
s2, designing multi-fault classification indexes of the SWLLSTA model to realize process monitoring;
in the modeling process, a normal operation sample and a plurality of fault samples with different types of targets in the production process are collected to form a training setIn the off-line modeling stage, the proposed SWLLSTA model is utilized to maintain local structure information in original data and separate process sampling data of different types in a mapping space, a training set is projected to an optimized low-dimensional characteristic space, normal operation samples and different types of process fault types are identified, a target optimization function constructed according to formulas (13) to (17) is obtained, and a transformation matrix of the original space is obtained>Low-dimensional representation of training set XIn the real-time fault detection phase, a new measurement data set X is regularized first new Then a transformation matrix is used to obtain a low-dimensional representation z=a of the new measured dataset T X new H N To identify the operation type of these new data, Z is calculated new And Euclidean distance between the training set in the low dimensional representation of the mapping space, expressed as:
in the method, in the process of the invention,the representation is from the first r The ith data sample of the class label, in order to determine Z new The following discrimination functions are designed for the fault types:
if and only if Z new And (3) withWhen the distance between the two is the smallest, the new observation sample belongs to the first r And the class marks are identified, and whether the process sample is a normal operation sample or a specific fault type is identified through the sample operation type of each class mark, so that multi-fault classification detection of real-time data in the production process is realized.
As a preferable technical scheme of the invention: the neighborhood range takes a value of k=12.
Compared with the prior art, the invention has the beneficial effects that: according to the invention, a new weight matrix is introduced to represent the local position relation between different class labels and the same class label samples in the adjacent domain on the process potential manifold, and the local geometric characteristics of the multi-fault process data manifold are maintained through the improved WLLSTA, so that the distorted manifold structure of the original WLLSTA, which is generated by not distinguishing the class labels of the samples in the adjacent domain, is improved; based on WLLTSA, the method fuses the locality of manifold geometry and the discrimination of global data, builds a SWLLTSA fault classification model, captures the intrinsic characteristics of different operation modes in the process, reflects the characteristic discrimination information of multi-fault process data, and identifies the fault type of a test sample by minimizing the Euclidean distance from the test sample to various known training samples on a low-dimensional potential manifold; the method is suitable for multi-fault classification detection in a high-dimensional multi-mode process.
Description of the drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention.
FIG. 1 is a schematic diagram of a Tennessee Isman industrial process;
FIG. 2 is a schematic diagram of a method embodiment of the present invention;
FIG. 3 is a test result obtained for TEP by SWLLTSA according to the method of the present invention;
FIG. 4 is a test result obtained for TEP by the WLLTSA method in the literature;
fig. 5 is a test result obtained for TEP by classical FDA method.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. Of course, the specific embodiments described herein are for purposes of illustration only and are not intended to limit the invention.
As shown in fig. 2, a multi-fault classification method for improving a linear local cut space arrangement model includes the steps of:
step 1, identifying k neighbors of a center sample according to Euclidean distance, forming a neighborhood of the center sample, and constructing a neighborhood library; defining a weight matrix of the samples in the neighborhood, wherein the added class mark information reflects the possibility that the samples in the neighborhood belong to a certain class;
step 2, establishing a local coordinate system of a tangent space by using defined weights, and constructing an improved WLLTSA method, wherein the remote samples of which the neighborhood is far away from a center sample in the class and the close samples of which the neighborhood is close to the center sample have different weight values, so that the position relation of the neighbor samples in the neighborhood is refined, and the dissimilarity measure of the neighbor samples of different classes is paid attention to, so that the positions of the similar samples and the different classes in the neighborhood in the local tangent space are distinguished;
step 3, establishing an intra-class objective function and an inter-class objective function, maintaining a local structural relationship of production process data and mining global divergence information among different class samples;
and 4, constructing a SWLLSTA fault classification model, fusing global divergence information and local structure relation information in the process, designing multi-fault classification indexes in the process, evaluating an operation mode of the production process, and identifying various fault types.
The specific process of the step 1 is as follows: step 1.1, sampling to generate N standard samples to form a high-dimensional multi-mode process data setThese samples are from the low-dimensional feature space +.>On the potential manifold in, wherein P and P (P < P) represent the dimensions of the high-dimensional original input space and the low-dimensional feature map space, respectively; for each sample, its k neighbor samples are determined by the nearest euclidean distance, forming a center sample x i Is a neighborhood of (2):
step 1.2, the process dataset has C+1 class labels: { l 1 ,l 2 ,...,l C+1 The class labels respectively correspond to normal operation samples collected in the industrial process and C different kinds of process fault classes, and are respectively:
for sample x i ,l(x i ) E {1,2,.. i Defining two samples x i And x j Weights w between ij :
Wherein the parameter adjustment beta is set as the average euclidean distance of the pairs of samples in the neighborhood. The weight reflects that similar close-range samples in the neighborhood have larger weight values than long-range samples, the neighborhood structure of the local cut space is mainly determined by close-range neighbors similar to the center sample in the neighborhood, and original process data can be properly mapped to the local cut space; in addition, dissimilarity measurement between neighborhood samples of different types of targets is added, dissimilarity characteristics between the different types of target samples are described, distances between the different types of target samples are stretched, and different positions of the same type of samples and the different types of samples in the neighborhood in a local cutting space are represented. By the weight, the distance of the similar samples in the neighborhood is compressed, the distance of the different samples in the neighborhood is stretched, and the correct positions of the similar samples and the different samples in the neighborhood in the local cutting space are reflected.
In order to maintain the local structure information of each neighborhood, an improved WLLTSA method is provided, and the specific process of constructing the improved WLLTSA method in the step 2 is as follows:
step a, based on newly established weight, a local PCA method is applied to approach the tangent space of each neighborhood, and a local transformation matrix Q is introduced i Each x is i Mapping the neighborhood to a local cutting space, establishing an optimization function of the local cutting space, extracting local coordinate information, and solving local coordinates theta of the cutting space i :
Wherein H is k =I-ee T K is the centering matrix and I is the identity matrixE is the column vector with elements all 1, w j =[w 1 ,w 2 ,...,w k ]Is the weight of each neighbor sample in the neighborhood; q (Q) i Is an orthogonal base matrix of tangent space, defined by matrix x i H k w i Feature vector composition, w, corresponding to the first p maximum feature values of the decomposition i Is a k x k diagonal matrix w i (i,j)=w ij ),Θ i Is x i Is described, local coordinate system information of the process data:
and b, after extracting the local structure information, arranging local coordinates of all obtained samples to a global low-dimensional feature space, and solving global coordinates Y of all samples X.
The step of establishing the intra-class objective function and the inter-class objective function in the step 3 is specifically as follows:
step A, arranging local coordinates Θ of a tangent space i Solving a global coordinate Y, establishing an intra-class objective function, and keeping a local geometric relationship:
in the formula, a selection matrix S= [ S ] is introduced 1 ,...,S N ]Its element S i (i=1,., N) is a 0 to 1 selection vector, Y i =YS i ,Is x i Global coordinates; l (L) i Is a global transformation matrix, optimal-> Is theta i Moore-Penrose generalized inverse array of (E);F=diag(F 1 ,...,F N ),F i By solving for the value of->Obtaining;
step B, an improved WLLTSA method, namely firstly accurately describing local structure information of each sample neighborhood on a local cut space by utilizing improved weights, then rearranging the local cut space of all samples to a global low-dimensional characteristic space, and finally, finding a projection matrix A, and mapping a high-dimensional process data set X to a low-dimensional data set
Y=A T XH N (6);
Step C, maintaining local relations among similar samples in the low-dimensional feature space by minimizing local structure information of process data, so that similar original samples are mapped on a tighter low-dimensional manifold;
according to equation (6), equation (5) can be transformed into:
wherein b=sff T S T ,A T XH N BH N X T A=I p I.e. YY T =I p For uniquely determining Y;
step D, maximizing the global separation degree of the data, separating different standard samples of the low-dimensional feature space, expanding the intervals of different types of samples in the feature space, and establishing an inter-class objective function:
wherein, the liquid crystal display device comprises a liquid crystal display device,is a global inter-class divergence matrix of the low-dimensional feature space dataset Y, expressed as:
wherein, the liquid crystal display device comprises a liquid crystal display device,and->The average value of the g type samples and the average value of all samples in the low-dimensional feature space are respectively; the elements of matrix G satisfy:
according to YY T =I p And y=a T XH N The divergence matrix becomes:
maximizing the global inter-class divergence matrix, formula (11) is rewritten as:
the step of constructing the SWLLSTA fault classification model in the step 4 is specifically as follows:
s1, constructing a SWLLTSA model, fusing local structure retention information and global separation degree information, extracting effective low-dimensional space characteristic information, and solving the following optimization function:
the optimization problem of equation (13) translates into solving the following generalized eigenvalue problem:
XH N (G-B)H N X T α=λXH N X T α (14)
according to y=a T XH N The constraint-containing optimization problem formula (13) is restated as:
i.e. solving the following generalized eigenvalue problem:
(G-B)y T =λy T (16)
y T is a feature vector corresponding to a feature value lambda of the feature problem solving of the above formula (16); if alpha is T XH N =y, α is a feature vector corresponding to the same feature value λ for solving the feature problem of formula (14);
if the characteristic problem of the formula (16) is solved to obtain the characteristic value after sequencing, lambda 1 >λ 2 >...>λ p The feature vectors corresponding to the feature values areAnd (3) calculating:
wherein, delta is more than or equal to 0, delta is a regularization parameter;
A=[α 1 ,α 2 ,...,α p ]=(XH N X T +δI) -1 XH N Y P (18)
A, determining the category direction of the SWLLTSA model, wherein the separation degree of different targets is orderly reduced; for test set X new The low dimension in the feature space is expressed as:
Z=A T X new H N (19);
s2, designing multi-fault classification indexes of the SWLLSTA model to realize process monitoring;
in the modeling process, a normal operation sample and a plurality of fault samples with different types of targets in the production process are collected to form a training setIn the off-line modeling stage, the proposed SWLLSTA model is utilized to maintain local structure information in original data and separate process sampling data of different types in a mapping space, a training set is projected to an optimized low-dimensional characteristic space, normal operation samples and different types of process fault types are identified, a target optimization function constructed according to formulas (13) to (17) is obtained, and a transformation matrix of the original space is obtained>Low-dimensional representation of training set XIn the real-time fault detection phase, a new measurement data set X is regularized first new Then a transformation matrix is used to obtain a low-dimensional representation z=a of the new measured dataset T X new H N To identify the operation type of these new data, Z is calculated new And Euclidean distance between the training set in the low dimensional representation of the mapping space, expressed as:
in the method, in the process of the invention,the representation is from the first r The ith data sample of the class label, in order to determine Z new The following discrimination functions are designed for the fault types:
if and only if Z new And (3) withWhen the distance between the two is the smallest, the new observation sample belongs to the first r And the class marks are identified, and whether the process sample is a normal operation sample or a specific fault type is identified through the sample operation type of each class mark, so that multi-fault classification detection of real-time data in the production process is realized.
The neighborhood range takes a value of k=12.
The effectiveness of the invention is described below in connection with the tanacisman industrial process (Tennessee Eastman process, TEP) examples. The platform is a simulation test platform developed by Eastman company in the United states according to practical chemical industry combined reaction, and comprises five main operation units of a continuous stirring reaction kettle, a condenser, a centrifugal compressor, a vapor/liquid separator and a stripping tower, wherein a schematic diagram of a TEP process is shown in figure 1. In the experiment, 22 continuous measurement variables and 11 manipulated variables (excluding stirring speed) were selected as process monitoring variables, and the sampling interval of the dataset was 3 minutes.
TABLE 1 TEP data collected and description of faults
Typical faults F4, F8, F13, F15 in faults and normal operation samples are selected in the TEP process scenario shown in table 1, two operation modes are selected, two types of fault data are collected in each mode, and the collected process data are used for evaluating and comparing classification performances of different monitoring models. 1600 data samples were collected for each pattern in the training set and 800 data samples were collected for each pattern in the test set.
Table 2 presents the proposed SWLLTSA, WLLSTA in literature, and classification results of TEP by conventional LLSTA and FDA (Fisher discriminant analysis) methods.
TABLE 2 classification results of TEP by SWLLTSA, WLLTSA, LLTSA and FDA
The larger values in the table illustrate better classification performance. These process data have non-gaussian, multi-modal, high dimensional, etc. characteristics and the FDA approach produces poor classification when identifying these several different types of faults. In comparison, SWLLTSA and WLLSTA are better classified, and the local structural relation of the samples in the same class can be maintained, so that the characteristic enhances the classification effect; moreover, the SWLLTSA performs divergence measurement on samples of different types, and simultaneously refines the geometric relationship between neighbor samples and remote samples in the same type of inner neighborhood, and has stronger discrimination capability than WLLSTA, so that the SWLLSTA method provided obtains better classification effect. Fig. 3 is a graph of SWLLSTA, WLLSTA and FDA test results. Overall, SWLLSTA method misclassified samples are less numerous.
According to the invention, a new weight matrix is introduced to represent the local position relation between different class labels and the same class label samples in the adjacent domain on the process potential manifold, and the local geometric characteristics of the multi-fault process data manifold are maintained through the improved WLLSTA, so that the distorted manifold structure of the original WLLSTA, which is generated by not distinguishing the class labels of the samples in the adjacent domain, is improved; the method is based on WLLTSA, integrates the locality of manifold geometry and the discrimination of global data, builds a fault classification model, captures the intrinsic characteristics of different operation modes in the process, reflects the characteristic discrimination information of multi-fault process data, and identifies the fault type of a test sample by minimizing Euclidean distance from the test sample to various known training samples on a low-dimensional potential manifold. Therefore, the method is suitable for multi-fault classification detection in a high-dimensional multi-mode process.
The method provided by the invention not only pays attention to the measurement of neighbor samples in the same class, but also pays attention to the dissimilarity measurement of neighbor samples in different classes, particularly distributes different weight values to the long-distance samples of the neighborhood far from the center sample and the short-distance samples close to the center sample, further refines the geometric relationship between the neighbor samples in the neighborhood, accords with the nonlinear and multi-mode multi-fault production process characteristics of the original process data, pays attention to the dissimilarity measurement of the neighbor samples in different classes, gives larger weight to the neighbor samples in different classes, and enlarges the interval between the neighbor samples in different classes so as to distinguish the positions of the same class and the different class samples in the neighborhood in the local cutting space; meanwhile, divergence measurement among different classes of marks is introduced, so that identification information of different types of process data can be fully utilized, and the multi-fault identification capability of a low-dimensional mapping space is improved; in general, the method utilizes class mark information to generate a weight matrix in a local neighborhood, gives different weight values to various types of neighbor samples, grabs the characteristic information of each sample, enhances the extraction of local structure information, and fuses the class mark information of data with the separation degree between different class marks, so that the sensitivity and the discrimination of multi-fault classification in a low-dimensional projection space are improved; compared with the WLLTSA monitoring method, higher diagnosis precision and stronger fault discrimination capability can be obtained.
While the foregoing is directed to embodiments of the present invention, other and further details of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.
Claims (6)
1. A multi-fault classification method for improving a linear local cut space arrangement model, comprising the steps of:
step 1, identifying k neighbors of a center sample according to Euclidean distance, forming a neighborhood of the center sample, and constructing a neighborhood library; defining a weight matrix of the samples in the neighborhood, wherein the added class mark information reflects the possibility that the samples in the neighborhood belong to a certain class;
step 2, establishing a local coordinate system of a tangent space by using defined weights, and constructing an improved WLLTSA method, wherein the remote samples of which the neighborhood is far away from a center sample in the class and the close samples of which the neighborhood is close to the center sample have different weight values, so that the position relation of the neighbor samples in the neighborhood is refined, and the dissimilarity measure of the neighbor samples of different classes is paid attention to, so that the positions of the similar samples and the different classes in the neighborhood in the local tangent space are distinguished;
step 3, establishing an intra-class objective function and an inter-class objective function, maintaining a local structural relationship of production process data and mining global divergence information among different class samples;
and 4, constructing a SWLLSTA fault classification model, fusing global divergence information and local structure relation information in the process, designing multi-fault classification indexes in the process, evaluating an operation mode of the production process, and identifying various fault types.
2. The method for classifying multiple faults in an improved linear local tangential spatial arrangement model of claim 1 in which step 1 is as follows:
step 1.1, sampling to generate N standard samples to form a high-dimensional multi-mode process data setThese samples are from the low-dimensional feature space +.>On the potential manifold in, wherein P and P (P < P) represent the dimensions of the high-dimensional original input space and the low-dimensional feature map space, respectively; for each sample, its k neighbor samples are determined by the nearest euclidean distance, forming a center sample x i Is a neighborhood of (2):
step 1.2, the process dataset has C+1 class labels: { l 1 ,l 2 ,…,l C+1 The class labels respectively correspond to normal operation samples collected in the industrial process and C different kinds of process fault classes, and are respectively:
for sample x i ,l(x i ) E {1,2,.. i Defining two samples x i And x j Weights w between ij :
Wherein the parameter adjustment beta is set as the average euclidean distance of the pairs of samples in the neighborhood.
3. The method for multiple fault classification for improved linear local cut space arrangement model as claimed in claim 2, wherein the specific procedure for constructing the improved WLLTSA method in step 2 is as follows:
step a, based on newly established weight, a local PCA method is applied to approach the tangent space of each neighborhood, and a local transformation matrix Q is introduced i Each x is i Mapping the neighborhood to a local cutting space, establishing an optimization function of the local cutting space, extracting local coordinate information, and solving local coordinates theta of the cutting space i :
Wherein H is k =I-ee T K is the centering matrix, I is the identity matrix, e is the column vector with all elements 1, w j =[w 1 ,w 2 ,...,w k ]Is the weight of each neighbor sample in the neighborhood; q (Q) i Is an orthogonal base matrix of tangent space, defined by matrix x i H k w i Feature vector composition, w, corresponding to the first p maximum feature values of the decomposition i Is a k x k diagonal matrix w i (j,j)=w ij ),Θ i Is x i Is described, local coordinate system information of the process data:
and b, after extracting the local structure information, arranging local coordinates of all obtained samples to a global low-dimensional feature space, and solving global coordinates Y of all samples X.
4. The method for multiple fault classification for improved linear local tangential spatial arrangement model as claimed in claim 3, wherein the step of establishing an intra-class objective function and an inter-class objective function in step 3 is specifically as follows:
step A, arranging local coordinates Θ of a tangent space i Solving a global coordinate Y, establishing an intra-class objective function, and keeping a local geometric relationship:
in the formula, a selection matrix S= [ S ] is introduced 1 ,…,S N ]Its element S i (i=1, …, N) is a 0 to 1 selection vector, Y i =YS i ,Is x i Global coordinates; l (L) i Is a global transformation matrix, optimal-> Is theta i Moore-Penrose generalized inverse array; f=diag (F 1 ,...,F N ),F i By solving for the value of->Obtaining;
step B, an improved WLLTSA method, namely firstly accurately describing local structure information of each sample neighborhood on a local cut space by utilizing improved weights, then rearranging the local cut space of all samples to a global low-dimensional characteristic space, and finally, finding a projection matrix A, and mapping a high-dimensional process data set X to a low-dimensional data set
Y=A T XH N (6);
Step C, maintaining local relations among similar samples in the low-dimensional feature space by minimizing local structure information of process data, so that similar original samples are mapped on a tighter low-dimensional manifold;
according to equation (6), equation (5) can be transformed into:
wherein b=sff T S T ,A T XH N BH N X T A=I p I.e. YY T =I p For uniquely determining Y;
step D, maximizing the global separation degree of the data, separating different standard samples of the low-dimensional feature space, expanding the intervals of different types of samples in the feature space, and establishing an inter-class objective function:
wherein, the liquid crystal display device comprises a liquid crystal display device,is a global inter-class divergence matrix of the low-dimensional feature space dataset Y, expressed as:
wherein, the liquid crystal display device comprises a liquid crystal display device,and->The average value of the g type samples and the average value of all samples in the low-dimensional feature space are respectively; the elements of matrix G satisfy:
according to YY T =I p And y=a T XH N The divergence matrix becomes:
maximizing the global inter-class divergence matrix, formula (11) is rewritten as:
5. the method for multi-fault classification for improved linear local tangential spatial arrangement model as claimed in claim 4, wherein the step of constructing SWLLSTA fault classification model in step 4 is specifically as follows:
s1, constructing a SWLLTSA model, fusing local structure retention information and global separation degree information, extracting effective low-dimensional space characteristic information, and solving the following optimization function:
the optimization problem of equation (13) translates into solving the following generalized eigenvalue problem:
XH N (G-B)H N X T α=λXH N X T α (14)
according to y=a T XH N The constraint-containing optimization problem formula (13) is restated as:
i.e. solving the following generalized eigenvalue problem:
(G-B)y T =λy T (16)
y T is a feature vector corresponding to a feature value lambda of the feature problem solving of the above formula (16); if alpha is T XH N =y, α is a feature vector corresponding to the same feature value λ for solving the feature problem of formula (14);
if the characteristic problem of the formula (16) is solved to obtain the characteristic value after sequencing, lambda 1 >λ 2 >…>λ p The feature vectors corresponding to the feature values areAnd (3) calculating:
wherein, delta is more than or equal to 0, delta is a regularization parameter;
given a givenThe projection matrix A of the obtainable model is A= [ alpha ] 1 ,α 2 ,...,α p ]=(XH N X T +δI) - 1 XH N Y p (18)
A, determining the category direction of the SWLLTSA model, wherein the separation degree of different targets is orderly reduced; for test set X new The low dimension in the feature space is expressed as:
Z=A T X new H N (19);
s2, designing multi-fault classification indexes of the SWLLSTA model to realize process monitoring;
in the modeling process, a normal operation sample and a plurality of fault samples with different types of targets in the production process are collected to form a training setIn the off-line modeling stage, the proposed SWLLSTA model is utilized to maintain local structure information in original data and separate process sampling data of different types in a mapping space, a training set is projected to an optimized low-dimensional characteristic space, normal operation samples and different types of process fault types are identified, a target optimization function constructed according to formulas (13) to (17) is obtained, and a transformation matrix of the original space is obtained>Low-dimensional representation of training set XIn the real-time fault detection phase, a new measurement data set X is regularized first new Then a transformation matrix is used to obtain a low-dimensional representation z=a of the new measured dataset T X new H N To identify the operation type of these new data, Z is calculated new And Euclidean distance between the training set in the low dimensional representation of the mapping space, expressed as:
in the method, in the process of the invention,the representation is from the first r The ith data sample of the class label, in order to determine Z new The following discrimination functions are designed for the fault types:
if and only if Z new And (3) withWhen the distance between the two is the smallest, the new observation sample belongs to the first r And the class marks are identified, and whether the process sample is a normal operation sample or a specific fault type is identified through the sample operation type of each class mark, so that multi-fault classification detection of real-time data in the production process is realized.
6. The method of claim 2, wherein the neighborhood range has a value of k=12.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310314884.7A CN116361722A (en) | 2023-03-28 | 2023-03-28 | Multi-fault classification method for improving linear local cut space arrangement model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310314884.7A CN116361722A (en) | 2023-03-28 | 2023-03-28 | Multi-fault classification method for improving linear local cut space arrangement model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116361722A true CN116361722A (en) | 2023-06-30 |
Family
ID=86935848
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310314884.7A Pending CN116361722A (en) | 2023-03-28 | 2023-03-28 | Multi-fault classification method for improving linear local cut space arrangement model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116361722A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116610927A (en) * | 2023-07-21 | 2023-08-18 | 傲拓科技股份有限公司 | Fan gear box bearing fault diagnosis method and diagnosis module based on FPGA |
-
2023
- 2023-03-28 CN CN202310314884.7A patent/CN116361722A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116610927A (en) * | 2023-07-21 | 2023-08-18 | 傲拓科技股份有限公司 | Fan gear box bearing fault diagnosis method and diagnosis module based on FPGA |
CN116610927B (en) * | 2023-07-21 | 2023-10-13 | 傲拓科技股份有限公司 | Fan gear box bearing fault diagnosis method and diagnosis module based on FPGA |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107515895B (en) | Visual target retrieval method and system based on target detection | |
CN110008584B (en) | GitHub-based semi-supervised heterogeneous software defect prediction method | |
Paclík et al. | Building road-sign classifiers using a trainable similarity measure | |
CN101140624A (en) | Image matching method | |
Chen et al. | Using improved self-organizing map for fault diagnosis in chemical industry process | |
US20080212880A1 (en) | Identification and Classification of Virus Particles in Textured Electron Micrographs | |
CN111160401A (en) | Abnormal electricity utilization judging method based on mean shift and XGboost | |
CN116361722A (en) | Multi-fault classification method for improving linear local cut space arrangement model | |
CN101738998B (en) | System and method for monitoring industrial process based on local discriminatory analysis | |
CN110765587A (en) | Complex petrochemical process fault diagnosis method based on dynamic regularization judgment local retention projection | |
CN110136779B (en) | Sample feature extraction and prediction method for key difference nodes of biological network | |
CN111667135B (en) | Load structure analysis method based on typical feature extraction | |
CN114564982A (en) | Automatic identification method for radar signal modulation type | |
CN103616889B (en) | A kind of chemical process Fault Classification of reconstructed sample center | |
CN109784142B (en) | Hyperspectral target detection method based on conditional random projection | |
Luqman et al. | Subgraph spotting through explicit graph embedding: An application to content spotting in graphic document images | |
CN102930291B (en) | Automatic K adjacent local search heredity clustering method for graphic image | |
CN111796576B (en) | Process monitoring visualization method based on dual-core t-distribution random neighbor embedding | |
CN111426657B (en) | Identification comparison method of three-dimensional fluorescence spectrogram of soluble organic matter | |
Song et al. | A multi-SOM with canonical variate analysis for chemical process monitoring and fault diagnosis | |
CN113033683B (en) | Industrial system working condition monitoring method and system based on static and dynamic joint analysis | |
CN114118292B (en) | Fault classification method based on linear discriminant neighborhood preserving embedding | |
CN110647922B (en) | Layered non-Gaussian process monitoring method based on public and special feature extraction | |
Yang et al. | Adaptive density peak clustering for determinging cluster center | |
CN112183569A (en) | FDA and SOM based intermittent industrial process reaction phase clustering and fault classification visualization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |