CN111610768A

CN111610768A - Intermittent process quality prediction method based on similarity multi-source domain transfer learning strategy

Info

Publication number: CN111610768A
Application number: CN202010523586.5A
Authority: CN
Inventors: 褚菲; 彭闯; 王嘉琛; 陆宁云; 王福利; 高富荣; 马小平
Original assignee: China University of Mining and Technology CUMT
Current assignee: China University of Mining and Technology CUMT
Priority date: 2020-06-10
Filing date: 2020-06-10
Publication date: 2020-09-01
Anticipated expiration: 2040-06-10
Also published as: CN111610768B

Abstract

A method for predicting quality of an intermittent process based on a similarity multi-source domain transfer learning strategy comprises the steps of collecting input and output data of a target domain process and a source domain process, expanding three-dimensional input data of a plurality of source domain old processes and a plurality of target domain new processes into a two-dimensional data matrix according to a batch direction, and standardizing the input and output data of all the processes; calculating the similarity between the old process of each source domain and the new process of the target domain through the Euclidean distance between data, simultaneously calculating the number of samples of the old process of each source domain, determining two main factors influencing the migration effect, and giving three specific choices and standards based on the two main factors: the method has the advantages that migration, preferential single migration and multi-source integrated migration are rejected, negative migration is avoided as far as possible, meanwhile, data information of old processes in a plurality of similar source domains is utilized, waste of data resources is reduced, efficiency and flexibility of migration learning are improved, modeling of new processes in a target domain is assisted and accelerated better, and therefore accuracy of quality prediction is improved.

Description

Intermittent process quality prediction method based on similarity multi-source domain transfer learning strategy

Technical Field

The invention relates to a quality prediction method, in particular to an intermittent process quality prediction method based on a similarity multi-source domain transfer learning strategy, and belongs to the technical field of industrial production process quality prediction.

Background

With the rapid development of economy and the increasingly fierce competition of international product market, the standards and requirements of product quality become higher and higher, and particularly in the intermittent production process, the stability of the product quality not only directly concerns the economic benefits of enterprises, but also is the premise for realizing the production optimization of the intermittent process.

Accurate quality prediction is a necessary condition for ensuring safe operation of the batch process and obtaining high-quality products; with the rapid development of data technology, data-driven methods are becoming the mainstream of process modeling due to their advantages of fast modeling speed, high model precision, good cost effectiveness, and the like, and are widely applied to product quality prediction in intermittent processes. The premise of data-driven modeling is to have sufficient process data, and the prediction result can be more accurate only by extracting potential information of intermittent processes from a large amount of process data to the maximum extent to realize process modeling. In the actual intermittent production process, different product specifications are produced by adopting specific operating conditions and even equipment, the operating state needs to be frequently updated, the performance of a data driving model is reduced, and a new process needs to be reconstructed at the moment; however, the new process cannot obtain rich process data due to the relatively short running time, and it is difficult to build an accurate and reliable data-driven model.

Under the background of big data era, it is noted that in the modern intermittent industrial process, there are many similar processes for producing products with the same or similar specifications by using the same or similar process principles, and a large amount of similar historical data are not fully utilized in the processes, which causes resource waste. Data migration is a form of migration learning technology, and can fully utilize data and models of an old process and migrate useful data information into a new process to assist modeling and control of the new process. Jaeckle and MacGregor propose one EPCR (extended principal component regression) method (from Jaeckle CM, Macgreger JF.product transfer between plant using historical process data [ J ]. Aiche Journal,2000,46(10)) used for data migration, this method sets up EPCR model by combining the output data matrix of two similar processes, can utilize the data information of two similar processes to predict the product quality effectively, however, EPCR only uses the output data in the similar processes to migrate, and neglects the process information contained in the input data which is very important for modeling; subsequently, Salvador et al proposed a new JYPLS (Joint-Y partial least squares) method for data migration (from Salvador G M, Macgregor J F, Kourti T.product transfer between using Joint-Y PLS [ J ]. Chemometrics & Intelligent Laboratory Systems,2005,79(1-2):101-114) by constructing a Joint quality index space, modeling with data matrices of all similar processes, and JYPLS model requires only similar processes to have the same quality index composition, without any restriction on the input variable matrix; in addition, aiming at the fact that the JYPLS method is not suitable for the intermittent process with strong nonlinearity, a kernel function is introduced on the model, and an improved JYKPLS (Joint-Y kernel partial least squares) method (from Chu F, Cheng X, Jia R, et al. final quality prediction method for new batch process based on improved JYKPLS process transfer model [ J ]. Chemometrics and indirect batch process Laboratory Systems,2018,183:1-10) of process migration is provided and successfully applied to the product quality prediction of the nonlinear new intermittent process. More recently, Luo et al have proposed a new nonparametric approach to multi-Process data analysis (from LuoL, Yao Y, Gao F, et al, Mixed-effects Gaussian processing with estimation of movement processes [ J ]. Journal of Process Control,2018,62:37-43), where each Process is modeled as a combination of fixed-effect and random-effect Gaussian Process (GP) regression models, i.e., mixed-effect Gaussian Process (ME-GP) models, which provide a flexible approach to combine common aspects of all processes and describe the heterogeneity between different processes, and can achieve prediction of multiple processes by modeling the probability density distribution of the migration.

According to the analysis, the data-driven modeling of the transfer learning can solve the current new problem by using a large amount of acquired knowledge and experience, so that the difficulty of learning a new task can be greatly reduced, and the method has a wide application prospect; however, the data-driven modeling method based on the migration learning is limited to directly determining a single source domain or a plurality of source domains to be migrated under the default condition, and specific factors influencing the migration effect are not explicitly analyzed, so that two key problems inevitable in the migration modeling, namely "when to migrate" and "how to migrate", are not discussed in practical application, and improper migration time and migration method cannot promote the completion of the learning task, and also can bring about the problem of "negative migration".

Therefore, the current migration learning method has the problems of low efficiency in data-driven modeling, waste of available data resources of a plurality of source domains, low efficiency of migration of data and knowledge from the source domains to the target domains, and "negative migration".

Disclosure of Invention

The invention aims to provide an intermittent process quality prediction method based on a similarity multi-source domain transfer learning strategy, which fully utilizes data information of old processes in a plurality of similar source domains while avoiding negative transfer as much as possible, reduces the waste of data resources, improves the efficiency and flexibility of transfer learning, better assists and accelerates the modeling of a new process in a target domain, and thus improves the accuracy of quality prediction; and field operators can adjust production operation in time according to the quality prediction result, optimize the intermittent industrial production process in real time, ensure the product quality and improve the production efficiency and comprehensive economic benefit of enterprises.

In order to achieve the aim, the invention provides an intermittent process quality prediction method based on a similarity multi-source domain transfer learning strategy, aiming at solving the problem that the existing method is not suitable for the conventional methodFor the target domain new process T, there are several similar source domain old processes S_i(i 1, 2.. multidot.M), the production equipment is identical, but the internal parameters are set differently, the target domain new process T is just put into operation, the production data is rare, the source domain old processes have sufficient data due to the early time of putting into production, and the three-dimensional input data matrix and the output matrix of the intermittent production process are respectively set as X ∈ R^I×J×K、Y∈R^I×KWherein I represents the process batch number, K represents the sampling time, and J represents the process variable number, the specific method comprises the following steps:

step 1, data acquisition: for a given target domain new process T, a plurality of source domain old processes S similar to the target domain new process T are found based on process similarity and prior knowledge_i(i 1, 2.. said, M), collecting input and output data of each process of the target domain and the source domain;

step 2, data preprocessing: multiple source domain old process S_i(i ═ 1, 2.. multidot.M) and the three-dimensional input data of the target domain new process T are expanded into a two-dimensional data matrix according to the batch direction, and then input and output data of all the processes are standardized;

step 3, similarity evaluation, calculation and source domain sample size statistics: calculating the similarity between the old process of each source domain and the new process of the target domain according to the Euclidean distance between the data, and recording the similarity as theta₁,θ₂,...,θ_MAnd θ ∈ (0,1), while the number of samples of old processes per source domain is calculated and recorded as N₁,N₂,...,N_M(ii) a As shown in the formula (1),

in the formula: s_iAn ith source domain old process;

t is a new process of the target domain;

and

a data center for a new process for the target domain;

and

a data center for a source domain old process;

d(S_it) represents the Euclidean distance between each old process in the source domain and the new process in the target domain;

θ_irepresenting a similarity;

step 4, according to the discriminant

Determining whether to execute migration, wherein α∈ (0,1) is a preset empirical value constant, if θ is_iIf not less than α, the condition theta is selected to be satisfied_iOtherwise, refusing the migration, terminating the whole migration process, and entering the modeling process of step 6;

step 5, selecting a migration method: according to the discriminant

Determining a specific migration method, wherein λ ∈ (0,1- α) is a predetermined empirical constant representing the ability to accommodate the source domain, and θ_maxAnd N_θmaxAre all in inverse proportion; theta_maxRepresenting the maximum similarity between the old process of the source domain and the new process of the target domain; n is a radical of_θmaxRepresenting the number of samples of the source domain old process having the greatest similarity;

if it is not

Selecting to satisfy the condition

And reserving and executing integrated migration of the plurality of source domains; otherwise, directly selecting the similarity with the highest degreeHigh source domain old process theta_maxCarrying out preference single migration;

step 6, according to the migration method selected in the step 5, establishing a proper multi-source migration learning model for the new process of the target domain, starting a new batch and obtaining input data x_newPredicting the product quality of the new batch process;

step 7, obtaining the actual quality index y after the new batch operation is finished_newAnd calculating the similarity theta between the data and the original target domain data_new；

Step 8, according to the discriminant formula theta_newβ, wherein β is a preset constant for judging the deviation degree of the generated new data and the original data, preliminarily determining whether the target domain and the migration strategy need to be updated simultaneously, if theta is larger than theta_newIf the result is more than β, go to step 9, otherwise, execute step 1;

step 9, according to whether the total number n of the accumulated newly produced batches meets the judgment formula n is larger than m, wherein m is a preset constant, whether the target domain and the migration strategy need to be updated simultaneously is determined again, if n is larger than m, the target domain and the migration strategy are updated simultaneously, the target domain updating formula is shown as a formula (2), then the step 1 is executed, otherwise, the step 10 is executed;

in the formula: x is the number of_newAnd y_newRepresents newly generated new process data;

X_T,oldand Y_T,oldRepresenting existing modeling data of a new process of the target domain;

step 10, first, calculate the current latest quality prediction error_newLatest quality prediction error

Where y is the actual quality value,

is the model prediction quality value, and judgesInterrupting whether the latest quality prediction error is continuously stabilized in a certain interval for k times, wherein k is a preset constant, namely, whether the difference value of the quality prediction errors of the previous batch and the next batch is continuously less than a threshold value for k times

Wherein the threshold value

Is a preset small normal number close to 0, and determines whether data elimination is needed according to the result; if yes, calculating data similarity, removing several groups of data with the lowest similarity from the migrated source domain data set, and then refilling new batch data to update the modeling data set, wherein a similarity calculation formula among the data is shown in a formula (3), otherwise, directly filling the new batch data to update the modeling data set in the target domain without updating the migration strategy;

in the formula: (x)_Si,y_Si) Representing source domain old process data;

a data center representing a new process for the target domain;

representing the Euclidean distance between the new and old process data;

θ(x_Si,y_Si) Representing the similarity between the new and old process data;

and 11, after the model is updated, predicting the next new batch, and then transferring to the step 7 until all the batches are predicted, and finishing the migration.

Compared with the prior art, the method has the advantages that the input and output data of each process of the target domain and the source domain are collected, the three-dimensional input data of the old processes of the source domains and the new processes of the target domain are expanded into the two-dimensional data matrix according to the batch direction, and then the input and output data of all the processes are standardized; and calculating the similarity of the old process of each source domain and the new process of the target domain through the Euclidean distance between the data, and simultaneously calculating the sample number of the old process of each source domain, namely determining two main factors influencing the migration effect, and then giving three specific choices and standards based on the two main factors: the method has the advantages that migration, preferential single migration and multi-source integrated migration are rejected, negative migration is avoided as far as possible, meanwhile, data information of old processes in a plurality of similar source domains is fully utilized, waste of data resources is reduced, efficiency and flexibility of migration learning are improved, modeling of new processes in a target domain is assisted and accelerated better, and therefore accuracy of quality prediction is improved; in addition, for multi-source integrated migration, a concept of multi-source domain migration modeling adaptability is provided, the number of source domains can be reasonably and flexibly selected for migration, and the utilization rate of data is further improved; meanwhile, the invention also provides a method for updating the prediction model by utilizing the online data and updating the migration strategy in time based on the change of the working condition so as to ensure the timeliness and the reliability of the migration learning strategy, and field operators can adjust the production operation in time according to the result of the quality prediction, optimize the intermittent industrial production process in real time, ensure the quality of products and improve the production efficiency and the comprehensive economic benefit of enterprises.

Drawings

FIG. 1 is a flow chart of a method of the present invention;

FIG. 2 is a schematic diagram of a process for synthesizing cobalt oxalate in an embodiment of the present invention;

FIG. 3 is a comparison of predicted results and actual values for a non-migrated model of a new process;

FIG. 4 is a comparison of predicted results and actual values of different batch numbers under the same similarity of a single-source domain migration model;

FIG. 5 is a comparison of predicted values and actual values for multiple source domain migration models and a single source domain migration model.

Detailed Description

The invention will be further explained with reference to the drawings.

As shown in FIG. 1, for a target domain new process T, a plurality of similar source domain old processes S exist in an intermittent process quality prediction method based on a similarity multi-source domain transfer learning strategy_i(i 1, 2.. multidot.M), the production equipment is identical, but the internal parameters are set differently, the target domain new process T is just put into operation, the production data is rare, the source domain old processes have sufficient data due to the early time of putting into production, and the three-dimensional input data matrix and the output matrix of the intermittent production process are respectively set as X ∈ R^I×J×K、Y∈R^I×KWherein I represents the process batch number, K represents the sampling time, and J represents the process variable number, the specific method comprises the following steps:

in the formula: s_iAn ith source domain old process;

t is a new process of the target domain;

and

a data center for a new process for the target domain;

and

a data center for a source domain old process;

θ_irepresenting a similarity;

step 4, according to the discriminant

step 5, selecting a migration method: according to the discriminant

if it is not

Selecting to satisfy the condition

And reserving and executing integrated migration of the plurality of source domains; otherwise, directly selecting the old process theta of the source domain with the highest similarity_maxCarrying out preference single migration;

Where y is the actual quality value,

predicting the quality value by the model, and judging whether the latest quality prediction error is continuously stabilized in a certain interval for k times, wherein k is a preset constant, namely whether the difference value of the quality prediction errors of the previous batch and the next batch is continuously less than a threshold value for k times

Wherein the threshold value

in the formula: (x)_Si,y_Si) Representing source domain old process data;

a data center representing a new process for the target domain;

representing the Euclidean distance between the new and old process data;

θ(x_Si,y_Si) Representing the similarity between the new and old process data;

The establishment of the multi-source transfer learning model specifically comprises the following steps:

assuming that there are M similar processes, the data of a certain M processes can be expressed as follows,

in the formula: n is a radical of_mA data sample representing an mth process;

represents the total number of samples;

the response of the mth process is given by the mixed-effect gaussian process model as follows:

～N(0,σ²)

response by fixed effect

And random effects

The two components plus random noise, the stationary and random effects are considered completely independent, assuming the mean function is zero, i.e.

As with the Gaussian process, here the squared exponential covariance function is chosen

And

their corresponding hyper-parameters are respectively

And

usually by maximum likelihood estimation joint learning, based on the above assumptions, for any process M, N ∈ { 1., M }, i ∈ { 1., N }, respectively_m}，j∈{1,...,N_nWe can get the covariance between any two similar processes m and n and can be expressed as

In the formula:

is the mixture effect covariance of all data;

_mnis a kronecker function, if m ═ n, then_mn1, otherwise_mn＝0；

All data for similar processes can be represented in aggregate as follows:

in the formula:

a set of all known inputs;

is the set of all response points;

the mixed effect Gaussian process (ME-GP) model can realize the prediction of a new test point of any process and provide a new test point of a q process

Our goal is to predict its response

And q ∈ { 1.., M }.

The training data includes all data of other similar processes and data of q process, and for the process q, under the assumption of Gaussian distribution, the combined Gaussian distribution composed of output values

And

as follows:

in the formula:

inputting a covariance matrix between data for the training sample and the test sample;

an autocovariance matrix for the test sample;

posterior probability distribution of output values

Is obtained by the Bayes principle as follows:

by utilizing information from all processes, prediction performance can be improved;

in the formula:

fixed-effect covariance structure containing all common information for M similar processes

And a random effect covariance structure containing information specific to the q process

Two parts of the utility model are provided with a water tank,

by

Sum noise covariance σ²Parameterized and order

The value of which has a great influence on the prediction effect of the model, and the negative log-likelihood function is usually used as a hyper-parameter

The negative log-likelihood function can be calculated from the training data D as follows:

first, the partial derivative of the negative log-likelihood function parameter is found

Then, the optimal hyper-parameter is obtained by a conjugate gradient iteration method, and after the optimal hyper-parameter is obtained, the prediction output corresponding to the test sample can be obtained

And the predicted variance

N_qThe sample size representing the qth process data, if q is a new process, typically contains a particularly small sample size, which is difficult to model accurately. In this way, with migration learning, we can use less new process data to improve prediction performance. This again demonstrates the superiority of the mixed effect gaussian process model (ME-GP) method, which is not only efficient but also cost effective.

Examples

The following is a specific example of the cobalt oxalate synthesis process:

the synthesis process of the cobalt oxalate is also a typical intermittent production process, and in order to master the quality of the product in time, the method of the invention is used for predicting the quality of the synthesis process of the cobalt oxalate; the method utilizes a mechanism model of the cobalt oxalate synthesis process to replace an actual production process, provides reasonable modeling data for a data model, and utilizes the mechanism model to replace the actual production process in a simulation process, so that the production process of the cobalt oxalate needs to be subjected to mechanism analysis, and in the synthesis process, in order to obtain the required cobalt oxalate crystal, the chemical reaction of ammonium oxalate and cobalt chloride in a solution is the most important step, and the liquid phase reaction equation is as follows:

CoCl₂+(NH₄)₂C₂O₄→CoC₂O₄↓+2NH₄Cl

in general, because the crystallization process of cobalt oxalate is relatively complex and reaction runaway is easily caused by using pure batch processing operation, a fed-batch processing method is generally adopted; as shown in fig. 2, the synthesis process of cobalt oxalate production mainly includes two processes of ammonium oxalate dissolution, cobalt oxalate drying and crystallization; firstly, putting pure water into an ammonium oxalate dissolving kettle, then adding a certain amount of solid oxalic acid into the pure water, heating the pure water until the solid oxalic acid is completely dissolved, then adding ammonia gas into the dissolving kettle containing an oxalic acid solution, and heating the ammonia gas to a certain temperature to form an ammonium oxalate solution; next, putting a cobalt chloride solution with fixed concentration and volume into a cobalt oxalate crystallization reactor, heating the cobalt chloride solution to a proper reaction temperature by using steam, passing the cobalt chloride solution through an ammonium oxalate solution at a certain speed, continuing to pass through a period of time until the reaction is finished to obtain an ammonium oxalate suspension, putting the suspension into a filter press to perform three times of pressure filtration, washing for three times, and finally drying to obtain a finished product cobalt oxalate; during operation, the PI controller maintained the reaction temperature constant and the reactor stirring rate generally constant.

The method utilizes a mechanism model of the cobalt oxalate crystallization process to generate data of a source domain process and a target domain process, namely, a plurality of processes of cobalt oxalate synthesis are simulated by using the same kinetic equation and similar but different parameters, so that a plurality of similar production flows are constructed; selecting a process T as a new process of a target domain from the plurality of processes, and expressing the rest similar processes S as a plurality of old processes of a source domain; in order to achieve the aim, the simulation parameters corresponding to the working environment and the process level are changed; through deep analysis of the production process of cobalt oxalate and consideration of the actual production process on site, 6 process variables can be selected for quality prediction of cobalt oxalate; the 6 process variables are respectively: the reaction temperature, the stirring rate, the ammonium oxalate concentration, the cobalt chloride concentration, the initial volume of the cobalt chloride and 1 output variable are the size of the cobalt oxalate particle; the parameters and variables are shown in table 1:

TABLE 1 production Process parameter variables

1) Acquisition of simulated process data

Simulating the reaction temperature, the stirring speed, the ammonium oxalate flow velocity, the ammonium oxalate concentration, the cobalt chloride concentration, the initial volume of cobalt chloride and the like in the cobalt oxalate production process under different operating conditions by using MATLAB software, establishing a mechanism model to generate data of each process, randomly generating 4 batches of process data in a target domain as a modeling data set of a new process T according to simulation parameter setting, and generating 40 batches of process data as a test data set of the new process T; in a plurality of source domains S, randomly generating 100 batches of old process data as an old process data set by each process; for each old process of the source domain, the data obtained by the mechanistic model is partitioned. Firstly, randomly selecting 10 batches of data, then randomly selecting 20 batches of data from the rest data for accumulation, and finally constructing five data sets with different batch data volumes, including 10 batches, 30 batches, 50 batches, 70 batches and 90 batches, establishing a mixed effect Gaussian process model by using the five data sets of the old process in each source domain and the 4 batches of data generated by the new process in the target domain as training sets, and using the new process data of 40 batches generated by the previous target domain as test sets for comparison of the predictive performance; firstly, the influence of the migration data volume on the prediction effect is preferentially verified under the condition of certain similarity, and then the influence of the similarity between the old process of the source domain and the new process of the target domain on the prediction effect under the condition of certain migration data volume is verified.

2) Single-source domain migration modeling and quality prediction results

In order to verify the effectiveness of a multi-source domain transfer learning strategy, prediction aiming at single-source domain transfer learning is firstly carried out, and the influence of the data volume of a source domain process and the similarity of an old source domain process and a new target domain process on the transfer effect is researched.

In order to study the influence of migration data volume in a source domain on the prediction effect, a certain source domain is fixed, migration modeling is performed by taking the old process data of an S2 source domain with the similarity of 0.82 as an example, the comparison results of predicted values and measured values of 10 batches and 90 batches are taken as a representative example, and as shown in fig. 4 in particular, it can be seen from the comparison between fig. 3 and fig. 4 that the method for performing modeling by using the migration data from the similar old process can effectively assist and accelerate the modeling of a new process. As can be seen from fig. 4, under a certain similarity, the migrated data amount has a larger influence on the prediction effect, and the prediction result of migrating 90 batches is obviously better than that of migrating 10 batches. Similarly, the influence of the similarity on the prediction effect is verified, under the condition that the migration data volume is certain, the influence of the similarity on the prediction effect is large, and the prediction result of the old process with high migration similarity is obviously superior to the result of the old process with low migration similarity. In order to better research the relationship between the prediction effect of a single-source-domain migration model and the data volume and the similarity of the old processes in a plurality of source domains to be migrated, migration modeling is carried out on the data of different batches of old processes in each source domain, a corresponding prediction result is given, the root mean square error is obtained, and the result shows that the batch number and the similarity of the source domains to be migrated are positively correlated with the prediction precision, but the modeling data tend to be saturated along with the continuous increase of the data volume, and when the data volume reaches a certain threshold value, the prediction precision is not remarkably improved. Furthermore, when the similarity between the old process data of the source domain and the new process data of the target domain is below a certain threshold, the prediction accuracy of the branch modeling will be lower than that of the non-branch modeling, which will result in a "negative branch".

3) Multi-source domain migration modeling and quality prediction results

Based on the influence of the two major influence factors on the prediction effect, the feasibility and the effectiveness of multi-source domain migration under certain conditions are further verified, namely, when the quantity of old process data in the source domain with the highest similarity to the target domain is insufficient and the source domain process data with the higher similarity to the target domain exists, whether to migrate the multiple source domains and how to migrate the multiple source domains are considered.

As shown in FIG. 5, 10 batches of old process data with similarity of 0.91 and 0.82 to the new process of the target domain in the source domain are selected as a representative for quality prediction of multi-source domain migration modeling, the predicted value of a single-source migration model and the comparison of the predicted value and the actual value of the multi-source migration model are shown in FIG. 1, the accuracy of model prediction can be improved by the multi-source domain transfer modeling method, and in order to further research the applicability of the multi-source domain migration modeling, three old processes with similarity of 0.91, 0.82 and 0.68 to the new process of the target domain are selected to perform a comparison experiment of multi-source domain transfer and single-source domain transfer. Migration modeling is carried out on different batches of data volumes of different source domains, and the effectiveness of the proposed strategy is verified as a result, namely the accuracy of quality prediction can be further improved by comprehensively considering two major influence factors to carry out proper multi-source migration modeling.

4) Model updating and culling of old process data

As the production process continues, new process data accumulates, model updates and migration strategies need to be updated, and in addition, due to differences between new and old processes, the modeling data of the used old process may affect the prediction effect. Because differences necessarily exist among similar processes, as new process data are increased to a certain degree, source domain old process data with larger differences with the new process need to be gradually eliminated so as to ensure further improvement of prediction accuracy.

According to the simulation result, the product quality index which is difficult to measure in real time in the actual new production process can be more efficiently predicted on line by the strategy, the modeling of the new process can be reasonably and effectively assisted by the data information of the old process of a plurality of source domains on the premise of avoiding negative migration as much as possible, the problems of low data resource utilization rate and low migration efficiency are solved, and the problems of less data volume at the initial stage of the cobalt oxalate production process and difficult accurate modeling are well solved. The method is used for predicting the quality of the cobalt oxalate product, the speed of off-line modeling is greatly increased, and with the continuous increase of the number of new production batches, the model is updated by using newly generated process data, and meanwhile, the interference data with the minimum similarity in the old process data is gradually eliminated, so that the precision of the prediction model is continuously improved, and a better prediction effect is realized; according to the predicted value of the product quality, an operator can adjust the production plan in time, optimize the production process and improve the production efficiency, so the strategy has important practical significance.

Claims

1. An intermittent process quality prediction method based on a similarity multi-source domain transfer learning strategy is characterized in that for a target domain new process T, a plurality of similar sources existDomain old process S_i(i 1, 2.. multidot.M), the production equipment is identical, but the internal parameters are set differently, the target domain new process T is just put into operation, the production data is rare, the source domain old processes have sufficient data due to the early time of putting into production, and the three-dimensional input data matrix and the output matrix of the intermittent production process are respectively set as X ∈ R^I×J×K、Y∈R^I×KWherein I represents the process batch number, K represents the sampling time, and J represents the process variable number, the specific method comprises the following steps:

in the formula: s_iAn ith source domain old process;

t is a new process of the target domain;

and

a data center for a new process for the target domain;

and

a data center for a source domain old process;

θ_irepresenting a similarity;

step 4, according to the discriminant

step 5, selecting a migration method: according to the discriminant

if it is not

Selecting to satisfy the condition

And reserving and executing integrated migration of the plurality of source domains; whether or notThen directly select the source domain old process theta with the highest similarity_maxCarrying out preference single migration;

Step 8, according to the discriminant formula theta_new>β, wherein β is a predetermined constant for determining the deviation degree of the generated new data from the original data, preliminarily determining whether the target domain and the migration policy need to be updated simultaneously, if θ is_new>β, go to step 9, otherwise, execute step 1;

step 9, according to whether the total number n of the accumulated newly produced batches meets a discriminant n > m, wherein m is a preset constant, whether the target domain and the migration strategy need to be updated simultaneously is determined again, if n > m is established, the target domain and the migration strategy are updated simultaneously, the target domain updating formula is shown as a formula (2), then step 1 is executed, otherwise, step 10 is executed;

Where y is the actual quality value,

Wherein the threshold value

in the formula: (x)_Si,y_Si) Representing source domain old process data;

a data center representing a new process for the target domain;

representing the Euclidean distance between the new and old process data;

θ(x_Si,y_Si) Representing the similarity between the new and old process data;